all the functions involved return double and later these locals are used
in double calculations. fixes a vs build warning
Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14
Get back some of the compression gains by extending the search space for
GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not
helping much in yielding better compression density.
Before:
1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations)
Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s
After:
1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations)
Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s
Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d
Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash
for merging the Histograms more efficiently, instead of the randomized
heuristic of existing method HistogramCombine.
This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50)
with 0.8% penalty in compression density. For lower method, the speed up is 25-30%,
with 0.4% penalty in the compression density.
Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce
Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove
one if condition inside the main/critial loop. Also separated TransformColor &
TransformColorInverse into separate functions and avoid one 'if condition'
inside this critical method.
This change speeds up lossless decoding for Lenna image about 5% and 1000 image
corpus by 3-4%.
Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d
These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
cwebp -z 5 in.png -o out_lossless.webp
There are 10 possible values for -z parameter:
0 (fastest, lowest compression)
to 9 (slowest, best compression)
A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.
This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.
Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)
Output is bitwise the same as before.
Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.
This speeds up lossless encoding by 1-2% and almost no impact on compression
density.
Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b
mostly by:
- storing a single rd-score instead of cost / distortion separately
- evaluating terminal cost only once
- getting some invariants out of the loops
- more consts behind fewer variables
Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c
incorporate non-last cost in per-level cost table
also: correct trellis-quant cost evaluation at nodes
(output a little bit different now). Method 6 is ~4% faster.
Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8
Speedup lossless encoder by 20-25% by optimizing:
- GetBestColorTransformForTile: Use techniques like binary search and
local minima search to reduce the search space.
- VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for
log(1 + x) and increase the threshold for calling the approximate
version of log_2 (compared to costly call to log()).
Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d
converts 2 s16 vectors to 2 u8 and store to uint8_t destination;
TransformAC3 can reuse this after a rework
Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0
Separate the C version from the MIPS32 version and have run-time
initialization during RescalerInit()
Change-Id: I93cfa5691c073a099fe62eda1333ad2bb749915b
Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.
Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.
This speeds up the lossless encoding at lower qualities by 10-15%.
Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
* simplify the endian logic
* remove the need for memset()
* write 16 or 32 at a time (likely aligned)
Makes the code a bit faster on ARM (~1%)
Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c
-> remove the 'color_transform' multiplier, use more constants, etc.
This function is particularly critical, mostly because of
GetBestColorTransformForTile().
Loop is a bit faster (maybe ~1%)
Change-Id: I90c96a3437cafb184773acef55c77e40c224388f
The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits)
data in the destination buffer and while pre-multiplying the alpha to RGB colors.
Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba
(instead of per-macroblock)
speed unchanged.
simplified the context-saving for incremental decoding
Change-Id: I301be581bab581ff68de14c4ffe5bc0ec63f34be
VP8GetThreadMethod() may be called with a NULL headers param; correct an
assert.
broken since:
8a2fa09 Add a second multi-thread method
Change-Id: If7b6d1b8f4ec874d343a806cee5f5e6bb6438620
This makes the segmentation overall less prone to
local-optimum or boundary effect.
(and overall, encoding is a little faster)
Change-Id: I35688098b0f43c28b5cb81c4a92e1575bb0eddb9
The registers and instructions are quite different to 32bit
and the assembly code needs a rewrite.
more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/
Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f
the -alpha_cleanup flag was ineffective since we switched cwebp
to using ARGB input always.
Original idea by David Eckel (dvdckl at gmail dot com)
Change-Id: I0917a8b91ce15a43199728ff4ee2a163be443bab
New API options: WebPDecoderOptions.flip and 'dwebp -flip ...'
it uses negative stride trick.
Also changed the decoder code to support user-supplied
buffers with negative stride, independently of the
WebPDecoderOptions.flip value.
Change-Id: I4dc0d06f0c87e51a3f3428be4fee2d6b5ad76053
this partially reverts
f626fe2 Detect canvas and image size mismatch in decoder.
the original change would cause calls to e.g., WebPGetInfo to fail until
a portion of the image chunk was available. With lossy+alpha this meant
waiting for the entire ALPH chunk to be received.
this change restores the original behavior -- reporting the values from
VP8X if available -- while retaining some of the added canvas/image size
checks if the image data is available
Change-Id: I6295b00a2e2d0d4d8847371756af347e4a80bc0e
add TransformDC special case, and make the switch function inlined.
Recovers a few of the CPU lost during the addition of TransformAC3
(only on ARM)
Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6
the *quantized* level should be clipped to 2047, not the
original coeff.
(similar problem was fixed in the regular quantize function
quite some time ago)
Change-Id: I2fd2f8d94561ff0204e60535321ab41a565e8f85
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.
Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
This is in preparation for a future change where input will
be 16bit instead of 12bit
No speed diff observed.
Note that the NEON implementation was using 32bit calc already.
Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this
Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)
Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24
Even at high quality setting, the U/V quantizer step is limited
to 4 which can lead to banding on gradient.
This option allows to selectively apply some randomness to
potentially flattened-out U/V blocks and attenuate the banding.
This option is off by default in 'dwebp', but set to -dither 50
by default in 'vwebp'.
Note: depending on the number of blocks selectively dithered,
we can have up to a 10% slow-down in decoding speed it seems.
Change-Id: Icc2446007f33ddacb60b3a80a9e63f2d5ad162de