+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual
Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d
This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.
Change-Id: I491aa1c726def934161d4a4377e009737fbeff82
Tune HistogramCombineBin for hard images that are larger than 1-2 Mega
pixel and represent photographic images.
This speeds up lossless encoding on 1000 image corpus by 10-12% and compression
penalty of 0.1-0.2%.
Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81
there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.
The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.
Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).
Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b
(and ~2-3% on ARM)
We don't need to store cost/score for each node, but only for
the current and previous one -> simplify code and save some memory.
Also made the 'Node' structure tighter.
Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6
all the functions involved return double and later these locals are used
in double calculations. fixes a vs build warning
Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14
Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash
for merging the Histograms more efficiently, instead of the randomized
heuristic of existing method HistogramCombine.
This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50)
with 0.8% penalty in compression density. For lower method, the speed up is 25-30%,
with 0.4% penalty in the compression density.
Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce
These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
cwebp -z 5 in.png -o out_lossless.webp
There are 10 possible values for -z parameter:
0 (fastest, lowest compression)
to 9 (slowest, best compression)
A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.
This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.
Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)
Output is bitwise the same as before.
Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.
This speeds up lossless encoding by 1-2% and almost no impact on compression
density.
Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b
mostly by:
- storing a single rd-score instead of cost / distortion separately
- evaluating terminal cost only once
- getting some invariants out of the loops
- more consts behind fewer variables
Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c
incorporate non-last cost in per-level cost table
also: correct trellis-quant cost evaluation at nodes
(output a little bit different now). Method 6 is ~4% faster.
Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8
Speedup lossless encoder by 20-25% by optimizing:
- GetBestColorTransformForTile: Use techniques like binary search and
local minima search to reduce the search space.
- VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for
log(1 + x) and increase the threshold for calling the approximate
version of log_2 (compared to costly call to log()).
Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d
Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.
Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.
This speeds up the lossless encoding at lower qualities by 10-15%.
Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
This makes the segmentation overall less prone to
local-optimum or boundary effect.
(and overall, encoding is a little faster)
Change-Id: I35688098b0f43c28b5cb81c4a92e1575bb0eddb9
the -alpha_cleanup flag was ineffective since we switched cwebp
to using ARGB input always.
Original idea by David Eckel (dvdckl at gmail dot com)
Change-Id: I0917a8b91ce15a43199728ff4ee2a163be443bab
the *quantized* level should be clipped to 2047, not the
original coeff.
(similar problem was fixed in the regular quantize function
quite some time ago)
Change-Id: I2fd2f8d94561ff0204e60535321ab41a565e8f85
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.
Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this
Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)
Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24
RGBToU/V calls expects two extra precision bits, they were only
given one by SUM2H and SUM2H macros.
For rounding coherency, also changed SUM1 macro.
Change-Id: I05f96a46f5d4f17b830d0420eaf79b066cdf78d4
this avoids local-minima that look bad, even if the distortion
looks low (e.g. gradients, sky,...). Mostly visible in the q=50-80 range.
Output size is mostly unchanged.
Change-Id: I425b600ec45420db409911367cda375870bc2c63
* raise U/V quantization bias to more neutral values
* also raise the non-zero AC bias for Y1/Y2 matrices
(we need all the precision we can for U/V leves, which are often empty)
This will increase quality in the higher range (q >= 90) mostly.
Files size is exacted to raise a little (5-7%). and SSIM accordingly of course.
Change-Id: I8a9ffdb6d8fb6dadb959e3fd392e66dc5aaed64e
kLevelsFromDelta[sharpness][delta] is an inverse look-up table
that tells the minimum filtering strength needed to trigger the
filtering of a step with amplitude 'delta'. We use this table
in various situations:
a) when computing the initial (/global) filtering
strength for each segment. We look at the quantization
step and deduce the proper filtering strength needed
to result this quantization noise (talking the -f option
into account).
b) during intra16 calculation, when a block ends up
very empty (only DC coeffs are non-zero, all ACs have
vanished). We'll rely on the in-loop filtering to
restore the smoothness (if the source was gradient-like
smooth. That's why we look at the distortion too before
triggering the filtering).
Step b) goes _in addition_ to a), potentially raising
the filtering strength if blockiness is likely.
Change-Id: Icaeca93ef21da195b079e6587a44d9edfc8e9efa
-> helps debanding (sky, gradients, etc.)
This dithering can only be triggered when using -preset photo
or -pre 2 (as a preprocessing). Everything is unchanged otherwise.
Note that this change is likely to make the perceived PSNR/SSIM drop
since we're altering the input internally.
Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813
"src\enc\frame.c(88) : warning C4244: '=' : conversion from 'const double' to 'float', possible loss of data"
Change-Id: I143cb0bb6b69e1b8befe9b4f24b71adbc28095c2
The convergence algo is noticeably faster and more accurate.
Try it with: 'cwebp -size xxxxx -pass 8 ...' or 'cwebp -psnr 39 -pass 8 ...'
for instance
Allow full-looping with TokenBuffer case, and make the non-TokenBuffer
case match too.
In case Partition0 is likely to overflow, retry encoding with harder
limits on max_i4_header_bits_.
This CL should make -partition_limit option somewhat useless,
since the fix made automatically (albeit in a non-optimal way yet).
Change-Id: I46fde3564188b13b89d4cb69f847a5f24b8c735b
* fix VP8FixedCostsI4ÆÅ table
(the constant cost '211' was erronenously included)
* use the rd-score for '211' correctly (calling SetRDScore() for good)
* count partition0 bits separately during rd-opt
No meaningful difference in rd-curve.
Change-Id: I6c49a150cf28928d9a92c32fff097600d7145ca4
When -mt is used, the analysis pass will be split in two
and each halves performed in parallel. This gives a 5%-9% speed-up.
This was a good occasion to revamp the iterator and analysis-loop
code. As a result, the default (non-mt) behaviour is a tad (~1%) faster.
Change-Id: Id0828c2ebe2e968db8ca227da80af591d6a4055f
-pass 2 can be useful sometimes. More passes usually don't help more.
This change is a step toward being able to re-code the whole picture
with varying parameter (when token buffer is used).
Change-Id: Ia2538e2069a53c080e2ad248c18a1e04623a9304
* move yuv_in_/out_* scratch buffers to iterator
* add y_top_/uv_top_ shortcuts in iterator
That's ~3k of stack size instead of heap.
But it allows having several iterators work in parallel.
Change-Id: I6a437c0f2ef1e5d398c1d6a2fd4974fa0869f0c1
If 'top' was meant to be NULL, then bottom and top can be
swapped. Logic is simpler.
+ fix compilation in non-FANCY_UPSAMPLING mode
Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff
Also created variant VP8LPrefixEncodeBits that returns the
code & extra_bits only.
There's no impact on compression density and compression speed.
Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c
Speed up HashChainFindCopy by optimizing on number of calls to
FindMatchLength method.
This change speeds up the lossless & lossy (Alpha) encoding by 20%
without loss of compression density.
At method=3, lossy (Alpha) compression speed (and density) remains
unchanged, as at that settings, the costly Backward Refs method is not
called
Change-Id: Ia1797148e9e4ee2787011837fa248afbae2242cb
Disable costly 'BackwardReferencesTraceBackwards' for encoding Alpha plane.
Increase the threshold for triggering 'BackwardReferencesTraceBackwards' to
quality 25 and above. Also lower the Alpha quality (at method 3) to be
lesser than this threshold (25).
Change-Id: Ic29fb2e6943472c564223df9fe099b19ccda0f31
This speeds up WebP lossless decoding by 20%. In particular, the
photographic images get 35% speedup.
Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc
+ some revamp and cleanup of the alpha-filter trial loop
+ EncodeAlphaInternal() now just takes a FilterTrial param
Change-Id: Ief84385083b1cba02678bbcd3dbf707245ee962f
* 0.3.0: (57 commits)
update ChangeLog
Regression fix for alpha channels using color cache:
wicdec: silence a format warning
muxedit: silence some uninitialized warnings
update ChangeLog
update NEWS
bump version to 0.3.1
Revert "add WebPBlendAlpha() function to blend colors against background"
Simplify forward-WHT + SSE2 version
probe input file and quick-check for WebP format.
configure: improve gl/glut library test
update copyright text
configure: remove use of AS_VAR_APPEND
fix EXIF parsing in PNG
add doc precision for WebPPictureCopy() and WebPPictureView()
remove datatype qualifier for vmnv
fix a memory leak in gif2webp
fix two minor memory leaks in webpmux
remove some cruft from swig/libwebp.jar
README: update swig notes
...
Conflicts:
NEWS
examples/gif2webp.c
src/dec/alpha.c
src/dec/idec.c
src/dec/vp8l.c
src/enc/alpha.c
src/enc/vp8l.c
Change-Id: Ib202fad7825a090c3b3a5169acd171369cface47
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
(cherry picked from commit d640614d54)
The auto-infer logic of detecting the 'Alpha' use case
(via check '(palette[i] & 0x00ff00ffu) != 0' is failing
for this corner case image with all black pixels (rgb = 0)
and different Alpha values.
-> switch generic use-LUT detection
Change-Id: I982a8b28c8bcc43e3dc68ac358f978a4bcc14c36
(cherry picked from commit afa3450c11)
Added 1 pixel cache for palette colors for faster lookup.
This will speedup images that require ApplyPalette by 6.5% for lossless
compression.
Change-Id: Id0c5174d797ffabdb09905c2ba76e60601b686f8
(cherry picked from commit 742110ccce)
Start VP8EncLoop/VP8EncTokenLoop only if VP8EncStartAlpha succeeded.
Change-Id: Id1faca3e6def88102329ae2b4974bd4d6d4c4a7a
(cherry picked from commit 67708d6701)
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.
Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
(cherry picked from commit e7d9548c9b)
Tuned the cross_color transform parameter (step) for lower quality
levels. This change gives speedup of 20% at lower qualities (25) and 10% at
moderate quality level (50) with a loss of 0.25% in compression density.
Also removed TODO for cross_color transform. Observed good correlation of
this with the predict transform.
Change-Id: I8a1044e9f24e6a5f84295c030fd444d0eec7d154
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4