Even at high quality setting, the U/V quantizer step is limited
to 4 which can lead to banding on gradient.
This option allows to selectively apply some randomness to
potentially flattened-out U/V blocks and attenuate the banding.
This option is off by default in 'dwebp', but set to -dither 50
by default in 'vwebp'.
Note: depending on the number of blocks selectively dithered,
we can have up to a 10% slow-down in decoding speed it seems.
Change-Id: Icc2446007f33ddacb60b3a80a9e63f2d5ad162de
RGBToU/V calls expects two extra precision bits, they were only
given one by SUM2H and SUM2H macros.
For rounding coherency, also changed SUM1 macro.
Change-Id: I05f96a46f5d4f17b830d0420eaf79b066cdf78d4
otherwise make sure that all frames are marked as a fragment. there's
still some work to do with validation if fragments are expected to cover
the entire canvas.
Change-Id: Id59e95ac01b9340ba8c6039b0c3b65484b91c42f
this avoids local-minima that look bad, even if the distortion
looks low (e.g. gradients, sky,...). Mostly visible in the q=50-80 range.
Output size is mostly unchanged.
Change-Id: I425b600ec45420db409911367cda375870bc2c63
Earlier we were only testing for bit_pos == LBITS. But this is not
sufficient,
as bit_pos can jump from < LBITS to > LBITS.
This was resulting in some bit-stream truncation errors not being
caught.
Note: Not a security bug though, as br->pos wasn't incremented in such
cases
and so we weren't reading beyond the buffer.
Change-Id: Idadcdcbc6a5713f8fac3470f907fa37a63074836
* raise U/V quantization bias to more neutral values
* also raise the non-zero AC bias for Y1/Y2 matrices
(we need all the precision we can for U/V leves, which are often empty)
This will increase quality in the higher range (q >= 90) mostly.
Files size is exacted to raise a little (5-7%). and SSIM accordingly of course.
Change-Id: I8a9ffdb6d8fb6dadb959e3fd392e66dc5aaed64e
kLevelsFromDelta[sharpness][delta] is an inverse look-up table
that tells the minimum filtering strength needed to trigger the
filtering of a step with amplitude 'delta'. We use this table
in various situations:
a) when computing the initial (/global) filtering
strength for each segment. We look at the quantization
step and deduce the proper filtering strength needed
to result this quantization noise (talking the -f option
into account).
b) during intra16 calculation, when a block ends up
very empty (only DC coeffs are non-zero, all ACs have
vanished). We'll rely on the in-loop filtering to
restore the smoothness (if the source was gradient-like
smooth. That's why we look at the distortion too before
triggering the filtering).
Step b) goes _in addition_ to a), potentially raising
the filtering strength if blockiness is likely.
Change-Id: Icaeca93ef21da195b079e6587a44d9edfc8e9efa
Earlier "f = f->next_" was executing for both inner and outer loop, thus
skipping validation of some frames.
Change-Id: Ice5cdb4ff5da78384aa0573addd3a5e5efa0b10c
-> helps debanding (sky, gradients, etc.)
This dithering can only be triggered when using -preset photo
or -pre 2 (as a preprocessing). Everything is unchanged otherwise.
Note that this change is likely to make the perceived PSNR/SSIM drop
since we're altering the input internally.
Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813
method 1 grouping: [parse + reconstruction] // [filtering + output]
method 2 grouping: [parse] // [reconstruction+filtering + output]
Depending on some heuristics (see VP8ThreadMethod()), we
can pick one of the other when -mt flag (or option.use_threads)
is selected.
Conservatively, we always use method #2 for now until the heuristic
is refined (so, timing should be the same the before this patch)
+ replace 'use_threads' by 'mt_method'
+ define MIN_WIDTH_FOR_THREADS constant
+ fix comment alignment
Change-Id: I11a756dea9070d6e21b1a9481d357a1e8aa0663e
Mostly visible for large images.
Reconstruction+filtering is now done in parallel to bitstream-parsing.
Change-Id: I4cc4483d803b255f4d97a2fcd9158b1c291dd900
Needs more memory but allows for future parallelization.
Noticeably faster on ARM, slightly faster on x86
also: remove dec->filter_row_ unnecessary field
Change-Id: I044a808839b4e000c838a477e3e8688820436d9a
happens surprisingly often at low quality, so we might
as well hard-code a simplified TransformWHT() directly.
Change-Id: Ib7a858ef74e8f334bd59d6512bf5bd3e455c5459
happens when decoding is partial (past Partition0), without error and
interrupted by calling WebPIDelete()
WebPIDelete() needs to call VP8ExitCritical() to free in-flight resources
Change-Id: Id4faef1b92f7edd8c17d642c58860e70dd570506
"src\enc\frame.c(88) : warning C4244: '=' : conversion from 'const double' to 'float', possible loss of data"
Change-Id: I143cb0bb6b69e1b8befe9b4f24b71adbc28095c2
The convergence algo is noticeably faster and more accurate.
Try it with: 'cwebp -size xxxxx -pass 8 ...' or 'cwebp -psnr 39 -pass 8 ...'
for instance
Allow full-looping with TokenBuffer case, and make the non-TokenBuffer
case match too.
In case Partition0 is likely to overflow, retry encoding with harder
limits on max_i4_header_bits_.
This CL should make -partition_limit option somewhat useless,
since the fix made automatically (albeit in a non-optimal way yet).
Change-Id: I46fde3564188b13b89d4cb69f847a5f24b8c735b
* fix VP8FixedCostsI4ÆÅ table
(the constant cost '211' was erronenously included)
* use the rd-score for '211' correctly (calling SetRDScore() for good)
* count partition0 bits separately during rd-opt
No meaningful difference in rd-curve.
Change-Id: I6c49a150cf28928d9a92c32fff097600d7145ca4
use of uint8_t type was causing error like:
src/dsp/upsampling.c:223:1: internal compiler error: in vect_determine_vectorization_factor, at tree-vect-loop.c:349
with gcc 4.6.3
Change-Id: Ieb6189a1375c47fc4ff992e6c09b34a7f1f605da
When -mt is used, the analysis pass will be split in two
and each halves performed in parallel. This gives a 5%-9% speed-up.
This was a good occasion to revamp the iterator and analysis-loop
code. As a result, the default (non-mt) behaviour is a tad (~1%) faster.
Change-Id: Id0828c2ebe2e968db8ca227da80af591d6a4055f
-pass 2 can be useful sometimes. More passes usually don't help more.
This change is a step toward being able to re-code the whole picture
with varying parameter (when token buffer is used).
Change-Id: Ia2538e2069a53c080e2ad248c18a1e04623a9304
* move yuv_in_/out_* scratch buffers to iterator
* add y_top_/uv_top_ shortcuts in iterator
That's ~3k of stack size instead of heap.
But it allows having several iterators work in parallel.
Change-Id: I6a437c0f2ef1e5d398c1d6a2fd4974fa0869f0c1
in_bits is const. Trying to apply bswap on it, one gets the error message:
error: read-only variable 'in_bits' used as 'asm' output
Change-Id: I0bef494b822c83d8ea87b1938b0e486d94de4742
The C-version gets ~7-8% slower in order to match the SSE2
output exactly. The old (now off-by-1) code is kept under
the WEBP_YUV_USE_TABLE flag for reference.
(note that calc rounding precision is slightly better ~= +0.02dB)
on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking
the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610)
Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f
If 'top' was meant to be NULL, then bottom and top can be
swapped. Logic is simpler.
+ fix compilation in non-FANCY_UPSAMPLING mode
Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff
Also created variant VP8LPrefixEncodeBits that returns the
code & extra_bits only.
There's no impact on compression density and compression speed.
Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c
WebPDemuxPartial() returns NULL for both of the following cases:
- There was a parsing error.
- It doesn't have enough data to start parsing.
Now, one can differentiate between these two cases by checking the value
of 'state' returned by WebPDemuxPartial().
Change-Id: Ia2377f0c516b3fcfae475c0662c4932d2eddcd0b
Earlier, all lossless images were assumed to contain alpha.
Now, we use the 'alpha_is_used' bit from the VP8L bitstream to determine
the
same.
Detecting an absence of alpha can sometimes lead to much more efficient
rendering, especially for animated images.
Related: refine mux code to read width/height/has_alpha information only
once
per frame/fragment. This avoid frequent calls to VP8(L)GetInfo().
Change-Id: I4e0eef4db7d94425396c7dff6ca5599d5bca8297
Speed up HashChainFindCopy by optimizing on number of calls to
FindMatchLength method.
This change speeds up the lossless & lossy (Alpha) encoding by 20%
without loss of compression density.
At method=3, lossy (Alpha) compression speed (and density) remains
unchanged, as at that settings, the costly Backward Refs method is not
called
Change-Id: Ia1797148e9e4ee2787011837fa248afbae2242cb
Disable costly 'BackwardReferencesTraceBackwards' for encoding Alpha plane.
Increase the threshold for triggering 'BackwardReferencesTraceBackwards' to
quality 25 and above. Also lower the Alpha quality (at method 3) to be
lesser than this threshold (25).
Change-Id: Ic29fb2e6943472c564223df9fe099b19ccda0f31
This speeds up WebP lossless decoding by 20%. In particular, the
photographic images get 35% speedup.
Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc
speeds up those codes that are not part of the main lookup.
This gives a 10 % speedup for a photographic image.
Change-Id: Ief54b0ad77db790a01314402ad351b40ac9a7be4
+ some revamp and cleanup of the alpha-filter trial loop
+ EncodeAlphaInternal() now just takes a FilterTrial param
Change-Id: Ief84385083b1cba02678bbcd3dbf707245ee962f
Specialize and simplify the alpha-decoding case, which is used when:
- no color-cache is use
- all red/blue/alpha values are the same (and hence their Huffman tree has
only 1 symbol. We don't need to consume any bits for reading these).
+ revamped the loop to use size_t and offsets instead of pointers.
~2-3% faster on Unix (gcc) but up to 25% faster lossy+alpha decoding
on Mac (llvm) and ARM.
Change-Id: I43c9688d1e4811cab0ecf0108a5b8f45781083e6
* 0.3.0: (57 commits)
update ChangeLog
Regression fix for alpha channels using color cache:
wicdec: silence a format warning
muxedit: silence some uninitialized warnings
update ChangeLog
update NEWS
bump version to 0.3.1
Revert "add WebPBlendAlpha() function to blend colors against background"
Simplify forward-WHT + SSE2 version
probe input file and quick-check for WebP format.
configure: improve gl/glut library test
update copyright text
configure: remove use of AS_VAR_APPEND
fix EXIF parsing in PNG
add doc precision for WebPPictureCopy() and WebPPictureView()
remove datatype qualifier for vmnv
fix a memory leak in gif2webp
fix two minor memory leaks in webpmux
remove some cruft from swig/libwebp.jar
README: update swig notes
...
Conflicts:
NEWS
examples/gif2webp.c
src/dec/alpha.c
src/dec/idec.c
src/dec/vp8l.c
src/enc/alpha.c
src/enc/vp8l.c
Change-Id: Ib202fad7825a090c3b3a5169acd171369cface47
+ split AllocateInternalBuffers() into two 32b/8b variants instead of
trying to do everything in one function.
Change-Id: I35cac9fcd990a2194c95da4b2a4046ca3a514343
Considering the fact that insert to/lookup from the color cache is always 32
bit, use DecodeImageData() variant in that case.
Conflicts:
src/dec/vp8l.c
Change-Id: I6c665a6cfbd9bd10651c1e82fa54e687cbd54a2b
(cherry picked from commit a37eff47d6)
src/mux/muxedit.c:490: warning: 'x_offset' may be used uninitialized in this function
src/mux/muxedit.c:490: warning: 'y_offset' may be used uninitialized in this function
Change-Id: I4fd27f717e59a556354d0560b633d0edafe7a4d8
(cherry picked from commit 14cd5c6c40)
Considering the fact that insert to/lookup from the color cache is always 32
bit, use DecodeImageData() variant in that case.
Change-Id: I6c665a6cfbd9bd10651c1e82fa54e687cbd54a2b
src/mux/muxedit.c:490: warning: 'x_offset' may be used uninitialized in this function
src/mux/muxedit.c:490: warning: 'y_offset' may be used uninitialized in this function
Change-Id: I4fd27f717e59a556354d0560b633d0edafe7a4d8
src\dec\vp8l.c(816) : warning C4244: '=' : conversion from '__int64' to
'int', possible loss of data
src\dec\vp8l.c(817) : warning C4244: '=' : conversion from '__int64' to
'int', possible loss of data
Change-Id: I1d376d5dea909395bff8741aba16e8eed83a6e8f