Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.
Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.
This speeds up the lossless encoding at lower qualities by 10-15%.
Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
* simplify the endian logic
* remove the need for memset()
* write 16 or 32 at a time (likely aligned)
Makes the code a bit faster on ARM (~1%)
Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c
-> remove the 'color_transform' multiplier, use more constants, etc.
This function is particularly critical, mostly because of
GetBestColorTransformForTile().
Loop is a bit faster (maybe ~1%)
Change-Id: I90c96a3437cafb184773acef55c77e40c224388f
The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits)
data in the destination buffer and while pre-multiplying the alpha to RGB colors.
Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba
(instead of per-macroblock)
speed unchanged.
simplified the context-saving for incremental decoding
Change-Id: I301be581bab581ff68de14c4ffe5bc0ec63f34be
VP8GetThreadMethod() may be called with a NULL headers param; correct an
assert.
broken since:
8a2fa09 Add a second multi-thread method
Change-Id: If7b6d1b8f4ec874d343a806cee5f5e6bb6438620
This makes the segmentation overall less prone to
local-optimum or boundary effect.
(and overall, encoding is a little faster)
Change-Id: I35688098b0f43c28b5cb81c4a92e1575bb0eddb9
The registers and instructions are quite different to 32bit
and the assembly code needs a rewrite.
more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/
Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f
the -alpha_cleanup flag was ineffective since we switched cwebp
to using ARGB input always.
Original idea by David Eckel (dvdckl at gmail dot com)
Change-Id: I0917a8b91ce15a43199728ff4ee2a163be443bab
New API options: WebPDecoderOptions.flip and 'dwebp -flip ...'
it uses negative stride trick.
Also changed the decoder code to support user-supplied
buffers with negative stride, independently of the
WebPDecoderOptions.flip value.
Change-Id: I4dc0d06f0c87e51a3f3428be4fee2d6b5ad76053
this partially reverts
f626fe2 Detect canvas and image size mismatch in decoder.
the original change would cause calls to e.g., WebPGetInfo to fail until
a portion of the image chunk was available. With lossy+alpha this meant
waiting for the entire ALPH chunk to be received.
this change restores the original behavior -- reporting the values from
VP8X if available -- while retaining some of the added canvas/image size
checks if the image data is available
Change-Id: I6295b00a2e2d0d4d8847371756af347e4a80bc0e
add TransformDC special case, and make the switch function inlined.
Recovers a few of the CPU lost during the addition of TransformAC3
(only on ARM)
Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6
the *quantized* level should be clipped to 2047, not the
original coeff.
(similar problem was fixed in the regular quantize function
quite some time ago)
Change-Id: I2fd2f8d94561ff0204e60535321ab41a565e8f85
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.
Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
This is in preparation for a future change where input will
be 16bit instead of 12bit
No speed diff observed.
Note that the NEON implementation was using 32bit calc already.
Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this
Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)
Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24
Even at high quality setting, the U/V quantizer step is limited
to 4 which can lead to banding on gradient.
This option allows to selectively apply some randomness to
potentially flattened-out U/V blocks and attenuate the banding.
This option is off by default in 'dwebp', but set to -dither 50
by default in 'vwebp'.
Note: depending on the number of blocks selectively dithered,
we can have up to a 10% slow-down in decoding speed it seems.
Change-Id: Icc2446007f33ddacb60b3a80a9e63f2d5ad162de
RGBToU/V calls expects two extra precision bits, they were only
given one by SUM2H and SUM2H macros.
For rounding coherency, also changed SUM1 macro.
Change-Id: I05f96a46f5d4f17b830d0420eaf79b066cdf78d4
otherwise make sure that all frames are marked as a fragment. there's
still some work to do with validation if fragments are expected to cover
the entire canvas.
Change-Id: Id59e95ac01b9340ba8c6039b0c3b65484b91c42f
this avoids local-minima that look bad, even if the distortion
looks low (e.g. gradients, sky,...). Mostly visible in the q=50-80 range.
Output size is mostly unchanged.
Change-Id: I425b600ec45420db409911367cda375870bc2c63
Earlier we were only testing for bit_pos == LBITS. But this is not
sufficient,
as bit_pos can jump from < LBITS to > LBITS.
This was resulting in some bit-stream truncation errors not being
caught.
Note: Not a security bug though, as br->pos wasn't incremented in such
cases
and so we weren't reading beyond the buffer.
Change-Id: Idadcdcbc6a5713f8fac3470f907fa37a63074836
* raise U/V quantization bias to more neutral values
* also raise the non-zero AC bias for Y1/Y2 matrices
(we need all the precision we can for U/V leves, which are often empty)
This will increase quality in the higher range (q >= 90) mostly.
Files size is exacted to raise a little (5-7%). and SSIM accordingly of course.
Change-Id: I8a9ffdb6d8fb6dadb959e3fd392e66dc5aaed64e
kLevelsFromDelta[sharpness][delta] is an inverse look-up table
that tells the minimum filtering strength needed to trigger the
filtering of a step with amplitude 'delta'. We use this table
in various situations:
a) when computing the initial (/global) filtering
strength for each segment. We look at the quantization
step and deduce the proper filtering strength needed
to result this quantization noise (talking the -f option
into account).
b) during intra16 calculation, when a block ends up
very empty (only DC coeffs are non-zero, all ACs have
vanished). We'll rely on the in-loop filtering to
restore the smoothness (if the source was gradient-like
smooth. That's why we look at the distortion too before
triggering the filtering).
Step b) goes _in addition_ to a), potentially raising
the filtering strength if blockiness is likely.
Change-Id: Icaeca93ef21da195b079e6587a44d9edfc8e9efa
Earlier "f = f->next_" was executing for both inner and outer loop, thus
skipping validation of some frames.
Change-Id: Ice5cdb4ff5da78384aa0573addd3a5e5efa0b10c
-> helps debanding (sky, gradients, etc.)
This dithering can only be triggered when using -preset photo
or -pre 2 (as a preprocessing). Everything is unchanged otherwise.
Note that this change is likely to make the perceived PSNR/SSIM drop
since we're altering the input internally.
Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813
method 1 grouping: [parse + reconstruction] // [filtering + output]
method 2 grouping: [parse] // [reconstruction+filtering + output]
Depending on some heuristics (see VP8ThreadMethod()), we
can pick one of the other when -mt flag (or option.use_threads)
is selected.
Conservatively, we always use method #2 for now until the heuristic
is refined (so, timing should be the same the before this patch)
+ replace 'use_threads' by 'mt_method'
+ define MIN_WIDTH_FOR_THREADS constant
+ fix comment alignment
Change-Id: I11a756dea9070d6e21b1a9481d357a1e8aa0663e
Mostly visible for large images.
Reconstruction+filtering is now done in parallel to bitstream-parsing.
Change-Id: I4cc4483d803b255f4d97a2fcd9158b1c291dd900
Needs more memory but allows for future parallelization.
Noticeably faster on ARM, slightly faster on x86
also: remove dec->filter_row_ unnecessary field
Change-Id: I044a808839b4e000c838a477e3e8688820436d9a
happens surprisingly often at low quality, so we might
as well hard-code a simplified TransformWHT() directly.
Change-Id: Ib7a858ef74e8f334bd59d6512bf5bd3e455c5459
happens when decoding is partial (past Partition0), without error and
interrupted by calling WebPIDelete()
WebPIDelete() needs to call VP8ExitCritical() to free in-flight resources
Change-Id: Id4faef1b92f7edd8c17d642c58860e70dd570506
"src\enc\frame.c(88) : warning C4244: '=' : conversion from 'const double' to 'float', possible loss of data"
Change-Id: I143cb0bb6b69e1b8befe9b4f24b71adbc28095c2
The convergence algo is noticeably faster and more accurate.
Try it with: 'cwebp -size xxxxx -pass 8 ...' or 'cwebp -psnr 39 -pass 8 ...'
for instance
Allow full-looping with TokenBuffer case, and make the non-TokenBuffer
case match too.
In case Partition0 is likely to overflow, retry encoding with harder
limits on max_i4_header_bits_.
This CL should make -partition_limit option somewhat useless,
since the fix made automatically (albeit in a non-optimal way yet).
Change-Id: I46fde3564188b13b89d4cb69f847a5f24b8c735b
* fix VP8FixedCostsI4ÆÅ table
(the constant cost '211' was erronenously included)
* use the rd-score for '211' correctly (calling SetRDScore() for good)
* count partition0 bits separately during rd-opt
No meaningful difference in rd-curve.
Change-Id: I6c49a150cf28928d9a92c32fff097600d7145ca4
use of uint8_t type was causing error like:
src/dsp/upsampling.c:223:1: internal compiler error: in vect_determine_vectorization_factor, at tree-vect-loop.c:349
with gcc 4.6.3
Change-Id: Ieb6189a1375c47fc4ff992e6c09b34a7f1f605da
When -mt is used, the analysis pass will be split in two
and each halves performed in parallel. This gives a 5%-9% speed-up.
This was a good occasion to revamp the iterator and analysis-loop
code. As a result, the default (non-mt) behaviour is a tad (~1%) faster.
Change-Id: Id0828c2ebe2e968db8ca227da80af591d6a4055f
-pass 2 can be useful sometimes. More passes usually don't help more.
This change is a step toward being able to re-code the whole picture
with varying parameter (when token buffer is used).
Change-Id: Ia2538e2069a53c080e2ad248c18a1e04623a9304
* move yuv_in_/out_* scratch buffers to iterator
* add y_top_/uv_top_ shortcuts in iterator
That's ~3k of stack size instead of heap.
But it allows having several iterators work in parallel.
Change-Id: I6a437c0f2ef1e5d398c1d6a2fd4974fa0869f0c1
in_bits is const. Trying to apply bswap on it, one gets the error message:
error: read-only variable 'in_bits' used as 'asm' output
Change-Id: I0bef494b822c83d8ea87b1938b0e486d94de4742
The C-version gets ~7-8% slower in order to match the SSE2
output exactly. The old (now off-by-1) code is kept under
the WEBP_YUV_USE_TABLE flag for reference.
(note that calc rounding precision is slightly better ~= +0.02dB)
on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking
the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610)
Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f
If 'top' was meant to be NULL, then bottom and top can be
swapped. Logic is simpler.
+ fix compilation in non-FANCY_UPSAMPLING mode
Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff
Also created variant VP8LPrefixEncodeBits that returns the
code & extra_bits only.
There's no impact on compression density and compression speed.
Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c
WebPDemuxPartial() returns NULL for both of the following cases:
- There was a parsing error.
- It doesn't have enough data to start parsing.
Now, one can differentiate between these two cases by checking the value
of 'state' returned by WebPDemuxPartial().
Change-Id: Ia2377f0c516b3fcfae475c0662c4932d2eddcd0b
Earlier, all lossless images were assumed to contain alpha.
Now, we use the 'alpha_is_used' bit from the VP8L bitstream to determine
the
same.
Detecting an absence of alpha can sometimes lead to much more efficient
rendering, especially for animated images.
Related: refine mux code to read width/height/has_alpha information only
once
per frame/fragment. This avoid frequent calls to VP8(L)GetInfo().
Change-Id: I4e0eef4db7d94425396c7dff6ca5599d5bca8297
Speed up HashChainFindCopy by optimizing on number of calls to
FindMatchLength method.
This change speeds up the lossless & lossy (Alpha) encoding by 20%
without loss of compression density.
At method=3, lossy (Alpha) compression speed (and density) remains
unchanged, as at that settings, the costly Backward Refs method is not
called
Change-Id: Ia1797148e9e4ee2787011837fa248afbae2242cb
Disable costly 'BackwardReferencesTraceBackwards' for encoding Alpha plane.
Increase the threshold for triggering 'BackwardReferencesTraceBackwards' to
quality 25 and above. Also lower the Alpha quality (at method 3) to be
lesser than this threshold (25).
Change-Id: Ic29fb2e6943472c564223df9fe099b19ccda0f31
This speeds up WebP lossless decoding by 20%. In particular, the
photographic images get 35% speedup.
Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc
speeds up those codes that are not part of the main lookup.
This gives a 10 % speedup for a photographic image.
Change-Id: Ief54b0ad77db790a01314402ad351b40ac9a7be4
+ some revamp and cleanup of the alpha-filter trial loop
+ EncodeAlphaInternal() now just takes a FilterTrial param
Change-Id: Ief84385083b1cba02678bbcd3dbf707245ee962f
Specialize and simplify the alpha-decoding case, which is used when:
- no color-cache is use
- all red/blue/alpha values are the same (and hence their Huffman tree has
only 1 symbol. We don't need to consume any bits for reading these).
+ revamped the loop to use size_t and offsets instead of pointers.
~2-3% faster on Unix (gcc) but up to 25% faster lossy+alpha decoding
on Mac (llvm) and ARM.
Change-Id: I43c9688d1e4811cab0ecf0108a5b8f45781083e6
* 0.3.0: (57 commits)
update ChangeLog
Regression fix for alpha channels using color cache:
wicdec: silence a format warning
muxedit: silence some uninitialized warnings
update ChangeLog
update NEWS
bump version to 0.3.1
Revert "add WebPBlendAlpha() function to blend colors against background"
Simplify forward-WHT + SSE2 version
probe input file and quick-check for WebP format.
configure: improve gl/glut library test
update copyright text
configure: remove use of AS_VAR_APPEND
fix EXIF parsing in PNG
add doc precision for WebPPictureCopy() and WebPPictureView()
remove datatype qualifier for vmnv
fix a memory leak in gif2webp
fix two minor memory leaks in webpmux
remove some cruft from swig/libwebp.jar
README: update swig notes
...
Conflicts:
NEWS
examples/gif2webp.c
src/dec/alpha.c
src/dec/idec.c
src/dec/vp8l.c
src/enc/alpha.c
src/enc/vp8l.c
Change-Id: Ib202fad7825a090c3b3a5169acd171369cface47
+ split AllocateInternalBuffers() into two 32b/8b variants instead of
trying to do everything in one function.
Change-Id: I35cac9fcd990a2194c95da4b2a4046ca3a514343
Considering the fact that insert to/lookup from the color cache is always 32
bit, use DecodeImageData() variant in that case.
Conflicts:
src/dec/vp8l.c
Change-Id: I6c665a6cfbd9bd10651c1e82fa54e687cbd54a2b
(cherry picked from commit a37eff47d6)
src/mux/muxedit.c:490: warning: 'x_offset' may be used uninitialized in this function
src/mux/muxedit.c:490: warning: 'y_offset' may be used uninitialized in this function
Change-Id: I4fd27f717e59a556354d0560b633d0edafe7a4d8
(cherry picked from commit 14cd5c6c40)
Considering the fact that insert to/lookup from the color cache is always 32
bit, use DecodeImageData() variant in that case.
Change-Id: I6c665a6cfbd9bd10651c1e82fa54e687cbd54a2b
src/mux/muxedit.c:490: warning: 'x_offset' may be used uninitialized in this function
src/mux/muxedit.c:490: warning: 'y_offset' may be used uninitialized in this function
Change-Id: I4fd27f717e59a556354d0560b633d0edafe7a4d8
src\dec\vp8l.c(816) : warning C4244: '=' : conversion from '__int64' to
'int', possible loss of data
src\dec\vp8l.c(817) : warning C4244: '=' : conversion from '__int64' to
'int', possible loss of data
Change-Id: I1d376d5dea909395bff8741aba16e8eed83a6e8f
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.
also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )
Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
(cherry picked from commit 9c4ce971a8)
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
(cherry picked from commit d640614d54)
output picture object is overwritten, not free'd or destroyed.
Change-Id: Ibb47ab444063e7ad90ff3d296260807ffe7ddbf9
(cherry picked from commit 23d28e216d)
The auto-infer logic of detecting the 'Alpha' use case
(via check '(palette[i] & 0x00ff00ffu) != 0' is failing
for this corner case image with all black pixels (rgb = 0)
and different Alpha values.
-> switch generic use-LUT detection
Change-Id: I982a8b28c8bcc43e3dc68ac358f978a4bcc14c36
(cherry picked from commit afa3450c11)
Added 1 pixel cache for palette colors for faster lookup.
This will speedup images that require ApplyPalette by 6.5% for lossless
compression.
Change-Id: Id0c5174d797ffabdb09905c2ba76e60601b686f8
(cherry picked from commit 742110ccce)
* "declaration of ‘index’ shadows a global declaration [-Wshadow]"
* "signed and unsigned type in conditional expression [-Wsign-compare]"
Change-Id: I891182d919b18b6c84048486e0385027bd93b57d
(cherry picked from commit 87a4fca25f)
Earlier such images were using roughly 9 * width * height bytes for
decoding. Now, they take 6 * width * height memory.
Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7
(cherry picked from commit 64c844863a)
Simply get rid of an intermediate buffer of size width x height, by
using the fact that stride == width in this case.
Change-Id: I92376a2561a3beb6e723e8bcf7340c7f348e02c2
(cherry picked from commit edccd19436)
doing so is not part of ISO C; removes some pedantic warnings
Change-Id: I739ad8c5cacc133e2546e9f45c0db9d92fb93d7e
(cherry picked from commit 96e948d7b0)
The output surface CAN be changed inbetween calls to
WebPIUpdate() or WebPIAppend(), but with precautions.
Change-Id: I899afbd95738a6a8e0e7000f8daef3e74c99ddd8
(cherry picked from commit ff885bfe1f)
This applies to images with optional chunks (e.g. images with ALPH
chunk,
ICCP chunk etc). Before this, the incremental decoding used to work like
non-incremental decoding for such files, that is, no rows were decoded
until
all data was available.
The change is in 2 parts:
- During optional chunk parsing, don't wait for the full VP8/VP8L chunk.
- Remap 'alpha_data' pointer whenever a new buffer is allocated/used in
WebPIAppend() and WebPIUpdate().
Change-Id: I6cfd6ca1f334b9c6610fcbf662cd85fa494f2a91
(cherry picked from commit ead4d47859)
Start VP8EncLoop/VP8EncTokenLoop only if VP8EncStartAlpha succeeded.
Change-Id: Id1faca3e6def88102329ae2b4974bd4d6d4c4a7a
(cherry picked from commit 67708d6701)
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.
Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
(cherry picked from commit e7d9548c9b)
Tuned the cross_color transform parameter (step) for lower quality
levels. This change gives speedup of 20% at lower qualities (25) and 10% at
moderate quality level (50) with a loss of 0.25% in compression density.
Also removed TODO for cross_color transform. Observed good correlation of
this with the predict transform.
Change-Id: I8a1044e9f24e6a5f84295c030fd444d0eec7d154
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
The auto-infer logic of detecting the 'Alpha' use case
(via check '(palette[i] & 0x00ff00ffu) != 0' is failing
for this corner case image with all black pixels (rgb = 0)
and different Alpha values.
-> switch generic use-LUT detection
Change-Id: I982a8b28c8bcc43e3dc68ac358f978a4bcc14c36
Added 1 pixel cache for palette colors for faster lookup.
This will speedup images that require ApplyPalette by 6.5% for lossless
compression.
Change-Id: Id0c5174d797ffabdb09905c2ba76e60601b686f8
'mem' was being offset once by DO_ALIGN() then shifted 'nz_size' which
would end up accounting for more than ALIGN_CST and exceed the allocation.
broken since:
9bf3129 align VP8Encoder::nz_ allocation
Change-Id: I04a4e0bbf80d909253ce057f8550ed98e0cf1054
* "declaration of ‘index’ shadows a global declaration [-Wshadow]"
* "signed and unsigned type in conditional expression [-Wsign-compare]"
Change-Id: I891182d919b18b6c84048486e0385027bd93b57d
Earlier such images were using roughly 9 * width * height bytes for
decoding. Now, they take 6 * width * height memory.
Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7
Simply get rid of an intermediate buffer of size width x height, by
using the fact that stride == width in this case.
Change-Id: I92376a2561a3beb6e723e8bcf7340c7f348e02c2
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.
also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )
Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
it's not often the case, but could happen, that chroma has non-zero
coeff but luma hasn't. In such case, we should skip luma right away
Change-Id: I9515573ffaec8aad8b069d2c02ffbda4a6eff97c
Removed a call to WebPParseHeaders() inside VP8GetHeaders(). This was not needed
anyway, as all call flows already call WebPParseHeaders() before calling
VP8GetHeaders().
This avoids duplicate calls to WebPParseHeaders().
Change-Id: Icb2d618bd26c44220d956c17a69c9c45a62d5237
The output surface CAN be changed inbetween calls to
WebPIUpdate() or WebPIAppend(), but with precautions.
Change-Id: I899afbd95738a6a8e0e7000f8daef3e74c99ddd8
This applies to images with optional chunks (e.g. images with ALPH
chunk,
ICCP chunk etc). Before this, the incremental decoding used to work like
non-incremental decoding for such files, that is, no rows were decoded
until
all data was available.
The change is in 2 parts:
- During optional chunk parsing, don't wait for the full VP8/VP8L chunk.
- Remap 'alpha_data' pointer whenever a new buffer is allocated/used in
WebPIAppend() and WebPIUpdate().
Change-Id: I6cfd6ca1f334b9c6610fcbf662cd85fa494f2a91
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.
Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
user can now call WebPEncode() with any YUVA or ARGB format, for
lossy or lossless compression
also: simplified error reporting, which is done in WebPPictureARGBToYUVA()
and WebPPictureYUVAToARGB()
Change-Id: Ifb68909217175bcf5a050e5c68d06de9849468f7
(cherry picked from commit 07d87bda1b)
transfer ownership of chunk passed in ChunkSetNth(). prevents freeing
the chunk in e.g., MuxImageParse() when a partial assignment (alpha but
no image) has occurred.
Change-Id: Ia69656b04fdf50f098f3816b54abd4e191248de3
Saturation was done on input coeff, not quantized one.
This saturation is not absolutely needed: output of FTransformWHT
is in range [-16320, 16321]. At quality 100, max quantization steps is 8,
so the maximal range used by QuantizeBlock() is [-2040, 2040].
But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so
it's better to leave this saturation check for now.
addresses issue #145
Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28
* merge cost calculation functions (BitsEntropy() and HuffmanCost())
* have HistogramAdd() specialized into separate functions
* use threshold to bail-out early
* revamp code a bit
* also: save memory by freeing free(histogram_image)
Change-Id: I8ee5d2cfa1462d5d6ea6361f5c89925a3720ef55
* VP8L shouldn't have an alpha chunk
* expect an animation to only contain frames, not a mix of image chunks
* enforce ANIM/ANMF order
* expect a full frame in a complete file
Change-Id: I953a8b6058f9bc00f1d042635548f158abdf6fce
subdirectories with more than one target can have the install targets
run in parallel with make -jN. group the shared headers in one place to
produce a common install target.
Change-Id: I1f3aa338a8ee6d681de1e5d0b2c6244d2c3d5451
Reported to eventually be 4% on ARM
(see https://code.google.com/p/webp/issues/detail?id=134 for details)
We might activate it selectively later...
Output values is not bitwise the same as the LUT-based
version, but difference is only +/-1 at max.
Change-Id: I1cc790ff4459885ed2ae2e72f31c5f3740095f07
this allows DLLs to be built under mingw and sets up a more obvious
dependency in the shared objects
Change-Id: I33e8a6132a16ca49563492438a1b3b74be9ed6a1
expands to e.g., -lpthread/-pthread, etc. if --enable-threading is used
and additional libs/flags are required
Change-Id: I8e8a7b44450bee32ddc58097e1e309d972b1092a
This is required for WebP lossy+Alpha images, where Alpha channel is taking
60-70% of the compression (CPU) cycles.
Also evaluated on 1000 PNG corpus and overall compression speed
is 15-40% better for lossy (PNG+Alpha) compression.
The pure lossless compression numbers are almost same (or little
better) with this change.
Change-Id: I9e5ae7372ed6227a9a5b64cd9cff84c747195a57
using token-buffer (that is: slightly more memory. O(output_size))
This change is ON by default. To return to previous behaviour, use
'cwebp -low_memory' or set config.low_memory to true.
Side-effect of this new mode: it forces 1 partition only (which was
default anyway), and makes some statistics about the bitstream
no longer available. cwebp will no longer report 'intra4-coeffs', etc.
This mode also doesn't work (yet) with multi-pass, and -low_memory
is currently forced for multi-pass.
also: reversed the flag: USE_TOKEN_BUFFER -> DISABLE_TOKEN_BUFFER
also: fixed the kAverageBytesPerMB estimate
Change-Id: I4ea80382038d6df4309663e0cb7bd88d9bca9cf1
When a ANMF/FRGM chunk size (read from file) is smaller than ANMF/FRGM header
size (which is constant and implicit), the parser should report an error.
Change-Id: I91d71889937f5133a97f1e83d5254cb2d7f37028
broken since:
ad25032 Merge "multi-threaded alpha encoding for lossy"
this produced an error due to an empty VP8TBuffer struct.
Change-Id: I640809d07d20092c1d660e2b59b58a62a12e4371
Also tweak the code for single image and fragmented image cases, so that
dmux->num_frames_ is always correct.
Change-Id: I31e6904222e4d96a54a0d8c8aa73d43b7a9094e7
new option: 'cwebp -mt ...'
new config flag: config.thread_level
(allowed thread_level are 0 or 1 for now. Maybe more later...)
If -mt is activated (and WEBP_USE_THREAD is used for compile), the alpha-compression
will be done in parallel to RGB coding for lossy. Can save quite a bit of latency...
Has no effect for lossless encoding.
Change-Id: I769d0bf90e7380cf99344ad62cd77277f4df5a46
It now returns ALPHA_FLAG for lossless images with alpha, that don't
have a VP8X chunk.
This is consistent with similar methods WebPMuxGetFeatures() and
WebPGetFeatures().
Change-Id: Ia3a4ca8f3e0f102f478bd33e0727ca5be98593df
larger values are still dealt with in the .cc
~5% faster encoding
Output size is slightly different (variably), because of
different floating-point calculation ordering.
Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4
-> split libraries further into decoder / encoder
-> add libwebpdecoder.a in Makefile.unix
-> make dwebp link against libwebpdecoder.a in Makefile.unix
also: in makefile.unix, pass EXTRA_FLAGS to LDFLAGS too
(otherwise, -m32 wouldn't work, e.g.)
Change-Id: Ief3da02a729dd86bbaf949ed048836716941657f
signed integer overflow behavior is undefined, split PrefixEncode() to
two branches to avoid this.
Change-Id: I6e2761d0d77f0aaceafdc4e07232e089c22beb64
* add SSE2 variant for lossless
* speed-up TransformColor calls using specialized TransformColorBlue/Red
* Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously.
This latter changes the output size a little bit.
Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89
Simplify and re-organize the VP8L bit-reader functions
(e.g.: the 40-bit look-ahead code was helping much)
Speed-up with LBITS=64, on arm7-a:
=> before:
./dwebp_justify_24_neon -v bryce_ll.webp
Time to decode picture: 11.393s
File bryce_ll.webp can be decoded (dimensions: 11158 x 2156).
...
=> after (LBITS=64): Time to decode picture: 9.953s
making the VP8L bit-reader in 32 bit mode is going to be
harder (because we need to be able to read two symbols
at a time, each with max length 15 bits)
Change-Id: I89746fb103b87b5e2fd40a3208a6fbc584b88297
This flag will make the code use no uint64, no asm, and no fancy
trick, but instead aim at being as simple and straightforward as
possible.
Main use is to help emscripten generate proper JS code.
More code needs to be simplified later.
Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY
Here are the typical timing for decoding a large image:
ARM7-a:
dwebp_justify_32_neon Time to decode picture: 3.280s
dwebp_justify_24_neon Time to decode picture: 2.640s
dwebp_justify_16_neon Time to decode picture: 2.723s
dwebp_justify_8_neon Time to decode picture: 2.802s
dwebp_justify_32 Time to decode picture: 4.264s
dwebp_justify_24 Time to decode picture: 3.696s
dwebp_justify_16 Time to decode picture: 3.779s
dwebp_justify_8 Time to decode picture: 3.834s
dwebp_32_neon Time to decode picture: 4.010s
dwebp_24_neon Time to decode picture: 2.725s
dwebp_16_neon Time to decode picture: 2.852s
dwebp_8_neon Time to decode picture: 2.778s
dwebp_32 Time to decode picture: 4.587s
dwebp_24 Time to decode picture: 3.800s
dwebp_16 Time to decode picture: 3.902s
dwebp_8 Time to decode picture: 3.815s
REFERENCE (HEAD) Time to decode picture: 3.818s
x86_64:
dwebp_justify_32 Time to decode picture: 0.473s
dwebp_justify_24 Time to decode picture: 0.434s
dwebp_justify_16 Time to decode picture: 0.450s
dwebp_justify_8 Time to decode picture: 0.467s
dwebp_32 Time to decode picture: 0.474s
dwebp_24 Time to decode picture: 0.468s
dwebp_16 Time to decode picture: 0.468s
dwebp_8 Time to decode picture: 0.481s
REFERENCE (HEAD) Time to decode picture: 0.436s
i386:
dwebp_justify_32 Time to decode picture: 0.723s
dwebp_justify_24 Time to decode picture: 0.618s
dwebp_justify_16 Time to decode picture: 0.626s
dwebp_justify_8 Time to decode picture: 0.651s
dwebp_32 Time to decode picture: 0.744s
dwebp_24 Time to decode picture: 0.627s
dwebp_16 Time to decode picture: 0.642s
dwebp_8 Time to decode picture: 0.642s
Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225
This option remaps internal parameters to better match
the expected compression curve of JPEG and produce output files
of similar size, but with better quality.
Change-Id: I96a1cbb480b1f6a0c6845a23c33dfd63f197b689
If it's not a partial file and parser returns PARSE_NEED_MORE_DATA, then
consider it to be PARSE_ERROR.
Change-Id: Id652a345bd2a9f574970272dd0a00517de113215
If a NULL pre-allocated buffer is passed, a buffer will be automatically
allocated.
+ add some parameter checks.
reported in http://code.google.com/p/webp/issues/detail?id=139
Change-Id: I9e14ed97db30ee12e46b5e92aac7eeaaeb99bfd5
store values to a temporary variable before calling functions that take
vector types.
removes non-standard constructs such as:
(uint8x8x2_t){{ a, b }}
fixing:
src/dsp/upsampling_neon.c:69:32: error: macro "vst2_u8" passed 3
arguments, but takes just 2
Change-Id: Ib4368e16e3a3efac18024f02be94e76243ade2dc
Fixes: https://code.google.com/p/webp/issues/detail?id=140
- along the lines of the SSE chroma upsampling.
Total speedup is ~30%.
4% speed loss on YuvToRgbXX conversion using tables instead
of 14-bit fixed precision. TODO(later): investigate, and compare
to x86.
see http://code.google.com/p/webp/issues/detail?id=134
Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009
When the config option '--enable-libwebpdecoder' is specified, the
lean decoder library 'libwebpdecoder' will be created in addition to
libwebp. Also dwebp binary will be linked to libwebpdecoder, if this
config option is specified.
Change-Id: I9de3e149b59c9a8390fae2ba660941749640e54a
Actually, it turns out we now should never call these functions
with a zero size, otherwise something is wrong in the logic.
Change-Id: Ie414fcbec95486c169190470a71f2cff0843782a
also change lossless encoder logic, which was relying on explicit
NULL return from WebPSafeMalloc(0)
renamed function to CheckSizeArgumentsOverflow() explicitly
addresses issue #138
Change-Id: Ibbd51cc0281e60e86dfd4c5496274399e4c0f7f3
- precompute filtering strength once for all at the beginning
instead of per-macroblock
- reduce size of VP8MB struct from 8 bytes to 4.
- removed VP8StoreBlock() accordingly
Change-Id: Icf3d329473e21c464770be3d72a04c9ee4c321f2
GetCoeffs is (by far) the most consuming function of the decoder.
No speed change (unfortunately), but the main loop is somehow clearer.
Change-Id: I78f1c10cadc2c8696c041f5cbda86cab92cc6598
* treat the last coeff as a special case
* re-arrange the inner code to be shorter
* replace some VP8EncBands[n] by n, for n = 0 or 1
Change-Id: I71e17b014cffad7b073e787fde06260905a6953f
The main advantage is that you can avoid the use of uint64_t
some times, sticking to 32bit only.
Default still is BITS=32, this is mainly "in case".
Change-Id: Id694028793117ba822c37d46ef6c52fa0afed4ac
- Separate out mux.h and demux.h
- muxtypes.h: new header for data types common to mux/demux
- Move some misc read/write utilities to utils/utils.h
- Remove some duplicate methods.
- Separate out mux/demux libraries
Change-Id: If9b9569b10d55d922ad9317ef51710544315d6de
We don't need to use the exact forward transform,
since it's only a rough evaluation.
-> Removed some shifts and rounding constants.
Change-Id: I3fdf8b4fe9720473894155e1ad0345f4d1fd9a33
In particular, this removes any unnecessary FRGM/ANMF/ANIM chunks, and
indirectly leads to removal of unnecessary VP8X chunks as well.
This is especially useful for GIF to WebP conversion - it saves 56 bytes
(ANMF: 16+8 bytes, ANIM: 6+8 bytes, VP8X: 10+8 bytes) for non-animated GIFs.
Change-Id: I3b50a96ca585844c421b0fa4cd8593e52c3f95c5
This is a correction to the following change:
a00a3daf5b Use 'frgm' instead of 'tile' in
webpmux parameters
Change-Id: I8fa0bce98efdde38827fd25712017a98a6ea7388
- Allow a duration of 0
- Rename LOOP chunk to ANIM and add the background color field to it.
- Add a disposal method field for each animation frame.
- Modify webpmux.c binary interface to allow the input of background color
and disposal methods. Also make '-loop' and '-bgcolor' arguments optional
with some default values.
Change-Id: I807372a61cdb8a0d3080ae3552caf2848070bf4d
10-15% faster encoding.
Almost same output, binary wise. The main difference is
that we can't compute uv_alpha susceptibility, means there
can be subtle differences with different -sns values.
Change-Id: Id1b1a50929bf125b6372212fee1ed75a3bed975f
- Also, use the term 'fragments' instead of 'tiling' in code
- This makes code consistent with the spec.
Change-Id: Ibeccffc35db23bbedb88cc5e18e29e51621931f8
- Make ANMF and FRGM chunks hierarchical so that they encompass all chunks of
that frame.
- Use this in demuxer: stop parsing a frame if all image data for it isn't
available yet. Thus, we have a frame-level incremental support; that is,
all frames that are fully available can be parsed.
- Note: We still keep incremental support for single images - so that they can
be decoded with incremental decoding.
Change-Id: Id1585b16b06caee1d84009c42a25d2de29fa6135
Use separate fourCCs "XMP " and "EXIF" instead of a common "META"
Also, some refactorization in webpmux.c
Change-Id: Iad3337e5c1b81e785c60670ce28b1f536dd7ee31
Number of pairs selected are limited between 25% of histogram
images (at start) and number of histogram images left at any iteration.
Increase the range of iter_mult.
Removed min_cluster_size as parameter for tuning HistogramCombine.
Change-Id: Ia4068cd7af4d0f63c5af9001aceda8a40b9de740
* commit 'v0.2.1':
Update ChangeLog
update NEWS
bump version to 0.2.1
libwebp: validate chunk size in ParseOptionalChunks
cwebp (windows): fix alpha image import on XP
autoconf/libwebp: enable dll builds for mingw
[cd]webp: always output windows errors
fix double to float conversion warning
cwebp: fix jpg encodes on XP
VP8LAllocateHistogramSet: fix overflow in size calculation
GetHistoBits: fix integer overflow
EncodeImageInternal: fix uninitialized free
fix the -g/O3 discrepancy for 32bit compile
fix the BITS=8 case
Make *InitSSE2() functions be empty on non-SSE2 platform
make *InitSSE2() functions be empty on non-SSE2 platform
make VP8DspInitNEON() public
Conflicts:
src/Makefile.am
src/dsp/dec_neon.c
Change-Id: Iddc5152e4a6892db96c12d7c3f74adbc85fe6178
Contributed by Wayne Chen (datoudatou at gmail dot com)
+ some header cleanup
+ remove the NEON suffix in static functions
Change-Id: I75bf5e9b54cf5e1acc53764c6f081d61690f8e3d
(implements the backward and forward transforms in the encoder)
original patch by Wayne Chen (datoudatou at gmail dot com)
Change-Id: Ic00f3bffcdf7a924f043006728735c810ee47a57
- Changed the dynamic range where more aggressive
(BackwardReferencesTraceBackward) heuristic is run from quality > 10
(instead of quality > 25).
- Limit the backward-ref Window size to 16*width & 256*width for lower
qualities ([0, 25[ & [25, 50[) respectively, instead of 1M window.
- Evaluate the params for HashChainFindCopy outside this function call
and pass it, instead of recomputing them for every call.
Change-Id: If9eedfc14b978e7632d7cf69c96186e2910b0554
the max wasn't checked leading to a rollover case, possibly exploitable.
additionally check the RIFF size early, to avoid similar issues.
pulled from chromium:
http://codereview.chromium.org/11229048/
Change-Id: I4050b13a7e61ec023c0ef50958c45f651cf34c49
the multiplications done for total_size would be done with integers,
possibly overflowing, before being promoted to 64-bit for the addition
Change-Id: I32c3a6400fc2ef120c38e01a8693f4cb1727234d
huff_image_size was a size_t (=32 bits with 32-bit builds) which could
rollover causing an incorrectly sized allocation and a crash in lossless
encoding.
fixes issue #128
Change-Id: I0f20cee98c29b2b40b02607930b6b7a7ca56996d
in debug mode, some float operations see their intermediate
values stored in memory rather than staying in the FPU (which
is 80bit precision).
Several fixes are possible (breaking long calculations into
atomic steps for instance), but simpler of all is just about
turning the cost[] array into float* instead of double*.
The code is a tad faster, and i didn't see any major output
size difference.
Change-Id: I053e6d340850f02761687e072b0782c6734d4bf8
this will avoid the "dec_neon.o has no symbol" warning
no change in binary size observed on linux.
Change-Id: Ifd83dfc6a0c61905481599b06cb5e711f55efa7d
the max wasn't checked leading to a rollover case, possibly exploitable.
additionally check the RIFF size early, to avoid similar issues.
pulled from chromium:
http://codereview.chromium.org/11229048/
Change-Id: Ifebc712bf3d3de0129b76ca4c57c68e062abc429
This is mostly for experimentation!
Need to define USE_YUVj flag in the code for that.
suggested by benwreder at hotmail dot com
Change-Id: If0b8e2c1863efc08ce097de6de20f4c7efc3f7e8
LSIM stands for "local similarity": before matching
a compressed pixel to the source, we search around in the source
and minimise the squared error. So, this is close to PSNR calculation,
but mitigates some of its limitations (pure translation and noise for instance).
There's a new -print_lsim option to cwebp too.
Change-Id: Ia38561034c7a90e71d2ea0f55bb1de527eda245b
Make the heuristic for combining Histograms a function of compression
quality. This change will speed-up compression time for compression
quality less than 75. The compression time/density remains unchanged
for compression quality 75 and higher.
Change-Id: I94513d51078340fbc0737d459fab2cebdd2d6082
the multiplications done for total_size would be done with integers,
possibly overflowing, before being promoted to 64-bit for the addition
Change-Id: Id5c127c8a497ce5de89a276c17f36b59eeb67c21
huff_image_size was a size_t (=32 bits with 32-bit builds) which could
rollover causing an incorrectly sized allocation and a crash in lossless
encoding.
fixes issue #128
Change-Id: I175c8c6132ba9792034807c5c1028dfddfeb4ea5
in debug mode, some float operations see their intermediate
values stored in memory rather than staying in the FPU (which
is 80bit precision).
Several fixes are possible (breaking long calculations into
atomic steps for instance), but simpler of all is just about
turning the cost[] array into float* instead of double*.
The code is a tad faster, and i didn't see any major output
size difference.
Change-Id: Icf1f833e15f8ee4ecc7f9a521d07fdc96ef711aa
Change-Id: I36d3765e94d2b5529b321c186ccee1744785c5b3
fixes:
error: ISO C++ forbids forward references to 'enum' types
since:
28d25c8 replace 'typedef struct {} X;" by "typedef struct X X; struct X {};"
Returning 0 (equal) can lead to undefined behaviour.
And, in our cases we'll never have equal keys (added asserts for that)
Change-Id: Ifaf202df321d3f877ad2a03de42e0d6cdd1b2388
SBITS=8 is reported 20-30% faster on ARM (where 64bit ops
are expensive).
Also use 32bits for i32.
Change-Id: Id6a7197d805061aeb8832f20432512d0d930ebfa
fixes the 'blocky sky problem' (saturation problem: when luma was flat,
chroma noise was taking over, resulting in random segment id assigned.
When just using a common uniform segment was better).
+ side clean-up and readibility/experimentability MACRO'ization
+ added '-map 7' option
Change-Id: I35982a9e43c0fecbfdd7b05e4813e8ba8c121d71
this will avoid the "dec_neon.o has no symbol" warning
no change in binary size observed on linux.
Change-Id: Ia27ae2bc5a03d714afa7e46671fdcf4cb630784d
lossy was rounding with a bias toward opaque:
[232+, 8] -> [15, 1]
now both paths use the range:
[240+, 16] -> [15, 1]
Change-Id: I3da2063b4959b9e9f45bae09e640acc1f43470c5
With this, MODE_rgbA can safely be used without speed penalty
even in case of pure-lossy alpha-less input.
It's also an optimization when cropping a fully-opaque region from
an image with alpha: premultiply is then skipped
Change-Id: Ibee28c75744f193dacdfccd5a2e7cd1e44604db6
* green was not descaled properly
* alpha was over-dithered, making the value '0x0f' not be a fixed point
* alpha value was not restored ok.
Change-Id: Ia4a4d75bdad41257f7c07ef76a487065ac36fede
Fix the lossless decoder for the case when it has to apply other
inverse transforms before applying Color indexing inverse transform.
The main idea is to make ColorIndexingInverse virtually in-place: we
use the fact that the argb_cache is allocated to accommodate all
*unpacked* pixels of a macro-row, not just *packed* pixels.
Change-Id: I27f11f3043f863dfd753cc2580bc5b36376800c4
Added a threshold of MAX_COLORS_FOR_GRAPH for color-palettes, above
which the graph hint is ignored.
Change-Id: Ia5d7f45e52731b6eaf2806999d6be82861744fd3
spurious in this case, but addresses e.g.,
... potentially uninitialized local variable 'weighted_average' used
Change-Id: Ib99998bf49e4af7a82ee66f13fb850ca5b17dc71
Order-by-cost mostly unchanged (up to a scaling constant 1/log(2))
(except for few minor diff in < 2% of cases)
+ remove unused field cost_mode->cache_bits_
Change-Id: I714f8ab12f49a23f5d499a64c741382c9b489a3e
no speed diff observed by removing the test before calling BitWriterResize().
+ remove some unnecessary memset() in VP8LBitWriter
+ fix mixed code/variable-decl in BIG_ENDIAN mode
Change-Id: I36be61f83d10a43e4682b680c2dae0e494da4218
* ~1-4% faster
* if it's not used, don't use it
* remove the special handling of cache_bits = 0
* remove some tests in the loops
Change-Id: I19d87c3ca731052ff532ea8b2d8e89816507b75f
For low-color images, it may be better to not use color-palettes.
Users should treat this as one another hint (as with Photo &
Picture) and another parameter for tuning the compression density.
The optimum compression can still be obtained by running (outer loop)
compression with all possible tunable parameters.
Change-Id: Icb1a4face2a84774e16e801aee4a8ae97e232e8a
also relocate user_data from WebPAuxStats to the WebPPicture struct to
make clearing easier while placing it closer to the progress hook with
which it's used.
prior to this change some spurious lossless data could be reported in
the lossy (sans alpha) encoding case. additionally user_data could be
lost during lossless encoding.
Change-Id: I929fae3dfde4d445ff81bbaad51445ea586dd80b
* Extend AuxStats with new fields
it's slightly ABI-incompatible, but i guess it's ok for 0.1.99+
I expect to add more stats later, possibly (predictor stats, etc.)
* Have cwebp report the features used by lossless
compression (either for alpha or full lossless coding)
* Print the PSNR for alpha (useful in case of -alpha_q)
* clean-up alpha.c signatures
+ misc cleanup (added const '* const ptr', etc.)
Change-Id: I157a21581f1793cb0c6cc0882e7b0a2dde68a970
* commit 'v0.1.99': (39 commits)
Update ChangeLog
add extra precision about default values and behaviour
header/doc clean up
Makefile.vc: fix webpmux.exe *-dynamic builds
remove INAM, ICOP, ... chunks from the test webp file.
harmonize authors as "Name (mail@address)"
makefile.unix: provide examples/webpmux target
update NEWS
README: cosmetics
man/cwebp.1: wording, change the date
add a very crude progress report for lossless
rename 'use_argb_input' to 'use_argb'
add some padding bytes areas for later use
fixing the findings by Frederic Kayser to the bitstream spec
add missing ABI compatibility checks
Doc: container spec text tweaks
add ABI compatibility check
mux.h: remove '* const' from function parameters
encode.h: remove '* const' from function parameters
decode.h: remove '* const' from function parameters
...
Conflicts:
src/mux/muxinternal.c
Change-Id: I635d095c451742e878088464fe6232637a331511
Put emphasis on RGBA decoding instead of RGB/BGR
Clarify doc at some rough spots
Misc typo fix and cosmetics
Change-Id: Ic5fcfcc5bf4d612c5de23b0a5499f1fadde55bfe
- WEBP_MUX_INVALID_PARAMETER: was used only at one place, and that too
should actually be an assert().
- WEBP_MUX_ERROR: was never used.
Change-Id: I8883cb4dfae7a7918507501f21fced0c04dda36a
minor revision shouldn't matter, we only check major revision number.
Bumped all version numbers so that incompatibility starts *now*
Change-Id: Id06c20f03039845ae4cfb3fd121807b931d67ee4
the VP8 decode functions do not need to be public; only
GetInfo/CheckSignature need to be marked extern as they're used by
libwebpmux.
Change-Id: Id9ab4d6166b0271cf5d04563c6dac1fcc84adbdc
- Match offsets, duration, width/height for frames/tiles and enforce
some constraints.
- Note that this also means using 'int's instead of 'uint32_t's for
16-bit and 24-bit fields.
Change-Id: If0b229ad9fce296372d961104aa36731a3b1304b
could have led to some negative overflow on 32bit arch
(if it was not for the "total_size == (size_t)total_size" test)
Change-Id: I7640340b605b9c674d30dd58a1e2144707299683
the extended file format is still under development and related
libs/binaries are not fit for release
configure:
--enable/disable-libwebpmux; default is disabled.
makefile.unix:
src/mux/libwebpmux.a and examples/webpmux must be explicitly specified
Makefile.vc:
$(DIRLIB)\libwebpmux.lib and $(DIRBIN)\webpmux.exe must be explicitly
specified
Change-Id: I8246746b256010dd2a2e4de58291222d7eaf0457
The image_info_ was used only in GetImageCanvasWidthHeight(). So, now
we infer it from data there.
This removal fixes a bug: earlier, 'image_info' wasn't initialized in
the WebPMuxCreate() flow, and so the canvas width/height were being
calculated to be zero.
Also, a related refactoring: Combine CreateImageInfo() and
CreateDataFromImageInfo() into a single function CreateFrameTileData().
Change-Id: I7b0afb0d36dc6e13b9d6a1135fb027aa4e03716c
Evaluated the impact of this change over 1000 image corpus.
The compression density is up (on average) by 1.2% and encoding time has
gone down considerably from 716 ms (per file) to 146 ms (per file)
(4.9X improvement in encoding time).
Change-Id: Ida562cc0bfe18c9d6f5f00873c95f8396b480eab
Adds new methods WebPPictureARGBToYUVA() and WebPPictureYUVAToARGB()
Depending on the value of picture->use_argb_input,
the main call WebPEncode() will convert appropriately.
Note that both conversions are lossy, so it's recommended to:
* use YUVA input for lossy compression (picture->use_argb_input=0)
* use ARGB input for lossless compression (picture->use_argb_input=1)
Change-Id: I8269d607723ee8a1136b9f4999f7ff4e657bbb04
* add a real proper pointer for holding memory chunk pointer
(instead of using y and argb fields)
* polish the doc with details
* add a WebPPictureView() that extract a view from a picture
without any copy (kind of a fast-crop).
* properly snap the top-left corner for Crop/View. Previously,
the luma position was not snapped, and was off compared to the chroma.
Change-Id: I8a3620c7f5fc6f7d1f8dd89d9da167c91e237439
'Set' and 'Get' methods for images take/return a bitstream as input,
instead of separate 'image' and 'alpha' arguments.
Also,
- Make WebPDataCopy() a public API
- Use WebPData for storing data in WebPChunk.
- Fix a potential memleak.
Change-Id: I4bf5ee6b39971384cb124b5b43921c27e9aabf3e
Check for valid bounds on the 'dist' in backward reference case.
Clamp it to 1 in case of zero and negative values.
Change-Id: I78e956d4595955efa02b1f9628b475093f6ee001
Adds a test of enc->mb_h_ to VP8IteratorProgress() to avoid a division
by zero. Fixes issue #121.
Original patch from even rouault (even dot rouault at gmail dot com).
Change-Id: Ie5fcc1821860c0a9366d5aa11f3aded4f5b98ed7
This limit corresponds to default histo_bits=3 for images upto sizes 400x400.
Any image higher than this dimension will bump up the histo_bits to 4 internally.
Change-Id: Ic8ba3dcd50e9c588cbbc4a0457289086498ff4ee
Fix inequality assertion on number of palette colors.
Fix inequality assertion test in BackwardReferencesHashChainFollowChosenPath.
Change-Id: Ie3242f1bbeaf96db91b839b6732ccce2634cebf3
- Move TAG_ID to webp/mux.h
- Rename it to WebPChunkId
- Rename IDs to WEBP_CHUNK_<tag>
- Remove "name" param from ChunkInfo struct and related changes.
- Rename WebPMuxNumNamedElements to WebPMuxNumChunks().
- WebPMuxNumChunks() takes WebPChunkId as param.
Change-Id: Ic6546e4a9ab823b556cdbc600faa137076546a2b
moves the implementation to ParseHeadersInternal. this also allows
decoding to start at a VP8X sub-chunk, e.g. 'ALPH'.
Change-Id: I06791f87d90f888de32746ecb02705e4b0ff227a
Used when we don't code a Huffman Image.
-> Simplify the code quite some because we don't have to
deal with special cases of histo bits
Change-Id: I0c3f46cbf3b501e021c093e07253e7404c01ff4f
ParseVP8X was checking for presence of extra 20 bytes (after RIFF header).
This check should not be executed for non-mux (non-VP8X) images.
Change-Id: I3fc89fa098ac0a53102e7bbf6c291269817c8e47
Updated histo_bits to 3 from 4 and changed the quality threshold for inner loop for HashChainFindCopy.
Impact: 0.5%-0.8% better bpp with 15%-20% hit on encoding throughput
at default encoding settings.
Change-Id: I316ef88403148b1e19036fa0817d944eb0301255
Ensure that the lossless bit-stream doesn't allow for such cases and
safe-gaurd decoder against indefinite recursion.
Change-Id: Ia6d7f519291de8739f79a977a5800982872aae71
When importing BGRA or RGBA data for encoding, provide variants of
the WEBPImportPicture API for RGBX and BRGX data meaning the alpha
channel should be ignored.
Author: noel@chromium.org
from Chromium patch: https://chromiumcodereview.appspot.com/10496016/
Change-Id: I15fcaa4160c69a2b5549394204b6e6d7a1c5d333
VP8-lossy will now avoid writing an ALPH chunk if the
alpha values are trivial.
+ changed DumpPicture() accordingly in cwebp
+ prevented the -d option to be active with lossless
(DumpPicture wouldn't work).
Change-Id: I34fdb108a2b6207e93fa6cd00b1d2509a8e1dc4b
The new modes are
MODE_rgbA
MODE_bgrA
MODE_Argb
MODE_rgbA_4444
It's binary incompatible, since the enums changed.
While at it, i removed the now unneeded KeepAlpha methods.
-> Saved ~12k of code!
* made explicit mention that alpha_plane is persistent,
so we have access to the full alpha plane data at all time.
Incremental decoding of alpha was planned for, but not
implemented. So better not dragged this constaint for now
and make the code easier until we revisit that.
Change-Id: Idaba281a6ca819965ca062d1c23329f36d90c7ff
Change the lossless signature to 0x2f
Add 1 bit indicator for 'droppable (or trivial) alpha)'.
Add 3 bit lossless version (for future extension like yuv support).
Change the sub-resolution information to 3 bits implying range [2 .. 9]
Change-Id: Ic7b8c069240bbcd326cf5d5d4cd2dde8667851e2
Take picture and percent value storage location instead of VP8Encoder.
This will allow reuse by the lossless encoder.
Change-Id: Ic49dbc800cc3e2df60d20f4ebac277f68ed6031b
Previously, it used to assume any raw bitstream is a VP8 one.
Also,
- Factor out VP8CheckSignature() & VP8LCheckSignature().
- Use a local var for *data_ptr in ParseVP8Header() for
readability.
Change-Id: I0fa8aa177dad7865e00c8898f7e7ce76a9db19d5
frame's InitIo should not reset fancy_upsampling flag.
This flag (fancy_upsampling) is set via CustomSetup -> WebPIoInitFromOptions
and frame's InitIo() is resetting it to 0.
Change-Id: I64b54cdfba43799c0a5aa8e384575af5d6331674
CostModelBuild() and TrackBackwards() returns weren't checked
+ code clean-up
+ de-inline VP8LBackwardRefs non-critical methods
+ shuffle the .h around to group things together
+ extract some constants as #define's
+ fixed the "if (!(cc_init = ...)) {...}" constructs
+ removed some unneeded VP8L prefixes
Change-Id: Ic634cb87bc6b2033242d3e8e8731fab4c134f327
may be useful later for instance to bypass some code
if we know we don't use the Bundle+ColorMap transform.
Change-Id: I9dc70d18165b2363ad9ede763684ef3d8eba5903
we vary linearly lossless-method between 0 and 6,
and lossless-quality between 50 and 100, so that encoding
speed can go from 'quite fast' to 'rather slow'.
Impact on size is moderate, but visible.
Change-Id: I0b7917e7170eb50258afb1a4e248028cd9e9207d
- Make sure alpha flag is set in case of a lossless file with VP8X chunk.
The semantic of ALPHA_FLAG changes with this: it means the images
contain alpha (rather than ALPH chunk in particular).
- Update the mux container spec to add 1-line description of alpha
flag.
- Rename "HasLosslessImages()" to "MuxHasLosslessImages()", and other
similar function renames.
- Rename FeatureFlags to WebPFeatureFlags
- Elaborated a comment for a special case.
- A misc comment fix.
Change-Id: If212ccf4338c125b4c71c10bf281a51b3ba7ff45
Defining LOCAL_ARM_NEON = true can result in neon instructions being
used in portions unprotected by the cpu check.
This changes defines a WEBP_USE_NEON/WEBP_ANDROID_NEON pair similar to
the SSE2 code and MSVC.
Change-Id: Ifac010b06e42c73d5aca529baa2198c6796674bd
This saves ~26 bytes of headers.
* introduce new VP8LDecodeAlphaImageStream() for decoding
* use VP8LEncodeStream() for encoding
* refactor code a bit
still TODO: make the alpha-quality/enc-method user-configurable
Change-Id: I23e599bebe335cfb5868e746e076c3358ef12e71
Limit the overall number of transformations to 4 and disallow any
duplicate transform for decoding an image.
Change-Id: Ic4b0ecd553db96702e117fd073617237d95e45c0
this allows later customization of data output method.
No perf diff observed, even if ProcessRows is no longer inlined.
Change-Id: I6933a3612a9cf6c108cf2776dfde0ae80c6c07c0
+ small opportunistic fixes:
* allow NULL decoded_data to be passed to DecodeStream
and clarity (with assert()) when to do so
* AllocateAndInitRescaler() was already setting error status,
as it should. No need to do it at caller's site
Change-Id: I30867e596564a7f459a0d1ddbf6f5d312414b7fd
* RIFF header is omitted
* rename NewVP8LEncoder and DeleteVP8Encoder
* change the signature to take a "const WebPPicture*"
(it was non-const just because we were setting some error potentially)
* made the pic_ field const in VP8LEncoder too.
* trap the bitwriter::error_ too
* simplify some signatures to take WebPPicture* instead
of unneeded VP8LEncoder*
VP8LEncodeStream() will be called directly to compress alpha channel
header-less.
Change-Id: Ibceef63d2b3fbc412f0dffc38dc05c2dee6b6bbf
This is a border-case situation: the picture is not const, because
we're change its error status. But taking it non-const forces
the caller to carry a non-const picture all around the code just
in case (0.00001% of the time?) something bad happen.
This pretty much the same as making all objects non-const because
we'll eventually call delete or free() on them, which is quite a
non-const operation. Well... Better allow constness enforcement for
the remaining 99.9999% of the code.
Change-Id: I9b93892a189a50feaec1a3a518ebf488eb6ff22f
now, we only use 2 bits for the filtering method, and 2 bits
for the compression method.
There's two additional bits which are INFORMATIVE, to specify
whether the source has been pre-processed (level reduction)
during compression. This can be used at decompression time
for some post-processing (see DequantizeLevels()).
New relevant spec excerpt:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ChunkHeader('ALPH') |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Rsv| P | F | C | Alpha Bitstream... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Compression method (C): 2 bits
: The compression method used:
* `0`: No compression.
* `1`: Backward reference counts encoded with arithmetic encoder.
Filtering method (F): 2 bits
: The filtering method used:
* `0`: None.
* `1`: Horizontal filter.
* `2`: Vertical filter.
* `3`: Gradient filter.
Pre-processing (P): 2 bits
: These INFORMATIVE bits are used to signal the pre-processing that has
been performed during compression. The decoder can use this information to
e.g. dither the values or smooth the gradients prior to display.
* `0`: no pre-processing
* `1`: level reduction
Decoders are not required to use this information in any specified way.
Reserved (Rsv): 2 bits
: SHOULD be `0`.
Alpha bitstream: _Chunk Size_ - `1` bytes
: Encoded alpha bitstream.
This optional chunk contains encoded alpha data for a single tile.
Either **ALL or NONE** of the tiles must contain this chunk.
The alpha channel data is losslessly stored as raw data (when
compression method is '0') or compressed using the lossless format
(when the compression method is '1').
Change-Id: Ied8f5fb922707a953e6a2b601c69c73e552dda6b
will be called by alpha post-processing, although doing nothing for now.
Gradient smoothing would be nice-to-have here. Patch welcome!
Change-Id: I534cde866bdc75da22d0f0a6d1373c90e21366f3
* Method #1 is now calling the lossless encoder on the alpha plane.
Format is not final, it's just a first draft. We need ad-hoc functions.
* removed now useless utils/alpha.*
* added utils/quant_levels.h instead
* removed the TCoder code altogether
Change-Id: I636840b6129a43171b74860e0a0fc5bb1bcffc6a
- Separate out 'CHUNK_INDEX' from 'TAG_ID' (this is to help with the
situation where two different tags - "VP8 " and "VP8L" can have the
same TAG_ID -> IMAGE_ID).
- Some internal methods now take 'CHUNK_INDEX' param instea of 'TAG_ID'
as appropriate.
- Add kChunks[] entry for lossless.
- Rename WebPMuxImage.vp8_ --> WebPMuxImage.img_
- SetImage() and AddFrame/Tile() infer whether the bitstream is a
lossless one based on LOSSLESS_MAGIC_BYTE. The correct tag is stored
based on this.
Also, handle the case when GetVP8Info/GetVP8LInfo() fails.
Change-Id: I6b3bc9555cedb791b43f743b5a7770958864bb05
-> lot of simplifications ensue and we should be able to get rid of
ClearHuffmanTreeIfOnlyOneSymbol() too, in a subsequent patch.
Change-Id: Ic4c51d05e4b1970e37f94ffd85fae6a02e4a6422
Allocate big chunk of memory in GetHuffBitLengthsAndCodes, instead of allocating in a loop.
Also fixed the potential memleak.
Change-Id: Idc23ffa306f76100217304444191a8d2fef9c44a
The size written in VP8L header should be without padding.
(Also clarified this code using consts).
Change-Id: Ic6583d760c0f52ef61924ab0330c65c668a12fdc
When we are at end-of-stream, but haven't decoded all pixels, we should
return an error.
Also remove an obsolete TODO.
Change-Id: I3fb1646136e706da536d537a54d1fa487a890630
- Symbols added to the tree are valid inside HuffmanTreeBuildExplicit().
- In HuffmanTreeBuildImplicit(), make sure 'root_symbol' is
valid in case of a single symbol tree.
Change-Id: I7de5de71ff28f41e2d6228b29ed8dd4a20813e99
if histogram_image_size is reduced in when writing the histogram_image
the bit arrays would leak any remaining elements. store their element
count separately.
Change-Id: I710142a11ebd4325faec7bd65c2d2572aae19307
[Basically, the condition "src - dist < data" can be wrongly evaluated
to be false if "src < dist" due to underflow. Instead, "src - data <
dist" is the correct condition, as "src > data" is always true and so
there would never be an underflow].
Change-Id: Ic9f64bfe76a9acae97abc1fb7c1f4868e81f1eb8
CompareHuffmanTrees() and SetBitDepths()):
- Move 'tree_size' initialization and malloc for 'tree + tree_pool'
outside the loop.
- Some renames/tweaks for readability.
Change-Id: I5cb3cc942afac6e9f51a0b97c57ee897677a48a2
The current implementation doesn't take care one byte
signature and associated one byte padding (for odd sized chunk).
Change-Id: I35b81d0644818cdba38189aa48c75db5f92e68f4
The new method VP8LGetBackwardReferences hides internal
heuristics used for choosing RLE or LZ77 based refs.
- Tuned VP8LHashChainFindCopy for better compression at higher Q.
- Refactored code.
- Removed the unused method VP8LVerifyBackwardReferences.
Change-Id: Ibb7bb072bab5a49a001577a20d88226f52e6c663
arrays can be passed directly as only their members are being modified.
this also reduces the allocation for bit_codes[] by taking the
sizeof(type)=2 rather than sizeof(ptr)=4/8 in one case.
Change-Id: Idad20cead58c218b58d90b71699374fefd01cad9
add proper cpu-detection for Android targets
Fixes issue #118 (and is a better solution for #117).
based on patch by pepijn vaneeckhoudt
Change-Id: I6b00ea6d51ca658ccf6a3d55b87b99c01c6805be
* lossless_encoder: (46 commits)
split StoreHuffmanCode() into smaller functions
more consolidation: introduce VP8LHistogramSet
big code clean-up and refactoring and optimization
Some cosmetics in histogram.c
Approximate FastLog between value range [256, 8192]
Forgot to update out_bit_costs to symbol_bit_costs at one instance.
Evaluate output cluster's bit_costs once in HistogramRefine.
Simple Huffman code changes.
Lossless decoder: remove an unneeded param in ReadHuffmanCodeLengths().
Reducing emerging palette size from 11 to 9 bits.
Move GetHistImageSymbols to histogram.c
Improve predict vs no-predict heuristic.
code-moving and clean-up
reduce memory usage by allocating only one histo
Restrict histo_bits to ensure histo_image size is under 32MB
further simplification for the meta-Huffman coding
A quick pass of cleanup in backward reference code
Make transform bits a function of encode method (-m).
introduce -lossless option, protected by USE_LOSSLESS_ENCODER
Run TraceBackwards for higher qualities.
...
Conflicts:
src/enc/webpenc.c
Change-Id: I9a5d98cba0889ea91d10699466939cc283da345a
VP8LHistogramSet is container for pointers to histograms that
we can shuffle around. Allocation is one big chunk of memory.
Downside is that we don't de-allocate memory on-the-go during
HistogramRefine().
+ renamed HistogramRefine() into HistogramRemap(), so we don't
confuse with "HistogramCombine"
+ made VP8LHistogramClear() static.
Change-Id: Idf1a748a871c3b942cca5c8050072ccd82c7511d
* de-inline some function
* make VP8LBackwardRefs be more like a vectorwith max capacity
* add bit_cost_ field to VP8LHistogram
* general code simplifications
* remove some memmov() from HistogramRefine
* simplify HistogramDistance()
...
Change-Id: I16904d9fa2380e1cf4a3fdddf56ed1fcadfa25dc
Profiled data: Profiled few images and found that in the function VP8LFastLog,
90% of time table lookup is performed, while rest of time (10%) call to log
function is made. Typical lookup accounts for 10 CPU instructions and call to
log 200 instruction counts. The weighted average comes out to be 30
instructions per call. For mid qualities (25-75), this function (VP8LFastLog)
accounts for 30-50% of total CPU cycles (via call path: VP8LCOlorSpaceTransform
-> PredictionCostCrossColor -> ShannonEntropy). After this change, the log is
called less that 1% of time, with average instructions being 15 per call.
Measured the performance over 1000 files for various qualities and found
overall compression speedup between 10-15% (in quality range [0, 75]). The
compression density loss is around 0.5% (though at some qualities, compression
is little better as well).
Change-Id: I247bc6a8d4351819c871f19d65455dc23aea8650
Avoid bit_costs evaluated every time in function HistogramDistance. Also moved
VP8LInitBackwardRefs and VP8LClearBackwardRefs to backward_references.h
Change-Id: Id507f164d0fc64480aebc4a3ea3e6950ed377a60
No empty trees are codified with the simple Huffman code. The simple Huffman
code is simplified to be either a 1-bit code or 8-bit code for symbols.
Change-Id: I3e2813027b5a643862729339303d80197c497aff