and 15% faster MultARGBRow()
by switching to formulae:
X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
(X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128
Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
The previous optimization was performing dichotomy on a function that
is anything in practice, hence a bit of randomness.
Also, two magic constants were used, one for an extra constant cost,
one for an extra linear cost. Both values/models were empirical.
A brute force search for the best cache size is now performed.
To have less CPU impact, a speed optimization is also made by not
inserting a value again and again.
This makes sense but it's also the most common case of when LZ77 is
useful hence an overall improvement sometimes.
Change-Id: I57de5750ad2313b2feecbcd15cd6e4feeb98e5c8
- 12/13/2016: version 0.5.2
This is a binary compatible release.
This release covers CVE-2016-8888 and CVE-2016-9085.
* further security related hardening in the tools; fixes to
gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
#312)
* full libwebp (encoder & decoder) iOS framework; libwebpdecoder
WebP.framework renamed to WebPDecoder.framework (issue #307)
* CMake support for Android Studio (2.2)
* miscellaneous build related fixes (issue #306, #313)
* miscellaneous documentation improvements (issue #225)
* minor lossy encoder fixes and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYWfopAAoJEPnD1r24Iytd0gAQALhTSEjJVmKfHxyPNDduc3kn
QeiVaVwPiOS/a266+ZnWHzCvkR3zgqZxNlyKzRty378gM8/P7r2dMCmfdnVFbF4O
a7M1lld9yYldNpAxvHDnY9u2RzmRfVD1yYu27gv77uT7gR2IybQ81FHi1pn56tFA
2g4yHdrC2tXud22ZUb9Bgqe7YW06gWND4EmeJgxF38S98gdrtJla5rmlUcuEhbIl
SHpkbEgJX4nZxWggyCJ61/OxeEwwWBtI3kpSLkEqmCVSnFb7WBC7pITq59n8hg2U
SaYCfWGRJ/oQQvxUxuPYYtzq26dYOxd2vT9S1mcE1be9jMGxKp9vgE8jNflvtza1
wTPUajaPUjsTLAvFikQRo+34W9QxOKp9jCX9Be0V4wvBClfM13toBgKolzPGGUuo
zlcZ0/GgzwfQ+sD7bs/p/7ToiH+GejBUK7FUR8ZB7EHZrDynszSzEevx5SUzPWV3
1q4TyD5eclUOjb4S2yplcKp0kwkwtOA5ETboPzA+b8TQnfTFM3GP7fMoYvORbSZp
39/H5hi1bjlOE4m3mp3qqfR2DMWZlla7YNZiuuTEeY3ztrlqeakC2ma1Fhi6ZmbG
TrqmAaDTueRizry4E7Fr9sBw0mee14v/xcTFcDcSI1BRFclFc1KAw0ObzdaN2iEt
L5tjlqzH0XEH4fl5OnD3
=x+Y3
-----END PGP SIGNATURE-----
Merge tag 'v0.5.2'
libwebp-0.5.2
- 12/13/2016: version 0.5.2
This is a binary compatible release.
This release covers CVE-2016-8888 and CVE-2016-9085.
* further security related hardening in the tools; fixes to
gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
#312)
* full libwebp (encoder & decoder) iOS framework; libwebpdecoder
WebP.framework renamed to WebPDecoder.framework (issue #307)
* CMake support for Android Studio (2.2)
* miscellaneous build related fixes (issue #306, #313)
* miscellaneous documentation improvements (issue #225)
* minor lossy encoder fixes and improvements
* tag 'v0.5.2': (54 commits)
update ChangeLog
anim_util: quiet implicit conv warnings in 32-bit
jpegdec: correct ContextFill signature
Remove some errors when compiling the code as C++.
vwebp: clear canvas during resize w/o animation
tiffdec: restore libtiff 3.9.x compatibility
update NEWS
AnimEncoder: avoid freeing uninitialized memory pointer.
WebPAnimEncoder: If 'minimize_size' and 'allow_mixed' on, try lossy + lossless.
fix a potential overflow with MALLOC_LIMIT
bump version to 0.5.2
update AUTHORS & .mailmap
iosbuild.sh: add WebPDecoder.framework + encoder
AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
Fix assertions in WebPRescalerExportRow()
fix a typo in WebPPictureYUVAToARGB's doc
systematically call WebPDemuxReleaseIterator() on dec->prev_iter_
doc: use two's complement explicitly for uint8->int8 conversion
Anim_encoder: correctly handle enc->prev_candidate_undecided_
WebPPictureDistortion(): free() -> WebPSafeFree()
...
Change-Id: I16bcf54af41ce8fad98d4fbc8aa1df58f338fc23
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().
BUG=webp:322
Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
(cherry picked from commit 3ebe1c0003)
This improves compression by ~5% at default quality.
If only 'allow_mixed' is on (but 'minimize_size' isn't), we continue to
use a heuristic to try one of the two or both.
Change-Id: Ia573a73ea26ad25f9debff759eed69d2b0449e82
(cherry picked from commit 3f4042b52a)
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().
BUG=webp:322
Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
after:
fbba5bc optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...
Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
For some reason, gcc has hard time inlining this one...
Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]
Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
Set enc->prev_candidate_undecided_ as 0 when a frame is not chosen
as a possible keyframe, so that the dispose method can be
dispose-to-background.
Change-Id: If2899f5dbc06fb53705fb8240072ab6440a6de12
(cherry picked from commit 29fedbf58b)
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)
We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.
Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
(cherry picked from commit 0a3838ca77)
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).
Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.
Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
(cherry picked from commit e4cd4daf74)
some multiplies here and there needed some extra checks
and error reporting. Even if width * height is guaranteed
to be < 2**32, we were multiplying by num_channels and
triggering a 32b overflow.
Some multiplies were not using size_t or uint64_t, additionally.
Change-Id: If2a35b94c8af204135f4b88a7fd63850aa381bbf
(cherry picked from commit 1c36440094)
max_i4_header_bits_ could drop to zero for difficult image and trigger
a loop. Surprisingly, StatLoop() didn't have this bug.
Change-Id: Idc0f9eadef30a2b2f02041b994f25def30901e36
(cherry picked from commit 21e7537abe)
Pick the mode with the smallest alpha.
It only affects m0, in which case the mode decision is not re-examined
later in VP8Decimate(). Tests on some natural content png images show
PSNR increase as well as visual quality improvement.
Change-Id: Iea997e718cd7477160fa05eb7cfb35f4cec2fa9a
(cherry picked from commit 1377ac2ec1)
Average3 created a slowdown of 1-2% in lossless decoding.
Average4 created a slowdown of 2-3% in lossless decoding.
Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5
...instead of the pointers stored in the array.
Should be faster (inlined) and safer.
Also: suffix explicitly the functions with _SSE2
Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90
Before, a first thread could enter VP8LDspInitSSE2, set
VP8LPredictorsAdd to an SSE2 version BEFORE another thread
would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C
thus leading to a C version actually being the SSE2 one (which
would then create an infinite recursion in the SSE2 predictors
at execution).
Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a
Benchmarks from vrabaud@:
8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged
skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged
images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged
images/lpcb corpus speed: unchanged, corpus size: unchanged
images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged
images/png_doc corpus speed: unchanged, corpus size: unchanged
images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged
images/jpeg_small corpus speed: unchanged, corpus size: unchanged
images/icip_core1 corpus speed: unchanged, corpus size: unchanged
images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged
images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged
images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged
images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged
images/gif corpus speed: faster: -1.4 % , corpus size: unchanged
images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged
images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged
images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged
images/all corpus speed: faster: -1.5 % , corpus size: unchanged
images/png_small corpus speed: unchanged, corpus size: unchanged
images/png corpus speed: unchanged, corpus size: unchanged
images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged
Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b
avoiding triplets of data should make it easier to write SSE2 versions.
FilterRow() can now filter all input in one single pass
-> conversion is 15-20% faster (but still overall slow compared to -pre 0)
Change-Id: I14c3215e672fdecde7ec80394e814bdc7445019f
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)
We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.
Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I6ad9f93b6c4b665c559bff87716a7b847f66a20d
(cherry picked from commit 342e15f0ce)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I2881bec2884b550c966108beeff1bf0d8ef9f76b
(cherry picked from commit 1147ab4ee7)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I693cbb295df9cf94aa89294b19c0496bdbe84d18
(cherry picked from commit de9fa5074e)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I3d7b689be8d5751248a82d1021243d80d3f67203
(cherry picked from commit deb1b83199)
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).
Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.
Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
Roughly, if both the source and the reference areas are
darker too dark (R/G/B <= ~6), they are ignored.
One caveat: SSIM calculation won't work for U/V planes,
which are 128-centered and not related to luminance.
But WebPPlaneDistortion() enforces the conversion to RGB,
if needed.
Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591
Make WebPPictureDistortion() only compute distortion on A/R/G/B planes, not Y/U/V(A).
(not just for SSIM, but PSNR too).
This is to avoid problems with using SSIM on U/V channels.
If Y/U/V distortion is needed, one can always use WebPPlaneDistortion() individually.
Change-Id: If8bc9c3ac12a8d2220f03224694fc389b16b7da9
When compiling as experimental, WEBP_EXPERIMENTAL_FEATURES
would not be defined because the header defining it would
not be included.
Hence runtime errors in debug mode when running:
./cwebp -lossles whatever
...
Error! Cannot encode picture as WebP
Error code: 4 (INVALID_CONFIGURATION: configuration is invalid)
(detail: WebPConfig would have a random value set for
delta_palettization as config.c does not consider
it to exist.)
Change-Id: I41761cffe81a971130ed514b195a73d1c6dac1b7
* prevent 64bit overflow by controlling the 32b->64b conversions
and preventively descaling by 8bit before the final multiply
* adjust the threshold constants C1 and C2 to de-emphasis the dark
areas
* use a hat-like filter instead of box-filtering to avoid blockiness
during averaging
SSIM distortion calc is actually *faster* now in SSE2, because of the
unrolling during the function rewrite.
The C-version is quite slower because still un-optimized.
Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90