this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.
Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
(cherry picked from commit d77bf512bd)
the 'accum' variable can be larger than 15b for large
rescale values.
Assert triggered:
src/dsp/rescaler_sse2.c:249: RescalerExportRowExpand_SSE2: Assertion `v >= 0 && v <= 255' failed.
src/dsp/rescaler_sse2.c:350: RescalerExportRowShrink_SSE2: Assertion `v >= 0 && v <= 255' failed.
-> fall back to C implementation in this case for now
Change-Id: I7ea1cb72301cafc1459be403f6a6f4e3cbc89bb1
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates function pointer casts that were causing cfi-icall
failures.
[1] https://www.chromium.org/developers/testing/control-flow-integrity
BUG=chromium:827826
Change-Id: I5db021d06390a6cefd670fdd2f0d34c9e530465e
(cherry picked from commit 978eec2507)
Output is <.1% difference in size, randomly.
Speed is 30-50% faster (-m 0 -sharp_yuv).
It also gives the exact same output on ARM and x86, because floats
are no longer used.
Change-Id: Id0f0aa748cc4fc0b82bac1fc5ca954775a0a1b7c
do_copy is a loop invariant, but based on a variable parameter; it would
only be extracted if Import was inlined.
Change-Id: Id5b4a1a4a83a4f2083444da4934e4c994df65b44
for q<=98, we always enable error diffusion.
+ reduce storage 2x by using int8_t
+ make the error diffusion more robust
BUG=webp:340,308
Change-Id: I0608df839ff7b64d6843005a0f81d2577143af9e
* regarding alpha_data_ used for testing.
alpha_data_!=NULL is as close a good test as we'll get.
* regarding filter-strength / sharpness forcing
no practical use (can be done during encode cycles,
for experimentation)
* regarding a 'less-complex' filtering:
no practical use so far. Next version!
Change-Id: If2dfff5818552a7d3e7c23ac08d64fe6d270229c
For large images overflowing the partition0, we re-do a number
of passes but were forgetting to reset the block_count[].
This was leading to incorrect summary.
+ some cosmetic fixes here and there
BUG=webp:355
Change-Id: Ie87158d7f177f8efdca429b146cfcd0e81652d2f
We can't predict if the quality is going to be below the threshold
eventually, so we might as well enable it always.
Change-Id: I30aedecc8c6d4daf159f6ef152697df0206d1e93
This fixes some color smearing due to heavy quantization.
This is only enabled for q <= 30 (cf ERROR_DIFFUSION_QUALITY)
Change-Id: I07e83a4d38461357a32c9e214f7eadc6db73baa9
with WEBP_NEON_OMIT_C_CODE the default _C functions won't be set and
with WEBP_REDUCE_CSP the NEON functions won't be either triggering an
assert for an empty table member.
BUG=chromium:792627
Change-Id: I8d2d430eaa37bb92885b61a3dd39f961924a8def
RGB to YUV conversion was not using SSE to finish up the row.
End data is now copied to a buffer big enough to fit in a
SSE register.
(UPSAMPLE_LAST_BLOCK was already using that trick).
Change-Id: Ie539bcbe570a643a774aa88263503c0d2c41890f
alpha processing is still required when requesting premultiplied output
since:
1b27bf8b WEBP_REDUCE_SIZE: disable all rescaler code
Change-Id: Id1b03256c4c04b8db31527e60cd31dd20ce6f3ad
only supported ones are: RGBA/BGRA/rgbA/bgrA (decoder)
as well as: WebPPictureImportRGB/RGBX/RGBA (encoder).
(note: extras/get_disto is affected too)
Change-Id: If6c4f95054ca15759c4e289fb3b4c352b3521c2c
(cherry picked from commit 6de20df02c)
remove auto-filter (-af) support and make WebPPictureCopy,
WebPPictureIsView, WebPPictureView, WebPPictureCrop, and
WebPPictureRescale noops.
Change-Id: If39d512cc268a0015298a1138dbc94feb86575e5
with gcc-4.8, clang-4.0.1/5 this is no faster (actually up to 2x slower)
than the code generated for memset (0x01010... * dst[-1]). shuffles in
sse4 recover a bit, but performance is still down.
Change-Id: Ie85e8353f8ede559d0b05a1d388787fd18ecc80f
Rewrote WebPPictureHasTransparency() to use them (even for argb).
This is 10% faster, for some reasons.
SSE2 version should be straightforward.
Removes a TODO.
Change-Id: I7ad5848fc5e355e2df505dbcd5a0f42fb6cbab41
The WebPDemux and WebPAnimDecoder APIs are provided for the purpose of
animated webp parsing and decoding. No major changes are currently
planned for the libwebp API.
Change-Id: I2758ecda195b0c4091572d5731a0a85fa3716303
this can result in an alignment hint on arm causing a SIGBUS. casting
the input ptr to anything aside from its type is unnecessary for memcpy
and is contrary to the intent of this function.
Change-Id: I9a4d3f4be90f80cd8c3e96ccbe557e51e34cf7a5
when targeting NEON C functions with NEON equivalents won't be used, but
will contribute to binary size. the same goes for sse2, etc., but this
change is primarily concerned with binary sizes for android arm targets.
note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect
on the use of NEON functions.
this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0.
Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface
* changes:
{dec,enc}_neon: harmonize function suffixes x2
upsampling_neon: harmonize function suffixes
yuv_neon: harmonize function suffixes
rescaler_neon: harmonize function suffixes
lossless_neon: harmonize function suffixes
lossless_enc_neon: harmonize function suffixes
filters_neon,cosmetics: fix indent
enc_neon: harmonize function suffixes
dec_neon: harmonize function suffixes
and all older versions.
force Sub3() to not be inlined, otherwise the code in Select() will be
incorrect.
extends the check add previously in:
637b3888 dsp/lossless: workaround gcc-4.9 bug on arm
BUG=webp:363
Change-Id: I1403b558f8660b764f3a570a3326822d5ef0be29
avoids over reading if the reported ANMF payload is < 8 bytes.
likely broken since:
81b8a741 Design change in ANMF and FRGM chunks:
Change-Id: I3e267bafea348a50545587dea8fafb2199c6b650
ssse3 is bit #9 in ecx, bit 1 is sse3. this only controls the check for
slow ssse3 and likely had no ill effect.
Change-Id: I84ce73dc480e1cdbd085e37be06f3f402116c201
calculate the file duration using unsigned math. this could still result
in an incorrect average duration calculation if there were multiple
rollovers. caching the duration is an option if it was desirable to
support such an extreme case.
Change-Id: I3875d94d081fec947c03a857055df6e27ff5351d
For a pixel, we look for the longest match starting in a window around it.
For the following pixel, the previous result can be used and smaller
search window is used.
Change-Id: Ice16f9a7c8754099d068380848f0d77de3f756ac
add a check for cpu-features.h and rework some of the ifdef's around
android + neon. for android builds with cpu-features enabled the
*_neon.c files will still need to be flagged correctly (with e.g.,
.c.neon in Android.mk) to properly build them.
BUG=webp:353
Change-Id: I905ce305af0a204e560b915d8665093a3edaceb9
including the type in the macro doesn't bring much benefit to ordering,
current platforms work with a prefix, this would be insufficient if the
attribute needed to follow the function prototype. this form makes it
easier to override on the command line.
BUG=webp:355
Change-Id: Iba41ec0bb319403054be0e899c4cc472dd932fd9
this check was relocated in:
b903b80c Split cost-based backward references in its own file.
quiets -Wundef
Change-Id: I7f7a4773fb8cc77ca9f671b11f50d5db2275d415
If "exact" is false, we can modify the luma samples in fully transparent
areas to facilitate lossy compression. Experiments on some PNG images
show compression improvement of more than 20%.
Change-Id: I1a728cfa920a6652bc1f600d87c01f7f648c4942
Half of the functionality was duplicated.
The rest is about the alpha channel handling so we
might as well put it in the appropriate file.
Change-Id: I8d5ef0afce82cc4842ab7132fd97995c42e6140a
As backward references use the plane code when checking the cost
of a distance, statistics used to compute the cost should use it too.
This provides a small compression improvement at no speed cost.
Change-Id: Icade150929ee39ef6dc0d8b1fc85973086ecf41d
quiets undefined sanitizer warnings of the form:
left shift of 128 by 24 places cannot be represented in type 'int'
Change-Id: I8a389f2ac9238513517180f302f759425eeb7262
When re-initializing a bit writer, we could set invalid values because
the bit writer was not big enough.
Change-Id: Id25ab6712603245a5a12d5f4a86fe35a9a799a5d
This is to prepare the inclusion of <windows.h>
FrameRect => FrameRectangle
CLIP_MASK => CLIP_8b_MASK
Change-Id: Ia4b1fa4ac06137b4102c91e232206a1fb7159ce0
The patch 21735e0 introduced a bug where a goto path was not testing
the eos_ state. If this happened just before a row_sync, a SaveState()
would be called that would store the eos_ state as '1' till the end
of the loop. This usually was not a problem, except for the very last
chunk where we disable the incremental decoding altogether (we have all
the data). The termination tests were then going wrong.
The fix is to add a proper eos_ test and avoid falling in this inconsistent
state.
(21735e06f7)
BUG=webp:332
Change-Id: Ib16773aee26bfd068fbf4e9db3d2313bd978b269
Before, the color cache size was chosen optimally for LZ77 and
the same value was used for RLE. Now, we optimize its value
taking both LZ77 and RLE into account.
Unfortunately, that comes with a small CPU hit.
Change-Id: I6261f04af78cf0784bb8e8fc4b4af5f566a0e071
Between each iteration we keep track of the previously found
potential merge hence less work to do.
Change-Id: I2b6237447e79443516a6111727d96c24f10bd98a
It was a bad implementation of a Lehmer random number generator
(the saturation was done wrong and mostly & was used instead of % .....).
That lead to "for" loop stuck with the same values given a specific seed,
hence wasted "for" loops (e.g. seed getting at 374988608 and modulo of 64
later leads to 0 even when updating the seed with the old formula).
As the "for" loops now always return a proper pair of histograms, their
number can greatly be reduced, hence a speedup.
Change-Id: I9f5b44d66cc96fd4824189d92276c3756c8ead5b
This code is ultra-critical for lossless decoding, especially on ARM.
The extra call VP8LIsEndOfStream() was causing unnecessary slow-down.
Now, we check for bitstream-end separately in the main loop.
Change-Id: I739b5d74cc29578e2b712ba99b544fd995ef0e0d
Currently, none are available. If WEBP_HAVE_SSE2 eventually works,
we'll have to refine this conditionals.
BUG=webp:261
Change-Id: Ibc63ee1c013f2a4169eeb85cc8b6317b6420c2ad
Previously, the stochastic method for histogram
combination could finish in a greedy way
if the number of iterations to perform so was smaller.
Except that another greedy combination was performed
afterwards ... hence wasted CPU in some cases.
Change-Id: Ic0f26873e6dc746679486b91cb35d73efee91931
The initial re-writing of this part of the code with intervals
had to be done with a complex logic (mostly intervals with a
lower and upper bound, not a constant value like now) to properly
deal with the inefficiencies of the then LZ77 algorithm.
The improvements made to LZ77 since, now allow for a simpler logic.
There were also small errors in the interval insertion logic
that lead to small inefficiencies (hence a slightly better
compression rate).
Change-Id: If079a0cafaae7be8e3f253485d9015a7177cf973
Documentation says: "if kmin == 0, then key-frame insertion is disabled;
and if kmax == 0, then all frames will be key-frames."
Reading this, you'd expect that if kmax == 0, then with any kmin <= 0
all frames will be key-frames. But actually the kmin <= 0 test is caught
first and you get the opposite (no keyframes but the first). You'd have
instead to set kmax == 0 and any value kmin > 0, which is absolutely
counter-intuitive (reversing order).
Moreover kmax == 1 has no valid kmin (kmin == 1 conflicts with the
`kmax > kmin` rule and kmin == 0 conflicts with `kmin >= kmax / 2 + 1`).
So it should be considered an exception too.
Instead I propose this new logic:
- kmax == 1 means that all frames are keyframes (you are explicitly
requesting a keyframe every 1 frame at most, i.e. all frames).
- kmax == 0 means no keyframes (you ask for a keyframe every 0 frames,
i.e. never).
This is more "logical" language-wise, and also does not involve any
conflicts about what if both kmax and kmin are 0, since now a single
property value is meaningful for the 2 exceptional cases.
Change-Id: Ia90fb963bc26904ff078d2e4ef9f74b22b13a0fd
(cherry picked from commit 2dc0bdcaee)
Compile with XCode, it appears quite slower than the C-version,
especially for arm64.
Change-Id: Ic46dba184a36be454fef674129d2f909003788fc
(cherry picked from commit 4f3e3bbd44)
Documentation says: "if kmin == 0, then key-frame insertion is disabled;
and if kmax == 0, then all frames will be key-frames."
Reading this, you'd expect that if kmax == 0, then with any kmin <= 0
all frames will be key-frames. But actually the kmin <= 0 test is caught
first and you get the opposite (no keyframes but the first). You'd have
instead to set kmax == 0 and any value kmin > 0, which is absolutely
counter-intuitive (reversing order).
Moreover kmax == 1 has no valid kmin (kmin == 1 conflicts with the
`kmax > kmin` rule and kmin == 0 conflicts with `kmin >= kmax / 2 + 1`).
So it should be considered an exception too.
Instead I propose this new logic:
- kmax == 1 means that all frames are keyframes (you are explicitly
requesting a keyframe every 1 frame at most, i.e. all frames).
- kmax == 0 means no keyframes (you ask for a keyframe every 0 frames,
i.e. never).
This is more "logical" language-wise, and also does not involve any
conflicts about what if both kmax and kmin are 0, since now a single
property value is meaningful for the 2 exceptional cases.
Change-Id: Ia90fb963bc26904ff078d2e4ef9f74b22b13a0fd
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
We can switch at run-time between the standard GetCoeffs() critical
function, that uses a fast variant of VP8GetBit().
However, some platforms have slow instructions that make standard
VP8GetBit() slow. GetCoeffs() is the right level of branching to
switch to GetCoeffsAlt() that avoids these slow instructions in some
not-frequent cases.
Next patch will upgrade VP8GetBit() to use clz, after this one
is proved to be neutral speed-wise.
Change-Id: Ia6cef5de9de6131574d2202bbc0bea8559c9b693
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.
Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
This makes the structure more generic, without the hard-coded
internal structure.
This is a borderline incompatible ABI change, even if WebPIDecoder structure
is opaque.
Change-Id: I518765c3f76fc17a136cef045a5a8aa70ed70e85
30% faster on x86, 5% faster on N5.
New generic function: WebPLog2FloorC()
This function is called as fallback for BitsLog2Floor() when there's
no clz() available.
Change-Id: Ica15c6092112e514c0e200fab89c434de48d4b19
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.
Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731
Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
and 15% faster MultARGBRow()
by switching to formulae:
X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
(X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128
Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
The previous optimization was performing dichotomy on a function that
is anything in practice, hence a bit of randomness.
Also, two magic constants were used, one for an extra constant cost,
one for an extra linear cost. Both values/models were empirical.
A brute force search for the best cache size is now performed.
To have less CPU impact, a speed optimization is also made by not
inserting a value again and again.
This makes sense but it's also the most common case of when LZ77 is
useful hence an overall improvement sometimes.
Change-Id: I57de5750ad2313b2feecbcd15cd6e4feeb98e5c8
- 12/13/2016: version 0.5.2
This is a binary compatible release.
This release covers CVE-2016-8888 and CVE-2016-9085.
* further security related hardening in the tools; fixes to
gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
#312)
* full libwebp (encoder & decoder) iOS framework; libwebpdecoder
WebP.framework renamed to WebPDecoder.framework (issue #307)
* CMake support for Android Studio (2.2)
* miscellaneous build related fixes (issue #306, #313)
* miscellaneous documentation improvements (issue #225)
* minor lossy encoder fixes and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYWfopAAoJEPnD1r24Iytd0gAQALhTSEjJVmKfHxyPNDduc3kn
QeiVaVwPiOS/a266+ZnWHzCvkR3zgqZxNlyKzRty378gM8/P7r2dMCmfdnVFbF4O
a7M1lld9yYldNpAxvHDnY9u2RzmRfVD1yYu27gv77uT7gR2IybQ81FHi1pn56tFA
2g4yHdrC2tXud22ZUb9Bgqe7YW06gWND4EmeJgxF38S98gdrtJla5rmlUcuEhbIl
SHpkbEgJX4nZxWggyCJ61/OxeEwwWBtI3kpSLkEqmCVSnFb7WBC7pITq59n8hg2U
SaYCfWGRJ/oQQvxUxuPYYtzq26dYOxd2vT9S1mcE1be9jMGxKp9vgE8jNflvtza1
wTPUajaPUjsTLAvFikQRo+34W9QxOKp9jCX9Be0V4wvBClfM13toBgKolzPGGUuo
zlcZ0/GgzwfQ+sD7bs/p/7ToiH+GejBUK7FUR8ZB7EHZrDynszSzEevx5SUzPWV3
1q4TyD5eclUOjb4S2yplcKp0kwkwtOA5ETboPzA+b8TQnfTFM3GP7fMoYvORbSZp
39/H5hi1bjlOE4m3mp3qqfR2DMWZlla7YNZiuuTEeY3ztrlqeakC2ma1Fhi6ZmbG
TrqmAaDTueRizry4E7Fr9sBw0mee14v/xcTFcDcSI1BRFclFc1KAw0ObzdaN2iEt
L5tjlqzH0XEH4fl5OnD3
=x+Y3
-----END PGP SIGNATURE-----
Merge tag 'v0.5.2'
libwebp-0.5.2
- 12/13/2016: version 0.5.2
This is a binary compatible release.
This release covers CVE-2016-8888 and CVE-2016-9085.
* further security related hardening in the tools; fixes to
gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
#312)
* full libwebp (encoder & decoder) iOS framework; libwebpdecoder
WebP.framework renamed to WebPDecoder.framework (issue #307)
* CMake support for Android Studio (2.2)
* miscellaneous build related fixes (issue #306, #313)
* miscellaneous documentation improvements (issue #225)
* minor lossy encoder fixes and improvements
* tag 'v0.5.2': (54 commits)
update ChangeLog
anim_util: quiet implicit conv warnings in 32-bit
jpegdec: correct ContextFill signature
Remove some errors when compiling the code as C++.
vwebp: clear canvas during resize w/o animation
tiffdec: restore libtiff 3.9.x compatibility
update NEWS
AnimEncoder: avoid freeing uninitialized memory pointer.
WebPAnimEncoder: If 'minimize_size' and 'allow_mixed' on, try lossy + lossless.
fix a potential overflow with MALLOC_LIMIT
bump version to 0.5.2
update AUTHORS & .mailmap
iosbuild.sh: add WebPDecoder.framework + encoder
AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
Fix assertions in WebPRescalerExportRow()
fix a typo in WebPPictureYUVAToARGB's doc
systematically call WebPDemuxReleaseIterator() on dec->prev_iter_
doc: use two's complement explicitly for uint8->int8 conversion
Anim_encoder: correctly handle enc->prev_candidate_undecided_
WebPPictureDistortion(): free() -> WebPSafeFree()
...
Change-Id: I16bcf54af41ce8fad98d4fbc8aa1df58f338fc23