Fix another pessimization found by the pingo image compressor.
Refactoring is necessary to make LZ77 computation
common to cache or no-cache analysis.
Slower by 1.7x instead of 2x
Change-Id: I396701ea6e88543dbfe9471eb552877f6c8ce1e3
qmin / qmax are now using the pad[] spot at the end of the struct,
and we don't need to bump the ABI major number.
Change-Id: I41adcaf1600b29a5a05c9fe380bfd977cf425124
this was not giving a good alpha value, making the method 5/6 a little
blurrier than method 4 (!).
Change-Id: I69b9890dea21499c1af1753e87d9f7adf8b433de
This is particularly useful for multi-pass search (but not only),
to prevent the search from going over or below a reasonable threshold.
E.g.: 'cwebp -qrange 50 80 ...' will prevent any unreasonable degradation.
new cwebp option: -qrange min max
Change-Id: I59f394533535fc20b6996bc0895f4301476d5eff
PredictorSub0_SSE2 doesn't use 'upper' (neither does
VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing
pixels to avoid undefined behavior when offsetting a NULL pointer
BUG=chromium:1026858,oss-fuzz:19430
Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3
some toolchains may implement vcreate_u64 as an assignment to a vector
causing a type mismatch:
invalid conversion between vector type 'uint64x1_t' (vector of 1
'uint64_t' value) and integer type 'unsigned int' of different size
const uint64x1_t LKJI____ = vcreate_u64(L | (K << 8) | (J << 16) | (I << 24));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: I5c7b0076ad66d4b3fcdcb7ee9f59bbaa6f19b783
The workaround for GCC ARM must not be applied when another toolchain
(like MSVC) is used for the build.
Change-Id: I11ec4558902063ccb085d3f435e24b3a60739dd5
'upper' could be NULL and it would be increased.
But that is for predictor zero that does not use 'upper'.
Change-Id: Icd4ae6792cc55ea021b4f828c3dbdb5f03e120d8
Since people seem to write "2 ^ X" hoping that it means "1 << X", clang
recently added a warning for this pattern.
It incorrectly fires on this file. To suppress it, restructure the code
to be less clever. (Alternatively we could use "xor" instead of "^" or
write "0x2" instead of "2" but both seem worse.)
No intended behavior change.
Bug: chromium:995200
Change-Id: I64744345be5f5a8cd1f4aaeaf0982da239b378a7
sometimes, the last rows of the alpha plane contain more than NUM_ARGB_CACHE_ROWS
rows to process. But ExtractAlphaRows() was repeatedly calling ApplyInverseTransforms()
without updating the dec->last_row_ field, which is the starting row used as starting
point.
Fix would consist of either updating correctly dec->last_row_ before calling
ApplyInverseTransforms(). Or pass the starting row explicitly, which is simpler.
BUG=webp:439
Change-Id: Id99f2c28662d02b2b866cb79e666050be9d59e04
For some exact resonance the over-quantization was exactly
compensating the under-quantization, leading to resonance
and strange patterns.
-> we special-handle the very flat blocks, hopefully for the
greater good (and not just the bad-resonance case).
For 'fast mode' (-m 3 or less), we just pay special attention
to the border of the image, where the oscillation / instability
usually starts. For the inner part of the image, since we're not
doing rd-opt, it's harder to fix anything.
Overall, on 'regular' images, the change is written the noise,
often leading to overall faster encoding (because of the short-cut).
BUG=webp:432
Change-Id: Ifaa8286499add80fd77daecf8e347abbff7c3a15
missed in a788b49
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)
Change-Id: I52dcd9cd613107f5424177c277785b92430bffb7
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)
Change-Id: I7f08a836ddcf777454dfd5b877a81b62b2abac86
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -12 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 244 (8-bit,
unsigned)
Change-Id: I053c92301e55dcb0cae89a7733636283da942176
no change in object code
from clang-7 integer sanitizer:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
1955895199 (32-bit, unsigned) to type 'uint8_t' (aka 'unsigned char')
changed the value to 159 (8-bit, unsigned)
Change-Id: I0c3022339e34b9c9af03167ab827ade677973644
_mm_set1_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 65280 (32-bit, signed) to
type 'short' changed the value to -256 (16-bit, signed)
Change-Id: Iad64f6209a8c130a7df67515451ded45b3f91702
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 189 (32-bit, signed) to
type 'char' changed the value to -67 (8-bit, signed)
implicit conversion from type 'int' of value 128 (32-bit, signed) to
type 'char' changed the value to -128 (8-bit, signed)
implicit conversion from type 'int' of value 33909 (32-bit, signed) to
type 'short' changed the value to -31627 (16-bit, signed)
Change-Id: Id6b191b2c06881e27d447eeb1ff5bb2c1857b6ba
holding the associated mutex while signaling a condition variable isn't
necessary and in some implementations will reduce performance as the
woken thread may test the mutex, fail and go back to sleep.
Change-Id: Id685a47b0c76fc4a1c5acedcb6623e8c55056415
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 255 (32-bit, signed) to
type 'char' changed the value to -1 (8-bit, signed)
implicit conversion from type 'int' of value 33153 (32-bit, signed) to
type 'short' changed the value to -32383 (16-bit, signed)
Change-Id: Ic88c8ef3d00146d34f53a560582db673f818370d
no change in object code
from clang-7 -fsanitize=implicit-integer-truncation
implicit conversion from type 'int' of value -16 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 240 (8-bit,
unsigned)
Change-Id: Ia7cbaad247ab22b505b7f98b1247219c024f6db0
no change in object code
from clang-7 -fsanitize=implicit-integer-truncation
implicit conversion from type 'int32_t' (aka 'int') of value 287
(32-bit, signed) to type 'uint8_t' (aka 'unsigned char') changed the
value to 31 (8-bit, unsigned)
Change-Id: I692368bcc2f41412697b8ae51e53078831072891
no change in object code
from clang-7 -fsanitize=implicit-integer-truncation
implicit conversion from type 'int' of value 39736 (32-bit, signed) to type 'uint8_t' (aka 'unsigned char') changed the value to 56 (8-bit, unsigned)
Change-Id: I0ecf24c5b1b11e056c58b3b85ea529c30cdadf57
"implicit conversion from type 'uint32_t' (aka 'unsigned int') of value xxxxx (32-bit, unsigned) to type 'uint8_t' (aka 'unsigned char') changed the value to xx (8-bit, unsigned)"
and same with signed -> unsigned conversion with truncation.
Change-Id: I50cae41a9ce7edcfcc814cc3ee2556b927064f43
We saturate the result to [0..255]
It's the easiest and safest, given the wide variety of scaling
range we cover: we're not using floats, so precision is always
an issue at one end or the other of the scaling spectrum.
we also use:
round(a - floor(b))
instead of:
floor(a - round(b))
to handle difficult cases (ratio ~= .99, e.g.)
MIPS code is still disabled (and wrong)
Change-Id: I18d3f5ddc4c524879c257b928329b1c648fa7fb5
previously if the mappings allocation failed histo_queue->queue would be
uninitialized; split the conditionals
Change-Id: I1b50b987e734393893dc8a83a3f314522ccd0c83
note that config.exact defaults to 0 and point users to WebPEncode() if
the default isn't acceptable.
BUG=webp:424
Change-Id: I179c34649834aeadc1606d0856f33e8255048ea1
This is to prevent resizing to dimension 0
+ added some safety checks about src_width > 0 and src_height > 0
BUG=webp:418
Change-Id: Ic04a53ad26455d80538bc8681882a554fca2a340
Taking the comments from the internal ParseHeadersInternal.
(which is called by GetFeatures).
BUG=webp:411
Change-Id: I9999b4a183805e2db1456610a30024a0d8be4d00
It's safer to clip the passed param instead of doing 32b arithmetic
and clipping afterward.
Output is unchanged, but code no longer rely on UB.
Change-Id: Ia5b4de6e8863981753f1d17f062965a6a5da5bed
dst->cur_ was not set.
The bug occurred only with several VP8LBitWriter instances
(thread_level > 0) and in 32-bit (in 64-bit, src->cur_ was
always 0 in VP8LBitWriterClone()).
BUG=chromium:917029
Change-Id: I0d94a3d8e62b247fd616eebe1009868dc8a5ed2e
Move IsFlat to its own header. This allows it to continue to be
inlined. Using the RTCD and creating a distinct function slows down arm
builds.
flower mug
C 3.59 2.12
NEON 3.47 2.01
BUG=b/118740850
Change-Id: Id77e8f76d9e9790c498806e7070bbe37c10bc2e9
thresh is defined by FLATNESS_LIMIT_* which ranges from 2-10.
score_t is int64 which is a touch overkill.
Change-Id: I308bd440bf11643665d3642fe361495a257b6e52
Direct copy of sse2. Slight improvement because neon has
abs().
flower.ppm had minimal improvement. Somewhat expected because
GetResidualCost_C is only ~3.6%
mug.ppm had a better improvement because GetResidualCost_C is
almost 9%.
C 2.150
NEON 2.130
BUG=b/118740850
Change-Id: Ibc0dd97a81596635f5599cf568205974b4fd2597
Much faster with aarch64. Still somewhat faster without vmaxv.
C: 3.700s
ArmV7: 3.675
aarch64: 3.600
BUG=b/118740850
Change-Id: I3be852da89633eca4bddce443c87f5e4a2f55868
The old code simply did not make sense.
The effect is that the pair would be popped from the
queue no matter what; as the queue is small, it does
not matter that much on the results.
But it will matter for a later CL.
Change-Id: If50c9fa9d7f3ac3c48bb7336d81479287d4944c4
(cherry picked from commit 485ff86fbb)
The old code simply did not make sense.
The effect is that the pair would be popped from the
queue no matter what; as the queue is small, it does
not matter that much on the results.
But it will matter for a later CL.
Change-Id: If50c9fa9d7f3ac3c48bb7336d81479287d4944c4
Also, histograms in a HistogramSet can be initialized all
at once.
Change-Id: Ibbfa6034dce58dca8bb9113487e2ae507222ce7d
(cherry picked from commit 6752904b2f)
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.
Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
(cherry picked from commit decf6f6b87)
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.
Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
We should be using 'floor' when doing the final divide.
-> new MACRO is MULT_FIX_FLOOR()
XXX*** Mips code is DISABLED for now ***XXX
I'll update and re-enable it in a later
patch, since this code needs some refactoring first.
BUG=oss-fuzz:9179
Change-Id: Ic0693cdca4e71f5beab1029475e35c4d06b12d13
* Assert chunklist
* fix potential memory leak and
* fix null pointer access
There should not be several alpha_ or img_ chunks in SynthesizeBitstream. Use ChunkListDelete in MuxImageRelease to be safe.
A null pointer accessed in WebPMuxPushFrame triggered a harmless runtime error.
Change-Id: I3027f8752093652bd41f55e667d041c0de77ab6e
The chunk list only has two operations: append and set
to one element. The two operations are split and the append
one is sped up by storing the last element.
Corrupted data could make a very long list to search through.
BUG=oss-fuzz:9190
Change-Id: I1aa813ca629df29efaa3b46dbd4c4c42dbeaa34c
The standard allows for Huffman images with any coefficients.
Hence potentially big memory allocations. The previous workaround
was "trying" things out, the new one is more rigorous and
only allocates what is needed, modifying the Huffman image
to contain the minimal set of coefficients.
BUG=oss-fuzz:8623,oss-fuzz:9111,oss-fuzz:9134
Change-Id: I6a972e90e4ae509c15cb41ee22c58b775fa3f4aa
idec_dec.c, DecodeRemaining: Set decoder state to ERROR to prevent VP8ExitCritical to be called again
Change-Id: Id5f893f45c348e1c529680d930e640f780a73d4c
treat an ANMF chunk containing multiple VP8/VP8L file as malformed.
fixes a WebPMuxImage::img_ leak.
Though the invalid free in #9106 was avoided in (ubsan):
be738c6d muxread,ChunkVerifyAndAssign: validate chunk_size
that file would still cause a leak similar to #9099.
BUG=oss-fuzz:9099,oss-fuzz:9106
Change-Id: Ib873446a1188afeeb2fe5d53a86b75e0c5de9573
(we also limit radius based on height too, for good measure, although it's not an asan bug)
fixes oss-fuzz issue #9105
Change-Id: Ie0d79dd81480dc4e2b653b7e992e5cdcd3dfa834
before accounting for padding which might overflow if chunk_size is >
MAX_CHUNK_PAYLOAD.
BUG=webp:387,webp:388
Change-Id: I3985b8817ed4faaec0629102c5333c228a0e9c98
previously when adjusting size down based on a smaller riff_size the
checks were insufficient to prevent 'size -= RIFF_HEADER_SIZE' from
rolling over causing ChunkVerifyAndAssign to over read. the new checks
are imported from demux.c.
BUG=webp:386
Change-Id: If863c4a9892977b9ade7dd894392a0ecae13775c
this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.
Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
(cherry picked from commit d77bf512bd)
the 'accum' variable can be larger than 15b for large
rescale values.
Assert triggered:
src/dsp/rescaler_sse2.c:249: RescalerExportRowExpand_SSE2: Assertion `v >= 0 && v <= 255' failed.
src/dsp/rescaler_sse2.c:350: RescalerExportRowShrink_SSE2: Assertion `v >= 0 && v <= 255' failed.
-> fall back to C implementation in this case for now
Change-Id: I7ea1cb72301cafc1459be403f6a6f4e3cbc89bb1
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates function pointer casts that were causing cfi-icall
failures.
[1] https://www.chromium.org/developers/testing/control-flow-integrity
BUG=chromium:827826
Change-Id: I5db021d06390a6cefd670fdd2f0d34c9e530465e
(cherry picked from commit 978eec2507)
Output is <.1% difference in size, randomly.
Speed is 30-50% faster (-m 0 -sharp_yuv).
It also gives the exact same output on ARM and x86, because floats
are no longer used.
Change-Id: Id0f0aa748cc4fc0b82bac1fc5ca954775a0a1b7c
do_copy is a loop invariant, but based on a variable parameter; it would
only be extracted if Import was inlined.
Change-Id: Id5b4a1a4a83a4f2083444da4934e4c994df65b44
for q<=98, we always enable error diffusion.
+ reduce storage 2x by using int8_t
+ make the error diffusion more robust
BUG=webp:340,308
Change-Id: I0608df839ff7b64d6843005a0f81d2577143af9e
* regarding alpha_data_ used for testing.
alpha_data_!=NULL is as close a good test as we'll get.
* regarding filter-strength / sharpness forcing
no practical use (can be done during encode cycles,
for experimentation)
* regarding a 'less-complex' filtering:
no practical use so far. Next version!
Change-Id: If2dfff5818552a7d3e7c23ac08d64fe6d270229c
For large images overflowing the partition0, we re-do a number
of passes but were forgetting to reset the block_count[].
This was leading to incorrect summary.
+ some cosmetic fixes here and there
BUG=webp:355
Change-Id: Ie87158d7f177f8efdca429b146cfcd0e81652d2f
We can't predict if the quality is going to be below the threshold
eventually, so we might as well enable it always.
Change-Id: I30aedecc8c6d4daf159f6ef152697df0206d1e93
This fixes some color smearing due to heavy quantization.
This is only enabled for q <= 30 (cf ERROR_DIFFUSION_QUALITY)
Change-Id: I07e83a4d38461357a32c9e214f7eadc6db73baa9
with WEBP_NEON_OMIT_C_CODE the default _C functions won't be set and
with WEBP_REDUCE_CSP the NEON functions won't be either triggering an
assert for an empty table member.
BUG=chromium:792627
Change-Id: I8d2d430eaa37bb92885b61a3dd39f961924a8def
RGB to YUV conversion was not using SSE to finish up the row.
End data is now copied to a buffer big enough to fit in a
SSE register.
(UPSAMPLE_LAST_BLOCK was already using that trick).
Change-Id: Ie539bcbe570a643a774aa88263503c0d2c41890f
alpha processing is still required when requesting premultiplied output
since:
1b27bf8b WEBP_REDUCE_SIZE: disable all rescaler code
Change-Id: Id1b03256c4c04b8db31527e60cd31dd20ce6f3ad
only supported ones are: RGBA/BGRA/rgbA/bgrA (decoder)
as well as: WebPPictureImportRGB/RGBX/RGBA (encoder).
(note: extras/get_disto is affected too)
Change-Id: If6c4f95054ca15759c4e289fb3b4c352b3521c2c
(cherry picked from commit 6de20df02c)
remove auto-filter (-af) support and make WebPPictureCopy,
WebPPictureIsView, WebPPictureView, WebPPictureCrop, and
WebPPictureRescale noops.
Change-Id: If39d512cc268a0015298a1138dbc94feb86575e5
with gcc-4.8, clang-4.0.1/5 this is no faster (actually up to 2x slower)
than the code generated for memset (0x01010... * dst[-1]). shuffles in
sse4 recover a bit, but performance is still down.
Change-Id: Ie85e8353f8ede559d0b05a1d388787fd18ecc80f
Rewrote WebPPictureHasTransparency() to use them (even for argb).
This is 10% faster, for some reasons.
SSE2 version should be straightforward.
Removes a TODO.
Change-Id: I7ad5848fc5e355e2df505dbcd5a0f42fb6cbab41
The WebPDemux and WebPAnimDecoder APIs are provided for the purpose of
animated webp parsing and decoding. No major changes are currently
planned for the libwebp API.
Change-Id: I2758ecda195b0c4091572d5731a0a85fa3716303
this can result in an alignment hint on arm causing a SIGBUS. casting
the input ptr to anything aside from its type is unnecessary for memcpy
and is contrary to the intent of this function.
Change-Id: I9a4d3f4be90f80cd8c3e96ccbe557e51e34cf7a5
when targeting NEON C functions with NEON equivalents won't be used, but
will contribute to binary size. the same goes for sse2, etc., but this
change is primarily concerned with binary sizes for android arm targets.
note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect
on the use of NEON functions.
this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0.
Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface
* changes:
{dec,enc}_neon: harmonize function suffixes x2
upsampling_neon: harmonize function suffixes
yuv_neon: harmonize function suffixes
rescaler_neon: harmonize function suffixes
lossless_neon: harmonize function suffixes
lossless_enc_neon: harmonize function suffixes
filters_neon,cosmetics: fix indent
enc_neon: harmonize function suffixes
dec_neon: harmonize function suffixes
and all older versions.
force Sub3() to not be inlined, otherwise the code in Select() will be
incorrect.
extends the check add previously in:
637b3888 dsp/lossless: workaround gcc-4.9 bug on arm
BUG=webp:363
Change-Id: I1403b558f8660b764f3a570a3326822d5ef0be29
avoids over reading if the reported ANMF payload is < 8 bytes.
likely broken since:
81b8a741 Design change in ANMF and FRGM chunks:
Change-Id: I3e267bafea348a50545587dea8fafb2199c6b650
ssse3 is bit #9 in ecx, bit 1 is sse3. this only controls the check for
slow ssse3 and likely had no ill effect.
Change-Id: I84ce73dc480e1cdbd085e37be06f3f402116c201