When `WEBP_USE_THREAD` is not defined the assignments of *_SSE and their
unsuffixed counterparts may race. Assigning *_SSE directly rather than
relying on the unsuffixed values avoids a case where the *_SSE variants
may refer to the calling function (i.e., AVX2) resulting in infinite
recursion.
Defining `WEBP_USE_THREAD` is recommended when decode/encode calls can
be made from different threads.
Bug: 435213378
Change-Id: Id5549730cb72be99b3014ed8e4e355f3ea988659
(Debian clang-format version 19.1.7 (3+build4)) with `--style=Google`.
Manual changes:
* clang-format disabled around macros with stringification (mostly
assembly)
* some inline assembly strings were adjusted to avoid awkward line
breaks
* trailing commas, `//` or suffixes (`ull`) added to help array
formatting
* thread_utils.c: parameter comments were changed to the more common
/*...=*/ style to improve formatting
The automatically generated code under swig/ was skipped.
Bug: 433996651
Change-Id: Iea3f24160d78d2a2653971cdf13fa932e47ff1b3
- move HistogramAdd to histogram_enc.cc: it is too high level
- homogenize the argument naming (e.g. h for histogram, p for
population)
- separate a bit the data from the stats (only used within
VP8LGetHistoImageSymbols)
Change-Id: I274546e3ff96297383bcae0a95696c11f18decbf
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram
This better matches Google style and clears some clang-tidy warnings.
This is the final change in this set. It is rather large due to the
shared dependencies between dec/enc.
Change-Id: I89de06b5653ae0bb627f904fa6060334831f7e3b
lossless_enc: better vectorization, most benefits seen in AddVector/Eq
w/ndk r27/gcc-13/clang-16
lossless: minor reordering and some improvement to PredictorAdd5_SSE2
w/gcc-13
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7
fixes integer sanitizer warnings of the form:
implicit conversion from type 'int' of value -2122283647 (32-bit,
signed) to type 'uint32_t' (aka 'unsigned int') changed the value to
2172683649 (32-bit, unsigned)
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
3724541952 (32-bit, unsigned) to type 'int' changed the value to
-570425344 (32-bit, signed)
Bug: b/229626362
Change-Id: I79f68e3e2fcab7cc0402477d2e88d629348c9ff4
fixes integer sanitizer warnings of the form:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
3724541952 (32-bit, unsigned) to type 'int' changed the value to
-570425344 (32-bit, signed)
Bug: b/229626362
Change-Id: Ie4d599aba88226e4e047250464ac37ca11d2cd3b
missed in:
83539239 (origin/main, main) dsp,x86: normalize types w/_mm_set* calls
fixes integer sanitizer warnings of the form:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
4292337446 (32-bit, unsigned) to type 'int' changed the value to
-2629850 (32-bit, signed)
runtime error: implicit conversion from type
'uint8_t' (aka 'unsigned char') of value 128 (8-bit, unsigned) to type
'char' changed the value to -128 (8-bit, signed)
Bug: b/229626362
Change-Id: Ie904da8ded26725b4e0a9b82cc0679234f0a5388
fixes integer sanitizer warnings of the form:
runtime error: implicit conversion from type 'unsigned int' of value
4294967295 (32-bit, unsigned) to type 'int' changed the value to -1
(32-bit, signed)
runtime error: implicit conversion from type
'uint8_t' (aka 'unsigned char') of value 128 (8-bit, unsigned) to type
'char' changed the value to -128 (8-bit, signed)
Bug: b/229626362
Change-Id: I6be3c40407cf7a27b79d31ee32d3829ecb78ed66
... when it's not available. Even if the value was discarded and
never used, some msan config were complaining about reading it
and passing it around.
Change-Id: Iab8d24676c5bb58e607a829121e36c2862da397c
PredictorSub0_SSE2 doesn't use 'upper' (neither does
VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing
pixels to avoid undefined behavior when offsetting a NULL pointer
BUG=chromium:1026858,oss-fuzz:19430
Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3
Average3 created a slowdown of 1-2% in lossless decoding.
Average4 created a slowdown of 2-3% in lossless decoding.
Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5
...instead of the pointers stored in the array.
Should be faster (inlined) and safer.
Also: suffix explicitly the functions with _SSE2
Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.
Removed the SSE4.1 implementation which is now slower than SSE2.
Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning
Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
The predictors based on Average2 are tad slower.
Following is the performance data for these predictors normalized to
number of instruction cycles (as per valgrind) per operation:
- Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11
instruction cycles for the C version.
- Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12
instruction cycles for the C version.
The predictors based on Average4 is faster and Average3 is tad slower:
- Predictor10 (Average4) now takes 23 instruction cycles compared to 25
instruction cycles for the C version.
- Predictor5 (Average3) now takes 20 instruction cycles compared to 18
instruction cycles for the C version.
Maybe SSE2 version of Average2 can be improved further. Otherwise, we can
remove the SSE2 version and always fallback to the C version.
Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029