lossless_enc: better vectorization, most benefits seen in AddVector/Eq
w/ndk r27/gcc-13/clang-16
lossless: minor reordering and some improvement to PredictorAdd5_SSE2
w/gcc-13
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7
Replace the 2d histograms with uint32_t 1d versions (to avoid
pointer casting and to use the optimized VP8LAddVectorEq).
Change-Id: I90b0fe98390b49e3fd03e3484289571cf7ae6eca
fixes integer sanitizer warnings of the form:
runtime error: implicit conversion from type 'unsigned int' of value
4294967295 (32-bit, unsigned) to type 'int' changed the value to -1
(32-bit, signed)
runtime error: implicit conversion from type
'uint8_t' (aka 'unsigned char') of value 128 (8-bit, unsigned) to type
'char' changed the value to -128 (8-bit, signed)
Bug: b/229626362
Change-Id: I6be3c40407cf7a27b79d31ee32d3829ecb78ed66
this function produces different results from the C code due to
use of double/float resulting in output differences when compared to
-noasm.
Bug: webp:499
Change-Id: Ia039b168c0a66da723fb434656657ba1948db8ae
PredictorSub0_SSE2 doesn't use 'upper' (neither does
VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing
pixels to avoid undefined behavior when offsetting a NULL pointer
BUG=chromium:1026858,oss-fuzz:19430
Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3
_mm_set1_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 65280 (32-bit, signed) to
type 'short' changed the value to -256 (16-bit, signed)
Change-Id: Iad64f6209a8c130a7df67515451ded45b3f91702
CombinedShannonEntropy takes 30% for lossless compression.
This implementation speeds up the overall process by 2 to 3 %.
Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.
Removed the SSE4.1 implementation which is now slower than SSE2.
Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning
Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147