2986 Commits

Author SHA1 Message Date
Henner Zeller
e015dcc0b9 Make histogram allocation and access more readable and type-safe.
This reduces manual offsetting inside a large chunk of memory to
hit the right histogram and replaces with types for the histogram
buckets and a container Histograms.

Change-Id: I1f80fcc2da38cadd9e4bc57d0693ed11dc5b3581
2025-06-12 15:55:20 +02:00
James Zern
753ed11ef8 enc_neon.c: fix aarch64 compilation w/gcc < 8.5.0
Fixes:
dsp/enc_neon.c:1192:11: warning: implicit declaration of function
  'vld1_u8_x2'; did you mean 'vld1_u32'? [-Wimplicit-function-declaration]
   inner = vld1_u8_x2(top);
           ^~~~~~~~~~
           vld1_u32

Change-Id: I8d0175561efd69bc9614a68dca1d0fc19cdf91be
2025-05-30 10:25:38 -07:00
James Zern
15e2e1ee3b analysis_enc.c: remove unused include
clears a clang-tidy warning

Change-Id: Ie17328dd624772806071fb8409fac4a9a78810bc
2025-05-16 12:40:51 -07:00
Henner Zeller
98c2780100 IWYU: Include all headers for symbols used in files.
Semi-automatically taking the the misc-include-cleaner warnings
by clang-tidy and fixing files to be self-contained.

Change-Id: Iaaa2b2ec9d6dcce547fa5cb6b4f056dfc8c781ff
2025-05-15 14:53:57 +02:00
Vincent Rabaud
eb3ff78159 Only use valid histograms in VP8LHistogramSet
Empty histograms or one of two merged histograms were set to NULL.
That made the code harder to understand.

This changes the order of the histograms and therefore the goldens,
but at the noise level.

Change-Id: I1702637bdcdbaaad1244a1345ca5297459f61132
2025-04-24 17:03:49 +02:00
Vincent Rabaud
57e324e2eb Refactor VP8LHistogram histogram_enc.cc
- move HistogramAdd to histogram_enc.cc: it is too high level
- homogenize the argument naming (e.g. h for histogram, p for
population)
- separate a bit the data from the stats (only used within
VP8LGetHistoImageSymbols)

Change-Id: I274546e3ff96297383bcae0a95696c11f18decbf
2025-04-23 19:12:21 +02:00
Vincent Rabaud
7191a602b0 Merge "Generalize trivial histograms" into main 2025-04-21 12:48:33 -07:00
James Zern
19696e0a6f Merge "alpha_processing_sse2: quiet signed conv warning" into main 2025-04-21 12:45:32 -07:00
Vincent Rabaud
52a430a7b6 Generalize trivial histograms
For now, this is used for histograms where A,R,B are
trivial. This can be done on a per-symbol basis for
speed-ups.
Only the entropy bin merge criterion is kept with
A,R,B to not create speed regressions (but compression
improvements).

Change-Id: Iaff6f6d5f157066e481bf43553ea5edd01ff1cde
2025-04-21 20:56:33 +02:00
Vincent Rabaud
e53e213091 Cache all costs in the histograms
This provides a small speed-up but it mostly makes a
unique entry point to compute costs.

Change-Id: I05d9eb3f01ae90d95bcd7b1e1e987ae729844a60
2025-04-20 18:18:38 +02:00
James Zern
f8b360c419 alpha_processing_sse2: quiet signed conv warning
After:
44f91b0d Speed DispatchAlpha_SSE2 up

_mm_set1_epi8 takes a char argument; add a `char` cast for 0xff.

from clang-14 integer sanitizer:
  implicit conversion from type 'int' of value 255 (32-bit, signed) to
  type 'char' changed the value to -1 (8-bit, signed)

Change-Id: I0f4ed092eddc0beb311f44bf3d4b74a4d1177040
2025-04-17 12:21:34 -07:00
James Zern
ad52d5fc7e dec/dsp/enc/utils,cosmetics: rm struct member '_' suffix
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram

This better matches Google style and clears some clang-tidy warnings.

This is the final change in this set. It is rather large due to the
shared dependencies between dec/enc.

Change-Id: I89de06b5653ae0bb627f904fa6060334831f7e3b
2025-04-16 13:23:42 -07:00
James Zern
ed7cd6a7f3 utils.c,cosmetics: rm struct member '_' suffix
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram

This better matches Google style and clears some clang-tidy warnings.

Change-Id: Ie2f82401e1ba28bd0575b6bb82d12ed55c71718f
2025-04-16 11:47:46 -07:00
James Zern
3a23b0f008 random_utils.[hc],cosmetics: rm struct member '_' suffix
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram

This better matches Google style and clears some clang-tidy warnings.

Change-Id: Ib58d676fa79c5a4a95c676a98b62b548097f3c48
2025-04-16 11:47:46 -07:00
James Zern
a99d0e6f04 quant_levels_dec_utils.c,cosmetics: rm struct member '_' suffix
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram

This better matches Google style and clears some clang-tidy warnings.

Change-Id: Ia4ce0fd0095f76f7edbc0fc6fe7f625e0d8bc6df
2025-04-16 11:47:46 -07:00
James Zern
1ed4654dc0 huffman_encode_utils.[hc],cosmetics: rm struct member '_' suffix
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram

This better matches Google style and clears some clang-tidy warnings.

Change-Id: Ice1edbbd98172a916be6b6d3cdaff80fe05a6e37
2025-04-16 11:47:46 -07:00
James Zern
f0689e48cb config_enc.c,cosmetics: rm struct member '_' suffix
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram

This better matches Google style and clears some clang-tidy warnings.

Change-Id: I23878bca2e14a898266704f3fec65d40f58fd0b2
2025-04-16 11:47:45 -07:00
James Zern
24262266d0 mux,cosmetics: rm struct member '_' suffix
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram

This better matches Google style and clears some clang-tidy warnings.

Change-Id: I9774ed6182ee4d872551aea56390fc0662cf0925
2025-04-16 11:47:41 -07:00
James Zern
3f54b1aa12 demux,cosmetics: rm struct member '_' suffix
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram

This better matches Google style and clears some clang-tidy warnings.

Change-Id: Ida41ca82445800552573ff5ebbde743cf8fa6eff
2025-04-15 19:27:17 -07:00
Vincent Rabaud
5225592f6b Refactor VP8LHistogram to hide initializations from the user.
This will make it easier to update some future statistics

Change-Id: I3a3ec64d3c9c53ebcf491007e3a4d916e122c87f
2025-04-11 16:16:37 +02:00
Vincent Rabaud
00338240c1 Remove some computations in histogram clustering
- move the bin_id to the Histogram
- do not consider empty histograms

The speed-ups are negligible as linear algorithms in uint16_t are
removed, while the whole code is still O(N^2) in histograms.

Change-Id: Ie9c4831f0f3c64af9d9710a1dc2d817ba165389e
2025-04-11 08:55:24 +02:00
Vincent Rabaud
44f91b0ddd Speed DispatchAlpha_SSE2 up
On some dataset, this was taking 2.5%. 2% when switching to
_mm_maskmoveu_si128. 1.7% when using _mm_loadu_si128

Confirmed by IACA: going from throughput of 4.26 to 3.5 and then
to 6.26 for twice the input.

Change-Id: I409f901aaad9d39bf55a1aac28cc25f126876b01
2025-04-10 11:53:19 +02:00
Vincent Rabaud
ee8e8c620f Fix member naming for VP8LHistogram
clang-tidy keeps complaining and that typedef will evolve in the
future

Change-Id: I734f2ae7dc0f4deac0dd391ae9f4b38c45507651
2025-04-10 09:54:57 +02:00
Vincent Rabaud
a1ad3f1e37 Merge "Remove now unused ExtraCostCombined" into main 2025-04-01 00:28:47 -07:00
Vincent Rabaud
321561b41f Remove now unused ExtraCostCombined
Change-Id: Ic9d1ccf5b10fed67f836aa19fa0f84238acbf4c1
2025-03-29 23:34:20 +01:00
James Zern
e0ae21d231 WebPMemoryWriterClear: use WebPMemoryWriterInit
Removes some common code between the two functions.

Change-Id: If9f42e580e34dad63f3806750d9d7571941026b5
2025-03-28 12:37:24 -07:00
Vincent Rabaud
a4183d94c7 Remove the computation of ExtraCost when comparing histograms
Entropy clustering merges symbol histograms to reduce the overall
entropy. The cost of 2 added histograms is compared to the 2 costs
of the individual histograms and if it is smaller, a merge is done.

Except for some symbols (distance and length), the computed cost is
 the real final cost based on the histogram, and some constant cost
(independent from the probabilities of the symbols and hence the
merge) because the symbol is encode as Golomb.

This constant cost is useless and can be removed.

Change-Id: I6271e8c0e4111cdeff544cbdb7dec3c67be5309c
2025-03-28 15:00:41 +01:00
Vincent Rabaud
f2b3f52733 Get AVX2 into WebP lossless
Change-Id: Ifad3102c9f899a46401985515cd98f3f7a21887f
2025-03-28 11:44:03 +01:00
Vincent Rabaud
7c70ff7a3b Clean dsp/lossless includes
Change-Id: I47a405a9c402095b440404fe57ac08b5293ea71b
2025-03-25 12:38:00 +01:00
Vincent Rabaud
9dd5ae819b Use the full register in PredictorSub13_SSE2
No more than 15 registers are used at a time

Change-Id: I40f77d9df8500e5e0d52ff6b206d765e8be62ae1
2025-03-25 11:07:15 +01:00
James Zern
743a5f092d enc_neon: enable vld1q_u8_x4 for clang & msvc
This restores the use of the function after
980b708e enc_neon: fix build w/aarch64 gcc < 9.4.0

The intrinsic was added to llvm for aarch64 in:
5e4ce1ae9dad Implement the newly added AArch64 ACLE functions for
             ld1/st1 with 2/3/4 vectors. The functions are like:
             vst1_s8_x2 ...
llvmorg-3.4.0-rc1~101
https://github.com/llvm/llvm-project/commit/5e4ce1ae9dad

Visual Studio 2019 and 2022 also support the function (2017 is still
disabled for this path due to it relying on arm64_neon.h).

Change-Id: I6ff10e22deb3968a48738a4458d2d3d55410b5ec
2025-03-05 16:56:20 -08:00
James Zern
980b708e2c enc_neon: fix build w/aarch64 gcc < 9.4.0
vld1q_u8_x4 was added for aarch64 in the gcc 9.4.0 release:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/ChangeLog;h=7558c0a369ea8c74a2b9369049a2d1cc187dc050;hb=13c83c4cc679ad5383ed57f359e53e8d518b7842#l2100

fixes:
src/dsp/enc_neon.c: In function 'Intra4Preds_NEON':
src/dsp/enc_neon.c:974:37: warning: implicit declaration of function
  'vld1q_u8_x4'; did you mean 'vld1q_u8_x2'?
  [-Wimplicit-function-declaration]

Bug: webp:398288323
Change-Id: Ic6e408065a375c945cc8691bd16a9f5d5642cfa2
2025-02-27 19:07:50 -08:00
Vincent Rabaud
6a22b6709c Add a function to validate a WebPDecoderConfig
This echoes WebPValidateConfig for encoding.

Change-Id: Ib404d55c7af4d0755644879ec491e3998e6b5e8d
2025-01-30 10:10:08 +01:00
Vincent Rabaud
7ed2b10ef0 Use consistently signed stride types.
The stride can be negative when asked for a flipped image.

Change-Id: I049e8027c769186274a6a3049949f3fcaae7d2e9
2025-01-30 00:12:28 +01:00
Vincent Rabaud
654bfb040c Avoid nullptr arithmetic in VP8BitReaderSetBuffer
When start is nullptr, the IO is not used afterwards
anyway, so there is not risk.

Change-Id: I0a828aec85c6e228e95dfed4a40d348275a7c577
2025-01-30 00:12:15 +01:00
James Zern
369238461b bump version to 1.5.0
libwebp{,decoder} - 1.5.0
libwebp libtool - 8.10.1
libwebpdecoder libtool - 4.10.1

mux - 1.5.0
libtool - 4.1.1

demux - 1.5.0
libtool - 2.16.0

sharpyuv - 0.4.1
libtool - 1.1.1

Bug: b:336795049,webp:380121350
Change-Id: I53bdac2b0bd5ce30addf10e16776a16a07910e45
2024-12-12 17:43:51 -08:00
Vincent Rabaud
9e5ecfaf00 Properly check the data size against the end of the RIFF chunk
Bug: oss-fuzz:382816119

Change-Id: I629870246d8f1bd7c6cb0d66e89018600cecee3a
2024-12-10 09:09:08 +01:00
James Zern
306335198d muxread: fix reading of buffers > riff size
After:
  2c70ad76 muxread,CreateInternal: fix riff size checks (cl/200674839)

`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` (plus additional 1 byte
padding if needed). A later check included `CHUNK_HEADER_SIZE` before
capping the value of the size passed to `WebPMuxCreateInternal()`,
missing cases with a small amount of extra data after the RIFF chunk
(like a newline when the file is opened and saved in a text editor) and
setting size to an incorrect value, so larger sizes would also fail.

Another check of `riff_size < CHUNK_HEADER_SIZE` after the call to
`SizeWithPadding()` is removed because 1) it could not fail given
`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` to the value; and 2) it is
redundant as `size < RIFF_HEADER_SIZE + CHUNK_HEADER_SIZE` is checked
earlier in the function.

Bug: webp:42340561
Change-Id: I58dc4f071b27c2841001b4012aabdb1869f64f97
2024-11-22 12:40:34 -08:00
James Zern
4c85d860ea yuv.h: update RGB<->YUV coefficients in comment
The values for the R/G/B floating point formulas resembled
https://fourcc.org/fccyvrgb.php and Video Demystified, but the fixed
point values are more closely aligned to rounded values from
https://en.wikipedia.org/wiki/YCbCr and BT.601.

The R/G/B formulas with the values prior to this change are added to
sharpyuv_csp.c as they align with the fixed values. The origin of those
coefficients is unclear. For consistency between library versions we'll
leave them as is.

Bug: webp:375011696
Change-Id: Id3f2a57530eee700cc52a899b32b25b5c015e89b
2024-11-21 16:21:45 -08:00
James Zern
61e2cfdadd rework AddVectorEq_SSE2
Take advantage of the known sizes used by VP8LHistogramAdd() and
remove loop for the remainder. The loop was being auto-vectorized making
the code larger and slower than the vectorized C code.

For larger sizes the new code is ~3-4.5% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.

The LINE_SIZE==8 option is removed with this change. It had been set
to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8
case by 2 in any case; they both profile similarly.

Change-Id: I6dfedfd57474f44d15e2ce510a48e5252221077a
2024-11-14 12:21:39 -08:00
James Zern
7bda3deb89 rework AddVector_SSE2
Take advantage of the known sizes used by VP8LHistogramAdd() and remove
loop for the remainder. The loop was being auto-vectorized making the
code larger and slower than the vectorized C code.

For larger sizes the new code is ~4-7% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.

The LINE_SIZE==8 option is removed with this change. It had been set to
16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case
by 2 in any case; they both profile similarly.

Change-Id: I2376e2dca3bffa38477b4a432f4c533419e3be0e
2024-11-14 12:21:33 -08:00
James Zern
dfdcb7f95c Merge "lossless.h: fix function declaration mismatches" into main 2024-10-09 22:30:49 +00:00
James Zern
78ed683978 fix overread in Intra4Preds_NEON
Extend VP8EncIterator::i4_boundary_ by 3 bytes to avoid Intra4Preds_NEON
reading deeper into the struct (likely padding) when top is positioned
at offset 29. This data is memset with MSan to prevent a warning due to
its incorrect modeling of tbl instructions.

Prior to:
  169dfbf9 disable Intra4Preds_NEON
there was a mismatch in the preprocessor checks for enabling the
function in NEON and removing the C version; NEON used `BPS == 32` while
the C code was removed unconditionally when building for aarch64. This
patch also normalizes those checks to look for `BPS == 32` and `BPS !=
32` as appropriate.

Bug: b:366668849,webp:372109644
Change-Id: Ic9e6ad4b2d844cb446decd63aec0b2676a89c8d0
2024-10-08 16:55:12 -07:00
James Zern
d516a68e54 lossless.h: fix function declaration mismatches
These appear as warnings under VS15 (16 and 17 are silent) and were
missed in:
a32b436b dsp/lossless*: use WEBP_RESTRICT qualifier

Change-Id: Ia7cffafc166f2da93b51714363558798cda71b67
2024-10-08 13:41:16 -07:00
James Zern
fdb229ea3a Merge changes I07a7e36a,Ib29980f7,I2316122d,I2356e314,I32b53dd3, ... into main
* changes:
  dsp/yuv*: use WEBP_RESTRICT qualifier
  dsp/upsampling*: use WEBP_RESTRICT qualifier
  dsp/rescaler*: use WEBP_RESTRICT qualifier
  dsp/lossless*: use WEBP_RESTRICT qualifier
  dsp/filters*: use WEBP_RESTRICT qualifier
  dsp/enc*: use WEBP_RESTRICT qualifier
  dsp/dec*: use WEBP_RESTRICT qualifier
  dsp/cost*: use WEBP_RESTRICT qualifier
2024-10-03 17:01:02 +00:00
James Zern
169dfbf931 disable Intra4Preds_NEON
The load of the `top` parameter may over read causing MSan errors:

==7373==WARNING: MemorySanitizer: use-of-uninitialized-value
  #0 0xfff891d52ad4 in Intra4Preds_NEON src/dsp/enc_neon.c:1003:12
  #1 0xfff892d87618 in MakeIntra4Preds src/enc/quant_enc.c:484:3

Bug: b:366668849
Change-Id: I29cf3b2f402ee79ea93c1ee2a4fdd95083aeed68
2024-10-02 15:42:19 -07:00
James Zern
2dd5eb9862 dsp/yuv*: use WEBP_RESTRICT qualifier
Better vectorization in the C code, fewer instructions / comparisons in
NEON, and fewer reloads in SSE2/SSE4 w/ndk r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I07a7e36a2dce8632c71c0fbbeef94dc51453eaf7
2024-10-02 14:55:15 -07:00
James Zern
23bbafbeb8 dsp/upsampling*: use WEBP_RESTRICT qualifier
Better vectorization in the C code, fewer instructions in NEON, and some
code reordering / better register usage in SSE2/SSE4 w/ndk
r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: Ib29980f778ad3dbb952178ad8dee39b8673c4ff8
2024-10-02 14:55:15 -07:00
James Zern
35915b389e dsp/rescaler*: use WEBP_RESTRICT qualifier
Some improvement in the C code. No changes in NEON or SSE2 w/ndk
r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I2316122db893f48f0afda90a147c83cac7f07526
2024-10-02 14:55:14 -07:00
James Zern
a32b436bd5 dsp/lossless*: use WEBP_RESTRICT qualifier
lossless_enc: better vectorization, most benefits seen in AddVector/Eq
              w/ndk r27/gcc-13/clang-16
lossless: minor reordering and some improvement to PredictorAdd5_SSE2
          w/gcc-13

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7
2024-10-02 14:55:14 -07:00