Commit Graph

2949 Commits

Author SHA1 Message Date
James Zern
306335198d muxread: fix reading of buffers > riff size
After:
  2c70ad76 muxread,CreateInternal: fix riff size checks (cl/200674839)

`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` (plus additional 1 byte
padding if needed). A later check included `CHUNK_HEADER_SIZE` before
capping the value of the size passed to `WebPMuxCreateInternal()`,
missing cases with a small amount of extra data after the RIFF chunk
(like a newline when the file is opened and saved in a text editor) and
setting size to an incorrect value, so larger sizes would also fail.

Another check of `riff_size < CHUNK_HEADER_SIZE` after the call to
`SizeWithPadding()` is removed because 1) it could not fail given
`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` to the value; and 2) it is
redundant as `size < RIFF_HEADER_SIZE + CHUNK_HEADER_SIZE` is checked
earlier in the function.

Bug: webp:42340561
Change-Id: I58dc4f071b27c2841001b4012aabdb1869f64f97
2024-11-22 12:40:34 -08:00
James Zern
4c85d860ea yuv.h: update RGB<->YUV coefficients in comment
The values for the R/G/B floating point formulas resembled
https://fourcc.org/fccyvrgb.php and Video Demystified, but the fixed
point values are more closely aligned to rounded values from
https://en.wikipedia.org/wiki/YCbCr and BT.601.

The R/G/B formulas with the values prior to this change are added to
sharpyuv_csp.c as they align with the fixed values. The origin of those
coefficients is unclear. For consistency between library versions we'll
leave them as is.

Bug: webp:375011696
Change-Id: Id3f2a57530eee700cc52a899b32b25b5c015e89b
2024-11-21 16:21:45 -08:00
James Zern
61e2cfdadd rework AddVectorEq_SSE2
Take advantage of the known sizes used by VP8LHistogramAdd() and
remove loop for the remainder. The loop was being auto-vectorized making
the code larger and slower than the vectorized C code.

For larger sizes the new code is ~3-4.5% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.

The LINE_SIZE==8 option is removed with this change. It had been set
to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8
case by 2 in any case; they both profile similarly.

Change-Id: I6dfedfd57474f44d15e2ce510a48e5252221077a
2024-11-14 12:21:39 -08:00
James Zern
7bda3deb89 rework AddVector_SSE2
Take advantage of the known sizes used by VP8LHistogramAdd() and remove
loop for the remainder. The loop was being auto-vectorized making the
code larger and slower than the vectorized C code.

For larger sizes the new code is ~4-7% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.

The LINE_SIZE==8 option is removed with this change. It had been set to
16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case
by 2 in any case; they both profile similarly.

Change-Id: I2376e2dca3bffa38477b4a432f4c533419e3be0e
2024-11-14 12:21:33 -08:00
James Zern
dfdcb7f95c Merge "lossless.h: fix function declaration mismatches" into main 2024-10-09 22:30:49 +00:00
James Zern
78ed683978 fix overread in Intra4Preds_NEON
Extend VP8EncIterator::i4_boundary_ by 3 bytes to avoid Intra4Preds_NEON
reading deeper into the struct (likely padding) when top is positioned
at offset 29. This data is memset with MSan to prevent a warning due to
its incorrect modeling of tbl instructions.

Prior to:
  169dfbf9 disable Intra4Preds_NEON
there was a mismatch in the preprocessor checks for enabling the
function in NEON and removing the C version; NEON used `BPS == 32` while
the C code was removed unconditionally when building for aarch64. This
patch also normalizes those checks to look for `BPS == 32` and `BPS !=
32` as appropriate.

Bug: b:366668849,webp:372109644
Change-Id: Ic9e6ad4b2d844cb446decd63aec0b2676a89c8d0
2024-10-08 16:55:12 -07:00
James Zern
d516a68e54 lossless.h: fix function declaration mismatches
These appear as warnings under VS15 (16 and 17 are silent) and were
missed in:
a32b436b dsp/lossless*: use WEBP_RESTRICT qualifier

Change-Id: Ia7cffafc166f2da93b51714363558798cda71b67
2024-10-08 13:41:16 -07:00
James Zern
fdb229ea3a Merge changes I07a7e36a,Ib29980f7,I2316122d,I2356e314,I32b53dd3, ... into main
* changes:
  dsp/yuv*: use WEBP_RESTRICT qualifier
  dsp/upsampling*: use WEBP_RESTRICT qualifier
  dsp/rescaler*: use WEBP_RESTRICT qualifier
  dsp/lossless*: use WEBP_RESTRICT qualifier
  dsp/filters*: use WEBP_RESTRICT qualifier
  dsp/enc*: use WEBP_RESTRICT qualifier
  dsp/dec*: use WEBP_RESTRICT qualifier
  dsp/cost*: use WEBP_RESTRICT qualifier
2024-10-03 17:01:02 +00:00
James Zern
169dfbf931 disable Intra4Preds_NEON
The load of the `top` parameter may over read causing MSan errors:

==7373==WARNING: MemorySanitizer: use-of-uninitialized-value
  #0 0xfff891d52ad4 in Intra4Preds_NEON src/dsp/enc_neon.c:1003:12
  #1 0xfff892d87618 in MakeIntra4Preds src/enc/quant_enc.c:484:3

Bug: b:366668849
Change-Id: I29cf3b2f402ee79ea93c1ee2a4fdd95083aeed68
2024-10-02 15:42:19 -07:00
James Zern
2dd5eb9862 dsp/yuv*: use WEBP_RESTRICT qualifier
Better vectorization in the C code, fewer instructions / comparisons in
NEON, and fewer reloads in SSE2/SSE4 w/ndk r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I07a7e36a2dce8632c71c0fbbeef94dc51453eaf7
2024-10-02 14:55:15 -07:00
James Zern
23bbafbeb8 dsp/upsampling*: use WEBP_RESTRICT qualifier
Better vectorization in the C code, fewer instructions in NEON, and some
code reordering / better register usage in SSE2/SSE4 w/ndk
r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: Ib29980f778ad3dbb952178ad8dee39b8673c4ff8
2024-10-02 14:55:15 -07:00
James Zern
35915b389e dsp/rescaler*: use WEBP_RESTRICT qualifier
Some improvement in the C code. No changes in NEON or SSE2 w/ndk
r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I2316122db893f48f0afda90a147c83cac7f07526
2024-10-02 14:55:14 -07:00
James Zern
a32b436bd5 dsp/lossless*: use WEBP_RESTRICT qualifier
lossless_enc: better vectorization, most benefits seen in AddVector/Eq
              w/ndk r27/gcc-13/clang-16
lossless: minor reordering and some improvement to PredictorAdd5_SSE2
          w/gcc-13

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7
2024-10-02 14:55:14 -07:00
James Zern
04d4b4f387 dsp/filters*: use WEBP_RESTRICT qualifier
Better stack/register usage in SSE2/NEON code and improved vectorization
of the C code with ndk r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I32b53dd38bfc7e2231d875409e7dfda7c513cfb6
2024-10-02 14:55:14 -07:00
James Zern
b1cb37e659 dsp/enc*: use WEBP_RESTRICT qualifier
This allows for better vectorization of the C code, inlining of
TrueMotion_SSE2, better load usage in aarch64 and other minor
reordering with ndk r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I07e9944d5c0aa5a079b22883ac5a2d649695e4a0
2024-10-02 14:55:14 -07:00
James Zern
201894ef24 dsp/dec*: use WEBP_RESTRICT qualifier
A minor improvement for arm targets with ndk r27/gcc-13 in H/VFilter8 (a
couple fewer moves w/aarch64) and much better vectorization of
DitherCombine8x8_C in most targets.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I03e73e6d6404261bb8408a9ae76a4b6ef142f8f0
2024-10-02 14:55:14 -07:00
James Zern
02eac8a741 dsp/cost*: use WEBP_RESTRICT qualifier
on SetResidualCoeffs_*. This results in some minor code reordering when
targeting arvm7 with ndk r27 and other recent versions of clang. No
changes in the x86 compilations with clang-16 / gcc-13.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I7c3554ece848fafbc5ac9c4944f1dc85129f6fd8
2024-10-02 14:55:14 -07:00
Vincent Rabaud
220ee52967 Search for best predictor transform bits
This is useful in cruncher mode.

Change-Id: I8586bdbf464daf85db381ab77a18bf63dd48f323
2024-09-24 10:44:22 +02:00
Vincent Rabaud
7861947813 Try to reduce the sampling for the entropy image
This offers minor compression improvements.

Change-Id: I4b3b1bb11ee83273c0e4c9f47e53b21cf7cd5f76
2024-09-24 10:28:43 +02:00
Vincent Rabaud
a78c5356ba Remove a useless malloc for entropy image
histogram_symbols is converted to uint32_t and <<8 into
histogram_argb.
Using a uint32_t buffer from the start prevents copying and
converting the data.

Change-Id: I245003a6a0f048c31519afa25a600d4479e762e3
2024-09-18 22:38:11 +02:00
Vincent Rabaud
367ca938f1 Refactor predictor finding
This is useful for a forward change that will improve compression.
It splits the residual computation and the best predictor
selection.

The only downside is that more memory is allocated: we had 2
histograms before, we now have 14, but this is necessary for the
later change. Still, this is nothing compared to what is done
later in the pipeline in HistogramSetTotalSize where the number of
histograms created is the number of pixels in the subsampled image.

Change-Id: If03501a26f00462dd1809daa6e9314abd180945d
2024-09-17 09:49:43 +02:00
James Zern
f888291359 anim_encode.c: fix function ref in comment
WebPCleanupTransparentAreaLossless() was renamed to
WebPReplaceTransparentPixels() in:
55a080e5 Add WebPReplaceTransparentPixels() in dsp

Change-Id: I91e32574e6add2748c0655146f100eb2b40498b2
2024-09-09 19:28:12 -07:00
Vincent Rabaud
2e81017c7a Convert predictor_enc.c to fixed point
Also remove the last float in histogram_enc.c

Change-Id: I6f647a5fc6dd34a19292820817472b4462c94f49
2024-08-30 09:22:48 +02:00
Vincent Rabaud
8e0cc14c3e Fix static overflow warning.
In practice, this can never happen because:
- 'streak' is at most as long as a histogram
- 'count' counts the number of streaks

'streak' and 'count' are therefore at most as big as the histogram
length which is at most the max of VP8LHistogramNumCodes,
which is 256+24+(1<<10).

Change-Id: I31c8834543479c8a9260732313ea26b045519515
2024-08-28 10:23:54 +02:00
James Zern
615e58744f Merge "make VP8LPredictor[01]_C() static" into main 2024-08-22 17:35:52 +00:00
James Zern
233e86b91f Merge changes Ie43dc5ef,I94cd8bab into main
* changes:
  Do*Filter_*: remove row & num_rows parameters
  Do*Filter_C: remove dead 'inverse' code paths
2024-08-19 18:51:06 +00:00
James Zern
1a29fd2fc3 make VP8LPredictor[01]_C() static
Only predictors 2-13 are reused in lossless_enc.c.

Change-Id: Ia3a7342fccfb44b9ad5297f48d6be2d96af68ec8
2024-08-16 10:58:45 -07:00
James Zern
dd9d3770d7 Do*Filter_*: remove row & num_rows parameters
The row parameter became a constant in:
2102ccd update the Unfilter API in dsp to process one row independently

num_rows is always equal to height.

Change-Id: Ie43dc5ef222e442ce8c92766da0b9824ccbca236
2024-08-12 19:36:31 -07:00
James Zern
ab451a495c Do*Filter_C: remove dead 'inverse' code paths
The inverse parameter became a constant in:
2102ccd update the Unfilter API in dsp to process one row independently

The row parameter to these functions is in a similar state; it will be
removed in a follow up.

Change-Id: I94cd8babe0e42474ff794ba5fa29dd48039de5f8
2024-08-08 18:13:48 -07:00
James Zern
f9a480f7c3 {TrueMotion,TM16}_NEON: remove zero extension
Replace vmovl_u8 -> s16 + signed vaddq with unsigned vaddw.
No change in assembly with clang-16 (armv7 & aarch64) and gcc-13
(aarch64). armv7 gcc-13 had kept the vmovl instructions, those are now
gone.

Change-Id: Ibb4fbdd5680d3e9dd06933c100528a6f363de472
2024-08-07 16:43:14 -07:00
James Zern
04834acae7 Merge changes I25c30a9e,I0a192fc6,I4cf89575 into main
* changes:
  WASM: Enable VP8L_USE_FAST_LOAD
  WASM: don't use USE_GENERIC_TREE
  WASM: Enable 64-bit BITS caching
2024-08-01 18:36:34 +00:00
Vincent Rabaud
74be8e22d9 Fix implicit conversion issues
Change-Id: If2cc8a137371ef365cf4a9c55f1b6ab131fba564
2024-07-25 22:30:15 +02:00
Vincent Rabaud
f2d6dc1eef Increase the transform bits if possible.
This brings minor size improvements because repetitive values in
the transform images are easily explainable through LZ77. Still,
it makes an upcoming pull request a bit more stable.

This is a rollforward of
7ec51c5916
ee26766a89

Change-Id: I254ab3ccd5053344f89099280e8d994ecd55aee0
2024-07-19 23:22:27 +02:00
wrv
8a7c8dc662 WASM: Enable VP8L_USE_FAST_LOAD
It is 2-5% faster to use VP8L fast load on WASM

Bug: webp:643
Change-Id: I25c30a9e6bcfc7cadd640122579eeebcb37e6fc0
2024-07-15 14:41:36 -05:00
wrv
f0c53cd966 WASM: don't use USE_GENERIC_TREE
It is 2-4% faster to use hard-coded tree on WASM

Bug: webp:643
Change-Id: I0a192fc6af210c79814a81084cd1f199714bf46c
2024-07-15 14:41:14 -05:00
wrv
eef903d04a WASM: Enable 64-bit BITS caching
Bug: webp:643
Change-Id: I4cf89575e0ebcfeaf9d84be8e188863657893a07
2024-07-15 14:40:45 -05:00
James Zern
6296cc8d0d iterator_enc: make VP8IteratorReset() static
This function is unused outside of iterator_enc.c.

Change-Id: I0f1ecedeb9ed4d9f51d0135f04b8ef00424f24cc
2024-07-12 15:23:10 -07:00
James Zern
fbd93896a6 histogram_enc: make VP8LGetHistogramSize static
This function is unused outside of histogram_enc.c.

Change-Id: I527f54408383d0bc9d04878ca397a3d044b350de
2024-07-12 15:23:10 -07:00
James Zern
cc7ff5459a cost_enc: make VP8CalculateLevelCosts[] static
This table is unused outside of cost_enc.c.

Change-Id: I0aa46554b8470fb09a7ffeae0e98d2356b40b671
2024-07-12 15:23:10 -07:00
James Zern
4e2828bae8 vp8l_dec: make VP8LClear() static
This function is unused outside of vp8l_dec.c.

Change-Id: I16733a44ea024ca9601c098641a3cd464bed2b53
2024-07-12 15:22:20 -07:00
James Zern
d742b24a88 Intra16Preds_NEON: fix truemotion saturation
This needs to be done with signed saturation as the sum may be negative.

fixes mismatch with C code after:
3bfb05e3 Add AArch64 Neon implementation of Intra16Preds

Change-Id: I017e939d7155cc3489ceb76fc8ad50ac9917f23d
2024-07-11 13:37:06 -07:00
James Zern
c7bb4cb585 Intra4Preds_NEON: fix truemotion saturation
This needs to be done with signed saturation as the sum may be negative.

fixes mismatch with C code after:
baa93808 Add AArch64 Neon implementation of Intra4Preds

Change-Id: I190c3d7f78cfd2c7ae83fb7059de41e307abda36
2024-07-11 13:37:06 -07:00
Vincent Rabaud
dde11574b0 Remove TODO now that log is using fixed point.
Bug: webp:499

Change-Id: I39ab340ec6b5932db7535c6b7f31843c28de8415
2024-07-11 20:11:03 +00:00
James Zern
3bd9420289 Merge changes Iff6e47ed,I24c67cd5,Id781e761 into main
* changes:
  Use QuantizeBlock_NEON for VP8EncQuantizeBlockWHT on Arm
  Add AArch64 Neon implementation of Intra16Preds
  Add AArch64 Neon implementation of Intra4Preds
2024-07-11 02:04:42 +00:00
Vincent Rabaud
d27d246e42 Merge "Convert VP8LFastSLog2 to fixed point" into main 2024-07-10 21:52:39 +00:00
Istvan Stefan
314a142a34 Use QuantizeBlock_NEON for VP8EncQuantizeBlockWHT on Arm
Use the Neon implementation instead of falling back to
QuantizeBlock_C.

Change-Id: Iff6e47eda353cbaa9766f75040fa63aa34607816
2024-07-10 14:48:38 +01:00
Istvan Stefan
3bfb05e38c Add AArch64 Neon implementation of Intra16Preds
Add a Neon implementation of Intra16Preds for use on 64-bit Arm
platforms. (This implementation cannot be used on 32-bit Arm
platforms as it makes use of a number of AArch64-only Neon
instructions.)

Change-Id: I24c67cd54b66307e3924fd332c2795fd7422f082
2024-07-10 14:48:38 +01:00
Istvan Stefan
baa93808d9 Add AArch64 Neon implementation of Intra4Preds
Add Neon implementation of Intra4Preds for use on 64-bit Arm
platforms. (The same implementation cannot be used for 32-bit Arm
platforms as it uses a number of AArch64-only Neon instructions.)

Change-Id: Id781e7614f4e8e876dfeecd95cfc85e04611d8c6
2024-07-10 14:48:26 +01:00
Vincent Rabaud
41a5e582c2 Fix errors when compiling code as C++
Change-Id: Iba94e24e764038640f39d61fb2bc9cfb3434cc8f
2024-07-10 10:30:48 +02:00
Vincent Rabaud
fb444b692b Convert VP8LFastSLog2 to fixed point
Speedups: 1% with '-lossless', 2% with '-lossless -q 100 -m6'

Change-Id: I1d79ea8e3e9e4bac7bcea4d7cbcc1bd56273988e
2024-07-09 16:42:21 +02:00