934 Commits

Author SHA1 Message Date
James Zern
23bbafbeb8 dsp/upsampling*: use WEBP_RESTRICT qualifier
Better vectorization in the C code, fewer instructions in NEON, and some
code reordering / better register usage in SSE2/SSE4 w/ndk
r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: Ib29980f778ad3dbb952178ad8dee39b8673c4ff8
2024-10-02 14:55:15 -07:00
James Zern
35915b389e dsp/rescaler*: use WEBP_RESTRICT qualifier
Some improvement in the C code. No changes in NEON or SSE2 w/ndk
r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I2316122db893f48f0afda90a147c83cac7f07526
2024-10-02 14:55:14 -07:00
James Zern
a32b436bd5 dsp/lossless*: use WEBP_RESTRICT qualifier
lossless_enc: better vectorization, most benefits seen in AddVector/Eq
              w/ndk r27/gcc-13/clang-16
lossless: minor reordering and some improvement to PredictorAdd5_SSE2
          w/gcc-13

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7
2024-10-02 14:55:14 -07:00
James Zern
04d4b4f387 dsp/filters*: use WEBP_RESTRICT qualifier
Better stack/register usage in SSE2/NEON code and improved vectorization
of the C code with ndk r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I32b53dd38bfc7e2231d875409e7dfda7c513cfb6
2024-10-02 14:55:14 -07:00
James Zern
b1cb37e659 dsp/enc*: use WEBP_RESTRICT qualifier
This allows for better vectorization of the C code, inlining of
TrueMotion_SSE2, better load usage in aarch64 and other minor
reordering with ndk r27/gcc-13/clang-16.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I07e9944d5c0aa5a079b22883ac5a2d649695e4a0
2024-10-02 14:55:14 -07:00
James Zern
201894ef24 dsp/dec*: use WEBP_RESTRICT qualifier
A minor improvement for arm targets with ndk r27/gcc-13 in H/VFilter8 (a
couple fewer moves w/aarch64) and much better vectorization of
DitherCombine8x8_C in most targets.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I03e73e6d6404261bb8408a9ae76a4b6ef142f8f0
2024-10-02 14:55:14 -07:00
James Zern
02eac8a741 dsp/cost*: use WEBP_RESTRICT qualifier
on SetResidualCoeffs_*. This results in some minor code reordering when
targeting arvm7 with ndk r27 and other recent versions of clang. No
changes in the x86 compilations with clang-16 / gcc-13.

This only affects non-vector pointers; any vector pointers are left as a
follow up.

Change-Id: I7c3554ece848fafbc5ac9c4944f1dc85129f6fd8
2024-10-02 14:55:14 -07:00
Vincent Rabaud
220ee52967 Search for best predictor transform bits
This is useful in cruncher mode.

Change-Id: I8586bdbf464daf85db381ab77a18bf63dd48f323
2024-09-24 10:44:22 +02:00
James Zern
615e58744f Merge "make VP8LPredictor[01]_C() static" into main 2024-08-22 17:35:52 +00:00
James Zern
233e86b91f Merge changes Ie43dc5ef,I94cd8bab into main
* changes:
  Do*Filter_*: remove row & num_rows parameters
  Do*Filter_C: remove dead 'inverse' code paths
2024-08-19 18:51:06 +00:00
James Zern
1a29fd2fc3 make VP8LPredictor[01]_C() static
Only predictors 2-13 are reused in lossless_enc.c.

Change-Id: Ia3a7342fccfb44b9ad5297f48d6be2d96af68ec8
2024-08-16 10:58:45 -07:00
James Zern
dd9d3770d7 Do*Filter_*: remove row & num_rows parameters
The row parameter became a constant in:
2102ccd update the Unfilter API in dsp to process one row independently

num_rows is always equal to height.

Change-Id: Ie43dc5ef222e442ce8c92766da0b9824ccbca236
2024-08-12 19:36:31 -07:00
James Zern
ab451a495c Do*Filter_C: remove dead 'inverse' code paths
The inverse parameter became a constant in:
2102ccd update the Unfilter API in dsp to process one row independently

The row parameter to these functions is in a similar state; it will be
removed in a follow up.

Change-Id: I94cd8babe0e42474ff794ba5fa29dd48039de5f8
2024-08-08 18:13:48 -07:00
James Zern
f9a480f7c3 {TrueMotion,TM16}_NEON: remove zero extension
Replace vmovl_u8 -> s16 + signed vaddq with unsigned vaddw.
No change in assembly with clang-16 (armv7 & aarch64) and gcc-13
(aarch64). armv7 gcc-13 had kept the vmovl instructions, those are now
gone.

Change-Id: Ibb4fbdd5680d3e9dd06933c100528a6f363de472
2024-08-07 16:43:14 -07:00
James Zern
d742b24a88 Intra16Preds_NEON: fix truemotion saturation
This needs to be done with signed saturation as the sum may be negative.

fixes mismatch with C code after:
3bfb05e3 Add AArch64 Neon implementation of Intra16Preds

Change-Id: I017e939d7155cc3489ceb76fc8ad50ac9917f23d
2024-07-11 13:37:06 -07:00
James Zern
c7bb4cb585 Intra4Preds_NEON: fix truemotion saturation
This needs to be done with signed saturation as the sum may be negative.

fixes mismatch with C code after:
baa93808 Add AArch64 Neon implementation of Intra4Preds

Change-Id: I190c3d7f78cfd2c7ae83fb7059de41e307abda36
2024-07-11 13:37:06 -07:00
Vincent Rabaud
dde11574b0 Remove TODO now that log is using fixed point.
Bug: webp:499

Change-Id: I39ab340ec6b5932db7535c6b7f31843c28de8415
2024-07-11 20:11:03 +00:00
James Zern
3bd9420289 Merge changes Iff6e47ed,I24c67cd5,Id781e761 into main
* changes:
  Use QuantizeBlock_NEON for VP8EncQuantizeBlockWHT on Arm
  Add AArch64 Neon implementation of Intra16Preds
  Add AArch64 Neon implementation of Intra4Preds
2024-07-11 02:04:42 +00:00
Istvan Stefan
314a142a34 Use QuantizeBlock_NEON for VP8EncQuantizeBlockWHT on Arm
Use the Neon implementation instead of falling back to
QuantizeBlock_C.

Change-Id: Iff6e47eda353cbaa9766f75040fa63aa34607816
2024-07-10 14:48:38 +01:00
Istvan Stefan
3bfb05e38c Add AArch64 Neon implementation of Intra16Preds
Add a Neon implementation of Intra16Preds for use on 64-bit Arm
platforms. (This implementation cannot be used on 32-bit Arm
platforms as it makes use of a number of AArch64-only Neon
instructions.)

Change-Id: I24c67cd54b66307e3924fd332c2795fd7422f082
2024-07-10 14:48:38 +01:00
Istvan Stefan
baa93808d9 Add AArch64 Neon implementation of Intra4Preds
Add Neon implementation of Intra4Preds for use on 64-bit Arm
platforms. (The same implementation cannot be used for 32-bit Arm
platforms as it uses a number of AArch64-only Neon instructions.)

Change-Id: Id781e7614f4e8e876dfeecd95cfc85e04611d8c6
2024-07-10 14:48:26 +01:00
Vincent Rabaud
fb444b692b Convert VP8LFastSLog2 to fixed point
Speedups: 1% with '-lossless', 2% with '-lossless -q 100 -m6'

Change-Id: I1d79ea8e3e9e4bac7bcea4d7cbcc1bd56273988e
2024-07-09 16:42:21 +02:00
Vincent Rabaud
66408c2c7c Switch the histogram_enc.h API to fixed point
Speedups: 4% with '-lossless', 8% with '-lossless -q 100 -m6'

Change-Id: I8f1c244b290d48132c1edc6a1c9fc3f79fef68ec
2024-07-09 13:39:45 +02:00
Vincent Rabaud
0a9f1c19f8 Convert VP8LFastLog2 to fixed point
The lossless encoding speed-ups are:
- up to 1% with default parameters
- up to 4% in cruncher mode: -q 100 -m 6

Change-Id: Id92d4bad0b0a2c28c8aa9ff5280eea5717017f30
2024-07-02 10:29:38 +02:00
wrv
5e5b8f0c95 Fix SSE2 Transform_AC3 function name
Change-Id: I5fda3221612beafc3548d2abfa7c1e3f686aaaf0
2024-05-29 21:41:14 -05:00
Vincent Rabaud
a90160e11a Refactor histograms in predictors.
Replace the 2d histograms with uint32_t 1d versions (to avoid
pointer casting and to use the optimized VP8LAddVectorEq).

Change-Id: I90b0fe98390b49e3fd03e3484289571cf7ae6eca
2024-05-03 22:09:38 +02:00
James Zern
edc289092a upsampling_{neon,sse41}: fix int sanitizer warning
fixes warnings of the form:
/src/dsp/upsampling_sse41.c:170:1: runtime error: implicit conversion
  from type 'int' of value -16 (32-bit, signed) to type 'uintptr_t' (aka
  'unsigned long') changed the value to 18446744073709551600 (64-bit,
  unsigned)

this is the same change as was done previously in upsampling_sse2.c:
2ee786c7 upsampling_sse2.c: clear int sanitizer warnings

Change-Id: I36064d837ad1a7a118918c16a5551fc732dec2ff
2024-04-30 13:06:09 -07:00
Vincent Rabaud
501d9274a7 Copy C code to not have multiplication overflow
Change-Id: I9375170ce1217921a334c5b93dc3e0084f976688
2024-03-07 09:22:20 +01:00
James Zern
4723db65bc cosmetics: s/SANITY_CHECK/DCHECK/
'sanity' is not an inclusive term. This is a debug check so DCHECK
better shows the intent.

https://source.android.com/docs/setup/contribute/respectful-code

Change-Id: I4cdad3ef9ddf0404d26e46b8430e3ad1d715c5b2
2024-02-16 11:57:16 -08:00
James Zern
f4b9bc9ea1 clear -Wextra-semi-stmt warnings
This is available with clang. Clears warnings of the form:
  warning: empty expression statement has no effect; remove unnecessary
    ';' to silence this warning [-Wextra-semi-stmt]

As a side-effect it also clear a few -Wpedantic warnings with gcc:
  warning: ISO C does not allow extra ';' outside of a function
    [-Wpedantic]

Change-Id: I9295c767aad475c68b1fbbdff855b0d6650a25f5
2024-02-15 18:55:22 -08:00
Arthur Eubanks
307071f144 Remove medium/large code model-specific inline asm
Initially added to workaround gcc implementation issues that clang
does not have. (gcc hardcodes rbx as the PIC register, clang uses a
virtual register)

Change-Id: I1a3277abf02b1ff437b4aea4d28f4cb1c0176b80
2023-10-24 09:48:12 +02:00
Vincent Rabaud
ac17ffffcb Fix non-C90 code.
Thank you OpenCV: https://github.com/opencv/opencv/blob/
1e54e5657927996e86b155d89f51c7b5a73461d2/3rdparty/libwebp/
CMakeLists.txt#L24

Change-Id: Ic79f6309951f96c380e44b3167c1a36aae6d8903
2023-09-14 22:43:45 +02:00
Vincent Rabaud
7ba44f80f3 Homogenize "__asm__ volatile" vs "asm volatile"
According to https://gcc.gnu.org/onlinedocs/gcc/extensions-to-the-c-language-family/how-to-use-inline-assembly-language-in-c-code.html

For the C language, the asm keyword is a GNU extension. When
writing C code that can be compiled with -ansi and the -std options
that select C dialects without GNU extensions, use __asm__ instead
of asm (see Alternate Keywords). For the C++ language, asm is a
standard keyword, but __asm__ can be used for code compiled with
-fno-asm.

Change-Id: I4af950e67c857c890290c1e3d9cc886da0748784
2023-09-06 17:15:05 +02:00
Vincent Rabaud
eac3bd5c53 Have the palette code be in its own file.
Change-Id: I099a342effedd9f451c94d00a14aead27079e6cc
2023-07-06 22:09:24 +02:00
James Zern
6b1c722a5b lossless_common.h,cosmetics: fix a typo
Change-Id: I22d2f96fa1fd2e3c9f059c26996fb11d998159a9
2023-06-22 18:31:36 -07:00
Vincent Rabaud
64819c7cf3 Implement ExtractGreen_SSE2
Change-Id: I74f50e0c01603a640aa8cf7c9658d477e696ea8a
2023-06-19 13:31:46 +02:00
skal
6b02f66015 treat FILTER_NONE as a regular Unfilter[] call
Removes the hard-coded memcpy() in alpha-decoding.

Change-Id: I1dfd98db206893d7715a79d05a1bd9272690471a
2023-06-07 15:42:16 +02:00
Vincent Rabaud
828b4ce062 Switch ExtraCost to ints and implement it in SSE.
The histograms count the occurrences of len/dist in entropy images.
Those (at most (1<<14) by (1<<14)) are sub-sampled by at least
MIN_HUFFMAN_BITS == 2, hence at most 24 bits in a histogram value.
At most, we multiply by 19 (because the longest histogram is of
size 40 and we do 40>>1, cf code) for the bit cost. So it all fits
in 32 bits.

Change-Id: Ife24b035f54794851ff31f2fac07901f724c6d7f
2023-06-01 10:17:13 +02:00
Nozomi Isozaki
ac42dde1c5 Specialize and optimize ITransform_SSE2 using do_two
Change-Id: I976eb4a0cc4e669a02b55012d4aba1536f193781
2023-05-16 12:07:58 +09:00
James Zern
ed27437160 neon.h,cosmetics: clear a couple lint warnings
Missing space after ,  [whitespace/comma] [3]

Change-Id: Ib8fc05c31cbef5318a752e98ab5106dad55d69e9
2023-05-02 17:32:14 -07:00
James Zern
3fb8294762 cpu.h,cosmetics: segment defines
Change-Id: Idc6dcd31e95de1c89b2a35b4c67fa66b92fe1a60
2023-05-02 12:28:50 -07:00
James Zern
0c496a4ff9 cpu.h: add WEBP_AARCH64
and define it to true for __aarch64__ and Win Arm64 + Visual Studio.

Microsoft's compiler (cl.exe) does not define __aarch64__, but relies on
_M_ARM64 & _M_ARM64EC

Bug: b/277254922
Change-Id: I20e4fa07a4031599db69e3d7ba9050345315ef51
2023-05-02 12:28:50 -07:00
James Zern
8151f388eb move VP8GetCPUInfo declaration to cpu.c
This avoids defining a version in each translation unit when using
__declspec(dllexport) which causes failures due to multiply defined
symbols with clang-cl:

lld-link: error: duplicate symbol: VP8GetCPUInfo
>>> defined at CMakeFiles\webpdecode.dir\Debug\src\dec\alpha_dec.c.obj
>>> defined at CMakeFiles\webpdsp.dir\Debug\src\dsp\dec_sse41.c.obj
...

Bug: webp:607
Change-Id: I6cd1ee75b3db984aa513263a05516e867a64925d
2023-04-27 12:39:13 -07:00
James Zern
0afbd97b45 cpu.h: enable NEON w/_M_ARM64EC
The Arm64EC (Emulation Compatible) ABI was added for Windows 11 [1].

[1] https://learn.microsoft.com/en-us/windows/arm/arm64ec

Bug: b/277254922
Change-Id: I3767e1b3db61fa9c33eef7a9ed7abee7c502e36f
2023-04-06 13:49:36 -07:00
James Zern
8f7513b7c0 upsampling_neon.c: fix WEBP_SWAP_16BIT_CSP check
this is always defined by default to 0 since:
v0.6.0-158-g663a6d9d unify the ALTERNATE_CODE flag usage

previously the !defined() check would cause a mismatch between C and
assembly.

Change-Id: Idca0b8e39ca90d63785fd4125aeb7af86c5aae61
2023-03-24 11:20:35 -07:00
James Zern
f853685e13 lossless: SUBTRACT_GREEN -> SUBTRACT_GREEN_TRANSFORM
this makes the name of the TransformType enum value match the other
members

Bug: webp:448
Change-Id: I85b2f615f97b40fc6d544197cccfb7189dcf4fc0
2022-11-21 16:48:51 -08:00
James Zern
769387c54a cpu.c,cosmetics: fix a typo
VP8DecGetCPUInfo -> VP8GetCPUInfo

Change-Id: Ic59d23a2964a881b853db62b3617117bf10ec66d
2022-10-25 16:24:07 -07:00
James Zern
e68765af42 dsp,neon: use vaddv in a few more places
SumToInt_NEON
horizontal_add_uint32x4

Change-Id: I881831a7b2bab35a1810b0d83fee761470f3e09f
2022-09-12 10:55:58 -07:00
James Zern
ef70ee06fa add a few missing <stddef.h> includes for NULL
and remove unused includes in sharpyuv/

Change-Id: If10538a994bd5dc55126f1485f2b163933ad8e91
2022-08-11 17:39:48 -07:00
James Zern
e2fecc22e1 dsp/lossless_enc.c: clear int sanitizer warnings
in TransformColorBlue; make new_blue an int to avoid:

implicit conversion from type 'int' of value 264 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 8 (8-bit,
unsigned)

Bug: b/229626362
Change-Id: Ife276a59231075788396204e1a192f3b0c6d9e21
2022-08-08 17:34:01 -07:00