Better vectorization in the C code, fewer instructions in NEON, and some
code reordering / better register usage in SSE2/SSE4 w/ndk
r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: Ib29980f778ad3dbb952178ad8dee39b8673c4ff8
Some improvement in the C code. No changes in NEON or SSE2 w/ndk
r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I2316122db893f48f0afda90a147c83cac7f07526
lossless_enc: better vectorization, most benefits seen in AddVector/Eq
w/ndk r27/gcc-13/clang-16
lossless: minor reordering and some improvement to PredictorAdd5_SSE2
w/gcc-13
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7
Better stack/register usage in SSE2/NEON code and improved vectorization
of the C code with ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I32b53dd38bfc7e2231d875409e7dfda7c513cfb6
This allows for better vectorization of the C code, inlining of
TrueMotion_SSE2, better load usage in aarch64 and other minor
reordering with ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I07e9944d5c0aa5a079b22883ac5a2d649695e4a0
A minor improvement for arm targets with ndk r27/gcc-13 in H/VFilter8 (a
couple fewer moves w/aarch64) and much better vectorization of
DitherCombine8x8_C in most targets.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I03e73e6d6404261bb8408a9ae76a4b6ef142f8f0
on SetResidualCoeffs_*. This results in some minor code reordering when
targeting arvm7 with ndk r27 and other recent versions of clang. No
changes in the x86 compilations with clang-16 / gcc-13.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I7c3554ece848fafbc5ac9c4944f1dc85129f6fd8
The row parameter became a constant in:
2102ccd update the Unfilter API in dsp to process one row independently
num_rows is always equal to height.
Change-Id: Ie43dc5ef222e442ce8c92766da0b9824ccbca236
The inverse parameter became a constant in:
2102ccd update the Unfilter API in dsp to process one row independently
The row parameter to these functions is in a similar state; it will be
removed in a follow up.
Change-Id: I94cd8babe0e42474ff794ba5fa29dd48039de5f8
Replace vmovl_u8 -> s16 + signed vaddq with unsigned vaddw.
No change in assembly with clang-16 (armv7 & aarch64) and gcc-13
(aarch64). armv7 gcc-13 had kept the vmovl instructions, those are now
gone.
Change-Id: Ibb4fbdd5680d3e9dd06933c100528a6f363de472
This needs to be done with signed saturation as the sum may be negative.
fixes mismatch with C code after:
3bfb05e3 Add AArch64 Neon implementation of Intra16Preds
Change-Id: I017e939d7155cc3489ceb76fc8ad50ac9917f23d
This needs to be done with signed saturation as the sum may be negative.
fixes mismatch with C code after:
baa93808 Add AArch64 Neon implementation of Intra4Preds
Change-Id: I190c3d7f78cfd2c7ae83fb7059de41e307abda36
* changes:
Use QuantizeBlock_NEON for VP8EncQuantizeBlockWHT on Arm
Add AArch64 Neon implementation of Intra16Preds
Add AArch64 Neon implementation of Intra4Preds
Add a Neon implementation of Intra16Preds for use on 64-bit Arm
platforms. (This implementation cannot be used on 32-bit Arm
platforms as it makes use of a number of AArch64-only Neon
instructions.)
Change-Id: I24c67cd54b66307e3924fd332c2795fd7422f082
Add Neon implementation of Intra4Preds for use on 64-bit Arm
platforms. (The same implementation cannot be used for 32-bit Arm
platforms as it uses a number of AArch64-only Neon instructions.)
Change-Id: Id781e7614f4e8e876dfeecd95cfc85e04611d8c6
The lossless encoding speed-ups are:
- up to 1% with default parameters
- up to 4% in cruncher mode: -q 100 -m 6
Change-Id: Id92d4bad0b0a2c28c8aa9ff5280eea5717017f30
Replace the 2d histograms with uint32_t 1d versions (to avoid
pointer casting and to use the optimized VP8LAddVectorEq).
Change-Id: I90b0fe98390b49e3fd03e3484289571cf7ae6eca
fixes warnings of the form:
/src/dsp/upsampling_sse41.c:170:1: runtime error: implicit conversion
from type 'int' of value -16 (32-bit, signed) to type 'uintptr_t' (aka
'unsigned long') changed the value to 18446744073709551600 (64-bit,
unsigned)
this is the same change as was done previously in upsampling_sse2.c:
2ee786c7 upsampling_sse2.c: clear int sanitizer warnings
Change-Id: I36064d837ad1a7a118918c16a5551fc732dec2ff
This is available with clang. Clears warnings of the form:
warning: empty expression statement has no effect; remove unnecessary
';' to silence this warning [-Wextra-semi-stmt]
As a side-effect it also clear a few -Wpedantic warnings with gcc:
warning: ISO C does not allow extra ';' outside of a function
[-Wpedantic]
Change-Id: I9295c767aad475c68b1fbbdff855b0d6650a25f5
Initially added to workaround gcc implementation issues that clang
does not have. (gcc hardcodes rbx as the PIC register, clang uses a
virtual register)
Change-Id: I1a3277abf02b1ff437b4aea4d28f4cb1c0176b80
According to https://gcc.gnu.org/onlinedocs/gcc/extensions-to-the-c-language-family/how-to-use-inline-assembly-language-in-c-code.html
For the C language, the asm keyword is a GNU extension. When
writing C code that can be compiled with -ansi and the -std options
that select C dialects without GNU extensions, use __asm__ instead
of asm (see Alternate Keywords). For the C++ language, asm is a
standard keyword, but __asm__ can be used for code compiled with
-fno-asm.
Change-Id: I4af950e67c857c890290c1e3d9cc886da0748784
The histograms count the occurrences of len/dist in entropy images.
Those (at most (1<<14) by (1<<14)) are sub-sampled by at least
MIN_HUFFMAN_BITS == 2, hence at most 24 bits in a histogram value.
At most, we multiply by 19 (because the longest histogram is of
size 40 and we do 40>>1, cf code) for the bit cost. So it all fits
in 32 bits.
Change-Id: Ife24b035f54794851ff31f2fac07901f724c6d7f
and define it to true for __aarch64__ and Win Arm64 + Visual Studio.
Microsoft's compiler (cl.exe) does not define __aarch64__, but relies on
_M_ARM64 & _M_ARM64EC
Bug: b/277254922
Change-Id: I20e4fa07a4031599db69e3d7ba9050345315ef51
This avoids defining a version in each translation unit when using
__declspec(dllexport) which causes failures due to multiply defined
symbols with clang-cl:
lld-link: error: duplicate symbol: VP8GetCPUInfo
>>> defined at CMakeFiles\webpdecode.dir\Debug\src\dec\alpha_dec.c.obj
>>> defined at CMakeFiles\webpdsp.dir\Debug\src\dsp\dec_sse41.c.obj
...
Bug: webp:607
Change-Id: I6cd1ee75b3db984aa513263a05516e867a64925d
this is always defined by default to 0 since:
v0.6.0-158-g663a6d9d unify the ALTERNATE_CODE flag usage
previously the !defined() check would cause a mismatch between C and
assembly.
Change-Id: Idca0b8e39ca90d63785fd4125aeb7af86c5aae61
in TransformColorBlue; make new_blue an int to avoid:
implicit conversion from type 'int' of value 264 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 8 (8-bit,
unsigned)
Bug: b/229626362
Change-Id: Ife276a59231075788396204e1a192f3b0c6d9e21