This allows for better vectorization of the C code, inlining of
TrueMotion_SSE2, better load usage in aarch64 and other minor
reordering with ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I07e9944d5c0aa5a079b22883ac5a2d649695e4a0
This needs to be done with signed saturation as the sum may be negative.
fixes mismatch with C code after:
3bfb05e3 Add AArch64 Neon implementation of Intra16Preds
Change-Id: I017e939d7155cc3489ceb76fc8ad50ac9917f23d
This needs to be done with signed saturation as the sum may be negative.
fixes mismatch with C code after:
baa93808 Add AArch64 Neon implementation of Intra4Preds
Change-Id: I190c3d7f78cfd2c7ae83fb7059de41e307abda36
Add a Neon implementation of Intra16Preds for use on 64-bit Arm
platforms. (This implementation cannot be used on 32-bit Arm
platforms as it makes use of a number of AArch64-only Neon
instructions.)
Change-Id: I24c67cd54b66307e3924fd332c2795fd7422f082
Add Neon implementation of Intra4Preds for use on 64-bit Arm
platforms. (The same implementation cannot be used for 32-bit Arm
platforms as it uses a number of AArch64-only Neon instructions.)
Change-Id: Id781e7614f4e8e876dfeecd95cfc85e04611d8c6
and define it to true for __aarch64__ and Win Arm64 + Visual Studio.
Microsoft's compiler (cl.exe) does not define __aarch64__, but relies on
_M_ARM64 & _M_ARM64EC
Bug: b/277254922
Change-Id: I20e4fa07a4031599db69e3d7ba9050345315ef51
- prefer https
- metadataworkinggroup.org/com seem to be offline; the web archive link
was obtained from exiftool: https://exiftool.org/TagNames/MWG.html
- fix kramdown link, rubyforge has been gone a long time
- fix png/zlib links
Bug: webp:544
Bug: b/202302177
Change-Id: Id69de4553e7baf00393f12a2c1acb262443a1a93
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.
Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
based on the sse2 change in:
9960c31 Remove an unnecessary transposition in TTransform.
~9-10.5% faster at the function-level, < 1% overall
Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning
Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
the standard vtbl functions are available there [1][2].
based on a patch from: aaroncrespo
fixes issue #243.
[1]
http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf
[2] Apple LLVM Compiler Version 6.1
- Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0.
[...]
Support for the arm64 architecture has been significantly revised to
align with ARM's implementation, where the most visible impact is that a
few of the vector intrinsics have changed to match ARM's specifications.
Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.
Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
we don't need to store the whole distribution in order to compute the alpha
Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.
Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.
Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
CollectHistogram / SSE* / QuantizeBlock have no inline equivalents,
enable them where possible and use USE_INTRINSICS to control borderline
cases: it's left undefined for now.
Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155
apparently faster, but we might save some load/store to/from memory
once we settle for the intrinsics-based FTransform()
(also: fixed some #ifdef USE_INTRINSICS problems)
Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7