libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-04-19 23:36:45 +02:00

Author	SHA1	Message	Date
James Zern	b1cb37e659	dsp/enc*: use WEBP_RESTRICT qualifier This allows for better vectorization of the C code, inlining of TrueMotion_SSE2, better load usage in aarch64 and other minor reordering with ndk r27/gcc-13/clang-16. This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I07e9944d5c0aa5a079b22883ac5a2d649695e4a0	2024-10-02 14:55:14 -07:00
James Zern	d742b24a88	Intra16Preds_NEON: fix truemotion saturation This needs to be done with signed saturation as the sum may be negative. fixes mismatch with C code after: 3bfb05e3 Add AArch64 Neon implementation of Intra16Preds Change-Id: I017e939d7155cc3489ceb76fc8ad50ac9917f23d	2024-07-11 13:37:06 -07:00
James Zern	c7bb4cb585	Intra4Preds_NEON: fix truemotion saturation This needs to be done with signed saturation as the sum may be negative. fixes mismatch with C code after: baa93808 Add AArch64 Neon implementation of Intra4Preds Change-Id: I190c3d7f78cfd2c7ae83fb7059de41e307abda36	2024-07-11 13:37:06 -07:00
Istvan Stefan	314a142a34	Use QuantizeBlock_NEON for VP8EncQuantizeBlockWHT on Arm Use the Neon implementation instead of falling back to QuantizeBlock_C. Change-Id: Iff6e47eda353cbaa9766f75040fa63aa34607816	2024-07-10 14:48:38 +01:00
Istvan Stefan	3bfb05e38c	Add AArch64 Neon implementation of Intra16Preds Add a Neon implementation of Intra16Preds for use on 64-bit Arm platforms. (This implementation cannot be used on 32-bit Arm platforms as it makes use of a number of AArch64-only Neon instructions.) Change-Id: I24c67cd54b66307e3924fd332c2795fd7422f082	2024-07-10 14:48:38 +01:00
Istvan Stefan	baa93808d9	Add AArch64 Neon implementation of Intra4Preds Add Neon implementation of Intra4Preds for use on 64-bit Arm platforms. (The same implementation cannot be used for 32-bit Arm platforms as it uses a number of AArch64-only Neon instructions.) Change-Id: Id781e7614f4e8e876dfeecd95cfc85e04611d8c6	2024-07-10 14:48:26 +01:00
Vincent Rabaud	501d9274a7	Copy C code to not have multiplication overflow Change-Id: I9375170ce1217921a334c5b93dc3e0084f976688	2024-03-07 09:22:20 +01:00
James Zern	0c496a4ff9	cpu.h: add WEBP_AARCH64 and define it to true for __aarch64__ and Win Arm64 + Visual Studio. Microsoft's compiler (cl.exe) does not define __aarch64__, but relies on _M_ARM64 & _M_ARM64EC Bug: b/277254922 Change-Id: I20e4fa07a4031599db69e3d7ba9050345315ef51	2023-05-02 12:28:50 -07:00
James Zern	e68765af42	dsp,neon: use vaddv in a few more places SumToInt_NEON horizontal_add_uint32x4 Change-Id: I881831a7b2bab35a1810b0d83fee761470f3e09f	2022-09-12 10:55:58 -07:00
James Zern	b6f756e82b	update http links - prefer https - metadataworkinggroup.org/com seem to be offline; the web archive link was obtained from exiftool: https://exiftool.org/TagNames/MWG.html - fix kramdown link, rubyforge has been gone a long time - fix png/zlib links Bug: webp:544 Bug: b/202302177 Change-Id: Id69de4553e7baf00393f12a2c1acb262443a1a93	2021-11-23 10:13:40 -08:00
James Zern	8d033b14d7	{dec,enc}_neon: harmonize function suffixes x2 + neon.h BUG=webp:355 Change-Id: Ia17c7dfc7d61742a4758823675a2d556a739c389	2017-10-20 19:00:53 -07:00
James Zern	785da7eadd	enc_neon: harmonize function suffixes BUG=webp:355 Change-Id: Ie59efd271d16f12d21f3c800667dfc0980dc2e68	2017-10-20 00:18:32 -07:00
James Zern	a439972175	WIP: list includes as descendants of the project dir #include "(.\|..)/..." -> #include "src/..." Change-Id: I772880aa097a770722043c8a4393552ba38a89b6	2017-10-10 23:04:05 -07:00
skal	b09307dcde	Encoder: harmonize function suffixes BUG=webp:355 Change-Id: Ia2fe95db7dfb303f3f64e390d43bc41b8933256c	2017-08-09 02:41:01 +00:00
James Zern	668e1dd44f	src/{dec,enc,utils}: give filenames a unique suffix this avoids duplicates between these trees and dsp/, e.g., enc/tree.c, dec/tree.c, making pulling the whole library source tree into one target possible BUG=webp:279 Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775	2017-01-19 19:09:48 -08:00
Pascal Massimino	9b3aca404d	NEON: fix overflow in SSE NxN calculation vmlal_u8() is prone to overflow during the accumulation. There was a mismatch happening at low q mostly. Because in this case the distortion is important and the accumulated sum was later than 16bit-unsigned. Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8	2017-01-17 11:47:36 +01:00
James Zern	875aec7044	enc_neon,cosmetics: break long comment Change-Id: I88dff0271fef1cc6dd5888572bfe0f09f467b028	2016-03-08 23:33:21 -08:00
James Zern	0d40cc5ea3	enc_neon,Disto4x4: remove an unnecessary transpose based on the sse2 change in: 9960c31 Remove an unnecessary transposition in TTransform. ~9-10.5% faster at the function-level, < 1% overall Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7	2016-03-03 16:18:59 -08:00
James Zern	b44eda3f60	dsp: add DSP_INIT_STUB generates a stub function when the specific architecture is not enabled, exposing a symbol in the module, avoiding a compiler warning Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147	2015-04-02 23:55:35 -07:00
James Zern	182497993b	dsp: s/VP8LSetHistogramData/VP8SetHistogramData/ this function is for lossy encoding; the VP8L prefix is used by lossless Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb	2015-03-24 18:27:41 -07:00
James Zern	fbdcef2401	dsp/enc*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: I0cf40b500f9b3eed55a3211213db180c7c0dd43b	2015-03-20 19:19:46 -07:00
James Zern	602a00f93f	fix iOS arm64 build with Xcode 6.3 the standard vtbl functions are available there [1][2]. based on a patch from: aaroncrespo fixes issue #243. [1] http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf [2] Apple LLVM Compiler Version 6.1 - Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0. [...] Support for the arm64 architecture has been significantly revised to align with ARM's implementation, where the most visible impact is that a few of the vector intrinsics have changed to match ARM's specifications. Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447	2015-02-19 12:16:58 -08:00
James Zern	b969f5dfac	dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage the attribute is only necessary in one location; remove it from the prototypes. Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2	2015-02-13 15:23:40 -08:00
James Zern	f8740f0d6c	dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/ for consistency with other defines shared across modules Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee	2015-01-12 14:27:36 -08:00
James Zern	a3946b8956	enc_neon: fix building with non-Xcode clang (iOS) check for __apple_build_version__ to distinguish the two; a version check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear how upstream will deal with their versioning as they go 3.6+, so avoid it for now. Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063	2014-12-10 15:50:26 -08:00
Pascal Massimino	bad775715a	simplify the Histogram struct, to only store max_value and last_nz we don't need to store the whole distribution in order to compute the alpha Later, we can incorporate the max_value / last_non_zero bookkeeping in SSE2 directly. Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15	2014-12-10 10:44:57 +01:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
James Zern	7534d71640	enc_neon: initialize vectors w/vdup_n_u32 replaces {} initialization gnu-ism Change-Id: I5a7b2d4246f0205e4bfb7f4b77d720c47d8674ec	2014-10-09 12:35:41 +02:00
Yang Zhang	ab70794ddb	rewrite Disto4x4 in enc_neon.c with intrinsic Performance test: Platform: A9 Input data: bryce.yuv 11158x2156 performance of assembly is the base. Less ratio is better. \|toolchain \|assembly \|intrinsic \| \|gcc4.6 \|100% \|97.15% \| \|gcc4.8 \|100% \|95.51 \| Change-Id: Idc2446685acdeb58a4dbdcdae533c68a83a1b879	2014-09-23 18:28:36 -07:00
skal	73d361dd5f	introduce VP8EncQuantize2Blocks to quantize two blocks at a time No speed diff for now. We might reorder better the instructions later, to speed things up. Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c	2014-08-25 20:21:42 -07:00
James Zern	953acd56a4	enc_neon: enable QuantizeBlock for aarch64 vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there with a corresponding change in the load. Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac	2014-08-20 11:48:25 -07:00
James Zern	e300c9d819	cosmetics fix some indent/whitespace, remove a few duplicate includes, extra semi-colons Change-Id: If937182b40a21e0f2028496e7b4b06c6e8a41352	2014-08-06 12:10:59 -07:00
James Zern	e59f53600f	neon: normalize vdup_n_* usage with constants, prefer this over vmov_n_* or vcreate_* Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a	2014-07-01 00:55:05 -07:00
James Zern	bc03670f01	neon: add INIT_VECTOR4 used to initialize NxMx4 vector types replaces initialization via '{{ }}' gnu-ism. Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e	2014-07-01 00:18:23 -07:00
James Zern	dc7687e51b	neon: add INIT_VECTOR2 used to initialize NxMx2 vector types replaces initialization via '{{ }}' gnu-ism. Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139	2014-06-30 23:52:42 -07:00
James Zern	9251c2f6d2	(enc\|dec)_neon: use vcreate_*() where appropriate this is more portable than {} initialization. more involved cases are left for a follow-up. Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17	2014-05-27 16:26:56 -07:00
James Zern	1ba61b09f9	enable NEON intrinsics in aarch64 builds avoids functions that use vtbl? as in iOS builds these are marked unavailable Change-Id: I17aedc3c7dc8f1d5be0941205de0b22c3772ef1b	2014-05-03 12:37:42 -07:00
James Zern	b9d2bb67d6	dsp/neon.h: coalesce intrinsics-related defines Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946	2014-05-03 11:34:07 -07:00
pascal massimino	3f3d717a6c	Merge "enc_neon: enable intrinsics-only functions"	2014-04-27 02:05:53 -07:00
James Zern	42b35e086b	enc_neon: enable intrinsics-only functions CollectHistogram / SSE* / QuantizeBlock have no inline equivalents, enable them where possible and use USE_INTRINSICS to control borderline cases: it's left undefined for now. Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155	2014-04-26 19:09:04 -07:00
James Zern	5e1a17ef4b	enc_neon: move Transpose4x4 to dsp/neon.h + reuse it in TransformWHT() Change-Id: Idfbd0f9b58d6253ac3d65ba55b58989c427ee989	2014-04-26 14:06:04 -07:00
James Zern	98519dd5c1	enc_neon: convert Disto4x4 to intrinsics Change-Id: I0f00d5af2de2301e8237c2a38a9612d3645abad6	2014-04-17 18:29:31 -07:00
Pascal Massimino	fe9317c9bf	cosmetics: * remove MIPS32 suffix from static function names * fix a long line in enc_neon.c Change-Id: Ia1294ae46f471b3eb1e9ba43c6aa1b29a7aeb447	2014-04-16 00:36:19 -07:00
James Zern	953b074677	enc_neon: cosmetics fix/remove incorrect comments + whitespace Change-Id: Id1b86beb23e5bf946e73c34ab7066b6ca177f33b	2014-04-15 23:57:03 -07:00
skal	3f84b5219d	Merge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)"	2014-04-15 07:09:12 -07:00
skal	95203d2d1b	NEON intrinsics version of CollectHistogram apparently faster, but we might save some load/store to/from memory once we settle for the intrinsics-based FTransform() (also: fixed some #ifdef USE_INTRINSICS problems) Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7	2014-04-14 16:47:20 +02:00
skal	7ca2e74bb4	replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8) saves few instructions Change-Id: If8f464bb2894a209bba94825a4db9267df126d47	2014-04-14 15:14:45 +02:00
skal	8ff96a027a	NEON intrinsics version of FTransform as little bit slower than inlined asm it seems. So disabled for now. Change-Id: I8c942846f9bedaed57275675ea9dbbcb8dfd9ccd	2014-04-14 09:58:35 +02:00
skal	869eaf6c60	~30% encoding speedup: use NEON for QuantizeBlock() also revamped the signature to avoid having to pass the 'first' parameter Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5	2014-04-08 03:08:22 -07:00

1 2

64 Commits