libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-08-29 15:22:12 +02:00

Author	SHA1	Message	Date
James Zern	2246828be3	dsp/lossless{,_enc}_sse2.c: reorder _SSE assignments When `WEBP_USE_THREAD` is not defined the assignments of _SSE and their unsuffixed counterparts may race. Assigning _SSE directly rather than relying on the unsuffixed values avoids a case where the _SSE variants may refer to the calling function (i.e., AVX2) resulting in infinite recursion. Defining `WEBP_USE_THREAD` is recommended when decode/encode calls can be made from different threads. Bug: 435213378 Change-Id: Id5549730cb72be99b3014ed8e4e355f3ea988659	2025-08-07 13:50:16 -07:00
clang-format	44257cb826	apply clang-format (Debian clang-format version 19.1.7 (3+build4)) with `--style=Google`. Manual changes: * clang-format disabled around macros with stringification (mostly assembly) * some inline assembly strings were adjusted to avoid awkward line breaks * trailing commas, `//` or suffixes (`ull`) added to help array formatting * thread_utils.c: parameter comments were changed to the more common /...=/ style to improve formatting The automatically generated code under swig/ was skipped. Bug: 433996651 Change-Id: Iea3f24160d78d2a2653971cdf13fa932e47ff1b3	2025-07-31 14:53:58 -07:00
Vincent Rabaud	57e324e2eb	Refactor VP8LHistogram histogram_enc.cc - move HistogramAdd to histogram_enc.cc: it is too high level - homogenize the argument naming (e.g. h for histogram, p for population) - separate a bit the data from the stats (only used within VP8LGetHistoImageSymbols) Change-Id: I274546e3ff96297383bcae0a95696c11f18decbf	2025-04-23 19:12:21 +02:00
James Zern	ad52d5fc7e	dec/dsp/enc/utils,cosmetics: rm struct member '_' suffix This is a follow up to: `ee8e8c62` Fix member naming for VP8LHistogram This better matches Google style and clears some clang-tidy warnings. This is the final change in this set. It is rather large due to the shared dependencies between dec/enc. Change-Id: I89de06b5653ae0bb627f904fa6060334831f7e3b	2025-04-16 13:23:42 -07:00
Vincent Rabaud	f2b3f52733	Get AVX2 into WebP lossless Change-Id: Ifad3102c9f899a46401985515cd98f3f7a21887f	2025-03-28 11:44:03 +01:00
Vincent Rabaud	7c70ff7a3b	Clean dsp/lossless includes Change-Id: I47a405a9c402095b440404fe57ac08b5293ea71b	2025-03-25 12:38:00 +01:00
James Zern	a32b436bd5	dsp/lossless*: use WEBP_RESTRICT qualifier lossless_enc: better vectorization, most benefits seen in AddVector/Eq w/ndk r27/gcc-13/clang-16 lossless: minor reordering and some improvement to PredictorAdd5_SSE2 w/gcc-13 This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7	2024-10-02 14:55:14 -07:00
James Zern	aff1c546ef	dsp,x86: normalize types w/_mm_cvtsi128_si32 calls fixes integer sanitizer warnings of the form: implicit conversion from type 'int' of value -2122283647 (32-bit, signed) to type 'uint32_t' (aka 'unsigned int') changed the value to 2172683649 (32-bit, unsigned) implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 3724541952 (32-bit, unsigned) to type 'int' changed the value to -570425344 (32-bit, signed) Bug: b/229626362 Change-Id: I79f68e3e2fcab7cc0402477d2e88d629348c9ff4	2022-08-04 11:26:23 -07:00
James Zern	ab540ae0c5	dsp,x86: normalize types w/_mm_cvtsi32_si128 calls fixes integer sanitizer warnings of the form: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 3724541952 (32-bit, unsigned) to type 'int' changed the value to -570425344 (32-bit, signed) Bug: b/229626362 Change-Id: Ie4d599aba88226e4e047250464ac37ca11d2cd3b	2022-08-04 11:26:23 -07:00
James Zern	8980362eed	dsp,x86: normalize types w/_mm_set* calls (2) missed in: `83539239` (origin/main, main) dsp,x86: normalize types w/_mm_set* calls fixes integer sanitizer warnings of the form: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 4292337446 (32-bit, unsigned) to type 'int' changed the value to -2629850 (32-bit, signed) runtime error: implicit conversion from type 'uint8_t' (aka 'unsigned char') of value 128 (8-bit, unsigned) to type 'char' changed the value to -128 (8-bit, signed) Bug: b/229626362 Change-Id: Ie904da8ded26725b4e0a9b82cc0679234f0a5388	2022-08-04 11:26:23 -07:00
James Zern	835392393b	dsp,x86: normalize types w/_mm_set* calls fixes integer sanitizer warnings of the form: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) runtime error: implicit conversion from type 'uint8_t' (aka 'unsigned char') of value 128 (8-bit, unsigned) to type 'char' changed the value to -128 (8-bit, signed) Bug: b/229626362 Change-Id: I6be3c40407cf7a27b79d31ee32d3829ecb78ed66	2022-08-03 16:50:46 -07:00
Pascal Massimino	8ea81561d2	change VP8LPredictorFunc signature to avoid reading 'left' ... when it's not available. Even if the value was discarded and never used, some msan config were complaining about reading it and passing it around. Change-Id: Iab8d24676c5bb58e607a829121e36c2862da397c	2021-11-05 16:22:31 +01:00
James Zern	33ddb894b1	lossless_sse{2,41}: remove some unneeded includes Change-Id: Icd2cffd32b39c6bf017eee353ac04a4b6d337a11	2021-02-18 10:54:09 -08:00
James Zern	c6b75a1966	lossless_(enc_\|)sse2: avoid offsetting a NULL pointer PredictorSub0_SSE2 doesn't use 'upper' (neither does VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing pixels to avoid undefined behavior when offsetting a NULL pointer BUG=chromium:1026858,oss-fuzz:19430 Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3	2019-12-13 18:33:10 +00:00
James Zern	8043504f95	lossless*sse2: improve non-const 16-bit vector creation use _mm_set1_epi32 instead of _mm_set_epi16 with non-const values; reduces shifts and ors. Change-Id: Ie2cb2ab815f642855d03c6f3001223bcac4bd35c	2018-02-17 17:59:20 -08:00
Vincent Rabaud	807b53c47e	Implement the upsampling/yuv functions in SSE41 Change-Id: If122da22b74a974262063d232f6ca0ab902ff64e	2017-12-04 22:29:43 +01:00
Pascal Massimino	0a17f4712c	Merge "WIP: list includes as descendants of the project dir"	2017-10-11 08:21:42 +00:00
James Zern	a439972175	WIP: list includes as descendants of the project dir #include "(.\|..)/..." -> #include "src/..." Change-Id: I772880aa097a770722043c8a4393552ba38a89b6	2017-10-10 23:04:05 -07:00
James Zern	2c1b18ba2f	lossless_sse2: harmonize function suffixes BUG=webp:355 Change-Id: I59d828800c2ab2a36e0ea90f629b74bd57207411	2017-10-08 14:06:14 -07:00
skal	54f6a3cf3a	lossless_sse2.c: fix some missed suffix changes BUG=webp:355 Change-Id: If830e3169a4021899ed850aa7edfd94b81fa2cf9	2017-08-08 14:19:05 -07:00
skal	622242aaba	Lossess dec: harmonize the function suffixes BUG=webp:355 Change-Id: I445d64df6aa2e347f41e7af306be12a77e2ac6a5	2017-08-07 18:22:41 -07:00
James Zern	7beed2807b	add missing ()s to macro parameters BUG=webp:355 Change-Id: I616c6d3540d6551edd1b1cfdb5bffcf0a044c90f	2017-08-04 17:02:53 -07:00
skal	663a6d9d2e	unify the ALTERNATE_CODE flag usage Pattern is now: #if !defined(FLAG) #define FLAG 0 // ALTERNATE_CODE #endif ... #if (FLAG == 1) ... #else ... #endif // FLAG ... Removed some unused code / flags: WEBP_YUV_USE_TABLE, WEBP_REFERENCE_IMPLEMENTATION, experimental code, VP8YUVInit(), ... BUG=webp:355 Change-Id: I98deb9189446a4cfd665c13ea8aa1ce6a308c63f	2017-08-01 20:49:29 -07:00
Vincent Rabaud	7ca0df1363	Have the SSE2 version of PackARGB use common code. The common code actually got sped-up by 25% by using the code from PackARGB. Change-Id: I94be6ccff2bfe02fff13c8e2698669e6a0d8fc74	2017-06-20 17:41:14 +02:00
Vincent Rabaud	8f6df1d0b9	Unroll Predictors 10, 11 and 12. We see the following speed-ups: 10 -> 13% 11 -> 13% 12 -> 13% Change-Id: I4734fd388d0f4e508884d0b123976bf2cbe69d2f	2017-06-08 20:37:47 +02:00
Vincent Rabaud	1cb638010c	Call the C function to finish off lossless SSE loops only when necessary. Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad	2016-12-21 14:25:54 +01:00
Pascal Massimino	9ae0b3f65a	Merge "SSE2: slightly (~2%) faster Predictor #1 "	2016-12-12 14:46:21 +00:00
Pascal Massimino	c1f97bd758	SSE2: slightly (~2%) faster Predictor #1 by removing a load from memory Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0	2016-12-12 02:24:38 -08:00
Pascal Massimino	ea664b8995	SSE2: 10% faster Predictor #11 Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185	2016-12-12 02:20:41 -08:00
Vincent Rabaud	54ab2e758f	Revert Average3 and Average4 Average3 created a slowdown of 1-2% in lossless decoding. Average4 created a slowdown of 2-3% in lossless decoding. Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5	2016-12-07 15:32:33 +01:00
Pascal Massimino	d4b7d801db	lossless_sse2: use the local functions ...instead of the pointers stored in the array. Should be faster (inlined) and safer. Also: suffix explicitly the functions with _SSE2 Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90	2016-12-06 14:20:41 +01:00
Vincent Rabaud	a5e3b22574	Lossless decoder SSE2 improvements. Change-Id: Ia901014ac63156a2e278b81e035256c30bdf8706	2016-12-06 13:45:09 +01:00
Vincent Rabaud	2e6cb6f34e	Give more flexibility to the predictor generating macro. Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900	2016-12-02 12:33:12 -08:00
Vincent Rabaud	67879e6d48	SSE implementation of decoding predictors. Change-Id: I5c9ae63afc98013cb45ce8a91f051203ac68402c	2016-11-30 12:00:07 +01:00
Vincent Rabaud	4239a1489c	Make the lossless predictors work on a batch of pixels. Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1	2016-11-28 17:12:10 +01:00
Vincent Rabaud	71e2f5cadf	Remove memcpy in lossless decoding. Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c	2016-11-24 17:45:24 +01:00
Vincent Rabaud	7474d46e45	Do not use a register array in SSE. Change-Id: I79cf95bdac1164fc4de899828e9380c23df8d141	2016-11-24 13:06:44 +01:00
Vincent Rabaud	6540cd0eeb	Provide an SSE implementation of ConvertBGRAToRGB Change-Id: Ida11b079077a47fe3b92754f08aa30d81c301fcf	2016-11-23 16:25:51 +01:00
James Zern	4741fac42e	dsp/lossless_*sse2: remove some unnecessary inlines TransformColor / TransformColorInverse are the top-level function pointer calls Change-Id: Ieabdb4005ff3e4f9bb3ebcb140ccb6bef5d28f8b	2015-06-25 21:02:01 -07:00
Pascal Massimino	9e356d6b25	SSE2: slightly faster (~5%) AddGreenToBlueAndRed() Change-Id: Ie147010b66544c4e959f26966ad588394302d418	2015-06-24 09:36:44 +02:00
Pascal Massimino	fc6c75a2a2	SSE2: 53% faster TransformColor[Inverse] Changed the code (again) to process 4 pixels at a time. Loop is more involved, but overall it's faster. Removed the SSE4.1 implementation which is now slower than SSE2. Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1	2015-06-23 14:52:01 -07:00
Pascal Massimino	49073da6d6	SSE2: 46% speed-up of TransformColor[Inverse] Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e	2015-06-23 20:09:04 +02:00
James Zern	b44eda3f60	dsp: add DSP_INIT_STUB generates a stub function when the specific architecture is not enabled, exposing a symbol in the module, avoiding a compiler warning Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147	2015-04-02 23:55:35 -07:00
James Zern	553051f741	dsp/lossless: split enc/dec functions adds lossless_enc*.c; reduces the size of the decode-only so: ~78K w/gcc-4.8.2 on x86_64. Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092	2015-03-23 22:57:50 -07:00
James Zern	1d93ddec19	dsp/lossless*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: If8b4459556e6bfaa36ef046f66520558b9444fc2	2015-03-20 19:19:46 -07:00
James Zern	b969f5dfac	dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage the attribute is only necessary in one location; remove it from the prototypes. Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2	2015-02-13 15:23:40 -08:00
James Zern	1829c42c58	cosmetics: lossless_sse2: add const to some casts source pointers are often cast to __m128*, retain the const in those cases Change-Id: I2405b18c6bb829b76c3a9814057ccbe6e14220d9	2015-02-05 23:51:44 -08:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
Vikas Arora	6d6865f0db	Added SSE2 variants for Average2/3/4 The predictors based on Average2 are tad slower. Following is the performance data for these predictors normalized to number of instruction cycles (as per valgrind) per operation: - Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11 instruction cycles for the C version. - Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12 instruction cycles for the C version. The predictors based on Average4 is faster and Average3 is tad slower: - Predictor10 (Average4) now takes 23 instruction cycles compared to 25 instruction cycles for the C version. - Predictor5 (Average3) now takes 20 instruction cycles compared to 18 instruction cycles for the C version. Maybe SSE2 version of Average2 can be improved further. Otherwise, we can remove the SSE2 version and always fallback to the C version. Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029	2014-04-28 14:47:30 -07:00

1 2

56 Commits