libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2024-11-20 12:28:26 +01:00

Author	SHA1	Message	Date
James Zern	33ddb894b1	lossless_sse{2,41}: remove some unneeded includes Change-Id: Icd2cffd32b39c6bf017eee353ac04a4b6d337a11	2021-02-18 10:54:09 -08:00
James Zern	c6b75a1966	lossless_(enc_\|)sse2: avoid offsetting a NULL pointer PredictorSub0_SSE2 doesn't use 'upper' (neither does VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing pixels to avoid undefined behavior when offsetting a NULL pointer BUG=chromium:1026858,oss-fuzz:19430 Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3	2019-12-13 18:33:10 +00:00
James Zern	8043504f95	lossless*sse2: improve non-const 16-bit vector creation use _mm_set1_epi32 instead of _mm_set_epi16 with non-const values; reduces shifts and ors. Change-Id: Ie2cb2ab815f642855d03c6f3001223bcac4bd35c	2018-02-17 17:59:20 -08:00
Vincent Rabaud	807b53c47e	Implement the upsampling/yuv functions in SSE41 Change-Id: If122da22b74a974262063d232f6ca0ab902ff64e	2017-12-04 22:29:43 +01:00
Pascal Massimino	0a17f4712c	Merge "WIP: list includes as descendants of the project dir"	2017-10-11 08:21:42 +00:00
James Zern	a439972175	WIP: list includes as descendants of the project dir #include "(.\|..)/..." -> #include "src/..." Change-Id: I772880aa097a770722043c8a4393552ba38a89b6	2017-10-10 23:04:05 -07:00
James Zern	2c1b18ba2f	lossless_sse2: harmonize function suffixes BUG=webp:355 Change-Id: I59d828800c2ab2a36e0ea90f629b74bd57207411	2017-10-08 14:06:14 -07:00
skal	54f6a3cf3a	lossless_sse2.c: fix some missed suffix changes BUG=webp:355 Change-Id: If830e3169a4021899ed850aa7edfd94b81fa2cf9	2017-08-08 14:19:05 -07:00
skal	622242aaba	Lossess dec: harmonize the function suffixes BUG=webp:355 Change-Id: I445d64df6aa2e347f41e7af306be12a77e2ac6a5	2017-08-07 18:22:41 -07:00
James Zern	7beed2807b	add missing ()s to macro parameters BUG=webp:355 Change-Id: I616c6d3540d6551edd1b1cfdb5bffcf0a044c90f	2017-08-04 17:02:53 -07:00
skal	663a6d9d2e	unify the ALTERNATE_CODE flag usage Pattern is now: #if !defined(FLAG) #define FLAG 0 // ALTERNATE_CODE #endif ... #if (FLAG == 1) ... #else ... #endif // FLAG ... Removed some unused code / flags: WEBP_YUV_USE_TABLE, WEBP_REFERENCE_IMPLEMENTATION, experimental code, VP8YUVInit(), ... BUG=webp:355 Change-Id: I98deb9189446a4cfd665c13ea8aa1ce6a308c63f	2017-08-01 20:49:29 -07:00
Vincent Rabaud	7ca0df1363	Have the SSE2 version of PackARGB use common code. The common code actually got sped-up by 25% by using the code from PackARGB. Change-Id: I94be6ccff2bfe02fff13c8e2698669e6a0d8fc74	2017-06-20 17:41:14 +02:00
Vincent Rabaud	8f6df1d0b9	Unroll Predictors 10, 11 and 12. We see the following speed-ups: 10 -> 13% 11 -> 13% 12 -> 13% Change-Id: I4734fd388d0f4e508884d0b123976bf2cbe69d2f	2017-06-08 20:37:47 +02:00
Vincent Rabaud	1cb638010c	Call the C function to finish off lossless SSE loops only when necessary. Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad	2016-12-21 14:25:54 +01:00
Pascal Massimino	9ae0b3f65a	Merge "SSE2: slightly (~2%) faster Predictor #1 "	2016-12-12 14:46:21 +00:00
Pascal Massimino	c1f97bd758	SSE2: slightly (~2%) faster Predictor #1 by removing a load from memory Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0	2016-12-12 02:24:38 -08:00
Pascal Massimino	ea664b8995	SSE2: 10% faster Predictor #11 Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185	2016-12-12 02:20:41 -08:00
Vincent Rabaud	54ab2e758f	Revert Average3 and Average4 Average3 created a slowdown of 1-2% in lossless decoding. Average4 created a slowdown of 2-3% in lossless decoding. Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5	2016-12-07 15:32:33 +01:00
Pascal Massimino	d4b7d801db	lossless_sse2: use the local functions ...instead of the pointers stored in the array. Should be faster (inlined) and safer. Also: suffix explicitly the functions with _SSE2 Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90	2016-12-06 14:20:41 +01:00
Vincent Rabaud	a5e3b22574	Lossless decoder SSE2 improvements. Change-Id: Ia901014ac63156a2e278b81e035256c30bdf8706	2016-12-06 13:45:09 +01:00
Vincent Rabaud	2e6cb6f34e	Give more flexibility to the predictor generating macro. Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900	2016-12-02 12:33:12 -08:00
Vincent Rabaud	67879e6d48	SSE implementation of decoding predictors. Change-Id: I5c9ae63afc98013cb45ce8a91f051203ac68402c	2016-11-30 12:00:07 +01:00
Vincent Rabaud	4239a1489c	Make the lossless predictors work on a batch of pixels. Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1	2016-11-28 17:12:10 +01:00
Vincent Rabaud	71e2f5cadf	Remove memcpy in lossless decoding. Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c	2016-11-24 17:45:24 +01:00
Vincent Rabaud	7474d46e45	Do not use a register array in SSE. Change-Id: I79cf95bdac1164fc4de899828e9380c23df8d141	2016-11-24 13:06:44 +01:00
Vincent Rabaud	6540cd0eeb	Provide an SSE implementation of ConvertBGRAToRGB Change-Id: Ida11b079077a47fe3b92754f08aa30d81c301fcf	2016-11-23 16:25:51 +01:00
James Zern	4741fac42e	dsp/lossless_*sse2: remove some unnecessary inlines TransformColor / TransformColorInverse are the top-level function pointer calls Change-Id: Ieabdb4005ff3e4f9bb3ebcb140ccb6bef5d28f8b	2015-06-25 21:02:01 -07:00
Pascal Massimino	9e356d6b25	SSE2: slightly faster (~5%) AddGreenToBlueAndRed() Change-Id: Ie147010b66544c4e959f26966ad588394302d418	2015-06-24 09:36:44 +02:00
Pascal Massimino	fc6c75a2a2	SSE2: 53% faster TransformColor[Inverse] Changed the code (again) to process 4 pixels at a time. Loop is more involved, but overall it's faster. Removed the SSE4.1 implementation which is now slower than SSE2. Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1	2015-06-23 14:52:01 -07:00
Pascal Massimino	49073da6d6	SSE2: 46% speed-up of TransformColor[Inverse] Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e	2015-06-23 20:09:04 +02:00
James Zern	b44eda3f60	dsp: add DSP_INIT_STUB generates a stub function when the specific architecture is not enabled, exposing a symbol in the module, avoiding a compiler warning Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147	2015-04-02 23:55:35 -07:00
James Zern	553051f741	dsp/lossless: split enc/dec functions adds lossless_enc*.c; reduces the size of the decode-only so: ~78K w/gcc-4.8.2 on x86_64. Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092	2015-03-23 22:57:50 -07:00
James Zern	1d93ddec19	dsp/lossless*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: If8b4459556e6bfaa36ef046f66520558b9444fc2	2015-03-20 19:19:46 -07:00
James Zern	b969f5dfac	dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage the attribute is only necessary in one location; remove it from the prototypes. Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2	2015-02-13 15:23:40 -08:00
James Zern	1829c42c58	cosmetics: lossless_sse2: add const to some casts source pointers are often cast to __m128*, retain the const in those cases Change-Id: I2405b18c6bb829b76c3a9814057ccbe6e14220d9	2015-02-05 23:51:44 -08:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
Vikas Arora	6d6865f0db	Added SSE2 variants for Average2/3/4 The predictors based on Average2 are tad slower. Following is the performance data for these predictors normalized to number of instruction cycles (as per valgrind) per operation: - Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11 instruction cycles for the C version. - Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12 instruction cycles for the C version. The predictors based on Average4 is faster and Average3 is tad slower: - Predictor10 (Average4) now takes 23 instruction cycles compared to 25 instruction cycles for the C version. - Predictor5 (Average3) now takes 20 instruction cycles compared to 18 instruction cycles for the C version. Maybe SSE2 version of Average2 can be improved further. Otherwise, we can remove the SSE2 version and always fallback to the C version. Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029	2014-04-28 14:47:30 -07:00
Pascal Massimino	b3a616b356	make HistogramAdd() a pointer in dsp * merged the two HistogramAdd/AddEval() into a single call (with detection of special case when b==out) * added a SSE2 variant * harmonize the histogram type to 'uint32_t' instead of just 'int'. This has a lot of ripples on signatures. * 1-2% faster Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306	2014-04-28 10:09:34 -07:00
Urvang Joshi	c90a902eff	Add SSE2 version of forward cross-color transform Encoding speed is roughly the same. Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c	2014-04-02 12:21:20 -07:00
Urvang Joshi	d4813f0cb2	Add SSE2 function for Inverse Cross-color Transform Lossless decoding is now ~3% faster. Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743	2014-04-01 15:52:25 -07:00
Urvang Joshi	4fd7c82e6a	SSE2 variants of Subtract-Green: Rectify loop condition When 4 pixels are left, they should be processed with SSE2. Decoding is marginally faster (~0.4%). Encoding speed: No observable difference. Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e	2014-03-31 10:51:45 -07:00
James Zern	51f406a5d7	lossless_sse2: relocate VP8LDspInitSSE2 proto this is in line with the other dsp files and silences a build warning. Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2	2014-03-27 15:07:43 -07:00
skal	0f4f721b12	separate SSE2 lossless functions into its own file expose the predictor array as function pointers instead of each individual sub-function + merged Average2() into ClampedAddSubtractHalf directly + unified the signature as "VP8LProcessBlueAndRedFunc" no speed diff observed Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044	2014-03-27 21:43:55 +01:00

44 Commits