libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-18 23:09:52 +02:00

Author	SHA1	Message	Date
Vincent Rabaud	7c70ff7a3b	Clean dsp/lossless includes Change-Id: I47a405a9c402095b440404fe57ac08b5293ea71b	2025-03-25 12:38:00 +01:00
Vincent Rabaud	9dd5ae819b	Use the full register in PredictorSub13_SSE2 No more than 15 registers are used at a time Change-Id: I40f77d9df8500e5e0d52ff6b206d765e8be62ae1	2025-03-25 11:07:15 +01:00
James Zern	61e2cfdadd	rework AddVectorEq_SSE2 Take advantage of the known sizes used by VP8LHistogramAdd() and remove loop for the remainder. The loop was being auto-vectorized making the code larger and slower than the vectorized C code. For larger sizes the new code is ~3-4.5% faster than the old code with about the same improvement against the vectorized C code. For the minimal size (40), the new code is ~30% faster than the C and old SSE2 code. The LINE_SIZE==8 option is removed with this change. It had been set to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case by 2 in any case; they both profile similarly. Change-Id: I6dfedfd57474f44d15e2ce510a48e5252221077a	2024-11-14 12:21:39 -08:00
James Zern	7bda3deb89	rework AddVector_SSE2 Take advantage of the known sizes used by VP8LHistogramAdd() and remove loop for the remainder. The loop was being auto-vectorized making the code larger and slower than the vectorized C code. For larger sizes the new code is ~4-7% faster than the old code with about the same improvement against the vectorized C code. For the minimal size (40), the new code is ~30% faster than the C and old SSE2 code. The LINE_SIZE==8 option is removed with this change. It had been set to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case by 2 in any case; they both profile similarly. Change-Id: I2376e2dca3bffa38477b4a432f4c533419e3be0e	2024-11-14 12:21:33 -08:00
James Zern	a32b436bd5	dsp/lossless*: use WEBP_RESTRICT qualifier lossless_enc: better vectorization, most benefits seen in AddVector/Eq w/ndk r27/gcc-13/clang-16 lossless: minor reordering and some improvement to PredictorAdd5_SSE2 w/gcc-13 This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7	2024-10-02 14:55:14 -07:00
Vincent Rabaud	dde11574b0	Remove TODO now that log is using fixed point. Bug: webp:499 Change-Id: I39ab340ec6b5932db7535c6b7f31843c28de8415	2024-07-11 20:11:03 +00:00
Vincent Rabaud	fb444b692b	Convert VP8LFastSLog2 to fixed point Speedups: 1% with '-lossless', 2% with '-lossless -q 100 -m6' Change-Id: I1d79ea8e3e9e4bac7bcea4d7cbcc1bd56273988e	2024-07-09 16:42:21 +02:00
Vincent Rabaud	a90160e11a	Refactor histograms in predictors. Replace the 2d histograms with uint32_t 1d versions (to avoid pointer casting and to use the optimized VP8LAddVectorEq). Change-Id: I90b0fe98390b49e3fd03e3484289571cf7ae6eca	2024-05-03 22:09:38 +02:00
James Zern	835392393b	dsp,x86: normalize types w/_mm_set* calls fixes integer sanitizer warnings of the form: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) runtime error: implicit conversion from type 'uint8_t' (aka 'unsigned char') of value 128 (8-bit, unsigned) to type 'char' changed the value to -128 (8-bit, signed) Bug: b/229626362 Change-Id: I6be3c40407cf7a27b79d31ee32d3829ecb78ed66	2022-08-03 16:50:46 -07:00
Vincent Rabaud	a19a25bb03	Replace doubles by floats in lossless misc cost estimations. Doubles are slower and use more RAM for no benefit. Change-Id: I05b313576f9b33388c7c39d7fed8de84170c3753	2022-04-17 21:07:54 +02:00
Ilya Kurdyukov	fae416179e	faster CombinedShannonEntropy_SSE2 optimized for sparse histograms Change-Id: I54412f5f8fc53d2598964a5be91f6c54ece3f21b	2021-02-19 13:14:46 +01:00
James Zern	8599571935	disable CombinedShannonEntropy_SSE2 on x86 this function produces different results from the C code due to use of double/float resulting in output differences when compared to -noasm. Bug: webp:499 Change-Id: Ia039b168c0a66da723fb434656657ba1948db8ae	2021-01-18 16:41:44 -08:00
Yannis Guyon	47309ef52d	webp: WEBP_OFFSET_PTR() Removes undefined behavior of offsetting NULL. Change-Id: I7c83d0c913c631c091a5fb128f6d6b46b1d116db	2020-03-20 11:39:06 +01:00
James Zern	c6b75a1966	lossless_(enc_\|)sse2: avoid offsetting a NULL pointer PredictorSub0_SSE2 doesn't use 'upper' (neither does VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing pixels to avoid undefined behavior when offsetting a NULL pointer BUG=chromium:1026858,oss-fuzz:19430 Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3	2019-12-13 18:33:10 +00:00
James Zern	c84673a62f	lossless_enc_sse{2,41}: quiet signed conv warnings _mm_set1_epi16 takes a short argument from clang-7 integer sanitizer: implicit conversion from type 'int' of value 65280 (32-bit, signed) to type 'short' changed the value to -256 (16-bit, signed) Change-Id: Iad64f6209a8c130a7df67515451ded45b3f91702	2019-06-15 00:22:03 -07:00
Vincent Rabaud	dea3e89983	Split HistogramAdd to only have the high level logic in C. Change-Id: Ic9eaebf7128ca0215b49d2a13bde1f5b94a28061	2018-10-19 14:03:28 +02:00
James Zern	8043504f95	lossless*sse2: improve non-const 16-bit vector creation use _mm_set1_epi32 instead of _mm_set_epi16 with non-const values; reduces shifts and ors. Change-Id: Ie2cb2ab815f642855d03c6f3001223bcac4bd35c	2018-02-17 17:59:20 -08:00
Pascal Massimino	0a17f4712c	Merge "WIP: list includes as descendants of the project dir"	2017-10-11 08:21:42 +00:00
James Zern	a439972175	WIP: list includes as descendants of the project dir #include "(.\|..)/..." -> #include "src/..." Change-Id: I772880aa097a770722043c8a4393552ba38a89b6	2017-10-10 23:04:05 -07:00
James Zern	0ac46e818b	lossless_enc_sse2: harmonize function suffixes BUG=webp:355 Change-Id: I06c64416103c3f3fc0519dd46d64b0a35f9798e4	2017-10-08 14:06:05 -07:00
skal	1411f02761	Lossless Enc: harmonize the function suffixes BUG=webp:355 Change-Id: I8baf506bd2a27095b956ef22a862b071f60c0d72	2017-08-07 18:02:07 -07:00
James Zern	7beed2807b	add missing ()s to macro parameters BUG=webp:355 Change-Id: I616c6d3540d6551edd1b1cfdb5bffcf0a044c90f	2017-08-04 17:02:53 -07:00
Vincent Rabaud	7c2779e95a	Get code to fully compile in C++. Change-Id: I6d8490c8c9b955d90dcc89ee8a9cf29ca0f93b08	2017-01-12 18:03:55 +01:00
Vincent Rabaud	1cb638010c	Call the C function to finish off lossless SSE loops only when necessary. Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad	2016-12-21 14:25:54 +01:00
Vincent Rabaud	875fafc191	Implement BundleColorMap in SSE2. Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e	2016-12-21 10:44:31 +01:00
Pascal Massimino	a4bbe4b38b	fix indentation Change-Id: I5593fb2441f253c6b8cc43949c11909f19184b55	2016-12-13 22:50:29 -08:00
Pascal Massimino	9cc421675b	PredictorSub: implement fully-SSE2 version and inline the C-version too. Predictor #13 is still a hard one. Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331	2016-12-13 02:19:35 -08:00
Vincent Rabaud	c9b45863e2	Split off common lossless dsp inline functions. Change-Id: I64f96897b11d1c21f033c7e47b21edccb5c68738	2016-09-12 17:35:08 +02:00
James Zern	b551e587b3	cosmetics: add {}s on continued control statements for consistency within the codebase. in some cases simply join the lines. Change-Id: I071f061052e274c8a69f651ed4305befb4414a40	2016-08-03 19:08:59 -07:00
Pascal Massimino	6c1d763119	avoid Yoda style for comparison Change-Id: I8ff9f96951e5e8a619f7132455dd281cbf91aa4d	2016-01-15 23:52:29 -08:00
Vincent Rabaud	8ce975ac82	SSE optimization for vector mismatch. Change-Id: I564b822033b59d86635230f29ed6197e306a2c4f	2016-01-07 18:23:45 +01:00
Vincent Rabaud	2835089d6a	Provide an SSE2 implementation of CombinedShannonEntropy. CombinedShannonEntropy takes 30% for lossless compression. This implementation speeds up the overall process by 2 to 3 %. Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f	2015-12-11 15:12:19 +01:00
James Zern	4741fac42e	dsp/lossless_*sse2: remove some unnecessary inlines TransformColor / TransformColorInverse are the top-level function pointer calls Change-Id: Ieabdb4005ff3e4f9bb3ebcb140ccb6bef5d28f8b	2015-06-25 21:02:01 -07:00
Pascal Massimino	1819965e0a	fix warning ("left shift of negative value") using a cast Change-Id: Ie99e8ff87924a1d15e2c5d83bd9adf07dab04e94	2015-06-24 23:46:09 -07:00
Pascal Massimino	7017001462	SSE2: speed-up some lossless-encoding functions optimized: CollectColorRedTransforms, CollectColorBlueTransforms, SubtractGreenFromBlueAndRed overall effect is sub-1% speed-up, though. Change-Id: I9cb49af5c56e4c03db417929b0a2cf575d60a5c6	2015-06-24 20:09:13 -07:00
Pascal Massimino	fc6c75a2a2	SSE2: 53% faster TransformColor[Inverse] Changed the code (again) to process 4 pixels at a time. Loop is more involved, but overall it's faster. Removed the SSE4.1 implementation which is now slower than SSE2. Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1	2015-06-23 14:52:01 -07:00
Pascal Massimino	49073da6d6	SSE2: 46% speed-up of TransformColor[Inverse] Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e	2015-06-23 20:09:04 +02:00
James Zern	b44eda3f60	dsp: add DSP_INIT_STUB generates a stub function when the specific architecture is not enabled, exposing a symbol in the module, avoiding a compiler warning Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147	2015-04-02 23:55:35 -07:00
James Zern	553051f741	dsp/lossless: split enc/dec functions adds lossless_enc*.c; reduces the size of the decode-only so: ~78K w/gcc-4.8.2 on x86_64. Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092	2015-03-23 22:57:50 -07:00

39 Commits