libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-06-09 23:34:21 +02:00

Author	SHA1	Message	Date
James Zern	668e1dd44f	src/{dec,enc,utils}: give filenames a unique suffix this avoids duplicates between these trees and dsp/, e.g., enc/tree.c, dec/tree.c, making pulling the whole library source tree into one target possible BUG=webp:279 Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775	2017-01-19 19:09:48 -08:00
Pascal Massimino	71c53f1aeb	NEON: speed-up strong filtering The sub-expression trick removes two constants and two vmlal_s8 instructions. Change-Id: I200022573b4880871b528b13a11a8f3d95def113	2017-01-19 20:46:48 +00:00
Pascal Massimino	749a45a520	Merge "NEON: implement alpha-filters (horizontal/vertical/gradient)"	2017-01-17 15:13:08 +00:00
Pascal Massimino	74c053b57d	Merge "NEON: fix overflow in SSE NxN calculation"	2017-01-17 15:10:54 +00:00
Pascal Massimino	1de931c669	NEON: implement alpha-filters (horizontal/vertical/gradient) gradient-filter code is not much faster, but maybe improvable in the future. Change-Id: Ia16070e409fe8703b02276166f19526917df6b35	2017-01-17 15:44:46 +01:00
Pascal Massimino	9b3aca404d	NEON: fix overflow in SSE NxN calculation vmlal_u8() is prone to overflow during the accumulation. There was a mismatch happening at low q mostly. Because in this case the distortion is important and the accumulated sum was later than 16bit-unsigned. Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8	2017-01-17 11:47:36 +01:00
Pascal Massimino	1c07a3c639	dsp: WebPExtractGreen function for alpha decompression + NEON implementation Change-Id: I67204f99d6e4c5974718bdf21dad30381978f72c	2017-01-17 09:33:25 +00:00
Pascal Massimino	8fda56126e	Merge "add a kSlowSSSE3 feature for CPUInfo"	2017-01-13 07:01:48 +00:00
Pascal Massimino	86bbd24552	add a kSlowSSSE3 feature for CPUInfo This is meant to be used for run-time detection of slow platforms regarding instructions like pshufb and bsr. Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731 Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640	2017-01-13 06:19:27 +00:00
Vincent Rabaud	7c2779e95a	Get code to fully compile in C++. Change-Id: I6d8490c8c9b955d90dcc89ee8a9cf29ca0f93b08	2017-01-12 18:03:55 +01:00
Vincent Rabaud	250c358662	Merge "When compiling as C++, avoid narrowing warnings."	2017-01-12 13:00:56 +00:00
Vincent Rabaud	c0648ac2ae	When compiling as C++, avoid narrowing warnings. The gcc compilation warning was: narrowing conversion from ‘int’ to ‘int8_t’ Change-Id: I4803dd60ad04060cdb5d61a1aa98b25215b9d4eb	2017-01-12 13:39:22 +01:00
Pascal Massimino	0d55f60c91	40% faster ApplyAlphaMultiply_SSE2 process four pixels at a time Change-Id: I1dee7f70772be4915654fc6638ef4729a1a239d4	2017-01-12 02:33:09 -08:00
Pascal Massimino	49d0280df1	NEON: implement several alpha-processing functions - ApplyAlphaMultiply - DispatchAlpha - DispatchAlphaToGreen - ExtractAlpha Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4) new file: alpha_processing_neon.c Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca	2017-01-11 17:39:29 +01:00
Pascal Massimino	48b1e85fbe	SSE2: 15% faster alpha-processing functions ApplyAlphaMultiply / MultARGBRow / MultRow we use now: x/255 = (x * 0x8081) >> (16 + 7) and x/255 + .5 = ((x + 128) * 0x0101) >> 16 Change-Id: I8931091316ffc8bbf65aa3402f2e7d2b800e1971	2017-01-11 15:35:16 +01:00
Pascal Massimino	28fe054e73	SSE2: 30% faster ApplyAlphaMultiply() and 15% faster MultARGBRow() by switching to formulae: X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X. (X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128 Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81	2017-01-10 23:34:22 +01:00
Pascal Massimino	be0ef6395f	fix a comment typo Change-Id: I0fabd08cd8abd3cea7ddfd2e498507adb0d3c67e	2017-01-10 21:17:13 +01:00
Pascal Massimino	00b08c88c0	Merge "NEON: 5% faster conversion to RGB565 and RGBA4444"	2016-12-22 08:39:01 +00:00
Pascal Massimino	0e7f444702	Merge "NEON: faster fancy upsampling"	2016-12-21 14:53:24 +00:00
Pascal Massimino	b016cb91c5	NEON: faster fancy upsampling 2-3% faster decoding overall Change-Id: I2c53e50dc7e0ade5245cff8cc5d7b96a14062955	2016-12-21 15:23:54 +01:00
Vincent Rabaud	1cb638010c	Call the C function to finish off lossless SSE loops only when necessary. Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad	2016-12-21 14:25:54 +01:00
Vincent Rabaud	875fafc191	Implement BundleColorMap in SSE2. Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e	2016-12-21 10:44:31 +01:00
Pascal Massimino	341d711c43	NEON: 5% faster conversion to RGB565 and RGBA4444 We use the magic 'shift and insert' instruction instead of the multiple shifts and or's. Change-Id: I48df0320668b502a91792defc0423a9441669d19	2016-12-20 17:01:48 +01:00
Pascal Massimino	a4bbe4b38b	fix indentation Change-Id: I5593fb2441f253c6b8cc43949c11909f19184b55	2016-12-13 22:50:29 -08:00
Pascal Massimino	58fc507842	Merge "PredictorSub: implement fully-SSE2 version"	2016-12-13 11:03:13 +00:00
Pascal Massimino	9cc421675b	PredictorSub: implement fully-SSE2 version and inline the C-version too. Predictor #13 is still a hard one. Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331	2016-12-13 02:19:35 -08:00
James Zern	2423017a28	dsp/lossless.c,cosmetics: fix indent after: fbba5bc optimize predictor #1 in plain-C For some reason, gcc has hard time inlining this one... Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b	2016-12-12 12:53:23 -08:00
Pascal Massimino	fbba5bc2c1	optimize predictor #1 in plain-C For some reason, gcc has hard time inlining this one... Also optimize predictor #0 and #1 for encoding, so we don't have to call the generic pointers VP8LPredictors[...] Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9	2016-12-12 17:41:36 +01:00
Pascal Massimino	9ae0b3f65a	Merge "SSE2: slightly (~2%) faster Predictor #1 "	2016-12-12 14:46:21 +00:00
Pascal Massimino	c1f97bd758	SSE2: slightly (~2%) faster Predictor #1 by removing a load from memory Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0	2016-12-12 02:24:38 -08:00
Pascal Massimino	ea664b8995	SSE2: 10% faster Predictor #11 Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185	2016-12-12 02:20:41 -08:00
Pascal Massimino	b3fb8bb602	slightly faster Predictor #11 in NEON (+some slight modifications on Predictor #12) Change-Id: Ic2132dcd83d961cd069fa01ca1670e35e35274e2	2016-12-08 07:32:51 -08:00
Pascal Massimino	76ebbfff28	NEON: implement predictor #13 ~5-7% faster Change-Id: I3361b0bbc978f3721168db15778a67337309c18a	2016-12-07 14:58:49 -08:00
Vincent Rabaud	95b12a08ae	Merge "Revert Average3 and Average4"	2016-12-07 15:38:56 +00:00
Vincent Rabaud	54ab2e758f	Revert Average3 and Average4 Average3 created a slowdown of 1-2% in lossless decoding. Average4 created a slowdown of 2-3% in lossless decoding. Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5	2016-12-07 15:32:33 +01:00
Pascal Massimino	fe12330c81	3-5% faster Predictor #5 , #6 , #7 and #10 for NEON Change-Id: Ica48c7088d4384f0888dd171a47e68ebd25729b2	2016-12-07 15:25:33 +01:00
Pascal Massimino	fbfb3bef7b	~2% faster predictor #10 for NEON Change-Id: Icd9cff90c227d702c3ba319131996c5475094520	2016-12-06 13:47:35 +00:00
Pascal Massimino	d4b7d801db	lossless_sse2: use the local functions ...instead of the pointers stored in the array. Should be faster (inlined) and safer. Also: suffix explicitly the functions with _SSE2 Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90	2016-12-06 14:20:41 +01:00
Vincent Rabaud	a5e3b22574	Lossless decoder SSE2 improvements. Change-Id: Ia901014ac63156a2e278b81e035256c30bdf8706	2016-12-06 13:45:09 +01:00
Pascal Massimino	58a1f124c2	~2% faster predictor #12 in NEON. Change-Id: I6772bb865d0f72720a65561eb55028e538df236d	2016-12-06 10:24:27 +01:00
Pascal Massimino	906c3b6392	Merge "Implement lossless transforms in NEON."	2016-12-03 16:55:14 +00:00
Vincent Rabaud	d23abe4e9f	Implement lossless transforms in NEON. Change-Id: I2172b1a763eb9dfe25d2b9bf1fb6501d7e192e55	2016-12-03 11:20:22 +00:00
Vincent Rabaud	2e6cb6f34e	Give more flexibility to the predictor generating macro. Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900	2016-12-02 12:33:12 -08:00
Vincent Rabaud	28e0bb7088	Merge "Fix race condition in multi-threading initialization."	2016-12-02 17:45:10 +00:00
Vincent Rabaud	647045305a	Fix race condition in multi-threading initialization. Before, a first thread could enter VP8LDspInitSSE2, set VP8LPredictorsAdd to an SSE2 version BEFORE another thread would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C thus leading to a C version actually being the SSE2 one (which would then create an infinite recursion in the SSE2 predictors at execution). Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a	2016-12-02 18:28:57 +01:00
Pascal Massimino	ea72cd60cb	add missing 'extern' keyword for predictor dcl Change-Id: Ibf3db9b6dae91e53524c31cdfccf4678b3fa1135	2016-12-01 08:15:14 +01:00
Vincent Rabaud	67879e6d48	SSE implementation of decoding predictors. Change-Id: I5c9ae63afc98013cb45ce8a91f051203ac68402c	2016-11-30 12:00:07 +01:00
Vincent Rabaud	4239a1489c	Make the lossless predictors work on a batch of pixels. Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1	2016-11-28 17:12:10 +01:00
Pascal Massimino	bc18ebad2e	fix extra 'const's in signatures Change-Id: Ie433d0defbc0c6feae2eb2f11e70082f1affada8	2016-11-25 09:45:52 +01:00
Vincent Rabaud	71e2f5cadf	Remove memcpy in lossless decoding. Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c	2016-11-24 17:45:24 +01:00

... 2 3 4 5 6 ...

863 Commits