libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-28 19:59:47 +02:00

Author	SHA1	Message	Date
Pascal Massimino	0d55f60c91	40% faster ApplyAlphaMultiply_SSE2 process four pixels at a time Change-Id: I1dee7f70772be4915654fc6638ef4729a1a239d4	2017-01-12 02:33:09 -08:00
Pascal Massimino	49d0280df1	NEON: implement several alpha-processing functions - ApplyAlphaMultiply - DispatchAlpha - DispatchAlphaToGreen - ExtractAlpha Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4) new file: alpha_processing_neon.c Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca	2017-01-11 17:39:29 +01:00
Pascal Massimino	48b1e85fbe	SSE2: 15% faster alpha-processing functions ApplyAlphaMultiply / MultARGBRow / MultRow we use now: x/255 = (x * 0x8081) >> (16 + 7) and x/255 + .5 = ((x + 128) * 0x0101) >> 16 Change-Id: I8931091316ffc8bbf65aa3402f2e7d2b800e1971	2017-01-11 15:35:16 +01:00
Pascal Massimino	28fe054e73	SSE2: 30% faster ApplyAlphaMultiply() and 15% faster MultARGBRow() by switching to formulae: X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X. (X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128 Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81	2017-01-10 23:34:22 +01:00
Pascal Massimino	be0ef6395f	fix a comment typo Change-Id: I0fabd08cd8abd3cea7ddfd2e498507adb0d3c67e	2017-01-10 21:17:13 +01:00
Pascal Massimino	00b08c88c0	Merge "NEON: 5% faster conversion to RGB565 and RGBA4444"	2016-12-22 08:39:01 +00:00
Pascal Massimino	0e7f444702	Merge "NEON: faster fancy upsampling"	2016-12-21 14:53:24 +00:00
Pascal Massimino	b016cb91c5	NEON: faster fancy upsampling 2-3% faster decoding overall Change-Id: I2c53e50dc7e0ade5245cff8cc5d7b96a14062955	2016-12-21 15:23:54 +01:00
Vincent Rabaud	1cb638010c	Call the C function to finish off lossless SSE loops only when necessary. Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad	2016-12-21 14:25:54 +01:00
Vincent Rabaud	875fafc191	Implement BundleColorMap in SSE2. Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e	2016-12-21 10:44:31 +01:00
Pascal Massimino	341d711c43	NEON: 5% faster conversion to RGB565 and RGBA4444 We use the magic 'shift and insert' instruction instead of the multiple shifts and or's. Change-Id: I48df0320668b502a91792defc0423a9441669d19	2016-12-20 17:01:48 +01:00
Pascal Massimino	a4bbe4b38b	fix indentation Change-Id: I5593fb2441f253c6b8cc43949c11909f19184b55	2016-12-13 22:50:29 -08:00
Pascal Massimino	58fc507842	Merge "PredictorSub: implement fully-SSE2 version"	2016-12-13 11:03:13 +00:00
Pascal Massimino	9cc421675b	PredictorSub: implement fully-SSE2 version and inline the C-version too. Predictor #13 is still a hard one. Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331	2016-12-13 02:19:35 -08:00
James Zern	2423017a28	dsp/lossless.c,cosmetics: fix indent after: `fbba5bc` optimize predictor #1 in plain-C For some reason, gcc has hard time inlining this one... Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b	2016-12-12 12:53:23 -08:00
Pascal Massimino	fbba5bc2c1	optimize predictor #1 in plain-C For some reason, gcc has hard time inlining this one... Also optimize predictor #0 and #1 for encoding, so we don't have to call the generic pointers VP8LPredictors[...] Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9	2016-12-12 17:41:36 +01:00
Pascal Massimino	9ae0b3f65a	Merge "SSE2: slightly (~2%) faster Predictor #1 "	2016-12-12 14:46:21 +00:00
Pascal Massimino	c1f97bd758	SSE2: slightly (~2%) faster Predictor #1 by removing a load from memory Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0	2016-12-12 02:24:38 -08:00
Pascal Massimino	ea664b8995	SSE2: 10% faster Predictor #11 Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185	2016-12-12 02:20:41 -08:00
Pascal Massimino	b3fb8bb602	slightly faster Predictor #11 in NEON (+some slight modifications on Predictor #12) Change-Id: Ic2132dcd83d961cd069fa01ca1670e35e35274e2	2016-12-08 07:32:51 -08:00
Pascal Massimino	76ebbfff28	NEON: implement predictor #13 ~5-7% faster Change-Id: I3361b0bbc978f3721168db15778a67337309c18a	2016-12-07 14:58:49 -08:00
Vincent Rabaud	95b12a08ae	Merge "Revert Average3 and Average4"	2016-12-07 15:38:56 +00:00
Vincent Rabaud	54ab2e758f	Revert Average3 and Average4 Average3 created a slowdown of 1-2% in lossless decoding. Average4 created a slowdown of 2-3% in lossless decoding. Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5	2016-12-07 15:32:33 +01:00
Pascal Massimino	fe12330c81	3-5% faster Predictor #5 , #6 , #7 and #10 for NEON Change-Id: Ica48c7088d4384f0888dd171a47e68ebd25729b2	2016-12-07 15:25:33 +01:00
Pascal Massimino	fbfb3bef7b	~2% faster predictor #10 for NEON Change-Id: Icd9cff90c227d702c3ba319131996c5475094520	2016-12-06 13:47:35 +00:00
Pascal Massimino	d4b7d801db	lossless_sse2: use the local functions ...instead of the pointers stored in the array. Should be faster (inlined) and safer. Also: suffix explicitly the functions with _SSE2 Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90	2016-12-06 14:20:41 +01:00
Vincent Rabaud	a5e3b22574	Lossless decoder SSE2 improvements. Change-Id: Ia901014ac63156a2e278b81e035256c30bdf8706	2016-12-06 13:45:09 +01:00
Pascal Massimino	58a1f124c2	~2% faster predictor #12 in NEON. Change-Id: I6772bb865d0f72720a65561eb55028e538df236d	2016-12-06 10:24:27 +01:00
Pascal Massimino	906c3b6392	Merge "Implement lossless transforms in NEON."	2016-12-03 16:55:14 +00:00
Vincent Rabaud	d23abe4e9f	Implement lossless transforms in NEON. Change-Id: I2172b1a763eb9dfe25d2b9bf1fb6501d7e192e55	2016-12-03 11:20:22 +00:00
Vincent Rabaud	2e6cb6f34e	Give more flexibility to the predictor generating macro. Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900	2016-12-02 12:33:12 -08:00
Vincent Rabaud	28e0bb7088	Merge "Fix race condition in multi-threading initialization."	2016-12-02 17:45:10 +00:00
Vincent Rabaud	647045305a	Fix race condition in multi-threading initialization. Before, a first thread could enter VP8LDspInitSSE2, set VP8LPredictorsAdd to an SSE2 version BEFORE another thread would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C thus leading to a C version actually being the SSE2 one (which would then create an infinite recursion in the SSE2 predictors at execution). Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a	2016-12-02 18:28:57 +01:00
Pascal Massimino	ea72cd60cb	add missing 'extern' keyword for predictor dcl Change-Id: Ibf3db9b6dae91e53524c31cdfccf4678b3fa1135	2016-12-01 08:15:14 +01:00
Vincent Rabaud	67879e6d48	SSE implementation of decoding predictors. Change-Id: I5c9ae63afc98013cb45ce8a91f051203ac68402c	2016-11-30 12:00:07 +01:00
Vincent Rabaud	4239a1489c	Make the lossless predictors work on a batch of pixels. Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1	2016-11-28 17:12:10 +01:00
Pascal Massimino	bc18ebad2e	fix extra 'const's in signatures Change-Id: Ie433d0defbc0c6feae2eb2f11e70082f1affada8	2016-11-25 09:45:52 +01:00
Vincent Rabaud	71e2f5cadf	Remove memcpy in lossless decoding. Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c	2016-11-24 17:45:24 +01:00
Vincent Rabaud	7474d46e45	Do not use a register array in SSE. Change-Id: I79cf95bdac1164fc4de899828e9380c23df8d141	2016-11-24 13:06:44 +01:00
Owen Rodley	67748b41db	Improve latency of FTransform2. Benchmarks from vrabaud@: 8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged images/lpcb corpus speed: unchanged, corpus size: unchanged images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged images/png_doc corpus speed: unchanged, corpus size: unchanged images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged images/jpeg_small corpus speed: unchanged, corpus size: unchanged images/icip_core1 corpus speed: unchanged, corpus size: unchanged images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged images/gif corpus speed: faster: -1.4 % , corpus size: unchanged images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged images/all corpus speed: faster: -1.5 % , corpus size: unchanged images/png_small corpus speed: unchanged, corpus size: unchanged images/png corpus speed: unchanged, corpus size: unchanged images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b	2016-11-24 07:09:50 +00:00
Vincent Rabaud	6540cd0eeb	Provide an SSE implementation of ConvertBGRAToRGB Change-Id: Ida11b079077a47fe3b92754f08aa30d81c301fcf	2016-11-23 16:25:51 +01:00
Pascal Massimino	3c2a61b099	remove some unneeded casts Change-Id: Ie68788c77f016ed11446a55142b1bd8d96261452	2016-11-16 22:54:40 -08:00
Pascal Massimino	9ac063c37f	add dsp functions for SmartYUV + SSE2 implementation Change-Id: I5cfdb62d68b5a95899241a097d3a2f697fbc590e	2016-11-16 14:23:06 +00:00
Pascal Massimino	31b1e34342	fix SSIM metric ... by ignoring too-dark area Roughly, if both the source and the reference areas are darker too dark (R/G/B <= ~6), they are ignored. One caveat: SSIM calculation won't work for U/V planes, which are 128-centered and not related to luminance. But WebPPlaneDistortion() enforces the conversion to RGB, if needed. Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591	2016-10-20 15:17:55 +02:00
Vincent Rabaud	28ce304344	Remove some errors when compiling the code as C++. This fixes some cases from https://bugs.chromium.org/p/webp/issues/detail?id=137 Change-Id: I58f3a617bf973dbe4c5794004a01e2aea39ba53a	2016-10-05 09:39:08 +02:00
Pascal Massimino	ba843a92e7	fix some SSIM calculations * prevent 64bit overflow by controlling the 32b->64b conversions and preventively descaling by 8bit before the final multiply * adjust the threshold constants C1 and C2 to de-emphasis the dark areas * use a hat-like filter instead of box-filtering to avoid blockiness during averaging SSIM distortion calc is actually faster now in SSE2, because of the unrolling during the function rewrite. The C-version is quite slower because still un-optimized. Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90	2016-10-04 01:09:07 -07:00
Pascal Massimino	86a84b3598	2x faster SSE2 implementation of SSIMGet Change-Id: I53705d7ddfa595389ff2d542e5088f96f948d351	2016-09-23 23:23:06 -07:00
Pascal Massimino	7c1fb7d0ff	fix uint32_t initialization (0. -> 0) Change-Id: Ia4aae27f70c4e74ddeb5654cfabb21d785cea9cf	2016-09-14 20:26:05 +02:00
Pascal Massimino	bfff0bf329	speed-up SSIM calculation SSIM results are incompatible with previous version! We're now averaging the SSIM value for each pixels instead of printing a frame-level global SSIM value. * Got rid of some old code * switched to uint32_t for accumulation * refactoring SSIM calculation is ~4x faster now. Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673	2016-09-14 16:15:43 +02:00
Vincent Rabaud	64577de8ae	De-VP8L-ize GetEntropUnrefinedHelper. Having it architecture dependent resulted in an extra function call of an extern function, hence no inlining and a 5-10% impact on performance. Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450	2016-09-14 13:55:24 +02:00

1 2 3 4 5 ...

701 Commits