libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-08-30 07:42:27 +02:00

Author	SHA1	Message	Date
James Zern	2ca42a4fb7	enc_sse2: drop SSE2 suffix from local functions Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e	2014-04-02 23:24:36 -07:00
James Zern	d038e6193b	dec_sse2: drop SSE2 suffix from local functions Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882	2014-04-02 23:10:39 -07:00
James Zern	fa52d7525f	dec_neon: use vld?_lane instead of vset?_lane results in fewer instructions, small speed improvement Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a	2014-04-02 23:03:18 -07:00
Pascal Massimino	c520e77d94	cosmetic: fix long line Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea	2014-04-02 23:00:50 -07:00
James Zern	4b0f2dae6f	Merge "add intrinsics NEON code for chroma strong-filtering"	2014-04-02 22:57:44 -07:00
skal	e351ec0759	add intrinsics NEON code for chroma strong-filtering The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t register, and re-use the previous (luma) functions Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38	2014-04-03 06:58:21 +02:00
pascal massimino	aaf734b8b0	Merge "Add SSE2 version of forward cross-color transform"	2014-04-02 14:18:59 -07:00
Urvang Joshi	c90a902eff	Add SSE2 version of forward cross-color transform Encoding speed is roughly the same. Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c	2014-04-02 12:21:20 -07:00
James Zern	2132992d47	Merge "Add strong filtering intrinsics (inner and outer edges)"	2014-04-02 00:10:01 -07:00
skal	5fbff3a646	Add strong filtering intrinsics (inner and outer edges) + added some work-around gcc-4.6 to make it compile (except one function). + lots of revamping All variants tested ok. Speed-up is ~5-7% Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3	2014-04-02 08:28:55 +02:00
Urvang Joshi	d4813f0cb2	Add SSE2 function for Inverse Cross-color Transform Lossless decoding is now ~3% faster. Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743	2014-04-01 15:52:25 -07:00
James Zern	26029568b7	dec_neon: add strong loopfilter intrinsics vertical only currently, 2.5-3% faster placed under USE_INTRINSICS as this change depends on the simple loopfilter improves the simple loopfilter slightly thanks to some reorganization Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155	2014-04-01 01:13:50 -07:00
James Zern	cca7d7ef0f	Merge "add intrinsics version of SimpleHFilter16NEON()"	2014-04-01 00:57:11 -07:00
skal	d6c50d8ac2	Merge "add some colorspace conversion functions in NEON"	2014-03-31 13:15:18 -07:00
Urvang Joshi	4fd7c82e6a	SSE2 variants of Subtract-Green: Rectify loop condition When 4 pixels are left, they should be processed with SSE2. Decoding is marginally faster (~0.4%). Encoding speed: No observable difference. Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e	2014-03-31 10:51:45 -07:00
skal	97e5fac389	add some colorspace conversion functions in NEON new file: lossless_neon.c speedup is ~5% gcc 4.6.3 seems to be doing some sub-optimal things here, storing register on stack using 'vstmia' and such. Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509 I've tried adding -fno-split-wide-types and it does help the generated assembly. But the overall speed gets worse with this flag. We should only compile lossless_neon.c with it -> urk. Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0	2014-03-31 17:47:46 +02:00
skal	b9a7a45f1f	add intrinsics version of SimpleHFilter16NEON() It's disable for now, because it crashes gcc-4.6.3 during compilation with -O2 or -O3. It's been tested OK with -O1. Code is still globally disabled with USE_INTRINSICS, though. Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c	2014-03-31 16:31:31 +02:00
Pascal Massimino	daccbf400d	add light filtering NEON intrinsics disabled for now (but tested OK), thanks to the USE_INTRINSICS #define We'll activate the code when we're on par with non-intrinsics Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff	2014-03-30 22:15:55 -07:00
Pascal Massimino	af44460880	fix typo in STORE_WHT was working ok because dst == out Change-Id: I27095129a11f468422250dd2b8fad8b3bd4e5bbd	2014-03-28 10:34:44 -07:00
James Zern	51f406a5d7	lossless_sse2: relocate VP8LDspInitSSE2 proto this is in line with the other dsp files and silences a build warning. Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2	2014-03-27 15:07:43 -07:00
skal	0f4f721b12	separate SSE2 lossless functions into its own file expose the predictor array as function pointers instead of each individual sub-function + merged Average2() into ClampedAddSubtractHalf directly + unified the signature as "VP8LProcessBlueAndRedFunc" no speed diff observed Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044	2014-03-27 21:43:55 +01:00
skal	514fc251df	VP8LConvertFromBGRA: use conversion function pointers Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c	2014-03-27 09:00:35 +01:00
James Zern	6d2f35273d	dsp/dec: TransformDCUV: use VP8TransformDC rather than forcing the C version; this is similar to TransformUV Change-Id: I2778194f05fca33e9b2b71323e92947c0b395e9a	2014-03-26 16:43:47 -07:00
James Zern	fbed36433d	Merge "dsp: reuse wht transform from dec in encoder"	2014-03-26 15:13:07 -07:00
skal	369c26dd3f	Add SSE2 version of ARGB -> BGR/RGB/... conversion functions ~4-6% faster lossless decoding Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361	2014-03-26 22:19:00 +01:00
James Zern	df230f2723	dsp: reuse wht transform from dec in encoder Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717	2014-03-22 13:25:08 -07:00
Vikas Arora	312e638f30	Extend the search space for GetBestGreenRedToBlue Get back some of the compression gains by extending the search space for GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not helping much in yielding better compression density. Before: 1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations) Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s After: 1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations) Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d	2014-03-14 09:56:00 -07:00
Vikas Arora	1c58526fe1	Fix few nits Add/remove few casts, fixed indentation. Change-Id: Icd141694201843c04e476f09142ce4be6e502dff	2014-03-13 13:57:39 -07:00
Vikas Arora	068b14ac57	Optimize lossless decoding. Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove one if condition inside the main/critial loop. Also separated TransformColor & TransformColorInverse into separate functions and avoid one 'if condition' inside this critical method. This change speeds up lossless decoding for Lenna image about 5% and 1000 image corpus by 3-4%. Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d	2014-03-13 11:27:12 -07:00
skal	c60de26099	~3-4% faster lossless encoding by re-arranging some code from SkipRepeatedPixel() Change-Id: I6c1fd7cd9af22cd9be4234217ff67d7b94f44137	2014-03-04 08:12:59 +01:00
James Zern	82ae1bf299	cosmetics: normalize VP8GetCPUInfo checks - use '!= NULL' + dec_neon/STORE_WHT: align '\'s Change-Id: I0f0ce49bd9c58e771bafb24c51c070d5ebd77e53	2014-02-28 18:47:41 -08:00
Vikas Arora	206cc1be5a	Refactor GetBestPredictorForTile for future tuning. This change doesn't impact compression gain or compression speed. Change-Id: Ia87d8a46c6f1ce0f8974178d75a6b0ba0a6e3696	2014-02-28 11:30:23 -08:00
James Zern	9a463c4a51	Merge "dec_neon: convert TransformWHT to intrinsics"	2014-02-25 14:36:44 -08:00
pascal massimino	e8605e9625	Merge "dec_neon: add ConvertU8ToS16"	2014-02-25 08:56:17 -08:00
Vikas Arora	c16cd99aba	Speed up lossless encoder. Speedup lossless encoder by 20-25% by optimizing: - GetBestColorTransformForTile: Use techniques like binary search and local minima search to reduce the search space. - VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for log(1 + x) and increase the threshold for calling the approximate version of log_2 (compared to costly call to log()). Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d	2014-02-21 22:13:50 -08:00
James Zern	9d6b5ff1e6	dec_neon: convert TransformWHT to intrinsics Change-Id: I34dc1d75ddebab131cfed031764117e3f7b75c6b	2014-02-21 11:23:46 -08:00
James Zern	2ff0aae2fe	dec_neon: add ConvertU8ToS16 Change-Id: Ifc4fb8e7f862e72154d2f2739811b1022d2b9416	2014-02-20 15:35:33 -08:00
skal	77a8f91981	fix compilation with USE_YUVj flag (not that we'll ever need it, but...) Change-Id: I9af993c62372097846c5ca6bae8362b59c3502dc	2014-02-20 13:23:18 +01:00
James Zern	2719bb7e98	dec_neon: TransformAC3: work on packed vectors pack 2 rows in 1 vector similar to TransformDC Change-Id: I3b240ffb4f51a632b5c8c2daf54d938333ed4b0d	2014-02-18 19:47:20 -08:00
James Zern	b7b60ca16c	dec_neon: add SaturateAndStore4x4 converts 2 s16 vectors to 2 u8 and store to uint8_t destination; TransformAC3 can reuse this after a rework Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0	2014-02-18 19:42:35 -08:00
James Zern	e02f16ef45	dec_neon.c: convert TransformDC to intrinsics no noticeable difference in performance Change-Id: Ia2d287289c3865ddd0fc99edaf7a030778aa7025	2014-02-14 12:11:58 -08:00
skal	9cba963f9a	add missing file Change-Id: I17eab2fedc64ee3bba941a592ecef765fcd2b402	2014-02-13 21:56:19 -08:00
skal	8992ddb756	use static clipping tables (shared with mips32) removed abs1[] table along the way sub-1% speed-up, but still... Change-Id: I8c29a8a0285076cb3423b01ffae9fcc465da6a81	2014-02-13 19:32:59 -08:00
skal	0235d5e44b	1-2% faster quantization in SSE2 C-version is a bit faster too (sub-1% faster on ARM) Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97	2014-02-13 15:55:30 -08:00
James Zern	228e4877ab	dec_neon.c: add TransformAC3 based on SSE2 version Change-Id: Icc6782955253c98e83d5984153b596ef5f1c0d34	2014-02-08 12:47:54 -08:00
skal	32aeaf115a	revamp VP8LColorSpaceTransform() a bit -> remove the 'color_transform' multiplier, use more constants, etc. This function is particularly critical, mostly because of GetBestColorTransformForTile(). Loop is a bit faster (maybe ~1%) Change-Id: I90c96a3437cafb184773acef55c77e40c224388f	2014-02-05 10:37:06 +01:00
skal	926ff40229	WEBP_SWAP_16BIT_CSP: remove code dup and prepare for potentially supporting both RGBA4444 and BARG4444 Change-Id: If5200289bc6338757a2ceb2df1a19de732595052	2014-02-03 13:24:33 -08:00
Vikas Arora	1d1cd3bbd6	Fix decode bug for rgbA_4444/RGBA_4444 color-modes. The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits) data in the destination buffer and while pre-multiplying the alpha to RGB colors. Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba	2014-02-03 09:20:54 -08:00
James Zern	8934a622ac	cosmetics: *_mips32.c indent, comments, unused includes Change-Id: Id0aabc52d05bb633f62aec022155ec27699cf5a0	2014-01-30 18:03:48 -08:00
Djordje Pesut	dd438c9a7d	MIPS: MIPS32r1: Optimization of some simple point-sampling functions. PATCH [6/6] Change-Id: I2020e71e9be5d17d4bf67cabf6c470ca43d5d838	2014-01-29 15:37:31 +01:00

1 2 3 4 5 ...

286 Commits