libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-06-28 15:14:34 +02:00

Author	SHA1	Message	Date
James Zern	51f406a5d7	lossless_sse2: relocate VP8LDspInitSSE2 proto this is in line with the other dsp files and silences a build warning. Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2	2014-03-27 15:07:43 -07:00
skal	0f4f721b12	separate SSE2 lossless functions into its own file expose the predictor array as function pointers instead of each individual sub-function + merged Average2() into ClampedAddSubtractHalf directly + unified the signature as "VP8LProcessBlueAndRedFunc" no speed diff observed Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044	2014-03-27 21:43:55 +01:00
skal	514fc251df	VP8LConvertFromBGRA: use conversion function pointers Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c	2014-03-27 09:00:35 +01:00
James Zern	6d2f35273d	dsp/dec: TransformDCUV: use VP8TransformDC rather than forcing the C version; this is similar to TransformUV Change-Id: I2778194f05fca33e9b2b71323e92947c0b395e9a	2014-03-26 16:43:47 -07:00
James Zern	fbed36433d	Merge "dsp: reuse wht transform from dec in encoder"	2014-03-26 15:13:07 -07:00
skal	369c26dd3f	Add SSE2 version of ARGB -> BGR/RGB/... conversion functions ~4-6% faster lossless decoding Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361	2014-03-26 22:19:00 +01:00
James Zern	df230f2723	dsp: reuse wht transform from dec in encoder Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717	2014-03-22 13:25:08 -07:00
Vikas Arora	312e638f30	Extend the search space for GetBestGreenRedToBlue Get back some of the compression gains by extending the search space for GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not helping much in yielding better compression density. Before: 1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations) Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s After: 1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations) Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d	2014-03-14 09:56:00 -07:00
Vikas Arora	1c58526fe1	Fix few nits Add/remove few casts, fixed indentation. Change-Id: Icd141694201843c04e476f09142ce4be6e502dff	2014-03-13 13:57:39 -07:00
Vikas Arora	068b14ac57	Optimize lossless decoding. Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove one if condition inside the main/critial loop. Also separated TransformColor & TransformColorInverse into separate functions and avoid one 'if condition' inside this critical method. This change speeds up lossless decoding for Lenna image about 5% and 1000 image corpus by 3-4%. Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d	2014-03-13 11:27:12 -07:00
skal	c60de26099	~3-4% faster lossless encoding by re-arranging some code from SkipRepeatedPixel() Change-Id: I6c1fd7cd9af22cd9be4234217ff67d7b94f44137	2014-03-04 08:12:59 +01:00
James Zern	82ae1bf299	cosmetics: normalize VP8GetCPUInfo checks - use '!= NULL' + dec_neon/STORE_WHT: align '\'s Change-Id: I0f0ce49bd9c58e771bafb24c51c070d5ebd77e53	2014-02-28 18:47:41 -08:00
Vikas Arora	206cc1be5a	Refactor GetBestPredictorForTile for future tuning. This change doesn't impact compression gain or compression speed. Change-Id: Ia87d8a46c6f1ce0f8974178d75a6b0ba0a6e3696	2014-02-28 11:30:23 -08:00
James Zern	9a463c4a51	Merge "dec_neon: convert TransformWHT to intrinsics"	2014-02-25 14:36:44 -08:00
pascal massimino	e8605e9625	Merge "dec_neon: add ConvertU8ToS16"	2014-02-25 08:56:17 -08:00
Vikas Arora	c16cd99aba	Speed up lossless encoder. Speedup lossless encoder by 20-25% by optimizing: - GetBestColorTransformForTile: Use techniques like binary search and local minima search to reduce the search space. - VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for log(1 + x) and increase the threshold for calling the approximate version of log_2 (compared to costly call to log()). Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d	2014-02-21 22:13:50 -08:00
James Zern	9d6b5ff1e6	dec_neon: convert TransformWHT to intrinsics Change-Id: I34dc1d75ddebab131cfed031764117e3f7b75c6b	2014-02-21 11:23:46 -08:00
James Zern	2ff0aae2fe	dec_neon: add ConvertU8ToS16 Change-Id: Ifc4fb8e7f862e72154d2f2739811b1022d2b9416	2014-02-20 15:35:33 -08:00
skal	77a8f91981	fix compilation with USE_YUVj flag (not that we'll ever need it, but...) Change-Id: I9af993c62372097846c5ca6bae8362b59c3502dc	2014-02-20 13:23:18 +01:00
James Zern	2719bb7e98	dec_neon: TransformAC3: work on packed vectors pack 2 rows in 1 vector similar to TransformDC Change-Id: I3b240ffb4f51a632b5c8c2daf54d938333ed4b0d	2014-02-18 19:47:20 -08:00
James Zern	b7b60ca16c	dec_neon: add SaturateAndStore4x4 converts 2 s16 vectors to 2 u8 and store to uint8_t destination; TransformAC3 can reuse this after a rework Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0	2014-02-18 19:42:35 -08:00
James Zern	e02f16ef45	dec_neon.c: convert TransformDC to intrinsics no noticeable difference in performance Change-Id: Ia2d287289c3865ddd0fc99edaf7a030778aa7025	2014-02-14 12:11:58 -08:00
skal	9cba963f9a	add missing file Change-Id: I17eab2fedc64ee3bba941a592ecef765fcd2b402	2014-02-13 21:56:19 -08:00
skal	8992ddb756	use static clipping tables (shared with mips32) removed abs1[] table along the way sub-1% speed-up, but still... Change-Id: I8c29a8a0285076cb3423b01ffae9fcc465da6a81	2014-02-13 19:32:59 -08:00
skal	0235d5e44b	1-2% faster quantization in SSE2 C-version is a bit faster too (sub-1% faster on ARM) Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97	2014-02-13 15:55:30 -08:00
James Zern	228e4877ab	dec_neon.c: add TransformAC3 based on SSE2 version Change-Id: Icc6782955253c98e83d5984153b596ef5f1c0d34	2014-02-08 12:47:54 -08:00
skal	32aeaf115a	revamp VP8LColorSpaceTransform() a bit -> remove the 'color_transform' multiplier, use more constants, etc. This function is particularly critical, mostly because of GetBestColorTransformForTile(). Loop is a bit faster (maybe ~1%) Change-Id: I90c96a3437cafb184773acef55c77e40c224388f	2014-02-05 10:37:06 +01:00
skal	926ff40229	WEBP_SWAP_16BIT_CSP: remove code dup and prepare for potentially supporting both RGBA4444 and BARG4444 Change-Id: If5200289bc6338757a2ceb2df1a19de732595052	2014-02-03 13:24:33 -08:00
Vikas Arora	1d1cd3bbd6	Fix decode bug for rgbA_4444/RGBA_4444 color-modes. The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits) data in the destination buffer and while pre-multiplying the alpha to RGB colors. Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba	2014-02-03 09:20:54 -08:00
James Zern	8934a622ac	cosmetics: *_mips32.c indent, comments, unused includes Change-Id: Id0aabc52d05bb633f62aec022155ec27699cf5a0	2014-01-30 18:03:48 -08:00
Djordje Pesut	dd438c9a7d	MIPS: MIPS32r1: Optimization of some simple point-sampling functions. PATCH [6/6] Change-Id: I2020e71e9be5d17d4bf67cabf6c470ca43d5d838	2014-01-29 15:37:31 +01:00
Djordje Pesut	53520911c3	Added support for calling sampling functions via pointers. Change-Id: Ic4d72e6b175a6b27bcdcc8cd97828e44ea93e743	2014-01-29 15:32:35 +01:00
Jovan Zelincevic	d16c69749b	MIPS: MIPS32r1: Optimization of filter functions. PATCH [5/6] Change-Id: Ifbd305e0514f09a587db02c3970f22190808503a	2014-01-29 15:03:45 +01:00
Djordje Pesut	04336fc7f8	MIPS: MIPS32r1: Optimization of function TransformOne. PATCH [4/6] Change-Id: I5b98e2de940977538cf91bfa2128f4d1daa5c170	2014-01-28 20:10:43 -08:00
Pascal Massimino	c1cb1933d5	disable NEON for arm64 platform The registers and instructions are quite different to 32bit and the assembly code needs a rewrite. more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/ Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f	2014-01-23 12:35:01 -08:00
skal	66a32af5e1	Merge "NEON speed up"	2013-12-18 14:17:19 -08:00
skal	26d842eb8f	NEON speed up add TransformDC special case, and make the switch function inlined. Recovers a few of the CPU lost during the addition of TransformAC3 (only on ARM) Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6	2013-12-18 22:32:58 +01:00
James Zern	605a712701	simplify __cplusplus ifdef drop c_plusplus which is from a quite ancient pre-standard compiler Change-Id: I9e357b3292a6b52b14c2641ba11f4f872c04b7fb	2013-12-16 20:16:02 -08:00
James Zern	5227d99146	drop: ifdef __cplusplus checks from C files the prototypes are already marked in the headers Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094	2013-12-13 11:42:13 -08:00
skal	73b731fb42	introduce a special quantization function for WHT WHT is somewhat a special case: no sharpen[] bias, etc. Will be useful in a later CL when precision of input is changed. Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299	2013-12-10 14:21:47 +01:00
skal	41c0cc4b9a	Make Forward WHT transform use 32bit fixed-point calculation This is in preparation for a future change where input will be 16bit instead of 12bit No speed diff observed. Note that the NEON implementation was using 32bit calc already. Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4	2013-12-10 06:10:52 +01:00
skal	d513bb62bc	* fix off-by-one zthresh calculation * remove the sharpening for non luma-AC coeffs * adjust the bias a little bit to compensate for this Using the multiply-by-reciprocal doesn't always give the same result as the exact divide, given the QFIX fixed-point precision we use. -> removed few now-unneeded SSE2 instructions (and checked for bit-exactness using -noasm) Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24	2013-12-09 13:56:04 +01:00
James Zern	4931c3294b	cosmetics: fix some typos Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c	2013-11-26 19:21:14 -08:00
Pascal Massimino	596a6d73ce	make use of 'extern' consistent in function declarations Change-Id: I18e050db3111e52acfe97da09cdf1860f3e15936	2013-10-30 03:23:21 -07:00
skal	0b2b05049f	Use deterministic random-dithering during RGB->YUV conversion -> helps debanding (sky, gradients, etc.) This dithering can only be triggered when using -preset photo or -pre 2 (as a preprocessing). Everything is unchanged otherwise. Note that this change is likely to make the perceived PSNR/SSIM drop since we're altering the input internally. Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813	2013-10-17 22:36:49 +02:00
James Zern	dca8a4d315	Merge "NEON/simple loopfilter: avoid q4-q7 registers"	2013-10-10 01:58:41 -07:00
pascal massimino	9e84d901d2	Merge "NEON/TransformWHT: avoid q4-q7 registers"	2013-10-09 09:32:59 -07:00
James Zern	fc10249b36	NEON/simple loopfilter: avoid q4-q7 registers very tiny speed improvement Change-Id: I3024f120feb7275ce20bfff21af31ea8650a5a03	2013-10-09 18:17:31 +02:00
James Zern	2f09d63e30	NEON/TransformWHT: avoid q4-q7 registers very tiny speed improvement Change-Id: Iace78b9038af412d0a794845ff19f54afa88ccdc	2013-10-09 18:17:23 +02:00
skal	f9bbc2a034	Special-case sparse transform If the number of non-zero coeffs is <= 3, use a simplified transform for luma. Change-Id: I78a1252704228d21720d4bc1221252c84338d9c8	2013-10-08 22:05:38 +02:00

... 3 4 5 6 7 ...

367 Commits