libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-08-16 17:08:08 +02:00

Author	SHA1	Message	Date
Jovan Zelincevic	0214f4a908	Merge "MIPS: MIPS32r1: Added optimizations for FastLog2"	2014-04-10 08:54:12 -07:00
Jovan Zelincevic	baabf1ea3a	MIPS: MIPS32r1: Added optimizations for FastLog2 Functions VP8LFastLog2Slow and VP8LFastSLog2Slow also: replaced some "% y" by "& (y-1)" in the C-version (since y is a power-of-two) Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0	2014-04-10 08:32:51 -07:00
skal	3d49871dbe	NEON functions for lossless coding Verified OK, but right now they don't seem faster. So they are disabled behind a USE_INTRINSICS flag (off for now) Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c	2014-04-10 15:32:08 +02:00
Slobodan Prijic	3fe0291530	MIPS: MIPS32r1: Added optimizations for SSE functions. Change-Id: I1287fa65064192cc2edc5c4be2b1974be665b9b4	2014-04-09 11:02:13 +02:00
skal	c503b485b6	Merge "fix the gcc-4.6.0 bug by implementing alternative method"	2014-04-08 23:25:59 -07:00
skal	abe6f48709	fix the gcc-4.6.0 bug by implementing alternative method previous functions are a bit faster with gcc-4.8, so we keep them for now. Change-Id: I4081e5af66fbf606295d8a83875c1b889729b4dc	2014-04-09 07:53:55 +02:00
James Zern	5598bdecd8	enc_mips32.c: fix file mode Change-Id: I5a43320e2ea2eebc88c65398acb9ea59b63af1fd	2014-04-08 15:12:54 -07:00
Slobodan Prijic	2b1b4d5ae9	MIPS: MIPS32r1: Add optimization for GetResidualCost + reorganize the cost-evaluation code by moving some functions to cost.h/cost.c and exposing VP8Residual Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d	2014-04-08 15:28:49 +02:00
pascal massimino	f0a1f3cd51	Merge "MIPS: MIPS32r1: Added optimization for FTransform"	2014-04-08 04:17:27 -07:00
Djordje Pesut	7231f610aa	MIPS: MIPS32r1: Added optimization for FTransform Change-Id: I9384dac483e8f98bcfdd277a0a3d6ec7c7a7b297	2014-04-08 04:16:44 -07:00
skal	869eaf6c60	~30% encoding speedup: use NEON for QuantizeBlock() also revamped the signature to avoid having to pass the 'first' parameter Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5	2014-04-08 03:08:22 -07:00
James Zern	f758af6b73	enc_neon: convert FTransformWHT to intrinsics slightly faster than the inline asm in practice not much faster than the C-code in a full NEON build, but still better overall in an Android-like one that only enables NEON for certain files. Change-Id: I69534016186064fd92476d5eabc0f53462d53146	2014-04-08 00:20:19 -07:00
Djordje Pesut	7dad095bb4	MIPS: MIPS32r1: Added optimization for Disto4x4 (TTransform) Change-Id: Ieb20c5c52b964247cfe46f45f9a7415725bf7c02	2014-04-07 15:04:23 +02:00
Jovan Zelincevic	2298d5f301	MIPS: MIPS32r1: Added optimization for QuantizeBlock Change-Id: I6047ab107e4d474e35b5af1dac391d5b3d8c049b	2014-04-07 09:22:35 +02:00
Djordje Pesut	e88150c9b6	Merge "MIPS: MIPS32r1: Add optimization for ITransform"	2014-04-05 10:36:05 -07:00
James Zern	de693f2502	lossless_neon: disable VP8LConvert* functions due to breakage with NDK/gcc-4.6 builds Change-Id: Id96258e710ee33e08a023354b3227f27da986620	2014-04-04 20:38:29 -07:00
skal	4143332b22	NEON intrinsics for encoding * inverse transform is actually slower with intrinsics + gcc-4.6, so is left disabled for now. With gcc-4.8, it's a bit faster than inlined assembly. * Sum of Square error function provide a 2-3% speed up There's enabled by default (since there's no inlined-asm equivalent) Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a	2014-04-04 15:02:56 -07:00
Djordje Pesut	0ca2914b23	MIPS: MIPS32r1: Add optimization for ITransform Change-Id: Ie4c8b9bc3a7826bd443cdebf05386786fafe8c56	2014-04-04 10:50:35 +02:00
James Zern	71bca5ecf3	dec_neon: use vst_lane instead of vget_lane results in fewer instructions, small speed improvement Change-Id: I98de632d09ff09f295368c0d744cb4397b585084	2014-04-03 14:56:26 -07:00
skal	bf06105293	Intrinsics NEON version of TransformOne + misc cosmetics * seems 4% slower than inlined-asm with gcc-4.6 * is a tad faster (<1%) with gcc-4.8 (disabled for now) Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095	2014-04-03 14:41:56 -07:00
pascal massimino	19c6f1ba74	Merge "dec_neon: use vld?_lane instead of vset?_lane"	2014-04-03 01:16:29 -07:00
James Zern	7a94c0cf75	upsampling_neon: drop NEON suffix from local functions Change-Id: I6583ad74aacf78dcbeb5a0ff0218a39bc3460e5a	2014-04-02 23:24:39 -07:00
James Zern	d14669c83c	upsampling_sse2: drop SSE2 suffix from local functions Change-Id: I2349c1a8e5e15e1d204642096f84f3202721c297	2014-04-02 23:24:39 -07:00
James Zern	2ca42a4fb7	enc_sse2: drop SSE2 suffix from local functions Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e	2014-04-02 23:24:36 -07:00
James Zern	d038e6193b	dec_sse2: drop SSE2 suffix from local functions Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882	2014-04-02 23:10:39 -07:00
James Zern	fa52d7525f	dec_neon: use vld?_lane instead of vset?_lane results in fewer instructions, small speed improvement Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a	2014-04-02 23:03:18 -07:00
Pascal Massimino	c520e77d94	cosmetic: fix long line Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea	2014-04-02 23:00:50 -07:00
James Zern	4b0f2dae6f	Merge "add intrinsics NEON code for chroma strong-filtering"	2014-04-02 22:57:44 -07:00
skal	e351ec0759	add intrinsics NEON code for chroma strong-filtering The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t register, and re-use the previous (luma) functions Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38	2014-04-03 06:58:21 +02:00
pascal massimino	aaf734b8b0	Merge "Add SSE2 version of forward cross-color transform"	2014-04-02 14:18:59 -07:00
Urvang Joshi	c90a902eff	Add SSE2 version of forward cross-color transform Encoding speed is roughly the same. Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c	2014-04-02 12:21:20 -07:00
James Zern	2132992d47	Merge "Add strong filtering intrinsics (inner and outer edges)"	2014-04-02 00:10:01 -07:00
skal	5fbff3a646	Add strong filtering intrinsics (inner and outer edges) + added some work-around gcc-4.6 to make it compile (except one function). + lots of revamping All variants tested ok. Speed-up is ~5-7% Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3	2014-04-02 08:28:55 +02:00
Urvang Joshi	d4813f0cb2	Add SSE2 function for Inverse Cross-color Transform Lossless decoding is now ~3% faster. Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743	2014-04-01 15:52:25 -07:00
James Zern	26029568b7	dec_neon: add strong loopfilter intrinsics vertical only currently, 2.5-3% faster placed under USE_INTRINSICS as this change depends on the simple loopfilter improves the simple loopfilter slightly thanks to some reorganization Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155	2014-04-01 01:13:50 -07:00
James Zern	cca7d7ef0f	Merge "add intrinsics version of SimpleHFilter16NEON()"	2014-04-01 00:57:11 -07:00
skal	d6c50d8ac2	Merge "add some colorspace conversion functions in NEON"	2014-03-31 13:15:18 -07:00
Urvang Joshi	4fd7c82e6a	SSE2 variants of Subtract-Green: Rectify loop condition When 4 pixels are left, they should be processed with SSE2. Decoding is marginally faster (~0.4%). Encoding speed: No observable difference. Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e	2014-03-31 10:51:45 -07:00
skal	97e5fac389	add some colorspace conversion functions in NEON new file: lossless_neon.c speedup is ~5% gcc 4.6.3 seems to be doing some sub-optimal things here, storing register on stack using 'vstmia' and such. Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509 I've tried adding -fno-split-wide-types and it does help the generated assembly. But the overall speed gets worse with this flag. We should only compile lossless_neon.c with it -> urk. Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0	2014-03-31 17:47:46 +02:00
skal	b9a7a45f1f	add intrinsics version of SimpleHFilter16NEON() It's disable for now, because it crashes gcc-4.6.3 during compilation with -O2 or -O3. It's been tested OK with -O1. Code is still globally disabled with USE_INTRINSICS, though. Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c	2014-03-31 16:31:31 +02:00
Pascal Massimino	daccbf400d	add light filtering NEON intrinsics disabled for now (but tested OK), thanks to the USE_INTRINSICS #define We'll activate the code when we're on par with non-intrinsics Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff	2014-03-30 22:15:55 -07:00
Pascal Massimino	af44460880	fix typo in STORE_WHT was working ok because dst == out Change-Id: I27095129a11f468422250dd2b8fad8b3bd4e5bbd	2014-03-28 10:34:44 -07:00
James Zern	51f406a5d7	lossless_sse2: relocate VP8LDspInitSSE2 proto this is in line with the other dsp files and silences a build warning. Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2	2014-03-27 15:07:43 -07:00
skal	0f4f721b12	separate SSE2 lossless functions into its own file expose the predictor array as function pointers instead of each individual sub-function + merged Average2() into ClampedAddSubtractHalf directly + unified the signature as "VP8LProcessBlueAndRedFunc" no speed diff observed Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044	2014-03-27 21:43:55 +01:00
skal	514fc251df	VP8LConvertFromBGRA: use conversion function pointers Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c	2014-03-27 09:00:35 +01:00
James Zern	6d2f35273d	dsp/dec: TransformDCUV: use VP8TransformDC rather than forcing the C version; this is similar to TransformUV Change-Id: I2778194f05fca33e9b2b71323e92947c0b395e9a	2014-03-26 16:43:47 -07:00
James Zern	fbed36433d	Merge "dsp: reuse wht transform from dec in encoder"	2014-03-26 15:13:07 -07:00
skal	369c26dd3f	Add SSE2 version of ARGB -> BGR/RGB/... conversion functions ~4-6% faster lossless decoding Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361	2014-03-26 22:19:00 +01:00
James Zern	df230f2723	dsp: reuse wht transform from dec in encoder Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717	2014-03-22 13:25:08 -07:00
Vikas Arora	312e638f30	Extend the search space for GetBestGreenRedToBlue Get back some of the compression gains by extending the search space for GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not helping much in yielding better compression density. Before: 1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations) Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s After: 1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations) Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d	2014-03-14 09:56:00 -07:00

1 2 3 4 5

209 Commits