libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-08-11 02:20:33 +02:00

Author	SHA1	Message	Date
skal	3f84b5219d	Merge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)"	2014-04-15 07:09:12 -07:00
Djordje Pesut	4ae0533f39	MIPS: MIPS32r1: Added optimizations for ExtraCost functions. ExtraCost and ExtraCostCombined Change-Id: I7eceb9ce2807296c6b43b974e4216879ddcd79f2	2014-04-15 15:37:06 +02:00
skal	b30a04cf11	WIP: extract the float-calculation of HuffmanCost from loop new function: VP8FinalHuffmanCost() Change-Id: I42102f8e5ef6d7a7af66490af77b7dc2048a9cb9	2014-04-15 14:52:52 +02:00
skal	a8fe8ce231	Merge "NEON intrinsics version of CollectHistogram"	2014-04-15 03:00:45 -07:00
skal	95203d2d1b	NEON intrinsics version of CollectHistogram apparently faster, but we might save some load/store to/from memory once we settle for the intrinsics-based FTransform() (also: fixed some #ifdef USE_INTRINSICS problems) Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7	2014-04-14 16:47:20 +02:00
skal	7ca2e74bb4	replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8) saves few instructions Change-Id: If8f464bb2894a209bba94825a4db9267df126d47	2014-04-14 15:14:45 +02:00
skal	41c6efbdc5	fix lossless_neon.c * some extra {xx , 0 } in initializers * replaced by vget_lane_u32() where appropriate Change-Id: Iabcd8ec34d7c853920491fb147a10d4472280a36	2014-04-14 14:27:11 +02:00
skal	8ff96a027a	NEON intrinsics version of FTransform as little bit slower than inlined asm it seems. So disabled for now. Change-Id: I8c942846f9bedaed57275675ea9dbbcb8dfd9ccd	2014-04-14 09:58:35 +02:00
Jovan Zelincevic	0214f4a908	Merge "MIPS: MIPS32r1: Added optimizations for FastLog2"	2014-04-10 08:54:12 -07:00
Jovan Zelincevic	baabf1ea3a	MIPS: MIPS32r1: Added optimizations for FastLog2 Functions VP8LFastLog2Slow and VP8LFastSLog2Slow also: replaced some "% y" by "& (y-1)" in the C-version (since y is a power-of-two) Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0	2014-04-10 08:32:51 -07:00
skal	3d49871dbe	NEON functions for lossless coding Verified OK, but right now they don't seem faster. So they are disabled behind a USE_INTRINSICS flag (off for now) Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c	2014-04-10 15:32:08 +02:00
Slobodan Prijic	3fe0291530	MIPS: MIPS32r1: Added optimizations for SSE functions. Change-Id: I1287fa65064192cc2edc5c4be2b1974be665b9b4	2014-04-09 11:02:13 +02:00
skal	c503b485b6	Merge "fix the gcc-4.6.0 bug by implementing alternative method"	2014-04-08 23:25:59 -07:00
skal	abe6f48709	fix the gcc-4.6.0 bug by implementing alternative method previous functions are a bit faster with gcc-4.8, so we keep them for now. Change-Id: I4081e5af66fbf606295d8a83875c1b889729b4dc	2014-04-09 07:53:55 +02:00
James Zern	5598bdecd8	enc_mips32.c: fix file mode Change-Id: I5a43320e2ea2eebc88c65398acb9ea59b63af1fd	2014-04-08 15:12:54 -07:00
Slobodan Prijic	2b1b4d5ae9	MIPS: MIPS32r1: Add optimization for GetResidualCost + reorganize the cost-evaluation code by moving some functions to cost.h/cost.c and exposing VP8Residual Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d	2014-04-08 15:28:49 +02:00
pascal massimino	f0a1f3cd51	Merge "MIPS: MIPS32r1: Added optimization for FTransform"	2014-04-08 04:17:27 -07:00
Djordje Pesut	7231f610aa	MIPS: MIPS32r1: Added optimization for FTransform Change-Id: I9384dac483e8f98bcfdd277a0a3d6ec7c7a7b297	2014-04-08 04:16:44 -07:00
skal	869eaf6c60	~30% encoding speedup: use NEON for QuantizeBlock() also revamped the signature to avoid having to pass the 'first' parameter Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5	2014-04-08 03:08:22 -07:00
James Zern	f758af6b73	enc_neon: convert FTransformWHT to intrinsics slightly faster than the inline asm in practice not much faster than the C-code in a full NEON build, but still better overall in an Android-like one that only enables NEON for certain files. Change-Id: I69534016186064fd92476d5eabc0f53462d53146	2014-04-08 00:20:19 -07:00
Djordje Pesut	7dad095bb4	MIPS: MIPS32r1: Added optimization for Disto4x4 (TTransform) Change-Id: Ieb20c5c52b964247cfe46f45f9a7415725bf7c02	2014-04-07 15:04:23 +02:00
Jovan Zelincevic	2298d5f301	MIPS: MIPS32r1: Added optimization for QuantizeBlock Change-Id: I6047ab107e4d474e35b5af1dac391d5b3d8c049b	2014-04-07 09:22:35 +02:00
Djordje Pesut	e88150c9b6	Merge "MIPS: MIPS32r1: Add optimization for ITransform"	2014-04-05 10:36:05 -07:00
James Zern	de693f2502	lossless_neon: disable VP8LConvert* functions due to breakage with NDK/gcc-4.6 builds Change-Id: Id96258e710ee33e08a023354b3227f27da986620	2014-04-04 20:38:29 -07:00
skal	4143332b22	NEON intrinsics for encoding * inverse transform is actually slower with intrinsics + gcc-4.6, so is left disabled for now. With gcc-4.8, it's a bit faster than inlined assembly. * Sum of Square error function provide a 2-3% speed up There's enabled by default (since there's no inlined-asm equivalent) Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a	2014-04-04 15:02:56 -07:00
Djordje Pesut	0ca2914b23	MIPS: MIPS32r1: Add optimization for ITransform Change-Id: Ie4c8b9bc3a7826bd443cdebf05386786fafe8c56	2014-04-04 10:50:35 +02:00
James Zern	71bca5ecf3	dec_neon: use vst_lane instead of vget_lane results in fewer instructions, small speed improvement Change-Id: I98de632d09ff09f295368c0d744cb4397b585084	2014-04-03 14:56:26 -07:00
skal	bf06105293	Intrinsics NEON version of TransformOne + misc cosmetics * seems 4% slower than inlined-asm with gcc-4.6 * is a tad faster (<1%) with gcc-4.8 (disabled for now) Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095	2014-04-03 14:41:56 -07:00
pascal massimino	19c6f1ba74	Merge "dec_neon: use vld?_lane instead of vset?_lane"	2014-04-03 01:16:29 -07:00
James Zern	7a94c0cf75	upsampling_neon: drop NEON suffix from local functions Change-Id: I6583ad74aacf78dcbeb5a0ff0218a39bc3460e5a	2014-04-02 23:24:39 -07:00
James Zern	d14669c83c	upsampling_sse2: drop SSE2 suffix from local functions Change-Id: I2349c1a8e5e15e1d204642096f84f3202721c297	2014-04-02 23:24:39 -07:00
James Zern	2ca42a4fb7	enc_sse2: drop SSE2 suffix from local functions Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e	2014-04-02 23:24:36 -07:00
James Zern	d038e6193b	dec_sse2: drop SSE2 suffix from local functions Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882	2014-04-02 23:10:39 -07:00
James Zern	fa52d7525f	dec_neon: use vld?_lane instead of vset?_lane results in fewer instructions, small speed improvement Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a	2014-04-02 23:03:18 -07:00
Pascal Massimino	c520e77d94	cosmetic: fix long line Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea	2014-04-02 23:00:50 -07:00
James Zern	4b0f2dae6f	Merge "add intrinsics NEON code for chroma strong-filtering"	2014-04-02 22:57:44 -07:00
skal	e351ec0759	add intrinsics NEON code for chroma strong-filtering The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t register, and re-use the previous (luma) functions Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38	2014-04-03 06:58:21 +02:00
pascal massimino	aaf734b8b0	Merge "Add SSE2 version of forward cross-color transform"	2014-04-02 14:18:59 -07:00
Urvang Joshi	c90a902eff	Add SSE2 version of forward cross-color transform Encoding speed is roughly the same. Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c	2014-04-02 12:21:20 -07:00
Vikas Arora	bc374ff39e	Use histogram_bits to initalize transform_bits. This change gains back 1% in compression density for method=3 and 0.5% for method=4, at the expense of 10% slower compression speed. Change-Id: I491aa1c726def934161d4a4377e009737fbeff82	2014-04-02 11:46:40 -07:00
James Zern	2132992d47	Merge "Add strong filtering intrinsics (inner and outer edges)"	2014-04-02 00:10:01 -07:00
skal	5fbff3a646	Add strong filtering intrinsics (inner and outer edges) + added some work-around gcc-4.6 to make it compile (except one function). + lots of revamping All variants tested ok. Speed-up is ~5-7% Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3	2014-04-02 08:28:55 +02:00
Urvang Joshi	d4813f0cb2	Add SSE2 function for Inverse Cross-color Transform Lossless decoding is now ~3% faster. Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743	2014-04-01 15:52:25 -07:00
James Zern	26029568b7	dec_neon: add strong loopfilter intrinsics vertical only currently, 2.5-3% faster placed under USE_INTRINSICS as this change depends on the simple loopfilter improves the simple loopfilter slightly thanks to some reorganization Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155	2014-04-01 01:13:50 -07:00
James Zern	cca7d7ef0f	Merge "add intrinsics version of SimpleHFilter16NEON()"	2014-04-01 00:57:11 -07:00
James Zern	1a05dfa7f5	windows: fix dll builds WebPSafe* need to be marked external to allow mux/demux to access them through libwebp.dll Change-Id: Ib6620e00d376f7aa5a0550e1e244f759977f97a0	2014-03-31 17:46:12 -07:00
skal	d6c50d8ac2	Merge "add some colorspace conversion functions in NEON"	2014-03-31 13:15:18 -07:00
Urvang Joshi	4fd7c82e6a	SSE2 variants of Subtract-Green: Rectify loop condition When 4 pixels are left, they should be processed with SSE2. Decoding is marginally faster (~0.4%). Encoding speed: No observable difference. Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e	2014-03-31 10:51:45 -07:00
skal	97e5fac389	add some colorspace conversion functions in NEON new file: lossless_neon.c speedup is ~5% gcc 4.6.3 seems to be doing some sub-optimal things here, storing register on stack using 'vstmia' and such. Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509 I've tried adding -fno-split-wide-types and it does help the generated assembly. But the overall speed gets worse with this flag. We should only compile lossless_neon.c with it -> urk. Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0	2014-03-31 17:47:46 +02:00
skal	b9a7a45f1f	add intrinsics version of SimpleHFilter16NEON() It's disable for now, because it crashes gcc-4.6.3 during compilation with -O2 or -O3. It's been tested OK with -O1. Code is still globally disabled with USE_INTRINSICS, though. Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c	2014-03-31 16:31:31 +02:00

1 2 3 4 5 ...

2020 Commits