libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-12-24 05:56:27 +01:00

Author	SHA1	Message	Date
Jovan Zelincevic	2298d5f301	MIPS: MIPS32r1: Added optimization for QuantizeBlock Change-Id: I6047ab107e4d474e35b5af1dac391d5b3d8c049b	2014-04-07 09:22:35 +02:00
Djordje Pesut	e88150c9b6	Merge "MIPS: MIPS32r1: Add optimization for ITransform"	2014-04-05 10:36:05 -07:00
James Zern	de693f2502	lossless_neon: disable VP8LConvert* functions due to breakage with NDK/gcc-4.6 builds Change-Id: Id96258e710ee33e08a023354b3227f27da986620	2014-04-04 20:38:29 -07:00
skal	4143332b22	NEON intrinsics for encoding * inverse transform is actually slower with intrinsics + gcc-4.6, so is left disabled for now. With gcc-4.8, it's a bit faster than inlined assembly. * Sum of Square error function provide a 2-3% speed up There's enabled by default (since there's no inlined-asm equivalent) Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a	2014-04-04 15:02:56 -07:00
Djordje Pesut	0ca2914b23	MIPS: MIPS32r1: Add optimization for ITransform Change-Id: Ie4c8b9bc3a7826bd443cdebf05386786fafe8c56	2014-04-04 10:50:35 +02:00
James Zern	71bca5ecf3	dec_neon: use vst_lane instead of vget_lane results in fewer instructions, small speed improvement Change-Id: I98de632d09ff09f295368c0d744cb4397b585084	2014-04-03 14:56:26 -07:00
skal	bf06105293	Intrinsics NEON version of TransformOne + misc cosmetics * seems 4% slower than inlined-asm with gcc-4.6 * is a tad faster (<1%) with gcc-4.8 (disabled for now) Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095	2014-04-03 14:41:56 -07:00
pascal massimino	19c6f1ba74	Merge "dec_neon: use vld?_lane instead of vset?_lane"	2014-04-03 01:16:29 -07:00
James Zern	7a94c0cf75	upsampling_neon: drop NEON suffix from local functions Change-Id: I6583ad74aacf78dcbeb5a0ff0218a39bc3460e5a	2014-04-02 23:24:39 -07:00
James Zern	d14669c83c	upsampling_sse2: drop SSE2 suffix from local functions Change-Id: I2349c1a8e5e15e1d204642096f84f3202721c297	2014-04-02 23:24:39 -07:00
James Zern	2ca42a4fb7	enc_sse2: drop SSE2 suffix from local functions Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e	2014-04-02 23:24:36 -07:00
James Zern	d038e6193b	dec_sse2: drop SSE2 suffix from local functions Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882	2014-04-02 23:10:39 -07:00
James Zern	fa52d7525f	dec_neon: use vld?_lane instead of vset?_lane results in fewer instructions, small speed improvement Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a	2014-04-02 23:03:18 -07:00
Pascal Massimino	c520e77d94	cosmetic: fix long line Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea	2014-04-02 23:00:50 -07:00
James Zern	4b0f2dae6f	Merge "add intrinsics NEON code for chroma strong-filtering"	2014-04-02 22:57:44 -07:00
skal	e351ec0759	add intrinsics NEON code for chroma strong-filtering The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t register, and re-use the previous (luma) functions Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38	2014-04-03 06:58:21 +02:00
pascal massimino	aaf734b8b0	Merge "Add SSE2 version of forward cross-color transform"	2014-04-02 14:18:59 -07:00
Urvang Joshi	c90a902eff	Add SSE2 version of forward cross-color transform Encoding speed is roughly the same. Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c	2014-04-02 12:21:20 -07:00
Vikas Arora	bc374ff39e	Use histogram_bits to initalize transform_bits. This change gains back 1% in compression density for method=3 and 0.5% for method=4, at the expense of 10% slower compression speed. Change-Id: I491aa1c726def934161d4a4377e009737fbeff82	2014-04-02 11:46:40 -07:00
James Zern	2132992d47	Merge "Add strong filtering intrinsics (inner and outer edges)"	2014-04-02 00:10:01 -07:00
skal	5fbff3a646	Add strong filtering intrinsics (inner and outer edges) + added some work-around gcc-4.6 to make it compile (except one function). + lots of revamping All variants tested ok. Speed-up is ~5-7% Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3	2014-04-02 08:28:55 +02:00
Urvang Joshi	d4813f0cb2	Add SSE2 function for Inverse Cross-color Transform Lossless decoding is now ~3% faster. Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743	2014-04-01 15:52:25 -07:00
James Zern	26029568b7	dec_neon: add strong loopfilter intrinsics vertical only currently, 2.5-3% faster placed under USE_INTRINSICS as this change depends on the simple loopfilter improves the simple loopfilter slightly thanks to some reorganization Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155	2014-04-01 01:13:50 -07:00
James Zern	cca7d7ef0f	Merge "add intrinsics version of SimpleHFilter16NEON()"	2014-04-01 00:57:11 -07:00
James Zern	1a05dfa7f5	windows: fix dll builds WebPSafe* need to be marked external to allow mux/demux to access them through libwebp.dll Change-Id: Ib6620e00d376f7aa5a0550e1e244f759977f97a0	2014-03-31 17:46:12 -07:00
skal	d6c50d8ac2	Merge "add some colorspace conversion functions in NEON"	2014-03-31 13:15:18 -07:00
Urvang Joshi	4fd7c82e6a	SSE2 variants of Subtract-Green: Rectify loop condition When 4 pixels are left, they should be processed with SSE2. Decoding is marginally faster (~0.4%). Encoding speed: No observable difference. Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e	2014-03-31 10:51:45 -07:00
skal	97e5fac389	add some colorspace conversion functions in NEON new file: lossless_neon.c speedup is ~5% gcc 4.6.3 seems to be doing some sub-optimal things here, storing register on stack using 'vstmia' and such. Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509 I've tried adding -fno-split-wide-types and it does help the generated assembly. But the overall speed gets worse with this flag. We should only compile lossless_neon.c with it -> urk. Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0	2014-03-31 17:47:46 +02:00
skal	b9a7a45f1f	add intrinsics version of SimpleHFilter16NEON() It's disable for now, because it crashes gcc-4.6.3 during compilation with -O2 or -O3. It's been tested OK with -O1. Code is still globally disabled with USE_INTRINSICS, though. Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c	2014-03-31 16:31:31 +02:00
Pascal Massimino	daccbf400d	add light filtering NEON intrinsics disabled for now (but tested OK), thanks to the USE_INTRINSICS #define We'll activate the code when we're on par with non-intrinsics Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff	2014-03-30 22:15:55 -07:00
Pascal Massimino	af44460880	fix typo in STORE_WHT was working ok because dst == out Change-Id: I27095129a11f468422250dd2b8fad8b3bd4e5bbd	2014-03-28 10:34:44 -07:00
Vikas Arora	6af6b8e1b6	Tune HistogramCombineBin for large images. Tune HistogramCombineBin for hard images that are larger than 1-2 Mega pixel and represent photographic images. This speeds up lossless encoding on 1000 image corpus by 10-12% and compression penalty of 0.1-0.2%. Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81	2014-03-28 07:09:59 -07:00
skal	af93bdd6bc	use WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free there's still some malloc/free in the external example This is an encoder API change because of the introduction of WebPMemoryWriterClear() for symmetry reasons. The MemoryWriter object should probably go in examples/ instead of being in the main lib, though. mux_types.h stil contain some inlined free()/malloc() that are harder to remove (we need to put them in the libwebputils lib and make sure link is ok). Left as a TODO for now. Also: WebPDecodeRGB*() function are still returning a pointer that needs to be free()'d. We should call WebPSafeFree() on these, but it means exposing the whole mechanism. TODO(later). Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b	2014-03-27 15:50:59 -07:00
James Zern	51f406a5d7	lossless_sse2: relocate VP8LDspInitSSE2 proto this is in line with the other dsp files and silences a build warning. Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2	2014-03-27 15:07:43 -07:00
skal	0f4f721b12	separate SSE2 lossless functions into its own file expose the predictor array as function pointers instead of each individual sub-function + merged Average2() into ClampedAddSubtractHalf directly + unified the signature as "VP8LProcessBlueAndRedFunc" no speed diff observed Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044	2014-03-27 21:43:55 +01:00
skal	514fc251df	VP8LConvertFromBGRA: use conversion function pointers Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c	2014-03-27 09:00:35 +01:00
James Zern	6d2f35273d	dsp/dec: TransformDCUV: use VP8TransformDC rather than forcing the C version; this is similar to TransformUV Change-Id: I2778194f05fca33e9b2b71323e92947c0b395e9a	2014-03-26 16:43:47 -07:00
skal	defc8e1b01	Merge "fix out-of-bound read during alpha-plane decoding"	2014-03-26 15:22:42 -07:00
James Zern	fbed36433d	Merge "dsp: reuse wht transform from dec in encoder"	2014-03-26 15:13:07 -07:00
skal	d846708400	Merge "Add SSE2 version of ARGB -> BGR/RGB/... conversion functions"	2014-03-26 15:01:46 -07:00
skal	207d03b484	fix out-of-bound read during alpha-plane decoding With -bypass_filter switched on, the lossless-compressed data is decoded ahead of time (before being transformed and display). Hence, the last row was called twice. http://code.google.com/p/webp/issues/detail?id=193 Change-Id: I9e13f495f6bd6f75fa84c4a21911f14c402d4b10	2014-03-26 22:45:03 +01:00
skal	d1b33ad58b	2-5% faster trellis with clang/MacOS (and ~2-3% on ARM) We don't need to store cost/score for each node, but only for the current and previous one -> simplify code and save some memory. Also made the 'Node' structure tighter. Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6	2014-03-26 22:33:01 +01:00
skal	369c26dd3f	Add SSE2 version of ARGB -> BGR/RGB/... conversion functions ~4-6% faster lossless decoding Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361	2014-03-26 22:19:00 +01:00
James Zern	df230f2723	dsp: reuse wht transform from dec in encoder Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717	2014-03-22 13:25:08 -07:00
James Zern	80e218d43a	Android.mk: fix build with APP_ABI=armeabi-v7a-hard added in r9d; relax the check to build neon code Change-Id: Ic52b3fbd3bf53617ee52b07a55b0ed05f6f9b26f	2014-03-20 23:24:39 -07:00
Pascal Massimino	59daf08362	Merge "cosmetics:"	2014-03-18 04:02:33 -07:00
Pascal Massimino	536220084c	cosmetics: - use VP8ScanUV, separate from VP8Scan[] (for luma) - fix indentation - few missing consts - change TrellisQuantizeBlock() signature Change-Id: I94b437d791cbf887015772b5923feb83dd145530	2014-03-18 03:34:56 -07:00
James Zern	3e7f34a3fb	AssignSegments: quiet array-bounds warning nb (enc->segment_hdr_.num_segments_) will be in the range [1, NUM_MB_SEGMENTS]. Change-Id: I5c2bd0bb82b17c99aff39c98b6b1747fc040dc16	2014-03-14 18:47:52 -07:00
James Zern	3c2ebf58a4	Merge "UpdateHistogramCost: avoid implicit double->float"	2014-03-14 15:50:57 -07:00
James Zern	cf821c821f	UpdateHistogramCost: avoid implicit double->float all the functions involved return double and later these locals are used in double calculations. fixes a vs build warning Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14	2014-03-14 11:18:52 -07:00

... 2 3 4 5 6 ...

2049 Commits