libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-05 10:34:32 +02:00

Author	SHA1	Message	Date
James Zern	de693f2502	lossless_neon: disable VP8LConvert* functions due to breakage with NDK/gcc-4.6 builds Change-Id: Id96258e710ee33e08a023354b3227f27da986620	2014-04-04 20:38:29 -07:00
skal	4143332b22	NEON intrinsics for encoding * inverse transform is actually slower with intrinsics + gcc-4.6, so is left disabled for now. With gcc-4.8, it's a bit faster than inlined assembly. * Sum of Square error function provide a 2-3% speed up There's enabled by default (since there's no inlined-asm equivalent) Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a	2014-04-04 15:02:56 -07:00
Djordje Pesut	0ca2914b23	MIPS: MIPS32r1: Add optimization for ITransform Change-Id: Ie4c8b9bc3a7826bd443cdebf05386786fafe8c56	2014-04-04 10:50:35 +02:00
James Zern	71bca5ecf3	dec_neon: use vst_lane instead of vget_lane results in fewer instructions, small speed improvement Change-Id: I98de632d09ff09f295368c0d744cb4397b585084	2014-04-03 14:56:26 -07:00
skal	bf06105293	Intrinsics NEON version of TransformOne + misc cosmetics * seems 4% slower than inlined-asm with gcc-4.6 * is a tad faster (<1%) with gcc-4.8 (disabled for now) Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095	2014-04-03 14:41:56 -07:00
pascal massimino	19c6f1ba74	Merge "dec_neon: use vld?_lane instead of vset?_lane"	2014-04-03 01:16:29 -07:00
James Zern	7a94c0cf75	upsampling_neon: drop NEON suffix from local functions Change-Id: I6583ad74aacf78dcbeb5a0ff0218a39bc3460e5a	2014-04-02 23:24:39 -07:00
James Zern	d14669c83c	upsampling_sse2: drop SSE2 suffix from local functions Change-Id: I2349c1a8e5e15e1d204642096f84f3202721c297	2014-04-02 23:24:39 -07:00
James Zern	2ca42a4fb7	enc_sse2: drop SSE2 suffix from local functions Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e	2014-04-02 23:24:36 -07:00
James Zern	d038e6193b	dec_sse2: drop SSE2 suffix from local functions Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882	2014-04-02 23:10:39 -07:00
James Zern	fa52d7525f	dec_neon: use vld?_lane instead of vset?_lane results in fewer instructions, small speed improvement Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a	2014-04-02 23:03:18 -07:00
Pascal Massimino	c520e77d94	cosmetic: fix long line Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea	2014-04-02 23:00:50 -07:00
James Zern	4b0f2dae6f	Merge "add intrinsics NEON code for chroma strong-filtering"	2014-04-02 22:57:44 -07:00
skal	e351ec0759	add intrinsics NEON code for chroma strong-filtering The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t register, and re-use the previous (luma) functions Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38	2014-04-03 06:58:21 +02:00
pascal massimino	aaf734b8b0	Merge "Add SSE2 version of forward cross-color transform"	2014-04-02 14:18:59 -07:00
Urvang Joshi	c90a902eff	Add SSE2 version of forward cross-color transform Encoding speed is roughly the same. Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c	2014-04-02 12:21:20 -07:00
Vikas Arora	bc374ff39e	Use histogram_bits to initalize transform_bits. This change gains back 1% in compression density for method=3 and 0.5% for method=4, at the expense of 10% slower compression speed. Change-Id: I491aa1c726def934161d4a4377e009737fbeff82	2014-04-02 11:46:40 -07:00
James Zern	2132992d47	Merge "Add strong filtering intrinsics (inner and outer edges)"	2014-04-02 00:10:01 -07:00
skal	5fbff3a646	Add strong filtering intrinsics (inner and outer edges) + added some work-around gcc-4.6 to make it compile (except one function). + lots of revamping All variants tested ok. Speed-up is ~5-7% Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3	2014-04-02 08:28:55 +02:00
Urvang Joshi	d4813f0cb2	Add SSE2 function for Inverse Cross-color Transform Lossless decoding is now ~3% faster. Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743	2014-04-01 15:52:25 -07:00
James Zern	26029568b7	dec_neon: add strong loopfilter intrinsics vertical only currently, 2.5-3% faster placed under USE_INTRINSICS as this change depends on the simple loopfilter improves the simple loopfilter slightly thanks to some reorganization Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155	2014-04-01 01:13:50 -07:00
James Zern	cca7d7ef0f	Merge "add intrinsics version of SimpleHFilter16NEON()"	2014-04-01 00:57:11 -07:00
James Zern	1a05dfa7f5	windows: fix dll builds WebPSafe* need to be marked external to allow mux/demux to access them through libwebp.dll Change-Id: Ib6620e00d376f7aa5a0550e1e244f759977f97a0	2014-03-31 17:46:12 -07:00
skal	d6c50d8ac2	Merge "add some colorspace conversion functions in NEON"	2014-03-31 13:15:18 -07:00
Urvang Joshi	4fd7c82e6a	SSE2 variants of Subtract-Green: Rectify loop condition When 4 pixels are left, they should be processed with SSE2. Decoding is marginally faster (~0.4%). Encoding speed: No observable difference. Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e	2014-03-31 10:51:45 -07:00
skal	97e5fac389	add some colorspace conversion functions in NEON new file: lossless_neon.c speedup is ~5% gcc 4.6.3 seems to be doing some sub-optimal things here, storing register on stack using 'vstmia' and such. Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509 I've tried adding -fno-split-wide-types and it does help the generated assembly. But the overall speed gets worse with this flag. We should only compile lossless_neon.c with it -> urk. Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0	2014-03-31 17:47:46 +02:00
skal	b9a7a45f1f	add intrinsics version of SimpleHFilter16NEON() It's disable for now, because it crashes gcc-4.6.3 during compilation with -O2 or -O3. It's been tested OK with -O1. Code is still globally disabled with USE_INTRINSICS, though. Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c	2014-03-31 16:31:31 +02:00
Pascal Massimino	daccbf400d	add light filtering NEON intrinsics disabled for now (but tested OK), thanks to the USE_INTRINSICS #define We'll activate the code when we're on par with non-intrinsics Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff	2014-03-30 22:15:55 -07:00
Pascal Massimino	af44460880	fix typo in STORE_WHT was working ok because dst == out Change-Id: I27095129a11f468422250dd2b8fad8b3bd4e5bbd	2014-03-28 10:34:44 -07:00
Vikas Arora	6af6b8e1b6	Tune HistogramCombineBin for large images. Tune HistogramCombineBin for hard images that are larger than 1-2 Mega pixel and represent photographic images. This speeds up lossless encoding on 1000 image corpus by 10-12% and compression penalty of 0.1-0.2%. Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81	2014-03-28 07:09:59 -07:00
skal	af93bdd6bc	use WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free there's still some malloc/free in the external example This is an encoder API change because of the introduction of WebPMemoryWriterClear() for symmetry reasons. The MemoryWriter object should probably go in examples/ instead of being in the main lib, though. mux_types.h stil contain some inlined free()/malloc() that are harder to remove (we need to put them in the libwebputils lib and make sure link is ok). Left as a TODO for now. Also: WebPDecodeRGB*() function are still returning a pointer that needs to be free()'d. We should call WebPSafeFree() on these, but it means exposing the whole mechanism. TODO(later). Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b	2014-03-27 15:50:59 -07:00
James Zern	51f406a5d7	lossless_sse2: relocate VP8LDspInitSSE2 proto this is in line with the other dsp files and silences a build warning. Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2	2014-03-27 15:07:43 -07:00
skal	0f4f721b12	separate SSE2 lossless functions into its own file expose the predictor array as function pointers instead of each individual sub-function + merged Average2() into ClampedAddSubtractHalf directly + unified the signature as "VP8LProcessBlueAndRedFunc" no speed diff observed Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044	2014-03-27 21:43:55 +01:00
skal	514fc251df	VP8LConvertFromBGRA: use conversion function pointers Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c	2014-03-27 09:00:35 +01:00
James Zern	6d2f35273d	dsp/dec: TransformDCUV: use VP8TransformDC rather than forcing the C version; this is similar to TransformUV Change-Id: I2778194f05fca33e9b2b71323e92947c0b395e9a	2014-03-26 16:43:47 -07:00
skal	defc8e1b01	Merge "fix out-of-bound read during alpha-plane decoding"	2014-03-26 15:22:42 -07:00
James Zern	fbed36433d	Merge "dsp: reuse wht transform from dec in encoder"	2014-03-26 15:13:07 -07:00
skal	d846708400	Merge "Add SSE2 version of ARGB -> BGR/RGB/... conversion functions"	2014-03-26 15:01:46 -07:00
skal	207d03b484	fix out-of-bound read during alpha-plane decoding With -bypass_filter switched on, the lossless-compressed data is decoded ahead of time (before being transformed and display). Hence, the last row was called twice. http://code.google.com/p/webp/issues/detail?id=193 Change-Id: I9e13f495f6bd6f75fa84c4a21911f14c402d4b10	2014-03-26 22:45:03 +01:00
skal	d1b33ad58b	2-5% faster trellis with clang/MacOS (and ~2-3% on ARM) We don't need to store cost/score for each node, but only for the current and previous one -> simplify code and save some memory. Also made the 'Node' structure tighter. Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6	2014-03-26 22:33:01 +01:00
skal	369c26dd3f	Add SSE2 version of ARGB -> BGR/RGB/... conversion functions ~4-6% faster lossless decoding Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361	2014-03-26 22:19:00 +01:00
James Zern	df230f2723	dsp: reuse wht transform from dec in encoder Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717	2014-03-22 13:25:08 -07:00
Pascal Massimino	59daf08362	Merge "cosmetics:"	2014-03-18 04:02:33 -07:00
Pascal Massimino	536220084c	cosmetics: - use VP8ScanUV, separate from VP8Scan[] (for luma) - fix indentation - few missing consts - change TrellisQuantizeBlock() signature Change-Id: I94b437d791cbf887015772b5923feb83dd145530	2014-03-18 03:34:56 -07:00
James Zern	3e7f34a3fb	AssignSegments: quiet array-bounds warning nb (enc->segment_hdr_.num_segments_) will be in the range [1, NUM_MB_SEGMENTS]. Change-Id: I5c2bd0bb82b17c99aff39c98b6b1747fc040dc16	2014-03-14 18:47:52 -07:00
James Zern	3c2ebf58a4	Merge "UpdateHistogramCost: avoid implicit double->float"	2014-03-14 15:50:57 -07:00
James Zern	cf821c821f	UpdateHistogramCost: avoid implicit double->float all the functions involved return double and later these locals are used in double calculations. fixes a vs build warning Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14	2014-03-14 11:18:52 -07:00
Vikas Arora	312e638f30	Extend the search space for GetBestGreenRedToBlue Get back some of the compression gains by extending the search space for GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not helping much in yielding better compression density. Before: 1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations) Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s After: 1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations) Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d	2014-03-14 09:56:00 -07:00
Vikas Arora	1c58526fe1	Fix few nits Add/remove few casts, fixed indentation. Change-Id: Icd141694201843c04e476f09142ce4be6e502dff	2014-03-13 13:57:39 -07:00
Vikas Arora	fef22704ec	Optimize and re-structure VP8LGetHistoImageSymbols Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash for merging the Histograms more efficiently, instead of the randomized heuristic of existing method HistogramCombine. This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50) with 0.8% penalty in compression density. For lower method, the speed up is 25-30%, with 0.4% penalty in the compression density. Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce	2014-03-13 11:48:37 -07:00

1 2 3 4 5 ...

1249 Commits