libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-12 05:54:31 +02:00

Author	SHA1	Message	Date
Pascal Massimino	306ce4fde1	rescaler: move the 1x1 or 2x1 handling one level up => no need to handle it in the sub-functions. Change-Id: I4b0211ecfafbc9c80a73bf2206809a13c94e7911	2015-09-25 14:35:35 -07:00
Pascal Massimino	cced974bb2	remove _mm_set_epi64x(), which is too specific Change-Id: I4b1035f9c548b804f31c68a00b0a1aa8e13550bb	2015-09-25 14:35:33 -07:00
Pascal Massimino	56668c9fc5	fix warnings about uint64_t -> uint32_t conversion Change-Id: Iee027979b404d4b7edda506b844d354aa1026dae	2015-09-25 17:36:11 +02:00
Pascal Massimino	76a7dc39e5	rescaler: add some SSE2 code The rounding and arithmetic is not the same as previously, to prevent overflow cases for large upscale factors. We still rely on 32b x 32b -> 64b multiplies. Raised the fixed-point precision to 32b so that we have some nice shifts from epi64 to epi32. Changed rescaler_t type to 'uint32_t' in order to squeeze in all the precision required. The MIPS code has been disabled because it's now out-of-sync. Will be fixed in a subsequent CL when the dust settles. ~30-35% faster Change-Id: I32e4ddc00933f1b1aa3463403086199fd5dad07b	2015-09-25 15:07:13 +02:00
James Zern	1df1d0eedb	rescaler: harmonize function protos Change-Id: I13b5f9add83c1225c82a650f3ef717582b057247	2015-09-19 22:57:25 -07:00
Pascal Massimino	9ba1894b9b	rescaler: simplify ImportRow logic incorporates the loop over 'channel' and removes one parameter Change-Id: I4e3b33c111ca825fe96461583420413b17326409	2015-09-19 10:07:26 -07:00
Pascal Massimino	5ff0079ece	fix rescaler vertical interpolation * vertical expansion now uses bilinear interpolation * heavily assumes that the alpha plane is decoded in full, not row-by-row * split the RescalerExportRow and RescalerImportRow methods into Shrink and Expand variants. * MIPS implementation of ExportRowExpand is missing. There's room for extra speed optim and code re-org, but let's keep that for later patches. addresses https://code.google.com/p/webp/issues/detail?id=254 Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb	2015-09-18 17:32:11 -07:00
James Zern	d623a8706f	dec_neon: add whitespace around stringizing operator prevents unintentional side-effects (though unlikely in this case) with future compilers, cf: eebaf97 dsp/mips: add whitespace around stringizing operator Change-Id: I0537091fcc97b4f54d0a156c3c83a28c51456b17	2015-09-03 23:13:56 -07:00
James Zern	29377d55b6	dsp/mips: cosmetics: add whitespace around XSTR macro normalizes formatting after: eebaf97 dsp/mips: add whitespace around stringizing operator Change-Id: I1e3986b6d08195d79072747eb99d7e0549aece72	2015-09-03 23:09:13 -07:00
James Zern	eebaf97f5a	dsp/mips: add whitespace around stringizing operator fixes compile with gcc 5.1 BUG=259 Change-Id: Ideb39c6290ab8569b1b6cc835bea11c822d0286c	2015-09-02 23:21:13 -07:00
James Zern	14efabbf1c	Android: limit use of cpufeatures cpufeatures is only used with armeabi-v7a.* Change-Id: I80284061d71d9defa50d139c7f1bda67c00f567e	2015-08-19 18:44:33 -07:00
skal	bd55604d1b	SSE2: add yuv444 converters, re-using yuv_sse2.c Change-Id: I4d5c9df8a4c8e8cb8b5daa537af07382894503a8	2015-08-17 21:15:37 -07:00
James Zern	155c1b222b	Merge changes I76f4d6fe,I45434639 * changes: lossless_enc_neon: add VP8LTransformColor lossless_neon: add VP8LTransformColorInverse	2015-08-06 23:00:03 +00:00
Djordje Pesut	717e4d5a7c	mips32/mipsDSPr2: function ImportRow rebased Change-Id: Id58d266040fdb5fe1e507cd0f6370ea625156e4d	2015-08-06 17:09:10 +02:00
Pascal Massimino	7df93893dc	fix rescaling bug (uninitialized read, see bug #254 ). the x_add/x_sub increments were wrong for u/v in the upscaling case. They shouldn't be left to the caller's discretion, but set up by WebPRescalerInit to their exact necessary values. -> Cleaned-up WebPRescalerInit() param list. -> added safety asserts -> removed the mips32/mips_r2 variant of "ImportRow" which were buggy prior Change-Id: I347c75804d835811e7025de92a0758d7929dfc09	2015-08-05 23:00:00 -07:00
James Zern	5cdcd561e2	lossless_enc_neon: add VP8LTransformColor based on SSE2, ~32% faster Change-Id: I76f4d6fe456baceba46ffebf2f699e98691eefdf	2015-08-05 00:15:13 -07:00
James Zern	a53c336919	lossless_neon: add VP8LTransformColorInverse based on SSE2, only ~11% faster Change-Id: I45434639d81e153f01f77c1f5d2da510b542170e	2015-08-04 23:22:36 -07:00
James Zern	99131e7f8c	Merge changes I9fb25a89,Ibc648e9e * changes: lossless_neon: remove predictors 5-13 ll_enc_neon: enable VP8LSubtractGreenFromBlueAndRed	2015-08-04 02:24:15 +00:00
Pascal Massimino	c455676680	simplify the main loop for downscaling (part of bug #254 investigation) no speed change observed. Change-Id: Ie21b33171def367f37643fef6a0bd378e49468c7	2015-08-03 16:57:35 +02:00
James Zern	2a010f992a	lossless_neon: remove predictors 5-13 operating on single uint32's isn't helped by NEON. this improves aarch64 performance by ~4% Change-Id: I9fb25a8962de7b80e893e756ee7c76393cfd40c7	2015-07-28 19:44:58 -07:00
James Zern	ca221bbc48	ll_enc_neon: enable VP8LSubtractGreenFromBlueAndRed this moves the function outside the WEBP_USE_INTRINSICS check. there's no alternative version and it's ~54% faster at the function level and mildly faster overall Change-Id: Ibc648e9ee35021d48901e05aa596aa01067796a2	2015-07-28 19:44:45 -07:00
Jyrki Alakuijala	85b44d8a69	lossless: encoding, don't compute unnecessary histo share the computation between different modes 3-5 % speedup for lossless alpha 1 % for lossy alpha no change in compression density Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2	2015-07-07 20:24:26 -07:00
Pascal Massimino	0ae2c2e4b2	SSE2/SSE41: optimize SSE_16xN loops After several trials at re-organizing the main loop and accumulation scheme, this is apparently the faster variant. removed the SSE41 version, which is no longer faster now. For some reason, the AVX variant seems to benefit most for the change. Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5	2015-07-02 20:55:04 +02:00
James Zern	39216e59d9	cosmetics: fix indent after 32462a07 Change-Id: If9a5d91c25e981bc4cd81adb476244e63fc7c3c8	2015-07-01 23:49:20 -07:00
James Zern	559e54ca60	Merge "SSE2: slightly faster FTransformWHT"	2015-07-02 06:36:33 +00:00
Pascal Massimino	8ef9a63b45	SSE2: slightly faster FTransformWHT goes from 0.3% to 0.1% overall CPU time, but... Change-Id: I4c9a92b1e1d6b58ed57c6b890366f1dbeaf84f84	2015-07-01 23:03:17 -07:00
James Zern	f27f773576	lossless_neon: enable VP8LAddGreenToBlueAndRed this moves the function outside the WEBP_USE_INTRINSICS check. there's no alternative version and it's ~70% faster at the function level and 1-2% faster overall Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb	2015-07-01 22:50:54 -07:00
Pascal Massimino	36e9c4bc50	SSE2: minor cosmetrics on in-loop filter code Change-Id: Ic0e6502081d7063bb2841df74e05c450d708aaf2	2015-06-28 11:59:22 +02:00
James Zern	4741fac42e	dsp/lossless_*sse2: remove some unnecessary inlines TransformColor / TransformColorInverse are the top-level function pointer calls Change-Id: Ieabdb4005ff3e4f9bb3ebcb140ccb6bef5d28f8b	2015-06-25 21:02:01 -07:00
Pascal Massimino	1819965e0a	fix warning ("left shift of negative value") using a cast Change-Id: Ie99e8ff87924a1d15e2c5d83bd9adf07dab04e94	2015-06-24 23:46:09 -07:00
Pascal Massimino	7017001462	SSE2: speed-up some lossless-encoding functions optimized: CollectColorRedTransforms, CollectColorBlueTransforms, SubtractGreenFromBlueAndRed overall effect is sub-1% speed-up, though. Change-Id: I9cb49af5c56e4c03db417929b0a2cf575d60a5c6	2015-06-24 20:09:13 -07:00
Pascal Massimino	abcb012841	Merge "SSE2: slightly faster (~5%) AddGreenToBlueAndRed()"	2015-06-24 09:37:46 +00:00
Pascal Massimino	2df5bd30a6	Merge "Speedup to HuffmanCostCombinedCount"	2015-06-24 07:42:26 +00:00
Pascal Massimino	9e356d6b25	SSE2: slightly faster (~5%) AddGreenToBlueAndRed() Change-Id: Ie147010b66544c4e959f26966ad588394302d418	2015-06-24 09:36:44 +02:00
Pascal Massimino	fc6c75a2a2	SSE2: 53% faster TransformColor[Inverse] Changed the code (again) to process 4 pixels at a time. Loop is more involved, but overall it's faster. Removed the SSE4.1 implementation which is now slower than SSE2. Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1	2015-06-23 14:52:01 -07:00
Pascal Massimino	49073da6d6	SSE2: 46% speed-up of TransformColor[Inverse] Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e	2015-06-23 20:09:04 +02:00
Pascal Massimino	32462a072c	Speedup to HuffmanCostCombinedCount ~3% speedup for lossless encoding Improves compression ratio by ~0.03% Change-Id: Ic6d05fb0b1099b5ca56689b92b1c6515d54a5d6b	2015-06-23 16:41:03 +02:00
Pascal Massimino	f3d687e3fa	SSE4.1 implementation of some lossless encoding functions New implementations: SubtractGreenFromBlueAndRed and TransformColor around 1-2% faster lossless encoding. Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861	2015-06-23 08:46:57 +02:00
Pascal Massimino	bfc300c7ff	SSE4.1 implementation of some alpha-processing functions DispatchAlpha* functions are hard to speed up, compared to SSE2. ExtractAlpha sees a ~15% speed-up though. Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de	2015-06-19 14:17:39 -07:00
Pascal Massimino	7f9c98f21d	Merge "sse2 in-loop: simplify SignedShift8b() a bit"	2015-06-12 07:37:32 +00:00
James Zern	ef314a5d6c	dec_sse2/GetNotHEV: micro optimization trade 2 subtractions + logical or for 1 max + 1 subtraction Change-Id: I7d1f25f7cda2a89bc8247f3d3d5417f6b0e3d96c	2015-06-11 22:46:24 -07:00
Pascal Massimino	a729cff987	sse2 in-loop: simplify SignedShift8b() a bit Change-Id: Ida3e096bb41451194d03dc7a97753a222ff0135c	2015-06-11 15:26:31 -07:00
Pascal Massimino	422ec9fb62	simplify Load8x4() a bit Change-Id: I68cf09c432f48e34bbe1d47dd091417cfd40cf4e	2015-06-10 12:35:50 -07:00
James Zern	8df238ec8a	Merge "remove some duplicate FlipSign()"	2015-06-06 05:25:04 +00:00
Pascal Massimino	751506c484	remove some duplicate FlipSign() ApplyFilter2NoFlip is the new variant of ApplyFilter2 without the sign-flip Change-Id: I2af54bd1499118c8321183e42251d265ba76219c	2015-06-05 17:20:29 +02:00
James Zern	65ef5afc27	Merge "lossless: 0.13% compression density gain"	2015-06-03 03:02:09 +00:00
Jyrki Alakuijala	2beef2f245	lossless: 0.13% compression density gain over a 1000 image corpus Single photograph benchmark: Before: Q=20: 2.560 MP/s Q=40: 2.593 MP/s Q=60: 1.795 MP/s Q=80: 1.603 MP/s Q=99: 1.122 MP/s After: Q=20: 3.334 MP/s Q=40: 2.464 MP/s Q=60: 2.009 MP/s Q=80: 1.871 MP/s Q=99: 1.163 MP/s This CL allows for some further improvements that would not be possible otherwise. Change-Id: I61ba154beca2266cb96469281cf96e84a4412586	2015-06-02 17:27:36 -07:00
Pascal Massimino	3033f24c26	lossless: 0.06 % compression density improvement Change-Id: Ib662e6aec53b40d6bc736d3ecfd6475bb005c790	2015-06-02 14:51:51 +02:00
James Zern	64960da9e1	dec_neon: add VE8uv / VE16 VE8uv/VE16: ~25%/~33% faster over 20M pixels Change-Id: Ifac1114091527a05ed10edfcc43852edff012d14	2015-05-30 13:40:00 -07:00
James Zern	14dbd87bed	dec_neon: add HE8uv / HE16 HE8uv/HE16: ~91%/~83% faster over 20M pixels Change-Id: Ib0a776f7c193593ea0993e92cfa6e6be000fb810	2015-05-30 13:39:24 -07:00

1 2 3 4 5 ...

579 Commits