libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-03 09:34:30 +02:00

Author	SHA1	Message	Date
Jyrki Alakuijala	f3a7a5bf76	lossless: bit writer optimization valgrind --tool=callgrind shows a 9 % speedup: 1021201984 ticks before vs. 927917709 after -q 0 -m 0 -lossless ~/alpi/1.png 22.040 MP/s before 24.796 MP/s after Change-Id: Iaab928167b3e20fb0d9401c6f8317a26c5a610b4	2015-07-20 16:18:40 -07:00
James Zern	d97b9ff755	Merge changes from topic 'lossless-enc-improvements' * changes: lossless: combine the Huffman code with extra bits lossless: Inlining add literal lossless: simplify HashChainFindCopy heuristics lossless: 0.5 % compression density improvement lossless: Add zeroes into the predicted histograms. lossless: encoding, don't compute unnecessary histo lossless: Remove about 25 % of the speed degradation Faster alpha coding for webp lossless: rle mode not to accept lengths smaller than 4. lossless: Less code for the entropy selection lossless: 0.37 % compression density improvement	2015-07-20 19:38:42 +00:00
James Zern	0250dfcc19	msvc: fix pointer type warning in BitsLog2Floor _BitScanReverse() takes an unsigned long* http://msdn.microsoft.com/en-us/library/fbxyd7zd.aspx fixes: C4057: 'function': 'unsigned long ' differs in indirection to slightly different base types from 'uint32_t ' fixes issue #253 Change-Id: I0101ef7be18c7ed188b35e9b17e7f71290953786	2015-07-18 11:12:21 -07:00
Jyrki Alakuijala	52931fd548	lossless: combine the Huffman code with extra bits gives 2 % speedup 24.9 -> 25.5 MP/s for a photo with -q 0 -m 0 Change-Id: If9ae04683a86dd7b1fced2183cf79b9349a24a9e	2015-07-07 20:24:28 -07:00
Jyrki Alakuijala	c4855ca249	lossless: Inlining add literal this is a simple speedup of about 1-2 % Change-Id: I0c7b01c0a69f4aeaf363ffda05a28871f1def696	2015-07-07 20:24:28 -07:00
Jyrki Alakuijala	8e9c94dedb	lossless: simplify HashChainFindCopy heuristics for small speedup 0.0003 % worse compression Change-Id: Ic4b6b21e5279231c6321f2cec1c79f7e17e56afa	2015-07-07 20:24:27 -07:00
Jyrki Alakuijala	888429f409	lossless: 0.5 % compression density improvement do not do length 2 matches far away speedup for non compressible data by inserting two literals at a time when no matches are found Change-Id: Ia8e033071f4186bb8148bb2bf13ca37586734aa3	2015-07-07 20:24:27 -07:00
Jyrki Alakuijala	7b23b19808	lossless: Add zeroes into the predicted histograms. Increases compression density by 0.03 % for lossy. Speeds up at least one of the lossy alpha images by 20 %. Palette entropy 'kludge' seems to save 1-2 % on alpha images. Change-Id: I2116b8d81593ac8173bfba54a7c833997fca0804	2015-07-07 20:24:27 -07:00
Jyrki Alakuijala	85b44d8a69	lossless: encoding, don't compute unnecessary histo share the computation between different modes 3-5 % speedup for lossless alpha 1 % for lossy alpha no change in compression density Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2	2015-07-07 20:24:26 -07:00
Jyrki Alakuijala	d92453f381	lossless: Remove about 25 % of the speed degradation introduced in: "lossless: 0.37 % compression density improvement" Uses the statistics of red and blue histograms to decide if to run cross color correction at all. Improves compression density by 0.02 % or so. Change-Id: I47429557e9cdbd9fa90c584696f241b17427d73f	2015-07-07 20:24:26 -07:00
Jyrki Alakuijala	2cce031704	Faster alpha coding for webp No significant size degradation (+0.001 %) for 1000 image corpus Fixes the 8 ms vs 2 ms degradation from: "lossless: 0.37 % compression density improvement" Change-Id: Id540169a305d9d5c6213a82b46c879761b3ca608	2015-07-07 20:24:25 -07:00
Jyrki Alakuijala	5e75642efd	lossless: rle mode not to accept lengths smaller than 4. Gives a compression gain of 0.22 % Change-Id: I0f3b8dad6b4c1bfb16eab095a467f34466b9e3b7	2015-07-07 20:24:25 -07:00
Jyrki Alakuijala	84326e4ab0	lossless: Less code for the entropy selection Tested: 1000 png corpus gives same results Change-Id: Ief5ea7727290743b9bd893b08af7aa7951f556cb	2015-07-07 20:24:25 -07:00
Jyrki Alakuijala	16ab951abf	lossless: 0.37 % compression density improvement counting the entropy expectation for five different configurations: palette non-predicted non-predicted with subtract green predicted predicted with subtract green and choose the strategy with the smallest expected entropy Change-Id: Iaaf209c0d565660a54a4f9b3959067afb9951960	2015-07-07 20:24:24 -07:00
James Zern	822f113ebb	add WebPFree() to the API this should be used in preference to free() for releasing memory returned from WebPDecode() / WebPEncode(). this simplifies memory management when working through language bindings Change-Id: I15eb538a45390efc552fda8e5c251a3fbdc13c29	2015-07-06 23:27:51 -07:00
Pascal Massimino	0ae2c2e4b2	SSE2/SSE41: optimize SSE_16xN loops After several trials at re-organizing the main loop and accumulation scheme, this is apparently the faster variant. removed the SSE41 version, which is no longer faster now. For some reason, the AVX variant seems to benefit most for the change. Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5	2015-07-02 20:55:04 +02:00
James Zern	39216e59d9	cosmetics: fix indent after 32462a07 Change-Id: If9a5d91c25e981bc4cd81adb476244e63fc7c3c8	2015-07-01 23:49:20 -07:00
James Zern	559e54ca60	Merge "SSE2: slightly faster FTransformWHT"	2015-07-02 06:36:33 +00:00
Pascal Massimino	8ef9a63b45	SSE2: slightly faster FTransformWHT goes from 0.3% to 0.1% overall CPU time, but... Change-Id: I4c9a92b1e1d6b58ed57c6b890366f1dbeaf84f84	2015-07-01 23:03:17 -07:00
James Zern	f27f773576	lossless_neon: enable VP8LAddGreenToBlueAndRed this moves the function outside the WEBP_USE_INTRINSICS check. there's no alternative version and it's ~70% faster at the function level and 1-2% faster overall Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb	2015-07-01 22:50:54 -07:00
Pascal Massimino	36e9c4bc50	SSE2: minor cosmetrics on in-loop filter code Change-Id: Ic0e6502081d7063bb2841df74e05c450d708aaf2	2015-06-28 11:59:22 +02:00
James Zern	4741fac42e	dsp/lossless_*sse2: remove some unnecessary inlines TransformColor / TransformColorInverse are the top-level function pointer calls Change-Id: Ieabdb4005ff3e4f9bb3ebcb140ccb6bef5d28f8b	2015-06-25 21:02:01 -07:00
Pascal Massimino	1819965e0a	fix warning ("left shift of negative value") using a cast Change-Id: Ie99e8ff87924a1d15e2c5d83bd9adf07dab04e94	2015-06-24 23:46:09 -07:00
Pascal Massimino	7017001462	SSE2: speed-up some lossless-encoding functions optimized: CollectColorRedTransforms, CollectColorBlueTransforms, SubtractGreenFromBlueAndRed overall effect is sub-1% speed-up, though. Change-Id: I9cb49af5c56e4c03db417929b0a2cf575d60a5c6	2015-06-24 20:09:13 -07:00
Pascal Massimino	abcb012841	Merge "SSE2: slightly faster (~5%) AddGreenToBlueAndRed()"	2015-06-24 09:37:46 +00:00
Pascal Massimino	2df5bd30a6	Merge "Speedup to HuffmanCostCombinedCount"	2015-06-24 07:42:26 +00:00
Pascal Massimino	9e356d6b25	SSE2: slightly faster (~5%) AddGreenToBlueAndRed() Change-Id: Ie147010b66544c4e959f26966ad588394302d418	2015-06-24 09:36:44 +02:00
Pascal Massimino	fc6c75a2a2	SSE2: 53% faster TransformColor[Inverse] Changed the code (again) to process 4 pixels at a time. Loop is more involved, but overall it's faster. Removed the SSE4.1 implementation which is now slower than SSE2. Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1	2015-06-23 14:52:01 -07:00
Pascal Massimino	49073da6d6	SSE2: 46% speed-up of TransformColor[Inverse] Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e	2015-06-23 20:09:04 +02:00
Pascal Massimino	32462a072c	Speedup to HuffmanCostCombinedCount ~3% speedup for lossless encoding Improves compression ratio by ~0.03% Change-Id: Ic6d05fb0b1099b5ca56689b92b1c6515d54a5d6b	2015-06-23 16:41:03 +02:00
Pascal Massimino	f3d687e3fa	SSE4.1 implementation of some lossless encoding functions New implementations: SubtractGreenFromBlueAndRed and TransformColor around 1-2% faster lossless encoding. Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861	2015-06-23 08:46:57 +02:00
Pascal Massimino	bfc300c7ff	SSE4.1 implementation of some alpha-processing functions DispatchAlpha* functions are hard to speed up, compared to SSE2. ExtractAlpha sees a ~15% speed-up though. Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de	2015-06-19 14:17:39 -07:00
Pascal Massimino	7f9c98f21d	Merge "sse2 in-loop: simplify SignedShift8b() a bit"	2015-06-12 07:37:32 +00:00
James Zern	ef314a5d6c	dec_sse2/GetNotHEV: micro optimization trade 2 subtractions + logical or for 1 max + 1 subtraction Change-Id: I7d1f25f7cda2a89bc8247f3d3d5417f6b0e3d96c	2015-06-11 22:46:24 -07:00
Pascal Massimino	a729cff987	sse2 in-loop: simplify SignedShift8b() a bit Change-Id: Ida3e096bb41451194d03dc7a97753a222ff0135c	2015-06-11 15:26:31 -07:00
Pascal Massimino	422ec9fb62	simplify Load8x4() a bit Change-Id: I68cf09c432f48e34bbe1d47dd091417cfd40cf4e	2015-06-10 12:35:50 -07:00
James Zern	8df238ec8a	Merge "remove some duplicate FlipSign()"	2015-06-06 05:25:04 +00:00
Pascal Massimino	751506c484	remove some duplicate FlipSign() ApplyFilter2NoFlip is the new variant of ApplyFilter2 without the sign-flip Change-Id: I2af54bd1499118c8321183e42251d265ba76219c	2015-06-05 17:20:29 +02:00
James Zern	65ef5afc27	Merge "lossless: 0.13% compression density gain"	2015-06-03 03:02:09 +00:00
Jyrki Alakuijala	2beef2f245	lossless: 0.13% compression density gain over a 1000 image corpus Single photograph benchmark: Before: Q=20: 2.560 MP/s Q=40: 2.593 MP/s Q=60: 1.795 MP/s Q=80: 1.603 MP/s Q=99: 1.122 MP/s After: Q=20: 3.334 MP/s Q=40: 2.464 MP/s Q=60: 2.009 MP/s Q=80: 1.871 MP/s Q=99: 1.163 MP/s This CL allows for some further improvements that would not be possible otherwise. Change-Id: I61ba154beca2266cb96469281cf96e84a4412586	2015-06-02 17:27:36 -07:00
Pascal Massimino	3033f24c26	lossless: 0.06 % compression density improvement Change-Id: Ib662e6aec53b40d6bc736d3ecfd6475bb005c790	2015-06-02 14:51:51 +02:00
James Zern	64960da9e1	dec_neon: add VE8uv / VE16 VE8uv/VE16: ~25%/~33% faster over 20M pixels Change-Id: Ifac1114091527a05ed10edfcc43852edff012d14	2015-05-30 13:40:00 -07:00
James Zern	14dbd87bed	dec_neon: add HE8uv / HE16 HE8uv/HE16: ~91%/~83% faster over 20M pixels Change-Id: Ib0a776f7c193593ea0993e92cfa6e6be000fb810	2015-05-30 13:39:24 -07:00
skal	ac76801159	introduce FTransform2 to perform two transforms at a time. FTransform goes from ~12.0% to 11.5% total CPU time. Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624	2015-05-18 21:06:15 -07:00
James Zern	aa6065aedd	dec_neon: use vld1_dup(mem) rather than vdup(mem[0]) should result in slightly less general purpose register use Change-Id: I6069f49541392e56c8db2c28c8d1fdf88c1a1726	2015-05-16 11:24:32 -07:00
Pascal Massimino	8b63ac78e0	Merge "dec_neon: add TM16"	2015-05-16 10:56:07 +00:00
Pascal Massimino	f51be09e1f	Merge "dec_neon/TrueMotion: simply left border load"	2015-05-16 10:54:05 +00:00
James Zern	dc48196bd9	dec_neon: add TM16 over 20M pixels ~78% faster Change-Id: I420d5d590f275f19e08f86df1d1caa6b82fffbde	2015-05-15 12:50:11 -07:00
James Zern	ea95b305ca	dec_neon/TrueMotion: simply left border load use vld1_dup_u8() rather than a separate ld+dup after the values were zero extended; mildly faster at the function level Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89	2015-05-15 12:48:13 -07:00
Pascal Massimino	f262d6120e	speed-up SetResidualSSE2 (was unnecessarily complicated) Before: VP8SetResidualCoeffs: checksum = 1127918 elapsed = 475 ms. Change-Id: Ia54bef86c45f9f474622ff16e594bf1da4f67ebd After: VP8SetResidualCoeffs: checksum = 1127918 elapsed = 404 ms.	2015-05-14 21:24:24 -07:00

... 3 4 5 6 7 ...

2012 Commits