this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~54% faster at the
function level and mildly faster overall
Change-Id: Ibc648e9ee35021d48901e05aa596aa01067796a2
a total impact of 1 % on encoding speed
This allows for performance neutral removal of the binary search
in cache bits selection. This will give a small improvement in
compression density.
Change-Id: If5d4d59460fa1924ce71af977320834a47c2054a
0.21 % compression density improvement for 1000 png corpus in
lossless mode
0.50 % compression density improvement for 1000 png corpus in
lossy mode
Change-Id: I14ee8c427ae5d3e116b0ee6695fcdea3321a319d
valgrind --tool=callgrind shows a 9 % speedup: 1021201984 ticks before vs.
927917709 after
-q 0 -m 0 -lossless ~/alpi/1.png
22.040 MP/s before
24.796 MP/s after
Change-Id: Iaab928167b3e20fb0d9401c6f8317a26c5a610b4
* changes:
lossless: combine the Huffman code with extra bits
lossless: Inlining add literal
lossless: simplify HashChainFindCopy heuristics
lossless: 0.5 % compression density improvement
lossless: Add zeroes into the predicted histograms.
lossless: encoding, don't compute unnecessary histo
lossless: Remove about 25 % of the speed degradation
Faster alpha coding for webp
lossless: rle mode not to accept lengths smaller than 4.
lossless: Less code for the entropy selection
lossless: 0.37 % compression density improvement
_BitScanReverse() takes an unsigned long*
http://msdn.microsoft.com/en-us/library/fbxyd7zd.aspx
fixes:
C4057: 'function': 'unsigned long *' differs in indirection to slightly
different base types from 'uint32_t *'
fixes issue #253
Change-Id: I0101ef7be18c7ed188b35e9b17e7f71290953786
do not do length 2 matches far away
speedup for non compressible data by inserting two literals at a time
when no matches are found
Change-Id: Ia8e033071f4186bb8148bb2bf13ca37586734aa3
Increases compression density by 0.03 % for lossy.
Speeds up at least one of the lossy alpha images by 20 %.
Palette entropy 'kludge' seems to save 1-2 % on alpha images.
Change-Id: I2116b8d81593ac8173bfba54a7c833997fca0804
share the computation between different modes
3-5 % speedup for lossless alpha
1 % for lossy alpha
no change in compression density
Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
introduced in:
"lossless: 0.37 % compression density improvement"
Uses the statistics of red and blue histograms to decide if to run
cross color correction at all.
Improves compression density by 0.02 % or so.
Change-Id: I47429557e9cdbd9fa90c584696f241b17427d73f
No significant size degradation (+0.001 %) for 1000 image corpus
Fixes the 8 ms vs 2 ms degradation from:
"lossless: 0.37 % compression density improvement"
Change-Id: Id540169a305d9d5c6213a82b46c879761b3ca608
counting the entropy expectation for five different configurations:
palette
non-predicted
non-predicted with subtract green
predicted
predicted with subtract green
and choose the strategy with the smallest expected entropy
Change-Id: Iaaf209c0d565660a54a4f9b3959067afb9951960
this should be used in preference to free() for releasing memory
returned from WebPDecode*() / WebPEncode*(). this simplifies memory
management when working through language bindings
Change-Id: I15eb538a45390efc552fda8e5c251a3fbdc13c29
After several trials at re-organizing the main loop and accumulation scheme,
this is apparently the faster variant.
removed the SSE41 version, which is no longer faster now.
For some reason, the AVX variant seems to benefit most for the change.
Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~70% faster at the
function level and 1-2% faster overall
Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.
Removed the SSE4.1 implementation which is now slower than SSE2.
Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
New implementations: SubtractGreenFromBlueAndRed and TransformColor
around 1-2% faster lossless encoding.
Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
DispatchAlpha* functions are hard to speed up, compared to SSE2.
ExtractAlpha sees a ~15% speed-up though.
Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de
over a 1000 image corpus
Single photograph benchmark:
Before:
Q=20: 2.560 MP/s
Q=40: 2.593 MP/s
Q=60: 1.795 MP/s
Q=80: 1.603 MP/s
Q=99: 1.122 MP/s
After:
Q=20: 3.334 MP/s
Q=40: 2.464 MP/s
Q=60: 2.009 MP/s
Q=80: 1.871 MP/s
Q=99: 1.163 MP/s
This CL allows for some further improvements that would not be possible
otherwise.
Change-Id: I61ba154beca2266cb96469281cf96e84a4412586