this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~70% faster at the
function level and 1-2% faster overall
Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.
Removed the SSE4.1 implementation which is now slower than SSE2.
Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
New implementations: SubtractGreenFromBlueAndRed and TransformColor
around 1-2% faster lossless encoding.
Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
DispatchAlpha* functions are hard to speed up, compared to SSE2.
ExtractAlpha sees a ~15% speed-up though.
Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de
over a 1000 image corpus
Single photograph benchmark:
Before:
Q=20: 2.560 MP/s
Q=40: 2.593 MP/s
Q=60: 1.795 MP/s
Q=80: 1.603 MP/s
Q=99: 1.122 MP/s
After:
Q=20: 3.334 MP/s
Q=40: 2.464 MP/s
Q=60: 2.009 MP/s
Q=80: 1.871 MP/s
Q=99: 1.163 MP/s
This CL allows for some further improvements that would not be possible
otherwise.
Change-Id: I61ba154beca2266cb96469281cf96e84a4412586
use vld1_dup_u8() rather than a separate ld+dup after the values were
zero extended; mildly faster at the function level
Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89
It can be used to test if given pair of animated images (GIF and/or
WebP) are identical in terms of pixel match and other animation
properties.
Change-Id: I84adea145e9d062be6ad06a0d4fcdc9658cf52d4
VP8EncPredChroma8 improvements over ~20M pixels
left/top: ~67%
left-only: ~52%
top-only: ~57%
none: ~61%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76
VP8EncPredLuma16 improvements over ~20M pixels
left/top: ~75%
left-only: ~47%
top-only: ~59%
none: ~63%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d
this target is out of date and there are better ways to make a clean
tree (the first 2 versioned, the last one not):
make distclean (when using autoconf)
git clean -fdx
git archive
Change-Id: I766b75e0adf566c6f7db1a087ff486020b031b3a
structured extended feature flags require eax = 7; avoids incorrectly
detecting avx2 on some older processors that support avx.
for completeness also check for value=1 support used by the other
checks.
from [1]:
INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor
Information and the Vendor Identification String
[1]
http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html
Change-Id: I60b20d661a978d551614dbf7acdc25db19cb6046