Pascal Massimino
0ae2c2e4b2
SSE2/SSE41: optimize SSE_16xN loops
...
After several trials at re-organizing the main loop and accumulation scheme,
this is apparently the faster variant.
removed the SSE41 version, which is no longer faster now.
For some reason, the AVX variant seems to benefit most for the change.
Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5
2015-07-02 20:55:04 +02:00
James Zern
39216e59d9
cosmetics: fix indent after 32462a07
...
Change-Id: If9a5d91c25e981bc4cd81adb476244e63fc7c3c8
2015-07-01 23:49:20 -07:00
James Zern
559e54ca60
Merge "SSE2: slightly faster FTransformWHT"
2015-07-02 06:36:33 +00:00
Pascal Massimino
8ef9a63b45
SSE2: slightly faster FTransformWHT
...
goes from 0.3% to 0.1% overall CPU time, but...
Change-Id: I4c9a92b1e1d6b58ed57c6b890366f1dbeaf84f84
2015-07-01 23:03:17 -07:00
James Zern
f27f773576
lossless_neon: enable VP8LAddGreenToBlueAndRed
...
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~70% faster at the
function level and 1-2% faster overall
Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb
2015-07-01 22:50:54 -07:00
Pascal Massimino
36e9c4bc50
SSE2: minor cosmetrics on in-loop filter code
...
Change-Id: Ic0e6502081d7063bb2841df74e05c450d708aaf2
2015-06-28 11:59:22 +02:00
James Zern
4741fac42e
dsp/lossless_*sse2: remove some unnecessary inlines
...
TransformColor / TransformColorInverse are the top-level function
pointer calls
Change-Id: Ieabdb4005ff3e4f9bb3ebcb140ccb6bef5d28f8b
2015-06-25 21:02:01 -07:00
Pascal Massimino
1819965e0a
fix warning ("left shift of negative value") using a cast
...
Change-Id: Ie99e8ff87924a1d15e2c5d83bd9adf07dab04e94
2015-06-24 23:46:09 -07:00
Pascal Massimino
7017001462
SSE2: speed-up some lossless-encoding functions
...
optimized: CollectColorRedTransforms, CollectColorBlueTransforms, SubtractGreenFromBlueAndRed
overall effect is sub-1% speed-up, though.
Change-Id: I9cb49af5c56e4c03db417929b0a2cf575d60a5c6
2015-06-24 20:09:13 -07:00
Pascal Massimino
abcb012841
Merge "SSE2: slightly faster (~5%) AddGreenToBlueAndRed()"
2015-06-24 09:37:46 +00:00
Pascal Massimino
2df5bd30a6
Merge "Speedup to HuffmanCostCombinedCount"
2015-06-24 07:42:26 +00:00
Pascal Massimino
9e356d6b25
SSE2: slightly faster (~5%) AddGreenToBlueAndRed()
...
Change-Id: Ie147010b66544c4e959f26966ad588394302d418
2015-06-24 09:36:44 +02:00
Pascal Massimino
fc6c75a2a2
SSE2: 53% faster TransformColor[Inverse]
...
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.
Removed the SSE4.1 implementation which is now slower than SSE2.
Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
2015-06-23 14:52:01 -07:00
Pascal Massimino
49073da6d6
SSE2: 46% speed-up of TransformColor[Inverse]
...
Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e
2015-06-23 20:09:04 +02:00
Pascal Massimino
32462a072c
Speedup to HuffmanCostCombinedCount
...
~3% speedup for lossless encoding
Improves compression ratio by ~0.03%
Change-Id: Ic6d05fb0b1099b5ca56689b92b1c6515d54a5d6b
2015-06-23 16:41:03 +02:00
Pascal Massimino
f3d687e3fa
SSE4.1 implementation of some lossless encoding functions
...
New implementations: SubtractGreenFromBlueAndRed and TransformColor
around 1-2% faster lossless encoding.
Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
2015-06-23 08:46:57 +02:00
Pascal Massimino
bfc300c7ff
SSE4.1 implementation of some alpha-processing functions
...
DispatchAlpha* functions are hard to speed up, compared to SSE2.
ExtractAlpha sees a ~15% speed-up though.
Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de
2015-06-19 14:17:39 -07:00
Pascal Massimino
7f9c98f21d
Merge "sse2 in-loop: simplify SignedShift8b() a bit"
2015-06-12 07:37:32 +00:00
James Zern
ef314a5d6c
dec_sse2/GetNotHEV: micro optimization
...
trade 2 subtractions + logical or for 1 max + 1 subtraction
Change-Id: I7d1f25f7cda2a89bc8247f3d3d5417f6b0e3d96c
2015-06-11 22:46:24 -07:00
Pascal Massimino
a729cff987
sse2 in-loop: simplify SignedShift8b() a bit
...
Change-Id: Ida3e096bb41451194d03dc7a97753a222ff0135c
2015-06-11 15:26:31 -07:00
Pascal Massimino
422ec9fb62
simplify Load8x4() a bit
...
Change-Id: I68cf09c432f48e34bbe1d47dd091417cfd40cf4e
2015-06-10 12:35:50 -07:00
James Zern
8df238ec8a
Merge "remove some duplicate FlipSign()"
2015-06-06 05:25:04 +00:00
Pascal Massimino
751506c484
remove some duplicate FlipSign()
...
ApplyFilter2NoFlip is the new variant of ApplyFilter2 without the sign-flip
Change-Id: I2af54bd1499118c8321183e42251d265ba76219c
2015-06-05 17:20:29 +02:00
James Zern
65ef5afc27
Merge "lossless: 0.13% compression density gain"
2015-06-03 03:02:09 +00:00
Jyrki Alakuijala
2beef2f245
lossless: 0.13% compression density gain
...
over a 1000 image corpus
Single photograph benchmark:
Before:
Q=20: 2.560 MP/s
Q=40: 2.593 MP/s
Q=60: 1.795 MP/s
Q=80: 1.603 MP/s
Q=99: 1.122 MP/s
After:
Q=20: 3.334 MP/s
Q=40: 2.464 MP/s
Q=60: 2.009 MP/s
Q=80: 1.871 MP/s
Q=99: 1.163 MP/s
This CL allows for some further improvements that would not be possible
otherwise.
Change-Id: I61ba154beca2266cb96469281cf96e84a4412586
2015-06-02 17:27:36 -07:00
Pascal Massimino
3033f24c26
lossless: 0.06 % compression density improvement
...
Change-Id: Ib662e6aec53b40d6bc736d3ecfd6475bb005c790
2015-06-02 14:51:51 +02:00
James Zern
64960da9e1
dec_neon: add VE8uv / VE16
...
VE8uv/VE16: ~25%/~33% faster over 20M pixels
Change-Id: Ifac1114091527a05ed10edfcc43852edff012d14
2015-05-30 13:40:00 -07:00
James Zern
14dbd87bed
dec_neon: add HE8uv / HE16
...
HE8uv/HE16: ~91%/~83% faster over 20M pixels
Change-Id: Ib0a776f7c193593ea0993e92cfa6e6be000fb810
2015-05-30 13:39:24 -07:00
skal
ac76801159
introduce FTransform2 to perform two transforms at a time.
...
FTransform goes from ~12.0% to 11.5% total CPU time.
Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
James Zern
aa6065aedd
dec_neon: use vld1_dup(mem) rather than vdup(mem[0])
...
should result in slightly less general purpose register use
Change-Id: I6069f49541392e56c8db2c28c8d1fdf88c1a1726
2015-05-16 11:24:32 -07:00
Pascal Massimino
8b63ac78e0
Merge "dec_neon: add TM16"
2015-05-16 10:56:07 +00:00
Pascal Massimino
f51be09e1f
Merge "dec_neon/TrueMotion: simply left border load"
2015-05-16 10:54:05 +00:00
James Zern
dc48196bd9
dec_neon: add TM16
...
over 20M pixels ~78% faster
Change-Id: I420d5d590f275f19e08f86df1d1caa6b82fffbde
2015-05-15 12:50:11 -07:00
James Zern
ea95b305ca
dec_neon/TrueMotion: simply left border load
...
use vld1_dup_u8() rather than a separate ld+dup after the values were
zero extended; mildly faster at the function level
Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89
2015-05-15 12:48:13 -07:00
Pascal Massimino
f262d6120e
speed-up SetResidualSSE2
...
(was unnecessarily complicated)
Before:
VP8SetResidualCoeffs: checksum = 1127918 elapsed = 475 ms.
Change-Id: Ia54bef86c45f9f474622ff16e594bf1da4f67ebd
After:
VP8SetResidualCoeffs: checksum = 1127918 elapsed = 404 ms.
2015-05-14 21:24:24 -07:00
James Zern
bf46d0acff
fix mips2 build target
...
tested with mips1 and mips2; this should cover 3/4 as well.
fixes an ftbfs reported on the debian issue tracker:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785000
Change-Id: I2458487c92bd638589fdfec5adb4f22102a5960c
2015-05-13 10:36:22 -07:00
James Zern
929a0fdccd
enc_sse2/TTransform: simplify abs calculation
...
max(b, 0 - b) works as well as (b ^ sign) - b
Change-Id: Iad923236fd70db85ff58a64d3c8e25e4f42a525d
2015-05-08 19:50:29 -07:00
James Zern
17dbd05819
enc_sse2/CollectHistogram: simplify abs calculation
...
max(out, 0 - out) works as well as (out ^ sign) - out
Change-Id: Id820ab9b296512cb0d56c8026b986bf98e3d3909
2015-05-08 19:49:08 -07:00
James Zern
a6c1593645
dec_neon: add DC16 intra predictors
...
improvement over 20M pixels:
DC16: ~77%
DC16NoTop: ~78%
DC16NoLeft: ~83%
DC16NoTopLeft: ~83%
Change-Id: I4c4ee16a8fa0eb466eee45dfa6f6bbce5ce64b99
2015-05-08 00:12:48 -07:00
Urvang Joshi
03b4f50d39
Makefile.vc: add anim_diff build support.
...
Change-Id: Ib5efc5cffea2d906640c81348db26ae28d28d3f1
2015-05-07 12:00:47 -07:00
Pascal Massimino
1b989874a7
Merge changes I9cd84125,Iee7e387f,I7548be72
...
* changes:
dsp/enc_sse2: add luma4 intra predictors
dsp/enc_sse2: add chroma intra predictors
dsp/enc_sse2: add luma16 intra predictors
2015-05-07 11:19:12 +00:00
Urvang Joshi
acd7b5af0f
Introduce a test tool anim_diff.
...
It can be used to test if given pair of animated images (GIF and/or
WebP) are identical in terms of pixel match and other animation
properties.
Change-Id: I84adea145e9d062be6ad06a0d4fcdc9658cf52d4
2015-05-06 17:17:03 -07:00
James Zern
f274a96ce9
dsp/enc_sse2: add luma4 intra predictors
...
VP8EncPredLuma4 improvement over ~20M pixels: ~39%
Change-Id: I9cd841250771276d2d1bef3991215a56e83f7f20
2015-05-05 23:51:19 -07:00
James Zern
040b11bdf6
dsp/enc_sse2: add chroma intra predictors
...
VP8EncPredChroma8 improvements over ~20M pixels
left/top: ~67%
left-only: ~52%
top-only: ~57%
none: ~61%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76
2015-05-05 23:51:14 -07:00
James Zern
aee021bbb1
dsp/enc_sse2: add luma16 intra predictors
...
VP8EncPredLuma16 improvements over ~20M pixels
left/top: ~75%
left-only: ~47%
top-only: ~59%
none: ~63%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d
2015-05-05 23:51:07 -07:00
James Zern
9e00a499a6
makefile.unix: remove superclean target
...
this target is out of date and there are better ways to make a clean
tree (the first 2 versioned, the last one not):
make distclean (when using autoconf)
git clean -fdx
git archive
Change-Id: I766b75e0adf566c6f7db1a087ff486020b031b3a
2015-05-02 11:31:40 -07:00
James Zern
cefc9c0964
makefile.unix: clean up after extras target
...
Change-Id: I3e2d259473db9f3649d18120513f8edcba64c5e6
2015-05-02 11:31:40 -07:00
James Zern
4c9af02326
dec_neon: add DC8uvNoTopLeft
...
~93% faster
Change-Id: Icf0fd5f85ac53c306a1b69d84275023e5b24a602
2015-05-01 20:03:57 -07:00
James Zern
dd55b8734a
Merge "doc/webp-container-spec: update repo browser link"
2015-04-30 07:20:58 +00:00
James Zern
f0486968ba
doc/webp-container-spec: update repo browser link
...
gerrit.chromium.org is deprecated, use chromium.googlesource.com.
Change-Id: Iaa6d6d18798dbd8cce908988287387f5cb8e8e64
2015-04-29 23:31:34 -07:00