Commit Graph

525 Commits

Author SHA1 Message Date
James Zern
1df1d0eedb rescaler: harmonize function protos
Change-Id: I13b5f9add83c1225c82a650f3ef717582b057247
2015-09-19 22:57:25 -07:00
Pascal Massimino
9ba1894b9b rescaler: simplify ImportRow logic
incorporates the loop over 'channel' and removes one parameter

Change-Id: I4e3b33c111ca825fe96461583420413b17326409
2015-09-19 10:07:26 -07:00
Pascal Massimino
5ff0079ece fix rescaler vertical interpolation
* vertical expansion now uses bilinear interpolation
  * heavily assumes that the alpha plane is decoded in full, not row-by-row
  * split the RescalerExportRow and RescalerImportRow methods into Shrink
    and Expand variants.
  * MIPS implementation of ExportRowExpand is missing.

There's room for extra speed optim and code re-org, but let's keep that for later patches.

addresses https://code.google.com/p/webp/issues/detail?id=254

Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb
2015-09-18 17:32:11 -07:00
James Zern
d623a8706f dec_neon: add whitespace around stringizing operator
prevents unintentional side-effects (though unlikely in this case) with
future compilers, cf:
eebaf97 dsp/mips: add whitespace around stringizing operator

Change-Id: I0537091fcc97b4f54d0a156c3c83a28c51456b17
2015-09-03 23:13:56 -07:00
James Zern
29377d55b6 dsp/mips: cosmetics: add whitespace around XSTR macro
normalizes formatting after:
eebaf97 dsp/mips: add whitespace around stringizing operator

Change-Id: I1e3986b6d08195d79072747eb99d7e0549aece72
2015-09-03 23:09:13 -07:00
James Zern
eebaf97f5a dsp/mips: add whitespace around stringizing operator
fixes compile with gcc 5.1
BUG=259

Change-Id: Ideb39c6290ab8569b1b6cc835bea11c822d0286c
2015-09-02 23:21:13 -07:00
James Zern
14efabbf1c Android: limit use of cpufeatures
cpufeatures is only used with armeabi-v7a.*

Change-Id: I80284061d71d9defa50d139c7f1bda67c00f567e
2015-08-19 18:44:33 -07:00
skal
bd55604d1b SSE2: add yuv444 converters, re-using yuv_sse2.c
Change-Id: I4d5c9df8a4c8e8cb8b5daa537af07382894503a8
2015-08-17 21:15:37 -07:00
James Zern
155c1b222b Merge changes I76f4d6fe,I45434639
* changes:
  lossless_enc_neon: add VP8LTransformColor
  lossless_neon: add VP8LTransformColorInverse
2015-08-06 23:00:03 +00:00
Djordje Pesut
717e4d5a7c mips32/mipsDSPr2: function ImportRow rebased
Change-Id: Id58d266040fdb5fe1e507cd0f6370ea625156e4d
2015-08-06 17:09:10 +02:00
Pascal Massimino
7df93893dc fix rescaling bug (uninitialized read, see bug #254).
the x_add/x_sub increments were wrong for u/v in the upscaling case.
They shouldn't be left to the caller's discretion, but set up by
WebPRescalerInit to their exact necessary values.

-> Cleaned-up WebPRescalerInit() param list.
-> added safety asserts
-> removed the mips32/mips_r2 variant of "ImportRow" which were buggy prior

Change-Id: I347c75804d835811e7025de92a0758d7929dfc09
2015-08-05 23:00:00 -07:00
James Zern
5cdcd561e2 lossless_enc_neon: add VP8LTransformColor
based on SSE2, ~32% faster

Change-Id: I76f4d6fe456baceba46ffebf2f699e98691eefdf
2015-08-05 00:15:13 -07:00
James Zern
a53c336919 lossless_neon: add VP8LTransformColorInverse
based on SSE2, only ~11% faster

Change-Id: I45434639d81e153f01f77c1f5d2da510b542170e
2015-08-04 23:22:36 -07:00
James Zern
99131e7f8c Merge changes I9fb25a89,Ibc648e9e
* changes:
  lossless_neon: remove predictors 5-13
  ll_enc_neon: enable VP8LSubtractGreenFromBlueAndRed
2015-08-04 02:24:15 +00:00
Pascal Massimino
c455676680 simplify the main loop for downscaling
(part of bug #254 investigation)

no speed change observed.

Change-Id: Ie21b33171def367f37643fef6a0bd378e49468c7
2015-08-03 16:57:35 +02:00
James Zern
2a010f992a lossless_neon: remove predictors 5-13
operating on single uint32's isn't helped by NEON.
this improves aarch64 performance by ~4%

Change-Id: I9fb25a8962de7b80e893e756ee7c76393cfd40c7
2015-07-28 19:44:58 -07:00
James Zern
ca221bbc48 ll_enc_neon: enable VP8LSubtractGreenFromBlueAndRed
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~54% faster at the
function level and mildly faster overall

Change-Id: Ibc648e9ee35021d48901e05aa596aa01067796a2
2015-07-28 19:44:45 -07:00
Jyrki Alakuijala
85b44d8a69 lossless: encoding, don't compute unnecessary histo
share the computation between different modes

3-5 % speedup for lossless alpha
1 % for lossy alpha

no change in compression density

Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
2015-07-07 20:24:26 -07:00
Pascal Massimino
0ae2c2e4b2 SSE2/SSE41: optimize SSE_16xN loops
After several trials at re-organizing the main loop and accumulation scheme,
this is apparently the faster variant.

removed the SSE41 version, which is no longer faster now.
For some reason, the AVX variant seems to benefit most for the change.

Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5
2015-07-02 20:55:04 +02:00
James Zern
39216e59d9 cosmetics: fix indent after 32462a07
Change-Id: If9a5d91c25e981bc4cd81adb476244e63fc7c3c8
2015-07-01 23:49:20 -07:00
James Zern
559e54ca60 Merge "SSE2: slightly faster FTransformWHT" 2015-07-02 06:36:33 +00:00
Pascal Massimino
8ef9a63b45 SSE2: slightly faster FTransformWHT
goes from 0.3% to 0.1% overall CPU time, but...

Change-Id: I4c9a92b1e1d6b58ed57c6b890366f1dbeaf84f84
2015-07-01 23:03:17 -07:00
James Zern
f27f773576 lossless_neon: enable VP8LAddGreenToBlueAndRed
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~70% faster at the
function level and 1-2% faster overall

Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb
2015-07-01 22:50:54 -07:00
Pascal Massimino
36e9c4bc50 SSE2: minor cosmetrics on in-loop filter code
Change-Id: Ic0e6502081d7063bb2841df74e05c450d708aaf2
2015-06-28 11:59:22 +02:00
James Zern
4741fac42e dsp/lossless_*sse2: remove some unnecessary inlines
TransformColor / TransformColorInverse are the top-level function
pointer calls

Change-Id: Ieabdb4005ff3e4f9bb3ebcb140ccb6bef5d28f8b
2015-06-25 21:02:01 -07:00
Pascal Massimino
1819965e0a fix warning ("left shift of negative value") using a cast
Change-Id: Ie99e8ff87924a1d15e2c5d83bd9adf07dab04e94
2015-06-24 23:46:09 -07:00
Pascal Massimino
7017001462 SSE2: speed-up some lossless-encoding functions
optimized: CollectColorRedTransforms, CollectColorBlueTransforms, SubtractGreenFromBlueAndRed

overall effect is sub-1% speed-up, though.

Change-Id: I9cb49af5c56e4c03db417929b0a2cf575d60a5c6
2015-06-24 20:09:13 -07:00
Pascal Massimino
abcb012841 Merge "SSE2: slightly faster (~5%) AddGreenToBlueAndRed()" 2015-06-24 09:37:46 +00:00
Pascal Massimino
2df5bd30a6 Merge "Speedup to HuffmanCostCombinedCount" 2015-06-24 07:42:26 +00:00
Pascal Massimino
9e356d6b25 SSE2: slightly faster (~5%) AddGreenToBlueAndRed()
Change-Id: Ie147010b66544c4e959f26966ad588394302d418
2015-06-24 09:36:44 +02:00
Pascal Massimino
fc6c75a2a2 SSE2: 53% faster TransformColor[Inverse]
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.

Removed the SSE4.1 implementation which is now slower than SSE2.

Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
2015-06-23 14:52:01 -07:00
Pascal Massimino
49073da6d6 SSE2: 46% speed-up of TransformColor[Inverse]
Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e
2015-06-23 20:09:04 +02:00
Pascal Massimino
32462a072c Speedup to HuffmanCostCombinedCount
~3% speedup for lossless encoding
Improves compression ratio by ~0.03%

Change-Id: Ic6d05fb0b1099b5ca56689b92b1c6515d54a5d6b
2015-06-23 16:41:03 +02:00
Pascal Massimino
f3d687e3fa SSE4.1 implementation of some lossless encoding functions
New implementations: SubtractGreenFromBlueAndRed and TransformColor

around 1-2% faster lossless encoding.

Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
2015-06-23 08:46:57 +02:00
Pascal Massimino
bfc300c7ff SSE4.1 implementation of some alpha-processing functions
DispatchAlpha* functions are hard to speed up, compared to SSE2.
ExtractAlpha sees a ~15% speed-up though.

Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de
2015-06-19 14:17:39 -07:00
Pascal Massimino
7f9c98f21d Merge "sse2 in-loop: simplify SignedShift8b() a bit" 2015-06-12 07:37:32 +00:00
James Zern
ef314a5d6c dec_sse2/GetNotHEV: micro optimization
trade 2 subtractions + logical or for 1 max + 1 subtraction

Change-Id: I7d1f25f7cda2a89bc8247f3d3d5417f6b0e3d96c
2015-06-11 22:46:24 -07:00
Pascal Massimino
a729cff987 sse2 in-loop: simplify SignedShift8b() a bit
Change-Id: Ida3e096bb41451194d03dc7a97753a222ff0135c
2015-06-11 15:26:31 -07:00
Pascal Massimino
422ec9fb62 simplify Load8x4() a bit
Change-Id: I68cf09c432f48e34bbe1d47dd091417cfd40cf4e
2015-06-10 12:35:50 -07:00
James Zern
8df238ec8a Merge "remove some duplicate FlipSign()" 2015-06-06 05:25:04 +00:00
Pascal Massimino
751506c484 remove some duplicate FlipSign()
ApplyFilter2NoFlip is the new variant of ApplyFilter2 without the sign-flip

Change-Id: I2af54bd1499118c8321183e42251d265ba76219c
2015-06-05 17:20:29 +02:00
James Zern
65ef5afc27 Merge "lossless: 0.13% compression density gain" 2015-06-03 03:02:09 +00:00
Jyrki Alakuijala
2beef2f245 lossless: 0.13% compression density gain
over a 1000 image corpus

Single photograph benchmark:
Before:
Q=20: 2.560 MP/s
Q=40: 2.593 MP/s
Q=60: 1.795 MP/s
Q=80: 1.603 MP/s
Q=99: 1.122 MP/s

After:
Q=20: 3.334 MP/s
Q=40: 2.464 MP/s
Q=60: 2.009 MP/s
Q=80: 1.871 MP/s
Q=99: 1.163 MP/s

This CL allows for some further improvements that would not be possible
otherwise.

Change-Id: I61ba154beca2266cb96469281cf96e84a4412586
2015-06-02 17:27:36 -07:00
Pascal Massimino
3033f24c26 lossless: 0.06 % compression density improvement
Change-Id: Ib662e6aec53b40d6bc736d3ecfd6475bb005c790
2015-06-02 14:51:51 +02:00
James Zern
64960da9e1 dec_neon: add VE8uv / VE16
VE8uv/VE16: ~25%/~33% faster over 20M pixels

Change-Id: Ifac1114091527a05ed10edfcc43852edff012d14
2015-05-30 13:40:00 -07:00
James Zern
14dbd87bed dec_neon: add HE8uv / HE16
HE8uv/HE16: ~91%/~83% faster over 20M pixels

Change-Id: Ib0a776f7c193593ea0993e92cfa6e6be000fb810
2015-05-30 13:39:24 -07:00
skal
ac76801159 introduce FTransform2 to perform two transforms at a time.
FTransform goes from ~12.0% to 11.5% total CPU time.

Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
James Zern
aa6065aedd dec_neon: use vld1_dup(mem) rather than vdup(mem[0])
should result in slightly less general purpose register use

Change-Id: I6069f49541392e56c8db2c28c8d1fdf88c1a1726
2015-05-16 11:24:32 -07:00
Pascal Massimino
8b63ac78e0 Merge "dec_neon: add TM16" 2015-05-16 10:56:07 +00:00
Pascal Massimino
f51be09e1f Merge "dec_neon/TrueMotion: simply left border load" 2015-05-16 10:54:05 +00:00