James Zern
92982609bc
dsp.h: fix -Wundef w/__mips_dsp_rev
...
Change-Id: I552a543c7b039774041b43ace75b0cbea566b119
2017-07-11 16:12:32 -07:00
James Zern
4ea49f6b82
rescaler_sse2.c: fix WEBP_RESCALER_FIX -> _RFIX typo
...
quiets -Wundef
Change-Id: I8f1facf401b6f1ab393005c93086ac3e2ae354d5
2017-07-11 15:35:27 -07:00
James Zern
b34a9db1a1
cosmetics,dec_sse2: remove some redundant comments
...
Change-Id: I5a59d6dde9b6638b318f36d51d0d53870a3de273
2017-07-06 23:19:18 -07:00
Vincent Rabaud
8acb4942f7
Remove the argb* files.
...
Half of the functionality was duplicated.
The rest is about the alpha channel handling so we
might as well put it in the appropriate file.
Change-Id: I8d5ef0afce82cc4842ab7132fd97995c42e6140a
2017-06-25 14:44:33 +02:00
Vincent Rabaud
7ca0df1363
Have the SSE2 version of PackARGB use common code.
...
The common code actually got sped-up by 25% by using the code
from PackARGB.
Change-Id: I94be6ccff2bfe02fff13c8e2698669e6a0d8fc74
2017-06-20 17:41:14 +02:00
Vincent Rabaud
8f6df1d0b9
Unroll Predictors 10, 11 and 12.
...
We see the following speed-ups:
10 -> 13%
11 -> 13%
12 -> 13%
Change-Id: I4734fd388d0f4e508884d0b123976bf2cbe69d2f
2017-06-08 20:37:47 +02:00
Vincent Rabaud
e4eb458741
lossless, VP8LTransformColor_C: make sure no overflow happens with colors.
...
Change-Id: Iec0d07cf1188ba96391cdb1b62131fc1469dfac6
2017-05-24 11:34:40 +02:00
Pascal Massimino
faf42213f4
NEON: implement ConvertRGB24ToY/BGR24/ARGB/RGBA32ToUV/ARGBToUV
...
Change-Id: Ie68aaed36d17f56d998c1b284514860cf5d28b8a
2017-05-09 15:57:20 +02:00
Pascal Massimino
f768218966
yuv: rationalize the C/SSE2 function naming
...
+ implement some easy missing targets in SSE2 (565/4444)
Change-Id: Ib575f7ada2a0ed7309cddd238f8bfc0e8999f145
2017-04-21 13:52:25 +02:00
Pascal Massimino
52245424b0
NEON implementation of some Sharp-YUV420 functions
...
Change-Id: I449ef9c76b06f971f6e2ad7f9db96bf906d8fe1f
new-file: dsp/yuv_neon.c
2017-04-18 19:22:37 +02:00
Pascal Massimino
28c37ebd5a
VP8LEnc: remove use of BitsLog2Ceiling()
...
was only used once. Better fall back for Log2Floor.
Change-Id: Ibcc26505440971bffe62ba6aca3d179ca85791d4
2017-03-20 02:58:16 -07:00
James Zern
80a2218668
ssim.c: remove dead include
...
Change-Id: Ia4be534b3b95d5d9f712ff53e530c98b942df860
2017-02-21 20:17:19 -08:00
Pascal Massimino
693bf74ec0
move the SSIM calculation code in ssim.c / ssim_sse2.c
...
Change-Id: I63a63fa7f44f257f2e17e45358b206c23069c448
2017-02-21 12:53:35 +01:00
Pascal Massimino
4105d565d3
disable WEBP_USE_XXX optimisations when EMSCRIPTEN is defined
...
Currently, none are available. If WEBP_HAVE_SSE2 eventually works,
we'll have to refine this conditionals.
BUG=webp:261
Change-Id: Ibc63ee1c013f2a4169eeb85cc8b6317b6420c2ad
2017-02-08 15:44:20 +00:00
Parag Salasakar
aa893914fc
Add clang build fix for MSA
...
Change-Id: If139f4ecbdce756c69ba4ae032a70f81179683f8
2017-02-01 17:45:17 +05:30
Pascal Massimino
4f3e3bbd44
disable GradientUnfilter_NEON
...
Compile with XCode, it appears quite slower than the C-version,
especially for arm64.
Change-Id: Ic46dba184a36be454fef674129d2f909003788fc
2017-01-25 16:33:26 -08:00
Pascal Massimino
79bf46f120
rename the pretentious SmartYUV into SharpYUV
...
Change-Id: Ifeeb9cb85896c5f3ba0cc1c2c821f8d00295f69e
2017-01-20 14:36:21 +01:00
James Zern
668e1dd44f
src/{dec,enc,utils}: give filenames a unique suffix
...
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
2017-01-19 19:09:48 -08:00
Pascal Massimino
71c53f1aeb
NEON: speed-up strong filtering
...
The sub-expression trick removes two constants and
two vmlal_s8 instructions.
Change-Id: I200022573b4880871b528b13a11a8f3d95def113
2017-01-19 20:46:48 +00:00
Pascal Massimino
749a45a520
Merge "NEON: implement alpha-filters (horizontal/vertical/gradient)"
2017-01-17 15:13:08 +00:00
Pascal Massimino
74c053b57d
Merge "NEON: fix overflow in SSE NxN calculation"
2017-01-17 15:10:54 +00:00
Pascal Massimino
1de931c669
NEON: implement alpha-filters (horizontal/vertical/gradient)
...
gradient-filter code is not much faster, but maybe improvable in the future.
Change-Id: Ia16070e409fe8703b02276166f19526917df6b35
2017-01-17 15:44:46 +01:00
Pascal Massimino
9b3aca404d
NEON: fix overflow in SSE NxN calculation
...
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.
Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
2017-01-17 11:47:36 +01:00
Pascal Massimino
1c07a3c639
dsp: WebPExtractGreen function for alpha decompression
...
+ NEON implementation
Change-Id: I67204f99d6e4c5974718bdf21dad30381978f72c
2017-01-17 09:33:25 +00:00
Pascal Massimino
8fda56126e
Merge "add a kSlowSSSE3 feature for CPUInfo"
2017-01-13 07:01:48 +00:00
Pascal Massimino
86bbd24552
add a kSlowSSSE3 feature for CPUInfo
...
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.
Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731
Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
2017-01-13 06:19:27 +00:00
Vincent Rabaud
7c2779e95a
Get code to fully compile in C++.
...
Change-Id: I6d8490c8c9b955d90dcc89ee8a9cf29ca0f93b08
2017-01-12 18:03:55 +01:00
Vincent Rabaud
250c358662
Merge "When compiling as C++, avoid narrowing warnings."
2017-01-12 13:00:56 +00:00
Vincent Rabaud
c0648ac2ae
When compiling as C++, avoid narrowing warnings.
...
The gcc compilation warning was: narrowing conversion from ‘int’ to ‘int8_t’
Change-Id: I4803dd60ad04060cdb5d61a1aa98b25215b9d4eb
2017-01-12 13:39:22 +01:00
Pascal Massimino
0d55f60c91
40% faster ApplyAlphaMultiply_SSE2
...
process four pixels at a time
Change-Id: I1dee7f70772be4915654fc6638ef4729a1a239d4
2017-01-12 02:33:09 -08:00
Pascal Massimino
49d0280df1
NEON: implement several alpha-processing functions
...
- ApplyAlphaMultiply
- DispatchAlpha
- DispatchAlphaToGreen
- ExtractAlpha
Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4)
new file: alpha_processing_neon.c
Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca
2017-01-11 17:39:29 +01:00
Pascal Massimino
48b1e85fbe
SSE2: 15% faster alpha-processing functions
...
ApplyAlphaMultiply / MultARGBRow / MultRow
we use now: x/255 = (x * 0x8081) >> (16 + 7)
and x/255 + .5 = ((x + 128) * 0x0101) >> 16
Change-Id: I8931091316ffc8bbf65aa3402f2e7d2b800e1971
2017-01-11 15:35:16 +01:00
Pascal Massimino
28fe054e73
SSE2: 30% faster ApplyAlphaMultiply()
...
and 15% faster MultARGBRow()
by switching to formulae:
X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
(X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128
Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
2017-01-10 23:34:22 +01:00
Pascal Massimino
be0ef6395f
fix a comment typo
...
Change-Id: I0fabd08cd8abd3cea7ddfd2e498507adb0d3c67e
2017-01-10 21:17:13 +01:00
Pascal Massimino
00b08c88c0
Merge "NEON: 5% faster conversion to RGB565 and RGBA4444"
2016-12-22 08:39:01 +00:00
Pascal Massimino
0e7f444702
Merge "NEON: faster fancy upsampling"
2016-12-21 14:53:24 +00:00
Pascal Massimino
b016cb91c5
NEON: faster fancy upsampling
...
2-3% faster decoding overall
Change-Id: I2c53e50dc7e0ade5245cff8cc5d7b96a14062955
2016-12-21 15:23:54 +01:00
Vincent Rabaud
1cb638010c
Call the C function to finish off lossless SSE loops only when necessary.
...
Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad
2016-12-21 14:25:54 +01:00
Vincent Rabaud
875fafc191
Implement BundleColorMap in SSE2.
...
Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e
2016-12-21 10:44:31 +01:00
Pascal Massimino
341d711c43
NEON: 5% faster conversion to RGB565 and RGBA4444
...
We use the magic 'shift and insert' instruction instead of
the multiple shifts and or's.
Change-Id: I48df0320668b502a91792defc0423a9441669d19
2016-12-20 17:01:48 +01:00
Pascal Massimino
a4bbe4b38b
fix indentation
...
Change-Id: I5593fb2441f253c6b8cc43949c11909f19184b55
2016-12-13 22:50:29 -08:00
Pascal Massimino
58fc507842
Merge "PredictorSub: implement fully-SSE2 version"
2016-12-13 11:03:13 +00:00
Pascal Massimino
9cc421675b
PredictorSub: implement fully-SSE2 version
...
and inline the C-version too.
Predictor #13 is still a hard one.
Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331
2016-12-13 02:19:35 -08:00
James Zern
2423017a28
dsp/lossless.c,cosmetics: fix indent
...
after:
fbba5bc
optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...
Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
2016-12-12 12:53:23 -08:00
Pascal Massimino
fbba5bc2c1
optimize predictor #1 in plain-C
...
For some reason, gcc has hard time inlining this one...
Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]
Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
2016-12-12 17:41:36 +01:00
Pascal Massimino
9ae0b3f65a
Merge "SSE2: slightly (~2%) faster Predictor #1 "
2016-12-12 14:46:21 +00:00
Pascal Massimino
c1f97bd758
SSE2: slightly (~2%) faster Predictor #1
...
by removing a load from memory
Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0
2016-12-12 02:24:38 -08:00
Pascal Massimino
ea664b8995
SSE2: 10% faster Predictor #11
...
Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185
2016-12-12 02:20:41 -08:00
Pascal Massimino
b3fb8bb602
slightly faster Predictor #11 in NEON
...
(+some slight modifications on Predictor #12 )
Change-Id: Ic2132dcd83d961cd069fa01ca1670e35e35274e2
2016-12-08 07:32:51 -08:00
Pascal Massimino
76ebbfff28
NEON: implement predictor #13
...
~5-7% faster
Change-Id: I3361b0bbc978f3721168db15778a67337309c18a
2016-12-07 14:58:49 -08:00