James Zern
b9e734fd5c
dec,cosmetics: normalize function naming style
...
Change-Id: I33a2d1b4133db7a6d56d506f5c19670f0268cecd
2017-11-21 14:31:34 -08:00
James Zern
c188d546b3
dec: harmonize function suffixes
...
BUG=webp:355
Change-Id: Iabdfd3fbde906c2e35a7d7c080a8512425eb8ccb
2017-11-21 13:00:25 -08:00
James Zern
28c5ac8104
dec_sse41: harmonize function suffixes
...
BUG=webp:355
Change-Id: Id55f7b2e6288d1d0885d8451fbc59771222073d6
2017-11-21 12:47:06 -08:00
Pascal Massimino
e65b72a368
Merge "introduce WebPHasAlpha8b and WebPHasAlpha32b"
2017-11-21 06:21:44 +00:00
James Zern
b94cee98fb
dec_sse2: remove HE8uv_SSE2
...
with gcc-4.8, clang-4.0.1/5 this is no faster (actually up to 2x slower)
than the code generated for memset (0x01010... * dst[-1]). shuffles in
sse4 recover a bit, but performance is still down.
Change-Id: Ie85e8353f8ede559d0b05a1d388787fd18ecc80f
2017-11-20 20:34:05 -08:00
Pascal Massimino
44a0ee3fa7
introduce WebPHasAlpha8b and WebPHasAlpha32b
...
Rewrote WebPPictureHasTransparency() to use them (even for argb).
This is 10% faster, for some reasons.
SSE2 version should be straightforward.
Removes a TODO.
Change-Id: I7ad5848fc5e355e2df505dbcd5a0f42fb6cbab41
2017-11-20 15:20:29 +01:00
Vincent Rabaud
c462cd0065
Remove useless code.
...
The casts are to the same type and the #define not used.
Change-Id: I8d69c3b9dde7a1c53c2ba5a026a653d8c2e1d2a7
2017-11-08 10:52:49 +01:00
James Zern
b7971d0e22
dsp: avoid defining _C functions w/NEON builds
...
when targeting NEON C functions with NEON equivalents won't be used, but
will contribute to binary size. the same goes for sse2, etc., but this
change is primarily concerned with binary sizes for android arm targets.
note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect
on the use of NEON functions.
this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0.
Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface
2017-10-27 10:54:56 -07:00
James Zern
8d033b14d7
{dec,enc}_neon: harmonize function suffixes x2
...
+ neon.h
BUG=webp:355
Change-Id: Ia17c7dfc7d61742a4758823675a2d556a739c389
2017-10-20 19:00:53 -07:00
James Zern
0295e9815d
upsampling_neon: harmonize function suffixes
...
BUG=webp:355
Change-Id: I75423abbe0bcea3c98a42e412cc2116be81b5d08
2017-10-20 19:00:53 -07:00
James Zern
d572c4e52b
yuv_neon: harmonize function suffixes
...
BUG=webp:355
Change-Id: Ia2f716b459950c18717b062175197d1e6419bf2a
2017-10-20 19:00:53 -07:00
James Zern
ab9c2500db
rescaler_neon: harmonize function suffixes
...
BUG=webp:355
Change-Id: I161caa14f7ebbc3ae978b1722472625a77d0a4a4
2017-10-20 19:00:53 -07:00
James Zern
93e0ce27f4
lossless_neon: harmonize function suffixes
...
BUG=webp:355
Change-Id: I4210081a39800b5c2589c443da237269908af666
2017-10-20 19:00:53 -07:00
James Zern
22fbc50edd
lossless_enc_neon: harmonize function suffixes
...
BUG=webp:355
Change-Id: I462facaeade4f0f4fc1e96895493306d095a6a9a
2017-10-20 19:00:53 -07:00
James Zern
447875b47b
filters_neon,cosmetics: fix indent
...
BUG=webp:355
Change-Id: I9df1119f1ea94868f75253a92c2e878c9290f744
2017-10-20 19:00:29 -07:00
James Zern
785da7eadd
enc_neon: harmonize function suffixes
...
BUG=webp:355
Change-Id: Ie59efd271d16f12d21f3c800667dfc0980dc2e68
2017-10-20 00:18:32 -07:00
James Zern
bc1a251fcf
dec_neon: harmonize function suffixes
...
BUG=webp:355
Change-Id: I61c9a0c9e24515322955e04afd8c4ea6a44b9319
2017-10-20 00:14:18 -07:00
James Zern
61e535f1ac
dsp/lossless: workaround gcc-4.8 bug on arm
...
and all older versions.
force Sub3() to not be inlined, otherwise the code in Select() will be
incorrect.
extends the check add previously in:
637b3888
dsp/lossless: workaround gcc-4.9 bug on arm
BUG=webp:363
Change-Id: I1403b558f8660b764f3a570a3326822d5ef0be29
2017-10-19 13:05:48 -07:00
Pascal Massimino
0a17f4712c
Merge "WIP: list includes as descendants of the project dir"
2017-10-11 08:21:42 +00:00
James Zern
a439972175
WIP: list includes as descendants of the project dir
...
#include "(.|..)/..." -> #include "src/..."
Change-Id: I772880aa097a770722043c8a4393552ba38a89b6
2017-10-10 23:04:05 -07:00
James Zern
d361a6a733
yuv_sse2: harmonize function suffixes
...
BUG=webp:355
Change-Id: I02a66f7446c75a10c3ce4766235e5767617d0dce
2017-10-08 14:06:34 -07:00
James Zern
6921aa6f0c
upsampling_sse2: harmonize function suffixes
...
BUG=webp:355
Change-Id: I3a02cc717eb7506bd87511d6a17ab1691e84f72c
2017-10-08 14:06:30 -07:00
James Zern
08c67d3ed1
ssim_sse2: harmonize function suffixes
...
BUG=webp:355
Change-Id: I1282559888118b8cb0a46b7f0aa627d26b8838f5
2017-10-08 14:06:24 -07:00
James Zern
582a1b572a
rescaler_sse2: harmonize function suffixes
...
BUG=webp:355
Change-Id: I978fd826ff90149c0ffd9d7607dcc6f88082d3e6
2017-10-08 14:06:19 -07:00
James Zern
2c1b18ba2f
lossless_sse2: harmonize function suffixes
...
BUG=webp:355
Change-Id: I59d828800c2ab2a36e0ea90f629b74bd57207411
2017-10-08 14:06:14 -07:00
James Zern
0ac46e818b
lossless_enc_sse2: harmonize function suffixes
...
BUG=webp:355
Change-Id: I06c64416103c3f3fc0519dd46d64b0a35f9798e4
2017-10-08 14:06:05 -07:00
James Zern
bc634d57c2
enc_sse2: harmonize function suffixes
...
BUG=webp:355
Change-Id: Idd2f289fcf99f12bf36494111b07a8906c99c826
2017-10-08 14:05:59 -07:00
James Zern
bcb7347c2b
dec_sse2: harmonize function suffixes
...
BUG=webp:355
Change-Id: Ic0390a4a24a5d8caff5b8af9fc9d59769ec533b1
2017-10-07 15:14:03 -07:00
James Zern
fb3daad604
cpu: fix ssse3 check
...
ssse3 is bit #9 in ecx, bit 1 is sse3. this only controls the check for
slow ssse3 and likely had no ill effect.
Change-Id: I84ce73dc480e1cdbd085e37be06f3f402116c201
2017-09-29 16:27:47 -07:00
Vincent Rabaud
a5216efc8c
Fix integer overflow warning.
...
Though the overflow could happen, it does not change the
end results.
Change-Id: I1b84e022a0776d35eab5c5c4fb7d3563f5667bfa
2017-09-25 11:02:22 +02:00
James Zern
f78da3dea6
add LOCAL_CLANG_PREREQ and avoid WORK_AROUND_GCC w/3.8+
...
this results in a 15-20% speedup for lossy decoding on a N5/S6/CM1
BUG=webp:339
Change-Id: Icdeb84c3e0b8908147ac276b4d8f76c3d565b735
2017-09-19 20:59:49 -07:00
James Zern
01c426f1e7
define WEBP_USE_INTRINSICS w/gcc-4.9+
...
32-bit builds are neutral to slightly faster using ndk r15c on a
N5/S6/CM1
BUG=webp:339
Change-Id: I94b9442e0ceaf2f5edb2b4026bc8b99cd77c918b
2017-09-19 20:59:43 -07:00
Pascal Massimino
3822762a6c
rationalize the Makefile.am
...
one library addition per line, etc...
BUG=webp:355
Change-Id: I95761dea598a382db5632c5187210937e129ff75
2017-08-29 00:00:14 -07:00
Pascal Massimino
42c79aa66b
Merge "Encoder: harmonize function suffixes"
2017-08-09 18:13:57 +00:00
skal
b09307dcde
Encoder: harmonize function suffixes
...
BUG=webp:355
Change-Id: Ia2fe95db7dfb303f3f64e390d43bc41b8933256c
2017-08-09 02:41:01 +00:00
James Zern
bed0456d58
Merge "SSIM: harmonize the function suffix"
2017-08-09 02:37:39 +00:00
skal
54f6a3cf3a
lossless_sse2.c: fix some missed suffix changes
...
BUG=webp:355
Change-Id: If830e3169a4021899ed850aa7edfd94b81fa2cf9
2017-08-08 14:19:05 -07:00
skal
088f1dcce8
SSIM: harmonize the function suffix
...
BUG=webp:355
Change-Id: I751852ddb2abb7319e41e6c7d022ac4f288b4d08
2017-08-08 08:52:06 -07:00
skal
a0f72a4fe0
VP8LTransformColorFunc: drop an non-respected 'const' from the signature.
...
BUG=webp:355
Change-Id: Ie99bf377a55db2950bfbac9423bfe0967623ea5d
2017-08-07 19:05:01 -07:00
Pascal Massimino
8c934902cd
Merge "Lossess dec: harmonize the function suffixes"
2017-08-08 02:04:10 +00:00
skal
622242aaba
Lossess dec: harmonize the function suffixes
...
BUG=webp:355
Change-Id: I445d64df6aa2e347f41e7af306be12a77e2ac6a5
2017-08-07 18:22:41 -07:00
skal
1411f02761
Lossless Enc: harmonize the function suffixes
...
BUG=webp:355
Change-Id: I8baf506bd2a27095b956ef22a862b071f60c0d72
2017-08-07 18:02:07 -07:00
James Zern
7beed2807b
add missing ()s to macro parameters
...
BUG=webp:355
Change-Id: I616c6d3540d6551edd1b1cfdb5bffcf0a044c90f
2017-08-04 17:02:53 -07:00
James Zern
6473d20b3e
Merge "fix Android standalone toolchain build"
2017-08-04 18:25:21 +00:00
James Zern
0c83a8bc69
Merge "yuv: harmonize suffix naming"
2017-08-02 06:35:36 +00:00
James Zern
c6d1db4b36
fix Android standalone toolchain build
...
add a check for cpu-features.h and rework some of the ifdef's around
android + neon. for android builds with cpu-features enabled the
*_neon.c files will still need to be flagged correctly (with e.g.,
.c.neon in Android.mk) to properly build them.
BUG=webp:353
Change-Id: I905ce305af0a204e560b915d8665093a3edaceb9
2017-08-01 22:59:03 -07:00
skal
663a6d9d2e
unify the ALTERNATE_CODE flag usage
...
Pattern is now:
#if !defined(FLAG)
#define FLAG 0 // ALTERNATE_CODE
#endif
...
#if (FLAG == 1)
...
#else
...
#endif // FLAG
...
Removed some unused code / flags:
WEBP_YUV_USE_TABLE, WEBP_REFERENCE_IMPLEMENTATION,
experimental code, VP8YUVInit(), ...
BUG=webp:355
Change-Id: I98deb9189446a4cfd665c13ea8aa1ce6a308c63f
2017-08-01 20:49:29 -07:00
skal
73ea9f2702
yuv: harmonize suffix naming
...
BUG=webp:355
Change-Id: I403c4b3cdfc55b3b1648f98a1d189326a3e660a3
2017-08-01 20:40:00 -07:00
skal
c4568b47fd
Rescaler: harmonize the suffix naming
...
BUG=webp:355
Change-Id: I7720502c62f96c780793d3d881eac7b3afae1418
2017-08-01 23:49:44 +00:00
Pascal Massimino
6cb13b0532
Merge "alpha_processing: harmonize the naming suffixes to be _C()"
2017-08-01 03:38:03 +00:00
James Zern
83a3e69a20
Merge "simplify WEBP_EXTERN macro"
2017-08-01 03:29:12 +00:00
Pascal Massimino
7295fde2e6
Merge "filters: harmonize the suffixes naming to _SSE2(), _C(), etc."
2017-08-01 01:55:48 +00:00
James Zern
8e42ba4c80
simplify WEBP_EXTERN macro
...
including the type in the macro doesn't bring much benefit to ordering,
current platforms work with a prefix, this would be insufficient if the
attribute needed to follow the function prototype. this form makes it
easier to override on the command line.
BUG=webp:355
Change-Id: Iba41ec0bb319403054be0e899c4cc472dd932fd9
2017-07-31 18:27:52 -07:00
skal
331ab34bcd
cost*.c: harmonize the suffix namings
...
BUG=webp:355
Change-Id: Ic2e60eaab71cdffe1ebf93fc36aaa3eb25bbf08d
2017-07-31 17:18:32 -07:00
skal
b161f670f8
filters: harmonize the suffixes naming to _SSE2(), _C(), etc.
...
BUG=webp:355
Change-Id: I28f464eb13444c3046332cdda3c547f81700ecf4
2017-08-01 00:09:05 +00:00
skal
dec5e4d330
alpha_processing: harmonize the naming suffixes to be _C()
...
BUG=webp:355
Change-Id: Iae8221cd34957764ead21aa46abfc320e5514a4b
2017-07-31 23:34:24 +00:00
James Zern
92982609bc
dsp.h: fix -Wundef w/__mips_dsp_rev
...
Change-Id: I552a543c7b039774041b43ace75b0cbea566b119
2017-07-11 16:12:32 -07:00
James Zern
4ea49f6b82
rescaler_sse2.c: fix WEBP_RESCALER_FIX -> _RFIX typo
...
quiets -Wundef
Change-Id: I8f1facf401b6f1ab393005c93086ac3e2ae354d5
2017-07-11 15:35:27 -07:00
James Zern
b34a9db1a1
cosmetics,dec_sse2: remove some redundant comments
...
Change-Id: I5a59d6dde9b6638b318f36d51d0d53870a3de273
2017-07-06 23:19:18 -07:00
Vincent Rabaud
8acb4942f7
Remove the argb* files.
...
Half of the functionality was duplicated.
The rest is about the alpha channel handling so we
might as well put it in the appropriate file.
Change-Id: I8d5ef0afce82cc4842ab7132fd97995c42e6140a
2017-06-25 14:44:33 +02:00
Vincent Rabaud
7ca0df1363
Have the SSE2 version of PackARGB use common code.
...
The common code actually got sped-up by 25% by using the code
from PackARGB.
Change-Id: I94be6ccff2bfe02fff13c8e2698669e6a0d8fc74
2017-06-20 17:41:14 +02:00
Vincent Rabaud
8f6df1d0b9
Unroll Predictors 10, 11 and 12.
...
We see the following speed-ups:
10 -> 13%
11 -> 13%
12 -> 13%
Change-Id: I4734fd388d0f4e508884d0b123976bf2cbe69d2f
2017-06-08 20:37:47 +02:00
Vincent Rabaud
e4eb458741
lossless, VP8LTransformColor_C: make sure no overflow happens with colors.
...
Change-Id: Iec0d07cf1188ba96391cdb1b62131fc1469dfac6
2017-05-24 11:34:40 +02:00
Pascal Massimino
faf42213f4
NEON: implement ConvertRGB24ToY/BGR24/ARGB/RGBA32ToUV/ARGBToUV
...
Change-Id: Ie68aaed36d17f56d998c1b284514860cf5d28b8a
2017-05-09 15:57:20 +02:00
Pascal Massimino
f768218966
yuv: rationalize the C/SSE2 function naming
...
+ implement some easy missing targets in SSE2 (565/4444)
Change-Id: Ib575f7ada2a0ed7309cddd238f8bfc0e8999f145
2017-04-21 13:52:25 +02:00
Pascal Massimino
52245424b0
NEON implementation of some Sharp-YUV420 functions
...
Change-Id: I449ef9c76b06f971f6e2ad7f9db96bf906d8fe1f
new-file: dsp/yuv_neon.c
2017-04-18 19:22:37 +02:00
Pascal Massimino
28c37ebd5a
VP8LEnc: remove use of BitsLog2Ceiling()
...
was only used once. Better fall back for Log2Floor.
Change-Id: Ibcc26505440971bffe62ba6aca3d179ca85791d4
2017-03-20 02:58:16 -07:00
James Zern
80a2218668
ssim.c: remove dead include
...
Change-Id: Ia4be534b3b95d5d9f712ff53e530c98b942df860
2017-02-21 20:17:19 -08:00
Pascal Massimino
693bf74ec0
move the SSIM calculation code in ssim.c / ssim_sse2.c
...
Change-Id: I63a63fa7f44f257f2e17e45358b206c23069c448
2017-02-21 12:53:35 +01:00
Pascal Massimino
4105d565d3
disable WEBP_USE_XXX optimisations when EMSCRIPTEN is defined
...
Currently, none are available. If WEBP_HAVE_SSE2 eventually works,
we'll have to refine this conditionals.
BUG=webp:261
Change-Id: Ibc63ee1c013f2a4169eeb85cc8b6317b6420c2ad
2017-02-08 15:44:20 +00:00
Parag Salasakar
aa893914fc
Add clang build fix for MSA
...
Change-Id: If139f4ecbdce756c69ba4ae032a70f81179683f8
2017-02-01 17:45:17 +05:30
Pascal Massimino
4f3e3bbd44
disable GradientUnfilter_NEON
...
Compile with XCode, it appears quite slower than the C-version,
especially for arm64.
Change-Id: Ic46dba184a36be454fef674129d2f909003788fc
2017-01-25 16:33:26 -08:00
Pascal Massimino
79bf46f120
rename the pretentious SmartYUV into SharpYUV
...
Change-Id: Ifeeb9cb85896c5f3ba0cc1c2c821f8d00295f69e
2017-01-20 14:36:21 +01:00
James Zern
668e1dd44f
src/{dec,enc,utils}: give filenames a unique suffix
...
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
2017-01-19 19:09:48 -08:00
Pascal Massimino
71c53f1aeb
NEON: speed-up strong filtering
...
The sub-expression trick removes two constants and
two vmlal_s8 instructions.
Change-Id: I200022573b4880871b528b13a11a8f3d95def113
2017-01-19 20:46:48 +00:00
Pascal Massimino
749a45a520
Merge "NEON: implement alpha-filters (horizontal/vertical/gradient)"
2017-01-17 15:13:08 +00:00
Pascal Massimino
74c053b57d
Merge "NEON: fix overflow in SSE NxN calculation"
2017-01-17 15:10:54 +00:00
Pascal Massimino
1de931c669
NEON: implement alpha-filters (horizontal/vertical/gradient)
...
gradient-filter code is not much faster, but maybe improvable in the future.
Change-Id: Ia16070e409fe8703b02276166f19526917df6b35
2017-01-17 15:44:46 +01:00
Pascal Massimino
9b3aca404d
NEON: fix overflow in SSE NxN calculation
...
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.
Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
2017-01-17 11:47:36 +01:00
Pascal Massimino
1c07a3c639
dsp: WebPExtractGreen function for alpha decompression
...
+ NEON implementation
Change-Id: I67204f99d6e4c5974718bdf21dad30381978f72c
2017-01-17 09:33:25 +00:00
Pascal Massimino
8fda56126e
Merge "add a kSlowSSSE3 feature for CPUInfo"
2017-01-13 07:01:48 +00:00
Pascal Massimino
86bbd24552
add a kSlowSSSE3 feature for CPUInfo
...
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.
Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731
Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
2017-01-13 06:19:27 +00:00
Vincent Rabaud
7c2779e95a
Get code to fully compile in C++.
...
Change-Id: I6d8490c8c9b955d90dcc89ee8a9cf29ca0f93b08
2017-01-12 18:03:55 +01:00
Vincent Rabaud
250c358662
Merge "When compiling as C++, avoid narrowing warnings."
2017-01-12 13:00:56 +00:00
Vincent Rabaud
c0648ac2ae
When compiling as C++, avoid narrowing warnings.
...
The gcc compilation warning was: narrowing conversion from ‘int’ to ‘int8_t’
Change-Id: I4803dd60ad04060cdb5d61a1aa98b25215b9d4eb
2017-01-12 13:39:22 +01:00
Pascal Massimino
0d55f60c91
40% faster ApplyAlphaMultiply_SSE2
...
process four pixels at a time
Change-Id: I1dee7f70772be4915654fc6638ef4729a1a239d4
2017-01-12 02:33:09 -08:00
Pascal Massimino
49d0280df1
NEON: implement several alpha-processing functions
...
- ApplyAlphaMultiply
- DispatchAlpha
- DispatchAlphaToGreen
- ExtractAlpha
Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4)
new file: alpha_processing_neon.c
Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca
2017-01-11 17:39:29 +01:00
Pascal Massimino
48b1e85fbe
SSE2: 15% faster alpha-processing functions
...
ApplyAlphaMultiply / MultARGBRow / MultRow
we use now: x/255 = (x * 0x8081) >> (16 + 7)
and x/255 + .5 = ((x + 128) * 0x0101) >> 16
Change-Id: I8931091316ffc8bbf65aa3402f2e7d2b800e1971
2017-01-11 15:35:16 +01:00
Pascal Massimino
28fe054e73
SSE2: 30% faster ApplyAlphaMultiply()
...
and 15% faster MultARGBRow()
by switching to formulae:
X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
(X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128
Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
2017-01-10 23:34:22 +01:00
Pascal Massimino
be0ef6395f
fix a comment typo
...
Change-Id: I0fabd08cd8abd3cea7ddfd2e498507adb0d3c67e
2017-01-10 21:17:13 +01:00
Pascal Massimino
00b08c88c0
Merge "NEON: 5% faster conversion to RGB565 and RGBA4444"
2016-12-22 08:39:01 +00:00
Pascal Massimino
0e7f444702
Merge "NEON: faster fancy upsampling"
2016-12-21 14:53:24 +00:00
Pascal Massimino
b016cb91c5
NEON: faster fancy upsampling
...
2-3% faster decoding overall
Change-Id: I2c53e50dc7e0ade5245cff8cc5d7b96a14062955
2016-12-21 15:23:54 +01:00
Vincent Rabaud
1cb638010c
Call the C function to finish off lossless SSE loops only when necessary.
...
Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad
2016-12-21 14:25:54 +01:00
Vincent Rabaud
875fafc191
Implement BundleColorMap in SSE2.
...
Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e
2016-12-21 10:44:31 +01:00
Pascal Massimino
341d711c43
NEON: 5% faster conversion to RGB565 and RGBA4444
...
We use the magic 'shift and insert' instruction instead of
the multiple shifts and or's.
Change-Id: I48df0320668b502a91792defc0423a9441669d19
2016-12-20 17:01:48 +01:00
Pascal Massimino
a4bbe4b38b
fix indentation
...
Change-Id: I5593fb2441f253c6b8cc43949c11909f19184b55
2016-12-13 22:50:29 -08:00
Pascal Massimino
58fc507842
Merge "PredictorSub: implement fully-SSE2 version"
2016-12-13 11:03:13 +00:00
Pascal Massimino
9cc421675b
PredictorSub: implement fully-SSE2 version
...
and inline the C-version too.
Predictor #13 is still a hard one.
Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331
2016-12-13 02:19:35 -08:00
James Zern
2423017a28
dsp/lossless.c,cosmetics: fix indent
...
after:
fbba5bc
optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...
Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
2016-12-12 12:53:23 -08:00
Pascal Massimino
fbba5bc2c1
optimize predictor #1 in plain-C
...
For some reason, gcc has hard time inlining this one...
Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]
Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
2016-12-12 17:41:36 +01:00
Pascal Massimino
9ae0b3f65a
Merge "SSE2: slightly (~2%) faster Predictor #1 "
2016-12-12 14:46:21 +00:00
Pascal Massimino
c1f97bd758
SSE2: slightly (~2%) faster Predictor #1
...
by removing a load from memory
Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0
2016-12-12 02:24:38 -08:00
Pascal Massimino
ea664b8995
SSE2: 10% faster Predictor #11
...
Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185
2016-12-12 02:20:41 -08:00
Pascal Massimino
b3fb8bb602
slightly faster Predictor #11 in NEON
...
(+some slight modifications on Predictor #12 )
Change-Id: Ic2132dcd83d961cd069fa01ca1670e35e35274e2
2016-12-08 07:32:51 -08:00
Pascal Massimino
76ebbfff28
NEON: implement predictor #13
...
~5-7% faster
Change-Id: I3361b0bbc978f3721168db15778a67337309c18a
2016-12-07 14:58:49 -08:00
Vincent Rabaud
95b12a08ae
Merge "Revert Average3 and Average4"
2016-12-07 15:38:56 +00:00
Vincent Rabaud
54ab2e758f
Revert Average3 and Average4
...
Average3 created a slowdown of 1-2% in lossless decoding.
Average4 created a slowdown of 2-3% in lossless decoding.
Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5
2016-12-07 15:32:33 +01:00
Pascal Massimino
fe12330c81
3-5% faster Predictor #5 , #6 , #7 and #10 for NEON
...
Change-Id: Ica48c7088d4384f0888dd171a47e68ebd25729b2
2016-12-07 15:25:33 +01:00
Pascal Massimino
fbfb3bef7b
~2% faster predictor #10 for NEON
...
Change-Id: Icd9cff90c227d702c3ba319131996c5475094520
2016-12-06 13:47:35 +00:00
Pascal Massimino
d4b7d801db
lossless_sse2: use the local functions
...
...instead of the pointers stored in the array.
Should be faster (inlined) and safer.
Also: suffix explicitly the functions with _SSE2
Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90
2016-12-06 14:20:41 +01:00
Vincent Rabaud
a5e3b22574
Lossless decoder SSE2 improvements.
...
Change-Id: Ia901014ac63156a2e278b81e035256c30bdf8706
2016-12-06 13:45:09 +01:00
Pascal Massimino
58a1f124c2
~2% faster predictor #12 in NEON.
...
Change-Id: I6772bb865d0f72720a65561eb55028e538df236d
2016-12-06 10:24:27 +01:00
Pascal Massimino
906c3b6392
Merge "Implement lossless transforms in NEON."
2016-12-03 16:55:14 +00:00
Vincent Rabaud
d23abe4e9f
Implement lossless transforms in NEON.
...
Change-Id: I2172b1a763eb9dfe25d2b9bf1fb6501d7e192e55
2016-12-03 11:20:22 +00:00
Vincent Rabaud
2e6cb6f34e
Give more flexibility to the predictor generating macro.
...
Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900
2016-12-02 12:33:12 -08:00
Vincent Rabaud
28e0bb7088
Merge "Fix race condition in multi-threading initialization."
2016-12-02 17:45:10 +00:00
Vincent Rabaud
647045305a
Fix race condition in multi-threading initialization.
...
Before, a first thread could enter VP8LDspInitSSE2, set
VP8LPredictorsAdd to an SSE2 version BEFORE another thread
would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C
thus leading to a C version actually being the SSE2 one (which
would then create an infinite recursion in the SSE2 predictors
at execution).
Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a
2016-12-02 18:28:57 +01:00
Pascal Massimino
ea72cd60cb
add missing 'extern' keyword for predictor dcl
...
Change-Id: Ibf3db9b6dae91e53524c31cdfccf4678b3fa1135
2016-12-01 08:15:14 +01:00
Vincent Rabaud
67879e6d48
SSE implementation of decoding predictors.
...
Change-Id: I5c9ae63afc98013cb45ce8a91f051203ac68402c
2016-11-30 12:00:07 +01:00
Vincent Rabaud
4239a1489c
Make the lossless predictors work on a batch of pixels.
...
Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1
2016-11-28 17:12:10 +01:00
Pascal Massimino
bc18ebad2e
fix extra 'const's in signatures
...
Change-Id: Ie433d0defbc0c6feae2eb2f11e70082f1affada8
2016-11-25 09:45:52 +01:00
Vincent Rabaud
71e2f5cadf
Remove memcpy in lossless decoding.
...
Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c
2016-11-24 17:45:24 +01:00
Vincent Rabaud
7474d46e45
Do not use a register array in SSE.
...
Change-Id: I79cf95bdac1164fc4de899828e9380c23df8d141
2016-11-24 13:06:44 +01:00
Owen Rodley
67748b41db
Improve latency of FTransform2.
...
Benchmarks from vrabaud@:
8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged
skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged
images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged
images/lpcb corpus speed: unchanged, corpus size: unchanged
images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged
images/png_doc corpus speed: unchanged, corpus size: unchanged
images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged
images/jpeg_small corpus speed: unchanged, corpus size: unchanged
images/icip_core1 corpus speed: unchanged, corpus size: unchanged
images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged
images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged
images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged
images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged
images/gif corpus speed: faster: -1.4 % , corpus size: unchanged
images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged
images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged
images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged
images/all corpus speed: faster: -1.5 % , corpus size: unchanged
images/png_small corpus speed: unchanged, corpus size: unchanged
images/png corpus speed: unchanged, corpus size: unchanged
images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged
Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b
2016-11-24 07:09:50 +00:00
Vincent Rabaud
6540cd0eeb
Provide an SSE implementation of ConvertBGRAToRGB
...
Change-Id: Ida11b079077a47fe3b92754f08aa30d81c301fcf
2016-11-23 16:25:51 +01:00
Pascal Massimino
3c2a61b099
remove some unneeded casts
...
Change-Id: Ie68788c77f016ed11446a55142b1bd8d96261452
2016-11-16 22:54:40 -08:00
Pascal Massimino
9ac063c37f
add dsp functions for SmartYUV
...
+ SSE2 implementation
Change-Id: I5cfdb62d68b5a95899241a097d3a2f697fbc590e
2016-11-16 14:23:06 +00:00
Pascal Massimino
31b1e34342
fix SSIM metric ... by ignoring too-dark area
...
Roughly, if both the source and the reference areas are
darker too dark (R/G/B <= ~6), they are ignored.
One caveat: SSIM calculation won't work for U/V planes,
which are 128-centered and not related to luminance.
But WebPPlaneDistortion() enforces the conversion to RGB,
if needed.
Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591
2016-10-20 15:17:55 +02:00
Vincent Rabaud
28ce304344
Remove some errors when compiling the code as C++.
...
This fixes some cases from
https://bugs.chromium.org/p/webp/issues/detail?id=137
Change-Id: I58f3a617bf973dbe4c5794004a01e2aea39ba53a
2016-10-05 09:39:08 +02:00
Pascal Massimino
ba843a92e7
fix some SSIM calculations
...
* prevent 64bit overflow by controlling the 32b->64b conversions
and preventively descaling by 8bit before the final multiply
* adjust the threshold constants C1 and C2 to de-emphasis the dark
areas
* use a hat-like filter instead of box-filtering to avoid blockiness
during averaging
SSIM distortion calc is actually *faster* now in SSE2, because of the
unrolling during the function rewrite.
The C-version is quite slower because still un-optimized.
Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90
2016-10-04 01:09:07 -07:00
Pascal Massimino
86a84b3598
2x faster SSE2 implementation of SSIMGet
...
Change-Id: I53705d7ddfa595389ff2d542e5088f96f948d351
2016-09-23 23:23:06 -07:00
Pascal Massimino
7c1fb7d0ff
fix uint32_t initialization (0. -> 0)
...
Change-Id: Ia4aae27f70c4e74ddeb5654cfabb21d785cea9cf
2016-09-14 20:26:05 +02:00
Pascal Massimino
bfff0bf329
speed-up SSIM calculation
...
SSIM results are incompatible with previous version!
We're now averaging the SSIM value for each pixels instead of
printing a frame-level global SSIM value.
* Got rid of some old code
* switched to uint32_t for accumulation
* refactoring
SSIM calculation is ~4x faster now.
Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673
2016-09-14 16:15:43 +02:00
Vincent Rabaud
64577de8ae
De-VP8L-ize GetEntropUnrefinedHelper.
...
Having it architecture dependent resulted in an extra
function call of an extern function, hence no inlining and
a 5-10% impact on performance.
Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450
2016-09-14 13:55:24 +02:00
Pascal Massimino
a7be73280b
Merge "refactor the PSNR / SSIM calculation code"
2016-09-14 06:37:56 +00:00
Pascal Massimino
50c3d7da9a
refactor the PSNR / SSIM calculation code
...
-print_psnr is now much faster because it doesn't use the SSIM code.
The SSIM speed-up and re-write will come later.
Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823
2016-09-14 06:13:24 +00:00
Vincent Rabaud
dd538b192d
Remove unused declaration.
...
Change-Id: I8ab19654df63e7ef8aad00e97d1428c7b53ee33f
2016-09-13 16:25:46 +02:00
Vincent Rabaud
6cc48b1728
Move some lossless logic out of dsp.
...
Change-Id: I4cfd60cd5497666a2e1c188ceada2e71b05f1505
2016-09-13 15:37:32 +02:00
Vincent Rabaud
c9b45863e2
Split off common lossless dsp inline functions.
...
Change-Id: I64f96897b11d1c21f033c7e47b21edccb5c68738
2016-09-12 17:35:08 +02:00
Pascal Massimino
3884972e3f
remove WEBP_FORCE_ALIGNED and use memcpy() instead.
...
BUG=webp:297
Change-Id: I89a08debec7bb1b3f411c897260ab1bb63f77df2
2016-08-17 20:16:03 -07:00
skal
6ab496ed22
fix some 'unsigned integer overflow' warnings in ubsan
...
I couldn't find a safe way of fixing VP8GetSigned() so i just
used the big-hammer.
Change-Id: I1039bc00307d1c90c85909a458a4bc70670e48b7
2016-08-16 23:18:27 -07:00
James Zern
8a4ebc6ab0
Revert "fix 'unsigned integer overflow' warnings in ubsan"
...
This reverts commit e44f5248ff
.
contains unintentional changes in quant.c
Change-Id: I1928f072566788b0c9ea80f6fbc9e571061f9b3e
2016-08-16 16:55:56 -07:00
Pascal Massimino
9d4f209f80
Merge changes I25711dd5,I43188fab
...
* changes:
Fix assertions in WebPRescalerExportRow()
Add descriptions of default configuration in help info.
2016-08-16 22:13:23 +00:00
skal
e44f5248ff
fix 'unsigned integer overflow' warnings in ubsan
...
I couldn't find a safe way of fixing VP8GetSigned() so i just
used the big-hammer.
Change-Id: I1039bc00307d1c90c85909a458a4bc70670e48b7
2016-08-16 15:04:41 -07:00
Hui Su
27b5d991e2
Fix assertions in WebPRescalerExportRow()
...
Change-Id: I25711dd54e71c90a25f7b18e0ef9155e8151a15e
2016-08-16 14:32:48 -07:00
James Zern
40872fb2e6
dec_neon,NeedsHev: micro optimization
...
trade 2 compares + 1 logical or for max + compare
Change-Id: I785ad8efdc64db2d0609456d6e7af795ab2117d8
2016-08-08 20:12:30 -07:00
James Zern
b551e587b3
cosmetics: add {}s on continued control statements
...
for consistency within the codebase. in some cases simply join the
lines.
Change-Id: I071f061052e274c8a69f651ed4305befb4414a40
2016-08-03 19:08:59 -07:00
James Zern
d2e4484ef3
dsp/Makefile.am: put msa source in correct lib
...
upsampling_msa.c was incorrectly included in the neon convenience lib
+ sort msa sources
Change-Id: I7c4883f16a5c2fed12bfa0e8d8d6a7acd5d4fb84
2016-08-03 17:50:45 -07:00
Parag Salasakar
d3ddacb625
Add MSA optimized YUV to RGB upsampling functions
...
We add the following MSA optimized YUV to RGB upsampling functions:
- UpsampleRgbLinePair
- UpsampleBgrLinePair
- UpsampleRgbaLinePair
- UpsampleBgraLinePair
- UpsampleArgbLinePair
- UpsampleRgba4444LinePair
- UpsampleRgb565LinePair
Change-Id: I7264a615edc7eb376e443e9d38bd8e3c9a2cab1f
2016-07-22 14:28:30 +00:00