Commit Graph

863 Commits

Author SHA1 Message Date
James Zern
582a1b572a rescaler_sse2: harmonize function suffixes
BUG=webp:355

Change-Id: I978fd826ff90149c0ffd9d7607dcc6f88082d3e6
2017-10-08 14:06:19 -07:00
James Zern
2c1b18ba2f lossless_sse2: harmonize function suffixes
BUG=webp:355

Change-Id: I59d828800c2ab2a36e0ea90f629b74bd57207411
2017-10-08 14:06:14 -07:00
James Zern
0ac46e818b lossless_enc_sse2: harmonize function suffixes
BUG=webp:355

Change-Id: I06c64416103c3f3fc0519dd46d64b0a35f9798e4
2017-10-08 14:06:05 -07:00
James Zern
bc634d57c2 enc_sse2: harmonize function suffixes
BUG=webp:355

Change-Id: Idd2f289fcf99f12bf36494111b07a8906c99c826
2017-10-08 14:05:59 -07:00
James Zern
bcb7347c2b dec_sse2: harmonize function suffixes
BUG=webp:355

Change-Id: Ic0390a4a24a5d8caff5b8af9fc9d59769ec533b1
2017-10-07 15:14:03 -07:00
James Zern
fb3daad604 cpu: fix ssse3 check
ssse3 is bit #9 in ecx, bit 1 is sse3. this only controls the check for
slow ssse3 and likely had no ill effect.

Change-Id: I84ce73dc480e1cdbd085e37be06f3f402116c201
2017-09-29 16:27:47 -07:00
Vincent Rabaud
a5216efc8c Fix integer overflow warning.
Though the overflow could happen, it does not change the
end results.

Change-Id: I1b84e022a0776d35eab5c5c4fb7d3563f5667bfa
2017-09-25 11:02:22 +02:00
James Zern
f78da3dea6 add LOCAL_CLANG_PREREQ and avoid WORK_AROUND_GCC w/3.8+
this results in a 15-20% speedup for lossy decoding on a N5/S6/CM1

BUG=webp:339

Change-Id: Icdeb84c3e0b8908147ac276b4d8f76c3d565b735
2017-09-19 20:59:49 -07:00
James Zern
01c426f1e7 define WEBP_USE_INTRINSICS w/gcc-4.9+
32-bit builds are neutral to slightly faster using ndk r15c on a
N5/S6/CM1

BUG=webp:339

Change-Id: I94b9442e0ceaf2f5edb2b4026bc8b99cd77c918b
2017-09-19 20:59:43 -07:00
Pascal Massimino
3822762a6c rationalize the Makefile.am
one library addition per line, etc...

BUG=webp:355

Change-Id: I95761dea598a382db5632c5187210937e129ff75
2017-08-29 00:00:14 -07:00
Pascal Massimino
42c79aa66b Merge "Encoder: harmonize function suffixes" 2017-08-09 18:13:57 +00:00
skal
b09307dcde Encoder: harmonize function suffixes
BUG=webp:355

Change-Id: Ia2fe95db7dfb303f3f64e390d43bc41b8933256c
2017-08-09 02:41:01 +00:00
James Zern
bed0456d58 Merge "SSIM: harmonize the function suffix" 2017-08-09 02:37:39 +00:00
skal
54f6a3cf3a lossless_sse2.c: fix some missed suffix changes
BUG=webp:355

Change-Id: If830e3169a4021899ed850aa7edfd94b81fa2cf9
2017-08-08 14:19:05 -07:00
skal
088f1dcce8 SSIM: harmonize the function suffix
BUG=webp:355

Change-Id: I751852ddb2abb7319e41e6c7d022ac4f288b4d08
2017-08-08 08:52:06 -07:00
skal
a0f72a4fe0 VP8LTransformColorFunc: drop an non-respected 'const' from the signature.
BUG=webp:355

Change-Id: Ie99bf377a55db2950bfbac9423bfe0967623ea5d
2017-08-07 19:05:01 -07:00
Pascal Massimino
8c934902cd Merge "Lossess dec: harmonize the function suffixes" 2017-08-08 02:04:10 +00:00
skal
622242aaba Lossess dec: harmonize the function suffixes
BUG=webp:355

Change-Id: I445d64df6aa2e347f41e7af306be12a77e2ac6a5
2017-08-07 18:22:41 -07:00
skal
1411f02761 Lossless Enc: harmonize the function suffixes
BUG=webp:355

Change-Id: I8baf506bd2a27095b956ef22a862b071f60c0d72
2017-08-07 18:02:07 -07:00
James Zern
7beed2807b add missing ()s to macro parameters
BUG=webp:355

Change-Id: I616c6d3540d6551edd1b1cfdb5bffcf0a044c90f
2017-08-04 17:02:53 -07:00
James Zern
6473d20b3e Merge "fix Android standalone toolchain build" 2017-08-04 18:25:21 +00:00
James Zern
0c83a8bc69 Merge "yuv: harmonize suffix naming" 2017-08-02 06:35:36 +00:00
James Zern
c6d1db4b36 fix Android standalone toolchain build
add a check for cpu-features.h and rework some of the ifdef's around
android + neon. for android builds with cpu-features enabled the
*_neon.c files will still need to be flagged correctly (with e.g.,
.c.neon in Android.mk) to properly build them.

BUG=webp:353

Change-Id: I905ce305af0a204e560b915d8665093a3edaceb9
2017-08-01 22:59:03 -07:00
skal
663a6d9d2e unify the ALTERNATE_CODE flag usage
Pattern is now:
 #if !defined(FLAG)
 #define FLAG 0   // ALTERNATE_CODE
 #endif
...
 #if (FLAG == 1)
 ...
 #else
  ...
 #endif    // FLAG
...

Removed some unused code / flags:
  WEBP_YUV_USE_TABLE, WEBP_REFERENCE_IMPLEMENTATION,
  experimental code,  VP8YUVInit(), ...

BUG=webp:355

Change-Id: I98deb9189446a4cfd665c13ea8aa1ce6a308c63f
2017-08-01 20:49:29 -07:00
skal
73ea9f2702 yuv: harmonize suffix naming
BUG=webp:355

Change-Id: I403c4b3cdfc55b3b1648f98a1d189326a3e660a3
2017-08-01 20:40:00 -07:00
skal
c4568b47fd Rescaler: harmonize the suffix naming
BUG=webp:355

Change-Id: I7720502c62f96c780793d3d881eac7b3afae1418
2017-08-01 23:49:44 +00:00
Pascal Massimino
6cb13b0532 Merge "alpha_processing: harmonize the naming suffixes to be _C()" 2017-08-01 03:38:03 +00:00
James Zern
83a3e69a20 Merge "simplify WEBP_EXTERN macro" 2017-08-01 03:29:12 +00:00
Pascal Massimino
7295fde2e6 Merge "filters: harmonize the suffixes naming to _SSE2(), _C(), etc." 2017-08-01 01:55:48 +00:00
James Zern
8e42ba4c80 simplify WEBP_EXTERN macro
including the type in the macro doesn't bring much benefit to ordering,
current platforms work with a prefix, this would be insufficient if the
attribute needed to follow the function prototype. this form makes it
easier to override on the command line.

BUG=webp:355

Change-Id: Iba41ec0bb319403054be0e899c4cc472dd932fd9
2017-07-31 18:27:52 -07:00
skal
331ab34bcd cost*.c: harmonize the suffix namings
BUG=webp:355

Change-Id: Ic2e60eaab71cdffe1ebf93fc36aaa3eb25bbf08d
2017-07-31 17:18:32 -07:00
skal
b161f670f8 filters: harmonize the suffixes naming to _SSE2(), _C(), etc.
BUG=webp:355

Change-Id: I28f464eb13444c3046332cdda3c547f81700ecf4
2017-08-01 00:09:05 +00:00
skal
dec5e4d330 alpha_processing: harmonize the naming suffixes to be _C()
BUG=webp:355

Change-Id: Iae8221cd34957764ead21aa46abfc320e5514a4b
2017-07-31 23:34:24 +00:00
James Zern
92982609bc dsp.h: fix -Wundef w/__mips_dsp_rev
Change-Id: I552a543c7b039774041b43ace75b0cbea566b119
2017-07-11 16:12:32 -07:00
James Zern
4ea49f6b82 rescaler_sse2.c: fix WEBP_RESCALER_FIX -> _RFIX typo
quiets -Wundef

Change-Id: I8f1facf401b6f1ab393005c93086ac3e2ae354d5
2017-07-11 15:35:27 -07:00
James Zern
b34a9db1a1 cosmetics,dec_sse2: remove some redundant comments
Change-Id: I5a59d6dde9b6638b318f36d51d0d53870a3de273
2017-07-06 23:19:18 -07:00
Vincent Rabaud
8acb4942f7 Remove the argb* files.
Half of the functionality was duplicated.
The rest is about the alpha channel handling so we
might as well put it in the appropriate file.

Change-Id: I8d5ef0afce82cc4842ab7132fd97995c42e6140a
2017-06-25 14:44:33 +02:00
Vincent Rabaud
7ca0df1363 Have the SSE2 version of PackARGB use common code.
The common code actually got sped-up by 25% by using the code
from PackARGB.

Change-Id: I94be6ccff2bfe02fff13c8e2698669e6a0d8fc74
2017-06-20 17:41:14 +02:00
Vincent Rabaud
8f6df1d0b9 Unroll Predictors 10, 11 and 12.
We see the following speed-ups:
10 -> 13%
11 -> 13%
12 -> 13%

Change-Id: I4734fd388d0f4e508884d0b123976bf2cbe69d2f
2017-06-08 20:37:47 +02:00
Vincent Rabaud
e4eb458741 lossless, VP8LTransformColor_C: make sure no overflow happens with colors.
Change-Id: Iec0d07cf1188ba96391cdb1b62131fc1469dfac6
2017-05-24 11:34:40 +02:00
Pascal Massimino
faf42213f4 NEON: implement ConvertRGB24ToY/BGR24/ARGB/RGBA32ToUV/ARGBToUV
Change-Id: Ie68aaed36d17f56d998c1b284514860cf5d28b8a
2017-05-09 15:57:20 +02:00
Pascal Massimino
f768218966 yuv: rationalize the C/SSE2 function naming
+ implement some easy missing targets in SSE2 (565/4444)

Change-Id: Ib575f7ada2a0ed7309cddd238f8bfc0e8999f145
2017-04-21 13:52:25 +02:00
Pascal Massimino
52245424b0 NEON implementation of some Sharp-YUV420 functions
Change-Id: I449ef9c76b06f971f6e2ad7f9db96bf906d8fe1f
new-file: dsp/yuv_neon.c
2017-04-18 19:22:37 +02:00
Pascal Massimino
28c37ebd5a VP8LEnc: remove use of BitsLog2Ceiling()
was only used once. Better fall back for Log2Floor.

Change-Id: Ibcc26505440971bffe62ba6aca3d179ca85791d4
2017-03-20 02:58:16 -07:00
James Zern
80a2218668 ssim.c: remove dead include
Change-Id: Ia4be534b3b95d5d9f712ff53e530c98b942df860
2017-02-21 20:17:19 -08:00
Pascal Massimino
693bf74ec0 move the SSIM calculation code in ssim.c / ssim_sse2.c
Change-Id: I63a63fa7f44f257f2e17e45358b206c23069c448
2017-02-21 12:53:35 +01:00
Pascal Massimino
4105d565d3 disable WEBP_USE_XXX optimisations when EMSCRIPTEN is defined
Currently, none are available. If WEBP_HAVE_SSE2 eventually works,
we'll have to refine this conditionals.

BUG=webp:261

Change-Id: Ibc63ee1c013f2a4169eeb85cc8b6317b6420c2ad
2017-02-08 15:44:20 +00:00
Parag Salasakar
aa893914fc Add clang build fix for MSA
Change-Id: If139f4ecbdce756c69ba4ae032a70f81179683f8
2017-02-01 17:45:17 +05:30
Pascal Massimino
4f3e3bbd44 disable GradientUnfilter_NEON
Compile with XCode, it appears quite slower than the C-version,
especially for arm64.

Change-Id: Ic46dba184a36be454fef674129d2f909003788fc
2017-01-25 16:33:26 -08:00
Pascal Massimino
79bf46f120 rename the pretentious SmartYUV into SharpYUV
Change-Id: Ifeeb9cb85896c5f3ba0cc1c2c821f8d00295f69e
2017-01-20 14:36:21 +01:00
James Zern
668e1dd44f src/{dec,enc,utils}: give filenames a unique suffix
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible

BUG=webp:279

Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
2017-01-19 19:09:48 -08:00
Pascal Massimino
71c53f1aeb NEON: speed-up strong filtering
The sub-expression trick removes two constants and
two vmlal_s8 instructions.

Change-Id: I200022573b4880871b528b13a11a8f3d95def113
2017-01-19 20:46:48 +00:00
Pascal Massimino
749a45a520 Merge "NEON: implement alpha-filters (horizontal/vertical/gradient)" 2017-01-17 15:13:08 +00:00
Pascal Massimino
74c053b57d Merge "NEON: fix overflow in SSE NxN calculation" 2017-01-17 15:10:54 +00:00
Pascal Massimino
1de931c669 NEON: implement alpha-filters (horizontal/vertical/gradient)
gradient-filter code is not much faster, but maybe improvable in the future.

Change-Id: Ia16070e409fe8703b02276166f19526917df6b35
2017-01-17 15:44:46 +01:00
Pascal Massimino
9b3aca404d NEON: fix overflow in SSE NxN calculation
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.

Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
2017-01-17 11:47:36 +01:00
Pascal Massimino
1c07a3c639 dsp: WebPExtractGreen function for alpha decompression
+ NEON implementation

Change-Id: I67204f99d6e4c5974718bdf21dad30381978f72c
2017-01-17 09:33:25 +00:00
Pascal Massimino
8fda56126e Merge "add a kSlowSSSE3 feature for CPUInfo" 2017-01-13 07:01:48 +00:00
Pascal Massimino
86bbd24552 add a kSlowSSSE3 feature for CPUInfo
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.

Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731

Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
2017-01-13 06:19:27 +00:00
Vincent Rabaud
7c2779e95a Get code to fully compile in C++.
Change-Id: I6d8490c8c9b955d90dcc89ee8a9cf29ca0f93b08
2017-01-12 18:03:55 +01:00
Vincent Rabaud
250c358662 Merge "When compiling as C++, avoid narrowing warnings." 2017-01-12 13:00:56 +00:00
Vincent Rabaud
c0648ac2ae When compiling as C++, avoid narrowing warnings.
The gcc compilation warning was: narrowing conversion from ‘int’ to ‘int8_t’

Change-Id: I4803dd60ad04060cdb5d61a1aa98b25215b9d4eb
2017-01-12 13:39:22 +01:00
Pascal Massimino
0d55f60c91 40% faster ApplyAlphaMultiply_SSE2
process four pixels at a time

Change-Id: I1dee7f70772be4915654fc6638ef4729a1a239d4
2017-01-12 02:33:09 -08:00
Pascal Massimino
49d0280df1 NEON: implement several alpha-processing functions
- ApplyAlphaMultiply
 - DispatchAlpha
 - DispatchAlphaToGreen
 - ExtractAlpha

Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4)

new file: alpha_processing_neon.c

Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca
2017-01-11 17:39:29 +01:00
Pascal Massimino
48b1e85fbe SSE2: 15% faster alpha-processing functions
ApplyAlphaMultiply / MultARGBRow / MultRow

we use now: x/255 = (x * 0x8081) >> (16 + 7)
and x/255 + .5 = ((x + 128) * 0x0101) >> 16

Change-Id: I8931091316ffc8bbf65aa3402f2e7d2b800e1971
2017-01-11 15:35:16 +01:00
Pascal Massimino
28fe054e73 SSE2: 30% faster ApplyAlphaMultiply()
and 15% faster MultARGBRow()

by switching to formulae:
    X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
   (X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128

Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
2017-01-10 23:34:22 +01:00
Pascal Massimino
be0ef6395f fix a comment typo
Change-Id: I0fabd08cd8abd3cea7ddfd2e498507adb0d3c67e
2017-01-10 21:17:13 +01:00
Pascal Massimino
00b08c88c0 Merge "NEON: 5% faster conversion to RGB565 and RGBA4444" 2016-12-22 08:39:01 +00:00
Pascal Massimino
0e7f444702 Merge "NEON: faster fancy upsampling" 2016-12-21 14:53:24 +00:00
Pascal Massimino
b016cb91c5 NEON: faster fancy upsampling
2-3% faster decoding overall

Change-Id: I2c53e50dc7e0ade5245cff8cc5d7b96a14062955
2016-12-21 15:23:54 +01:00
Vincent Rabaud
1cb638010c Call the C function to finish off lossless SSE loops only when necessary.
Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad
2016-12-21 14:25:54 +01:00
Vincent Rabaud
875fafc191 Implement BundleColorMap in SSE2.
Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e
2016-12-21 10:44:31 +01:00
Pascal Massimino
341d711c43 NEON: 5% faster conversion to RGB565 and RGBA4444
We use the magic 'shift and insert' instruction instead of
the multiple shifts and or's.

Change-Id: I48df0320668b502a91792defc0423a9441669d19
2016-12-20 17:01:48 +01:00
Pascal Massimino
a4bbe4b38b fix indentation
Change-Id: I5593fb2441f253c6b8cc43949c11909f19184b55
2016-12-13 22:50:29 -08:00
Pascal Massimino
58fc507842 Merge "PredictorSub: implement fully-SSE2 version" 2016-12-13 11:03:13 +00:00
Pascal Massimino
9cc421675b PredictorSub: implement fully-SSE2 version
and inline the C-version too.

Predictor #13 is still a hard one.

Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331
2016-12-13 02:19:35 -08:00
James Zern
2423017a28 dsp/lossless.c,cosmetics: fix indent
after:
fbba5bc optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...

Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
2016-12-12 12:53:23 -08:00
Pascal Massimino
fbba5bc2c1 optimize predictor #1 in plain-C
For some reason, gcc has hard time inlining this one...

Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]

Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
2016-12-12 17:41:36 +01:00
Pascal Massimino
9ae0b3f65a Merge "SSE2: slightly (~2%) faster Predictor #1" 2016-12-12 14:46:21 +00:00
Pascal Massimino
c1f97bd758 SSE2: slightly (~2%) faster Predictor #1
by removing a load from memory

Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0
2016-12-12 02:24:38 -08:00
Pascal Massimino
ea664b8995 SSE2: 10% faster Predictor #11
Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185
2016-12-12 02:20:41 -08:00
Pascal Massimino
b3fb8bb602 slightly faster Predictor #11 in NEON
(+some slight modifications on Predictor #12)

Change-Id: Ic2132dcd83d961cd069fa01ca1670e35e35274e2
2016-12-08 07:32:51 -08:00
Pascal Massimino
76ebbfff28 NEON: implement predictor #13
~5-7% faster

Change-Id: I3361b0bbc978f3721168db15778a67337309c18a
2016-12-07 14:58:49 -08:00
Vincent Rabaud
95b12a08ae Merge "Revert Average3 and Average4" 2016-12-07 15:38:56 +00:00
Vincent Rabaud
54ab2e758f Revert Average3 and Average4
Average3 created a slowdown of 1-2% in lossless decoding.
Average4 created a slowdown of 2-3% in lossless decoding.

Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5
2016-12-07 15:32:33 +01:00
Pascal Massimino
fe12330c81 3-5% faster Predictor #5, #6, #7 and #10 for NEON
Change-Id: Ica48c7088d4384f0888dd171a47e68ebd25729b2
2016-12-07 15:25:33 +01:00
Pascal Massimino
fbfb3bef7b ~2% faster predictor #10 for NEON
Change-Id: Icd9cff90c227d702c3ba319131996c5475094520
2016-12-06 13:47:35 +00:00
Pascal Massimino
d4b7d801db lossless_sse2: use the local functions
...instead of the pointers stored in the array.
Should be faster (inlined) and safer.

Also: suffix explicitly the functions with _SSE2

Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90
2016-12-06 14:20:41 +01:00
Vincent Rabaud
a5e3b22574 Lossless decoder SSE2 improvements.
Change-Id: Ia901014ac63156a2e278b81e035256c30bdf8706
2016-12-06 13:45:09 +01:00
Pascal Massimino
58a1f124c2 ~2% faster predictor #12 in NEON.
Change-Id: I6772bb865d0f72720a65561eb55028e538df236d
2016-12-06 10:24:27 +01:00
Pascal Massimino
906c3b6392 Merge "Implement lossless transforms in NEON." 2016-12-03 16:55:14 +00:00
Vincent Rabaud
d23abe4e9f Implement lossless transforms in NEON.
Change-Id: I2172b1a763eb9dfe25d2b9bf1fb6501d7e192e55
2016-12-03 11:20:22 +00:00
Vincent Rabaud
2e6cb6f34e Give more flexibility to the predictor generating macro.
Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900
2016-12-02 12:33:12 -08:00
Vincent Rabaud
28e0bb7088 Merge "Fix race condition in multi-threading initialization." 2016-12-02 17:45:10 +00:00
Vincent Rabaud
647045305a Fix race condition in multi-threading initialization.
Before, a first thread could enter VP8LDspInitSSE2, set
VP8LPredictorsAdd to an SSE2 version BEFORE another thread
would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C
thus leading to a C version actually being the SSE2 one (which
would then create an infinite recursion in the SSE2 predictors
at execution).

Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a
2016-12-02 18:28:57 +01:00
Pascal Massimino
ea72cd60cb add missing 'extern' keyword for predictor dcl
Change-Id: Ibf3db9b6dae91e53524c31cdfccf4678b3fa1135
2016-12-01 08:15:14 +01:00
Vincent Rabaud
67879e6d48 SSE implementation of decoding predictors.
Change-Id: I5c9ae63afc98013cb45ce8a91f051203ac68402c
2016-11-30 12:00:07 +01:00
Vincent Rabaud
4239a1489c Make the lossless predictors work on a batch of pixels.
Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1
2016-11-28 17:12:10 +01:00
Pascal Massimino
bc18ebad2e fix extra 'const's in signatures
Change-Id: Ie433d0defbc0c6feae2eb2f11e70082f1affada8
2016-11-25 09:45:52 +01:00
Vincent Rabaud
71e2f5cadf Remove memcpy in lossless decoding.
Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c
2016-11-24 17:45:24 +01:00
Vincent Rabaud
7474d46e45 Do not use a register array in SSE.
Change-Id: I79cf95bdac1164fc4de899828e9380c23df8d141
2016-11-24 13:06:44 +01:00
Owen Rodley
67748b41db Improve latency of FTransform2.
Benchmarks from vrabaud@:
8BIT/GRAY                corpus speed: faster: -4.3 % , corpus size: unchanged
skal/sources_png_skal    corpus speed: faster: -5.2 % , corpus size: unchanged
images/png_rgb           corpus speed: faster: -5.1 % , corpus size: unchanged
images/lpcb              corpus speed: unchanged, corpus size: unchanged
images/png_big           corpus speed: faster: -1.7 % , corpus size: unchanged
images/png_doc           corpus speed: unchanged, corpus size: unchanged
images/png_1bit          corpus speed: faster: -1.2 % , corpus size: unchanged
images/jpeg_small        corpus speed: unchanged, corpus size: unchanged
images/icip_core1        corpus speed: unchanged, corpus size: unchanged
images/png_gray          corpus speed: faster: -2.5 % , corpus size: unchanged
images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged
images/jpeg              corpus speed: faster: -2.3 % , corpus size: unchanged
images/png_translucent   corpus speed: faster: -2.8 % , corpus size: unchanged
images/gif               corpus speed: faster: -1.4 % , corpus size: unchanged
images/png_opaque        corpus speed: faster: -2.8 % , corpus size: unchanged
images/png_rgb_opaque    corpus speed: unchanged, corpus size: unchanged
images/png_indexed       corpus speed: faster: -2.0 % , corpus size: unchanged
images/all               corpus speed: faster: -1.5 % , corpus size: unchanged
images/png_small         corpus speed: unchanged, corpus size: unchanged
images/png               corpus speed: unchanged, corpus size: unchanged
images/gif_still         corpus speed: faster: -1.6 % , corpus size: unchanged

Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b
2016-11-24 07:09:50 +00:00
Vincent Rabaud
6540cd0eeb Provide an SSE implementation of ConvertBGRAToRGB
Change-Id: Ida11b079077a47fe3b92754f08aa30d81c301fcf
2016-11-23 16:25:51 +01:00
Pascal Massimino
3c2a61b099 remove some unneeded casts
Change-Id: Ie68788c77f016ed11446a55142b1bd8d96261452
2016-11-16 22:54:40 -08:00
Pascal Massimino
9ac063c37f add dsp functions for SmartYUV
+ SSE2 implementation

Change-Id: I5cfdb62d68b5a95899241a097d3a2f697fbc590e
2016-11-16 14:23:06 +00:00
Pascal Massimino
31b1e34342 fix SSIM metric ... by ignoring too-dark area
Roughly, if both the source and the reference areas are
darker too dark (R/G/B <= ~6), they are ignored.

One caveat: SSIM calculation won't work for U/V planes,
which are 128-centered and not related to luminance.
But WebPPlaneDistortion() enforces the conversion to RGB,
if needed.

Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591
2016-10-20 15:17:55 +02:00
Vincent Rabaud
28ce304344 Remove some errors when compiling the code as C++.
This fixes some cases from
https://bugs.chromium.org/p/webp/issues/detail?id=137

Change-Id: I58f3a617bf973dbe4c5794004a01e2aea39ba53a
2016-10-05 09:39:08 +02:00
Pascal Massimino
ba843a92e7 fix some SSIM calculations
* prevent 64bit overflow by controlling the 32b->64b conversions
  and preventively descaling by 8bit before the final multiply
* adjust the threshold constants C1 and C2 to de-emphasis the dark
  areas
* use a hat-like filter instead of box-filtering to avoid blockiness
  during averaging

SSIM distortion calc is actually *faster* now in SSE2, because of the
unrolling during the function rewrite.
The C-version is quite slower because still un-optimized.

Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90
2016-10-04 01:09:07 -07:00
Pascal Massimino
86a84b3598 2x faster SSE2 implementation of SSIMGet
Change-Id: I53705d7ddfa595389ff2d542e5088f96f948d351
2016-09-23 23:23:06 -07:00
Pascal Massimino
7c1fb7d0ff fix uint32_t initialization (0. -> 0)
Change-Id: Ia4aae27f70c4e74ddeb5654cfabb21d785cea9cf
2016-09-14 20:26:05 +02:00
Pascal Massimino
bfff0bf329 speed-up SSIM calculation
SSIM results are incompatible with previous version!
We're now averaging the SSIM value for each pixels instead of
printing a frame-level global SSIM value.

* Got rid of some old code
* switched to uint32_t for accumulation
* refactoring

SSIM calculation is ~4x faster now.

Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673
2016-09-14 16:15:43 +02:00
Vincent Rabaud
64577de8ae De-VP8L-ize GetEntropUnrefinedHelper.
Having it architecture dependent resulted in an extra
function call of an extern function, hence no inlining and
a 5-10% impact on performance.

Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450
2016-09-14 13:55:24 +02:00
Pascal Massimino
a7be73280b Merge "refactor the PSNR / SSIM calculation code" 2016-09-14 06:37:56 +00:00
Pascal Massimino
50c3d7da9a refactor the PSNR / SSIM calculation code
-print_psnr is now much faster because it doesn't use the SSIM code.
The SSIM speed-up and re-write will come later.

Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823
2016-09-14 06:13:24 +00:00
Vincent Rabaud
dd538b192d Remove unused declaration.
Change-Id: I8ab19654df63e7ef8aad00e97d1428c7b53ee33f
2016-09-13 16:25:46 +02:00
Vincent Rabaud
6cc48b1728 Move some lossless logic out of dsp.
Change-Id: I4cfd60cd5497666a2e1c188ceada2e71b05f1505
2016-09-13 15:37:32 +02:00
Vincent Rabaud
c9b45863e2 Split off common lossless dsp inline functions.
Change-Id: I64f96897b11d1c21f033c7e47b21edccb5c68738
2016-09-12 17:35:08 +02:00
Pascal Massimino
3884972e3f remove WEBP_FORCE_ALIGNED and use memcpy() instead.
BUG=webp:297

Change-Id: I89a08debec7bb1b3f411c897260ab1bb63f77df2
2016-08-17 20:16:03 -07:00
skal
6ab496ed22 fix some 'unsigned integer overflow' warnings in ubsan
I couldn't find a safe way of fixing VP8GetSigned() so i just
used the big-hammer.

Change-Id: I1039bc00307d1c90c85909a458a4bc70670e48b7
2016-08-16 23:18:27 -07:00
James Zern
8a4ebc6ab0 Revert "fix 'unsigned integer overflow' warnings in ubsan"
This reverts commit e44f5248ff.

contains unintentional changes in quant.c

Change-Id: I1928f072566788b0c9ea80f6fbc9e571061f9b3e
2016-08-16 16:55:56 -07:00
Pascal Massimino
9d4f209f80 Merge changes I25711dd5,I43188fab
* changes:
  Fix assertions in WebPRescalerExportRow()
  Add descriptions of default configuration in help info.
2016-08-16 22:13:23 +00:00
skal
e44f5248ff fix 'unsigned integer overflow' warnings in ubsan
I couldn't find a safe way of fixing VP8GetSigned() so i just
used the big-hammer.

Change-Id: I1039bc00307d1c90c85909a458a4bc70670e48b7
2016-08-16 15:04:41 -07:00
Hui Su
27b5d991e2 Fix assertions in WebPRescalerExportRow()
Change-Id: I25711dd54e71c90a25f7b18e0ef9155e8151a15e
2016-08-16 14:32:48 -07:00
James Zern
40872fb2e6 dec_neon,NeedsHev: micro optimization
trade 2 compares + 1 logical or for max + compare

Change-Id: I785ad8efdc64db2d0609456d6e7af795ab2117d8
2016-08-08 20:12:30 -07:00
James Zern
b551e587b3 cosmetics: add {}s on continued control statements
for consistency within the codebase. in some cases simply join the
lines.

Change-Id: I071f061052e274c8a69f651ed4305befb4414a40
2016-08-03 19:08:59 -07:00
James Zern
d2e4484ef3 dsp/Makefile.am: put msa source in correct lib
upsampling_msa.c was incorrectly included in the neon convenience lib
+ sort msa sources

Change-Id: I7c4883f16a5c2fed12bfa0e8d8d6a7acd5d4fb84
2016-08-03 17:50:45 -07:00
Parag Salasakar
d3ddacb625 Add MSA optimized YUV to RGB upsampling functions
We add the following MSA optimized YUV to RGB upsampling functions:
- UpsampleRgbLinePair
- UpsampleBgrLinePair
- UpsampleRgbaLinePair
- UpsampleBgraLinePair
- UpsampleArgbLinePair
- UpsampleRgba4444LinePair
- UpsampleRgb565LinePair

Change-Id: I7264a615edc7eb376e443e9d38bd8e3c9a2cab1f
2016-07-22 14:28:30 +00:00
Parag Salasakar
9ac74f922e Add MSA optimized rescaling functions
We add the following MSA optimized rescaling functions:
- RescalerExportRowExpand
- RescalerExportRowShrink

Change-Id: Ic1c76065423b02617db94cf0c22bb564219b36e6
2016-07-19 15:52:42 +00:00
Parag Salasakar
cb19dbc1a4 Add MSA optimized color transform functions
We add the following MSA optimized color transform functions:
- TransformColor
- SubtractGreenFromBlueAndRed

Change-Id: Ib182d2b5faa7191f503ce70f0dfde0ac89402fd3
2016-07-18 13:49:24 +00:00
James Zern
5e2eb89e1f cosmetics,dsp/*msa.c: associate '*' with the type
not the variable

Change-Id: If5823e9731c406655eaf1dc1aaa2e6554ca7daad
2016-07-15 15:40:41 -07:00
skal
5b60db5c9d FastMBAnalyze() for quick i16/i4 decision
The decision is based on the variance between DC values of each
sub-4x4 block. This heuristic is rather ok for predicting whether
the 2nd transform (intra-16) is going to help or not.
The decision threshold varies with quality (=quantization).

It's only used for -m 0 and -m 1, where no full RD-opt is performed.
It actually makes these modes quite faster, with RD curve much
closer to the -m 2 mode.

Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8
2016-07-15 11:21:08 -07:00
Parag Salasakar
567e697776 Add MSA optimized CollectHistogram function
We add the following MSA optimized encoder Histogram function:
- CollectHistogram

Change-Id: I28415704ec62c3ad375de06eeef468d9f514bb2d
2016-07-15 22:51:33 +05:30
Parag Salasakar
c54ab8dd1a Add MSA optimized quantization functions
We add the following MSA optimized encoder quantization functions:
- QuantizeBlock
- Quantize2Blocks

Change-Id: Ie32b442afa99eee62d2ef48942b41116a4e157d3
2016-07-15 15:33:47 +00:00
hui su
91b59e886b Remove QuantizeBlockWHT() in enc.c
QuantizeBlockWHT() is basically identical to QuantizeBlock(),
no need to keep two copies.

Change-Id: I970cb6948da1c750c1339971a55e3b40765cdd01
2016-07-14 10:44:18 -07:00
Parag Salasakar
fe57273736 Add MSA optimized SSE functions
We add the following MSA optimized encoder SSE functions:
- SSE16x16
- SSE16x8
- SSE8x8
- SSE4x4

Change-Id: I9ef9e903019337d9975c83264a652a7282bf5d5b
2016-07-14 15:43:23 +05:30
James Zern
6b53ca876e cosmetics,(dec|enc)_sse2.c: fix indent
Change-Id: Ic3326136ddd325e911e96c2e5a7f06b3e1d60f66
2016-07-13 16:11:29 -07:00
Parag Salasakar
afe3cec813 Add MSA optimized encoder IntraChromaPreds function
We add the following MSA optimized intrapred chroma function:
- IntraChromaPreds

Change-Id: I051cd174f5ce675aeb94e648d52c5a340a133ed4
2016-07-13 18:13:51 +05:30
Parag Salasakar
7b4b05e0dc Add MSA optimized encoder Intra16Preds function
We add the following MSA optimized intrapred 16x16 function:
- Intra16Preds

Change-Id: I89a249e041fbed377cb6a328c0b973add335b980
2016-07-12 14:30:03 +05:30
Parag Salasakar
c18787a0e9 Add MSA optimized encoder Intra4Preds function
We add the following MSA optimized intrapred 4x4 function:
- Intra4Preds

Change-Id: Icf325f3dcbf98bb6210811b666ce632cae575b22
2016-07-12 04:36:56 +00:00
Parag Salasakar
7915396f40 Add MSA optimized distortion functions
We add the following MSA optimized distortion functions:
- Disto4x4
- Disto16x16

Change-Id: I0a545ed0182ea56a0d5f358639f6671c2c21b95c
2016-07-07 07:30:22 +00:00
James Zern
bfef6c9f82 libwebp-0.5.1
- 6/14/2016: version 0.5.1
   This is a binary compatible release.
   * miscellaneous bug fixes (issues #280, #289)
   * reverted alpha plane encoding with color cache for compatibility with
     libwebp 0.4.0->0.4.3 (issues #291, #298)
   * lossless encoding performance improvements
   * memory reduction in both lossless encoding and decoding
   * force mux output to be in the extended format (VP8X) when undefined chunks
     are present (issue #294)
   * gradle, cmake build support
   * workaround for compiler bug causing 64-bit decode failures on android
     devices using clang-3.8 in the r11c NDK
   * various WebPAnimEncoder improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXfb1vAAoJEPnD1r24IytdtbwP/iCCEEU9scepXgh9+ICUOm1D
 6ASfz6eTYIPP4s2E+kIJKrKeGUrk7U1j6BeehjKxS3vMQxQlJvkXvepk0mdJUO4C
 okttfLahLY6DOZSAETK9SI4haE2Uuz5WGfxMe8x+4uuZZTxSLHqOCFMvU2oxo6uM
 rhErJgH3jWE9vGV9OuI8YUa109qGi8PLtErrFjXqFmAvnxJS95kJHr3MHVoulH8g
 tXrSUYTq37BCfSsxudhZTCENLhYqlXHO5tydvQVAlVbXJfpOsNLQciWUrqFiPuB9
 qhUv3smRV9YBd4XuUgFWLQcbcecQVBzIqxJ7lv41R71vi17Lu4plLjNAc0Cx70qc
 cnfe/acH+9hX0EwBzpvOpN/Lzirx1tmBKPOqnSiFpFP48RZSngLMG0mwhUufyq1I
 y6T2rEcMLRbAX/85sGMRd1AwffoW6OvgPG2LdhW2bh8u9YbA/g3qGH98z2T1JKjy
 V/TNvpTjXAdZ5XQMY8zIunv83Wp/6AWmJIRWZ+mfhw29F/F80HQG2Ss7dulbe3m2
 zpBjxdsaLj+9iZpheewrGGImZ5mJQsG7nRovtQ0VARVaRSY3xpaYug2CqXlQQ2bc
 bjdmGS9u+a4fHdk+uKTMzJEbu4RbXcOeLrvpzA+PxhUQi9WRyLIucIWeVVEDiUI2
 p7OJop9JmPjkRvvqfi5y
 =Mchr
 -----END PGP SIGNATURE-----

Merge tag 'v0.5.1'

libwebp-0.5.1
- 6/14/2016: version 0.5.1
  This is a binary compatible release.
  * miscellaneous bug fixes (issues #280, #289)
  * reverted alpha plane encoding with color cache for compatibility with
    libwebp 0.4.0->0.4.3 (issues #291, #298)
  * lossless encoding performance improvements
  * memory reduction in both lossless encoding and decoding
  * force mux output to be in the extended format (VP8X) when undefined chunks
    are present (issue #294)
  * gradle, cmake build support
  * workaround for compiler bug causing 64-bit decode failures on android
    devices using clang-3.8 in the r11c NDK
  * various WebPAnimEncoder improvements

* tag 'v0.5.1': (30 commits)
  update ChangeLog
  Clarify the expected 'config' lifespan in WebPIDecode()
  update ChangeLog
  Fix corner case in CostManagerInit.
  gif2webp: normalize the number of .'s in the help message
  vwebp: normalize the number of .'s in the help message
  cwebp: normalize the number of .'s in the help message
  fix rescaling bug: alpha plane wasn't filled with 0xff
  Improve lossless compression.
  'our bug tracker' -> 'the bug tracker'
  normalize the number of .'s in the help message
  pngdec,ReadFunc: throw an error on invalid read
  decode.h,WebPGetInfo: normalize function comment
  Inline GetResidual for speed.
  Speed-up uniform-region processing.
  free -> WebPSafeFree()
  DecodeImageData(): change the incorrect assert
  Fix a boundary case in BackwardReferencesHashChainDistanceOnly.
  Make sure to consider small distances in LZ77.
  add some asserts to delimit the perimeter of CostManager's operation
  ...

Change-Id: I44cee79fddd43527062ea9d83be67da42484ebfc
2016-07-06 19:31:27 -07:00
Parag Salasakar
435308e029 Add MSA optimized encoder transform functions
We add the following MSA optimized encoder transform functions:
- ITransform
- FTransform
- FTransformWHT

Change-Id: Ia6b17556aba5aff2d7a88208905fb45293d080a8
2016-07-05 14:35:47 +00:00
Parag Salasakar
dce64bfa1b Add MSA optimized alpha filter functions
We add the following MSA optimized alpha filter functions:
- HorizontalFilter
- VerticalFilter
- GradientFilter

Change-Id: I71e2e04050e569b8c0bf086fadf210ee16d50924
2016-07-01 19:58:25 +00:00
Parag Salasakar
429120d0af Add MSA optimized color transform functions
We add the following MSA optimized color transform functions:
- AddGreenToBlueAndRed
- TransformColorInverse

Change-Id: Iceab3813905955aa8b811253df9188512fc7de3f
2016-06-28 20:36:21 +05:30
Pascal Massimino
55b2fede7f normalize the macros' "do {...} while (0)" constructs
(so we're no longer bitten by the extra ';' problem!)

Change-Id: Icf849c97df9a7af135ba15a7906fc28590d7ce77
2016-06-27 15:30:05 -07:00
Parag Salasakar
701c772eed Add MSA optimized colorspace conversion functions
We add the following MSA optimized colorspace conversion functions:
- ConvertBGRAToRGBA
- ConvertBGRAToBGR
- ConvertBGRAToRGB

Change-Id: I76db1c829d593a06d4975d54dbafa385c82b84fb
2016-06-27 21:19:06 +00:00
Parag Salasakar
6a19793777 Add MSA optimized intra pred chroma functions
We add the following MSA optimized intra pred chroma functions:
- DC8uv
- TM8uv
- VE8uv
- HE8uv
- DC8uvNoTop
- DC8uvNoLeft
- DC8uvNoTopLeft

Change-Id: I48ad2409f334371acd38f4a70626ebcf2e10f4fe
2016-06-24 08:45:29 +00:00
Parag Salasakar
293d786f31 Added MSA optimized intra prediction 16x16 functions
1. DC16
2. TM16
3. VE16
4. HE16
5. DC16NoTop
6. DC16NoLeft
7. DC16NoTopLeft

Change-Id: I53c57c27cee40973b7ee40a7b7a7fbf0df812d1a
2016-06-23 13:09:02 +00:00
Parag Salasakar
0afa0ce2ff Added MSA optimized intra prediction 4x4 functions
1. DC4
2. TM4
3. VE4
4. RD4
5. LD4

Change-Id: Ib73131f9174aac13443160d2c2add1af90a3bd45
2016-06-23 10:49:34 +00:00
Parag Salasakar
a6621bacf3 Added MSA optimized simple edge filtering functions
1. SimpleVFilter16
2. SimpleHFilter16
3. SimpleVFilter16i
4. SimpleHFilter16i

Change-Id: Ib330e01960623aeeed1bdb5bc8155cc6657556f9
2016-06-23 06:52:01 +00:00