2530 Commits

Author SHA1 Message Date
Pascal Massimino
749a45a520 Merge "NEON: implement alpha-filters (horizontal/vertical/gradient)" 2017-01-17 15:13:08 +00:00
Pascal Massimino
74c053b57d Merge "NEON: fix overflow in SSE NxN calculation" 2017-01-17 15:10:54 +00:00
Pascal Massimino
0a3aeff75b Merge "dsp: WebPExtractGreen function for alpha decompression" 2017-01-17 15:08:20 +00:00
Pascal Massimino
1de931c669 NEON: implement alpha-filters (horizontal/vertical/gradient)
gradient-filter code is not much faster, but maybe improvable in the future.

Change-Id: Ia16070e409fe8703b02276166f19526917df6b35
2017-01-17 15:44:46 +01:00
Pascal Massimino
9b3aca404d NEON: fix overflow in SSE NxN calculation
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.

Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
2017-01-17 11:47:36 +01:00
Pascal Massimino
1c07a3c639 dsp: WebPExtractGreen function for alpha decompression
+ NEON implementation

Change-Id: I67204f99d6e4c5974718bdf21dad30381978f72c
2017-01-17 09:33:25 +00:00
Pascal Massimino
9ed5e3e5dd use pointers for WebPRescaler's in WebPDecParams
This makes the structure more generic, without the hard-coded
internal structure.

This is a borderline incompatible ABI change, even if WebPIDecoder structure
is opaque.

Change-Id: I518765c3f76fc17a136cef045a5a8aa70ed70e85
2017-01-16 22:30:29 -08:00
James Zern
db013a8d5c Merge "ARM: don't use USE_GENERIC_TREE" 2017-01-13 22:15:04 +00:00
Pascal Massimino
fcd4784dcd use a 8b table for C-version for clz()
30% faster on x86, 5% faster on N5.

New generic function: WebPLog2FloorC()
This function is called as fallback for BitsLog2Floor() when there's
no clz() available.

Change-Id: Ica15c6092112e514c0e200fab89c434de48d4b19
2017-01-13 15:36:26 +01:00
Pascal Massimino
fbb5c473b4 ARM: don't use USE_GENERIC_TREE
It's 1-2% faster to use hard-coded tree on ARM

Change-Id: I54403a70f6c692e50148c33f36833588957c20ee
2017-01-13 10:05:21 +01:00
Pascal Massimino
8fda56126e Merge "add a kSlowSSSE3 feature for CPUInfo" 2017-01-13 07:01:48 +00:00
Pascal Massimino
86bbd24552 add a kSlowSSSE3 feature for CPUInfo
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.

Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731

Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
2017-01-13 06:19:27 +00:00
Vincent Rabaud
7c2779e95a Get code to fully compile in C++.
Change-Id: I6d8490c8c9b955d90dcc89ee8a9cf29ca0f93b08
2017-01-12 18:03:55 +01:00
Vincent Rabaud
250c358662 Merge "When compiling as C++, avoid narrowing warnings." 2017-01-12 13:00:56 +00:00
Vincent Rabaud
c0648ac2ae When compiling as C++, avoid narrowing warnings.
The gcc compilation warning was: narrowing conversion from ‘int’ to ‘int8_t’

Change-Id: I4803dd60ad04060cdb5d61a1aa98b25215b9d4eb
2017-01-12 13:39:22 +01:00
Pascal Massimino
0d55f60c91 40% faster ApplyAlphaMultiply_SSE2
process four pixels at a time

Change-Id: I1dee7f70772be4915654fc6638ef4729a1a239d4
2017-01-12 02:33:09 -08:00
Pascal Massimino
49d0280df1 NEON: implement several alpha-processing functions
- ApplyAlphaMultiply
 - DispatchAlpha
 - DispatchAlphaToGreen
 - ExtractAlpha

Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4)

new file: alpha_processing_neon.c

Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca
2017-01-11 17:39:29 +01:00
Pascal Massimino
48b1e85fbe SSE2: 15% faster alpha-processing functions
ApplyAlphaMultiply / MultARGBRow / MultRow

we use now: x/255 = (x * 0x8081) >> (16 + 7)
and x/255 + .5 = ((x + 128) * 0x0101) >> 16

Change-Id: I8931091316ffc8bbf65aa3402f2e7d2b800e1971
2017-01-11 15:35:16 +01:00
Pascal Massimino
e3b8abbc9b fix warning from static analysis.
"-1 cannot be represented in type 'unsigned int'"

Change-Id: I05abcb44af68f702ead5a7f24dc14aab31a2e4d9
2017-01-10 22:59:47 -08:00
Pascal Massimino
28fe054e73 SSE2: 30% faster ApplyAlphaMultiply()
and 15% faster MultARGBRow()

by switching to formulae:
    X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
   (X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128

Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
2017-01-10 23:34:22 +01:00
Vincent Rabaud
f44acd253b Merge "Properly compute the optimal color cache size." 2017-01-10 21:14:16 +00:00
Vincent Rabaud
527844fee0 Properly compute the optimal color cache size.
The previous optimization was performing dichotomy on a function that
is anything in practice, hence a bit of randomness.
Also, two magic constants were used, one for an extra constant cost,
one for an extra linear cost. Both values/models were empirical.

A brute force search for the best cache size is now performed.

To have less CPU impact, a speed optimization is also made by not
inserting a value again and again.
This makes sense but it's also the most common case of when LZ77 is
useful hence an overall improvement sometimes.

Change-Id: I57de5750ad2313b2feecbcd15cd6e4feeb98e5c8
2017-01-10 21:44:53 +01:00
Pascal Massimino
be0ef6395f fix a comment typo
Change-Id: I0fabd08cd8abd3cea7ddfd2e498507adb0d3c67e
2017-01-10 21:17:13 +01:00
Vincent Rabaud
8874b16275 Fix a non-deterministic color cache size computation.
In case of impossible allocation, some value was returned while
computation should be stopped.

Change-Id: I5f85e264575be825e4261ab6fa63840c157cf5c2
2017-01-10 18:53:19 +01:00
Vincent Rabaud
d712e20de0 Do not allow a color cache size bigger than the number of colors.
This is purely for speed optimization.

Change-Id: Ie4b4380df8a5afa90574012bacdb1ddad03f320e
2017-01-10 09:25:02 +01:00
Vincent Rabaud
ecff04f625 re-introduce some comments in Huffman Cost.
Change-Id: I2396bbc58628dd12a2d36068f7193e2a6eb4d166
2017-01-06 13:17:14 +01:00
Pascal Massimino
00b08c88c0 Merge "NEON: 5% faster conversion to RGB565 and RGBA4444" 2016-12-22 08:39:01 +00:00
Pascal Massimino
0e7f444702 Merge "NEON: faster fancy upsampling" 2016-12-21 14:53:24 +00:00
Pascal Massimino
b016cb91c5 NEON: faster fancy upsampling
2-3% faster decoding overall

Change-Id: I2c53e50dc7e0ade5245cff8cc5d7b96a14062955
2016-12-21 15:23:54 +01:00
Vincent Rabaud
1cb638010c Call the C function to finish off lossless SSE loops only when necessary.
Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad
2016-12-21 14:25:54 +01:00
Vincent Rabaud
875fafc191 Implement BundleColorMap in SSE2.
Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e
2016-12-21 10:44:31 +01:00
James Zern
f04eb37603 libwebp-0.5.2
- 12/13/2016: version 0.5.2
   This is a binary compatible release.
   This release covers CVE-2016-8888 and CVE-2016-9085.
   * further security related hardening in the tools; fixes to
     gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
     #312)
   * full libwebp (encoder & decoder) iOS framework; libwebpdecoder
     WebP.framework renamed to WebPDecoder.framework (issue #307)
   * CMake support for Android Studio (2.2)
   * miscellaneous build related fixes (issue #306, #313)
   * miscellaneous documentation improvements (issue #225)
   * minor lossy encoder fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYWfopAAoJEPnD1r24Iytd0gAQALhTSEjJVmKfHxyPNDduc3kn
 QeiVaVwPiOS/a266+ZnWHzCvkR3zgqZxNlyKzRty378gM8/P7r2dMCmfdnVFbF4O
 a7M1lld9yYldNpAxvHDnY9u2RzmRfVD1yYu27gv77uT7gR2IybQ81FHi1pn56tFA
 2g4yHdrC2tXud22ZUb9Bgqe7YW06gWND4EmeJgxF38S98gdrtJla5rmlUcuEhbIl
 SHpkbEgJX4nZxWggyCJ61/OxeEwwWBtI3kpSLkEqmCVSnFb7WBC7pITq59n8hg2U
 SaYCfWGRJ/oQQvxUxuPYYtzq26dYOxd2vT9S1mcE1be9jMGxKp9vgE8jNflvtza1
 wTPUajaPUjsTLAvFikQRo+34W9QxOKp9jCX9Be0V4wvBClfM13toBgKolzPGGUuo
 zlcZ0/GgzwfQ+sD7bs/p/7ToiH+GejBUK7FUR8ZB7EHZrDynszSzEevx5SUzPWV3
 1q4TyD5eclUOjb4S2yplcKp0kwkwtOA5ETboPzA+b8TQnfTFM3GP7fMoYvORbSZp
 39/H5hi1bjlOE4m3mp3qqfR2DMWZlla7YNZiuuTEeY3ztrlqeakC2ma1Fhi6ZmbG
 TrqmAaDTueRizry4E7Fr9sBw0mee14v/xcTFcDcSI1BRFclFc1KAw0ObzdaN2iEt
 L5tjlqzH0XEH4fl5OnD3
 =x+Y3
 -----END PGP SIGNATURE-----

Merge tag 'v0.5.2'

libwebp-0.5.2
- 12/13/2016: version 0.5.2
  This is a binary compatible release.
  This release covers CVE-2016-8888 and CVE-2016-9085.
  * further security related hardening in the tools; fixes to
    gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
    #312)
  * full libwebp (encoder & decoder) iOS framework; libwebpdecoder
    WebP.framework renamed to WebPDecoder.framework (issue #307)
  * CMake support for Android Studio (2.2)
  * miscellaneous build related fixes (issue #306, #313)
  * miscellaneous documentation improvements (issue #225)
  * minor lossy encoder fixes and improvements

* tag 'v0.5.2': (54 commits)
  update ChangeLog
  anim_util: quiet implicit conv warnings in 32-bit
  jpegdec: correct ContextFill signature
  Remove some errors when compiling the code as C++.
  vwebp: clear canvas during resize w/o animation
  tiffdec: restore libtiff 3.9.x compatibility
  update NEWS
  AnimEncoder: avoid freeing uninitialized memory pointer.
  WebPAnimEncoder: If 'minimize_size' and 'allow_mixed' on, try lossy + lossless.
  fix a potential overflow with MALLOC_LIMIT
  bump version to 0.5.2
  update AUTHORS & .mailmap
  iosbuild.sh: add WebPDecoder.framework + encoder
  AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
  Fix assertions in WebPRescalerExportRow()
  fix a typo in WebPPictureYUVAToARGB's doc
  systematically call WebPDemuxReleaseIterator() on dec->prev_iter_
  doc: use two's complement explicitly for uint8->int8 conversion
  Anim_encoder: correctly handle enc->prev_candidate_undecided_
  WebPPictureDistortion(): free() -> WebPSafeFree()
  ...

Change-Id: I16bcf54af41ce8fad98d4fbc8aa1df58f338fc23
2016-12-20 20:14:55 -08:00
Pascal Massimino
341d711c43 NEON: 5% faster conversion to RGB565 and RGBA4444
We use the magic 'shift and insert' instruction instead of
the multiple shifts and or's.

Change-Id: I48df0320668b502a91792defc0423a9441669d19
2016-12-20 17:01:48 +01:00
Vincent Rabaud
24eb39401b Remove some errors when compiling the code as C++.
This fixes some cases from
https://bugs.chromium.org/p/webp/issues/detail?id=137

Change-Id: I58f3a617bf973dbe4c5794004a01e2aea39ba53a
(cherry picked from commit 28ce3043448bd3a941989939521cd333b6a6ae39)
2016-12-15 11:50:44 -08:00
Pascal Massimino
a4bbe4b38b fix indentation
Change-Id: I5593fb2441f253c6b8cc43949c11909f19184b55
2016-12-13 22:50:29 -08:00
hui su
5ab6d9de1f AnimEncoder: avoid freeing uninitialized memory pointer.
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().

BUG=webp:322

Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
(cherry picked from commit 3ebe1c0003287e1d9b65d99750f227ca7ed4dffc)
2016-12-13 18:18:57 -08:00
Urvang Joshi
f29bf582df WebPAnimEncoder: If 'minimize_size' and 'allow_mixed' on, try lossy + lossless.
This improves compression by ~5% at default quality.

If only 'allow_mixed' is on (but 'minimize_size' isn't), we continue to
use a heuristic to try one of the two or both.

Change-Id: Ia573a73ea26ad25f9debff759eed69d2b0449e82
(cherry picked from commit 3f4042b52a5d1a1c6ea41c192970d8b7e1a53118)
2016-12-13 18:18:48 -08:00
hui su
3ebe1c0003 AnimEncoder: avoid freeing uninitialized memory pointer.
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().

BUG=webp:322

Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
2016-12-13 17:39:16 -08:00
Pascal Massimino
df780e0eac fix a potential overflow with MALLOC_LIMIT
BUG=webp:321

Change-Id: Iab89dfe167fb394fcdffd3b2732d4ac9bef764b0
(cherry picked from commit 76bbcf2ed61d326bae3e37e1941e2a8674840462)
2016-12-13 16:15:28 -08:00
Pascal Massimino
58fc507842 Merge "PredictorSub: implement fully-SSE2 version" 2016-12-13 11:03:13 +00:00
Pascal Massimino
9cc421675b PredictorSub: implement fully-SSE2 version
and inline the C-version too.

Predictor #13 is still a hard one.

Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331
2016-12-13 02:19:35 -08:00
Pascal Massimino
827d3c5038 Merge "fix a potential overflow with MALLOC_LIMIT" 2016-12-13 06:10:56 +00:00
James Zern
218460cdd7 bump version to 0.5.2
libwebp{,decoder} - 0.5.2
libwebp libtool - 6.2.0
libwebpdecoder libtool - 2.2.0

mux - 0.3.2
libtool - 2.2.0

demux - 0.3.1
libtool - 2.1.0

Change-Id: Idf199415c325e6e9d157459a4e016ebba88c3f34
2016-12-12 17:36:12 -08:00
Pascal Massimino
76bbcf2ed6 fix a potential overflow with MALLOC_LIMIT
BUG=webp:321

Change-Id: Iab89dfe167fb394fcdffd3b2732d4ac9bef764b0
2016-12-12 13:40:40 -08:00
James Zern
2423017a28 dsp/lossless.c,cosmetics: fix indent
after:
fbba5bc optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...

Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
2016-12-12 12:53:23 -08:00
Pascal Massimino
fbba5bc2c1 optimize predictor #1 in plain-C
For some reason, gcc has hard time inlining this one...

Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]

Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
2016-12-12 17:41:36 +01:00
Pascal Massimino
9ae0b3f65a Merge "SSE2: slightly (~2%) faster Predictor #1" 2016-12-12 14:46:21 +00:00
Pascal Massimino
c1f97bd758 SSE2: slightly (~2%) faster Predictor #1
by removing a load from memory

Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0
2016-12-12 02:24:38 -08:00
Pascal Massimino
ea664b8995 SSE2: 10% faster Predictor #11
Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185
2016-12-12 02:20:41 -08:00
Hui Su
be7dcc088c AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
Change-Id: I0d288bd9561b48cf5a1eae92a1b7106ba44c664e
(cherry picked from commit 1cc79e92ac74337aa4102a3128fa9451ef4b5fd0)
2016-12-09 20:22:31 -08:00