Commit Graph

2432 Commits

Author SHA1 Message Date
Pascal Massimino
1dc82a6bba Merge "introduce a generic GetCoeffs() function pointer" 2017-01-18 07:44:36 +00:00
Pascal Massimino
8074b89eb3 introduce a generic GetCoeffs() function pointer
We can switch at run-time between the standard GetCoeffs() critical
function, that uses a fast variant of VP8GetBit().
However, some platforms have slow instructions that make standard
VP8GetBit() slow. GetCoeffs() is the right level of branching to
switch to GetCoeffsAlt() that avoids these slow instructions in some
not-frequent cases.

Next patch will upgrade VP8GetBit() to use clz, after this one
is proved to be neutral speed-wise.

Change-Id: Ia6cef5de9de6131574d2202bbc0bea8559c9b693
2017-01-17 16:24:00 +01:00
Pascal Massimino
749a45a520 Merge "NEON: implement alpha-filters (horizontal/vertical/gradient)" 2017-01-17 15:13:08 +00:00
Pascal Massimino
74c053b57d Merge "NEON: fix overflow in SSE NxN calculation" 2017-01-17 15:10:54 +00:00
Pascal Massimino
0a3aeff75b Merge "dsp: WebPExtractGreen function for alpha decompression" 2017-01-17 15:08:20 +00:00
Pascal Massimino
1de931c669 NEON: implement alpha-filters (horizontal/vertical/gradient)
gradient-filter code is not much faster, but maybe improvable in the future.

Change-Id: Ia16070e409fe8703b02276166f19526917df6b35
2017-01-17 15:44:46 +01:00
Pascal Massimino
9b3aca404d NEON: fix overflow in SSE NxN calculation
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.

Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
2017-01-17 11:47:36 +01:00
Pascal Massimino
1c07a3c639 dsp: WebPExtractGreen function for alpha decompression
+ NEON implementation

Change-Id: I67204f99d6e4c5974718bdf21dad30381978f72c
2017-01-17 09:33:25 +00:00
Pascal Massimino
9ed5e3e5dd use pointers for WebPRescaler's in WebPDecParams
This makes the structure more generic, without the hard-coded
internal structure.

This is a borderline incompatible ABI change, even if WebPIDecoder structure
is opaque.

Change-Id: I518765c3f76fc17a136cef045a5a8aa70ed70e85
2017-01-16 22:30:29 -08:00
James Zern
db013a8d5c Merge "ARM: don't use USE_GENERIC_TREE" 2017-01-13 22:15:04 +00:00
Pascal Massimino
fcd4784dcd use a 8b table for C-version for clz()
30% faster on x86, 5% faster on N5.

New generic function: WebPLog2FloorC()
This function is called as fallback for BitsLog2Floor() when there's
no clz() available.

Change-Id: Ica15c6092112e514c0e200fab89c434de48d4b19
2017-01-13 15:36:26 +01:00
Pascal Massimino
fbb5c473b4 ARM: don't use USE_GENERIC_TREE
It's 1-2% faster to use hard-coded tree on ARM

Change-Id: I54403a70f6c692e50148c33f36833588957c20ee
2017-01-13 10:05:21 +01:00
Pascal Massimino
8fda56126e Merge "add a kSlowSSSE3 feature for CPUInfo" 2017-01-13 07:01:48 +00:00
Pascal Massimino
86bbd24552 add a kSlowSSSE3 feature for CPUInfo
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.

Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731

Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
2017-01-13 06:19:27 +00:00
Vincent Rabaud
7c2779e95a Get code to fully compile in C++.
Change-Id: I6d8490c8c9b955d90dcc89ee8a9cf29ca0f93b08
2017-01-12 18:03:55 +01:00
Vincent Rabaud
250c358662 Merge "When compiling as C++, avoid narrowing warnings." 2017-01-12 13:00:56 +00:00
Vincent Rabaud
c0648ac2ae When compiling as C++, avoid narrowing warnings.
The gcc compilation warning was: narrowing conversion from ‘int’ to ‘int8_t’

Change-Id: I4803dd60ad04060cdb5d61a1aa98b25215b9d4eb
2017-01-12 13:39:22 +01:00
Pascal Massimino
0d55f60c91 40% faster ApplyAlphaMultiply_SSE2
process four pixels at a time

Change-Id: I1dee7f70772be4915654fc6638ef4729a1a239d4
2017-01-12 02:33:09 -08:00
Pascal Massimino
49d0280df1 NEON: implement several alpha-processing functions
- ApplyAlphaMultiply
 - DispatchAlpha
 - DispatchAlphaToGreen
 - ExtractAlpha

Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4)

new file: alpha_processing_neon.c

Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca
2017-01-11 17:39:29 +01:00
Pascal Massimino
48b1e85fbe SSE2: 15% faster alpha-processing functions
ApplyAlphaMultiply / MultARGBRow / MultRow

we use now: x/255 = (x * 0x8081) >> (16 + 7)
and x/255 + .5 = ((x + 128) * 0x0101) >> 16

Change-Id: I8931091316ffc8bbf65aa3402f2e7d2b800e1971
2017-01-11 15:35:16 +01:00
Pascal Massimino
e3b8abbc9b fix warning from static analysis.
"-1 cannot be represented in type 'unsigned int'"

Change-Id: I05abcb44af68f702ead5a7f24dc14aab31a2e4d9
2017-01-10 22:59:47 -08:00
Pascal Massimino
28fe054e73 SSE2: 30% faster ApplyAlphaMultiply()
and 15% faster MultARGBRow()

by switching to formulae:
    X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
   (X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128

Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
2017-01-10 23:34:22 +01:00
Vincent Rabaud
f44acd253b Merge "Properly compute the optimal color cache size." 2017-01-10 21:14:16 +00:00
Vincent Rabaud
527844fee0 Properly compute the optimal color cache size.
The previous optimization was performing dichotomy on a function that
is anything in practice, hence a bit of randomness.
Also, two magic constants were used, one for an extra constant cost,
one for an extra linear cost. Both values/models were empirical.

A brute force search for the best cache size is now performed.

To have less CPU impact, a speed optimization is also made by not
inserting a value again and again.
This makes sense but it's also the most common case of when LZ77 is
useful hence an overall improvement sometimes.

Change-Id: I57de5750ad2313b2feecbcd15cd6e4feeb98e5c8
2017-01-10 21:44:53 +01:00
Pascal Massimino
be0ef6395f fix a comment typo
Change-Id: I0fabd08cd8abd3cea7ddfd2e498507adb0d3c67e
2017-01-10 21:17:13 +01:00
Vincent Rabaud
8874b16275 Fix a non-deterministic color cache size computation.
In case of impossible allocation, some value was returned while
computation should be stopped.

Change-Id: I5f85e264575be825e4261ab6fa63840c157cf5c2
2017-01-10 18:53:19 +01:00
Vincent Rabaud
d712e20de0 Do not allow a color cache size bigger than the number of colors.
This is purely for speed optimization.

Change-Id: Ie4b4380df8a5afa90574012bacdb1ddad03f320e
2017-01-10 09:25:02 +01:00
Vincent Rabaud
ecff04f625 re-introduce some comments in Huffman Cost.
Change-Id: I2396bbc58628dd12a2d36068f7193e2a6eb4d166
2017-01-06 13:17:14 +01:00
Pascal Massimino
00b08c88c0 Merge "NEON: 5% faster conversion to RGB565 and RGBA4444" 2016-12-22 08:39:01 +00:00
Pascal Massimino
0e7f444702 Merge "NEON: faster fancy upsampling" 2016-12-21 14:53:24 +00:00
Pascal Massimino
b016cb91c5 NEON: faster fancy upsampling
2-3% faster decoding overall

Change-Id: I2c53e50dc7e0ade5245cff8cc5d7b96a14062955
2016-12-21 15:23:54 +01:00
Vincent Rabaud
1cb638010c Call the C function to finish off lossless SSE loops only when necessary.
Change-Id: I4e221d80879dc9c90c24d69a40bc5811d73787ad
2016-12-21 14:25:54 +01:00
Vincent Rabaud
875fafc191 Implement BundleColorMap in SSE2.
Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e
2016-12-21 10:44:31 +01:00
James Zern
f04eb37603 libwebp-0.5.2
- 12/13/2016: version 0.5.2
   This is a binary compatible release.
   This release covers CVE-2016-8888 and CVE-2016-9085.
   * further security related hardening in the tools; fixes to
     gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
     #312)
   * full libwebp (encoder & decoder) iOS framework; libwebpdecoder
     WebP.framework renamed to WebPDecoder.framework (issue #307)
   * CMake support for Android Studio (2.2)
   * miscellaneous build related fixes (issue #306, #313)
   * miscellaneous documentation improvements (issue #225)
   * minor lossy encoder fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYWfopAAoJEPnD1r24Iytd0gAQALhTSEjJVmKfHxyPNDduc3kn
 QeiVaVwPiOS/a266+ZnWHzCvkR3zgqZxNlyKzRty378gM8/P7r2dMCmfdnVFbF4O
 a7M1lld9yYldNpAxvHDnY9u2RzmRfVD1yYu27gv77uT7gR2IybQ81FHi1pn56tFA
 2g4yHdrC2tXud22ZUb9Bgqe7YW06gWND4EmeJgxF38S98gdrtJla5rmlUcuEhbIl
 SHpkbEgJX4nZxWggyCJ61/OxeEwwWBtI3kpSLkEqmCVSnFb7WBC7pITq59n8hg2U
 SaYCfWGRJ/oQQvxUxuPYYtzq26dYOxd2vT9S1mcE1be9jMGxKp9vgE8jNflvtza1
 wTPUajaPUjsTLAvFikQRo+34W9QxOKp9jCX9Be0V4wvBClfM13toBgKolzPGGUuo
 zlcZ0/GgzwfQ+sD7bs/p/7ToiH+GejBUK7FUR8ZB7EHZrDynszSzEevx5SUzPWV3
 1q4TyD5eclUOjb4S2yplcKp0kwkwtOA5ETboPzA+b8TQnfTFM3GP7fMoYvORbSZp
 39/H5hi1bjlOE4m3mp3qqfR2DMWZlla7YNZiuuTEeY3ztrlqeakC2ma1Fhi6ZmbG
 TrqmAaDTueRizry4E7Fr9sBw0mee14v/xcTFcDcSI1BRFclFc1KAw0ObzdaN2iEt
 L5tjlqzH0XEH4fl5OnD3
 =x+Y3
 -----END PGP SIGNATURE-----

Merge tag 'v0.5.2'

libwebp-0.5.2
- 12/13/2016: version 0.5.2
  This is a binary compatible release.
  This release covers CVE-2016-8888 and CVE-2016-9085.
  * further security related hardening in the tools; fixes to
    gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
    #312)
  * full libwebp (encoder & decoder) iOS framework; libwebpdecoder
    WebP.framework renamed to WebPDecoder.framework (issue #307)
  * CMake support for Android Studio (2.2)
  * miscellaneous build related fixes (issue #306, #313)
  * miscellaneous documentation improvements (issue #225)
  * minor lossy encoder fixes and improvements

* tag 'v0.5.2': (54 commits)
  update ChangeLog
  anim_util: quiet implicit conv warnings in 32-bit
  jpegdec: correct ContextFill signature
  Remove some errors when compiling the code as C++.
  vwebp: clear canvas during resize w/o animation
  tiffdec: restore libtiff 3.9.x compatibility
  update NEWS
  AnimEncoder: avoid freeing uninitialized memory pointer.
  WebPAnimEncoder: If 'minimize_size' and 'allow_mixed' on, try lossy + lossless.
  fix a potential overflow with MALLOC_LIMIT
  bump version to 0.5.2
  update AUTHORS & .mailmap
  iosbuild.sh: add WebPDecoder.framework + encoder
  AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
  Fix assertions in WebPRescalerExportRow()
  fix a typo in WebPPictureYUVAToARGB's doc
  systematically call WebPDemuxReleaseIterator() on dec->prev_iter_
  doc: use two's complement explicitly for uint8->int8 conversion
  Anim_encoder: correctly handle enc->prev_candidate_undecided_
  WebPPictureDistortion(): free() -> WebPSafeFree()
  ...

Change-Id: I16bcf54af41ce8fad98d4fbc8aa1df58f338fc23
2016-12-20 20:14:55 -08:00
Pascal Massimino
341d711c43 NEON: 5% faster conversion to RGB565 and RGBA4444
We use the magic 'shift and insert' instruction instead of
the multiple shifts and or's.

Change-Id: I48df0320668b502a91792defc0423a9441669d19
2016-12-20 17:01:48 +01:00
Vincent Rabaud
24eb39401b Remove some errors when compiling the code as C++.
This fixes some cases from
https://bugs.chromium.org/p/webp/issues/detail?id=137

Change-Id: I58f3a617bf973dbe4c5794004a01e2aea39ba53a
(cherry picked from commit 28ce304344)
2016-12-15 11:50:44 -08:00
Pascal Massimino
a4bbe4b38b fix indentation
Change-Id: I5593fb2441f253c6b8cc43949c11909f19184b55
2016-12-13 22:50:29 -08:00
hui su
5ab6d9de1f AnimEncoder: avoid freeing uninitialized memory pointer.
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().

BUG=webp:322

Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
(cherry picked from commit 3ebe1c0003)
2016-12-13 18:18:57 -08:00
Urvang Joshi
f29bf582df WebPAnimEncoder: If 'minimize_size' and 'allow_mixed' on, try lossy + lossless.
This improves compression by ~5% at default quality.

If only 'allow_mixed' is on (but 'minimize_size' isn't), we continue to
use a heuristic to try one of the two or both.

Change-Id: Ia573a73ea26ad25f9debff759eed69d2b0449e82
(cherry picked from commit 3f4042b52a)
2016-12-13 18:18:48 -08:00
hui su
3ebe1c0003 AnimEncoder: avoid freeing uninitialized memory pointer.
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().

BUG=webp:322

Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
2016-12-13 17:39:16 -08:00
Pascal Massimino
df780e0eac fix a potential overflow with MALLOC_LIMIT
BUG=webp:321

Change-Id: Iab89dfe167fb394fcdffd3b2732d4ac9bef764b0
(cherry picked from commit 76bbcf2ed6)
2016-12-13 16:15:28 -08:00
Pascal Massimino
58fc507842 Merge "PredictorSub: implement fully-SSE2 version" 2016-12-13 11:03:13 +00:00
Pascal Massimino
9cc421675b PredictorSub: implement fully-SSE2 version
and inline the C-version too.

Predictor #13 is still a hard one.

Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331
2016-12-13 02:19:35 -08:00
Pascal Massimino
827d3c5038 Merge "fix a potential overflow with MALLOC_LIMIT" 2016-12-13 06:10:56 +00:00
James Zern
218460cdd7 bump version to 0.5.2
libwebp{,decoder} - 0.5.2
libwebp libtool - 6.2.0
libwebpdecoder libtool - 2.2.0

mux - 0.3.2
libtool - 2.2.0

demux - 0.3.1
libtool - 2.1.0

Change-Id: Idf199415c325e6e9d157459a4e016ebba88c3f34
2016-12-12 17:36:12 -08:00
Pascal Massimino
76bbcf2ed6 fix a potential overflow with MALLOC_LIMIT
BUG=webp:321

Change-Id: Iab89dfe167fb394fcdffd3b2732d4ac9bef764b0
2016-12-12 13:40:40 -08:00
James Zern
2423017a28 dsp/lossless.c,cosmetics: fix indent
after:
fbba5bc optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...

Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
2016-12-12 12:53:23 -08:00
Pascal Massimino
fbba5bc2c1 optimize predictor #1 in plain-C
For some reason, gcc has hard time inlining this one...

Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]

Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
2016-12-12 17:41:36 +01:00
Pascal Massimino
9ae0b3f65a Merge "SSE2: slightly (~2%) faster Predictor #1" 2016-12-12 14:46:21 +00:00
Pascal Massimino
c1f97bd758 SSE2: slightly (~2%) faster Predictor #1
by removing a load from memory

Change-Id: If6c4aa7fb99309d09f943393ec772891449971f0
2016-12-12 02:24:38 -08:00
Pascal Massimino
ea664b8995 SSE2: 10% faster Predictor #11
Change-Id: I14ae5f6603071b86dfdbe8e6f7dfdbe5d8510185
2016-12-12 02:20:41 -08:00
Hui Su
be7dcc088c AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
Change-Id: I0d288bd9561b48cf5a1eae92a1b7106ba44c664e
(cherry picked from commit 1cc79e92ac)
2016-12-09 20:22:31 -08:00
Hui Su
408858308a Fix assertions in WebPRescalerExportRow()
Change-Id: I25711dd54e71c90a25f7b18e0ef9155e8151a15e
(cherry picked from commit 27b5d991e2)
2016-12-09 20:22:25 -08:00
Pascal Massimino
8f38c72e11 fix a typo in WebPPictureYUVAToARGB's doc
method -> colorspace

Change-Id: I5c9a2ccc909c967a936758dde2cfce92eb95462a
(cherry picked from commit dc789ada44)
2016-12-09 17:27:59 -08:00
Pascal Massimino
33ca93f909 systematically call WebPDemuxReleaseIterator() on dec->prev_iter_
Change-Id: I4a767134dcc52a7ee7c3bc5deb91012eaf7b6512
(cherry picked from commit aaf2a6a698)
2016-12-09 17:27:54 -08:00
hui su
f91ba96306 Anim_encoder: correctly handle enc->prev_candidate_undecided_
Set enc->prev_candidate_undecided_ as 0 when a frame is not chosen
as a possible keyframe, so that the dispose method can be
dispose-to-background.

Change-Id: If2899f5dbc06fb53705fb8240072ab6440a6de12
(cherry picked from commit 29fedbf58b)
2016-12-09 16:58:28 -08:00
Pascal Massimino
25d74e652e WebPPictureDistortion(): free() -> WebPSafeFree()
missed one!

Change-Id: I643170451b3ac07c748b70a9abfe8af17a716b24
(cherry picked from commit 32dead4ee3)
2016-12-09 16:58:19 -08:00
James Zern
03f1c00877 mux/Makefile.am: add missing -lm
+ libwebpmux.pc

anim_encode.c relies on functions from math.h

BUG=webp:306

Change-Id: I3a8eb48febfd52bfbeb04f4dc615ccbed72926f7
(cherry picked from commit aaf2530cc3)
2016-12-09 15:03:08 -08:00
Pascal Massimino
58410cd6dc fix bug in RefineUsingDistortion()
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)

We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.

Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
(cherry picked from commit 0a3838ca77)
2016-12-08 15:48:16 -08:00
Pascal Massimino
e168af8c6c fix filtering auto-adjustment
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).

Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.

Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
(cherry picked from commit e4cd4daf74)
2016-12-08 15:48:08 -08:00
Pascal Massimino
ed9dec41a5 fix doc and code snippet for WebPINewDecoder() doc
Change-Id: I1a75fdf60f0b9f1816be28f22613438bfe21752b
(cherry picked from commit e715285611)
2016-12-08 15:48:04 -08:00
Pascal Massimino
3c49178f7d prevent 32b overflow for very large canvas_width / height
some multiplies here and there needed some extra checks
and error reporting. Even if width * height is guaranteed
to be < 2**32, we were multiplying by num_channels and
triggering a 32b overflow.
Some multiplies were not using size_t or uint64_t, additionally.

Change-Id: If2a35b94c8af204135f4b88a7fd63850aa381bbf
(cherry picked from commit 1c36440094)
2016-12-08 15:27:51 -08:00
Pascal Massimino
b3fb8bb602 slightly faster Predictor #11 in NEON
(+some slight modifications on Predictor #12)

Change-Id: Ic2132dcd83d961cd069fa01ca1670e35e35274e2
2016-12-08 07:32:51 -08:00
James Zern
a0d2753fcb lower WEBP_MAX_ALLOCABLE_MEMORY default
restrict to 2^34 for 64-bit targets, < 2^32 for 32-bit

Change-Id: Iff4ce40ae2c3c7fc119f018c2128dbe8f744341f
(cherry picked from commit b8384b53d6)
2016-12-07 18:30:44 -08:00
Pascal Massimino
31fe11a57a fix infinite loop in case of PARTITION0 overflow
max_i4_header_bits_ could drop to zero for difficult image and trigger
a loop. Surprisingly, StatLoop() didn't have this bug.

Change-Id: Idc0f9eadef30a2b2f02041b994f25def30901e36
(cherry picked from commit 21e7537abe)
2016-12-07 18:30:39 -08:00
hui su
532215dd29 Change the rule of picking UV mode in MBAnalyzeBestUVMode()
Pick the mode with the smallest alpha.
It only affects m0, in which case the mode decision is not re-examined
later in VP8Decimate(). Tests on some natural content png images show
PSNR increase as well as visual quality improvement.

Change-Id: Iea997e718cd7477160fa05eb7cfb35f4cec2fa9a
(cherry picked from commit 1377ac2ec1)
2016-12-07 18:30:33 -08:00
hui su
7416280d75 Fix an unsigned integer overflow error in enc/cost.h
Change-Id: I9774b59c417c185f09a61a115364b9642976a100
(cherry picked from commit 0b2c58a91c)
2016-12-07 18:29:51 -08:00
hui su
13cf1d2e41 Do token recording and counting in a single loop
Change-Id: I8afd3c486b210bd67888de03e91dde7f78276f89
(cherry picked from commit 0c0fb83211)
2016-12-07 18:29:44 -08:00
hui su
eb9a4b97c5 Reset segment id if we decide not to update segment map
This avoids potential encoder and decoder mismatch.

Change-Id: I5282d3e168afc6193033ad3fce8fbc35618ab2f5
(cherry picked from commit 386e4ba2f0)
2016-12-07 18:25:06 -08:00
Pascal Massimino
76ebbfff28 NEON: implement predictor #13
~5-7% faster

Change-Id: I3361b0bbc978f3721168db15778a67337309c18a
2016-12-07 14:58:49 -08:00
Vincent Rabaud
95b12a08ae Merge "Revert Average3 and Average4" 2016-12-07 15:38:56 +00:00
Vincent Rabaud
54ab2e758f Revert Average3 and Average4
Average3 created a slowdown of 1-2% in lossless decoding.
Average4 created a slowdown of 2-3% in lossless decoding.

Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5
2016-12-07 15:32:33 +01:00
Pascal Massimino
fe12330c81 3-5% faster Predictor #5, #6, #7 and #10 for NEON
Change-Id: Ica48c7088d4384f0888dd171a47e68ebd25729b2
2016-12-07 15:25:33 +01:00
Pascal Massimino
fbfb3bef7b ~2% faster predictor #10 for NEON
Change-Id: Icd9cff90c227d702c3ba319131996c5475094520
2016-12-06 13:47:35 +00:00
Pascal Massimino
d4b7d801db lossless_sse2: use the local functions
...instead of the pointers stored in the array.
Should be faster (inlined) and safer.

Also: suffix explicitly the functions with _SSE2

Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90
2016-12-06 14:20:41 +01:00
Vincent Rabaud
a5e3b22574 Lossless decoder SSE2 improvements.
Change-Id: Ia901014ac63156a2e278b81e035256c30bdf8706
2016-12-06 13:45:09 +01:00
Pascal Massimino
58a1f124c2 ~2% faster predictor #12 in NEON.
Change-Id: I6772bb865d0f72720a65561eb55028e538df236d
2016-12-06 10:24:27 +01:00
Pascal Massimino
906c3b6392 Merge "Implement lossless transforms in NEON." 2016-12-03 16:55:14 +00:00
Vincent Rabaud
d23abe4e9f Implement lossless transforms in NEON.
Change-Id: I2172b1a763eb9dfe25d2b9bf1fb6501d7e192e55
2016-12-03 11:20:22 +00:00
Vincent Rabaud
2e6cb6f34e Give more flexibility to the predictor generating macro.
Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900
2016-12-02 12:33:12 -08:00
Vincent Rabaud
28e0bb7088 Merge "Fix race condition in multi-threading initialization." 2016-12-02 17:45:10 +00:00
Vincent Rabaud
647045305a Fix race condition in multi-threading initialization.
Before, a first thread could enter VP8LDspInitSSE2, set
VP8LPredictorsAdd to an SSE2 version BEFORE another thread
would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C
thus leading to a C version actually being the SSE2 one (which
would then create an infinite recursion in the SSE2 predictors
at execution).

Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a
2016-12-02 18:28:57 +01:00
Hui Su
1cc79e92ac AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
Change-Id: I0d288bd9561b48cf5a1eae92a1b7106ba44c664e
2016-12-02 11:50:13 +01:00
Pascal Massimino
ea72cd60cb add missing 'extern' keyword for predictor dcl
Change-Id: Ibf3db9b6dae91e53524c31cdfccf4678b3fa1135
2016-12-01 08:15:14 +01:00
Vincent Rabaud
67879e6d48 SSE implementation of decoding predictors.
Change-Id: I5c9ae63afc98013cb45ce8a91f051203ac68402c
2016-11-30 12:00:07 +01:00
Vincent Rabaud
a41296aef5 Fix potentially uninitialized value.
Change-Id: I721695e22474992db3094942b1ad4754ae7c0a02
2016-11-29 13:19:32 +01:00
Vincent Rabaud
4239a1489c Make the lossless predictors work on a batch of pixels.
Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1
2016-11-28 17:12:10 +01:00
Pascal Massimino
bc18ebad2e fix extra 'const's in signatures
Change-Id: Ie433d0defbc0c6feae2eb2f11e70082f1affada8
2016-11-25 09:45:52 +01:00
Vincent Rabaud
71e2f5cadf Remove memcpy in lossless decoding.
Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c
2016-11-24 17:45:24 +01:00
Vincent Rabaud
7474d46e45 Do not use a register array in SSE.
Change-Id: I79cf95bdac1164fc4de899828e9380c23df8d141
2016-11-24 13:06:44 +01:00
Owen Rodley
67748b41db Improve latency of FTransform2.
Benchmarks from vrabaud@:
8BIT/GRAY                corpus speed: faster: -4.3 % , corpus size: unchanged
skal/sources_png_skal    corpus speed: faster: -5.2 % , corpus size: unchanged
images/png_rgb           corpus speed: faster: -5.1 % , corpus size: unchanged
images/lpcb              corpus speed: unchanged, corpus size: unchanged
images/png_big           corpus speed: faster: -1.7 % , corpus size: unchanged
images/png_doc           corpus speed: unchanged, corpus size: unchanged
images/png_1bit          corpus speed: faster: -1.2 % , corpus size: unchanged
images/jpeg_small        corpus speed: unchanged, corpus size: unchanged
images/icip_core1        corpus speed: unchanged, corpus size: unchanged
images/png_gray          corpus speed: faster: -2.5 % , corpus size: unchanged
images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged
images/jpeg              corpus speed: faster: -2.3 % , corpus size: unchanged
images/png_translucent   corpus speed: faster: -2.8 % , corpus size: unchanged
images/gif               corpus speed: faster: -1.4 % , corpus size: unchanged
images/png_opaque        corpus speed: faster: -2.8 % , corpus size: unchanged
images/png_rgb_opaque    corpus speed: unchanged, corpus size: unchanged
images/png_indexed       corpus speed: faster: -2.0 % , corpus size: unchanged
images/all               corpus speed: faster: -1.5 % , corpus size: unchanged
images/png_small         corpus speed: unchanged, corpus size: unchanged
images/png               corpus speed: unchanged, corpus size: unchanged
images/gif_still         corpus speed: faster: -1.6 % , corpus size: unchanged

Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b
2016-11-24 07:09:50 +00:00
Vincent Rabaud
6540cd0eeb Provide an SSE implementation of ConvertBGRAToRGB
Change-Id: Ida11b079077a47fe3b92754f08aa30d81c301fcf
2016-11-23 16:25:51 +01:00
Pascal Massimino
3c2a61b099 remove some unneeded casts
Change-Id: Ie68788c77f016ed11446a55142b1bd8d96261452
2016-11-16 22:54:40 -08:00
Pascal Massimino
9ac063c37f add dsp functions for SmartYUV
+ SSE2 implementation

Change-Id: I5cfdb62d68b5a95899241a097d3a2f697fbc590e
2016-11-16 14:23:06 +00:00
Pascal Massimino
22efabddb4 Merge "smart_yuv: switch to planar instead of packed r/g/b processing" 2016-11-15 14:55:17 +00:00
Pascal Massimino
1d6e7bf39f smart_yuv: switch to planar instead of packed r/g/b processing
avoiding triplets of data should make it easier to write SSE2 versions.

FilterRow() can now filter all input in one single pass
-> conversion is 15-20% faster (but still overall slow compared to -pre 0)

Change-Id: I14c3215e672fdecde7ec80394e814bdc7445019f
2016-11-15 14:51:34 +01:00
Pascal Massimino
0a3838ca77 fix bug in RefineUsingDistortion()
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)

We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.

Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
2016-11-12 02:15:28 -08:00
James Zern
83cbfa09a1 Import: use relative pointer offsets
avoids int rollover when working with large input

BUG=webp:312

Change-Id: I6ad9f93b6c4b665c559bff87716a7b847f66a20d
(cherry picked from commit 342e15f0ce)
2016-11-09 15:50:57 -08:00
James Zern
a1ade40ed8 PreprocessARGB: use relative pointer offsets
avoids int rollover when working with large input

BUG=webp:312

Change-Id: I2881bec2884b550c966108beeff1bf0d8ef9f76b
(cherry picked from commit 1147ab4ee7)
2016-11-09 15:24:16 -08:00
James Zern
fd4d090fd1 ConvertWRGBToYUV: use relative pointer offsets
avoids int rollover when working with large input

BUG=webp:312

Change-Id: I693cbb295df9cf94aa89294b19c0496bdbe84d18
(cherry picked from commit de9fa5074e)
2016-11-09 12:57:03 -08:00
James Zern
9daad4598b ImportYUVAFromRGBA: use relative pointer offsets
avoids int rollover when working with large input

BUG=webp:312

Change-Id: I3d7b689be8d5751248a82d1021243d80d3f67203
(cherry picked from commit deb1b83199)
2016-11-09 12:56:49 -08:00
James Zern
342e15f0ce Import: use relative pointer offsets
avoids int rollover when working with large input

BUG=webp:312

Change-Id: I6ad9f93b6c4b665c559bff87716a7b847f66a20d
2016-11-07 17:08:13 -08:00
James Zern
1147ab4ee7 PreprocessARGB: use relative pointer offsets
avoids int rollover when working with large input

BUG=webp:312

Change-Id: I2881bec2884b550c966108beeff1bf0d8ef9f76b
2016-11-07 17:08:06 -08:00
Pascal Massimino
e4cd4daf74 fix filtering auto-adjustment
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).

Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.

Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
2016-11-07 06:43:51 -08:00
Pascal Massimino
e715285611 fix doc and code snippet for WebPINewDecoder() doc
Change-Id: I1a75fdf60f0b9f1816be28f22613438bfe21752b
2016-11-04 12:07:54 +01:00
James Zern
de9fa5074e ConvertWRGBToYUV: use relative pointer offsets
avoids int rollover when working with large input

BUG=webp:312

Change-Id: I693cbb295df9cf94aa89294b19c0496bdbe84d18
2016-11-04 00:35:04 -07:00
James Zern
deb1b83199 ImportYUVAFromRGBA: use relative pointer offsets
avoids int rollover when working with large input

BUG=webp:312

Change-Id: I3d7b689be8d5751248a82d1021243d80d3f67203
2016-11-04 00:34:58 -07:00
Pascal Massimino
31b1e34342 fix SSIM metric ... by ignoring too-dark area
Roughly, if both the source and the reference areas are
darker too dark (R/G/B <= ~6), they are ignored.

One caveat: SSIM calculation won't work for U/V planes,
which are 128-centered and not related to luminance.
But WebPPlaneDistortion() enforces the conversion to RGB,
if needed.

Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591
2016-10-20 15:17:55 +02:00
Pascal Massimino
2f51b614b0 introduce WebPPlaneDistortion to compute plane distortion
Make WebPPictureDistortion() only compute distortion on A/R/G/B planes, not Y/U/V(A).
(not just for SSIM, but PSNR too).

This is to avoid problems with using SSIM on U/V channels.
If Y/U/V distortion is needed, one can always use WebPPlaneDistortion() individually.

Change-Id: If8bc9c3ac12a8d2220f03224694fc389b16b7da9
2016-10-19 09:12:13 +02:00
Pascal Massimino
4eb5df28d1 remove unused stride fields from VP8Iterator
Change-Id: I242aaa746dc53c456eb8f1a71a5a2378f26fa843
2016-10-10 18:08:47 +02:00
Vincent Rabaud
11bc423ae5 MIN_LENGTH cleanups.
No change in logic so no change in speed or compression.

Change-Id: I744161978c7d058c9b58450f330cba11731530c6
2016-10-10 15:37:45 +02:00
Pascal Massimino
273d035a44 Merge "fix a typo in WebPPictureYUVAToARGB's doc" 2016-10-10 13:30:20 +00:00
Pascal Massimino
dc789ada44 fix a typo in WebPPictureYUVAToARGB's doc
method -> colorspace

Change-Id: I5c9a2ccc909c967a936758dde2cfce92eb95462a
2016-10-10 04:50:10 -07:00
Vincent Rabaud
539f5a688f Fix non-included header in config.c.
When compiling as experimental, WEBP_EXPERIMENTAL_FEATURES
would not be defined because the header defining it would
not be included.
Hence runtime errors in debug mode when running:
./cwebp -lossles whatever
...
Error! Cannot encode picture as WebP
Error code: 4 (INVALID_CONFIGURATION: configuration is invalid)

(detail: WebPConfig would have a random value set for
delta_palettization as config.c does not consider
it to exist.)

Change-Id: I41761cffe81a971130ed514b195a73d1c6dac1b7
2016-10-10 13:39:17 +02:00
Pascal Massimino
aaf2a6a698 systematically call WebPDemuxReleaseIterator() on dec->prev_iter_
Change-Id: I4a767134dcc52a7ee7c3bc5deb91012eaf7b6512
2016-10-07 17:30:58 -07:00
hui su
68ae5b671f Add libwebp/src/mux/animi.h
Change-Id: I80ca2070d419acf6e8355a295ee965d2df5a4d8f
2016-10-05 10:33:29 -07:00
Vincent Rabaud
28ce304344 Remove some errors when compiling the code as C++.
This fixes some cases from
https://bugs.chromium.org/p/webp/issues/detail?id=137

Change-Id: I58f3a617bf973dbe4c5794004a01e2aea39ba53a
2016-10-05 09:39:08 +02:00
hui su
b34abcb8b1 Favor keeping the areas locally similar in spatial prediction mode selection
About 0.1% compression improvement.

Change-Id: If106ab209cc2671ef282b726e09ff2971c3e4abf
2016-10-04 16:28:24 -07:00
Pascal Massimino
ba843a92e7 fix some SSIM calculations
* prevent 64bit overflow by controlling the 32b->64b conversions
  and preventively descaling by 8bit before the final multiply
* adjust the threshold constants C1 and C2 to de-emphasis the dark
  areas
* use a hat-like filter instead of box-filtering to avoid blockiness
  during averaging

SSIM distortion calc is actually *faster* now in SSE2, because of the
unrolling during the function rewrite.
The C-version is quite slower because still un-optimized.

Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90
2016-10-04 01:09:07 -07:00
Vincent Rabaud
f79450ca02 Speedup ApplyMap.
If a small hash map can be used, use it to avoid binary search.
This fist hash function that is tried works with the previous
use case of having indexed data in green.

Change-Id: I2f91cec5f3ca7e9c393fd829e69e09bab74f4e7c
2016-09-28 17:18:08 +02:00
Pascal Massimino
cfdda7c6bf Merge "prevent 32b overflow for very large canvas_width / height" 2016-09-28 15:09:58 +00:00
Vincent Rabaud
30d43706d3 Speed-up Combined entropy for palettized histograms.
Change-Id: Ie9bdebb26c726e5b44c2dbcc84d453f85a03f419
2016-09-28 13:22:13 +02:00
Pascal Massimino
86a84b3598 2x faster SSE2 implementation of SSIMGet
Change-Id: I53705d7ddfa595389ff2d542e5088f96f948d351
2016-09-23 23:23:06 -07:00
James Zern
b8384b53d6 lower WEBP_MAX_ALLOCABLE_MEMORY default
restrict to 2^34 for 64-bit targets, < 2^32 for 32-bit

Change-Id: Iff4ce40ae2c3c7fc119f018c2128dbe8f744341f
2016-09-22 23:13:33 -07:00
Pascal Massimino
1c36440094 prevent 32b overflow for very large canvas_width / height
some multiplies here and there needed some extra checks
and error reporting. Even if width * height is guaranteed
to be < 2**32, we were multiplying by num_channels and
triggering a 32b overflow.
Some multiplies were not using size_t or uint64_t, additionally.

Change-Id: If2a35b94c8af204135f4b88a7fd63850aa381bbf
2016-09-23 05:19:32 +00:00
Vincent Rabaud
5f1caf2987 Small LZ77 speedups.
The most common conditions are re-ordered and cached.

iter_min was recently introduced to make sure enough iterations
are made in cases where there are many matches (mostly uniform regions).
Now that those are properly analyzed, it becomes useless.

Change-Id: Id3010ee4ec66b84d602fcb926f91eb9155ad27f4
2016-09-22 14:03:25 +02:00
hui su
a2fe9bf404 Speedup TrellisQuantizeBlock().
-Skip examining quantized levels that are too high.
-Calculate last_pos_cost only when needed.

Encoding speed for m6 is increased by about 3%;
Compression performance is neutral.

Change-Id: I8af70b049587cca0375d9b3eb00479ec7c0c842a
2016-09-20 14:54:15 -07:00
Pascal Massimino
573cce270e smartYUV improvements
* switch to Rec709 transfer function in SmartYUV
* use Rec709 for Gray evaluation too.
* stop iterations if error is going up

See paragraph 1.2 and 3.2:
https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.709-6-201506-I!!PDF-E.pdf

(digest: https://en.wikipedia.org/wiki/Rec._709#Transfer_characteristics)

suggested by pdknsk@gmail.com on the mailing list

Change-Id: I12b5f4d3e318dd5134984e1c0a4b244a620a57d7
2016-09-20 06:14:31 +00:00
Pascal Massimino
21e7537abe fix infinite loop in case of PARTITION0 overflow
max_i4_header_bits_ could drop to zero for difficult image and trigger
a loop. Surprisingly, StatLoop() didn't have this bug.

Change-Id: Idc0f9eadef30a2b2f02041b994f25def30901e36
2016-09-15 02:38:28 -07:00
Pascal Massimino
053a1565c8 Merge "Change the rule of picking UV mode in MBAnalyzeBestUVMode()" 2016-09-15 06:58:40 +00:00
hui su
1377ac2ec1 Change the rule of picking UV mode in MBAnalyzeBestUVMode()
Pick the mode with the smallest alpha.
It only affects m0, in which case the mode decision is not re-examined
later in VP8Decimate(). Tests on some natural content png images show
PSNR increase as well as visual quality improvement.

Change-Id: Iea997e718cd7477160fa05eb7cfb35f4cec2fa9a
2016-09-15 06:01:03 +00:00
Pascal Massimino
7c1fb7d0ff fix uint32_t initialization (0. -> 0)
Change-Id: Ia4aae27f70c4e74ddeb5654cfabb21d785cea9cf
2016-09-14 20:26:05 +02:00
Pascal Massimino
bfff0bf329 speed-up SSIM calculation
SSIM results are incompatible with previous version!
We're now averaging the SSIM value for each pixels instead of
printing a frame-level global SSIM value.

* Got rid of some old code
* switched to uint32_t for accumulation
* refactoring

SSIM calculation is ~4x faster now.

Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673
2016-09-14 16:15:43 +02:00
Vincent Rabaud
64577de8ae De-VP8L-ize GetEntropUnrefinedHelper.
Having it architecture dependent resulted in an extra
function call of an extern function, hence no inlining and
a 5-10% impact on performance.

Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450
2016-09-14 13:55:24 +02:00
Pascal Massimino
a7be73280b Merge "refactor the PSNR / SSIM calculation code" 2016-09-14 06:37:56 +00:00
Pascal Massimino
50c3d7da9a refactor the PSNR / SSIM calculation code
-print_psnr is now much faster because it doesn't use the SSIM code.
The SSIM speed-up and re-write will come later.

Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823
2016-09-14 06:13:24 +00:00
Pascal Massimino
d6228aed6a indentation fix after I7055d3ee3bd7ed5e78e94ae82cb858fa7db3ddc0
Change-Id: I2145815d778321b9ccc7ac2775aaf64cf2372a42
2016-09-13 22:49:42 -07:00
Vincent Rabaud
dd538b192d Remove unused declaration.
Change-Id: I8ab19654df63e7ef8aad00e97d1428c7b53ee33f
2016-09-13 16:25:46 +02:00
Vincent Rabaud
6cc48b1728 Move some lossless logic out of dsp.
Change-Id: I4cfd60cd5497666a2e1c188ceada2e71b05f1505
2016-09-13 15:37:32 +02:00
Pascal Massimino
78363e9e51 Merge "Remove a redundant call to InitLeft() in VP8IteratorReset()" 2016-09-13 04:47:51 +00:00
hui su
ffd01929f1 Refactor VP8IteratorNext().
Change-Id: I7055d3ee3bd7ed5e78e94ae82cb858fa7db3ddc0
2016-09-12 15:08:24 -07:00
hui su
c4f6d9c939 Remove a redundant call to InitLeft() in VP8IteratorReset()
VP8IteratorSetRow(it, 0) already did it.

Change-Id: I410e48b5205897a6a23301d8a98aa266787676c6
2016-09-12 14:41:15 -07:00
Pascal Massimino
c27d821096 Merge "smartYUV: simplify main loop" 2016-09-12 17:14:22 +00:00
Pascal Massimino
0779529616 smartYUV: simplify main loop
we don't need to centralize best_uv[] since target_uv[] and best_rgb_uv[]
are already centralized. The diff 'W' was just in the ~[-2,2] range, so
we can ignore the correction.

Overall speed-impact is not large, though. Around ~4% faster conversion.

Output with -pre 4 is expected to be slightly different

Change-Id: Ib59f033955577c49b084d0560108020f42d84102
also: remove the useless clipping in StoreGray()
2016-09-12 18:51:40 +02:00
Vincent Rabaud
c9b45863e2 Split off common lossless dsp inline functions.
Change-Id: I64f96897b11d1c21f033c7e47b21edccb5c68738
2016-09-12 17:35:08 +02:00
Pascal Massimino
490ae5b13d smartYUV: improve initial state for faster convergence
For speed reason, the 'gray' plane was initialized with the same
value for 2x2 block. But in some cases (underlying camera noise, e.g.),
it could lead to instability during iteration, noise amplification,
and visible banding.
Using a precise (but slower) initialization solves the issue, and
since the convergence is faster, we might actually gain some speed.

Change-Id: I81c42101497e7096a8f60289d710f5a3bcb0ddea
2016-09-09 18:29:32 +02:00
Pascal Massimino
894232be56 smartYUV: fix and simplify the over-zealous stop criterion
We usually need at least 2 iterations to converge
(and usually not much more after that). Only 1 was not enough.

Change-Id: Iaf802ea81afa2596f4ba045c92f5eaff61623b7b
2016-09-09 10:10:13 +02:00
Hui Su
8de08483b6 Remove unused code in webpi.h
Change-Id: If927530034cefa0da62ddfde588833aafec51d4d
2016-09-08 22:46:32 -07:00
Pascal Massimino
545c147fd8 remove mention of fragment, frgm, FRGM, etc.
demux.h still has a 'fragment' field, for historical reasons.

bumped the ABI number in mux.h

Change-Id: I73299aa2e96b1f2e44f5cebfd84c597d1db12bff
2016-09-02 14:43:12 +02:00
hui su
9e478f808e dec/vp8l.c: add assertions in EmitRescaledRowsRGBA/YUVA
Add assertions to make sure the number of lines that are imported
into rescaler is as expected.

Change-Id: I3e9e4202e696bb09f7aa54519285b6b0deb84c1a
2016-09-01 20:58:37 -07:00