Commit Graph

2369 Commits

Author SHA1 Message Date
Vikas Arora
c8581b06e1 Optimize BackwardReferences for RLE encoding.
Updated BackwardReferencesRle method by utilizing the local color cache.
Also changed the name of method BackwardReferencesHashChain to
BackwardReferencesLz77 to reflect the LZ77 coding.

For the 1000 image corpus, this change saves 0.2% bytes
(at default settings) and is 2-5% faster to encode.

Change-Id: Ic3f288253b3bbb101a69945a80994c3fd0917f8b
2014-11-04 08:12:07 -08:00
Djordje Pesut
5798eee6be MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899)
Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140
2014-11-03 15:39:00 +01:00
Vikas Arora
4167a3f5f7 Optimize backwardreferences
Optimize backwardreferences (about 0.1% byte savings) with almost same
compression speed (3% faster on defaut compression settings).
1.) Simplified iteration logic for HashChainFindCopy.
    - Remapped the iter_max constant.
2.) Simplified main for loop for BackwardReferencesHashChain
    - Removed 'if' conditions for corner cases in the main loop.
    - Refactored the method(AddSingleLiteral) for adding one pixel.

Change-Id: I1bc44832fd81f11e714868a13e606c8f83157e64
2014-10-31 18:08:38 -07:00
James Zern
d18554c30d Merge "webp/types.h: use inline for clang++/-std=c++11" 2014-10-31 03:53:06 -07:00
Urvang Joshi
7489b0e72b gif2webp: Add '-min-size' option to get best compression.
This disables key-frame insertion and tries both dispose methods for
each frame.

Change-Id: Ib167747198fd1d98cb93321f76826e91228f24d8
2014-10-30 15:17:09 -07:00
Vikas Arora
77bdddf016 Speed up BackwardReferences
Speed up BackwardReferencesHashChainDistanceOnly method by:
1.) Remove for loop for shortmax code path.
2.) Execute the shortmax code path after regular call to
HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH).
3.) Also for shortmax, call method HashChainFindOffset (for length = 2),
instead of expensive method HashChainFindCopy().
4.) Handling first pixel (i==0) outside main loop and removing one if
condition (i > 0) per pixel.
5.) Handle the last pixel outside the main 'for' loop.

Overall compression speedup observed is around 5% (+/- noise).

Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b
2014-10-30 10:58:24 -07:00
James Zern
6638710b9e webp/types.h: use inline for clang++/-std=c++11
at least clang 3.[45] in c++ mode with -std=c++11 define __STRICT_ANSI__
this change set WEBP_INLINE to inline for c++/non-strict-ansi/> c99

fixes crbug.com/428383

Change-Id: Ief2b934353c336a75865c73c90cc3dc5e4f83913
2014-10-30 15:25:27 +01:00
Vikas Arora
abf04205b3 Enable entropy based merge histo for (q<100)
Enable bin-partition entropy based heuristic for merging histograms
for higher (q >= 90) qualities as well. Keep the old behavior at the
maximum quality level (q==100).

This speeds up the compression between Q=90-99 (method=4) by factor 5-7X
and with loss of 0.5-0.8% in the compression density.

Change-Id: I011182cb8ae5403c565a150362bc302630b3f330
2014-10-30 03:59:36 -07:00
James Zern
572022a350 filters_mips_dsp_r2.c: disable unfilters
the output does not match the C-code.

Change-Id: Ie7b7387478a6b5c3f08691628ae00f059cf6d899
2014-10-30 11:10:11 +01:00
Djordje Pesut
a28e21b141 MIPS: dspr2: Added optimization for function ClampedAddSubtractFull
Change-Id: Iee98eaf007158f44a299dd5ba8d972d0d4108380
2014-10-29 13:08:06 +01:00
Djordje Pesut
18d5a1efa8 MIPS: dspr2: added optimization for function ClampedAddSubtractHalf
Change-Id: Iec22e897a4f56e79c18ec00f8caa9cefac67f186
2014-10-29 11:08:37 +01:00
Djordje Pesut
829a8c19a0 MIPS: dspr2: added optimization for ITransform
Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463
2014-10-28 14:28:14 +01:00
Urvang Joshi
c94ed49efd gif2webp: Use the default hint instead of WEBP_HINT_GRAPH.
This is much faster and the compression is slightly better too.

Change-Id: Ibf0d10eea83bfabfcc44ee497074767462ff41b1
2014-10-27 16:41:39 -07:00
Vikas Arora
653ace55c3 Increase the MAX_COLOR_CACHE_BITS from 9 to 10.
The Maximum allowed limit is 11.
The Q=25 and below is not impacted as cache bits are forced to 0.
This saves 0.05% - 0.1% bytes for other quality with almost same compression
speed (+/- 2-3%, that's more of a noise).

Change-Id: Icf972a98f298c89e140e37a627baf709134be9a0
2014-10-27 14:19:04 -07:00
Vikas Arora
919220c7e6 Change the logic adjusting the Histogram bits.
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).

The compression density remains the same with this change, with little faster
compression speed.

Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
2014-10-27 09:57:17 -07:00
pascal massimino
53b096c0d7 Merge "Fix bug in VP8LCalculateEstimateForCacheSize." 2014-10-27 02:31:10 -07:00
Vikas Arora
e912bd55be Fix bug in VP8LCalculateEstimateForCacheSize.
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.

This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.

Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
2014-10-26 20:05:48 -07:00
pascal massimino
541d783983 Merge "dec_neon: add RD4 intra predictor" 2014-10-26 13:55:03 -07:00
pascal massimino
f8cd0672bb Merge "Makefile.vc: add a 'legacy' RTLIBCFG option" 2014-10-26 13:54:36 -07:00
James Zern
22881c999e dec_neon: add RD4 intra predictor
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.

Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
613d281e87 update NEWS
Change-Id: Ib9b48e3611b77214bc875524249366ff62451c1b
(cherry picked from commit 0c1b98d28c)
2014-10-23 17:20:12 +02:00
James Zern
1304eb3418 Merge "dec_neon: DC4: use pair-wise adds for top row" 2014-10-23 08:08:34 -07:00
James Zern
34c20c06c8 Makefile.vc: add a 'legacy' RTLIBCFG option
disables buffer security checks (/GS-) and any machine optimizations
(e.g., sse2)

fixes issue #228

Change-Id: I81fa483dc1654199b2017626320383d2d63317dc
2014-10-23 07:57:34 -07:00
pascal massimino
7083006b61 Merge "dsp/dec_{neon,sse2}: VE4: normalize variable names" 2014-10-23 07:29:27 -07:00
James Zern
0db9031c79 dsp/dec_{neon,sse2}: VE4: normalize variable names
use '0' rather than '_' when dealing with variables that result from a
shift

Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
James Zern
b5bc15305b dec_neon: DC4: use pair-wise adds for top row
reduces load count, slightly faster

Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08
2014-10-23 15:48:49 +02:00
Pascal Massimino
5b90d8fe42 Unify the API between VP8BitWriter and VP8LBitWriter
BitReader will be next...

Change-Id: Icd9e7ab2e3890131e664c0523627d9b8c5399a74
2014-10-23 15:35:16 +02:00
pascal massimino
f7ada560ce Merge changes I2e06907b,Ia9ed4ca6,I782282ff
* changes:
  dec_neon: add DC4 intra predictor
  dec_neon: add TM4 intra predictor
  dec_neon: add LD4 intra predictor
2014-10-23 06:31:54 -07:00
pascal massimino
5beb6bf070 Merge "dec_neon: add VE4 intra predictor" 2014-10-23 05:38:41 -07:00
James Zern
eba6ce06c3 dec_neon: add DC4 intra predictor
~70% faster

Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0
2014-10-23 14:21:08 +02:00
James Zern
79abfbd9df dec_neon: add TM4 intra predictor
~21% faster

Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a
2014-10-23 14:21:08 +02:00
James Zern
fe395f0e4d dec_neon: add LD4 intra predictor
based on SSE2 version, ~55% faster

Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0
2014-10-23 14:20:47 +02:00
James Zern
32de385eca dec_neon: add VE4 intra predictor
based on SSE2 version, ~59% faster

Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b
2014-10-23 11:46:08 +02:00
James Zern
72395ba977 Merge "Modify CostModel to allocate optimal memory." 2014-10-23 02:00:39 -07:00
Urvang Joshi
65e5eb8a62 gif2webp: Support GIF_DISPOSE_RESTORE_PREVIOUS
Tweaked the gif2webp_util API to support this.

Requested in: https://code.google.com/p/webp/issues/detail?id=144

Change-Id: I0e8c4edc39227355cd8d3acc55795186e25d0c3a
2014-10-22 17:02:07 -07:00
Urvang Joshi
e4c829efe9 gif2webp: Handle frames with odd offsets + disposal to background.
Snapping odd offsets in GIF to even offsets in WebP was causing extra row/column
being disposed in such cases.

Code is rewritten to maintain previous and current canvas (it used to maintain
previous canvas and current frame earlier). And we recompute change rectangles
as those from GIF may no longer apply.
Also, this renders methods like ReduceTransparency() and ConvertToKeyFrame()
redundant, as internally maintained current canvas is always independent of
previous canvases.

Disposal method choice: we pick the disposal method that results in the smallest
change rectangle.

Change-Id: Ic31186d98fe1a2a790a89d1571b17e3abd127e79
2014-10-22 15:43:26 -07:00
Vikas Arora
c2b5a0396a Modify CostModel to allocate optimal memory.
Change-Id: I7d52675d28bfc109d4e901581fc24cd36fcb79ee
2014-10-22 13:30:33 -07:00
Pascal Massimino
b7a33d7e91 implement VE4/HE4/RD4/... in SSE2
(30% faster prediction functions, but overall speed-up is ~1% only)

Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765
2014-10-22 18:25:36 +02:00
Pascal Massimino
97c76f1f30 make VP8PredLuma4[] non-const and initialize array in VP8DspInit()
also convert 'type *dst' to 'type* dst'

Change-Id: I41ab66ad15b548cc45d1cb8b10bbca4fe1528cae
2014-10-22 18:14:20 +02:00
pascal massimino
0ea8c6c219 Merge "PrintReg: output to stderr" 2014-10-22 08:55:10 -07:00
pascal massimino
d7ff2f976c Merge "stopwatch.h: fix includes" 2014-10-22 08:28:55 -07:00
James Zern
f85ec712b0 PrintReg: output to stderr
allows use of '-o -' while testing

Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191
2014-10-22 17:28:19 +02:00
James Zern
54edbf65ff stopwatch.h: fix includes
WEBP_INLINE -> webp/types.h
memcpy -> string.h

Change-Id: Iab2ea8b553dc98be75eede751de62ab0292d1f97
2014-10-22 17:25:41 +02:00
Vikas Arora
139142e440 Optimize BackwardReferenceHashChainFollowPath.
Instead of calling HashChainFindMethod, call a new (subset) method
HashChainFindOffset to get the offset/distance for a given length.

The encoding is tad faster at default compression

                       Before              After
                     bpp/rate            bpp/rate
442 Palette     0.2720/5.270 MP/s      0.2720/5.790 MP/s
558 non-palette 3.7607/0.797 MP/s      3.7607/0.816 MP/s

Change-Id: If4041a9c18f7e972f49fcbab8c3e2f013d8bf1cf
2014-10-21 10:04:27 -07:00
James Zern
5f36b68d22 enc/backward_references.c: fix indent
reindent after c24f895

Change-Id: I55adcbef21ea3fdaded84b138745515596191a09
2014-10-20 11:35:20 +02:00
James Zern
e0e9960dd1 Merge "sync version numbers to 0.4.2 release" 2014-10-17 11:47:30 -07:00
James Zern
64ac51446d sync version numbers to 0.4.2 release
libwebp{,decoder} - 0.4.2
libwebp libtool - 5.2.0
libwebpdecoder libtool - 1.2.0

mux/demux - 0.2.2
libtool - 1.2.0

(cherry picked from commit eec5f5f121)
(cherry picked from commit 857578a811)

Change-Id: Ie9d10c68e28083674a8865ad8447b1a70dcea95d
2014-10-17 19:50:21 +02:00
Vikas Arora
c24f8954be Simplify and speedup Backward refs computation.
Updated VP8LGetBackwardReferences and HashChainFindCopy method with following:
- Remove the recursive CostModelBuild.
- Reuse the lz77 backward refs in CostModelBuild, instead of evaluating it
  again (as it was done for recursion_level=0).
- Consolidated the Match-length logic inside FindMatchLength method.
- Removed the logic for altering best_length/val based on the 2D distance.
  The additional 162 value (+= 9 * 9 + 9 * 9 - y * y - x * x) can't change the
  best_val eval computation to choose a different curr_length, as best_val was
  set to 'curr_length << 16'.

  Following is the impact on the compression speed/density at default & max
  quality, overall this speeds up compression by 5-15% (q=100 -> 75) with a tad
  drop (0.02-0.03%) in compression density for the non-palette images.

                  Before                After
                bpp/Rate(MP/s)        bpp/Rate(MP/s)
q=75 (def)
All 1000        2.4492/1.049 MP/s     2.4498/1.230 MP/s
Palette         0.2719/5.060 MP/s     0.2719/6.110 MP/s
non-Palette     3.7597/0.732 MP/s     3.7607/0.840 MP/s

q=100
All 1000        2.4134/0.125 MP/s     2.4142/0.131 MP/s
Palette         0.2692/2.585 MP/s     0.2692/2.885 MP/s
non-Palette     3.7040/0.079 MP/s     3.7053/0.083 MP/s

Change-Id: I27a5eff3356d876c3e949fd32262244b25678b7a
2014-10-17 09:21:30 -07:00
James Zern
d1c359ef29 fix shared object build with -fvisibility=hidden
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples

Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
2014-10-17 11:50:52 +02:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00