Commit Graph

1595 Commits

Author SHA1 Message Date
Vikas Arora
c8581b06e1 Optimize BackwardReferences for RLE encoding.
Updated BackwardReferencesRle method by utilizing the local color cache.
Also changed the name of method BackwardReferencesHashChain to
BackwardReferencesLz77 to reflect the LZ77 coding.

For the 1000 image corpus, this change saves 0.2% bytes
(at default settings) and is 2-5% faster to encode.

Change-Id: Ic3f288253b3bbb101a69945a80994c3fd0917f8b
2014-11-04 08:12:07 -08:00
Djordje Pesut
5798eee6be MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899)
Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140
2014-11-03 15:39:00 +01:00
Vikas Arora
4167a3f5f7 Optimize backwardreferences
Optimize backwardreferences (about 0.1% byte savings) with almost same
compression speed (3% faster on defaut compression settings).
1.) Simplified iteration logic for HashChainFindCopy.
    - Remapped the iter_max constant.
2.) Simplified main for loop for BackwardReferencesHashChain
    - Removed 'if' conditions for corner cases in the main loop.
    - Refactored the method(AddSingleLiteral) for adding one pixel.

Change-Id: I1bc44832fd81f11e714868a13e606c8f83157e64
2014-10-31 18:08:38 -07:00
James Zern
d18554c30d Merge "webp/types.h: use inline for clang++/-std=c++11" 2014-10-31 03:53:06 -07:00
Vikas Arora
77bdddf016 Speed up BackwardReferences
Speed up BackwardReferencesHashChainDistanceOnly method by:
1.) Remove for loop for shortmax code path.
2.) Execute the shortmax code path after regular call to
HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH).
3.) Also for shortmax, call method HashChainFindOffset (for length = 2),
instead of expensive method HashChainFindCopy().
4.) Handling first pixel (i==0) outside main loop and removing one if
condition (i > 0) per pixel.
5.) Handle the last pixel outside the main 'for' loop.

Overall compression speedup observed is around 5% (+/- noise).

Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b
2014-10-30 10:58:24 -07:00
James Zern
6638710b9e webp/types.h: use inline for clang++/-std=c++11
at least clang 3.[45] in c++ mode with -std=c++11 define __STRICT_ANSI__
this change set WEBP_INLINE to inline for c++/non-strict-ansi/> c99

fixes crbug.com/428383

Change-Id: Ief2b934353c336a75865c73c90cc3dc5e4f83913
2014-10-30 15:25:27 +01:00
Vikas Arora
abf04205b3 Enable entropy based merge histo for (q<100)
Enable bin-partition entropy based heuristic for merging histograms
for higher (q >= 90) qualities as well. Keep the old behavior at the
maximum quality level (q==100).

This speeds up the compression between Q=90-99 (method=4) by factor 5-7X
and with loss of 0.5-0.8% in the compression density.

Change-Id: I011182cb8ae5403c565a150362bc302630b3f330
2014-10-30 03:59:36 -07:00
James Zern
572022a350 filters_mips_dsp_r2.c: disable unfilters
the output does not match the C-code.

Change-Id: Ie7b7387478a6b5c3f08691628ae00f059cf6d899
2014-10-30 11:10:11 +01:00
Djordje Pesut
a28e21b141 MIPS: dspr2: Added optimization for function ClampedAddSubtractFull
Change-Id: Iee98eaf007158f44a299dd5ba8d972d0d4108380
2014-10-29 13:08:06 +01:00
Djordje Pesut
18d5a1efa8 MIPS: dspr2: added optimization for function ClampedAddSubtractHalf
Change-Id: Iec22e897a4f56e79c18ec00f8caa9cefac67f186
2014-10-29 11:08:37 +01:00
Djordje Pesut
829a8c19a0 MIPS: dspr2: added optimization for ITransform
Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463
2014-10-28 14:28:14 +01:00
Vikas Arora
653ace55c3 Increase the MAX_COLOR_CACHE_BITS from 9 to 10.
The Maximum allowed limit is 11.
The Q=25 and below is not impacted as cache bits are forced to 0.
This saves 0.05% - 0.1% bytes for other quality with almost same compression
speed (+/- 2-3%, that's more of a noise).

Change-Id: Icf972a98f298c89e140e37a627baf709134be9a0
2014-10-27 14:19:04 -07:00
Vikas Arora
919220c7e6 Change the logic adjusting the Histogram bits.
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).

The compression density remains the same with this change, with little faster
compression speed.

Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
2014-10-27 09:57:17 -07:00
pascal massimino
53b096c0d7 Merge "Fix bug in VP8LCalculateEstimateForCacheSize." 2014-10-27 02:31:10 -07:00
Vikas Arora
e912bd55be Fix bug in VP8LCalculateEstimateForCacheSize.
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.

This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.

Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
2014-10-26 20:05:48 -07:00
James Zern
22881c999e dec_neon: add RD4 intra predictor
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.

Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
1304eb3418 Merge "dec_neon: DC4: use pair-wise adds for top row" 2014-10-23 08:08:34 -07:00
pascal massimino
7083006b61 Merge "dsp/dec_{neon,sse2}: VE4: normalize variable names" 2014-10-23 07:29:27 -07:00
James Zern
0db9031c79 dsp/dec_{neon,sse2}: VE4: normalize variable names
use '0' rather than '_' when dealing with variables that result from a
shift

Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
James Zern
b5bc15305b dec_neon: DC4: use pair-wise adds for top row
reduces load count, slightly faster

Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08
2014-10-23 15:48:49 +02:00
Pascal Massimino
5b90d8fe42 Unify the API between VP8BitWriter and VP8LBitWriter
BitReader will be next...

Change-Id: Icd9e7ab2e3890131e664c0523627d9b8c5399a74
2014-10-23 15:35:16 +02:00
pascal massimino
f7ada560ce Merge changes I2e06907b,Ia9ed4ca6,I782282ff
* changes:
  dec_neon: add DC4 intra predictor
  dec_neon: add TM4 intra predictor
  dec_neon: add LD4 intra predictor
2014-10-23 06:31:54 -07:00
pascal massimino
5beb6bf070 Merge "dec_neon: add VE4 intra predictor" 2014-10-23 05:38:41 -07:00
James Zern
eba6ce06c3 dec_neon: add DC4 intra predictor
~70% faster

Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0
2014-10-23 14:21:08 +02:00
James Zern
79abfbd9df dec_neon: add TM4 intra predictor
~21% faster

Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a
2014-10-23 14:21:08 +02:00
James Zern
fe395f0e4d dec_neon: add LD4 intra predictor
based on SSE2 version, ~55% faster

Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0
2014-10-23 14:20:47 +02:00
James Zern
32de385eca dec_neon: add VE4 intra predictor
based on SSE2 version, ~59% faster

Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b
2014-10-23 11:46:08 +02:00
Vikas Arora
c2b5a0396a Modify CostModel to allocate optimal memory.
Change-Id: I7d52675d28bfc109d4e901581fc24cd36fcb79ee
2014-10-22 13:30:33 -07:00
Pascal Massimino
b7a33d7e91 implement VE4/HE4/RD4/... in SSE2
(30% faster prediction functions, but overall speed-up is ~1% only)

Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765
2014-10-22 18:25:36 +02:00
Pascal Massimino
97c76f1f30 make VP8PredLuma4[] non-const and initialize array in VP8DspInit()
also convert 'type *dst' to 'type* dst'

Change-Id: I41ab66ad15b548cc45d1cb8b10bbca4fe1528cae
2014-10-22 18:14:20 +02:00
pascal massimino
0ea8c6c219 Merge "PrintReg: output to stderr" 2014-10-22 08:55:10 -07:00
James Zern
f85ec712b0 PrintReg: output to stderr
allows use of '-o -' while testing

Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191
2014-10-22 17:28:19 +02:00
Vikas Arora
139142e440 Optimize BackwardReferenceHashChainFollowPath.
Instead of calling HashChainFindMethod, call a new (subset) method
HashChainFindOffset to get the offset/distance for a given length.

The encoding is tad faster at default compression

                       Before              After
                     bpp/rate            bpp/rate
442 Palette     0.2720/5.270 MP/s      0.2720/5.790 MP/s
558 non-palette 3.7607/0.797 MP/s      3.7607/0.816 MP/s

Change-Id: If4041a9c18f7e972f49fcbab8c3e2f013d8bf1cf
2014-10-21 10:04:27 -07:00
James Zern
5f36b68d22 enc/backward_references.c: fix indent
reindent after c24f895

Change-Id: I55adcbef21ea3fdaded84b138745515596191a09
2014-10-20 11:35:20 +02:00
James Zern
e0e9960dd1 Merge "sync version numbers to 0.4.2 release" 2014-10-17 11:47:30 -07:00
James Zern
64ac51446d sync version numbers to 0.4.2 release
libwebp{,decoder} - 0.4.2
libwebp libtool - 5.2.0
libwebpdecoder libtool - 1.2.0

mux/demux - 0.2.2
libtool - 1.2.0

(cherry picked from commit eec5f5f121)
(cherry picked from commit 857578a811)

Change-Id: Ie9d10c68e28083674a8865ad8447b1a70dcea95d
2014-10-17 19:50:21 +02:00
Vikas Arora
c24f8954be Simplify and speedup Backward refs computation.
Updated VP8LGetBackwardReferences and HashChainFindCopy method with following:
- Remove the recursive CostModelBuild.
- Reuse the lz77 backward refs in CostModelBuild, instead of evaluating it
  again (as it was done for recursion_level=0).
- Consolidated the Match-length logic inside FindMatchLength method.
- Removed the logic for altering best_length/val based on the 2D distance.
  The additional 162 value (+= 9 * 9 + 9 * 9 - y * y - x * x) can't change the
  best_val eval computation to choose a different curr_length, as best_val was
  set to 'curr_length << 16'.

  Following is the impact on the compression speed/density at default & max
  quality, overall this speeds up compression by 5-15% (q=100 -> 75) with a tad
  drop (0.02-0.03%) in compression density for the non-palette images.

                  Before                After
                bpp/Rate(MP/s)        bpp/Rate(MP/s)
q=75 (def)
All 1000        2.4492/1.049 MP/s     2.4498/1.230 MP/s
Palette         0.2719/5.060 MP/s     0.2719/6.110 MP/s
non-Palette     3.7597/0.732 MP/s     3.7607/0.840 MP/s

q=100
All 1000        2.4134/0.125 MP/s     2.4142/0.131 MP/s
Palette         0.2692/2.585 MP/s     0.2692/2.885 MP/s
non-Palette     3.7040/0.079 MP/s     3.7053/0.083 MP/s

Change-Id: I27a5eff3356d876c3e949fd32262244b25678b7a
2014-10-17 09:21:30 -07:00
James Zern
d1c359ef29 fix shared object build with -fvisibility=hidden
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples

Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
2014-10-17 11:50:52 +02:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6 mark some init function as being safe for thread_sanitizer.
introduces the macro WEBP_TSAN_IGNORE_FUNCTION

Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
James Zern
79b5bdbfde bit_reader.h: cosmetics: fix a typo
Change-Id: I1ba09124700b3120f18eb3705eb5ba805feb2ca0
2014-10-16 10:52:47 +02:00
Pascal Massimino
6c6736816c Improved near-lossless mode.
Compared to previous mode it gives another 10-30% improvement in compression keeping comparable PSNR on corresponding quality settings.

Still protected by the WEBP_EXPERIMENTAL_FEATURES flag.

Change-Id: I4821815b9a508f4f38c98821acaddb74c73c60ac
2014-10-15 10:57:21 -07:00
James Zern
0ce27e715e enc_mips32: workaround gcc-4.9 bug
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9

In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault

Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
2014-10-15 19:14:04 +02:00
James Zern
aca1b98f52 enc/vp8l.c: fix indent
reindent after ca00502

Change-Id: I8c88dbc11dc96c117531b17682b764a235ef23bb
2014-10-13 11:33:23 +02:00
Vikas Arora
ca00502788 Evaluate non-palette compression for palette image
Evaluate if for Palette images (num_colors <= 256), non-palette
compression path (Subtract green, predictor transform etc) yield an
optimal compression density.

This change reduces the WebP file (for palette images) size by 0.4% with
drop of 3-5% in compression speed.

Change-Id: I1ad66fa94db4fd7ba7bc215763791ef662cd4f42
2014-10-10 11:55:45 -07:00
James Zern
c8a87bb62d AssignSegments: quiet -Warray-bounds warning
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9

Change-Id: Ifa7c0dd7f3f075b3860fa8ec176d2c98ff54fcea
2014-10-10 17:18:39 +02:00
pascal massimino
32f67e309f Merge "enc_neon: initialize vectors w/vdup_n_u32" 2014-10-09 12:23:18 -07:00
Pascal Massimino
fabc65da32 1-3% faster encoding optimizing SSE_NxN functions
got rid of the |a-b|^|b-a| method and went back
to just (a-b)^2 instead.

quality | size(bytes) after/before | time (ms) after/before

Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0
2014-10-09 07:20:00 -07:00
James Zern
7534d71640 enc_neon: initialize vectors w/vdup_n_u32
replaces {} initialization gnu-ism

Change-Id: I5a7b2d4246f0205e4bfb7f4b77d720c47d8674ec
2014-10-09 12:35:41 +02:00
Pascal Massimino
5f81391263 Merge "Fix return code of EncodeImageInternal()" 2014-10-07 23:49:29 -07:00
Pascal Massimino
e321abe43d Fix return code of EncodeImageInternal()
It was returning 'VP8_ENC_OK' in case of memory error.

Change-Id: I184a3e29c9f1b863637cacbe389b058d75c3dbf8
2014-10-08 08:48:53 +02:00
Pascal Massimino
f82cb06afb optimize palette ordering
We compact the palette by weighted distance, favoring the green channel.

Average gain on paletted file is ~0.5%, with gain up to 6-7% on some favorable cases.
Encoding speed is unaffected.

Disabled for alpha (or any single-channel input)

Also: always use quality=20 for EncodePalette() since it
doesn't make any real difference.

Change-Id: I19fb14316a366f139a941b45aef5663a33c905e1
2014-10-08 08:42:36 +02:00
Pascal Massimino
f545feee64 don't set the alpha value for histogram index image
This leads to tiny extra compression (~few bytes per file) for free

Change-Id: Ia4d8cef3de4365e32eacefd69a57689c80042a23
2014-10-08 08:24:19 +02:00
Pascal Massimino
2d9b0a4472 add WebPDispatchAlphaToGreen() to dsp
SSE2 version is 2.1x faster

This is used to transfer the alpha plane to green channel before lossless compression.

Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
2014-10-06 23:15:44 +02:00
Vikas Arora
d5e498d47f Change Entropy based Histogram Combine heuristic.
Don't combine the Histograms that have trivial (single valued A, R & B)
  symbols.
Following is the compression savings data along with compression time (before
& after) per image.
                     Before             After
                     bpp, rate(MP/s)    bpp, rate(MP/s)
Q=25, method = 4     2.508, 1.807       2.499, 1.916
Q=50, method = 4     2.460, 1.488       2.456, 1.512
Q=75, method = 4     2.452, 1.078       2.450, 1.092
Q=25, method = 5     2.505, 1.398       2.496, 1.383
Q=50, method = 5     2.458, 1.170       2.453, 1.143
Q=75, method = 5     2.453, 0.886       2.450, 0.855

This change provides 0.1-0.4% compression gains and speeds up the lossless
compression for the default method=4 (the drop in compression speed is between 1-3.5% for method=5).

Change-Id: Idfd88c2092f37afacd26a97097b3053f8183953a
2014-09-30 13:41:39 -07:00
Pascal Massimino
47a2d8e1d9 fix MSVC float->int conversion warning
+ add a clarifying comment

Change-Id: I8ac1df1de2e5277f2d968dec489546e680bb5e0c
2014-09-27 00:36:01 -07:00
James Zern
35ad48b848 HistoHeapInit: correct positions allocation size
Change-Id: I1879fd48bee3aea6f0504926d7030b504dd9be07
2014-09-26 11:21:19 -07:00
Pascal Massimino
45d9635fd3 lossless: entropy clustering for high qualities.
Tested on 1000 pngs corpus with quality 90-100 it gives ~0.15% improvement
in compression density and ~7% speed up.

Change-Id: I460f56c96707edb3c1f0b51a024e5122e10458df
2014-09-26 15:26:56 +02:00
Pascal Massimino
dc37df8c7a fix type warning for VS9_x64
Error report was:
src\utils\color_cache.c(48) : warning C4334: '<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)

Change-Id: I93463ba7cd94faf1cf04986acbfaa06b62700d26
2014-09-25 23:47:06 -07:00
Vikas Arora
fdd6528ba2 Remove unused VP8LDecoder member variable
Remove the unused VP8LDecoder member variable (last_cached_)

Change-Id: I4a7d2f1b72d166efb978850e061dc69c8509e224
2014-09-24 11:59:51 -07:00
James Zern
ea3bba5a66 Merge "rewrite Disto4x4 in enc_neon.c with intrinsic" 2014-09-24 10:51:47 -07:00
Pascal Massimino
f060dfc422 add lossless incremental decoding support
* We don't need to change DecodeAlpha, since incremental
decoding is not useful for Alpha (we already decode
progressively along the RGB)
* Similarly, we don't do incremental decoding for level>0 planes:
   the metadata don't turn into visible pixel (only the ones in level0), so...
(No visible speed change)

Change-Id: I2fd4b9ba227561a7dbede647686584b752be7baa
2014-09-24 09:55:01 +02:00
Yang Zhang
ab70794ddb rewrite Disto4x4 in enc_neon.c with intrinsic
Performance test:
Platform: A9
Input data: bryce.yuv  11158x2156
performance of assembly is the base. Less ratio is better.
|toolchain |assembly |intrinsic |
|gcc4.6    |100%     |97.15%    |
|gcc4.8    |100%     |95.51     |

Change-Id: Idc2446685acdeb58a4dbdcdae533c68a83a1b879
2014-09-23 18:28:36 -07:00
Djordje Pesut
d4471637ef MIPS: dspr2: added optimization for function FilterLoop24
affected functions: VFilter16i, HFilter16i, VFilter8i and HFilter8i

Change-Id: I5d2bc7716e60e048a33d630fe4a86011bfb6d42e
2014-09-23 10:32:55 +02:00
skal
2aef54d429 Merge "prepare VP8LDecodeImage for incremental decode" 2014-09-23 00:31:27 -07:00
pascal massimino
aed0f5a231 Merge "MIPS: dspr2: added optimization for function FilterLoop26" 2014-09-23 00:17:25 -07:00
skal
286306853e prepare VP8LDecodeImage for incremental decode
- don't call VP8LClear() when there's no error (and let the caller do it)
- only initialize output once if state_ is not READ_DATA
- don't over-set dec->status_ = READ_DATA
- don't re-set dec->status_ if DecodeImageStream() fails
- remove unneeded dec->action_ field
- make ReadImageInfo() check br->eos_
- use ErrorStatusLossless() more consistently

Change-Id: Ica6e4b1c82e3fce8b1ce0274def551a886b73b0b
2014-09-23 00:13:52 -07:00
skal
248f3aed22 remove br->error_ field
it's somewhat redundant with br->eos_

also make the status-check coherent.

Change-Id: I98e755e037d45acb0760baf2344bf11fb5fb5cda
2014-09-23 00:04:58 -07:00
Djordje Pesut
49e15044ef MIPS: dspr2: added optimization for function FilterLoop26
affected functions: VFilter16, HFilter16, VFilter8 and HFilter8

Change-Id: Ib2fc41aaa00b10c2906d689bdc5a10f4568e70a8
2014-09-23 08:46:05 +02:00
skal
c792d4129a Premultiply with alpha during U/V downsampling
This prevents the 'alpha-leak' reported in issue #220

Speed-diff is kept minimal.

Change-Id: I1976de5e6de7cfcec89a54df9233c1a6586a5846
2014-09-18 23:40:34 -07:00
Vikas Arora
b901416b90 Record the lossless size stats.
Record and show the lossless header and image data sizes in the cwebp.

Change-Id: I08f19693cb7a756b6fdce5b55d71f5367b5f02fc
2014-09-17 15:16:05 -07:00
Pascal Massimino
cddd334050 Add a WebPExtractAlpha function to dsp
This is the opposite of WebPDispatchAlpha

+ Implement the SSE2 version

Change-Id: I0c297309255f508c5261da8aad01f7e57f924d6c
2014-09-15 08:12:03 +02:00
Pascal Massimino
0716a98eb3 fix indent after I0204949917836f74c0eb4ba5a7f4052a4797833b
Change-Id: I5d9e5d0a2ad2cefd8c539571d2eaee948da60ad5
2014-09-12 19:59:53 +02:00
Vikas Arora
f9ced95a9b Optimize lossless decoding for trivial(ARB) codes.
Optimize the decoding for region that have trivial literal codes.
The trivial literal is defined as huffman image with Red, Blue and Alpha
huffman trees with only single code values.
This speeds up lossless decoding by 3%

Change-Id: I0204949917836f74c0eb4ba5a7f4052a4797833b
2014-09-12 09:08:08 -07:00
Pascal Massimino
690b491af1 fix loop bug in DispatchAlpha()
* We were re-doing most of the work in plain-C as 'left-over'.
* we were always returning has_alpha = true because of a bad mask all_0xff

These bugs were conservative and silent, in the sense that we were 'just' doing
more work than necessary.

Now, the SSE2 version is really 2x faster than the C version.

Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa
2014-09-11 22:35:08 +02:00
Djordje Pesut
3101f53720 MIPS: dspr2: added optimization for TransformOne
added macros for TransformOne, TransformAC3 and TransfromDC

Change-Id: I4341450f443cf46dcf91c0db17bde63c8fb8afee
2014-09-11 17:02:02 +02:00
Pascal Massimino
a6bb9b17d8 SSE2 for inverse Mult(ARGB)Row and ApplyAlphaMultiply
Change-Id: Iab5c0e4a4d2b31f86736a9b277e62b6e28c3d2b4
WebPMultRow: ~7x faster
WebPMultARGBRow: ~3x faster
ApplyAlphaMultiply: 60% faster
2014-09-11 07:58:42 +02:00
Vikas Arora
d84a8ffdf7 Remove default initialization of decoder status.
emove the default initialization of decoder status in the method
VP8LDecodeImage().

Change-Id: Ie6b949606349f4e937c4c1dd2c02ff2a4f86870f
2014-09-10 14:55:46 -07:00
Vikas Arora
e0a9932161 Rectify bug in lossless incremental decoding.
Handle the corner case when VP8LDecodeImage() method is called with an invalid
header data. The lossless decoding doesn't support incremental mode yet.
Return the error status as BITSTREAM error in case not all pixels are decoded
with the provided bit-stream. Also added asserts in the VP8LDecodeImage() method
to validate the decoder header with appropriate/valid data for huffman trees
(htree_groups_ etc).

Change-Id: Ibac9fcfc4bd0a2c5f624bb9d4a2b9f6459aa19ea
2014-09-09 15:34:16 -07:00
Djordje Pesut
e2502a97c1 MIPS: dspr2: added optimization for TransformAC3
Change-Id: Icd789ee5f6d764297e7dc0a0f8a3bc47ab92ac65
2014-09-09 14:53:36 +02:00
Djordje Pesut
24e1072aac MIPS: dspr2: added optimization for TransformDC
Change-Id: Iee69758f6442ea9c80ddaa32cea8d00dda4c6252
2014-09-09 14:15:04 +02:00
Pascal Massimino
c0e84df8e8 Merge "Slightly faster lossless decoding (1%)" 2014-09-09 03:55:00 -07:00
Pascal Massimino
8dd28bb560 Slightly faster lossless decoding (1%)
-> introduce special case 64b pattern-copy, similar to the 8b one for alpha.
-> use mempcy() for non-overlapping areas
+ cosmetics and homogenezation of the code

Change-Id: I0e65e04b96fec94c009a4614137dfba2a0f98561
2014-09-09 11:18:30 +02:00
Djordje Pesut
f0103595dd MIPS: dspr2: added optimization for ColorIndexInverseTransforms
Change-Id: I5b6094ce489d4f896bc4b8f575142eb3c5054beb
2014-09-08 17:22:59 +02:00
Pascal Massimino
d3242aee16 make VP8LSetBitPos() set br->eos_ flag
ReadSymbol() finishes with a VP8LSetBitPos() call only and could miss an eos_ during the decode loop.

Things are faster because of inlining too.

Change-Id: I2d2a275f38834ba005bc767d45c5de72d032103e
2014-09-06 08:40:20 +02:00
Pascal Massimino
a9decb5584 Lossless decoding: fix eos_ flag condition
eos_ needs to be set only when superfluous bits have actually
been requested.
Earlier, we were assuming pre-mature end-of-stream to be an error.
Now, more precisely, we mark error when we have encountered end-of-stream *and*
we attempt to read more bits after that.

This handles cases where image data requires no bits to be read

Change-Id: I628e2c39c64f10c443fb51f86b1f5919cc9fd299
2014-09-05 20:21:50 +02:00
Pascal Massimino
3fea6a28da fix erroneous dec->status_ setting
We only need to set BITSTREAM_ERROR if !ok.

Change-Id: I5bd13e64797e8bc509477edb29158abb39cb0ee1
2014-09-05 19:48:11 +02:00
Djordje Pesut
80b8099fd8 MIPS: dspr2: add some specific mips code to commit I2c3f2b12f8df15b785fad5a9c56316e954ae0c53
added some C-code tuning also

Change-Id: I67ce70a063ef6b5821b9158a4defd6987eccbb9a
2014-09-04 13:42:39 +02:00
skal
e564062522 Merge "further refine the COPY_PATTERN optim for DecodeAlpha" 2014-09-04 03:43:55 -07:00
James Zern
854509fec0 enc/histogram.c: reindent after f4059d0
fixes indent in HistogramRemap after:
f4059d0 Code cleanup for HistogramRemap.

Change-Id: I9f53a088749e9100a70331bda1662488666c5156
2014-09-03 16:58:49 -07:00
skal
344219645b Merge "~3-5% faster encoding optimizing PickBestIntra*()" 2014-09-03 15:53:32 -07:00
skal
865069c12e further refine the COPY_PATTERN optim for DecodeAlpha
* use functions instead of MACRO
* adjust var's name

Overall, same speed, with more readible code

Change-Id: I2c3f2b12f8df15b785fad5a9c56316e954ae0c53
2014-09-04 00:25:27 +02:00
Djordje Pesut
a59562283f added C-level optimization for DecodeAlphaData function
Copies with short distances of 1,2 and 4 are specialized.

up to 10-14% faster alpha decoding.

Change-Id: I9708e98193910bfaf8ef43091f3fdea73b63896d
2014-09-03 16:49:17 +02:00
skal
187d379db6 add a fallback to ALPHA_NO_COMPRESSION
if ALPHA_LOSSLESS_COMPRESSION produces a too big file (very rare!),
we fall-back to no-compression automatically.

Change-Id: I5f3f509c635ce43a5e7c23f5d0f0c8329a5f24b7
2014-09-02 21:55:13 +02:00
skal
a48a2d7635 ~3-5% faster encoding optimizing PickBestIntra*()
* Add early-out check for Intra16
* replace some memcpy() by pointer swap

Change-Id: I5edc5f7fbc8e39984deb48e6c045c97c61418589
2014-09-01 14:40:25 +02:00
skal
77d4c7e337 address cosmetic comments from patch #71380
Change-Id: Iaba301b9e77aa4febe0efe1e6016fab42d5589f3
2014-08-28 18:08:00 -07:00
skal
f75dfbf23d Speed up Huffman decoding for lossless
speed-up is ~1.6% for photographic image to 10% for graphical image
(1000 images corpus was sped up by 5.8 %)

Code by akramarz@google.com and jyrki@google.com

Change-Id: Iceb2e50e6cc761b9315a3865d22ec9d19b8011c6
2014-08-28 12:28:04 -07:00
James Zern
637b388809 dsp/lossless: workaround gcc-4.9 bug on arm
force Sub3() to not be inlined, otherwise the code in Select() will be
incorrect.
https://android-review.googlesource.com/#/c/102511

Change-Id: I90ae58bf3e6cc92ca9897f69974733d562e29aaf
2014-08-27 20:31:21 -07:00
James Zern
8323a9038d dsp.h: collect gcc/clang version test macros
endian_inl.h already relies on dsp.h, grab the definitions from there.

Change-Id: I445f7d0631723043c55da1070498f89965bec7b1
2014-08-27 19:33:09 -07:00
skal
e6c4b52f28 move static initialization of WebPYUV444Converters[] to the Init function.
Split initialization of YUV444Converters[] out of Upsamplers init.

update test for NULL function pointers

Change-Id: I9603f54250f90c85a12ffbecfd6c59e9b06c47e0
2014-08-27 11:36:37 -07:00
skal
49911d4df2 Merge "fix indentation" 2014-08-27 07:52:36 -07:00
Vikas Arora
f4059d0c7d Code cleanup for HistogramRemap.
Avoid call to HistogramAddThresh when there's only one Histogram image.
Change-Id: I43b09e8e2d218c95969567034224777dcce37ab9
2014-08-26 15:45:22 -07:00
skal
e632b0929b fix indentation
Change-Id: I2294a6c83e5f345f64bd5120b91532e00ed6c543
2014-08-25 23:52:09 -07:00
skal
f5c04d64b7 Merge "add a DispatchAlpha() for SSE2 that handles 8 pixels at a time" 2014-08-25 22:43:42 -07:00
skal
fc98edd936 add a DispatchAlpha() for SSE2 that handles 8 pixels at a time
Only slightly faster.

Change-Id: Ie2e57e6a0950166124cf1075c6c9b45b7abdad8c
2014-08-25 21:03:03 -07:00
skal
73d361dd5f introduce VP8EncQuantize2Blocks to quantize two blocks at a time
No speed diff for now. We might reorder better the instructions later,
to speed things up.

Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c
2014-08-25 20:21:42 -07:00
Djordje Pesut
0b21c30b1a MIPS: dspr2: added optimization for EmitAlphaRGB
New dsp function: WebPDispatchAlpha()

Change-Id: I48e539d22471279ec75185759bc68d18b127f716
2014-08-21 20:39:35 -07:00
James Zern
953acd56a4 enc_neon: enable QuantizeBlock for aarch64
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.

Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
2014-08-20 11:48:25 -07:00
Djordje Pesut
f4ae143720 MIPS: mips32: code rebase
mips code rebased to be same as C code
from commit I8c29a8a0285076cb3423b01ffae9fcc465da6a81

Change-Id: I3848f4ce43387c3a62b336606498779f7b07ec44
2014-08-19 15:13:16 +02:00
Djordje Pesut
569771549a MIPS: dspr2: added optimizations for VP8YuvTo*
VP8YuvToRgb
VP8YuvToBgr
VP8YuvToRgb565
VP8YuvToRgba4444
VP8YuvToArgb
VP8YuvToBgra
VP8YuvToRgba

Change-Id: I22212a125d890e1fd28388fec906a1a5c07ff386
2014-08-19 14:29:32 +02:00
skal
2523aa73cb SmartRGBYUV: fix odd-width problem with pixel replication
rightmost pixel was missing a copy, which could lead to invalid read.

Also added a lower dimension of 4, below which we use the regular conversion.
This is to prevent corner cases, in addition to not being overkill.

Change-Id: Iac12e7a3d74590f12fe8eeb1830b9891e61439f6
2014-08-18 15:58:36 -07:00
Pascal Massimino
ee52dc4e54 fix some MSVC64 warning about float conversion
Change-Id: I27ab27fc15033d27d0505729f6275fb542c8d473
2014-08-16 00:15:29 -07:00
James Zern
3fca851a20 cpu: check for _MSC_VER before using msvc inline asm
_M_IX86 will be defined in mingw builds after including windows.h. as
the gcc inline asm is first, this missing check would only have caused
an error if the code was reorganized.

Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c
2014-08-15 15:11:40 -07:00
skal
e2a83d7109 faster RGB->YUV conversion function (~7% speedup)
with a special case for dithering==0., it gets somewhat faster on x86
thanks to inlining.

Also, less macros.

Change-Id: Ic2f2bf6718310743bb40cef2104fa759a073e6d5
2014-08-15 11:13:25 -07:00
skal
de2d03e12f Merge "Add smart RGB->YUV conversion option -pre 4" 2014-08-15 11:07:49 -07:00
skal
3fc4c539aa Add smart RGB->YUV conversion option -pre 4
New function: WebPPictureSmartARGBToYUVA()
This implement smart RGB->YUV conversion.

This is rather undocumented for now, and is triggered using '-pre 4'
preprocessing option.

This is slow-ish and use quite some memory, but should be improvable.
This is somehow a usable beta version.

Change-Id: Ia50a8c30134e4cab8a7d3eb70aef13ce1f6187a1
2014-08-15 10:55:09 -07:00
Djordje Pesut
b4dc4069a2 MIPS: dspr2: added optimization for (un)filters
HorizontalFilter
VerticalFilter
GradientFilter
HorizontalUnfilter
VerticalUnfilter
GradientUnfilter

Change-Id: I54055b4767c37719691811072e95bf79c1f627b1
2014-08-14 11:55:19 -07:00
Djordje Pesut
b61c9ceca8 MIPS: dspr2: Optimization of some simple point-sampling functions
Change-Id: I6a4ab29bd0cc5a2951a8882cf9997032dc38bd79
2014-08-13 17:18:49 +02:00
Djordje Pesut
98c54107df MIPS: mips32r2: added optimization for BSwap32
gcc < 4.8.3 doesn't translate bswap optimally.
use optimized version always

Change-Id: I979ea26ad6dc0166d3d2f39c4148eb8adfb7ddec
2014-08-12 09:29:13 +02:00
Djordje Pesut
b7e5a5c451 MIPS: detect mips32r6 and disable mips32r1 code
Change-Id: Id1325c789a990c9a8704e84e99a22d580303eb8a
2014-08-08 17:29:31 +02:00
pascal massimino
bb07022b66 Merge "cosmetics" 2014-08-06 12:30:08 -07:00
James Zern
e300c9d819 cosmetics
fix some indent/whitespace, remove a few duplicate includes, extra
semi-colons

Change-Id: If937182b40a21e0f2028496e7b4b06c6e8a41352
2014-08-06 12:10:59 -07:00
pascal massimino
0e519eea8e Merge "cosmetics: remove some extraneous 'extern's" 2014-08-05 23:00:04 -07:00
pascal massimino
3ef0f08af5 Merge "vp8enci.h: cosmetics: fix '*' placement" 2014-08-05 22:34:13 -07:00
James Zern
4c6dde37b9 bit_writer: cosmetics: rename kFlush() -> Flush()
Change-Id: I8907927974188bee85ffade1d75d2e50817aa115
2014-08-05 22:14:29 -07:00
James Zern
f7b4c48bba cosmetics: remove some extraneous 'extern's
Change-Id: Ib3f0cff37120c51633387dd1c46592c53ab0ba6d
2014-08-05 22:14:24 -07:00
James Zern
b47fb00ac0 vp8enci.h: cosmetics: fix '*' placement
associate with the type

Change-Id: Icf94f11bf79f6ccee3150e27b228755f8f3f0f37
2014-08-05 22:14:12 -07:00
skal
b5a36cc9ad add -near_lossless [0..100] experimental option
This compresses the uimage using lossless compression and controlable
decimating pre-process.
Code is under WEBP_EXPERIMENTAL_FEATURE while it's being experimented with.

Change-Id: I8b7f4cfcc3c6afc52a556102842bdbb045ed5ee8
2014-08-05 19:17:10 +02:00
James Zern
0524d9e5e8 dsp: detect mips64 & disable mips32 code
Change-Id: Icf68dafd5cf0614ca25b36a0252caa1784ac8059
2014-08-01 21:18:53 -07:00
James Zern
29a9fe222a libwebp 0.4.1
- 7/24/14: version 0.4.1
   This is a binary compatible release.
   * AArch64 (arm64) & MIPS support/optimizations
   * NEON assembly additions:
     - ~25% faster lossy decode / encode (-m 4)
     - ~10% faster lossless decode
     - ~5-10% faster lossless encode (-m 3/4)
   * dwebp/vwebp can read from stdin
   * cwebp/gif2webp can write to stdout
   * cwebp can read webp files; useful if storing sources as webp lossless
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJT1xp9AAoJEPnD1r24IytdjDEP/3ZOnrWG0OIThlGE6bqgO3oy
 Y5O7RrvzFuPdGEZ1Kl9jDXjzsYY018/+HJmOD3kf+Qt/+F/8hpGH520VuEiJdVIW
 UcvoYaYq9xrmKNqEJx910Vh8TP7wE2T62OJcqKWg2JEczfUWn8WOKjmM5c8N1kJ2
 q6EbpCdWlxcD49L/MavJ5Yfw9jSZAjKzOIxxz0C294iMTK4IcSmeVvdqhkdyh96E
 CABw3o8sJfqB6p+KXjweXcE2KOhvzAWqTRcIogDC0jV/PgOlindf6k0am2FJHvMM
 A+sf/pmD0YKI1vEaXW+Vs6cz6LzvwbIkJSwuzBA7FYHAG5yqTSkQDxTSttw/RwiW
 fUScqHjQVBUqkM5bdOsdYBSDutQKDF2+WfcK5jXFdnydkQi59HKHV2R0K5cXYqfN
 Tu7aMBqFcfGunLlzfKCJcz8SElEmUjG6oAzRZYcdM9dmnR7ypQK17A/GbaysKKOE
 HMmep7uNX25w+6AL7zExnmPPPtSz+kj1SXt9fgldkelDhg1faAgfwXb/N4E+00lA
 1+aJD3gHcR4QnDI4gnKBKHyIktQPfNKMQ6xuL0oyvsalQ/loz08wu0aACcGDFrg4
 uOVVxTqU+pEITuwGcNk228+O2EbMWzzi3+Vhi1v3Gg3jJ3TRB3QN6NohmrsIackL
 4W2V5NoX5i2VizGfLy2g
 =GWd5
 -----END PGP SIGNATURE-----

Merge tag 'v0.4.1'

libwebp 0.4.1
- 7/24/14: version 0.4.1
  This is a binary compatible release.
  * AArch64 (arm64) & MIPS support/optimizations
  * NEON assembly additions:
    - ~25% faster lossy decode / encode (-m 4)
    - ~10% faster lossless decode
    - ~5-10% faster lossless encode (-m 3/4)
  * dwebp/vwebp can read from stdin
  * cwebp/gif2webp can write to stdout
  * cwebp can read webp files; useful if storing sources as webp lossless

* tag 'v0.4.1':
  update ChangeLog
  iosbuild.sh: specify optimization flags
  update ChangeLog
  makefile.unix: add vwebp.1 to the dist target
  update ChangeLog
  gif2webp: dust up the help message
  remove -noalphadither option from README/vwebp.1
  update NEWS for the next release
  update AUTHORS
  bump version to 0.4.1
  restore mux API compatibility
  remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
  restore encode API compatibility
  restore decode API compatibility
  gif2webp: fix compile with giflib 5.1.0
  gif2webp: simplify giflib version checking

Change-Id: Icf599f29bc6c0db757bc133aaddb3dbbbc316e08
2014-07-29 18:06:58 -07:00
James Zern
85213b9bbe bump version to 0.4.1
libwebp{,decoder} - 0.4.1
libwebp libtool - 5.1.0
libwebpdecoder libtool - 1.1.0

mux/demux - 0.2.1
libtool - 1.1.0

Change-Id: If593a198f802fd68c7dbbdbe0fc2612dbc44e2df
2014-07-23 17:17:25 -07:00
James Zern
695f80ae25 Merge "restore mux API compatibility" into 0.4.1 2014-07-23 17:11:33 -07:00
James Zern
862d296cf9 restore mux API compatibility
protect WebPMuxSetCanvasSize w/a WEBP_MUX_ABI_VERSION check

Change-Id: I6b01af55ebb4cc4c860d3cbf43be722077896748
2014-07-23 16:13:56 -07:00
skal
8f6f8c5dde remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
There's no speed diff, so better remove it altogether

Reported in https://code.google.com/p/webp/issues/detail?id=215

Change-Id: I991330de18bec340029d6df5fed0dfb4337e4662
2014-07-23 14:15:40 -07:00
James Zern
c2fc52e4ec restore encode API compatibility
protect WebPConfigLosslessPreset/WebPMemoryWriterClear w/a
WEBP_ENCODER_ABI_VERSION check

Change-Id: If4debc15fee172a3f18079bc2bd29eb8447bc14b
2014-07-22 22:19:55 -07:00
James Zern
793368e8c6 restore decode API compatibility
protect flip/alpha_dither w/a WEBP_DECODER_ABI_VERSION check

Change-Id: I437a5d5f78800f71b7e7e323faa321f946bf9515
2014-07-22 20:03:52 -07:00
Vikas Arora
d2cc61b7dd Extend MakeARGB32() to accept Alpha channel.
Change-Id: I31b8e2d085000e2e3687a373401e4f655f11fc42
2014-07-21 14:49:38 -07:00
skal
3398d81ac3 Actuate memory stats for PRINT_MEMORY_INFO
Change-Id: If7eac591b5205990ca452ca02b084a908482850a
2014-07-21 13:16:18 -07:00
skal
6c347bbb0c move WebPPictureInit to picture.c
Change-Id: I4b8c352cfd47256d0c3827334a6942c1caf742f6
2014-07-21 14:16:19 +02:00
Pascal Massimino
1549d62067 reorder the YUVA->ARGB and ARGB->YUVA functions correctly
+ rework few loops
+ consolidate few error-checks / error-reporting
+ don't modify picture->colorspace in Import() for ARGB output

Change-Id: Iae6da9b50acc738c59b85c3ee64efbaf6af8bffc
2014-07-18 07:15:54 -07:00
Pascal Massimino
736f2a175e extract colorspace code from picture.c into picture_csp.c
had to refactor few functions here and there.

Change-Id: I86fde6fec7c2fc7eb48f0ecf327dbbd2bd40b9d4
2014-07-16 16:37:26 -07:00
Pascal Massimino
fbadb48026 split monolithic picture.c into picture_{tools,psnr,rescale}.c
Change-Id: Ia5eb5496e4337e5bac8203872c5b014cad21c4f9
2014-07-12 09:13:33 -07:00
James Zern
c76f07ecc2 dec_neon/TransformAC3: initialize vector w/vcreate
replaces {} initialization gnu-ism

Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0
2014-07-11 15:56:53 -07:00
Urvang Joshi
bb4fc051bf gif2webp: Allow single-frame animations
Some single-frame GIF images have a canvas larger than the frame rectangle. For
such images, we retain the ANMF, ANIM and VP8X chunks in the output WebP file.
This ensures that the full canvas width/height and frame offsets are retained.

Change-Id: I3ebae4893f953984de4072fda0938411de787a29
2014-07-10 15:21:05 -07:00
James Zern
46fd44c104 thread: remove harmless race on status_ in End()
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.

Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
2014-07-08 20:32:29 -07:00
James Zern
6781423b7d configure: check for __builtin_bswapXX()
defines HAVE_BUILTIN_BSWAP16/32/64
updated endian_inl.h to have a non-configure fallback for gcc and clang
BSwap16() now uses __builtin_bswap16 if available

Change-Id: Ia04ee07b39303c4b247df96d84f298fb8a81f389
2014-07-05 12:35:13 -07:00
James Zern
6422e683af VP8LFillBitWindow: enable fast path for 32-bit builds
also reduce the load size from 64 to 32 bits as the top 32 bits are
being shifted away in the operation.

the change is neutral speed-wise on x86_64 as is the change in load size
on x86, but it gives a slight improvement on 32-bit arm.
x86 is improved ~13%, 32-bit arm ~3.7%
aarch64 is untested but will likely benefit as well.

Change-Id: Ibcb02a70f46f2651105d7ab571afe352673bef48
2014-07-04 14:42:47 -07:00
James Zern
4f7f52b2a1 VP8LFillBitWindow: respect WEBP_FORCE_ALIGNED
Change-Id: I23eddf01590de002efc21d8c7acc545a08fc3e48
2014-07-04 13:53:29 -07:00
James Zern
e458badcc3 endian_inl.h: implement htoleXX with BSwapXX
+ s/htole(16|32)/HToLE$1/ to avoid any name conflicts

Change-Id: Ic1c84711557e50f73d83ca5aa2b3992ac6738216
2014-07-04 12:16:36 -07:00
James Zern
f2664d1aab endian_inl.h: add BSwap16
+ use it in VP8LoadNewBytes()

Change-Id: I701d3652dc0cbd553852978702ef68c2657bca1c
2014-07-04 12:16:28 -07:00