Updated BackwardReferencesRle method by utilizing the local color cache.
Also changed the name of method BackwardReferencesHashChain to
BackwardReferencesLz77 to reflect the LZ77 coding.
For the 1000 image corpus, this change saves 0.2% bytes
(at default settings) and is 2-5% faster to encode.
Change-Id: Ic3f288253b3bbb101a69945a80994c3fd0917f8b
Optimize backwardreferences (about 0.1% byte savings) with almost same
compression speed (3% faster on defaut compression settings).
1.) Simplified iteration logic for HashChainFindCopy.
- Remapped the iter_max constant.
2.) Simplified main for loop for BackwardReferencesHashChain
- Removed 'if' conditions for corner cases in the main loop.
- Refactored the method(AddSingleLiteral) for adding one pixel.
Change-Id: I1bc44832fd81f11e714868a13e606c8f83157e64
Speed up BackwardReferencesHashChainDistanceOnly method by:
1.) Remove for loop for shortmax code path.
2.) Execute the shortmax code path after regular call to
HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH).
3.) Also for shortmax, call method HashChainFindOffset (for length = 2),
instead of expensive method HashChainFindCopy().
4.) Handling first pixel (i==0) outside main loop and removing one if
condition (i > 0) per pixel.
5.) Handle the last pixel outside the main 'for' loop.
Overall compression speedup observed is around 5% (+/- noise).
Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b
at least clang 3.[45] in c++ mode with -std=c++11 define __STRICT_ANSI__
this change set WEBP_INLINE to inline for c++/non-strict-ansi/> c99
fixes crbug.com/428383
Change-Id: Ief2b934353c336a75865c73c90cc3dc5e4f83913
Enable bin-partition entropy based heuristic for merging histograms
for higher (q >= 90) qualities as well. Keep the old behavior at the
maximum quality level (q==100).
This speeds up the compression between Q=90-99 (method=4) by factor 5-7X
and with loss of 0.5-0.8% in the compression density.
Change-Id: I011182cb8ae5403c565a150362bc302630b3f330
The Maximum allowed limit is 11.
The Q=25 and below is not impacted as cache bits are forced to 0.
This saves 0.05% - 0.1% bytes for other quality with almost same compression
speed (+/- 2-3%, that's more of a noise).
Change-Id: Icf972a98f298c89e140e37a627baf709134be9a0
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).
The compression density remains the same with this change, with little faster
compression speed.
Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.
This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.
Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
Instead of calling HashChainFindMethod, call a new (subset) method
HashChainFindOffset to get the offset/distance for a given length.
The encoding is tad faster at default compression
Before After
bpp/rate bpp/rate
442 Palette 0.2720/5.270 MP/s 0.2720/5.790 MP/s
558 non-palette 3.7607/0.797 MP/s 3.7607/0.816 MP/s
Change-Id: If4041a9c18f7e972f49fcbab8c3e2f013d8bf1cf
Updated VP8LGetBackwardReferences and HashChainFindCopy method with following:
- Remove the recursive CostModelBuild.
- Reuse the lz77 backward refs in CostModelBuild, instead of evaluating it
again (as it was done for recursion_level=0).
- Consolidated the Match-length logic inside FindMatchLength method.
- Removed the logic for altering best_length/val based on the 2D distance.
The additional 162 value (+= 9 * 9 + 9 * 9 - y * y - x * x) can't change the
best_val eval computation to choose a different curr_length, as best_val was
set to 'curr_length << 16'.
Following is the impact on the compression speed/density at default & max
quality, overall this speeds up compression by 5-15% (q=100 -> 75) with a tad
drop (0.02-0.03%) in compression density for the non-palette images.
Before After
bpp/Rate(MP/s) bpp/Rate(MP/s)
q=75 (def)
All 1000 2.4492/1.049 MP/s 2.4498/1.230 MP/s
Palette 0.2719/5.060 MP/s 0.2719/6.110 MP/s
non-Palette 3.7597/0.732 MP/s 3.7607/0.840 MP/s
q=100
All 1000 2.4134/0.125 MP/s 2.4142/0.131 MP/s
Palette 0.2692/2.585 MP/s 0.2692/2.885 MP/s
non-Palette 3.7040/0.079 MP/s 3.7053/0.083 MP/s
Change-Id: I27a5eff3356d876c3e949fd32262244b25678b7a
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples
Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
Compared to previous mode it gives another 10-30% improvement in compression keeping comparable PSNR on corresponding quality settings.
Still protected by the WEBP_EXPERIMENTAL_FEATURES flag.
Change-Id: I4821815b9a508f4f38c98821acaddb74c73c60ac
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9
In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault
Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
Evaluate if for Palette images (num_colors <= 256), non-palette
compression path (Subtract green, predictor transform etc) yield an
optimal compression density.
This change reduces the WebP file (for palette images) size by 0.4% with
drop of 3-5% in compression speed.
Change-Id: I1ad66fa94db4fd7ba7bc215763791ef662cd4f42
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9
Change-Id: Ifa7c0dd7f3f075b3860fa8ec176d2c98ff54fcea
got rid of the |a-b|^|b-a| method and went back
to just (a-b)^2 instead.
quality | size(bytes) after/before | time (ms) after/before
Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0
We compact the palette by weighted distance, favoring the green channel.
Average gain on paletted file is ~0.5%, with gain up to 6-7% on some favorable cases.
Encoding speed is unaffected.
Disabled for alpha (or any single-channel input)
Also: always use quality=20 for EncodePalette() since it
doesn't make any real difference.
Change-Id: I19fb14316a366f139a941b45aef5663a33c905e1
SSE2 version is 2.1x faster
This is used to transfer the alpha plane to green channel before lossless compression.
Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
Don't combine the Histograms that have trivial (single valued A, R & B)
symbols.
Following is the compression savings data along with compression time (before
& after) per image.
Before After
bpp, rate(MP/s) bpp, rate(MP/s)
Q=25, method = 4 2.508, 1.807 2.499, 1.916
Q=50, method = 4 2.460, 1.488 2.456, 1.512
Q=75, method = 4 2.452, 1.078 2.450, 1.092
Q=25, method = 5 2.505, 1.398 2.496, 1.383
Q=50, method = 5 2.458, 1.170 2.453, 1.143
Q=75, method = 5 2.453, 0.886 2.450, 0.855
This change provides 0.1-0.4% compression gains and speeds up the lossless
compression for the default method=4 (the drop in compression speed is between 1-3.5% for method=5).
Change-Id: Idfd88c2092f37afacd26a97097b3053f8183953a
Tested on 1000 pngs corpus with quality 90-100 it gives ~0.15% improvement
in compression density and ~7% speed up.
Change-Id: I460f56c96707edb3c1f0b51a024e5122e10458df
* We don't need to change DecodeAlpha, since incremental
decoding is not useful for Alpha (we already decode
progressively along the RGB)
* Similarly, we don't do incremental decoding for level>0 planes:
the metadata don't turn into visible pixel (only the ones in level0), so...
(No visible speed change)
Change-Id: I2fd4b9ba227561a7dbede647686584b752be7baa
- don't call VP8LClear() when there's no error (and let the caller do it)
- only initialize output once if state_ is not READ_DATA
- don't over-set dec->status_ = READ_DATA
- don't re-set dec->status_ if DecodeImageStream() fails
- remove unneeded dec->action_ field
- make ReadImageInfo() check br->eos_
- use ErrorStatusLossless() more consistently
Change-Id: Ica6e4b1c82e3fce8b1ce0274def551a886b73b0b
Optimize the decoding for region that have trivial literal codes.
The trivial literal is defined as huffman image with Red, Blue and Alpha
huffman trees with only single code values.
This speeds up lossless decoding by 3%
Change-Id: I0204949917836f74c0eb4ba5a7f4052a4797833b
* We were re-doing most of the work in plain-C as 'left-over'.
* we were always returning has_alpha = true because of a bad mask all_0xff
These bugs were conservative and silent, in the sense that we were 'just' doing
more work than necessary.
Now, the SSE2 version is really 2x faster than the C version.
Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa
Handle the corner case when VP8LDecodeImage() method is called with an invalid
header data. The lossless decoding doesn't support incremental mode yet.
Return the error status as BITSTREAM error in case not all pixels are decoded
with the provided bit-stream. Also added asserts in the VP8LDecodeImage() method
to validate the decoder header with appropriate/valid data for huffman trees
(htree_groups_ etc).
Change-Id: Ibac9fcfc4bd0a2c5f624bb9d4a2b9f6459aa19ea
-> introduce special case 64b pattern-copy, similar to the 8b one for alpha.
-> use mempcy() for non-overlapping areas
+ cosmetics and homogenezation of the code
Change-Id: I0e65e04b96fec94c009a4614137dfba2a0f98561
ReadSymbol() finishes with a VP8LSetBitPos() call only and could miss an eos_ during the decode loop.
Things are faster because of inlining too.
Change-Id: I2d2a275f38834ba005bc767d45c5de72d032103e
eos_ needs to be set only when superfluous bits have actually
been requested.
Earlier, we were assuming pre-mature end-of-stream to be an error.
Now, more precisely, we mark error when we have encountered end-of-stream *and*
we attempt to read more bits after that.
This handles cases where image data requires no bits to be read
Change-Id: I628e2c39c64f10c443fb51f86b1f5919cc9fd299
if ALPHA_LOSSLESS_COMPRESSION produces a too big file (very rare!),
we fall-back to no-compression automatically.
Change-Id: I5f3f509c635ce43a5e7c23f5d0f0c8329a5f24b7
speed-up is ~1.6% for photographic image to 10% for graphical image
(1000 images corpus was sped up by 5.8 %)
Code by akramarz@google.com and jyrki@google.com
Change-Id: Iceb2e50e6cc761b9315a3865d22ec9d19b8011c6
Split initialization of YUV444Converters[] out of Upsamplers init.
update test for NULL function pointers
Change-Id: I9603f54250f90c85a12ffbecfd6c59e9b06c47e0
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.
Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
rightmost pixel was missing a copy, which could lead to invalid read.
Also added a lower dimension of 4, below which we use the regular conversion.
This is to prevent corner cases, in addition to not being overkill.
Change-Id: Iac12e7a3d74590f12fe8eeb1830b9891e61439f6
_M_IX86 will be defined in mingw builds after including windows.h. as
the gcc inline asm is first, this missing check would only have caused
an error if the code was reorganized.
Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c
with a special case for dithering==0., it gets somewhat faster on x86
thanks to inlining.
Also, less macros.
Change-Id: Ic2f2bf6718310743bb40cef2104fa759a073e6d5
New function: WebPPictureSmartARGBToYUVA()
This implement smart RGB->YUV conversion.
This is rather undocumented for now, and is triggered using '-pre 4'
preprocessing option.
This is slow-ish and use quite some memory, but should be improvable.
This is somehow a usable beta version.
Change-Id: Ia50a8c30134e4cab8a7d3eb70aef13ce1f6187a1
This compresses the uimage using lossless compression and controlable
decimating pre-process.
Code is under WEBP_EXPERIMENTAL_FEATURE while it's being experimented with.
Change-Id: I8b7f4cfcc3c6afc52a556102842bdbb045ed5ee8
Some single-frame GIF images have a canvas larger than the frame rectangle. For
such images, we retain the ANMF, ANIM and VP8X chunks in the output WebP file.
This ensures that the full canvas width/height and frame offsets are retained.
Change-Id: I3ebae4893f953984de4072fda0938411de787a29
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.
Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
defines HAVE_BUILTIN_BSWAP16/32/64
updated endian_inl.h to have a non-configure fallback for gcc and clang
BSwap16() now uses __builtin_bswap16 if available
Change-Id: Ia04ee07b39303c4b247df96d84f298fb8a81f389
also reduce the load size from 64 to 32 bits as the top 32 bits are
being shifted away in the operation.
the change is neutral speed-wise on x86_64 as is the change in load size
on x86, but it gives a slight improvement on 32-bit arm.
x86 is improved ~13%, 32-bit arm ~3.7%
aarch64 is untested but will likely benefit as well.
Change-Id: Ibcb02a70f46f2651105d7ab571afe352673bef48