Increases compression density by 0.03 % for lossy.
Speeds up at least one of the lossy alpha images by 20 %.
Palette entropy 'kludge' seems to save 1-2 % on alpha images.
Change-Id: I2116b8d81593ac8173bfba54a7c833997fca0804
share the computation between different modes
3-5 % speedup for lossless alpha
1 % for lossy alpha
no change in compression density
Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
introduced in:
"lossless: 0.37 % compression density improvement"
Uses the statistics of red and blue histograms to decide if to run
cross color correction at all.
Improves compression density by 0.02 % or so.
Change-Id: I47429557e9cdbd9fa90c584696f241b17427d73f
No significant size degradation (+0.001 %) for 1000 image corpus
Fixes the 8 ms vs 2 ms degradation from:
"lossless: 0.37 % compression density improvement"
Change-Id: Id540169a305d9d5c6213a82b46c879761b3ca608
counting the entropy expectation for five different configurations:
palette
non-predicted
non-predicted with subtract green
predicted
predicted with subtract green
and choose the strategy with the smallest expected entropy
Change-Id: Iaaf209c0d565660a54a4f9b3959067afb9951960
Speedup method StoreImageToBitMask by replacing the code to find histogram
index and Huffman tree codes at every iteration to a more optimal code that
updates these only when the current pixel (to write) crosses the histogram
tile-row boundary.
This change speeds up the StoreImageToBitMask method by 5%.
Change-Id: If01a1ccd7820f9a3a3e5bc449d070defa51be14b
Updated the near-lossless level mapping and make it correlated to lossy
quality i.e 100 => minimum loss (in-fact no-loss) and the visual-quality loss
increases with decrease in near-lossless level (quality) till value 0.
The new mapping implies following (PSNR) loss-metric:
-near_lossless 100: No-loss (bit-stream same as -lossless).
-near_lossless 80: Very very high PSNR (around 54dB).
-near_lossless 60: Very high PSNR (around 48dB).
-near_lossless 40: High PSNR (around 42dB).
-near_lossless 20: Moderate PSNR (around 36dB).
-near_lossless 0: Low PSNR (around 30dB).
Change-Id: I930de4b18950faf2868c97d42e9e49ba0b642960
AnalyzeSubtractGreen constitutes about 8-10% of the comression CPU cycles.
Statistically, subtract-green is proved to be useful for most of the
non-palette compression. So instead of evaluating the entropy (by calling
AnalyzeSubtractGreen) apply subtract-green transform for the low-effort
compression.
This changes speeds up the compression at m=0 by 8-10% (with very slight loss
of 0.07% in the compression density).
Change-Id: I9797dc39437ae089716acb14631bbc77d367acf4
Speed up AnalyzeSubtractGreen by looping through the image pixel once to
compute the two histograms.
AnalyzeEntropy code cleanup.
Removed some 'if' conditions and pointer indirections inside pixel iterate loop.
Change-Id: Ia65e3033988ff67df8e3ecce19d6e34cfc76358e
check enc->argb_ to quiet an msvs /analyze warning:
C6387: 'enc->argb_+y*width' could be '0': this does not adhere to the
specification for the function 'memcpy'.
Change-Id: I87544e92ee0d3ea38942a475c30c6d552f9877b7
- Lower the threshold parameters for HashChainFindCopy.
For 1000 image PNG corpus (m=0), this change yields speedup of 15-20% at
lower quality range (0.25% drop in compression density) and about 10%
for higher quality range without any drop in the compression density.
Following is the compression stats (before/after) for method = 0:
Before After
bpp/MPs bpp/MPs
q=0 2.8615/18.000 2.8651/18.631
q=5 2.8615/18.216 2.8650/20.517
q=10 2.8572/18.070 2.8650/21.992
q=15 2.8519/18.371 2.8584/21.747
q=20 2.8454/18.975 2.8515/20.448
q=25 2.8230/8.531 2.8253/9.585
// Compression density remains same for q-range [30-100]
q=30 2.7310/7.706 2.7310/8.028
q=35 2.7253/6.855 2.7253/7.184
q=40 2.7231/6.364 2.7231/6.604
q=45 2.7216/5.844 2.7216/6.223
q=50 2.7196/5.210 2.7196/5.731
q=55 2.7208/4.766 2.7208/4.970
q=60 2.7195/4.495 2.7195/4.602
q=65 2.7185/4.024 2.7185/4.236
q=70 2.7174/3.699 2.7174/3.861
q=75 2.7164/3.449 2.7164/3.605
q=80 2.7161/3.222 2.7161/3.038
q=85 2.7153/2.919 2.7153/2.946
q=90 2.7145/2.766 2.7145/2.771
q=95 2.7124/2.548 2.7124/2.575
q=100 2.6873/2.253 2.6873/2.335
Change-Id: I0e17581fb71f6094032ad06c6203350bd502f9a1
- Do light weight entropy based histogram combine and leave out CPU
intensive stochastic and greedy heuristics for combining the
histograms.
For 1000 image PNG corpus (m=0), this change yields speedup of 10% at
lower quality range (1% drop in compression density) and about 5% for
higher quality range (1% drop in compression density). Following is the
compression stats (before/after) for method = 0:
Before After
bpp/MPs bpp/MPs
q=0 2.8336/16.577 2.8615/18.000
q=5 2.8336/16.504 2.8615/18.216
q=10 2.8293/16.419 2.8572/18.070
q=15 2.8242/17.582 2.8519/18.371
q=20 2.8182/16.131 2.8454/18.975
q=25 2.7924/7.670 2.8230/8.531
q=30 2.7078/6.635 2.7310/7.706
q=35 2.7028/6.203 2.7253/6.855
q=40 2.7005/6.198 2.7231/6.364
q=45 2.6989/5.570 2.7216/5.844
q=50 2.6970/5.087 2.7196/5.210
q=55 2.6963/4.589 2.7208/4.766
q=60 2.6949/4.292 2.7195/4.495
q=65 2.6940/3.970 2.7185/4.024
q=70 2.6929/3.698 2.7174/3.699
q=75 2.6919/3.427 2.7164/3.449
q=80 2.6918/3.106 2.7161/3.222
q=85 2.6909/2.856 2.7153/2.919
q=90 2.6902/2.695 2.7145/2.766
q=95 2.6881/2.499 2.7124/2.548
q=100 2.6873/2.253 2.6873/2.285
Change-Id: I0567945068f8dc7888041e93d872f9def91f50ba
most of the time, we don't need to actually move the
data.
Compression is randomly slightly different, because HistogramCompactBins() changed.
Timing is about the same.
Change-Id: Ia6af8e9780581014d6860f2b546189ac817cfad1
Remove handling for WEBP_HINT_GRAPH w.r.t use_palette flag.
The WEBP_HINT_GRAPH is now used at one place, to set the initial size of the
Bit Writer as bpp for photo images are generally larger than the graphical
images.
Change-Id: I1b9c4436c85a8f69da74c0dbcd292397323f2696
- The optimal cache bits is evaluated inside the method 'VP8LGetBackwardReferences'.
- The input cache_bits to 'VP8LGetBackwardReferences' sets the maximum cache
bits to use (passing 0 implies disabling the local color cache).
- The local color cache is disabled for lowerf (<= 25) quality levels (as before).
- Enabled local color cache for palette images as well. This saves additional
0.017% bytes with a slight (2-3%) improvement in the compression speed.
- Removed 'use_2d_locality' parameter from method VP8LGetBackwardReferences, as
this option is not an option now (after we freeze the lossless bit-stream).
Change-Id: I33430401e465474fa1be899f330387cd2b466280
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).
The compression density remains the same with this change, with little faster
compression speed.
Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.
This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.
Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
Compared to previous mode it gives another 10-30% improvement in compression keeping comparable PSNR on corresponding quality settings.
Still protected by the WEBP_EXPERIMENTAL_FEATURES flag.
Change-Id: I4821815b9a508f4f38c98821acaddb74c73c60ac
Evaluate if for Palette images (num_colors <= 256), non-palette
compression path (Subtract green, predictor transform etc) yield an
optimal compression density.
This change reduces the WebP file (for palette images) size by 0.4% with
drop of 3-5% in compression speed.
Change-Id: I1ad66fa94db4fd7ba7bc215763791ef662cd4f42
We compact the palette by weighted distance, favoring the green channel.
Average gain on paletted file is ~0.5%, with gain up to 6-7% on some favorable cases.
Encoding speed is unaffected.
Disabled for alpha (or any single-channel input)
Also: always use quality=20 for EncodePalette() since it
doesn't make any real difference.
Change-Id: I19fb14316a366f139a941b45aef5663a33c905e1
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)
Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)
Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
Non-photo source produce far less literal reference and their
buffer is usually much smaller than the picture size if its compresses
well. Hence, use a block-base allocation (and recycling) to avoid
pre-allocating a buffer with maximal size.
This can reduce memory consumption up to 50% for non-photographic
content. Encode speed is also a little better (1-2%)
Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11
the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_)
is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain
in the encoder.
Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677
We use automatic int->uint64_t promotion where applicable.
(uint64_t should be kept only for overflow checking and memory alloc).
Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125
* merged the two HistogramAdd/AddEval() into a single call
(with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster
Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.
Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.
Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.
Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.
Change-Id: I491aa1c726def934161d4a4377e009737fbeff82
there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.
The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.
Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).
Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b
Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.
This speeds up lossless encoding by 1-2% and almost no impact on compression
density.
Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b
Speedup lossless encoder by 20-25% by optimizing:
- GetBestColorTransformForTile: Use techniques like binary search and
local minima search to reduce the search space.
- VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for
log(1 + x) and increase the threshold for calling the approximate
version of log_2 (compared to costly call to log()).
Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d
Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.
Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.
This speeds up the lossless encoding at lower qualities by 10-15%.
Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
This speeds up WebP lossless decoding by 20%. In particular, the
photographic images get 35% speedup.
Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc
Tuned the cross_color transform parameter (step) for lower quality
levels. This change gives speedup of 20% at lower qualities (25) and 10% at
moderate quality level (50) with a loss of 0.25% in compression density.
Also removed TODO for cross_color transform. Observed good correlation of
this with the predict transform.
Change-Id: I8a1044e9f24e6a5f84295c030fd444d0eec7d154