We were not updating the current_width_, which is usually
not a problem, unless we use Delta Palette with small number
of colors
-> Addressed this re-entrancy problem by checking we have
enough capacity for transform buffer.
The problem is not currently visible, until we restrict
the number of gradient used in delta-palette to less than 16.
Then the buffers have different current_width_ and the problem
surfaces.
Change-Id: Icd84b919905d7789014bb6668bfb6813c93fb36e
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.
Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset:
Alpha cleanup before: 18856614
Alpha cleanup after: 18685802
For reference, with no alpha cleanup: 19159992
Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps
preprocessing in the encoder, and the cases when the prediction transform is not used.
Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28
instead of per block. This prepares for a next CL that can make the
predictors alter RGB value behind transparent pixels for denser
encoding. Some predictors depend on the top-right pixel, and it must
have been already processed to know its new RGB value, so requires per
scanline instead of per block.
Running the encode speed test on 1000 PNGs 10 times with default
settings:
Before:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s
After:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s
Same but with quality 0, method 0 and 30 iterations:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s
No effect on compressed size, this produces exactly same files. No
significant measured effect on speed. Expected faster speed from better
memory layout with scanline processing but slower speed due to needing
to get predictor mode per pixel, may compensate each other.
Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334
same functionality, but better code layout.
What changed:
* don't trash the palette_[] in EncodePalette(), so it can be re-used
* split generation of image from bit-stream coding
* move all the delta-palette code to delta_palettization.c, and only have 1 entry point there WebPSearchOptimalDeltaPalette()
* minimize the number of "#ifdef WEBP_EXPERIMENTAL_FEATURES" in vp8l.c
* clarify the TransformBuffer stuff. more clean-up to come here...
This should make experimenting with delta-palettization easier and more compartimentalized.
Change-Id: Iadaa90e6c5b9dabc7791aec2530e18c973a94610
Slightly faster on -m 0 -q 0, particularly for small images (50 x 75
image was 0.1 % faster on callgrind measurement).
Increases compression density by 0.005 % for the 1000 images, but small
images can improve even 0.5 % (about 4 bytes, depending on the
characteristics of the palette).
Change-Id: I94f568d396ac62a054a829abeeef3eb0af6b3f94
Increases compression density by 0.03 % for lossy.
Speeds up at least one of the lossy alpha images by 20 %.
Palette entropy 'kludge' seems to save 1-2 % on alpha images.
Change-Id: I2116b8d81593ac8173bfba54a7c833997fca0804
share the computation between different modes
3-5 % speedup for lossless alpha
1 % for lossy alpha
no change in compression density
Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
introduced in:
"lossless: 0.37 % compression density improvement"
Uses the statistics of red and blue histograms to decide if to run
cross color correction at all.
Improves compression density by 0.02 % or so.
Change-Id: I47429557e9cdbd9fa90c584696f241b17427d73f
No significant size degradation (+0.001 %) for 1000 image corpus
Fixes the 8 ms vs 2 ms degradation from:
"lossless: 0.37 % compression density improvement"
Change-Id: Id540169a305d9d5c6213a82b46c879761b3ca608
counting the entropy expectation for five different configurations:
palette
non-predicted
non-predicted with subtract green
predicted
predicted with subtract green
and choose the strategy with the smallest expected entropy
Change-Id: Iaaf209c0d565660a54a4f9b3959067afb9951960
Speedup method StoreImageToBitMask by replacing the code to find histogram
index and Huffman tree codes at every iteration to a more optimal code that
updates these only when the current pixel (to write) crosses the histogram
tile-row boundary.
This change speeds up the StoreImageToBitMask method by 5%.
Change-Id: If01a1ccd7820f9a3a3e5bc449d070defa51be14b
Updated the near-lossless level mapping and make it correlated to lossy
quality i.e 100 => minimum loss (in-fact no-loss) and the visual-quality loss
increases with decrease in near-lossless level (quality) till value 0.
The new mapping implies following (PSNR) loss-metric:
-near_lossless 100: No-loss (bit-stream same as -lossless).
-near_lossless 80: Very very high PSNR (around 54dB).
-near_lossless 60: Very high PSNR (around 48dB).
-near_lossless 40: High PSNR (around 42dB).
-near_lossless 20: Moderate PSNR (around 36dB).
-near_lossless 0: Low PSNR (around 30dB).
Change-Id: I930de4b18950faf2868c97d42e9e49ba0b642960
AnalyzeSubtractGreen constitutes about 8-10% of the comression CPU cycles.
Statistically, subtract-green is proved to be useful for most of the
non-palette compression. So instead of evaluating the entropy (by calling
AnalyzeSubtractGreen) apply subtract-green transform for the low-effort
compression.
This changes speeds up the compression at m=0 by 8-10% (with very slight loss
of 0.07% in the compression density).
Change-Id: I9797dc39437ae089716acb14631bbc77d367acf4
Speed up AnalyzeSubtractGreen by looping through the image pixel once to
compute the two histograms.
AnalyzeEntropy code cleanup.
Removed some 'if' conditions and pointer indirections inside pixel iterate loop.
Change-Id: Ia65e3033988ff67df8e3ecce19d6e34cfc76358e
check enc->argb_ to quiet an msvs /analyze warning:
C6387: 'enc->argb_+y*width' could be '0': this does not adhere to the
specification for the function 'memcpy'.
Change-Id: I87544e92ee0d3ea38942a475c30c6d552f9877b7
- Lower the threshold parameters for HashChainFindCopy.
For 1000 image PNG corpus (m=0), this change yields speedup of 15-20% at
lower quality range (0.25% drop in compression density) and about 10%
for higher quality range without any drop in the compression density.
Following is the compression stats (before/after) for method = 0:
Before After
bpp/MPs bpp/MPs
q=0 2.8615/18.000 2.8651/18.631
q=5 2.8615/18.216 2.8650/20.517
q=10 2.8572/18.070 2.8650/21.992
q=15 2.8519/18.371 2.8584/21.747
q=20 2.8454/18.975 2.8515/20.448
q=25 2.8230/8.531 2.8253/9.585
// Compression density remains same for q-range [30-100]
q=30 2.7310/7.706 2.7310/8.028
q=35 2.7253/6.855 2.7253/7.184
q=40 2.7231/6.364 2.7231/6.604
q=45 2.7216/5.844 2.7216/6.223
q=50 2.7196/5.210 2.7196/5.731
q=55 2.7208/4.766 2.7208/4.970
q=60 2.7195/4.495 2.7195/4.602
q=65 2.7185/4.024 2.7185/4.236
q=70 2.7174/3.699 2.7174/3.861
q=75 2.7164/3.449 2.7164/3.605
q=80 2.7161/3.222 2.7161/3.038
q=85 2.7153/2.919 2.7153/2.946
q=90 2.7145/2.766 2.7145/2.771
q=95 2.7124/2.548 2.7124/2.575
q=100 2.6873/2.253 2.6873/2.335
Change-Id: I0e17581fb71f6094032ad06c6203350bd502f9a1
- Do light weight entropy based histogram combine and leave out CPU
intensive stochastic and greedy heuristics for combining the
histograms.
For 1000 image PNG corpus (m=0), this change yields speedup of 10% at
lower quality range (1% drop in compression density) and about 5% for
higher quality range (1% drop in compression density). Following is the
compression stats (before/after) for method = 0:
Before After
bpp/MPs bpp/MPs
q=0 2.8336/16.577 2.8615/18.000
q=5 2.8336/16.504 2.8615/18.216
q=10 2.8293/16.419 2.8572/18.070
q=15 2.8242/17.582 2.8519/18.371
q=20 2.8182/16.131 2.8454/18.975
q=25 2.7924/7.670 2.8230/8.531
q=30 2.7078/6.635 2.7310/7.706
q=35 2.7028/6.203 2.7253/6.855
q=40 2.7005/6.198 2.7231/6.364
q=45 2.6989/5.570 2.7216/5.844
q=50 2.6970/5.087 2.7196/5.210
q=55 2.6963/4.589 2.7208/4.766
q=60 2.6949/4.292 2.7195/4.495
q=65 2.6940/3.970 2.7185/4.024
q=70 2.6929/3.698 2.7174/3.699
q=75 2.6919/3.427 2.7164/3.449
q=80 2.6918/3.106 2.7161/3.222
q=85 2.6909/2.856 2.7153/2.919
q=90 2.6902/2.695 2.7145/2.766
q=95 2.6881/2.499 2.7124/2.548
q=100 2.6873/2.253 2.6873/2.285
Change-Id: I0567945068f8dc7888041e93d872f9def91f50ba
most of the time, we don't need to actually move the
data.
Compression is randomly slightly different, because HistogramCompactBins() changed.
Timing is about the same.
Change-Id: Ia6af8e9780581014d6860f2b546189ac817cfad1
Remove handling for WEBP_HINT_GRAPH w.r.t use_palette flag.
The WEBP_HINT_GRAPH is now used at one place, to set the initial size of the
Bit Writer as bpp for photo images are generally larger than the graphical
images.
Change-Id: I1b9c4436c85a8f69da74c0dbcd292397323f2696
- The optimal cache bits is evaluated inside the method 'VP8LGetBackwardReferences'.
- The input cache_bits to 'VP8LGetBackwardReferences' sets the maximum cache
bits to use (passing 0 implies disabling the local color cache).
- The local color cache is disabled for lowerf (<= 25) quality levels (as before).
- Enabled local color cache for palette images as well. This saves additional
0.017% bytes with a slight (2-3%) improvement in the compression speed.
- Removed 'use_2d_locality' parameter from method VP8LGetBackwardReferences, as
this option is not an option now (after we freeze the lossless bit-stream).
Change-Id: I33430401e465474fa1be899f330387cd2b466280
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).
The compression density remains the same with this change, with little faster
compression speed.
Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.
This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.
Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
Compared to previous mode it gives another 10-30% improvement in compression keeping comparable PSNR on corresponding quality settings.
Still protected by the WEBP_EXPERIMENTAL_FEATURES flag.
Change-Id: I4821815b9a508f4f38c98821acaddb74c73c60ac
Evaluate if for Palette images (num_colors <= 256), non-palette
compression path (Subtract green, predictor transform etc) yield an
optimal compression density.
This change reduces the WebP file (for palette images) size by 0.4% with
drop of 3-5% in compression speed.
Change-Id: I1ad66fa94db4fd7ba7bc215763791ef662cd4f42
We compact the palette by weighted distance, favoring the green channel.
Average gain on paletted file is ~0.5%, with gain up to 6-7% on some favorable cases.
Encoding speed is unaffected.
Disabled for alpha (or any single-channel input)
Also: always use quality=20 for EncodePalette() since it
doesn't make any real difference.
Change-Id: I19fb14316a366f139a941b45aef5663a33c905e1
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)
Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)
Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
Non-photo source produce far less literal reference and their
buffer is usually much smaller than the picture size if its compresses
well. Hence, use a block-base allocation (and recycling) to avoid
pre-allocating a buffer with maximal size.
This can reduce memory consumption up to 50% for non-photographic
content. Encode speed is also a little better (1-2%)
Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11
the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_)
is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain
in the encoder.
Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677
We use automatic int->uint64_t promotion where applicable.
(uint64_t should be kept only for overflow checking and memory alloc).
Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125
* merged the two HistogramAdd/AddEval() into a single call
(with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster
Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.
Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.
Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.
Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095