The code and logic is unified when computing bit entropy + Huffman cost.
Speed-wise, we gain 8% for lossless encoding.
Logic-wise, the beginning/end of the distributions are handled properly
and the compression ratio does not change much.
Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633
setting all transparent pixels to black rather than the "flatten" method.
0.3% smaller filesize on the 1000 PNGs if alpha cleanup is used (before: 18685774, after: 18622472)
Change-Id: Ib0db9e7ccde55b36e82de07855f2dbb630fe62b1
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.
Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
This implementation brings:
- an SSE implementation of packing / unpacking
- bigger buffers processed at the same time
The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on
lossy encoding (RGB to YUV was already optimized).
Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f
* changes:
demux: remove GetFragment()
demux: remove dead fragment related TODO
demux, Frame: remove is_fragment_ field
demux,WebPIterator: remove fragment_num/num_fragments
demux: remove WebPDemuxSelectFragment
this hasn't been set since parsing of the experimental chunk was
removed.
+ cleanup IsValidExtendedFormat(). is_fragmented has caused immediate
failure since:
4e2589f demux: restore strict fragment flag check
Change-Id: If9ecfc19556297100a6d5de1ba2cffdcbdc6c8fd
INFO: From Compiling src/dsp/cpu.c:
src/dsp/cpu.c: In function 'x86CPUInfo':
src/dsp/cpu.c:36:3: inconsistent operand constraints in an 'asm'
With PIC and mcmodel=medium, the %rbx register must be saved and
restored which causes this problem. This was also solved in GCC-4.9 with
this patch:
https://gcc.gnu.org/ml/gcc-patches/2012-12/msg01484.html
Tested:
Builds fine with this change.
Change-Id: Icca8eea7bf5af3ef9f17f6ae2886e3430143febf
CombinedShannonEntropy takes 30% for lossless compression.
This implementation speeds up the overall process by 2 to 3 %.
Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f
The previous priority system used a heap which was too heavy to
maintain (what was gained from insertions / deletions was lost
due to a linear that still happened on the heap for invalidation).
The new structure is a priority queue where only the head is
ordered.
Change-Id: Id13f8694885a934fe2b2f115f8f84ada061b9016
SimpleQuantize()
it's now a single function, that reconstructs the intra4x4 block during the scan
The I4_PENALTY had to be adjusted.
Overall, result is better quality-wise (esp. at q < 50), and a tad faster too.
method #0, #1 and #3+ are unchanged
Change-Id: If262aeb552397860b3dd532df8df6b1357779222
* Precision is slightly different
* also implemented in SSE2 the missing WebPUpsamplers for MODE_ARGB, MODE_Argb, MODE_RGB565, etc.
* removing yuv_tables_sse2.h saved ~8k of binary size
* the mips32/mips_dsp_r2 code is disabled for now, since it has drifted away
* the NEON code is somewhat tricky
Change-Id: Icf205faa62cf46c2825d79f3af6725dc1ec7f052
Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset:
Alpha cleanup before: 18856614
Alpha cleanup after: 18685802
For reference, with no alpha cleanup: 19159992
Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps
preprocessing in the encoder, and the cases when the prediction transform is not used.
Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28
The 32-bit buffers are actually rarely 64-bit aligned.
The new solution uses memcmp and is alignment agnostic.
It is also slightly faster.
Change-Id: I863003e9ee4ee8a3eed25b7b2478cb82a0ddbb20
for more speed.
This gives a roughly a 1% speedup for low_effort. But actually this is a
preparation for the upcoming CL that changes RGB values of transparent pixels
based on prediction, which should not be done for low_effort because that would
slightly hurt its performance.
On 1000 PNGs, with quality 0, method 0:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.034 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.428 MP/s
Change-Id: I5ed9f599bbf908a917723f3c780551ceb7fd724d
Arrays were compared 32 bits at a time, it is now done 64 bits at a time.
Overall encoding speed-up is only of 0.2% on @skal's small PNG corpus.
It is of 3% on my initial 1.3 Mp desktop screenshot image.
Change-Id: I1acb32b437397a7bf3dcffbecbcd4b06d29c05e1
The same computation was done for both values: go over two buffers,
sum them up, and take a decision on the sum at each iteration.
MIPS32 code has been disabled for now, pending a code update.
Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d
instead of per block. This prepares for a next CL that can make the
predictors alter RGB value behind transparent pixels for denser
encoding. Some predictors depend on the top-right pixel, and it must
have been already processed to know its new RGB value, so requires per
scanline instead of per block.
Running the encode speed test on 1000 PNGs 10 times with default
settings:
Before:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s
After:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s
Same but with quality 0, method 0 and 30 iterations:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s
No effect on compressed size, this produces exactly same files. No
significant measured effect on speed. Expected faster speed from better
memory layout with scanline processing but slower speed due to needing
to get predictor mode per pixel, may compensate each other.
Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334
the problem was the incorporation of the extra constant 1<<16 in the kC1
constant, to emulate the addition. It's now removed and the addition is
performed explicitly.
No real speed difference observed.
cf. issue #278
Change-Id: I2c6499031571d98afff392fb5ebe21a5fa60722d
* changes:
Makefile.vc: enable WEBP_USE_THREAD for windows phone
thread: use CreateThread for windows phone
thread: use WaitForSingleObjectEx if available
thread: use InitializeCriticalSectionEx if available
thread: use native windows cond var if available
if FANCY_UPSAMPLING was not defined but io->fancy_upsampling was set,
then the call to WebPInitSamplers() was skipped -> boom.
Change-Id: Id63e2ecc09f532fbe2ec9936d9ce4b502ba8fac5
Rename the flag to exact instead of the opposite cleanup_alpha. Add the flag to
WebPConfig. Do the cleanup in the webp encoder library rather than the cwebp
binary, this will be needed for the next stage: smarter alpha cleanup for
better compression which cannot be done as a preprocessing due to depending on
predictor choices in the encoder.
Change-Id: I2fbf57f918a35f2da6186ef0b5d85e5fd0020eef
the original change triggered several internal API modifs.
This is to ensure that we're never computing pointer that can
possibly wrap around, or differences between pointers that can
overflow.
no observed speed difference
Change-Id: I9c94dda38d94fecc010305e4ad12f13b8fda5380
We now consider 3 special cases:
* htree-group has only 1 code (no bit is read from bitstream)
* htree-group has few enough literal symbols, so that all the bit
codes can fit into a look-up table of less than 64 entries
* htree-group has a trivial arb literal (not GREEN!), like before
No overall speed change.
Change-Id: I6077fa0b7e5c31a6c67aa8aca859c22cc50ee254
We now get error string instead of printing it.
The verbose option is now only used to print info and warnings.
Change-Id: I985c5acd427a9d1973068e7b7a8af5dd0d6d2585
It was needed earlier for WebPAnimEncoder API when it was using structs
like WebPConfig, but it only uses pointers to those now.
Change-Id: Ic0c144966421c678e8ef54b3fa81574bb2c9cd08
global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)
Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV
Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.
ConvertRowToY() should go into dsp/ too, at some point.
Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425
also switch to using ExtractAlpha() instead of hard-coding the loop.
The ARGBToY/UV functions are rather easy to port to SSE2 / NEON.
Change-Id: I8f1346a9ca427a36ce2d6c848369ca7964d8b3c7
use 'u' rather than the unnecessary 'l' as a suffix. this prevents a
conversion warning with some toolchains
Change-Id: I21c33ce08819b3c839c75e03a8f7f3a6041d0695
Note that ALIGN_CST is still kept different in dec/frame.c for now,
because the values is 31 there, not 15. We might re-unite these two
later.
Change-Id: Ibbee607fac4eef02f175b56f0bb0ba359fda3b87
same functionality, but better code layout.
What changed:
* don't trash the palette_[] in EncodePalette(), so it can be re-used
* split generation of image from bit-stream coding
* move all the delta-palette code to delta_palettization.c, and only have 1 entry point there WebPSearchOptimalDeltaPalette()
* minimize the number of "#ifdef WEBP_EXPERIMENTAL_FEATURES" in vp8l.c
* clarify the TransformBuffer stuff. more clean-up to come here...
This should make experimenting with delta-palettization easier and more compartimentalized.
Change-Id: Iadaa90e6c5b9dabc7791aec2530e18c973a94610
some limitations: only for RGBA output,
and if reduction factor is not too small (dst_width > src_width / 128)
20-25% faster, ~4-6% global improvement total decoding.
Change-Id: I95366ddaa4a38e0a96bed754dfe790126f7bb84a
It's better to stay with a 32b fixed-point precision overall, otherwise
the C-version on ARM gets *slower*.
Actually, gcc ARM compiler optimizes some instructions pretty
well when WEBP_RESCALER_FIX is exactly 32, even in C.
Change-Id: I0eea97f7db5947470f5af355dee098eca81e178d
New palette compresses more than 20% better with minimum quality loss.
Tested on set of wikipedia images with command line:
cwebp -delta_palettization
Change-Id: I82ec7d513136599cd70386f607f634502eb9095d
Earlier, we stored a 1x1 frame for such frames. Now, we drop every such
frame and increase the duration of its previous frame instead.
Also, modify the anim_diff tool to handle animated images that are
equivalent, but have different number of frames.
Change-Id: I2688b1771e1f5f9f6a78e48ec81b01c3cd495403
The rounding and arithmetic is not the same as previously, to prevent overflow cases for large upscale factors.
We still rely on 32b x 32b -> 64b multiplies. Raised the fixed-point precision to 32b
so that we have some nice shifts from epi64 to epi32.
Changed rescaler_t type to 'uint32_t' in order to squeeze in all the precision required.
The MIPS code has been disabled because it's now out-of-sync. Will be fixed in
a subsequent CL when the dust settles.
~30-35% faster
Change-Id: I32e4ddc00933f1b1aa3463403086199fd5dad07b
* vertical expansion now uses bilinear interpolation
* heavily assumes that the alpha plane is decoded in full, not row-by-row
* split the RescalerExportRow and RescalerImportRow methods into Shrink
and Expand variants.
* MIPS implementation of ExportRowExpand is missing.
There's room for extra speed optim and code re-org, but let's keep that for later patches.
addresses https://code.google.com/p/webp/issues/detail?id=254
Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb
This is designed for the simple use-case where one wants to decode all
frames one-by-one in order.
Also, use this API in anim_util library, which is in turn used by
anim_diff tool.
Change-Id: Ie8b653c04e867d40fd23321b3dd41b87689656c7
This makes the chains more efficient and a larger variety of data is tested.
0.02 % compression gain at q 100, 0.05 % at default quality. 0.8 % speedup by
callgrind.
0.16 % compression gain for lossy alpha ?!
Change-Id: I888120133352799eb14f5f602c7f40ab404bd665