Arrays were compared 32 bits at a time, it is now done 64 bits at a time.
Overall encoding speed-up is only of 0.2% on @skal's small PNG corpus.
It is of 3% on my initial 1.3 Mp desktop screenshot image.
Change-Id: I1acb32b437397a7bf3dcffbecbcd4b06d29c05e1
The same computation was done for both values: go over two buffers,
sum them up, and take a decision on the sum at each iteration.
MIPS32 code has been disabled for now, pending a code update.
Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d
instead of per block. This prepares for a next CL that can make the
predictors alter RGB value behind transparent pixels for denser
encoding. Some predictors depend on the top-right pixel, and it must
have been already processed to know its new RGB value, so requires per
scanline instead of per block.
Running the encode speed test on 1000 PNGs 10 times with default
settings:
Before:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s
After:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s
Same but with quality 0, method 0 and 30 iterations:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s
No effect on compressed size, this produces exactly same files. No
significant measured effect on speed. Expected faster speed from better
memory layout with scanline processing but slower speed due to needing
to get predictor mode per pixel, may compensate each other.
Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334
the problem was the incorporation of the extra constant 1<<16 in the kC1
constant, to emulate the addition. It's now removed and the addition is
performed explicitly.
No real speed difference observed.
cf. issue #278
Change-Id: I2c6499031571d98afff392fb5ebe21a5fa60722d
* changes:
Makefile.vc: enable WEBP_USE_THREAD for windows phone
thread: use CreateThread for windows phone
thread: use WaitForSingleObjectEx if available
thread: use InitializeCriticalSectionEx if available
thread: use native windows cond var if available
if FANCY_UPSAMPLING was not defined but io->fancy_upsampling was set,
then the call to WebPInitSamplers() was skipped -> boom.
Change-Id: Id63e2ecc09f532fbe2ec9936d9ce4b502ba8fac5
Rename the flag to exact instead of the opposite cleanup_alpha. Add the flag to
WebPConfig. Do the cleanup in the webp encoder library rather than the cwebp
binary, this will be needed for the next stage: smarter alpha cleanup for
better compression which cannot be done as a preprocessing due to depending on
predictor choices in the encoder.
Change-Id: I2fbf57f918a35f2da6186ef0b5d85e5fd0020eef
the original change triggered several internal API modifs.
This is to ensure that we're never computing pointer that can
possibly wrap around, or differences between pointers that can
overflow.
no observed speed difference
Change-Id: I9c94dda38d94fecc010305e4ad12f13b8fda5380
We now consider 3 special cases:
* htree-group has only 1 code (no bit is read from bitstream)
* htree-group has few enough literal symbols, so that all the bit
codes can fit into a look-up table of less than 64 entries
* htree-group has a trivial arb literal (not GREEN!), like before
No overall speed change.
Change-Id: I6077fa0b7e5c31a6c67aa8aca859c22cc50ee254
We now get error string instead of printing it.
The verbose option is now only used to print info and warnings.
Change-Id: I985c5acd427a9d1973068e7b7a8af5dd0d6d2585
+ jenkins fixes for native config (library order)
+ add a missing -lm
+ replace log10 by log, just in case
+ partially reverted configure.ac to remove the C++ part
Change-Id: Iee099c544451b23c6cfaca53d5a95d2d332e066e
It was needed earlier for WebPAnimEncoder API when it was using structs
like WebPConfig, but it only uses pointers to those now.
Change-Id: Ic0c144966421c678e8ef54b3fa81574bb2c9cd08
global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)
Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV
Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.
ConvertRowToY() should go into dsp/ too, at some point.
Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425