larger values are still dealt with in the .cc
~5% faster encoding
Output size is slightly different (variably), because of
different floating-point calculation ordering.
Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4
* add SSE2 variant for lossless
* speed-up TransformColor calls using specialized TransformColorBlue/Red
* Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously.
This latter changes the output size a little bit.
Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89
This flag will make the code use no uint64, no asm, and no fancy
trick, but instead aim at being as simple and straightforward as
possible.
Main use is to help emscripten generate proper JS code.
More code needs to be simplified later.
Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY
Here are the typical timing for decoding a large image:
ARM7-a:
dwebp_justify_32_neon Time to decode picture: 3.280s
dwebp_justify_24_neon Time to decode picture: 2.640s
dwebp_justify_16_neon Time to decode picture: 2.723s
dwebp_justify_8_neon Time to decode picture: 2.802s
dwebp_justify_32 Time to decode picture: 4.264s
dwebp_justify_24 Time to decode picture: 3.696s
dwebp_justify_16 Time to decode picture: 3.779s
dwebp_justify_8 Time to decode picture: 3.834s
dwebp_32_neon Time to decode picture: 4.010s
dwebp_24_neon Time to decode picture: 2.725s
dwebp_16_neon Time to decode picture: 2.852s
dwebp_8_neon Time to decode picture: 2.778s
dwebp_32 Time to decode picture: 4.587s
dwebp_24 Time to decode picture: 3.800s
dwebp_16 Time to decode picture: 3.902s
dwebp_8 Time to decode picture: 3.815s
REFERENCE (HEAD) Time to decode picture: 3.818s
x86_64:
dwebp_justify_32 Time to decode picture: 0.473s
dwebp_justify_24 Time to decode picture: 0.434s
dwebp_justify_16 Time to decode picture: 0.450s
dwebp_justify_8 Time to decode picture: 0.467s
dwebp_32 Time to decode picture: 0.474s
dwebp_24 Time to decode picture: 0.468s
dwebp_16 Time to decode picture: 0.468s
dwebp_8 Time to decode picture: 0.481s
REFERENCE (HEAD) Time to decode picture: 0.436s
i386:
dwebp_justify_32 Time to decode picture: 0.723s
dwebp_justify_24 Time to decode picture: 0.618s
dwebp_justify_16 Time to decode picture: 0.626s
dwebp_justify_8 Time to decode picture: 0.651s
dwebp_32 Time to decode picture: 0.744s
dwebp_24 Time to decode picture: 0.627s
dwebp_16 Time to decode picture: 0.642s
dwebp_8 Time to decode picture: 0.642s
Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225
Fix the lossless decoder for the case when it has to apply other
inverse transforms before applying Color indexing inverse transform.
The main idea is to make ColorIndexingInverse virtually in-place: we
use the fact that the argb_cache is allocated to accommodate all
*unpacked* pixels of a macro-row, not just *packed* pixels.
Change-Id: I27f11f3043f863dfd753cc2580bc5b36376800c4
Order-by-cost mostly unchanged (up to a scaling constant 1/log(2))
(except for few minor diff in < 2% of cases)
+ remove unused field cost_mode->cache_bits_
Change-Id: I714f8ab12f49a23f5d499a64c741382c9b489a3e
The new modes are
MODE_rgbA
MODE_bgrA
MODE_Argb
MODE_rgbA_4444
It's binary incompatible, since the enums changed.
While at it, i removed the now unneeded KeepAlpha methods.
-> Saved ~12k of code!
* made explicit mention that alpha_plane is persistent,
so we have access to the full alpha plane data at all time.
Incremental decoding of alpha was planned for, but not
implemented. So better not dragged this constaint for now
and make the code easier until we revisit that.
Change-Id: Idaba281a6ca819965ca062d1c23329f36d90c7ff
* lossless_encoder: (46 commits)
split StoreHuffmanCode() into smaller functions
more consolidation: introduce VP8LHistogramSet
big code clean-up and refactoring and optimization
Some cosmetics in histogram.c
Approximate FastLog between value range [256, 8192]
Forgot to update out_bit_costs to symbol_bit_costs at one instance.
Evaluate output cluster's bit_costs once in HistogramRefine.
Simple Huffman code changes.
Lossless decoder: remove an unneeded param in ReadHuffmanCodeLengths().
Reducing emerging palette size from 11 to 9 bits.
Move GetHistImageSymbols to histogram.c
Improve predict vs no-predict heuristic.
code-moving and clean-up
reduce memory usage by allocating only one histo
Restrict histo_bits to ensure histo_image size is under 32MB
further simplification for the meta-Huffman coding
A quick pass of cleanup in backward reference code
Make transform bits a function of encode method (-m).
introduce -lossless option, protected by USE_LOSSLESS_ENCODER
Run TraceBackwards for higher qualities.
...
Conflicts:
src/enc/webpenc.c
Change-Id: I9a5d98cba0889ea91d10699466939cc283da345a
Profiled data: Profiled few images and found that in the function VP8LFastLog,
90% of time table lookup is performed, while rest of time (10%) call to log
function is made. Typical lookup accounts for 10 CPU instructions and call to
log 200 instruction counts. The weighted average comes out to be 30
instructions per call. For mid qualities (25-75), this function (VP8LFastLog)
accounts for 30-50% of total CPU cycles (via call path: VP8LCOlorSpaceTransform
-> PredictionCostCrossColor -> ShannonEntropy). After this change, the log is
called less that 1% of time, with average instructions being 15 per call.
Measured the performance over 1000 files for various qualities and found
overall compression speedup between 10-15% (in quality range [0, 75]). The
compression density loss is around 0.5% (though at some qualities, compression
is little better as well).
Change-Id: I247bc6a8d4351819c871f19d65455dc23aea8650
make elements of "Multiplier" struct unsigned, so that any negative values are
automatically converted to "mod 256" values.
Change-Id: Iab4f9bacc50dcd94a557944727d9338dbb0982f7
so that it uses original values of left, top etc for prediction rather than the
predicted values of the same. Also, do some renaming in the same to make it
more readable.
Change-Id: I2fe94e35a6700bd437f5c601e2af12323bf32445
- VP8LEncAnalyze, EvalAndApplySubtractGreen, ApplyPredictFilter,
ApplyCrossColorFilter
- Added palette handling and transform buffer management in VP8LEncodeImage()
- Add Transforms (subtract Green, Predict, cross_color) to dsp/lossless.c.
These are more-or-less copied from src/lossless code.
After this Change, will implement the EncodeImageInternal() method.
Change-Id: Idf71f803c24b3b5ae3b5079b15e019721784611d
src/dsp/lossless.c: In function 'VP8LInverseTransform':
src/dsp/lossless.c:312:23: warning: 'packed_pixels' may be used
uninitialized in this function [-Wuninitialized]
src/dsp/lossless.c:304:16: note: 'packed_pixels' was declared here
src/dsp/lossless.c:258:34: warning: 'm.red_to_blue_' may be used
uninitialized in this function [-Wuninitialized]
src/dsp/lossless.c:275:17: note: 'm.red_to_blue_' was declared here
src/dsp/lossless.c:257:34: warning: 'm.green_to_blue_' may be used
uninitialized in this function [-Wuninitialized]
src/dsp/lossless.c:275:17: note: 'm.green_to_blue_' was declared here
src/dsp/lossless.c:255:33: warning: 'm.green_to_red_' may be used
uninitialized in this function [-Wuninitialized]
src/dsp/lossless.c:275:17: note: 'm.green_to_red_' was declared here
patch by pepijn vaneeckhoudt
Change-Id: Iffa4764487a75479df45e772169325cd9ee60d94
Pulled from the current HEAD (218c32e).
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: I40538799dbf999abb9408ac83f55b897d8e22498