Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.
Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.
Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.
Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
HuffmanCost and HuffmanCostCombined optimized and added
'const' to some variables from ExtraCost functions.
Change-Id: I28b2b357a06766bee78bdab294b5fc8c05ac120d
When remapping buffer, br->eos_ was wrongly being set to true for
certain
images.
Also, refactored the end-of-stream detection as a function.
Reported in http://crbug.com/364830
Change-Id: I716ce082ef2b505fe24246b9c14912d8e97b5d84
Some versions of compiler in debug build can't find
a register in class 'GR_REGS' while reloading 'asm'
Number of used registers is decreased in this fix.
Change-Id: I7d7b8172b8f37f1de4db3d8534a346d7a72c5065
This is to help further optimizations.
(like in https://gerrit.chromium.org/gerrit/#/c/69787/)
There's a small slowdown (~0.5% at -z 9 quality) due to
function pointer usage. Note that, for speed, it's important
to return VP8LStreaks by value, and not pass a pointer.
Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000
.set at - Indicates that macro expansions may clobber
the assembler temporary ($at or $28) register.
Some macros may not be expanded without this
and will generate an error message if noat
is in effect.
"at" also added to the clobber list.
Change-Id: I67feebbd9f2944fc7f26c28496e49e1e2348529d
avoids:
src/dsp/enc_mips32.c: In function 'ITransformOne':
src/dsp/enc_mips32.c:123:3: can't find a register in class 'GR_REGS' while reloading 'asm'
src/dsp/enc_mips32.c:123:3: 'asm' operand has impossible constraints
Change-Id: Ic469667ee572f25e502c9873c913643cf7bbe89d
apparently faster, but we might save some load/store to/from memory
once we settle for the intrinsics-based FTransform()
(also: fixed some #ifdef USE_INTRINSICS problems)
Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7
Functions VP8LFastLog2Slow and VP8LFastSLog2Slow
also: replaced some "% y" by "& (y-1)" in the C-version
(since y is a power-of-two)
Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0
Verified OK, but right now they don't seem faster.
So they are disabled behind a USE_INTRINSICS flag (off for now)
Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c
+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual
Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d
slightly faster than the inline asm
in practice not much faster than the C-code in a full NEON build, but
still better overall in an Android-like one that only enables NEON for
certain files.
Change-Id: I69534016186064fd92476d5eabc0f53462d53146
* inverse transform is actually slower with intrinsics + gcc-4.6,
so is left disabled for now.
With gcc-4.8, it's a bit faster than inlined assembly.
* Sum of Square error function provide a 2-3% speed up
There's enabled by default (since there's no inlined-asm equivalent)
Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a
+ misc cosmetics
* seems 4% slower than inlined-asm with gcc-4.6
* is a tad faster (<1%) with gcc-4.8
(disabled for now)
Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095