Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove
one if condition inside the main/critial loop. Also separated TransformColor &
TransformColorInverse into separate functions and avoid one 'if condition'
inside this critical method.
This change speeds up lossless decoding for Lenna image about 5% and 1000 image
corpus by 3-4%.
Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d
* allow reading from stdin for dwebp / vwebp
* allow writing to stdout for gif2webp
by introducing a new function ExUtilReadFromStdin()
Example use: cat in.webp | dwebp -o - -- - > out.png
Note that the '-- -' option must appear *last*
(as per general fashion for '--' option parsing)
Change-Id: I8df0f3a246cc325925d6b6f668ba060f7dd81d68
These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
cwebp -z 5 in.png -o out_lossless.webp
There are 10 possible values for -z parameter:
0 (fastest, lowest compression)
to 9 (slowest, best compression)
A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.
This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.
Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)
Output is bitwise the same as before.
Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.
This speeds up lossless encoding by 1-2% and almost no impact on compression
density.
Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b
mostly by:
- storing a single rd-score instead of cost / distortion separately
- evaluating terminal cost only once
- getting some invariants out of the loops
- more consts behind fewer variables
Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c
incorporate non-last cost in per-level cost table
also: correct trellis-quant cost evaluation at nodes
(output a little bit different now). Method 6 is ~4% faster.
Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8
Speedup lossless encoder by 20-25% by optimizing:
- GetBestColorTransformForTile: Use techniques like binary search and
local minima search to reduce the search space.
- VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for
log(1 + x) and increase the threshold for calling the approximate
version of log_2 (compared to costly call to log()).
Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d
converts 2 s16 vectors to 2 u8 and store to uint8_t destination;
TransformAC3 can reuse this after a rework
Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0
Separate the C version from the MIPS32 version and have run-time
initialization during RescalerInit()
Change-Id: I93cfa5691c073a099fe62eda1333ad2bb749915b
Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.
Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.
This speeds up the lossless encoding at lower qualities by 10-15%.
Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
* simplify the endian logic
* remove the need for memset()
* write 16 or 32 at a time (likely aligned)
Makes the code a bit faster on ARM (~1%)
Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c