Commit Graph

582 Commits

Author SHA1 Message Date
skal
ca3d746e39 use block-based allocation for backward refs storage, and free-lists
Non-photo source produce far less literal reference and their
buffer is usually much smaller than the picture size if its compresses
well. Hence, use a block-base allocation (and recycling) to avoid
pre-allocating a buffer with maximal size.

This can reduce memory consumption up to 50% for non-photographic
content. Encode speed is also a little better (1-2%)

Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11
2014-05-05 11:11:55 -07:00
skal
d3bcf72bf5 Don't allocate VP8LHashChain, but treat like automatic object
the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_)
is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain
in the encoder.

Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677
2014-04-30 14:10:48 -07:00
Pascal Massimino
cf5eb8ad19 remove some uint64_t casts and use.
We use automatic int->uint64_t promotion where applicable.

(uint64_t should be kept only for overflow checking and memory alloc).

Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125
2014-04-29 09:08:25 -07:00
Pascal Massimino
b3a616b356 make HistogramAdd() a pointer in dsp
* merged the two HistogramAdd/AddEval() into a single call
  (with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
  of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster

Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
2014-04-28 10:09:34 -07:00
Pascal Massimino
4825b4360d fix warning about size_t -> int conversion
+ re-order and add some const

Change-Id: I3746520b75699e56e20835d10d1dd9cd9fd6d85d
2014-04-27 00:50:07 -07:00
Vikas Arora
0b896101b4 Reduce memory footprint for encoding WebP lossless.
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.

Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.

Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.

Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
2014-04-26 01:14:33 -07:00
skal
75b12006e3 Move the HuffmanCost() function to dsp lib
This is to help further optimizations.
(like in https://gerrit.chromium.org/gerrit/#/c/69787/)

There's a small slowdown (~0.5% at -z 9 quality) due to
function pointer usage. Note that, for speed, it's important
to return VP8LStreaks by value, and not pass a pointer.

Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000
2014-04-18 11:59:48 -07:00
skal
a9fc697cb6 Merge "WIP: extract the float-calculation of HuffmanCost from loop" 2014-04-15 11:33:11 -07:00
Djordje Pesut
4ae0533f39 MIPS: MIPS32r1: Added optimizations for ExtraCost functions.
ExtraCost and ExtraCostCombined

Change-Id: I7eceb9ce2807296c6b43b974e4216879ddcd79f2
2014-04-15 15:37:06 +02:00
skal
b30a04cf11 WIP: extract the float-calculation of HuffmanCost from loop
new function: VP8FinalHuffmanCost()

Change-Id: I42102f8e5ef6d7a7af66490af77b7dc2048a9cb9
2014-04-15 14:52:52 +02:00
Slobodan Prijic
2b1b4d5ae9 MIPS: MIPS32r1: Add optimization for GetResidualCost
+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual

Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d
2014-04-08 15:28:49 +02:00
skal
869eaf6c60 ~30% encoding speedup: use NEON for QuantizeBlock()
also revamped the signature to avoid having to pass the 'first' parameter

Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5
2014-04-08 03:08:22 -07:00
Vikas Arora
bc374ff39e Use histogram_bits to initalize transform_bits.
This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.

Change-Id: I491aa1c726def934161d4a4377e009737fbeff82
2014-04-02 11:46:40 -07:00
Vikas Arora
6af6b8e1b6 Tune HistogramCombineBin for large images.
Tune HistogramCombineBin for hard images that are larger than 1-2 Mega
pixel and represent photographic images.

This speeds up lossless encoding on 1000 image corpus by 10-12% and compression
penalty of 0.1-0.2%.

Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81
2014-03-28 07:09:59 -07:00
skal
af93bdd6bc use WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free
there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.

The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.

Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).

Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b
2014-03-27 15:50:59 -07:00
James Zern
fbed36433d Merge "dsp: reuse wht transform from dec in encoder" 2014-03-26 15:13:07 -07:00
skal
d1b33ad58b 2-5% faster trellis with clang/MacOS
(and ~2-3% on ARM)

We don't need to store cost/score for each node, but only for
the current and previous one -> simplify code and save some memory.

Also made the 'Node' structure tighter.

Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6
2014-03-26 22:33:01 +01:00
James Zern
df230f2723 dsp: reuse wht transform from dec in encoder
Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717
2014-03-22 13:25:08 -07:00
Pascal Massimino
59daf08362 Merge "cosmetics:" 2014-03-18 04:02:33 -07:00
Pascal Massimino
536220084c cosmetics:
- use VP8ScanUV, separate from VP8Scan[] (for luma)
 - fix indentation
 - few missing consts
 - change TrellisQuantizeBlock() signature

Change-Id: I94b437d791cbf887015772b5923feb83dd145530
2014-03-18 03:34:56 -07:00
James Zern
3e7f34a3fb AssignSegments: quiet array-bounds warning
nb (enc->segment_hdr_.num_segments_) will be in the range
[1, NUM_MB_SEGMENTS].

Change-Id: I5c2bd0bb82b17c99aff39c98b6b1747fc040dc16
2014-03-14 18:47:52 -07:00
James Zern
cf821c821f UpdateHistogramCost: avoid implicit double->float
all the functions involved return double and later these locals are used
in double calculations. fixes a vs build warning

Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14
2014-03-14 11:18:52 -07:00
Vikas Arora
1c58526fe1 Fix few nits
Add/remove few casts, fixed indentation.

Change-Id: Icd141694201843c04e476f09142ce4be6e502dff
2014-03-13 13:57:39 -07:00
Vikas Arora
fef22704ec Optimize and re-structure VP8LGetHistoImageSymbols
Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash
for merging the Histograms more efficiently, instead of the randomized
heuristic of existing method HistogramCombine.

This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50)
with 0.8% penalty in compression density. For lower method, the speed up is 25-30%,
with 0.4% penalty in the compression density.

Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce
2014-03-13 11:48:37 -07:00
Vikas Arora
5f0cfa80ff Do a binary search to get the optimum cache bits.
This speeds up the lossless encoder by a bit (1-2%), without impacting the
compression density.

Change-Id: Ied6fb38fab58eef9ded078697e0463fe7c560b26
2014-03-13 10:30:32 -07:00
skal
65b99f1c92 add a -z option to cwebp, and WebPConfigLosslessPreset() function
These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
   cwebp -z 5 in.png -o out_lossless.webp

There are 10 possible values for -z parameter:
   0 (fastest, lowest compression)
to 9 (slowest, best compression)

A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.

This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.

Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04
2014-03-11 23:25:35 +01:00
skal
30176619c6 4-5% faster trellis by removing some unneeded calculations.
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)

Output is bitwise the same as before.

Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
2014-03-06 15:57:25 +01:00
James Zern
687a58ecc3 histogram.c: reindent after b33e8a0
b33e8a0 Refactor code for HistogramCombine.

Change-Id: Ia1b4b545c5f4e29cc897339df2b58f18f83c15b3
2014-03-04 00:38:14 -08:00
James Zern
42eb06fc0e Merge "few cosmetics after patch #69079" 2014-03-03 15:13:25 -08:00
skal
82af82644b few cosmetics after patch #69079
Change-Id: Ifa758420421b5a05825a593f6b43504887603ee7
2014-03-03 23:53:08 +01:00
Vikas Arora
b33e8a05ee Refactor code for HistogramCombine.
Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.

This speeds up lossless encoding by 1-2% and almost no impact on compression
density.

Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b
2014-03-03 13:50:42 -08:00
skal
5aeeb087d6 5-10% encoding speedup with faster trellis (-m 6)
mostly by:
- storing a single rd-score instead of cost / distortion separately
- evaluating terminal cost only once
- getting some invariants out of the loops
- more consts behind fewer variables

Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c
2014-03-03 22:07:06 +01:00
Pascal Massimino
4287d0d49b speed-up trellis quant (~5-10% overall speed-up)
store costs[] in node instead of context

Change-Id: I6aeb0fd94af9e48580106c41408900fe3467cc54
also: various cosmetics
2014-02-27 00:06:00 -08:00
Pascal Massimino
390c8b316d lossy encoding: ~3% speed-up
incorporate non-last cost in per-level cost table

also: correct trellis-quant cost evaluation at nodes
(output a little bit different now). Method 6 is ~4% faster.

Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8
2014-02-26 05:52:24 -08:00
Vikas Arora
c16cd99aba Speed up lossless encoder.
Speedup lossless encoder by 20-25% by optimizing:
- GetBestColorTransformForTile: Use techniques like binary search and
  local minima search to reduce the search space.
- VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for
  log(1 + x) and increase the threshold for calling the approximate
  version of log_2 (compared to costly call to log()).

Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d
2014-02-21 22:13:50 -08:00
skal
0235d5e44b 1-2% faster quantization in SSE2
C-version is a bit faster too (sub-1% faster on ARM)

Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97
2014-02-13 15:55:30 -08:00
James Zern
a42ea9742a cosmetics: backward_references.c: reindent after a7d2ee3
a7d2ee3 Optimize cache estimate logic.

Change-Id: I81dd1eea49f603465dc5f3afae8a101e5205e963
2014-02-11 15:52:22 -08:00
Vikas Arora
fde2904b8a Increase initial buffer size for VP8L Bit Writer.
Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.

Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
2014-02-11 11:13:21 -08:00
Vikas Arora
a7d2ee39be Optimize cache estimate logic.
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.

This speeds up the lossless encoding at lower qualities by 10-15%.

Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
2014-02-11 10:59:01 -08:00
Scott Talbot
391316fee2 Don't dereference NULL, ensure HashChain fully initialized
Found by clang's static analyzer, they look validly uninitialized
to me.

Change-Id: I650250f516cdf6081b35cdfe92288c20a3036ac8
2014-02-03 21:16:59 -08:00
skal
9882b2f953 always use fast-analysis for all methods.
This makes the segmentation overall less prone to
local-optimum  or boundary effect.

(and overall, encoding is a little faster)

Change-Id: I35688098b0f43c28b5cb81c4a92e1575bb0eddb9
2014-01-24 18:17:04 +01:00
skal
c741183c10 make WebPCleanupTransparentArea work with argb picture
the -alpha_cleanup flag was ineffective since we switched cwebp
to using ARGB input always.

Original idea by David Eckel (dvdckl at gmail dot com)

Change-Id: I0917a8b91ce15a43199728ff4ee2a163be443bab
2014-01-17 20:19:16 +01:00
James Zern
ea59a8e980 Merge "Merge tag 'v0.4.0'" 2014-01-10 17:09:19 -08:00
skal
7574bed43d fix comments related to array sizes
thanks to foivos_g4 at hotmail dot com for spotting this.

Change-Id: I8cd301bae58a6edbc3ef47607654bc21321721ca
2014-01-09 17:30:25 +01:00
James Zern
8c524db84c bump version to 0.4.0
libwebp{,decoder} - 0.4.0
libwebp libtool - 5.0.0
libwebpdecoder libtool - 1.0.0

mux/demux - 0.2.0
libtool - 1.0.0

Change-Id: Idbd067f95a6af2f0057d6a63ab43176fcdbb767d
2013-12-18 19:20:00 -08:00
Pascal Massimino
495bef413d fix bug in TrellisQuantize
the *quantized* level should be clipped to 2047, not the
original coeff.
(similar problem was fixed in the regular quantize function
quite some time ago)

Change-Id: I2fd2f8d94561ff0204e60535321ab41a565e8f85
2013-12-17 11:08:01 -08:00
James Zern
605a712701 simplify __cplusplus ifdef
drop c_plusplus which is from a quite ancient pre-standard compiler

Change-Id: I9e357b3292a6b52b14c2641ba11f4f872c04b7fb
2013-12-16 20:16:02 -08:00
James Zern
5227d99146 drop: ifdef __cplusplus checks from C files
the prototypes are already marked in the headers

Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094
2013-12-13 11:42:13 -08:00
skal
73b731fb42 introduce a special quantization function for WHT
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.

Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
2013-12-10 14:21:47 +01:00
skal
a3359f5d2c Only compute quantization params once
(all quantization params #1..#15 are the same)

Change-Id: If04058bd89fe2677b5b118ee4e1bcce88f0e4bf5
2013-12-10 05:36:23 +01:00
skal
d513bb62bc * fix off-by-one zthresh calculation
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this

Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)

Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24
2013-12-09 13:56:04 +01:00
skal
6f104034a1 fix bug due to overzealous check in WebPPictureYUVAToARGB()
This tests prevented views to be converted to ARGB

https://code.google.com/p/webp/issues/detail?id=178

Change-Id: I5ba66da2791e6f1d2bfd8c55b5fffe6955263374
2013-12-02 15:23:12 +01:00
James Zern
cc55790e37 Merge changes I8bb7a4dc,I2c180051,I021a014f,I8a224a62
* changes:
  mux: add some missing casts
  enc/vp8l: add a missing cast
  idec: add some missing casts
  ErrorStatusLossless: correct return type
2013-11-27 17:06:35 -08:00
James Zern
4931c3294b cosmetics: fix some typos
Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c
2013-11-26 19:21:14 -08:00
James Zern
617d93480e enc/vp8l: add a missing cast
Change-Id: I2c1800516eb4573ae2599866ace10017b865f23a
2013-11-25 20:48:13 -08:00
skal
cb261f790f fix a descaling bug for vertical/horizontal U/V interpolation
RGBToU/V calls expects two extra precision bits, they were only
given one by SUM2H and SUM2H macros.

For rounding coherency, also changed SUM1 macro.

Change-Id: I05f96a46f5d4f17b830d0420eaf79b066cdf78d4
2013-11-19 11:24:09 +01:00
skal
ff379db317 few % speedup of lossless encoding
mostly visible for method 4 and up

Change-Id: I1561d871bc055ec5f7998eb193d927927d3f2add
2013-11-12 00:09:45 +01:00
James Zern
7708e60908 Merge "detect flatness in blocks and favor DC prediction" 2013-11-01 11:24:09 -07:00
James Zern
06b1503eff Merge "add comment about the kLevelsFromDelta[][] LUT generation" 2013-10-31 21:37:55 -07:00
skal
5935259cd0 add comment about the kLevelsFromDelta[][] LUT generation
Change-Id: Id1a91932bf05aeee421761667af351ef2200c334
2013-10-31 21:35:55 -07:00
skal
e3312ea681 detect flatness in blocks and favor DC prediction
this avoids local-minima that look bad, even if the distortion
looks low (e.g. gradients, sky,...). Mostly visible in the q=50-80 range.
Output size is mostly unchanged.

Change-Id: I425b600ec45420db409911367cda375870bc2c63
2013-11-01 00:47:04 +01:00
skal
a014e9c9cd tune quantization biases toward higher precision
* raise U/V quantization bias to more neutral values
* also raise the non-zero AC bias for Y1/Y2 matrices

(we need all the precision we can for U/V leves, which are often empty)
This will increase quality in the higher range (q >= 90) mostly.
Files size is exacted to raise a little (5-7%). and SSIM accordingly of course.

Change-Id: I8a9ffdb6d8fb6dadb959e3fd392e66dc5aaed64e
2013-10-30 23:57:23 +01:00
skal
1e898619cb add helpful PrintBlockInfo() function
(protected under a DEBUG_BLOCK compile flag)

Change-Id: Icb8da02dbd00e5cf856c314943c212f1c9578d9b
2013-10-30 19:25:27 +01:00
Pascal Massimino
98aa33cf1e extract random utils to their own file util/random.[ch]
they'll be used for decoding too, probably.

Change-Id: Id9cbb250c74fc0e876d4ea46b1b3dbf8356d6725
2013-10-30 02:00:33 -07:00
James Zern
d3408720d8 Merge "fast auto-determined filtering strength" 2013-10-29 12:47:58 -07:00
skal
f8bfd5cd1e fast auto-determined filtering strength
kLevelsFromDelta[sharpness][delta] is an inverse look-up table
that tells the minimum filtering strength needed to trigger the
filtering of a step with amplitude 'delta'. We use this table
in various situations:

a) when computing the initial (/global) filtering
strength for each segment. We look at the quantization
step and deduce the proper filtering strength needed
to result this quantization noise (talking the -f option
into account).

b) during intra16 calculation, when a block ends up
very empty (only DC coeffs are non-zero, all ACs have
vanished). We'll rely on the in-loop filtering to
restore the smoothness (if the source was gradient-like
smooth. That's why we look at the distortion too before
triggering the filtering).

Step b) goes _in addition_ to a), potentially raising
the filtering strength if blockiness is likely.

Change-Id: Icaeca93ef21da195b079e6587a44d9edfc8e9efa
2013-10-29 20:13:29 +01:00
Pascal Massimino
ac0bf951ca small clean-up in ExpandMatrix()
Change-Id: Ib06cb1658a6548f06bb7320310b3864881b606a7
2013-10-29 19:58:57 +01:00
skal
7c098d1814 Use some gamma-curve range compression when computing U/V average
This helps for discolorated chroma-subsampled edges.

Change-Id: I1d8ce87b66cb7e8b3572e6722905beabf0f50554
2013-10-18 21:28:05 +02:00
skal
0b2b05049f Use deterministic random-dithering during RGB->YUV conversion
-> helps debanding (sky, gradients, etc.)

This dithering can only be triggered when using -preset photo
or -pre 2 (as a preprocessing). Everything is unchanged otherwise.

Note that this change is likely to make the perceived PSNR/SSIM drop
since we're altering the input internally.

Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813
2013-10-17 22:36:49 +02:00
Pascal Massimino
f7146bc1e6 small optimization in segment-smoothing loop
probably not much of a speed difference

Change-Id: I08c41d82c3c2eb5ff9ec9ca9d81af2bb09b362de
2013-10-07 07:44:51 -07:00
skal
2b0a759335 Merge "fix some warnings from static analysis" 2013-09-13 13:00:12 -07:00
skal
d51f45f047 fix some warnings from static analysis
http://code.google.com/p/webp/issues/detail?id=138

Change-Id: I21470e965357cc14eab356e2c477c7846ff76ef2
2013-09-13 11:33:30 +02:00
Pascal Massimino
d134307b7f fix conversion warning on MSVC
"src\enc\frame.c(88) : warning C4244: '=' : conversion from 'const double' to 'float', possible loss of data"

Change-Id: I143cb0bb6b69e1b8befe9b4f24b71adbc28095c2
2013-09-12 18:57:51 -07:00
skal
80b54e1c69 allow search with token buffer loop and fix PARTITION0 problem
The convergence algo is noticeably faster and more accurate.

Try it with: 'cwebp -size xxxxx -pass 8 ...' or 'cwebp -psnr 39 -pass 8 ...'
for instance

Allow full-looping with TokenBuffer case, and make the non-TokenBuffer
case match too.

In case Partition0 is likely to overflow, retry encoding with harder
limits on max_i4_header_bits_.

This CL should make -partition_limit option somewhat useless,
since the fix made automatically (albeit in a non-optimal way yet).

Change-Id: I46fde3564188b13b89d4cb69f847a5f24b8c735b
2013-09-11 21:15:28 +02:00
skal
b7d4e04255 add VP8EstimateTokenSize()
estimates final size of coded tokens given a set of probabilities

Change-Id: Ia5a459422557d98b78b3cd8e1a88cb30835825b6
2013-09-11 10:08:49 +02:00
James Zern
10fddf53bb enc/quant.c: silence a warning
score_t -> int: rd_i4.H contains a value from a uint16_t
lookup

Change-Id: I7227de2dfab74b4f796abbc47955197ffa0e6110
2013-09-11 00:04:11 -07:00
Pascal Massimino
9f24519e82 encoder: misc rate-related fixes
* fix VP8FixedCostsI4ÆÅ table
   (the constant cost '211' was erronenously included)
* use the rd-score for '211' correctly (calling SetRDScore() for good)
* count partition0 bits separately during rd-opt

No meaningful difference in rd-curve.

Change-Id: I6c49a150cf28928d9a92c32fff097600d7145ca4
2013-09-10 00:25:32 -07:00
James Zern
c663bb214a Merge "simplify VP8IteratorSaveBoundary() arg passing" 2013-09-06 14:21:42 -07:00
Pascal Massimino
f691f0e461 simplify VP8IteratorSaveBoundary() arg passing
we only need to save yuv_out_, so no need for the arg

Change-Id: I7bad5d910e81ed2eda5c9787821fd1cfe905bd92
2013-09-06 02:11:16 -07:00
Pascal Massimino
42542be855 up to 6% faster encoding with clang compiler
mostly by revamping the main loop of GetResidualCost() and avoiding some branches

Change-Id: Ib05763e18a6bf46c82dc3d5d1d8eb65e99474207
2013-09-05 10:38:22 -07:00
skal
93402f02db multi-threaded segment analysis
When -mt is used, the analysis pass will be split in two
and each halves performed in parallel. This gives a 5%-9% speed-up.

This was a good occasion to revamp the iterator and analysis-loop
code. As a result, the default (non-mt) behaviour is a tad (~1%) faster.

Change-Id: Id0828c2ebe2e968db8ca227da80af591d6a4055f
2013-09-05 09:13:36 +02:00
Pascal Massimino
2fd091c9ae Merge "use NULL for lf_stats_ testing, not bool" 2013-09-04 07:18:42 -07:00
skal
cfb56b1707 make -pass option work with token buffers
-pass 2 can be useful sometimes. More passes usually don't help more.
This change is a step toward being able to re-code the whole picture
with varying parameter (when token buffer is used).

Change-Id: Ia2538e2069a53c080e2ad248c18a1e04623a9304
2013-09-03 23:37:42 +02:00
Pascal Massimino
35dba337a3 use NULL for lf_stats_ testing, not bool
Change-Id: I14c6341bce1c745b7f2d1b790f2d4f8441d01be6
2013-09-03 08:45:59 -07:00
skal
733a7faae4 enc->Iterator memory cleanup
* move yuv_in_/out_* scratch buffers to iterator
* add y_top_/uv_top_ shortcuts in iterator

That's ~3k of stack size instead of heap.
But it allows having several iterators work in parallel.

Change-Id: I6a437c0f2ef1e5d398c1d6a2fd4974fa0869f0c1
2013-08-31 23:38:11 +02:00
skal
ad6ac32d7c simplify upsampler calls: only allow 'bottom' to be NULL
If 'top' was meant to be NULL, then bottom and top can be
swapped. Logic is simpler.

+ fix compilation in non-FANCY_UPSAMPLING mode

Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff
2013-08-19 16:47:51 -07:00
Pascal Massimino
f72fab7045 cosmetics
Change-Id: Ib6fa2a7e1db8c5c48607c7097520ffca34a6cb66
2013-08-15 04:16:37 -07:00
Vikas Arora
e081f2f359 Pack code & extra_bits to Struct (VP8LPrefixCode).
Also created variant VP8LPrefixEncodeBits that returns the
code & extra_bits only.
There's no impact on compression density and compression speed.

Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c
2013-08-12 11:56:42 -07:00
Vikas Arora
69257f70df Create LUT for PrefixEncode.
This speeds up lossless compression by 5%.
Change-Id: Ifd114b1d9850dc3aac74593809e7d48529d35e3d
2013-08-05 10:20:18 -07:00
skal
de4d4ad598 VP8EncIterator clean-up
- remove unused fields from iterator
- introduce VP8IteratorSetRow() too
- rename 'done_' to 'countdown_'
- bring y_left_/u_left_/v_left_ from VP8Encoder

Change-Id: Idc1c15743157936e4cbb7002ebb5cc3c90e7f92a
2013-08-01 23:05:54 -07:00
Pascal Massimino
8dcae8b3cf fix rescaling-with-alpha inaccuracy
(still missing YUVA decoding case for now)

https://code.google.com/p/webp/issues/detail?id=160

Change-Id: If723b4a5c0a303d0853ec9d839f995adce056095
2013-07-26 12:10:26 -07:00
Vikas Arora
7d60bbc6d9 Speed up HashChainFindCopy function.
Speed up HashChainFindCopy by optimizing on number of calls to
FindMatchLength method.
This change speeds up the lossless & lossy (Alpha) encoding by 20%
without loss of compression density.
At method=3, lossy (Alpha) compression speed (and density) remains
unchanged, as at that settings, the costly Backward Refs method is not
called

Change-Id: Ia1797148e9e4ee2787011837fa248afbae2242cb
2013-07-16 19:58:18 -07:00
Vikas Arora
6674014077 Speedup Alpha plane encoding.
Disable costly 'BackwardReferencesTraceBackwards' for encoding Alpha plane.
Increase the threshold for triggering 'BackwardReferencesTraceBackwards' to
quality 25 and above. Also lower the Alpha quality (at method 3) to be
lesser than this threshold (25).

Change-Id: Ic29fb2e6943472c564223df9fe099b19ccda0f31
2013-07-12 11:02:13 -07:00
Vikas Arora
8967b9f37e SSE2 for lossless decoding (critical) functions.
This speeds up WebP lossless decoding by 20%. In particular, the
photographic images get 35% speedup.

Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc
2013-06-27 11:42:45 -07:00
Pascal Massimino
2c07143b9d remove a malloc() in case we're using only FILTER_NONE for alpha
+ some revamp and cleanup of the alpha-filter trial loop
+ EncodeAlphaInternal() now just takes a FilterTrial param

Change-Id: Ief84385083b1cba02678bbcd3dbf707245ee962f
2013-06-24 02:07:52 -07:00
Urvang Joshi
fdb9177917 Rename a method
Change-Id: Idc5f4ea1019a828a5fa1fbd81804fa51cd7e8673
2013-06-21 00:35:39 -07:00
James Zern
30e77d0f66 Merge branch '0.3.0'
* 0.3.0: (57 commits)
  update ChangeLog
  Regression fix for alpha channels using color cache:
  wicdec: silence a format warning
  muxedit: silence some uninitialized warnings
  update ChangeLog
  update NEWS
  bump version to 0.3.1
  Revert "add WebPBlendAlpha() function to blend colors against background"
  Simplify forward-WHT + SSE2 version
  probe input file and quick-check for WebP format.
  configure: improve gl/glut library test
  update copyright text
  configure: remove use of AS_VAR_APPEND
  fix EXIF parsing in PNG
  add doc precision for WebPPictureCopy() and WebPPictureView()
  remove datatype qualifier for vmnv
  fix a memory leak in gif2webp
  fix two minor memory leaks in webpmux
  remove some cruft from swig/libwebp.jar
  README: update swig notes
  ...

Conflicts:
	NEWS
	examples/gif2webp.c
	src/dec/alpha.c
	src/dec/idec.c
	src/dec/vp8l.c
	src/enc/alpha.c
	src/enc/vp8l.c

Change-Id: Ib202fad7825a090c3b3a5169acd171369cface47
2013-06-20 14:47:48 -07:00
skal
3307c16327 Don't set alpha-channel to 0xff for alpha->green uplift
usually saves ~4 bytes on average (but, up to 10 -or even 16- sometimes).

Change-Id: Ib500e1a35471a2f3da453ffc8c7e95d28b8d34fe
2013-06-18 00:03:32 +02:00
Urvang Joshi
8cf0701eb0 Alpha encoding: never filter in case of NO_COMPRESSION
This is because, filtering would never affect compression density for
this case.

Change-Id: I4bb14d3eb7da0a3805fda140eb1dfbf9ccc134f5
2013-06-14 13:34:25 -07:00
James Zern
5a92c1a5e9 bump version to 0.3.1
libwebp{,decoder} - 0.3.1
libwebp libtool - 4.3.0 (compatible release)
libwebpdecoder libtool - 0.1.0 (compatible release)

mux/demux - 0.1.1
libtool - 0.1.0 (compatible release)

Change-Id: Icc8329a6bcd9eea5a715ea83f1535a66d6ba4b58
2013-06-12 23:19:13 -07:00
James Zern
67bc353e6d Revert "add WebPBlendAlpha() function to blend colors against background"
This reverts commit dcbb1ca54a.

Dropping this for now to maintain compatibility for 0.3.1.

Change-Id: I44e032a072d317bb67e1439c42cff923e603038f
2013-06-12 15:02:09 -07:00
James Zern
c7e89cbb02 update copyright text
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files

based on the discussion in:
https://codereview.chromium.org/12771026/

Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
(cherry picked from commit d640614d54)
2013-06-11 15:03:22 -07:00
skal
a2ea46439e Fix the bug in ApplyPalette.
The auto-infer logic of detecting the 'Alpha' use case
(via check '(palette[i] & 0x00ff00ffu) != 0' is failing
for this corner case image with all black pixels (rgb = 0)
and different Alpha values.

-> switch generic use-LUT detection

Change-Id: I982a8b28c8bcc43e3dc68ac358f978a4bcc14c36
(cherry picked from commit afa3450c11)
2013-06-11 15:00:47 -07:00
Vikas Arora
f036d4bfa6 Speed up ApplyPalette for ARGB pixels.
Added 1 pixel cache for palette colors for faster lookup.
This will speedup images that require ApplyPalette by 6.5% for lossless
compression.

Change-Id: Id0c5174d797ffabdb09905c2ba76e60601b686f8
(cherry picked from commit 742110ccce)
2013-06-11 15:00:46 -07:00
Vikas Arora
498d4dd634 WebP-Lossless encoding improvements.
Lossy (with Alpha) image compression gets 2.3X speedup.
Compressing lossless images is 20%-40% faster now.

Change-Id: I41f0225838b48ae5c60b1effd1b0de72fecb3ae6
(cherry picked from commit 8eae188a62)
2013-06-11 15:00:45 -07:00
James Zern
7a650c6ad6 prevent signed int overflow in left shift ops
force unsigned when shifting by 24.

Change-Id: Ie229d252e2e4078107cd705b09397e686a321ffd
(cherry picked from commit f4f90880a8)
2013-06-11 15:00:43 -07:00
Urvang Joshi
f4c7b6547b WebPEncode: An additional check.
Start VP8EncLoop/VP8EncTokenLoop only if VP8EncStartAlpha succeeded.

Change-Id: Id1faca3e6def88102329ae2b4974bd4d6d4c4a7a
(cherry picked from commit 67708d6701)
2013-06-11 15:00:42 -07:00
Pascal Massimino
dcbb1ca54a add WebPBlendAlpha() function to blend colors against background
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.

Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
(cherry picked from commit e7d9548c9b)
2013-06-11 15:00:41 -07:00
Vikas Arora
bf867bf296 Tuned cross_color parameter (step) for lower qual
Tuned the cross_color transform parameter (step) for lower quality
levels. This change gives speedup of 20% at lower qualities (25) and 10% at
moderate quality level (50) with a loss of 0.25% in compression density.
Also removed TODO for cross_color transform. Observed good correlation of
this with the predict transform.

Change-Id: I8a1044e9f24e6a5f84295c030fd444d0eec7d154
2013-06-11 12:15:07 -07:00
James Zern
d640614d54 update copyright text
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files

based on the discussion in:
https://codereview.chromium.org/12771026/

Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
2013-06-06 23:09:14 -07:00
skal
ea63d61937 fix a type warning on VS9 x86
"warning C4244: 'function' : conversion from 'uint64_t' to 'size_t', possible loss of data"

Change-Id: Ibd9f6a24993518d658d08127d616a17d7b99e0e4
2013-06-05 10:14:04 +02:00
Vikas Arora
7413394e7f Fix the memory leak in ApplyFilters.
Change-Id: Iba1b1adf3088ea9c43e4f602a93e77450f6c6170
2013-05-21 14:00:26 -07:00
skal
2053c2cff2 simplify the alpha-filter testing loop
Change-Id: Iacebae749c37edc87a3c94c76cd589a2565ee642
2013-05-21 10:29:27 +02:00
skal
0d25876bad use uint8_t for inv_palette[]
Change-Id: I5005ce68d89bfb657d46ad8acc4368c29fa0c4fd
2013-05-18 17:13:58 +02:00
skal
afa3450c11 Fix the bug in ApplyPalette.
The auto-infer logic of detecting the 'Alpha' use case
(via check '(palette[i] & 0x00ff00ffu) != 0' is failing
for this corner case image with all black pixels (rgb = 0)
and different Alpha values.

-> switch generic use-LUT detection

Change-Id: I982a8b28c8bcc43e3dc68ac358f978a4bcc14c36
2013-05-18 17:03:18 +02:00
Vikas Arora
742110ccce Speed up ApplyPalette for ARGB pixels.
Added 1 pixel cache for palette colors for faster lookup.
This will speedup images that require ApplyPalette by 6.5% for lossless
compression.

Change-Id: Id0c5174d797ffabdb09905c2ba76e60601b686f8
2013-05-16 15:44:21 -07:00
James Zern
d8edd83551 libwebp: fix vp8 encoder mem alloc offsetting
'mem' was being offset once by DO_ALIGN() then shifted 'nz_size' which
would end up accounting for more than ALIGN_CST and exceed the allocation.

broken since:
  9bf3129 align VP8Encoder::nz_ allocation

Change-Id: I04a4e0bbf80d909253ce057f8550ed98e0cf1054
2013-05-15 00:31:23 -07:00
Vikas Arora
8eae188a62 WebP-Lossless encoding improvements.
Lossy (with Alpha) image compression gets 2.3X speedup.
Compressing lossless images is 20%-40% faster now.

Change-Id: I41f0225838b48ae5c60b1effd1b0de72fecb3ae6
2013-05-08 17:22:11 -07:00
James Zern
9bf312938f align VP8Encoder::nz_ allocation
prevents unaligned uint32_t load/store

Change-Id: I3f5e1b434a7452f618009d5e4bbe4f3260e3e321
2013-04-25 02:55:39 -07:00
pascal massimino
5369a80fd4 Merge "prevent signed int overflow in left shift ops" 2013-04-19 00:13:16 -07:00
James Zern
f4f90880a8 prevent signed int overflow in left shift ops
force unsigned when shifting by 24.

Change-Id: Ie229d252e2e4078107cd705b09397e686a321ffd
2013-04-13 10:57:31 -07:00
James Zern
0261545e0b cosmetics: remove unnecessary ';'s
Change-Id: I5fefd9a5b2fe3795c2b5d785c30335b85bac0b43
2013-04-13 10:49:35 -07:00
Urvang Joshi
67708d6701 WebPEncode: An additional check.
Start VP8EncLoop/VP8EncTokenLoop only if VP8EncStartAlpha succeeded.

Change-Id: Id1faca3e6def88102329ae2b4974bd4d6d4c4a7a
2013-04-05 11:33:44 -07:00
Pascal Massimino
e7d9548c9b add WebPBlendAlpha() function to blend colors against background
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.

Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
2013-04-02 19:14:30 -07:00
Pascal Massimino
6cb4a61825 misc style fix
(cherry picked from commit 142c46291e)

Conflicts:
	src/webp/format_constants.h

Change-Id: Ib764cb09bd78ab6e72c60f495d55b752ad4dbe4d
2013-03-29 15:49:05 -07:00
Pascal Massimino
68111ab02f add missing YUVA->ARGB automatic conversion in WebPEncode()
user can now call WebPEncode() with any YUVA or ARGB format, for
lossy or lossless compression

also: simplified error reporting, which is done in WebPPictureARGBToYUVA()
and WebPPictureYUVAToARGB()

Change-Id: Ifb68909217175bcf5a050e5c68d06de9849468f7
(cherry picked from commit 07d87bda1b)
2013-03-29 15:33:14 -07:00
Pascal Massimino
3c8eb9a806 fix bad saturation order in QuantizeBlock
Saturation was done on input coeff, not quantized one.

This saturation is not absolutely needed: output of FTransformWHT
is in range [-16320, 16321]. At quality 100, max quantization steps is 8,
so the maximal range used by QuantizeBlock() is [-2040, 2040].
But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so
it's better to leave this saturation check for now.

addresses issue #145

Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28
2013-03-25 14:53:29 -07:00
James Zern
f933fd2a27 move WebPFeatureFlags declaration
from private format_constants.h to public mux_types.h
also do the reverse for MKFOURCC()

Change-Id: I3aa86b007e9dbfed37a170989164ac3a77de2bd5
2013-03-19 16:10:41 -07:00
skal
f4ffb2d59a speed-up lossless (~3%) with ad-hoc histogram cost evaluation
* merge cost calculation functions (BitsEntropy() and HuffmanCost())
* have HistogramAdd() specialized into separate functions
* use threshold to bail-out early
* revamp code a bit

* also: save memory by freeing free(histogram_image)

Change-Id: I8ee5d2cfa1462d5d6ea6361f5c89925a3720ef55
2013-03-18 22:34:32 +01:00
James Zern
cef9388283 bump version to 0.3.0
libwebp{,decoder} - 0.3.0
libwebp libtool - 4.2.0 (compatible release)
libwebpdecoder libtool - 0.0.0 (new release)

mux/demux - 0.1.0
libtool - 0.0.0 (new release)

Change-Id: Ied6efa390b2f97f1f41fc8349a365613c639d6cc
2013-03-16 14:08:14 -07:00
skal
0aef3ebdea make alpha unfilter work in-place
* remove a malloc
* remove the unused 'bpp' argument from filter/unfilter functions

Change-Id: I28d78baaaddc20f1d5a3bb2bd0b4e96a12a920d8
2013-03-15 01:36:38 +01:00
skal
eef73d07a3 don't consolidate proba stats too often
adds a minimum count before FinalizeToken() is called for update.

Change-Id: I445be6a3e347620583d87c33067cefa656a25039
2013-03-13 13:50:08 +01:00
James Zern
69e2590642 Merge "Tune Lossless compression for lower qualities." 2013-03-12 22:08:29 -07:00
Vikas Arora
0478b5d214 Tune Lossless compression for lower qualities.
This is required for WebP lossy+Alpha images, where Alpha channel is taking
60-70% of the compression (CPU) cycles.

Also evaluated on 1000 PNG corpus and overall compression speed
is 15-40% better for lossy (PNG+Alpha) compression.
The pure lossless compression numbers are almost same (or little
better) with this change.

Change-Id: I9e5ae7372ed6227a9a5b64cd9cff84c747195a57
2013-03-12 14:17:28 -07:00
skal
9bfbdd144f 1.5x-2x faster encoding for method 3 and up
using token-buffer (that is: slightly more memory. O(output_size))

This change is ON by default. To return to previous behaviour, use
'cwebp -low_memory' or set config.low_memory to true.

Side-effect of this new mode: it forces 1 partition only (which was
default anyway), and makes some statistics about the bitstream
no longer available. cwebp will no longer report 'intra4-coeffs', etc.

This mode also doesn't work (yet) with multi-pass, and -low_memory
is currently forced for multi-pass.

also: reversed the flag: USE_TOKEN_BUFFER -> DISABLE_TOKEN_BUFFER
also: fixed the kAverageBytesPerMB estimate

Change-Id: I4ea80382038d6df4309663e0cb7bd88d9bca9cf1
2013-03-11 17:01:33 -07:00
James Zern
153f94e8b5 fix windows build
broken since:
ad25032 Merge "multi-threaded alpha encoding for lossy"

this produced an error due to an empty VP8TBuffer struct.

Change-Id: I640809d07d20092c1d660e2b59b58a62a12e4371
2013-03-08 15:50:04 -08:00
skal
ad2503203a Merge "multi-threaded alpha encoding for lossy" 2013-03-01 16:00:10 -08:00
skal
4e32d3e1e7 Merge "fix compilation of token.c" 2013-03-01 01:07:56 -08:00
skal
f817930a55 multi-threaded alpha encoding for lossy
new option: 'cwebp -mt ...'
new config flag: config.thread_level
(allowed thread_level are 0 or 1 for now. Maybe more later...)
If -mt is activated (and WEBP_USE_THREAD is used for compile), the alpha-compression
will be done in parallel to RGB coding for lossy. Can save quite a bit of latency...
Has no effect for lossless encoding.

Change-Id: I769d0bf90e7380cf99344ad62cd77277f4df5a46
2013-03-01 10:04:08 +01:00
skal
88050351f4 fix compilation of token.c
(TOKEN_BUFFER still disabled)
also: made VP8TBufferClear() always visible

Change-Id: Iff353fe70b2f3c5b0ab4ef7f143e1d65b0ab2b0d
2013-03-01 09:58:14 +01:00
skal
fc816219e1 code using the actual values for num_parts_, not the ones from config
Change-Id: Icb961c66fe62cb703a12c5ff8a9aa4fc884bac1c
2013-03-01 09:48:33 +01:00
Pascal Massimino
58ca6f65b7 rebalance method tools (-m) for methods [0..4]
(methods 5 and 6 are still untouched).

Methods #0 and #1 got much faster
Method #2 gets vastly improved in quality
Method #3 is noticeably faster for little lower quality
Method #4 (default) is 10-20% faster for comparable quality

+ update the internal doc about the methods' tools.

Example of speed difference:

Time to encode picture:
Method | Before | After
-m 0   | 1.272s | 0.517s
-m 1   | 1.295s | 0.623s
-m 2   | 2.217s | 0.834s
-m 3   | 2.816s | 2.243s
-m 4   | 3.235s | 3.014s
-m 5   | 3.668s | 3.654s
-m 6   | 8.296s | 8.235s

Change-Id: Ic41fda5de65066b3a6586cb8ae1ebb0206d47fe0
2013-02-27 02:19:20 -08:00
Pascal Massimino
5189957e07 describe rd-opt levels introduce VP8RDLevel enum
makes things somehow clearer compared to using magic constants

Change-Id: I9115cee71252511f722806427ee8a97f1a1cd95f
2013-02-26 02:20:59 -08:00
James Zern
c0ba090335 backward_references: avoid signed integer overflow
signed integer overflow behavior is undefined, split PrefixEncode() to
two branches to avoid this.

Change-Id: I6e2761d0d77f0aaceafdc4e07232e089c22beb64
2013-02-20 13:35:00 -08:00
Pascal Massimino
92668da6f2 change default filtering parameters:
* type is now 'strong'
  * strength is now '60'

These help with gradients and blocking

Change-Id: Ie1c8265c557306ef5e9ccefacf43e10946e55370
2013-02-15 01:09:32 -08:00
skal
5c3e381b2f Merge "add a -jpeg_like option" 2013-02-13 22:17:58 -08:00
Pascal Massimino
c23110467e remove unused declaration of VP8Zigzag
Change-Id: I80bdf4b692dcdad1fc2b0cfffcaebb5fef5dde34
2013-02-12 07:14:21 -08:00
skal
23c0f354a6 fix missing intptr_t->int cast for MSVC
Change-Id: I39acdfbe287ef7b7f615d59e0729aab161189bf9
2013-02-06 17:51:10 +01:00
skal
e895059a05 add a -jpeg_like option
This option remaps internal parameters to better match
the expected compression curve of JPEG and produce output files
of similar size, but with better quality.

Change-Id: I96a1cbb480b1f6a0c6845a23c33dfd63f197b689
2013-02-06 14:19:16 +01:00
pascal massimino
1f803f645d Merge "Tune alpha quality mapping to more reasonable values." 2013-02-05 10:50:28 -08:00
Urvang Joshi
1267d498dc Tune alpha quality mapping to more reasonable values.
This results in a significant speedup  with minimal increase in file sizes.

Change-Id: I6ecefe33eee219fba4099810d04a916f7efbd292
2013-02-05 19:49:35 +01:00
skal
043076e2ef Merge "speed-up lossless in BackwardTrace" 2013-02-05 10:46:06 -08:00
skal
f3a44dcd83 remove one malloc from TraceBackwards()
Change-Id: I4f77c0240ca2ece86e8beab11d02c74277409921
2013-02-05 19:43:43 +01:00
skal
0fc1a3a072 speed-up lossless in BackwardTrace
we special-case code=2 (with a later TODO to adapt this on quality)

Change-Id: I93d43f5b3f8f1ef9f211cce253bb4b415918ee57
2013-02-05 19:42:23 +01:00
skal
152ec3d2ee Merge "handle malloc(0) and calloc(0) uniformly on all platforms" 2013-01-23 04:41:36 -08:00
skal
42f8f9346c handle malloc(0) and calloc(0) uniformly on all platforms
also change lossless encoder logic, which was relying on explicit
NULL return from WebPSafeMalloc(0)

renamed function to CheckSizeArgumentsOverflow() explicitly

addresses issue #138

Change-Id: Ibbd51cc0281e60e86dfd4c5496274399e4c0f7f3
2013-01-22 23:40:16 +01:00
skal
0d19fbff51 remove some -Wshadow warnings
these are quite noisy, but it's not a big deal to remove
them.

Change-Id: I5deb08f10263feb77e2cc8a70be44ad4f725febd
2013-01-22 23:06:28 +01:00
James Zern
d6b88b7694 cosmetics: use '== 0' in size checks
Change-Id: I8ac18e2e570e4c6a8569a3955afa11fc943bee28
2012-12-10 23:27:34 -08:00
James Zern
d65ec6786a fix build, move token.c to src/enc/
broken in:
  657f5c9 move token buffer to its own file (token.c)

Change-Id: I8944a0b5760979bd43008c501b55df1d22d32180
2012-12-03 11:16:08 -08:00
skal
657f5c91b1 move token buffer to its own file (token.c)
Change-Id: Ib9791c52f48d98fad5ed3830f36894ef5ac362fa
2012-12-03 13:50:14 +01:00
skal
f76191f9db speed up GetResidualCost()
* treat the last coeff as a special case
* re-arrange the inner code to be shorter
* replace some VP8EncBands[n] by n, for n = 0 or 1

Change-Id: I71e17b014cffad7b073e787fde06260905a6953f
2012-11-26 23:50:37 +01:00
Urvang Joshi
23782f95b4 Separate out mux and demux code and libraries:
- Separate out mux.h and demux.h
- muxtypes.h: new header for data types common to mux/demux
- Move some misc read/write utilities to utils/utils.h
- Remove some duplicate methods.
- Separate out mux/demux libraries

Change-Id: If9b9569b10d55d922ad9317ef51710544315d6de
2012-11-19 11:40:18 -08:00
skal
0f923c3ffd make the bundling work in a tmp buffer
This avoids modifying the source picture.

Change-Id: I5b472859cda17fd3236a9e0fbedbb68977e09f85
2012-11-15 09:05:25 +01:00
Pascal Massimino
d9c5fbefa4 by-pass Analysis pass in case segments=1
10-15% faster encoding.

Almost same output, binary wise. The main difference is
that we can't compute uv_alpha susceptibility, means there
can be subtle differences with different -sns values.

Change-Id: Id1b1a50929bf125b6372212fee1ed75a3bed975f
2012-11-06 22:53:13 -08:00
vikas arora
812933d6ba Tune performance of HistogramCombine
Number of pairs selected are limited between 25% of histogram
images (at start) and number of histogram images left at any iteration.
Increase the range of iter_mult.
Removed min_cluster_size as parameter for tuning HistogramCombine.

Change-Id: Ia4068cd7af4d0f63c5af9001aceda8a40b9de740
2012-11-05 16:45:39 -08:00
James Zern
1c4609b1f8 Merge commit 'v0.2.1'
* commit 'v0.2.1':
  Update ChangeLog
  update NEWS
  bump version to 0.2.1
  libwebp: validate chunk size in ParseOptionalChunks
  cwebp (windows): fix alpha image import on XP
  autoconf/libwebp: enable dll builds for mingw
  [cd]webp: always output windows errors
  fix double to float conversion warning
  cwebp: fix jpg encodes on XP
  VP8LAllocateHistogramSet: fix overflow in size calculation
  GetHistoBits: fix integer overflow
  EncodeImageInternal: fix uninitialized free
  fix the -g/O3 discrepancy for 32bit compile
  fix the BITS=8 case
  Make *InitSSE2() functions be empty on non-SSE2 platform
  make *InitSSE2() functions be empty on non-SSE2 platform
  make VP8DspInitNEON() public

Conflicts:
	src/Makefile.am
	src/dsp/dec_neon.c

Change-Id: Iddc5152e4a6892db96c12d7c3f74adbc85fe6178
2012-11-02 12:20:19 -07:00
Pascal Massimino
0e8b7eedaa fix WebPPictureView() unassigned strides
y_stride/uv_stride/argb_stride were not set properly.

Change-Id: I001b8d46f873ca04b5c68eccd6f232061020f9ec
2012-10-31 16:01:34 -07:00
vikas arora
ab5ea217f7 Tune Lossless encoder
- Changed the dynamic range where more aggressive
  (BackwardReferencesTraceBackward) heuristic is run from quality > 10
  (instead of quality > 25).
- Limit the backward-ref Window size to 16*width & 256*width for lower
  qualities ([0, 25[ & [25, 50[) respectively, instead of 1M window.
- Evaluate the params for HashChainFindCopy outside this function call
  and pass it, instead of recomputing them for every call.

Change-Id: If9eedfc14b978e7632d7cf69c96186e2910b0554
2012-10-30 17:30:17 -07:00
James Zern
25f585c4f2 bump version to 0.2.1
lib - 0.2.1
libtool - 4.1.0 (compatible release)

Change-Id: Ib6ca47f2008d5c97d818422816ac60d0d0d8cffa
2012-10-29 19:17:17 -07:00
James Zern
d662158010 fix double to float conversion warning
introduced in:
 a792b91 fix the -g/O3 discrepancy for 32bit compile

Change-Id: Ic77d6170a5a91cf58ec10c68656ac61a7c0ee41d
2012-10-29 14:00:14 -07:00
James Zern
734f762a08 VP8LAllocateHistogramSet: fix overflow in size calculation
the multiplications done for total_size would be done with integers,
possibly overflowing, before being promoted to 64-bit for the addition

Change-Id: I32c3a6400fc2ef120c38e01a8693f4cb1727234d
2012-10-29 14:00:05 -07:00
James Zern
f9cb58fbce GetHistoBits: fix integer overflow
huff_image_size was a size_t (=32 bits with 32-bit builds) which could
rollover causing an incorrectly sized allocation and a crash in lossless
encoding.
fixes issue #128

Change-Id: I0f20cee98c29b2b40b02607930b6b7a7ca56996d
2012-10-29 14:00:01 -07:00
James Zern
b30add2017 EncodeImageInternal: fix uninitialized free
on allocation error refs.refs would be uninitialized and free'd, causing
a crash

Change-Id: Idb7daec7aec3e5d769d8103595c28a9d0d0b86f4
2012-10-29 13:59:57 -07:00
skal
3de58d7730 fix the -g/O3 discrepancy for 32bit compile
in debug mode, some float operations see their intermediate
values stored in memory rather than staying in the FPU (which
is 80bit precision).

Several fixes are possible (breaking long calculations into
atomic steps for instance), but simpler of all is just about
turning the cost[] array into float* instead of double*.

The code is a tad faster, and i didn't see any major output
size difference.

Change-Id: I053e6d340850f02761687e072b0782c6734d4bf8
2012-10-29 13:59:52 -07:00
James Zern
704818980f AccumulateLSIM: fix double -> float warnings
Change-Id: I234a5cd09b9351dbbbbc5076be35bb794d1bf890
2012-10-22 18:00:20 -07:00
Pascal Massimino
f86e6abe1f add LSIM metric to WebPPictureDistortion()
LSIM stands for "local similarity": before matching
a compressed pixel to the source, we search around in the source
and minimise the squared error. So, this is close to PSNR calculation,
but mitigates some of its limitations (pure translation and noise for instance).

There's a new -print_lsim option to cwebp too.

Change-Id: Ia38561034c7a90e71d2ea0f55bb1de527eda245b
2012-10-19 06:48:11 -07:00
Vikas Arora
c3aa215afa Speed up HistogramCombine for lower qualities.
Make the heuristic for combining Histograms a function of compression
quality. This change will speed-up compression time for compression
quality less than 75. The compression time/density remains unchanged
for compression quality 75 and higher.

Change-Id: I94513d51078340fbc0737d459fab2cebdd2d6082
2012-10-16 15:45:36 -07:00
James Zern
e855208c16 fix double to float conversion warning
introduced in:
 a792b91 fix the -g/O3 discrepancy for 32bit compile

Change-Id: I6a77223f237527eda4ee1d6eaa993351bd74f1d6
2012-10-08 18:27:27 -07:00
Vikas Arora
7b3eb372ad Tune lossless compression to get better gains.
Tune compression heuristics to get better gains across wide quality range.

Change-Id: Ic342d4dbcf83fe2086a34e5c184aef0714109430
2012-10-05 09:58:29 -07:00
pascal massimino
ce8bff45bc Merge "VP8LAllocateHistogramSet: fix overflow in size calculation" 2012-10-03 14:57:15 -07:00
pascal massimino
ab5b67a1d0 Merge "EncodeImageInternal: fix uninitialized free" 2012-10-03 14:53:35 -07:00
pascal massimino
7fee5d1231 Merge "GetHistoBits: fix integer overflow" 2012-10-03 14:51:22 -07:00
James Zern
a6ae04d455 VP8LAllocateHistogramSet: fix overflow in size calculation
the multiplications done for total_size would be done with integers,
possibly overflowing, before being promoted to 64-bit for the addition

Change-Id: Id5c127c8a497ce5de89a276c17f36b59eeb67c21
2012-10-03 12:18:00 -07:00
James Zern
80237c4371 GetHistoBits: fix integer overflow
huff_image_size was a size_t (=32 bits with 32-bit builds) which could
rollover causing an incorrectly sized allocation and a crash in lossless
encoding.
fixes issue #128

Change-Id: I175c8c6132ba9792034807c5c1028dfddfeb4ea5
2012-10-03 12:17:52 -07:00
James Zern
8a9972353d EncodeImageInternal: fix uninitialized free
on allocation error refs.refs would be uninitialized and free'd, causing
a crash

Change-Id: I8d77069aadc594758aaa79b2b73376c0107e57e4
2012-10-03 12:17:47 -07:00
skal
0b9e682934 minor cosmetics
spotted in patch #34187

Change-Id: Ia706af6ef7674ec7a1d7250da08f718ed7c09e72
2012-10-03 15:22:33 +02:00
skal
a792b913bd fix the -g/O3 discrepancy for 32bit compile
in debug mode, some float operations see their intermediate
values stored in memory rather than staying in the FPU (which
is 80bit precision).

Several fixes are possible (breaking long calculations into
atomic steps for instance), but simpler of all is just about
turning the cost[] array into float* instead of double*.

The code is a tad faster, and i didn't see any major output
size difference.

Change-Id: Icf1f833e15f8ee4ecc7f9a521d07fdc96ef711aa
2012-10-03 15:15:58 +02:00
Pascal Massimino
73ba4357fe Merge "detect and merge similar segments" 2012-10-02 06:22:23 -07:00
Pascal Massimino
fee6627538 detect and merge similar segments
similar = same quant and filter strength.
This save some bits in the segment map

Change-Id: I6f594474ad82bddf013278d47089e43a02e07e63
2012-10-01 21:18:03 +02:00
skal
8f216f7e60 remove cases of equal comparison for qsort()
Returning 0 (equal) can lead to undefined behaviour.
And, in our cases we'll never have equal keys (added asserts for that)

Change-Id: Ifaf202df321d3f877ad2a03de42e0d6cdd1b2388
2012-09-25 19:03:41 +02:00
skal
5725cabac0 new segmentation algorithm
fixes the 'blocky sky problem' (saturation problem: when luma was flat,
chroma noise was taking over, resulting in random segment id assigned.
When just using a common uniform segment was better).

+ side clean-up and readibility/experimentability MACRO'ization
+ added '-map 7' option

Change-Id: I35982a9e43c0fecbfdd7b05e4813e8ba8c121d71
2012-09-04 23:09:15 +02:00
Vikas Arora
62dd9bb242 Update encoding heuristic w.r.t palette colors.
Added a threshold of MAX_COLORS_FOR_GRAPH for color-palettes, above
which the graph hint is ignored.

Change-Id: Ia5d7f45e52731b6eaf2806999d6be82861744fd3
2012-08-08 18:57:52 -07:00
Pascal Massimino
33705ca093 bump version to 0.2.0
Change-Id: I01cb50b9c4c8e9245aede3947481cbbd27d6a19d
2012-08-03 15:41:01 -07:00
James Zern
52a87dd7ff Merge "silence one more warning" into 0.2.0 2012-08-02 17:53:04 -07:00
James Zern
3b02309347 silence one more warning
inadvertently added in last warning roundup

Change-Id: I38e6bcfb18c133f2dc2b38cec81e12d2ff556011
2012-08-02 17:50:12 -07:00
Pascal Massimino
f94b04f045 move some RGB->YUV functions to yuv.h
will be needed later

Change-Id: I6b9e460db2d398b9fecd5d3c1bbdb3f2f3d4f5db
2012-08-02 17:23:02 -07:00
James Zern
292ec5cc7d quiet a few 'uninitialized' warnings
spurious in this case, but addresses e.g.,
... potentially uninitialized local variable 'weighted_average' used

Change-Id: Ib99998bf49e4af7a82ee66f13fb850ca5b17dc71
2012-08-02 14:03:30 -07:00
Pascal Massimino
323dc4d9b9 remove use of log2(). Use VP8LFastLog2() instead.
Order-by-cost mostly unchanged (up to a scaling constant 1/log(2))
(except for few minor diff in < 2% of cases)

+ remove unused field cost_mode->cache_bits_

Change-Id: I714f8ab12f49a23f5d499a64c741382c9b489a3e
2012-08-02 00:08:58 -07:00
Pascal Massimino
8c515d54ea Merge "harness some malloc/calloc to use WebPSafeMalloc and WebPSafeCalloc" into 0.2.0 2012-08-01 18:16:46 -07:00
James Zern
d4b4bb0248 Merge changes I46090628,I1a41b2ce into 0.2.0
* changes:
  check VP8LBitWriterInit return
  lossless: fix crash on user abort
2012-08-01 13:19:32 -07:00