* remove LEFT/RIGHT_JUSTIFY distinction. It's all RIGHT_JUSTIFY now.
* simplify VP8GetSigned(), and add some masking branch-less code. Much
faster on ARM (~13% speed-up). 8% on x86-64, 5% on MacBook.
* split critical implementation into separate bit_reader_inl.h file that
is only included where needed (vp8.c / tree.c / bit_reader.c)
* bumped BITS value from 16 to 24 for x86-32b too, since it's a bit faster.
Change-Id: If41ca1da3e5c3dadacf2379d1ba419b151e7fce8
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.
Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()
Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
gcc was generating very complex code, one for each case of br->len_ values!
also, pretty-fy the mask constants
Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42
Sometimes, we can write 18bit or more at time, and it would
overflow the 32bit accumulator.
Also clarified the num-bits limitations (and exposed
VP8L_MAX_NUM_BIT_READ in bit_reader.h)
fixes http://code.google.com/p/webp/issues/detail?id=200
Seems a bit faster (use of local fields for bits_ / used_)
also: added the __QNX__ bswap while at it.
Change-Id: I876db93a931db15b083cf1d838c70105effa7167
MALLOC_FAIL_AT flag can be used to set-up a pre-determined failure
point during malloc calls. The counter value is retrieved using
getenv().
Example usage: export MALLOC_FAIL_AT=37 && cwebp input.png
will make 'cwebp' report a memory allocation error the 37th time
malloc() or calloc() is called.
MALLOC_MEM_LIMIT can be used similarly to prevent allocating more
than a given amount of memory. This is usually less convenient to
use than MALLOC_FAIL_AT since one has to know in advance the typical
memory size allocated.
Both these flags are meant to be used for debugging only!
Also: added a 'total_mem_allocated' to record the overall memory allocated
Change-Id: I9d408095ee7d76acba0f3a31b1276fc36478720a
This change reduces the number of calls to WebPSafeMalloc from 200 to
100. The overall memory consumption is down 3% for Lenna image.
Change-Id: I1b351a1f61abf2634c035ef1ccb34050b7876bdd
Some tracing code is activated by PRINT_MEM_INFO flag.
For debugging only! (not thread-safe, and slow).
Change-Id: I282c623c960f97d474a35b600981b761ef89ace9
* merged the two HistogramAdd/AddEval() into a single call
(with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster
Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.
Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.
Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.
Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
When remapping buffer, br->eos_ was wrongly being set to true for
certain
images.
Also, refactored the end-of-stream detection as a function.
Reported in http://crbug.com/364830
Change-Id: I716ce082ef2b505fe24246b9c14912d8e97b5d84
.set at - Indicates that macro expansions may clobber
the assembler temporary ($at or $28) register.
Some macros may not be expanded without this
and will generate an error message if noat
is in effect.
"at" also added to the clobber list.
Change-Id: I67feebbd9f2944fc7f26c28496e49e1e2348529d
there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.
The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.
Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).
Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b
Separate the C version from the MIPS32 version and have run-time
initialization during RescalerInit()
Change-Id: I93cfa5691c073a099fe62eda1333ad2bb749915b
* simplify the endian logic
* remove the need for memset()
* write 16 or 32 at a time (likely aligned)
Makes the code a bit faster on ARM (~1%)
Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c
Even at high quality setting, the U/V quantizer step is limited
to 4 which can lead to banding on gradient.
This option allows to selectively apply some randomness to
potentially flattened-out U/V blocks and attenuate the banding.
This option is off by default in 'dwebp', but set to -dither 50
by default in 'vwebp'.
Note: depending on the number of blocks selectively dithered,
we can have up to a 10% slow-down in decoding speed it seems.
Change-Id: Icc2446007f33ddacb60b3a80a9e63f2d5ad162de
Earlier we were only testing for bit_pos == LBITS. But this is not
sufficient,
as bit_pos can jump from < LBITS to > LBITS.
This was resulting in some bit-stream truncation errors not being
caught.
Note: Not a security bug though, as br->pos wasn't incremented in such
cases
and so we weren't reading beyond the buffer.
Change-Id: Idadcdcbc6a5713f8fac3470f907fa37a63074836
in_bits is const. Trying to apply bswap on it, one gets the error message:
error: read-only variable 'in_bits' used as 'asm' output
Change-Id: I0bef494b822c83d8ea87b1938b0e486d94de4742
speeds up those codes that are not part of the main lookup.
This gives a 10 % speedup for a photographic image.
Change-Id: Ief54b0ad77db790a01314402ad351b40ac9a7be4
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
subdirectories with more than one target can have the install targets
run in parallel with make -jN. group the shared headers in one place to
produce a common install target.
Change-Id: I1f3aa338a8ee6d681de1e5d0b2c6244d2c3d5451
-> split libraries further into decoder / encoder
-> add libwebpdecoder.a in Makefile.unix
-> make dwebp link against libwebpdecoder.a in Makefile.unix
also: in makefile.unix, pass EXTRA_FLAGS to LDFLAGS too
(otherwise, -m32 wouldn't work, e.g.)
Change-Id: Ief3da02a729dd86bbaf949ed048836716941657f
Simplify and re-organize the VP8L bit-reader functions
(e.g.: the 40-bit look-ahead code was helping much)
Speed-up with LBITS=64, on arm7-a:
=> before:
./dwebp_justify_24_neon -v bryce_ll.webp
Time to decode picture: 11.393s
File bryce_ll.webp can be decoded (dimensions: 11158 x 2156).
...
=> after (LBITS=64): Time to decode picture: 9.953s
making the VP8L bit-reader in 32 bit mode is going to be
harder (because we need to be able to read two symbols
at a time, each with max length 15 bits)
Change-Id: I89746fb103b87b5e2fd40a3208a6fbc584b88297
This flag will make the code use no uint64, no asm, and no fancy
trick, but instead aim at being as simple and straightforward as
possible.
Main use is to help emscripten generate proper JS code.
More code needs to be simplified later.
Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY
Here are the typical timing for decoding a large image:
ARM7-a:
dwebp_justify_32_neon Time to decode picture: 3.280s
dwebp_justify_24_neon Time to decode picture: 2.640s
dwebp_justify_16_neon Time to decode picture: 2.723s
dwebp_justify_8_neon Time to decode picture: 2.802s
dwebp_justify_32 Time to decode picture: 4.264s
dwebp_justify_24 Time to decode picture: 3.696s
dwebp_justify_16 Time to decode picture: 3.779s
dwebp_justify_8 Time to decode picture: 3.834s
dwebp_32_neon Time to decode picture: 4.010s
dwebp_24_neon Time to decode picture: 2.725s
dwebp_16_neon Time to decode picture: 2.852s
dwebp_8_neon Time to decode picture: 2.778s
dwebp_32 Time to decode picture: 4.587s
dwebp_24 Time to decode picture: 3.800s
dwebp_16 Time to decode picture: 3.902s
dwebp_8 Time to decode picture: 3.815s
REFERENCE (HEAD) Time to decode picture: 3.818s
x86_64:
dwebp_justify_32 Time to decode picture: 0.473s
dwebp_justify_24 Time to decode picture: 0.434s
dwebp_justify_16 Time to decode picture: 0.450s
dwebp_justify_8 Time to decode picture: 0.467s
dwebp_32 Time to decode picture: 0.474s
dwebp_24 Time to decode picture: 0.468s
dwebp_16 Time to decode picture: 0.468s
dwebp_8 Time to decode picture: 0.481s
REFERENCE (HEAD) Time to decode picture: 0.436s
i386:
dwebp_justify_32 Time to decode picture: 0.723s
dwebp_justify_24 Time to decode picture: 0.618s
dwebp_justify_16 Time to decode picture: 0.626s
dwebp_justify_8 Time to decode picture: 0.651s
dwebp_32 Time to decode picture: 0.744s
dwebp_24 Time to decode picture: 0.627s
dwebp_16 Time to decode picture: 0.642s
dwebp_8 Time to decode picture: 0.642s
Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225
When the config option '--enable-libwebpdecoder' is specified, the
lean decoder library 'libwebpdecoder' will be created in addition to
libwebp. Also dwebp binary will be linked to libwebpdecoder, if this
config option is specified.
Change-Id: I9de3e149b59c9a8390fae2ba660941749640e54a
Actually, it turns out we now should never call these functions
with a zero size, otherwise something is wrong in the logic.
Change-Id: Ie414fcbec95486c169190470a71f2cff0843782a
also change lossless encoder logic, which was relying on explicit
NULL return from WebPSafeMalloc(0)
renamed function to CheckSizeArgumentsOverflow() explicitly
addresses issue #138
Change-Id: Ibbd51cc0281e60e86dfd4c5496274399e4c0f7f3
The main advantage is that you can avoid the use of uint64_t
some times, sticking to 32bit only.
Default still is BITS=32, this is mainly "in case".
Change-Id: Id694028793117ba822c37d46ef6c52fa0afed4ac
- Separate out mux.h and demux.h
- muxtypes.h: new header for data types common to mux/demux
- Move some misc read/write utilities to utils/utils.h
- Remove some duplicate methods.
- Separate out mux/demux libraries
Change-Id: If9b9569b10d55d922ad9317ef51710544315d6de
Returning 0 (equal) can lead to undefined behaviour.
And, in our cases we'll never have equal keys (added asserts for that)
Change-Id: Ifaf202df321d3f877ad2a03de42e0d6cdd1b2388
SBITS=8 is reported 20-30% faster on ARM (where 64bit ops
are expensive).
Also use 32bits for i32.
Change-Id: Id6a7197d805061aeb8832f20432512d0d930ebfa
no speed diff observed by removing the test before calling BitWriterResize().
+ remove some unnecessary memset() in VP8LBitWriter
+ fix mixed code/variable-decl in BIG_ENDIAN mode
Change-Id: I36be61f83d10a43e4682b680c2dae0e494da4218
* ~1-4% faster
* if it's not used, don't use it
* remove the special handling of cache_bits = 0
* remove some tests in the loops
Change-Id: I19d87c3ca731052ff532ea8b2d8e89816507b75f
will be called by alpha post-processing, although doing nothing for now.
Gradient smoothing would be nice-to-have here. Patch welcome!
Change-Id: I534cde866bdc75da22d0f0a6d1373c90e21366f3
* Method #1 is now calling the lossless encoder on the alpha plane.
Format is not final, it's just a first draft. We need ad-hoc functions.
* removed now useless utils/alpha.*
* added utils/quant_levels.h instead
* removed the TCoder code altogether
Change-Id: I636840b6129a43171b74860e0a0fc5bb1bcffc6a
- Symbols added to the tree are valid inside HuffmanTreeBuildExplicit().
- In HuffmanTreeBuildImplicit(), make sure 'root_symbol' is
valid in case of a single symbol tree.
Change-Id: I7de5de71ff28f41e2d6228b29ed8dd4a20813e99
CompareHuffmanTrees() and SetBitDepths()):
- Move 'tree_size' initialization and malloc for 'tree + tree_pool'
outside the loop.
- Some renames/tweaks for readability.
Change-Id: I5cb3cc942afac6e9f51a0b97c57ee897677a48a2
* lossless_encoder: (46 commits)
split StoreHuffmanCode() into smaller functions
more consolidation: introduce VP8LHistogramSet
big code clean-up and refactoring and optimization
Some cosmetics in histogram.c
Approximate FastLog between value range [256, 8192]
Forgot to update out_bit_costs to symbol_bit_costs at one instance.
Evaluate output cluster's bit_costs once in HistogramRefine.
Simple Huffman code changes.
Lossless decoder: remove an unneeded param in ReadHuffmanCodeLengths().
Reducing emerging palette size from 11 to 9 bits.
Move GetHistImageSymbols to histogram.c
Improve predict vs no-predict heuristic.
code-moving and clean-up
reduce memory usage by allocating only one histo
Restrict histo_bits to ensure histo_image size is under 32MB
further simplification for the meta-Huffman coding
A quick pass of cleanup in backward reference code
Make transform bits a function of encode method (-m).
introduce -lossless option, protected by USE_LOSSLESS_ENCODER
Run TraceBackwards for higher qualities.
...
Conflicts:
src/enc/webpenc.c
Change-Id: I9a5d98cba0889ea91d10699466939cc283da345a
no version of msvc currently implements log2(). unconditionally define
NOT_HAVE_LOG2 in this case to simplify building libwebp sources in other
projects.
Change-Id: Ia9d985b1125553c5a8271d7e539bc1b4f898d749
- use common file organization across subdir makefiles
- append lib/source/header list variables and sort
Change-Id: I0653e1c73a4552b0c43d21f321b22b4972d6e87b
* each with their own decoder instances.
* Refactor the incremental buffer-update code a lot.
* remove br_offset_ for VP8LDecoder along the way
* make VP8GetHeaders() be used only for VP8, not VP8L bitstream
* remove VP8LInitDecoder()
* rename VP8LBitReaderResize() to VP8LBitReaderSetBuffer()
(cherry picked from commit 5529a2e6d47212a721ca4ab003215f97bd88ebb4)
Change-Id: I58f0b8abe1ef31c8b0e1a6175d2d86b863793ead
import changes from experimental 5529a2e^
and enable build in autoconf and makefile.unix; windows will be treated
separately.
Change-Id: Ie2e177a99db63190b4cd647b3edee3b4e13719e9
Pulled from the parent of the current version (5529a2e^).
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: I6dae97fc381cd6c1d1640c4c565b2084a41ec955
Pulled from the current HEAD (218c32e).
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: Ie57be21bf50ad83808c72aeb5fc706d9954d01d8
Pulled from the current HEAD (218c32e).
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: Id879be453a94d9f44ec8d47747823ca7297ae008
by packing the symbol map more efficiently.
This is mainly useful in making it harder to generate invalid bitstream:
before this change, one could code the same symbol twice. Now, it's
impossible, since we code the position using empty symbol slots, instead
of total position.
* Fix the HasOnlyLeftChild() naming while at it.
Change-Id: I63c56c80a4f04a86ac83eded1e3306329815b6c9
_byteswap_ulong is defined in stdlib.h, release builds seem to pull it
in through a different path.
Change-Id: I510d2624150f89a4a77734bf3dc5b4db60a4ba95
- remove some unused functions
- move global arrays from data to read only section
- explicitly cast malloc returns; not specifically necessary, but helps
show intent
- miscellaneous formatting
Change-Id: Ib15fe5b37fe6c29c369ad928bdc3a7290cd13c84
MAX_LEN -> max_len
This was sub-optimal at the end of the picture, when there's
less than MAX_LEN bytes left to match.
Change-Id: I5ebe1fca4e7c112dcd34748a082d1c97f95eb099
often reduces compressed size by ~10's of bytes
+ refactored / sped-up the prediction code (gradient: ~30% faster)
Change-Id: I26bd983655dad4f85d5c5ddc20a1980f384c4dd6
.. where only 2 filtering modes are potentially
tried, instead of all of them. This is fast than the exhaustive 'best'
mode, and not much worse.
Options for cwebp are:
-alpha_filter none
-alpha_filter fast (<- default)
-alpha_filter best (<- slow)
Change-Id: I8cb90ee11b8f981811e013ea4ad5bf72ba3ea7d4