Simplify and re-organize the VP8L bit-reader functions
(e.g.: the 40-bit look-ahead code was helping much)
Speed-up with LBITS=64, on arm7-a:
=> before:
./dwebp_justify_24_neon -v bryce_ll.webp
Time to decode picture: 11.393s
File bryce_ll.webp can be decoded (dimensions: 11158 x 2156).
...
=> after (LBITS=64): Time to decode picture: 9.953s
making the VP8L bit-reader in 32 bit mode is going to be
harder (because we need to be able to read two symbols
at a time, each with max length 15 bits)
Change-Id: I89746fb103b87b5e2fd40a3208a6fbc584b88297
This flag will make the code use no uint64, no asm, and no fancy
trick, but instead aim at being as simple and straightforward as
possible.
Main use is to help emscripten generate proper JS code.
More code needs to be simplified later.
Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY
Here are the typical timing for decoding a large image:
ARM7-a:
dwebp_justify_32_neon Time to decode picture: 3.280s
dwebp_justify_24_neon Time to decode picture: 2.640s
dwebp_justify_16_neon Time to decode picture: 2.723s
dwebp_justify_8_neon Time to decode picture: 2.802s
dwebp_justify_32 Time to decode picture: 4.264s
dwebp_justify_24 Time to decode picture: 3.696s
dwebp_justify_16 Time to decode picture: 3.779s
dwebp_justify_8 Time to decode picture: 3.834s
dwebp_32_neon Time to decode picture: 4.010s
dwebp_24_neon Time to decode picture: 2.725s
dwebp_16_neon Time to decode picture: 2.852s
dwebp_8_neon Time to decode picture: 2.778s
dwebp_32 Time to decode picture: 4.587s
dwebp_24 Time to decode picture: 3.800s
dwebp_16 Time to decode picture: 3.902s
dwebp_8 Time to decode picture: 3.815s
REFERENCE (HEAD) Time to decode picture: 3.818s
x86_64:
dwebp_justify_32 Time to decode picture: 0.473s
dwebp_justify_24 Time to decode picture: 0.434s
dwebp_justify_16 Time to decode picture: 0.450s
dwebp_justify_8 Time to decode picture: 0.467s
dwebp_32 Time to decode picture: 0.474s
dwebp_24 Time to decode picture: 0.468s
dwebp_16 Time to decode picture: 0.468s
dwebp_8 Time to decode picture: 0.481s
REFERENCE (HEAD) Time to decode picture: 0.436s
i386:
dwebp_justify_32 Time to decode picture: 0.723s
dwebp_justify_24 Time to decode picture: 0.618s
dwebp_justify_16 Time to decode picture: 0.626s
dwebp_justify_8 Time to decode picture: 0.651s
dwebp_32 Time to decode picture: 0.744s
dwebp_24 Time to decode picture: 0.627s
dwebp_16 Time to decode picture: 0.642s
dwebp_8 Time to decode picture: 0.642s
Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225
This option remaps internal parameters to better match
the expected compression curve of JPEG and produce output files
of similar size, but with better quality.
Change-Id: I96a1cbb480b1f6a0c6845a23c33dfd63f197b689
If it's not a partial file and parser returns PARSE_NEED_MORE_DATA, then
consider it to be PARSE_ERROR.
Change-Id: Id652a345bd2a9f574970272dd0a00517de113215
If a NULL pre-allocated buffer is passed, a buffer will be automatically
allocated.
+ add some parameter checks.
reported in http://code.google.com/p/webp/issues/detail?id=139
Change-Id: I9e14ed97db30ee12e46b5e92aac7eeaaeb99bfd5
store values to a temporary variable before calling functions that take
vector types.
removes non-standard constructs such as:
(uint8x8x2_t){{ a, b }}
fixing:
src/dsp/upsampling_neon.c:69:32: error: macro "vst2_u8" passed 3
arguments, but takes just 2
Change-Id: Ib4368e16e3a3efac18024f02be94e76243ade2dc
Fixes: https://code.google.com/p/webp/issues/detail?id=140
- along the lines of the SSE chroma upsampling.
Total speedup is ~30%.
4% speed loss on YuvToRgbXX conversion using tables instead
of 14-bit fixed precision. TODO(later): investigate, and compare
to x86.
see http://code.google.com/p/webp/issues/detail?id=134
Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009
When the config option '--enable-libwebpdecoder' is specified, the
lean decoder library 'libwebpdecoder' will be created in addition to
libwebp. Also dwebp binary will be linked to libwebpdecoder, if this
config option is specified.
Change-Id: I9de3e149b59c9a8390fae2ba660941749640e54a
Actually, it turns out we now should never call these functions
with a zero size, otherwise something is wrong in the logic.
Change-Id: Ie414fcbec95486c169190470a71f2cff0843782a
also change lossless encoder logic, which was relying on explicit
NULL return from WebPSafeMalloc(0)
renamed function to CheckSizeArgumentsOverflow() explicitly
addresses issue #138
Change-Id: Ibbd51cc0281e60e86dfd4c5496274399e4c0f7f3
- precompute filtering strength once for all at the beginning
instead of per-macroblock
- reduce size of VP8MB struct from 8 bytes to 4.
- removed VP8StoreBlock() accordingly
Change-Id: Icf3d329473e21c464770be3d72a04c9ee4c321f2
GetCoeffs is (by far) the most consuming function of the decoder.
No speed change (unfortunately), but the main loop is somehow clearer.
Change-Id: I78f1c10cadc2c8696c041f5cbda86cab92cc6598
* treat the last coeff as a special case
* re-arrange the inner code to be shorter
* replace some VP8EncBands[n] by n, for n = 0 or 1
Change-Id: I71e17b014cffad7b073e787fde06260905a6953f
The main advantage is that you can avoid the use of uint64_t
some times, sticking to 32bit only.
Default still is BITS=32, this is mainly "in case".
Change-Id: Id694028793117ba822c37d46ef6c52fa0afed4ac
- Separate out mux.h and demux.h
- muxtypes.h: new header for data types common to mux/demux
- Move some misc read/write utilities to utils/utils.h
- Remove some duplicate methods.
- Separate out mux/demux libraries
Change-Id: If9b9569b10d55d922ad9317ef51710544315d6de
We don't need to use the exact forward transform,
since it's only a rough evaluation.
-> Removed some shifts and rounding constants.
Change-Id: I3fdf8b4fe9720473894155e1ad0345f4d1fd9a33
In particular, this removes any unnecessary FRGM/ANMF/ANIM chunks, and
indirectly leads to removal of unnecessary VP8X chunks as well.
This is especially useful for GIF to WebP conversion - it saves 56 bytes
(ANMF: 16+8 bytes, ANIM: 6+8 bytes, VP8X: 10+8 bytes) for non-animated GIFs.
Change-Id: I3b50a96ca585844c421b0fa4cd8593e52c3f95c5
This is a correction to the following change:
a00a3daf5b Use 'frgm' instead of 'tile' in
webpmux parameters
Change-Id: I8fa0bce98efdde38827fd25712017a98a6ea7388
- Allow a duration of 0
- Rename LOOP chunk to ANIM and add the background color field to it.
- Add a disposal method field for each animation frame.
- Modify webpmux.c binary interface to allow the input of background color
and disposal methods. Also make '-loop' and '-bgcolor' arguments optional
with some default values.
Change-Id: I807372a61cdb8a0d3080ae3552caf2848070bf4d
10-15% faster encoding.
Almost same output, binary wise. The main difference is
that we can't compute uv_alpha susceptibility, means there
can be subtle differences with different -sns values.
Change-Id: Id1b1a50929bf125b6372212fee1ed75a3bed975f
- Also, use the term 'fragments' instead of 'tiling' in code
- This makes code consistent with the spec.
Change-Id: Ibeccffc35db23bbedb88cc5e18e29e51621931f8
- Make ANMF and FRGM chunks hierarchical so that they encompass all chunks of
that frame.
- Use this in demuxer: stop parsing a frame if all image data for it isn't
available yet. Thus, we have a frame-level incremental support; that is,
all frames that are fully available can be parsed.
- Note: We still keep incremental support for single images - so that they can
be decoded with incremental decoding.
Change-Id: Id1585b16b06caee1d84009c42a25d2de29fa6135
Use separate fourCCs "XMP " and "EXIF" instead of a common "META"
Also, some refactorization in webpmux.c
Change-Id: Iad3337e5c1b81e785c60670ce28b1f536dd7ee31
Number of pairs selected are limited between 25% of histogram
images (at start) and number of histogram images left at any iteration.
Increase the range of iter_mult.
Removed min_cluster_size as parameter for tuning HistogramCombine.
Change-Id: Ia4068cd7af4d0f63c5af9001aceda8a40b9de740
* commit 'v0.2.1':
Update ChangeLog
update NEWS
bump version to 0.2.1
libwebp: validate chunk size in ParseOptionalChunks
cwebp (windows): fix alpha image import on XP
autoconf/libwebp: enable dll builds for mingw
[cd]webp: always output windows errors
fix double to float conversion warning
cwebp: fix jpg encodes on XP
VP8LAllocateHistogramSet: fix overflow in size calculation
GetHistoBits: fix integer overflow
EncodeImageInternal: fix uninitialized free
fix the -g/O3 discrepancy for 32bit compile
fix the BITS=8 case
Make *InitSSE2() functions be empty on non-SSE2 platform
make *InitSSE2() functions be empty on non-SSE2 platform
make VP8DspInitNEON() public
Conflicts:
src/Makefile.am
src/dsp/dec_neon.c
Change-Id: Iddc5152e4a6892db96c12d7c3f74adbc85fe6178
Contributed by Wayne Chen (datoudatou at gmail dot com)
+ some header cleanup
+ remove the NEON suffix in static functions
Change-Id: I75bf5e9b54cf5e1acc53764c6f081d61690f8e3d
(implements the backward and forward transforms in the encoder)
original patch by Wayne Chen (datoudatou at gmail dot com)
Change-Id: Ic00f3bffcdf7a924f043006728735c810ee47a57
- Changed the dynamic range where more aggressive
(BackwardReferencesTraceBackward) heuristic is run from quality > 10
(instead of quality > 25).
- Limit the backward-ref Window size to 16*width & 256*width for lower
qualities ([0, 25[ & [25, 50[) respectively, instead of 1M window.
- Evaluate the params for HashChainFindCopy outside this function call
and pass it, instead of recomputing them for every call.
Change-Id: If9eedfc14b978e7632d7cf69c96186e2910b0554
the max wasn't checked leading to a rollover case, possibly exploitable.
additionally check the RIFF size early, to avoid similar issues.
pulled from chromium:
http://codereview.chromium.org/11229048/
Change-Id: I4050b13a7e61ec023c0ef50958c45f651cf34c49
the multiplications done for total_size would be done with integers,
possibly overflowing, before being promoted to 64-bit for the addition
Change-Id: I32c3a6400fc2ef120c38e01a8693f4cb1727234d
huff_image_size was a size_t (=32 bits with 32-bit builds) which could
rollover causing an incorrectly sized allocation and a crash in lossless
encoding.
fixes issue #128
Change-Id: I0f20cee98c29b2b40b02607930b6b7a7ca56996d
in debug mode, some float operations see their intermediate
values stored in memory rather than staying in the FPU (which
is 80bit precision).
Several fixes are possible (breaking long calculations into
atomic steps for instance), but simpler of all is just about
turning the cost[] array into float* instead of double*.
The code is a tad faster, and i didn't see any major output
size difference.
Change-Id: I053e6d340850f02761687e072b0782c6734d4bf8
this will avoid the "dec_neon.o has no symbol" warning
no change in binary size observed on linux.
Change-Id: Ifd83dfc6a0c61905481599b06cb5e711f55efa7d
the max wasn't checked leading to a rollover case, possibly exploitable.
additionally check the RIFF size early, to avoid similar issues.
pulled from chromium:
http://codereview.chromium.org/11229048/
Change-Id: Ifebc712bf3d3de0129b76ca4c57c68e062abc429