Earlier such images were using roughly 9 * width * height bytes for
decoding. Now, they take 6 * width * height memory.
Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7
(cherry picked from commit 64c844863a)
Simply get rid of an intermediate buffer of size width x height, by
using the fact that stride == width in this case.
Change-Id: I92376a2561a3beb6e723e8bcf7340c7f348e02c2
(cherry picked from commit edccd19436)
doing so is not part of ISO C; removes some pedantic warnings
Change-Id: I739ad8c5cacc133e2546e9f45c0db9d92fb93d7e
(cherry picked from commit 96e948d7b0)
The output surface CAN be changed inbetween calls to
WebPIUpdate() or WebPIAppend(), but with precautions.
Change-Id: I899afbd95738a6a8e0e7000f8daef3e74c99ddd8
(cherry picked from commit ff885bfe1f)
This applies to images with optional chunks (e.g. images with ALPH
chunk,
ICCP chunk etc). Before this, the incremental decoding used to work like
non-incremental decoding for such files, that is, no rows were decoded
until
all data was available.
The change is in 2 parts:
- During optional chunk parsing, don't wait for the full VP8/VP8L chunk.
- Remap 'alpha_data' pointer whenever a new buffer is allocated/used in
WebPIAppend() and WebPIUpdate().
Change-Id: I6cfd6ca1f334b9c6610fcbf662cd85fa494f2a91
(cherry picked from commit ead4d47859)
Start VP8EncLoop/VP8EncTokenLoop only if VP8EncStartAlpha succeeded.
Change-Id: Id1faca3e6def88102329ae2b4974bd4d6d4c4a7a
(cherry picked from commit 67708d6701)
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.
Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
(cherry picked from commit e7d9548c9b)
Tuned the cross_color transform parameter (step) for lower quality
levels. This change gives speedup of 20% at lower qualities (25) and 10% at
moderate quality level (50) with a loss of 0.25% in compression density.
Also removed TODO for cross_color transform. Observed good correlation of
this with the predict transform.
Change-Id: I8a1044e9f24e6a5f84295c030fd444d0eec7d154
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
The auto-infer logic of detecting the 'Alpha' use case
(via check '(palette[i] & 0x00ff00ffu) != 0' is failing
for this corner case image with all black pixels (rgb = 0)
and different Alpha values.
-> switch generic use-LUT detection
Change-Id: I982a8b28c8bcc43e3dc68ac358f978a4bcc14c36
Added 1 pixel cache for palette colors for faster lookup.
This will speedup images that require ApplyPalette by 6.5% for lossless
compression.
Change-Id: Id0c5174d797ffabdb09905c2ba76e60601b686f8
'mem' was being offset once by DO_ALIGN() then shifted 'nz_size' which
would end up accounting for more than ALIGN_CST and exceed the allocation.
broken since:
9bf3129 align VP8Encoder::nz_ allocation
Change-Id: I04a4e0bbf80d909253ce057f8550ed98e0cf1054
* "declaration of ‘index’ shadows a global declaration [-Wshadow]"
* "signed and unsigned type in conditional expression [-Wsign-compare]"
Change-Id: I891182d919b18b6c84048486e0385027bd93b57d
Earlier such images were using roughly 9 * width * height bytes for
decoding. Now, they take 6 * width * height memory.
Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7
Simply get rid of an intermediate buffer of size width x height, by
using the fact that stride == width in this case.
Change-Id: I92376a2561a3beb6e723e8bcf7340c7f348e02c2
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.
also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )
Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
it's not often the case, but could happen, that chroma has non-zero
coeff but luma hasn't. In such case, we should skip luma right away
Change-Id: I9515573ffaec8aad8b069d2c02ffbda4a6eff97c
Removed a call to WebPParseHeaders() inside VP8GetHeaders(). This was not needed
anyway, as all call flows already call WebPParseHeaders() before calling
VP8GetHeaders().
This avoids duplicate calls to WebPParseHeaders().
Change-Id: Icb2d618bd26c44220d956c17a69c9c45a62d5237
The output surface CAN be changed inbetween calls to
WebPIUpdate() or WebPIAppend(), but with precautions.
Change-Id: I899afbd95738a6a8e0e7000f8daef3e74c99ddd8
This applies to images with optional chunks (e.g. images with ALPH
chunk,
ICCP chunk etc). Before this, the incremental decoding used to work like
non-incremental decoding for such files, that is, no rows were decoded
until
all data was available.
The change is in 2 parts:
- During optional chunk parsing, don't wait for the full VP8/VP8L chunk.
- Remap 'alpha_data' pointer whenever a new buffer is allocated/used in
WebPIAppend() and WebPIUpdate().
Change-Id: I6cfd6ca1f334b9c6610fcbf662cd85fa494f2a91
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.
Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
user can now call WebPEncode() with any YUVA or ARGB format, for
lossy or lossless compression
also: simplified error reporting, which is done in WebPPictureARGBToYUVA()
and WebPPictureYUVAToARGB()
Change-Id: Ifb68909217175bcf5a050e5c68d06de9849468f7
(cherry picked from commit 07d87bda1b)
transfer ownership of chunk passed in ChunkSetNth(). prevents freeing
the chunk in e.g., MuxImageParse() when a partial assignment (alpha but
no image) has occurred.
Change-Id: Ia69656b04fdf50f098f3816b54abd4e191248de3
Saturation was done on input coeff, not quantized one.
This saturation is not absolutely needed: output of FTransformWHT
is in range [-16320, 16321]. At quality 100, max quantization steps is 8,
so the maximal range used by QuantizeBlock() is [-2040, 2040].
But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so
it's better to leave this saturation check for now.
addresses issue #145
Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28
* merge cost calculation functions (BitsEntropy() and HuffmanCost())
* have HistogramAdd() specialized into separate functions
* use threshold to bail-out early
* revamp code a bit
* also: save memory by freeing free(histogram_image)
Change-Id: I8ee5d2cfa1462d5d6ea6361f5c89925a3720ef55
* VP8L shouldn't have an alpha chunk
* expect an animation to only contain frames, not a mix of image chunks
* enforce ANIM/ANMF order
* expect a full frame in a complete file
Change-Id: I953a8b6058f9bc00f1d042635548f158abdf6fce
subdirectories with more than one target can have the install targets
run in parallel with make -jN. group the shared headers in one place to
produce a common install target.
Change-Id: I1f3aa338a8ee6d681de1e5d0b2c6244d2c3d5451
Reported to eventually be 4% on ARM
(see https://code.google.com/p/webp/issues/detail?id=134 for details)
We might activate it selectively later...
Output values is not bitwise the same as the LUT-based
version, but difference is only +/-1 at max.
Change-Id: I1cc790ff4459885ed2ae2e72f31c5f3740095f07
this allows DLLs to be built under mingw and sets up a more obvious
dependency in the shared objects
Change-Id: I33e8a6132a16ca49563492438a1b3b74be9ed6a1
expands to e.g., -lpthread/-pthread, etc. if --enable-threading is used
and additional libs/flags are required
Change-Id: I8e8a7b44450bee32ddc58097e1e309d972b1092a
This is required for WebP lossy+Alpha images, where Alpha channel is taking
60-70% of the compression (CPU) cycles.
Also evaluated on 1000 PNG corpus and overall compression speed
is 15-40% better for lossy (PNG+Alpha) compression.
The pure lossless compression numbers are almost same (or little
better) with this change.
Change-Id: I9e5ae7372ed6227a9a5b64cd9cff84c747195a57
using token-buffer (that is: slightly more memory. O(output_size))
This change is ON by default. To return to previous behaviour, use
'cwebp -low_memory' or set config.low_memory to true.
Side-effect of this new mode: it forces 1 partition only (which was
default anyway), and makes some statistics about the bitstream
no longer available. cwebp will no longer report 'intra4-coeffs', etc.
This mode also doesn't work (yet) with multi-pass, and -low_memory
is currently forced for multi-pass.
also: reversed the flag: USE_TOKEN_BUFFER -> DISABLE_TOKEN_BUFFER
also: fixed the kAverageBytesPerMB estimate
Change-Id: I4ea80382038d6df4309663e0cb7bd88d9bca9cf1
When a ANMF/FRGM chunk size (read from file) is smaller than ANMF/FRGM header
size (which is constant and implicit), the parser should report an error.
Change-Id: I91d71889937f5133a97f1e83d5254cb2d7f37028
broken since:
ad25032 Merge "multi-threaded alpha encoding for lossy"
this produced an error due to an empty VP8TBuffer struct.
Change-Id: I640809d07d20092c1d660e2b59b58a62a12e4371
Also tweak the code for single image and fragmented image cases, so that
dmux->num_frames_ is always correct.
Change-Id: I31e6904222e4d96a54a0d8c8aa73d43b7a9094e7
new option: 'cwebp -mt ...'
new config flag: config.thread_level
(allowed thread_level are 0 or 1 for now. Maybe more later...)
If -mt is activated (and WEBP_USE_THREAD is used for compile), the alpha-compression
will be done in parallel to RGB coding for lossy. Can save quite a bit of latency...
Has no effect for lossless encoding.
Change-Id: I769d0bf90e7380cf99344ad62cd77277f4df5a46
It now returns ALPHA_FLAG for lossless images with alpha, that don't
have a VP8X chunk.
This is consistent with similar methods WebPMuxGetFeatures() and
WebPGetFeatures().
Change-Id: Ia3a4ca8f3e0f102f478bd33e0727ca5be98593df
larger values are still dealt with in the .cc
~5% faster encoding
Output size is slightly different (variably), because of
different floating-point calculation ordering.
Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4
-> split libraries further into decoder / encoder
-> add libwebpdecoder.a in Makefile.unix
-> make dwebp link against libwebpdecoder.a in Makefile.unix
also: in makefile.unix, pass EXTRA_FLAGS to LDFLAGS too
(otherwise, -m32 wouldn't work, e.g.)
Change-Id: Ief3da02a729dd86bbaf949ed048836716941657f
signed integer overflow behavior is undefined, split PrefixEncode() to
two branches to avoid this.
Change-Id: I6e2761d0d77f0aaceafdc4e07232e089c22beb64
* add SSE2 variant for lossless
* speed-up TransformColor calls using specialized TransformColorBlue/Red
* Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously.
This latter changes the output size a little bit.
Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89
Simplify and re-organize the VP8L bit-reader functions
(e.g.: the 40-bit look-ahead code was helping much)
Speed-up with LBITS=64, on arm7-a:
=> before:
./dwebp_justify_24_neon -v bryce_ll.webp
Time to decode picture: 11.393s
File bryce_ll.webp can be decoded (dimensions: 11158 x 2156).
...
=> after (LBITS=64): Time to decode picture: 9.953s
making the VP8L bit-reader in 32 bit mode is going to be
harder (because we need to be able to read two symbols
at a time, each with max length 15 bits)
Change-Id: I89746fb103b87b5e2fd40a3208a6fbc584b88297
This flag will make the code use no uint64, no asm, and no fancy
trick, but instead aim at being as simple and straightforward as
possible.
Main use is to help emscripten generate proper JS code.
More code needs to be simplified later.
Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY
Here are the typical timing for decoding a large image:
ARM7-a:
dwebp_justify_32_neon Time to decode picture: 3.280s
dwebp_justify_24_neon Time to decode picture: 2.640s
dwebp_justify_16_neon Time to decode picture: 2.723s
dwebp_justify_8_neon Time to decode picture: 2.802s
dwebp_justify_32 Time to decode picture: 4.264s
dwebp_justify_24 Time to decode picture: 3.696s
dwebp_justify_16 Time to decode picture: 3.779s
dwebp_justify_8 Time to decode picture: 3.834s
dwebp_32_neon Time to decode picture: 4.010s
dwebp_24_neon Time to decode picture: 2.725s
dwebp_16_neon Time to decode picture: 2.852s
dwebp_8_neon Time to decode picture: 2.778s
dwebp_32 Time to decode picture: 4.587s
dwebp_24 Time to decode picture: 3.800s
dwebp_16 Time to decode picture: 3.902s
dwebp_8 Time to decode picture: 3.815s
REFERENCE (HEAD) Time to decode picture: 3.818s
x86_64:
dwebp_justify_32 Time to decode picture: 0.473s
dwebp_justify_24 Time to decode picture: 0.434s
dwebp_justify_16 Time to decode picture: 0.450s
dwebp_justify_8 Time to decode picture: 0.467s
dwebp_32 Time to decode picture: 0.474s
dwebp_24 Time to decode picture: 0.468s
dwebp_16 Time to decode picture: 0.468s
dwebp_8 Time to decode picture: 0.481s
REFERENCE (HEAD) Time to decode picture: 0.436s
i386:
dwebp_justify_32 Time to decode picture: 0.723s
dwebp_justify_24 Time to decode picture: 0.618s
dwebp_justify_16 Time to decode picture: 0.626s
dwebp_justify_8 Time to decode picture: 0.651s
dwebp_32 Time to decode picture: 0.744s
dwebp_24 Time to decode picture: 0.627s
dwebp_16 Time to decode picture: 0.642s
dwebp_8 Time to decode picture: 0.642s
Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225
This option remaps internal parameters to better match
the expected compression curve of JPEG and produce output files
of similar size, but with better quality.
Change-Id: I96a1cbb480b1f6a0c6845a23c33dfd63f197b689
If it's not a partial file and parser returns PARSE_NEED_MORE_DATA, then
consider it to be PARSE_ERROR.
Change-Id: Id652a345bd2a9f574970272dd0a00517de113215
If a NULL pre-allocated buffer is passed, a buffer will be automatically
allocated.
+ add some parameter checks.
reported in http://code.google.com/p/webp/issues/detail?id=139
Change-Id: I9e14ed97db30ee12e46b5e92aac7eeaaeb99bfd5
store values to a temporary variable before calling functions that take
vector types.
removes non-standard constructs such as:
(uint8x8x2_t){{ a, b }}
fixing:
src/dsp/upsampling_neon.c:69:32: error: macro "vst2_u8" passed 3
arguments, but takes just 2
Change-Id: Ib4368e16e3a3efac18024f02be94e76243ade2dc
Fixes: https://code.google.com/p/webp/issues/detail?id=140
- along the lines of the SSE chroma upsampling.
Total speedup is ~30%.
4% speed loss on YuvToRgbXX conversion using tables instead
of 14-bit fixed precision. TODO(later): investigate, and compare
to x86.
see http://code.google.com/p/webp/issues/detail?id=134
Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009
When the config option '--enable-libwebpdecoder' is specified, the
lean decoder library 'libwebpdecoder' will be created in addition to
libwebp. Also dwebp binary will be linked to libwebpdecoder, if this
config option is specified.
Change-Id: I9de3e149b59c9a8390fae2ba660941749640e54a
Actually, it turns out we now should never call these functions
with a zero size, otherwise something is wrong in the logic.
Change-Id: Ie414fcbec95486c169190470a71f2cff0843782a
also change lossless encoder logic, which was relying on explicit
NULL return from WebPSafeMalloc(0)
renamed function to CheckSizeArgumentsOverflow() explicitly
addresses issue #138
Change-Id: Ibbd51cc0281e60e86dfd4c5496274399e4c0f7f3
- precompute filtering strength once for all at the beginning
instead of per-macroblock
- reduce size of VP8MB struct from 8 bytes to 4.
- removed VP8StoreBlock() accordingly
Change-Id: Icf3d329473e21c464770be3d72a04c9ee4c321f2
GetCoeffs is (by far) the most consuming function of the decoder.
No speed change (unfortunately), but the main loop is somehow clearer.
Change-Id: I78f1c10cadc2c8696c041f5cbda86cab92cc6598
* treat the last coeff as a special case
* re-arrange the inner code to be shorter
* replace some VP8EncBands[n] by n, for n = 0 or 1
Change-Id: I71e17b014cffad7b073e787fde06260905a6953f
The main advantage is that you can avoid the use of uint64_t
some times, sticking to 32bit only.
Default still is BITS=32, this is mainly "in case".
Change-Id: Id694028793117ba822c37d46ef6c52fa0afed4ac
- Separate out mux.h and demux.h
- muxtypes.h: new header for data types common to mux/demux
- Move some misc read/write utilities to utils/utils.h
- Remove some duplicate methods.
- Separate out mux/demux libraries
Change-Id: If9b9569b10d55d922ad9317ef51710544315d6de