* simplify the endian logic
* remove the need for memset()
* write 16 or 32 at a time (likely aligned)
Makes the code a bit faster on ARM (~1%)
Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c
-> remove the 'color_transform' multiplier, use more constants, etc.
This function is particularly critical, mostly because of
GetBestColorTransformForTile().
Loop is a bit faster (maybe ~1%)
Change-Id: I90c96a3437cafb184773acef55c77e40c224388f
The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits)
data in the destination buffer and while pre-multiplying the alpha to RGB colors.
Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba
(instead of per-macroblock)
speed unchanged.
simplified the context-saving for incremental decoding
Change-Id: I301be581bab581ff68de14c4ffe5bc0ec63f34be
VP8GetThreadMethod() may be called with a NULL headers param; correct an
assert.
broken since:
8a2fa09 Add a second multi-thread method
Change-Id: If7b6d1b8f4ec874d343a806cee5f5e6bb6438620
This makes the segmentation overall less prone to
local-optimum or boundary effect.
(and overall, encoding is a little faster)
Change-Id: I35688098b0f43c28b5cb81c4a92e1575bb0eddb9
The registers and instructions are quite different to 32bit
and the assembly code needs a rewrite.
more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/
Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f
the -alpha_cleanup flag was ineffective since we switched cwebp
to using ARGB input always.
Original idea by David Eckel (dvdckl at gmail dot com)
Change-Id: I0917a8b91ce15a43199728ff4ee2a163be443bab
New API options: WebPDecoderOptions.flip and 'dwebp -flip ...'
it uses negative stride trick.
Also changed the decoder code to support user-supplied
buffers with negative stride, independently of the
WebPDecoderOptions.flip value.
Change-Id: I4dc0d06f0c87e51a3f3428be4fee2d6b5ad76053
this partially reverts
f626fe2 Detect canvas and image size mismatch in decoder.
the original change would cause calls to e.g., WebPGetInfo to fail until
a portion of the image chunk was available. With lossy+alpha this meant
waiting for the entire ALPH chunk to be received.
this change restores the original behavior -- reporting the values from
VP8X if available -- while retaining some of the added canvas/image size
checks if the image data is available
Change-Id: I6295b00a2e2d0d4d8847371756af347e4a80bc0e
add TransformDC special case, and make the switch function inlined.
Recovers a few of the CPU lost during the addition of TransformAC3
(only on ARM)
Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6
the *quantized* level should be clipped to 2047, not the
original coeff.
(similar problem was fixed in the regular quantize function
quite some time ago)
Change-Id: I2fd2f8d94561ff0204e60535321ab41a565e8f85
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.
Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
This is in preparation for a future change where input will
be 16bit instead of 12bit
No speed diff observed.
Note that the NEON implementation was using 32bit calc already.
Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this
Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)
Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24