* We were re-doing most of the work in plain-C as 'left-over'.
* we were always returning has_alpha = true because of a bad mask all_0xff
These bugs were conservative and silent, in the sense that we were 'just' doing
more work than necessary.
Now, the SSE2 version is really 2x faster than the C version.
Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa
Handle the corner case when VP8LDecodeImage() method is called with an invalid
header data. The lossless decoding doesn't support incremental mode yet.
Return the error status as BITSTREAM error in case not all pixels are decoded
with the provided bit-stream. Also added asserts in the VP8LDecodeImage() method
to validate the decoder header with appropriate/valid data for huffman trees
(htree_groups_ etc).
Change-Id: Ibac9fcfc4bd0a2c5f624bb9d4a2b9f6459aa19ea
-> introduce special case 64b pattern-copy, similar to the 8b one for alpha.
-> use mempcy() for non-overlapping areas
+ cosmetics and homogenezation of the code
Change-Id: I0e65e04b96fec94c009a4614137dfba2a0f98561
ReadSymbol() finishes with a VP8LSetBitPos() call only and could miss an eos_ during the decode loop.
Things are faster because of inlining too.
Change-Id: I2d2a275f38834ba005bc767d45c5de72d032103e
eos_ needs to be set only when superfluous bits have actually
been requested.
Earlier, we were assuming pre-mature end-of-stream to be an error.
Now, more precisely, we mark error when we have encountered end-of-stream *and*
we attempt to read more bits after that.
This handles cases where image data requires no bits to be read
Change-Id: I628e2c39c64f10c443fb51f86b1f5919cc9fd299
if ALPHA_LOSSLESS_COMPRESSION produces a too big file (very rare!),
we fall-back to no-compression automatically.
Change-Id: I5f3f509c635ce43a5e7c23f5d0f0c8329a5f24b7
speed-up is ~1.6% for photographic image to 10% for graphical image
(1000 images corpus was sped up by 5.8 %)
Code by akramarz@google.com and jyrki@google.com
Change-Id: Iceb2e50e6cc761b9315a3865d22ec9d19b8011c6
Split initialization of YUV444Converters[] out of Upsamplers init.
update test for NULL function pointers
Change-Id: I9603f54250f90c85a12ffbecfd6c59e9b06c47e0
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.
Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
rightmost pixel was missing a copy, which could lead to invalid read.
Also added a lower dimension of 4, below which we use the regular conversion.
This is to prevent corner cases, in addition to not being overkill.
Change-Id: Iac12e7a3d74590f12fe8eeb1830b9891e61439f6
_M_IX86 will be defined in mingw builds after including windows.h. as
the gcc inline asm is first, this missing check would only have caused
an error if the code was reorganized.
Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c
with a special case for dithering==0., it gets somewhat faster on x86
thanks to inlining.
Also, less macros.
Change-Id: Ic2f2bf6718310743bb40cef2104fa759a073e6d5
New function: WebPPictureSmartARGBToYUVA()
This implement smart RGB->YUV conversion.
This is rather undocumented for now, and is triggered using '-pre 4'
preprocessing option.
This is slow-ish and use quite some memory, but should be improvable.
This is somehow a usable beta version.
Change-Id: Ia50a8c30134e4cab8a7d3eb70aef13ce1f6187a1
This compresses the uimage using lossless compression and controlable
decimating pre-process.
Code is under WEBP_EXPERIMENTAL_FEATURE while it's being experimented with.
Change-Id: I8b7f4cfcc3c6afc52a556102842bdbb045ed5ee8
Some single-frame GIF images have a canvas larger than the frame rectangle. For
such images, we retain the ANMF, ANIM and VP8X chunks in the output WebP file.
This ensures that the full canvas width/height and frame offsets are retained.
Change-Id: I3ebae4893f953984de4072fda0938411de787a29
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.
Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
defines HAVE_BUILTIN_BSWAP16/32/64
updated endian_inl.h to have a non-configure fallback for gcc and clang
BSwap16() now uses __builtin_bswap16 if available
Change-Id: Ia04ee07b39303c4b247df96d84f298fb8a81f389
also reduce the load size from 64 to 32 bits as the top 32 bits are
being shifted away in the operation.
the change is neutral speed-wise on x86_64 as is the change in load size
on x86, but it gives a slight improvement on 32-bit arm.
x86 is improved ~13%, 32-bit arm ~3.7%
aarch64 is untested but will likely benefit as well.
Change-Id: Ibcb02a70f46f2651105d7ab571afe352673bef48
forces aligned memory reads (via memcpy) in the VP8 bit reader, useful
for platforms that don't support unaligned loads.
Change-Id: Ifa44a9a1677fbdc6a929520f9340b7e3fcbd6692
this defines WORDS_BIGENDIAN, replacing uses of
__BIG_ENDIAN__/__BYTE_ORDER__ with it
+ fixes lossless BGRA output with big-endian toolchains
that do not define __BIG_ENDIAN__ (codesourcery mips gcc)
Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7
moves the following to this header:
- htole*() definitions from bit_writer.c
- __BIG_ENDIAN__ fallback define from bit_reader_inl.h
Change-Id: I7fff59543f08a70bf8f9ddac849b72ed290471b1
(typecast uint32 pointer to uint64).
The proposed change is little (0.05%) slower but avoids uint32 to uint64
pointer conversion.
Change-Id: I6b8828077ea1324fabd04bfa7e7439e324776250
previously, the final canvas size was adjusted tightly from the
animation frames. Now, it can be specified separately (to be larger, in particular).
calling WebPMuxSetCanvasSize(mux, 0, 0) triggers the 'adjust tightly' behaviour.
This can be useful after calling WebPMuxCreate() if further image addition
is expected.
-> Fixed gif2webp accordingly.
also: made WebPMuxAssemble() more robust by systematically zero-ing WebPData.
Change-Id: Ib4f7eac372cf9dbf6e25cd686a77960e386a0b7f
this will remove a warning about the shift amount not being
an immediate (=constant).
Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a
Also: fix the NULL sharpen argument mismatch.
mostly to balance the use of bswap64, some gcc platforms are already
interpreting the default case the same
Change-Id: Icf860f55b3f16bea349a7d721e6d6abeeb4e5cf3
Another store to load forward block was detected coming from the function
FTransform.
FTransform save the output data 4 times 8 bytes each. when this data is
later being loaded by the QuantizeBlock function in one chunk of 16 bytes
that caused a store to load forward block.
The fix was done in the FTransform function where each two consecutive 8 bytes
were merged into one 16 bytes register and saved into the memory.
This fix gives ~21% function level gain and 1.6% user level gain.
Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992
ChunkVerifyAndAssign() expects to have at least 8 bytes to work with,
but was only checking for the presence of 4.
Change-Id: I8456b15d872de24a90c1e8fbfba463391ced5c7f