Another store to load forward block was detected coming from the function
FTransform.
FTransform save the output data 4 times 8 bytes each. when this data is
later being loaded by the QuantizeBlock function in one chunk of 16 bytes
that caused a store to load forward block.
The fix was done in the FTransform function where each two consecutive 8 bytes
were merged into one 16 bytes register and saved into the memory.
This fix gives ~21% function level gain and 1.6% user level gain.
Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992
ChunkVerifyAndAssign() expects to have at least 8 bytes to work with,
but was only checking for the presence of 4.
Change-Id: I8456b15d872de24a90c1e8fbfba463391ced5c7f
new options:
dwebp -alpha_dither
vwebp -noalphadither
When the source was marked as quantized, we use a threshold-averaging
filter to smooth the decoded alpha plane.
Note: this option forces the decoding of alpha data in one pass, and
might slow the decoding a bit.
The new field in WebPDecoderOptions struct is 'alpha_dithering_strength'
(0 by default, means: off). Max strength value is '100'.
Change-Id: I218e21af96360d4781587fede95f8ea4e2b7287a
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)
Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)
Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.
Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
this change has the side-effect of using directory names in the
include, silencing a lint warning.
Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.
also: fixes a warning for win64
+ slight speed-up
Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
+ add a WEBP_HAVE_SSE2 to dsp.h
not all 32-bit toolchain configurations will have sse2 enabled by
default
Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
this is used to set WEBP_USE_AVX2 in files where the build flag won't be
used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called
Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f
* remove LEFT/RIGHT_JUSTIFY distinction. It's all RIGHT_JUSTIFY now.
* simplify VP8GetSigned(), and add some masking branch-less code. Much
faster on ARM (~13% speed-up). 8% on x86-64, 5% on MacBook.
* split critical implementation into separate bit_reader_inl.h file that
is only included where needed (vp8.c / tree.c / bit_reader.c)
* bumped BITS value from 16 to 24 for x86-32b too, since it's a bit faster.
Change-Id: If41ca1da3e5c3dadacf2379d1ba419b151e7fce8
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).
Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.
Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()
Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
gcc was generating very complex code, one for each case of br->len_ values!
also, pretty-fy the mask constants
Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled
Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
Sometimes, we can write 18bit or more at time, and it would
overflow the 32bit accumulator.
Also clarified the num-bits limitations (and exposed
VP8L_MAX_NUM_BIT_READ in bit_reader.h)
fixes http://code.google.com/p/webp/issues/detail?id=200
Seems a bit faster (use of local fields for bits_ / used_)
also: added the __QNX__ bswap while at it.
Change-Id: I876db93a931db15b083cf1d838c70105effa7167