The lossless encoding speed-ups are:
- up to 1% with default parameters
- up to 4% in cruncher mode: -q 100 -m 6
Change-Id: Id92d4bad0b0a2c28c8aa9ff5280eea5717017f30
Replace the 2d histograms with uint32_t 1d versions (to avoid
pointer casting and to use the optimized VP8LAddVectorEq).
Change-Id: I90b0fe98390b49e3fd03e3484289571cf7ae6eca
The histograms count the occurrences of len/dist in entropy images.
Those (at most (1<<14) by (1<<14)) are sub-sampled by at least
MIN_HUFFMAN_BITS == 2, hence at most 24 bits in a histogram value.
At most, we multiply by 19 (because the longest histogram is of
size 40 and we do 40>>1, cf code) for the bit cost. So it all fits
in 32 bits.
Change-Id: Ife24b035f54794851ff31f2fac07901f724c6d7f
This avoids defining a version in each translation unit when using
__declspec(dllexport) which causes failures due to multiply defined
symbols with clang-cl:
lld-link: error: duplicate symbol: VP8GetCPUInfo
>>> defined at CMakeFiles\webpdecode.dir\Debug\src\dec\alpha_dec.c.obj
>>> defined at CMakeFiles\webpdsp.dir\Debug\src\dsp\dec_sse41.c.obj
...
Bug: webp:607
Change-Id: I6cd1ee75b3db984aa513263a05516e867a64925d
in TransformColorBlue; make new_blue an int to avoid:
implicit conversion from type 'int' of value 264 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 8 (8-bit,
unsigned)
Bug: b/229626362
Change-Id: Ife276a59231075788396204e1a192f3b0c6d9e21
add explicit casts in calls to ColorTransformDelta()
clears warnings of the form:
implicit conversion from type 'uint8_t' (aka 'unsigned char') of value
254 (8-bit, unsigned) to type 'int8_t' (aka 'signed char') changed the
value to -2 (8-bit, signed)
Bug: b/229626362
Change-Id: I40618209509508f56d8053f9daa29cf2e6999766
previously the types were changed to int to prevent unsigned overflow
warnings:
6ab496ed fix some 'unsigned integer overflow' warnings in ubsan
clears warnings of the form:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
3724541952 (32-bit, unsigned) to type 'int' changed the value to
-570425344 (32-bit, signed)
implicit conversion from type 'int' of value -3361661 (32-bit, signed)
to type 'unsigned int' changed the value to 4291605635 (32-bit,
unsigned)
Bug: b/229626362
Change-Id: If1eb39c5dd7218d686c3c47fb7df72431b873be4
... when it's not available. Even if the value was discarded and
never used, some msan config were complaining about reading it
and passing it around.
Change-Id: Iab8d24676c5bb58e607a829121e36c2862da397c
after:
ece18e55 dsp.h: respect --disable-sse2/sse4.1/neon
WEBP_USE_* will be set when a module is targeting a particular
instruction set, e.g., sse4.1, and not overridden if WEBP_HAVE_SSE41 is
set, as previously this would ignore the case where the instruction set
was disabled via config.h and the HAVE macro was unset.
dsp.h not ensures WEBP_HAVE_* are set when WEBP_USE_* to cover cases
where the files are built without config.h.
Change-Id: Ia1c2dcf4100cc1081d968acb6e085e2a1768ece6
no change in object code
from clang-7 integer sanitizer:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
1955895199 (32-bit, unsigned) to type 'uint8_t' (aka 'unsigned char')
changed the value to 159 (8-bit, unsigned)
Change-Id: I0c3022339e34b9c9af03167ab827ade677973644
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.
Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.
Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
when targeting NEON C functions with NEON equivalents won't be used, but
will contribute to binary size. the same goes for sse2, etc., but this
change is primarily concerned with binary sizes for android arm targets.
note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect
on the use of NEON functions.
this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0.
Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
For some reason, gcc has hard time inlining this one...
Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]
Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
Having it architecture dependent resulted in an extra
function call of an extern function, hence no inlining and
a 5-10% impact on performance.
Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450
We add the following MSA optimized color transform functions:
- TransformColor
- SubtractGreenFromBlueAndRed
Change-Id: Ib182d2b5faa7191f503ce70f0dfde0ac89402fd3
The old implementation in enc/near_lossless.c performing a separate
preprocessing step is used only when a prediction filter is not used,
otherwise a new implementation integrated into lossless_enc.c is used.
It retains the same logic for converting near lossless quality into max
number of bits dropped, and for adjusting the number of bits based on
the smoothness of the image at a given pixel. As before, borders are not
changed.
Then, instead of quantizing raw component values, the residual after
subtract green and after prediction is quantized according to the
resulting number of bits, taking care to not cross the boundary between
255 and 0 after decoding. Ties are resolved by moving closer to the
prediction instead of by bankers’ rounding.
This results in about 15% size decrease for the same quality.
Change-Id: If3e9c388158c2e3e75ef88876703f40b932f671f
The code and logic is unified when computing bit entropy + Huffman cost.
Speed-wise, we gain 8% for lossless encoding.
Logic-wise, the beginning/end of the distributions are handled properly
and the compression ratio does not change much.
Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.
Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
CombinedShannonEntropy takes 30% for lossless compression.
This implementation speeds up the overall process by 2 to 3 %.
Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f
Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset:
Alpha cleanup before: 18856614
Alpha cleanup after: 18685802
For reference, with no alpha cleanup: 19159992
Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps
preprocessing in the encoder, and the cases when the prediction transform is not used.
Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28
for more speed.
This gives a roughly a 1% speedup for low_effort. But actually this is a
preparation for the upcoming CL that changes RGB values of transparent pixels
based on prediction, which should not be done for low_effort because that would
slightly hurt its performance.
On 1000 PNGs, with quality 0, method 0:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.034 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.428 MP/s
Change-Id: I5ed9f599bbf908a917723f3c780551ceb7fd724d
The same computation was done for both values: go over two buffers,
sum them up, and take a decision on the sum at each iteration.
MIPS32 code has been disabled for now, pending a code update.
Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d