The workaround for GCC ARM must not be applied when another toolchain
(like MSVC) is used for the build.
Change-Id: I11ec4558902063ccb085d3f435e24b3a60739dd5
'upper' could be NULL and it would be increased.
But that is for predictor zero that does not use 'upper'.
Change-Id: Icd4ae6792cc55ea021b4f828c3dbdb5f03e120d8
For some exact resonance the over-quantization was exactly
compensating the under-quantization, leading to resonance
and strange patterns.
-> we special-handle the very flat blocks, hopefully for the
greater good (and not just the bad-resonance case).
For 'fast mode' (-m 3 or less), we just pay special attention
to the border of the image, where the oscillation / instability
usually starts. For the inner part of the image, since we're not
doing rd-opt, it's harder to fix anything.
Overall, on 'regular' images, the change is written the noise,
often leading to overall faster encoding (because of the short-cut).
BUG=webp:432
Change-Id: Ifaa8286499add80fd77daecf8e347abbff7c3a15
missed in a788b49
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)
Change-Id: I52dcd9cd613107f5424177c277785b92430bffb7
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)
Change-Id: I7f08a836ddcf777454dfd5b877a81b62b2abac86
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -12 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 244 (8-bit,
unsigned)
Change-Id: I053c92301e55dcb0cae89a7733636283da942176
no change in object code
from clang-7 integer sanitizer:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
1955895199 (32-bit, unsigned) to type 'uint8_t' (aka 'unsigned char')
changed the value to 159 (8-bit, unsigned)
Change-Id: I0c3022339e34b9c9af03167ab827ade677973644
_mm_set1_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 65280 (32-bit, signed) to
type 'short' changed the value to -256 (16-bit, signed)
Change-Id: Iad64f6209a8c130a7df67515451ded45b3f91702
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 189 (32-bit, signed) to
type 'char' changed the value to -67 (8-bit, signed)
implicit conversion from type 'int' of value 128 (32-bit, signed) to
type 'char' changed the value to -128 (8-bit, signed)
implicit conversion from type 'int' of value 33909 (32-bit, signed) to
type 'short' changed the value to -31627 (16-bit, signed)
Change-Id: Id6b191b2c06881e27d447eeb1ff5bb2c1857b6ba
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 255 (32-bit, signed) to
type 'char' changed the value to -1 (8-bit, signed)
implicit conversion from type 'int' of value 33153 (32-bit, signed) to
type 'short' changed the value to -32383 (16-bit, signed)
Change-Id: Ic88c8ef3d00146d34f53a560582db673f818370d
We saturate the result to [0..255]
It's the easiest and safest, given the wide variety of scaling
range we cover: we're not using floats, so precision is always
an issue at one end or the other of the scaling spectrum.
we also use:
round(a - floor(b))
instead of:
floor(a - round(b))
to handle difficult cases (ratio ~= .99, e.g.)
MIPS code is still disabled (and wrong)
Change-Id: I18d3f5ddc4c524879c257b928329b1c648fa7fb5
Move IsFlat to its own header. This allows it to continue to be
inlined. Using the RTCD and creating a distinct function slows down arm
builds.
flower mug
C 3.59 2.12
NEON 3.47 2.01
BUG=b/118740850
Change-Id: Id77e8f76d9e9790c498806e7070bbe37c10bc2e9
Direct copy of sse2. Slight improvement because neon has
abs().
flower.ppm had minimal improvement. Somewhat expected because
GetResidualCost_C is only ~3.6%
mug.ppm had a better improvement because GetResidualCost_C is
almost 9%.
C 2.150
NEON 2.130
BUG=b/118740850
Change-Id: Ibc0dd97a81596635f5599cf568205974b4fd2597
Much faster with aarch64. Still somewhat faster without vmaxv.
C: 3.700s
ArmV7: 3.675
aarch64: 3.600
BUG=b/118740850
Change-Id: I3be852da89633eca4bddce443c87f5e4a2f55868
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.
Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
We should be using 'floor' when doing the final divide.
-> new MACRO is MULT_FIX_FLOOR()
XXX*** Mips code is DISABLED for now ***XXX
I'll update and re-enable it in a later
patch, since this code needs some refactoring first.
BUG=oss-fuzz:9179
Change-Id: Ic0693cdca4e71f5beab1029475e35c4d06b12d13
this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.
Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
(cherry picked from commit d77bf512bd)
the 'accum' variable can be larger than 15b for large
rescale values.
Assert triggered:
src/dsp/rescaler_sse2.c:249: RescalerExportRowExpand_SSE2: Assertion `v >= 0 && v <= 255' failed.
src/dsp/rescaler_sse2.c:350: RescalerExportRowShrink_SSE2: Assertion `v >= 0 && v <= 255' failed.
-> fall back to C implementation in this case for now
Change-Id: I7ea1cb72301cafc1459be403f6a6f4e3cbc89bb1
with WEBP_NEON_OMIT_C_CODE the default _C functions won't be set and
with WEBP_REDUCE_CSP the NEON functions won't be either triggering an
assert for an empty table member.
BUG=chromium:792627
Change-Id: I8d2d430eaa37bb92885b61a3dd39f961924a8def
RGB to YUV conversion was not using SSE to finish up the row.
End data is now copied to a buffer big enough to fit in a
SSE register.
(UPSAMPLE_LAST_BLOCK was already using that trick).
Change-Id: Ie539bcbe570a643a774aa88263503c0d2c41890f
only supported ones are: RGBA/BGRA/rgbA/bgrA (decoder)
as well as: WebPPictureImportRGB/RGBX/RGBA (encoder).
(note: extras/get_disto is affected too)
Change-Id: If6c4f95054ca15759c4e289fb3b4c352b3521c2c
remove auto-filter (-af) support and make WebPPictureCopy,
WebPPictureIsView, WebPPictureView, WebPPictureCrop, and
WebPPictureRescale noops.
Change-Id: If39d512cc268a0015298a1138dbc94feb86575e5
with gcc-4.8, clang-4.0.1/5 this is no faster (actually up to 2x slower)
than the code generated for memset (0x01010... * dst[-1]). shuffles in
sse4 recover a bit, but performance is still down.
Change-Id: Ie85e8353f8ede559d0b05a1d388787fd18ecc80f