Commit Graph

855 Commits

Author SHA1 Message Date
James Zern
a1e5dae0f0 alpha_processing*: use WEBP_RESTRICT qualifier
this helps both auto-vectorization in the C code and the optimized code
generation

Change-Id: Ide570d6be45125ffef7248bdc40e9eb08f00e832
2021-07-07 15:39:21 -07:00
James Zern
a2fce86744 WebPRescalerImportRowExpand_C: promote some vals before multiply
avoids integer overflow in extreme cases:
src/dsp/rescaler.c:45:32: runtime error: signed integer overflow: 129 *
16777215 cannot be represented in type 'int'
    #0 0x556bde3538e3 in WebPRescalerImportRowExpand_C src/dsp/rescaler.c:45:32
    #1 0x556bde357465 in RescalerImportRowExpand_SSE2 src/dsp/rescaler_sse2.c:56:5
    ...

Bug: chromium:1196850
Change-Id: I4f923807f106713e113f3eec62a1d1c346066345
2021-06-07 18:59:33 -07:00
James Zern
5d4ee4c3c0 cosmetics: remove use of the term 'dummy'
this is replaced with more inclusive / informative text

Bug: webp:507
Change-Id: Ib77f0c79dd548601bf2bc3169985af4b5edf0a62
2021-03-15 11:39:06 -07:00
Ilya Kurdyukov
01b38ee19a faster CollectColorXXXTransforms_SSE41
3/4% faster overall.

Change-Id: If555c5530238ca0342b8d97b0d708b1bdc888d3f
2021-02-19 20:45:07 +01:00
Ilya Kurdyukov
8886f620c0 Use BitCtz for FastSLog2Slow_C
Change-Id: Icc6068b8934e481e6f17efd30616392e68d504ad
2021-02-19 15:11:42 +01:00
Ilya Kurdyukov
fae416179e faster CombinedShannonEntropy_SSE2
optimized for sparse histograms

Change-Id: I54412f5f8fc53d2598964a5be91f6c54ece3f21b
2021-02-19 13:14:46 +01:00
James Zern
33ddb894b1 lossless_sse{2,41}: remove some unneeded includes
Change-Id: Icd2cffd32b39c6bf017eee353ac04a4b6d337a11
2021-02-18 10:54:09 -08:00
Pascal Massimino
b78494a933 Merge "Fix undefined signed shift." 2021-02-18 16:51:17 +00:00
Vincent Rabaud
e79974cd6a Fix undefined signed shift.
Using the fix from SSE2.

Change-Id: Ie53d0163d97322da5a722c3e49f9d5f057ee1d91
2021-02-18 16:56:22 +01:00
Ilya Kurdyukov
a885339448 SSE4.1 versions of BGRA to RGB/BGR color-space conversions
Change-Id: Iacafd2f6402080b02fcbf75831e69c488f447454
2021-02-18 15:32:30 +01:00
Ilya Kurdyukov
a09a647241 SSE4.1 version of TransformColorInverse
Change-Id: I6ba5cb35917eef7a52152c4924eca205b4af7220
2021-02-18 12:42:39 +01:00
James Zern
47f64f6edd filters_sse2: import Chromium change
VerticalUnfilter_SSE2 has long been disabled due to a crash in an
Android emulator that hasn't reproduced elsewhere (crbug.com/654974).
this synchronizes the code for now to avoid needing to locally edit the
file on import.

Bug: 1141126
Change-Id: Ib61aeab93caaff1759606566b9e499eaac1576cf
2021-01-30 11:44:07 -08:00
James Zern
8599571935 disable CombinedShannonEntropy_SSE2 on x86
this function produces different results from the C code due to
use of double/float resulting in output differences when compared to
-noasm.

Bug: webp:499
Change-Id: Ia039b168c0a66da723fb434656657ba1948db8ae
2021-01-18 16:41:44 -08:00
James Zern
ae54553461 dsp.h: allow config.h to override MSVC SIMD autodetection
this fixes builds with cmake targeting visual studio that set
-DWEBP_ENABLE_SIMD=0

BUG=webp:478

Change-Id: I21b61b112c79ff9cbab9e4502a25d3f1fa096c8b
2020-12-03 10:22:04 -08:00
Vincent Rabaud
fc14fc038b Have C encoding predictors use decoding predictors.
libwebp.a in Release mode with no symbols size in bytes:
986430 -> 975114  (-1.1%)

Change-Id: Ia96192a6be2911779e359b72132bdba60b60a13d
2020-12-02 11:54:59 +01:00
Ingvar Stepanyan
52273943c6 Couple of fixes to allow SIMD on Emscripten
- Add `-msimd128` to flags to actually enable WebAssembly SIMD
   when performing SIMD detection. It's currently required in
   addition to `-msse*` / `-mfpu=neon` flags which only perform
   translation of corresponding intrinsics to Wasm SIMD ones.
   See a discussion at emscripten-core/emscripten#12714 for
   automating this and making easier in the future.
 - Remove compilation branch that prevented definitions of
   `WEBP_USE_SSE` and `WEBP_USE_NEON` on Emscripten even when
   SIMD support was detected at compile-time.
 - Add an implementation of `VP8GetCPUInfo` for Emscripten which
   uses static `WEBP_USE_*` flags to determine if a corresponding
   SIMD instruction is supported. This is because Wasm doesn't
   have proper feature detection (yet) and requires making separate
   build for SIMD version anyway.

Change-Id: I77592081b91fd0e4cbc9242f5600ce905184f506
2020-11-18 21:51:41 +00:00
Skal
55a080e50a Add WebPReplaceTransparentPixels() in dsp
with SSE2 implementation.

(Extracted from side experiment)

Change-Id: I62d457fb6643645291cffd6d2d205d4a5ffa4517
2020-09-09 08:15:22 +02:00
Yannis Guyon
47309ef52d webp: WEBP_OFFSET_PTR()
Removes undefined behavior of offsetting NULL.

Change-Id: I7c83d0c913c631c091a5fb128f6d6b46b1d116db
2020-03-20 11:39:06 +01:00
James Zern
687ab00e6e DC{4,8,16}_NEON: replace vmovl w/vaddl
4/8/16 fewer instructions

Change-Id: I38fe08722e7b839e3f3e0bf4df7e0fa8e7a0138f
2020-03-05 09:41:14 -08:00
James Zern
1b92fe75a1 DC16_NEON,aarch64: use vaddlv
saves 3 instructions, neutral to mildly faster on a pixel 3a

Change-Id: I6ae57e8e38d4149167ea14e27cd2b32113b4f8e7
2020-03-04 23:12:20 -08:00
James Zern
53f3d8cf7e dec_neon,DC8_NEON: use vaddlv instead of movl+vaddv
one fewer instruction

Change-Id: I2f599fd6f9eebbb0cab81ae9855244fc401d4323
2020-03-04 15:46:38 -08:00
James Zern
c6b75a1966 lossless_(enc_|)sse2: avoid offsetting a NULL pointer
PredictorSub0_SSE2 doesn't use 'upper' (neither does
VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing
pixels to avoid undefined behavior when offsetting a NULL pointer

BUG=chromium:1026858,oss-fuzz:19430

Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3
2019-12-13 18:33:10 +00:00
James Zern
e2575e05cb DC8_NEON,aarch64: use vaddv
results in one fewer instruction for both DC8uv_NEON and
DC8uvNoLeft_NEON

Change-Id: Ia4e6f4dbc070079cdc2496a698bd4b34198ea164
2019-12-06 09:38:48 -08:00
Cheng Yi
b0e09e346f dec_neon: Fix build failure under some toolchains
some toolchains may implement vcreate_u64 as an assignment to a vector
causing a type mismatch:
 invalid conversion between vector type 'uint64x1_t' (vector of 1
'uint64_t' value) and integer type 'unsigned int' of different size
  const uint64x1_t LKJI____ = vcreate_u64(L | (K << 8) | (J << 16) | (I << 24));
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Change-Id: I5c7b0076ad66d4b3fcdcb7ee9f59bbaa6f19b783
2019-12-06 00:06:44 -08:00
Oliver Wolff
cf0e903c89 dsp/lossless: Fix non gcc ARM builds
The workaround for GCC ARM must not be applied when another toolchain
(like MSVC) is used for the build.

Change-Id: I11ec4558902063ccb085d3f435e24b3a60739dd5
2019-11-27 15:05:08 +01:00
Vincent Rabaud
bb7bc40b6d Remove ubsan errors.
'upper' could be NULL and it would be increased.
But that is for predictor zero that does not use 'upper'.

Change-Id: Icd4ae6792cc55ea021b4f828c3dbdb5f03e120d8
2019-11-06 14:08:14 +01:00
James Zern
fab8f9cfcf cosmetics: normalize '*' association
we associate '*' with types rather than variables

Change-Id: Id93ed65272a8a88e604278693e3850649639e9b6
2019-07-26 01:04:09 -07:00
Pascal Massimino
9d6988f44d Fix the oscillating prediction problem at low quality
For some exact resonance the over-quantization was exactly
compensating the under-quantization, leading to resonance
and strange patterns.

-> we special-handle the very flat blocks, hopefully for the
greater good (and not just the bad-resonance case).

For 'fast mode' (-m 3 or less), we just pay special attention
to the border of the image, where the oscillation / instability
usually starts. For the inner part of the image, since we're not
doing rd-opt, it's harder to fix anything.

Overall, on 'regular' images, the change is written the noise,
often leading to overall faster encoding (because of the short-cut).

BUG=webp:432

Change-Id: Ifaa8286499add80fd77daecf8e347abbff7c3a15
2019-07-03 08:40:41 -07:00
James Zern
92dbf23775 filters_sse2,cosmetics: shorten some long lines
Change-Id: Ifd8ddec50821aba175d41237df18e41b9ac6c7d4
2019-07-01 12:17:43 -07:00
James Zern
a277d197a2 filters_sse2.c: quiet integer sanitizer warnings
missed in a788b49

with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)

Change-Id: I52dcd9cd613107f5424177c277785b92430bffb7
2019-07-01 11:16:50 -07:00
James Zern
a788b49897 filters_sse2.c: quiet integer sanitizer warnings
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)

Change-Id: I7f08a836ddcf777454dfd5b877a81b62b2abac86
2019-06-28 23:22:49 -07:00
James Zern
e6a92c5e15 filters.c: quiet integer sanitizer warnings
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -12 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 244 (8-bit,
unsigned)

Change-Id: I053c92301e55dcb0cae89a7733636283da942176
2019-06-28 23:16:28 -07:00
James Zern
ec1cc40a59 lossless.c: remove U32 -> S8 conversion warnings
Change-Id: Ica2664ea087254959391275654412141ed9472df
2019-06-28 01:34:55 -07:00
Pascal Massimino
1106478f42 remove conversion U32 -> S8 warnings
using an inline U32ToS8() function

Change-Id: I45f535c6c9b5de33d69acc17b466e183fcc19a63
2019-06-24 16:42:42 -07:00
Skal
812a6b49fc lossless_enc: fix some conversion warning
object code is unchanged.

Change-Id: I40fc16056c0ab44c5c57ef6b02af14be767abe87
2019-06-24 16:16:18 +02:00
James Zern
4627c1c91b lossless_enc,TransformColorBlue: quiet uint32_t conv warning
no change in object code

from clang-7 integer sanitizer:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
1955895199 (32-bit, unsigned) to type 'uint8_t' (aka 'unsigned char')
changed the value to 159 (8-bit, unsigned)

Change-Id: I0c3022339e34b9c9af03167ab827ade677973644
2019-06-20 23:06:13 -07:00
James Zern
c84673a62f lossless_enc_sse{2,41}: quiet signed conv warnings
_mm_set1_epi16 takes a short argument

from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 65280 (32-bit, signed) to
type 'short' changed the value to -256 (16-bit, signed)

Change-Id: Iad64f6209a8c130a7df67515451ded45b3f91702
2019-06-15 00:22:03 -07:00
James Zern
776a775709 dec_sse2: quiet signed conv warnings
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument

from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 189 (32-bit, signed) to
type 'char' changed the value to -67 (8-bit, signed)
implicit conversion from type 'int' of value 128 (32-bit, signed) to
type 'char' changed the value to -128 (8-bit, signed)
implicit conversion from type 'int' of value 33909 (32-bit, signed) to
type 'short' changed the value to -31627 (16-bit, signed)

Change-Id: Id6b191b2c06881e27d447eeb1ff5bb2c1857b6ba
2019-06-14 01:00:20 -07:00
James Zern
e78dea7587 (alpha_processing,enc}_sse2: quiet signed conv warnings
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument

from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 255 (32-bit, signed) to
type 'char' changed the value to -1 (8-bit, signed)
implicit conversion from type 'int' of value 33153 (32-bit, signed) to
type 'short' changed the value to -32383 (16-bit, signed)

Change-Id: Ic88c8ef3d00146d34f53a560582db673f818370d
2019-06-10 14:23:58 -07:00
Pascal Massimino
ab2dc8939f Rescaler: fix rounding error
We saturate the result to [0..255]
It's the easiest and safest, given the wide variety of scaling
range we cover: we're not using floats, so precision is always
an issue at one end or the other of the scaling spectrum.

we also use:
  round(a - floor(b))
instead of:
  floor(a - round(b))
to handle difficult cases (ratio ~= .99, e.g.)

MIPS code is still disabled (and wrong)

Change-Id: I18d3f5ddc4c524879c257b928329b1c648fa7fb5
2019-03-30 06:43:55 +00:00
James Zern
8c3f04febb AndroidCPUInfo: reorder terms in conditional
'var != constant' is the preferred style for the library

Change-Id: I226e6d5d80dddd0469808136605f49205d238341
2019-03-15 18:12:04 -07:00
Johann
5173d4ee6f neon IsFlat
Move IsFlat to its own header. This allows it to continue to be
inlined. Using the RTCD and creating a distinct function slows down arm
builds.

   flower   mug
C    3.59  2.12
NEON 3.47  2.01

BUG=b/118740850

Change-Id: Id77e8f76d9e9790c498806e7070bbe37c10bc2e9
2018-12-03 22:59:12 +00:00
Johann
9f4d4a3f49 neon: GetResidualCost
Direct copy of sse2. Slight improvement because neon has
abs().

flower.ppm had minimal improvement. Somewhat expected because
GetResidualCost_C is only ~3.6%

mug.ppm had a better improvement because GetResidualCost_C is
almost 9%.

C    2.150
NEON 2.130

BUG=b/118740850

Change-Id: Ibc0dd97a81596635f5599cf568205974b4fd2597
2018-11-14 11:46:58 -08:00
Johann
0fd7514b55 neon: SetResidualCoeffs
Much faster with aarch64. Still somewhat faster without vmaxv.

C: 3.700s
ArmV7: 3.675
aarch64: 3.600

BUG=b/118740850

Change-Id: I3be852da89633eca4bddce443c87f5e4a2f55868
2018-11-14 11:46:40 -08:00
Vincent Rabaud
decf6f6b87 Speedups for empty histograms.
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.

Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
2018-10-20 13:23:50 +02:00
Vincent Rabaud
dea3e89983 Split HistogramAdd to only have the high level logic in C.
Change-Id: Ic9eaebf7128ca0215b49d2a13bde1f5b94a28061
2018-10-19 14:03:28 +02:00
Vincent Rabaud
cbf82cc04d Remove AVX2 files.
There is only enc_avx2.c and we never managed to get
something fast enough.

Change-Id: I7465b5d8ccf47d9aa612173b8f80f96060cdb366
2018-10-16 14:12:03 +02:00
Vincent Rabaud
ac5433118a Remove a few more useless #defines
Change-Id: I211e9bcb1c37d0ebc108896f109b23ce915e22b4
2018-10-15 16:26:10 +02:00
Vincent Rabaud
3e13da7b4f Clean-up the common sources in dsp.
Change-Id: I1b995e6517e8437127a433dccbb5b2db63e7c3a3
2018-10-08 15:00:01 +02:00
James Zern
de08d72741 cosmetics: normalize include guard comment
Change-Id: I0e08ec604aad8412cfe3d3670d773f4ae5650375
2018-08-22 14:46:53 -07:00