Commit Graph

853 Commits

Author SHA1 Message Date
James Zern
5d4ee4c3c0 cosmetics: remove use of the term 'dummy'
this is replaced with more inclusive / informative text

Bug: webp:507
Change-Id: Ib77f0c79dd548601bf2bc3169985af4b5edf0a62
2021-03-15 11:39:06 -07:00
Ilya Kurdyukov
01b38ee19a faster CollectColorXXXTransforms_SSE41
3/4% faster overall.

Change-Id: If555c5530238ca0342b8d97b0d708b1bdc888d3f
2021-02-19 20:45:07 +01:00
Ilya Kurdyukov
8886f620c0 Use BitCtz for FastSLog2Slow_C
Change-Id: Icc6068b8934e481e6f17efd30616392e68d504ad
2021-02-19 15:11:42 +01:00
Ilya Kurdyukov
fae416179e faster CombinedShannonEntropy_SSE2
optimized for sparse histograms

Change-Id: I54412f5f8fc53d2598964a5be91f6c54ece3f21b
2021-02-19 13:14:46 +01:00
James Zern
33ddb894b1 lossless_sse{2,41}: remove some unneeded includes
Change-Id: Icd2cffd32b39c6bf017eee353ac04a4b6d337a11
2021-02-18 10:54:09 -08:00
Pascal Massimino
b78494a933 Merge "Fix undefined signed shift." 2021-02-18 16:51:17 +00:00
Vincent Rabaud
e79974cd6a Fix undefined signed shift.
Using the fix from SSE2.

Change-Id: Ie53d0163d97322da5a722c3e49f9d5f057ee1d91
2021-02-18 16:56:22 +01:00
Ilya Kurdyukov
a885339448 SSE4.1 versions of BGRA to RGB/BGR color-space conversions
Change-Id: Iacafd2f6402080b02fcbf75831e69c488f447454
2021-02-18 15:32:30 +01:00
Ilya Kurdyukov
a09a647241 SSE4.1 version of TransformColorInverse
Change-Id: I6ba5cb35917eef7a52152c4924eca205b4af7220
2021-02-18 12:42:39 +01:00
James Zern
47f64f6edd filters_sse2: import Chromium change
VerticalUnfilter_SSE2 has long been disabled due to a crash in an
Android emulator that hasn't reproduced elsewhere (crbug.com/654974).
this synchronizes the code for now to avoid needing to locally edit the
file on import.

Bug: 1141126
Change-Id: Ib61aeab93caaff1759606566b9e499eaac1576cf
2021-01-30 11:44:07 -08:00
James Zern
8599571935 disable CombinedShannonEntropy_SSE2 on x86
this function produces different results from the C code due to
use of double/float resulting in output differences when compared to
-noasm.

Bug: webp:499
Change-Id: Ia039b168c0a66da723fb434656657ba1948db8ae
2021-01-18 16:41:44 -08:00
James Zern
ae54553461 dsp.h: allow config.h to override MSVC SIMD autodetection
this fixes builds with cmake targeting visual studio that set
-DWEBP_ENABLE_SIMD=0

BUG=webp:478

Change-Id: I21b61b112c79ff9cbab9e4502a25d3f1fa096c8b
2020-12-03 10:22:04 -08:00
Vincent Rabaud
fc14fc038b Have C encoding predictors use decoding predictors.
libwebp.a in Release mode with no symbols size in bytes:
986430 -> 975114  (-1.1%)

Change-Id: Ia96192a6be2911779e359b72132bdba60b60a13d
2020-12-02 11:54:59 +01:00
Ingvar Stepanyan
52273943c6 Couple of fixes to allow SIMD on Emscripten
- Add `-msimd128` to flags to actually enable WebAssembly SIMD
   when performing SIMD detection. It's currently required in
   addition to `-msse*` / `-mfpu=neon` flags which only perform
   translation of corresponding intrinsics to Wasm SIMD ones.
   See a discussion at emscripten-core/emscripten#12714 for
   automating this and making easier in the future.
 - Remove compilation branch that prevented definitions of
   `WEBP_USE_SSE` and `WEBP_USE_NEON` on Emscripten even when
   SIMD support was detected at compile-time.
 - Add an implementation of `VP8GetCPUInfo` for Emscripten which
   uses static `WEBP_USE_*` flags to determine if a corresponding
   SIMD instruction is supported. This is because Wasm doesn't
   have proper feature detection (yet) and requires making separate
   build for SIMD version anyway.

Change-Id: I77592081b91fd0e4cbc9242f5600ce905184f506
2020-11-18 21:51:41 +00:00
Skal
55a080e50a Add WebPReplaceTransparentPixels() in dsp
with SSE2 implementation.

(Extracted from side experiment)

Change-Id: I62d457fb6643645291cffd6d2d205d4a5ffa4517
2020-09-09 08:15:22 +02:00
Yannis Guyon
47309ef52d webp: WEBP_OFFSET_PTR()
Removes undefined behavior of offsetting NULL.

Change-Id: I7c83d0c913c631c091a5fb128f6d6b46b1d116db
2020-03-20 11:39:06 +01:00
James Zern
687ab00e6e DC{4,8,16}_NEON: replace vmovl w/vaddl
4/8/16 fewer instructions

Change-Id: I38fe08722e7b839e3f3e0bf4df7e0fa8e7a0138f
2020-03-05 09:41:14 -08:00
James Zern
1b92fe75a1 DC16_NEON,aarch64: use vaddlv
saves 3 instructions, neutral to mildly faster on a pixel 3a

Change-Id: I6ae57e8e38d4149167ea14e27cd2b32113b4f8e7
2020-03-04 23:12:20 -08:00
James Zern
53f3d8cf7e dec_neon,DC8_NEON: use vaddlv instead of movl+vaddv
one fewer instruction

Change-Id: I2f599fd6f9eebbb0cab81ae9855244fc401d4323
2020-03-04 15:46:38 -08:00
James Zern
c6b75a1966 lossless_(enc_|)sse2: avoid offsetting a NULL pointer
PredictorSub0_SSE2 doesn't use 'upper' (neither does
VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing
pixels to avoid undefined behavior when offsetting a NULL pointer

BUG=chromium:1026858,oss-fuzz:19430

Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3
2019-12-13 18:33:10 +00:00
James Zern
e2575e05cb DC8_NEON,aarch64: use vaddv
results in one fewer instruction for both DC8uv_NEON and
DC8uvNoLeft_NEON

Change-Id: Ia4e6f4dbc070079cdc2496a698bd4b34198ea164
2019-12-06 09:38:48 -08:00
Cheng Yi
b0e09e346f dec_neon: Fix build failure under some toolchains
some toolchains may implement vcreate_u64 as an assignment to a vector
causing a type mismatch:
 invalid conversion between vector type 'uint64x1_t' (vector of 1
'uint64_t' value) and integer type 'unsigned int' of different size
  const uint64x1_t LKJI____ = vcreate_u64(L | (K << 8) | (J << 16) | (I << 24));
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Change-Id: I5c7b0076ad66d4b3fcdcb7ee9f59bbaa6f19b783
2019-12-06 00:06:44 -08:00
Oliver Wolff
cf0e903c89 dsp/lossless: Fix non gcc ARM builds
The workaround for GCC ARM must not be applied when another toolchain
(like MSVC) is used for the build.

Change-Id: I11ec4558902063ccb085d3f435e24b3a60739dd5
2019-11-27 15:05:08 +01:00
Vincent Rabaud
bb7bc40b6d Remove ubsan errors.
'upper' could be NULL and it would be increased.
But that is for predictor zero that does not use 'upper'.

Change-Id: Icd4ae6792cc55ea021b4f828c3dbdb5f03e120d8
2019-11-06 14:08:14 +01:00
James Zern
fab8f9cfcf cosmetics: normalize '*' association
we associate '*' with types rather than variables

Change-Id: Id93ed65272a8a88e604278693e3850649639e9b6
2019-07-26 01:04:09 -07:00
Pascal Massimino
9d6988f44d Fix the oscillating prediction problem at low quality
For some exact resonance the over-quantization was exactly
compensating the under-quantization, leading to resonance
and strange patterns.

-> we special-handle the very flat blocks, hopefully for the
greater good (and not just the bad-resonance case).

For 'fast mode' (-m 3 or less), we just pay special attention
to the border of the image, where the oscillation / instability
usually starts. For the inner part of the image, since we're not
doing rd-opt, it's harder to fix anything.

Overall, on 'regular' images, the change is written the noise,
often leading to overall faster encoding (because of the short-cut).

BUG=webp:432

Change-Id: Ifaa8286499add80fd77daecf8e347abbff7c3a15
2019-07-03 08:40:41 -07:00
James Zern
92dbf23775 filters_sse2,cosmetics: shorten some long lines
Change-Id: Ifd8ddec50821aba175d41237df18e41b9ac6c7d4
2019-07-01 12:17:43 -07:00
James Zern
a277d197a2 filters_sse2.c: quiet integer sanitizer warnings
missed in a788b49

with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)

Change-Id: I52dcd9cd613107f5424177c277785b92430bffb7
2019-07-01 11:16:50 -07:00
James Zern
a788b49897 filters_sse2.c: quiet integer sanitizer warnings
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)

Change-Id: I7f08a836ddcf777454dfd5b877a81b62b2abac86
2019-06-28 23:22:49 -07:00
James Zern
e6a92c5e15 filters.c: quiet integer sanitizer warnings
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -12 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 244 (8-bit,
unsigned)

Change-Id: I053c92301e55dcb0cae89a7733636283da942176
2019-06-28 23:16:28 -07:00
James Zern
ec1cc40a59 lossless.c: remove U32 -> S8 conversion warnings
Change-Id: Ica2664ea087254959391275654412141ed9472df
2019-06-28 01:34:55 -07:00
Pascal Massimino
1106478f42 remove conversion U32 -> S8 warnings
using an inline U32ToS8() function

Change-Id: I45f535c6c9b5de33d69acc17b466e183fcc19a63
2019-06-24 16:42:42 -07:00
Skal
812a6b49fc lossless_enc: fix some conversion warning
object code is unchanged.

Change-Id: I40fc16056c0ab44c5c57ef6b02af14be767abe87
2019-06-24 16:16:18 +02:00
James Zern
4627c1c91b lossless_enc,TransformColorBlue: quiet uint32_t conv warning
no change in object code

from clang-7 integer sanitizer:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
1955895199 (32-bit, unsigned) to type 'uint8_t' (aka 'unsigned char')
changed the value to 159 (8-bit, unsigned)

Change-Id: I0c3022339e34b9c9af03167ab827ade677973644
2019-06-20 23:06:13 -07:00
James Zern
c84673a62f lossless_enc_sse{2,41}: quiet signed conv warnings
_mm_set1_epi16 takes a short argument

from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 65280 (32-bit, signed) to
type 'short' changed the value to -256 (16-bit, signed)

Change-Id: Iad64f6209a8c130a7df67515451ded45b3f91702
2019-06-15 00:22:03 -07:00
James Zern
776a775709 dec_sse2: quiet signed conv warnings
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument

from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 189 (32-bit, signed) to
type 'char' changed the value to -67 (8-bit, signed)
implicit conversion from type 'int' of value 128 (32-bit, signed) to
type 'char' changed the value to -128 (8-bit, signed)
implicit conversion from type 'int' of value 33909 (32-bit, signed) to
type 'short' changed the value to -31627 (16-bit, signed)

Change-Id: Id6b191b2c06881e27d447eeb1ff5bb2c1857b6ba
2019-06-14 01:00:20 -07:00
James Zern
e78dea7587 (alpha_processing,enc}_sse2: quiet signed conv warnings
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument

from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 255 (32-bit, signed) to
type 'char' changed the value to -1 (8-bit, signed)
implicit conversion from type 'int' of value 33153 (32-bit, signed) to
type 'short' changed the value to -32383 (16-bit, signed)

Change-Id: Ic88c8ef3d00146d34f53a560582db673f818370d
2019-06-10 14:23:58 -07:00
Pascal Massimino
ab2dc8939f Rescaler: fix rounding error
We saturate the result to [0..255]
It's the easiest and safest, given the wide variety of scaling
range we cover: we're not using floats, so precision is always
an issue at one end or the other of the scaling spectrum.

we also use:
  round(a - floor(b))
instead of:
  floor(a - round(b))
to handle difficult cases (ratio ~= .99, e.g.)

MIPS code is still disabled (and wrong)

Change-Id: I18d3f5ddc4c524879c257b928329b1c648fa7fb5
2019-03-30 06:43:55 +00:00
James Zern
8c3f04febb AndroidCPUInfo: reorder terms in conditional
'var != constant' is the preferred style for the library

Change-Id: I226e6d5d80dddd0469808136605f49205d238341
2019-03-15 18:12:04 -07:00
Johann
5173d4ee6f neon IsFlat
Move IsFlat to its own header. This allows it to continue to be
inlined. Using the RTCD and creating a distinct function slows down arm
builds.

   flower   mug
C    3.59  2.12
NEON 3.47  2.01

BUG=b/118740850

Change-Id: Id77e8f76d9e9790c498806e7070bbe37c10bc2e9
2018-12-03 22:59:12 +00:00
Johann
9f4d4a3f49 neon: GetResidualCost
Direct copy of sse2. Slight improvement because neon has
abs().

flower.ppm had minimal improvement. Somewhat expected because
GetResidualCost_C is only ~3.6%

mug.ppm had a better improvement because GetResidualCost_C is
almost 9%.

C    2.150
NEON 2.130

BUG=b/118740850

Change-Id: Ibc0dd97a81596635f5599cf568205974b4fd2597
2018-11-14 11:46:58 -08:00
Johann
0fd7514b55 neon: SetResidualCoeffs
Much faster with aarch64. Still somewhat faster without vmaxv.

C: 3.700s
ArmV7: 3.675
aarch64: 3.600

BUG=b/118740850

Change-Id: I3be852da89633eca4bddce443c87f5e4a2f55868
2018-11-14 11:46:40 -08:00
Vincent Rabaud
decf6f6b87 Speedups for empty histograms.
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.

Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
2018-10-20 13:23:50 +02:00
Vincent Rabaud
dea3e89983 Split HistogramAdd to only have the high level logic in C.
Change-Id: Ic9eaebf7128ca0215b49d2a13bde1f5b94a28061
2018-10-19 14:03:28 +02:00
Vincent Rabaud
cbf82cc04d Remove AVX2 files.
There is only enc_avx2.c and we never managed to get
something fast enough.

Change-Id: I7465b5d8ccf47d9aa612173b8f80f96060cdb366
2018-10-16 14:12:03 +02:00
Vincent Rabaud
ac5433118a Remove a few more useless #defines
Change-Id: I211e9bcb1c37d0ebc108896f109b23ce915e22b4
2018-10-15 16:26:10 +02:00
Vincent Rabaud
3e13da7b4f Clean-up the common sources in dsp.
Change-Id: I1b995e6517e8437127a433dccbb5b2db63e7c3a3
2018-10-08 15:00:01 +02:00
James Zern
de08d72741 cosmetics: normalize include guard comment
Change-Id: I0e08ec604aad8412cfe3d3670d773f4ae5650375
2018-08-22 14:46:53 -07:00
Pascal Massimino
2563db4759 fix rescaling rounding inaccuracy
We should be using 'floor' when doing the final divide.

-> new MACRO is MULT_FIX_FLOOR()

     XXX*** Mips code is DISABLED for now ***XXX

I'll update and re-enable it in a later
patch, since this code needs some refactoring first.

BUG=oss-fuzz:9179

Change-Id: Ic0693cdca4e71f5beab1029475e35c4d06b12d13
2018-07-10 22:45:50 -07:00
James Zern
0d5fad46cf add WEBP_DSP_INIT / WEBP_DSP_INIT_FUNC
this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.

Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
(cherry picked from commit d77bf512bd)
2018-04-17 18:01:34 -07:00