Commit Graph

43 Commits

Author SHA1 Message Date
James Zern
776a775709 dec_sse2: quiet signed conv warnings
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument

from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 189 (32-bit, signed) to
type 'char' changed the value to -67 (8-bit, signed)
implicit conversion from type 'int' of value 128 (32-bit, signed) to
type 'char' changed the value to -128 (8-bit, signed)
implicit conversion from type 'int' of value 33909 (32-bit, signed) to
type 'short' changed the value to -31627 (16-bit, signed)

Change-Id: Id6b191b2c06881e27d447eeb1ff5bb2c1857b6ba
2019-06-14 01:00:20 -07:00
James Zern
b94cee98fb dec_sse2: remove HE8uv_SSE2
with gcc-4.8, clang-4.0.1/5 this is no faster (actually up to 2x slower)
than the code generated for memset (0x01010... * dst[-1]). shuffles in
sse4 recover a bit, but performance is still down.

Change-Id: Ie85e8353f8ede559d0b05a1d388787fd18ecc80f
2017-11-20 20:34:05 -08:00
Pascal Massimino
0a17f4712c Merge "WIP: list includes as descendants of the project dir" 2017-10-11 08:21:42 +00:00
James Zern
a439972175 WIP: list includes as descendants of the project dir
#include "(.|..)/..." -> #include "src/..."

Change-Id: I772880aa097a770722043c8a4393552ba38a89b6
2017-10-10 23:04:05 -07:00
James Zern
bcb7347c2b dec_sse2: harmonize function suffixes
BUG=webp:355

Change-Id: Ic0390a4a24a5d8caff5b8af9fc9d59769ec533b1
2017-10-07 15:14:03 -07:00
James Zern
7beed2807b add missing ()s to macro parameters
BUG=webp:355

Change-Id: I616c6d3540d6551edd1b1cfdb5bffcf0a044c90f
2017-08-04 17:02:53 -07:00
skal
663a6d9d2e unify the ALTERNATE_CODE flag usage
Pattern is now:
 #if !defined(FLAG)
 #define FLAG 0   // ALTERNATE_CODE
 #endif
...
 #if (FLAG == 1)
 ...
 #else
  ...
 #endif    // FLAG
...

Removed some unused code / flags:
  WEBP_YUV_USE_TABLE, WEBP_REFERENCE_IMPLEMENTATION,
  experimental code,  VP8YUVInit(), ...

BUG=webp:355

Change-Id: I98deb9189446a4cfd665c13ea8aa1ce6a308c63f
2017-08-01 20:49:29 -07:00
James Zern
b34a9db1a1 cosmetics,dec_sse2: remove some redundant comments
Change-Id: I5a59d6dde9b6638b318f36d51d0d53870a3de273
2017-07-06 23:19:18 -07:00
James Zern
668e1dd44f src/{dec,enc,utils}: give filenames a unique suffix
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible

BUG=webp:279

Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
2017-01-19 19:09:48 -08:00
James Zern
6b53ca876e cosmetics,(dec|enc)_sse2.c: fix indent
Change-Id: Ic3326136ddd325e911e96c2e5a7f06b3e1d60f66
2016-07-13 16:11:29 -07:00
James Zern
ea0be354a0 dsp.h: remove utils.h include
include utils.h directly where needed to allow utils.h to rely on
defines from dsp.h in a follow-up.

Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d
2016-05-11 23:17:21 -07:00
Vincent Rabaud
bf2b4f114f Regroup common SSE code + optimization.
The transpose refactoring will help removing a transpose in a
later CL.

The horizontal add function helps removing a _mm_sad_epu8 in DC8uv
=> the latency/throughput went from 29/25 to 23/19

Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766
2016-02-16 18:34:34 +01:00
Pascal Massimino
2c08aac81a introduce WebPMemToUint32 and WebPUint32ToMem for memory access
it uses memcpy() when unaligned memory write is tricky

Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874
2015-12-04 13:43:01 +00:00
Pascal Massimino
25bf2ce5cc fix some warning about unaligned 32b reads
on x86 + gcc, the assembly code is the same.

Change-Id: Ib0d23772ccf928f8d9ebcb0e157c0573d1f6a786
2015-10-28 15:51:55 -07:00
Pascal Massimino
36e9c4bc50 SSE2: minor cosmetrics on in-loop filter code
Change-Id: Ic0e6502081d7063bb2841df74e05c450d708aaf2
2015-06-28 11:59:22 +02:00
Pascal Massimino
7f9c98f21d Merge "sse2 in-loop: simplify SignedShift8b() a bit" 2015-06-12 07:37:32 +00:00
James Zern
ef314a5d6c dec_sse2/GetNotHEV: micro optimization
trade 2 subtractions + logical or for 1 max + 1 subtraction

Change-Id: I7d1f25f7cda2a89bc8247f3d3d5417f6b0e3d96c
2015-06-11 22:46:24 -07:00
Pascal Massimino
a729cff987 sse2 in-loop: simplify SignedShift8b() a bit
Change-Id: Ida3e096bb41451194d03dc7a97753a222ff0135c
2015-06-11 15:26:31 -07:00
Pascal Massimino
422ec9fb62 simplify Load8x4() a bit
Change-Id: I68cf09c432f48e34bbe1d47dd091417cfd40cf4e
2015-06-10 12:35:50 -07:00
James Zern
9904e365a8 dsp/dec_sse2: DC8uv / DC8uvNoLeft speedup
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.

DC8uv: ~19% faster
DC8uvNoLeft: ~12% faster

Change-Id: I707c4f6177a65b5d1f2d3deeca87d2bb740185e2
2015-04-08 23:12:53 -07:00
James Zern
7df2049785 dsp/dec_sse2: DC16 / DC16NoLeft speedup
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.

DC16: ~20% faster
DC16NoLeft: ~14% faster

Change-Id: I7ec3f8a6e5923f88a530f79fceb88d5001bef691
2015-04-08 23:10:39 -07:00
James Zern
b44eda3f60 dsp: add DSP_INIT_STUB
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning

Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
2015-04-02 23:55:35 -07:00
James Zern
66de69c1fe dsp/dec*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: I319bc7714f36b8a3d8b35f6474e5592a439aaf24
2015-03-20 19:19:37 -07:00
James Zern
b969f5dfac dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage
the attribute is only necessary in one location; remove it from the
prototypes.

Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2
2015-02-13 15:23:40 -08:00
James Zern
860badcacc cosmetics: dec_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I77decb55f1382bea4b646a11b77dfa40bf1ef94d
2015-02-05 23:51:16 -08:00
Pascal Massimino
19f0ba0eb9 Implement true-motion prediction in SSE2
(along with DC/HE/VE for chroma/luma16)

Overall effect is ~1% faster decoding.

Change-Id: I90917e050d61874cbc8da0e88f26b5dd6131c265
2015-02-04 17:02:22 +01:00
James Zern
0db9031c79 dsp/dec_{neon,sse2}: VE4: normalize variable names
use '0' rather than '_' when dealing with variables that result from a
shift

Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
Pascal Massimino
b7a33d7e91 implement VE4/HE4/RD4/... in SSE2
(30% faster prediction functions, but overall speed-up is ~1% only)

Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765
2014-10-22 18:25:36 +02:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6 mark some init function as being safe for thread_sanitizer.
introduces the macro WEBP_TSAN_IGNORE_FUNCTION

Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
skal
ea8b0a171d strong filtering speed-up (~2-3% x86, ~1-2% for NEON)
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).

Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
2014-06-03 07:14:23 +02:00
Pascal Massimino
f60957bfd2 clean-up and slight speed-up in-loop filtering SSE2
* remove some sign-bit flipping
* turn some macro into inline functions
* fix some 'const' in signatures
* clarify the int8/uint8 usage

Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d
2014-05-19 23:23:47 -07:00
James Zern
d038e6193b dec_sse2: drop SSE2 suffix from local functions
Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882
2014-04-02 23:10:39 -07:00
James Zern
5227d99146 drop: ifdef __cplusplus checks from C files
the prototypes are already marked in the headers

Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094
2013-12-13 11:42:13 -08:00
skal
f9bbc2a034 Special-case sparse transform
If the number of non-zero coeffs is <= 3, use a
simplified transform for luma.

Change-Id: I78a1252704228d21720d4bc1221252c84338d9c8
2013-10-08 22:05:38 +02:00
James Zern
d640614d54 update copyright text
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files

based on the discussion in:
https://codereview.chromium.org/12771026/

Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
2013-06-06 23:09:14 -07:00
skal
0d19fbff51 remove some -Wshadow warnings
these are quite noisy, but it's not a big deal to remove
them.

Change-Id: I5deb08f10263feb77e2cc8a70be44ad4f725febd
2013-01-22 23:06:28 +01:00
skal
35bfd4c08f add SSE2 version of Sum of Square error for 16x16, 16x8 and 8x8 case
+ replace mm_set1_ps(0) by _mm_setzero_si128()

Change-Id: I4601033c27466532373f5dabfaf349ce5e5039da
2012-11-14 06:16:49 +01:00
Pascal Massimino
7c6e60f4bd make *InitSSE2() functions be empty on non-SSE2 platform
this avoids the '*.o has no symbols' warning messages

Change-Id: Idbaa02f5c2f7c632997a26f9507926922d191b6e
2012-08-27 23:40:47 -07:00
James Zern
ad1e163a0d cosmetics: normalize copyright headers
Change-Id: I5e2462b101e0447a4f15a1455c07131bc97a52dd
2012-01-06 14:49:06 -08:00
James Zern
f06817aaea simplify checks for enabling SSE2 code
also fixes build issues under vs11 which has a native arm compiler for
windows 8 targets

Change-Id: Id76c2deae9fc9de147d13ad0d34edffcb5a726c4
2011-12-20 17:41:55 -08:00
James Zern
964387ed19 use WEBP_INLINE for inline function declarations
removes a #define inline, objectionable in certain projects

Change-Id: Iebe0ce0b25a030756304d402679ef769e5f854d1
2011-11-11 10:53:58 -08:00
Pascal Massimino
e06ac0887f create a separate libwebpdsp under src/dsp
Gathers all DSP-related function (and SSE2 implementations).
Clean-up some unwanted symbolic dependencies so that webp_encode,
webp_decode and webp_dsp are truly independent libraries.

+ opportunistic clean-up:
  * remove unneeded VP8DspInitTables(), now integrated in VP8DspInit()
  * make consistent use of VP8GetCPUInfo() in the various DspInit() funcs
  * change OUT macro to DST
2011-09-13 12:29:44 -07:00