Commit Graph

33 Commits

Author SHA1 Message Date
Pascal Massimino
8ea81561d2 change VP8LPredictorFunc signature to avoid reading 'left'
... when it's not available. Even if the value was discarded and
never used, some msan config were complaining about reading it
and passing it around.

Change-Id: Iab8d24676c5bb58e607a829121e36c2862da397c
2021-11-05 16:22:31 +01:00
James Zern
93e0ce27f4 lossless_neon: harmonize function suffixes
BUG=webp:355

Change-Id: I4210081a39800b5c2589c443da237269908af666
2017-10-20 19:00:53 -07:00
James Zern
a439972175 WIP: list includes as descendants of the project dir
#include "(.|..)/..." -> #include "src/..."

Change-Id: I772880aa097a770722043c8a4393552ba38a89b6
2017-10-10 23:04:05 -07:00
skal
622242aaba Lossess dec: harmonize the function suffixes
BUG=webp:355

Change-Id: I445d64df6aa2e347f41e7af306be12a77e2ac6a5
2017-08-07 18:22:41 -07:00
Pascal Massimino
b3fb8bb602 slightly faster Predictor #11 in NEON
(+some slight modifications on Predictor #12)

Change-Id: Ic2132dcd83d961cd069fa01ca1670e35e35274e2
2016-12-08 07:32:51 -08:00
Pascal Massimino
76ebbfff28 NEON: implement predictor #13
~5-7% faster

Change-Id: I3361b0bbc978f3721168db15778a67337309c18a
2016-12-07 14:58:49 -08:00
Pascal Massimino
fe12330c81 3-5% faster Predictor #5, #6, #7 and #10 for NEON
Change-Id: Ica48c7088d4384f0888dd171a47e68ebd25729b2
2016-12-07 15:25:33 +01:00
Pascal Massimino
fbfb3bef7b ~2% faster predictor #10 for NEON
Change-Id: Icd9cff90c227d702c3ba319131996c5475094520
2016-12-06 13:47:35 +00:00
Pascal Massimino
58a1f124c2 ~2% faster predictor #12 in NEON.
Change-Id: I6772bb865d0f72720a65561eb55028e538df236d
2016-12-06 10:24:27 +01:00
Vincent Rabaud
d23abe4e9f Implement lossless transforms in NEON.
Change-Id: I2172b1a763eb9dfe25d2b9bf1fb6501d7e192e55
2016-12-03 11:20:22 +00:00
Vincent Rabaud
71e2f5cadf Remove memcpy in lossless decoding.
Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c
2016-11-24 17:45:24 +01:00
James Zern
a53c336919 lossless_neon: add VP8LTransformColorInverse
based on SSE2, only ~11% faster

Change-Id: I45434639d81e153f01f77c1f5d2da510b542170e
2015-08-04 23:22:36 -07:00
James Zern
2a010f992a lossless_neon: remove predictors 5-13
operating on single uint32's isn't helped by NEON.
this improves aarch64 performance by ~4%

Change-Id: I9fb25a8962de7b80e893e756ee7c76393cfd40c7
2015-07-28 19:44:58 -07:00
James Zern
f27f773576 lossless_neon: enable VP8LAddGreenToBlueAndRed
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~70% faster at the
function level and 1-2% faster overall

Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb
2015-07-01 22:50:54 -07:00
James Zern
b44eda3f60 dsp: add DSP_INIT_STUB
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning

Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
2015-04-02 23:55:35 -07:00
James Zern
553051f741 dsp/lossless: split enc/dec functions
adds lossless_enc*.c; reduces the size of the decode-only so: ~78K
w/gcc-4.8.2 on x86_64.

Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092
2015-03-23 22:57:50 -07:00
James Zern
1d93ddec19 dsp/lossless*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: If8b4459556e6bfaa36ef046f66520558b9444fc2
2015-03-20 19:19:46 -07:00
James Zern
602a00f93f fix iOS arm64 build with Xcode 6.3
the standard vtbl functions are available there [1][2].
based on a patch from: aaroncrespo
fixes issue #243.

[1]
http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf
[2] Apple LLVM Compiler Version 6.1
- Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0.
[...]
Support for the arm64 architecture has been significantly revised to
align with ARM's implementation, where the most visible impact is that a
few of the vector intrinsics have changed to match ARM's specifications.

Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447
2015-02-19 12:16:58 -08:00
James Zern
b969f5dfac dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage
the attribute is only necessary in one location; remove it from the
prototypes.

Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2
2015-02-13 15:23:40 -08:00
James Zern
416e1cea9b lossless_neon: enable subtract green for aarch64
similar to:
1ba61b0 enable NEON intrinsics in aarch64 builds

vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.

performance varies based on the input, 1-3% on encode was observed

Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
2015-01-31 11:32:05 -08:00
James Zern
f8740f0d6c dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
for consistency with other defines shared across modules

Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6 mark some init function as being safe for thread_sanitizer.
introduces the macro WEBP_TSAN_IGNORE_FUNCTION

Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
James Zern
bc03670f01 neon: add INIT_VECTOR4
used to initialize NxMx4 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e
2014-07-01 00:18:23 -07:00
James Zern
bf0e003067 lossless_neon: use vcreate_*() where appropriate
this is more portable than {} initialization.
more involved cases are left for a follow-up.

Change-Id: If7e111864f287ea0a5de6311454aeda37afbb52a
2014-05-27 16:27:46 -07:00
James Zern
1ba61b09f9 enable NEON intrinsics in aarch64 builds
avoids functions that use vtbl? as in iOS builds these are marked
unavailable

Change-Id: I17aedc3c7dc8f1d5be0941205de0b22c3772ef1b
2014-05-03 12:37:42 -07:00
James Zern
b9d2bb67d6 dsp/neon.h: coalesce intrinsics-related defines
Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946
2014-05-03 11:34:07 -07:00
James Zern
f937e01261 move LOCAL_GCC_VERSION def to dsp.h
+ add LOCAL_GCC_PREREQ and use it in lossless_neon.c

Change-Id: Ic9fd99540bc3e19c027d1598e4530dfdc9b9de00
2014-04-26 19:09:04 -07:00
skal
41c6efbdc5 fix lossless_neon.c
* some extra {xx , 0 } in initializers
* replaced by vget_lane_u32() where appropriate

Change-Id: Iabcd8ec34d7c853920491fb147a10d4472280a36
2014-04-14 14:27:11 +02:00
skal
3d49871dbe NEON functions for lossless coding
Verified OK, but right now they don't seem faster.
So they are disabled behind a USE_INTRINSICS flag (off for now)

Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c
2014-04-10 15:32:08 +02:00
skal
abe6f48709 fix the gcc-4.6.0 bug by implementing alternative method
previous functions are a bit faster with gcc-4.8, so we keep them
for now.

Change-Id: I4081e5af66fbf606295d8a83875c1b889729b4dc
2014-04-09 07:53:55 +02:00
James Zern
de693f2502 lossless_neon: disable VP8LConvert* functions
due to breakage with NDK/gcc-4.6 builds

Change-Id: Id96258e710ee33e08a023354b3227f27da986620
2014-04-04 20:38:29 -07:00
skal
97e5fac389 add some colorspace conversion functions in NEON
new file: lossless_neon.c
speedup is ~5%

gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509

I've tried adding  -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.

Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0
2014-03-31 17:47:46 +02:00