Commit Graph

65 Commits

Author SHA1 Message Date
James Zern
66de69c1fe dsp/dec*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: I319bc7714f36b8a3d8b35f6474e5592a439aaf24
2015-03-20 19:19:37 -07:00
James Zern
b969f5dfac dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage
the attribute is only necessary in one location; remove it from the
prototypes.

Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2
2015-02-13 15:23:40 -08:00
James Zern
f8740f0d6c dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
for consistency with other defines shared across modules

Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
James Zern
e7c5954c10 dec_neon: remove returns from void functions
Change-Id: I3c66a5dfe3de2bb3653cbbf1b92b0328aba62881
2015-01-09 18:08:05 -08:00
James Zern
14108d7878 dec_neon: add DC8uvNoTop / DC8uvNoLeft
adds do_top/do_left flags to DC8uv; ~88% / ~92% faster respectively
no change in DC8uv speed.

Change-Id: I109ec4d9ad13c9db64516e98ed4693a21a3e9b54
2014-12-22 15:47:38 -05:00
James Zern
d8340da756 dec_neon: add DC8uv
~87% faster.

Change-Id: I73fe77437792f1361ce8ab0b411132c6ec0fa021
2014-12-22 14:36:45 -05:00
James Zern
9de9074c92 dec_neon: add TM8uv
~68% faster

reuses TM4() adding support for the additional rows, the columns were
already being done.

Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
2014-11-25 14:40:17 -08:00
James Zern
22881c999e dec_neon: add RD4 intra predictor
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.

Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
1304eb3418 Merge "dec_neon: DC4: use pair-wise adds for top row" 2014-10-23 08:08:34 -07:00
James Zern
0db9031c79 dsp/dec_{neon,sse2}: VE4: normalize variable names
use '0' rather than '_' when dealing with variables that result from a
shift

Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
James Zern
b5bc15305b dec_neon: DC4: use pair-wise adds for top row
reduces load count, slightly faster

Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08
2014-10-23 15:48:49 +02:00
James Zern
eba6ce06c3 dec_neon: add DC4 intra predictor
~70% faster

Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0
2014-10-23 14:21:08 +02:00
James Zern
79abfbd9df dec_neon: add TM4 intra predictor
~21% faster

Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a
2014-10-23 14:21:08 +02:00
James Zern
fe395f0e4d dec_neon: add LD4 intra predictor
based on SSE2 version, ~55% faster

Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0
2014-10-23 14:20:47 +02:00
James Zern
32de385eca dec_neon: add VE4 intra predictor
based on SSE2 version, ~59% faster

Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b
2014-10-23 11:46:08 +02:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6 mark some init function as being safe for thread_sanitizer.
introduces the macro WEBP_TSAN_IGNORE_FUNCTION

Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
James Zern
c76f07ecc2 dec_neon/TransformAC3: initialize vector w/vcreate
replaces {} initialization gnu-ism

Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0
2014-07-11 15:56:53 -07:00
James Zern
e59f53600f neon: normalize vdup_n_* usage
with constants, prefer this over vmov_n_* or vcreate_*

Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a
2014-07-01 00:55:05 -07:00
James Zern
bc03670f01 neon: add INIT_VECTOR4
used to initialize NxMx4 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e
2014-07-01 00:18:23 -07:00
James Zern
6c1c632b03 neon: add INIT_VECTOR3
used to initialize NxMx3 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: Idad2f278ab104cf2cc650517194258ce3cfb37b4
2014-06-30 23:53:23 -07:00
James Zern
dc7687e51b neon: add INIT_VECTOR2
used to initialize NxMx2 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139
2014-06-30 23:52:42 -07:00
skal
ea8b0a171d strong filtering speed-up (~2-3% x86, ~1-2% for NEON)
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).

Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
2014-06-03 07:14:23 +02:00
James Zern
9251c2f6d2 (enc|dec)_neon: use vcreate_*() where appropriate
this is more portable than {} initialization.
more involved cases are left for a follow-up.

Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17
2014-05-27 16:26:56 -07:00
James Zern
b9d2bb67d6 dsp/neon.h: coalesce intrinsics-related defines
Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946
2014-05-03 11:34:07 -07:00
James Zern
c8bbb636ea dec_neon: relocate some inline-asm defines
move simple loop filter defines closer to their use and LOAD* to a
location common with the intrinsics

Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3
2014-04-28 00:41:42 -07:00
James Zern
4e393bb9f1 dec_neon: enable intrinsics-only functions
the complex loop filter has no inline equivalent; the simple loop filter
remains conditional on USE_INTRINSICS: it's left undefined for now.

Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2
2014-04-28 00:39:46 -07:00
James Zern
ba99a922ab dec_neon: use positive tests for USE_INTRINSICS
makes Simple* layout consistent with the rest of the file

Change-Id: Ib3108b0f2c694c634210e22027c253ea6236a9c6
2014-04-28 00:38:47 -07:00
James Zern
a7828e8bdb dec_neon: make WORK_AROUND_GCC conditional on version
Change-Id: Ic1b95f8749988de90df7c1ff6c537a21981329db
2014-04-28 00:01:19 -07:00
pascal massimino
ca49e7ad97 Merge "enc_neon: move Transpose4x4 to dsp/neon.h" 2014-04-27 01:11:05 -07:00
James Zern
5e1a17ef4b enc_neon: move Transpose4x4 to dsp/neon.h
+ reuse it in TransformWHT()

Change-Id: Idfbd0f9b58d6253ac3d65ba55b58989c427ee989
2014-04-26 14:06:04 -07:00
James Zern
c7b92a5a29 dec_neon: (WORK_AROUND_GCC) delete unused Load4x8
using this in Load4x16 was slightly slower and didn't help mitigate any
of the remaining build issues with 4.6.x.

Change-Id: Idabfe1b528842a514d14a85f4cefeb90abe08e51
2014-04-26 12:36:14 -07:00
James Zern
71bca5ecf3 dec_neon: use vst_lane instead of vget_lane
results in fewer instructions, small speed improvement

Change-Id: I98de632d09ff09f295368c0d744cb4397b585084
2014-04-03 14:56:26 -07:00
skal
bf06105293 Intrinsics NEON version of TransformOne
+ misc cosmetics

* seems 4% slower than inlined-asm with gcc-4.6
* is a tad faster (<1%) with gcc-4.8
(disabled for now)

Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095
2014-04-03 14:41:56 -07:00
pascal massimino
19c6f1ba74 Merge "dec_neon: use vld?_lane instead of vset?_lane" 2014-04-03 01:16:29 -07:00
James Zern
fa52d7525f dec_neon: use vld?_lane instead of vset?_lane
results in fewer instructions, small speed improvement

Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a
2014-04-02 23:03:18 -07:00
Pascal Massimino
c520e77d94 cosmetic: fix long line
Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea
2014-04-02 23:00:50 -07:00
skal
e351ec0759 add intrinsics NEON code for chroma strong-filtering
The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t
register, and re-use the previous (luma) functions

Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38
2014-04-03 06:58:21 +02:00
skal
5fbff3a646 Add strong filtering intrinsics (inner and outer edges)
+ added some work-around gcc-4.6 to make it compile (except one function).
+ lots of revamping

All variants tested ok.
Speed-up is ~5-7%

Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3
2014-04-02 08:28:55 +02:00
James Zern
26029568b7 dec_neon: add strong loopfilter intrinsics
vertical only currently, 2.5-3% faster
placed under USE_INTRINSICS as this change depends on the simple
loopfilter
improves the simple loopfilter slightly thanks to some reorganization

Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155
2014-04-01 01:13:50 -07:00
skal
b9a7a45f1f add intrinsics version of SimpleHFilter16NEON()
It's disable for now, because it crashes gcc-4.6.3 during compilation
with -O2 or -O3. It's been tested OK with -O1.

Code is still globally disabled with USE_INTRINSICS, though.

Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c
2014-03-31 16:31:31 +02:00
Pascal Massimino
daccbf400d add light filtering NEON intrinsics
disabled for now (but tested OK), thanks to the USE_INTRINSICS #define
We'll activate the code when we're on par with non-intrinsics

Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff
2014-03-30 22:15:55 -07:00
Pascal Massimino
af44460880 fix typo in STORE_WHT
was working ok because dst == out

Change-Id: I27095129a11f468422250dd2b8fad8b3bd4e5bbd
2014-03-28 10:34:44 -07:00
James Zern
82ae1bf299 cosmetics: normalize VP8GetCPUInfo checks
- use '!= NULL'
+ dec_neon/STORE_WHT: align '\'s

Change-Id: I0f0ce49bd9c58e771bafb24c51c070d5ebd77e53
2014-02-28 18:47:41 -08:00
James Zern
9a463c4a51 Merge "dec_neon: convert TransformWHT to intrinsics" 2014-02-25 14:36:44 -08:00
James Zern
9d6b5ff1e6 dec_neon: convert TransformWHT to intrinsics
Change-Id: I34dc1d75ddebab131cfed031764117e3f7b75c6b
2014-02-21 11:23:46 -08:00
James Zern
2ff0aae2fe dec_neon: add ConvertU8ToS16
Change-Id: Ifc4fb8e7f862e72154d2f2739811b1022d2b9416
2014-02-20 15:35:33 -08:00
James Zern
2719bb7e98 dec_neon: TransformAC3: work on packed vectors
pack 2 rows in 1 vector similar to TransformDC

Change-Id: I3b240ffb4f51a632b5c8c2daf54d938333ed4b0d
2014-02-18 19:47:20 -08:00
James Zern
b7b60ca16c dec_neon: add SaturateAndStore4x4
converts 2 s16 vectors to 2 u8 and store to uint8_t destination;
TransformAC3 can reuse this after a rework

Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0
2014-02-18 19:42:35 -08:00
James Zern
e02f16ef45 dec_neon.c: convert TransformDC to intrinsics
no noticeable difference in performance

Change-Id: Ia2d287289c3865ddd0fc99edaf7a030778aa7025
2014-02-14 12:11:58 -08:00