James Zern
53f3d8cf7e
dec_neon,DC8_NEON: use vaddlv instead of movl+vaddv
...
one fewer instruction
Change-Id: I2f599fd6f9eebbb0cab81ae9855244fc401d4323
2020-03-04 15:46:38 -08:00
James Zern
e2575e05cb
DC8_NEON,aarch64: use vaddv
...
results in one fewer instruction for both DC8uv_NEON and
DC8uvNoLeft_NEON
Change-Id: Ia4e6f4dbc070079cdc2496a698bd4b34198ea164
2019-12-06 09:38:48 -08:00
Cheng Yi
b0e09e346f
dec_neon: Fix build failure under some toolchains
...
some toolchains may implement vcreate_u64 as an assignment to a vector
causing a type mismatch:
invalid conversion between vector type 'uint64x1_t' (vector of 1
'uint64_t' value) and integer type 'unsigned int' of different size
const uint64x1_t LKJI____ = vcreate_u64(L | (K << 8) | (J << 16) | (I << 24));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: I5c7b0076ad66d4b3fcdcb7ee9f59bbaa6f19b783
2019-12-06 00:06:44 -08:00
Pascal Massimino
c245343dcb
move LOAD8x4 and STORE8x2 closer to their use location
...
Change-Id: I674821732d3e607123070e4bbba87d9359c9a4ec
2017-11-21 23:44:39 -08:00
James Zern
8d033b14d7
{dec,enc}_neon: harmonize function suffixes x2
...
+ neon.h
BUG=webp:355
Change-Id: Ia17c7dfc7d61742a4758823675a2d556a739c389
2017-10-20 19:00:53 -07:00
James Zern
bc1a251fcf
dec_neon: harmonize function suffixes
...
BUG=webp:355
Change-Id: I61c9a0c9e24515322955e04afd8c4ea6a44b9319
2017-10-20 00:14:18 -07:00
James Zern
a439972175
WIP: list includes as descendants of the project dir
...
#include "(.|..)/..." -> #include "src/..."
Change-Id: I772880aa097a770722043c8a4393552ba38a89b6
2017-10-10 23:04:05 -07:00
James Zern
668e1dd44f
src/{dec,enc,utils}: give filenames a unique suffix
...
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
2017-01-19 19:09:48 -08:00
Pascal Massimino
71c53f1aeb
NEON: speed-up strong filtering
...
The sub-expression trick removes two constants and
two vmlal_s8 instructions.
Change-Id: I200022573b4880871b528b13a11a8f3d95def113
2017-01-19 20:46:48 +00:00
James Zern
40872fb2e6
dec_neon,NeedsHev: micro optimization
...
trade 2 compares + 1 logical or for max + compare
Change-Id: I785ad8efdc64db2d0609456d6e7af795ab2117d8
2016-08-08 20:12:30 -07:00
James Zern
d623a8706f
dec_neon: add whitespace around stringizing operator
...
prevents unintentional side-effects (though unlikely in this case) with
future compilers, cf:
eebaf97 dsp/mips: add whitespace around stringizing operator
Change-Id: I0537091fcc97b4f54d0a156c3c83a28c51456b17
2015-09-03 23:13:56 -07:00
Pascal Massimino
751506c484
remove some duplicate FlipSign()
...
ApplyFilter2NoFlip is the new variant of ApplyFilter2 without the sign-flip
Change-Id: I2af54bd1499118c8321183e42251d265ba76219c
2015-06-05 17:20:29 +02:00
James Zern
64960da9e1
dec_neon: add VE8uv / VE16
...
VE8uv/VE16: ~25%/~33% faster over 20M pixels
Change-Id: Ifac1114091527a05ed10edfcc43852edff012d14
2015-05-30 13:40:00 -07:00
James Zern
14dbd87bed
dec_neon: add HE8uv / HE16
...
HE8uv/HE16: ~91%/~83% faster over 20M pixels
Change-Id: Ib0a776f7c193593ea0993e92cfa6e6be000fb810
2015-05-30 13:39:24 -07:00
James Zern
aa6065aedd
dec_neon: use vld1_dup(mem) rather than vdup(mem[0])
...
should result in slightly less general purpose register use
Change-Id: I6069f49541392e56c8db2c28c8d1fdf88c1a1726
2015-05-16 11:24:32 -07:00
James Zern
dc48196bd9
dec_neon: add TM16
...
over 20M pixels ~78% faster
Change-Id: I420d5d590f275f19e08f86df1d1caa6b82fffbde
2015-05-15 12:50:11 -07:00
James Zern
ea95b305ca
dec_neon/TrueMotion: simply left border load
...
use vld1_dup_u8() rather than a separate ld+dup after the values were
zero extended; mildly faster at the function level
Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89
2015-05-15 12:48:13 -07:00
James Zern
a6c1593645
dec_neon: add DC16 intra predictors
...
improvement over 20M pixels:
DC16: ~77%
DC16NoTop: ~78%
DC16NoLeft: ~83%
DC16NoTopLeft: ~83%
Change-Id: I4c4ee16a8fa0eb466eee45dfa6f6bbce5ce64b99
2015-05-08 00:12:48 -07:00
James Zern
4c9af02326
dec_neon: add DC8uvNoTopLeft
...
~93% faster
Change-Id: Icf0fd5f85ac53c306a1b69d84275023e5b24a602
2015-05-01 20:03:57 -07:00
James Zern
b44eda3f60
dsp: add DSP_INIT_STUB
...
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning
Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
2015-04-02 23:55:35 -07:00
James Zern
66de69c1fe
dsp/dec*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I319bc7714f36b8a3d8b35f6474e5592a439aaf24
2015-03-20 19:19:37 -07:00
James Zern
b969f5dfac
dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage
...
the attribute is only necessary in one location; remove it from the
prototypes.
Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2
2015-02-13 15:23:40 -08:00
James Zern
f8740f0d6c
dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
...
for consistency with other defines shared across modules
Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
James Zern
e7c5954c10
dec_neon: remove returns from void functions
...
Change-Id: I3c66a5dfe3de2bb3653cbbf1b92b0328aba62881
2015-01-09 18:08:05 -08:00
James Zern
14108d7878
dec_neon: add DC8uvNoTop / DC8uvNoLeft
...
adds do_top/do_left flags to DC8uv; ~88% / ~92% faster respectively
no change in DC8uv speed.
Change-Id: I109ec4d9ad13c9db64516e98ed4693a21a3e9b54
2014-12-22 15:47:38 -05:00
James Zern
d8340da756
dec_neon: add DC8uv
...
~87% faster.
Change-Id: I73fe77437792f1361ce8ab0b411132c6ec0fa021
2014-12-22 14:36:45 -05:00
James Zern
9de9074c92
dec_neon: add TM8uv
...
~68% faster
reuses TM4() adding support for the additional rows, the columns were
already being done.
Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
2014-11-25 14:40:17 -08:00
James Zern
22881c999e
dec_neon: add RD4 intra predictor
...
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.
Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
1304eb3418
Merge "dec_neon: DC4: use pair-wise adds for top row"
2014-10-23 08:08:34 -07:00
James Zern
0db9031c79
dsp/dec_{neon,sse2}: VE4: normalize variable names
...
use '0' rather than '_' when dealing with variables that result from a
shift
Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
James Zern
b5bc15305b
dec_neon: DC4: use pair-wise adds for top row
...
reduces load count, slightly faster
Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08
2014-10-23 15:48:49 +02:00
James Zern
eba6ce06c3
dec_neon: add DC4 intra predictor
...
~70% faster
Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0
2014-10-23 14:21:08 +02:00
James Zern
79abfbd9df
dec_neon: add TM4 intra predictor
...
~21% faster
Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a
2014-10-23 14:21:08 +02:00
James Zern
fe395f0e4d
dec_neon: add LD4 intra predictor
...
based on SSE2 version, ~55% faster
Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0
2014-10-23 14:20:47 +02:00
James Zern
32de385eca
dec_neon: add VE4 intra predictor
...
based on SSE2 version, ~59% faster
Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b
2014-10-23 11:46:08 +02:00
James Zern
a4c3a31b8f
WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
...
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6
mark some init function as being safe for thread_sanitizer.
...
introduces the macro WEBP_TSAN_IGNORE_FUNCTION
Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
James Zern
c76f07ecc2
dec_neon/TransformAC3: initialize vector w/vcreate
...
replaces {} initialization gnu-ism
Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0
2014-07-11 15:56:53 -07:00
James Zern
e59f53600f
neon: normalize vdup_n_* usage
...
with constants, prefer this over vmov_n_* or vcreate_*
Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a
2014-07-01 00:55:05 -07:00
James Zern
bc03670f01
neon: add INIT_VECTOR4
...
used to initialize NxMx4 vector types
replaces initialization via '{{ }}' gnu-ism.
Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e
2014-07-01 00:18:23 -07:00
James Zern
6c1c632b03
neon: add INIT_VECTOR3
...
used to initialize NxMx3 vector types
replaces initialization via '{{ }}' gnu-ism.
Change-Id: Idad2f278ab104cf2cc650517194258ce3cfb37b4
2014-06-30 23:53:23 -07:00
James Zern
dc7687e51b
neon: add INIT_VECTOR2
...
used to initialize NxMx2 vector types
replaces initialization via '{{ }}' gnu-ism.
Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139
2014-06-30 23:52:42 -07:00
skal
ea8b0a171d
strong filtering speed-up (~2-3% x86, ~1-2% for NEON)
...
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).
Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
2014-06-03 07:14:23 +02:00
James Zern
9251c2f6d2
(enc|dec)_neon: use vcreate_*() where appropriate
...
this is more portable than {} initialization.
more involved cases are left for a follow-up.
Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17
2014-05-27 16:26:56 -07:00
James Zern
b9d2bb67d6
dsp/neon.h: coalesce intrinsics-related defines
...
Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946
2014-05-03 11:34:07 -07:00
James Zern
c8bbb636ea
dec_neon: relocate some inline-asm defines
...
move simple loop filter defines closer to their use and LOAD* to a
location common with the intrinsics
Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3
2014-04-28 00:41:42 -07:00
James Zern
4e393bb9f1
dec_neon: enable intrinsics-only functions
...
the complex loop filter has no inline equivalent; the simple loop filter
remains conditional on USE_INTRINSICS: it's left undefined for now.
Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2
2014-04-28 00:39:46 -07:00
James Zern
ba99a922ab
dec_neon: use positive tests for USE_INTRINSICS
...
makes Simple* layout consistent with the rest of the file
Change-Id: Ib3108b0f2c694c634210e22027c253ea6236a9c6
2014-04-28 00:38:47 -07:00
James Zern
a7828e8bdb
dec_neon: make WORK_AROUND_GCC conditional on version
...
Change-Id: Ic1b95f8749988de90df7c1ff6c537a21981329db
2014-04-28 00:01:19 -07:00
pascal massimino
ca49e7ad97
Merge "enc_neon: move Transpose4x4 to dsp/neon.h"
2014-04-27 01:11:05 -07:00