James Zern
eba6ce06c3
dec_neon: add DC4 intra predictor
...
~70% faster
Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0
2014-10-23 14:21:08 +02:00
James Zern
79abfbd9df
dec_neon: add TM4 intra predictor
...
~21% faster
Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a
2014-10-23 14:21:08 +02:00
James Zern
fe395f0e4d
dec_neon: add LD4 intra predictor
...
based on SSE2 version, ~55% faster
Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0
2014-10-23 14:20:47 +02:00
James Zern
32de385eca
dec_neon: add VE4 intra predictor
...
based on SSE2 version, ~59% faster
Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b
2014-10-23 11:46:08 +02:00
Pascal Massimino
b7a33d7e91
implement VE4/HE4/RD4/... in SSE2
...
(30% faster prediction functions, but overall speed-up is ~1% only)
Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765
2014-10-22 18:25:36 +02:00
Pascal Massimino
97c76f1f30
make VP8PredLuma4[] non-const and initialize array in VP8DspInit()
...
also convert 'type *dst' to 'type* dst'
Change-Id: I41ab66ad15b548cc45d1cb8b10bbca4fe1528cae
2014-10-22 18:14:20 +02:00
James Zern
f85ec712b0
PrintReg: output to stderr
...
allows use of '-o -' while testing
Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191
2014-10-22 17:28:19 +02:00
James Zern
d1c359ef29
fix shared object build with -fvisibility=hidden
...
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples
Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
2014-10-17 11:50:52 +02:00
James Zern
a4c3a31b8f
WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
...
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6
mark some init function as being safe for thread_sanitizer.
...
introduces the macro WEBP_TSAN_IGNORE_FUNCTION
Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
James Zern
0ce27e715e
enc_mips32: workaround gcc-4.9 bug
...
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9
In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault
Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
2014-10-15 19:14:04 +02:00
pascal massimino
32f67e309f
Merge "enc_neon: initialize vectors w/vdup_n_u32"
2014-10-09 12:23:18 -07:00
Pascal Massimino
fabc65da32
1-3% faster encoding optimizing SSE_NxN functions
...
got rid of the |a-b|^|b-a| method and went back
to just (a-b)^2 instead.
quality | size(bytes) after/before | time (ms) after/before
Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0
2014-10-09 07:20:00 -07:00
James Zern
7534d71640
enc_neon: initialize vectors w/vdup_n_u32
...
replaces {} initialization gnu-ism
Change-Id: I5a7b2d4246f0205e4bfb7f4b77d720c47d8674ec
2014-10-09 12:35:41 +02:00
Pascal Massimino
2d9b0a4472
add WebPDispatchAlphaToGreen() to dsp
...
SSE2 version is 2.1x faster
This is used to transfer the alpha plane to green channel before lossless compression.
Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
2014-10-06 23:15:44 +02:00
Yang Zhang
ab70794ddb
rewrite Disto4x4 in enc_neon.c with intrinsic
...
Performance test:
Platform: A9
Input data: bryce.yuv 11158x2156
performance of assembly is the base. Less ratio is better.
|toolchain |assembly |intrinsic |
|gcc4.6 |100% |97.15% |
|gcc4.8 |100% |95.51 |
Change-Id: Idc2446685acdeb58a4dbdcdae533c68a83a1b879
2014-09-23 18:28:36 -07:00
Djordje Pesut
d4471637ef
MIPS: dspr2: added optimization for function FilterLoop24
...
affected functions: VFilter16i, HFilter16i, VFilter8i and HFilter8i
Change-Id: I5d2bc7716e60e048a33d630fe4a86011bfb6d42e
2014-09-23 10:32:55 +02:00
Djordje Pesut
49e15044ef
MIPS: dspr2: added optimization for function FilterLoop26
...
affected functions: VFilter16, HFilter16, VFilter8 and HFilter8
Change-Id: Ib2fc41aaa00b10c2906d689bdc5a10f4568e70a8
2014-09-23 08:46:05 +02:00
Pascal Massimino
cddd334050
Add a WebPExtractAlpha function to dsp
...
This is the opposite of WebPDispatchAlpha
+ Implement the SSE2 version
Change-Id: I0c297309255f508c5261da8aad01f7e57f924d6c
2014-09-15 08:12:03 +02:00
Pascal Massimino
690b491af1
fix loop bug in DispatchAlpha()
...
* We were re-doing most of the work in plain-C as 'left-over'.
* we were always returning has_alpha = true because of a bad mask all_0xff
These bugs were conservative and silent, in the sense that we were 'just' doing
more work than necessary.
Now, the SSE2 version is really 2x faster than the C version.
Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa
2014-09-11 22:35:08 +02:00
Djordje Pesut
3101f53720
MIPS: dspr2: added optimization for TransformOne
...
added macros for TransformOne, TransformAC3 and TransfromDC
Change-Id: I4341450f443cf46dcf91c0db17bde63c8fb8afee
2014-09-11 17:02:02 +02:00
Pascal Massimino
a6bb9b17d8
SSE2 for inverse Mult(ARGB)Row and ApplyAlphaMultiply
...
Change-Id: Iab5c0e4a4d2b31f86736a9b277e62b6e28c3d2b4
WebPMultRow: ~7x faster
WebPMultARGBRow: ~3x faster
ApplyAlphaMultiply: 60% faster
2014-09-11 07:58:42 +02:00
Djordje Pesut
e2502a97c1
MIPS: dspr2: added optimization for TransformAC3
...
Change-Id: Icd789ee5f6d764297e7dc0a0f8a3bc47ab92ac65
2014-09-09 14:53:36 +02:00
Djordje Pesut
24e1072aac
MIPS: dspr2: added optimization for TransformDC
...
Change-Id: Iee69758f6442ea9c80ddaa32cea8d00dda4c6252
2014-09-09 14:15:04 +02:00
Djordje Pesut
f0103595dd
MIPS: dspr2: added optimization for ColorIndexInverseTransforms
...
Change-Id: I5b6094ce489d4f896bc4b8f575142eb3c5054beb
2014-09-08 17:22:59 +02:00
James Zern
637b388809
dsp/lossless: workaround gcc-4.9 bug on arm
...
force Sub3() to not be inlined, otherwise the code in Select() will be
incorrect.
https://android-review.googlesource.com/#/c/102511
Change-Id: I90ae58bf3e6cc92ca9897f69974733d562e29aaf
2014-08-27 20:31:21 -07:00
James Zern
8323a9038d
dsp.h: collect gcc/clang version test macros
...
endian_inl.h already relies on dsp.h, grab the definitions from there.
Change-Id: I445f7d0631723043c55da1070498f89965bec7b1
2014-08-27 19:33:09 -07:00
skal
e6c4b52f28
move static initialization of WebPYUV444Converters[] to the Init function.
...
Split initialization of YUV444Converters[] out of Upsamplers init.
update test for NULL function pointers
Change-Id: I9603f54250f90c85a12ffbecfd6c59e9b06c47e0
2014-08-27 11:36:37 -07:00
skal
f5c04d64b7
Merge "add a DispatchAlpha() for SSE2 that handles 8 pixels at a time"
2014-08-25 22:43:42 -07:00
skal
fc98edd936
add a DispatchAlpha() for SSE2 that handles 8 pixels at a time
...
Only slightly faster.
Change-Id: Ie2e57e6a0950166124cf1075c6c9b45b7abdad8c
2014-08-25 21:03:03 -07:00
skal
73d361dd5f
introduce VP8EncQuantize2Blocks to quantize two blocks at a time
...
No speed diff for now. We might reorder better the instructions later,
to speed things up.
Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c
2014-08-25 20:21:42 -07:00
Djordje Pesut
0b21c30b1a
MIPS: dspr2: added optimization for EmitAlphaRGB
...
New dsp function: WebPDispatchAlpha()
Change-Id: I48e539d22471279ec75185759bc68d18b127f716
2014-08-21 20:39:35 -07:00
James Zern
953acd56a4
enc_neon: enable QuantizeBlock for aarch64
...
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.
Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
2014-08-20 11:48:25 -07:00
Djordje Pesut
f4ae143720
MIPS: mips32: code rebase
...
mips code rebased to be same as C code
from commit I8c29a8a0285076cb3423b01ffae9fcc465da6a81
Change-Id: I3848f4ce43387c3a62b336606498779f7b07ec44
2014-08-19 15:13:16 +02:00
Djordje Pesut
569771549a
MIPS: dspr2: added optimizations for VP8YuvTo*
...
VP8YuvToRgb
VP8YuvToBgr
VP8YuvToRgb565
VP8YuvToRgba4444
VP8YuvToArgb
VP8YuvToBgra
VP8YuvToRgba
Change-Id: I22212a125d890e1fd28388fec906a1a5c07ff386
2014-08-19 14:29:32 +02:00
James Zern
3fca851a20
cpu: check for _MSC_VER before using msvc inline asm
...
_M_IX86 will be defined in mingw builds after including windows.h. as
the gcc inline asm is first, this missing check would only have caused
an error if the code was reorganized.
Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c
2014-08-15 15:11:40 -07:00
Djordje Pesut
b4dc4069a2
MIPS: dspr2: added optimization for (un)filters
...
HorizontalFilter
VerticalFilter
GradientFilter
HorizontalUnfilter
VerticalUnfilter
GradientUnfilter
Change-Id: I54055b4767c37719691811072e95bf79c1f627b1
2014-08-14 11:55:19 -07:00
Djordje Pesut
b61c9ceca8
MIPS: dspr2: Optimization of some simple point-sampling functions
...
Change-Id: I6a4ab29bd0cc5a2951a8882cf9997032dc38bd79
2014-08-13 17:18:49 +02:00
Djordje Pesut
98c54107df
MIPS: mips32r2: added optimization for BSwap32
...
gcc < 4.8.3 doesn't translate bswap optimally.
use optimized version always
Change-Id: I979ea26ad6dc0166d3d2f39c4148eb8adfb7ddec
2014-08-12 09:29:13 +02:00
Djordje Pesut
b7e5a5c451
MIPS: detect mips32r6 and disable mips32r1 code
...
Change-Id: Id1325c789a990c9a8704e84e99a22d580303eb8a
2014-08-08 17:29:31 +02:00
pascal massimino
bb07022b66
Merge "cosmetics"
2014-08-06 12:30:08 -07:00
James Zern
e300c9d819
cosmetics
...
fix some indent/whitespace, remove a few duplicate includes, extra
semi-colons
Change-Id: If937182b40a21e0f2028496e7b4b06c6e8a41352
2014-08-06 12:10:59 -07:00
James Zern
f7b4c48bba
cosmetics: remove some extraneous 'extern's
...
Change-Id: Ib3f0cff37120c51633387dd1c46592c53ab0ba6d
2014-08-05 22:14:24 -07:00
James Zern
0524d9e5e8
dsp: detect mips64 & disable mips32 code
...
Change-Id: Icf68dafd5cf0614ca25b36a0252caa1784ac8059
2014-08-01 21:18:53 -07:00
skal
8f6f8c5dde
remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
...
There's no speed diff, so better remove it altogether
Reported in https://code.google.com/p/webp/issues/detail?id=215
Change-Id: I991330de18bec340029d6df5fed0dfb4337e4662
2014-07-23 14:15:40 -07:00
James Zern
c76f07ecc2
dec_neon/TransformAC3: initialize vector w/vcreate
...
replaces {} initialization gnu-ism
Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0
2014-07-11 15:56:53 -07:00
James Zern
380cca4f2c
configure.ac: add AC_C_BIGENDIAN
...
this defines WORDS_BIGENDIAN, replacing uses of
__BIG_ENDIAN__/__BYTE_ORDER__ with it
+ fixes lossless BGRA output with big-endian toolchains
that do not define __BIG_ENDIAN__ (codesourcery mips gcc)
Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7
2014-07-03 18:15:50 -07:00
James Zern
47779d46c8
endian_inl.h: add BSwap32
...
Change-Id: I96e3ae49659307024415d64587e6312888a0070f
2014-07-03 13:28:13 -07:00
James Zern
e59f53600f
neon: normalize vdup_n_* usage
...
with constants, prefer this over vmov_n_* or vcreate_*
Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a
2014-07-01 00:55:05 -07:00
James Zern
bc03670f01
neon: add INIT_VECTOR4
...
used to initialize NxMx4 vector types
replaces initialization via '{{ }}' gnu-ism.
Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e
2014-07-01 00:18:23 -07:00