Pascal Massimino
3033f24c26
lossless: 0.06 % compression density improvement
...
Change-Id: Ib662e6aec53b40d6bc736d3ecfd6475bb005c790
2015-06-02 14:51:51 +02:00
James Zern
64960da9e1
dec_neon: add VE8uv / VE16
...
VE8uv/VE16: ~25%/~33% faster over 20M pixels
Change-Id: Ifac1114091527a05ed10edfcc43852edff012d14
2015-05-30 13:40:00 -07:00
James Zern
14dbd87bed
dec_neon: add HE8uv / HE16
...
HE8uv/HE16: ~91%/~83% faster over 20M pixels
Change-Id: Ib0a776f7c193593ea0993e92cfa6e6be000fb810
2015-05-30 13:39:24 -07:00
skal
ac76801159
introduce FTransform2 to perform two transforms at a time.
...
FTransform goes from ~12.0% to 11.5% total CPU time.
Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
James Zern
aa6065aedd
dec_neon: use vld1_dup(mem) rather than vdup(mem[0])
...
should result in slightly less general purpose register use
Change-Id: I6069f49541392e56c8db2c28c8d1fdf88c1a1726
2015-05-16 11:24:32 -07:00
Pascal Massimino
8b63ac78e0
Merge "dec_neon: add TM16"
2015-05-16 10:56:07 +00:00
Pascal Massimino
f51be09e1f
Merge "dec_neon/TrueMotion: simply left border load"
2015-05-16 10:54:05 +00:00
James Zern
dc48196bd9
dec_neon: add TM16
...
over 20M pixels ~78% faster
Change-Id: I420d5d590f275f19e08f86df1d1caa6b82fffbde
2015-05-15 12:50:11 -07:00
James Zern
ea95b305ca
dec_neon/TrueMotion: simply left border load
...
use vld1_dup_u8() rather than a separate ld+dup after the values were
zero extended; mildly faster at the function level
Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89
2015-05-15 12:48:13 -07:00
Pascal Massimino
f262d6120e
speed-up SetResidualSSE2
...
(was unnecessarily complicated)
Before:
VP8SetResidualCoeffs: checksum = 1127918 elapsed = 475 ms.
Change-Id: Ia54bef86c45f9f474622ff16e594bf1da4f67ebd
After:
VP8SetResidualCoeffs: checksum = 1127918 elapsed = 404 ms.
2015-05-14 21:24:24 -07:00
James Zern
bf46d0acff
fix mips2 build target
...
tested with mips1 and mips2; this should cover 3/4 as well.
fixes an ftbfs reported on the debian issue tracker:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785000
Change-Id: I2458487c92bd638589fdfec5adb4f22102a5960c
2015-05-13 10:36:22 -07:00
James Zern
929a0fdccd
enc_sse2/TTransform: simplify abs calculation
...
max(b, 0 - b) works as well as (b ^ sign) - b
Change-Id: Iad923236fd70db85ff58a64d3c8e25e4f42a525d
2015-05-08 19:50:29 -07:00
James Zern
17dbd05819
enc_sse2/CollectHistogram: simplify abs calculation
...
max(out, 0 - out) works as well as (out ^ sign) - out
Change-Id: Id820ab9b296512cb0d56c8026b986bf98e3d3909
2015-05-08 19:49:08 -07:00
James Zern
a6c1593645
dec_neon: add DC16 intra predictors
...
improvement over 20M pixels:
DC16: ~77%
DC16NoTop: ~78%
DC16NoLeft: ~83%
DC16NoTopLeft: ~83%
Change-Id: I4c4ee16a8fa0eb466eee45dfa6f6bbce5ce64b99
2015-05-08 00:12:48 -07:00
James Zern
f274a96ce9
dsp/enc_sse2: add luma4 intra predictors
...
VP8EncPredLuma4 improvement over ~20M pixels: ~39%
Change-Id: I9cd841250771276d2d1bef3991215a56e83f7f20
2015-05-05 23:51:19 -07:00
James Zern
040b11bdf6
dsp/enc_sse2: add chroma intra predictors
...
VP8EncPredChroma8 improvements over ~20M pixels
left/top: ~67%
left-only: ~52%
top-only: ~57%
none: ~61%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76
2015-05-05 23:51:14 -07:00
James Zern
aee021bbb1
dsp/enc_sse2: add luma16 intra predictors
...
VP8EncPredLuma16 improvements over ~20M pixels
left/top: ~75%
left-only: ~47%
top-only: ~59%
none: ~63%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d
2015-05-05 23:51:07 -07:00
James Zern
4c9af02326
dec_neon: add DC8uvNoTopLeft
...
~93% faster
Change-Id: Icf0fd5f85ac53c306a1b69d84275023e5b24a602
2015-05-01 20:03:57 -07:00
Pascal Massimino
9287761d95
Merge "GetResidualCostSSE2: simplify abs calculation"
2015-04-30 06:30:58 +00:00
James Zern
0e009366f8
dsp/cpu.c(x86): check maximum supported cpuid feature
...
structured extended feature flags require eax = 7; avoids incorrectly
detecting avx2 on some older processors that support avx.
for completeness also check for value=1 support used by the other
checks.
from [1]:
INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor
Information and the Vendor Identification String
[1]
http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html
Change-Id: I60b20d661a978d551614dbf7acdc25db19cb6046
2015-04-29 23:22:53 -07:00
James Zern
b243a4bc30
GetResidualCostSSE2: simplify abs calculation
...
max(coeff, 0 - coeff) works as well as min/max/sub or
(coeff ^ sign) - coeff
Change-Id: I9b11715372e49cd83820677bf4beba4a1c04931c
2015-04-21 20:29:12 -07:00
James Zern
0768b252fa
dsp/enc.c: cosmetics: move DST() def closer to use
...
Change-Id: Iccbcf046412426c2893b71eced517f611d2ffc3f
2015-04-15 20:03:39 -07:00
James Zern
9904e365a8
dsp/dec_sse2: DC8uv / DC8uvNoLeft speedup
...
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.
DC8uv: ~19% faster
DC8uvNoLeft: ~12% faster
Change-Id: I707c4f6177a65b5d1f2d3deeca87d2bb740185e2
2015-04-08 23:12:53 -07:00
James Zern
7df2049785
dsp/dec_sse2: DC16 / DC16NoLeft speedup
...
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.
DC16: ~20% faster
DC16NoLeft: ~14% faster
Change-Id: I7ec3f8a6e5923f88a530f79fceb88d5001bef691
2015-04-08 23:10:39 -07:00
James Zern
b44eda3f60
dsp: add DSP_INIT_STUB
...
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning
Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
2015-04-02 23:55:35 -07:00
James Zern
1a338fb306
enc_sse41: add Disto4x4 / Disto16x16
...
direct translation from sse2; minor gain, fewer instructions
Change-Id: I60288a842fac1a686b82b5cab637931789fe29f2
2015-03-25 23:28:46 -07:00
Pascal Massimino
94055503e3
encoding SSE4.1 stub for StoreHistogram + Quantize + SSE_16xN
...
Visible speed-up, thanks to pshufb and pabsw and psignw use.
had to tweak configure.ac to make "smmintri.h" presence correctly
detected (we need to set the CPPFLAGS instead of the CFLAGS!)
Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080
2015-03-25 20:23:51 -07:00
Pascal Massimino
c64659e1b4
remove duplicate variables after the lossless{_enc}.c split
...
clang was giving "duplicate symbols" error messages at link time.
Change-Id: I2b77b55222fe033cc1d4636567902e80d814aab6
2015-03-25 11:10:21 +01:00
James Zern
67ba7c7acc
enc_sse2: call local FTransform in CollectHistogram
...
allows the former to be inlined; negligible speed-up in most cases,
however this is structure is consistent with the rest of the optimized
modules
Change-Id: Ib080240b06f7a995b47f1906627850c355b82901
2015-03-24 20:22:24 -07:00
James Zern
182497993b
dsp: s/VP8LSetHistogramData/VP8SetHistogramData/
...
this function is for lossy encoding; the VP8L prefix is used by lossless
Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb
2015-03-24 18:27:41 -07:00
James Zern
ede5e1584c
cosmetics: dsp/lossless.h: reorder prototypes
...
group decoding / encoding functions together, followed by their
respective Init() function.
Change-Id: Ib4d22f8ec2369efec752faf733ecf53acc67b1ca
2015-03-24 17:52:42 -07:00
James Zern
553051f741
dsp/lossless: split enc/dec functions
...
adds lossless_enc*.c; reduces the size of the decode-only so: ~78K
w/gcc-4.8.2 on x86_64.
Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092
2015-03-23 22:57:50 -07:00
James Zern
cecf509662
dsp/yuv*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I42e621481be7305bb7c426b4d0b279619195611e
2015-03-20 19:19:46 -07:00
James Zern
6584d398eb
dsp/upsampling*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I3c753915eefe900987c9720733efb720ebe6bfa7
2015-03-20 19:19:46 -07:00
James Zern
808094228c
dsp/rescaler*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: Ife9c7cd363b3692b64a7ade1960cfce3a76c3ba2
2015-03-20 19:19:46 -07:00
James Zern
1d93ddec19
dsp/lossless*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: If8b4459556e6bfaa36ef046f66520558b9444fc2
2015-03-20 19:19:46 -07:00
James Zern
73805ff270
dsp/filters*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: Idf08ffeb2aef1392a6d69596d897a59deebb64cf
2015-03-20 19:19:46 -07:00
James Zern
fbdcef2401
dsp/enc*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I0cf40b500f9b3eed55a3211213db180c7c0dd43b
2015-03-20 19:19:46 -07:00
James Zern
66de69c1fe
dsp/dec*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I319bc7714f36b8a3d8b35f6474e5592a439aaf24
2015-03-20 19:19:37 -07:00
James Zern
48e4ffd15e
dsp/cost*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: Ie9bee5eaf9daebe0909ab1dda1cf1aa4ee1ef03e
2015-03-20 19:18:50 -07:00
James Zern
29fd6f90c0
dsp/argb*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I46b89909a0279172d37dbda70f731c7b9f052dad
2015-03-20 19:18:50 -07:00
James Zern
80ff38130e
dsp/alpha*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I9e7f187daffe1a3b1bc92953dce980c38d1a6269
2015-03-20 19:18:41 -07:00
Pascal Massimino
e9570dd987
stub for SSE4.1 support.
...
Change-Id: I0c845a98d2871cc8907ff7b914bab7747a92c7ed
2015-03-20 00:26:35 -07:00
James Zern
cabf4bd2bc
dsp: add sse4.1 detection
...
bit 19 in ecx
no targets or code
https://software.intel.com/en-us/articles/using-cpuid-to-detect-the-presence-of-sse-41-and-sse-42-instruction-sets
Change-Id: Ie61b004dd5b6a3639b30bd9d2a09e6d7359b8040
2015-03-18 19:16:47 -07:00
Sam Clegg
ac4f5784a0
Disable NEON code on Native Client
...
The NEON assember in libwebp has not yet been ported
to Native Client. This changes disables it.
Related issue:
https://code.google.com/p/nativeclient/issues/detail?id=3205
Change-Id: I200291db7aa79d40c1f10cff7622c9b8599e6886
2015-03-10 16:17:25 -07:00
Djordje Pesut
241bb5d9d9
MIPS: dspr2: added optimization for TrueMotion
...
affected functions:
TM4 - TrueMotion4
TM8uv - TrueMotion8
TM16 - TrueMotion16
Change-Id: Iff4377c4b0ae94716789c03fe1cd5bfd91f79188
2015-02-26 10:22:19 +01:00
Djordje Pesut
b5e79422d5
MIPS: dspr2: Added optimization for some convert functions
...
affected functions:
VP8LConvertBGRAToRGBA4444_C
VP8LConvertBGRAToRGB565_C
VP8LConvertBGRAToBGR_C
Change-Id: I81513d242d33ebb9fef397ee6a2ca75d17f66e97
2015-02-24 10:51:34 +01:00
Djordje Pesut
0f595db60c
MIPS: dspr2: Added optimization for some convert functions
...
affected functions:
VP8LConvertBGRAToRGB_C
VP8LConvertBGRAToRGBA_C
Change-Id: I5f25795c385688f2432d0710296e589f3793cb2b
2015-02-23 17:44:06 +01:00
Djordje Pesut
8a218b4a96
MIPS: [mips32|dspr2]: GetResidualCost rebased
...
Change-Id: Ie15524c773f7a8c79e002097881a508187ca7cc6
2015-02-23 10:43:42 +01:00
James Zern
602a00f93f
fix iOS arm64 build with Xcode 6.3
...
the standard vtbl functions are available there [1][2].
based on a patch from: aaroncrespo
fixes issue #243 .
[1]
http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf
[2] Apple LLVM Compiler Version 6.1
- Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0.
[...]
Support for the arm64 architecture has been significantly revised to
align with ARM's implementation, where the most visible impact is that a
few of the vector intrinsics have changed to match ARM's specifications.
Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447
2015-02-19 12:16:58 -08:00
Pascal Massimino
2382050748
1-2% faster encoding by removing an indirection in GetResidualCost()
...
The MIPS code for cost is not updated yet, that's why i keep Residual::*cost
around for now. Should be removed in favor of *costs later.
Change-Id: Id1d09a8c37ea8c5b34ad5eb8811d6a3ec6c4d89f
2015-02-19 08:44:35 +01:00
Djordje Pesut
eddb7e70be
MIPS: dspr2: added otpimization for DC8uv, DC8uvNoTop and DC8uvNoLeft
...
added macros for load/store
Change-Id: I151d4d49bf1fab87fc3a82cb8e8e0835fe10b690
2015-02-18 18:24:10 +01:00
Djordje Pesut
73ba29158f
MIPS: dspr2: added optimization for functions RD4 and LD4
...
Change-Id: I71216c1300f4eb254de4ae940ea9dcdba50aa080
2015-02-18 15:11:34 +01:00
Pascal Massimino
c7129da5b6
Merge "4-5% faster encoding using SSE2 for GetResidualCost"
2015-02-18 04:46:53 -08:00
Djordje Pesut
94380d00d9
MIPS: dspr2: added optimizaton for functions VE4 and DC4
...
Change-Id: I118adc6d3872742d8b1f9dbac438cba6fc90b7a9
2015-02-18 11:25:08 +01:00
Pascal Massimino
2a407092ab
4-5% faster encoding using SSE2 for GetResidualCost
...
new file: cost_sse2.c
Change-Id: I4896c07f5ff2443ef743f4435fe2758d95a672ed
2015-02-18 09:41:02 +01:00
James Zern
17e1986214
Merge "MIPS: dspr2: added optimization for simple filtering functions"
2015-02-17 14:57:05 -08:00
pascal massimino
3ec404c47a
Merge "dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage"
2015-02-14 01:57:08 -08:00
James Zern
b969f5dfac
dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage
...
the attribute is only necessary in one location; remove it from the
prototypes.
Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2
2015-02-13 15:23:40 -08:00
Djordje Pesut
d7b8e71126
MIPS: dspr2: added optimization for simple filtering functions
...
affected functions: SimpleVFilter16, SimpleHFilter16,
SimpleVFilter16i and SimpleHFilter16i
noticed bug in FilterLoop26 (fix included in this patch)
Change-Id: I72d9c1e45cbac6393eba52bb549b04924d463e30
2015-02-13 11:18:43 +01:00
Djordje Pesut
42a8a6280c
MIPS: dspr2: Added optimization for function VP8LTransformColorInverse_C
...
Change-Id: I8b60e22c9f6c0badab6267a33751dfc28750f457
2015-02-13 08:57:20 +01:00
James Zern
3030f11525
Merge "dsp/mips: add some missing TSan annotations"
2015-02-12 14:55:32 -08:00
pascal massimino
dfcf4593fe
Merge "MIPS: dspr2: Added optimization for function VP8LAddGreenToBlueAndRed_C"
2015-02-12 14:48:17 -08:00
James Zern
55c75a25f0
dsp/mips: add some missing TSan annotations
...
Change-Id: I3c832aefdeac26c6c75c35b19b45c1a2f67493c5
2015-02-12 14:36:33 -08:00
Djordje Pesut
2cb879f0c6
MIPS: dspr2: Added optimization for function VP8LAddGreenToBlueAndRed_C
...
Change-Id: If897c6c2f1c4b8405789298e135d6a1e4bf13012
2015-02-12 09:06:49 +01:00
James Zern
e15560107c
move some cost tables from enc/ to dsp/
...
removes circular dependency between dsp and enc.
since:
a987fae
MIPS: dspr2: added optimization for function GetResidualCost
Change-Id: Ifeb8fc02de89e2ba982ed7ffacd925d649bfec3c
2015-02-11 16:10:06 -08:00
pascal massimino
39537d7cfe
Merge "VP8LDspInitMIPSdspR2: add missing TSan annotation"
2015-02-10 00:02:41 -08:00
James Zern
43fd3543df
VP8LDspInitMIPSdspR2: add missing TSan annotation
...
Change-Id: Ic0d84e95daf063976b40fb5ba1e94d3547e2afba
2015-02-09 23:55:30 -08:00
pascal massimino
c7233dfcdc
Merge "VP8LDspInit: remove memcpy"
2015-02-09 23:48:44 -08:00
James Zern
35579a4902
VP8LDspInit: remove memcpy
...
without this change the TSan annotation is useless
Change-Id: Ief511379f3aad75889815d4fe8362aed5c1abac7
2015-02-09 23:41:24 -08:00
James Zern
97f6aff874
VP8YUVInit: add missing TSan annotation
...
Change-Id: I7f8868de425e1aac3721b3e328844725104d14db
2015-02-09 22:50:31 -08:00
James Zern
f9016d6662
dsp/enc::InitTables: add missing TSan annotation
...
Change-Id: I262b9071417a0ec502c7c0380f27da6413cc74e4
2015-02-09 22:40:45 -08:00
James Zern
e3d9771aa1
VP8EncDspCostInit*: add missing TSan annotations
...
Change-Id: I4cdb84bc8c9a8c6aa34b5773c8fb69e5810a9809
2015-02-09 22:39:14 -08:00
Djordje Pesut
309b790867
MIPS: mips32: Added optimization for function SetResidualCoeffs
...
Change-Id: If67c10285df71ba7dd1aff6c24c2145c280dd2bf
2015-02-09 13:17:49 +01:00
Pascal Massimino
a987faedfa
MIPS: dspr2: added optimization for function GetResidualCost
...
set/get residual C functions moved to new file in src/dsp
mips32 version of GetResidualCost moved to new file
Change-Id: I7cebb7933a89820ff28c187249a9181f281081d2
2015-02-07 02:13:26 -08:00
James Zern
c24d8f144f
cosmetics: upsampling_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: Ia6df6690c85f580b20f19ce85cc6ec7b52620aee
2015-02-05 23:51:57 -08:00
James Zern
1829c42c58
cosmetics: lossless_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I2405b18c6bb829b76c3a9814057ccbe6e14220d9
2015-02-05 23:51:44 -08:00
James Zern
183168f332
cosmetics: enc_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: Ib85d63abbb9fc33096f893c2524d3ce8ae3ebd03
2015-02-05 23:51:29 -08:00
James Zern
860badcacc
cosmetics: dec_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I77decb55f1382bea4b646a11b77dfa40bf1ef94d
2015-02-05 23:51:16 -08:00
James Zern
0254db9793
cosmetics: argb_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I87fa2de11dafcc77767aab64e13b8c5585ebf5cd
2015-02-05 23:51:07 -08:00
James Zern
1aadf856c9
cosmetics: alpha_processing_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I00ba15af2f43d125ceb2620e82fd43d420fbb9d3
2015-02-05 23:50:39 -08:00
Pascal Massimino
19f0ba0eb9
Implement true-motion prediction in SSE2
...
(along with DC/HE/VE for chroma/luma16)
Overall effect is ~1% faster decoding.
Change-Id: I90917e050d61874cbc8da0e88f26b5dd6131c265
2015-02-04 17:02:22 +01:00
Pascal Massimino
774d4cb758
make VP8PredLuma16[] array non-const
...
Change-Id: I0ce7e4e847f9fffefb6544db9636068442a2d264
2015-02-04 17:00:22 +01:00
Djordje Pesut
6ce296da12
MIPS: dspr2: Added optimization for function CollectHistogram
...
Change-Id: Id6b87ea1c9d21fee9494ad6c53ffc84ef60d5974
2015-02-03 14:11:20 +01:00
James Zern
b7de794622
Merge "lossless_neon: enable subtract green for aarch64"
2015-02-02 16:01:40 -08:00
Pascal Massimino
77724f70e9
SSE2 version of GradientUnfilter
...
somewhat 1-2% faster decoder for lossy+alpha
Change-Id: Ib317e26e9fcb8d37af02668ffbfccc4664e659fe
2015-01-31 23:18:00 +01:00
James Zern
416e1cea9b
lossless_neon: enable subtract green for aarch64
...
similar to:
1ba61b0
enable NEON intrinsics in aarch64 builds
vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.
performance varies based on the input, 1-3% on encode was observed
Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
2015-01-31 11:32:05 -08:00
Pascal Massimino
022d2f886c
add SSE2 variants for alpha filtering functions
...
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.
The heavy bottleneck left for optimization is still GradientUnfilter()
Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
2015-01-29 08:46:22 +01:00
Pascal Massimino
7afdaf8496
Alpha coding: reorganize the filter/unfiltering code
...
Move the filtering code to their own dsp/ spot
New function: VP8FiltersInit()
Change-Id: I0b2041eab42346c59b972f2575b05509e6a8f7b1
2015-01-28 08:02:41 +01:00
Djordje Pesut
da0912126b
MIPS: dspr2: Added optimization for function FTransformWHT
...
Change-Id: I918366cd1908304068c66da9965efb0aa63320cd
2015-01-19 10:15:13 +01:00
pascal massimino
daeb276a2b
Merge "MIPS: dspr2: Added optimization for MultARGBRow function"
2015-01-17 08:56:01 -08:00
James Zern
0de5f33e31
dsp/cpu: (msvs) add include for __cpuidex
...
and only use it on x86 / x64 where it's available.
has the side-effect of quieting a msvs /analyze warning:
C6001: Using uninitialized memory 'cpu_info'.
Change-Id: Iae51be3b22b2ee949cfc473eeea9fd9fb6b3c2cb
2015-01-16 18:16:10 -08:00
Djordje Pesut
7d850f7b9a
MIPS: dspr2: Added optimization for MultARGBRow function
...
Change-Id: Ide549ae0d80413bea8c19fe091d97bffe8b17985
2015-01-16 15:56:34 +01:00
Djordje Pesut
5487529368
MIPS: dspr2: added optimization for function QuantizeBlock
...
Change-Id: Id217116890b7408d23464216608ce67ae545688a
2015-01-16 12:51:13 +01:00
James Zern
4fbe9cf202
dsp/cpu: (msvs) avoid immintrin.h on _M_ARM
...
_xgetgv() isn't relevant there anyway
broken since:
279e661
Merge "dsp/cpu: add include for _xgetbv() w/MSVS"
Change-Id: Iaa7bc0c5be9c06bfffab39e194c64c09bf5b5a27
2015-01-15 23:04:08 -08:00
Pascal Massimino
3fd59039bd
simplify/reorganize arguments for CollectColorBlueTransforms
...
and other various call sites too.
Change-Id: Icb8f828dfe25672662de18d0e48e7d3144b1f38d
2015-01-15 18:12:12 -08:00
Djordje Pesut
a7e7caa486
MIPS: dspr2: added optimization for function TransformColorRed
...
added new function CollectColorRedTransforms to C, which calls
TransformColorRed and it is realized via pointer to function
Change-Id: Ia68d73bfcf1ca2cb443dc2825910946221f87835
2015-01-15 09:32:09 +01:00
pascal massimino
2cb39180cc
Merge "MIPS: dspr2: added optimization for function TransformColorBlue"
2015-01-15 00:06:01 -08:00
pascal massimino
279e66138d
Merge "dsp/cpu: add include for _xgetbv() w/MSVS"
2015-01-15 00:05:35 -08:00
James Zern
b6c0428e8c
dsp/cpu: add include for _xgetbv() w/MSVS
...
explicitly add immintrin.h instead of transitively picking it up via
windows.h presumably. makes the code easier to move around.
Change-Id: If70d5143ac94fc331da763ce034358858e460e06
2015-01-14 23:31:35 -08:00