skal
ac76801159
introduce FTransform2 to perform two transforms at a time.
...
FTransform goes from ~12.0% to 11.5% total CPU time.
Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
James Zern
aa6065aedd
dec_neon: use vld1_dup(mem) rather than vdup(mem[0])
...
should result in slightly less general purpose register use
Change-Id: I6069f49541392e56c8db2c28c8d1fdf88c1a1726
2015-05-16 11:24:32 -07:00
Pascal Massimino
8b63ac78e0
Merge "dec_neon: add TM16"
2015-05-16 10:56:07 +00:00
Pascal Massimino
f51be09e1f
Merge "dec_neon/TrueMotion: simply left border load"
2015-05-16 10:54:05 +00:00
James Zern
dc48196bd9
dec_neon: add TM16
...
over 20M pixels ~78% faster
Change-Id: I420d5d590f275f19e08f86df1d1caa6b82fffbde
2015-05-15 12:50:11 -07:00
James Zern
ea95b305ca
dec_neon/TrueMotion: simply left border load
...
use vld1_dup_u8() rather than a separate ld+dup after the values were
zero extended; mildly faster at the function level
Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89
2015-05-15 12:48:13 -07:00
Pascal Massimino
f262d6120e
speed-up SetResidualSSE2
...
(was unnecessarily complicated)
Before:
VP8SetResidualCoeffs: checksum = 1127918 elapsed = 475 ms.
Change-Id: Ia54bef86c45f9f474622ff16e594bf1da4f67ebd
After:
VP8SetResidualCoeffs: checksum = 1127918 elapsed = 404 ms.
2015-05-14 21:24:24 -07:00
James Zern
bf46d0acff
fix mips2 build target
...
tested with mips1 and mips2; this should cover 3/4 as well.
fixes an ftbfs reported on the debian issue tracker:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785000
Change-Id: I2458487c92bd638589fdfec5adb4f22102a5960c
2015-05-13 10:36:22 -07:00
James Zern
929a0fdccd
enc_sse2/TTransform: simplify abs calculation
...
max(b, 0 - b) works as well as (b ^ sign) - b
Change-Id: Iad923236fd70db85ff58a64d3c8e25e4f42a525d
2015-05-08 19:50:29 -07:00
James Zern
17dbd05819
enc_sse2/CollectHistogram: simplify abs calculation
...
max(out, 0 - out) works as well as (out ^ sign) - out
Change-Id: Id820ab9b296512cb0d56c8026b986bf98e3d3909
2015-05-08 19:49:08 -07:00
James Zern
a6c1593645
dec_neon: add DC16 intra predictors
...
improvement over 20M pixels:
DC16: ~77%
DC16NoTop: ~78%
DC16NoLeft: ~83%
DC16NoTopLeft: ~83%
Change-Id: I4c4ee16a8fa0eb466eee45dfa6f6bbce5ce64b99
2015-05-08 00:12:48 -07:00
James Zern
f274a96ce9
dsp/enc_sse2: add luma4 intra predictors
...
VP8EncPredLuma4 improvement over ~20M pixels: ~39%
Change-Id: I9cd841250771276d2d1bef3991215a56e83f7f20
2015-05-05 23:51:19 -07:00
James Zern
040b11bdf6
dsp/enc_sse2: add chroma intra predictors
...
VP8EncPredChroma8 improvements over ~20M pixels
left/top: ~67%
left-only: ~52%
top-only: ~57%
none: ~61%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76
2015-05-05 23:51:14 -07:00
James Zern
aee021bbb1
dsp/enc_sse2: add luma16 intra predictors
...
VP8EncPredLuma16 improvements over ~20M pixels
left/top: ~75%
left-only: ~47%
top-only: ~59%
none: ~63%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d
2015-05-05 23:51:07 -07:00
James Zern
4c9af02326
dec_neon: add DC8uvNoTopLeft
...
~93% faster
Change-Id: Icf0fd5f85ac53c306a1b69d84275023e5b24a602
2015-05-01 20:03:57 -07:00
Pascal Massimino
9287761d95
Merge "GetResidualCostSSE2: simplify abs calculation"
2015-04-30 06:30:58 +00:00
James Zern
0e009366f8
dsp/cpu.c(x86): check maximum supported cpuid feature
...
structured extended feature flags require eax = 7; avoids incorrectly
detecting avx2 on some older processors that support avx.
for completeness also check for value=1 support used by the other
checks.
from [1]:
INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor
Information and the Vendor Identification String
[1]
http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html
Change-Id: I60b20d661a978d551614dbf7acdc25db19cb6046
2015-04-29 23:22:53 -07:00
James Zern
b243a4bc30
GetResidualCostSSE2: simplify abs calculation
...
max(coeff, 0 - coeff) works as well as min/max/sub or
(coeff ^ sign) - coeff
Change-Id: I9b11715372e49cd83820677bf4beba4a1c04931c
2015-04-21 20:29:12 -07:00
James Zern
0768b252fa
dsp/enc.c: cosmetics: move DST() def closer to use
...
Change-Id: Iccbcf046412426c2893b71eced517f611d2ffc3f
2015-04-15 20:03:39 -07:00
James Zern
9904e365a8
dsp/dec_sse2: DC8uv / DC8uvNoLeft speedup
...
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.
DC8uv: ~19% faster
DC8uvNoLeft: ~12% faster
Change-Id: I707c4f6177a65b5d1f2d3deeca87d2bb740185e2
2015-04-08 23:12:53 -07:00
James Zern
7df2049785
dsp/dec_sse2: DC16 / DC16NoLeft speedup
...
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.
DC16: ~20% faster
DC16NoLeft: ~14% faster
Change-Id: I7ec3f8a6e5923f88a530f79fceb88d5001bef691
2015-04-08 23:10:39 -07:00
James Zern
b44eda3f60
dsp: add DSP_INIT_STUB
...
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning
Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
2015-04-02 23:55:35 -07:00
James Zern
1a338fb306
enc_sse41: add Disto4x4 / Disto16x16
...
direct translation from sse2; minor gain, fewer instructions
Change-Id: I60288a842fac1a686b82b5cab637931789fe29f2
2015-03-25 23:28:46 -07:00
Pascal Massimino
94055503e3
encoding SSE4.1 stub for StoreHistogram + Quantize + SSE_16xN
...
Visible speed-up, thanks to pshufb and pabsw and psignw use.
had to tweak configure.ac to make "smmintri.h" presence correctly
detected (we need to set the CPPFLAGS instead of the CFLAGS!)
Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080
2015-03-25 20:23:51 -07:00
Pascal Massimino
c64659e1b4
remove duplicate variables after the lossless{_enc}.c split
...
clang was giving "duplicate symbols" error messages at link time.
Change-Id: I2b77b55222fe033cc1d4636567902e80d814aab6
2015-03-25 11:10:21 +01:00
James Zern
67ba7c7acc
enc_sse2: call local FTransform in CollectHistogram
...
allows the former to be inlined; negligible speed-up in most cases,
however this is structure is consistent with the rest of the optimized
modules
Change-Id: Ib080240b06f7a995b47f1906627850c355b82901
2015-03-24 20:22:24 -07:00
James Zern
182497993b
dsp: s/VP8LSetHistogramData/VP8SetHistogramData/
...
this function is for lossy encoding; the VP8L prefix is used by lossless
Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb
2015-03-24 18:27:41 -07:00
James Zern
ede5e1584c
cosmetics: dsp/lossless.h: reorder prototypes
...
group decoding / encoding functions together, followed by their
respective Init() function.
Change-Id: Ib4d22f8ec2369efec752faf733ecf53acc67b1ca
2015-03-24 17:52:42 -07:00
James Zern
553051f741
dsp/lossless: split enc/dec functions
...
adds lossless_enc*.c; reduces the size of the decode-only so: ~78K
w/gcc-4.8.2 on x86_64.
Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092
2015-03-23 22:57:50 -07:00
James Zern
cecf509662
dsp/yuv*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I42e621481be7305bb7c426b4d0b279619195611e
2015-03-20 19:19:46 -07:00
James Zern
6584d398eb
dsp/upsampling*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I3c753915eefe900987c9720733efb720ebe6bfa7
2015-03-20 19:19:46 -07:00
James Zern
808094228c
dsp/rescaler*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: Ife9c7cd363b3692b64a7ade1960cfce3a76c3ba2
2015-03-20 19:19:46 -07:00
James Zern
1d93ddec19
dsp/lossless*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: If8b4459556e6bfaa36ef046f66520558b9444fc2
2015-03-20 19:19:46 -07:00
James Zern
73805ff270
dsp/filters*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: Idf08ffeb2aef1392a6d69596d897a59deebb64cf
2015-03-20 19:19:46 -07:00
James Zern
fbdcef2401
dsp/enc*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I0cf40b500f9b3eed55a3211213db180c7c0dd43b
2015-03-20 19:19:46 -07:00
James Zern
66de69c1fe
dsp/dec*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I319bc7714f36b8a3d8b35f6474e5592a439aaf24
2015-03-20 19:19:37 -07:00
James Zern
48e4ffd15e
dsp/cost*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: Ie9bee5eaf9daebe0909ab1dda1cf1aa4ee1ef03e
2015-03-20 19:18:50 -07:00
James Zern
29fd6f90c0
dsp/argb*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I46b89909a0279172d37dbda70f731c7b9f052dad
2015-03-20 19:18:50 -07:00
James Zern
80ff38130e
dsp/alpha*.c: rework WEBP_USE_<arch> ifdef
...
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.
Change-Id: I9e7f187daffe1a3b1bc92953dce980c38d1a6269
2015-03-20 19:18:41 -07:00
Pascal Massimino
e9570dd987
stub for SSE4.1 support.
...
Change-Id: I0c845a98d2871cc8907ff7b914bab7747a92c7ed
2015-03-20 00:26:35 -07:00
James Zern
cabf4bd2bc
dsp: add sse4.1 detection
...
bit 19 in ecx
no targets or code
https://software.intel.com/en-us/articles/using-cpuid-to-detect-the-presence-of-sse-41-and-sse-42-instruction-sets
Change-Id: Ie61b004dd5b6a3639b30bd9d2a09e6d7359b8040
2015-03-18 19:16:47 -07:00
Sam Clegg
ac4f5784a0
Disable NEON code on Native Client
...
The NEON assember in libwebp has not yet been ported
to Native Client. This changes disables it.
Related issue:
https://code.google.com/p/nativeclient/issues/detail?id=3205
Change-Id: I200291db7aa79d40c1f10cff7622c9b8599e6886
2015-03-10 16:17:25 -07:00
Djordje Pesut
241bb5d9d9
MIPS: dspr2: added optimization for TrueMotion
...
affected functions:
TM4 - TrueMotion4
TM8uv - TrueMotion8
TM16 - TrueMotion16
Change-Id: Iff4377c4b0ae94716789c03fe1cd5bfd91f79188
2015-02-26 10:22:19 +01:00
Djordje Pesut
b5e79422d5
MIPS: dspr2: Added optimization for some convert functions
...
affected functions:
VP8LConvertBGRAToRGBA4444_C
VP8LConvertBGRAToRGB565_C
VP8LConvertBGRAToBGR_C
Change-Id: I81513d242d33ebb9fef397ee6a2ca75d17f66e97
2015-02-24 10:51:34 +01:00
Djordje Pesut
0f595db60c
MIPS: dspr2: Added optimization for some convert functions
...
affected functions:
VP8LConvertBGRAToRGB_C
VP8LConvertBGRAToRGBA_C
Change-Id: I5f25795c385688f2432d0710296e589f3793cb2b
2015-02-23 17:44:06 +01:00
Djordje Pesut
8a218b4a96
MIPS: [mips32|dspr2]: GetResidualCost rebased
...
Change-Id: Ie15524c773f7a8c79e002097881a508187ca7cc6
2015-02-23 10:43:42 +01:00
James Zern
602a00f93f
fix iOS arm64 build with Xcode 6.3
...
the standard vtbl functions are available there [1][2].
based on a patch from: aaroncrespo
fixes issue #243 .
[1]
http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf
[2] Apple LLVM Compiler Version 6.1
- Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0.
[...]
Support for the arm64 architecture has been significantly revised to
align with ARM's implementation, where the most visible impact is that a
few of the vector intrinsics have changed to match ARM's specifications.
Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447
2015-02-19 12:16:58 -08:00
Pascal Massimino
2382050748
1-2% faster encoding by removing an indirection in GetResidualCost()
...
The MIPS code for cost is not updated yet, that's why i keep Residual::*cost
around for now. Should be removed in favor of *costs later.
Change-Id: Id1d09a8c37ea8c5b34ad5eb8811d6a3ec6c4d89f
2015-02-19 08:44:35 +01:00
Djordje Pesut
eddb7e70be
MIPS: dspr2: added otpimization for DC8uv, DC8uvNoTop and DC8uvNoLeft
...
added macros for load/store
Change-Id: I151d4d49bf1fab87fc3a82cb8e8e0835fe10b690
2015-02-18 18:24:10 +01:00
Djordje Pesut
73ba29158f
MIPS: dspr2: added optimization for functions RD4 and LD4
...
Change-Id: I71216c1300f4eb254de4ae940ea9dcdba50aa080
2015-02-18 15:11:34 +01:00
Pascal Massimino
c7129da5b6
Merge "4-5% faster encoding using SSE2 for GetResidualCost"
2015-02-18 04:46:53 -08:00
Djordje Pesut
94380d00d9
MIPS: dspr2: added optimizaton for functions VE4 and DC4
...
Change-Id: I118adc6d3872742d8b1f9dbac438cba6fc90b7a9
2015-02-18 11:25:08 +01:00
Pascal Massimino
2a407092ab
4-5% faster encoding using SSE2 for GetResidualCost
...
new file: cost_sse2.c
Change-Id: I4896c07f5ff2443ef743f4435fe2758d95a672ed
2015-02-18 09:41:02 +01:00
James Zern
17e1986214
Merge "MIPS: dspr2: added optimization for simple filtering functions"
2015-02-17 14:57:05 -08:00
pascal massimino
3ec404c47a
Merge "dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage"
2015-02-14 01:57:08 -08:00
James Zern
b969f5dfac
dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage
...
the attribute is only necessary in one location; remove it from the
prototypes.
Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2
2015-02-13 15:23:40 -08:00
Djordje Pesut
d7b8e71126
MIPS: dspr2: added optimization for simple filtering functions
...
affected functions: SimpleVFilter16, SimpleHFilter16,
SimpleVFilter16i and SimpleHFilter16i
noticed bug in FilterLoop26 (fix included in this patch)
Change-Id: I72d9c1e45cbac6393eba52bb549b04924d463e30
2015-02-13 11:18:43 +01:00
Djordje Pesut
42a8a6280c
MIPS: dspr2: Added optimization for function VP8LTransformColorInverse_C
...
Change-Id: I8b60e22c9f6c0badab6267a33751dfc28750f457
2015-02-13 08:57:20 +01:00
James Zern
3030f11525
Merge "dsp/mips: add some missing TSan annotations"
2015-02-12 14:55:32 -08:00
pascal massimino
dfcf4593fe
Merge "MIPS: dspr2: Added optimization for function VP8LAddGreenToBlueAndRed_C"
2015-02-12 14:48:17 -08:00
James Zern
55c75a25f0
dsp/mips: add some missing TSan annotations
...
Change-Id: I3c832aefdeac26c6c75c35b19b45c1a2f67493c5
2015-02-12 14:36:33 -08:00
Djordje Pesut
2cb879f0c6
MIPS: dspr2: Added optimization for function VP8LAddGreenToBlueAndRed_C
...
Change-Id: If897c6c2f1c4b8405789298e135d6a1e4bf13012
2015-02-12 09:06:49 +01:00
James Zern
e15560107c
move some cost tables from enc/ to dsp/
...
removes circular dependency between dsp and enc.
since:
a987fae
MIPS: dspr2: added optimization for function GetResidualCost
Change-Id: Ifeb8fc02de89e2ba982ed7ffacd925d649bfec3c
2015-02-11 16:10:06 -08:00
pascal massimino
39537d7cfe
Merge "VP8LDspInitMIPSdspR2: add missing TSan annotation"
2015-02-10 00:02:41 -08:00
James Zern
43fd3543df
VP8LDspInitMIPSdspR2: add missing TSan annotation
...
Change-Id: Ic0d84e95daf063976b40fb5ba1e94d3547e2afba
2015-02-09 23:55:30 -08:00
pascal massimino
c7233dfcdc
Merge "VP8LDspInit: remove memcpy"
2015-02-09 23:48:44 -08:00
James Zern
35579a4902
VP8LDspInit: remove memcpy
...
without this change the TSan annotation is useless
Change-Id: Ief511379f3aad75889815d4fe8362aed5c1abac7
2015-02-09 23:41:24 -08:00
James Zern
97f6aff874
VP8YUVInit: add missing TSan annotation
...
Change-Id: I7f8868de425e1aac3721b3e328844725104d14db
2015-02-09 22:50:31 -08:00
James Zern
f9016d6662
dsp/enc::InitTables: add missing TSan annotation
...
Change-Id: I262b9071417a0ec502c7c0380f27da6413cc74e4
2015-02-09 22:40:45 -08:00
James Zern
e3d9771aa1
VP8EncDspCostInit*: add missing TSan annotations
...
Change-Id: I4cdb84bc8c9a8c6aa34b5773c8fb69e5810a9809
2015-02-09 22:39:14 -08:00
Djordje Pesut
309b790867
MIPS: mips32: Added optimization for function SetResidualCoeffs
...
Change-Id: If67c10285df71ba7dd1aff6c24c2145c280dd2bf
2015-02-09 13:17:49 +01:00
Pascal Massimino
a987faedfa
MIPS: dspr2: added optimization for function GetResidualCost
...
set/get residual C functions moved to new file in src/dsp
mips32 version of GetResidualCost moved to new file
Change-Id: I7cebb7933a89820ff28c187249a9181f281081d2
2015-02-07 02:13:26 -08:00
James Zern
c24d8f144f
cosmetics: upsampling_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: Ia6df6690c85f580b20f19ce85cc6ec7b52620aee
2015-02-05 23:51:57 -08:00
James Zern
1829c42c58
cosmetics: lossless_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I2405b18c6bb829b76c3a9814057ccbe6e14220d9
2015-02-05 23:51:44 -08:00
James Zern
183168f332
cosmetics: enc_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: Ib85d63abbb9fc33096f893c2524d3ce8ae3ebd03
2015-02-05 23:51:29 -08:00
James Zern
860badcacc
cosmetics: dec_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I77decb55f1382bea4b646a11b77dfa40bf1ef94d
2015-02-05 23:51:16 -08:00
James Zern
0254db9793
cosmetics: argb_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I87fa2de11dafcc77767aab64e13b8c5585ebf5cd
2015-02-05 23:51:07 -08:00
James Zern
1aadf856c9
cosmetics: alpha_processing_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I00ba15af2f43d125ceb2620e82fd43d420fbb9d3
2015-02-05 23:50:39 -08:00
Pascal Massimino
19f0ba0eb9
Implement true-motion prediction in SSE2
...
(along with DC/HE/VE for chroma/luma16)
Overall effect is ~1% faster decoding.
Change-Id: I90917e050d61874cbc8da0e88f26b5dd6131c265
2015-02-04 17:02:22 +01:00
Pascal Massimino
774d4cb758
make VP8PredLuma16[] array non-const
...
Change-Id: I0ce7e4e847f9fffefb6544db9636068442a2d264
2015-02-04 17:00:22 +01:00
Djordje Pesut
6ce296da12
MIPS: dspr2: Added optimization for function CollectHistogram
...
Change-Id: Id6b87ea1c9d21fee9494ad6c53ffc84ef60d5974
2015-02-03 14:11:20 +01:00
James Zern
b7de794622
Merge "lossless_neon: enable subtract green for aarch64"
2015-02-02 16:01:40 -08:00
Pascal Massimino
77724f70e9
SSE2 version of GradientUnfilter
...
somewhat 1-2% faster decoder for lossy+alpha
Change-Id: Ib317e26e9fcb8d37af02668ffbfccc4664e659fe
2015-01-31 23:18:00 +01:00
James Zern
416e1cea9b
lossless_neon: enable subtract green for aarch64
...
similar to:
1ba61b0
enable NEON intrinsics in aarch64 builds
vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.
performance varies based on the input, 1-3% on encode was observed
Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
2015-01-31 11:32:05 -08:00
Pascal Massimino
022d2f886c
add SSE2 variants for alpha filtering functions
...
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.
The heavy bottleneck left for optimization is still GradientUnfilter()
Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
2015-01-29 08:46:22 +01:00
Pascal Massimino
7afdaf8496
Alpha coding: reorganize the filter/unfiltering code
...
Move the filtering code to their own dsp/ spot
New function: VP8FiltersInit()
Change-Id: I0b2041eab42346c59b972f2575b05509e6a8f7b1
2015-01-28 08:02:41 +01:00
Djordje Pesut
da0912126b
MIPS: dspr2: Added optimization for function FTransformWHT
...
Change-Id: I918366cd1908304068c66da9965efb0aa63320cd
2015-01-19 10:15:13 +01:00
pascal massimino
daeb276a2b
Merge "MIPS: dspr2: Added optimization for MultARGBRow function"
2015-01-17 08:56:01 -08:00
James Zern
0de5f33e31
dsp/cpu: (msvs) add include for __cpuidex
...
and only use it on x86 / x64 where it's available.
has the side-effect of quieting a msvs /analyze warning:
C6001: Using uninitialized memory 'cpu_info'.
Change-Id: Iae51be3b22b2ee949cfc473eeea9fd9fb6b3c2cb
2015-01-16 18:16:10 -08:00
Djordje Pesut
7d850f7b9a
MIPS: dspr2: Added optimization for MultARGBRow function
...
Change-Id: Ide549ae0d80413bea8c19fe091d97bffe8b17985
2015-01-16 15:56:34 +01:00
Djordje Pesut
5487529368
MIPS: dspr2: added optimization for function QuantizeBlock
...
Change-Id: Id217116890b7408d23464216608ce67ae545688a
2015-01-16 12:51:13 +01:00
James Zern
4fbe9cf202
dsp/cpu: (msvs) avoid immintrin.h on _M_ARM
...
_xgetgv() isn't relevant there anyway
broken since:
279e661
Merge "dsp/cpu: add include for _xgetbv() w/MSVS"
Change-Id: Iaa7bc0c5be9c06bfffab39e194c64c09bf5b5a27
2015-01-15 23:04:08 -08:00
Pascal Massimino
3fd59039bd
simplify/reorganize arguments for CollectColorBlueTransforms
...
and other various call sites too.
Change-Id: Icb8f828dfe25672662de18d0e48e7d3144b1f38d
2015-01-15 18:12:12 -08:00
Djordje Pesut
a7e7caa486
MIPS: dspr2: added optimization for function TransformColorRed
...
added new function CollectColorRedTransforms to C, which calls
TransformColorRed and it is realized via pointer to function
Change-Id: Ia68d73bfcf1ca2cb443dc2825910946221f87835
2015-01-15 09:32:09 +01:00
pascal massimino
2cb39180cc
Merge "MIPS: dspr2: added optimization for function TransformColorBlue"
2015-01-15 00:06:01 -08:00
pascal massimino
279e66138d
Merge "dsp/cpu: add include for _xgetbv() w/MSVS"
2015-01-15 00:05:35 -08:00
James Zern
b6c0428e8c
dsp/cpu: add include for _xgetbv() w/MSVS
...
explicitly add immintrin.h instead of transitively picking it up via
windows.h presumably. makes the code easier to move around.
Change-Id: If70d5143ac94fc331da763ce034358858e460e06
2015-01-14 23:31:35 -08:00
Djordje Pesut
7b16197361
MIPS: dspr2: added optimization for function TransformColorBlue
...
added new function CollectColorBlueTransforms to C, which calls
TransformColorBlue and it is realized via pointer to function
Change-Id: Ia488b7a7a689223b5d33aae9724afab89b97fced
2015-01-13 10:39:38 +01:00
James Zern
d7c4b02a57
cpu: fix AVX2 detection for gcc/clang targets
...
ecx needs to be set to 0; the visual studio builds were already doing
this.
https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
Change-Id: I95efb115b4d50bbdb6b14fca2aa63d0a24974e55
2015-01-12 17:58:57 -08:00
Pascal Massimino
d581ba40ba
follow-up: clean up WebPRescalerXXX dsp function
...
by removing redundant RFIX macros and using a plain-C fallback.
Change-Id: I52436c672bf20780b6fe3bcf43fe73e1abac10ff
2015-01-12 15:26:55 -08:00
James Zern
f8740f0d6c
dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
...
for consistency with other defines shared across modules
Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
Pascal Massimino
ab66becaae
introduce a separate WebPRescalerDspInit to initialize pointers
...
so that we keep the details of WebPRescaler in utils/rescaler.c
when possible.
Change-Id: Ib6c1029a09b84cbc7a7d2f70dafa4d4d9132cecc
2015-01-12 13:58:30 -08:00
pascal massimino
cbcdd5ffaf
Merge "move rescaler functions to rescaler* files in src/dsp/"
2015-01-10 05:41:45 -08:00
pascal massimino
bf586e8844
Merge changes I230b3532,Idf3057a7
...
* changes:
enable NEON for Windows ARM builds
Makefile.vc: add rudimentary Windows ARM support
2015-01-10 02:14:48 -08:00
James Zern
4f43d38ca8
enable NEON for Windows ARM builds
...
Change-Id: I230b353214ce44ab29ffd2df6ccd14345d6578e8
2015-01-09 19:11:55 -08:00
James Zern
e7c5954c10
dec_neon: remove returns from void functions
...
Change-Id: I3c66a5dfe3de2bb3653cbbf1b92b0328aba62881
2015-01-09 18:08:05 -08:00
Djordje Pesut
cbcbedd0de
move rescaler functions to rescaler* files in src/dsp/
...
Change-Id: I906add1b1010a59ebfcc2dd81e15745433cc206b
2015-01-09 16:47:09 +01:00
Djordje Pesut
a28c4b363d
MIPS: move WORK_AROUND_GCC define to appropriate place
...
Change-Id: I3055eca57dc4e9d39533a5b8170bbf7af9cd818f
2015-01-08 15:55:41 +01:00
Djordje Pesut
012d2c60fa
MIPS: dspr2: added optimization for functions SSEAxB
...
list of optimized functions: SSE16x16, SSE8x8, SSE16x8, SSE4x4
Change-Id: Ie99e7cdd73b0d4ff855977315a5d0db9ffaa5f04
2015-01-08 13:49:17 +01:00
Djordje Pesut
9241ecf45d
MIPS: dspr2: added optimization for function Average
...
Change-Id: I7ca316bc3f5fbdaf8dcaf9a2d2227a5134bf4f63
2015-01-08 11:46:15 +01:00
pascal massimino
c6d3292738
argb_sse2: cosmetics
...
clarify some variable names in PackARGB() + add some comments
Change-Id: I2bb91d6c52dcbcdebe0f92d5f2136c2d7d11af2a
2015-01-08 00:18:54 -08:00
James Zern
67f601cd46
make the 'last_cpuinfo_used' variable names unique
...
allows the sources to be #include'd in some hackish builds (don't do
that!)
Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91
2015-01-07 23:38:53 -08:00
Pascal Massimino
9592053859
Merge "multi-thread fix: lock each entry points with a static var"
2015-01-07 00:03:51 -08:00
Pascal Massimino
4c1b300ada
Merge "SSE2 implementation of VP8PackARGB"
2015-01-06 23:53:50 -08:00
James Zern
04c20e75ea
Merge "MIPS: dspr2: added optimization for function Intra4Preds"
2015-01-06 16:15:10 -08:00
Pascal Massimino
a437694a17
multi-thread fix: lock each entry points with a static var
...
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)
Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
2015-01-05 07:48:49 -08:00
Pascal Massimino
ca7f60db5f
SSE2 implementation of VP8PackARGB
...
Change-Id: I40c0e26a6a2701216e4ddebcf793aa535677f437
2015-01-05 05:17:51 -08:00
Pascal Massimino
72d573f693
simplify the PackARGB signature
...
Change-Id: I51570e362126b2681f93211a4f59a3fedb5fd4b5
2015-01-05 02:10:04 -08:00
James Zern
f8abb112f2
Merge changes I109ec4d9,I73fe7743
...
* changes:
dec_neon: add DC8uvNoTop / DC8uvNoLeft
dec_neon: add DC8uv
2014-12-23 09:11:22 -08:00
Djordje Pesut
ae2188a435
MIPS: dspr2: added optimization for function Intra4Preds
...
Change-Id: Ie2a23c356a8715817b020fbee2b40e878e2946de
2014-12-23 17:32:27 +01:00
James Zern
14108d7878
dec_neon: add DC8uvNoTop / DC8uvNoLeft
...
adds do_top/do_left flags to DC8uv; ~88% / ~92% faster respectively
no change in DC8uv speed.
Change-Id: I109ec4d9ad13c9db64516e98ed4693a21a3e9b54
2014-12-22 15:47:38 -05:00
James Zern
d8340da756
dec_neon: add DC8uv
...
~87% faster.
Change-Id: I73fe77437792f1361ce8ab0b411132c6ec0fa021
2014-12-22 14:36:45 -05:00
Djordje Pesut
7ce8788b06
MIPS: dspr2: added optimization for function MakeARGB32
...
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row
Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
2014-12-22 12:31:36 +01:00
Pascal Massimino
87c3d53180
method=0: Don't evaluate any predictor
...
and apply Paeth predictor (predictor#11) for the low effort (m=0) mode.
For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality
range and about 10% for higher quality range.
Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93
2014-12-17 18:41:08 +01:00
Pascal Massimino
31a9cf6417
Speedup WebP lossless compression for low effort (m=0) mode with following:
...
- Disable Cross-Color transform.
- Evaluate predictors #11 (paeth), #12 and #13 only.
Change-Id: I857264c85c61c3957d4fb45ae32d261d947c8bed
2014-12-17 11:52:11 +01:00
Djordje Pesut
9275d91c79
MIPS: dspr2: added optimization for function TrueMotion
...
Change-Id: Id006d9591c0c922e28f7f4c01e4006f0f07bdd56
2014-12-12 14:38:55 +01:00
James Zern
a3946b8956
enc_neon: fix building with non-Xcode clang (iOS)
...
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.
Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
2014-12-10 15:50:26 -08:00
Pascal Massimino
8ed9c00d5e
Merge "simplify the Histogram struct, to only store max_value and last_nz"
2014-12-10 02:02:05 -08:00
Pascal Massimino
bad775715a
simplify the Histogram struct, to only store max_value and last_nz
...
we don't need to store the whole distribution in order to compute the alpha
Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.
Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
2014-12-10 10:44:57 +01:00
Djordje Pesut
3cca0dc7f0
MIPS: dspr2: Added optimization for DCMode function
...
Change-Id: I8ea31907c1ea1259ec4db8cee1a479bd13a025a1
2014-12-09 13:58:39 +01:00
Djordje Pesut
37e395fd1c
MIPS: fix functions to use generic BPS istead of hardcoded value
...
Change-Id: I2d68abef886eff7f8df230f155b758dccd7d04fd
2014-12-05 15:55:47 +01:00
Pascal Massimino
4a279a680e
cosmetics: add some missing != NULL comparisons
...
Change-Id: I55f8da527e5e8ee4b49c7e7aa0d61ea4a6c80904
2014-12-04 14:54:11 +01:00
Pascal Massimino
66ad372500
factorize BPS definition in dsp.h and add VP8Copy16x8
...
Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d
2014-12-04 13:45:14 +01:00
Pascal Massimino
57606047ec
encoder: switch BPS to 32 instead of 16
...
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/
With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.
Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
2014-12-04 09:17:18 +01:00
Djordje Pesut
1b66bbe998
MIPS: dspr2: added optimization for function TransformColor_C
...
Change-Id: Idbf5cecf6775340585b0fd7e6ddcb29c2fcbea36
2014-12-01 15:46:06 +01:00
James Zern
9de9074c92
dec_neon: add TM8uv
...
~68% faster
reuses TM4() adding support for the additional rows, the columns were
already being done.
Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
2014-11-25 14:40:17 -08:00
James Zern
e18571393d
dsp: initialize VP8PredChroma8 in VP8DspInit()
...
the table becomes non-const to allow for platform-specific optimizations
Change-Id: I32d2b51480020dc653ecfafd20b6b0f096af349f
2014-11-24 22:12:42 -08:00
Vikas Arora
e0c809ad23
Move Entropy methods to lossless.c
...
Move all the Entropy evaluation methods to lossless.c (from histogram.c).
There's slight difference in the way entropy is computed for evaluating
entropy in prediction methods and histogram (literal) for huffman trees.
Plan (later) to merge few (static) methods and reduce the code size.
This change has no impact on the compression speed/density.
Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15
2014-11-20 13:48:05 -08:00
Djordje Pesut
2f0e2ba826
MIPS: dspr2: added optimization for function Select
...
Change-Id: I22470d8b9ab8c5e90c5330ff12c9852676da1a3d
2014-11-07 09:44:16 +01:00
Djordje Pesut
54f2c14cce
MIPS: dspr2: added optimization for function FTransform
...
Change-Id: Ib5850edbc2a586ec9781f494b2337f024e22af78
2014-11-06 14:21:33 +01:00
Djordje Pesut
aa42f4231f
MIPS: dspr2: Added optimization for function VP8LSubtractGreenFromBlueAndRed
...
Change-Id: I683c73cceee4a40ca810deba15e54fbf7dbe8918
2014-11-06 10:56:18 +01:00
Djordje Pesut
95ca44a718
MIPS: dspr2: added optimization for Disto4x4
...
enc/dec common macros moved to mips_macro.h
Change-Id: I38d491e772554ac663dd5eb4d15485c0343f23b1
2014-11-05 12:06:15 +01:00
Djordje Pesut
5798eee6be
MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899)
...
Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140
2014-11-03 15:39:00 +01:00
James Zern
572022a350
filters_mips_dsp_r2.c: disable unfilters
...
the output does not match the C-code.
Change-Id: Ie7b7387478a6b5c3f08691628ae00f059cf6d899
2014-10-30 11:10:11 +01:00
Djordje Pesut
a28e21b141
MIPS: dspr2: Added optimization for function ClampedAddSubtractFull
...
Change-Id: Iee98eaf007158f44a299dd5ba8d972d0d4108380
2014-10-29 13:08:06 +01:00
Djordje Pesut
18d5a1efa8
MIPS: dspr2: added optimization for function ClampedAddSubtractHalf
...
Change-Id: Iec22e897a4f56e79c18ec00f8caa9cefac67f186
2014-10-29 11:08:37 +01:00
Djordje Pesut
829a8c19a0
MIPS: dspr2: added optimization for ITransform
...
Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463
2014-10-28 14:28:14 +01:00
James Zern
22881c999e
dec_neon: add RD4 intra predictor
...
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.
Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
1304eb3418
Merge "dec_neon: DC4: use pair-wise adds for top row"
2014-10-23 08:08:34 -07:00
James Zern
0db9031c79
dsp/dec_{neon,sse2}: VE4: normalize variable names
...
use '0' rather than '_' when dealing with variables that result from a
shift
Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00