James Zern
c24d8f144f
cosmetics: upsampling_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: Ia6df6690c85f580b20f19ce85cc6ec7b52620aee
2015-02-05 23:51:57 -08:00
James Zern
1829c42c58
cosmetics: lossless_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I2405b18c6bb829b76c3a9814057ccbe6e14220d9
2015-02-05 23:51:44 -08:00
James Zern
183168f332
cosmetics: enc_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: Ib85d63abbb9fc33096f893c2524d3ce8ae3ebd03
2015-02-05 23:51:29 -08:00
James Zern
860badcacc
cosmetics: dec_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I77decb55f1382bea4b646a11b77dfa40bf1ef94d
2015-02-05 23:51:16 -08:00
James Zern
0254db9793
cosmetics: argb_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I87fa2de11dafcc77767aab64e13b8c5585ebf5cd
2015-02-05 23:51:07 -08:00
James Zern
1aadf856c9
cosmetics: alpha_processing_sse2: add const to some casts
...
source pointers are often cast to __m128*, retain the const in those
cases
Change-Id: I00ba15af2f43d125ceb2620e82fd43d420fbb9d3
2015-02-05 23:50:39 -08:00
Pascal Massimino
19f0ba0eb9
Implement true-motion prediction in SSE2
...
(along with DC/HE/VE for chroma/luma16)
Overall effect is ~1% faster decoding.
Change-Id: I90917e050d61874cbc8da0e88f26b5dd6131c265
2015-02-04 17:02:22 +01:00
Pascal Massimino
774d4cb758
make VP8PredLuma16[] array non-const
...
Change-Id: I0ce7e4e847f9fffefb6544db9636068442a2d264
2015-02-04 17:00:22 +01:00
Djordje Pesut
6ce296da12
MIPS: dspr2: Added optimization for function CollectHistogram
...
Change-Id: Id6b87ea1c9d21fee9494ad6c53ffc84ef60d5974
2015-02-03 14:11:20 +01:00
James Zern
b7de794622
Merge "lossless_neon: enable subtract green for aarch64"
2015-02-02 16:01:40 -08:00
Pascal Massimino
77724f70e9
SSE2 version of GradientUnfilter
...
somewhat 1-2% faster decoder for lossy+alpha
Change-Id: Ib317e26e9fcb8d37af02668ffbfccc4664e659fe
2015-01-31 23:18:00 +01:00
James Zern
416e1cea9b
lossless_neon: enable subtract green for aarch64
...
similar to:
1ba61b0
enable NEON intrinsics in aarch64 builds
vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.
performance varies based on the input, 1-3% on encode was observed
Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
2015-01-31 11:32:05 -08:00
Pascal Massimino
022d2f886c
add SSE2 variants for alpha filtering functions
...
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.
The heavy bottleneck left for optimization is still GradientUnfilter()
Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
2015-01-29 08:46:22 +01:00
Pascal Massimino
7afdaf8496
Alpha coding: reorganize the filter/unfiltering code
...
Move the filtering code to their own dsp/ spot
New function: VP8FiltersInit()
Change-Id: I0b2041eab42346c59b972f2575b05509e6a8f7b1
2015-01-28 08:02:41 +01:00
Djordje Pesut
da0912126b
MIPS: dspr2: Added optimization for function FTransformWHT
...
Change-Id: I918366cd1908304068c66da9965efb0aa63320cd
2015-01-19 10:15:13 +01:00
pascal massimino
daeb276a2b
Merge "MIPS: dspr2: Added optimization for MultARGBRow function"
2015-01-17 08:56:01 -08:00
James Zern
0de5f33e31
dsp/cpu: (msvs) add include for __cpuidex
...
and only use it on x86 / x64 where it's available.
has the side-effect of quieting a msvs /analyze warning:
C6001: Using uninitialized memory 'cpu_info'.
Change-Id: Iae51be3b22b2ee949cfc473eeea9fd9fb6b3c2cb
2015-01-16 18:16:10 -08:00
Djordje Pesut
7d850f7b9a
MIPS: dspr2: Added optimization for MultARGBRow function
...
Change-Id: Ide549ae0d80413bea8c19fe091d97bffe8b17985
2015-01-16 15:56:34 +01:00
Djordje Pesut
5487529368
MIPS: dspr2: added optimization for function QuantizeBlock
...
Change-Id: Id217116890b7408d23464216608ce67ae545688a
2015-01-16 12:51:13 +01:00
James Zern
4fbe9cf202
dsp/cpu: (msvs) avoid immintrin.h on _M_ARM
...
_xgetgv() isn't relevant there anyway
broken since:
279e661
Merge "dsp/cpu: add include for _xgetbv() w/MSVS"
Change-Id: Iaa7bc0c5be9c06bfffab39e194c64c09bf5b5a27
2015-01-15 23:04:08 -08:00
Pascal Massimino
3fd59039bd
simplify/reorganize arguments for CollectColorBlueTransforms
...
and other various call sites too.
Change-Id: Icb8f828dfe25672662de18d0e48e7d3144b1f38d
2015-01-15 18:12:12 -08:00
Djordje Pesut
a7e7caa486
MIPS: dspr2: added optimization for function TransformColorRed
...
added new function CollectColorRedTransforms to C, which calls
TransformColorRed and it is realized via pointer to function
Change-Id: Ia68d73bfcf1ca2cb443dc2825910946221f87835
2015-01-15 09:32:09 +01:00
pascal massimino
2cb39180cc
Merge "MIPS: dspr2: added optimization for function TransformColorBlue"
2015-01-15 00:06:01 -08:00
pascal massimino
279e66138d
Merge "dsp/cpu: add include for _xgetbv() w/MSVS"
2015-01-15 00:05:35 -08:00
James Zern
b6c0428e8c
dsp/cpu: add include for _xgetbv() w/MSVS
...
explicitly add immintrin.h instead of transitively picking it up via
windows.h presumably. makes the code easier to move around.
Change-Id: If70d5143ac94fc331da763ce034358858e460e06
2015-01-14 23:31:35 -08:00
Djordje Pesut
7b16197361
MIPS: dspr2: added optimization for function TransformColorBlue
...
added new function CollectColorBlueTransforms to C, which calls
TransformColorBlue and it is realized via pointer to function
Change-Id: Ia488b7a7a689223b5d33aae9724afab89b97fced
2015-01-13 10:39:38 +01:00
James Zern
d7c4b02a57
cpu: fix AVX2 detection for gcc/clang targets
...
ecx needs to be set to 0; the visual studio builds were already doing
this.
https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
Change-Id: I95efb115b4d50bbdb6b14fca2aa63d0a24974e55
2015-01-12 17:58:57 -08:00
Pascal Massimino
d581ba40ba
follow-up: clean up WebPRescalerXXX dsp function
...
by removing redundant RFIX macros and using a plain-C fallback.
Change-Id: I52436c672bf20780b6fe3bcf43fe73e1abac10ff
2015-01-12 15:26:55 -08:00
James Zern
f8740f0d6c
dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
...
for consistency with other defines shared across modules
Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
Pascal Massimino
ab66becaae
introduce a separate WebPRescalerDspInit to initialize pointers
...
so that we keep the details of WebPRescaler in utils/rescaler.c
when possible.
Change-Id: Ib6c1029a09b84cbc7a7d2f70dafa4d4d9132cecc
2015-01-12 13:58:30 -08:00
pascal massimino
cbcdd5ffaf
Merge "move rescaler functions to rescaler* files in src/dsp/"
2015-01-10 05:41:45 -08:00
pascal massimino
bf586e8844
Merge changes I230b3532,Idf3057a7
...
* changes:
enable NEON for Windows ARM builds
Makefile.vc: add rudimentary Windows ARM support
2015-01-10 02:14:48 -08:00
James Zern
4f43d38ca8
enable NEON for Windows ARM builds
...
Change-Id: I230b353214ce44ab29ffd2df6ccd14345d6578e8
2015-01-09 19:11:55 -08:00
James Zern
e7c5954c10
dec_neon: remove returns from void functions
...
Change-Id: I3c66a5dfe3de2bb3653cbbf1b92b0328aba62881
2015-01-09 18:08:05 -08:00
Djordje Pesut
cbcbedd0de
move rescaler functions to rescaler* files in src/dsp/
...
Change-Id: I906add1b1010a59ebfcc2dd81e15745433cc206b
2015-01-09 16:47:09 +01:00
Djordje Pesut
a28c4b363d
MIPS: move WORK_AROUND_GCC define to appropriate place
...
Change-Id: I3055eca57dc4e9d39533a5b8170bbf7af9cd818f
2015-01-08 15:55:41 +01:00
Djordje Pesut
012d2c60fa
MIPS: dspr2: added optimization for functions SSEAxB
...
list of optimized functions: SSE16x16, SSE8x8, SSE16x8, SSE4x4
Change-Id: Ie99e7cdd73b0d4ff855977315a5d0db9ffaa5f04
2015-01-08 13:49:17 +01:00
Djordje Pesut
9241ecf45d
MIPS: dspr2: added optimization for function Average
...
Change-Id: I7ca316bc3f5fbdaf8dcaf9a2d2227a5134bf4f63
2015-01-08 11:46:15 +01:00
pascal massimino
c6d3292738
argb_sse2: cosmetics
...
clarify some variable names in PackARGB() + add some comments
Change-Id: I2bb91d6c52dcbcdebe0f92d5f2136c2d7d11af2a
2015-01-08 00:18:54 -08:00
James Zern
67f601cd46
make the 'last_cpuinfo_used' variable names unique
...
allows the sources to be #include'd in some hackish builds (don't do
that!)
Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91
2015-01-07 23:38:53 -08:00
Pascal Massimino
9592053859
Merge "multi-thread fix: lock each entry points with a static var"
2015-01-07 00:03:51 -08:00
Pascal Massimino
4c1b300ada
Merge "SSE2 implementation of VP8PackARGB"
2015-01-06 23:53:50 -08:00
James Zern
04c20e75ea
Merge "MIPS: dspr2: added optimization for function Intra4Preds"
2015-01-06 16:15:10 -08:00
Pascal Massimino
a437694a17
multi-thread fix: lock each entry points with a static var
...
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)
Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
2015-01-05 07:48:49 -08:00
Pascal Massimino
ca7f60db5f
SSE2 implementation of VP8PackARGB
...
Change-Id: I40c0e26a6a2701216e4ddebcf793aa535677f437
2015-01-05 05:17:51 -08:00
Pascal Massimino
72d573f693
simplify the PackARGB signature
...
Change-Id: I51570e362126b2681f93211a4f59a3fedb5fd4b5
2015-01-05 02:10:04 -08:00
James Zern
f8abb112f2
Merge changes I109ec4d9,I73fe7743
...
* changes:
dec_neon: add DC8uvNoTop / DC8uvNoLeft
dec_neon: add DC8uv
2014-12-23 09:11:22 -08:00
Djordje Pesut
ae2188a435
MIPS: dspr2: added optimization for function Intra4Preds
...
Change-Id: Ie2a23c356a8715817b020fbee2b40e878e2946de
2014-12-23 17:32:27 +01:00
James Zern
14108d7878
dec_neon: add DC8uvNoTop / DC8uvNoLeft
...
adds do_top/do_left flags to DC8uv; ~88% / ~92% faster respectively
no change in DC8uv speed.
Change-Id: I109ec4d9ad13c9db64516e98ed4693a21a3e9b54
2014-12-22 15:47:38 -05:00
James Zern
d8340da756
dec_neon: add DC8uv
...
~87% faster.
Change-Id: I73fe77437792f1361ce8ab0b411132c6ec0fa021
2014-12-22 14:36:45 -05:00