Commit Graph

457 Commits

Author SHA1 Message Date
James Zern
c24d8f144f cosmetics: upsampling_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: Ia6df6690c85f580b20f19ce85cc6ec7b52620aee
2015-02-05 23:51:57 -08:00
James Zern
1829c42c58 cosmetics: lossless_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I2405b18c6bb829b76c3a9814057ccbe6e14220d9
2015-02-05 23:51:44 -08:00
James Zern
183168f332 cosmetics: enc_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: Ib85d63abbb9fc33096f893c2524d3ce8ae3ebd03
2015-02-05 23:51:29 -08:00
James Zern
860badcacc cosmetics: dec_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I77decb55f1382bea4b646a11b77dfa40bf1ef94d
2015-02-05 23:51:16 -08:00
James Zern
0254db9793 cosmetics: argb_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I87fa2de11dafcc77767aab64e13b8c5585ebf5cd
2015-02-05 23:51:07 -08:00
James Zern
1aadf856c9 cosmetics: alpha_processing_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I00ba15af2f43d125ceb2620e82fd43d420fbb9d3
2015-02-05 23:50:39 -08:00
Pascal Massimino
19f0ba0eb9 Implement true-motion prediction in SSE2
(along with DC/HE/VE for chroma/luma16)

Overall effect is ~1% faster decoding.

Change-Id: I90917e050d61874cbc8da0e88f26b5dd6131c265
2015-02-04 17:02:22 +01:00
Pascal Massimino
774d4cb758 make VP8PredLuma16[] array non-const
Change-Id: I0ce7e4e847f9fffefb6544db9636068442a2d264
2015-02-04 17:00:22 +01:00
Djordje Pesut
6ce296da12 MIPS: dspr2: Added optimization for function CollectHistogram
Change-Id: Id6b87ea1c9d21fee9494ad6c53ffc84ef60d5974
2015-02-03 14:11:20 +01:00
James Zern
b7de794622 Merge "lossless_neon: enable subtract green for aarch64" 2015-02-02 16:01:40 -08:00
Pascal Massimino
77724f70e9 SSE2 version of GradientUnfilter
somewhat 1-2% faster decoder for lossy+alpha

Change-Id: Ib317e26e9fcb8d37af02668ffbfccc4664e659fe
2015-01-31 23:18:00 +01:00
James Zern
416e1cea9b lossless_neon: enable subtract green for aarch64
similar to:
1ba61b0 enable NEON intrinsics in aarch64 builds

vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.

performance varies based on the input, 1-3% on encode was observed

Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
2015-01-31 11:32:05 -08:00
Pascal Massimino
022d2f886c add SSE2 variants for alpha filtering functions
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.

The heavy bottleneck left for optimization is still GradientUnfilter()

Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
2015-01-29 08:46:22 +01:00
Pascal Massimino
7afdaf8496 Alpha coding: reorganize the filter/unfiltering code
Move the filtering code to their own dsp/ spot
New function: VP8FiltersInit()

Change-Id: I0b2041eab42346c59b972f2575b05509e6a8f7b1
2015-01-28 08:02:41 +01:00
Djordje Pesut
da0912126b MIPS: dspr2: Added optimization for function FTransformWHT
Change-Id: I918366cd1908304068c66da9965efb0aa63320cd
2015-01-19 10:15:13 +01:00
pascal massimino
daeb276a2b Merge "MIPS: dspr2: Added optimization for MultARGBRow function" 2015-01-17 08:56:01 -08:00
James Zern
0de5f33e31 dsp/cpu: (msvs) add include for __cpuidex
and only use it on x86 / x64 where it's available.
has the side-effect of quieting a msvs /analyze warning:
C6001: Using uninitialized memory 'cpu_info'.

Change-Id: Iae51be3b22b2ee949cfc473eeea9fd9fb6b3c2cb
2015-01-16 18:16:10 -08:00
Djordje Pesut
7d850f7b9a MIPS: dspr2: Added optimization for MultARGBRow function
Change-Id: Ide549ae0d80413bea8c19fe091d97bffe8b17985
2015-01-16 15:56:34 +01:00
Djordje Pesut
5487529368 MIPS: dspr2: added optimization for function QuantizeBlock
Change-Id: Id217116890b7408d23464216608ce67ae545688a
2015-01-16 12:51:13 +01:00
James Zern
4fbe9cf202 dsp/cpu: (msvs) avoid immintrin.h on _M_ARM
_xgetgv() isn't relevant there anyway

broken since:
279e661 Merge "dsp/cpu: add include for _xgetbv() w/MSVS"

Change-Id: Iaa7bc0c5be9c06bfffab39e194c64c09bf5b5a27
2015-01-15 23:04:08 -08:00
Pascal Massimino
3fd59039bd simplify/reorganize arguments for CollectColorBlueTransforms
and other various call sites too.

Change-Id: Icb8f828dfe25672662de18d0e48e7d3144b1f38d
2015-01-15 18:12:12 -08:00
Djordje Pesut
a7e7caa486 MIPS: dspr2: added optimization for function TransformColorRed
added new function CollectColorRedTransforms to C, which calls
TransformColorRed and it is realized via pointer to function

Change-Id: Ia68d73bfcf1ca2cb443dc2825910946221f87835
2015-01-15 09:32:09 +01:00
pascal massimino
2cb39180cc Merge "MIPS: dspr2: added optimization for function TransformColorBlue" 2015-01-15 00:06:01 -08:00
pascal massimino
279e66138d Merge "dsp/cpu: add include for _xgetbv() w/MSVS" 2015-01-15 00:05:35 -08:00
James Zern
b6c0428e8c dsp/cpu: add include for _xgetbv() w/MSVS
explicitly add immintrin.h instead of transitively picking it up via
windows.h presumably. makes the code easier to move around.

Change-Id: If70d5143ac94fc331da763ce034358858e460e06
2015-01-14 23:31:35 -08:00
Djordje Pesut
7b16197361 MIPS: dspr2: added optimization for function TransformColorBlue
added new function CollectColorBlueTransforms to C, which calls
TransformColorBlue and it is realized via pointer to function

Change-Id: Ia488b7a7a689223b5d33aae9724afab89b97fced
2015-01-13 10:39:38 +01:00
James Zern
d7c4b02a57 cpu: fix AVX2 detection for gcc/clang targets
ecx needs to be set to 0; the visual studio builds were already doing
this.

https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family

Change-Id: I95efb115b4d50bbdb6b14fca2aa63d0a24974e55
2015-01-12 17:58:57 -08:00
Pascal Massimino
d581ba40ba follow-up: clean up WebPRescalerXXX dsp function
by removing redundant RFIX macros and using a plain-C fallback.

Change-Id: I52436c672bf20780b6fe3bcf43fe73e1abac10ff
2015-01-12 15:26:55 -08:00
James Zern
f8740f0d6c dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
for consistency with other defines shared across modules

Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
Pascal Massimino
ab66becaae introduce a separate WebPRescalerDspInit to initialize pointers
so that we keep the details of WebPRescaler in utils/rescaler.c
when possible.

Change-Id: Ib6c1029a09b84cbc7a7d2f70dafa4d4d9132cecc
2015-01-12 13:58:30 -08:00
pascal massimino
cbcdd5ffaf Merge "move rescaler functions to rescaler* files in src/dsp/" 2015-01-10 05:41:45 -08:00
pascal massimino
bf586e8844 Merge changes I230b3532,Idf3057a7
* changes:
  enable NEON for Windows ARM builds
  Makefile.vc: add rudimentary Windows ARM support
2015-01-10 02:14:48 -08:00
James Zern
4f43d38ca8 enable NEON for Windows ARM builds
Change-Id: I230b353214ce44ab29ffd2df6ccd14345d6578e8
2015-01-09 19:11:55 -08:00
James Zern
e7c5954c10 dec_neon: remove returns from void functions
Change-Id: I3c66a5dfe3de2bb3653cbbf1b92b0328aba62881
2015-01-09 18:08:05 -08:00
Djordje Pesut
cbcbedd0de move rescaler functions to rescaler* files in src/dsp/
Change-Id: I906add1b1010a59ebfcc2dd81e15745433cc206b
2015-01-09 16:47:09 +01:00
Djordje Pesut
a28c4b363d MIPS: move WORK_AROUND_GCC define to appropriate place
Change-Id: I3055eca57dc4e9d39533a5b8170bbf7af9cd818f
2015-01-08 15:55:41 +01:00
Djordje Pesut
012d2c60fa MIPS: dspr2: added optimization for functions SSEAxB
list of optimized functions: SSE16x16, SSE8x8, SSE16x8, SSE4x4

Change-Id: Ie99e7cdd73b0d4ff855977315a5d0db9ffaa5f04
2015-01-08 13:49:17 +01:00
Djordje Pesut
9241ecf45d MIPS: dspr2: added optimization for function Average
Change-Id: I7ca316bc3f5fbdaf8dcaf9a2d2227a5134bf4f63
2015-01-08 11:46:15 +01:00
pascal massimino
c6d3292738 argb_sse2: cosmetics
clarify some variable names in PackARGB() + add some comments

Change-Id: I2bb91d6c52dcbcdebe0f92d5f2136c2d7d11af2a
2015-01-08 00:18:54 -08:00
James Zern
67f601cd46 make the 'last_cpuinfo_used' variable names unique
allows the sources to be #include'd in some hackish builds (don't do
that!)

Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91
2015-01-07 23:38:53 -08:00
Pascal Massimino
9592053859 Merge "multi-thread fix: lock each entry points with a static var" 2015-01-07 00:03:51 -08:00
Pascal Massimino
4c1b300ada Merge "SSE2 implementation of VP8PackARGB" 2015-01-06 23:53:50 -08:00
James Zern
04c20e75ea Merge "MIPS: dspr2: added optimization for function Intra4Preds" 2015-01-06 16:15:10 -08:00
Pascal Massimino
a437694a17 multi-thread fix: lock each entry points with a static var
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)

Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
2015-01-05 07:48:49 -08:00
Pascal Massimino
ca7f60db5f SSE2 implementation of VP8PackARGB
Change-Id: I40c0e26a6a2701216e4ddebcf793aa535677f437
2015-01-05 05:17:51 -08:00
Pascal Massimino
72d573f693 simplify the PackARGB signature
Change-Id: I51570e362126b2681f93211a4f59a3fedb5fd4b5
2015-01-05 02:10:04 -08:00
James Zern
f8abb112f2 Merge changes I109ec4d9,I73fe7743
* changes:
  dec_neon: add DC8uvNoTop / DC8uvNoLeft
  dec_neon: add DC8uv
2014-12-23 09:11:22 -08:00
Djordje Pesut
ae2188a435 MIPS: dspr2: added optimization for function Intra4Preds
Change-Id: Ie2a23c356a8715817b020fbee2b40e878e2946de
2014-12-23 17:32:27 +01:00
James Zern
14108d7878 dec_neon: add DC8uvNoTop / DC8uvNoLeft
adds do_top/do_left flags to DC8uv; ~88% / ~92% faster respectively
no change in DC8uv speed.

Change-Id: I109ec4d9ad13c9db64516e98ed4693a21a3e9b54
2014-12-22 15:47:38 -05:00
James Zern
d8340da756 dec_neon: add DC8uv
~87% faster.

Change-Id: I73fe77437792f1361ce8ab0b411132c6ec0fa021
2014-12-22 14:36:45 -05:00
Djordje Pesut
7ce8788b06 MIPS: dspr2: added optimization for function MakeARGB32
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row

Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
2014-12-22 12:31:36 +01:00
Pascal Massimino
87c3d53180 method=0: Don't evaluate any predictor
and apply Paeth predictor (predictor#11) for the low effort (m=0) mode.

For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality
range and about 10% for higher quality range.

Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93
2014-12-17 18:41:08 +01:00
Pascal Massimino
31a9cf6417 Speedup WebP lossless compression for low effort (m=0) mode with following:
- Disable Cross-Color transform.
- Evaluate predictors #11 (paeth), #12 and #13 only.

Change-Id: I857264c85c61c3957d4fb45ae32d261d947c8bed
2014-12-17 11:52:11 +01:00
Djordje Pesut
9275d91c79 MIPS: dspr2: added optimization for function TrueMotion
Change-Id: Id006d9591c0c922e28f7f4c01e4006f0f07bdd56
2014-12-12 14:38:55 +01:00
James Zern
a3946b8956 enc_neon: fix building with non-Xcode clang (iOS)
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.

Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
2014-12-10 15:50:26 -08:00
Pascal Massimino
8ed9c00d5e Merge "simplify the Histogram struct, to only store max_value and last_nz" 2014-12-10 02:02:05 -08:00
Pascal Massimino
bad775715a simplify the Histogram struct, to only store max_value and last_nz
we don't need to store the whole distribution in order to compute the alpha

Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.

Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
2014-12-10 10:44:57 +01:00
Djordje Pesut
3cca0dc7f0 MIPS: dspr2: Added optimization for DCMode function
Change-Id: I8ea31907c1ea1259ec4db8cee1a479bd13a025a1
2014-12-09 13:58:39 +01:00
Djordje Pesut
37e395fd1c MIPS: fix functions to use generic BPS istead of hardcoded value
Change-Id: I2d68abef886eff7f8df230f155b758dccd7d04fd
2014-12-05 15:55:47 +01:00
Pascal Massimino
4a279a680e cosmetics: add some missing != NULL comparisons
Change-Id: I55f8da527e5e8ee4b49c7e7aa0d61ea4a6c80904
2014-12-04 14:54:11 +01:00
Pascal Massimino
66ad372500 factorize BPS definition in dsp.h and add VP8Copy16x8
Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d
2014-12-04 13:45:14 +01:00
Pascal Massimino
57606047ec encoder: switch BPS to 32 instead of 16
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/

With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.

Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
2014-12-04 09:17:18 +01:00
Djordje Pesut
1b66bbe998 MIPS: dspr2: added optimization for function TransformColor_C
Change-Id: Idbf5cecf6775340585b0fd7e6ddcb29c2fcbea36
2014-12-01 15:46:06 +01:00
James Zern
9de9074c92 dec_neon: add TM8uv
~68% faster

reuses TM4() adding support for the additional rows, the columns were
already being done.

Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
2014-11-25 14:40:17 -08:00
James Zern
e18571393d dsp: initialize VP8PredChroma8 in VP8DspInit()
the table becomes non-const to allow for platform-specific optimizations

Change-Id: I32d2b51480020dc653ecfafd20b6b0f096af349f
2014-11-24 22:12:42 -08:00
Vikas Arora
e0c809ad23 Move Entropy methods to lossless.c
Move all the Entropy evaluation methods to lossless.c (from histogram.c).
There's slight difference in the way entropy is computed for evaluating
entropy in prediction methods and histogram (literal) for huffman trees.
Plan (later) to merge few (static) methods and reduce the code size.

This change has no impact on the compression speed/density.

Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15
2014-11-20 13:48:05 -08:00
Djordje Pesut
2f0e2ba826 MIPS: dspr2: added optimization for function Select
Change-Id: I22470d8b9ab8c5e90c5330ff12c9852676da1a3d
2014-11-07 09:44:16 +01:00
Djordje Pesut
54f2c14cce MIPS: dspr2: added optimization for function FTransform
Change-Id: Ib5850edbc2a586ec9781f494b2337f024e22af78
2014-11-06 14:21:33 +01:00
Djordje Pesut
aa42f4231f MIPS: dspr2: Added optimization for function VP8LSubtractGreenFromBlueAndRed
Change-Id: I683c73cceee4a40ca810deba15e54fbf7dbe8918
2014-11-06 10:56:18 +01:00
Djordje Pesut
95ca44a718 MIPS: dspr2: added optimization for Disto4x4
enc/dec common macros moved to mips_macro.h

Change-Id: I38d491e772554ac663dd5eb4d15485c0343f23b1
2014-11-05 12:06:15 +01:00
Djordje Pesut
5798eee6be MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899)
Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140
2014-11-03 15:39:00 +01:00
James Zern
572022a350 filters_mips_dsp_r2.c: disable unfilters
the output does not match the C-code.

Change-Id: Ie7b7387478a6b5c3f08691628ae00f059cf6d899
2014-10-30 11:10:11 +01:00
Djordje Pesut
a28e21b141 MIPS: dspr2: Added optimization for function ClampedAddSubtractFull
Change-Id: Iee98eaf007158f44a299dd5ba8d972d0d4108380
2014-10-29 13:08:06 +01:00
Djordje Pesut
18d5a1efa8 MIPS: dspr2: added optimization for function ClampedAddSubtractHalf
Change-Id: Iec22e897a4f56e79c18ec00f8caa9cefac67f186
2014-10-29 11:08:37 +01:00
Djordje Pesut
829a8c19a0 MIPS: dspr2: added optimization for ITransform
Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463
2014-10-28 14:28:14 +01:00
James Zern
22881c999e dec_neon: add RD4 intra predictor
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.

Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
1304eb3418 Merge "dec_neon: DC4: use pair-wise adds for top row" 2014-10-23 08:08:34 -07:00
James Zern
0db9031c79 dsp/dec_{neon,sse2}: VE4: normalize variable names
use '0' rather than '_' when dealing with variables that result from a
shift

Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
James Zern
b5bc15305b dec_neon: DC4: use pair-wise adds for top row
reduces load count, slightly faster

Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08
2014-10-23 15:48:49 +02:00
James Zern
eba6ce06c3 dec_neon: add DC4 intra predictor
~70% faster

Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0
2014-10-23 14:21:08 +02:00
James Zern
79abfbd9df dec_neon: add TM4 intra predictor
~21% faster

Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a
2014-10-23 14:21:08 +02:00
James Zern
fe395f0e4d dec_neon: add LD4 intra predictor
based on SSE2 version, ~55% faster

Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0
2014-10-23 14:20:47 +02:00
James Zern
32de385eca dec_neon: add VE4 intra predictor
based on SSE2 version, ~59% faster

Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b
2014-10-23 11:46:08 +02:00
Pascal Massimino
b7a33d7e91 implement VE4/HE4/RD4/... in SSE2
(30% faster prediction functions, but overall speed-up is ~1% only)

Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765
2014-10-22 18:25:36 +02:00
Pascal Massimino
97c76f1f30 make VP8PredLuma4[] non-const and initialize array in VP8DspInit()
also convert 'type *dst' to 'type* dst'

Change-Id: I41ab66ad15b548cc45d1cb8b10bbca4fe1528cae
2014-10-22 18:14:20 +02:00
James Zern
f85ec712b0 PrintReg: output to stderr
allows use of '-o -' while testing

Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191
2014-10-22 17:28:19 +02:00
James Zern
d1c359ef29 fix shared object build with -fvisibility=hidden
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples

Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
2014-10-17 11:50:52 +02:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6 mark some init function as being safe for thread_sanitizer.
introduces the macro WEBP_TSAN_IGNORE_FUNCTION

Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
James Zern
0ce27e715e enc_mips32: workaround gcc-4.9 bug
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9

In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault

Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
2014-10-15 19:14:04 +02:00
pascal massimino
32f67e309f Merge "enc_neon: initialize vectors w/vdup_n_u32" 2014-10-09 12:23:18 -07:00
Pascal Massimino
fabc65da32 1-3% faster encoding optimizing SSE_NxN functions
got rid of the |a-b|^|b-a| method and went back
to just (a-b)^2 instead.

quality | size(bytes) after/before | time (ms) after/before

Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0
2014-10-09 07:20:00 -07:00
James Zern
7534d71640 enc_neon: initialize vectors w/vdup_n_u32
replaces {} initialization gnu-ism

Change-Id: I5a7b2d4246f0205e4bfb7f4b77d720c47d8674ec
2014-10-09 12:35:41 +02:00
Pascal Massimino
2d9b0a4472 add WebPDispatchAlphaToGreen() to dsp
SSE2 version is 2.1x faster

This is used to transfer the alpha plane to green channel before lossless compression.

Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
2014-10-06 23:15:44 +02:00
Yang Zhang
ab70794ddb rewrite Disto4x4 in enc_neon.c with intrinsic
Performance test:
Platform: A9
Input data: bryce.yuv  11158x2156
performance of assembly is the base. Less ratio is better.
|toolchain |assembly |intrinsic |
|gcc4.6    |100%     |97.15%    |
|gcc4.8    |100%     |95.51     |

Change-Id: Idc2446685acdeb58a4dbdcdae533c68a83a1b879
2014-09-23 18:28:36 -07:00
Djordje Pesut
d4471637ef MIPS: dspr2: added optimization for function FilterLoop24
affected functions: VFilter16i, HFilter16i, VFilter8i and HFilter8i

Change-Id: I5d2bc7716e60e048a33d630fe4a86011bfb6d42e
2014-09-23 10:32:55 +02:00
Djordje Pesut
49e15044ef MIPS: dspr2: added optimization for function FilterLoop26
affected functions: VFilter16, HFilter16, VFilter8 and HFilter8

Change-Id: Ib2fc41aaa00b10c2906d689bdc5a10f4568e70a8
2014-09-23 08:46:05 +02:00
Pascal Massimino
cddd334050 Add a WebPExtractAlpha function to dsp
This is the opposite of WebPDispatchAlpha

+ Implement the SSE2 version

Change-Id: I0c297309255f508c5261da8aad01f7e57f924d6c
2014-09-15 08:12:03 +02:00
Pascal Massimino
690b491af1 fix loop bug in DispatchAlpha()
* We were re-doing most of the work in plain-C as 'left-over'.
* we were always returning has_alpha = true because of a bad mask all_0xff

These bugs were conservative and silent, in the sense that we were 'just' doing
more work than necessary.

Now, the SSE2 version is really 2x faster than the C version.

Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa
2014-09-11 22:35:08 +02:00
Djordje Pesut
3101f53720 MIPS: dspr2: added optimization for TransformOne
added macros for TransformOne, TransformAC3 and TransfromDC

Change-Id: I4341450f443cf46dcf91c0db17bde63c8fb8afee
2014-09-11 17:02:02 +02:00