add some colorspace conversion functions in NEON

new file: lossless_neon.c
speedup is ~5%

gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509

I've tried adding  -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.

Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0
This commit is contained in:
skal
2014-03-31 16:36:33 +02:00
parent daccbf400d
commit 97e5fac389
6 changed files with 96 additions and 1 deletions

View File

@ -174,6 +174,7 @@ DSP_DEC_OBJS = \
$(DIROBJ)\dsp\dec_neon.obj \
$(DIROBJ)\dsp\dec_sse2.obj \
$(DIROBJ)\dsp\lossless.obj \
$(DIROBJ)\dsp\lossless_neon.obj \
$(DIROBJ)\dsp\lossless_sse2.obj \
$(DIROBJ)\dsp\upsampling.obj \
$(DIROBJ)\dsp\upsampling_mips32.obj \