add some colorspace conversion functions in NEON

new file: lossless_neon.c
speedup is ~5%

gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509

I've tried adding  -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.

Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0
This commit is contained in:
skal
2014-03-31 16:36:33 +02:00
parent daccbf400d
commit 97e5fac389
6 changed files with 96 additions and 1 deletions

View File

@ -1475,6 +1475,7 @@ VP8LConvertFunc VP8LConvertBGRAToRGB565;
VP8LConvertFunc VP8LConvertBGRAToBGR;
extern void VP8LDspInitSSE2(void);
extern void VP8LDspInitNEON(void);
void VP8LDspInit(void) {
memcpy(VP8LPredictors, kPredictorsC, sizeof(VP8LPredictors));
@ -1494,6 +1495,11 @@ void VP8LDspInit(void) {
if (VP8GetCPUInfo(kSSE2)) {
VP8LDspInitSSE2();
}
#endif
#if defined(WEBP_USE_NEON)
if (VP8GetCPUInfo(kNEON)) {
VP8LDspInitNEON();
}
#endif
}
}