Regroup common SSE code + optimization.

The transpose refactoring will help removing a transpose in a
later CL.

The horizontal add function helps removing a _mm_sad_epu8 in DC8uv
=> the latency/throughput went from 29/25 to 23/19

Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766
This commit is contained in:
Vincent Rabaud
2016-02-15 15:17:11 +01:00
parent 4ed650a13d
commit bf2b4f114f
5 changed files with 126 additions and 207 deletions

View File

@ -12,6 +12,7 @@ commondir = $(includedir)/webp
COMMON_SOURCES =
COMMON_SOURCES += alpha_processing.c
COMMON_SOURCES += alpha_processing_mips_dsp_r2.c
COMMON_SOURCES += common_sse2.h
COMMON_SOURCES += cpu.c
COMMON_SOURCES += dec.c
COMMON_SOURCES += dec_clip_tables.c