As with the loop filter, implementing this with intrinsics is difficult
because we require subscript access for reading and writing 32 bits at a
time.
Approximately 5% decode speed improvement. This could be increased by
exposing TransformOne and rewriting TransformTwo to only handle dual
IDCTs.
Change-Id: Idd409264ab5d154a537107a1d54b419a48f7c1a8
gcc-4.0 defines __PIC__ but not __pic__. This leaves the test for
__pic__ should the inverse case exist.
Fixes issue #103; build failing with:
"error: can't find a register in class 'BREG' while reloading 'asm'"
Change-Id: Ia767a733de6ce0294146f9477ff9c46f0ebe13b0
- Fix the off-by-one diff when cropping with simple-filter.
- Fix a bug in incremental decoding in case of alpha.
- In VP8FinishRow(), do not decode alpha when y_start > y_end.
- Correct output of alpha channel for MODE_ARGB.
- Correct output of alpha channel for MODE_RGBA_4444.
Change-Id: I785763a2a704b973cc742ad93ffbb53699d1fc0a
Gathers all DSP-related function (and SSE2 implementations).
Clean-up some unwanted symbolic dependencies so that webp_encode,
webp_decode and webp_dsp are truly independent libraries.
+ opportunistic clean-up:
* remove unneeded VP8DspInitTables(), now integrated in VP8DspInit()
* make consistent use of VP8GetCPUInfo() in the various DspInit() funcs
* change OUT macro to DST