Defining LOCAL_ARM_NEON = true can result in neon instructions being
used in portions unprotected by the cpu check.
This changes defines a WEBP_USE_NEON/WEBP_ANDROID_NEON pair similar to
the SSE2 code and MSVC.
Change-Id: Ifac010b06e42c73d5aca529baa2198c6796674bd
As with the loop filter, implementing this with intrinsics is difficult
because we require subscript access for reading and writing 32 bits at a
time.
Approximately 5% decode speed improvement. This could be increased by
exposing TransformOne and rewriting TransformTwo to only handle dual
IDCTs.
Change-Id: Idd409264ab5d154a537107a1d54b419a48f7c1a8