this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.
Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
when targeting NEON C functions with NEON equivalents won't be used, but
will contribute to binary size. the same goes for sse2, etc., but this
change is primarily concerned with binary sizes for android arm targets.
note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect
on the use of NEON functions.
this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0.
Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface
This will allow to work in-place on cropped area later.
Also sped up the inverse gradient filtering in SSE2 (~4%)
Change-Id: I463149eee95d36984328f163a1e17f8cabd87441
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.
The heavy bottleneck left for optimization is still GradientUnfilter()
Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9