check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.
Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
we don't need to store the whole distribution in order to compute the alpha
Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.
Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.
Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
CollectHistogram / SSE* / QuantizeBlock have no inline equivalents,
enable them where possible and use USE_INTRINSICS to control borderline
cases: it's left undefined for now.
Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155
apparently faster, but we might save some load/store to/from memory
once we settle for the intrinsics-based FTransform()
(also: fixed some #ifdef USE_INTRINSICS problems)
Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7
slightly faster than the inline asm
in practice not much faster than the C-code in a full NEON build, but
still better overall in an Android-like one that only enables NEON for
certain files.
Change-Id: I69534016186064fd92476d5eabc0f53462d53146
* inverse transform is actually slower with intrinsics + gcc-4.6,
so is left disabled for now.
With gcc-4.8, it's a bit faster than inlined assembly.
* Sum of Square error function provide a 2-3% speed up
There's enabled by default (since there's no inlined-asm equivalent)
Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.
also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )
Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
Contributed by Wayne Chen (datoudatou at gmail dot com)
+ some header cleanup
+ remove the NEON suffix in static functions
Change-Id: I75bf5e9b54cf5e1acc53764c6f081d61690f8e3d
(implements the backward and forward transforms in the encoder)
original patch by Wayne Chen (datoudatou at gmail dot com)
Change-Id: Ic00f3bffcdf7a924f043006728735c810ee47a57