we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)
Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
* remove some sign-bit flipping
* turn some macro into inline functions
* fix some 'const' in signatures
* clarify the int8/uint8 usage
Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
This flag will make the code use no uint64, no asm, and no fancy
trick, but instead aim at being as simple and straightforward as
possible.
Main use is to help emscripten generate proper JS code.
More code needs to be simplified later.
Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY
Here are the typical timing for decoding a large image:
ARM7-a:
dwebp_justify_32_neon Time to decode picture: 3.280s
dwebp_justify_24_neon Time to decode picture: 2.640s
dwebp_justify_16_neon Time to decode picture: 2.723s
dwebp_justify_8_neon Time to decode picture: 2.802s
dwebp_justify_32 Time to decode picture: 4.264s
dwebp_justify_24 Time to decode picture: 3.696s
dwebp_justify_16 Time to decode picture: 3.779s
dwebp_justify_8 Time to decode picture: 3.834s
dwebp_32_neon Time to decode picture: 4.010s
dwebp_24_neon Time to decode picture: 2.725s
dwebp_16_neon Time to decode picture: 2.852s
dwebp_8_neon Time to decode picture: 2.778s
dwebp_32 Time to decode picture: 4.587s
dwebp_24 Time to decode picture: 3.800s
dwebp_16 Time to decode picture: 3.902s
dwebp_8 Time to decode picture: 3.815s
REFERENCE (HEAD) Time to decode picture: 3.818s
x86_64:
dwebp_justify_32 Time to decode picture: 0.473s
dwebp_justify_24 Time to decode picture: 0.434s
dwebp_justify_16 Time to decode picture: 0.450s
dwebp_justify_8 Time to decode picture: 0.467s
dwebp_32 Time to decode picture: 0.474s
dwebp_24 Time to decode picture: 0.468s
dwebp_16 Time to decode picture: 0.468s
dwebp_8 Time to decode picture: 0.481s
REFERENCE (HEAD) Time to decode picture: 0.436s
i386:
dwebp_justify_32 Time to decode picture: 0.723s
dwebp_justify_24 Time to decode picture: 0.618s
dwebp_justify_16 Time to decode picture: 0.626s
dwebp_justify_8 Time to decode picture: 0.651s
dwebp_32 Time to decode picture: 0.744s
dwebp_24 Time to decode picture: 0.627s
dwebp_16 Time to decode picture: 0.642s
dwebp_8 Time to decode picture: 0.642s
Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225
Defining LOCAL_ARM_NEON = true can result in neon instructions being
used in portions unprotected by the cpu check.
This changes defines a WEBP_USE_NEON/WEBP_ANDROID_NEON pair similar to
the SSE2 code and MSVC.
Change-Id: Ifac010b06e42c73d5aca529baa2198c6796674bd
- remove some unused functions
- move global arrays from data to read only section
- explicitly cast malloc returns; not specifically necessary, but helps
show intent
- miscellaneous formatting
Change-Id: Ib15fe5b37fe6c29c369ad928bdc3a7290cd13c84
Gathers all DSP-related function (and SSE2 implementations).
Clean-up some unwanted symbolic dependencies so that webp_encode,
webp_decode and webp_dsp are truly independent libraries.
+ opportunistic clean-up:
* remove unneeded VP8DspInitTables(), now integrated in VP8DspInit()
* make consistent use of VP8GetCPUInfo() in the various DspInit() funcs
* change OUT macro to DST