global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)
Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV
Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.
ConvertRowToY() should go into dsp/ too, at some point.
Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425
also switch to using ExtractAlpha() instead of hard-coding the loop.
The ARGBToY/UV functions are rather easy to port to SSE2 / NEON.
Change-Id: I8f1346a9ca427a36ce2d6c848369ca7964d8b3c7
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)
Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
The C-version gets ~7-8% slower in order to match the SSE2
output exactly. The old (now off-by-1) code is kept under
the WEBP_YUV_USE_TABLE flag for reference.
(note that calc rounding precision is slightly better ~= +0.02dB)
on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking
the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610)
Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
Reported to eventually be 4% on ARM
(see https://code.google.com/p/webp/issues/detail?id=134 for details)
We might activate it selectively later...
Output values is not bitwise the same as the LUT-based
version, but difference is only +/-1 at max.
Change-Id: I1cc790ff4459885ed2ae2e72f31c5f3740095f07
This is mostly for experimentation!
Need to define USE_YUVj flag in the code for that.
suggested by benwreder at hotmail dot com
Change-Id: If0b8e2c1863efc08ce097de6de20f4c7efc3f7e8
Gathers all DSP-related function (and SSE2 implementations).
Clean-up some unwanted symbolic dependencies so that webp_encode,
webp_decode and webp_dsp are truly independent libraries.
+ opportunistic clean-up:
* remove unneeded VP8DspInitTables(), now integrated in VP8DspInit()
* make consistent use of VP8GetCPUInfo() in the various DspInit() funcs
* change OUT macro to DST