move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
The predictors based on Average2 are tad slower.
Following is the performance data for these predictors normalized to
number of instruction cycles (as per valgrind) per operation:
- Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11
instruction cycles for the C version.
- Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12
instruction cycles for the C version.
The predictors based on Average4 is faster and Average3 is tad slower:
- Predictor10 (Average4) now takes 23 instruction cycles compared to 25
instruction cycles for the C version.
- Predictor5 (Average3) now takes 20 instruction cycles compared to 18
instruction cycles for the C version.
Maybe SSE2 version of Average2 can be improved further. Otherwise, we can
remove the SSE2 version and always fallback to the C version.
Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029
* merged the two HistogramAdd/AddEval() into a single call
(with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster
Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
When 4 pixels are left, they should be processed with SSE2.
Decoding is marginally faster (~0.4%).
Encoding speed: No observable difference.
Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e
expose the predictor array as function pointers instead
of each individual sub-function
+ merged Average2() into ClampedAddSubtractHalf directly
+ unified the signature as "VP8LProcessBlueAndRedFunc"
no speed diff observed
Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044