Commit Graph

407 Commits

Author SHA1 Message Date
Djordje Pesut
7ce8788b06 MIPS: dspr2: added optimization for function MakeARGB32
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row

Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
2014-12-22 12:31:36 +01:00
Pascal Massimino
87c3d53180 method=0: Don't evaluate any predictor
and apply Paeth predictor (predictor#11) for the low effort (m=0) mode.

For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality
range and about 10% for higher quality range.

Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93
2014-12-17 18:41:08 +01:00
Pascal Massimino
31a9cf6417 Speedup WebP lossless compression for low effort (m=0) mode with following:
- Disable Cross-Color transform.
- Evaluate predictors #11 (paeth), #12 and #13 only.

Change-Id: I857264c85c61c3957d4fb45ae32d261d947c8bed
2014-12-17 11:52:11 +01:00
Djordje Pesut
9275d91c79 MIPS: dspr2: added optimization for function TrueMotion
Change-Id: Id006d9591c0c922e28f7f4c01e4006f0f07bdd56
2014-12-12 14:38:55 +01:00
James Zern
a3946b8956 enc_neon: fix building with non-Xcode clang (iOS)
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.

Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
2014-12-10 15:50:26 -08:00
Pascal Massimino
8ed9c00d5e Merge "simplify the Histogram struct, to only store max_value and last_nz" 2014-12-10 02:02:05 -08:00
Pascal Massimino
bad775715a simplify the Histogram struct, to only store max_value and last_nz
we don't need to store the whole distribution in order to compute the alpha

Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.

Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
2014-12-10 10:44:57 +01:00
Djordje Pesut
3cca0dc7f0 MIPS: dspr2: Added optimization for DCMode function
Change-Id: I8ea31907c1ea1259ec4db8cee1a479bd13a025a1
2014-12-09 13:58:39 +01:00
Djordje Pesut
37e395fd1c MIPS: fix functions to use generic BPS istead of hardcoded value
Change-Id: I2d68abef886eff7f8df230f155b758dccd7d04fd
2014-12-05 15:55:47 +01:00
Pascal Massimino
4a279a680e cosmetics: add some missing != NULL comparisons
Change-Id: I55f8da527e5e8ee4b49c7e7aa0d61ea4a6c80904
2014-12-04 14:54:11 +01:00
Pascal Massimino
66ad372500 factorize BPS definition in dsp.h and add VP8Copy16x8
Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d
2014-12-04 13:45:14 +01:00
Pascal Massimino
57606047ec encoder: switch BPS to 32 instead of 16
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/

With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.

Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
2014-12-04 09:17:18 +01:00
Djordje Pesut
1b66bbe998 MIPS: dspr2: added optimization for function TransformColor_C
Change-Id: Idbf5cecf6775340585b0fd7e6ddcb29c2fcbea36
2014-12-01 15:46:06 +01:00
James Zern
9de9074c92 dec_neon: add TM8uv
~68% faster

reuses TM4() adding support for the additional rows, the columns were
already being done.

Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
2014-11-25 14:40:17 -08:00
James Zern
e18571393d dsp: initialize VP8PredChroma8 in VP8DspInit()
the table becomes non-const to allow for platform-specific optimizations

Change-Id: I32d2b51480020dc653ecfafd20b6b0f096af349f
2014-11-24 22:12:42 -08:00
Vikas Arora
e0c809ad23 Move Entropy methods to lossless.c
Move all the Entropy evaluation methods to lossless.c (from histogram.c).
There's slight difference in the way entropy is computed for evaluating
entropy in prediction methods and histogram (literal) for huffman trees.
Plan (later) to merge few (static) methods and reduce the code size.

This change has no impact on the compression speed/density.

Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15
2014-11-20 13:48:05 -08:00
Djordje Pesut
2f0e2ba826 MIPS: dspr2: added optimization for function Select
Change-Id: I22470d8b9ab8c5e90c5330ff12c9852676da1a3d
2014-11-07 09:44:16 +01:00
Djordje Pesut
54f2c14cce MIPS: dspr2: added optimization for function FTransform
Change-Id: Ib5850edbc2a586ec9781f494b2337f024e22af78
2014-11-06 14:21:33 +01:00
Djordje Pesut
aa42f4231f MIPS: dspr2: Added optimization for function VP8LSubtractGreenFromBlueAndRed
Change-Id: I683c73cceee4a40ca810deba15e54fbf7dbe8918
2014-11-06 10:56:18 +01:00
Djordje Pesut
95ca44a718 MIPS: dspr2: added optimization for Disto4x4
enc/dec common macros moved to mips_macro.h

Change-Id: I38d491e772554ac663dd5eb4d15485c0343f23b1
2014-11-05 12:06:15 +01:00
Djordje Pesut
5798eee6be MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899)
Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140
2014-11-03 15:39:00 +01:00
James Zern
572022a350 filters_mips_dsp_r2.c: disable unfilters
the output does not match the C-code.

Change-Id: Ie7b7387478a6b5c3f08691628ae00f059cf6d899
2014-10-30 11:10:11 +01:00
Djordje Pesut
a28e21b141 MIPS: dspr2: Added optimization for function ClampedAddSubtractFull
Change-Id: Iee98eaf007158f44a299dd5ba8d972d0d4108380
2014-10-29 13:08:06 +01:00
Djordje Pesut
18d5a1efa8 MIPS: dspr2: added optimization for function ClampedAddSubtractHalf
Change-Id: Iec22e897a4f56e79c18ec00f8caa9cefac67f186
2014-10-29 11:08:37 +01:00
Djordje Pesut
829a8c19a0 MIPS: dspr2: added optimization for ITransform
Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463
2014-10-28 14:28:14 +01:00
James Zern
22881c999e dec_neon: add RD4 intra predictor
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.

Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
1304eb3418 Merge "dec_neon: DC4: use pair-wise adds for top row" 2014-10-23 08:08:34 -07:00
James Zern
0db9031c79 dsp/dec_{neon,sse2}: VE4: normalize variable names
use '0' rather than '_' when dealing with variables that result from a
shift

Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
James Zern
b5bc15305b dec_neon: DC4: use pair-wise adds for top row
reduces load count, slightly faster

Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08
2014-10-23 15:48:49 +02:00
James Zern
eba6ce06c3 dec_neon: add DC4 intra predictor
~70% faster

Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0
2014-10-23 14:21:08 +02:00
James Zern
79abfbd9df dec_neon: add TM4 intra predictor
~21% faster

Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a
2014-10-23 14:21:08 +02:00
James Zern
fe395f0e4d dec_neon: add LD4 intra predictor
based on SSE2 version, ~55% faster

Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0
2014-10-23 14:20:47 +02:00
James Zern
32de385eca dec_neon: add VE4 intra predictor
based on SSE2 version, ~59% faster

Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b
2014-10-23 11:46:08 +02:00
Pascal Massimino
b7a33d7e91 implement VE4/HE4/RD4/... in SSE2
(30% faster prediction functions, but overall speed-up is ~1% only)

Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765
2014-10-22 18:25:36 +02:00
Pascal Massimino
97c76f1f30 make VP8PredLuma4[] non-const and initialize array in VP8DspInit()
also convert 'type *dst' to 'type* dst'

Change-Id: I41ab66ad15b548cc45d1cb8b10bbca4fe1528cae
2014-10-22 18:14:20 +02:00
James Zern
f85ec712b0 PrintReg: output to stderr
allows use of '-o -' while testing

Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191
2014-10-22 17:28:19 +02:00
James Zern
d1c359ef29 fix shared object build with -fvisibility=hidden
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples

Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
2014-10-17 11:50:52 +02:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6 mark some init function as being safe for thread_sanitizer.
introduces the macro WEBP_TSAN_IGNORE_FUNCTION

Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
James Zern
0ce27e715e enc_mips32: workaround gcc-4.9 bug
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9

In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault

Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
2014-10-15 19:14:04 +02:00
pascal massimino
32f67e309f Merge "enc_neon: initialize vectors w/vdup_n_u32" 2014-10-09 12:23:18 -07:00
Pascal Massimino
fabc65da32 1-3% faster encoding optimizing SSE_NxN functions
got rid of the |a-b|^|b-a| method and went back
to just (a-b)^2 instead.

quality | size(bytes) after/before | time (ms) after/before

Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0
2014-10-09 07:20:00 -07:00
James Zern
7534d71640 enc_neon: initialize vectors w/vdup_n_u32
replaces {} initialization gnu-ism

Change-Id: I5a7b2d4246f0205e4bfb7f4b77d720c47d8674ec
2014-10-09 12:35:41 +02:00
Pascal Massimino
2d9b0a4472 add WebPDispatchAlphaToGreen() to dsp
SSE2 version is 2.1x faster

This is used to transfer the alpha plane to green channel before lossless compression.

Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
2014-10-06 23:15:44 +02:00
Yang Zhang
ab70794ddb rewrite Disto4x4 in enc_neon.c with intrinsic
Performance test:
Platform: A9
Input data: bryce.yuv  11158x2156
performance of assembly is the base. Less ratio is better.
|toolchain |assembly |intrinsic |
|gcc4.6    |100%     |97.15%    |
|gcc4.8    |100%     |95.51     |

Change-Id: Idc2446685acdeb58a4dbdcdae533c68a83a1b879
2014-09-23 18:28:36 -07:00
Djordje Pesut
d4471637ef MIPS: dspr2: added optimization for function FilterLoop24
affected functions: VFilter16i, HFilter16i, VFilter8i and HFilter8i

Change-Id: I5d2bc7716e60e048a33d630fe4a86011bfb6d42e
2014-09-23 10:32:55 +02:00
Djordje Pesut
49e15044ef MIPS: dspr2: added optimization for function FilterLoop26
affected functions: VFilter16, HFilter16, VFilter8 and HFilter8

Change-Id: Ib2fc41aaa00b10c2906d689bdc5a10f4568e70a8
2014-09-23 08:46:05 +02:00
Pascal Massimino
cddd334050 Add a WebPExtractAlpha function to dsp
This is the opposite of WebPDispatchAlpha

+ Implement the SSE2 version

Change-Id: I0c297309255f508c5261da8aad01f7e57f924d6c
2014-09-15 08:12:03 +02:00
Pascal Massimino
690b491af1 fix loop bug in DispatchAlpha()
* We were re-doing most of the work in plain-C as 'left-over'.
* we were always returning has_alpha = true because of a bad mask all_0xff

These bugs were conservative and silent, in the sense that we were 'just' doing
more work than necessary.

Now, the SSE2 version is really 2x faster than the C version.

Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa
2014-09-11 22:35:08 +02:00
Djordje Pesut
3101f53720 MIPS: dspr2: added optimization for TransformOne
added macros for TransformOne, TransformAC3 and TransfromDC

Change-Id: I4341450f443cf46dcf91c0db17bde63c8fb8afee
2014-09-11 17:02:02 +02:00
Pascal Massimino
a6bb9b17d8 SSE2 for inverse Mult(ARGB)Row and ApplyAlphaMultiply
Change-Id: Iab5c0e4a4d2b31f86736a9b277e62b6e28c3d2b4
WebPMultRow: ~7x faster
WebPMultARGBRow: ~3x faster
ApplyAlphaMultiply: 60% faster
2014-09-11 07:58:42 +02:00
Djordje Pesut
e2502a97c1 MIPS: dspr2: added optimization for TransformAC3
Change-Id: Icd789ee5f6d764297e7dc0a0f8a3bc47ab92ac65
2014-09-09 14:53:36 +02:00
Djordje Pesut
24e1072aac MIPS: dspr2: added optimization for TransformDC
Change-Id: Iee69758f6442ea9c80ddaa32cea8d00dda4c6252
2014-09-09 14:15:04 +02:00
Djordje Pesut
f0103595dd MIPS: dspr2: added optimization for ColorIndexInverseTransforms
Change-Id: I5b6094ce489d4f896bc4b8f575142eb3c5054beb
2014-09-08 17:22:59 +02:00
James Zern
637b388809 dsp/lossless: workaround gcc-4.9 bug on arm
force Sub3() to not be inlined, otherwise the code in Select() will be
incorrect.
https://android-review.googlesource.com/#/c/102511

Change-Id: I90ae58bf3e6cc92ca9897f69974733d562e29aaf
2014-08-27 20:31:21 -07:00
James Zern
8323a9038d dsp.h: collect gcc/clang version test macros
endian_inl.h already relies on dsp.h, grab the definitions from there.

Change-Id: I445f7d0631723043c55da1070498f89965bec7b1
2014-08-27 19:33:09 -07:00
skal
e6c4b52f28 move static initialization of WebPYUV444Converters[] to the Init function.
Split initialization of YUV444Converters[] out of Upsamplers init.

update test for NULL function pointers

Change-Id: I9603f54250f90c85a12ffbecfd6c59e9b06c47e0
2014-08-27 11:36:37 -07:00
skal
f5c04d64b7 Merge "add a DispatchAlpha() for SSE2 that handles 8 pixels at a time" 2014-08-25 22:43:42 -07:00
skal
fc98edd936 add a DispatchAlpha() for SSE2 that handles 8 pixels at a time
Only slightly faster.

Change-Id: Ie2e57e6a0950166124cf1075c6c9b45b7abdad8c
2014-08-25 21:03:03 -07:00
skal
73d361dd5f introduce VP8EncQuantize2Blocks to quantize two blocks at a time
No speed diff for now. We might reorder better the instructions later,
to speed things up.

Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c
2014-08-25 20:21:42 -07:00
Djordje Pesut
0b21c30b1a MIPS: dspr2: added optimization for EmitAlphaRGB
New dsp function: WebPDispatchAlpha()

Change-Id: I48e539d22471279ec75185759bc68d18b127f716
2014-08-21 20:39:35 -07:00
James Zern
953acd56a4 enc_neon: enable QuantizeBlock for aarch64
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.

Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
2014-08-20 11:48:25 -07:00
Djordje Pesut
f4ae143720 MIPS: mips32: code rebase
mips code rebased to be same as C code
from commit I8c29a8a0285076cb3423b01ffae9fcc465da6a81

Change-Id: I3848f4ce43387c3a62b336606498779f7b07ec44
2014-08-19 15:13:16 +02:00
Djordje Pesut
569771549a MIPS: dspr2: added optimizations for VP8YuvTo*
VP8YuvToRgb
VP8YuvToBgr
VP8YuvToRgb565
VP8YuvToRgba4444
VP8YuvToArgb
VP8YuvToBgra
VP8YuvToRgba

Change-Id: I22212a125d890e1fd28388fec906a1a5c07ff386
2014-08-19 14:29:32 +02:00
James Zern
3fca851a20 cpu: check for _MSC_VER before using msvc inline asm
_M_IX86 will be defined in mingw builds after including windows.h. as
the gcc inline asm is first, this missing check would only have caused
an error if the code was reorganized.

Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c
2014-08-15 15:11:40 -07:00
Djordje Pesut
b4dc4069a2 MIPS: dspr2: added optimization for (un)filters
HorizontalFilter
VerticalFilter
GradientFilter
HorizontalUnfilter
VerticalUnfilter
GradientUnfilter

Change-Id: I54055b4767c37719691811072e95bf79c1f627b1
2014-08-14 11:55:19 -07:00
Djordje Pesut
b61c9ceca8 MIPS: dspr2: Optimization of some simple point-sampling functions
Change-Id: I6a4ab29bd0cc5a2951a8882cf9997032dc38bd79
2014-08-13 17:18:49 +02:00
Djordje Pesut
98c54107df MIPS: mips32r2: added optimization for BSwap32
gcc < 4.8.3 doesn't translate bswap optimally.
use optimized version always

Change-Id: I979ea26ad6dc0166d3d2f39c4148eb8adfb7ddec
2014-08-12 09:29:13 +02:00
Djordje Pesut
b7e5a5c451 MIPS: detect mips32r6 and disable mips32r1 code
Change-Id: Id1325c789a990c9a8704e84e99a22d580303eb8a
2014-08-08 17:29:31 +02:00
pascal massimino
bb07022b66 Merge "cosmetics" 2014-08-06 12:30:08 -07:00
James Zern
e300c9d819 cosmetics
fix some indent/whitespace, remove a few duplicate includes, extra
semi-colons

Change-Id: If937182b40a21e0f2028496e7b4b06c6e8a41352
2014-08-06 12:10:59 -07:00
James Zern
f7b4c48bba cosmetics: remove some extraneous 'extern's
Change-Id: Ib3f0cff37120c51633387dd1c46592c53ab0ba6d
2014-08-05 22:14:24 -07:00
James Zern
0524d9e5e8 dsp: detect mips64 & disable mips32 code
Change-Id: Icf68dafd5cf0614ca25b36a0252caa1784ac8059
2014-08-01 21:18:53 -07:00
skal
8f6f8c5dde remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
There's no speed diff, so better remove it altogether

Reported in https://code.google.com/p/webp/issues/detail?id=215

Change-Id: I991330de18bec340029d6df5fed0dfb4337e4662
2014-07-23 14:15:40 -07:00
James Zern
c76f07ecc2 dec_neon/TransformAC3: initialize vector w/vcreate
replaces {} initialization gnu-ism

Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0
2014-07-11 15:56:53 -07:00
James Zern
380cca4f2c configure.ac: add AC_C_BIGENDIAN
this defines WORDS_BIGENDIAN, replacing uses of
__BIG_ENDIAN__/__BYTE_ORDER__ with it

+ fixes lossless BGRA output with big-endian toolchains
  that do not define __BIG_ENDIAN__ (codesourcery mips gcc)

Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7
2014-07-03 18:15:50 -07:00
James Zern
47779d46c8 endian_inl.h: add BSwap32
Change-Id: I96e3ae49659307024415d64587e6312888a0070f
2014-07-03 13:28:13 -07:00
James Zern
e59f53600f neon: normalize vdup_n_* usage
with constants, prefer this over vmov_n_* or vcreate_*

Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a
2014-07-01 00:55:05 -07:00
James Zern
bc03670f01 neon: add INIT_VECTOR4
used to initialize NxMx4 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e
2014-07-01 00:18:23 -07:00
James Zern
6c1c632b03 neon: add INIT_VECTOR3
used to initialize NxMx3 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: Idad2f278ab104cf2cc650517194258ce3cfb37b4
2014-06-30 23:53:23 -07:00
James Zern
dc7687e51b neon: add INIT_VECTOR2
used to initialize NxMx2 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139
2014-06-30 23:52:42 -07:00
Pascal Massimino
1f3e5f1e60 remove unused 'shift' argument and QFIX2 define
this will remove a warning about the shift amount not being
an immediate (=constant).

Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a
Also: fix the NULL sharpen argument mismatch.
2014-06-26 00:44:12 -07:00
levytamar82
27bfeee43a QuantizeBlock SSE2 Optimization:
Another store to load forward block was detected coming from the function
FTransform.
FTransform save the output data 4 times 8 bytes each. when this data is
later being loaded by the QuantizeBlock function in one chunk of 16 bytes
that caused a store to load forward block.
The fix was done in the FTransform function where each two consecutive 8 bytes
were merged into one 16 bytes register and saved into the memory.
This fix gives ~21% function level gain and 1.6% user level gain.

Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992
2014-06-18 16:22:00 -07:00
James Zern
7a93c000ee **/Makefile.am: remove unused AM_CPPFLAGS
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.

Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
2014-06-12 11:59:05 -07:00
James Zern
32b3137936 configure: move config.h to src/webp/config.h
this change has the side-effect of using directory names in the
include, silencing a lint warning.

Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
2014-06-10 23:42:00 -07:00
James Zern
90090d99b5 Merge changes I7c675e51,I84f7d785
* changes:
  configure: test for -msse2
  rename upsampling_mips32.c to yuv_mips32.c
2014-06-10 16:15:21 -07:00
skal
69fce2ea78 remove the special casing for res->first in VP8SetResidualCoeffs
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.

also: fixes a warning for win64
    + slight speed-up

Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
2014-06-08 06:40:22 +02:00
James Zern
6e61a3a905 configure: test for -msse2
+ add a WEBP_HAVE_SSE2 to dsp.h

not all 32-bit toolchain configurations will have sse2 enabled by
default

Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
2014-06-07 19:44:08 -07:00
James Zern
b9d2efc629 rename upsampling_mips32.c to yuv_mips32.c
matches yuv_sse2 added in;
bdfeeba dsp/yuv: move sse2 functions to yuv_sse2.c

Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3
2014-06-07 12:35:47 -07:00
James Zern
bdfeebaa01 dsp/yuv: move sse2 functions to yuv_sse2.c
Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000
2014-06-05 23:52:54 -07:00
pascal massimino
46b32e861a Merge "configure: set WEBP_HAVE_AVX2 when available" 2014-06-05 02:57:42 -07:00
James Zern
db4860b355 enc_sse2: prevent signed int overflow
_mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed
int.

Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda
2014-06-04 23:18:22 -07:00
James Zern
230a055501 configure: set WEBP_HAVE_AVX2 when available
this is used to set WEBP_USE_AVX2 in files where the build flag won't be
used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called

Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f
2014-06-03 23:29:23 -07:00
James Zern
61362db57c remove libwebpdspdecode dep on libwebpdsp_avx2
it's encode only, libwebpdecoder doesn't need the symbols

Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83
2014-06-03 00:05:56 -07:00
James Zern
9754d39a4e Merge "strong filtering speed-up (~2-3% x86, ~1-2% for NEON)" 2014-06-02 23:06:18 -07:00
skal
ea8b0a171d strong filtering speed-up (~2-3% x86, ~1-2% for NEON)
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).

Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
2014-06-03 07:14:23 +02:00
skal
6679f8996f Optimize VP8SetResidualCoeffs.
Brings down WebP lossy encoding timings by 5%

Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489
2014-06-03 06:44:04 +02:00
James Zern
4dfa86b29c dsp/cpu: NaCl has no support for xgetbv
or the raw opcode; fixes:
  934ed4: unrecognized instruction

Change-Id: I981870baf0e8b03bf40144ea8ec25eff140d5bc3
2014-05-29 23:02:23 -07:00
pascal massimino
57897bae09 Merge "lossless_neon: use vcreate_*() where appropriate" 2014-05-28 01:36:13 -07:00
pascal massimino
6aa4777b39 Merge "(enc|dec)_neon: use vcreate_*() where appropriate" 2014-05-28 01:34:56 -07:00