Pascal Massimino
fe9317c9bf
cosmetics:
...
* remove MIPS32 suffix from static function names
* fix a long line in enc_neon.c
Change-Id: Ia1294ae46f471b3eb1e9ba43c6aa1b29a7aeb447
2014-04-16 00:36:19 -07:00
James Zern
953b074677
enc_neon: cosmetics
...
fix/remove incorrect comments
+ whitespace
Change-Id: Id1b86beb23e5bf946e73c34ab7066b6ca177f33b
2014-04-15 23:57:03 -07:00
skal
a9fc697cb6
Merge "WIP: extract the float-calculation of HuffmanCost from loop"
2014-04-15 11:33:11 -07:00
skal
3f84b5219d
Merge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)"
2014-04-15 07:09:12 -07:00
Djordje Pesut
4ae0533f39
MIPS: MIPS32r1: Added optimizations for ExtraCost functions.
...
ExtraCost and ExtraCostCombined
Change-Id: I7eceb9ce2807296c6b43b974e4216879ddcd79f2
2014-04-15 15:37:06 +02:00
skal
b30a04cf11
WIP: extract the float-calculation of HuffmanCost from loop
...
new function: VP8FinalHuffmanCost()
Change-Id: I42102f8e5ef6d7a7af66490af77b7dc2048a9cb9
2014-04-15 14:52:52 +02:00
skal
a8fe8ce231
Merge "NEON intrinsics version of CollectHistogram"
2014-04-15 03:00:45 -07:00
skal
95203d2d1b
NEON intrinsics version of CollectHistogram
...
apparently faster, but we might save some load/store to/from memory
once we settle for the intrinsics-based FTransform()
(also: fixed some #ifdef USE_INTRINSICS problems)
Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7
2014-04-14 16:47:20 +02:00
skal
7ca2e74bb4
replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)
...
saves few instructions
Change-Id: If8f464bb2894a209bba94825a4db9267df126d47
2014-04-14 15:14:45 +02:00
skal
41c6efbdc5
fix lossless_neon.c
...
* some extra {xx , 0 } in initializers
* replaced by vget_lane_u32() where appropriate
Change-Id: Iabcd8ec34d7c853920491fb147a10d4472280a36
2014-04-14 14:27:11 +02:00
skal
8ff96a027a
NEON intrinsics version of FTransform
...
as little bit slower than inlined asm it seems.
So disabled for now.
Change-Id: I8c942846f9bedaed57275675ea9dbbcb8dfd9ccd
2014-04-14 09:58:35 +02:00
Jovan Zelincevic
0214f4a908
Merge "MIPS: MIPS32r1: Added optimizations for FastLog2"
2014-04-10 08:54:12 -07:00
Jovan Zelincevic
baabf1ea3a
MIPS: MIPS32r1: Added optimizations for FastLog2
...
Functions VP8LFastLog2Slow and VP8LFastSLog2Slow
also: replaced some "% y" by "& (y-1)" in the C-version
(since y is a power-of-two)
Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0
2014-04-10 08:32:51 -07:00
skal
3d49871dbe
NEON functions for lossless coding
...
Verified OK, but right now they don't seem faster.
So they are disabled behind a USE_INTRINSICS flag (off for now)
Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c
2014-04-10 15:32:08 +02:00
Slobodan Prijic
3fe0291530
MIPS: MIPS32r1: Added optimizations for SSE functions.
...
Change-Id: I1287fa65064192cc2edc5c4be2b1974be665b9b4
2014-04-09 11:02:13 +02:00
skal
c503b485b6
Merge "fix the gcc-4.6.0 bug by implementing alternative method"
2014-04-08 23:25:59 -07:00
skal
abe6f48709
fix the gcc-4.6.0 bug by implementing alternative method
...
previous functions are a bit faster with gcc-4.8, so we keep them
for now.
Change-Id: I4081e5af66fbf606295d8a83875c1b889729b4dc
2014-04-09 07:53:55 +02:00
James Zern
5598bdecd8
enc_mips32.c: fix file mode
...
Change-Id: I5a43320e2ea2eebc88c65398acb9ea59b63af1fd
2014-04-08 15:12:54 -07:00
Slobodan Prijic
2b1b4d5ae9
MIPS: MIPS32r1: Add optimization for GetResidualCost
...
+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual
Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d
2014-04-08 15:28:49 +02:00
pascal massimino
f0a1f3cd51
Merge "MIPS: MIPS32r1: Added optimization for FTransform"
2014-04-08 04:17:27 -07:00
Djordje Pesut
7231f610aa
MIPS: MIPS32r1: Added optimization for FTransform
...
Change-Id: I9384dac483e8f98bcfdd277a0a3d6ec7c7a7b297
2014-04-08 04:16:44 -07:00
skal
869eaf6c60
~30% encoding speedup: use NEON for QuantizeBlock()
...
also revamped the signature to avoid having to pass the 'first' parameter
Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5
2014-04-08 03:08:22 -07:00
James Zern
f758af6b73
enc_neon: convert FTransformWHT to intrinsics
...
slightly faster than the inline asm
in practice not much faster than the C-code in a full NEON build, but
still better overall in an Android-like one that only enables NEON for
certain files.
Change-Id: I69534016186064fd92476d5eabc0f53462d53146
2014-04-08 00:20:19 -07:00
Djordje Pesut
7dad095bb4
MIPS: MIPS32r1: Added optimization for Disto4x4 (TTransform)
...
Change-Id: Ieb20c5c52b964247cfe46f45f9a7415725bf7c02
2014-04-07 15:04:23 +02:00
Jovan Zelincevic
2298d5f301
MIPS: MIPS32r1: Added optimization for QuantizeBlock
...
Change-Id: I6047ab107e4d474e35b5af1dac391d5b3d8c049b
2014-04-07 09:22:35 +02:00
Djordje Pesut
e88150c9b6
Merge "MIPS: MIPS32r1: Add optimization for ITransform"
2014-04-05 10:36:05 -07:00
James Zern
de693f2502
lossless_neon: disable VP8LConvert* functions
...
due to breakage with NDK/gcc-4.6 builds
Change-Id: Id96258e710ee33e08a023354b3227f27da986620
2014-04-04 20:38:29 -07:00
skal
4143332b22
NEON intrinsics for encoding
...
* inverse transform is actually slower with intrinsics + gcc-4.6,
so is left disabled for now.
With gcc-4.8, it's a bit faster than inlined assembly.
* Sum of Square error function provide a 2-3% speed up
There's enabled by default (since there's no inlined-asm equivalent)
Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a
2014-04-04 15:02:56 -07:00
Djordje Pesut
0ca2914b23
MIPS: MIPS32r1: Add optimization for ITransform
...
Change-Id: Ie4c8b9bc3a7826bd443cdebf05386786fafe8c56
2014-04-04 10:50:35 +02:00
James Zern
71bca5ecf3
dec_neon: use vst_lane instead of vget_lane
...
results in fewer instructions, small speed improvement
Change-Id: I98de632d09ff09f295368c0d744cb4397b585084
2014-04-03 14:56:26 -07:00
skal
bf06105293
Intrinsics NEON version of TransformOne
...
+ misc cosmetics
* seems 4% slower than inlined-asm with gcc-4.6
* is a tad faster (<1%) with gcc-4.8
(disabled for now)
Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095
2014-04-03 14:41:56 -07:00
pascal massimino
19c6f1ba74
Merge "dec_neon: use vld?_lane instead of vset?_lane"
2014-04-03 01:16:29 -07:00
James Zern
7a94c0cf75
upsampling_neon: drop NEON suffix from local functions
...
Change-Id: I6583ad74aacf78dcbeb5a0ff0218a39bc3460e5a
2014-04-02 23:24:39 -07:00
James Zern
d14669c83c
upsampling_sse2: drop SSE2 suffix from local functions
...
Change-Id: I2349c1a8e5e15e1d204642096f84f3202721c297
2014-04-02 23:24:39 -07:00
James Zern
2ca42a4fb7
enc_sse2: drop SSE2 suffix from local functions
...
Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e
2014-04-02 23:24:36 -07:00
James Zern
d038e6193b
dec_sse2: drop SSE2 suffix from local functions
...
Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882
2014-04-02 23:10:39 -07:00
James Zern
fa52d7525f
dec_neon: use vld?_lane instead of vset?_lane
...
results in fewer instructions, small speed improvement
Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a
2014-04-02 23:03:18 -07:00
Pascal Massimino
c520e77d94
cosmetic: fix long line
...
Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea
2014-04-02 23:00:50 -07:00
James Zern
4b0f2dae6f
Merge "add intrinsics NEON code for chroma strong-filtering"
2014-04-02 22:57:44 -07:00
skal
e351ec0759
add intrinsics NEON code for chroma strong-filtering
...
The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t
register, and re-use the previous (luma) functions
Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38
2014-04-03 06:58:21 +02:00
pascal massimino
aaf734b8b0
Merge "Add SSE2 version of forward cross-color transform"
2014-04-02 14:18:59 -07:00
Urvang Joshi
c90a902eff
Add SSE2 version of forward cross-color transform
...
Encoding speed is roughly the same.
Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c
2014-04-02 12:21:20 -07:00
Vikas Arora
bc374ff39e
Use histogram_bits to initalize transform_bits.
...
This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.
Change-Id: I491aa1c726def934161d4a4377e009737fbeff82
2014-04-02 11:46:40 -07:00
James Zern
2132992d47
Merge "Add strong filtering intrinsics (inner and outer edges)"
2014-04-02 00:10:01 -07:00
skal
5fbff3a646
Add strong filtering intrinsics (inner and outer edges)
...
+ added some work-around gcc-4.6 to make it compile (except one function).
+ lots of revamping
All variants tested ok.
Speed-up is ~5-7%
Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3
2014-04-02 08:28:55 +02:00
Urvang Joshi
d4813f0cb2
Add SSE2 function for Inverse Cross-color Transform
...
Lossless decoding is now ~3% faster.
Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743
2014-04-01 15:52:25 -07:00
James Zern
26029568b7
dec_neon: add strong loopfilter intrinsics
...
vertical only currently, 2.5-3% faster
placed under USE_INTRINSICS as this change depends on the simple
loopfilter
improves the simple loopfilter slightly thanks to some reorganization
Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155
2014-04-01 01:13:50 -07:00
James Zern
cca7d7ef0f
Merge "add intrinsics version of SimpleHFilter16NEON()"
2014-04-01 00:57:11 -07:00
James Zern
1a05dfa7f5
windows: fix dll builds
...
WebPSafe* need to be marked external to allow mux/demux to access them
through libwebp.dll
Change-Id: Ib6620e00d376f7aa5a0550e1e244f759977f97a0
2014-03-31 17:46:12 -07:00
skal
d6c50d8ac2
Merge "add some colorspace conversion functions in NEON"
2014-03-31 13:15:18 -07:00