Pascal Massimino
ab66becaae
introduce a separate WebPRescalerDspInit to initialize pointers
...
so that we keep the details of WebPRescaler in utils/rescaler.c
when possible.
Change-Id: Ib6c1029a09b84cbc7a7d2f70dafa4d4d9132cecc
2015-01-12 13:58:30 -08:00
pascal massimino
cbcdd5ffaf
Merge "move rescaler functions to rescaler* files in src/dsp/"
2015-01-10 05:41:45 -08:00
pascal massimino
bf586e8844
Merge changes I230b3532,Idf3057a7
...
* changes:
enable NEON for Windows ARM builds
Makefile.vc: add rudimentary Windows ARM support
2015-01-10 02:14:48 -08:00
James Zern
4f43d38ca8
enable NEON for Windows ARM builds
...
Change-Id: I230b353214ce44ab29ffd2df6ccd14345d6578e8
2015-01-09 19:11:55 -08:00
James Zern
e7c5954c10
dec_neon: remove returns from void functions
...
Change-Id: I3c66a5dfe3de2bb3653cbbf1b92b0328aba62881
2015-01-09 18:08:05 -08:00
Djordje Pesut
cbcbedd0de
move rescaler functions to rescaler* files in src/dsp/
...
Change-Id: I906add1b1010a59ebfcc2dd81e15745433cc206b
2015-01-09 16:47:09 +01:00
Djordje Pesut
a28c4b363d
MIPS: move WORK_AROUND_GCC define to appropriate place
...
Change-Id: I3055eca57dc4e9d39533a5b8170bbf7af9cd818f
2015-01-08 15:55:41 +01:00
Djordje Pesut
012d2c60fa
MIPS: dspr2: added optimization for functions SSEAxB
...
list of optimized functions: SSE16x16, SSE8x8, SSE16x8, SSE4x4
Change-Id: Ie99e7cdd73b0d4ff855977315a5d0db9ffaa5f04
2015-01-08 13:49:17 +01:00
Djordje Pesut
9241ecf45d
MIPS: dspr2: added optimization for function Average
...
Change-Id: I7ca316bc3f5fbdaf8dcaf9a2d2227a5134bf4f63
2015-01-08 11:46:15 +01:00
pascal massimino
c6d3292738
argb_sse2: cosmetics
...
clarify some variable names in PackARGB() + add some comments
Change-Id: I2bb91d6c52dcbcdebe0f92d5f2136c2d7d11af2a
2015-01-08 00:18:54 -08:00
James Zern
67f601cd46
make the 'last_cpuinfo_used' variable names unique
...
allows the sources to be #include'd in some hackish builds (don't do
that!)
Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91
2015-01-07 23:38:53 -08:00
Pascal Massimino
9592053859
Merge "multi-thread fix: lock each entry points with a static var"
2015-01-07 00:03:51 -08:00
Pascal Massimino
4c1b300ada
Merge "SSE2 implementation of VP8PackARGB"
2015-01-06 23:53:50 -08:00
James Zern
04c20e75ea
Merge "MIPS: dspr2: added optimization for function Intra4Preds"
2015-01-06 16:15:10 -08:00
Pascal Massimino
a437694a17
multi-thread fix: lock each entry points with a static var
...
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)
Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
2015-01-05 07:48:49 -08:00
Pascal Massimino
ca7f60db5f
SSE2 implementation of VP8PackARGB
...
Change-Id: I40c0e26a6a2701216e4ddebcf793aa535677f437
2015-01-05 05:17:51 -08:00
Pascal Massimino
72d573f693
simplify the PackARGB signature
...
Change-Id: I51570e362126b2681f93211a4f59a3fedb5fd4b5
2015-01-05 02:10:04 -08:00
James Zern
f8abb112f2
Merge changes I109ec4d9,I73fe7743
...
* changes:
dec_neon: add DC8uvNoTop / DC8uvNoLeft
dec_neon: add DC8uv
2014-12-23 09:11:22 -08:00
Djordje Pesut
ae2188a435
MIPS: dspr2: added optimization for function Intra4Preds
...
Change-Id: Ie2a23c356a8715817b020fbee2b40e878e2946de
2014-12-23 17:32:27 +01:00
James Zern
14108d7878
dec_neon: add DC8uvNoTop / DC8uvNoLeft
...
adds do_top/do_left flags to DC8uv; ~88% / ~92% faster respectively
no change in DC8uv speed.
Change-Id: I109ec4d9ad13c9db64516e98ed4693a21a3e9b54
2014-12-22 15:47:38 -05:00
James Zern
d8340da756
dec_neon: add DC8uv
...
~87% faster.
Change-Id: I73fe77437792f1361ce8ab0b411132c6ec0fa021
2014-12-22 14:36:45 -05:00
Djordje Pesut
7ce8788b06
MIPS: dspr2: added optimization for function MakeARGB32
...
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row
Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
2014-12-22 12:31:36 +01:00
Pascal Massimino
87c3d53180
method=0: Don't evaluate any predictor
...
and apply Paeth predictor (predictor#11) for the low effort (m=0) mode.
For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality
range and about 10% for higher quality range.
Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93
2014-12-17 18:41:08 +01:00
Pascal Massimino
31a9cf6417
Speedup WebP lossless compression for low effort (m=0) mode with following:
...
- Disable Cross-Color transform.
- Evaluate predictors #11 (paeth), #12 and #13 only.
Change-Id: I857264c85c61c3957d4fb45ae32d261d947c8bed
2014-12-17 11:52:11 +01:00
Djordje Pesut
9275d91c79
MIPS: dspr2: added optimization for function TrueMotion
...
Change-Id: Id006d9591c0c922e28f7f4c01e4006f0f07bdd56
2014-12-12 14:38:55 +01:00
James Zern
a3946b8956
enc_neon: fix building with non-Xcode clang (iOS)
...
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.
Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
2014-12-10 15:50:26 -08:00
Pascal Massimino
8ed9c00d5e
Merge "simplify the Histogram struct, to only store max_value and last_nz"
2014-12-10 02:02:05 -08:00
Pascal Massimino
bad775715a
simplify the Histogram struct, to only store max_value and last_nz
...
we don't need to store the whole distribution in order to compute the alpha
Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.
Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
2014-12-10 10:44:57 +01:00
Djordje Pesut
3cca0dc7f0
MIPS: dspr2: Added optimization for DCMode function
...
Change-Id: I8ea31907c1ea1259ec4db8cee1a479bd13a025a1
2014-12-09 13:58:39 +01:00
Djordje Pesut
37e395fd1c
MIPS: fix functions to use generic BPS istead of hardcoded value
...
Change-Id: I2d68abef886eff7f8df230f155b758dccd7d04fd
2014-12-05 15:55:47 +01:00
Pascal Massimino
4a279a680e
cosmetics: add some missing != NULL comparisons
...
Change-Id: I55f8da527e5e8ee4b49c7e7aa0d61ea4a6c80904
2014-12-04 14:54:11 +01:00
Pascal Massimino
66ad372500
factorize BPS definition in dsp.h and add VP8Copy16x8
...
Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d
2014-12-04 13:45:14 +01:00
Pascal Massimino
57606047ec
encoder: switch BPS to 32 instead of 16
...
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/
With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.
Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
2014-12-04 09:17:18 +01:00
Djordje Pesut
1b66bbe998
MIPS: dspr2: added optimization for function TransformColor_C
...
Change-Id: Idbf5cecf6775340585b0fd7e6ddcb29c2fcbea36
2014-12-01 15:46:06 +01:00
James Zern
9de9074c92
dec_neon: add TM8uv
...
~68% faster
reuses TM4() adding support for the additional rows, the columns were
already being done.
Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
2014-11-25 14:40:17 -08:00
James Zern
e18571393d
dsp: initialize VP8PredChroma8 in VP8DspInit()
...
the table becomes non-const to allow for platform-specific optimizations
Change-Id: I32d2b51480020dc653ecfafd20b6b0f096af349f
2014-11-24 22:12:42 -08:00
Vikas Arora
e0c809ad23
Move Entropy methods to lossless.c
...
Move all the Entropy evaluation methods to lossless.c (from histogram.c).
There's slight difference in the way entropy is computed for evaluating
entropy in prediction methods and histogram (literal) for huffman trees.
Plan (later) to merge few (static) methods and reduce the code size.
This change has no impact on the compression speed/density.
Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15
2014-11-20 13:48:05 -08:00
Djordje Pesut
2f0e2ba826
MIPS: dspr2: added optimization for function Select
...
Change-Id: I22470d8b9ab8c5e90c5330ff12c9852676da1a3d
2014-11-07 09:44:16 +01:00
Djordje Pesut
54f2c14cce
MIPS: dspr2: added optimization for function FTransform
...
Change-Id: Ib5850edbc2a586ec9781f494b2337f024e22af78
2014-11-06 14:21:33 +01:00
Djordje Pesut
aa42f4231f
MIPS: dspr2: Added optimization for function VP8LSubtractGreenFromBlueAndRed
...
Change-Id: I683c73cceee4a40ca810deba15e54fbf7dbe8918
2014-11-06 10:56:18 +01:00
Djordje Pesut
95ca44a718
MIPS: dspr2: added optimization for Disto4x4
...
enc/dec common macros moved to mips_macro.h
Change-Id: I38d491e772554ac663dd5eb4d15485c0343f23b1
2014-11-05 12:06:15 +01:00
Djordje Pesut
5798eee6be
MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899)
...
Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140
2014-11-03 15:39:00 +01:00
James Zern
572022a350
filters_mips_dsp_r2.c: disable unfilters
...
the output does not match the C-code.
Change-Id: Ie7b7387478a6b5c3f08691628ae00f059cf6d899
2014-10-30 11:10:11 +01:00
Djordje Pesut
a28e21b141
MIPS: dspr2: Added optimization for function ClampedAddSubtractFull
...
Change-Id: Iee98eaf007158f44a299dd5ba8d972d0d4108380
2014-10-29 13:08:06 +01:00
Djordje Pesut
18d5a1efa8
MIPS: dspr2: added optimization for function ClampedAddSubtractHalf
...
Change-Id: Iec22e897a4f56e79c18ec00f8caa9cefac67f186
2014-10-29 11:08:37 +01:00
Djordje Pesut
829a8c19a0
MIPS: dspr2: added optimization for ITransform
...
Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463
2014-10-28 14:28:14 +01:00
James Zern
22881c999e
dec_neon: add RD4 intra predictor
...
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.
Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
1304eb3418
Merge "dec_neon: DC4: use pair-wise adds for top row"
2014-10-23 08:08:34 -07:00
James Zern
0db9031c79
dsp/dec_{neon,sse2}: VE4: normalize variable names
...
use '0' rather than '_' when dealing with variables that result from a
shift
Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
James Zern
b5bc15305b
dec_neon: DC4: use pair-wise adds for top row
...
reduces load count, slightly faster
Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08
2014-10-23 15:48:49 +02:00