Commit Graph

71 Commits

Author SHA1 Message Date
Vincent Rabaud
8acb4942f7 Remove the argb* files.
Half of the functionality was duplicated.
The rest is about the alpha channel handling so we
might as well put it in the appropriate file.

Change-Id: I8d5ef0afce82cc4842ab7132fd97995c42e6140a
2017-06-25 14:44:33 +02:00
Pascal Massimino
52245424b0 NEON implementation of some Sharp-YUV420 functions
Change-Id: I449ef9c76b06f971f6e2ad7f9db96bf906d8fe1f
new-file: dsp/yuv_neon.c
2017-04-18 19:22:37 +02:00
Pascal Massimino
693bf74ec0 move the SSIM calculation code in ssim.c / ssim_sse2.c
Change-Id: I63a63fa7f44f257f2e17e45358b206c23069c448
2017-02-21 12:53:35 +01:00
James Zern
668e1dd44f src/{dec,enc,utils}: give filenames a unique suffix
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible

BUG=webp:279

Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
2017-01-19 19:09:48 -08:00
Pascal Massimino
1de931c669 NEON: implement alpha-filters (horizontal/vertical/gradient)
gradient-filter code is not much faster, but maybe improvable in the future.

Change-Id: Ia16070e409fe8703b02276166f19526917df6b35
2017-01-17 15:44:46 +01:00
Pascal Massimino
49d0280df1 NEON: implement several alpha-processing functions
- ApplyAlphaMultiply
 - DispatchAlpha
 - DispatchAlphaToGreen
 - ExtractAlpha

Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4)

new file: alpha_processing_neon.c

Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca
2017-01-11 17:39:29 +01:00
Vincent Rabaud
c9b45863e2 Split off common lossless dsp inline functions.
Change-Id: I64f96897b11d1c21f033c7e47b21edccb5c68738
2016-09-12 17:35:08 +02:00
James Zern
d2e4484ef3 dsp/Makefile.am: put msa source in correct lib
upsampling_msa.c was incorrectly included in the neon convenience lib
+ sort msa sources

Change-Id: I7c4883f16a5c2fed12bfa0e8d8d6a7acd5d4fb84
2016-08-03 17:50:45 -07:00
Parag Salasakar
d3ddacb625 Add MSA optimized YUV to RGB upsampling functions
We add the following MSA optimized YUV to RGB upsampling functions:
- UpsampleRgbLinePair
- UpsampleBgrLinePair
- UpsampleRgbaLinePair
- UpsampleBgraLinePair
- UpsampleArgbLinePair
- UpsampleRgba4444LinePair
- UpsampleRgb565LinePair

Change-Id: I7264a615edc7eb376e443e9d38bd8e3c9a2cab1f
2016-07-22 14:28:30 +00:00
Parag Salasakar
9ac74f922e Add MSA optimized rescaling functions
We add the following MSA optimized rescaling functions:
- RescalerExportRowExpand
- RescalerExportRowShrink

Change-Id: Ic1c76065423b02617db94cf0c22bb564219b36e6
2016-07-19 15:52:42 +00:00
Parag Salasakar
cb19dbc1a4 Add MSA optimized color transform functions
We add the following MSA optimized color transform functions:
- TransformColor
- SubtractGreenFromBlueAndRed

Change-Id: Ib182d2b5faa7191f503ce70f0dfde0ac89402fd3
2016-07-18 13:49:24 +00:00
Parag Salasakar
435308e029 Add MSA optimized encoder transform functions
We add the following MSA optimized encoder transform functions:
- ITransform
- FTransform
- FTransformWHT

Change-Id: Ia6b17556aba5aff2d7a88208905fb45293d080a8
2016-07-05 14:35:47 +00:00
Parag Salasakar
dce64bfa1b Add MSA optimized alpha filter functions
We add the following MSA optimized alpha filter functions:
- HorizontalFilter
- VerticalFilter
- GradientFilter

Change-Id: I71e2e04050e569b8c0bf086fadf210ee16d50924
2016-07-01 19:58:25 +00:00
Parag Salasakar
701c772eed Add MSA optimized colorspace conversion functions
We add the following MSA optimized colorspace conversion functions:
- ConvertBGRAToRGBA
- ConvertBGRAToBGR
- ConvertBGRAToRGB

Change-Id: I76db1c829d593a06d4975d54dbafa385c82b84fb
2016-06-27 21:19:06 +00:00
Parag Salasakar
5e60c42a76 Added MSA optimized transform functions
1. TransformWHT
2. TransformTwo
3. TransformDC
4. TransformAC3

Change-Id: Ia3624cb4aed215bcaffce542b28794e643207039
2016-06-16 09:04:27 +00:00
James Zern
74fb56fb5d add runtime NEON detection
configure gets 2 new options:
--enable-neon / --enable-neon-rtcd

the NEON modules are split to their own convenience lib and built with
auto-detected flags if none are given via CFLAGS.

the /proc/cpuinfo check will only be used for armv7 targets whose
toolchain does not enable NEON by default or didn't have NEON forced by
the CFLAGS from the environment.

Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5
2016-05-06 15:32:48 -07:00
Vincent Rabaud
bf2b4f114f Regroup common SSE code + optimization.
The transpose refactoring will help removing a transpose in a
later CL.

The horizontal add function helps removing a _mm_sad_epu8 in DC8uv
=> the latency/throughput went from 29/25 to 23/19

Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766
2016-02-16 18:34:34 +01:00
James Zern
b9d80fa4e8 configure: disable asserts by default
--enable-asserts can be used to avoid defining NDEBUG

Change-Id: I6216668e3f79f69bd8c453f0b36cecb3b585688e
2015-12-16 13:15:53 -08:00
Pascal Massimino
ac761a3738 10% faster table-less SSE2/NEON version of YUV->RGB conversion
* Precision is slightly different
* also implemented in SSE2 the missing WebPUpsamplers for MODE_ARGB, MODE_Argb, MODE_RGB565, etc.
* removing yuv_tables_sse2.h saved ~8k of binary size
* the mips32/mips_dsp_r2 code is disabled for now, since it has drifted away
* the NEON code is somewhat tricky

Change-Id: Icf205faa62cf46c2825d79f3af6725dc1ec7f052
2015-12-08 20:05:56 -08:00
skal
b4e731cd93 neon-implementation for rescaler code
It's better to stay with a 32b fixed-point precision overall, otherwise
the C-version on ARM gets *slower*.
Actually, gcc ARM compiler optimizes some instructions pretty
well when WEBP_RESCALER_FIX is exactly 32, even in C.

Change-Id: I0eea97f7db5947470f5af355dee098eca81e178d
2015-10-07 21:18:39 -07:00
Pascal Massimino
76a7dc39e5 rescaler: add some SSE2 code
The rounding and arithmetic is not the same as previously, to prevent overflow cases for large upscale factors.

We still rely on 32b x 32b -> 64b multiplies. Raised the fixed-point precision to 32b
so that we have some nice shifts from epi64 to epi32.
Changed rescaler_t type to 'uint32_t' in order to squeeze in all the precision required.

The MIPS code has been disabled because it's now out-of-sync. Will be fixed in
a subsequent CL when the dust settles.
~30-35% faster

Change-Id: I32e4ddc00933f1b1aa3463403086199fd5dad07b
2015-09-25 15:07:13 +02:00
Pascal Massimino
f3d687e3fa SSE4.1 implementation of some lossless encoding functions
New implementations: SubtractGreenFromBlueAndRed and TransformColor

around 1-2% faster lossless encoding.

Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
2015-06-23 08:46:57 +02:00
Pascal Massimino
bfc300c7ff SSE4.1 implementation of some alpha-processing functions
DispatchAlpha* functions are hard to speed up, compared to SSE2.
ExtractAlpha sees a ~15% speed-up though.

Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de
2015-06-19 14:17:39 -07:00
Pascal Massimino
94055503e3 encoding SSE4.1 stub for StoreHistogram + Quantize + SSE_16xN
Visible speed-up, thanks to pshufb and pabsw and psignw use.

had to tweak configure.ac to make "smmintri.h" presence correctly
detected (we need to set the CPPFLAGS instead of the CFLAGS!)

Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080
2015-03-25 20:23:51 -07:00
James Zern
553051f741 dsp/lossless: split enc/dec functions
adds lossless_enc*.c; reduces the size of the decode-only so: ~78K
w/gcc-4.8.2 on x86_64.

Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092
2015-03-23 22:57:50 -07:00
Pascal Massimino
e9570dd987 stub for SSE4.1 support.
Change-Id: I0c845a98d2871cc8907ff7b914bab7747a92c7ed
2015-03-20 00:26:35 -07:00
Pascal Massimino
2a407092ab 4-5% faster encoding using SSE2 for GetResidualCost
new file: cost_sse2.c

Change-Id: I4896c07f5ff2443ef743f4435fe2758d95a672ed
2015-02-18 09:41:02 +01:00
Pascal Massimino
a987faedfa MIPS: dspr2: added optimization for function GetResidualCost
set/get residual C functions moved to new file in src/dsp
mips32 version of GetResidualCost moved to new file

Change-Id: I7cebb7933a89820ff28c187249a9181f281081d2
2015-02-07 02:13:26 -08:00
Pascal Massimino
022d2f886c add SSE2 variants for alpha filtering functions
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.

The heavy bottleneck left for optimization is still GradientUnfilter()

Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
2015-01-29 08:46:22 +01:00
Pascal Massimino
7afdaf8496 Alpha coding: reorganize the filter/unfiltering code
Move the filtering code to their own dsp/ spot
New function: VP8FiltersInit()

Change-Id: I0b2041eab42346c59b972f2575b05509e6a8f7b1
2015-01-28 08:02:41 +01:00
Djordje Pesut
cbcbedd0de move rescaler functions to rescaler* files in src/dsp/
Change-Id: I906add1b1010a59ebfcc2dd81e15745433cc206b
2015-01-09 16:47:09 +01:00
Pascal Massimino
ca7f60db5f SSE2 implementation of VP8PackARGB
Change-Id: I40c0e26a6a2701216e4ddebcf793aa535677f437
2015-01-05 05:17:51 -08:00
Djordje Pesut
7ce8788b06 MIPS: dspr2: added optimization for function MakeARGB32
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row

Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
2014-12-22 12:31:36 +01:00
Djordje Pesut
829a8c19a0 MIPS: dspr2: added optimization for ITransform
Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463
2014-10-28 14:28:14 +01:00
Djordje Pesut
24e1072aac MIPS: dspr2: added optimization for TransformDC
Change-Id: Iee69758f6442ea9c80ddaa32cea8d00dda4c6252
2014-09-09 14:15:04 +02:00
Djordje Pesut
f0103595dd MIPS: dspr2: added optimization for ColorIndexInverseTransforms
Change-Id: I5b6094ce489d4f896bc4b8f575142eb3c5054beb
2014-09-08 17:22:59 +02:00
skal
fc98edd936 add a DispatchAlpha() for SSE2 that handles 8 pixels at a time
Only slightly faster.

Change-Id: Ie2e57e6a0950166124cf1075c6c9b45b7abdad8c
2014-08-25 21:03:03 -07:00
Djordje Pesut
0b21c30b1a MIPS: dspr2: added optimization for EmitAlphaRGB
New dsp function: WebPDispatchAlpha()

Change-Id: I48e539d22471279ec75185759bc68d18b127f716
2014-08-21 20:39:35 -07:00
Djordje Pesut
569771549a MIPS: dspr2: added optimizations for VP8YuvTo*
VP8YuvToRgb
VP8YuvToBgr
VP8YuvToRgb565
VP8YuvToRgba4444
VP8YuvToArgb
VP8YuvToBgra
VP8YuvToRgba

Change-Id: I22212a125d890e1fd28388fec906a1a5c07ff386
2014-08-19 14:29:32 +02:00
Djordje Pesut
b4dc4069a2 MIPS: dspr2: added optimization for (un)filters
HorizontalFilter
VerticalFilter
GradientFilter
HorizontalUnfilter
VerticalUnfilter
GradientUnfilter

Change-Id: I54055b4767c37719691811072e95bf79c1f627b1
2014-08-14 11:55:19 -07:00
Djordje Pesut
b61c9ceca8 MIPS: dspr2: Optimization of some simple point-sampling functions
Change-Id: I6a4ab29bd0cc5a2951a8882cf9997032dc38bd79
2014-08-13 17:18:49 +02:00
James Zern
7a93c000ee **/Makefile.am: remove unused AM_CPPFLAGS
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.

Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
2014-06-12 11:59:05 -07:00
James Zern
6e61a3a905 configure: test for -msse2
+ add a WEBP_HAVE_SSE2 to dsp.h

not all 32-bit toolchain configurations will have sse2 enabled by
default

Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
2014-06-07 19:44:08 -07:00
James Zern
b9d2efc629 rename upsampling_mips32.c to yuv_mips32.c
matches yuv_sse2 added in;
bdfeeba dsp/yuv: move sse2 functions to yuv_sse2.c

Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3
2014-06-07 12:35:47 -07:00
James Zern
bdfeebaa01 dsp/yuv: move sse2 functions to yuv_sse2.c
Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000
2014-06-05 23:52:54 -07:00
James Zern
61362db57c remove libwebpdspdecode dep on libwebpdsp_avx2
it's encode only, libwebpdecoder doesn't need the symbols

Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83
2014-06-03 00:05:56 -07:00
skal
399b916d27 lossy decoding: correct alpha-rescaling for YUVA format
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.

Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()

Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
2014-05-27 15:27:13 -07:00
James Zern
515e35cfb1 Merge "add stub dsp/enc_avx2.c" 2014-05-22 18:28:38 -07:00
skal
a05dc1402c SSE2: yuv->rgb speed-up for point-sampling
- use statically initialized tables (if WEBP_YUV_USE_SSE2_TABLES is defined)
 - use SSE2 row conversion for yuv->ARGB / RGBA / ABGR / RGB / BGR
 - clean-up and harmonize the WebpUpsamplers[] usage.

Change-Id: Ic5f3659a995927bd7363defac99c1fc03a85a47d
2014-05-22 09:56:47 +02:00
James Zern
178e9a69ae add stub dsp/enc_avx2.c
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled

Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
2014-05-22 00:31:46 -07:00