97 Commits

Author SHA1 Message Date
Parag Salasakar
e11da081f9 mips msa webp configuration
Change-Id: I886164d6d3d560b1249603d47391fddf20b5a3d4
2016-06-07 23:49:41 -07:00
Pascal Massimino
77f21c9c39 Move DitherCombine8x8 to dsp/dec.c
To be later optimized in SSE2

Change-Id: I0de9c89eb5166f3319bb4b0500150de271ecac05
2016-05-24 23:14:41 -07:00
James Zern
e15afbce5d dsp.h: fix ubsan macro name
copy and paste error in the previous commit, change
no_sanitize("unsigned-integer-overflow") from WEBP_UBSAN_IGNORE_UNDEF ->
WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW

Change-Id: Id178ee14df1f2c4923a91ce423241e26b60b5d32
2016-05-13 11:09:57 -07:00
James Zern
e53c9ccb24 dsp.h: add WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
for suppressing expected failures with -fsanitize=integer

Change-Id: I954cba45f0c96478b770ed7a6ac7491359cae075
2016-05-12 23:51:23 -07:00
James Zern
ea0be354a0 dsp.h: remove utils.h include
include utils.h directly where needed to allow utils.h to rely on
defines from dsp.h in a follow-up.

Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d
2016-05-11 23:17:21 -07:00
James Zern
ea24e026aa Merge "dsp.h: add WEBP_UBSAN_IGNORE_UNDEF" 2016-05-11 06:21:45 +00:00
James Zern
369e264e2e dsp.h: add WEBP_UBSAN_IGNORE_UNDEF
only defined when WEBP_FORCE_ALIGNED isn't. use it to quiet alignment
warnings VP8LoadNewBytes().

Change-Id: I710a74bb9375285974e97022540551a3f4eda414
2016-05-10 22:45:13 -07:00
James Zern
74fb56fb5d add runtime NEON detection
configure gets 2 new options:
--enable-neon / --enable-neon-rtcd

the NEON modules are split to their own convenience lib and built with
auto-detected flags if none are given via CFLAGS.

the /proc/cpuinfo check will only be used for armv7 targets whose
toolchain does not enable NEON by default or didn't have NEON forced by
the CFLAGS from the environment.

Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5
2016-05-06 15:32:48 -07:00
Pascal Massimino
2102ccd091 update the Unfilter API in dsp to process one row independently
This will allow to work in-place on cropped area later.

Also sped up the inverse gradient filtering in SSE2 (~4%)

Change-Id: I463149eee95d36984328f163a1e17f8cabd87441
2016-04-21 08:10:45 +00:00
Pascal Massimino
a90edffb7e fix missing 'extern' for SSIM function in dsp/
Change-Id: Id8143120f01065dc088f4e90bd930f8ea7c3ae5a
2016-03-08 10:27:46 -08:00
Pascal Massimino
423ecaf484 move some SSIM-accumulation function for dsp/
This is in preparation for some SSE2 code.

And generally speaking, the whole SSIM code needs some
revamp: we're not averaging the SSIM value at each pixels
but just computing the overall SSIM value once, for the whole
plane. The former might be better than the latter.

Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5
2016-03-08 07:50:09 +01:00
Vincent Rabaud
9960c31685 Remove an unnecessary transposition in TTransform.
Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e
2016-02-17 21:41:28 +01:00
Pascal Massimino
2c08aac81a introduce WebPMemToUint32 and WebPUint32ToMem for memory access
it uses memcpy() when unaligned memory write is tricky

Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874
2015-12-04 13:43:01 +00:00
Pascal Massimino
bfd3fc02df ~2x faster SSE2 RGB24toY, BGR24toY, ARGBToY|UV
global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)

Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV

Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
2015-11-06 15:02:01 -08:00
Pascal Massimino
52fdbdfe66 extract some RGB24 to Luma conversion function from enc/ to dsp/
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.

ConvertRowToY() should go into dsp/ too, at some point.

Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425
2015-10-30 00:28:11 -07:00
Pascal Massimino
fa8927efe4 Move ARGB->YUV functions from dec/vp8l.c to dsp/yuv.c
also switch to using ExtractAlpha() instead of hard-coding the loop.

The ARGBToY/UV functions are rather easy to port to SSE2 / NEON.

Change-Id: I8f1346a9ca427a36ce2d6c848369ca7964d8b3c7
2015-10-28 01:45:08 -07:00
Johann
d26d9def80 Use __has_builtin to check clang support
Older versions of Xcode with clang reporting versions 4.[012] and 5.0
did not include support for __builtin_bswap16. Checking in this manner
avoids using brittle version checks.

Matches a change to libvpx:
https://chromium-review.googlesource.com/305573
to fix:
https://code.google.com/p/webm/issues/detail?id=1082

Change-Id: I23ea466ee1b53b12cd3fb45f65a2186c8dda95a1
2015-10-14 17:48:08 -07:00
Pascal Massimino
9ba1894b9b rescaler: simplify ImportRow logic
incorporates the loop over 'channel' and removes one parameter

Change-Id: I4e3b33c111ca825fe96461583420413b17326409
2015-09-19 10:07:26 -07:00
Pascal Massimino
5ff0079ece fix rescaler vertical interpolation
* vertical expansion now uses bilinear interpolation
  * heavily assumes that the alpha plane is decoded in full, not row-by-row
  * split the RescalerExportRow and RescalerImportRow methods into Shrink
    and Expand variants.
  * MIPS implementation of ExportRowExpand is missing.

There's room for extra speed optim and code re-org, but let's keep that for later patches.

addresses https://code.google.com/p/webp/issues/detail?id=254

Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb
2015-09-18 17:32:11 -07:00
skal
ac76801159 introduce FTransform2 to perform two transforms at a time.
FTransform goes from ~12.0% to 11.5% total CPU time.

Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
James Zern
bf46d0acff fix mips2 build target
tested with mips1 and mips2; this should cover 3/4 as well.
fixes an ftbfs reported on the debian issue tracker:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785000

Change-Id: I2458487c92bd638589fdfec5adb4f22102a5960c
2015-05-13 10:36:22 -07:00
James Zern
b44eda3f60 dsp: add DSP_INIT_STUB
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning

Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
2015-04-02 23:55:35 -07:00
James Zern
182497993b dsp: s/VP8LSetHistogramData/VP8SetHistogramData/
this function is for lossy encoding; the VP8L prefix is used by lossless

Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb
2015-03-24 18:27:41 -07:00
Pascal Massimino
e9570dd987 stub for SSE4.1 support.
Change-Id: I0c845a98d2871cc8907ff7b914bab7747a92c7ed
2015-03-20 00:26:35 -07:00
James Zern
cabf4bd2bc dsp: add sse4.1 detection
bit 19 in ecx
no targets or code

https://software.intel.com/en-us/articles/using-cpuid-to-detect-the-presence-of-sse-41-and-sse-42-instruction-sets

Change-Id: Ie61b004dd5b6a3639b30bd9d2a09e6d7359b8040
2015-03-18 19:16:47 -07:00
Sam Clegg
ac4f5784a0 Disable NEON code on Native Client
The NEON assember in libwebp has not yet been ported
to Native Client. This changes disables it.
Related issue:
https://code.google.com/p/nativeclient/issues/detail?id=3205

Change-Id: I200291db7aa79d40c1f10cff7622c9b8599e6886
2015-03-10 16:17:25 -07:00
James Zern
b969f5dfac dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage
the attribute is only necessary in one location; remove it from the
prototypes.

Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2
2015-02-13 15:23:40 -08:00
James Zern
e15560107c move some cost tables from enc/ to dsp/
removes circular dependency between dsp and enc.

since:
a987fae MIPS: dspr2: added optimization for function GetResidualCost

Change-Id: Ifeb8fc02de89e2ba982ed7ffacd925d649bfec3c
2015-02-11 16:10:06 -08:00
James Zern
e3d9771aa1 VP8EncDspCostInit*: add missing TSan annotations
Change-Id: I4cdb84bc8c9a8c6aa34b5773c8fb69e5810a9809
2015-02-09 22:39:14 -08:00
Pascal Massimino
a987faedfa MIPS: dspr2: added optimization for function GetResidualCost
set/get residual C functions moved to new file in src/dsp
mips32 version of GetResidualCost moved to new file

Change-Id: I7cebb7933a89820ff28c187249a9181f281081d2
2015-02-07 02:13:26 -08:00
Pascal Massimino
774d4cb758 make VP8PredLuma16[] array non-const
Change-Id: I0ce7e4e847f9fffefb6544db9636068442a2d264
2015-02-04 17:00:22 +01:00
Pascal Massimino
7afdaf8496 Alpha coding: reorganize the filter/unfiltering code
Move the filtering code to their own dsp/ spot
New function: VP8FiltersInit()

Change-Id: I0b2041eab42346c59b972f2575b05509e6a8f7b1
2015-01-28 08:02:41 +01:00
Pascal Massimino
d581ba40ba follow-up: clean up WebPRescalerXXX dsp function
by removing redundant RFIX macros and using a plain-C fallback.

Change-Id: I52436c672bf20780b6fe3bcf43fe73e1abac10ff
2015-01-12 15:26:55 -08:00
James Zern
f8740f0d6c dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
for consistency with other defines shared across modules

Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
Pascal Massimino
ab66becaae introduce a separate WebPRescalerDspInit to initialize pointers
so that we keep the details of WebPRescaler in utils/rescaler.c
when possible.

Change-Id: Ib6c1029a09b84cbc7a7d2f70dafa4d4d9132cecc
2015-01-12 13:58:30 -08:00
James Zern
4f43d38ca8 enable NEON for Windows ARM builds
Change-Id: I230b353214ce44ab29ffd2df6ccd14345d6578e8
2015-01-09 19:11:55 -08:00
pascal massimino
c6d3292738 argb_sse2: cosmetics
clarify some variable names in PackARGB() + add some comments

Change-Id: I2bb91d6c52dcbcdebe0f92d5f2136c2d7d11af2a
2015-01-08 00:18:54 -08:00
Pascal Massimino
72d573f693 simplify the PackARGB signature
Change-Id: I51570e362126b2681f93211a4f59a3fedb5fd4b5
2015-01-05 02:10:04 -08:00
Djordje Pesut
7ce8788b06 MIPS: dspr2: added optimization for function MakeARGB32
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row

Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
2014-12-22 12:31:36 +01:00
Pascal Massimino
bad775715a simplify the Histogram struct, to only store max_value and last_nz
we don't need to store the whole distribution in order to compute the alpha

Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.

Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
2014-12-10 10:44:57 +01:00
Pascal Massimino
66ad372500 factorize BPS definition in dsp.h and add VP8Copy16x8
Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d
2014-12-04 13:45:14 +01:00
James Zern
e18571393d dsp: initialize VP8PredChroma8 in VP8DspInit()
the table becomes non-const to allow for platform-specific optimizations

Change-Id: I32d2b51480020dc653ecfafd20b6b0f096af349f
2014-11-24 22:12:42 -08:00
Pascal Massimino
97c76f1f30 make VP8PredLuma4[] non-const and initialize array in VP8DspInit()
also convert 'type *dst' to 'type* dst'

Change-Id: I41ab66ad15b548cc45d1cb8b10bbca4fe1528cae
2014-10-22 18:14:20 +02:00
James Zern
d1c359ef29 fix shared object build with -fvisibility=hidden
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples

Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
2014-10-17 11:50:52 +02:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6 mark some init function as being safe for thread_sanitizer.
introduces the macro WEBP_TSAN_IGNORE_FUNCTION

Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
Pascal Massimino
2d9b0a4472 add WebPDispatchAlphaToGreen() to dsp
SSE2 version is 2.1x faster

This is used to transfer the alpha plane to green channel before lossless compression.

Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
2014-10-06 23:15:44 +02:00
Pascal Massimino
cddd334050 Add a WebPExtractAlpha function to dsp
This is the opposite of WebPDispatchAlpha

+ Implement the SSE2 version

Change-Id: I0c297309255f508c5261da8aad01f7e57f924d6c
2014-09-15 08:12:03 +02:00
Pascal Massimino
a6bb9b17d8 SSE2 for inverse Mult(ARGB)Row and ApplyAlphaMultiply
Change-Id: Iab5c0e4a4d2b31f86736a9b277e62b6e28c3d2b4
WebPMultRow: ~7x faster
WebPMultARGBRow: ~3x faster
ApplyAlphaMultiply: 60% faster
2014-09-11 07:58:42 +02:00
James Zern
8323a9038d dsp.h: collect gcc/clang version test macros
endian_inl.h already relies on dsp.h, grab the definitions from there.

Change-Id: I445f7d0631723043c55da1070498f89965bec7b1
2014-08-27 19:33:09 -07:00