Commit Graph

126 Commits

Author SHA1 Message Date
James Zern
fab8f9cfcf cosmetics: normalize '*' association
we associate '*' with types rather than variables

Change-Id: Id93ed65272a8a88e604278693e3850649639e9b6
2019-07-26 01:04:09 -07:00
Vincent Rabaud
cbf82cc04d Remove AVX2 files.
There is only enc_avx2.c and we never managed to get
something fast enough.

Change-Id: I7465b5d8ccf47d9aa612173b8f80f96060cdb366
2018-10-16 14:12:03 +02:00
James Zern
de08d72741 cosmetics: normalize include guard comment
Change-Id: I0e08ec604aad8412cfe3d3670d773f4ae5650375
2018-08-22 14:46:53 -07:00
James Zern
d77bf512bd add WEBP_DSP_INIT / WEBP_DSP_INIT_FUNC
this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.

Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
2018-04-17 11:45:34 +00:00
Pascal Massimino
3b07d32712 Import,RGBA: fix for BigEndian import
+ simplification of the logic

Change-Id: Ia20ce844793ed35ea03a17cef45838f3d0ae4afa
2018-02-17 13:07:58 -08:00
James Zern
b299c47eac add WEBP_REDUCE_SIZE
remove auto-filter (-af) support and make WebPPictureCopy,
WebPPictureIsView, WebPPictureView, WebPPictureCrop, and
WebPPictureRescale noops.

Change-Id: If39d512cc268a0015298a1138dbc94feb86575e5
2017-11-22 17:35:39 -08:00
James Zern
eab5bab74f add WEBP_DISABLE_STATS
use to to make WebPPictureDistortion & WebPPlaneDistortion noops and
clear some ssim code.

Change-Id: I9b50b2318b7a114632e5a237a4002f64e95afbbc
2017-11-22 12:41:17 -08:00
Pascal Massimino
44a0ee3fa7 introduce WebPHasAlpha8b and WebPHasAlpha32b
Rewrote WebPPictureHasTransparency() to use them (even for argb).
This is 10% faster, for some reasons.

SSE2 version should be straightforward.
Removes a TODO.

Change-Id: I7ad5848fc5e355e2df505dbcd5a0f42fb6cbab41
2017-11-20 15:20:29 +01:00
James Zern
b7971d0e22 dsp: avoid defining _C functions w/NEON builds
when targeting NEON C functions with NEON equivalents won't be used, but
will contribute to binary size. the same goes for sse2, etc., but this
change is primarily concerned with binary sizes for android arm targets.

note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect
on the use of NEON functions.

this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0.

Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface
2017-10-27 10:54:56 -07:00
James Zern
a439972175 WIP: list includes as descendants of the project dir
#include "(.|..)/..." -> #include "src/..."

Change-Id: I772880aa097a770722043c8a4393552ba38a89b6
2017-10-10 23:04:05 -07:00
James Zern
f78da3dea6 add LOCAL_CLANG_PREREQ and avoid WORK_AROUND_GCC w/3.8+
this results in a 15-20% speedup for lossy decoding on a N5/S6/CM1

BUG=webp:339

Change-Id: Icdeb84c3e0b8908147ac276b4d8f76c3d565b735
2017-09-19 20:59:49 -07:00
James Zern
6473d20b3e Merge "fix Android standalone toolchain build" 2017-08-04 18:25:21 +00:00
James Zern
c6d1db4b36 fix Android standalone toolchain build
add a check for cpu-features.h and rework some of the ifdef's around
android + neon. for android builds with cpu-features enabled the
*_neon.c files will still need to be flagged correctly (with e.g.,
.c.neon in Android.mk) to properly build them.

BUG=webp:353

Change-Id: I905ce305af0a204e560b915d8665093a3edaceb9
2017-08-01 22:59:03 -07:00
skal
663a6d9d2e unify the ALTERNATE_CODE flag usage
Pattern is now:
 #if !defined(FLAG)
 #define FLAG 0   // ALTERNATE_CODE
 #endif
...
 #if (FLAG == 1)
 ...
 #else
  ...
 #endif    // FLAG
...

Removed some unused code / flags:
  WEBP_YUV_USE_TABLE, WEBP_REFERENCE_IMPLEMENTATION,
  experimental code,  VP8YUVInit(), ...

BUG=webp:355

Change-Id: I98deb9189446a4cfd665c13ea8aa1ce6a308c63f
2017-08-01 20:49:29 -07:00
skal
c4568b47fd Rescaler: harmonize the suffix naming
BUG=webp:355

Change-Id: I7720502c62f96c780793d3d881eac7b3afae1418
2017-08-01 23:49:44 +00:00
Pascal Massimino
6cb13b0532 Merge "alpha_processing: harmonize the naming suffixes to be _C()" 2017-08-01 03:38:03 +00:00
James Zern
8e42ba4c80 simplify WEBP_EXTERN macro
including the type in the macro doesn't bring much benefit to ordering,
current platforms work with a prefix, this would be insufficient if the
attribute needed to follow the function prototype. this form makes it
easier to override on the command line.

BUG=webp:355

Change-Id: Iba41ec0bb319403054be0e899c4cc472dd932fd9
2017-07-31 18:27:52 -07:00
skal
dec5e4d330 alpha_processing: harmonize the naming suffixes to be _C()
BUG=webp:355

Change-Id: Iae8221cd34957764ead21aa46abfc320e5514a4b
2017-07-31 23:34:24 +00:00
James Zern
92982609bc dsp.h: fix -Wundef w/__mips_dsp_rev
Change-Id: I552a543c7b039774041b43ace75b0cbea566b119
2017-07-11 16:12:32 -07:00
Vincent Rabaud
8acb4942f7 Remove the argb* files.
Half of the functionality was duplicated.
The rest is about the alpha channel handling so we
might as well put it in the appropriate file.

Change-Id: I8d5ef0afce82cc4842ab7132fd97995c42e6140a
2017-06-25 14:44:33 +02:00
Pascal Massimino
4105d565d3 disable WEBP_USE_XXX optimisations when EMSCRIPTEN is defined
Currently, none are available. If WEBP_HAVE_SSE2 eventually works,
we'll have to refine this conditionals.

BUG=webp:261

Change-Id: Ibc63ee1c013f2a4169eeb85cc8b6317b6420c2ad
2017-02-08 15:44:20 +00:00
Pascal Massimino
79bf46f120 rename the pretentious SmartYUV into SharpYUV
Change-Id: Ifeeb9cb85896c5f3ba0cc1c2c821f8d00295f69e
2017-01-20 14:36:21 +01:00
Pascal Massimino
1c07a3c639 dsp: WebPExtractGreen function for alpha decompression
+ NEON implementation

Change-Id: I67204f99d6e4c5974718bdf21dad30381978f72c
2017-01-17 09:33:25 +00:00
Pascal Massimino
86bbd24552 add a kSlowSSSE3 feature for CPUInfo
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.

Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731

Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
2017-01-13 06:19:27 +00:00
Pascal Massimino
9ac063c37f add dsp functions for SmartYUV
+ SSE2 implementation

Change-Id: I5cfdb62d68b5a95899241a097d3a2f697fbc590e
2016-11-16 14:23:06 +00:00
Pascal Massimino
bfff0bf329 speed-up SSIM calculation
SSIM results are incompatible with previous version!
We're now averaging the SSIM value for each pixels instead of
printing a frame-level global SSIM value.

* Got rid of some old code
* switched to uint32_t for accumulation
* refactoring

SSIM calculation is ~4x faster now.

Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673
2016-09-14 16:15:43 +02:00
Pascal Massimino
50c3d7da9a refactor the PSNR / SSIM calculation code
-print_psnr is now much faster because it doesn't use the SSIM code.
The SSIM speed-up and re-write will come later.

Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823
2016-09-14 06:13:24 +00:00
Pascal Massimino
3884972e3f remove WEBP_FORCE_ALIGNED and use memcpy() instead.
BUG=webp:297

Change-Id: I89a08debec7bb1b3f411c897260ab1bb63f77df2
2016-08-17 20:16:03 -07:00
skal
5b60db5c9d FastMBAnalyze() for quick i16/i4 decision
The decision is based on the variance between DC values of each
sub-4x4 block. This heuristic is rather ok for predicting whether
the 2nd transform (intra-16) is going to help or not.
The decision threshold varies with quality (=quantization).

It's only used for -m 0 and -m 1, where no full RD-opt is performed.
It actually makes these modes quite faster, with RD curve much
closer to the -m 2 mode.

Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8
2016-07-15 11:21:08 -07:00
Parag Salasakar
e11da081f9 mips msa webp configuration
Change-Id: I886164d6d3d560b1249603d47391fddf20b5a3d4
2016-06-07 23:49:41 -07:00
Pascal Massimino
77f21c9c39 Move DitherCombine8x8 to dsp/dec.c
To be later optimized in SSE2

Change-Id: I0de9c89eb5166f3319bb4b0500150de271ecac05
2016-05-24 23:14:41 -07:00
James Zern
e15afbce5d dsp.h: fix ubsan macro name
copy and paste error in the previous commit, change
no_sanitize("unsigned-integer-overflow") from WEBP_UBSAN_IGNORE_UNDEF ->
WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW

Change-Id: Id178ee14df1f2c4923a91ce423241e26b60b5d32
2016-05-13 11:09:57 -07:00
James Zern
e53c9ccb24 dsp.h: add WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
for suppressing expected failures with -fsanitize=integer

Change-Id: I954cba45f0c96478b770ed7a6ac7491359cae075
2016-05-12 23:51:23 -07:00
James Zern
ea0be354a0 dsp.h: remove utils.h include
include utils.h directly where needed to allow utils.h to rely on
defines from dsp.h in a follow-up.

Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d
2016-05-11 23:17:21 -07:00
James Zern
ea24e026aa Merge "dsp.h: add WEBP_UBSAN_IGNORE_UNDEF" 2016-05-11 06:21:45 +00:00
James Zern
369e264e2e dsp.h: add WEBP_UBSAN_IGNORE_UNDEF
only defined when WEBP_FORCE_ALIGNED isn't. use it to quiet alignment
warnings VP8LoadNewBytes().

Change-Id: I710a74bb9375285974e97022540551a3f4eda414
2016-05-10 22:45:13 -07:00
James Zern
74fb56fb5d add runtime NEON detection
configure gets 2 new options:
--enable-neon / --enable-neon-rtcd

the NEON modules are split to their own convenience lib and built with
auto-detected flags if none are given via CFLAGS.

the /proc/cpuinfo check will only be used for armv7 targets whose
toolchain does not enable NEON by default or didn't have NEON forced by
the CFLAGS from the environment.

Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5
2016-05-06 15:32:48 -07:00
Pascal Massimino
2102ccd091 update the Unfilter API in dsp to process one row independently
This will allow to work in-place on cropped area later.

Also sped up the inverse gradient filtering in SSE2 (~4%)

Change-Id: I463149eee95d36984328f163a1e17f8cabd87441
2016-04-21 08:10:45 +00:00
Pascal Massimino
a90edffb7e fix missing 'extern' for SSIM function in dsp/
Change-Id: Id8143120f01065dc088f4e90bd930f8ea7c3ae5a
2016-03-08 10:27:46 -08:00
Pascal Massimino
423ecaf484 move some SSIM-accumulation function for dsp/
This is in preparation for some SSE2 code.

And generally speaking, the whole SSIM code needs some
revamp: we're not averaging the SSIM value at each pixels
but just computing the overall SSIM value once, for the whole
plane. The former might be better than the latter.

Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5
2016-03-08 07:50:09 +01:00
Vincent Rabaud
9960c31685 Remove an unnecessary transposition in TTransform.
Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e
2016-02-17 21:41:28 +01:00
Pascal Massimino
2c08aac81a introduce WebPMemToUint32 and WebPUint32ToMem for memory access
it uses memcpy() when unaligned memory write is tricky

Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874
2015-12-04 13:43:01 +00:00
Pascal Massimino
bfd3fc02df ~2x faster SSE2 RGB24toY, BGR24toY, ARGBToY|UV
global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)

Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV

Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
2015-11-06 15:02:01 -08:00
Pascal Massimino
52fdbdfe66 extract some RGB24 to Luma conversion function from enc/ to dsp/
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.

ConvertRowToY() should go into dsp/ too, at some point.

Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425
2015-10-30 00:28:11 -07:00
Pascal Massimino
fa8927efe4 Move ARGB->YUV functions from dec/vp8l.c to dsp/yuv.c
also switch to using ExtractAlpha() instead of hard-coding the loop.

The ARGBToY/UV functions are rather easy to port to SSE2 / NEON.

Change-Id: I8f1346a9ca427a36ce2d6c848369ca7964d8b3c7
2015-10-28 01:45:08 -07:00
Johann
d26d9def80 Use __has_builtin to check clang support
Older versions of Xcode with clang reporting versions 4.[012] and 5.0
did not include support for __builtin_bswap16. Checking in this manner
avoids using brittle version checks.

Matches a change to libvpx:
https://chromium-review.googlesource.com/305573
to fix:
https://code.google.com/p/webm/issues/detail?id=1082

Change-Id: I23ea466ee1b53b12cd3fb45f65a2186c8dda95a1
2015-10-14 17:48:08 -07:00
Pascal Massimino
9ba1894b9b rescaler: simplify ImportRow logic
incorporates the loop over 'channel' and removes one parameter

Change-Id: I4e3b33c111ca825fe96461583420413b17326409
2015-09-19 10:07:26 -07:00
Pascal Massimino
5ff0079ece fix rescaler vertical interpolation
* vertical expansion now uses bilinear interpolation
  * heavily assumes that the alpha plane is decoded in full, not row-by-row
  * split the RescalerExportRow and RescalerImportRow methods into Shrink
    and Expand variants.
  * MIPS implementation of ExportRowExpand is missing.

There's room for extra speed optim and code re-org, but let's keep that for later patches.

addresses https://code.google.com/p/webp/issues/detail?id=254

Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb
2015-09-18 17:32:11 -07:00
skal
ac76801159 introduce FTransform2 to perform two transforms at a time.
FTransform goes from ~12.0% to 11.5% total CPU time.

Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
James Zern
bf46d0acff fix mips2 build target
tested with mips1 and mips2; this should cover 3/4 as well.
fixes an ftbfs reported on the debian issue tracker:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785000

Change-Id: I2458487c92bd638589fdfec5adb4f22102a5960c
2015-05-13 10:36:22 -07:00