Commit Graph

133 Commits

Author SHA1 Message Date
Pascal Massimino
c1cb1933d5 disable NEON for arm64 platform
The registers and instructions are quite different to 32bit
and the assembly code needs a rewrite.

more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/

Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f
2014-01-23 12:35:01 -08:00
skal
66a32af5e1 Merge "NEON speed up" 2013-12-18 14:17:19 -08:00
skal
26d842eb8f NEON speed up
add TransformDC special case, and make the switch function inlined.
Recovers a few of the CPU lost during the addition of TransformAC3
(only on ARM)

Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6
2013-12-18 22:32:58 +01:00
James Zern
605a712701 simplify __cplusplus ifdef
drop c_plusplus which is from a quite ancient pre-standard compiler

Change-Id: I9e357b3292a6b52b14c2641ba11f4f872c04b7fb
2013-12-16 20:16:02 -08:00
James Zern
5227d99146 drop: ifdef __cplusplus checks from C files
the prototypes are already marked in the headers

Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094
2013-12-13 11:42:13 -08:00
skal
73b731fb42 introduce a special quantization function for WHT
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.

Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
2013-12-10 14:21:47 +01:00
skal
41c0cc4b9a Make Forward WHT transform use 32bit fixed-point calculation
This is in preparation for a future change where input will
be 16bit instead of 12bit

No speed diff observed.

Note that the NEON implementation was using 32bit calc already.

Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4
2013-12-10 06:10:52 +01:00
skal
d513bb62bc * fix off-by-one zthresh calculation
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this

Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)

Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24
2013-12-09 13:56:04 +01:00
James Zern
4931c3294b cosmetics: fix some typos
Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c
2013-11-26 19:21:14 -08:00
Pascal Massimino
596a6d73ce make use of 'extern' consistent in function declarations
Change-Id: I18e050db3111e52acfe97da09cdf1860f3e15936
2013-10-30 03:23:21 -07:00
skal
0b2b05049f Use deterministic random-dithering during RGB->YUV conversion
-> helps debanding (sky, gradients, etc.)

This dithering can only be triggered when using -preset photo
or -pre 2 (as a preprocessing). Everything is unchanged otherwise.

Note that this change is likely to make the perceived PSNR/SSIM drop
since we're altering the input internally.

Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813
2013-10-17 22:36:49 +02:00
James Zern
dca8a4d315 Merge "NEON/simple loopfilter: avoid q4-q7 registers" 2013-10-10 01:58:41 -07:00
pascal massimino
9e84d901d2 Merge "NEON/TransformWHT: avoid q4-q7 registers" 2013-10-09 09:32:59 -07:00
James Zern
fc10249b36 NEON/simple loopfilter: avoid q4-q7 registers
very tiny speed improvement

Change-Id: I3024f120feb7275ce20bfff21af31ea8650a5a03
2013-10-09 18:17:31 +02:00
James Zern
2f09d63e30 NEON/TransformWHT: avoid q4-q7 registers
very tiny speed improvement

Change-Id: Iace78b9038af412d0a794845ff19f54afa88ccdc
2013-10-09 18:17:23 +02:00
skal
f9bbc2a034 Special-case sparse transform
If the number of non-zero coeffs is <= 3, use a
simplified transform for luma.

Change-Id: I78a1252704228d21720d4bc1221252c84338d9c8
2013-10-08 22:05:38 +02:00
Pascal Massimino
f8398c9dab fix compile error on ARM/gcc
use of uint8_t type was causing error like:
src/dsp/upsampling.c:223:1: internal compiler error: in vect_determine_vectorization_factor, at tree-vect-loop.c:349

with gcc 4.6.3

Change-Id: Ieb6189a1375c47fc4ff992e6c09b34a7f1f605da
2013-09-06 03:07:28 -07:00
James Zern
b25a6fbfdc yuv.h: fix indent
Change-Id: I0c0bd5f7f71bc44e10134bd4f788769ec25cec1f
2013-08-19 18:06:15 -07:00
James Zern
388a7249c9 cosmetics: fix indent
Change-Id: Iad0fce79886bed0d61ddf2510ce133a5355ebc1f
2013-08-19 17:51:04 -07:00
James Zern
4c7322c86f Merge "dsp: msvc compatibility" 2013-08-19 17:42:16 -07:00
skal
df6cebfa9e 5-7% faster SSE2 versions of YUV->RGB conversion functions
The C-version gets ~7-8% slower in order to match the SSE2
output exactly. The old (now off-by-1) code is kept under
the WEBP_YUV_USE_TABLE flag for reference.

(note that calc rounding precision is slightly better ~= +0.02dB)

on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking
the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610)

Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f
2013-08-19 17:05:58 -07:00
skal
ad6ac32d7c simplify upsampler calls: only allow 'bottom' to be NULL
If 'top' was meant to be NULL, then bottom and top can be
swapped. Logic is simpler.

+ fix compilation in non-FANCY_UPSAMPLING mode

Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff
2013-08-19 16:47:51 -07:00
James Zern
f358450feb dsp: msvc compatibility
intrin.h is available after VS2003

patch from the FreeImage project

Change-Id: I58a18a0db00e247f871d05e3ba99772704f0e079
2013-08-16 20:46:16 -07:00
Vikas Arora
e081f2f359 Pack code & extra_bits to Struct (VP8LPrefixCode).
Also created variant VP8LPrefixEncodeBits that returns the
code & extra_bits only.
There's no impact on compression density and compression speed.

Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c
2013-08-12 11:56:42 -07:00
Vikas Arora
69257f70df Create LUT for PrefixEncode.
This speeds up lossless compression by 5%.
Change-Id: Ifd114b1d9850dc3aac74593809e7d48529d35e3d
2013-08-05 10:20:18 -07:00
Vikas Arora
8967b9f37e SSE2 for lossless decoding (critical) functions.
This speeds up WebP lossless decoding by 20%. In particular, the
photographic images get 35% speedup.

Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc
2013-06-27 11:42:45 -07:00
James Zern
d640614d54 update copyright text
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files

based on the discussion in:
https://codereview.chromium.org/12771026/

Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
2013-06-06 23:09:14 -07:00
skal
af358e68ed Merge "remove datatype qualifier for vmnv" 2013-05-23 06:12:06 -07:00
skal
3fe91635df remove datatype qualifier for vmnv
this fix is for clang (LLVM v4.2). gcc was fine.

Change-Id: Id4076cda84813f6f9548a01775b094cff22b4be9
2013-05-23 13:52:24 +02:00
James Zern
2ca83968ae webp/lossless: fix big endian BGRA output
Change-Id: I3d4b3d21f561cb526dbe7697a31ea847d3e8b2c1
2013-05-17 00:32:01 -07:00
skal
87a4fca25f remove some warnings:
* "declaration of ‘index’ shadows a global declaration [-Wshadow]"
* "signed and unsigned type in conditional expression [-Wsign-compare]"

Change-Id: I891182d919b18b6c84048486e0385027bd93b57d
2013-05-14 22:28:32 +02:00
Urvang Joshi
64c844863a Further reduce memory to decode lossy+alpha images
Earlier such images were using roughly 9 * width * height bytes for
decoding. Now, they take 6 * width * height memory.

Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7
2013-05-13 16:24:49 -07:00
Vikas Arora
8eae188a62 WebP-Lossless encoding improvements.
Lossy (with Alpha) image compression gets 2.3X speedup.
Compressing lossless images is 20%-40% faster now.

Change-Id: I41f0225838b48ae5c60b1effd1b0de72fecb3ae6
2013-05-08 17:22:11 -07:00
skal
9c4ce971a8 Simplify forward-WHT + SSE2 version
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.

also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )

Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
2013-04-26 08:57:18 +02:00
Urvang Joshi
d52b405dbd Cosmetic fixes
Change-Id: Ia878115086edc3fdfee3f0ca76e5e74ea5906f21
(cherry picked from commit e9a7990bc5)
2013-03-29 15:49:15 -07:00
Pascal Massimino
6cb4a61825 misc style fix
(cherry picked from commit 142c46291e)

Conflicts:
	src/webp/format_constants.h

Change-Id: Ib764cb09bd78ab6e72c60f495d55b752ad4dbe4d
2013-03-29 15:49:05 -07:00
Pascal Massimino
3c8eb9a806 fix bad saturation order in QuantizeBlock
Saturation was done on input coeff, not quantized one.

This saturation is not absolutely needed: output of FTransformWHT
is in range [-16320, 16321]. At quality 100, max quantization steps is 8,
so the maximal range used by QuantizeBlock() is [-2040, 2040].
But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so
it's better to leave this saturation check for now.

addresses issue #145

Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28
2013-03-25 14:53:29 -07:00
James Zern
9048494df6 build: fix install race on shared headers
subdirectories with more than one target can have the install targets
run in parallel with make -jN. group the shared headers in one place to
produce a common install target.

Change-Id: I1f3aa338a8ee6d681de1e5d0b2c6244d2c3d5451
2013-03-16 13:29:49 -07:00
skal
126974b45b add LUT-free reference code for YUV->RGB conversion.
Reported to eventually be 4% on ARM
(see https://code.google.com/p/webp/issues/detail?id=134 for details)
We might activate it selectively later...

Output values is not bitwise the same as the LUT-based
version, but difference is only +/-1 at max.

Change-Id: I1cc790ff4459885ed2ae2e72f31c5f3740095f07
2013-03-15 01:37:55 +01:00
skal
b7eaa85d6a inline VP8LFastLog2() and VP8LFastSLog2 for small values
larger values are still dealt with in the .cc

~5% faster encoding
Output size is slightly different (variably), because of
different floating-point calculation ordering.

Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4
2013-02-25 22:46:52 +01:00
skal
943386db4b disable SSE2 for now
(until proper run-time detection is ready)

Change-Id: I7b8eee52b23fce2f1612ad7d4ed603ffb02620a2
2013-02-20 08:20:47 +01:00
skal
9479fb7d2d lossless encoding speedup
* add SSE2 variant for lossless
* speed-up TransformColor calls using specialized TransformColorBlue/Red
* Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously.

This latter changes the output size a little bit.

Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89
2013-02-20 08:13:12 +01:00
skal
b7490f8553 introduce WEBP_REFERENCE_IMPLEMENTATION compile option
This flag will make the code use no uint64, no asm, and no fancy
trick, but instead aim at being as simple and straightforward as
possible.
Main use is to help emscripten generate proper JS code.
More code needs to be simplified later.

Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY
Here are the typical timing for decoding a large image:

        ARM7-a:
        dwebp_justify_32_neon Time to decode picture: 3.280s
        dwebp_justify_24_neon Time to decode picture: 2.640s
        dwebp_justify_16_neon Time to decode picture: 2.723s
        dwebp_justify_8_neon Time to decode picture: 2.802s
        dwebp_justify_32 Time to decode picture: 4.264s
        dwebp_justify_24 Time to decode picture: 3.696s
        dwebp_justify_16 Time to decode picture: 3.779s
        dwebp_justify_8 Time to decode picture: 3.834s
        dwebp_32_neon Time to decode picture: 4.010s
        dwebp_24_neon Time to decode picture: 2.725s
        dwebp_16_neon Time to decode picture: 2.852s
        dwebp_8_neon Time to decode picture: 2.778s
        dwebp_32 Time to decode picture: 4.587s
        dwebp_24 Time to decode picture: 3.800s
        dwebp_16 Time to decode picture: 3.902s
        dwebp_8 Time to decode picture: 3.815s
        REFERENCE (HEAD) Time to decode picture: 3.818s

        x86_64:
        dwebp_justify_32 Time to decode picture: 0.473s
        dwebp_justify_24 Time to decode picture: 0.434s
        dwebp_justify_16 Time to decode picture: 0.450s
        dwebp_justify_8 Time to decode picture: 0.467s
        dwebp_32 Time to decode picture: 0.474s
        dwebp_24 Time to decode picture: 0.468s
        dwebp_16 Time to decode picture: 0.468s
        dwebp_8 Time to decode picture: 0.481s
        REFERENCE (HEAD) Time to decode picture: 0.436s

        i386:
        dwebp_justify_32 Time to decode picture: 0.723s
        dwebp_justify_24 Time to decode picture: 0.618s
        dwebp_justify_16 Time to decode picture: 0.626s
        dwebp_justify_8 Time to decode picture: 0.651s
        dwebp_32 Time to decode picture: 0.744s
        dwebp_24 Time to decode picture: 0.627s
        dwebp_16 Time to decode picture: 0.642s
        dwebp_8 Time to decode picture: 0.642s

Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225
2013-02-14 15:46:12 +01:00
pascal massimino
841a3ba5da Merge "Remove -Wshadow warnings." 2013-01-28 13:15:54 -08:00
Johann
6efed26865 Remove -Wshadow warnings.
Accidentally carried some bad habits from SSE code. Copy over fixes
from 0d19fbf

Change-Id: I763312c9d176c434ba41f95602bada1aeffebfb2
2013-01-28 12:29:12 -08:00
James Zern
27f8f7420e upsampling_neon.c: fix build
store values to a temporary variable before calling functions that take
vector types.
removes non-standard constructs such as:
  (uint8x8x2_t){{ a, b }}
fixing:
  src/dsp/upsampling_neon.c:69:32: error: macro "vst2_u8" passed 3
arguments, but takes just 2

Change-Id: Ib4368e16e3a3efac18024f02be94e76243ade2dc
Fixes: https://code.google.com/p/webp/issues/detail?id=140
2013-01-25 19:42:50 -08:00
Mans Rullgard
090b708a00 NEON optimised yuv to rgb conversion
- along the lines of the SSE chroma upsampling.
Total speedup is ~30%.

4% speed loss on YuvToRgbXX conversion using tables instead
of 14-bit fixed precision. TODO(later): investigate, and compare
to x86.

see http://code.google.com/p/webp/issues/detail?id=134

Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009
2013-01-25 15:46:40 -08:00
James Zern
be7c96b069 cosmetics: break a few long lines
Change-Id: I785763b974b4e7664ad8e9884251aa2d5274b456
2013-01-23 14:50:19 -08:00
Vikas Arora
0aeba52852 Provide an option to build decoder library.
When the config option '--enable-libwebpdecoder' is specified, the
lean decoder library 'libwebpdecoder' will be created in addition to
libwebp. Also dwebp binary will be linked to libwebpdecoder, if this
config option is specified.

Change-Id: I9de3e149b59c9a8390fae2ba660941749640e54a
2013-01-23 11:43:36 -08:00
James Zern
2b252a53a8 Merge "Provide option to swap bytes for 16 bit colormodes" 2013-01-22 15:00:39 -08:00