291 Commits

Author SHA1 Message Date
skal
32aeaf115a revamp VP8LColorSpaceTransform() a bit
-> remove the 'color_transform' multiplier, use more constants, etc.

This function is particularly critical, mostly because of
GetBestColorTransformForTile().
Loop is a bit faster (maybe ~1%)

Change-Id: I90c96a3437cafb184773acef55c77e40c224388f
2014-02-05 10:37:06 +01:00
skal
926ff40229 WEBP_SWAP_16BIT_CSP: remove code dup
and prepare for potentially supporting both RGBA4444 and BARG4444

Change-Id: If5200289bc6338757a2ceb2df1a19de732595052
2014-02-03 13:24:33 -08:00
Vikas Arora
1d1cd3bbd6 Fix decode bug for rgbA_4444/RGBA_4444 color-modes.
The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits)
data in the destination buffer and while pre-multiplying the alpha to RGB colors.

Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba
2014-02-03 09:20:54 -08:00
James Zern
8934a622ac cosmetics: *_mips32.c
indent, comments, unused includes

Change-Id: Id0aabc52d05bb633f62aec022155ec27699cf5a0
2014-01-30 18:03:48 -08:00
Djordje Pesut
dd438c9a7d MIPS: MIPS32r1: Optimization of some simple point-sampling functions. PATCH [6/6]
Change-Id: I2020e71e9be5d17d4bf67cabf6c470ca43d5d838
2014-01-29 15:37:31 +01:00
Djordje Pesut
53520911c3 Added support for calling sampling functions via pointers.
Change-Id: Ic4d72e6b175a6b27bcdcc8cd97828e44ea93e743
2014-01-29 15:32:35 +01:00
Jovan Zelincevic
d16c69749b MIPS: MIPS32r1: Optimization of filter functions. PATCH [5/6]
Change-Id: Ifbd305e0514f09a587db02c3970f22190808503a
2014-01-29 15:03:45 +01:00
Djordje Pesut
04336fc7f8 MIPS: MIPS32r1: Optimization of function TransformOne. PATCH [4/6]
Change-Id: I5b98e2de940977538cf91bfa2128f4d1daa5c170
2014-01-28 20:10:43 -08:00
Pascal Massimino
c1cb1933d5 disable NEON for arm64 platform
The registers and instructions are quite different to 32bit
and the assembly code needs a rewrite.

more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/

Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f
2014-01-23 12:35:01 -08:00
skal
66a32af5e1 Merge "NEON speed up" 2013-12-18 14:17:19 -08:00
skal
26d842eb8f NEON speed up
add TransformDC special case, and make the switch function inlined.
Recovers a few of the CPU lost during the addition of TransformAC3
(only on ARM)

Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6
2013-12-18 22:32:58 +01:00
James Zern
605a712701 simplify __cplusplus ifdef
drop c_plusplus which is from a quite ancient pre-standard compiler

Change-Id: I9e357b3292a6b52b14c2641ba11f4f872c04b7fb
2013-12-16 20:16:02 -08:00
James Zern
5227d99146 drop: ifdef __cplusplus checks from C files
the prototypes are already marked in the headers

Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094
2013-12-13 11:42:13 -08:00
skal
73b731fb42 introduce a special quantization function for WHT
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.

Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
2013-12-10 14:21:47 +01:00
skal
41c0cc4b9a Make Forward WHT transform use 32bit fixed-point calculation
This is in preparation for a future change where input will
be 16bit instead of 12bit

No speed diff observed.

Note that the NEON implementation was using 32bit calc already.

Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4
2013-12-10 06:10:52 +01:00
skal
d513bb62bc * fix off-by-one zthresh calculation
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this

Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)

Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24
2013-12-09 13:56:04 +01:00
James Zern
4931c3294b cosmetics: fix some typos
Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c
2013-11-26 19:21:14 -08:00
Pascal Massimino
596a6d73ce make use of 'extern' consistent in function declarations
Change-Id: I18e050db3111e52acfe97da09cdf1860f3e15936
2013-10-30 03:23:21 -07:00
skal
0b2b05049f Use deterministic random-dithering during RGB->YUV conversion
-> helps debanding (sky, gradients, etc.)

This dithering can only be triggered when using -preset photo
or -pre 2 (as a preprocessing). Everything is unchanged otherwise.

Note that this change is likely to make the perceived PSNR/SSIM drop
since we're altering the input internally.

Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813
2013-10-17 22:36:49 +02:00
James Zern
dca8a4d315 Merge "NEON/simple loopfilter: avoid q4-q7 registers" 2013-10-10 01:58:41 -07:00
pascal massimino
9e84d901d2 Merge "NEON/TransformWHT: avoid q4-q7 registers" 2013-10-09 09:32:59 -07:00
James Zern
fc10249b36 NEON/simple loopfilter: avoid q4-q7 registers
very tiny speed improvement

Change-Id: I3024f120feb7275ce20bfff21af31ea8650a5a03
2013-10-09 18:17:31 +02:00
James Zern
2f09d63e30 NEON/TransformWHT: avoid q4-q7 registers
very tiny speed improvement

Change-Id: Iace78b9038af412d0a794845ff19f54afa88ccdc
2013-10-09 18:17:23 +02:00
skal
f9bbc2a034 Special-case sparse transform
If the number of non-zero coeffs is <= 3, use a
simplified transform for luma.

Change-Id: I78a1252704228d21720d4bc1221252c84338d9c8
2013-10-08 22:05:38 +02:00
Pascal Massimino
f8398c9dab fix compile error on ARM/gcc
use of uint8_t type was causing error like:
src/dsp/upsampling.c:223:1: internal compiler error: in vect_determine_vectorization_factor, at tree-vect-loop.c:349

with gcc 4.6.3

Change-Id: Ieb6189a1375c47fc4ff992e6c09b34a7f1f605da
2013-09-06 03:07:28 -07:00
James Zern
b25a6fbfdc yuv.h: fix indent
Change-Id: I0c0bd5f7f71bc44e10134bd4f788769ec25cec1f
2013-08-19 18:06:15 -07:00
James Zern
388a7249c9 cosmetics: fix indent
Change-Id: Iad0fce79886bed0d61ddf2510ce133a5355ebc1f
2013-08-19 17:51:04 -07:00
James Zern
4c7322c86f Merge "dsp: msvc compatibility" 2013-08-19 17:42:16 -07:00
skal
df6cebfa9e 5-7% faster SSE2 versions of YUV->RGB conversion functions
The C-version gets ~7-8% slower in order to match the SSE2
output exactly. The old (now off-by-1) code is kept under
the WEBP_YUV_USE_TABLE flag for reference.

(note that calc rounding precision is slightly better ~= +0.02dB)

on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking
the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610)

Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f
2013-08-19 17:05:58 -07:00
skal
ad6ac32d7c simplify upsampler calls: only allow 'bottom' to be NULL
If 'top' was meant to be NULL, then bottom and top can be
swapped. Logic is simpler.

+ fix compilation in non-FANCY_UPSAMPLING mode

Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff
2013-08-19 16:47:51 -07:00
James Zern
f358450feb dsp: msvc compatibility
intrin.h is available after VS2003

patch from the FreeImage project

Change-Id: I58a18a0db00e247f871d05e3ba99772704f0e079
2013-08-16 20:46:16 -07:00
Vikas Arora
e081f2f359 Pack code & extra_bits to Struct (VP8LPrefixCode).
Also created variant VP8LPrefixEncodeBits that returns the
code & extra_bits only.
There's no impact on compression density and compression speed.

Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c
2013-08-12 11:56:42 -07:00
Vikas Arora
69257f70df Create LUT for PrefixEncode.
This speeds up lossless compression by 5%.
Change-Id: Ifd114b1d9850dc3aac74593809e7d48529d35e3d
2013-08-05 10:20:18 -07:00
Vikas Arora
8967b9f37e SSE2 for lossless decoding (critical) functions.
This speeds up WebP lossless decoding by 20%. In particular, the
photographic images get 35% speedup.

Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc
2013-06-27 11:42:45 -07:00
James Zern
d640614d54 update copyright text
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files

based on the discussion in:
https://codereview.chromium.org/12771026/

Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
2013-06-06 23:09:14 -07:00
skal
af358e68ed Merge "remove datatype qualifier for vmnv" 2013-05-23 06:12:06 -07:00
skal
3fe91635df remove datatype qualifier for vmnv
this fix is for clang (LLVM v4.2). gcc was fine.

Change-Id: Id4076cda84813f6f9548a01775b094cff22b4be9
2013-05-23 13:52:24 +02:00
James Zern
2ca83968ae webp/lossless: fix big endian BGRA output
Change-Id: I3d4b3d21f561cb526dbe7697a31ea847d3e8b2c1
2013-05-17 00:32:01 -07:00
skal
87a4fca25f remove some warnings:
* "declaration of ‘index’ shadows a global declaration [-Wshadow]"
* "signed and unsigned type in conditional expression [-Wsign-compare]"

Change-Id: I891182d919b18b6c84048486e0385027bd93b57d
2013-05-14 22:28:32 +02:00
Urvang Joshi
64c844863a Further reduce memory to decode lossy+alpha images
Earlier such images were using roughly 9 * width * height bytes for
decoding. Now, they take 6 * width * height memory.

Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7
2013-05-13 16:24:49 -07:00
Vikas Arora
8eae188a62 WebP-Lossless encoding improvements.
Lossy (with Alpha) image compression gets 2.3X speedup.
Compressing lossless images is 20%-40% faster now.

Change-Id: I41f0225838b48ae5c60b1effd1b0de72fecb3ae6
2013-05-08 17:22:11 -07:00
skal
9c4ce971a8 Simplify forward-WHT + SSE2 version
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.

also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )

Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
2013-04-26 08:57:18 +02:00
Urvang Joshi
d52b405dbd Cosmetic fixes
Change-Id: Ia878115086edc3fdfee3f0ca76e5e74ea5906f21
(cherry picked from commit e9a7990bc5a7698a29a9cac6d5447c16e9686c23)
2013-03-29 15:49:15 -07:00
Pascal Massimino
6cb4a61825 misc style fix
(cherry picked from commit 142c46291e2b8daaed2b7a9ef038f7eb39fbd503)

Conflicts:
	src/webp/format_constants.h

Change-Id: Ib764cb09bd78ab6e72c60f495d55b752ad4dbe4d
2013-03-29 15:49:05 -07:00
Pascal Massimino
3c8eb9a806 fix bad saturation order in QuantizeBlock
Saturation was done on input coeff, not quantized one.

This saturation is not absolutely needed: output of FTransformWHT
is in range [-16320, 16321]. At quality 100, max quantization steps is 8,
so the maximal range used by QuantizeBlock() is [-2040, 2040].
But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so
it's better to leave this saturation check for now.

addresses issue #145

Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28
2013-03-25 14:53:29 -07:00
James Zern
9048494df6 build: fix install race on shared headers
subdirectories with more than one target can have the install targets
run in parallel with make -jN. group the shared headers in one place to
produce a common install target.

Change-Id: I1f3aa338a8ee6d681de1e5d0b2c6244d2c3d5451
2013-03-16 13:29:49 -07:00
skal
126974b45b add LUT-free reference code for YUV->RGB conversion.
Reported to eventually be 4% on ARM
(see https://code.google.com/p/webp/issues/detail?id=134 for details)
We might activate it selectively later...

Output values is not bitwise the same as the LUT-based
version, but difference is only +/-1 at max.

Change-Id: I1cc790ff4459885ed2ae2e72f31c5f3740095f07
2013-03-15 01:37:55 +01:00
skal
b7eaa85d6a inline VP8LFastLog2() and VP8LFastSLog2 for small values
larger values are still dealt with in the .cc

~5% faster encoding
Output size is slightly different (variably), because of
different floating-point calculation ordering.

Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4
2013-02-25 22:46:52 +01:00
skal
943386db4b disable SSE2 for now
(until proper run-time detection is ready)

Change-Id: I7b8eee52b23fce2f1612ad7d4ed603ffb02620a2
2013-02-20 08:20:47 +01:00
skal
9479fb7d2d lossless encoding speedup
* add SSE2 variant for lossless
* speed-up TransformColor calls using specialized TransformColorBlue/Red
* Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously.

This latter changes the output size a little bit.

Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89
2013-02-20 08:13:12 +01:00