libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2026-02-02 00:17:56 +01:00

Author	SHA1	Message	Date
Pascal Massimino	c1cb1933d5	disable NEON for arm64 platform The registers and instructions are quite different to 32bit and the assembly code needs a rewrite. more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/ Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f	2014-01-23 12:35:01 -08:00
skal	66a32af5e1	Merge "NEON speed up"	2013-12-18 14:17:19 -08:00
skal	26d842eb8f	NEON speed up add TransformDC special case, and make the switch function inlined. Recovers a few of the CPU lost during the addition of TransformAC3 (only on ARM) Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6	2013-12-18 22:32:58 +01:00
James Zern	605a712701	simplify __cplusplus ifdef drop c_plusplus which is from a quite ancient pre-standard compiler Change-Id: I9e357b3292a6b52b14c2641ba11f4f872c04b7fb	2013-12-16 20:16:02 -08:00
James Zern	5227d99146	drop: ifdef __cplusplus checks from C files the prototypes are already marked in the headers Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094	2013-12-13 11:42:13 -08:00
skal	73b731fb42	introduce a special quantization function for WHT WHT is somewhat a special case: no sharpen[] bias, etc. Will be useful in a later CL when precision of input is changed. Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299	2013-12-10 14:21:47 +01:00
skal	41c0cc4b9a	Make Forward WHT transform use 32bit fixed-point calculation This is in preparation for a future change where input will be 16bit instead of 12bit No speed diff observed. Note that the NEON implementation was using 32bit calc already. Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4	2013-12-10 06:10:52 +01:00
skal	d513bb62bc	* fix off-by-one zthresh calculation * remove the sharpening for non luma-AC coeffs * adjust the bias a little bit to compensate for this Using the multiply-by-reciprocal doesn't always give the same result as the exact divide, given the QFIX fixed-point precision we use. -> removed few now-unneeded SSE2 instructions (and checked for bit-exactness using -noasm) Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24	2013-12-09 13:56:04 +01:00
James Zern	4931c3294b	cosmetics: fix some typos Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c	2013-11-26 19:21:14 -08:00
Pascal Massimino	596a6d73ce	make use of 'extern' consistent in function declarations Change-Id: I18e050db3111e52acfe97da09cdf1860f3e15936	2013-10-30 03:23:21 -07:00
skal	0b2b05049f	Use deterministic random-dithering during RGB->YUV conversion -> helps debanding (sky, gradients, etc.) This dithering can only be triggered when using -preset photo or -pre 2 (as a preprocessing). Everything is unchanged otherwise. Note that this change is likely to make the perceived PSNR/SSIM drop since we're altering the input internally. Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813	2013-10-17 22:36:49 +02:00
James Zern	dca8a4d315	Merge "NEON/simple loopfilter: avoid q4-q7 registers"	2013-10-10 01:58:41 -07:00
pascal massimino	9e84d901d2	Merge "NEON/TransformWHT: avoid q4-q7 registers"	2013-10-09 09:32:59 -07:00
James Zern	fc10249b36	NEON/simple loopfilter: avoid q4-q7 registers very tiny speed improvement Change-Id: I3024f120feb7275ce20bfff21af31ea8650a5a03	2013-10-09 18:17:31 +02:00
James Zern	2f09d63e30	NEON/TransformWHT: avoid q4-q7 registers very tiny speed improvement Change-Id: Iace78b9038af412d0a794845ff19f54afa88ccdc	2013-10-09 18:17:23 +02:00
skal	f9bbc2a034	Special-case sparse transform If the number of non-zero coeffs is <= 3, use a simplified transform for luma. Change-Id: I78a1252704228d21720d4bc1221252c84338d9c8	2013-10-08 22:05:38 +02:00
Pascal Massimino	f8398c9dab	fix compile error on ARM/gcc use of uint8_t type was causing error like: src/dsp/upsampling.c:223:1: internal compiler error: in vect_determine_vectorization_factor, at tree-vect-loop.c:349 with gcc 4.6.3 Change-Id: Ieb6189a1375c47fc4ff992e6c09b34a7f1f605da	2013-09-06 03:07:28 -07:00
James Zern	b25a6fbfdc	yuv.h: fix indent Change-Id: I0c0bd5f7f71bc44e10134bd4f788769ec25cec1f	2013-08-19 18:06:15 -07:00
James Zern	388a7249c9	cosmetics: fix indent Change-Id: Iad0fce79886bed0d61ddf2510ce133a5355ebc1f	2013-08-19 17:51:04 -07:00
James Zern	4c7322c86f	Merge "dsp: msvc compatibility"	2013-08-19 17:42:16 -07:00
skal	df6cebfa9e	5-7% faster SSE2 versions of YUV->RGB conversion functions The C-version gets ~7-8% slower in order to match the SSE2 output exactly. The old (now off-by-1) code is kept under the WEBP_YUV_USE_TABLE flag for reference. (note that calc rounding precision is slightly better ~= +0.02dB) on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610) Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f	2013-08-19 17:05:58 -07:00
skal	ad6ac32d7c	simplify upsampler calls: only allow 'bottom' to be NULL If 'top' was meant to be NULL, then bottom and top can be swapped. Logic is simpler. + fix compilation in non-FANCY_UPSAMPLING mode Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff	2013-08-19 16:47:51 -07:00
James Zern	f358450feb	dsp: msvc compatibility intrin.h is available after VS2003 patch from the FreeImage project Change-Id: I58a18a0db00e247f871d05e3ba99772704f0e079	2013-08-16 20:46:16 -07:00
Vikas Arora	e081f2f359	Pack code & extra_bits to Struct (VP8LPrefixCode). Also created variant VP8LPrefixEncodeBits that returns the code & extra_bits only. There's no impact on compression density and compression speed. Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c	2013-08-12 11:56:42 -07:00
Vikas Arora	69257f70df	Create LUT for PrefixEncode. This speeds up lossless compression by 5%. Change-Id: Ifd114b1d9850dc3aac74593809e7d48529d35e3d	2013-08-05 10:20:18 -07:00
Vikas Arora	8967b9f37e	SSE2 for lossless decoding (critical) functions. This speeds up WebP lossless decoding by 20%. In particular, the photographic images get 35% speedup. Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc	2013-06-27 11:42:45 -07:00
James Zern	d640614d54	update copyright text rather than symlink the webm/vpx terms, use the same header as libvpx to reference in-tree files based on the discussion in: https://codereview.chromium.org/12771026/ Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4	2013-06-06 23:09:14 -07:00
skal	af358e68ed	Merge "remove datatype qualifier for vmnv"	2013-05-23 06:12:06 -07:00
skal	3fe91635df	remove datatype qualifier for vmnv this fix is for clang (LLVM v4.2). gcc was fine. Change-Id: Id4076cda84813f6f9548a01775b094cff22b4be9	2013-05-23 13:52:24 +02:00
James Zern	2ca83968ae	webp/lossless: fix big endian BGRA output Change-Id: I3d4b3d21f561cb526dbe7697a31ea847d3e8b2c1	2013-05-17 00:32:01 -07:00
skal	87a4fca25f	remove some warnings: * "declaration of ‘index’ shadows a global declaration [-Wshadow]" * "signed and unsigned type in conditional expression [-Wsign-compare]" Change-Id: I891182d919b18b6c84048486e0385027bd93b57d	2013-05-14 22:28:32 +02:00
Urvang Joshi	64c844863a	Further reduce memory to decode lossy+alpha images Earlier such images were using roughly 9 * width * height bytes for decoding. Now, they take 6 * width * height memory. Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7	2013-05-13 16:24:49 -07:00
Vikas Arora	8eae188a62	WebP-Lossless encoding improvements. Lossy (with Alpha) image compression gets 2.3X speedup. Compressing lossless images is 20%-40% faster now. Change-Id: I41f0225838b48ae5c60b1effd1b0de72fecb3ae6	2013-05-08 17:22:11 -07:00
skal	9c4ce971a8	Simplify forward-WHT + SSE2 version no precision loss observed speed is not really faster (0.5% at max), as forward-WHT isn't called often. also: replaced a "int << 3" (undefined by C-spec) by a "int * 8" ( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ ) Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401	2013-04-26 08:57:18 +02:00
Urvang Joshi	d52b405dbd	Cosmetic fixes Change-Id: Ia878115086edc3fdfee3f0ca76e5e74ea5906f21 (cherry picked from commit `e9a7990bc5`)	2013-03-29 15:49:15 -07:00
Pascal Massimino	6cb4a61825	misc style fix (cherry picked from commit `142c46291e`) Conflicts: src/webp/format_constants.h Change-Id: Ib764cb09bd78ab6e72c60f495d55b752ad4dbe4d	2013-03-29 15:49:05 -07:00
Pascal Massimino	3c8eb9a806	fix bad saturation order in QuantizeBlock Saturation was done on input coeff, not quantized one. This saturation is not absolutely needed: output of FTransformWHT is in range [-16320, 16321]. At quality 100, max quantization steps is 8, so the maximal range used by QuantizeBlock() is [-2040, 2040]. But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so it's better to leave this saturation check for now. addresses issue #145 Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28	2013-03-25 14:53:29 -07:00
James Zern	9048494df6	build: fix install race on shared headers subdirectories with more than one target can have the install targets run in parallel with make -jN. group the shared headers in one place to produce a common install target. Change-Id: I1f3aa338a8ee6d681de1e5d0b2c6244d2c3d5451	2013-03-16 13:29:49 -07:00
skal	126974b45b	add LUT-free reference code for YUV->RGB conversion. Reported to eventually be 4% on ARM (see https://code.google.com/p/webp/issues/detail?id=134 for details) We might activate it selectively later... Output values is not bitwise the same as the LUT-based version, but difference is only +/-1 at max. Change-Id: I1cc790ff4459885ed2ae2e72f31c5f3740095f07	2013-03-15 01:37:55 +01:00
skal	b7eaa85d6a	inline VP8LFastLog2() and VP8LFastSLog2 for small values larger values are still dealt with in the .cc ~5% faster encoding Output size is slightly different (variably), because of different floating-point calculation ordering. Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4	2013-02-25 22:46:52 +01:00
skal	943386db4b	disable SSE2 for now (until proper run-time detection is ready) Change-Id: I7b8eee52b23fce2f1612ad7d4ed603ffb02620a2	2013-02-20 08:20:47 +01:00
skal	9479fb7d2d	lossless encoding speedup * add SSE2 variant for lossless * speed-up TransformColor calls using specialized TransformColorBlue/Red * Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously. This latter changes the output size a little bit. Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89	2013-02-20 08:13:12 +01:00
skal	b7490f8553	introduce WEBP_REFERENCE_IMPLEMENTATION compile option This flag will make the code use no uint64, no asm, and no fancy trick, but instead aim at being as simple and straightforward as possible. Main use is to help emscripten generate proper JS code. More code needs to be simplified later. Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY Here are the typical timing for decoding a large image: ARM7-a: dwebp_justify_32_neon Time to decode picture: 3.280s dwebp_justify_24_neon Time to decode picture: 2.640s dwebp_justify_16_neon Time to decode picture: 2.723s dwebp_justify_8_neon Time to decode picture: 2.802s dwebp_justify_32 Time to decode picture: 4.264s dwebp_justify_24 Time to decode picture: 3.696s dwebp_justify_16 Time to decode picture: 3.779s dwebp_justify_8 Time to decode picture: 3.834s dwebp_32_neon Time to decode picture: 4.010s dwebp_24_neon Time to decode picture: 2.725s dwebp_16_neon Time to decode picture: 2.852s dwebp_8_neon Time to decode picture: 2.778s dwebp_32 Time to decode picture: 4.587s dwebp_24 Time to decode picture: 3.800s dwebp_16 Time to decode picture: 3.902s dwebp_8 Time to decode picture: 3.815s REFERENCE (HEAD) Time to decode picture: 3.818s x86_64: dwebp_justify_32 Time to decode picture: 0.473s dwebp_justify_24 Time to decode picture: 0.434s dwebp_justify_16 Time to decode picture: 0.450s dwebp_justify_8 Time to decode picture: 0.467s dwebp_32 Time to decode picture: 0.474s dwebp_24 Time to decode picture: 0.468s dwebp_16 Time to decode picture: 0.468s dwebp_8 Time to decode picture: 0.481s REFERENCE (HEAD) Time to decode picture: 0.436s i386: dwebp_justify_32 Time to decode picture: 0.723s dwebp_justify_24 Time to decode picture: 0.618s dwebp_justify_16 Time to decode picture: 0.626s dwebp_justify_8 Time to decode picture: 0.651s dwebp_32 Time to decode picture: 0.744s dwebp_24 Time to decode picture: 0.627s dwebp_16 Time to decode picture: 0.642s dwebp_8 Time to decode picture: 0.642s Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225	2013-02-14 15:46:12 +01:00
pascal massimino	841a3ba5da	Merge "Remove -Wshadow warnings."	2013-01-28 13:15:54 -08:00
Johann	6efed26865	Remove -Wshadow warnings. Accidentally carried some bad habits from SSE code. Copy over fixes from `0d19fbf` Change-Id: I763312c9d176c434ba41f95602bada1aeffebfb2	2013-01-28 12:29:12 -08:00
James Zern	27f8f7420e	upsampling_neon.c: fix build store values to a temporary variable before calling functions that take vector types. removes non-standard constructs such as: (uint8x8x2_t){{ a, b }} fixing: src/dsp/upsampling_neon.c:69:32: error: macro "vst2_u8" passed 3 arguments, but takes just 2 Change-Id: Ib4368e16e3a3efac18024f02be94e76243ade2dc Fixes: https://code.google.com/p/webp/issues/detail?id=140	2013-01-25 19:42:50 -08:00
Mans Rullgard	090b708a00	NEON optimised yuv to rgb conversion - along the lines of the SSE chroma upsampling. Total speedup is ~30%. 4% speed loss on YuvToRgbXX conversion using tables instead of 14-bit fixed precision. TODO(later): investigate, and compare to x86. see http://code.google.com/p/webp/issues/detail?id=134 Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009	2013-01-25 15:46:40 -08:00
James Zern	be7c96b069	cosmetics: break a few long lines Change-Id: I785763b974b4e7664ad8e9884251aa2d5274b456	2013-01-23 14:50:19 -08:00
Vikas Arora	0aeba52852	Provide an option to build decoder library. When the config option '--enable-libwebpdecoder' is specified, the lean decoder library 'libwebpdecoder' will be created in addition to libwebp. Also dwebp binary will be linked to libwebpdecoder, if this config option is specified. Change-Id: I9de3e149b59c9a8390fae2ba660941749640e54a	2013-01-23 11:43:36 -08:00
James Zern	2b252a53a8	Merge "Provide option to swap bytes for 16 bit colormodes"	2013-01-22 15:00:39 -08:00

1 2 3

133 Commits