libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-06-11 00:04:21 +02:00

Author	SHA1	Message	Date
skal	df6cebfa9e	5-7% faster SSE2 versions of YUV->RGB conversion functions The C-version gets ~7-8% slower in order to match the SSE2 output exactly. The old (now off-by-1) code is kept under the WEBP_YUV_USE_TABLE flag for reference. (note that calc rounding precision is slightly better ~= +0.02dB) on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610) Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f	2013-08-19 17:05:58 -07:00
skal	ad6ac32d7c	simplify upsampler calls: only allow 'bottom' to be NULL If 'top' was meant to be NULL, then bottom and top can be swapped. Logic is simpler. + fix compilation in non-FANCY_UPSAMPLING mode Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff	2013-08-19 16:47:51 -07:00
James Zern	f358450feb	dsp: msvc compatibility intrin.h is available after VS2003 patch from the FreeImage project Change-Id: I58a18a0db00e247f871d05e3ba99772704f0e079	2013-08-16 20:46:16 -07:00
Vikas Arora	e081f2f359	Pack code & extra_bits to Struct (VP8LPrefixCode). Also created variant VP8LPrefixEncodeBits that returns the code & extra_bits only. There's no impact on compression density and compression speed. Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c	2013-08-12 11:56:42 -07:00
Vikas Arora	69257f70df	Create LUT for PrefixEncode. This speeds up lossless compression by 5%. Change-Id: Ifd114b1d9850dc3aac74593809e7d48529d35e3d	2013-08-05 10:20:18 -07:00
Vikas Arora	8967b9f37e	SSE2 for lossless decoding (critical) functions. This speeds up WebP lossless decoding by 20%. In particular, the photographic images get 35% speedup. Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc	2013-06-27 11:42:45 -07:00
James Zern	d640614d54	update copyright text rather than symlink the webm/vpx terms, use the same header as libvpx to reference in-tree files based on the discussion in: https://codereview.chromium.org/12771026/ Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4	2013-06-06 23:09:14 -07:00
skal	af358e68ed	Merge "remove datatype qualifier for vmnv"	2013-05-23 06:12:06 -07:00
skal	3fe91635df	remove datatype qualifier for vmnv this fix is for clang (LLVM v4.2). gcc was fine. Change-Id: Id4076cda84813f6f9548a01775b094cff22b4be9	2013-05-23 13:52:24 +02:00
James Zern	2ca83968ae	webp/lossless: fix big endian BGRA output Change-Id: I3d4b3d21f561cb526dbe7697a31ea847d3e8b2c1	2013-05-17 00:32:01 -07:00
skal	87a4fca25f	remove some warnings: * "declaration of ‘index’ shadows a global declaration [-Wshadow]" * "signed and unsigned type in conditional expression [-Wsign-compare]" Change-Id: I891182d919b18b6c84048486e0385027bd93b57d	2013-05-14 22:28:32 +02:00
Urvang Joshi	64c844863a	Further reduce memory to decode lossy+alpha images Earlier such images were using roughly 9 * width * height bytes for decoding. Now, they take 6 * width * height memory. Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7	2013-05-13 16:24:49 -07:00
Vikas Arora	8eae188a62	WebP-Lossless encoding improvements. Lossy (with Alpha) image compression gets 2.3X speedup. Compressing lossless images is 20%-40% faster now. Change-Id: I41f0225838b48ae5c60b1effd1b0de72fecb3ae6	2013-05-08 17:22:11 -07:00
skal	9c4ce971a8	Simplify forward-WHT + SSE2 version no precision loss observed speed is not really faster (0.5% at max), as forward-WHT isn't called often. also: replaced a "int << 3" (undefined by C-spec) by a "int * 8" ( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ ) Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401	2013-04-26 08:57:18 +02:00
Urvang Joshi	d52b405dbd	Cosmetic fixes Change-Id: Ia878115086edc3fdfee3f0ca76e5e74ea5906f21 (cherry picked from commit e9a7990bc5a7698a29a9cac6d5447c16e9686c23)	2013-03-29 15:49:15 -07:00
Pascal Massimino	6cb4a61825	misc style fix (cherry picked from commit 142c46291e2b8daaed2b7a9ef038f7eb39fbd503) Conflicts: src/webp/format_constants.h Change-Id: Ib764cb09bd78ab6e72c60f495d55b752ad4dbe4d	2013-03-29 15:49:05 -07:00
Pascal Massimino	3c8eb9a806	fix bad saturation order in QuantizeBlock Saturation was done on input coeff, not quantized one. This saturation is not absolutely needed: output of FTransformWHT is in range [-16320, 16321]. At quality 100, max quantization steps is 8, so the maximal range used by QuantizeBlock() is [-2040, 2040]. But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so it's better to leave this saturation check for now. addresses issue #145 Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28	2013-03-25 14:53:29 -07:00
James Zern	9048494df6	build: fix install race on shared headers subdirectories with more than one target can have the install targets run in parallel with make -jN. group the shared headers in one place to produce a common install target. Change-Id: I1f3aa338a8ee6d681de1e5d0b2c6244d2c3d5451	2013-03-16 13:29:49 -07:00
skal	126974b45b	add LUT-free reference code for YUV->RGB conversion. Reported to eventually be 4% on ARM (see https://code.google.com/p/webp/issues/detail?id=134 for details) We might activate it selectively later... Output values is not bitwise the same as the LUT-based version, but difference is only +/-1 at max. Change-Id: I1cc790ff4459885ed2ae2e72f31c5f3740095f07	2013-03-15 01:37:55 +01:00
skal	b7eaa85d6a	inline VP8LFastLog2() and VP8LFastSLog2 for small values larger values are still dealt with in the .cc ~5% faster encoding Output size is slightly different (variably), because of different floating-point calculation ordering. Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4	2013-02-25 22:46:52 +01:00
skal	943386db4b	disable SSE2 for now (until proper run-time detection is ready) Change-Id: I7b8eee52b23fce2f1612ad7d4ed603ffb02620a2	2013-02-20 08:20:47 +01:00
skal	9479fb7d2d	lossless encoding speedup * add SSE2 variant for lossless * speed-up TransformColor calls using specialized TransformColorBlue/Red * Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously. This latter changes the output size a little bit. Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89	2013-02-20 08:13:12 +01:00
skal	b7490f8553	introduce WEBP_REFERENCE_IMPLEMENTATION compile option This flag will make the code use no uint64, no asm, and no fancy trick, but instead aim at being as simple and straightforward as possible. Main use is to help emscripten generate proper JS code. More code needs to be simplified later. Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY Here are the typical timing for decoding a large image: ARM7-a: dwebp_justify_32_neon Time to decode picture: 3.280s dwebp_justify_24_neon Time to decode picture: 2.640s dwebp_justify_16_neon Time to decode picture: 2.723s dwebp_justify_8_neon Time to decode picture: 2.802s dwebp_justify_32 Time to decode picture: 4.264s dwebp_justify_24 Time to decode picture: 3.696s dwebp_justify_16 Time to decode picture: 3.779s dwebp_justify_8 Time to decode picture: 3.834s dwebp_32_neon Time to decode picture: 4.010s dwebp_24_neon Time to decode picture: 2.725s dwebp_16_neon Time to decode picture: 2.852s dwebp_8_neon Time to decode picture: 2.778s dwebp_32 Time to decode picture: 4.587s dwebp_24 Time to decode picture: 3.800s dwebp_16 Time to decode picture: 3.902s dwebp_8 Time to decode picture: 3.815s REFERENCE (HEAD) Time to decode picture: 3.818s x86_64: dwebp_justify_32 Time to decode picture: 0.473s dwebp_justify_24 Time to decode picture: 0.434s dwebp_justify_16 Time to decode picture: 0.450s dwebp_justify_8 Time to decode picture: 0.467s dwebp_32 Time to decode picture: 0.474s dwebp_24 Time to decode picture: 0.468s dwebp_16 Time to decode picture: 0.468s dwebp_8 Time to decode picture: 0.481s REFERENCE (HEAD) Time to decode picture: 0.436s i386: dwebp_justify_32 Time to decode picture: 0.723s dwebp_justify_24 Time to decode picture: 0.618s dwebp_justify_16 Time to decode picture: 0.626s dwebp_justify_8 Time to decode picture: 0.651s dwebp_32 Time to decode picture: 0.744s dwebp_24 Time to decode picture: 0.627s dwebp_16 Time to decode picture: 0.642s dwebp_8 Time to decode picture: 0.642s Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225	2013-02-14 15:46:12 +01:00
pascal massimino	841a3ba5da	Merge "Remove -Wshadow warnings."	2013-01-28 13:15:54 -08:00
Johann	6efed26865	Remove -Wshadow warnings. Accidentally carried some bad habits from SSE code. Copy over fixes from 0d19fbf Change-Id: I763312c9d176c434ba41f95602bada1aeffebfb2	2013-01-28 12:29:12 -08:00
James Zern	27f8f7420e	upsampling_neon.c: fix build store values to a temporary variable before calling functions that take vector types. removes non-standard constructs such as: (uint8x8x2_t){{ a, b }} fixing: src/dsp/upsampling_neon.c:69:32: error: macro "vst2_u8" passed 3 arguments, but takes just 2 Change-Id: Ib4368e16e3a3efac18024f02be94e76243ade2dc Fixes: https://code.google.com/p/webp/issues/detail?id=140	2013-01-25 19:42:50 -08:00
Mans Rullgard	090b708a00	NEON optimised yuv to rgb conversion - along the lines of the SSE chroma upsampling. Total speedup is ~30%. 4% speed loss on YuvToRgbXX conversion using tables instead of 14-bit fixed precision. TODO(later): investigate, and compare to x86. see http://code.google.com/p/webp/issues/detail?id=134 Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009	2013-01-25 15:46:40 -08:00
James Zern	be7c96b069	cosmetics: break a few long lines Change-Id: I785763b974b4e7664ad8e9884251aa2d5274b456	2013-01-23 14:50:19 -08:00
Vikas Arora	0aeba52852	Provide an option to build decoder library. When the config option '--enable-libwebpdecoder' is specified, the lean decoder library 'libwebpdecoder' will be created in addition to libwebp. Also dwebp binary will be linked to libwebpdecoder, if this config option is specified. Change-Id: I9de3e149b59c9a8390fae2ba660941749640e54a	2013-01-23 11:43:36 -08:00
James Zern	2b252a53a8	Merge "Provide option to swap bytes for 16 bit colormodes"	2013-01-22 15:00:39 -08:00
Vikas Arora	94a48b4bc3	Provide option to swap bytes for 16 bit colormodes Color modes: RGB_565 & RGBA_4444 Change-Id: I571b6832b9848e5c4109272978f68623ca373383	2013-01-22 14:51:20 -08:00
skal	0d19fbff51	remove some -Wshadow warnings these are quite noisy, but it's not a big deal to remove them. Change-Id: I5deb08f10263feb77e2cc8a70be44ad4f725febd	2013-01-22 23:06:28 +01:00
skal	a556cb1ab4	Add details and reference about the YUV->RGB conversion Originated from the discussion at http://code.google.com/p/webp/issues/detail?id=134 Change-Id: I24384e2d2f5cf262d8632fc98303cba5e2d27224	2013-01-18 23:26:55 +01:00
pascal massimino	f4a97970de	Merge "Disto4x4 and Disto16x16 in NEON"	2013-01-17 11:07:20 -08:00
Johann	47b7b0ba47	Disto4x4 and Disto16x16 in NEON Change-Id: Ic6d9dbbc97b5025ce359332c33ae306d5d8925a5	2013-01-16 16:57:33 -08:00
vikas arora	e6409adc2e	Remove redundant include from dsp/lossless code. Change-Id: Ie8a497a486653f907c2a27f4027640a3308c6cc8	2013-01-10 15:09:19 -08:00
skal	d5838cd598	faster non-transposing SSE2 4x4 FTransform 1-2% faster. uses pmaddwd instead of transpose + pmullw. Can possibly be simplified further. Change-Id: I420e148816c4c6ab5e2080c9b1719dbbe6762d4e	2012-11-27 08:38:24 +01:00
skal	42c3b550ba	simplify the fwd transform -> remove two shifts Change-Id: Ibc55bca98588da30553a7870224ffd0e13d57f52	2012-11-15 09:51:35 +01:00
skal	118cb31270	Merge "add SSE2 version of Sum of Square error for 16x16, 16x8 and 8x8 case"	2012-11-15 00:07:44 -08:00
skal	e5c3b3f554	Simplify the texture evaluation Disto4x4() We don't need to use the exact forward transform, since it's only a rough evaluation. -> Removed some shifts and rounding constants. Change-Id: I3fdf8b4fe9720473894155e1ad0345f4d1fd9a33	2012-11-14 07:49:31 +01:00
skal	35bfd4c08f	add SSE2 version of Sum of Square error for 16x16, 16x8 and 8x8 case + replace mm_set1_ps(0) by _mm_setzero_si128() Change-Id: I4601033c27466532373f5dabfaf349ce5e5039da	2012-11-14 06:16:49 +01:00
Urvang Joshi	7caab1d8f6	Some cosmetic/comment fixes. Change-Id: Id0613f84cc53fcbeceb913c835a262451687e27b	2012-11-09 10:46:38 -08:00
Pascal Massimino	22a0fd9d01	Add NEON version of FTransformWHT Contributed by Wayne Chen (datoudatou at gmail dot com) Change-Id: I007c21db4eeadbf82b89f0963256f965deda7d90	2012-11-08 08:28:51 -08:00
Pascal Massimino	e8b41ad136	add NEON asm version for WHT inverse transform Contributed by Wayne Chen (datoudatou at gmail dot com) + some header cleanup + remove the NEON suffix in static functions Change-Id: I75bf5e9b54cf5e1acc53764c6f081d61690f8e3d	2012-11-01 16:31:01 -07:00
Pascal Massimino	75e5f17e3b	ARM/NEON: 30% encoding speed-up (implements the backward and forward transforms in the encoder) original patch by Wayne Chen (datoudatou at gmail dot com) Change-Id: Ic00f3bffcdf7a924f043006728735c810ee47a57	2012-10-31 14:00:20 -07:00
skal	f0360b4fcf	add EXPERIMENTAL code for YUV-JPEG colorspace This is mostly for experimentation! Need to define USE_YUVj flag in the code for that. suggested by benwreder at hotmail dot com Change-Id: If0b8e2c1863efc08ce097de6de20f4c7efc3f7e8	2012-10-19 20:15:58 +02:00
skal	5725cabac0	new segmentation algorithm fixes the 'blocky sky problem' (saturation problem: when luma was flat, chroma noise was taking over, resulting in random segment id assigned. When just using a common uniform segment was better). + side clean-up and readibility/experimentability MACRO'ization + added '-map 7' option Change-Id: I35982a9e43c0fecbfdd7b05e4813e8ba8c121d71	2012-09-04 23:09:15 +02:00
Pascal Massimino	5c3a7231ca	Make InitSSE2() functions be empty on non-SSE2 platform this avoids the '.o has no symbols' warning messages Change-Id: I00cf527a9041a810d896bd24b993112af6276323	2012-08-28 11:02:38 -07:00
Pascal Massimino	7c6e60f4bd	make InitSSE2() functions be empty on non-SSE2 platform this avoids the '.o has no symbols' warning messages Change-Id: Idbaa02f5c2f7c632997a26f9507926922d191b6e	2012-08-27 23:40:47 -07:00
Pascal Massimino	c7eb45764f	make VP8DspInitNEON() public this will avoid the "dec_neon.o has no symbol" warning no change in binary size observed on linux. Change-Id: Ia27ae2bc5a03d714afa7e46671fdcf4cb630784d	2012-08-27 00:28:13 -07:00

... 14 15 16 17 18

863 Commits