libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-12 05:54:31 +02:00

Author	SHA1	Message	Date
James Zern	2719bb7e98	dec_neon: TransformAC3: work on packed vectors pack 2 rows in 1 vector similar to TransformDC Change-Id: I3b240ffb4f51a632b5c8c2daf54d938333ed4b0d	2014-02-18 19:47:20 -08:00
James Zern	b7b60ca16c	dec_neon: add SaturateAndStore4x4 converts 2 s16 vectors to 2 u8 and store to uint8_t destination; TransformAC3 can reuse this after a rework Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0	2014-02-18 19:42:35 -08:00
James Zern	e02f16ef45	dec_neon.c: convert TransformDC to intrinsics no noticeable difference in performance Change-Id: Ia2d287289c3865ddd0fc99edaf7a030778aa7025	2014-02-14 12:11:58 -08:00
skal	9cba963f9a	add missing file Change-Id: I17eab2fedc64ee3bba941a592ecef765fcd2b402	2014-02-13 21:56:19 -08:00
skal	8992ddb756	use static clipping tables (shared with mips32) removed abs1[] table along the way sub-1% speed-up, but still... Change-Id: I8c29a8a0285076cb3423b01ffae9fcc465da6a81	2014-02-13 19:32:59 -08:00
skal	0235d5e44b	1-2% faster quantization in SSE2 C-version is a bit faster too (sub-1% faster on ARM) Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97	2014-02-13 15:55:30 -08:00
James Zern	228e4877ab	dec_neon.c: add TransformAC3 based on SSE2 version Change-Id: Icc6782955253c98e83d5984153b596ef5f1c0d34	2014-02-08 12:47:54 -08:00
skal	32aeaf115a	revamp VP8LColorSpaceTransform() a bit -> remove the 'color_transform' multiplier, use more constants, etc. This function is particularly critical, mostly because of GetBestColorTransformForTile(). Loop is a bit faster (maybe ~1%) Change-Id: I90c96a3437cafb184773acef55c77e40c224388f	2014-02-05 10:37:06 +01:00
skal	926ff40229	WEBP_SWAP_16BIT_CSP: remove code dup and prepare for potentially supporting both RGBA4444 and BARG4444 Change-Id: If5200289bc6338757a2ceb2df1a19de732595052	2014-02-03 13:24:33 -08:00
Vikas Arora	1d1cd3bbd6	Fix decode bug for rgbA_4444/RGBA_4444 color-modes. The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits) data in the destination buffer and while pre-multiplying the alpha to RGB colors. Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba	2014-02-03 09:20:54 -08:00
James Zern	8934a622ac	cosmetics: *_mips32.c indent, comments, unused includes Change-Id: Id0aabc52d05bb633f62aec022155ec27699cf5a0	2014-01-30 18:03:48 -08:00
Djordje Pesut	dd438c9a7d	MIPS: MIPS32r1: Optimization of some simple point-sampling functions. PATCH [6/6] Change-Id: I2020e71e9be5d17d4bf67cabf6c470ca43d5d838	2014-01-29 15:37:31 +01:00
Djordje Pesut	53520911c3	Added support for calling sampling functions via pointers. Change-Id: Ic4d72e6b175a6b27bcdcc8cd97828e44ea93e743	2014-01-29 15:32:35 +01:00
Jovan Zelincevic	d16c69749b	MIPS: MIPS32r1: Optimization of filter functions. PATCH [5/6] Change-Id: Ifbd305e0514f09a587db02c3970f22190808503a	2014-01-29 15:03:45 +01:00
Djordje Pesut	04336fc7f8	MIPS: MIPS32r1: Optimization of function TransformOne. PATCH [4/6] Change-Id: I5b98e2de940977538cf91bfa2128f4d1daa5c170	2014-01-28 20:10:43 -08:00
Pascal Massimino	c1cb1933d5	disable NEON for arm64 platform The registers and instructions are quite different to 32bit and the assembly code needs a rewrite. more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/ Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f	2014-01-23 12:35:01 -08:00
skal	66a32af5e1	Merge "NEON speed up"	2013-12-18 14:17:19 -08:00
skal	26d842eb8f	NEON speed up add TransformDC special case, and make the switch function inlined. Recovers a few of the CPU lost during the addition of TransformAC3 (only on ARM) Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6	2013-12-18 22:32:58 +01:00
James Zern	605a712701	simplify __cplusplus ifdef drop c_plusplus which is from a quite ancient pre-standard compiler Change-Id: I9e357b3292a6b52b14c2641ba11f4f872c04b7fb	2013-12-16 20:16:02 -08:00
James Zern	5227d99146	drop: ifdef __cplusplus checks from C files the prototypes are already marked in the headers Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094	2013-12-13 11:42:13 -08:00
skal	73b731fb42	introduce a special quantization function for WHT WHT is somewhat a special case: no sharpen[] bias, etc. Will be useful in a later CL when precision of input is changed. Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299	2013-12-10 14:21:47 +01:00
skal	41c0cc4b9a	Make Forward WHT transform use 32bit fixed-point calculation This is in preparation for a future change where input will be 16bit instead of 12bit No speed diff observed. Note that the NEON implementation was using 32bit calc already. Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4	2013-12-10 06:10:52 +01:00
skal	d513bb62bc	* fix off-by-one zthresh calculation * remove the sharpening for non luma-AC coeffs * adjust the bias a little bit to compensate for this Using the multiply-by-reciprocal doesn't always give the same result as the exact divide, given the QFIX fixed-point precision we use. -> removed few now-unneeded SSE2 instructions (and checked for bit-exactness using -noasm) Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24	2013-12-09 13:56:04 +01:00
James Zern	4931c3294b	cosmetics: fix some typos Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c	2013-11-26 19:21:14 -08:00
Pascal Massimino	596a6d73ce	make use of 'extern' consistent in function declarations Change-Id: I18e050db3111e52acfe97da09cdf1860f3e15936	2013-10-30 03:23:21 -07:00
skal	0b2b05049f	Use deterministic random-dithering during RGB->YUV conversion -> helps debanding (sky, gradients, etc.) This dithering can only be triggered when using -preset photo or -pre 2 (as a preprocessing). Everything is unchanged otherwise. Note that this change is likely to make the perceived PSNR/SSIM drop since we're altering the input internally. Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813	2013-10-17 22:36:49 +02:00
James Zern	dca8a4d315	Merge "NEON/simple loopfilter: avoid q4-q7 registers"	2013-10-10 01:58:41 -07:00
pascal massimino	9e84d901d2	Merge "NEON/TransformWHT: avoid q4-q7 registers"	2013-10-09 09:32:59 -07:00
James Zern	fc10249b36	NEON/simple loopfilter: avoid q4-q7 registers very tiny speed improvement Change-Id: I3024f120feb7275ce20bfff21af31ea8650a5a03	2013-10-09 18:17:31 +02:00
James Zern	2f09d63e30	NEON/TransformWHT: avoid q4-q7 registers very tiny speed improvement Change-Id: Iace78b9038af412d0a794845ff19f54afa88ccdc	2013-10-09 18:17:23 +02:00
skal	f9bbc2a034	Special-case sparse transform If the number of non-zero coeffs is <= 3, use a simplified transform for luma. Change-Id: I78a1252704228d21720d4bc1221252c84338d9c8	2013-10-08 22:05:38 +02:00
Pascal Massimino	f8398c9dab	fix compile error on ARM/gcc use of uint8_t type was causing error like: src/dsp/upsampling.c:223:1: internal compiler error: in vect_determine_vectorization_factor, at tree-vect-loop.c:349 with gcc 4.6.3 Change-Id: Ieb6189a1375c47fc4ff992e6c09b34a7f1f605da	2013-09-06 03:07:28 -07:00
James Zern	b25a6fbfdc	yuv.h: fix indent Change-Id: I0c0bd5f7f71bc44e10134bd4f788769ec25cec1f	2013-08-19 18:06:15 -07:00
James Zern	388a7249c9	cosmetics: fix indent Change-Id: Iad0fce79886bed0d61ddf2510ce133a5355ebc1f	2013-08-19 17:51:04 -07:00
James Zern	4c7322c86f	Merge "dsp: msvc compatibility"	2013-08-19 17:42:16 -07:00
skal	df6cebfa9e	5-7% faster SSE2 versions of YUV->RGB conversion functions The C-version gets ~7-8% slower in order to match the SSE2 output exactly. The old (now off-by-1) code is kept under the WEBP_YUV_USE_TABLE flag for reference. (note that calc rounding precision is slightly better ~= +0.02dB) on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610) Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f	2013-08-19 17:05:58 -07:00
skal	ad6ac32d7c	simplify upsampler calls: only allow 'bottom' to be NULL If 'top' was meant to be NULL, then bottom and top can be swapped. Logic is simpler. + fix compilation in non-FANCY_UPSAMPLING mode Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff	2013-08-19 16:47:51 -07:00
James Zern	f358450feb	dsp: msvc compatibility intrin.h is available after VS2003 patch from the FreeImage project Change-Id: I58a18a0db00e247f871d05e3ba99772704f0e079	2013-08-16 20:46:16 -07:00
Vikas Arora	e081f2f359	Pack code & extra_bits to Struct (VP8LPrefixCode). Also created variant VP8LPrefixEncodeBits that returns the code & extra_bits only. There's no impact on compression density and compression speed. Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c	2013-08-12 11:56:42 -07:00
Vikas Arora	69257f70df	Create LUT for PrefixEncode. This speeds up lossless compression by 5%. Change-Id: Ifd114b1d9850dc3aac74593809e7d48529d35e3d	2013-08-05 10:20:18 -07:00
Vikas Arora	8967b9f37e	SSE2 for lossless decoding (critical) functions. This speeds up WebP lossless decoding by 20%. In particular, the photographic images get 35% speedup. Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc	2013-06-27 11:42:45 -07:00
James Zern	d640614d54	update copyright text rather than symlink the webm/vpx terms, use the same header as libvpx to reference in-tree files based on the discussion in: https://codereview.chromium.org/12771026/ Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4	2013-06-06 23:09:14 -07:00
skal	af358e68ed	Merge "remove datatype qualifier for vmnv"	2013-05-23 06:12:06 -07:00
skal	3fe91635df	remove datatype qualifier for vmnv this fix is for clang (LLVM v4.2). gcc was fine. Change-Id: Id4076cda84813f6f9548a01775b094cff22b4be9	2013-05-23 13:52:24 +02:00
James Zern	2ca83968ae	webp/lossless: fix big endian BGRA output Change-Id: I3d4b3d21f561cb526dbe7697a31ea847d3e8b2c1	2013-05-17 00:32:01 -07:00
skal	87a4fca25f	remove some warnings: * "declaration of ‘index’ shadows a global declaration [-Wshadow]" * "signed and unsigned type in conditional expression [-Wsign-compare]" Change-Id: I891182d919b18b6c84048486e0385027bd93b57d	2013-05-14 22:28:32 +02:00
Urvang Joshi	64c844863a	Further reduce memory to decode lossy+alpha images Earlier such images were using roughly 9 * width * height bytes for decoding. Now, they take 6 * width * height memory. Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7	2013-05-13 16:24:49 -07:00
Vikas Arora	8eae188a62	WebP-Lossless encoding improvements. Lossy (with Alpha) image compression gets 2.3X speedup. Compressing lossless images is 20%-40% faster now. Change-Id: I41f0225838b48ae5c60b1effd1b0de72fecb3ae6	2013-05-08 17:22:11 -07:00
skal	9c4ce971a8	Simplify forward-WHT + SSE2 version no precision loss observed speed is not really faster (0.5% at max), as forward-WHT isn't called often. also: replaced a "int << 3" (undefined by C-spec) by a "int * 8" ( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ ) Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401	2013-04-26 08:57:18 +02:00
Urvang Joshi	d52b405dbd	Cosmetic fixes Change-Id: Ia878115086edc3fdfee3f0ca76e5e74ea5906f21 (cherry picked from commit e9a7990bc5a7698a29a9cac6d5447c16e9686c23)	2013-03-29 15:49:15 -07:00

1 2 3 4

198 Commits