libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-13 06:24:27 +02:00

Author	SHA1	Message	Date
Pascal Massimino	bad775715a	simplify the Histogram struct, to only store max_value and last_nz we don't need to store the whole distribution in order to compute the alpha Later, we can incorporate the max_value / last_non_zero bookkeeping in SSE2 directly. Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15	2014-12-10 10:44:57 +01:00
James Zern	f85ec712b0	PrintReg: output to stderr allows use of '-o -' while testing Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191	2014-10-22 17:28:19 +02:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
Pascal Massimino	fabc65da32	1-3% faster encoding optimizing SSE_NxN functions got rid of the \|a-b\|^\|b-a\| method and went back to just (a-b)^2 instead. quality \| size(bytes) after/before \| time (ms) after/before Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0	2014-10-09 07:20:00 -07:00
skal	73d361dd5f	introduce VP8EncQuantize2Blocks to quantize two blocks at a time No speed diff for now. We might reorder better the instructions later, to speed things up. Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c	2014-08-25 20:21:42 -07:00
Pascal Massimino	1f3e5f1e60	remove unused 'shift' argument and QFIX2 define this will remove a warning about the shift amount not being an immediate (=constant). Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a Also: fix the NULL sharpen argument mismatch.	2014-06-26 00:44:12 -07:00
levytamar82	27bfeee43a	QuantizeBlock SSE2 Optimization: Another store to load forward block was detected coming from the function FTransform. FTransform save the output data 4 times 8 bytes each. when this data is later being loaded by the QuantizeBlock function in one chunk of 16 bytes that caused a store to load forward block. The fix was done in the FTransform function where each two consecutive 8 bytes were merged into one 16 bytes register and saved into the memory. This fix gives ~21% function level gain and 1.6% user level gain. Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992	2014-06-18 16:22:00 -07:00
skal	69fce2ea78	remove the special casing for res->first in VP8SetResidualCoeffs if res->first = 1, coeffs[0]=0 because of quant.c:749 and line added at quant.c:744 So, no need for the extra case. Going forward, TrellisQuantizeBlock() should also be calling a variant of VP8SetResidualCoeffs() to set the 'last' field. also: fixes a warning for win64 + slight speed-up Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0	2014-06-08 06:40:22 +02:00
James Zern	db4860b355	enc_sse2: prevent signed int overflow _mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed int. Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda	2014-06-04 23:18:22 -07:00
skal	6679f8996f	Optimize VP8SetResidualCoeffs. Brings down WebP lossy encoding timings by 5% Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489	2014-06-03 06:44:04 +02:00
skal	869eaf6c60	~30% encoding speedup: use NEON for QuantizeBlock() also revamped the signature to avoid having to pass the 'first' parameter Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5	2014-04-08 03:08:22 -07:00
James Zern	2ca42a4fb7	enc_sse2: drop SSE2 suffix from local functions Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e	2014-04-02 23:24:36 -07:00
skal	0235d5e44b	1-2% faster quantization in SSE2 C-version is a bit faster too (sub-1% faster on ARM) Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97	2014-02-13 15:55:30 -08:00
James Zern	5227d99146	drop: ifdef __cplusplus checks from C files the prototypes are already marked in the headers Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094	2013-12-13 11:42:13 -08:00
skal	73b731fb42	introduce a special quantization function for WHT WHT is somewhat a special case: no sharpen[] bias, etc. Will be useful in a later CL when precision of input is changed. Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299	2013-12-10 14:21:47 +01:00
skal	41c0cc4b9a	Make Forward WHT transform use 32bit fixed-point calculation This is in preparation for a future change where input will be 16bit instead of 12bit No speed diff observed. Note that the NEON implementation was using 32bit calc already. Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4	2013-12-10 06:10:52 +01:00
skal	d513bb62bc	* fix off-by-one zthresh calculation * remove the sharpening for non luma-AC coeffs * adjust the bias a little bit to compensate for this Using the multiply-by-reciprocal doesn't always give the same result as the exact divide, given the QFIX fixed-point precision we use. -> removed few now-unneeded SSE2 instructions (and checked for bit-exactness using -noasm) Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24	2013-12-09 13:56:04 +01:00
James Zern	4931c3294b	cosmetics: fix some typos Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c	2013-11-26 19:21:14 -08:00
James Zern	d640614d54	update copyright text rather than symlink the webm/vpx terms, use the same header as libvpx to reference in-tree files based on the discussion in: https://codereview.chromium.org/12771026/ Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4	2013-06-06 23:09:14 -07:00
skal	9c4ce971a8	Simplify forward-WHT + SSE2 version no precision loss observed speed is not really faster (0.5% at max), as forward-WHT isn't called often. also: replaced a "int << 3" (undefined by C-spec) by a "int * 8" ( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ ) Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401	2013-04-26 08:57:18 +02:00
Pascal Massimino	3c8eb9a806	fix bad saturation order in QuantizeBlock Saturation was done on input coeff, not quantized one. This saturation is not absolutely needed: output of FTransformWHT is in range [-16320, 16321]. At quality 100, max quantization steps is 8, so the maximal range used by QuantizeBlock() is [-2040, 2040]. But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so it's better to leave this saturation check for now. addresses issue #145 Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28	2013-03-25 14:53:29 -07:00
James Zern	be7c96b069	cosmetics: break a few long lines Change-Id: I785763b974b4e7664ad8e9884251aa2d5274b456	2013-01-23 14:50:19 -08:00
skal	d5838cd598	faster non-transposing SSE2 4x4 FTransform 1-2% faster. uses pmaddwd instead of transpose + pmullw. Can possibly be simplified further. Change-Id: I420e148816c4c6ab5e2080c9b1719dbbe6762d4e	2012-11-27 08:38:24 +01:00
skal	42c3b550ba	simplify the fwd transform -> remove two shifts Change-Id: Ibc55bca98588da30553a7870224ffd0e13d57f52	2012-11-15 09:51:35 +01:00
skal	118cb31270	Merge "add SSE2 version of Sum of Square error for 16x16, 16x8 and 8x8 case"	2012-11-15 00:07:44 -08:00
skal	e5c3b3f554	Simplify the texture evaluation Disto4x4() We don't need to use the exact forward transform, since it's only a rough evaluation. -> Removed some shifts and rounding constants. Change-Id: I3fdf8b4fe9720473894155e1ad0345f4d1fd9a33	2012-11-14 07:49:31 +01:00
skal	35bfd4c08f	add SSE2 version of Sum of Square error for 16x16, 16x8 and 8x8 case + replace mm_set1_ps(0) by _mm_setzero_si128() Change-Id: I4601033c27466532373f5dabfaf349ce5e5039da	2012-11-14 06:16:49 +01:00
skal	5725cabac0	new segmentation algorithm fixes the 'blocky sky problem' (saturation problem: when luma was flat, chroma noise was taking over, resulting in random segment id assigned. When just using a common uniform segment was better). + side clean-up and readibility/experimentability MACRO'ization + added '-map 7' option Change-Id: I35982a9e43c0fecbfdd7b05e4813e8ba8c121d71	2012-09-04 23:09:15 +02:00
Pascal Massimino	7c6e60f4bd	make InitSSE2() functions be empty on non-SSE2 platform this avoids the '.o has no symbols' warning messages Change-Id: Idbaa02f5c2f7c632997a26f9507926922d191b6e	2012-08-27 23:40:47 -07:00
James Zern	80256b8567	enc_sse2 add missing stdlib.h include lost in fbd82b5; most platforms were getting it indirectly through emmintrin.h. Change-Id: I310f8bc8e82d63cfbde74c34cd21b72514a16a01	2012-04-19 15:47:58 -07:00
James Zern	ad1e163a0d	cosmetics: normalize copyright headers Change-Id: I5e2462b101e0447a4f15a1455c07131bc97a52dd	2012-01-06 14:49:06 -08:00
James Zern	f06817aaea	simplify checks for enabling SSE2 code also fixes build issues under vs11 which has a native arm compiler for windows 8 targets Change-Id: Id76c2deae9fc9de147d13ad0d34edffcb5a726c4	2011-12-20 17:41:55 -08:00
Pascal Massimino	e06ac0887f	create a separate libwebpdsp under src/dsp Gathers all DSP-related function (and SSE2 implementations). Clean-up some unwanted symbolic dependencies so that webp_encode, webp_decode and webp_dsp are truly independent libraries. + opportunistic clean-up: * remove unneeded VP8DspInitTables(), now integrated in VP8DspInit() * make consistent use of VP8GetCPUInfo() in the various DspInit() funcs * change OUT macro to DST	2011-09-13 12:29:44 -07:00

34 Commits