libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-12 05:54:31 +02:00

Author	SHA1	Message	Date
Pascal Massimino	693bf74ec0	move the SSIM calculation code in ssim.c / ssim_sse2.c Change-Id: I63a63fa7f44f257f2e17e45358b206c23069c448	2017-02-21 12:53:35 +01:00
James Zern	668e1dd44f	src/{dec,enc,utils}: give filenames a unique suffix this avoids duplicates between these trees and dsp/, e.g., enc/tree.c, dec/tree.c, making pulling the whole library source tree into one target possible BUG=webp:279 Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775	2017-01-19 19:09:48 -08:00
Owen Rodley	67748b41db	Improve latency of FTransform2. Benchmarks from vrabaud@: 8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged images/lpcb corpus speed: unchanged, corpus size: unchanged images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged images/png_doc corpus speed: unchanged, corpus size: unchanged images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged images/jpeg_small corpus speed: unchanged, corpus size: unchanged images/icip_core1 corpus speed: unchanged, corpus size: unchanged images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged images/gif corpus speed: faster: -1.4 % , corpus size: unchanged images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged images/all corpus speed: faster: -1.5 % , corpus size: unchanged images/png_small corpus speed: unchanged, corpus size: unchanged images/png corpus speed: unchanged, corpus size: unchanged images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b	2016-11-24 07:09:50 +00:00
Pascal Massimino	ba843a92e7	fix some SSIM calculations * prevent 64bit overflow by controlling the 32b->64b conversions and preventively descaling by 8bit before the final multiply * adjust the threshold constants C1 and C2 to de-emphasis the dark areas * use a hat-like filter instead of box-filtering to avoid blockiness during averaging SSIM distortion calc is actually faster now in SSE2, because of the unrolling during the function rewrite. The C-version is quite slower because still un-optimized. Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90	2016-10-04 01:09:07 -07:00
Pascal Massimino	86a84b3598	2x faster SSE2 implementation of SSIMGet Change-Id: I53705d7ddfa595389ff2d542e5088f96f948d351	2016-09-23 23:23:06 -07:00
Pascal Massimino	50c3d7da9a	refactor the PSNR / SSIM calculation code -print_psnr is now much faster because it doesn't use the SSIM code. The SSIM speed-up and re-write will come later. Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823	2016-09-14 06:13:24 +00:00
skal	5b60db5c9d	FastMBAnalyze() for quick i16/i4 decision The decision is based on the variance between DC values of each sub-4x4 block. This heuristic is rather ok for predicting whether the 2nd transform (intra-16) is going to help or not. The decision threshold varies with quality (=quantization). It's only used for -m 0 and -m 1, where no full RD-opt is performed. It actually makes these modes quite faster, with RD curve much closer to the -m 2 mode. Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8	2016-07-15 11:21:08 -07:00
James Zern	6b53ca876e	cosmetics,(dec\|enc)_sse2.c: fix indent Change-Id: Ic3326136ddd325e911e96c2e5a7f06b3e1d60f66	2016-07-13 16:11:29 -07:00
Vincent Rabaud	7561d0c338	FTransformWHT optimization. Data is packed sooner in the functions. Change-Id: I018cfeca43f015ac755c7f209f9a97984cc0517b	2016-02-18 17:44:05 +01:00
Vincent Rabaud	8aa352b256	Merge "Remove an unnecessary transposition in TTransform."	2016-02-18 08:15:10 +00:00
Vincent Rabaud	9960c31685	Remove an unnecessary transposition in TTransform. Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e	2016-02-17 21:41:28 +01:00
Vincent Rabaud	6e36b51188	Small speedup in FTransform. It removes two _mm_unpacklo_epi32 and two _mm_sub_epi16. Change-Id: Icdf86259f796ba855d1cda5e9c0e99cb396cb351	2016-02-17 21:26:36 +01:00
Vincent Rabaud	bf2b4f114f	Regroup common SSE code + optimization. The transpose refactoring will help removing a transpose in a later CL. The horizontal add function helps removing a _mm_sad_epu8 in DC8uv => the latency/throughput went from 29/25 to 23/19 Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766	2016-02-16 18:34:34 +01:00
Pascal Massimino	2dee2966df	remove few obsolete TODO about aligned loads in SSE2 Change-Id: I3628602942ea2ce34dbcb85975d15afc1041f76c	2015-12-15 23:00:41 -08:00
Pascal Massimino	2c08aac81a	introduce WebPMemToUint32 and WebPUint32ToMem for memory access it uses memcpy() when unaligned memory write is tricky Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874	2015-12-04 13:43:01 +00:00
Pascal Massimino	25bf2ce5cc	fix some warning about unaligned 32b reads on x86 + gcc, the assembly code is the same. Change-Id: Ib0d23772ccf928f8d9ebcb0e157c0573d1f6a786	2015-10-28 15:51:55 -07:00
Pascal Massimino	0ae2c2e4b2	SSE2/SSE41: optimize SSE_16xN loops After several trials at re-organizing the main loop and accumulation scheme, this is apparently the faster variant. removed the SSE41 version, which is no longer faster now. For some reason, the AVX variant seems to benefit most for the change. Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5	2015-07-02 20:55:04 +02:00
Pascal Massimino	8ef9a63b45	SSE2: slightly faster FTransformWHT goes from 0.3% to 0.1% overall CPU time, but... Change-Id: I4c9a92b1e1d6b58ed57c6b890366f1dbeaf84f84	2015-07-01 23:03:17 -07:00
skal	ac76801159	introduce FTransform2 to perform two transforms at a time. FTransform goes from ~12.0% to 11.5% total CPU time. Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624	2015-05-18 21:06:15 -07:00
James Zern	929a0fdccd	enc_sse2/TTransform: simplify abs calculation max(b, 0 - b) works as well as (b ^ sign) - b Change-Id: Iad923236fd70db85ff58a64d3c8e25e4f42a525d	2015-05-08 19:50:29 -07:00
James Zern	17dbd05819	enc_sse2/CollectHistogram: simplify abs calculation max(out, 0 - out) works as well as (out ^ sign) - out Change-Id: Id820ab9b296512cb0d56c8026b986bf98e3d3909	2015-05-08 19:49:08 -07:00
James Zern	f274a96ce9	dsp/enc_sse2: add luma4 intra predictors VP8EncPredLuma4 improvement over ~20M pixels: ~39% Change-Id: I9cd841250771276d2d1bef3991215a56e83f7f20	2015-05-05 23:51:19 -07:00
James Zern	040b11bdf6	dsp/enc_sse2: add chroma intra predictors VP8EncPredChroma8 improvements over ~20M pixels left/top: ~67% left-only: ~52% top-only: ~57% none: ~61% based on dec_sse2 versions with minor changes to benefit from the linear storage of the left boundary Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76	2015-05-05 23:51:14 -07:00
James Zern	aee021bbb1	dsp/enc_sse2: add luma16 intra predictors VP8EncPredLuma16 improvements over ~20M pixels left/top: ~75% left-only: ~47% top-only: ~59% none: ~63% based on dec_sse2 versions with minor changes to benefit from the linear storage of the left boundary Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d	2015-05-05 23:51:07 -07:00
James Zern	b44eda3f60	dsp: add DSP_INIT_STUB generates a stub function when the specific architecture is not enabled, exposing a symbol in the module, avoiding a compiler warning Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147	2015-04-02 23:55:35 -07:00
James Zern	67ba7c7acc	enc_sse2: call local FTransform in CollectHistogram allows the former to be inlined; negligible speed-up in most cases, however this is structure is consistent with the rest of the optimized modules Change-Id: Ib080240b06f7a995b47f1906627850c355b82901	2015-03-24 20:22:24 -07:00
James Zern	182497993b	dsp: s/VP8LSetHistogramData/VP8SetHistogramData/ this function is for lossy encoding; the VP8L prefix is used by lossless Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb	2015-03-24 18:27:41 -07:00
James Zern	fbdcef2401	dsp/enc*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: I0cf40b500f9b3eed55a3211213db180c7c0dd43b	2015-03-20 19:19:46 -07:00
Pascal Massimino	2a407092ab	4-5% faster encoding using SSE2 for GetResidualCost new file: cost_sse2.c Change-Id: I4896c07f5ff2443ef743f4435fe2758d95a672ed	2015-02-18 09:41:02 +01:00
James Zern	b969f5dfac	dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage the attribute is only necessary in one location; remove it from the prototypes. Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2	2015-02-13 15:23:40 -08:00
James Zern	183168f332	cosmetics: enc_sse2: add const to some casts source pointers are often cast to __m128*, retain the const in those cases Change-Id: Ib85d63abbb9fc33096f893c2524d3ce8ae3ebd03	2015-02-05 23:51:29 -08:00
Pascal Massimino	bad775715a	simplify the Histogram struct, to only store max_value and last_nz we don't need to store the whole distribution in order to compute the alpha Later, we can incorporate the max_value / last_non_zero bookkeeping in SSE2 directly. Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15	2014-12-10 10:44:57 +01:00
James Zern	f85ec712b0	PrintReg: output to stderr allows use of '-o -' while testing Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191	2014-10-22 17:28:19 +02:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
Pascal Massimino	fabc65da32	1-3% faster encoding optimizing SSE_NxN functions got rid of the \|a-b\|^\|b-a\| method and went back to just (a-b)^2 instead. quality \| size(bytes) after/before \| time (ms) after/before Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0	2014-10-09 07:20:00 -07:00
skal	73d361dd5f	introduce VP8EncQuantize2Blocks to quantize two blocks at a time No speed diff for now. We might reorder better the instructions later, to speed things up. Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c	2014-08-25 20:21:42 -07:00
Pascal Massimino	1f3e5f1e60	remove unused 'shift' argument and QFIX2 define this will remove a warning about the shift amount not being an immediate (=constant). Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a Also: fix the NULL sharpen argument mismatch.	2014-06-26 00:44:12 -07:00
levytamar82	27bfeee43a	QuantizeBlock SSE2 Optimization: Another store to load forward block was detected coming from the function FTransform. FTransform save the output data 4 times 8 bytes each. when this data is later being loaded by the QuantizeBlock function in one chunk of 16 bytes that caused a store to load forward block. The fix was done in the FTransform function where each two consecutive 8 bytes were merged into one 16 bytes register and saved into the memory. This fix gives ~21% function level gain and 1.6% user level gain. Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992	2014-06-18 16:22:00 -07:00
skal	69fce2ea78	remove the special casing for res->first in VP8SetResidualCoeffs if res->first = 1, coeffs[0]=0 because of quant.c:749 and line added at quant.c:744 So, no need for the extra case. Going forward, TrellisQuantizeBlock() should also be calling a variant of VP8SetResidualCoeffs() to set the 'last' field. also: fixes a warning for win64 + slight speed-up Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0	2014-06-08 06:40:22 +02:00
James Zern	db4860b355	enc_sse2: prevent signed int overflow _mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed int. Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda	2014-06-04 23:18:22 -07:00
skal	6679f8996f	Optimize VP8SetResidualCoeffs. Brings down WebP lossy encoding timings by 5% Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489	2014-06-03 06:44:04 +02:00
skal	869eaf6c60	~30% encoding speedup: use NEON for QuantizeBlock() also revamped the signature to avoid having to pass the 'first' parameter Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5	2014-04-08 03:08:22 -07:00
James Zern	2ca42a4fb7	enc_sse2: drop SSE2 suffix from local functions Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e	2014-04-02 23:24:36 -07:00
skal	0235d5e44b	1-2% faster quantization in SSE2 C-version is a bit faster too (sub-1% faster on ARM) Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97	2014-02-13 15:55:30 -08:00
James Zern	5227d99146	drop: ifdef __cplusplus checks from C files the prototypes are already marked in the headers Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094	2013-12-13 11:42:13 -08:00
skal	73b731fb42	introduce a special quantization function for WHT WHT is somewhat a special case: no sharpen[] bias, etc. Will be useful in a later CL when precision of input is changed. Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299	2013-12-10 14:21:47 +01:00
skal	41c0cc4b9a	Make Forward WHT transform use 32bit fixed-point calculation This is in preparation for a future change where input will be 16bit instead of 12bit No speed diff observed. Note that the NEON implementation was using 32bit calc already. Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4	2013-12-10 06:10:52 +01:00
skal	d513bb62bc	* fix off-by-one zthresh calculation * remove the sharpening for non luma-AC coeffs * adjust the bias a little bit to compensate for this Using the multiply-by-reciprocal doesn't always give the same result as the exact divide, given the QFIX fixed-point precision we use. -> removed few now-unneeded SSE2 instructions (and checked for bit-exactness using -noasm) Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24	2013-12-09 13:56:04 +01:00
James Zern	4931c3294b	cosmetics: fix some typos Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c	2013-11-26 19:21:14 -08:00

1 2

65 Commits