libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-18 23:09:52 +02:00

Author	SHA1	Message	Date
Nozomi Isozaki	ac42dde1c5	Specialize and optimize ITransform_SSE2 using do_two Change-Id: I976eb4a0cc4e669a02b55012d4aba1536f193781	2023-05-16 12:07:58 +09:00
James Zern	aff1c546ef	dsp,x86: normalize types w/_mm_cvtsi128_si32 calls fixes integer sanitizer warnings of the form: implicit conversion from type 'int' of value -2122283647 (32-bit, signed) to type 'uint32_t' (aka 'unsigned int') changed the value to 2172683649 (32-bit, unsigned) implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 3724541952 (32-bit, unsigned) to type 'int' changed the value to -570425344 (32-bit, signed) Bug: b/229626362 Change-Id: I79f68e3e2fcab7cc0402477d2e88d629348c9ff4	2022-08-04 11:26:23 -07:00
James Zern	835392393b	dsp,x86: normalize types w/_mm_set* calls fixes integer sanitizer warnings of the form: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) runtime error: implicit conversion from type 'uint8_t' (aka 'unsigned char') of value 128 (8-bit, unsigned) to type 'char' changed the value to -128 (8-bit, signed) Bug: b/229626362 Change-Id: I6be3c40407cf7a27b79d31ee32d3829ecb78ed66	2022-08-03 16:50:46 -07:00
James Zern	748e92bbb9	add WebPInt32ToMem and use it in calls containing _mm_cvtsi32_si128; this calls WebPUint32ToMem, but corrects the type to avoid runtime warnings with clang -fsanitize=integer of the form: implicit conversion from type 'int' of value -1904123502 (32-bit, signed) to type 'uint32_t' (aka 'unsigned int') changed the value to 2390843794 (32-bit, unsigned) Bug: b/229626362 Change-Id: I20545e822d8045fa44f688241879206055a0a148	2022-08-01 13:44:20 -07:00
James Zern	4f402f34a1	add WebPMemToInt32 and use it with calls to _mm_cvtsi32_si128 and _mm_set_epi32; this calls WebPMemToUint32, but corrects the type to avoid runtime warnings with clang -fsanitize=integer of the form: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 2155905152 (32-bit, unsigned) to type 'int' changed the value to -2139062144 (32-bit, signed) Bug: b/229626362 Change-Id: I50101ba2b46dfaa852f02d46830f3511c80b02d9	2022-07-28 22:10:22 -07:00
James Zern	e78dea7587	(alpha_processing,enc}_sse2: quiet signed conv warnings _mm_set1_epi8() takes a char argument _mm_insert_epi16 takes a short argument from clang-7 integer sanitizer: implicit conversion from type 'int' of value 255 (32-bit, signed) to type 'char' changed the value to -1 (8-bit, signed) implicit conversion from type 'int' of value 33153 (32-bit, signed) to type 'short' changed the value to -32383 (16-bit, signed) Change-Id: Ic88c8ef3d00146d34f53a560582db673f818370d	2019-06-10 14:23:58 -07:00
Pascal Massimino	0a17f4712c	Merge "WIP: list includes as descendants of the project dir"	2017-10-11 08:21:42 +00:00
James Zern	a439972175	WIP: list includes as descendants of the project dir #include "(.\|..)/..." -> #include "src/..." Change-Id: I772880aa097a770722043c8a4393552ba38a89b6	2017-10-10 23:04:05 -07:00
James Zern	bc634d57c2	enc_sse2: harmonize function suffixes BUG=webp:355 Change-Id: Idd2f289fcf99f12bf36494111b07a8906c99c826	2017-10-08 14:05:59 -07:00
skal	b09307dcde	Encoder: harmonize function suffixes BUG=webp:355 Change-Id: Ia2fe95db7dfb303f3f64e390d43bc41b8933256c	2017-08-09 02:41:01 +00:00
Pascal Massimino	693bf74ec0	move the SSIM calculation code in ssim.c / ssim_sse2.c Change-Id: I63a63fa7f44f257f2e17e45358b206c23069c448	2017-02-21 12:53:35 +01:00
James Zern	668e1dd44f	src/{dec,enc,utils}: give filenames a unique suffix this avoids duplicates between these trees and dsp/, e.g., enc/tree.c, dec/tree.c, making pulling the whole library source tree into one target possible BUG=webp:279 Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775	2017-01-19 19:09:48 -08:00
Owen Rodley	67748b41db	Improve latency of FTransform2. Benchmarks from vrabaud@: 8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged images/lpcb corpus speed: unchanged, corpus size: unchanged images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged images/png_doc corpus speed: unchanged, corpus size: unchanged images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged images/jpeg_small corpus speed: unchanged, corpus size: unchanged images/icip_core1 corpus speed: unchanged, corpus size: unchanged images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged images/gif corpus speed: faster: -1.4 % , corpus size: unchanged images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged images/all corpus speed: faster: -1.5 % , corpus size: unchanged images/png_small corpus speed: unchanged, corpus size: unchanged images/png corpus speed: unchanged, corpus size: unchanged images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b	2016-11-24 07:09:50 +00:00
Pascal Massimino	ba843a92e7	fix some SSIM calculations * prevent 64bit overflow by controlling the 32b->64b conversions and preventively descaling by 8bit before the final multiply * adjust the threshold constants C1 and C2 to de-emphasis the dark areas * use a hat-like filter instead of box-filtering to avoid blockiness during averaging SSIM distortion calc is actually faster now in SSE2, because of the unrolling during the function rewrite. The C-version is quite slower because still un-optimized. Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90	2016-10-04 01:09:07 -07:00
Pascal Massimino	86a84b3598	2x faster SSE2 implementation of SSIMGet Change-Id: I53705d7ddfa595389ff2d542e5088f96f948d351	2016-09-23 23:23:06 -07:00
Pascal Massimino	50c3d7da9a	refactor the PSNR / SSIM calculation code -print_psnr is now much faster because it doesn't use the SSIM code. The SSIM speed-up and re-write will come later. Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823	2016-09-14 06:13:24 +00:00
skal	5b60db5c9d	FastMBAnalyze() for quick i16/i4 decision The decision is based on the variance between DC values of each sub-4x4 block. This heuristic is rather ok for predicting whether the 2nd transform (intra-16) is going to help or not. The decision threshold varies with quality (=quantization). It's only used for -m 0 and -m 1, where no full RD-opt is performed. It actually makes these modes quite faster, with RD curve much closer to the -m 2 mode. Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8	2016-07-15 11:21:08 -07:00
James Zern	6b53ca876e	cosmetics,(dec\|enc)_sse2.c: fix indent Change-Id: Ic3326136ddd325e911e96c2e5a7f06b3e1d60f66	2016-07-13 16:11:29 -07:00
Vincent Rabaud	7561d0c338	FTransformWHT optimization. Data is packed sooner in the functions. Change-Id: I018cfeca43f015ac755c7f209f9a97984cc0517b	2016-02-18 17:44:05 +01:00
Vincent Rabaud	8aa352b256	Merge "Remove an unnecessary transposition in TTransform."	2016-02-18 08:15:10 +00:00
Vincent Rabaud	9960c31685	Remove an unnecessary transposition in TTransform. Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e	2016-02-17 21:41:28 +01:00
Vincent Rabaud	6e36b51188	Small speedup in FTransform. It removes two _mm_unpacklo_epi32 and two _mm_sub_epi16. Change-Id: Icdf86259f796ba855d1cda5e9c0e99cb396cb351	2016-02-17 21:26:36 +01:00
Vincent Rabaud	bf2b4f114f	Regroup common SSE code + optimization. The transpose refactoring will help removing a transpose in a later CL. The horizontal add function helps removing a _mm_sad_epu8 in DC8uv => the latency/throughput went from 29/25 to 23/19 Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766	2016-02-16 18:34:34 +01:00
Pascal Massimino	2dee2966df	remove few obsolete TODO about aligned loads in SSE2 Change-Id: I3628602942ea2ce34dbcb85975d15afc1041f76c	2015-12-15 23:00:41 -08:00
Pascal Massimino	2c08aac81a	introduce WebPMemToUint32 and WebPUint32ToMem for memory access it uses memcpy() when unaligned memory write is tricky Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874	2015-12-04 13:43:01 +00:00
Pascal Massimino	25bf2ce5cc	fix some warning about unaligned 32b reads on x86 + gcc, the assembly code is the same. Change-Id: Ib0d23772ccf928f8d9ebcb0e157c0573d1f6a786	2015-10-28 15:51:55 -07:00
Pascal Massimino	0ae2c2e4b2	SSE2/SSE41: optimize SSE_16xN loops After several trials at re-organizing the main loop and accumulation scheme, this is apparently the faster variant. removed the SSE41 version, which is no longer faster now. For some reason, the AVX variant seems to benefit most for the change. Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5	2015-07-02 20:55:04 +02:00
Pascal Massimino	8ef9a63b45	SSE2: slightly faster FTransformWHT goes from 0.3% to 0.1% overall CPU time, but... Change-Id: I4c9a92b1e1d6b58ed57c6b890366f1dbeaf84f84	2015-07-01 23:03:17 -07:00
skal	ac76801159	introduce FTransform2 to perform two transforms at a time. FTransform goes from ~12.0% to 11.5% total CPU time. Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624	2015-05-18 21:06:15 -07:00
James Zern	929a0fdccd	enc_sse2/TTransform: simplify abs calculation max(b, 0 - b) works as well as (b ^ sign) - b Change-Id: Iad923236fd70db85ff58a64d3c8e25e4f42a525d	2015-05-08 19:50:29 -07:00
James Zern	17dbd05819	enc_sse2/CollectHistogram: simplify abs calculation max(out, 0 - out) works as well as (out ^ sign) - out Change-Id: Id820ab9b296512cb0d56c8026b986bf98e3d3909	2015-05-08 19:49:08 -07:00
James Zern	f274a96ce9	dsp/enc_sse2: add luma4 intra predictors VP8EncPredLuma4 improvement over ~20M pixels: ~39% Change-Id: I9cd841250771276d2d1bef3991215a56e83f7f20	2015-05-05 23:51:19 -07:00
James Zern	040b11bdf6	dsp/enc_sse2: add chroma intra predictors VP8EncPredChroma8 improvements over ~20M pixels left/top: ~67% left-only: ~52% top-only: ~57% none: ~61% based on dec_sse2 versions with minor changes to benefit from the linear storage of the left boundary Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76	2015-05-05 23:51:14 -07:00
James Zern	aee021bbb1	dsp/enc_sse2: add luma16 intra predictors VP8EncPredLuma16 improvements over ~20M pixels left/top: ~75% left-only: ~47% top-only: ~59% none: ~63% based on dec_sse2 versions with minor changes to benefit from the linear storage of the left boundary Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d	2015-05-05 23:51:07 -07:00
James Zern	b44eda3f60	dsp: add DSP_INIT_STUB generates a stub function when the specific architecture is not enabled, exposing a symbol in the module, avoiding a compiler warning Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147	2015-04-02 23:55:35 -07:00
James Zern	67ba7c7acc	enc_sse2: call local FTransform in CollectHistogram allows the former to be inlined; negligible speed-up in most cases, however this is structure is consistent with the rest of the optimized modules Change-Id: Ib080240b06f7a995b47f1906627850c355b82901	2015-03-24 20:22:24 -07:00
James Zern	182497993b	dsp: s/VP8LSetHistogramData/VP8SetHistogramData/ this function is for lossy encoding; the VP8L prefix is used by lossless Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb	2015-03-24 18:27:41 -07:00
James Zern	fbdcef2401	dsp/enc*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: I0cf40b500f9b3eed55a3211213db180c7c0dd43b	2015-03-20 19:19:46 -07:00
Pascal Massimino	2a407092ab	4-5% faster encoding using SSE2 for GetResidualCost new file: cost_sse2.c Change-Id: I4896c07f5ff2443ef743f4435fe2758d95a672ed	2015-02-18 09:41:02 +01:00
James Zern	b969f5dfac	dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage the attribute is only necessary in one location; remove it from the prototypes. Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2	2015-02-13 15:23:40 -08:00
James Zern	183168f332	cosmetics: enc_sse2: add const to some casts source pointers are often cast to __m128*, retain the const in those cases Change-Id: Ib85d63abbb9fc33096f893c2524d3ce8ae3ebd03	2015-02-05 23:51:29 -08:00
Pascal Massimino	bad775715a	simplify the Histogram struct, to only store max_value and last_nz we don't need to store the whole distribution in order to compute the alpha Later, we can incorporate the max_value / last_non_zero bookkeeping in SSE2 directly. Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15	2014-12-10 10:44:57 +01:00
James Zern	f85ec712b0	PrintReg: output to stderr allows use of '-o -' while testing Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191	2014-10-22 17:28:19 +02:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
Pascal Massimino	fabc65da32	1-3% faster encoding optimizing SSE_NxN functions got rid of the \|a-b\|^\|b-a\| method and went back to just (a-b)^2 instead. quality \| size(bytes) after/before \| time (ms) after/before Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0	2014-10-09 07:20:00 -07:00
skal	73d361dd5f	introduce VP8EncQuantize2Blocks to quantize two blocks at a time No speed diff for now. We might reorder better the instructions later, to speed things up. Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c	2014-08-25 20:21:42 -07:00
Pascal Massimino	1f3e5f1e60	remove unused 'shift' argument and QFIX2 define this will remove a warning about the shift amount not being an immediate (=constant). Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a Also: fix the NULL sharpen argument mismatch.	2014-06-26 00:44:12 -07:00
levytamar82	27bfeee43a	QuantizeBlock SSE2 Optimization: Another store to load forward block was detected coming from the function FTransform. FTransform save the output data 4 times 8 bytes each. when this data is later being loaded by the QuantizeBlock function in one chunk of 16 bytes that caused a store to load forward block. The fix was done in the FTransform function where each two consecutive 8 bytes were merged into one 16 bytes register and saved into the memory. This fix gives ~21% function level gain and 1.6% user level gain. Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992	2014-06-18 16:22:00 -07:00
skal	69fce2ea78	remove the special casing for res->first in VP8SetResidualCoeffs if res->first = 1, coeffs[0]=0 because of quant.c:749 and line added at quant.c:744 So, no need for the extra case. Going forward, TrellisQuantizeBlock() should also be calling a variant of VP8SetResidualCoeffs() to set the 'last' field. also: fixes a warning for win64 + slight speed-up Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0	2014-06-08 06:40:22 +02:00

1 2

75 Commits