libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-02 17:14:30 +02:00

Author	SHA1	Message	Date
Pascal Massimino	9e356d6b25	SSE2: slightly faster (~5%) AddGreenToBlueAndRed() Change-Id: Ie147010b66544c4e959f26966ad588394302d418	2015-06-24 09:36:44 +02:00
Pascal Massimino	fc6c75a2a2	SSE2: 53% faster TransformColor[Inverse] Changed the code (again) to process 4 pixels at a time. Loop is more involved, but overall it's faster. Removed the SSE4.1 implementation which is now slower than SSE2. Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1	2015-06-23 14:52:01 -07:00
Pascal Massimino	49073da6d6	SSE2: 46% speed-up of TransformColor[Inverse] Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e	2015-06-23 20:09:04 +02:00
Pascal Massimino	f3d687e3fa	SSE4.1 implementation of some lossless encoding functions New implementations: SubtractGreenFromBlueAndRed and TransformColor around 1-2% faster lossless encoding. Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861	2015-06-23 08:46:57 +02:00
Pascal Massimino	bfc300c7ff	SSE4.1 implementation of some alpha-processing functions DispatchAlpha* functions are hard to speed up, compared to SSE2. ExtractAlpha sees a ~15% speed-up though. Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de	2015-06-19 14:17:39 -07:00
Pascal Massimino	7f9c98f21d	Merge "sse2 in-loop: simplify SignedShift8b() a bit"	2015-06-12 07:37:32 +00:00
James Zern	ef314a5d6c	dec_sse2/GetNotHEV: micro optimization trade 2 subtractions + logical or for 1 max + 1 subtraction Change-Id: I7d1f25f7cda2a89bc8247f3d3d5417f6b0e3d96c	2015-06-11 22:46:24 -07:00
Pascal Massimino	a729cff987	sse2 in-loop: simplify SignedShift8b() a bit Change-Id: Ida3e096bb41451194d03dc7a97753a222ff0135c	2015-06-11 15:26:31 -07:00
Pascal Massimino	422ec9fb62	simplify Load8x4() a bit Change-Id: I68cf09c432f48e34bbe1d47dd091417cfd40cf4e	2015-06-10 12:35:50 -07:00
James Zern	8df238ec8a	Merge "remove some duplicate FlipSign()"	2015-06-06 05:25:04 +00:00
Pascal Massimino	751506c484	remove some duplicate FlipSign() ApplyFilter2NoFlip is the new variant of ApplyFilter2 without the sign-flip Change-Id: I2af54bd1499118c8321183e42251d265ba76219c	2015-06-05 17:20:29 +02:00
James Zern	65ef5afc27	Merge "lossless: 0.13% compression density gain"	2015-06-03 03:02:09 +00:00
Jyrki Alakuijala	2beef2f245	lossless: 0.13% compression density gain over a 1000 image corpus Single photograph benchmark: Before: Q=20: 2.560 MP/s Q=40: 2.593 MP/s Q=60: 1.795 MP/s Q=80: 1.603 MP/s Q=99: 1.122 MP/s After: Q=20: 3.334 MP/s Q=40: 2.464 MP/s Q=60: 2.009 MP/s Q=80: 1.871 MP/s Q=99: 1.163 MP/s This CL allows for some further improvements that would not be possible otherwise. Change-Id: I61ba154beca2266cb96469281cf96e84a4412586	2015-06-02 17:27:36 -07:00
Pascal Massimino	3033f24c26	lossless: 0.06 % compression density improvement Change-Id: Ib662e6aec53b40d6bc736d3ecfd6475bb005c790	2015-06-02 14:51:51 +02:00
James Zern	64960da9e1	dec_neon: add VE8uv / VE16 VE8uv/VE16: ~25%/~33% faster over 20M pixels Change-Id: Ifac1114091527a05ed10edfcc43852edff012d14	2015-05-30 13:40:00 -07:00
James Zern	14dbd87bed	dec_neon: add HE8uv / HE16 HE8uv/HE16: ~91%/~83% faster over 20M pixels Change-Id: Ib0a776f7c193593ea0993e92cfa6e6be000fb810	2015-05-30 13:39:24 -07:00
skal	ac76801159	introduce FTransform2 to perform two transforms at a time. FTransform goes from ~12.0% to 11.5% total CPU time. Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624	2015-05-18 21:06:15 -07:00
James Zern	aa6065aedd	dec_neon: use vld1_dup(mem) rather than vdup(mem[0]) should result in slightly less general purpose register use Change-Id: I6069f49541392e56c8db2c28c8d1fdf88c1a1726	2015-05-16 11:24:32 -07:00
Pascal Massimino	8b63ac78e0	Merge "dec_neon: add TM16"	2015-05-16 10:56:07 +00:00
Pascal Massimino	f51be09e1f	Merge "dec_neon/TrueMotion: simply left border load"	2015-05-16 10:54:05 +00:00
James Zern	dc48196bd9	dec_neon: add TM16 over 20M pixels ~78% faster Change-Id: I420d5d590f275f19e08f86df1d1caa6b82fffbde	2015-05-15 12:50:11 -07:00
James Zern	ea95b305ca	dec_neon/TrueMotion: simply left border load use vld1_dup_u8() rather than a separate ld+dup after the values were zero extended; mildly faster at the function level Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89	2015-05-15 12:48:13 -07:00
Pascal Massimino	f262d6120e	speed-up SetResidualSSE2 (was unnecessarily complicated) Before: VP8SetResidualCoeffs: checksum = 1127918 elapsed = 475 ms. Change-Id: Ia54bef86c45f9f474622ff16e594bf1da4f67ebd After: VP8SetResidualCoeffs: checksum = 1127918 elapsed = 404 ms.	2015-05-14 21:24:24 -07:00
James Zern	bf46d0acff	fix mips2 build target tested with mips1 and mips2; this should cover 3/4 as well. fixes an ftbfs reported on the debian issue tracker: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785000 Change-Id: I2458487c92bd638589fdfec5adb4f22102a5960c	2015-05-13 10:36:22 -07:00
James Zern	929a0fdccd	enc_sse2/TTransform: simplify abs calculation max(b, 0 - b) works as well as (b ^ sign) - b Change-Id: Iad923236fd70db85ff58a64d3c8e25e4f42a525d	2015-05-08 19:50:29 -07:00
James Zern	17dbd05819	enc_sse2/CollectHistogram: simplify abs calculation max(out, 0 - out) works as well as (out ^ sign) - out Change-Id: Id820ab9b296512cb0d56c8026b986bf98e3d3909	2015-05-08 19:49:08 -07:00
James Zern	a6c1593645	dec_neon: add DC16 intra predictors improvement over 20M pixels: DC16: ~77% DC16NoTop: ~78% DC16NoLeft: ~83% DC16NoTopLeft: ~83% Change-Id: I4c4ee16a8fa0eb466eee45dfa6f6bbce5ce64b99	2015-05-08 00:12:48 -07:00
James Zern	f274a96ce9	dsp/enc_sse2: add luma4 intra predictors VP8EncPredLuma4 improvement over ~20M pixels: ~39% Change-Id: I9cd841250771276d2d1bef3991215a56e83f7f20	2015-05-05 23:51:19 -07:00
James Zern	040b11bdf6	dsp/enc_sse2: add chroma intra predictors VP8EncPredChroma8 improvements over ~20M pixels left/top: ~67% left-only: ~52% top-only: ~57% none: ~61% based on dec_sse2 versions with minor changes to benefit from the linear storage of the left boundary Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76	2015-05-05 23:51:14 -07:00
James Zern	aee021bbb1	dsp/enc_sse2: add luma16 intra predictors VP8EncPredLuma16 improvements over ~20M pixels left/top: ~75% left-only: ~47% top-only: ~59% none: ~63% based on dec_sse2 versions with minor changes to benefit from the linear storage of the left boundary Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d	2015-05-05 23:51:07 -07:00
James Zern	4c9af02326	dec_neon: add DC8uvNoTopLeft ~93% faster Change-Id: Icf0fd5f85ac53c306a1b69d84275023e5b24a602	2015-05-01 20:03:57 -07:00
Pascal Massimino	9287761d95	Merge "GetResidualCostSSE2: simplify abs calculation"	2015-04-30 06:30:58 +00:00
James Zern	0e009366f8	dsp/cpu.c(x86): check maximum supported cpuid feature structured extended feature flags require eax = 7; avoids incorrectly detecting avx2 on some older processors that support avx. for completeness also check for value=1 support used by the other checks. from [1]: INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor Information and the Vendor Identification String [1] http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html Change-Id: I60b20d661a978d551614dbf7acdc25db19cb6046	2015-04-29 23:22:53 -07:00
James Zern	b243a4bc30	GetResidualCostSSE2: simplify abs calculation max(coeff, 0 - coeff) works as well as min/max/sub or (coeff ^ sign) - coeff Change-Id: I9b11715372e49cd83820677bf4beba4a1c04931c	2015-04-21 20:29:12 -07:00
James Zern	0768b252fa	dsp/enc.c: cosmetics: move DST() def closer to use Change-Id: Iccbcf046412426c2893b71eced517f611d2ffc3f	2015-04-15 20:03:39 -07:00
James Zern	9904e365a8	dsp/dec_sse2: DC8uv / DC8uvNoLeft speedup use psadbw to perform top row summation; left remains in C as repacking it into a vector to apply the same operation is too costly. DC8uv: ~19% faster DC8uvNoLeft: ~12% faster Change-Id: I707c4f6177a65b5d1f2d3deeca87d2bb740185e2	2015-04-08 23:12:53 -07:00
James Zern	7df2049785	dsp/dec_sse2: DC16 / DC16NoLeft speedup use psadbw to perform top row summation; left remains in C as repacking it into a vector to apply the same operation is too costly. DC16: ~20% faster DC16NoLeft: ~14% faster Change-Id: I7ec3f8a6e5923f88a530f79fceb88d5001bef691	2015-04-08 23:10:39 -07:00
James Zern	b44eda3f60	dsp: add DSP_INIT_STUB generates a stub function when the specific architecture is not enabled, exposing a symbol in the module, avoiding a compiler warning Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147	2015-04-02 23:55:35 -07:00
James Zern	1a338fb306	enc_sse41: add Disto4x4 / Disto16x16 direct translation from sse2; minor gain, fewer instructions Change-Id: I60288a842fac1a686b82b5cab637931789fe29f2	2015-03-25 23:28:46 -07:00
Pascal Massimino	94055503e3	encoding SSE4.1 stub for StoreHistogram + Quantize + SSE_16xN Visible speed-up, thanks to pshufb and pabsw and psignw use. had to tweak configure.ac to make "smmintri.h" presence correctly detected (we need to set the CPPFLAGS instead of the CFLAGS!) Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080	2015-03-25 20:23:51 -07:00
Pascal Massimino	c64659e1b4	remove duplicate variables after the lossless{_enc}.c split clang was giving "duplicate symbols" error messages at link time. Change-Id: I2b77b55222fe033cc1d4636567902e80d814aab6	2015-03-25 11:10:21 +01:00
James Zern	67ba7c7acc	enc_sse2: call local FTransform in CollectHistogram allows the former to be inlined; negligible speed-up in most cases, however this is structure is consistent with the rest of the optimized modules Change-Id: Ib080240b06f7a995b47f1906627850c355b82901	2015-03-24 20:22:24 -07:00
James Zern	182497993b	dsp: s/VP8LSetHistogramData/VP8SetHistogramData/ this function is for lossy encoding; the VP8L prefix is used by lossless Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb	2015-03-24 18:27:41 -07:00
James Zern	ede5e1584c	cosmetics: dsp/lossless.h: reorder prototypes group decoding / encoding functions together, followed by their respective Init() function. Change-Id: Ib4d22f8ec2369efec752faf733ecf53acc67b1ca	2015-03-24 17:52:42 -07:00
James Zern	553051f741	dsp/lossless: split enc/dec functions adds lossless_enc*.c; reduces the size of the decode-only so: ~78K w/gcc-4.8.2 on x86_64. Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092	2015-03-23 22:57:50 -07:00
James Zern	cecf509662	dsp/yuv*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: I42e621481be7305bb7c426b4d0b279619195611e	2015-03-20 19:19:46 -07:00
James Zern	6584d398eb	dsp/upsampling*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: I3c753915eefe900987c9720733efb720ebe6bfa7	2015-03-20 19:19:46 -07:00
James Zern	808094228c	dsp/rescaler*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: Ife9c7cd363b3692b64a7ade1960cfce3a76c3ba2	2015-03-20 19:19:46 -07:00
James Zern	1d93ddec19	dsp/lossless*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: If8b4459556e6bfaa36ef046f66520558b9444fc2	2015-03-20 19:19:46 -07:00
James Zern	73805ff270	dsp/filters*.c: rework WEBP_USE_<arch> ifdef add a dummy init rather than repeating the '#ifdef WEBP_USE_...' pattern. Change-Id: Idf08ffeb2aef1392a6d69596d897a59deebb64cf	2015-03-20 19:19:46 -07:00

1 2 3 4 5 ...

495 Commits