libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-08-11 02:20:33 +02:00

Author	SHA1	Message	Date
James Zern	32b3137936	configure: move config.h to src/webp/config.h this change has the side-effect of using directory names in the include, silencing a lint warning. Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02	2014-06-10 23:42:00 -07:00
James Zern	90090d99b5	Merge changes I7c675e51,I84f7d785 * changes: configure: test for -msse2 rename upsampling_mips32.c to yuv_mips32.c	2014-06-10 16:15:21 -07:00
skal	69fce2ea78	remove the special casing for res->first in VP8SetResidualCoeffs if res->first = 1, coeffs[0]=0 because of quant.c:749 and line added at quant.c:744 So, no need for the extra case. Going forward, TrellisQuantizeBlock() should also be calling a variant of VP8SetResidualCoeffs() to set the 'last' field. also: fixes a warning for win64 + slight speed-up Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0	2014-06-08 06:40:22 +02:00
James Zern	6e61a3a905	configure: test for -msse2 + add a WEBP_HAVE_SSE2 to dsp.h not all 32-bit toolchain configurations will have sse2 enabled by default Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e	2014-06-07 19:44:08 -07:00
James Zern	b9d2efc629	rename upsampling_mips32.c to yuv_mips32.c matches yuv_sse2 added in; `bdfeeba` dsp/yuv: move sse2 functions to yuv_sse2.c Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3	2014-06-07 12:35:47 -07:00
James Zern	bdfeebaa01	dsp/yuv: move sse2 functions to yuv_sse2.c Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000	2014-06-05 23:52:54 -07:00
pascal massimino	46b32e861a	Merge "configure: set WEBP_HAVE_AVX2 when available"	2014-06-05 02:57:42 -07:00
James Zern	db4860b355	enc_sse2: prevent signed int overflow _mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed int. Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda	2014-06-04 23:18:22 -07:00
James Zern	230a055501	configure: set WEBP_HAVE_AVX2 when available this is used to set WEBP_USE_AVX2 in files where the build flag won't be used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f	2014-06-03 23:29:23 -07:00
James Zern	61362db57c	remove libwebpdspdecode dep on libwebpdsp_avx2 it's encode only, libwebpdecoder doesn't need the symbols Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83	2014-06-03 00:05:56 -07:00
James Zern	9754d39a4e	Merge "strong filtering speed-up (~2-3% x86, ~1-2% for NEON)"	2014-06-02 23:06:18 -07:00
skal	ea8b0a171d	strong filtering speed-up (~2-3% x86, ~1-2% for NEON) Extract loop invariant and avoid storing/loading samples if they can be re-used. This is particularly interesting when a transpose is involved (HFilter16i). Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139	2014-06-03 07:14:23 +02:00
skal	6679f8996f	Optimize VP8SetResidualCoeffs. Brings down WebP lossy encoding timings by 5% Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489	2014-06-03 06:44:04 +02:00
James Zern	4dfa86b29c	dsp/cpu: NaCl has no support for xgetbv or the raw opcode; fixes: 934ed4: unrecognized instruction Change-Id: I981870baf0e8b03bf40144ea8ec25eff140d5bc3	2014-05-29 23:02:23 -07:00
pascal massimino	57897bae09	Merge "lossless_neon: use vcreate_*() where appropriate"	2014-05-28 01:36:13 -07:00
pascal massimino	6aa4777b39	Merge "(enc\|dec)_neon: use vcreate_*() where appropriate"	2014-05-28 01:34:56 -07:00
skal	0d346e418d	Always reinit VP8TransformWHT instead of hard-coding Change-Id: I2012749ed29bd166d2a96555372f0d9baa784385	2014-05-28 10:21:07 +02:00
James Zern	bf0e003067	lossless_neon: use vcreate_*() where appropriate this is more portable than {} initialization. more involved cases are left for a follow-up. Change-Id: If7e111864f287ea0a5de6311454aeda37afbb52a	2014-05-27 16:27:46 -07:00
James Zern	9251c2f6d2	(enc\|dec)_neon: use vcreate_*() where appropriate this is more portable than {} initialization. more involved cases are left for a follow-up. Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17	2014-05-27 16:26:56 -07:00
skal	399b916d27	lossy decoding: correct alpha-rescaling for YUVA format The luminance needs to be pre- and post- multiplied by the alpha value in case of rescaling, for proper averaging. Also: - removed util/alpha_processing and moved it to dsp/ - removed WebPInitPremultiply() which was mostly useless and merged it with the new function WebPInitAlphaProcessing() Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60	2014-05-27 15:27:13 -07:00
James Zern	515e35cfb1	Merge "add stub dsp/enc_avx2.c"	2014-05-22 18:28:38 -07:00
skal	a05dc1402c	SSE2: yuv->rgb speed-up for point-sampling - use statically initialized tables (if WEBP_YUV_USE_SSE2_TABLES is defined) - use SSE2 row conversion for yuv->ARGB / RGBA / ABGR / RGB / BGR - clean-up and harmonize the WebpUpsamplers[] usage. Change-Id: Ic5f3659a995927bd7363defac99c1fc03a85a47d	2014-05-22 09:56:47 +02:00
James Zern	178e9a69ae	add stub dsp/enc_avx2.c VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure flag should be added to avoid the stub when avx2 is unavailable/disabled Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108	2014-05-22 00:31:46 -07:00
James Zern	e46a247c87	cpu: fix check for __cpuidex availability __cpuidex was added in VS2008 /SP1/ Change-Id: Ie49b00b0246bd6537c0ed583412f17d6fd135baa	2014-05-21 22:59:47 -07:00
James Zern	541784c710	dsp.h: add a check for AVX2 / define WEBP_USE_AVX2 Change-Id: I90cc870f0bb4426af701779c367587dc2ae79c8b	2014-05-21 20:46:28 -07:00
James Zern	bdb151ee80	dsp/cpu: add AVX2 detection currently unused. https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf Change-Id: I314200f890c58b9a587b902b214f90deb95f0579	2014-05-20 22:48:54 -07:00
Pascal Massimino	a2f8b28905	revamp the point-sampling functions by processing a full plane -nofancy is slower than fancy upsampler, because the latter has SSE2 optim. Change-Id: Ibf22e5a8ea1de86a54248d4a4ecc63d514c01b88	2014-05-20 15:13:44 -07:00
pascal massimino	2b5cb32612	Merge "dsp/cpu: add AVX detection"	2014-05-20 01:10:18 -07:00
James Zern	df08e67e06	dsp/cpu: add AVX detection currently unused. https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions similar checks exist in ffmpeg, libyuv. the visual studio inline asm is based off of libyuv. Change-Id: I3e233de3492172434e482607a94b99c617f11aad	2014-05-20 00:25:12 -07:00
Pascal Massimino	e2f405c969	Merge "clean-up and slight speed-up in-loop filtering SSE2"	2014-05-20 00:08:40 -07:00
Pascal Massimino	f60957bfd2	clean-up and slight speed-up in-loop filtering SSE2 * remove some sign-bit flipping * turn some macro into inline functions * fix some 'const' in signatures * clarify the int8/uint8 usage Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d	2014-05-19 23:23:47 -07:00
James Zern	a577b23a0a	dsp/WEBP_USE_NEON: test for __aarch64__ __ARM_NEON__ is unset by current linux gcc/clang + android toolchains for aarch64/arm64 builds. Change-Id: Ib2ca172ea6fcf046e4ced19a431088674c99b7f6	2014-05-14 00:07:13 -07:00
James Zern	1ba61b09f9	enable NEON intrinsics in aarch64 builds avoids functions that use vtbl? as in iOS builds these are marked unavailable Change-Id: I17aedc3c7dc8f1d5be0941205de0b22c3772ef1b	2014-05-03 12:37:42 -07:00
James Zern	b9d2bb67d6	dsp/neon.h: coalesce intrinsics-related defines Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946	2014-05-03 11:34:07 -07:00
James Zern	bd6b8619dd	dsp/lossless: prevent signed int overflow in left shift ops force unsigned when shifting by 24. Change-Id: I453601f33fdf01c516ef66ad23399ae6cbe032b3	2014-04-30 00:10:49 -07:00
Djordje Pesut	38e2db3e16	MIPS: MIPS32r1: Added optimization for HistogramAdd. Change-Id: I39622a9c340c4090f64dd10e515c4ef2aa21d10a	2014-04-29 08:36:51 -07:00
Vikas Arora	6d6865f0db	Added SSE2 variants for Average2/3/4 The predictors based on Average2 are tad slower. Following is the performance data for these predictors normalized to number of instruction cycles (as per valgrind) per operation: - Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11 instruction cycles for the C version. - Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12 instruction cycles for the C version. The predictors based on Average4 is faster and Average3 is tad slower: - Predictor10 (Average4) now takes 23 instruction cycles compared to 25 instruction cycles for the C version. - Predictor5 (Average3) now takes 20 instruction cycles compared to 18 instruction cycles for the C version. Maybe SSE2 version of Average2 can be improved further. Otherwise, we can remove the SSE2 version and always fallback to the C version. Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029	2014-04-28 14:47:30 -07:00
Pascal Massimino	b3a616b356	make HistogramAdd() a pointer in dsp * merged the two HistogramAdd/AddEval() into a single call (with detection of special case when b==out) * added a SSE2 variant * harmonize the histogram type to 'uint32_t' instead of just 'int'. This has a lot of ripples on signatures. * 1-2% faster Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306	2014-04-28 10:09:34 -07:00
James Zern	c8bbb636ea	dec_neon: relocate some inline-asm defines move simple loop filter defines closer to their use and LOAD* to a location common with the intrinsics Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3	2014-04-28 00:41:42 -07:00
James Zern	4e393bb9f1	dec_neon: enable intrinsics-only functions the complex loop filter has no inline equivalent; the simple loop filter remains conditional on USE_INTRINSICS: it's left undefined for now. Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2	2014-04-28 00:39:46 -07:00
James Zern	ba99a922ab	dec_neon: use positive tests for USE_INTRINSICS makes Simple* layout consistent with the rest of the file Change-Id: Ib3108b0f2c694c634210e22027c253ea6236a9c6	2014-04-28 00:38:47 -07:00
James Zern	a7828e8bdb	dec_neon: make WORK_AROUND_GCC conditional on version Change-Id: Ic1b95f8749988de90df7c1ff6c537a21981329db	2014-04-28 00:01:19 -07:00
pascal massimino	3f3d717a6c	Merge "enc_neon: enable intrinsics-only functions"	2014-04-27 02:05:53 -07:00
pascal massimino	de3cb6c820	Merge "move LOCAL_GCC_VERSION def to dsp.h"	2014-04-27 02:04:08 -07:00
pascal massimino	ca49e7ad97	Merge "enc_neon: move Transpose4x4 to dsp/neon.h"	2014-04-27 01:11:05 -07:00
James Zern	42b35e086b	enc_neon: enable intrinsics-only functions CollectHistogram / SSE* / QuantizeBlock have no inline equivalents, enable them where possible and use USE_INTRINSICS to control borderline cases: it's left undefined for now. Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155	2014-04-26 19:09:04 -07:00
James Zern	f937e01261	move LOCAL_GCC_VERSION def to dsp.h + add LOCAL_GCC_PREREQ and use it in lossless_neon.c Change-Id: Ic9fd99540bc3e19c027d1598e4530dfdc9b9de00	2014-04-26 19:09:04 -07:00
James Zern	5e1a17ef4b	enc_neon: move Transpose4x4 to dsp/neon.h + reuse it in TransformWHT() Change-Id: Idfbd0f9b58d6253ac3d65ba55b58989c427ee989	2014-04-26 14:06:04 -07:00
James Zern	c7b92a5a29	dec_neon: (WORK_AROUND_GCC) delete unused Load4x8 using this in Load4x16 was slightly slower and didn't help mitigate any of the remaining build issues with 4.6.x. Change-Id: Idabfe1b528842a514d14a85f4cefeb90abe08e51	2014-04-26 12:36:14 -07:00
Djordje Pesut	1d62acf6af	MIPS: MIPS32r1: Added optimization for HuffmanCost functions. HuffmanCost and HuffmanCostCombined optimized and added 'const' to some variables from ExtraCost functions. Change-Id: I28b2b357a06766bee78bdab294b5fc8c05ac120d	2014-04-24 11:14:57 +02:00

1 2 3 4 5 ...

273 Commits