libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2026-04-10 06:40:02 +02:00

Author	SHA1	Message	Date
James Zern	a9cf31913c	cosmetics: update thread.h comments WebPWorker*() are now part of WebPWorkerInterface; refer to them with unadorned names. Change-Id: Iae1dd59f1e545cba6dd8c18f26ba60eb9a84419b	2014-06-19 19:31:10 -07:00
levytamar82	27bfeee43a	QuantizeBlock SSE2 Optimization: Another store to load forward block was detected coming from the function FTransform. FTransform save the output data 4 times 8 bytes each. when this data is later being loaded by the QuantizeBlock function in one chunk of 16 bytes that caused a store to load forward block. The fix was done in the FTransform function where each two consecutive 8 bytes were merged into one 16 bytes register and saved into the memory. This fix gives ~21% function level gain and 1.6% user level gain. Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992	2014-06-18 16:22:00 -07:00
skal	c5c6b408b1	Merge "add alpha dithering for lossy"	2014-06-18 07:32:24 -07:00
James Zern	6e93317f5b	muxread: fix out of bounds read ChunkVerifyAndAssign() expects to have at least 8 bytes to work with, but was only checking for the presence of 4. Change-Id: I8456b15d872de24a90c1e8fbfba463391ced5c7f	2014-06-17 12:53:28 -07:00
skal	bbe32df1e3	add alpha dithering for lossy new options: dwebp -alpha_dither vwebp -noalphadither When the source was marked as quantized, we use a threshold-averaging filter to smooth the decoded alpha plane. Note: this option forces the decoding of alpha data in one pass, and might slow the decoding a bit. The new field in WebPDecoderOptions struct is 'alpha_dithering_strength' (0 by default, means: off). Max strength value is '100'. Change-Id: I218e21af96360d4781587fede95f8ea4e2b7287a	2014-06-14 00:06:16 +02:00
skal	790207679d	Merge "make error-code reporting consistent upon malloc failure"	2014-06-13 00:25:30 -07:00
skal	77bf4410f7	make error-code reporting consistent upon malloc failure Sometimes, the error-code was not set correctly. We now return OUT_OF_MEMORY everytimes it's appropriate (tested using MALLOC_FAIL_AT mechanism) Took the opportunity to clean-up the code and dust the error code returned (some were erroneously set to INVALID_CONFIGURATION) Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c	2014-06-13 08:45:12 +02:00
James Zern	7a93c000ee	**/Makefile.am: remove unused AM_CPPFLAGS only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former getting precedence when it's defined. configure's DEFAULT_INCLUDES is covering what's necessary given the include paths are all source relative. Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3	2014-06-12 11:59:05 -07:00
skal	24e3080571	Add an interface abstraction to the WebP worker thread implementation This allows custom implementations of threading mecanism. Patch by Leonhard Gruenschloss. Change-Id: Id8ea5917acd2f24fa8bce79748d1747de2751614	2014-06-12 11:35:44 +02:00
James Zern	32b3137936	configure: move config.h to src/webp/config.h this change has the side-effect of using directory names in the include, silencing a lint warning. Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02	2014-06-10 23:42:00 -07:00
James Zern	90090d99b5	Merge changes I7c675e51,I84f7d785 * changes: configure: test for -msse2 rename upsampling_mips32.c to yuv_mips32.c	2014-06-10 16:15:21 -07:00
skal	69fce2ea78	remove the special casing for res->first in VP8SetResidualCoeffs if res->first = 1, coeffs[0]=0 because of quant.c:749 and line added at quant.c:744 So, no need for the extra case. Going forward, TrellisQuantizeBlock() should also be calling a variant of VP8SetResidualCoeffs() to set the 'last' field. also: fixes a warning for win64 + slight speed-up Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0	2014-06-08 06:40:22 +02:00
James Zern	6e61a3a905	configure: test for -msse2 + add a WEBP_HAVE_SSE2 to dsp.h not all 32-bit toolchain configurations will have sse2 enabled by default Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e	2014-06-07 19:44:08 -07:00
James Zern	b9d2efc629	rename upsampling_mips32.c to yuv_mips32.c matches yuv_sse2 added in; `bdfeeba` dsp/yuv: move sse2 functions to yuv_sse2.c Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3	2014-06-07 12:35:47 -07:00
James Zern	bdfeebaa01	dsp/yuv: move sse2 functions to yuv_sse2.c Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000	2014-06-05 23:52:54 -07:00
pascal massimino	46b32e861a	Merge "configure: set WEBP_HAVE_AVX2 when available"	2014-06-05 02:57:42 -07:00
pascal massimino	88305db4fc	Merge "VP8RandomBits2: prevent signed int overflow"	2014-06-05 01:46:42 -07:00
James Zern	73fee88c4a	VP8RandomBits2: prevent signed int overflow 'diff' at its largest may be INT_MAX; << 1 of anything at or above 1 << 30 will overflow. Change-Id: Idb2b5a9b55acc2f6d5e32be8baaebee3f89919ad	2014-06-04 23:19:03 -07:00
James Zern	db4860b355	enc_sse2: prevent signed int overflow _mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed int. Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda	2014-06-04 23:18:22 -07:00
James Zern	230a055501	configure: set WEBP_HAVE_AVX2 when available this is used to set WEBP_USE_AVX2 in files where the build flag won't be used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f	2014-06-03 23:29:23 -07:00
skal	a2ac8a420e	restore original value_/range_ field order no speed change, just for coherency Change-Id: Iaa395bca24f33a14b68ba6920b838ef87d0d0db6	2014-06-03 09:36:56 +02:00
James Zern	5e2ee56fdd	Merge "remove libwebpdspdecode dep on libwebpdsp_avx2"	2014-06-03 00:28:54 -07:00
James Zern	61362db57c	remove libwebpdspdecode dep on libwebpdsp_avx2 it's encode only, libwebpdecoder doesn't need the symbols Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83	2014-06-03 00:05:56 -07:00
Pascal Massimino	42c447aeb0	Merge "lossy bit-reader clean-up:"	2014-06-02 23:53:00 -07:00
James Zern	479ffd8b5d	Merge "remove unused #include's"	2014-06-02 23:07:20 -07:00
James Zern	9754d39a4e	Merge "strong filtering speed-up (~2-3% x86, ~1-2% for NEON)"	2014-06-02 23:06:18 -07:00
skal	158aff9bb9	remove unused #include's Change-Id: Icd91a4b6a0bde49145f57e3e74a997822c45792c	2014-06-03 08:01:05 +02:00
Pascal Massimino	09545eeadc	lossy bit-reader clean-up: * remove LEFT/RIGHT_JUSTIFY distinction. It's all RIGHT_JUSTIFY now. * simplify VP8GetSigned(), and add some masking branch-less code. Much faster on ARM (~13% speed-up). 8% on x86-64, 5% on MacBook. * split critical implementation into separate bit_reader_inl.h file that is only included where needed (vp8.c / tree.c / bit_reader.c) * bumped BITS value from 16 to 24 for x86-32b too, since it's a bit faster. Change-Id: If41ca1da3e5c3dadacf2379d1ba419b151e7fce8	2014-06-03 07:46:55 +02:00
skal	ea8b0a171d	strong filtering speed-up (~2-3% x86, ~1-2% for NEON) Extract loop invariant and avoid storing/loading samples if they can be re-used. This is particularly interesting when a transpose is involved (HFilter16i). Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139	2014-06-03 07:14:23 +02:00
skal	6679f8996f	Optimize VP8SetResidualCoeffs. Brings down WebP lossy encoding timings by 5% Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489	2014-06-03 06:44:04 +02:00
James Zern	4dfa86b29c	dsp/cpu: NaCl has no support for xgetbv or the raw opcode; fixes: 934ed4: unrecognized instruction Change-Id: I981870baf0e8b03bf40144ea8ec25eff140d5bc3	2014-05-29 23:02:23 -07:00
skal	c9b340a279	fix missing WebPInitAlphaProcessing call for premultiplied colorspace output (lossless only) Change-Id: Ic2d01c8cf9bc1082f07f348733461eb2ee30288a	2014-05-28 10:44:05 +02:00
pascal massimino	57897bae09	Merge "lossless_neon: use vcreate_*() where appropriate"	2014-05-28 01:36:13 -07:00
pascal massimino	6aa4777b39	Merge "(enc\|dec)_neon: use vcreate_*() where appropriate"	2014-05-28 01:34:56 -07:00
skal	0d346e418d	Always reinit VP8TransformWHT instead of hard-coding Change-Id: I2012749ed29bd166d2a96555372f0d9baa784385	2014-05-28 10:21:07 +02:00
James Zern	bf0e003067	lossless_neon: use vcreate_*() where appropriate this is more portable than {} initialization. more involved cases are left for a follow-up. Change-Id: If7e111864f287ea0a5de6311454aeda37afbb52a	2014-05-27 16:27:46 -07:00
James Zern	9251c2f6d2	(enc\|dec)_neon: use vcreate_*() where appropriate this is more portable than {} initialization. more involved cases are left for a follow-up. Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17	2014-05-27 16:26:56 -07:00
skal	399b916d27	lossy decoding: correct alpha-rescaling for YUVA format The luminance needs to be pre- and post- multiplied by the alpha value in case of rescaling, for proper averaging. Also: - removed util/alpha_processing and moved it to dsp/ - removed WebPInitPremultiply() which was mostly useless and merged it with the new function WebPInitAlphaProcessing() Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60	2014-05-27 15:27:13 -07:00
Pascal Massimino	fdbcd44dd3	simplify VP8LInitBitReader() gcc was generating very complex code, one for each case of br->len_ values! also, pretty-fy the mask constants Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42	2014-05-22 21:44:16 -07:00
James Zern	515e35cfb1	Merge "add stub dsp/enc_avx2.c"	2014-05-22 18:28:38 -07:00
skal	a05dc1402c	SSE2: yuv->rgb speed-up for point-sampling - use statically initialized tables (if WEBP_YUV_USE_SSE2_TABLES is defined) - use SSE2 row conversion for yuv->ARGB / RGBA / ABGR / RGB / BGR - clean-up and harmonize the WebpUpsamplers[] usage. Change-Id: Ic5f3659a995927bd7363defac99c1fc03a85a47d	2014-05-22 09:56:47 +02:00
James Zern	178e9a69ae	add stub dsp/enc_avx2.c VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure flag should be added to avoid the stub when avx2 is unavailable/disabled Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108	2014-05-22 00:31:46 -07:00
James Zern	e46a247c87	cpu: fix check for __cpuidex availability __cpuidex was added in VS2008 /SP1/ Change-Id: Ie49b00b0246bd6537c0ed583412f17d6fd135baa	2014-05-21 22:59:47 -07:00
skal	176fda2650	fix the bit-writer for lossless in 32bit mode Sometimes, we can write 18bit or more at time, and it would overflow the 32bit accumulator. Also clarified the num-bits limitations (and exposed VP8L_MAX_NUM_BIT_READ in bit_reader.h) fixes http://code.google.com/p/webp/issues/detail?id=200 Seems a bit faster (use of local fields for bits_ / used_) also: added the __QNX__ bswap while at it. Change-Id: I876db93a931db15b083cf1d838c70105effa7167	2014-05-22 07:19:22 +02:00
James Zern	541784c710	dsp.h: add a check for AVX2 / define WEBP_USE_AVX2 Change-Id: I90cc870f0bb4426af701779c367587dc2ae79c8b	2014-05-21 20:46:28 -07:00
James Zern	bdb151ee80	dsp/cpu: add AVX2 detection currently unused. https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf Change-Id: I314200f890c58b9a587b902b214f90deb95f0579	2014-05-20 22:48:54 -07:00
Pascal Massimino	ab9f2f8685	Merge "revamp the point-sampling functions by processing a full plane"	2014-05-20 15:21:31 -07:00
Pascal Massimino	a2f8b28905	revamp the point-sampling functions by processing a full plane -nofancy is slower than fancy upsampler, because the latter has SSE2 optim. Change-Id: Ibf22e5a8ea1de86a54248d4a4ecc63d514c01b88	2014-05-20 15:13:44 -07:00
Pascal Massimino	ef076026af	use decoder's DSP functions for autofilter -af is now faster (6-7%), since we're using the SSE2 variant Output is binary the same as before. Change-Id: If75694594c9501cd486b8f237a810ddcc145cadd	2014-05-20 14:55:05 -07:00
pascal massimino	2b5cb32612	Merge "dsp/cpu: add AVX detection"	2014-05-20 01:10:18 -07:00

1 2 3 4 5 ...

1325 Commits