libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2026-04-09 22:30:02 +02:00

Author	SHA1	Message	Date
skal	dc5b122f23	try to remove the spurious warning for static analysis Change-Id: Ib81f16c70a0bfad05021401c1cf6788c974b63bd	2014-05-26 18:31:00 +02:00
Pascal Massimino	a891164398	Merge "simplify VP8LInitBitReader()"	2014-05-22 22:36:41 -07:00
Pascal Massimino	fdbcd44dd3	simplify VP8LInitBitReader() gcc was generating very complex code, one for each case of br->len_ values! also, pretty-fy the mask constants Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42	2014-05-22 21:44:16 -07:00
James Zern	7c004287af	makefile.unix: add rudimentary avx2 support $ make -f makefile.unix HAVE_AVX2=1 will define -mavx2 for src/dsp/*_dsp.c Change-Id: Id9651bda54da057cb051dc70f7dcd008a3f803f4	2014-05-22 18:38:40 -07:00
James Zern	515e35cfb1	Merge "add stub dsp/enc_avx2.c"	2014-05-22 18:28:38 -07:00
skal	a05dc1402c	SSE2: yuv->rgb speed-up for point-sampling - use statically initialized tables (if WEBP_YUV_USE_SSE2_TABLES is defined) - use SSE2 row conversion for yuv->ARGB / RGBA / ABGR / RGB / BGR - clean-up and harmonize the WebpUpsamplers[] usage. Change-Id: Ic5f3659a995927bd7363defac99c1fc03a85a47d	2014-05-22 09:56:47 +02:00
James Zern	178e9a69ae	add stub dsp/enc_avx2.c VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure flag should be added to avoid the stub when avx2 is unavailable/disabled Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108	2014-05-22 00:31:46 -07:00
James Zern	1b99c09cdc	Merge "configure: add a test for -mavx2"	2014-05-22 00:30:10 -07:00
James Zern	fe72807112	configure: add a test for -mavx2 sets AVX2_FLAGS; currently unused Change-Id: Ie07ee6c2fa7c1f0748430010a9f207b1723b6def	2014-05-21 23:17:21 -07:00
James Zern	e46a247c87	cpu: fix check for __cpuidex availability __cpuidex was added in VS2008 /SP1/ Change-Id: Ie49b00b0246bd6537c0ed583412f17d6fd135baa	2014-05-21 22:59:47 -07:00
skal	176fda2650	fix the bit-writer for lossless in 32bit mode Sometimes, we can write 18bit or more at time, and it would overflow the 32bit accumulator. Also clarified the num-bits limitations (and exposed VP8L_MAX_NUM_BIT_READ in bit_reader.h) fixes http://code.google.com/p/webp/issues/detail?id=200 Seems a bit faster (use of local fields for bits_ / used_) also: added the __QNX__ bswap while at it. Change-Id: I876db93a931db15b083cf1d838c70105effa7167	2014-05-22 07:19:22 +02:00
James Zern	541784c710	dsp.h: add a check for AVX2 / define WEBP_USE_AVX2 Change-Id: I90cc870f0bb4426af701779c367587dc2ae79c8b	2014-05-21 20:46:28 -07:00
James Zern	bdb151ee80	dsp/cpu: add AVX2 detection currently unused. https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf Change-Id: I314200f890c58b9a587b902b214f90deb95f0579	2014-05-20 22:48:54 -07:00
Pascal Massimino	ab9f2f8685	Merge "revamp the point-sampling functions by processing a full plane"	2014-05-20 15:21:31 -07:00
Pascal Massimino	a2f8b28905	revamp the point-sampling functions by processing a full plane -nofancy is slower than fancy upsampler, because the latter has SSE2 optim. Change-Id: Ibf22e5a8ea1de86a54248d4a4ecc63d514c01b88	2014-05-20 15:13:44 -07:00
Pascal Massimino	ef076026af	use decoder's DSP functions for autofilter -af is now faster (6-7%), since we're using the SSE2 variant Output is binary the same as before. Change-Id: If75694594c9501cd486b8f237a810ddcc145cadd	2014-05-20 14:55:05 -07:00
pascal massimino	2b5cb32612	Merge "dsp/cpu: add AVX detection"	2014-05-20 01:10:18 -07:00
James Zern	df08e67e06	dsp/cpu: add AVX detection currently unused. https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions similar checks exist in ffmpeg, libyuv. the visual studio inline asm is based off of libyuv. Change-Id: I3e233de3492172434e482607a94b99c617f11aad	2014-05-20 00:25:12 -07:00
Pascal Massimino	e2f405c969	Merge "clean-up and slight speed-up in-loop filtering SSE2"	2014-05-20 00:08:40 -07:00
Pascal Massimino	f60957bfd2	clean-up and slight speed-up in-loop filtering SSE2 * remove some sign-bit flipping * turn some macro into inline functions * fix some 'const' in signatures * clarify the int8/uint8 usage Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d	2014-05-19 23:23:47 -07:00
James Zern	9fc3ae469f	.gitattributes: treat .ppm as binary Change-Id: I4da7b846f6255078f0ce97fc7e8df9f29271f52a	2014-05-15 23:18:35 -07:00
James Zern	3da924b5b4	Merge "dsp/WEBP_USE_NEON: test for __aarch64__"	2014-05-14 20:16:18 -07:00
James Zern	c7164490da	Android.mk: always include *_neon.c in the build the inclusion of the files is harmless when NEON is not enabled and will allow them to be built with NEON for APP_ABI=arm64-v8a which currently does not use the '.neon' suffix Change-Id: I39377876b1b68822c38f4e2396da93c56145fc0f	2014-05-14 00:11:46 -07:00
James Zern	a577b23a0a	dsp/WEBP_USE_NEON: test for __aarch64__ __ARM_NEON__ is unset by current linux gcc/clang + android toolchains for aarch64/arm64 builds. Change-Id: Ib2ca172ea6fcf046e4ced19a431088674c99b7f6	2014-05-14 00:07:13 -07:00
skal	54bfffcabc	move RemapBitReader() from idec.c to bit_reader code mostly for coherency and later patch. Change-Id: Ica8352d67845b6c5b3153435edfb4646c6f24341	2014-05-14 07:07:08 +02:00
James Zern	34168ecbe4	Merge "remove all unused layer code"	2014-05-08 22:51:13 -07:00
Pascal Massimino	f1e771735a	remove all unused layer code Change-Id: I220590162b24c70f404fe3087f19dd3e6cac3608	2014-05-08 22:37:38 -07:00
Vikas Arora	b0757db7c6	Code cleanup for VP8LGetHistoImageSymbols. Fix comments and few nits. Change-Id: I8fa25ed523f12c6a7bfe125f0e4d638466ba4304	2014-05-08 14:13:47 -07:00
skal	5fe628d35d	make the token page size be variable instead of fixed 8192 also changed the token-page layout a little bit to remove a not-needed field. This reduces the number of malloc()/free() calls substantially with minimal increase in memory consumption (~2%). For the tail of large sources, the number of malloc calls goes typically from ~10000 to ~100 (e.g.: bryce_big.jpg: 22711 -> 105) Change-Id: Ib847f41e618ed8c303d26b76da982fbc48de45b9	2014-05-05 14:26:14 -07:00
skal	f948d08c81	memory debug: allow setting pre-defined malloc failure points MALLOC_FAIL_AT flag can be used to set-up a pre-determined failure point during malloc calls. The counter value is retrieved using getenv(). Example usage: export MALLOC_FAIL_AT=37 && cwebp input.png will make 'cwebp' report a memory allocation error the 37th time malloc() or calloc() is called. MALLOC_MEM_LIMIT can be used similarly to prevent allocating more than a given amount of memory. This is usually less convenient to use than MALLOC_FAIL_AT since one has to know in advance the typical memory size allocated. Both these flags are meant to be used for debugging only! Also: added a 'total_mem_allocated' to record the overall memory allocated Change-Id: I9d408095ee7d76acba0f3a31b1276fc36478720a	2014-05-05 14:01:33 -07:00
skal	ca3d746e39	use block-based allocation for backward refs storage, and free-lists Non-photo source produce far less literal reference and their buffer is usually much smaller than the picture size if its compresses well. Hence, use a block-base allocation (and recycling) to avoid pre-allocating a buffer with maximal size. This can reduce memory consumption up to 50% for non-photographic content. Encode speed is also a little better (1-2%) Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11	2014-05-05 11:11:55 -07:00
James Zern	1ba61b09f9	enable NEON intrinsics in aarch64 builds avoids functions that use vtbl? as in iOS builds these are marked unavailable Change-Id: I17aedc3c7dc8f1d5be0941205de0b22c3772ef1b	2014-05-03 12:37:42 -07:00
James Zern	b9d2bb67d6	dsp/neon.h: coalesce intrinsics-related defines Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946	2014-05-03 11:34:07 -07:00
James Zern	b5c7525897	iosbuild: add support for iOSv7/aarch64 Change-Id: I3a51c77276e245cd871acb18d9d70d109aac000b	2014-05-03 11:14:37 -07:00
Vikas Arora	9383afd5c7	Reduce number of memory allocations while decoding lossless. This change reduces the number of calls to WebPSafeMalloc from 200 to 100. The overall memory consumption is down 3% for Lenna image. Change-Id: I1b351a1f61abf2634c035ef1ccb34050b7876bdd	2014-05-02 01:01:43 -07:00
James Zern	888e63edc9	Merge "dsp/lossless: prevent signed int overflow in left shift ops"	2014-05-02 00:29:54 -07:00
Pascal Massimino	8137f3edbd	Merge "instrument memory allocation routines for debugging"	2014-05-02 00:23:48 -07:00
Pascal Massimino	2aa187360d	instrument memory allocation routines for debugging Some tracing code is activated by PRINT_MEM_INFO flag. For debugging only! (not thread-safe, and slow). Change-Id: I282c623c960f97d474a35b600981b761ef89ace9	2014-05-02 00:19:55 -07:00
skal	d3bcf72bf5	Don't allocate VP8LHashChain, but treat like automatic object the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_) is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain in the encoder. Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677	2014-04-30 14:10:48 -07:00
James Zern	bd6b8619dd	dsp/lossless: prevent signed int overflow in left shift ops force unsigned when shifting by 24. Change-Id: I453601f33fdf01c516ef66ad23399ae6cbe032b3	2014-04-30 00:10:49 -07:00
James Zern	b7f19b8311	Merge "dec/vp8l: prevent signed int overflow in left shift ops"	2014-04-29 15:56:54 -07:00
Pascal Massimino	29059d5178	Merge "remove some uint64_t casts and use."	2014-04-29 14:15:40 -07:00
James Zern	e69a1df4b7	dec/vp8l: prevent signed int overflow in left shift ops force unsigned when shifting by 24. Change-Id: I6f9ca5fa2109e59b1d46a909136384fc6dc8ca0b	2014-04-29 14:12:38 -07:00
Pascal Massimino	cf5eb8ad19	remove some uint64_t casts and use. We use automatic int->uint64_t promotion where applicable. (uint64_t should be kept only for overflow checking and memory alloc). Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125	2014-04-29 09:08:25 -07:00
Djordje Pesut	38e2db3e16	MIPS: MIPS32r1: Added optimization for HistogramAdd. Change-Id: I39622a9c340c4090f64dd10e515c4ef2aa21d10a	2014-04-29 08:36:51 -07:00
James Zern	e0609ade15	dwebp: fix exit code on webp load failure on ExUtilLoadWebP() failure no allocated memory will be returned, so it's safe to exit immediately. additionally, any webp specific problems will already have been reported as part of the call to WebPGetFeatures(). broken since: `4a0e739` dwebp: move webp decoding to example_util Change-Id: Ibc632015a1f52bae7f96d063252624123fa7c2da	2014-04-28 17:22:29 -07:00
James Zern	bbd358a8e7	Merge "example_util.h: avoid forward declaring enums"	2014-04-28 15:10:12 -07:00
James Zern	8955da2149	example_util.h: avoid forward declaring enums doing so is not part of ISO C; removes some pedantic warnings. use webp/decode.h to pickup VP8StatusCode instead. Change-Id: I19b35e0f8a36fb7c45944ae9ca86838e08b90548	2014-04-28 14:56:19 -07:00
Vikas Arora	6d6865f0db	Added SSE2 variants for Average2/3/4 The predictors based on Average2 are tad slower. Following is the performance data for these predictors normalized to number of instruction cycles (as per valgrind) per operation: - Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11 instruction cycles for the C version. - Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12 instruction cycles for the C version. The predictors based on Average4 is faster and Average3 is tad slower: - Predictor10 (Average4) now takes 23 instruction cycles compared to 25 instruction cycles for the C version. - Predictor5 (Average3) now takes 20 instruction cycles compared to 18 instruction cycles for the C version. Maybe SSE2 version of Average2 can be improved further. Otherwise, we can remove the SSE2 version and always fallback to the C version. Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029	2014-04-28 14:47:30 -07:00
Pascal Massimino	b3a616b356	make HistogramAdd() a pointer in dsp * merged the two HistogramAdd/AddEval() into a single call (with detection of special case when b==out) * added a SSE2 variant * harmonize the histogram type to 'uint32_t' instead of just 'int'. This has a lot of ripples on signatures. * 1-2% faster Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306	2014-04-28 10:09:34 -07:00

1 2 3 4 5 ...

2006 Commits