libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-15 21:39:59 +02:00

Author	SHA1	Message	Date
James Zern	e15afbce5d	dsp.h: fix ubsan macro name copy and paste error in the previous commit, change no_sanitize("unsigned-integer-overflow") from WEBP_UBSAN_IGNORE_UNDEF -> WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW Change-Id: Id178ee14df1f2c4923a91ce423241e26b60b5d32	2016-05-13 11:09:57 -07:00
James Zern	e53c9ccb24	dsp.h: add WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW for suppressing expected failures with -fsanitize=integer Change-Id: I954cba45f0c96478b770ed7a6ac7491359cae075	2016-05-12 23:51:23 -07:00
James Zern	ea0be354a0	dsp.h: remove utils.h include include utils.h directly where needed to allow utils.h to rely on defines from dsp.h in a follow-up. Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d	2016-05-11 23:17:21 -07:00
James Zern	ea24e026aa	Merge "dsp.h: add WEBP_UBSAN_IGNORE_UNDEF"	2016-05-11 06:21:45 +00:00
James Zern	369e264e2e	dsp.h: add WEBP_UBSAN_IGNORE_UNDEF only defined when WEBP_FORCE_ALIGNED isn't. use it to quiet alignment warnings VP8LoadNewBytes(). Change-Id: I710a74bb9375285974e97022540551a3f4eda414	2016-05-10 22:45:13 -07:00
James Zern	0d020a7892	Merge "add runtime NEON detection"	2016-05-11 05:42:13 +00:00
James Zern	5ee2136a71	Merge "add VP8LAddPixels() to lossless.h"	2016-05-10 22:39:29 +00:00
Pascal Massimino	47435a6162	add VP8LAddPixels() to lossless.h Change-Id: I67f9118f875affa32c47adfedf9df28b0ac9957b	2016-05-10 20:30:30 +00:00
Pascal Massimino	8fa6ac68f0	remove two ubsan warnings (regarding uint overflow) Change-Id: I1a76e4b1268370b6b7d6a1aa93b99e57f55fd02e	2016-05-10 18:40:18 +00:00
James Zern	74fb56fb5d	add runtime NEON detection configure gets 2 new options: --enable-neon / --enable-neon-rtcd the NEON modules are split to their own convenience lib and built with auto-detected flags if none are given via CFLAGS. the /proc/cpuinfo check will only be used for armv7 targets whose toolchain does not enable NEON by default or didn't have NEON forced by the CFLAGS from the environment. Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5	2016-05-06 15:32:48 -07:00
Jovan Zelincevic	4154a8395d	MIPS update to new Unfilter API Change-Id: I2b5960812954dfcabc84663382b9e032fd1eeb43	2016-05-05 15:50:34 +02:00
Pascal Massimino	2102ccd091	update the Unfilter API in dsp to process one row independently This will allow to work in-place on cropped area later. Also sped up the inverse gradient filtering in SSE2 (~4%) Change-Id: I463149eee95d36984328f163a1e17f8cabd87441	2016-04-21 08:10:45 +00:00
James Zern	875aec7044	enc_neon,cosmetics: break long comment Change-Id: I88dff0271fef1cc6dd5888572bfe0f09f467b028	2016-03-08 23:33:21 -08:00
Pascal Massimino	a90edffb7e	fix missing 'extern' for SSIM function in dsp/ Change-Id: Id8143120f01065dc088f4e90bd930f8ea7c3ae5a	2016-03-08 10:27:46 -08:00
Pascal Massimino	423ecaf484	move some SSIM-accumulation function for dsp/ This is in preparation for some SSE2 code. And generally speaking, the whole SSIM code needs some revamp: we're not averaging the SSIM value at each pixels but just computing the overall SSIM value once, for the whole plane. The former might be better than the latter. Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5	2016-03-08 07:50:09 +01:00
James Zern	0d40cc5ea3	enc_neon,Disto4x4: remove an unnecessary transpose based on the sse2 change in: `9960c31` Remove an unnecessary transposition in TTransform. ~9-10.5% faster at the function-level, < 1% overall Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7	2016-03-03 16:18:59 -08:00
Pascal Massimino	6753f35cac	Merge "FTransformWHT optimization."	2016-02-19 09:38:04 +00:00
Vincent Rabaud	6583bb1a42	Improve SSE4.1 implementation of TTransform. SSE4.1 is slower than the SSE2 implementation and this seems to be due to a slow _mm_loadl_epi64 implementation by gcc (hence a bug with my gcc 4.8) and a very slow _mm_hadd_epi32. Both got confirmed by IACA and experiments. Change-Id: I05607f66b7ccd8f4f42e000693aea583ffd5768f	2016-02-19 09:11:53 +01:00
Vincent Rabaud	7561d0c338	FTransformWHT optimization. Data is packed sooner in the functions. Change-Id: I018cfeca43f015ac755c7f209f9a97984cc0517b	2016-02-18 17:44:05 +01:00
Vincent Rabaud	8aa352b256	Merge "Remove an unnecessary transposition in TTransform."	2016-02-18 08:15:10 +00:00
Vincent Rabaud	9960c31685	Remove an unnecessary transposition in TTransform. Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e	2016-02-17 21:41:28 +01:00
Vincent Rabaud	6e36b51188	Small speedup in FTransform. It removes two _mm_unpacklo_epi32 and two _mm_sub_epi16. Change-Id: Icdf86259f796ba855d1cda5e9c0e99cb396cb351	2016-02-17 21:26:36 +01:00
Vincent Rabaud	bf2b4f114f	Regroup common SSE code + optimization. The transpose refactoring will help removing a transpose in a later CL. The horizontal add function helps removing a _mm_sad_epu8 in DC8uv => the latency/throughput went from 29/25 to 23/19 Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766	2016-02-16 18:34:34 +01:00
Nico Weber	3ef1ce98b9	yuv_sse2: fix -Wconstant-conversion warning 'implicit conversion from 'int' to 'short' changes value from 33050 to -32486' original patch: https://codereview.chromium.org/1657313003/ Make libwebp build with -Wconstant-conversion from newer clangs. After http://llvm.org/viewvc/llvm-project?rev=259271&view=rev, clang points out that _mm_set1_epi16(33050) causes an overflow in the short argument to _mm_set1_epi16(). Since there's no version that takes an unsigned short, add an explicit cast to tell the compiler that this is intentional. No behavior change. Change-Id: I6b4e3401b15cfbcc895f9e81b5c2dc59d43ffb9b	2016-02-02 14:52:11 -08:00
Pascal Massimino	6c1d763119	avoid Yoda style for comparison Change-Id: I8ff9f96951e5e8a619f7132455dd281cbf91aa4d	2016-01-15 23:52:29 -08:00
Vincent Rabaud	8ce975ac82	SSE optimization for vector mismatch. Change-Id: I564b822033b59d86635230f29ed6197e306a2c4f	2016-01-07 18:23:45 +01:00
Pascal Massimino	7e7b6ccc7f	faster rgb565/rgb4444/argb output SSE2 and NEON implementation. Change-Id: I342a1c3d84937b8497f0aaecb7ce9bdb7f50296b	2015-12-17 23:38:58 -08:00
James Zern	99a01f4f8b	Merge "Unify some entropy functions."	2015-12-17 22:35:29 +00:00
James Zern	4b025f10f7	Merge "configure: disable asserts by default"	2015-12-17 22:28:37 +00:00
Vincent Rabaud	ca509a3362	Unify some entropy functions. The code and logic is unified when computing bit entropy + Huffman cost. Speed-wise, we gain 8% for lossless encoding. Logic-wise, the beginning/end of the distributions are handled properly and the compression ratio does not change much. Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633	2015-12-17 17:00:08 +01:00
Pascal Massimino	b0547ff0b4	move back common constants for lossless_enc*.c into the .h Change-Id: I11bc979db691f6518d85e2e1c3ac7f05d69681b0	2015-12-17 15:11:56 +01:00
Vincent Rabaud	47ddd5a4cc	Move some codec logic out of ./dsp . The functions containing magic constants are moved out of ./dsp . VP8LPopulationCost got put back in ./enc VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc) VP8LBitsEntropy is now unrefined (refinement happening in ./enc) VP8LHistogramEstimateBits got put back in ./enc VP8LHistogramEstimateBitsBulk got deleted. Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb	2015-12-17 07:03:25 +00:00
James Zern	357f455dec	yuv_sse2: fix 32-bit visual studio build src\dsp\yuv_sse2.c : C2719: 'in': formal parameter with __declspec(align('16')) won't be aligned src\dsp\yuv_sse2.c : C2719: 'out': formal parameter with __declspec(align('16')) won't be aligned Change-Id: Ifd79e33b35c70748faff19cd64eba4a8ffce5a5a	2015-12-16 15:04:36 -08:00
James Zern	b9d80fa4e8	configure: disable asserts by default --enable-asserts can be used to avoid defining NDEBUG Change-Id: I6216668e3f79f69bd8c453f0b36cecb3b585688e	2015-12-16 13:15:53 -08:00
Vincent Rabaud	80ce27d34e	Speed up 24-bit packing / unpacking in YUV / RGB conversions. This implementation brings: - an SSE implementation of packing / unpacking - bigger buffers processed at the same time The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on lossy encoding (RGB to YUV was already optimized). Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f	2015-12-16 11:06:42 +01:00
Pascal Massimino	2dee2966df	remove few obsolete TODO about aligned loads in SSE2 Change-Id: I3628602942ea2ce34dbcb85975d15afc1041f76c	2015-12-15 23:00:41 -08:00
James Zern	b105921c7d	yuv_sse2, cosmetics: fix indent + remove unneeded header Change-Id: I3247378fd3315d95bb3345625d3575aa9e05c1b8	2015-12-15 17:29:04 -08:00
Sriraman Tallam	b275e598b5	fix optimized build with -mcmodel=medium INFO: From Compiling src/dsp/cpu.c: src/dsp/cpu.c: In function 'x86CPUInfo': src/dsp/cpu.c:36:3: inconsistent operand constraints in an 'asm' With PIC and mcmodel=medium, the %rbx register must be saved and restored which causes this problem. This was also solved in GCC-4.9 with this patch: https://gcc.gnu.org/ml/gcc-patches/2012-12/msg01484.html Tested: Builds fine with this change. Change-Id: Icca8eea7bf5af3ef9f17f6ae2886e3430143febf	2015-12-11 16:49:10 -08:00
Vincent Rabaud	2835089d6a	Provide an SSE2 implementation of CombinedShannonEntropy. CombinedShannonEntropy takes 30% for lossless compression. This implementation speeds up the overall process by 2 to 3 %. Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f	2015-12-11 15:12:19 +01:00
Pascal Massimino	202a710b26	fix undefined behaviour during shift, using a cast Change-Id: Ibca261d01092cecf8b37c54e9fcc920c9527c0a9	2015-12-10 08:09:23 +01:00
Pascal Massimino	cb1ce9969c	Merge "10% faster table-less SSE2/NEON version of YUV->RGB conversion"	2015-12-09 10:41:24 +00:00
Pascal Massimino	ac761a3738	10% faster table-less SSE2/NEON version of YUV->RGB conversion * Precision is slightly different * also implemented in SSE2 the missing WebPUpsamplers for MODE_ARGB, MODE_Argb, MODE_RGB565, etc. * removing yuv_tables_sse2.h saved ~8k of binary size * the mips32/mips_dsp_r2 code is disabled for now, since it has drifted away * the NEON code is somewhat tricky Change-Id: Icf205faa62cf46c2825d79f3af6725dc1ec7f052	2015-12-08 20:05:56 -08:00
Lode Vandevenne	6938111357	Improved alpha cleanup for the webp encoder when prediction transform is used. Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset: Alpha cleanup before: 18856614 Alpha cleanup after: 18685802 For reference, with no alpha cleanup: 19159992 Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps preprocessing in the encoder, and the cases when the prediction transform is not used. Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28	2015-12-04 13:50:56 +00:00
Pascal Massimino	2c08aac81a	introduce WebPMemToUint32 and WebPUint32ToMem for memory access it uses memcpy() when unaligned memory write is tricky Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874	2015-12-04 13:43:01 +00:00
James Zern	0837512964	Merge "Make a separate case for low_effort in CopyImageWithPrediction"	2015-12-03 08:46:31 +00:00
James Zern	aa2eb2d4a1	Merge "cosmetics: fix indent"	2015-12-03 08:44:54 +00:00
James Zern	b7551e90e1	cosmetics: fix indent Change-Id: I67e5a0308a964bc37b2314d96f3691fc0550e9bc	2015-12-03 00:34:15 -08:00
Lode Vandevenne	5bda52d4e8	Make a separate case for low_effort in CopyImageWithPrediction for more speed. This gives a roughly a 1% speedup for low_effort. But actually this is a preparation for the upcoming CL that changes RGB values of transparent pixels based on prediction, which should not be done for low_effort because that would slightly hurt its performance. On 1000 PNGs, with quality 0, method 0: Before: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.034 MP/s After: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.428 MP/s Change-Id: I5ed9f599bbf908a917723f3c780551ceb7fd724d	2015-12-03 00:22:50 -08:00
Pascal Massimino	363babe255	Merge "fix some warning about unaligned 32b reads"	2015-12-02 10:29:40 +00:00
Vincent Rabaud	829bd14145	Combine Huffman cost and bit entropy into one loop The same computation was done for both values: go over two buffers, sum them up, and take a decision on the sum at each iteration. MIPS32 code has been disabled for now, pending a code update. Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d	2015-11-30 13:57:25 +01:00

... 5 6 7 8 9 ...

903 Commits