libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-04 10:04:30 +02:00

Author	SHA1	Message	Date
Parag Salasakar	1ebf193c2c	Added MSA optimized chroma edge filtering functions 1. VFilter8 2. HFilter8 3. VFilter8i 4. HFilter8i Change-Id: Iea5f0107178809dc31f3d9ba817e2474bd73fc0a	2016-06-22 13:51:29 +00:00
Parag Salasakar	607510967f	Added MSA optimized edge filtering functions 1. VFilter16 2. HFilter16 3. VFilter16i 4. HFilter16i Change-Id: I6a302c5ab40329c9e9bd1501a611d7267a983d81	2016-06-22 09:35:49 +00:00
Vincent Rabaud	9e8e1b7b2a	Inline GetResidual for speed. Change-Id: Ib4228e87dc448866229c0795ca68dabe777ef31c	2016-06-21 16:04:53 +02:00
Parag Salasakar	5e60c42a76	Added MSA optimized transform functions 1. TransformWHT 2. TransformTwo 3. TransformDC 4. TransformAC3 Change-Id: Ia3624cb4aed215bcaffce542b28794e643207039	2016-06-16 09:04:27 +00:00
James Zern	4c59aac0f9	Merge "mips msa webp configuration"	2016-06-09 05:39:53 +00:00
Parag Salasakar	e11da081f9	mips msa webp configuration Change-Id: I886164d6d3d560b1249603d47391fddf20b5a3d4	2016-06-07 23:49:41 -07:00
Jovan Zelincevic	50a486656d	Sync mips32 and dsp_r2 YUV->RGB code with C verison Change-Id: Ibe12f5ef596b8922225b95c36b67955a3f8b9ae4	2016-06-03 10:42:11 +02:00
Pascal Massimino	ca8d951980	remove some obsolete TODOs Change-Id: Ied77b2dd7e3e5bb65524c0ac7b9a3fb6585cac57	2016-06-01 16:23:16 +02:00
Pascal Massimino	77f21c9c39	Move DitherCombine8x8 to dsp/dec.c To be later optimized in SSE2 Change-Id: I0de9c89eb5166f3319bb4b0500150de271ecac05	2016-05-24 23:14:41 -07:00
Marcin Kowalczyk	f2e1efbeb7	Improve near lossless compression when a prediction filter is used. The old implementation in enc/near_lossless.c performing a separate preprocessing step is used only when a prediction filter is not used, otherwise a new implementation integrated into lossless_enc.c is used. It retains the same logic for converting near lossless quality into max number of bits dropped, and for adjusting the number of bits based on the smoothness of the image at a given pixel. As before, borders are not changed. Then, instead of quantizing raw component values, the residual after subtract green and after prediction is quantized according to the resulting number of bits, taking care to not cross the boundary between 255 and 0 after decoding. Ties are resolved by moving closer to the prediction instead of by bankers’ rounding. This results in about 15% size decrease for the same quality. Change-Id: If3e9c388158c2e3e75ef88876703f40b932f671f	2016-05-18 20:59:02 +00:00
James Zern	e15afbce5d	dsp.h: fix ubsan macro name copy and paste error in the previous commit, change no_sanitize("unsigned-integer-overflow") from WEBP_UBSAN_IGNORE_UNDEF -> WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW Change-Id: Id178ee14df1f2c4923a91ce423241e26b60b5d32	2016-05-13 11:09:57 -07:00
James Zern	e53c9ccb24	dsp.h: add WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW for suppressing expected failures with -fsanitize=integer Change-Id: I954cba45f0c96478b770ed7a6ac7491359cae075	2016-05-12 23:51:23 -07:00
James Zern	ea0be354a0	dsp.h: remove utils.h include include utils.h directly where needed to allow utils.h to rely on defines from dsp.h in a follow-up. Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d	2016-05-11 23:17:21 -07:00
James Zern	ea24e026aa	Merge "dsp.h: add WEBP_UBSAN_IGNORE_UNDEF"	2016-05-11 06:21:45 +00:00
James Zern	369e264e2e	dsp.h: add WEBP_UBSAN_IGNORE_UNDEF only defined when WEBP_FORCE_ALIGNED isn't. use it to quiet alignment warnings VP8LoadNewBytes(). Change-Id: I710a74bb9375285974e97022540551a3f4eda414	2016-05-10 22:45:13 -07:00
James Zern	0d020a7892	Merge "add runtime NEON detection"	2016-05-11 05:42:13 +00:00
James Zern	5ee2136a71	Merge "add VP8LAddPixels() to lossless.h"	2016-05-10 22:39:29 +00:00
Pascal Massimino	47435a6162	add VP8LAddPixels() to lossless.h Change-Id: I67f9118f875affa32c47adfedf9df28b0ac9957b	2016-05-10 20:30:30 +00:00
Pascal Massimino	8fa6ac68f0	remove two ubsan warnings (regarding uint overflow) Change-Id: I1a76e4b1268370b6b7d6a1aa93b99e57f55fd02e	2016-05-10 18:40:18 +00:00
James Zern	74fb56fb5d	add runtime NEON detection configure gets 2 new options: --enable-neon / --enable-neon-rtcd the NEON modules are split to their own convenience lib and built with auto-detected flags if none are given via CFLAGS. the /proc/cpuinfo check will only be used for armv7 targets whose toolchain does not enable NEON by default or didn't have NEON forced by the CFLAGS from the environment. Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5	2016-05-06 15:32:48 -07:00
Jovan Zelincevic	4154a8395d	MIPS update to new Unfilter API Change-Id: I2b5960812954dfcabc84663382b9e032fd1eeb43	2016-05-05 15:50:34 +02:00
Pascal Massimino	2102ccd091	update the Unfilter API in dsp to process one row independently This will allow to work in-place on cropped area later. Also sped up the inverse gradient filtering in SSE2 (~4%) Change-Id: I463149eee95d36984328f163a1e17f8cabd87441	2016-04-21 08:10:45 +00:00
James Zern	875aec7044	enc_neon,cosmetics: break long comment Change-Id: I88dff0271fef1cc6dd5888572bfe0f09f467b028	2016-03-08 23:33:21 -08:00
Pascal Massimino	a90edffb7e	fix missing 'extern' for SSIM function in dsp/ Change-Id: Id8143120f01065dc088f4e90bd930f8ea7c3ae5a	2016-03-08 10:27:46 -08:00
Pascal Massimino	423ecaf484	move some SSIM-accumulation function for dsp/ This is in preparation for some SSE2 code. And generally speaking, the whole SSIM code needs some revamp: we're not averaging the SSIM value at each pixels but just computing the overall SSIM value once, for the whole plane. The former might be better than the latter. Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5	2016-03-08 07:50:09 +01:00
James Zern	0d40cc5ea3	enc_neon,Disto4x4: remove an unnecessary transpose based on the sse2 change in: 9960c31 Remove an unnecessary transposition in TTransform. ~9-10.5% faster at the function-level, < 1% overall Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7	2016-03-03 16:18:59 -08:00
Pascal Massimino	6753f35cac	Merge "FTransformWHT optimization."	2016-02-19 09:38:04 +00:00
Vincent Rabaud	6583bb1a42	Improve SSE4.1 implementation of TTransform. SSE4.1 is slower than the SSE2 implementation and this seems to be due to a slow _mm_loadl_epi64 implementation by gcc (hence a bug with my gcc 4.8) and a very slow _mm_hadd_epi32. Both got confirmed by IACA and experiments. Change-Id: I05607f66b7ccd8f4f42e000693aea583ffd5768f	2016-02-19 09:11:53 +01:00
Vincent Rabaud	7561d0c338	FTransformWHT optimization. Data is packed sooner in the functions. Change-Id: I018cfeca43f015ac755c7f209f9a97984cc0517b	2016-02-18 17:44:05 +01:00
Vincent Rabaud	8aa352b256	Merge "Remove an unnecessary transposition in TTransform."	2016-02-18 08:15:10 +00:00
Vincent Rabaud	9960c31685	Remove an unnecessary transposition in TTransform. Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e	2016-02-17 21:41:28 +01:00
Vincent Rabaud	6e36b51188	Small speedup in FTransform. It removes two _mm_unpacklo_epi32 and two _mm_sub_epi16. Change-Id: Icdf86259f796ba855d1cda5e9c0e99cb396cb351	2016-02-17 21:26:36 +01:00
Vincent Rabaud	bf2b4f114f	Regroup common SSE code + optimization. The transpose refactoring will help removing a transpose in a later CL. The horizontal add function helps removing a _mm_sad_epu8 in DC8uv => the latency/throughput went from 29/25 to 23/19 Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766	2016-02-16 18:34:34 +01:00
Nico Weber	3ef1ce98b9	yuv_sse2: fix -Wconstant-conversion warning 'implicit conversion from 'int' to 'short' changes value from 33050 to -32486' original patch: https://codereview.chromium.org/1657313003/ Make libwebp build with -Wconstant-conversion from newer clangs. After http://llvm.org/viewvc/llvm-project?rev=259271&view=rev, clang points out that _mm_set1_epi16(33050) causes an overflow in the short argument to _mm_set1_epi16(). Since there's no version that takes an unsigned short, add an explicit cast to tell the compiler that this is intentional. No behavior change. Change-Id: I6b4e3401b15cfbcc895f9e81b5c2dc59d43ffb9b	2016-02-02 14:52:11 -08:00
Pascal Massimino	6c1d763119	avoid Yoda style for comparison Change-Id: I8ff9f96951e5e8a619f7132455dd281cbf91aa4d	2016-01-15 23:52:29 -08:00
Vincent Rabaud	8ce975ac82	SSE optimization for vector mismatch. Change-Id: I564b822033b59d86635230f29ed6197e306a2c4f	2016-01-07 18:23:45 +01:00
Pascal Massimino	7e7b6ccc7f	faster rgb565/rgb4444/argb output SSE2 and NEON implementation. Change-Id: I342a1c3d84937b8497f0aaecb7ce9bdb7f50296b	2015-12-17 23:38:58 -08:00
James Zern	99a01f4f8b	Merge "Unify some entropy functions."	2015-12-17 22:35:29 +00:00
James Zern	4b025f10f7	Merge "configure: disable asserts by default"	2015-12-17 22:28:37 +00:00
Vincent Rabaud	ca509a3362	Unify some entropy functions. The code and logic is unified when computing bit entropy + Huffman cost. Speed-wise, we gain 8% for lossless encoding. Logic-wise, the beginning/end of the distributions are handled properly and the compression ratio does not change much. Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633	2015-12-17 17:00:08 +01:00
Pascal Massimino	b0547ff0b4	move back common constants for lossless_enc*.c into the .h Change-Id: I11bc979db691f6518d85e2e1c3ac7f05d69681b0	2015-12-17 15:11:56 +01:00
Vincent Rabaud	47ddd5a4cc	Move some codec logic out of ./dsp . The functions containing magic constants are moved out of ./dsp . VP8LPopulationCost got put back in ./enc VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc) VP8LBitsEntropy is now unrefined (refinement happening in ./enc) VP8LHistogramEstimateBits got put back in ./enc VP8LHistogramEstimateBitsBulk got deleted. Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb	2015-12-17 07:03:25 +00:00
James Zern	357f455dec	yuv_sse2: fix 32-bit visual studio build src\dsp\yuv_sse2.c : C2719: 'in': formal parameter with __declspec(align('16')) won't be aligned src\dsp\yuv_sse2.c : C2719: 'out': formal parameter with __declspec(align('16')) won't be aligned Change-Id: Ifd79e33b35c70748faff19cd64eba4a8ffce5a5a	2015-12-16 15:04:36 -08:00
James Zern	b9d80fa4e8	configure: disable asserts by default --enable-asserts can be used to avoid defining NDEBUG Change-Id: I6216668e3f79f69bd8c453f0b36cecb3b585688e	2015-12-16 13:15:53 -08:00
Vincent Rabaud	80ce27d34e	Speed up 24-bit packing / unpacking in YUV / RGB conversions. This implementation brings: - an SSE implementation of packing / unpacking - bigger buffers processed at the same time The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on lossy encoding (RGB to YUV was already optimized). Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f	2015-12-16 11:06:42 +01:00
Pascal Massimino	2dee2966df	remove few obsolete TODO about aligned loads in SSE2 Change-Id: I3628602942ea2ce34dbcb85975d15afc1041f76c	2015-12-15 23:00:41 -08:00
James Zern	b105921c7d	yuv_sse2, cosmetics: fix indent + remove unneeded header Change-Id: I3247378fd3315d95bb3345625d3575aa9e05c1b8	2015-12-15 17:29:04 -08:00
Sriraman Tallam	b275e598b5	fix optimized build with -mcmodel=medium INFO: From Compiling src/dsp/cpu.c: src/dsp/cpu.c: In function 'x86CPUInfo': src/dsp/cpu.c:36:3: inconsistent operand constraints in an 'asm' With PIC and mcmodel=medium, the %rbx register must be saved and restored which causes this problem. This was also solved in GCC-4.9 with this patch: https://gcc.gnu.org/ml/gcc-patches/2012-12/msg01484.html Tested: Builds fine with this change. Change-Id: Icca8eea7bf5af3ef9f17f6ae2886e3430143febf	2015-12-11 16:49:10 -08:00
Vincent Rabaud	2835089d6a	Provide an SSE2 implementation of CombinedShannonEntropy. CombinedShannonEntropy takes 30% for lossless compression. This implementation speeds up the overall process by 2 to 3 %. Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f	2015-12-11 15:12:19 +01:00
Pascal Massimino	202a710b26	fix undefined behaviour during shift, using a cast Change-Id: Ibca261d01092cecf8b37c54e9fcc920c9527c0a9	2015-12-10 08:09:23 +01:00

... 4 5 6 7 8 ...

863 Commits