libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2026-04-10 06:40:02 +02:00

Author	SHA1	Message	Date
James Zern	ea24e026aa	Merge "dsp.h: add WEBP_UBSAN_IGNORE_UNDEF"	2016-05-11 06:21:45 +00:00
James Zern	369e264e2e	dsp.h: add WEBP_UBSAN_IGNORE_UNDEF only defined when WEBP_FORCE_ALIGNED isn't. use it to quiet alignment warnings VP8LoadNewBytes(). Change-Id: I710a74bb9375285974e97022540551a3f4eda414	2016-05-10 22:45:13 -07:00
James Zern	0d020a7892	Merge "add runtime NEON detection"	2016-05-11 05:42:13 +00:00
James Zern	5ee2136a71	Merge "add VP8LAddPixels() to lossless.h"	2016-05-10 22:39:29 +00:00
Pascal Massimino	47435a6162	add VP8LAddPixels() to lossless.h Change-Id: I67f9118f875affa32c47adfedf9df28b0ac9957b	2016-05-10 20:30:30 +00:00
Pascal Massimino	8fa6ac68f0	remove two ubsan warnings (regarding uint overflow) Change-Id: I1a76e4b1268370b6b7d6a1aa93b99e57f55fd02e	2016-05-10 18:40:18 +00:00
James Zern	74fb56fb5d	add runtime NEON detection configure gets 2 new options: --enable-neon / --enable-neon-rtcd the NEON modules are split to their own convenience lib and built with auto-detected flags if none are given via CFLAGS. the /proc/cpuinfo check will only be used for armv7 targets whose toolchain does not enable NEON by default or didn't have NEON forced by the CFLAGS from the environment. Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5	2016-05-06 15:32:48 -07:00
Jovan Zelincevic	4154a8395d	MIPS update to new Unfilter API Change-Id: I2b5960812954dfcabc84663382b9e032fd1eeb43	2016-05-05 15:50:34 +02:00
Pascal Massimino	2102ccd091	update the Unfilter API in dsp to process one row independently This will allow to work in-place on cropped area later. Also sped up the inverse gradient filtering in SSE2 (~4%) Change-Id: I463149eee95d36984328f163a1e17f8cabd87441	2016-04-21 08:10:45 +00:00
James Zern	875aec7044	enc_neon,cosmetics: break long comment Change-Id: I88dff0271fef1cc6dd5888572bfe0f09f467b028	2016-03-08 23:33:21 -08:00
Pascal Massimino	a90edffb7e	fix missing 'extern' for SSIM function in dsp/ Change-Id: Id8143120f01065dc088f4e90bd930f8ea7c3ae5a	2016-03-08 10:27:46 -08:00
Pascal Massimino	423ecaf484	move some SSIM-accumulation function for dsp/ This is in preparation for some SSE2 code. And generally speaking, the whole SSIM code needs some revamp: we're not averaging the SSIM value at each pixels but just computing the overall SSIM value once, for the whole plane. The former might be better than the latter. Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5	2016-03-08 07:50:09 +01:00
James Zern	0d40cc5ea3	enc_neon,Disto4x4: remove an unnecessary transpose based on the sse2 change in: `9960c31` Remove an unnecessary transposition in TTransform. ~9-10.5% faster at the function-level, < 1% overall Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7	2016-03-03 16:18:59 -08:00
Pascal Massimino	6753f35cac	Merge "FTransformWHT optimization."	2016-02-19 09:38:04 +00:00
Vincent Rabaud	6583bb1a42	Improve SSE4.1 implementation of TTransform. SSE4.1 is slower than the SSE2 implementation and this seems to be due to a slow _mm_loadl_epi64 implementation by gcc (hence a bug with my gcc 4.8) and a very slow _mm_hadd_epi32. Both got confirmed by IACA and experiments. Change-Id: I05607f66b7ccd8f4f42e000693aea583ffd5768f	2016-02-19 09:11:53 +01:00
Vincent Rabaud	7561d0c338	FTransformWHT optimization. Data is packed sooner in the functions. Change-Id: I018cfeca43f015ac755c7f209f9a97984cc0517b	2016-02-18 17:44:05 +01:00
Vincent Rabaud	8aa352b256	Merge "Remove an unnecessary transposition in TTransform."	2016-02-18 08:15:10 +00:00
Vincent Rabaud	9960c31685	Remove an unnecessary transposition in TTransform. Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e	2016-02-17 21:41:28 +01:00
Vincent Rabaud	6e36b51188	Small speedup in FTransform. It removes two _mm_unpacklo_epi32 and two _mm_sub_epi16. Change-Id: Icdf86259f796ba855d1cda5e9c0e99cb396cb351	2016-02-17 21:26:36 +01:00
Vincent Rabaud	bf2b4f114f	Regroup common SSE code + optimization. The transpose refactoring will help removing a transpose in a later CL. The horizontal add function helps removing a _mm_sad_epu8 in DC8uv => the latency/throughput went from 29/25 to 23/19 Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766	2016-02-16 18:34:34 +01:00
Nico Weber	3ef1ce98b9	yuv_sse2: fix -Wconstant-conversion warning 'implicit conversion from 'int' to 'short' changes value from 33050 to -32486' original patch: https://codereview.chromium.org/1657313003/ Make libwebp build with -Wconstant-conversion from newer clangs. After http://llvm.org/viewvc/llvm-project?rev=259271&view=rev, clang points out that _mm_set1_epi16(33050) causes an overflow in the short argument to _mm_set1_epi16(). Since there's no version that takes an unsigned short, add an explicit cast to tell the compiler that this is intentional. No behavior change. Change-Id: I6b4e3401b15cfbcc895f9e81b5c2dc59d43ffb9b	2016-02-02 14:52:11 -08:00
Pascal Massimino	6c1d763119	avoid Yoda style for comparison Change-Id: I8ff9f96951e5e8a619f7132455dd281cbf91aa4d	2016-01-15 23:52:29 -08:00
Vincent Rabaud	8ce975ac82	SSE optimization for vector mismatch. Change-Id: I564b822033b59d86635230f29ed6197e306a2c4f	2016-01-07 18:23:45 +01:00
Pascal Massimino	7e7b6ccc7f	faster rgb565/rgb4444/argb output SSE2 and NEON implementation. Change-Id: I342a1c3d84937b8497f0aaecb7ce9bdb7f50296b	2015-12-17 23:38:58 -08:00
James Zern	99a01f4f8b	Merge "Unify some entropy functions."	2015-12-17 22:35:29 +00:00
James Zern	4b025f10f7	Merge "configure: disable asserts by default"	2015-12-17 22:28:37 +00:00
Vincent Rabaud	ca509a3362	Unify some entropy functions. The code and logic is unified when computing bit entropy + Huffman cost. Speed-wise, we gain 8% for lossless encoding. Logic-wise, the beginning/end of the distributions are handled properly and the compression ratio does not change much. Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633	2015-12-17 17:00:08 +01:00
Pascal Massimino	b0547ff0b4	move back common constants for lossless_enc*.c into the .h Change-Id: I11bc979db691f6518d85e2e1c3ac7f05d69681b0	2015-12-17 15:11:56 +01:00
Vincent Rabaud	47ddd5a4cc	Move some codec logic out of ./dsp . The functions containing magic constants are moved out of ./dsp . VP8LPopulationCost got put back in ./enc VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc) VP8LBitsEntropy is now unrefined (refinement happening in ./enc) VP8LHistogramEstimateBits got put back in ./enc VP8LHistogramEstimateBitsBulk got deleted. Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb	2015-12-17 07:03:25 +00:00
James Zern	357f455dec	yuv_sse2: fix 32-bit visual studio build src\dsp\yuv_sse2.c : C2719: 'in': formal parameter with __declspec(align('16')) won't be aligned src\dsp\yuv_sse2.c : C2719: 'out': formal parameter with __declspec(align('16')) won't be aligned Change-Id: Ifd79e33b35c70748faff19cd64eba4a8ffce5a5a	2015-12-16 15:04:36 -08:00
James Zern	b9d80fa4e8	configure: disable asserts by default --enable-asserts can be used to avoid defining NDEBUG Change-Id: I6216668e3f79f69bd8c453f0b36cecb3b585688e	2015-12-16 13:15:53 -08:00
Vincent Rabaud	80ce27d34e	Speed up 24-bit packing / unpacking in YUV / RGB conversions. This implementation brings: - an SSE implementation of packing / unpacking - bigger buffers processed at the same time The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on lossy encoding (RGB to YUV was already optimized). Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f	2015-12-16 11:06:42 +01:00
Pascal Massimino	2dee2966df	remove few obsolete TODO about aligned loads in SSE2 Change-Id: I3628602942ea2ce34dbcb85975d15afc1041f76c	2015-12-15 23:00:41 -08:00
James Zern	b105921c7d	yuv_sse2, cosmetics: fix indent + remove unneeded header Change-Id: I3247378fd3315d95bb3345625d3575aa9e05c1b8	2015-12-15 17:29:04 -08:00
Sriraman Tallam	b275e598b5	fix optimized build with -mcmodel=medium INFO: From Compiling src/dsp/cpu.c: src/dsp/cpu.c: In function 'x86CPUInfo': src/dsp/cpu.c:36:3: inconsistent operand constraints in an 'asm' With PIC and mcmodel=medium, the %rbx register must be saved and restored which causes this problem. This was also solved in GCC-4.9 with this patch: https://gcc.gnu.org/ml/gcc-patches/2012-12/msg01484.html Tested: Builds fine with this change. Change-Id: Icca8eea7bf5af3ef9f17f6ae2886e3430143febf	2015-12-11 16:49:10 -08:00
Vincent Rabaud	2835089d6a	Provide an SSE2 implementation of CombinedShannonEntropy. CombinedShannonEntropy takes 30% for lossless compression. This implementation speeds up the overall process by 2 to 3 %. Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f	2015-12-11 15:12:19 +01:00
Pascal Massimino	202a710b26	fix undefined behaviour during shift, using a cast Change-Id: Ibca261d01092cecf8b37c54e9fcc920c9527c0a9	2015-12-10 08:09:23 +01:00
Pascal Massimino	cb1ce9969c	Merge "10% faster table-less SSE2/NEON version of YUV->RGB conversion"	2015-12-09 10:41:24 +00:00
Pascal Massimino	ac761a3738	10% faster table-less SSE2/NEON version of YUV->RGB conversion * Precision is slightly different * also implemented in SSE2 the missing WebPUpsamplers for MODE_ARGB, MODE_Argb, MODE_RGB565, etc. * removing yuv_tables_sse2.h saved ~8k of binary size * the mips32/mips_dsp_r2 code is disabled for now, since it has drifted away * the NEON code is somewhat tricky Change-Id: Icf205faa62cf46c2825d79f3af6725dc1ec7f052	2015-12-08 20:05:56 -08:00
Lode Vandevenne	6938111357	Improved alpha cleanup for the webp encoder when prediction transform is used. Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset: Alpha cleanup before: 18856614 Alpha cleanup after: 18685802 For reference, with no alpha cleanup: 19159992 Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps preprocessing in the encoder, and the cases when the prediction transform is not used. Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28	2015-12-04 13:50:56 +00:00
Pascal Massimino	2c08aac81a	introduce WebPMemToUint32 and WebPUint32ToMem for memory access it uses memcpy() when unaligned memory write is tricky Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874	2015-12-04 13:43:01 +00:00
James Zern	0837512964	Merge "Make a separate case for low_effort in CopyImageWithPrediction"	2015-12-03 08:46:31 +00:00
James Zern	aa2eb2d4a1	Merge "cosmetics: fix indent"	2015-12-03 08:44:54 +00:00
James Zern	b7551e90e1	cosmetics: fix indent Change-Id: I67e5a0308a964bc37b2314d96f3691fc0550e9bc	2015-12-03 00:34:15 -08:00
Lode Vandevenne	5bda52d4e8	Make a separate case for low_effort in CopyImageWithPrediction for more speed. This gives a roughly a 1% speedup for low_effort. But actually this is a preparation for the upcoming CL that changes RGB values of transparent pixels based on prediction, which should not be done for low_effort because that would slightly hurt its performance. On 1000 PNGs, with quality 0, method 0: Before: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.034 MP/s After: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.428 MP/s Change-Id: I5ed9f599bbf908a917723f3c780551ceb7fd724d	2015-12-03 00:22:50 -08:00
Pascal Massimino	363babe255	Merge "fix some warning about unaligned 32b reads"	2015-12-02 10:29:40 +00:00
Vincent Rabaud	829bd14145	Combine Huffman cost and bit entropy into one loop The same computation was done for both values: go over two buffers, sum them up, and take a decision on the sum at each iteration. MIPS32 code has been disabled for now, pending a code update. Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d	2015-11-30 13:57:25 +01:00
James Zern	a7a954c851	Merge "lossless: make prediction in encoder work per scanline"	2015-11-25 20:40:44 +00:00
Lode Vandevenne	239421c5ef	lossless: make prediction in encoder work per scanline instead of per block. This prepares for a next CL that can make the predictors alter RGB value behind transparent pixels for denser encoding. Some predictors depend on the top-right pixel, and it must have been already processed to know its new RGB value, so requires per scanline instead of per block. Running the encode speed test on 1000 PNGs 10 times with default settings: Before: Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s After: Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s Same but with quality 0, method 0 and 30 iterations: Before: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s After: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s No effect on compressed size, this produces exactly same files. No significant measured effect on speed. Expected faster speed from better memory layout with scanline processing but slower speed due to needing to get predictor mode per pixel, may compensate each other. Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334	2015-11-25 00:38:27 -08:00
Pascal Massimino	f5ca40e05f	fix of undefined multiply (int32 overflow) the problem was the incorporation of the extra constant 1<<16 in the kC1 constant, to emulate the addition. It's now removed and the addition is performed explicitly. No real speed difference observed. cf. issue #278 Change-Id: I2c6499031571d98afff392fb5ebe21a5fa60722d	2015-11-24 23:18:31 -08:00

1 2 3 4 5 ...

600 Commits