libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-02 17:14:30 +02:00

Author	SHA1	Message	Date
Vincent Rabaud	10d791ca70	Merge "Fix the random generator in HistogramCombineStochastic."	2017-02-19 10:11:48 +00:00
Vincent Rabaud	fa63a96603	Fix the random generator in HistogramCombineStochastic. It was a bad implementation of a Lehmer random number generator (the saturation was done wrong and mostly & was used instead of % .....). That lead to "for" loop stuck with the same values given a specific seed, hence wasted "for" loops (e.g. seed getting at 374988608 and modulo of 64 later leads to 0 even when updating the seed with the old formula). As the "for" loops now always return a proper pair of histograms, their number can greatly be reduced, hence a speedup. Change-Id: I9f5b44d66cc96fd4824189d92276c3756c8ead5b	2017-02-19 10:49:16 +01:00
skal	16be192f47	VP8LSetBitPos: remove the eos_ setting This code is ultra-critical for lossless decoding, especially on ARM. The extra call VP8LIsEndOfStream() was causing unnecessary slow-down. Now, we check for bitstream-end separately in the main loop. Change-Id: I739b5d74cc29578e2b712ba99b544fd995ef0e0d	2017-02-11 02:35:02 -08:00
Pascal Massimino	4105d565d3	disable WEBP_USE_XXX optimisations when EMSCRIPTEN is defined Currently, none are available. If WEBP_HAVE_SSE2 eventually works, we'll have to refine this conditionals. BUG=webp:261 Change-Id: Ibc63ee1c013f2a4169eeb85cc8b6317b6420c2ad	2017-02-08 15:44:20 +00:00
Vincent Rabaud	868aa6901f	Perform greedy histogram merge in a unified way. Previously, the stochastic method for histogram combination could finish in a greedy way if the number of iterations to perform so was smaller. Except that another greedy combination was performed afterwards ... hence wasted CPU in some cases. Change-Id: Ic0f26873e6dc746679486b91cb35d73efee91931	2017-02-07 11:57:20 +01:00
Pascal Massimino	b494fdec45	optimize the ARGB->ARGB Import to use memcpy (instead of the generic VP8PackARGB call) Change-Id: I86edeb5934e7c062593f0248de7607cca5f1027c	2017-02-03 16:54:52 +01:00
Vincent Rabaud	5cfd4ebc5e	LZ77 interval speedups. Faster, smaller, simpler. The initial re-writing of this part of the code with intervals had to be done with a complex logic (mostly intervals with a lower and upper bound, not a constant value like now) to properly deal with the inefficiencies of the then LZ77 algorithm. The improvements made to LZ77 since, now allow for a simpler logic. There were also small errors in the interval insertion logic that lead to small inefficiencies (hence a slightly better compression rate). Change-Id: If079a0cafaae7be8e3f253485d9015a7177cf973	2017-02-02 11:51:30 +01:00
Pascal Massimino	be73378684	Merge "Add clang build fix for MSA"	2017-02-01 12:43:09 +00:00
Parag Salasakar	aa893914fc	Add clang build fix for MSA Change-Id: If139f4ecbdce756c69ba4ae032a70f81179683f8	2017-02-01 17:45:17 +05:30
Jehan	32ed856f60	Fix "all\|no frames are keyframes" settings. Documentation says: "if kmin == 0, then key-frame insertion is disabled; and if kmax == 0, then all frames will be key-frames." Reading this, you'd expect that if kmax == 0, then with any kmin <= 0 all frames will be key-frames. But actually the kmin <= 0 test is caught first and you get the opposite (no keyframes but the first). You'd have instead to set kmax == 0 and any value kmin > 0, which is absolutely counter-intuitive (reversing order). Moreover kmax == 1 has no valid kmin (kmin == 1 conflicts with the `kmax > kmin` rule and kmin == 0 conflicts with `kmin >= kmax / 2 + 1`). So it should be considered an exception too. Instead I propose this new logic: - kmax == 1 means that all frames are keyframes (you are explicitly requesting a keyframe every 1 frame at most, i.e. all frames). - kmax == 0 means no keyframes (you ask for a keyframe every 0 frames, i.e. never). This is more "logical" language-wise, and also does not involve any conflicts about what if both kmax and kmin are 0, since now a single property value is meaningful for the 2 exceptional cases. Change-Id: Ia90fb963bc26904ff078d2e4ef9f74b22b13a0fd (cherry picked from commit 2dc0bdcaeee77ae8b40ff9eb82a9e03a7cecaf04)	2017-01-26 22:31:16 -08:00
James Zern	1c3190b6ed	Merge "Fix "all\|no frames are keyframes" settings."	2017-01-27 00:02:04 +00:00
Pascal Massimino	f4dc56fd77	disable GradientUnfilter_NEON Compile with XCode, it appears quite slower than the C-version, especially for arm64. Change-Id: Ic46dba184a36be454fef674129d2f909003788fc (cherry picked from commit 4f3e3bbd44ad2989916910ce4ef4e6f10d8f2145)	2017-01-25 20:30:15 -08:00
Pascal Massimino	4f3e3bbd44	disable GradientUnfilter_NEON Compile with XCode, it appears quite slower than the C-version, especially for arm64. Change-Id: Ic46dba184a36be454fef674129d2f909003788fc	2017-01-25 16:33:26 -08:00
Jehan	2dc0bdcaee	Fix "all\|no frames are keyframes" settings. Documentation says: "if kmin == 0, then key-frame insertion is disabled; and if kmax == 0, then all frames will be key-frames." Reading this, you'd expect that if kmax == 0, then with any kmin <= 0 all frames will be key-frames. But actually the kmin <= 0 test is caught first and you get the opposite (no keyframes but the first). You'd have instead to set kmax == 0 and any value kmin > 0, which is absolutely counter-intuitive (reversing order). Moreover kmax == 1 has no valid kmin (kmin == 1 conflicts with the `kmax > kmin` rule and kmin == 0 conflicts with `kmin >= kmax / 2 + 1`). So it should be considered an exception too. Instead I propose this new logic: - kmax == 1 means that all frames are keyframes (you are explicitly requesting a keyframe every 1 frame at most, i.e. all frames). - kmax == 0 means no keyframes (you ask for a keyframe every 0 frames, i.e. never). This is more "logical" language-wise, and also does not involve any conflicts about what if both kmax and kmin are 0, since now a single property value is meaningful for the 2 exceptional cases. Change-Id: Ia90fb963bc26904ff078d2e4ef9f74b22b13a0fd	2017-01-25 13:12:52 -08:00
James Zern	36c42ea415	bump version to 0.6.0 libwebp{,decoder} - 0.6.0 libwebp libtool - 7.0.0 libwebpdecoder libtool - 3.0.0 mux - 0.4.0 libtool - 3.0.0 demux - 0.3.2 libtool - 2.2.0 Change-Id: Ie46dc70df1e283df0ccef6eb07c5694feb4d4a2b	2017-01-23 18:07:00 -08:00
James Zern	919f9e2fd6	Merge "add .rc files for windows dll versioning"	2017-01-20 19:29:50 +00:00
Pascal Massimino	4689ce1635	cwebp: add a -sharp_yuv option for 'sharp' RGB->YUV conversion Change-Id: I6edd5b44d693da50f702fa8218f14872874d91ba	2017-01-20 16:54:54 +01:00
Pascal Massimino	79bf46f120	rename the pretentious SmartYUV into SharpYUV Change-Id: Ifeeb9cb85896c5f3ba0cc1c2c821f8d00295f69e	2017-01-20 14:36:21 +01:00
Pascal Massimino	eb1dc89a5f	silently expose use_delta_palette in the WebPConfig API is just a placeholder for now, unless WEBP_USE_EXPERIMENTAL_FEATURES is defined. Change-Id: I087cb49781560bc1a7fbb01b136d36115c97ef72	2017-01-20 10:25:19 +01:00
James Zern	43d3f01a2f	add .rc files for windows dll versioning BUG=webp:323 Change-Id: Id415a32b63618d39af2e599cec0d40f64c35bbce	2017-01-20 00:35:15 -08:00
James Zern	668e1dd44f	src/{dec,enc,utils}: give filenames a unique suffix this avoids duplicates between these trees and dsp/, e.g., enc/tree.c, dec/tree.c, making pulling the whole library source tree into one target possible BUG=webp:279 Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775	2017-01-19 19:09:48 -08:00
Pascal Massimino	71c53f1aeb	NEON: speed-up strong filtering The sub-expression trick removes two constants and two vmlal_s8 instructions. Change-Id: I200022573b4880871b528b13a11a8f3d95def113	2017-01-19 20:46:48 +00:00
Pascal Massimino	a345068aba	ARM: speed up bitreader by avoiding tables (and using BitsLog2Floor() from utils.h instead) 9-10% speed-up, apparently Change-Id: I9acae4a4dceb1ddcc99306f99b722079bb06f6f8	2017-01-17 23:52:37 -08:00
Pascal Massimino	1dc82a6bba	Merge "introduce a generic GetCoeffs() function pointer"	2017-01-18 07:44:36 +00:00
Pascal Massimino	8074b89eb3	introduce a generic GetCoeffs() function pointer We can switch at run-time between the standard GetCoeffs() critical function, that uses a fast variant of VP8GetBit(). However, some platforms have slow instructions that make standard VP8GetBit() slow. GetCoeffs() is the right level of branching to switch to GetCoeffsAlt() that avoids these slow instructions in some not-frequent cases. Next patch will upgrade VP8GetBit() to use clz, after this one is proved to be neutral speed-wise. Change-Id: Ia6cef5de9de6131574d2202bbc0bea8559c9b693	2017-01-17 16:24:00 +01:00
Pascal Massimino	749a45a520	Merge "NEON: implement alpha-filters (horizontal/vertical/gradient)"	2017-01-17 15:13:08 +00:00
Pascal Massimino	74c053b57d	Merge "NEON: fix overflow in SSE NxN calculation"	2017-01-17 15:10:54 +00:00
Pascal Massimino	0a3aeff75b	Merge "dsp: WebPExtractGreen function for alpha decompression"	2017-01-17 15:08:20 +00:00
Pascal Massimino	1de931c669	NEON: implement alpha-filters (horizontal/vertical/gradient) gradient-filter code is not much faster, but maybe improvable in the future. Change-Id: Ia16070e409fe8703b02276166f19526917df6b35	2017-01-17 15:44:46 +01:00
Pascal Massimino	9b3aca404d	NEON: fix overflow in SSE NxN calculation vmlal_u8() is prone to overflow during the accumulation. There was a mismatch happening at low q mostly. Because in this case the distortion is important and the accumulated sum was later than 16bit-unsigned. Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8	2017-01-17 11:47:36 +01:00
Pascal Massimino	1c07a3c639	dsp: WebPExtractGreen function for alpha decompression + NEON implementation Change-Id: I67204f99d6e4c5974718bdf21dad30381978f72c	2017-01-17 09:33:25 +00:00
Pascal Massimino	9ed5e3e5dd	use pointers for WebPRescaler's in WebPDecParams This makes the structure more generic, without the hard-coded internal structure. This is a borderline incompatible ABI change, even if WebPIDecoder structure is opaque. Change-Id: I518765c3f76fc17a136cef045a5a8aa70ed70e85	2017-01-16 22:30:29 -08:00
James Zern	db013a8d5c	Merge "ARM: don't use USE_GENERIC_TREE"	2017-01-13 22:15:04 +00:00
Pascal Massimino	fcd4784dcd	use a 8b table for C-version for clz() 30% faster on x86, 5% faster on N5. New generic function: WebPLog2FloorC() This function is called as fallback for BitsLog2Floor() when there's no clz() available. Change-Id: Ica15c6092112e514c0e200fab89c434de48d4b19	2017-01-13 15:36:26 +01:00
Pascal Massimino	fbb5c473b4	ARM: don't use USE_GENERIC_TREE It's 1-2% faster to use hard-coded tree on ARM Change-Id: I54403a70f6c692e50148c33f36833588957c20ee	2017-01-13 10:05:21 +01:00
Pascal Massimino	8fda56126e	Merge "add a kSlowSSSE3 feature for CPUInfo"	2017-01-13 07:01:48 +00:00
Pascal Massimino	86bbd24552	add a kSlowSSSE3 feature for CPUInfo This is meant to be used for run-time detection of slow platforms regarding instructions like pshufb and bsr. Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731 Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640	2017-01-13 06:19:27 +00:00
Vincent Rabaud	7c2779e95a	Get code to fully compile in C++. Change-Id: I6d8490c8c9b955d90dcc89ee8a9cf29ca0f93b08	2017-01-12 18:03:55 +01:00
Vincent Rabaud	250c358662	Merge "When compiling as C++, avoid narrowing warnings."	2017-01-12 13:00:56 +00:00
Vincent Rabaud	c0648ac2ae	When compiling as C++, avoid narrowing warnings. The gcc compilation warning was: narrowing conversion from ‘int’ to ‘int8_t’ Change-Id: I4803dd60ad04060cdb5d61a1aa98b25215b9d4eb	2017-01-12 13:39:22 +01:00
Pascal Massimino	0d55f60c91	40% faster ApplyAlphaMultiply_SSE2 process four pixels at a time Change-Id: I1dee7f70772be4915654fc6638ef4729a1a239d4	2017-01-12 02:33:09 -08:00
Pascal Massimino	49d0280df1	NEON: implement several alpha-processing functions - ApplyAlphaMultiply - DispatchAlpha - DispatchAlphaToGreen - ExtractAlpha Decoding to Argb / rgbA / ... is 10-15% faster (measured on N4) new file: alpha_processing_neon.c Change-Id: I40f1a809e9885d1031ff0bc886d8d001efa66bca	2017-01-11 17:39:29 +01:00
Pascal Massimino	48b1e85fbe	SSE2: 15% faster alpha-processing functions ApplyAlphaMultiply / MultARGBRow / MultRow we use now: x/255 = (x * 0x8081) >> (16 + 7) and x/255 + .5 = ((x + 128) * 0x0101) >> 16 Change-Id: I8931091316ffc8bbf65aa3402f2e7d2b800e1971	2017-01-11 15:35:16 +01:00
Pascal Massimino	e3b8abbc9b	fix warning from static analysis. "-1 cannot be represented in type 'unsigned int'" Change-Id: I05abcb44af68f702ead5a7f24dc14aab31a2e4d9	2017-01-10 22:59:47 -08:00
Pascal Massimino	28fe054e73	SSE2: 30% faster ApplyAlphaMultiply() and 15% faster MultARGBRow() by switching to formulae: X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X. (X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128 Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81	2017-01-10 23:34:22 +01:00
Vincent Rabaud	f44acd253b	Merge "Properly compute the optimal color cache size."	2017-01-10 21:14:16 +00:00
Vincent Rabaud	527844fee0	Properly compute the optimal color cache size. The previous optimization was performing dichotomy on a function that is anything in practice, hence a bit of randomness. Also, two magic constants were used, one for an extra constant cost, one for an extra linear cost. Both values/models were empirical. A brute force search for the best cache size is now performed. To have less CPU impact, a speed optimization is also made by not inserting a value again and again. This makes sense but it's also the most common case of when LZ77 is useful hence an overall improvement sometimes. Change-Id: I57de5750ad2313b2feecbcd15cd6e4feeb98e5c8	2017-01-10 21:44:53 +01:00
Pascal Massimino	be0ef6395f	fix a comment typo Change-Id: I0fabd08cd8abd3cea7ddfd2e498507adb0d3c67e	2017-01-10 21:17:13 +01:00
Vincent Rabaud	8874b16275	Fix a non-deterministic color cache size computation. In case of impossible allocation, some value was returned while computation should be stopped. Change-Id: I5f85e264575be825e4261ab6fa63840c157cf5c2	2017-01-10 18:53:19 +01:00
Vincent Rabaud	d712e20de0	Do not allow a color cache size bigger than the number of colors. This is purely for speed optimization. Change-Id: Ie4b4380df8a5afa90574012bacdb1ddad03f320e	2017-01-10 09:25:02 +01:00

1 2 3 4 5 ...

2355 Commits