libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-12 05:54:31 +02:00

Author	SHA1	Message	Date
James Zern	a7a954c851	Merge "lossless: make prediction in encoder work per scanline"	2015-11-25 20:40:44 +00:00
Lode Vandevenne	239421c5ef	lossless: make prediction in encoder work per scanline instead of per block. This prepares for a next CL that can make the predictors alter RGB value behind transparent pixels for denser encoding. Some predictors depend on the top-right pixel, and it must have been already processed to know its new RGB value, so requires per scanline instead of per block. Running the encode speed test on 1000 PNGs 10 times with default settings: Before: Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s After: Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s Same but with quality 0, method 0 and 30 iterations: Before: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s After: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s No effect on compressed size, this produces exactly same files. No significant measured effect on speed. Expected faster speed from better memory layout with scanline processing but slower speed due to needing to get predictor mode per pixel, may compensate each other. Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334	2015-11-25 00:38:27 -08:00
Pascal Massimino	f5ca40e05f	fix of undefined multiply (int32 overflow) the problem was the incorporation of the extra constant 1<<16 in the kC1 constant, to emulate the addition. It's now removed and the addition is performed explicitly. No real speed difference observed. cf. issue #278 Change-Id: I2c6499031571d98afff392fb5ebe21a5fa60722d	2015-11-24 23:18:31 -08:00
Lode Vandevenne	b8c44f1aa4	3% speed improvement for lossless webp encoder for low effort mode: prevent updating unused histogram. Benchmark on 1000 PNGs, 30 iterations, lossless, quality 0, method 0: before: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 34.578 MP/s after: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.980 MP/s Change-Id: Id62759d4d111a6ba41c85c611a15d4f6ffc9f935	2015-11-22 09:12:54 +01:00
Pascal Massimino	bfd3fc02df	~2x faster SSE2 RGB24toY, BGR24toY, ARGBToY\|UV global effect is ~2% faster encoding from JPG source and ~8% faster lossless-webp source decoding to PGM (e.g.) Also revamped the YUVA case to first accumulate R/G/B value into 16b temporary buffer, and then doing the UV conversion. -> New function: WebPConvertRGBA32ToUV Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5	2015-11-06 15:02:01 -08:00
Pascal Massimino	52fdbdfe66	extract some RGB24 to Luma conversion function from enc/ to dsp/ Just for RGB24/BGR24 for now, which are the hard-to-optimize ones. SSE2 implementation coming next. ConvertRowToY() should go into dsp/ too, at some point. Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425	2015-10-30 00:28:11 -07:00
Pascal Massimino	ab8c2300b6	add missing \n Change-Id: I0c9236bbeef5868629d4dc02e3fae6e79ca55949	2015-10-30 00:02:27 -07:00
Pascal Massimino	8f1fcc15af	Merge "Move ARGB->YUV functions from dec/vp8l.c to dsp/yuv.c"	2015-10-29 06:38:52 +00:00
Pascal Massimino	25bf2ce5cc	fix some warning about unaligned 32b reads on x86 + gcc, the assembly code is the same. Change-Id: Ib0d23772ccf928f8d9ebcb0e157c0573d1f6a786	2015-10-28 15:51:55 -07:00
Pascal Massimino	fa8927efe4	Move ARGB->YUV functions from dec/vp8l.c to dsp/yuv.c also switch to using ExtractAlpha() instead of hard-coding the loop. The ARGBToY/UV functions are rather easy to port to SSE2 / NEON. Change-Id: I8f1346a9ca427a36ce2d6c848369ca7964d8b3c7	2015-10-28 01:45:08 -07:00
Pascal Massimino	14e4043b67	remove unnecessary #include "yuv.h" Change-Id: I8b277433663e063e7a182f66818afec1654a39bd	2015-10-27 01:27:36 -07:00
Pascal Massimino	5aa8d61f75	Merge "MIPS: rescaler code synced with C implementation"	2015-10-17 07:52:36 +00:00
Djordje Pesut	e7fb267df7	MIPS: rescaler code synced with C implementation Change-Id: I4cec115d3fe6f3f825084d7388249694c500256a	2015-10-17 00:16:27 -07:00
James Zern	65726cd3a7	dsp/lossless: Average2, make a constant unsigned use 'u' rather than the unnecessary 'l' as a suffix. this prevents a conversion warning with some toolchains Change-Id: I21c33ce08819b3c839c75e03a8f7f3a6041d0695	2015-10-16 18:39:42 -07:00
Johann	d26d9def80	Use __has_builtin to check clang support Older versions of Xcode with clang reporting versions 4.[012] and 5.0 did not include support for __builtin_bswap16. Checking in this manner avoids using brittle version checks. Matches a change to libvpx: https://chromium-review.googlesource.com/305573 to fix: https://code.google.com/p/webm/issues/detail?id=1082 Change-Id: I23ea466ee1b53b12cd3fb45f65a2186c8dda95a1	2015-10-14 17:48:08 -07:00
Pascal Massimino	67c547fdcd	rescaler: ~20% faster SSE2 implementation for lossless ImportRowExpand lossy (1-channel) speed-up is more on the 5% side. Change-Id: Id19d97b9e9a34804b59604a5b48f94a37fdafd62	2015-10-14 07:32:12 +02:00
Pascal Massimino	99e3f8128a	Merge "large re-organization of the delta-palettization code"	2015-10-14 05:11:47 +00:00
Pascal Massimino	95509f9914	large re-organization of the delta-palettization code same functionality, but better code layout. What changed: * don't trash the palette_[] in EncodePalette(), so it can be re-used * split generation of image from bit-stream coding * move all the delta-palette code to delta_palettization.c, and only have 1 entry point there WebPSearchOptimalDeltaPalette() * minimize the number of "#ifdef WEBP_EXPERIMENTAL_FEATURES" in vp8l.c * clarify the TransformBuffer stuff. more clean-up to come here... This should make experimenting with delta-palettization easier and more compartimentalized. Change-Id: Iadaa90e6c5b9dabc7791aec2530e18c973a94610	2015-10-14 00:25:42 +02:00
Pascal Massimino	74fb458bbc	fix for weird msvc warning message " warning C4098: 'RescalerImportRowShrinkSSE2' : 'void' function returning a value" Change-Id: Ifa893502e3e4b394910e142d954393dda9d59d1a	2015-10-10 22:35:59 -07:00
Pascal Massimino	932fd4df61	SSE2 implementation of ImportRowShrink some limitations: only for RGBA output, and if reduction factor is not too small (dst_width > src_width / 128) 20-25% faster, ~4-6% global improvement total decoding. Change-Id: I95366ddaa4a38e0a96bed754dfe790126f7bb84a	2015-10-09 13:04:54 -07:00
skal	b4e731cd93	neon-implementation for rescaler code It's better to stay with a 32b fixed-point precision overall, otherwise the C-version on ARM gets slower. Actually, gcc ARM compiler optimizes some instructions pretty well when WEBP_RESCALER_FIX is exactly 32, even in C. Change-Id: I0eea97f7db5947470f5af355dee098eca81e178d	2015-10-07 21:18:39 -07:00
Mislav Bradac	48f66b6687	Add delta_palettization feature to WebP Change-Id: Ibaf4e49aa67d63d0eb11848cca4fd0c60815864a	2015-10-02 14:29:54 -07:00
James Zern	5a84460d6d	rescaler_mips_dsp_r2: cosmetics, fix indent Change-Id: I59a432a66a658a74f383bd81b6f9abb5e5bb409e	2015-09-25 18:35:16 -07:00
James Zern	acde0aae5a	rescaler: cosmetics, join two lines Change-Id: Ic231dd048c82a934122ce4884180a2339f7ce2f8	2015-09-25 18:34:45 -07:00
Pascal Massimino	306ce4fde1	rescaler: move the 1x1 or 2x1 handling one level up => no need to handle it in the sub-functions. Change-Id: I4b0211ecfafbc9c80a73bf2206809a13c94e7911	2015-09-25 14:35:35 -07:00
Pascal Massimino	cced974bb2	remove _mm_set_epi64x(), which is too specific Change-Id: I4b1035f9c548b804f31c68a00b0a1aa8e13550bb	2015-09-25 14:35:33 -07:00
Pascal Massimino	56668c9fc5	fix warnings about uint64_t -> uint32_t conversion Change-Id: Iee027979b404d4b7edda506b844d354aa1026dae	2015-09-25 17:36:11 +02:00
Pascal Massimino	76a7dc39e5	rescaler: add some SSE2 code The rounding and arithmetic is not the same as previously, to prevent overflow cases for large upscale factors. We still rely on 32b x 32b -> 64b multiplies. Raised the fixed-point precision to 32b so that we have some nice shifts from epi64 to epi32. Changed rescaler_t type to 'uint32_t' in order to squeeze in all the precision required. The MIPS code has been disabled because it's now out-of-sync. Will be fixed in a subsequent CL when the dust settles. ~30-35% faster Change-Id: I32e4ddc00933f1b1aa3463403086199fd5dad07b	2015-09-25 15:07:13 +02:00
James Zern	1df1d0eedb	rescaler: harmonize function protos Change-Id: I13b5f9add83c1225c82a650f3ef717582b057247	2015-09-19 22:57:25 -07:00
Pascal Massimino	9ba1894b9b	rescaler: simplify ImportRow logic incorporates the loop over 'channel' and removes one parameter Change-Id: I4e3b33c111ca825fe96461583420413b17326409	2015-09-19 10:07:26 -07:00
Pascal Massimino	5ff0079ece	fix rescaler vertical interpolation * vertical expansion now uses bilinear interpolation * heavily assumes that the alpha plane is decoded in full, not row-by-row * split the RescalerExportRow and RescalerImportRow methods into Shrink and Expand variants. * MIPS implementation of ExportRowExpand is missing. There's room for extra speed optim and code re-org, but let's keep that for later patches. addresses https://code.google.com/p/webp/issues/detail?id=254 Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb	2015-09-18 17:32:11 -07:00
James Zern	d623a8706f	dec_neon: add whitespace around stringizing operator prevents unintentional side-effects (though unlikely in this case) with future compilers, cf: eebaf97 dsp/mips: add whitespace around stringizing operator Change-Id: I0537091fcc97b4f54d0a156c3c83a28c51456b17	2015-09-03 23:13:56 -07:00
James Zern	29377d55b6	dsp/mips: cosmetics: add whitespace around XSTR macro normalizes formatting after: eebaf97 dsp/mips: add whitespace around stringizing operator Change-Id: I1e3986b6d08195d79072747eb99d7e0549aece72	2015-09-03 23:09:13 -07:00
James Zern	eebaf97f5a	dsp/mips: add whitespace around stringizing operator fixes compile with gcc 5.1 BUG=259 Change-Id: Ideb39c6290ab8569b1b6cc835bea11c822d0286c	2015-09-02 23:21:13 -07:00
James Zern	14efabbf1c	Android: limit use of cpufeatures cpufeatures is only used with armeabi-v7a.* Change-Id: I80284061d71d9defa50d139c7f1bda67c00f567e	2015-08-19 18:44:33 -07:00
skal	bd55604d1b	SSE2: add yuv444 converters, re-using yuv_sse2.c Change-Id: I4d5c9df8a4c8e8cb8b5daa537af07382894503a8	2015-08-17 21:15:37 -07:00
James Zern	155c1b222b	Merge changes I76f4d6fe,I45434639 * changes: lossless_enc_neon: add VP8LTransformColor lossless_neon: add VP8LTransformColorInverse	2015-08-06 23:00:03 +00:00
Djordje Pesut	717e4d5a7c	mips32/mipsDSPr2: function ImportRow rebased Change-Id: Id58d266040fdb5fe1e507cd0f6370ea625156e4d	2015-08-06 17:09:10 +02:00
Pascal Massimino	7df93893dc	fix rescaling bug (uninitialized read, see bug #254 ). the x_add/x_sub increments were wrong for u/v in the upscaling case. They shouldn't be left to the caller's discretion, but set up by WebPRescalerInit to their exact necessary values. -> Cleaned-up WebPRescalerInit() param list. -> added safety asserts -> removed the mips32/mips_r2 variant of "ImportRow" which were buggy prior Change-Id: I347c75804d835811e7025de92a0758d7929dfc09	2015-08-05 23:00:00 -07:00
James Zern	5cdcd561e2	lossless_enc_neon: add VP8LTransformColor based on SSE2, ~32% faster Change-Id: I76f4d6fe456baceba46ffebf2f699e98691eefdf	2015-08-05 00:15:13 -07:00
James Zern	a53c336919	lossless_neon: add VP8LTransformColorInverse based on SSE2, only ~11% faster Change-Id: I45434639d81e153f01f77c1f5d2da510b542170e	2015-08-04 23:22:36 -07:00
James Zern	99131e7f8c	Merge changes I9fb25a89,Ibc648e9e * changes: lossless_neon: remove predictors 5-13 ll_enc_neon: enable VP8LSubtractGreenFromBlueAndRed	2015-08-04 02:24:15 +00:00
Pascal Massimino	c455676680	simplify the main loop for downscaling (part of bug #254 investigation) no speed change observed. Change-Id: Ie21b33171def367f37643fef6a0bd378e49468c7	2015-08-03 16:57:35 +02:00
James Zern	2a010f992a	lossless_neon: remove predictors 5-13 operating on single uint32's isn't helped by NEON. this improves aarch64 performance by ~4% Change-Id: I9fb25a8962de7b80e893e756ee7c76393cfd40c7	2015-07-28 19:44:58 -07:00
James Zern	ca221bbc48	ll_enc_neon: enable VP8LSubtractGreenFromBlueAndRed this moves the function outside the WEBP_USE_INTRINSICS check. there's no alternative version and it's ~54% faster at the function level and mildly faster overall Change-Id: Ibc648e9ee35021d48901e05aa596aa01067796a2	2015-07-28 19:44:45 -07:00
Jyrki Alakuijala	85b44d8a69	lossless: encoding, don't compute unnecessary histo share the computation between different modes 3-5 % speedup for lossless alpha 1 % for lossy alpha no change in compression density Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2	2015-07-07 20:24:26 -07:00
Pascal Massimino	0ae2c2e4b2	SSE2/SSE41: optimize SSE_16xN loops After several trials at re-organizing the main loop and accumulation scheme, this is apparently the faster variant. removed the SSE41 version, which is no longer faster now. For some reason, the AVX variant seems to benefit most for the change. Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5	2015-07-02 20:55:04 +02:00
James Zern	39216e59d9	cosmetics: fix indent after 32462a07 Change-Id: If9a5d91c25e981bc4cd81adb476244e63fc7c3c8	2015-07-01 23:49:20 -07:00
James Zern	559e54ca60	Merge "SSE2: slightly faster FTransformWHT"	2015-07-02 06:36:33 +00:00
Pascal Massimino	8ef9a63b45	SSE2: slightly faster FTransformWHT goes from 0.3% to 0.1% overall CPU time, but... Change-Id: I4c9a92b1e1d6b58ed57c6b890366f1dbeaf84f84	2015-07-01 23:03:17 -07:00

... 6 7 8 9 10 ...

903 Commits