libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-08-17 09:28:05 +02:00

Author	SHA1	Message	Date
Pascal Massimino	76ebbfff28	NEON: implement predictor #13 ~5-7% faster Change-Id: I3361b0bbc978f3721168db15778a67337309c18a	2016-12-07 14:58:49 -08:00
Vincent Rabaud	95b12a08ae	Merge "Revert Average3 and Average4"	2016-12-07 15:38:56 +00:00
Vincent Rabaud	54ab2e758f	Revert Average3 and Average4 Average3 created a slowdown of 1-2% in lossless decoding. Average4 created a slowdown of 2-3% in lossless decoding. Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5	2016-12-07 15:32:33 +01:00
Pascal Massimino	fe12330c81	3-5% faster Predictor #5 , #6 , #7 and #10 for NEON Change-Id: Ica48c7088d4384f0888dd171a47e68ebd25729b2	2016-12-07 15:25:33 +01:00
Pascal Massimino	fbfb3bef7b	~2% faster predictor #10 for NEON Change-Id: Icd9cff90c227d702c3ba319131996c5475094520	2016-12-06 13:47:35 +00:00
Pascal Massimino	d4b7d801db	lossless_sse2: use the local functions ...instead of the pointers stored in the array. Should be faster (inlined) and safer. Also: suffix explicitly the functions with _SSE2 Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90	2016-12-06 14:20:41 +01:00
Vincent Rabaud	a5e3b22574	Lossless decoder SSE2 improvements. Change-Id: Ia901014ac63156a2e278b81e035256c30bdf8706	2016-12-06 13:45:09 +01:00
Pascal Massimino	58a1f124c2	~2% faster predictor #12 in NEON. Change-Id: I6772bb865d0f72720a65561eb55028e538df236d	2016-12-06 10:24:27 +01:00
Pascal Massimino	906c3b6392	Merge "Implement lossless transforms in NEON."	2016-12-03 16:55:14 +00:00
Vincent Rabaud	d23abe4e9f	Implement lossless transforms in NEON. Change-Id: I2172b1a763eb9dfe25d2b9bf1fb6501d7e192e55	2016-12-03 11:20:22 +00:00
Vincent Rabaud	2e6cb6f34e	Give more flexibility to the predictor generating macro. Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900	2016-12-02 12:33:12 -08:00
Vincent Rabaud	28e0bb7088	Merge "Fix race condition in multi-threading initialization."	2016-12-02 17:45:10 +00:00
Vincent Rabaud	647045305a	Fix race condition in multi-threading initialization. Before, a first thread could enter VP8LDspInitSSE2, set VP8LPredictorsAdd to an SSE2 version BEFORE another thread would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C thus leading to a C version actually being the SSE2 one (which would then create an infinite recursion in the SSE2 predictors at execution). Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a	2016-12-02 18:28:57 +01:00
Pascal Massimino	ea72cd60cb	add missing 'extern' keyword for predictor dcl Change-Id: Ibf3db9b6dae91e53524c31cdfccf4678b3fa1135	2016-12-01 08:15:14 +01:00
Vincent Rabaud	67879e6d48	SSE implementation of decoding predictors. Change-Id: I5c9ae63afc98013cb45ce8a91f051203ac68402c	2016-11-30 12:00:07 +01:00
Vincent Rabaud	4239a1489c	Make the lossless predictors work on a batch of pixels. Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1	2016-11-28 17:12:10 +01:00
Pascal Massimino	bc18ebad2e	fix extra 'const's in signatures Change-Id: Ie433d0defbc0c6feae2eb2f11e70082f1affada8	2016-11-25 09:45:52 +01:00
Vincent Rabaud	71e2f5cadf	Remove memcpy in lossless decoding. Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c	2016-11-24 17:45:24 +01:00
Vincent Rabaud	7474d46e45	Do not use a register array in SSE. Change-Id: I79cf95bdac1164fc4de899828e9380c23df8d141	2016-11-24 13:06:44 +01:00
Owen Rodley	67748b41db	Improve latency of FTransform2. Benchmarks from vrabaud@: 8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged images/lpcb corpus speed: unchanged, corpus size: unchanged images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged images/png_doc corpus speed: unchanged, corpus size: unchanged images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged images/jpeg_small corpus speed: unchanged, corpus size: unchanged images/icip_core1 corpus speed: unchanged, corpus size: unchanged images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged images/gif corpus speed: faster: -1.4 % , corpus size: unchanged images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged images/all corpus speed: faster: -1.5 % , corpus size: unchanged images/png_small corpus speed: unchanged, corpus size: unchanged images/png corpus speed: unchanged, corpus size: unchanged images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b	2016-11-24 07:09:50 +00:00
Vincent Rabaud	6540cd0eeb	Provide an SSE implementation of ConvertBGRAToRGB Change-Id: Ida11b079077a47fe3b92754f08aa30d81c301fcf	2016-11-23 16:25:51 +01:00
Pascal Massimino	3c2a61b099	remove some unneeded casts Change-Id: Ie68788c77f016ed11446a55142b1bd8d96261452	2016-11-16 22:54:40 -08:00
Pascal Massimino	9ac063c37f	add dsp functions for SmartYUV + SSE2 implementation Change-Id: I5cfdb62d68b5a95899241a097d3a2f697fbc590e	2016-11-16 14:23:06 +00:00
Pascal Massimino	31b1e34342	fix SSIM metric ... by ignoring too-dark area Roughly, if both the source and the reference areas are darker too dark (R/G/B <= ~6), they are ignored. One caveat: SSIM calculation won't work for U/V planes, which are 128-centered and not related to luminance. But WebPPlaneDistortion() enforces the conversion to RGB, if needed. Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591	2016-10-20 15:17:55 +02:00
Vincent Rabaud	28ce304344	Remove some errors when compiling the code as C++. This fixes some cases from https://bugs.chromium.org/p/webp/issues/detail?id=137 Change-Id: I58f3a617bf973dbe4c5794004a01e2aea39ba53a	2016-10-05 09:39:08 +02:00
Pascal Massimino	ba843a92e7	fix some SSIM calculations * prevent 64bit overflow by controlling the 32b->64b conversions and preventively descaling by 8bit before the final multiply * adjust the threshold constants C1 and C2 to de-emphasis the dark areas * use a hat-like filter instead of box-filtering to avoid blockiness during averaging SSIM distortion calc is actually faster now in SSE2, because of the unrolling during the function rewrite. The C-version is quite slower because still un-optimized. Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90	2016-10-04 01:09:07 -07:00
Pascal Massimino	86a84b3598	2x faster SSE2 implementation of SSIMGet Change-Id: I53705d7ddfa595389ff2d542e5088f96f948d351	2016-09-23 23:23:06 -07:00
Pascal Massimino	7c1fb7d0ff	fix uint32_t initialization (0. -> 0) Change-Id: Ia4aae27f70c4e74ddeb5654cfabb21d785cea9cf	2016-09-14 20:26:05 +02:00
Pascal Massimino	bfff0bf329	speed-up SSIM calculation SSIM results are incompatible with previous version! We're now averaging the SSIM value for each pixels instead of printing a frame-level global SSIM value. * Got rid of some old code * switched to uint32_t for accumulation * refactoring SSIM calculation is ~4x faster now. Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673	2016-09-14 16:15:43 +02:00
Vincent Rabaud	64577de8ae	De-VP8L-ize GetEntropUnrefinedHelper. Having it architecture dependent resulted in an extra function call of an extern function, hence no inlining and a 5-10% impact on performance. Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450	2016-09-14 13:55:24 +02:00
Pascal Massimino	a7be73280b	Merge "refactor the PSNR / SSIM calculation code"	2016-09-14 06:37:56 +00:00
Pascal Massimino	50c3d7da9a	refactor the PSNR / SSIM calculation code -print_psnr is now much faster because it doesn't use the SSIM code. The SSIM speed-up and re-write will come later. Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823	2016-09-14 06:13:24 +00:00
Vincent Rabaud	dd538b192d	Remove unused declaration. Change-Id: I8ab19654df63e7ef8aad00e97d1428c7b53ee33f	2016-09-13 16:25:46 +02:00
Vincent Rabaud	6cc48b1728	Move some lossless logic out of dsp. Change-Id: I4cfd60cd5497666a2e1c188ceada2e71b05f1505	2016-09-13 15:37:32 +02:00
Vincent Rabaud	c9b45863e2	Split off common lossless dsp inline functions. Change-Id: I64f96897b11d1c21f033c7e47b21edccb5c68738	2016-09-12 17:35:08 +02:00
Pascal Massimino	3884972e3f	remove WEBP_FORCE_ALIGNED and use memcpy() instead. BUG=webp:297 Change-Id: I89a08debec7bb1b3f411c897260ab1bb63f77df2	2016-08-17 20:16:03 -07:00
skal	6ab496ed22	fix some 'unsigned integer overflow' warnings in ubsan I couldn't find a safe way of fixing VP8GetSigned() so i just used the big-hammer. Change-Id: I1039bc00307d1c90c85909a458a4bc70670e48b7	2016-08-16 23:18:27 -07:00
James Zern	8a4ebc6ab0	Revert "fix 'unsigned integer overflow' warnings in ubsan" This reverts commit `e44f5248ff`. contains unintentional changes in quant.c Change-Id: I1928f072566788b0c9ea80f6fbc9e571061f9b3e	2016-08-16 16:55:56 -07:00
Pascal Massimino	9d4f209f80	Merge changes I25711dd5,I43188fab * changes: Fix assertions in WebPRescalerExportRow() Add descriptions of default configuration in help info.	2016-08-16 22:13:23 +00:00
skal	e44f5248ff	fix 'unsigned integer overflow' warnings in ubsan I couldn't find a safe way of fixing VP8GetSigned() so i just used the big-hammer. Change-Id: I1039bc00307d1c90c85909a458a4bc70670e48b7	2016-08-16 15:04:41 -07:00
Hui Su	27b5d991e2	Fix assertions in WebPRescalerExportRow() Change-Id: I25711dd54e71c90a25f7b18e0ef9155e8151a15e	2016-08-16 14:32:48 -07:00
James Zern	40872fb2e6	dec_neon,NeedsHev: micro optimization trade 2 compares + 1 logical or for max + compare Change-Id: I785ad8efdc64db2d0609456d6e7af795ab2117d8	2016-08-08 20:12:30 -07:00
James Zern	b551e587b3	cosmetics: add {}s on continued control statements for consistency within the codebase. in some cases simply join the lines. Change-Id: I071f061052e274c8a69f651ed4305befb4414a40	2016-08-03 19:08:59 -07:00
James Zern	d2e4484ef3	dsp/Makefile.am: put msa source in correct lib upsampling_msa.c was incorrectly included in the neon convenience lib + sort msa sources Change-Id: I7c4883f16a5c2fed12bfa0e8d8d6a7acd5d4fb84	2016-08-03 17:50:45 -07:00
Parag Salasakar	d3ddacb625	Add MSA optimized YUV to RGB upsampling functions We add the following MSA optimized YUV to RGB upsampling functions: - UpsampleRgbLinePair - UpsampleBgrLinePair - UpsampleRgbaLinePair - UpsampleBgraLinePair - UpsampleArgbLinePair - UpsampleRgba4444LinePair - UpsampleRgb565LinePair Change-Id: I7264a615edc7eb376e443e9d38bd8e3c9a2cab1f	2016-07-22 14:28:30 +00:00
Parag Salasakar	9ac74f922e	Add MSA optimized rescaling functions We add the following MSA optimized rescaling functions: - RescalerExportRowExpand - RescalerExportRowShrink Change-Id: Ic1c76065423b02617db94cf0c22bb564219b36e6	2016-07-19 15:52:42 +00:00
Parag Salasakar	cb19dbc1a4	Add MSA optimized color transform functions We add the following MSA optimized color transform functions: - TransformColor - SubtractGreenFromBlueAndRed Change-Id: Ib182d2b5faa7191f503ce70f0dfde0ac89402fd3	2016-07-18 13:49:24 +00:00
James Zern	5e2eb89e1f	cosmetics,dsp/msa.c: associate '' with the type not the variable Change-Id: If5823e9731c406655eaf1dc1aaa2e6554ca7daad	2016-07-15 15:40:41 -07:00
skal	5b60db5c9d	FastMBAnalyze() for quick i16/i4 decision The decision is based on the variance between DC values of each sub-4x4 block. This heuristic is rather ok for predicting whether the 2nd transform (intra-16) is going to help or not. The decision threshold varies with quality (=quantization). It's only used for -m 0 and -m 1, where no full RD-opt is performed. It actually makes these modes quite faster, with RD curve much closer to the -m 2 mode. Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8	2016-07-15 11:21:08 -07:00
Parag Salasakar	567e697776	Add MSA optimized CollectHistogram function We add the following MSA optimized encoder Histogram function: - CollectHistogram Change-Id: I28415704ec62c3ad375de06eeef468d9f514bb2d	2016-07-15 22:51:33 +05:30

... 2 3 4 5 6 ...

831 Commits