libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2026-04-09 06:12:32 +02:00

Author	SHA1	Message	Date
James Zern	b7971d0e22	dsp: avoid defining _C functions w/NEON builds when targeting NEON C functions with NEON equivalents won't be used, but will contribute to binary size. the same goes for sse2, etc., but this change is primarily concerned with binary sizes for android arm targets. note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect on the use of NEON functions. this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0. Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface	2017-10-27 10:54:56 -07:00
James Zern	a439972175	WIP: list includes as descendants of the project dir #include "(.\|..)/..." -> #include "src/..." Change-Id: I772880aa097a770722043c8a4393552ba38a89b6	2017-10-10 23:04:05 -07:00
skal	b09307dcde	Encoder: harmonize function suffixes BUG=webp:355 Change-Id: Ia2fe95db7dfb303f3f64e390d43bc41b8933256c	2017-08-09 02:41:01 +00:00
Pascal Massimino	693bf74ec0	move the SSIM calculation code in ssim.c / ssim_sse2.c Change-Id: I63a63fa7f44f257f2e17e45358b206c23069c448	2017-02-21 12:53:35 +01:00
James Zern	668e1dd44f	src/{dec,enc,utils}: give filenames a unique suffix this avoids duplicates between these trees and dsp/, e.g., enc/tree.c, dec/tree.c, making pulling the whole library source tree into one target possible BUG=webp:279 Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775	2017-01-19 19:09:48 -08:00
Pascal Massimino	31b1e34342	fix SSIM metric ... by ignoring too-dark area Roughly, if both the source and the reference areas are darker too dark (R/G/B <= ~6), they are ignored. One caveat: SSIM calculation won't work for U/V planes, which are 128-centered and not related to luminance. But WebPPlaneDistortion() enforces the conversion to RGB, if needed. Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591	2016-10-20 15:17:55 +02:00
Vincent Rabaud	28ce304344	Remove some errors when compiling the code as C++. This fixes some cases from https://bugs.chromium.org/p/webp/issues/detail?id=137 Change-Id: I58f3a617bf973dbe4c5794004a01e2aea39ba53a	2016-10-05 09:39:08 +02:00
Pascal Massimino	ba843a92e7	fix some SSIM calculations * prevent 64bit overflow by controlling the 32b->64b conversions and preventively descaling by 8bit before the final multiply * adjust the threshold constants C1 and C2 to de-emphasis the dark areas * use a hat-like filter instead of box-filtering to avoid blockiness during averaging SSIM distortion calc is actually faster now in SSE2, because of the unrolling during the function rewrite. The C-version is quite slower because still un-optimized. Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90	2016-10-04 01:09:07 -07:00
Pascal Massimino	7c1fb7d0ff	fix uint32_t initialization (0. -> 0) Change-Id: Ia4aae27f70c4e74ddeb5654cfabb21d785cea9cf	2016-09-14 20:26:05 +02:00
Pascal Massimino	bfff0bf329	speed-up SSIM calculation SSIM results are incompatible with previous version! We're now averaging the SSIM value for each pixels instead of printing a frame-level global SSIM value. * Got rid of some old code * switched to uint32_t for accumulation * refactoring SSIM calculation is ~4x faster now. Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673	2016-09-14 16:15:43 +02:00
Pascal Massimino	50c3d7da9a	refactor the PSNR / SSIM calculation code -print_psnr is now much faster because it doesn't use the SSIM code. The SSIM speed-up and re-write will come later. Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823	2016-09-14 06:13:24 +00:00
skal	5b60db5c9d	FastMBAnalyze() for quick i16/i4 decision The decision is based on the variance between DC values of each sub-4x4 block. This heuristic is rather ok for predicting whether the 2nd transform (intra-16) is going to help or not. The decision threshold varies with quality (=quantization). It's only used for -m 0 and -m 1, where no full RD-opt is performed. It actually makes these modes quite faster, with RD curve much closer to the -m 2 mode. Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8	2016-07-15 11:21:08 -07:00
hui su	91b59e886b	Remove QuantizeBlockWHT() in enc.c QuantizeBlockWHT() is basically identical to QuantizeBlock(), no need to keep two copies. Change-Id: I970cb6948da1c750c1339971a55e3b40765cdd01	2016-07-14 10:44:18 -07:00
Parag Salasakar	435308e029	Add MSA optimized encoder transform functions We add the following MSA optimized encoder transform functions: - ITransform - FTransform - FTransformWHT Change-Id: Ia6b17556aba5aff2d7a88208905fb45293d080a8	2016-07-05 14:35:47 +00:00
Pascal Massimino	ca8d951980	remove some obsolete TODOs Change-Id: Ied77b2dd7e3e5bb65524c0ac7b9a3fb6585cac57	2016-06-01 16:23:16 +02:00
Pascal Massimino	8fa6ac68f0	remove two ubsan warnings (regarding uint overflow) Change-Id: I1a76e4b1268370b6b7d6a1aa93b99e57f55fd02e	2016-05-10 18:40:18 +00:00
Pascal Massimino	423ecaf484	move some SSIM-accumulation function for dsp/ This is in preparation for some SSE2 code. And generally speaking, the whole SSIM code needs some revamp: we're not averaging the SSIM value at each pixels but just computing the overall SSIM value once, for the whole plane. The former might be better than the latter. Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5	2016-03-08 07:50:09 +01:00
Vincent Rabaud	9960c31685	Remove an unnecessary transposition in TTransform. Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e	2016-02-17 21:41:28 +01:00
Pascal Massimino	2c08aac81a	introduce WebPMemToUint32 and WebPUint32ToMem for memory access it uses memcpy() when unaligned memory write is tricky Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874	2015-12-04 13:43:01 +00:00
skal	ac76801159	introduce FTransform2 to perform two transforms at a time. FTransform goes from ~12.0% to 11.5% total CPU time. Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624	2015-05-18 21:06:15 -07:00
James Zern	0768b252fa	dsp/enc.c: cosmetics: move DST() def closer to use Change-Id: Iccbcf046412426c2893b71eced517f611d2ffc3f	2015-04-15 20:03:39 -07:00
Pascal Massimino	94055503e3	encoding SSE4.1 stub for StoreHistogram + Quantize + SSE_16xN Visible speed-up, thanks to pshufb and pabsw and psignw use. had to tweak configure.ac to make "smmintri.h" presence correctly detected (we need to set the CPPFLAGS instead of the CFLAGS!) Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080	2015-03-25 20:23:51 -07:00
James Zern	182497993b	dsp: s/VP8LSetHistogramData/VP8SetHistogramData/ this function is for lossy encoding; the VP8L prefix is used by lossless Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb	2015-03-24 18:27:41 -07:00
James Zern	f9016d6662	dsp/enc::InitTables: add missing TSan annotation Change-Id: I262b9071417a0ec502c7c0380f27da6413cc74e4	2015-02-09 22:40:45 -08:00
James Zern	67f601cd46	make the 'last_cpuinfo_used' variable names unique allows the sources to be #include'd in some hackish builds (don't do that!) Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91	2015-01-07 23:38:53 -08:00
Pascal Massimino	a437694a17	multi-thread fix: lock each entry points with a static var we compare the current VP8GetCPUInfo pointer to the last used. This is less code overall and each implementation is still testable separately (by just changing VP8GetCPUInfo, but not a separate threads!) Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b	2015-01-05 07:48:49 -08:00
Pascal Massimino	bad775715a	simplify the Histogram struct, to only store max_value and last_nz we don't need to store the whole distribution in order to compute the alpha Later, we can incorporate the max_value / last_non_zero bookkeeping in SSE2 directly. Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15	2014-12-10 10:44:57 +01:00
Pascal Massimino	4a279a680e	cosmetics: add some missing != NULL comparisons Change-Id: I55f8da527e5e8ee4b49c7e7aa0d61ea4a6c80904	2014-12-04 14:54:11 +01:00
Pascal Massimino	66ad372500	factorize BPS definition in dsp.h and add VP8Copy16x8 Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d	2014-12-04 13:45:14 +01:00
Djordje Pesut	829a8c19a0	MIPS: dspr2: added optimization for ITransform Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463	2014-10-28 14:28:14 +01:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
skal	73d361dd5f	introduce VP8EncQuantize2Blocks to quantize two blocks at a time No speed diff for now. We might reorder better the instructions later, to speed things up. Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c	2014-08-25 20:21:42 -07:00
James Zern	230a055501	configure: set WEBP_HAVE_AVX2 when available this is used to set WEBP_USE_AVX2 in files where the build flag won't be used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f	2014-06-03 23:29:23 -07:00
James Zern	178e9a69ae	add stub dsp/enc_avx2.c VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure flag should be added to avoid the stub when avx2 is unavailable/disabled Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108	2014-05-22 00:31:46 -07:00
skal	869eaf6c60	~30% encoding speedup: use NEON for QuantizeBlock() also revamped the signature to avoid having to pass the 'first' parameter Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5	2014-04-08 03:08:22 -07:00
Djordje Pesut	0ca2914b23	MIPS: MIPS32r1: Add optimization for ITransform Change-Id: Ie4c8b9bc3a7826bd443cdebf05386786fafe8c56	2014-04-04 10:50:35 +02:00
James Zern	df230f2723	dsp: reuse wht transform from dec in encoder Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717	2014-03-22 13:25:08 -07:00
James Zern	82ae1bf299	cosmetics: normalize VP8GetCPUInfo checks - use '!= NULL' + dec_neon/STORE_WHT: align '\'s Change-Id: I0f0ce49bd9c58e771bafb24c51c070d5ebd77e53	2014-02-28 18:47:41 -08:00
skal	0235d5e44b	1-2% faster quantization in SSE2 C-version is a bit faster too (sub-1% faster on ARM) Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97	2014-02-13 15:55:30 -08:00
James Zern	5227d99146	drop: ifdef __cplusplus checks from C files the prototypes are already marked in the headers Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094	2013-12-13 11:42:13 -08:00
skal	73b731fb42	introduce a special quantization function for WHT WHT is somewhat a special case: no sharpen[] bias, etc. Will be useful in a later CL when precision of input is changed. Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299	2013-12-10 14:21:47 +01:00
skal	41c0cc4b9a	Make Forward WHT transform use 32bit fixed-point calculation This is in preparation for a future change where input will be 16bit instead of 12bit No speed diff observed. Note that the NEON implementation was using 32bit calc already. Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4	2013-12-10 06:10:52 +01:00
James Zern	d640614d54	update copyright text rather than symlink the webm/vpx terms, use the same header as libvpx to reference in-tree files based on the discussion in: https://codereview.chromium.org/12771026/ Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4	2013-06-06 23:09:14 -07:00
skal	9c4ce971a8	Simplify forward-WHT + SSE2 version no precision loss observed speed is not really faster (0.5% at max), as forward-WHT isn't called often. also: replaced a "int << 3" (undefined by C-spec) by a "int * 8" ( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ ) Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401	2013-04-26 08:57:18 +02:00
Pascal Massimino	3c8eb9a806	fix bad saturation order in QuantizeBlock Saturation was done on input coeff, not quantized one. This saturation is not absolutely needed: output of FTransformWHT is in range [-16320, 16321]. At quality 100, max quantization steps is 8, so the maximal range used by QuantizeBlock() is [-2040, 2040]. But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so it's better to leave this saturation check for now. addresses issue #145 Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28	2013-03-25 14:53:29 -07:00
skal	42c3b550ba	simplify the fwd transform -> remove two shifts Change-Id: Ibc55bca98588da30553a7870224ffd0e13d57f52	2012-11-15 09:51:35 +01:00
skal	e5c3b3f554	Simplify the texture evaluation Disto4x4() We don't need to use the exact forward transform, since it's only a rough evaluation. -> Removed some shifts and rounding constants. Change-Id: I3fdf8b4fe9720473894155e1ad0345f4d1fd9a33	2012-11-14 07:49:31 +01:00
Pascal Massimino	75e5f17e3b	ARM/NEON: 30% encoding speed-up (implements the backward and forward transforms in the encoder) original patch by Wayne Chen (datoudatou at gmail dot com) Change-Id: Ic00f3bffcdf7a924f043006728735c810ee47a57	2012-10-31 14:00:20 -07:00
skal	5725cabac0	new segmentation algorithm fixes the 'blocky sky problem' (saturation problem: when luma was flat, chroma noise was taking over, resulting in random segment id assigned. When just using a common uniform segment was better). + side clean-up and readibility/experimentability MACRO'ization + added '-map 7' option Change-Id: I35982a9e43c0fecbfdd7b05e4813e8ba8c121d71	2012-09-04 23:09:15 +02:00

1 2

56 Commits