libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-08-11 02:20:33 +02:00

Author	SHA1	Message	Date
Yang Zhang	ab70794ddb	rewrite Disto4x4 in enc_neon.c with intrinsic Performance test: Platform: A9 Input data: bryce.yuv 11158x2156 performance of assembly is the base. Less ratio is better. \|toolchain \|assembly \|intrinsic \| \|gcc4.6 \|100% \|97.15% \| \|gcc4.8 \|100% \|95.51 \| Change-Id: Idc2446685acdeb58a4dbdcdae533c68a83a1b879	2014-09-23 18:28:36 -07:00
Djordje Pesut	d4471637ef	MIPS: dspr2: added optimization for function FilterLoop24 affected functions: VFilter16i, HFilter16i, VFilter8i and HFilter8i Change-Id: I5d2bc7716e60e048a33d630fe4a86011bfb6d42e	2014-09-23 10:32:55 +02:00
Djordje Pesut	49e15044ef	MIPS: dspr2: added optimization for function FilterLoop26 affected functions: VFilter16, HFilter16, VFilter8 and HFilter8 Change-Id: Ib2fc41aaa00b10c2906d689bdc5a10f4568e70a8	2014-09-23 08:46:05 +02:00
Pascal Massimino	cddd334050	Add a WebPExtractAlpha function to dsp This is the opposite of WebPDispatchAlpha + Implement the SSE2 version Change-Id: I0c297309255f508c5261da8aad01f7e57f924d6c	2014-09-15 08:12:03 +02:00
Pascal Massimino	690b491af1	fix loop bug in DispatchAlpha() * We were re-doing most of the work in plain-C as 'left-over'. * we were always returning has_alpha = true because of a bad mask all_0xff These bugs were conservative and silent, in the sense that we were 'just' doing more work than necessary. Now, the SSE2 version is really 2x faster than the C version. Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa	2014-09-11 22:35:08 +02:00
Djordje Pesut	3101f53720	MIPS: dspr2: added optimization for TransformOne added macros for TransformOne, TransformAC3 and TransfromDC Change-Id: I4341450f443cf46dcf91c0db17bde63c8fb8afee	2014-09-11 17:02:02 +02:00
Pascal Massimino	a6bb9b17d8	SSE2 for inverse Mult(ARGB)Row and ApplyAlphaMultiply Change-Id: Iab5c0e4a4d2b31f86736a9b277e62b6e28c3d2b4 WebPMultRow: ~7x faster WebPMultARGBRow: ~3x faster ApplyAlphaMultiply: 60% faster	2014-09-11 07:58:42 +02:00
Djordje Pesut	e2502a97c1	MIPS: dspr2: added optimization for TransformAC3 Change-Id: Icd789ee5f6d764297e7dc0a0f8a3bc47ab92ac65	2014-09-09 14:53:36 +02:00
Djordje Pesut	24e1072aac	MIPS: dspr2: added optimization for TransformDC Change-Id: Iee69758f6442ea9c80ddaa32cea8d00dda4c6252	2014-09-09 14:15:04 +02:00
Djordje Pesut	f0103595dd	MIPS: dspr2: added optimization for ColorIndexInverseTransforms Change-Id: I5b6094ce489d4f896bc4b8f575142eb3c5054beb	2014-09-08 17:22:59 +02:00
James Zern	637b388809	dsp/lossless: workaround gcc-4.9 bug on arm force Sub3() to not be inlined, otherwise the code in Select() will be incorrect. https://android-review.googlesource.com/#/c/102511 Change-Id: I90ae58bf3e6cc92ca9897f69974733d562e29aaf	2014-08-27 20:31:21 -07:00
James Zern	8323a9038d	dsp.h: collect gcc/clang version test macros endian_inl.h already relies on dsp.h, grab the definitions from there. Change-Id: I445f7d0631723043c55da1070498f89965bec7b1	2014-08-27 19:33:09 -07:00
skal	e6c4b52f28	move static initialization of WebPYUV444Converters[] to the Init function. Split initialization of YUV444Converters[] out of Upsamplers init. update test for NULL function pointers Change-Id: I9603f54250f90c85a12ffbecfd6c59e9b06c47e0	2014-08-27 11:36:37 -07:00
skal	f5c04d64b7	Merge "add a DispatchAlpha() for SSE2 that handles 8 pixels at a time"	2014-08-25 22:43:42 -07:00
skal	fc98edd936	add a DispatchAlpha() for SSE2 that handles 8 pixels at a time Only slightly faster. Change-Id: Ie2e57e6a0950166124cf1075c6c9b45b7abdad8c	2014-08-25 21:03:03 -07:00
skal	73d361dd5f	introduce VP8EncQuantize2Blocks to quantize two blocks at a time No speed diff for now. We might reorder better the instructions later, to speed things up. Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c	2014-08-25 20:21:42 -07:00
Djordje Pesut	0b21c30b1a	MIPS: dspr2: added optimization for EmitAlphaRGB New dsp function: WebPDispatchAlpha() Change-Id: I48e539d22471279ec75185759bc68d18b127f716	2014-08-21 20:39:35 -07:00
James Zern	953acd56a4	enc_neon: enable QuantizeBlock for aarch64 vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there with a corresponding change in the load. Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac	2014-08-20 11:48:25 -07:00
Djordje Pesut	f4ae143720	MIPS: mips32: code rebase mips code rebased to be same as C code from commit I8c29a8a0285076cb3423b01ffae9fcc465da6a81 Change-Id: I3848f4ce43387c3a62b336606498779f7b07ec44	2014-08-19 15:13:16 +02:00
Djordje Pesut	569771549a	MIPS: dspr2: added optimizations for VP8YuvTo* VP8YuvToRgb VP8YuvToBgr VP8YuvToRgb565 VP8YuvToRgba4444 VP8YuvToArgb VP8YuvToBgra VP8YuvToRgba Change-Id: I22212a125d890e1fd28388fec906a1a5c07ff386	2014-08-19 14:29:32 +02:00
James Zern	3fca851a20	cpu: check for _MSC_VER before using msvc inline asm _M_IX86 will be defined in mingw builds after including windows.h. as the gcc inline asm is first, this missing check would only have caused an error if the code was reorganized. Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c	2014-08-15 15:11:40 -07:00
Djordje Pesut	b4dc4069a2	MIPS: dspr2: added optimization for (un)filters HorizontalFilter VerticalFilter GradientFilter HorizontalUnfilter VerticalUnfilter GradientUnfilter Change-Id: I54055b4767c37719691811072e95bf79c1f627b1	2014-08-14 11:55:19 -07:00
Djordje Pesut	b61c9ceca8	MIPS: dspr2: Optimization of some simple point-sampling functions Change-Id: I6a4ab29bd0cc5a2951a8882cf9997032dc38bd79	2014-08-13 17:18:49 +02:00
Djordje Pesut	98c54107df	MIPS: mips32r2: added optimization for BSwap32 gcc < 4.8.3 doesn't translate bswap optimally. use optimized version always Change-Id: I979ea26ad6dc0166d3d2f39c4148eb8adfb7ddec	2014-08-12 09:29:13 +02:00
Djordje Pesut	b7e5a5c451	MIPS: detect mips32r6 and disable mips32r1 code Change-Id: Id1325c789a990c9a8704e84e99a22d580303eb8a	2014-08-08 17:29:31 +02:00
pascal massimino	bb07022b66	Merge "cosmetics"	2014-08-06 12:30:08 -07:00
James Zern	e300c9d819	cosmetics fix some indent/whitespace, remove a few duplicate includes, extra semi-colons Change-Id: If937182b40a21e0f2028496e7b4b06c6e8a41352	2014-08-06 12:10:59 -07:00
James Zern	f7b4c48bba	cosmetics: remove some extraneous 'extern's Change-Id: Ib3f0cff37120c51633387dd1c46592c53ab0ba6d	2014-08-05 22:14:24 -07:00
James Zern	0524d9e5e8	dsp: detect mips64 & disable mips32 code Change-Id: Icf68dafd5cf0614ca25b36a0252caa1784ac8059	2014-08-01 21:18:53 -07:00
skal	8f6f8c5dde	remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv There's no speed diff, so better remove it altogether Reported in https://code.google.com/p/webp/issues/detail?id=215 Change-Id: I991330de18bec340029d6df5fed0dfb4337e4662	2014-07-23 14:15:40 -07:00
James Zern	c76f07ecc2	dec_neon/TransformAC3: initialize vector w/vcreate replaces {} initialization gnu-ism Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0	2014-07-11 15:56:53 -07:00
James Zern	380cca4f2c	configure.ac: add AC_C_BIGENDIAN this defines WORDS_BIGENDIAN, replacing uses of __BIG_ENDIAN__/__BYTE_ORDER__ with it + fixes lossless BGRA output with big-endian toolchains that do not define __BIG_ENDIAN__ (codesourcery mips gcc) Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7	2014-07-03 18:15:50 -07:00
James Zern	47779d46c8	endian_inl.h: add BSwap32 Change-Id: I96e3ae49659307024415d64587e6312888a0070f	2014-07-03 13:28:13 -07:00
James Zern	e59f53600f	neon: normalize vdup_n_* usage with constants, prefer this over vmov_n_* or vcreate_* Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a	2014-07-01 00:55:05 -07:00
James Zern	bc03670f01	neon: add INIT_VECTOR4 used to initialize NxMx4 vector types replaces initialization via '{{ }}' gnu-ism. Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e	2014-07-01 00:18:23 -07:00
James Zern	6c1c632b03	neon: add INIT_VECTOR3 used to initialize NxMx3 vector types replaces initialization via '{{ }}' gnu-ism. Change-Id: Idad2f278ab104cf2cc650517194258ce3cfb37b4	2014-06-30 23:53:23 -07:00
James Zern	dc7687e51b	neon: add INIT_VECTOR2 used to initialize NxMx2 vector types replaces initialization via '{{ }}' gnu-ism. Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139	2014-06-30 23:52:42 -07:00
Pascal Massimino	1f3e5f1e60	remove unused 'shift' argument and QFIX2 define this will remove a warning about the shift amount not being an immediate (=constant). Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a Also: fix the NULL sharpen argument mismatch.	2014-06-26 00:44:12 -07:00
levytamar82	27bfeee43a	QuantizeBlock SSE2 Optimization: Another store to load forward block was detected coming from the function FTransform. FTransform save the output data 4 times 8 bytes each. when this data is later being loaded by the QuantizeBlock function in one chunk of 16 bytes that caused a store to load forward block. The fix was done in the FTransform function where each two consecutive 8 bytes were merged into one 16 bytes register and saved into the memory. This fix gives ~21% function level gain and 1.6% user level gain. Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992	2014-06-18 16:22:00 -07:00
James Zern	7a93c000ee	**/Makefile.am: remove unused AM_CPPFLAGS only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former getting precedence when it's defined. configure's DEFAULT_INCLUDES is covering what's necessary given the include paths are all source relative. Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3	2014-06-12 11:59:05 -07:00
James Zern	32b3137936	configure: move config.h to src/webp/config.h this change has the side-effect of using directory names in the include, silencing a lint warning. Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02	2014-06-10 23:42:00 -07:00
James Zern	90090d99b5	Merge changes I7c675e51,I84f7d785 * changes: configure: test for -msse2 rename upsampling_mips32.c to yuv_mips32.c	2014-06-10 16:15:21 -07:00
skal	69fce2ea78	remove the special casing for res->first in VP8SetResidualCoeffs if res->first = 1, coeffs[0]=0 because of quant.c:749 and line added at quant.c:744 So, no need for the extra case. Going forward, TrellisQuantizeBlock() should also be calling a variant of VP8SetResidualCoeffs() to set the 'last' field. also: fixes a warning for win64 + slight speed-up Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0	2014-06-08 06:40:22 +02:00
James Zern	6e61a3a905	configure: test for -msse2 + add a WEBP_HAVE_SSE2 to dsp.h not all 32-bit toolchain configurations will have sse2 enabled by default Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e	2014-06-07 19:44:08 -07:00
James Zern	b9d2efc629	rename upsampling_mips32.c to yuv_mips32.c matches yuv_sse2 added in; `bdfeeba` dsp/yuv: move sse2 functions to yuv_sse2.c Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3	2014-06-07 12:35:47 -07:00
James Zern	bdfeebaa01	dsp/yuv: move sse2 functions to yuv_sse2.c Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000	2014-06-05 23:52:54 -07:00
pascal massimino	46b32e861a	Merge "configure: set WEBP_HAVE_AVX2 when available"	2014-06-05 02:57:42 -07:00
James Zern	db4860b355	enc_sse2: prevent signed int overflow _mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed int. Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda	2014-06-04 23:18:22 -07:00
James Zern	230a055501	configure: set WEBP_HAVE_AVX2 when available this is used to set WEBP_USE_AVX2 in files where the build flag won't be used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f	2014-06-03 23:29:23 -07:00
James Zern	61362db57c	remove libwebpdspdecode dep on libwebpdsp_avx2 it's encode only, libwebpdecoder doesn't need the symbols Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83	2014-06-03 00:05:56 -07:00

... 3 4 5 6 7 ...

513 Commits