libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-06-29 07:34:37 +02:00

Author	SHA1	Message	Date
Djordje Pesut	7ce8788b06	MIPS: dspr2: added optimization for function MakeARGB32 inline function MakeARGB32 calls changed to call via pointers to functions which make (a)rgb for entire row Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9	2014-12-22 12:31:36 +01:00
Pascal Massimino	87c3d53180	method=0: Don't evaluate any predictor and apply Paeth predictor (predictor#11) for the low effort (m=0) mode. For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality range and about 10% for higher quality range. Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93	2014-12-17 18:41:08 +01:00
Pascal Massimino	31a9cf6417	Speedup WebP lossless compression for low effort (m=0) mode with following: - Disable Cross-Color transform. - Evaluate predictors #11 (paeth), #12 and #13 only. Change-Id: I857264c85c61c3957d4fb45ae32d261d947c8bed	2014-12-17 11:52:11 +01:00
Djordje Pesut	9275d91c79	MIPS: dspr2: added optimization for function TrueMotion Change-Id: Id006d9591c0c922e28f7f4c01e4006f0f07bdd56	2014-12-12 14:38:55 +01:00
James Zern	a3946b8956	enc_neon: fix building with non-Xcode clang (iOS) check for __apple_build_version__ to distinguish the two; a version check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear how upstream will deal with their versioning as they go 3.6+, so avoid it for now. Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063	2014-12-10 15:50:26 -08:00
Pascal Massimino	8ed9c00d5e	Merge "simplify the Histogram struct, to only store max_value and last_nz"	2014-12-10 02:02:05 -08:00
Pascal Massimino	bad775715a	simplify the Histogram struct, to only store max_value and last_nz we don't need to store the whole distribution in order to compute the alpha Later, we can incorporate the max_value / last_non_zero bookkeeping in SSE2 directly. Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15	2014-12-10 10:44:57 +01:00
Djordje Pesut	3cca0dc7f0	MIPS: dspr2: Added optimization for DCMode function Change-Id: I8ea31907c1ea1259ec4db8cee1a479bd13a025a1	2014-12-09 13:58:39 +01:00
Djordje Pesut	37e395fd1c	MIPS: fix functions to use generic BPS istead of hardcoded value Change-Id: I2d68abef886eff7f8df230f155b758dccd7d04fd	2014-12-05 15:55:47 +01:00
Pascal Massimino	4a279a680e	cosmetics: add some missing != NULL comparisons Change-Id: I55f8da527e5e8ee4b49c7e7aa0d61ea4a6c80904	2014-12-04 14:54:11 +01:00
Pascal Massimino	66ad372500	factorize BPS definition in dsp.h and add VP8Copy16x8 Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d	2014-12-04 13:45:14 +01:00
Pascal Massimino	57606047ec	encoder: switch BPS to 32 instead of 16 this is a first step to unifying encoding/decoding cache stride and possibly sharing the prediction functions in dsp/ With this layout, there's a little (~7%) space lost with unused samples. But no speed change was observed. Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e	2014-12-04 09:17:18 +01:00
Djordje Pesut	1b66bbe998	MIPS: dspr2: added optimization for function TransformColor_C Change-Id: Idbf5cecf6775340585b0fd7e6ddcb29c2fcbea36	2014-12-01 15:46:06 +01:00
James Zern	9de9074c92	dec_neon: add TM8uv ~68% faster reuses TM4() adding support for the additional rows, the columns were already being done. Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b	2014-11-25 14:40:17 -08:00
James Zern	e18571393d	dsp: initialize VP8PredChroma8 in VP8DspInit() the table becomes non-const to allow for platform-specific optimizations Change-Id: I32d2b51480020dc653ecfafd20b6b0f096af349f	2014-11-24 22:12:42 -08:00
Vikas Arora	e0c809ad23	Move Entropy methods to lossless.c Move all the Entropy evaluation methods to lossless.c (from histogram.c). There's slight difference in the way entropy is computed for evaluating entropy in prediction methods and histogram (literal) for huffman trees. Plan (later) to merge few (static) methods and reduce the code size. This change has no impact on the compression speed/density. Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15	2014-11-20 13:48:05 -08:00
Djordje Pesut	2f0e2ba826	MIPS: dspr2: added optimization for function Select Change-Id: I22470d8b9ab8c5e90c5330ff12c9852676da1a3d	2014-11-07 09:44:16 +01:00
Djordje Pesut	54f2c14cce	MIPS: dspr2: added optimization for function FTransform Change-Id: Ib5850edbc2a586ec9781f494b2337f024e22af78	2014-11-06 14:21:33 +01:00
Djordje Pesut	aa42f4231f	MIPS: dspr2: Added optimization for function VP8LSubtractGreenFromBlueAndRed Change-Id: I683c73cceee4a40ca810deba15e54fbf7dbe8918	2014-11-06 10:56:18 +01:00
Djordje Pesut	95ca44a718	MIPS: dspr2: added optimization for Disto4x4 enc/dec common macros moved to mips_macro.h Change-Id: I38d491e772554ac663dd5eb4d15485c0343f23b1	2014-11-05 12:06:15 +01:00
Djordje Pesut	5798eee6be	MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899) Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140	2014-11-03 15:39:00 +01:00
James Zern	572022a350	filters_mips_dsp_r2.c: disable unfilters the output does not match the C-code. Change-Id: Ie7b7387478a6b5c3f08691628ae00f059cf6d899	2014-10-30 11:10:11 +01:00
Djordje Pesut	a28e21b141	MIPS: dspr2: Added optimization for function ClampedAddSubtractFull Change-Id: Iee98eaf007158f44a299dd5ba8d972d0d4108380	2014-10-29 13:08:06 +01:00
Djordje Pesut	18d5a1efa8	MIPS: dspr2: added optimization for function ClampedAddSubtractHalf Change-Id: Iec22e897a4f56e79c18ec00f8caa9cefac67f186	2014-10-29 11:08:37 +01:00
Djordje Pesut	829a8c19a0	MIPS: dspr2: added optimization for ITransform Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463	2014-10-28 14:28:14 +01:00
James Zern	22881c999e	dec_neon: add RD4 intra predictor based on the SSE2 version; a bit rough around the loads, but still ~38% faster. Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f	2014-10-24 21:22:07 +02:00
James Zern	1304eb3418	Merge "dec_neon: DC4: use pair-wise adds for top row"	2014-10-23 08:08:34 -07:00
James Zern	0db9031c79	dsp/dec_{neon,sse2}: VE4: normalize variable names use '0' rather than '_' when dealing with variables that result from a shift Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4	2014-10-23 16:04:13 +02:00
James Zern	b5bc15305b	dec_neon: DC4: use pair-wise adds for top row reduces load count, slightly faster Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08	2014-10-23 15:48:49 +02:00
James Zern	eba6ce06c3	dec_neon: add DC4 intra predictor ~70% faster Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0	2014-10-23 14:21:08 +02:00
James Zern	79abfbd9df	dec_neon: add TM4 intra predictor ~21% faster Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a	2014-10-23 14:21:08 +02:00
James Zern	fe395f0e4d	dec_neon: add LD4 intra predictor based on SSE2 version, ~55% faster Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0	2014-10-23 14:20:47 +02:00
James Zern	32de385eca	dec_neon: add VE4 intra predictor based on SSE2 version, ~59% faster Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b	2014-10-23 11:46:08 +02:00
Pascal Massimino	b7a33d7e91	implement VE4/HE4/RD4/... in SSE2 (30% faster prediction functions, but overall speed-up is ~1% only) Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765	2014-10-22 18:25:36 +02:00
Pascal Massimino	97c76f1f30	make VP8PredLuma4[] non-const and initialize array in VP8DspInit() also convert 'type dst' to 'type dst' Change-Id: I41ab66ad15b548cc45d1cb8b10bbca4fe1528cae	2014-10-22 18:14:20 +02:00
James Zern	f85ec712b0	PrintReg: output to stderr allows use of '-o -' while testing Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191	2014-10-22 17:28:19 +02:00
James Zern	d1c359ef29	fix shared object build with -fvisibility=hidden set WEBP_EXTERN to visibility=default + explicitly mark VP8GetCPUInfo as it's referenced within the examples Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99	2014-10-17 11:50:52 +02:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
James Zern	0ce27e715e	enc_mips32: workaround gcc-4.9 bug avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9 In function 'SSE16x16': enc_mips32.c (684) internal compiler error: Segmentation fault Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473	2014-10-15 19:14:04 +02:00
pascal massimino	32f67e309f	Merge "enc_neon: initialize vectors w/vdup_n_u32"	2014-10-09 12:23:18 -07:00
Pascal Massimino	fabc65da32	1-3% faster encoding optimizing SSE_NxN functions got rid of the \|a-b\|^\|b-a\| method and went back to just (a-b)^2 instead. quality \| size(bytes) after/before \| time (ms) after/before Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0	2014-10-09 07:20:00 -07:00
James Zern	7534d71640	enc_neon: initialize vectors w/vdup_n_u32 replaces {} initialization gnu-ism Change-Id: I5a7b2d4246f0205e4bfb7f4b77d720c47d8674ec	2014-10-09 12:35:41 +02:00
Pascal Massimino	2d9b0a4472	add WebPDispatchAlphaToGreen() to dsp SSE2 version is 2.1x faster This is used to transfer the alpha plane to green channel before lossless compression. Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a	2014-10-06 23:15:44 +02:00
Yang Zhang	ab70794ddb	rewrite Disto4x4 in enc_neon.c with intrinsic Performance test: Platform: A9 Input data: bryce.yuv 11158x2156 performance of assembly is the base. Less ratio is better. \|toolchain \|assembly \|intrinsic \| \|gcc4.6 \|100% \|97.15% \| \|gcc4.8 \|100% \|95.51 \| Change-Id: Idc2446685acdeb58a4dbdcdae533c68a83a1b879	2014-09-23 18:28:36 -07:00
Djordje Pesut	d4471637ef	MIPS: dspr2: added optimization for function FilterLoop24 affected functions: VFilter16i, HFilter16i, VFilter8i and HFilter8i Change-Id: I5d2bc7716e60e048a33d630fe4a86011bfb6d42e	2014-09-23 10:32:55 +02:00
Djordje Pesut	49e15044ef	MIPS: dspr2: added optimization for function FilterLoop26 affected functions: VFilter16, HFilter16, VFilter8 and HFilter8 Change-Id: Ib2fc41aaa00b10c2906d689bdc5a10f4568e70a8	2014-09-23 08:46:05 +02:00
Pascal Massimino	cddd334050	Add a WebPExtractAlpha function to dsp This is the opposite of WebPDispatchAlpha + Implement the SSE2 version Change-Id: I0c297309255f508c5261da8aad01f7e57f924d6c	2014-09-15 08:12:03 +02:00
Pascal Massimino	690b491af1	fix loop bug in DispatchAlpha() * We were re-doing most of the work in plain-C as 'left-over'. * we were always returning has_alpha = true because of a bad mask all_0xff These bugs were conservative and silent, in the sense that we were 'just' doing more work than necessary. Now, the SSE2 version is really 2x faster than the C version. Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa	2014-09-11 22:35:08 +02:00
Djordje Pesut	3101f53720	MIPS: dspr2: added optimization for TransformOne added macros for TransformOne, TransformAC3 and TransfromDC Change-Id: I4341450f443cf46dcf91c0db17bde63c8fb8afee	2014-09-11 17:02:02 +02:00

1 2 3 4 5 ...

457 Commits