libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-10-14 14:31:36 +02:00

Author	SHA1	Message	Date
James Zern	79abfbd9df	dec_neon: add TM4 intra predictor ~21% faster Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a	2014-10-23 14:21:08 +02:00
James Zern	fe395f0e4d	dec_neon: add LD4 intra predictor based on SSE2 version, ~55% faster Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0	2014-10-23 14:20:47 +02:00
James Zern	32de385eca	dec_neon: add VE4 intra predictor based on SSE2 version, ~59% faster Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b	2014-10-23 11:46:08 +02:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
James Zern	c76f07ecc2	dec_neon/TransformAC3: initialize vector w/vcreate replaces {} initialization gnu-ism Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0	2014-07-11 15:56:53 -07:00
James Zern	e59f53600f	neon: normalize vdup_n_* usage with constants, prefer this over vmov_n_* or vcreate_* Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a	2014-07-01 00:55:05 -07:00
James Zern	bc03670f01	neon: add INIT_VECTOR4 used to initialize NxMx4 vector types replaces initialization via '{{ }}' gnu-ism. Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e	2014-07-01 00:18:23 -07:00
James Zern	6c1c632b03	neon: add INIT_VECTOR3 used to initialize NxMx3 vector types replaces initialization via '{{ }}' gnu-ism. Change-Id: Idad2f278ab104cf2cc650517194258ce3cfb37b4	2014-06-30 23:53:23 -07:00
James Zern	dc7687e51b	neon: add INIT_VECTOR2 used to initialize NxMx2 vector types replaces initialization via '{{ }}' gnu-ism. Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139	2014-06-30 23:52:42 -07:00
skal	ea8b0a171d	strong filtering speed-up (~2-3% x86, ~1-2% for NEON) Extract loop invariant and avoid storing/loading samples if they can be re-used. This is particularly interesting when a transpose is involved (HFilter16i). Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139	2014-06-03 07:14:23 +02:00
James Zern	9251c2f6d2	(enc\|dec)_neon: use vcreate_*() where appropriate this is more portable than {} initialization. more involved cases are left for a follow-up. Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17	2014-05-27 16:26:56 -07:00
James Zern	b9d2bb67d6	dsp/neon.h: coalesce intrinsics-related defines Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946	2014-05-03 11:34:07 -07:00
James Zern	c8bbb636ea	dec_neon: relocate some inline-asm defines move simple loop filter defines closer to their use and LOAD* to a location common with the intrinsics Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3	2014-04-28 00:41:42 -07:00
James Zern	4e393bb9f1	dec_neon: enable intrinsics-only functions the complex loop filter has no inline equivalent; the simple loop filter remains conditional on USE_INTRINSICS: it's left undefined for now. Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2	2014-04-28 00:39:46 -07:00
James Zern	ba99a922ab	dec_neon: use positive tests for USE_INTRINSICS makes Simple* layout consistent with the rest of the file Change-Id: Ib3108b0f2c694c634210e22027c253ea6236a9c6	2014-04-28 00:38:47 -07:00
James Zern	a7828e8bdb	dec_neon: make WORK_AROUND_GCC conditional on version Change-Id: Ic1b95f8749988de90df7c1ff6c537a21981329db	2014-04-28 00:01:19 -07:00
pascal massimino	ca49e7ad97	Merge "enc_neon: move Transpose4x4 to dsp/neon.h"	2014-04-27 01:11:05 -07:00
James Zern	5e1a17ef4b	enc_neon: move Transpose4x4 to dsp/neon.h + reuse it in TransformWHT() Change-Id: Idfbd0f9b58d6253ac3d65ba55b58989c427ee989	2014-04-26 14:06:04 -07:00
James Zern	c7b92a5a29	dec_neon: (WORK_AROUND_GCC) delete unused Load4x8 using this in Load4x16 was slightly slower and didn't help mitigate any of the remaining build issues with 4.6.x. Change-Id: Idabfe1b528842a514d14a85f4cefeb90abe08e51	2014-04-26 12:36:14 -07:00
James Zern	71bca5ecf3	dec_neon: use vst_lane instead of vget_lane results in fewer instructions, small speed improvement Change-Id: I98de632d09ff09f295368c0d744cb4397b585084	2014-04-03 14:56:26 -07:00
skal	bf06105293	Intrinsics NEON version of TransformOne + misc cosmetics * seems 4% slower than inlined-asm with gcc-4.6 * is a tad faster (<1%) with gcc-4.8 (disabled for now) Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095	2014-04-03 14:41:56 -07:00
pascal massimino	19c6f1ba74	Merge "dec_neon: use vld?_lane instead of vset?_lane"	2014-04-03 01:16:29 -07:00
James Zern	fa52d7525f	dec_neon: use vld?_lane instead of vset?_lane results in fewer instructions, small speed improvement Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a	2014-04-02 23:03:18 -07:00
Pascal Massimino	c520e77d94	cosmetic: fix long line Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea	2014-04-02 23:00:50 -07:00
skal	e351ec0759	add intrinsics NEON code for chroma strong-filtering The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t register, and re-use the previous (luma) functions Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38	2014-04-03 06:58:21 +02:00
skal	5fbff3a646	Add strong filtering intrinsics (inner and outer edges) + added some work-around gcc-4.6 to make it compile (except one function). + lots of revamping All variants tested ok. Speed-up is ~5-7% Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3	2014-04-02 08:28:55 +02:00
James Zern	26029568b7	dec_neon: add strong loopfilter intrinsics vertical only currently, 2.5-3% faster placed under USE_INTRINSICS as this change depends on the simple loopfilter improves the simple loopfilter slightly thanks to some reorganization Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155	2014-04-01 01:13:50 -07:00
skal	b9a7a45f1f	add intrinsics version of SimpleHFilter16NEON() It's disable for now, because it crashes gcc-4.6.3 during compilation with -O2 or -O3. It's been tested OK with -O1. Code is still globally disabled with USE_INTRINSICS, though. Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c	2014-03-31 16:31:31 +02:00
Pascal Massimino	daccbf400d	add light filtering NEON intrinsics disabled for now (but tested OK), thanks to the USE_INTRINSICS #define We'll activate the code when we're on par with non-intrinsics Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff	2014-03-30 22:15:55 -07:00
Pascal Massimino	af44460880	fix typo in STORE_WHT was working ok because dst == out Change-Id: I27095129a11f468422250dd2b8fad8b3bd4e5bbd	2014-03-28 10:34:44 -07:00
James Zern	82ae1bf299	cosmetics: normalize VP8GetCPUInfo checks - use '!= NULL' + dec_neon/STORE_WHT: align '\'s Change-Id: I0f0ce49bd9c58e771bafb24c51c070d5ebd77e53	2014-02-28 18:47:41 -08:00
James Zern	9a463c4a51	Merge "dec_neon: convert TransformWHT to intrinsics"	2014-02-25 14:36:44 -08:00
James Zern	9d6b5ff1e6	dec_neon: convert TransformWHT to intrinsics Change-Id: I34dc1d75ddebab131cfed031764117e3f7b75c6b	2014-02-21 11:23:46 -08:00
James Zern	2ff0aae2fe	dec_neon: add ConvertU8ToS16 Change-Id: Ifc4fb8e7f862e72154d2f2739811b1022d2b9416	2014-02-20 15:35:33 -08:00
James Zern	2719bb7e98	dec_neon: TransformAC3: work on packed vectors pack 2 rows in 1 vector similar to TransformDC Change-Id: I3b240ffb4f51a632b5c8c2daf54d938333ed4b0d	2014-02-18 19:47:20 -08:00
James Zern	b7b60ca16c	dec_neon: add SaturateAndStore4x4 converts 2 s16 vectors to 2 u8 and store to uint8_t destination; TransformAC3 can reuse this after a rework Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0	2014-02-18 19:42:35 -08:00
James Zern	e02f16ef45	dec_neon.c: convert TransformDC to intrinsics no noticeable difference in performance Change-Id: Ia2d287289c3865ddd0fc99edaf7a030778aa7025	2014-02-14 12:11:58 -08:00
James Zern	228e4877ab	dec_neon.c: add TransformAC3 based on SSE2 version Change-Id: Icc6782955253c98e83d5984153b596ef5f1c0d34	2014-02-08 12:47:54 -08:00
skal	66a32af5e1	Merge "NEON speed up"	2013-12-18 14:17:19 -08:00
skal	26d842eb8f	NEON speed up add TransformDC special case, and make the switch function inlined. Recovers a few of the CPU lost during the addition of TransformAC3 (only on ARM) Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6	2013-12-18 22:32:58 +01:00
James Zern	5227d99146	drop: ifdef __cplusplus checks from C files the prototypes are already marked in the headers Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094	2013-12-13 11:42:13 -08:00
James Zern	fc10249b36	NEON/simple loopfilter: avoid q4-q7 registers very tiny speed improvement Change-Id: I3024f120feb7275ce20bfff21af31ea8650a5a03	2013-10-09 18:17:31 +02:00
James Zern	2f09d63e30	NEON/TransformWHT: avoid q4-q7 registers very tiny speed improvement Change-Id: Iace78b9038af412d0a794845ff19f54afa88ccdc	2013-10-09 18:17:23 +02:00
James Zern	d640614d54	update copyright text rather than symlink the webm/vpx terms, use the same header as libvpx to reference in-tree files based on the discussion in: https://codereview.chromium.org/12771026/ Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4	2013-06-06 23:09:14 -07:00
Urvang Joshi	e9a7990bc5	Cosmetic fixes Change-Id: Ia878115086edc3fdfee3f0ca76e5e74ea5906f21	2013-03-29 14:21:56 -07:00
Pascal Massimino	142c46291e	misc style fix Change-Id: Ib764cb09bd78ab6e72c60f495d55b752ad4dbe4d	2013-03-29 03:13:43 -07:00
Pascal Massimino	e8b41ad136	add NEON asm version for WHT inverse transform Contributed by Wayne Chen (datoudatou at gmail dot com) + some header cleanup + remove the NEON suffix in static functions Change-Id: I75bf5e9b54cf5e1acc53764c6f081d61690f8e3d	2012-11-01 16:31:01 -07:00
Pascal Massimino	c7eb45764f	make VP8DspInitNEON() public this will avoid the "dec_neon.o has no symbol" warning no change in binary size observed on linux. Change-Id: Ia27ae2bc5a03d714afa7e46671fdcf4cb630784d	2012-08-27 00:28:13 -07:00
James Zern	255c66b48f	Android: only build dec_neon with NEON support Defining LOCAL_ARM_NEON = true can result in neon instructions being used in portions unprotected by the cpu check. This changes defines a WEBP_USE_NEON/WEBP_ANDROID_NEON pair similar to the SSE2 code and MSVC. Change-Id: Ifac010b06e42c73d5aca529baa2198c6796674bd	2012-05-23 22:21:10 -07:00

1 2

53 Commits