libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-18 23:09:52 +02:00

Author	SHA1	Message	Date
Pascal Massimino	c64659e1b4	remove duplicate variables after the lossless{_enc}.c split clang was giving "duplicate symbols" error messages at link time. Change-Id: I2b77b55222fe033cc1d4636567902e80d814aab6	2015-03-25 11:10:21 +01:00
James Zern	553051f741	dsp/lossless: split enc/dec functions adds lossless_enc*.c; reduces the size of the decode-only so: ~78K w/gcc-4.8.2 on x86_64. Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092	2015-03-23 22:57:50 -07:00
James Zern	35579a4902	VP8LDspInit: remove memcpy without this change the TSan annotation is useless Change-Id: Ief511379f3aad75889815d4fe8362aed5c1abac7	2015-02-09 23:41:24 -08:00
Pascal Massimino	3fd59039bd	simplify/reorganize arguments for CollectColorBlueTransforms and other various call sites too. Change-Id: Icb8f828dfe25672662de18d0e48e7d3144b1f38d	2015-01-15 18:12:12 -08:00
Djordje Pesut	a7e7caa486	MIPS: dspr2: added optimization for function TransformColorRed added new function CollectColorRedTransforms to C, which calls TransformColorRed and it is realized via pointer to function Change-Id: Ia68d73bfcf1ca2cb443dc2825910946221f87835	2015-01-15 09:32:09 +01:00
Djordje Pesut	7b16197361	MIPS: dspr2: added optimization for function TransformColorBlue added new function CollectColorBlueTransforms to C, which calls TransformColorBlue and it is realized via pointer to function Change-Id: Ia488b7a7a689223b5d33aae9724afab89b97fced	2015-01-13 10:39:38 +01:00
James Zern	67f601cd46	make the 'last_cpuinfo_used' variable names unique allows the sources to be #include'd in some hackish builds (don't do that!) Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91	2015-01-07 23:38:53 -08:00
Pascal Massimino	a437694a17	multi-thread fix: lock each entry points with a static var we compare the current VP8GetCPUInfo pointer to the last used. This is less code overall and each implementation is still testable separately (by just changing VP8GetCPUInfo, but not a separate threads!) Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b	2015-01-05 07:48:49 -08:00
Pascal Massimino	87c3d53180	method=0: Don't evaluate any predictor and apply Paeth predictor (predictor#11) for the low effort (m=0) mode. For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality range and about 10% for higher quality range. Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93	2014-12-17 18:41:08 +01:00
Pascal Massimino	31a9cf6417	Speedup WebP lossless compression for low effort (m=0) mode with following: - Disable Cross-Color transform. - Evaluate predictors #11 (paeth), #12 and #13 only. Change-Id: I857264c85c61c3957d4fb45ae32d261d947c8bed	2014-12-17 11:52:11 +01:00
Vikas Arora	e0c809ad23	Move Entropy methods to lossless.c Move all the Entropy evaluation methods to lossless.c (from histogram.c). There's slight difference in the way entropy is computed for evaluating entropy in prediction methods and histogram (literal) for huffman trees. Plan (later) to merge few (static) methods and reduce the code size. This change has no impact on the compression speed/density. Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15	2014-11-20 13:48:05 -08:00
James Zern	a4c3a31b8f	WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning move the attribute to the front of the function to quiet clang warning: GCC does not allow no_sanitize_thread attribute in this position on a function definition Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676	2014-10-16 18:06:43 +02:00
Pascal Massimino	80247291c6	mark some init function as being safe for thread_sanitizer. introduces the macro WEBP_TSAN_IGNORE_FUNCTION Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b	2014-10-16 16:34:07 +02:00
Djordje Pesut	f0103595dd	MIPS: dspr2: added optimization for ColorIndexInverseTransforms Change-Id: I5b6094ce489d4f896bc4b8f575142eb3c5054beb	2014-09-08 17:22:59 +02:00
James Zern	637b388809	dsp/lossless: workaround gcc-4.9 bug on arm force Sub3() to not be inlined, otherwise the code in Select() will be incorrect. https://android-review.googlesource.com/#/c/102511 Change-Id: I90ae58bf3e6cc92ca9897f69974733d562e29aaf	2014-08-27 20:31:21 -07:00
James Zern	e300c9d819	cosmetics fix some indent/whitespace, remove a few duplicate includes, extra semi-colons Change-Id: If937182b40a21e0f2028496e7b4b06c6e8a41352	2014-08-06 12:10:59 -07:00
James Zern	380cca4f2c	configure.ac: add AC_C_BIGENDIAN this defines WORDS_BIGENDIAN, replacing uses of __BIG_ENDIAN__/__BYTE_ORDER__ with it + fixes lossless BGRA output with big-endian toolchains that do not define __BIG_ENDIAN__ (codesourcery mips gcc) Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7	2014-07-03 18:15:50 -07:00
James Zern	47779d46c8	endian_inl.h: add BSwap32 Change-Id: I96e3ae49659307024415d64587e6312888a0070f	2014-07-03 13:28:13 -07:00
James Zern	bd6b8619dd	dsp/lossless: prevent signed int overflow in left shift ops force unsigned when shifting by 24. Change-Id: I453601f33fdf01c516ef66ad23399ae6cbe032b3	2014-04-30 00:10:49 -07:00
Pascal Massimino	b3a616b356	make HistogramAdd() a pointer in dsp * merged the two HistogramAdd/AddEval() into a single call (with detection of special case when b==out) * added a SSE2 variant * harmonize the histogram type to 'uint32_t' instead of just 'int'. This has a lot of ripples on signatures. * 1-2% faster Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306	2014-04-28 10:09:34 -07:00
skal	75b12006e3	Move the HuffmanCost() function to dsp lib This is to help further optimizations. (like in https://gerrit.chromium.org/gerrit/#/c/69787/) There's a small slowdown (~0.5% at -z 9 quality) due to function pointer usage. Note that, for speed, it's important to return VP8LStreaks by value, and not pass a pointer. Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000	2014-04-18 11:59:48 -07:00
Djordje Pesut	4ae0533f39	MIPS: MIPS32r1: Added optimizations for ExtraCost functions. ExtraCost and ExtraCostCombined Change-Id: I7eceb9ce2807296c6b43b974e4216879ddcd79f2	2014-04-15 15:37:06 +02:00
Jovan Zelincevic	baabf1ea3a	MIPS: MIPS32r1: Added optimizations for FastLog2 Functions VP8LFastLog2Slow and VP8LFastSLog2Slow also: replaced some "% y" by "& (y-1)" in the C-version (since y is a power-of-two) Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0	2014-04-10 08:32:51 -07:00
Urvang Joshi	c90a902eff	Add SSE2 version of forward cross-color transform Encoding speed is roughly the same. Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c	2014-04-02 12:21:20 -07:00
Urvang Joshi	d4813f0cb2	Add SSE2 function for Inverse Cross-color Transform Lossless decoding is now ~3% faster. Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743	2014-04-01 15:52:25 -07:00
skal	97e5fac389	add some colorspace conversion functions in NEON new file: lossless_neon.c speedup is ~5% gcc 4.6.3 seems to be doing some sub-optimal things here, storing register on stack using 'vstmia' and such. Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509 I've tried adding -fno-split-wide-types and it does help the generated assembly. But the overall speed gets worse with this flag. We should only compile lossless_neon.c with it -> urk. Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0	2014-03-31 17:47:46 +02:00
James Zern	51f406a5d7	lossless_sse2: relocate VP8LDspInitSSE2 proto this is in line with the other dsp files and silences a build warning. Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2	2014-03-27 15:07:43 -07:00
skal	0f4f721b12	separate SSE2 lossless functions into its own file expose the predictor array as function pointers instead of each individual sub-function + merged Average2() into ClampedAddSubtractHalf directly + unified the signature as "VP8LProcessBlueAndRedFunc" no speed diff observed Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044	2014-03-27 21:43:55 +01:00
skal	514fc251df	VP8LConvertFromBGRA: use conversion function pointers Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c	2014-03-27 09:00:35 +01:00
skal	369c26dd3f	Add SSE2 version of ARGB -> BGR/RGB/... conversion functions ~4-6% faster lossless decoding Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361	2014-03-26 22:19:00 +01:00
Vikas Arora	312e638f30	Extend the search space for GetBestGreenRedToBlue Get back some of the compression gains by extending the search space for GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not helping much in yielding better compression density. Before: 1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations) Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s After: 1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations) Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d	2014-03-14 09:56:00 -07:00
Vikas Arora	1c58526fe1	Fix few nits Add/remove few casts, fixed indentation. Change-Id: Icd141694201843c04e476f09142ce4be6e502dff	2014-03-13 13:57:39 -07:00
Vikas Arora	068b14ac57	Optimize lossless decoding. Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove one if condition inside the main/critial loop. Also separated TransformColor & TransformColorInverse into separate functions and avoid one 'if condition' inside this critical method. This change speeds up lossless decoding for Lenna image about 5% and 1000 image corpus by 3-4%. Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d	2014-03-13 11:27:12 -07:00
skal	c60de26099	~3-4% faster lossless encoding by re-arranging some code from SkipRepeatedPixel() Change-Id: I6c1fd7cd9af22cd9be4234217ff67d7b94f44137	2014-03-04 08:12:59 +01:00
Vikas Arora	206cc1be5a	Refactor GetBestPredictorForTile for future tuning. This change doesn't impact compression gain or compression speed. Change-Id: Ia87d8a46c6f1ce0f8974178d75a6b0ba0a6e3696	2014-02-28 11:30:23 -08:00
Vikas Arora	c16cd99aba	Speed up lossless encoder. Speedup lossless encoder by 20-25% by optimizing: - GetBestColorTransformForTile: Use techniques like binary search and local minima search to reduce the search space. - VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for log(1 + x) and increase the threshold for calling the approximate version of log_2 (compared to costly call to log()). Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d	2014-02-21 22:13:50 -08:00
skal	32aeaf115a	revamp VP8LColorSpaceTransform() a bit -> remove the 'color_transform' multiplier, use more constants, etc. This function is particularly critical, mostly because of GetBestColorTransformForTile(). Loop is a bit faster (maybe ~1%) Change-Id: I90c96a3437cafb184773acef55c77e40c224388f	2014-02-05 10:37:06 +01:00
James Zern	5227d99146	drop: ifdef __cplusplus checks from C files the prototypes are already marked in the headers Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094	2013-12-13 11:42:13 -08:00
Vikas Arora	e081f2f359	Pack code & extra_bits to Struct (VP8LPrefixCode). Also created variant VP8LPrefixEncodeBits that returns the code & extra_bits only. There's no impact on compression density and compression speed. Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c	2013-08-12 11:56:42 -07:00
Vikas Arora	69257f70df	Create LUT for PrefixEncode. This speeds up lossless compression by 5%. Change-Id: Ifd114b1d9850dc3aac74593809e7d48529d35e3d	2013-08-05 10:20:18 -07:00
Vikas Arora	8967b9f37e	SSE2 for lossless decoding (critical) functions. This speeds up WebP lossless decoding by 20%. In particular, the photographic images get 35% speedup. Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc	2013-06-27 11:42:45 -07:00
James Zern	d640614d54	update copyright text rather than symlink the webm/vpx terms, use the same header as libvpx to reference in-tree files based on the discussion in: https://codereview.chromium.org/12771026/ Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4	2013-06-06 23:09:14 -07:00
James Zern	2ca83968ae	webp/lossless: fix big endian BGRA output Change-Id: I3d4b3d21f561cb526dbe7697a31ea847d3e8b2c1	2013-05-17 00:32:01 -07:00
skal	87a4fca25f	remove some warnings: * "declaration of ‘index’ shadows a global declaration [-Wshadow]" * "signed and unsigned type in conditional expression [-Wsign-compare]" Change-Id: I891182d919b18b6c84048486e0385027bd93b57d	2013-05-14 22:28:32 +02:00
Urvang Joshi	64c844863a	Further reduce memory to decode lossy+alpha images Earlier such images were using roughly 9 * width * height bytes for decoding. Now, they take 6 * width * height memory. Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7	2013-05-13 16:24:49 -07:00
Vikas Arora	8eae188a62	WebP-Lossless encoding improvements. Lossy (with Alpha) image compression gets 2.3X speedup. Compressing lossless images is 20%-40% faster now. Change-Id: I41f0225838b48ae5c60b1effd1b0de72fecb3ae6	2013-05-08 17:22:11 -07:00
skal	b7eaa85d6a	inline VP8LFastLog2() and VP8LFastSLog2 for small values larger values are still dealt with in the .cc ~5% faster encoding Output size is slightly different (variably), because of different floating-point calculation ordering. Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4	2013-02-25 22:46:52 +01:00
skal	943386db4b	disable SSE2 for now (until proper run-time detection is ready) Change-Id: I7b8eee52b23fce2f1612ad7d4ed603ffb02620a2	2013-02-20 08:20:47 +01:00
skal	9479fb7d2d	lossless encoding speedup * add SSE2 variant for lossless * speed-up TransformColor calls using specialized TransformColorBlue/Red * Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously. This latter changes the output size a little bit. Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89	2013-02-20 08:13:12 +01:00
skal	b7490f8553	introduce WEBP_REFERENCE_IMPLEMENTATION compile option This flag will make the code use no uint64, no asm, and no fancy trick, but instead aim at being as simple and straightforward as possible. Main use is to help emscripten generate proper JS code. More code needs to be simplified later. Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY Here are the typical timing for decoding a large image: ARM7-a: dwebp_justify_32_neon Time to decode picture: 3.280s dwebp_justify_24_neon Time to decode picture: 2.640s dwebp_justify_16_neon Time to decode picture: 2.723s dwebp_justify_8_neon Time to decode picture: 2.802s dwebp_justify_32 Time to decode picture: 4.264s dwebp_justify_24 Time to decode picture: 3.696s dwebp_justify_16 Time to decode picture: 3.779s dwebp_justify_8 Time to decode picture: 3.834s dwebp_32_neon Time to decode picture: 4.010s dwebp_24_neon Time to decode picture: 2.725s dwebp_16_neon Time to decode picture: 2.852s dwebp_8_neon Time to decode picture: 2.778s dwebp_32 Time to decode picture: 4.587s dwebp_24 Time to decode picture: 3.800s dwebp_16 Time to decode picture: 3.902s dwebp_8 Time to decode picture: 3.815s REFERENCE (HEAD) Time to decode picture: 3.818s x86_64: dwebp_justify_32 Time to decode picture: 0.473s dwebp_justify_24 Time to decode picture: 0.434s dwebp_justify_16 Time to decode picture: 0.450s dwebp_justify_8 Time to decode picture: 0.467s dwebp_32 Time to decode picture: 0.474s dwebp_24 Time to decode picture: 0.468s dwebp_16 Time to decode picture: 0.468s dwebp_8 Time to decode picture: 0.481s REFERENCE (HEAD) Time to decode picture: 0.436s i386: dwebp_justify_32 Time to decode picture: 0.723s dwebp_justify_24 Time to decode picture: 0.618s dwebp_justify_16 Time to decode picture: 0.626s dwebp_justify_8 Time to decode picture: 0.651s dwebp_32 Time to decode picture: 0.744s dwebp_24 Time to decode picture: 0.627s dwebp_16 Time to decode picture: 0.642s dwebp_8 Time to decode picture: 0.642s Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225	2013-02-14 15:46:12 +01:00

1 2

78 Commits