libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-12 05:54:31 +02:00

Author	SHA1	Message	Date
Vikas Arora	9383afd5c7	Reduce number of memory allocations while decoding lossless. This change reduces the number of calls to WebPSafeMalloc from 200 to 100. The overall memory consumption is down 3% for Lenna image. Change-Id: I1b351a1f61abf2634c035ef1ccb34050b7876bdd	2014-05-02 01:01:43 -07:00
James Zern	888e63edc9	Merge "dsp/lossless: prevent signed int overflow in left shift ops"	2014-05-02 00:29:54 -07:00
Pascal Massimino	8137f3edbd	Merge "instrument memory allocation routines for debugging"	2014-05-02 00:23:48 -07:00
Pascal Massimino	2aa187360d	instrument memory allocation routines for debugging Some tracing code is activated by PRINT_MEM_INFO flag. For debugging only! (not thread-safe, and slow). Change-Id: I282c623c960f97d474a35b600981b761ef89ace9	2014-05-02 00:19:55 -07:00
skal	d3bcf72bf5	Don't allocate VP8LHashChain, but treat like automatic object the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_) is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain in the encoder. Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677	2014-04-30 14:10:48 -07:00
James Zern	bd6b8619dd	dsp/lossless: prevent signed int overflow in left shift ops force unsigned when shifting by 24. Change-Id: I453601f33fdf01c516ef66ad23399ae6cbe032b3	2014-04-30 00:10:49 -07:00
James Zern	b7f19b8311	Merge "dec/vp8l: prevent signed int overflow in left shift ops"	2014-04-29 15:56:54 -07:00
Pascal Massimino	29059d5178	Merge "remove some uint64_t casts and use."	2014-04-29 14:15:40 -07:00
James Zern	e69a1df4b7	dec/vp8l: prevent signed int overflow in left shift ops force unsigned when shifting by 24. Change-Id: I6f9ca5fa2109e59b1d46a909136384fc6dc8ca0b	2014-04-29 14:12:38 -07:00
Pascal Massimino	cf5eb8ad19	remove some uint64_t casts and use. We use automatic int->uint64_t promotion where applicable. (uint64_t should be kept only for overflow checking and memory alloc). Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125	2014-04-29 09:08:25 -07:00
Djordje Pesut	38e2db3e16	MIPS: MIPS32r1: Added optimization for HistogramAdd. Change-Id: I39622a9c340c4090f64dd10e515c4ef2aa21d10a	2014-04-29 08:36:51 -07:00
Vikas Arora	6d6865f0db	Added SSE2 variants for Average2/3/4 The predictors based on Average2 are tad slower. Following is the performance data for these predictors normalized to number of instruction cycles (as per valgrind) per operation: - Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11 instruction cycles for the C version. - Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12 instruction cycles for the C version. The predictors based on Average4 is faster and Average3 is tad slower: - Predictor10 (Average4) now takes 23 instruction cycles compared to 25 instruction cycles for the C version. - Predictor5 (Average3) now takes 20 instruction cycles compared to 18 instruction cycles for the C version. Maybe SSE2 version of Average2 can be improved further. Otherwise, we can remove the SSE2 version and always fallback to the C version. Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029	2014-04-28 14:47:30 -07:00
Pascal Massimino	b3a616b356	make HistogramAdd() a pointer in dsp * merged the two HistogramAdd/AddEval() into a single call (with detection of special case when b==out) * added a SSE2 variant * harmonize the histogram type to 'uint32_t' instead of just 'int'. This has a lot of ripples on signatures. * 1-2% faster Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306	2014-04-28 10:09:34 -07:00
James Zern	c8bbb636ea	dec_neon: relocate some inline-asm defines move simple loop filter defines closer to their use and LOAD* to a location common with the intrinsics Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3	2014-04-28 00:41:42 -07:00
James Zern	4e393bb9f1	dec_neon: enable intrinsics-only functions the complex loop filter has no inline equivalent; the simple loop filter remains conditional on USE_INTRINSICS: it's left undefined for now. Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2	2014-04-28 00:39:46 -07:00
James Zern	ba99a922ab	dec_neon: use positive tests for USE_INTRINSICS makes Simple* layout consistent with the rest of the file Change-Id: Ib3108b0f2c694c634210e22027c253ea6236a9c6	2014-04-28 00:38:47 -07:00
James Zern	a7828e8bdb	dec_neon: make WORK_AROUND_GCC conditional on version Change-Id: Ic1b95f8749988de90df7c1ff6c537a21981329db	2014-04-28 00:01:19 -07:00
pascal massimino	3f3d717a6c	Merge "enc_neon: enable intrinsics-only functions"	2014-04-27 02:05:53 -07:00
pascal massimino	de3cb6c820	Merge "move LOCAL_GCC_VERSION def to dsp.h"	2014-04-27 02:04:08 -07:00
pascal massimino	ca49e7ad97	Merge "enc_neon: move Transpose4x4 to dsp/neon.h"	2014-04-27 01:11:05 -07:00
Pascal Massimino	ad900abddd	Merge "fix warning about size_t -> int conversion"	2014-04-27 01:07:03 -07:00
Pascal Massimino	4825b4360d	fix warning about size_t -> int conversion + re-order and add some const Change-Id: I3746520b75699e56e20835d10d1dd9cd9fd6d85d	2014-04-27 00:50:07 -07:00
James Zern	42b35e086b	enc_neon: enable intrinsics-only functions CollectHistogram / SSE* / QuantizeBlock have no inline equivalents, enable them where possible and use USE_INTRINSICS to control borderline cases: it's left undefined for now. Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155	2014-04-26 19:09:04 -07:00
James Zern	f937e01261	move LOCAL_GCC_VERSION def to dsp.h + add LOCAL_GCC_PREREQ and use it in lossless_neon.c Change-Id: Ic9fd99540bc3e19c027d1598e4530dfdc9b9de00	2014-04-26 19:09:04 -07:00
James Zern	5e1a17ef4b	enc_neon: move Transpose4x4 to dsp/neon.h + reuse it in TransformWHT() Change-Id: Idfbd0f9b58d6253ac3d65ba55b58989c427ee989	2014-04-26 14:06:04 -07:00
James Zern	c7b92a5a29	dec_neon: (WORK_AROUND_GCC) delete unused Load4x8 using this in Load4x16 was slightly slower and didn't help mitigate any of the remaining build issues with 4.6.x. Change-Id: Idabfe1b528842a514d14a85f4cefeb90abe08e51	2014-04-26 12:36:14 -07:00
Vikas Arora	0b896101b4	Reduce memory footprint for encoding WebP lossless. Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for: - Building HashChain data-structure used in creating the backward references. - Creating Backward references for LZ77 or RLE coding. - Creating Huffman tree for encoding the image. For the above mentioned code-paths, allocate memory once and re-use it subsequently. Reduce the foorprint of VP8LHistogram struct by changing the Struct field 'literal_' from an array of constant size to dynamically allocated buffer based on the input parameter cache_bits. Initialize BitWriter buffer corresponding to 16bpp (2WH). There are some hard-files that are compressed at 12 bpp or more. The realloc is costly and can be avoided for most of the WebP lossless images by allocating some extra memory at the encoder initializaiton. Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095	2014-04-26 01:14:33 -07:00
Djordje Pesut	1d62acf6af	MIPS: MIPS32r1: Added optimization for HuffmanCost functions. HuffmanCost and HuffmanCostCombined optimized and added 'const' to some variables from ExtraCost functions. Change-Id: I28b2b357a06766bee78bdab294b5fc8c05ac120d	2014-04-24 11:14:57 +02:00
James Zern	c0220460e9	Merge "Bugfix: Incremental decode of lossy-alpha"	2014-04-22 16:33:12 -07:00
Urvang Joshi	8c7cd722f6	Bugfix: Incremental decode of lossy-alpha When remapping buffer, br->eos_ was wrongly being set to true for certain images. Also, refactored the end-of-stream detection as a function. Reported in http://crbug.com/364830 Change-Id: I716ce082ef2b505fe24246b9c14912d8e97b5d84	2014-04-22 16:06:32 -07:00
Djordje Pesut	7955152d58	MIPS: fix error with number of registers. Some versions of compiler in debug build can't find a register in class 'GR_REGS' while reloading 'asm' Number of used registers is decreased in this fix. Change-Id: I7d7b8172b8f37f1de4db3d8534a346d7a72c5065	2014-04-22 12:06:45 +02:00
skal	b1dabe3767	Merge "Move the HuffmanCost() function to dsp lib"	2014-04-18 12:08:22 -07:00
skal	75b12006e3	Move the HuffmanCost() function to dsp lib This is to help further optimizations. (like in https://gerrit.chromium.org/gerrit/#/c/69787/) There's a small slowdown (~0.5% at -z 9 quality) due to function pointer usage. Note that, for speed, it's important to return VP8LStreaks by value, and not pass a pointer. Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000	2014-04-18 11:59:48 -07:00
Djordje Pesut	2772b8bd98	MIPS: fix assembler error revealed by clang's debug build .set at - Indicates that macro expansions may clobber the assembler temporary ($at or $28) register. Some macros may not be expanded without this and will generate an error message if noat is in effect. "at" also added to the clobber list. Change-Id: I67feebbd9f2944fc7f26c28496e49e1e2348529d	2014-04-18 18:10:52 +02:00
James Zern	6653b601ef	enc_mips32: fix unused symbol warning in debug move kC1 / kC2 under __OPTIMIZE__ missed in: 8dec120 enc_mips32: disable ITransform(One) in debug builds Change-Id: Ic9a12e6d73090c8c06b0e7a4bc56dd9c76b8e596	2014-04-17 23:35:36 -07:00
James Zern	8dec120975	enc_mips32: disable ITransform(One) in debug builds avoids: src/dsp/enc_mips32.c: In function 'ITransformOne': src/dsp/enc_mips32.c:123:3: can't find a register in class 'GR_REGS' while reloading 'asm' src/dsp/enc_mips32.c:123:3: 'asm' operand has impossible constraints Change-Id: Ic469667ee572f25e502c9873c913643cf7bbe89d	2014-04-17 20:10:31 -07:00
James Zern	98519dd5c1	enc_neon: convert Disto4x4 to intrinsics Change-Id: I0f00d5af2de2301e8237c2a38a9612d3645abad6	2014-04-17 18:29:31 -07:00
Pascal Massimino	fe9317c9bf	cosmetics: * remove MIPS32 suffix from static function names * fix a long line in enc_neon.c Change-Id: Ia1294ae46f471b3eb1e9ba43c6aa1b29a7aeb447	2014-04-16 00:36:19 -07:00
James Zern	953b074677	enc_neon: cosmetics fix/remove incorrect comments + whitespace Change-Id: Id1b86beb23e5bf946e73c34ab7066b6ca177f33b	2014-04-15 23:57:03 -07:00
skal	a9fc697cb6	Merge "WIP: extract the float-calculation of HuffmanCost from loop"	2014-04-15 11:33:11 -07:00
skal	3f84b5219d	Merge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)"	2014-04-15 07:09:12 -07:00
Djordje Pesut	4ae0533f39	MIPS: MIPS32r1: Added optimizations for ExtraCost functions. ExtraCost and ExtraCostCombined Change-Id: I7eceb9ce2807296c6b43b974e4216879ddcd79f2	2014-04-15 15:37:06 +02:00
skal	b30a04cf11	WIP: extract the float-calculation of HuffmanCost from loop new function: VP8FinalHuffmanCost() Change-Id: I42102f8e5ef6d7a7af66490af77b7dc2048a9cb9	2014-04-15 14:52:52 +02:00
skal	a8fe8ce231	Merge "NEON intrinsics version of CollectHistogram"	2014-04-15 03:00:45 -07:00
skal	95203d2d1b	NEON intrinsics version of CollectHistogram apparently faster, but we might save some load/store to/from memory once we settle for the intrinsics-based FTransform() (also: fixed some #ifdef USE_INTRINSICS problems) Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7	2014-04-14 16:47:20 +02:00
skal	7ca2e74bb4	replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8) saves few instructions Change-Id: If8f464bb2894a209bba94825a4db9267df126d47	2014-04-14 15:14:45 +02:00
skal	41c6efbdc5	fix lossless_neon.c * some extra {xx , 0 } in initializers * replaced by vget_lane_u32() where appropriate Change-Id: Iabcd8ec34d7c853920491fb147a10d4472280a36	2014-04-14 14:27:11 +02:00
skal	8ff96a027a	NEON intrinsics version of FTransform as little bit slower than inlined asm it seems. So disabled for now. Change-Id: I8c942846f9bedaed57275675ea9dbbcb8dfd9ccd	2014-04-14 09:58:35 +02:00
Jovan Zelincevic	0214f4a908	Merge "MIPS: MIPS32r1: Added optimizations for FastLog2"	2014-04-10 08:54:12 -07:00
Jovan Zelincevic	baabf1ea3a	MIPS: MIPS32r1: Added optimizations for FastLog2 Functions VP8LFastLog2Slow and VP8LFastSLog2Slow also: replaced some "% y" by "& (y-1)" in the C-version (since y is a power-of-two) Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0	2014-04-10 08:32:51 -07:00

... 5 6 7 8 9 ...

1562 Commits