libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-13 06:24:27 +02:00

Author	SHA1	Message	Date
Vincent Rabaud	010ca3d10d	Fix FindMatchLength with non-aligned buffers. The 32-bit buffers are actually rarely 64-bit aligned. The new solution uses memcmp and is alignment agnostic. It is also slightly faster. Change-Id: I863003e9ee4ee8a3eed25b7b2478cb82a0ddbb20	2015-12-04 10:19:58 +01:00
Scott Hancher	5ae220bef6	backward_references.c: Fixed compiler warning "Implicit conversion loses integer precision: 'long' to 'int'." Change-Id: I1aec7431f84123e5280447883eb80b84a3821d91	2015-12-02 23:51:06 -08:00
Vincent Rabaud	a141178255	Optimization in hash chain comparison for 64 bit Arrays were compared 32 bits at a time, it is now done 64 bits at a time. Overall encoding speed-up is only of 0.2% on @skal's small PNG corpus. It is of 3% on my initial 1.3 Mp desktop screenshot image. Change-Id: I1acb32b437397a7bf3dcffbecbcd4b06d29c05e1	2015-12-01 13:01:57 +01:00
Jyrki Alakuijala	90fcfcd905	Insert less hash chain entries from the beginnings of long copies. This makes the chains more efficient and a larger variety of data is tested. 0.02 % compression gain at q 100, 0.05 % at default quality. 0.8 % speedup by callgrind. 0.16 % compression gain for lossy alpha ?! Change-Id: I888120133352799eb14f5f602c7f40ab404bd665	2015-08-18 18:44:03 -07:00
Jyrki Alakuijala	01d61fd9c6	lossless: ~20 % speedup 0.28 % byte size increase on lossless, 0.18 % increase on lossy alpha Change-Id: I1e001a56831a8f996ac522aa646f9ae587c80d12	2015-07-20 17:13:44 -07:00
Jyrki Alakuijala	f722c8f0bd	lossless: Speed up ComputeCacheEntropy by 40 % a total impact of 1 % on encoding speed This allows for performance neutral removal of the binary search in cache bits selection. This will give a small improvement in compression density. Change-Id: If5d4d59460fa1924ce71af977320834a47c2054a	2015-07-20 17:13:44 -07:00
Jyrki Alakuijala	17eb609916	lossless: Allow copying from prev row in rle-mode. 0.21 % compression density improvement for 1000 png corpus in lossless mode 0.50 % compression density improvement for 1000 png corpus in lossy mode Change-Id: I14ee8c427ae5d3e116b0ee6695fcdea3321a319d	2015-07-20 17:13:43 -07:00
Jyrki Alakuijala	c4855ca249	lossless: Inlining add literal this is a simple speedup of about 1-2 % Change-Id: I0c7b01c0a69f4aeaf363ffda05a28871f1def696	2015-07-07 20:24:28 -07:00
Jyrki Alakuijala	8e9c94dedb	lossless: simplify HashChainFindCopy heuristics for small speedup 0.0003 % worse compression Change-Id: Ic4b6b21e5279231c6321f2cec1c79f7e17e56afa	2015-07-07 20:24:27 -07:00
Jyrki Alakuijala	888429f409	lossless: 0.5 % compression density improvement do not do length 2 matches far away speedup for non compressible data by inserting two literals at a time when no matches are found Change-Id: Ia8e033071f4186bb8148bb2bf13ca37586734aa3	2015-07-07 20:24:27 -07:00
Jyrki Alakuijala	5e75642efd	lossless: rle mode not to accept lengths smaller than 4. Gives a compression gain of 0.22 % Change-Id: I0f3b8dad6b4c1bfb16eab095a467f34466b9e3b7	2015-07-07 20:24:25 -07:00
Pascal Massimino	7fa67c9b9e	change GetPixPairHash64() return type to uint32_t Change-Id: Ibb61c1631d7a4bcda5417b5a85864d5e2c3f3858	2015-04-16 00:55:25 -07:00
Pascal Massimino	7fe357b8c0	split 64-mult hashing into two 32-bit multiplies Speed-wise equivalent on x86 and ARM (maybe a tad faster, hard to tell). Note that the two 32-bit multiples are not strictly equivalent to the 64-bit one, since we're missing one carry propagation. In practice, no observable difference was seen because of this slightly different hashing result. Change-Id: I8f2381175eae1cb20dabf149e6b27e1768fba6ab	2015-04-15 17:45:19 +02:00
Vikas Arora	4d6d7285b0	Simplify backward refs calculation for low-effort. Simplify and speedup backward references for low-effort settings by evaluating LZ77 references only. This change speeds up compression by 10-25% at lower (q <= 25) quality range with a slight drop (0.2%) in the compression density. Change-Id: Ibd6f03b1a062d8ab9191786c2a425e9132e4779f	2015-01-27 09:36:14 -08:00
Pascal Massimino	0d5b334ee8	BackwardReferencesHashChainFollowChosenPath: remove unused variable Change-Id: I8dc4622dbacca03a7876f8856a0db5b9b9ec2fbd	2015-01-22 23:22:58 -08:00
Pascal Massimino	cb4a18a7ba	rename HashChainInit into HashChainReset this avoids the confusion with "VP8LHashChainInit" Change-Id: Ia1686828c138729e5bda3cc5c8246d99c80915ef	2015-01-20 00:38:07 -08:00
Pascal Massimino	f079e487ae	use uint16_t for chosen_path[] len is MAX_LENGTH (4096) at max. This reduce memory for path by a half. Change-Id: I399fda4093d93b1e9d956397b7b210956c5b948f	2015-01-20 00:34:09 -08:00
Vikas Arora	b9e356b998	Disable costly TraceBackwards for method=0. Disable costly TraceBackwards heuristic for computing the backward references for low_effort (method=0) compression. The TraceBackwards heuristic is already disabled for lower (q < 25) quality range. Following is the compression data for 1000 image corpus for q >= 25. This speeds up compression (q >= 25) by a factor of 2.5-3X with slight loss of compression density (0.7% for lower quality range and 1.2% for higher qualities). Change-Id: I256c9e2137c7de4083f423ea32ee12d3b0f46253	2015-01-15 09:01:40 -08:00
Vikas Arora	ea08466d34	Tune BackwardReferencesLz77 for low_effort (m=0). - Lower the threshold parameters for HashChainFindCopy. For 1000 image PNG corpus (m=0), this change yields speedup of 15-20% at lower quality range (0.25% drop in compression density) and about 10% for higher quality range without any drop in the compression density. Following is the compression stats (before/after) for method = 0: Before After bpp/MPs bpp/MPs q=0 2.8615/18.000 2.8651/18.631 q=5 2.8615/18.216 2.8650/20.517 q=10 2.8572/18.070 2.8650/21.992 q=15 2.8519/18.371 2.8584/21.747 q=20 2.8454/18.975 2.8515/20.448 q=25 2.8230/8.531 2.8253/9.585 // Compression density remains same for q-range [30-100] q=30 2.7310/7.706 2.7310/8.028 q=35 2.7253/6.855 2.7253/7.184 q=40 2.7231/6.364 2.7231/6.604 q=45 2.7216/5.844 2.7216/6.223 q=50 2.7196/5.210 2.7196/5.731 q=55 2.7208/4.766 2.7208/4.970 q=60 2.7195/4.495 2.7195/4.602 q=65 2.7185/4.024 2.7185/4.236 q=70 2.7174/3.699 2.7174/3.861 q=75 2.7164/3.449 2.7164/3.605 q=80 2.7161/3.222 2.7161/3.038 q=85 2.7153/2.919 2.7153/2.946 q=90 2.7145/2.766 2.7145/2.771 q=95 2.7124/2.548 2.7124/2.575 q=100 2.6873/2.253 2.6873/2.335 Change-Id: I0e17581fb71f6094032ad06c6203350bd502f9a1	2015-01-08 00:30:21 -08:00
Vikas Arora	413dfc0c4b	Move static method definition before its usage. Change-Id: Id766c2bea92e7ebf0de65046f73429b74b4fdda4	2014-11-13 13:18:30 -08:00
Vikas Arora	0f23566558	Update BackwardRefsWithLocalCache. Update BackwardRefsWithLocalCache to do in-place update of backward references w.r.t local color cache index. No impact on the compression density or compression speed. Change-Id: Ie066251464c3928c044e037b43df3af28b48ca30	2014-11-13 11:54:26 -08:00
Vikas Arora	fdaac8e0ca	Optmize VP8LGetBackwardReferences LZ77 references. Use the refs_lz77 computed (with cache_bits=0) in the method 'CalculateBestCacheSize' to regenerate the LZ77 references corresponding to the optimum cache_bits and avoid calling costly 'BackwardReferencesLz77' one extra time. This change leaves the compression density unchanged and speeds up compression by 10-15%. Change-Id: I5a92e11788d3c3f656aa7e1fba54fb5d96ee0027	2014-11-12 14:50:04 -08:00
Vikas Arora	95a9bd85c4	Updated VP8LGetBackwardReferences and color cache. - The optimal cache bits is evaluated inside the method 'VP8LGetBackwardReferences'. - The input cache_bits to 'VP8LGetBackwardReferences' sets the maximum cache bits to use (passing 0 implies disabling the local color cache). - The local color cache is disabled for lowerf (<= 25) quality levels (as before). - Enabled local color cache for palette images as well. This saves additional 0.017% bytes with a slight (2-3%) improvement in the compression speed. - Removed 'use_2d_locality' parameter from method VP8LGetBackwardReferences, as this option is not an option now (after we freeze the lossless bit-stream). Change-Id: I33430401e465474fa1be899f330387cd2b466280	2014-11-06 13:14:05 -08:00
James Zern	4171b6724e	backward_references.c: reindent after `c8581b0` Change-Id: Icfc0fe8e266c0f67a70b8cb095e5aaee155290b6	2014-11-04 17:40:04 +01:00
Vikas Arora	c8581b06e1	Optimize BackwardReferences for RLE encoding. Updated BackwardReferencesRle method by utilizing the local color cache. Also changed the name of method BackwardReferencesHashChain to BackwardReferencesLz77 to reflect the LZ77 coding. For the 1000 image corpus, this change saves 0.2% bytes (at default settings) and is 2-5% faster to encode. Change-Id: Ic3f288253b3bbb101a69945a80994c3fd0917f8b	2014-11-04 08:12:07 -08:00
Vikas Arora	4167a3f5f7	Optimize backwardreferences Optimize backwardreferences (about 0.1% byte savings) with almost same compression speed (3% faster on defaut compression settings). 1.) Simplified iteration logic for HashChainFindCopy. - Remapped the iter_max constant. 2.) Simplified main for loop for BackwardReferencesHashChain - Removed 'if' conditions for corner cases in the main loop. - Refactored the method(AddSingleLiteral) for adding one pixel. Change-Id: I1bc44832fd81f11e714868a13e606c8f83157e64	2014-10-31 18:08:38 -07:00
Vikas Arora	77bdddf016	Speed up BackwardReferences Speed up BackwardReferencesHashChainDistanceOnly method by: 1.) Remove for loop for shortmax code path. 2.) Execute the shortmax code path after regular call to HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH). 3.) Also for shortmax, call method HashChainFindOffset (for length = 2), instead of expensive method HashChainFindCopy(). 4.) Handling first pixel (i==0) outside main loop and removing one if condition (i > 0) per pixel. 5.) Handle the last pixel outside the main 'for' loop. Overall compression speedup observed is around 5% (+/- noise). Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b	2014-10-30 10:58:24 -07:00
Vikas Arora	e912bd55be	Fix bug in VP8LCalculateEstimateForCacheSize. The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible range for cache_bits. Also added a small penality for choosing the larger cache-size. This is done to strike a balance between additional memory/CPU cost (with larger cache-size) and byte savings from smaller WebP lossless files. This change saves about 0.07% bytes and speeds up compression by 8% (default settings). There's small speedup at Q=50 along with byte savings as well. Compression at Quality=25 is not effected by this change. Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77	2014-10-26 20:05:48 -07:00
Vikas Arora	c2b5a0396a	Modify CostModel to allocate optimal memory. Change-Id: I7d52675d28bfc109d4e901581fc24cd36fcb79ee	2014-10-22 13:30:33 -07:00
Vikas Arora	139142e440	Optimize BackwardReferenceHashChainFollowPath. Instead of calling HashChainFindMethod, call a new (subset) method HashChainFindOffset to get the offset/distance for a given length. The encoding is tad faster at default compression Before After bpp/rate bpp/rate 442 Palette 0.2720/5.270 MP/s 0.2720/5.790 MP/s 558 non-palette 3.7607/0.797 MP/s 3.7607/0.816 MP/s Change-Id: If4041a9c18f7e972f49fcbab8c3e2f013d8bf1cf	2014-10-21 10:04:27 -07:00
James Zern	5f36b68d22	enc/backward_references.c: fix indent reindent after `c24f895` Change-Id: I55adcbef21ea3fdaded84b138745515596191a09	2014-10-20 11:35:20 +02:00
Vikas Arora	c24f8954be	Simplify and speedup Backward refs computation. Updated VP8LGetBackwardReferences and HashChainFindCopy method with following: - Remove the recursive CostModelBuild. - Reuse the lz77 backward refs in CostModelBuild, instead of evaluating it again (as it was done for recursion_level=0). - Consolidated the Match-length logic inside FindMatchLength method. - Removed the logic for altering best_length/val based on the 2D distance. The additional 162 value (+= 9 * 9 + 9 * 9 - y * y - x * x) can't change the best_val eval computation to choose a different curr_length, as best_val was set to 'curr_length << 16'. Following is the impact on the compression speed/density at default & max quality, overall this speeds up compression by 5-15% (q=100 -> 75) with a tad drop (0.02-0.03%) in compression density for the non-palette images. Before After bpp/Rate(MP/s) bpp/Rate(MP/s) q=75 (def) All 1000 2.4492/1.049 MP/s 2.4498/1.230 MP/s Palette 0.2719/5.060 MP/s 0.2719/6.110 MP/s non-Palette 3.7597/0.732 MP/s 3.7607/0.840 MP/s q=100 All 1000 2.4134/0.125 MP/s 2.4142/0.131 MP/s Palette 0.2692/2.585 MP/s 0.2692/2.885 MP/s non-Palette 3.7040/0.079 MP/s 3.7053/0.083 MP/s Change-Id: I27a5eff3356d876c3e949fd32262244b25678b7a	2014-10-17 09:21:30 -07:00
Vikas Arora	516971b136	lossless: Remove unaligned read warning (typecast uint32 pointer to uint64). The proposed change is little (0.05%) slower but avoids uint32 to uint64 pointer conversion. Change-Id: I6b8828077ea1324fabd04bfa7e7439e324776250	2014-07-02 20:55:27 -07:00
skal	77bf4410f7	make error-code reporting consistent upon malloc failure Sometimes, the error-code was not set correctly. We now return OUT_OF_MEMORY everytimes it's appropriate (tested using MALLOC_FAIL_AT mechanism) Took the opportunity to clean-up the code and dust the error code returned (some were erroneously set to INVALID_CONFIGURATION) Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c	2014-06-13 08:45:12 +02:00
skal	ca3d746e39	use block-based allocation for backward refs storage, and free-lists Non-photo source produce far less literal reference and their buffer is usually much smaller than the picture size if its compresses well. Hence, use a block-base allocation (and recycling) to avoid pre-allocating a buffer with maximal size. This can reduce memory consumption up to 50% for non-photographic content. Encode speed is also a little better (1-2%) Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11	2014-05-05 11:11:55 -07:00
skal	d3bcf72bf5	Don't allocate VP8LHashChain, but treat like automatic object the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_) is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain in the encoder. Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677	2014-04-30 14:10:48 -07:00
Pascal Massimino	cf5eb8ad19	remove some uint64_t casts and use. We use automatic int->uint64_t promotion where applicable. (uint64_t should be kept only for overflow checking and memory alloc). Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125	2014-04-29 09:08:25 -07:00
Pascal Massimino	b3a616b356	make HistogramAdd() a pointer in dsp * merged the two HistogramAdd/AddEval() into a single call (with detection of special case when b==out) * added a SSE2 variant * harmonize the histogram type to 'uint32_t' instead of just 'int'. This has a lot of ripples on signatures. * 1-2% faster Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306	2014-04-28 10:09:34 -07:00
Vikas Arora	0b896101b4	Reduce memory footprint for encoding WebP lossless. Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for: - Building HashChain data-structure used in creating the backward references. - Creating Backward references for LZ77 or RLE coding. - Creating Huffman tree for encoding the image. For the above mentioned code-paths, allocate memory once and re-use it subsequently. Reduce the foorprint of VP8LHistogram struct by changing the Struct field 'literal_' from an array of constant size to dynamically allocated buffer based on the input parameter cache_bits. Initialize BitWriter buffer corresponding to 16bpp (2WH). There are some hard-files that are compressed at 12 bpp or more. The realloc is costly and can be avoided for most of the WebP lossless images by allocating some extra memory at the encoder initializaiton. Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095	2014-04-26 01:14:33 -07:00
skal	af93bdd6bc	use WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free there's still some malloc/free in the external example This is an encoder API change because of the introduction of WebPMemoryWriterClear() for symmetry reasons. The MemoryWriter object should probably go in examples/ instead of being in the main lib, though. mux_types.h stil contain some inlined free()/malloc() that are harder to remove (we need to put them in the libwebputils lib and make sure link is ok). Left as a TODO for now. Also: WebPDecodeRGB*() function are still returning a pointer that needs to be free()'d. We should call WebPSafeFree() on these, but it means exposing the whole mechanism. TODO(later). Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b	2014-03-27 15:50:59 -07:00
Vikas Arora	5f0cfa80ff	Do a binary search to get the optimum cache bits. This speeds up the lossless encoder by a bit (1-2%), without impacting the compression density. Change-Id: Ied6fb38fab58eef9ded078697e0463fe7c560b26	2014-03-13 10:30:32 -07:00
Vikas Arora	b33e8a05ee	Refactor code for HistogramCombine. Refactor code for HistogramCombine and optimize the code by calculating the combined entropy and avoid un-necessary Histogram merges. This speeds up lossless encoding by 1-2% and almost no impact on compression density. Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b	2014-03-03 13:50:42 -08:00
James Zern	a42ea9742a	cosmetics: backward_references.c: reindent after `a7d2ee3` `a7d2ee3` Optimize cache estimate logic. Change-Id: I81dd1eea49f603465dc5f3afae8a101e5205e963	2014-02-11 15:52:22 -08:00
Vikas Arora	a7d2ee39be	Optimize cache estimate logic. Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50). The entropy is generally lower for higher cache_bits, so start searching from higher cache_bits and settle for a local minima, instead of evaluating all values. This speeds up the lossless encoding at lower qualities by 10-15%. Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec	2014-02-11 10:59:01 -08:00
Scott Talbot	391316fee2	Don't dereference NULL, ensure HashChain fully initialized Found by clang's static analyzer, they look validly uninitialized to me. Change-Id: I650250f516cdf6081b35cdfe92288c20a3036ac8	2014-02-03 21:16:59 -08:00
James Zern	4931c3294b	cosmetics: fix some typos Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c	2013-11-26 19:21:14 -08:00
skal	ff379db317	few % speedup of lossless encoding mostly visible for method 4 and up Change-Id: I1561d871bc055ec5f7998eb193d927927d3f2add	2013-11-12 00:09:45 +01:00
Vikas Arora	e081f2f359	Pack code & extra_bits to Struct (VP8LPrefixCode). Also created variant VP8LPrefixEncodeBits that returns the code & extra_bits only. There's no impact on compression density and compression speed. Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c	2013-08-12 11:56:42 -07:00
Vikas Arora	69257f70df	Create LUT for PrefixEncode. This speeds up lossless compression by 5%. Change-Id: Ifd114b1d9850dc3aac74593809e7d48529d35e3d	2013-08-05 10:20:18 -07:00
Vikas Arora	7d60bbc6d9	Speed up HashChainFindCopy function. Speed up HashChainFindCopy by optimizing on number of calls to FindMatchLength method. This change speeds up the lossless & lossy (Alpha) encoding by 20% without loss of compression density. At method=3, lossy (Alpha) compression speed (and density) remains unchanged, as at that settings, the costly Backward Refs method is not called Change-Id: Ia1797148e9e4ee2787011837fa248afbae2242cb	2013-07-16 19:58:18 -07:00

1 2

82 Commits