libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-03 01:24:30 +02:00

Author	SHA1	Message	Date
James Zern	51f406a5d7	lossless_sse2: relocate VP8LDspInitSSE2 proto this is in line with the other dsp files and silences a build warning. Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2	2014-03-27 15:07:43 -07:00
skal	0f4f721b12	separate SSE2 lossless functions into its own file expose the predictor array as function pointers instead of each individual sub-function + merged Average2() into ClampedAddSubtractHalf directly + unified the signature as "VP8LProcessBlueAndRedFunc" no speed diff observed Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044	2014-03-27 21:43:55 +01:00
skal	514fc251df	VP8LConvertFromBGRA: use conversion function pointers Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c	2014-03-27 09:00:35 +01:00
James Zern	6d2f35273d	dsp/dec: TransformDCUV: use VP8TransformDC rather than forcing the C version; this is similar to TransformUV Change-Id: I2778194f05fca33e9b2b71323e92947c0b395e9a	2014-03-26 16:43:47 -07:00
skal	defc8e1b01	Merge "fix out-of-bound read during alpha-plane decoding"	2014-03-26 15:22:42 -07:00
James Zern	fbed36433d	Merge "dsp: reuse wht transform from dec in encoder"	2014-03-26 15:13:07 -07:00
skal	d846708400	Merge "Add SSE2 version of ARGB -> BGR/RGB/... conversion functions"	2014-03-26 15:01:46 -07:00
skal	207d03b484	fix out-of-bound read during alpha-plane decoding With -bypass_filter switched on, the lossless-compressed data is decoded ahead of time (before being transformed and display). Hence, the last row was called twice. http://code.google.com/p/webp/issues/detail?id=193 Change-Id: I9e13f495f6bd6f75fa84c4a21911f14c402d4b10	2014-03-26 22:45:03 +01:00
skal	d1b33ad58b	2-5% faster trellis with clang/MacOS (and ~2-3% on ARM) We don't need to store cost/score for each node, but only for the current and previous one -> simplify code and save some memory. Also made the 'Node' structure tighter. Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6	2014-03-26 22:33:01 +01:00
skal	369c26dd3f	Add SSE2 version of ARGB -> BGR/RGB/... conversion functions ~4-6% faster lossless decoding Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361	2014-03-26 22:19:00 +01:00
James Zern	df230f2723	dsp: reuse wht transform from dec in encoder Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717	2014-03-22 13:25:08 -07:00
Pascal Massimino	59daf08362	Merge "cosmetics:"	2014-03-18 04:02:33 -07:00
Pascal Massimino	536220084c	cosmetics: - use VP8ScanUV, separate from VP8Scan[] (for luma) - fix indentation - few missing consts - change TrellisQuantizeBlock() signature Change-Id: I94b437d791cbf887015772b5923feb83dd145530	2014-03-18 03:34:56 -07:00
James Zern	3e7f34a3fb	AssignSegments: quiet array-bounds warning nb (enc->segment_hdr_.num_segments_) will be in the range [1, NUM_MB_SEGMENTS]. Change-Id: I5c2bd0bb82b17c99aff39c98b6b1747fc040dc16	2014-03-14 18:47:52 -07:00
James Zern	3c2ebf58a4	Merge "UpdateHistogramCost: avoid implicit double->float"	2014-03-14 15:50:57 -07:00
James Zern	cf821c821f	UpdateHistogramCost: avoid implicit double->float all the functions involved return double and later these locals are used in double calculations. fixes a vs build warning Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14	2014-03-14 11:18:52 -07:00
Vikas Arora	312e638f30	Extend the search space for GetBestGreenRedToBlue Get back some of the compression gains by extending the search space for GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not helping much in yielding better compression density. Before: 1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations) Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s After: 1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations) Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d	2014-03-14 09:56:00 -07:00
Vikas Arora	1c58526fe1	Fix few nits Add/remove few casts, fixed indentation. Change-Id: Icd141694201843c04e476f09142ce4be6e502dff	2014-03-13 13:57:39 -07:00
Vikas Arora	fef22704ec	Optimize and re-structure VP8LGetHistoImageSymbols Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash for merging the Histograms more efficiently, instead of the randomized heuristic of existing method HistogramCombine. This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50) with 0.8% penalty in compression density. For lower method, the speed up is 25-30%, with 0.4% penalty in the compression density. Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce	2014-03-13 11:48:37 -07:00
Vikas Arora	068b14ac57	Optimize lossless decoding. Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove one if condition inside the main/critial loop. Also separated TransformColor & TransformColorInverse into separate functions and avoid one 'if condition' inside this critical method. This change speeds up lossless decoding for Lenna image about 5% and 1000 image corpus by 3-4%. Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d	2014-03-13 11:27:12 -07:00
Vikas Arora	5f0cfa80ff	Do a binary search to get the optimum cache bits. This speeds up the lossless encoder by a bit (1-2%), without impacting the compression density. Change-Id: Ied6fb38fab58eef9ded078697e0463fe7c560b26	2014-03-13 10:30:32 -07:00
skal	65b99f1c92	add a -z option to cwebp, and WebPConfigLosslessPreset() function These are presets for lossless coding, similar to zlib. The shortcut for lossless coding is now, e.g.: cwebp -z 5 in.png -o out_lossless.webp There are 10 possible values for -z parameter: 0 (fastest, lowest compression) to 9 (slowest, best compression) A reasonable tradeoff is -z 6, e.g. -z 9 can be quite slow, so use with care. This -z option is just a shortcut for some pre-defined '-lossless -m xx -q yy' combinations. Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04	2014-03-11 23:25:35 +01:00
skal	30176619c6	4-5% faster trellis by removing some unneeded calculations. (We didn't need the exact value of the max_error properly. We can work with relative values instead of absolute) Output is bitwise the same as before. Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649	2014-03-06 15:57:25 +01:00
James Zern	687a58ecc3	histogram.c: reindent after b33e8a0 b33e8a0 Refactor code for HistogramCombine. Change-Id: Ia1b4b545c5f4e29cc897339df2b58f18f83c15b3	2014-03-04 00:38:14 -08:00
skal	06d456f685	Merge "~3-4% faster lossless encoding"	2014-03-04 00:17:52 -08:00
skal	c60de26099	~3-4% faster lossless encoding by re-arranging some code from SkipRepeatedPixel() Change-Id: I6c1fd7cd9af22cd9be4234217ff67d7b94f44137	2014-03-04 08:12:59 +01:00
James Zern	42eb06fc0e	Merge "few cosmetics after patch #69079 "	2014-03-03 15:13:25 -08:00
skal	82af82644b	few cosmetics after patch #69079 Change-Id: Ifa758420421b5a05825a593f6b43504887603ee7	2014-03-03 23:53:08 +01:00
Vikas Arora	b33e8a05ee	Refactor code for HistogramCombine. Refactor code for HistogramCombine and optimize the code by calculating the combined entropy and avoid un-necessary Histogram merges. This speeds up lossless encoding by 1-2% and almost no impact on compression density. Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b	2014-03-03 13:50:42 -08:00
skal	ca1bfff53f	Merge "5-10% encoding speedup with faster trellis (-m 6)"	2014-03-03 13:09:17 -08:00
skal	5aeeb087d6	5-10% encoding speedup with faster trellis (-m 6) mostly by: - storing a single rd-score instead of cost / distortion separately - evaluating terminal cost only once - getting some invariants out of the loops - more consts behind fewer variables Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c	2014-03-03 22:07:06 +01:00
James Zern	82ae1bf299	cosmetics: normalize VP8GetCPUInfo checks - use '!= NULL' + dec_neon/STORE_WHT: align '\'s Change-Id: I0f0ce49bd9c58e771bafb24c51c070d5ebd77e53	2014-02-28 18:47:41 -08:00
James Zern	e3dd9243cb	Merge "Refactor GetBestPredictorForTile for future tuning."	2014-02-28 18:39:27 -08:00
Vikas Arora	206cc1be5a	Refactor GetBestPredictorForTile for future tuning. This change doesn't impact compression gain or compression speed. Change-Id: Ia87d8a46c6f1ce0f8974178d75a6b0ba0a6e3696	2014-02-28 11:30:23 -08:00
James Zern	3cb8406262	Merge "speed-up trellis quant (~5-10% overall speed-up)"	2014-02-27 14:34:01 -08:00
Pascal Massimino	b66f2227c1	Merge "lossy encoding: ~3% speed-up"	2014-02-27 11:42:16 -08:00
Pascal Massimino	4287d0d49b	speed-up trellis quant (~5-10% overall speed-up) store costs[] in node instead of context Change-Id: I6aeb0fd94af9e48580106c41408900fe3467cc54 also: various cosmetics	2014-02-27 00:06:00 -08:00
Pascal Massimino	390c8b316d	lossy encoding: ~3% speed-up incorporate non-last cost in per-level cost table also: correct trellis-quant cost evaluation at nodes (output a little bit different now). Method 6 is ~4% faster. Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8	2014-02-26 05:52:24 -08:00
James Zern	9a463c4a51	Merge "dec_neon: convert TransformWHT to intrinsics"	2014-02-25 14:36:44 -08:00
pascal massimino	e8605e9625	Merge "dec_neon: add ConvertU8ToS16"	2014-02-25 08:56:17 -08:00
Djordje Pesut	4aa3e4122b	MIPS: MIPS32r1: rescaler bugfix Change-Id: I6de6e2488bd5bd58c1f705739e4467feb211f8b4	2014-02-25 14:36:48 +01:00
Vikas Arora	c16cd99aba	Speed up lossless encoder. Speedup lossless encoder by 20-25% by optimizing: - GetBestColorTransformForTile: Use techniques like binary search and local minima search to reduce the search space. - VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for log(1 + x) and increase the threshold for calling the approximate version of log_2 (compared to costly call to log()). Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d	2014-02-21 22:13:50 -08:00
James Zern	9d6b5ff1e6	dec_neon: convert TransformWHT to intrinsics Change-Id: I34dc1d75ddebab131cfed031764117e3f7b75c6b	2014-02-21 11:23:46 -08:00
James Zern	2ff0aae2fe	dec_neon: add ConvertU8ToS16 Change-Id: Ifc4fb8e7f862e72154d2f2739811b1022d2b9416	2014-02-20 15:35:33 -08:00
skal	77a8f91981	fix compilation with USE_YUVj flag (not that we'll ever need it, but...) Change-Id: I9af993c62372097846c5ca6bae8362b59c3502dc	2014-02-20 13:23:18 +01:00
James Zern	4acbec1bef	Merge changes I3b240ffb,Ia9370283,Ia2d28728 * changes: dec_neon: TransformAC3: work on packed vectors dec_neon: add SaturateAndStore4x4 dec_neon.c: convert TransformDC to intrinsics	2014-02-19 14:47:33 -08:00
James Zern	2719bb7e98	dec_neon: TransformAC3: work on packed vectors pack 2 rows in 1 vector similar to TransformDC Change-Id: I3b240ffb4f51a632b5c8c2daf54d938333ed4b0d	2014-02-18 19:47:20 -08:00
James Zern	b7b60ca16c	dec_neon: add SaturateAndStore4x4 converts 2 s16 vectors to 2 u8 and store to uint8_t destination; TransformAC3 can reuse this after a rework Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0	2014-02-18 19:42:35 -08:00
Pascal Massimino	b7685d73fe	Rescale: let ImportRow / ExportRow be pointer-to-function Separate the C version from the MIPS32 version and have run-time initialization during RescalerInit() Change-Id: I93cfa5691c073a099fe62eda1333ad2bb749915b	2014-02-17 00:58:17 -08:00
James Zern	e02f16ef45	dec_neon.c: convert TransformDC to intrinsics no noticeable difference in performance Change-Id: Ia2d287289c3865ddd0fc99edaf7a030778aa7025	2014-02-14 12:11:58 -08:00

... 2 3 4 5 6 ...

1318 Commits