Commit Graph

47 Commits

Author SHA1 Message Date
Vincent Rabaud
decf6f6b87 Speedups for empty histograms.
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.

Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
2018-10-20 13:23:50 +02:00
Vincent Rabaud
dea3e89983 Split HistogramAdd to only have the high level logic in C.
Change-Id: Ic9eaebf7128ca0215b49d2a13bde1f5b94a28061
2018-10-19 14:03:28 +02:00
James Zern
d77bf512bd add WEBP_DSP_INIT / WEBP_DSP_INIT_FUNC
this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.

Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
2018-04-17 11:45:34 +00:00
James Zern
b7971d0e22 dsp: avoid defining _C functions w/NEON builds
when targeting NEON C functions with NEON equivalents won't be used, but
will contribute to binary size. the same goes for sse2, etc., but this
change is primarily concerned with binary sizes for android arm targets.

note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect
on the use of NEON functions.

this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0.

Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface
2017-10-27 10:54:56 -07:00
James Zern
a439972175 WIP: list includes as descendants of the project dir
#include "(.|..)/..." -> #include "src/..."

Change-Id: I772880aa097a770722043c8a4393552ba38a89b6
2017-10-10 23:04:05 -07:00
skal
1411f02761 Lossless Enc: harmonize the function suffixes
BUG=webp:355

Change-Id: I8baf506bd2a27095b956ef22a862b071f60c0d72
2017-08-07 18:02:07 -07:00
Vincent Rabaud
e4eb458741 lossless, VP8LTransformColor_C: make sure no overflow happens with colors.
Change-Id: Iec0d07cf1188ba96391cdb1b62131fc1469dfac6
2017-05-24 11:34:40 +02:00
James Zern
668e1dd44f src/{dec,enc,utils}: give filenames a unique suffix
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible

BUG=webp:279

Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
2017-01-19 19:09:48 -08:00
Vincent Rabaud
875fafc191 Implement BundleColorMap in SSE2.
Change-Id: I44cd23647bd0a49330b6b2b3ed08050a5500e58e
2016-12-21 10:44:31 +01:00
Pascal Massimino
9cc421675b PredictorSub: implement fully-SSE2 version
and inline the C-version too.

Predictor #13 is still a hard one.

Change-Id: Iedecfb5cbf216da4e28ccfdd0810286133f42331
2016-12-13 02:19:35 -08:00
Pascal Massimino
fbba5bc2c1 optimize predictor #1 in plain-C
For some reason, gcc has hard time inlining this one...

Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]

Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
2016-12-12 17:41:36 +01:00
Vincent Rabaud
2e6cb6f34e Give more flexibility to the predictor generating macro.
Change-Id: Ia651afa8322cb5c5ae87128340d05245c0f6a900
2016-12-02 12:33:12 -08:00
Vincent Rabaud
4239a1489c Make the lossless predictors work on a batch of pixels.
Change-Id: Ieaee34f1f97c375b9e97ef7e9df60aed353dffa1
2016-11-28 17:12:10 +01:00
Vincent Rabaud
71e2f5cadf Remove memcpy in lossless decoding.
Change-Id: Iba694b306486d67764e2fc5576c98a974c9b886c
2016-11-24 17:45:24 +01:00
Vincent Rabaud
64577de8ae De-VP8L-ize GetEntropUnrefinedHelper.
Having it architecture dependent resulted in an extra
function call of an extern function, hence no inlining and
a 5-10% impact on performance.

Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450
2016-09-14 13:55:24 +02:00
Vincent Rabaud
6cc48b1728 Move some lossless logic out of dsp.
Change-Id: I4cfd60cd5497666a2e1c188ceada2e71b05f1505
2016-09-13 15:37:32 +02:00
Vincent Rabaud
c9b45863e2 Split off common lossless dsp inline functions.
Change-Id: I64f96897b11d1c21f033c7e47b21edccb5c68738
2016-09-12 17:35:08 +02:00
skal
6ab496ed22 fix some 'unsigned integer overflow' warnings in ubsan
I couldn't find a safe way of fixing VP8GetSigned() so i just
used the big-hammer.

Change-Id: I1039bc00307d1c90c85909a458a4bc70670e48b7
2016-08-16 23:18:27 -07:00
James Zern
8a4ebc6ab0 Revert "fix 'unsigned integer overflow' warnings in ubsan"
This reverts commit e44f5248ff.

contains unintentional changes in quant.c

Change-Id: I1928f072566788b0c9ea80f6fbc9e571061f9b3e
2016-08-16 16:55:56 -07:00
skal
e44f5248ff fix 'unsigned integer overflow' warnings in ubsan
I couldn't find a safe way of fixing VP8GetSigned() so i just
used the big-hammer.

Change-Id: I1039bc00307d1c90c85909a458a4bc70670e48b7
2016-08-16 15:04:41 -07:00
Parag Salasakar
cb19dbc1a4 Add MSA optimized color transform functions
We add the following MSA optimized color transform functions:
- TransformColor
- SubtractGreenFromBlueAndRed

Change-Id: Ib182d2b5faa7191f503ce70f0dfde0ac89402fd3
2016-07-18 13:49:24 +00:00
Vincent Rabaud
9e8e1b7b2a Inline GetResidual for speed.
Change-Id: Ib4228e87dc448866229c0795ca68dabe777ef31c
2016-06-21 16:04:53 +02:00
Marcin Kowalczyk
f2e1efbeb7 Improve near lossless compression when a prediction filter is used.
The old implementation in enc/near_lossless.c performing a separate
preprocessing step is used only when a prediction filter is not used,
otherwise a new implementation integrated into lossless_enc.c is used.

It retains the same logic for converting near lossless quality into max
number of bits dropped, and for adjusting the number of bits based on
the smoothness of the image at a given pixel. As before, borders are not
changed.

Then, instead of quantizing raw component values, the residual after
subtract green and after prediction is quantized according to the
resulting number of bits, taking care to not cross the boundary between
255 and 0 after decoding. Ties are resolved by moving closer to the
prediction instead of by bankers’ rounding.

This results in about 15% size decrease for the same quality.

Change-Id: If3e9c388158c2e3e75ef88876703f40b932f671f
2016-05-18 20:59:02 +00:00
Vincent Rabaud
8ce975ac82 SSE optimization for vector mismatch.
Change-Id: I564b822033b59d86635230f29ed6197e306a2c4f
2016-01-07 18:23:45 +01:00
James Zern
99a01f4f8b Merge "Unify some entropy functions." 2015-12-17 22:35:29 +00:00
Vincent Rabaud
ca509a3362 Unify some entropy functions.
The code and logic is unified when computing bit entropy + Huffman cost.

Speed-wise, we gain 8% for lossless encoding.
Logic-wise, the beginning/end of the distributions are handled properly
and the compression ratio does not change much.

Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633
2015-12-17 17:00:08 +01:00
Pascal Massimino
b0547ff0b4 move back common constants for lossless_enc*.c into the .h
Change-Id: I11bc979db691f6518d85e2e1c3ac7f05d69681b0
2015-12-17 15:11:56 +01:00
Vincent Rabaud
47ddd5a4cc Move some codec logic out of ./dsp .
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.

Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
2015-12-17 07:03:25 +00:00
Vincent Rabaud
2835089d6a Provide an SSE2 implementation of CombinedShannonEntropy.
CombinedShannonEntropy takes 30% for lossless compression.
This implementation speeds up the overall process by 2 to 3 %.

Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f
2015-12-11 15:12:19 +01:00
Lode Vandevenne
6938111357 Improved alpha cleanup for the webp encoder when prediction transform is used.
Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset:
Alpha cleanup before: 18856614
Alpha cleanup after: 18685802
For reference, with no alpha cleanup: 19159992

Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps
preprocessing in the encoder, and the cases when the prediction transform is not used.

Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28
2015-12-04 13:50:56 +00:00
James Zern
0837512964 Merge "Make a separate case for low_effort in CopyImageWithPrediction" 2015-12-03 08:46:31 +00:00
James Zern
aa2eb2d4a1 Merge "cosmetics: fix indent" 2015-12-03 08:44:54 +00:00
James Zern
b7551e90e1 cosmetics: fix indent
Change-Id: I67e5a0308a964bc37b2314d96f3691fc0550e9bc
2015-12-03 00:34:15 -08:00
Lode Vandevenne
5bda52d4e8 Make a separate case for low_effort in CopyImageWithPrediction
for more speed.

This gives a roughly a 1% speedup for low_effort. But actually this is a
preparation for the upcoming CL that changes RGB values of transparent pixels
based on prediction, which should not be done for low_effort because that would
slightly hurt its performance.

On 1000 PNGs, with quality 0, method 0:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.034 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.428 MP/s

Change-Id: I5ed9f599bbf908a917723f3c780551ceb7fd724d
2015-12-03 00:22:50 -08:00
Vincent Rabaud
829bd14145 Combine Huffman cost and bit entropy into one loop
The same computation was done for both values: go over two buffers,
sum them up, and take a decision on the sum at each iteration.

MIPS32 code has been disabled for now, pending a code update.

Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d
2015-11-30 13:57:25 +01:00
Lode Vandevenne
239421c5ef lossless: make prediction in encoder work per scanline
instead of per block. This prepares for a next CL that can make the
predictors alter RGB value behind transparent pixels for denser
encoding. Some predictors depend on the top-right pixel, and it must
have been already processed to know its new RGB value, so requires per
scanline instead of per block.

Running the encode speed test on 1000 PNGs 10 times with default
settings:
Before:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s
After:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s

Same but with quality 0, method 0 and 30 iterations:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s

No effect on compressed size, this produces exactly same files. No
significant measured effect on speed. Expected faster speed from better
memory layout with scanline processing but slower speed due to needing
to get predictor mode per pixel, may compensate each other.

Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334
2015-11-25 00:38:27 -08:00
Lode Vandevenne
b8c44f1aa4 3% speed improvement for lossless webp encoder for low effort mode:
prevent updating unused histogram.

Benchmark on 1000 PNGs, 30 iterations, lossless, quality 0, method 0:
before: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 34.578 MP/s
after: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.980 MP/s

Change-Id: Id62759d4d111a6ba41c85c611a15d4f6ffc9f935
2015-11-22 09:12:54 +01:00
Jyrki Alakuijala
85b44d8a69 lossless: encoding, don't compute unnecessary histo
share the computation between different modes

3-5 % speedup for lossless alpha
1 % for lossy alpha

no change in compression density

Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
2015-07-07 20:24:26 -07:00
James Zern
39216e59d9 cosmetics: fix indent after 32462a07
Change-Id: If9a5d91c25e981bc4cd81adb476244e63fc7c3c8
2015-07-01 23:49:20 -07:00
Pascal Massimino
7017001462 SSE2: speed-up some lossless-encoding functions
optimized: CollectColorRedTransforms, CollectColorBlueTransforms, SubtractGreenFromBlueAndRed

overall effect is sub-1% speed-up, though.

Change-Id: I9cb49af5c56e4c03db417929b0a2cf575d60a5c6
2015-06-24 20:09:13 -07:00
Pascal Massimino
32462a072c Speedup to HuffmanCostCombinedCount
~3% speedup for lossless encoding
Improves compression ratio by ~0.03%

Change-Id: Ic6d05fb0b1099b5ca56689b92b1c6515d54a5d6b
2015-06-23 16:41:03 +02:00
Pascal Massimino
f3d687e3fa SSE4.1 implementation of some lossless encoding functions
New implementations: SubtractGreenFromBlueAndRed and TransformColor

around 1-2% faster lossless encoding.

Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
2015-06-23 08:46:57 +02:00
James Zern
65ef5afc27 Merge "lossless: 0.13% compression density gain" 2015-06-03 03:02:09 +00:00
Jyrki Alakuijala
2beef2f245 lossless: 0.13% compression density gain
over a 1000 image corpus

Single photograph benchmark:
Before:
Q=20: 2.560 MP/s
Q=40: 2.593 MP/s
Q=60: 1.795 MP/s
Q=80: 1.603 MP/s
Q=99: 1.122 MP/s

After:
Q=20: 3.334 MP/s
Q=40: 2.464 MP/s
Q=60: 2.009 MP/s
Q=80: 1.871 MP/s
Q=99: 1.163 MP/s

This CL allows for some further improvements that would not be possible
otherwise.

Change-Id: I61ba154beca2266cb96469281cf96e84a4412586
2015-06-02 17:27:36 -07:00
Pascal Massimino
3033f24c26 lossless: 0.06 % compression density improvement
Change-Id: Ib662e6aec53b40d6bc736d3ecfd6475bb005c790
2015-06-02 14:51:51 +02:00
Pascal Massimino
c64659e1b4 remove duplicate variables after the lossless{_enc}.c split
clang was giving "duplicate symbols" error messages at link time.

Change-Id: I2b77b55222fe033cc1d4636567902e80d814aab6
2015-03-25 11:10:21 +01:00
James Zern
553051f741 dsp/lossless: split enc/dec functions
adds lossless_enc*.c; reduces the size of the decode-only so: ~78K
w/gcc-4.8.2 on x86_64.

Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092
2015-03-23 22:57:50 -07:00