Commit Graph

616 Commits

Author SHA1 Message Date
Vincent Rabaud
d3d163972f Optimize the heap usage in HistogramCombineGreedy.
The previous priority system used a heap which was too heavy to
maintain (what was gained from insertions / deletions was lost
due to a linear that still happened on the heap for invalidation).
The new structure is a priority queue where only the head is
ordered.

Change-Id: Id13f8694885a934fe2b2f115f8f84ada061b9016
2015-12-10 12:44:11 +01:00
Pascal Massimino
14d27a46be improve method #2 by merging DistoRefine() and
SimpleQuantize()

it's now a single function, that reconstructs the intra4x4 block during the scan
The I4_PENALTY had to be adjusted.

Overall, result is better quality-wise (esp. at q < 50), and a tad faster too.

method #0, #1 and #3+ are unchanged

Change-Id: If262aeb552397860b3dd532df8df6b1357779222
2015-12-10 08:04:04 +01:00
Pascal Massimino
7eb01ff3e8 Merge "Improved alpha cleanup for the webp encoder when prediction transform is used." 2015-12-08 11:32:37 +00:00
Pascal Massimino
fb8c9106c7 Merge "introduce WebPMemToUint32 and WebPUint32ToMem for memory access" 2015-12-08 11:32:05 +00:00
Vincent Rabaud
6c702b81ac Speed up hash chain initialization using memset.
That gains 1% on lossy compression.

Change-Id: Ib9aa210194ed2f17eaff85b499b55cc4eb99ff11
2015-12-07 11:54:50 +01:00
Lode Vandevenne
6938111357 Improved alpha cleanup for the webp encoder when prediction transform is used.
Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset:
Alpha cleanup before: 18856614
Alpha cleanup after: 18685802
For reference, with no alpha cleanup: 19159992

Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps
preprocessing in the encoder, and the cases when the prediction transform is not used.

Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28
2015-12-04 13:50:56 +00:00
Pascal Massimino
2c08aac81a introduce WebPMemToUint32 and WebPUint32ToMem for memory access
it uses memcpy() when unaligned memory write is tricky

Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874
2015-12-04 13:43:01 +00:00
Vincent Rabaud
010ca3d10d Fix FindMatchLength with non-aligned buffers.
The 32-bit buffers are actually rarely 64-bit aligned.
The new solution uses memcmp and is alignment agnostic.
It is also slightly faster.

Change-Id: I863003e9ee4ee8a3eed25b7b2478cb82a0ddbb20
2015-12-04 10:19:58 +01:00
Scott Hancher
5ae220bef6 backward_references.c: Fixed compiler warning
"Implicit conversion loses integer precision: 'long' to 'int'."

Change-Id: I1aec7431f84123e5280447883eb80b84a3821d91
2015-12-02 23:51:06 -08:00
Vincent Rabaud
a141178255 Optimization in hash chain comparison for 64 bit
Arrays were compared 32 bits at a time, it is now done 64 bits at a time.
Overall encoding speed-up is only of 0.2% on @skal's small PNG corpus.
It is of 3% on my initial 1.3 Mp desktop screenshot image.

Change-Id: I1acb32b437397a7bf3dcffbecbcd4b06d29c05e1
2015-12-01 13:01:57 +01:00
Lode Vandevenne
239421c5ef lossless: make prediction in encoder work per scanline
instead of per block. This prepares for a next CL that can make the
predictors alter RGB value behind transparent pixels for denser
encoding. Some predictors depend on the top-right pixel, and it must
have been already processed to know its new RGB value, so requires per
scanline instead of per block.

Running the encode speed test on 1000 PNGs 10 times with default
settings:
Before:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s
After:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s

Same but with quality 0, method 0 and 30 iterations:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s

No effect on compressed size, this produces exactly same files. No
significant measured effect on speed. Expected faster speed from better
memory layout with scanline processing but slower speed due to needing
to get predictor mode per pixel, may compensate each other.

Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334
2015-11-25 00:38:27 -08:00
Pascal Massimino
3770f3bbb6 Merge "cleanup the YFIX/TFIX difference by removing some code and #define" 2015-11-23 20:47:42 +00:00
Pascal Massimino
997e103871 cleanup the YFIX/TFIX difference by removing some code and #define
no speed or output difference

Change-Id: I50bfb44f357e19431457b1cf9504a5a6bcce1945
2015-11-21 23:51:58 -08:00
Lode Vandevenne
1f9be97c22 Make discarding invisible RGB values (cleanup alpha) the default.
Rename the flag to exact instead of the opposite cleanup_alpha. Add the flag to
WebPConfig. Do the cleanup in the webp encoder library rather than the cwebp
binary, this will be needed for the next stage: smarter alpha cleanup for
better compression which cannot be done as a preprocessing due to depending on
predictor choices in the encoder.

Change-Id: I2fbf57f918a35f2da6186ef0b5d85e5fd0020eef
2015-11-21 12:32:32 -08:00
Urvang Joshi
397863bd66 Refactor CopyPlane() and CopyPixels() methods: put them in utils.
Change-Id: I0e1533df557a0fa42c670e3b826fc0675c36e0a5
2015-11-13 11:39:22 -08:00
Urvang Joshi
6ecd72f845 Re-enable encoding of alpha plane with color cache for next release.
This is a revert of: https://chromium-review.googlesource.com/#/c/73607/

Change-Id: I7ec45277d73608d77d5e873290c6c185caa30c32
2015-11-13 07:15:19 +00:00
Pascal Massimino
bfd3fc02df ~2x faster SSE2 RGB24toY, BGR24toY, ARGBToY|UV
global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)

Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV

Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
2015-11-06 15:02:01 -08:00
Pascal Massimino
52fdbdfe66 extract some RGB24 to Luma conversion function from enc/ to dsp/
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.

ConvertRowToY() should go into dsp/ too, at some point.

Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425
2015-10-30 00:28:11 -07:00
James Zern
5bd04a087c sync versions with 0.4.4
libwebp{,decoder} - 0.4.4
libwebp libtool - 5.4.0
libwebpdecoder libtool - 1.4.0

mux/demux - 0.2.2 (unchanged)
libtool - 1.2.0 (unchanged)

(cherry picked from commit 62864042c0)

Change-Id: I7d421dc47ad4d25a17450ce1b04562c5d58c596b
2015-10-28 23:43:40 -07:00
James Zern
f717b82864 vp8l.c, cosmetics: fix indent after 95509f9
95509f9 large re-organization of the delta-palettization code

Change-Id: I9d27f15cb6072a2bd1dd593d53db5b2dd3c30133
2015-10-19 12:28:57 -07:00
Pascal Massimino
fea94b2b36 fix alignment of allocated memory in AllocateTransformBuffer
likely to avoid unaligned reads in the future

Change-Id: I434ba17c139ad6e190ebd9b909b241c6c6f1e7f8
2015-10-18 13:09:22 -07:00
Pascal Massimino
12ec204ec7 moved ALIGN_CST into util/utils.h and renamed WEBP_ALIGN_xxx
Note that ALIGN_CST is still kept different in dec/frame.c for now,
because the values is 31 there, not 15. We might re-unite these two
later.

Change-Id: Ibbee607fac4eef02f175b56f0bb0ba359fda3b87
2015-10-14 00:03:14 -07:00
Pascal Massimino
95509f9914 large re-organization of the delta-palettization code
same functionality, but better code layout.

What changed:
  * don't trash the palette_[] in EncodePalette(), so it can be re-used
  * split generation of image from bit-stream coding
  * move all the delta-palette code to delta_palettization.c, and only have 1 entry point there WebPSearchOptimalDeltaPalette()
  * minimize the number of "#ifdef WEBP_EXPERIMENTAL_FEATURES" in vp8l.c
  * clarify the TransformBuffer stuff. more clean-up to come here...

This should make experimenting with delta-palettization easier and more compartimentalized.

Change-Id: Iadaa90e6c5b9dabc7791aec2530e18c973a94610
2015-10-14 00:25:42 +02:00
James Zern
f7b8f90740 delta_palettization.*: add copyright
Change-Id: I5dc0ae0de88968d2c73b7025ce18319897219630
2015-10-03 10:05:09 -07:00
Mislav Bradac
c1e1b7104c Changed delta palette to compress better
New palette compresses more than 20% better with minimum quality loss.
Tested on set of wikipedia images with command line:
cwebp -delta_palettization

Change-Id: I82ec7d513136599cd70386f607f634502eb9095d
2015-10-03 08:48:42 +00:00
Mislav Bradac
48f66b6687 Add delta_palettization feature to WebP
Change-Id: Ibaf4e49aa67d63d0eb11848cca4fd0c60815864a
2015-10-02 14:29:54 -07:00
Pascal Massimino
5ff0079ece fix rescaler vertical interpolation
* vertical expansion now uses bilinear interpolation
  * heavily assumes that the alpha plane is decoded in full, not row-by-row
  * split the RescalerExportRow and RescalerImportRow methods into Shrink
    and Expand variants.
  * MIPS implementation of ExportRowExpand is missing.

There's room for extra speed optim and code re-org, but let's keep that for later patches.

addresses https://code.google.com/p/webp/issues/detail?id=254

Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb
2015-09-18 17:32:11 -07:00
James Zern
cd82440ec7 VP8LAllocateHistogramSet: align histogram[] entries
fixes issue #262: a SIGBUS when accessing a misaligned double in
VP8LHistogram

Change-Id: Ic78cc5366d7e43d892c375b6a69dce2379db931b
2015-09-17 22:59:01 -07:00
Jyrki Alakuijala
90fcfcd905 Insert less hash chain entries from the beginnings of long copies.
This makes the chains more efficient and a larger variety of data is tested.

0.02 % compression gain at q 100, 0.05 % at default quality. 0.8 % speedup by
callgrind.

0.16 % compression gain for lossy alpha ?!

Change-Id: I888120133352799eb14f5f602c7f40ab404bd665
2015-08-18 18:44:03 -07:00
skal
c5f00621c7 incorporate bzero() into WebPRescalerInit() instead of call site
Change-Id: I9ebb83e643e24bc685a1a1cb6836cb54e34a0ec8
2015-08-14 19:37:22 -07:00
James Zern
b918724280 utils/rescaler: add WebPRescalerGetScaledDimensions
+ use it in WebPPictureRescale()

Change-Id: I491bea8cd56f0eb1ac8bf0829b9f36c77804219a
2015-08-13 20:50:38 -07:00
skal
56a2e9f5e7 WebPPictureDistortion: support ARGB format for 'pic' when computing distortion.
using a *tmp_plane buffer to split a/r/g/b planes up appeared to
be the easiest route, compared to copy-pasting the whole code and
making it x_stride aware...

Change-Id: I0898ef1df62bd3e1713b77187b31b5eeef3832fe
2015-08-11 17:28:29 -07:00
Jyrki Alakuijala
b969f888ab Reduce magic in palette reordering
Slightly faster on -m 0 -q 0, particularly for small images (50 x 75
image was 0.1 % faster on callgrind measurement).

Increases compression density by 0.005 % for the 1000 images, but small
images can improve even 0.5 % (about 4 bytes, depending on the
characteristics of the palette).

Change-Id: I94f568d396ac62a054a829abeeef3eb0af6b3f94
2015-08-10 19:06:07 -07:00
Pascal Massimino
7df93893dc fix rescaling bug (uninitialized read, see bug #254).
the x_add/x_sub increments were wrong for u/v in the upscaling case.
They shouldn't be left to the caller's discretion, but set up by
WebPRescalerInit to their exact necessary values.

-> Cleaned-up WebPRescalerInit() param list.
-> added safety asserts
-> removed the mips32/mips_r2 variant of "ImportRow" which were buggy prior

Change-Id: I347c75804d835811e7025de92a0758d7929dfc09
2015-08-05 23:00:00 -07:00
Jyrki Alakuijala
01d61fd9c6 lossless: ~20 % speedup
0.28 % byte size increase on lossless, 0.18 % increase on lossy alpha

Change-Id: I1e001a56831a8f996ac522aa646f9ae587c80d12
2015-07-20 17:13:44 -07:00
Jyrki Alakuijala
f722c8f0bd lossless: Speed up ComputeCacheEntropy by 40 %
a total impact of 1 % on encoding speed

This allows for performance neutral removal of the binary search
in cache bits selection. This will give a small improvement in
compression density.

Change-Id: If5d4d59460fa1924ce71af977320834a47c2054a
2015-07-20 17:13:44 -07:00
Jyrki Alakuijala
17eb609916 lossless: Allow copying from prev row in rle-mode.
0.21 % compression density improvement for 1000 png corpus in
lossless mode

0.50 % compression density improvement for 1000 png corpus in
lossy mode

Change-Id: I14ee8c427ae5d3e116b0ee6695fcdea3321a319d
2015-07-20 17:13:43 -07:00
Jyrki Alakuijala
52931fd548 lossless: combine the Huffman code with extra bits
gives 2 % speedup
24.9 -> 25.5 MP/s for a photo with -q 0 -m 0

Change-Id: If9ae04683a86dd7b1fced2183cf79b9349a24a9e
2015-07-07 20:24:28 -07:00
Jyrki Alakuijala
c4855ca249 lossless: Inlining add literal
this is a simple speedup of about 1-2 %

Change-Id: I0c7b01c0a69f4aeaf363ffda05a28871f1def696
2015-07-07 20:24:28 -07:00
Jyrki Alakuijala
8e9c94dedb lossless: simplify HashChainFindCopy heuristics
for small speedup
0.0003 % worse compression

Change-Id: Ic4b6b21e5279231c6321f2cec1c79f7e17e56afa
2015-07-07 20:24:27 -07:00
Jyrki Alakuijala
888429f409 lossless: 0.5 % compression density improvement
do not do length 2 matches far away

speedup for non compressible data by inserting two literals at a time
when no matches are found

Change-Id: Ia8e033071f4186bb8148bb2bf13ca37586734aa3
2015-07-07 20:24:27 -07:00
Jyrki Alakuijala
7b23b19808 lossless: Add zeroes into the predicted histograms.
Increases compression density by 0.03 % for lossy.

Speeds up at least one of the lossy alpha images by 20 %.

Palette entropy 'kludge' seems to save 1-2 % on alpha images.

Change-Id: I2116b8d81593ac8173bfba54a7c833997fca0804
2015-07-07 20:24:27 -07:00
Jyrki Alakuijala
85b44d8a69 lossless: encoding, don't compute unnecessary histo
share the computation between different modes

3-5 % speedup for lossless alpha
1 % for lossy alpha

no change in compression density

Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
2015-07-07 20:24:26 -07:00
Jyrki Alakuijala
d92453f381 lossless: Remove about 25 % of the speed degradation
introduced in:
"lossless: 0.37 % compression density improvement"

Uses the statistics of red and blue histograms to decide if to run
cross color correction at all.

Improves compression density by 0.02 % or so.

Change-Id: I47429557e9cdbd9fa90c584696f241b17427d73f
2015-07-07 20:24:26 -07:00
Jyrki Alakuijala
2cce031704 Faster alpha coding for webp
No significant size degradation (+0.001 %) for 1000 image corpus

Fixes the 8 ms vs 2 ms degradation from:
"lossless: 0.37 % compression density improvement"

Change-Id: Id540169a305d9d5c6213a82b46c879761b3ca608
2015-07-07 20:24:25 -07:00
Jyrki Alakuijala
5e75642efd lossless: rle mode not to accept lengths smaller than 4.
Gives a compression gain of 0.22 %

Change-Id: I0f3b8dad6b4c1bfb16eab095a467f34466b9e3b7
2015-07-07 20:24:25 -07:00
Jyrki Alakuijala
84326e4ab0 lossless: Less code for the entropy selection
Tested:
	1000 png corpus gives same results

Change-Id: Ief5ea7727290743b9bd893b08af7aa7951f556cb
2015-07-07 20:24:25 -07:00
Jyrki Alakuijala
16ab951abf lossless: 0.37 % compression density improvement
counting the entropy expectation for five different configurations:
palette
non-predicted
non-predicted with subtract green
predicted
predicted with subtract green

and choose the strategy with the smallest expected entropy

Change-Id: Iaaf209c0d565660a54a4f9b3959067afb9951960
2015-07-07 20:24:24 -07:00
skal
ac76801159 introduce FTransform2 to perform two transforms at a time.
FTransform goes from ~12.0% to 11.5% total CPU time.

Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
James Zern
dbba67d1e7 histogram.h: cosmetics: remove unnecessary includes
Change-Id: Ia8277d3587534c2a1af05d3df57a6973a68be16d
2015-04-17 12:23:06 -07:00