Commit Graph

553 Commits

Author SHA1 Message Date
Vikas Arora
77bdddf016 Speed up BackwardReferences
Speed up BackwardReferencesHashChainDistanceOnly method by:
1.) Remove for loop for shortmax code path.
2.) Execute the shortmax code path after regular call to
HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH).
3.) Also for shortmax, call method HashChainFindOffset (for length = 2),
instead of expensive method HashChainFindCopy().
4.) Handling first pixel (i==0) outside main loop and removing one if
condition (i > 0) per pixel.
5.) Handle the last pixel outside the main 'for' loop.

Overall compression speedup observed is around 5% (+/- noise).

Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b
2014-10-30 10:58:24 -07:00
Vikas Arora
abf04205b3 Enable entropy based merge histo for (q<100)
Enable bin-partition entropy based heuristic for merging histograms
for higher (q >= 90) qualities as well. Keep the old behavior at the
maximum quality level (q==100).

This speeds up the compression between Q=90-99 (method=4) by factor 5-7X
and with loss of 0.5-0.8% in the compression density.

Change-Id: I011182cb8ae5403c565a150362bc302630b3f330
2014-10-30 03:59:36 -07:00
Vikas Arora
653ace55c3 Increase the MAX_COLOR_CACHE_BITS from 9 to 10.
The Maximum allowed limit is 11.
The Q=25 and below is not impacted as cache bits are forced to 0.
This saves 0.05% - 0.1% bytes for other quality with almost same compression
speed (+/- 2-3%, that's more of a noise).

Change-Id: Icf972a98f298c89e140e37a627baf709134be9a0
2014-10-27 14:19:04 -07:00
Vikas Arora
919220c7e6 Change the logic adjusting the Histogram bits.
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).

The compression density remains the same with this change, with little faster
compression speed.

Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
2014-10-27 09:57:17 -07:00
Vikas Arora
e912bd55be Fix bug in VP8LCalculateEstimateForCacheSize.
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.

This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.

Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
2014-10-26 20:05:48 -07:00
Pascal Massimino
5b90d8fe42 Unify the API between VP8BitWriter and VP8LBitWriter
BitReader will be next...

Change-Id: Icd9e7ab2e3890131e664c0523627d9b8c5399a74
2014-10-23 15:35:16 +02:00
Vikas Arora
c2b5a0396a Modify CostModel to allocate optimal memory.
Change-Id: I7d52675d28bfc109d4e901581fc24cd36fcb79ee
2014-10-22 13:30:33 -07:00
Vikas Arora
139142e440 Optimize BackwardReferenceHashChainFollowPath.
Instead of calling HashChainFindMethod, call a new (subset) method
HashChainFindOffset to get the offset/distance for a given length.

The encoding is tad faster at default compression

                       Before              After
                     bpp/rate            bpp/rate
442 Palette     0.2720/5.270 MP/s      0.2720/5.790 MP/s
558 non-palette 3.7607/0.797 MP/s      3.7607/0.816 MP/s

Change-Id: If4041a9c18f7e972f49fcbab8c3e2f013d8bf1cf
2014-10-21 10:04:27 -07:00
James Zern
5f36b68d22 enc/backward_references.c: fix indent
reindent after c24f895

Change-Id: I55adcbef21ea3fdaded84b138745515596191a09
2014-10-20 11:35:20 +02:00
James Zern
e0e9960dd1 Merge "sync version numbers to 0.4.2 release" 2014-10-17 11:47:30 -07:00
James Zern
64ac51446d sync version numbers to 0.4.2 release
libwebp{,decoder} - 0.4.2
libwebp libtool - 5.2.0
libwebpdecoder libtool - 1.2.0

mux/demux - 0.2.2
libtool - 1.2.0

(cherry picked from commit eec5f5f121)
(cherry picked from commit 857578a811)

Change-Id: Ie9d10c68e28083674a8865ad8447b1a70dcea95d
2014-10-17 19:50:21 +02:00
Vikas Arora
c24f8954be Simplify and speedup Backward refs computation.
Updated VP8LGetBackwardReferences and HashChainFindCopy method with following:
- Remove the recursive CostModelBuild.
- Reuse the lz77 backward refs in CostModelBuild, instead of evaluating it
  again (as it was done for recursion_level=0).
- Consolidated the Match-length logic inside FindMatchLength method.
- Removed the logic for altering best_length/val based on the 2D distance.
  The additional 162 value (+= 9 * 9 + 9 * 9 - y * y - x * x) can't change the
  best_val eval computation to choose a different curr_length, as best_val was
  set to 'curr_length << 16'.

  Following is the impact on the compression speed/density at default & max
  quality, overall this speeds up compression by 5-15% (q=100 -> 75) with a tad
  drop (0.02-0.03%) in compression density for the non-palette images.

                  Before                After
                bpp/Rate(MP/s)        bpp/Rate(MP/s)
q=75 (def)
All 1000        2.4492/1.049 MP/s     2.4498/1.230 MP/s
Palette         0.2719/5.060 MP/s     0.2719/6.110 MP/s
non-Palette     3.7597/0.732 MP/s     3.7607/0.840 MP/s

q=100
All 1000        2.4134/0.125 MP/s     2.4142/0.131 MP/s
Palette         0.2692/2.585 MP/s     0.2692/2.885 MP/s
non-Palette     3.7040/0.079 MP/s     3.7053/0.083 MP/s

Change-Id: I27a5eff3356d876c3e949fd32262244b25678b7a
2014-10-17 09:21:30 -07:00
Pascal Massimino
6c6736816c Improved near-lossless mode.
Compared to previous mode it gives another 10-30% improvement in compression keeping comparable PSNR on corresponding quality settings.

Still protected by the WEBP_EXPERIMENTAL_FEATURES flag.

Change-Id: I4821815b9a508f4f38c98821acaddb74c73c60ac
2014-10-15 10:57:21 -07:00
James Zern
aca1b98f52 enc/vp8l.c: fix indent
reindent after ca00502

Change-Id: I8c88dbc11dc96c117531b17682b764a235ef23bb
2014-10-13 11:33:23 +02:00
Vikas Arora
ca00502788 Evaluate non-palette compression for palette image
Evaluate if for Palette images (num_colors <= 256), non-palette
compression path (Subtract green, predictor transform etc) yield an
optimal compression density.

This change reduces the WebP file (for palette images) size by 0.4% with
drop of 3-5% in compression speed.

Change-Id: I1ad66fa94db4fd7ba7bc215763791ef662cd4f42
2014-10-10 11:55:45 -07:00
James Zern
c8a87bb62d AssignSegments: quiet -Warray-bounds warning
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9

Change-Id: Ifa7c0dd7f3f075b3860fa8ec176d2c98ff54fcea
2014-10-10 17:18:39 +02:00
Pascal Massimino
5f81391263 Merge "Fix return code of EncodeImageInternal()" 2014-10-07 23:49:29 -07:00
Pascal Massimino
e321abe43d Fix return code of EncodeImageInternal()
It was returning 'VP8_ENC_OK' in case of memory error.

Change-Id: I184a3e29c9f1b863637cacbe389b058d75c3dbf8
2014-10-08 08:48:53 +02:00
Pascal Massimino
f82cb06afb optimize palette ordering
We compact the palette by weighted distance, favoring the green channel.

Average gain on paletted file is ~0.5%, with gain up to 6-7% on some favorable cases.
Encoding speed is unaffected.

Disabled for alpha (or any single-channel input)

Also: always use quality=20 for EncodePalette() since it
doesn't make any real difference.

Change-Id: I19fb14316a366f139a941b45aef5663a33c905e1
2014-10-08 08:42:36 +02:00
Pascal Massimino
f545feee64 don't set the alpha value for histogram index image
This leads to tiny extra compression (~few bytes per file) for free

Change-Id: Ia4d8cef3de4365e32eacefd69a57689c80042a23
2014-10-08 08:24:19 +02:00
Pascal Massimino
2d9b0a4472 add WebPDispatchAlphaToGreen() to dsp
SSE2 version is 2.1x faster

This is used to transfer the alpha plane to green channel before lossless compression.

Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
2014-10-06 23:15:44 +02:00
Vikas Arora
d5e498d47f Change Entropy based Histogram Combine heuristic.
Don't combine the Histograms that have trivial (single valued A, R & B)
  symbols.
Following is the compression savings data along with compression time (before
& after) per image.
                     Before             After
                     bpp, rate(MP/s)    bpp, rate(MP/s)
Q=25, method = 4     2.508, 1.807       2.499, 1.916
Q=50, method = 4     2.460, 1.488       2.456, 1.512
Q=75, method = 4     2.452, 1.078       2.450, 1.092
Q=25, method = 5     2.505, 1.398       2.496, 1.383
Q=50, method = 5     2.458, 1.170       2.453, 1.143
Q=75, method = 5     2.453, 0.886       2.450, 0.855

This change provides 0.1-0.4% compression gains and speeds up the lossless
compression for the default method=4 (the drop in compression speed is between 1-3.5% for method=5).

Change-Id: Idfd88c2092f37afacd26a97097b3053f8183953a
2014-09-30 13:41:39 -07:00
Pascal Massimino
47a2d8e1d9 fix MSVC float->int conversion warning
+ add a clarifying comment

Change-Id: I8ac1df1de2e5277f2d968dec489546e680bb5e0c
2014-09-27 00:36:01 -07:00
James Zern
35ad48b848 HistoHeapInit: correct positions allocation size
Change-Id: I1879fd48bee3aea6f0504926d7030b504dd9be07
2014-09-26 11:21:19 -07:00
Pascal Massimino
45d9635fd3 lossless: entropy clustering for high qualities.
Tested on 1000 pngs corpus with quality 90-100 it gives ~0.15% improvement
in compression density and ~7% speed up.

Change-Id: I460f56c96707edb3c1f0b51a024e5122e10458df
2014-09-26 15:26:56 +02:00
skal
c792d4129a Premultiply with alpha during U/V downsampling
This prevents the 'alpha-leak' reported in issue #220

Speed-diff is kept minimal.

Change-Id: I1976de5e6de7cfcec89a54df9233c1a6586a5846
2014-09-18 23:40:34 -07:00
Vikas Arora
b901416b90 Record the lossless size stats.
Record and show the lossless header and image data sizes in the cwebp.

Change-Id: I08f19693cb7a756b6fdce5b55d71f5367b5f02fc
2014-09-17 15:16:05 -07:00
James Zern
854509fec0 enc/histogram.c: reindent after f4059d0
fixes indent in HistogramRemap after:
f4059d0 Code cleanup for HistogramRemap.

Change-Id: I9f53a088749e9100a70331bda1662488666c5156
2014-09-03 16:58:49 -07:00
skal
344219645b Merge "~3-5% faster encoding optimizing PickBestIntra*()" 2014-09-03 15:53:32 -07:00
skal
187d379db6 add a fallback to ALPHA_NO_COMPRESSION
if ALPHA_LOSSLESS_COMPRESSION produces a too big file (very rare!),
we fall-back to no-compression automatically.

Change-Id: I5f3f509c635ce43a5e7c23f5d0f0c8329a5f24b7
2014-09-02 21:55:13 +02:00
skal
a48a2d7635 ~3-5% faster encoding optimizing PickBestIntra*()
* Add early-out check for Intra16
* replace some memcpy() by pointer swap

Change-Id: I5edc5f7fbc8e39984deb48e6c045c97c61418589
2014-09-01 14:40:25 +02:00
skal
49911d4df2 Merge "fix indentation" 2014-08-27 07:52:36 -07:00
Vikas Arora
f4059d0c7d Code cleanup for HistogramRemap.
Avoid call to HistogramAddThresh when there's only one Histogram image.
Change-Id: I43b09e8e2d218c95969567034224777dcce37ab9
2014-08-26 15:45:22 -07:00
skal
e632b0929b fix indentation
Change-Id: I2294a6c83e5f345f64bd5120b91532e00ed6c543
2014-08-25 23:52:09 -07:00
skal
73d361dd5f introduce VP8EncQuantize2Blocks to quantize two blocks at a time
No speed diff for now. We might reorder better the instructions later,
to speed things up.

Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c
2014-08-25 20:21:42 -07:00
skal
2523aa73cb SmartRGBYUV: fix odd-width problem with pixel replication
rightmost pixel was missing a copy, which could lead to invalid read.

Also added a lower dimension of 4, below which we use the regular conversion.
This is to prevent corner cases, in addition to not being overkill.

Change-Id: Iac12e7a3d74590f12fe8eeb1830b9891e61439f6
2014-08-18 15:58:36 -07:00
Pascal Massimino
ee52dc4e54 fix some MSVC64 warning about float conversion
Change-Id: I27ab27fc15033d27d0505729f6275fb542c8d473
2014-08-16 00:15:29 -07:00
skal
e2a83d7109 faster RGB->YUV conversion function (~7% speedup)
with a special case for dithering==0., it gets somewhat faster on x86
thanks to inlining.

Also, less macros.

Change-Id: Ic2f2bf6718310743bb40cef2104fa759a073e6d5
2014-08-15 11:13:25 -07:00
skal
3fc4c539aa Add smart RGB->YUV conversion option -pre 4
New function: WebPPictureSmartARGBToYUVA()
This implement smart RGB->YUV conversion.

This is rather undocumented for now, and is triggered using '-pre 4'
preprocessing option.

This is slow-ish and use quite some memory, but should be improvable.
This is somehow a usable beta version.

Change-Id: Ia50a8c30134e4cab8a7d3eb70aef13ce1f6187a1
2014-08-15 10:55:09 -07:00
pascal massimino
bb07022b66 Merge "cosmetics" 2014-08-06 12:30:08 -07:00
James Zern
e300c9d819 cosmetics
fix some indent/whitespace, remove a few duplicate includes, extra
semi-colons

Change-Id: If937182b40a21e0f2028496e7b4b06c6e8a41352
2014-08-06 12:10:59 -07:00
pascal massimino
0e519eea8e Merge "cosmetics: remove some extraneous 'extern's" 2014-08-05 23:00:04 -07:00
James Zern
f7b4c48bba cosmetics: remove some extraneous 'extern's
Change-Id: Ib3f0cff37120c51633387dd1c46592c53ab0ba6d
2014-08-05 22:14:24 -07:00
James Zern
b47fb00ac0 vp8enci.h: cosmetics: fix '*' placement
associate with the type

Change-Id: Icf94f11bf79f6ccee3150e27b228755f8f3f0f37
2014-08-05 22:14:12 -07:00
skal
b5a36cc9ad add -near_lossless [0..100] experimental option
This compresses the uimage using lossless compression and controlable
decimating pre-process.
Code is under WEBP_EXPERIMENTAL_FEATURE while it's being experimented with.

Change-Id: I8b7f4cfcc3c6afc52a556102842bdbb045ed5ee8
2014-08-05 19:17:10 +02:00
James Zern
29a9fe222a libwebp 0.4.1
- 7/24/14: version 0.4.1
   This is a binary compatible release.
   * AArch64 (arm64) & MIPS support/optimizations
   * NEON assembly additions:
     - ~25% faster lossy decode / encode (-m 4)
     - ~10% faster lossless decode
     - ~5-10% faster lossless encode (-m 3/4)
   * dwebp/vwebp can read from stdin
   * cwebp/gif2webp can write to stdout
   * cwebp can read webp files; useful if storing sources as webp lossless
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJT1xp9AAoJEPnD1r24IytdjDEP/3ZOnrWG0OIThlGE6bqgO3oy
 Y5O7RrvzFuPdGEZ1Kl9jDXjzsYY018/+HJmOD3kf+Qt/+F/8hpGH520VuEiJdVIW
 UcvoYaYq9xrmKNqEJx910Vh8TP7wE2T62OJcqKWg2JEczfUWn8WOKjmM5c8N1kJ2
 q6EbpCdWlxcD49L/MavJ5Yfw9jSZAjKzOIxxz0C294iMTK4IcSmeVvdqhkdyh96E
 CABw3o8sJfqB6p+KXjweXcE2KOhvzAWqTRcIogDC0jV/PgOlindf6k0am2FJHvMM
 A+sf/pmD0YKI1vEaXW+Vs6cz6LzvwbIkJSwuzBA7FYHAG5yqTSkQDxTSttw/RwiW
 fUScqHjQVBUqkM5bdOsdYBSDutQKDF2+WfcK5jXFdnydkQi59HKHV2R0K5cXYqfN
 Tu7aMBqFcfGunLlzfKCJcz8SElEmUjG6oAzRZYcdM9dmnR7ypQK17A/GbaysKKOE
 HMmep7uNX25w+6AL7zExnmPPPtSz+kj1SXt9fgldkelDhg1faAgfwXb/N4E+00lA
 1+aJD3gHcR4QnDI4gnKBKHyIktQPfNKMQ6xuL0oyvsalQ/loz08wu0aACcGDFrg4
 uOVVxTqU+pEITuwGcNk228+O2EbMWzzi3+Vhi1v3Gg3jJ3TRB3QN6NohmrsIackL
 4W2V5NoX5i2VizGfLy2g
 =GWd5
 -----END PGP SIGNATURE-----

Merge tag 'v0.4.1'

libwebp 0.4.1
- 7/24/14: version 0.4.1
  This is a binary compatible release.
  * AArch64 (arm64) & MIPS support/optimizations
  * NEON assembly additions:
    - ~25% faster lossy decode / encode (-m 4)
    - ~10% faster lossless decode
    - ~5-10% faster lossless encode (-m 3/4)
  * dwebp/vwebp can read from stdin
  * cwebp/gif2webp can write to stdout
  * cwebp can read webp files; useful if storing sources as webp lossless

* tag 'v0.4.1':
  update ChangeLog
  iosbuild.sh: specify optimization flags
  update ChangeLog
  makefile.unix: add vwebp.1 to the dist target
  update ChangeLog
  gif2webp: dust up the help message
  remove -noalphadither option from README/vwebp.1
  update NEWS for the next release
  update AUTHORS
  bump version to 0.4.1
  restore mux API compatibility
  remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
  restore encode API compatibility
  restore decode API compatibility
  gif2webp: fix compile with giflib 5.1.0
  gif2webp: simplify giflib version checking

Change-Id: Icf599f29bc6c0db757bc133aaddb3dbbbc316e08
2014-07-29 18:06:58 -07:00
James Zern
85213b9bbe bump version to 0.4.1
libwebp{,decoder} - 0.4.1
libwebp libtool - 5.1.0
libwebpdecoder libtool - 1.1.0

mux/demux - 0.2.1
libtool - 1.1.0

Change-Id: If593a198f802fd68c7dbbdbe0fc2612dbc44e2df
2014-07-23 17:17:25 -07:00
James Zern
c2fc52e4ec restore encode API compatibility
protect WebPConfigLosslessPreset/WebPMemoryWriterClear w/a
WEBP_ENCODER_ABI_VERSION check

Change-Id: If4debc15fee172a3f18079bc2bd29eb8447bc14b
2014-07-22 22:19:55 -07:00
Vikas Arora
d2cc61b7dd Extend MakeARGB32() to accept Alpha channel.
Change-Id: I31b8e2d085000e2e3687a373401e4f655f11fc42
2014-07-21 14:49:38 -07:00
skal
3398d81ac3 Actuate memory stats for PRINT_MEMORY_INFO
Change-Id: If7eac591b5205990ca452ca02b084a908482850a
2014-07-21 13:16:18 -07:00
skal
6c347bbb0c move WebPPictureInit to picture.c
Change-Id: I4b8c352cfd47256d0c3827334a6942c1caf742f6
2014-07-21 14:16:19 +02:00
Pascal Massimino
1549d62067 reorder the YUVA->ARGB and ARGB->YUVA functions correctly
+ rework few loops
+ consolidate few error-checks / error-reporting
+ don't modify picture->colorspace in Import() for ARGB output

Change-Id: Iae6da9b50acc738c59b85c3ee64efbaf6af8bffc
2014-07-18 07:15:54 -07:00
Pascal Massimino
736f2a175e extract colorspace code from picture.c into picture_csp.c
had to refactor few functions here and there.

Change-Id: I86fde6fec7c2fc7eb48f0ecf327dbbd2bd40b9d4
2014-07-16 16:37:26 -07:00
Pascal Massimino
fbadb48026 split monolithic picture.c into picture_{tools,psnr,rescale}.c
Change-Id: Ia5eb5496e4337e5bac8203872c5b014cad21c4f9
2014-07-12 09:13:33 -07:00
Pascal Massimino
257adfb0be remove experimental YUV444 YUV422 and YUV400 code
(never used)

Change-Id: I12a886703592133939607df05132e9b498f916c1
2014-07-03 22:26:25 -07:00
Vikas Arora
516971b136 lossless: Remove unaligned read warning
(typecast uint32 pointer to uint64).
The proposed change is little (0.05%) slower but avoids uint32 to uint64
pointer conversion.

Change-Id: I6b8828077ea1324fabd04bfa7e7439e324776250
2014-07-02 20:55:27 -07:00
skal
824eab1086 fix (uncompiled) typo
output was not properly descaled when compiled
without USE_GAMMA_COMPRESSION

Change-Id: I5282c9573cfcfbef4be22b6b3cdec7e16cad9c1c
2014-06-27 17:10:25 +02:00
skal
790207679d Merge "make error-code reporting consistent upon malloc failure" 2014-06-13 00:25:30 -07:00
skal
77bf4410f7 make error-code reporting consistent upon malloc failure
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)

Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)

Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
2014-06-13 08:45:12 +02:00
James Zern
7a93c000ee **/Makefile.am: remove unused AM_CPPFLAGS
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.

Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
2014-06-12 11:59:05 -07:00
skal
24e3080571 Add an interface abstraction to the WebP worker thread implementation
This allows custom implementations of threading mecanism.

Patch by Leonhard Gruenschloss.

Change-Id: Id8ea5917acd2f24fa8bce79748d1747de2751614
2014-06-12 11:35:44 +02:00
James Zern
32b3137936 configure: move config.h to src/webp/config.h
this change has the side-effect of using directory names in the
include, silencing a lint warning.

Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
2014-06-10 23:42:00 -07:00
skal
69fce2ea78 remove the special casing for res->first in VP8SetResidualCoeffs
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.

also: fixes a warning for win64
    + slight speed-up

Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
2014-06-08 06:40:22 +02:00
skal
158aff9bb9 remove unused #include's
Change-Id: Icd91a4b6a0bde49145f57e3e74a997822c45792c
2014-06-03 08:01:05 +02:00
skal
6679f8996f Optimize VP8SetResidualCoeffs.
Brings down WebP lossy encoding timings by 5%

Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489
2014-06-03 06:44:04 +02:00
skal
399b916d27 lossy decoding: correct alpha-rescaling for YUVA format
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.

Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()

Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
2014-05-27 15:27:13 -07:00
Pascal Massimino
ef076026af use decoder's DSP functions for autofilter
-af is now faster (6-7%), since we're using the SSE2 variant
Output is binary the same as before.

Change-Id: If75694594c9501cd486b8f237a810ddcc145cadd
2014-05-20 14:55:05 -07:00
James Zern
34168ecbe4 Merge "remove all unused layer code" 2014-05-08 22:51:13 -07:00
Pascal Massimino
f1e771735a remove all unused layer code
Change-Id: I220590162b24c70f404fe3087f19dd3e6cac3608
2014-05-08 22:37:38 -07:00
Vikas Arora
b0757db7c6 Code cleanup for VP8LGetHistoImageSymbols.
Fix comments and few nits.
Change-Id: I8fa25ed523f12c6a7bfe125f0e4d638466ba4304
2014-05-08 14:13:47 -07:00
skal
5fe628d35d make the token page size be variable instead of fixed 8192
also changed the token-page layout a little bit to remove
a not-needed field.

This reduces the number of malloc()/free() calls substantially
with minimal increase in memory consumption (~2%).
For the tail of large sources, the number of malloc calls goes
typically from ~10000 to ~100 (e.g.: bryce_big.jpg: 22711 -> 105)

Change-Id: Ib847f41e618ed8c303d26b76da982fbc48de45b9
2014-05-05 14:26:14 -07:00
skal
ca3d746e39 use block-based allocation for backward refs storage, and free-lists
Non-photo source produce far less literal reference and their
buffer is usually much smaller than the picture size if its compresses
well. Hence, use a block-base allocation (and recycling) to avoid
pre-allocating a buffer with maximal size.

This can reduce memory consumption up to 50% for non-photographic
content. Encode speed is also a little better (1-2%)

Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11
2014-05-05 11:11:55 -07:00
skal
d3bcf72bf5 Don't allocate VP8LHashChain, but treat like automatic object
the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_)
is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain
in the encoder.

Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677
2014-04-30 14:10:48 -07:00
Pascal Massimino
cf5eb8ad19 remove some uint64_t casts and use.
We use automatic int->uint64_t promotion where applicable.

(uint64_t should be kept only for overflow checking and memory alloc).

Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125
2014-04-29 09:08:25 -07:00
Pascal Massimino
b3a616b356 make HistogramAdd() a pointer in dsp
* merged the two HistogramAdd/AddEval() into a single call
  (with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
  of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster

Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
2014-04-28 10:09:34 -07:00
Pascal Massimino
4825b4360d fix warning about size_t -> int conversion
+ re-order and add some const

Change-Id: I3746520b75699e56e20835d10d1dd9cd9fd6d85d
2014-04-27 00:50:07 -07:00
Vikas Arora
0b896101b4 Reduce memory footprint for encoding WebP lossless.
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.

Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.

Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.

Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
2014-04-26 01:14:33 -07:00
skal
75b12006e3 Move the HuffmanCost() function to dsp lib
This is to help further optimizations.
(like in https://gerrit.chromium.org/gerrit/#/c/69787/)

There's a small slowdown (~0.5% at -z 9 quality) due to
function pointer usage. Note that, for speed, it's important
to return VP8LStreaks by value, and not pass a pointer.

Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000
2014-04-18 11:59:48 -07:00
skal
a9fc697cb6 Merge "WIP: extract the float-calculation of HuffmanCost from loop" 2014-04-15 11:33:11 -07:00
Djordje Pesut
4ae0533f39 MIPS: MIPS32r1: Added optimizations for ExtraCost functions.
ExtraCost and ExtraCostCombined

Change-Id: I7eceb9ce2807296c6b43b974e4216879ddcd79f2
2014-04-15 15:37:06 +02:00
skal
b30a04cf11 WIP: extract the float-calculation of HuffmanCost from loop
new function: VP8FinalHuffmanCost()

Change-Id: I42102f8e5ef6d7a7af66490af77b7dc2048a9cb9
2014-04-15 14:52:52 +02:00
Slobodan Prijic
2b1b4d5ae9 MIPS: MIPS32r1: Add optimization for GetResidualCost
+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual

Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d
2014-04-08 15:28:49 +02:00
skal
869eaf6c60 ~30% encoding speedup: use NEON for QuantizeBlock()
also revamped the signature to avoid having to pass the 'first' parameter

Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5
2014-04-08 03:08:22 -07:00
Vikas Arora
bc374ff39e Use histogram_bits to initalize transform_bits.
This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.

Change-Id: I491aa1c726def934161d4a4377e009737fbeff82
2014-04-02 11:46:40 -07:00
Vikas Arora
6af6b8e1b6 Tune HistogramCombineBin for large images.
Tune HistogramCombineBin for hard images that are larger than 1-2 Mega
pixel and represent photographic images.

This speeds up lossless encoding on 1000 image corpus by 10-12% and compression
penalty of 0.1-0.2%.

Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81
2014-03-28 07:09:59 -07:00
skal
af93bdd6bc use WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free
there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.

The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.

Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).

Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b
2014-03-27 15:50:59 -07:00
James Zern
fbed36433d Merge "dsp: reuse wht transform from dec in encoder" 2014-03-26 15:13:07 -07:00
skal
d1b33ad58b 2-5% faster trellis with clang/MacOS
(and ~2-3% on ARM)

We don't need to store cost/score for each node, but only for
the current and previous one -> simplify code and save some memory.

Also made the 'Node' structure tighter.

Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6
2014-03-26 22:33:01 +01:00
James Zern
df230f2723 dsp: reuse wht transform from dec in encoder
Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717
2014-03-22 13:25:08 -07:00
Pascal Massimino
59daf08362 Merge "cosmetics:" 2014-03-18 04:02:33 -07:00
Pascal Massimino
536220084c cosmetics:
- use VP8ScanUV, separate from VP8Scan[] (for luma)
 - fix indentation
 - few missing consts
 - change TrellisQuantizeBlock() signature

Change-Id: I94b437d791cbf887015772b5923feb83dd145530
2014-03-18 03:34:56 -07:00
James Zern
3e7f34a3fb AssignSegments: quiet array-bounds warning
nb (enc->segment_hdr_.num_segments_) will be in the range
[1, NUM_MB_SEGMENTS].

Change-Id: I5c2bd0bb82b17c99aff39c98b6b1747fc040dc16
2014-03-14 18:47:52 -07:00
James Zern
cf821c821f UpdateHistogramCost: avoid implicit double->float
all the functions involved return double and later these locals are used
in double calculations. fixes a vs build warning

Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14
2014-03-14 11:18:52 -07:00
Vikas Arora
1c58526fe1 Fix few nits
Add/remove few casts, fixed indentation.

Change-Id: Icd141694201843c04e476f09142ce4be6e502dff
2014-03-13 13:57:39 -07:00
Vikas Arora
fef22704ec Optimize and re-structure VP8LGetHistoImageSymbols
Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash
for merging the Histograms more efficiently, instead of the randomized
heuristic of existing method HistogramCombine.

This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50)
with 0.8% penalty in compression density. For lower method, the speed up is 25-30%,
with 0.4% penalty in the compression density.

Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce
2014-03-13 11:48:37 -07:00
Vikas Arora
5f0cfa80ff Do a binary search to get the optimum cache bits.
This speeds up the lossless encoder by a bit (1-2%), without impacting the
compression density.

Change-Id: Ied6fb38fab58eef9ded078697e0463fe7c560b26
2014-03-13 10:30:32 -07:00
skal
65b99f1c92 add a -z option to cwebp, and WebPConfigLosslessPreset() function
These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
   cwebp -z 5 in.png -o out_lossless.webp

There are 10 possible values for -z parameter:
   0 (fastest, lowest compression)
to 9 (slowest, best compression)

A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.

This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.

Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04
2014-03-11 23:25:35 +01:00
skal
30176619c6 4-5% faster trellis by removing some unneeded calculations.
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)

Output is bitwise the same as before.

Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
2014-03-06 15:57:25 +01:00
James Zern
687a58ecc3 histogram.c: reindent after b33e8a0
b33e8a0 Refactor code for HistogramCombine.

Change-Id: Ia1b4b545c5f4e29cc897339df2b58f18f83c15b3
2014-03-04 00:38:14 -08:00
James Zern
42eb06fc0e Merge "few cosmetics after patch #69079" 2014-03-03 15:13:25 -08:00