Commit Graph

2216 Commits

Author SHA1 Message Date
Pascal Massimino
0209d7e6d6 Merge "speed-up MapToPalette() with binary search" 2016-05-23 14:09:34 +00:00
Pascal Massimino
fdd29a3d3f speed-up MapToPalette() with binary search
goes from ~2% CPU to ~0.7% for large images
Change-Id: Ibd8a0fde9ba553f93157a49dcb7da0426209e404
2016-05-23 15:39:54 +02:00
James Zern
cf4a651bb8 Revert "Refactor GetColorPalette method."
This reverts commit 169004b1d5.

this changes the ABI, so should bump versions and add a note to NEWS
when we're ready to expose it

Change-Id: Ic5bbd0aee2b6fd0f9d438a9effedf22fe0cec4bf
2016-05-20 17:05:33 -07:00
James Zern
0a27aca3f8 Merge changes Idfa8ce83,I19adc9c4
* changes:
  WebPAnimEncoder: Restore original canvas between multiple encodes.
  Refactor GetColorPalette method.
2016-05-21 00:03:15 +00:00
Urvang Joshi
f25c4406e6 WebPAnimEncoder: Restore original canvas between multiple encodes.
tl;dr
We do the following:
- Start with transparent value of 0x00000000 instead of 0x00ffffff, so that
WebPCleanupTransparentAreaLossless() is a no-op.
- Restore the original canvas after lossy encoding, to discard changes made by
WebPCleanupTransparentArea() before the next encode.

Explanation of why:
In the mixed mode, anim_encoder tries to encode using both lossless and lossy
compression. In fact, when "min_size" option is enabled, there are at most 4
encodes that can happen in this order:
- lossless with dispose none
- lossy with dispose none
- lossless with dispose background
- lossy with dispose background

But both lossless and lossy both potentially modify the canvas during encode
(for better compression):
- Lossless: WebPCleanupTransparentAreaLossless() turns all transparent pixels
to 0x00000000
- Lossy: WebPCleanupTransparentArea() flattens some transparent pixels

So, the result is that, sometimes we feed the modified canvas to the encoder
instead of the original one, which isn't the right thing to do.

This also applies to just lossless or just lossy encoding, as multiple encodes
happen (with the two dispose methods) in those cases too.

Change-Id: Idfa8ce831a1627014785ba7d0316c42f72594455
2016-05-20 15:07:50 -07:00
Urvang Joshi
169004b1d5 Refactor GetColorPalette method.
This was defined (slightly differently) at two places. Created a common
method and moved to utils/utils.[hc].

Change-Id: I19adc9c48f2a4e2ec9d995e78add6f25172774c2
2016-05-20 15:06:38 -07:00
James Zern
576362abd7 VP8LDoFillBitWindow: support big-endian in fast path
Change-Id: I577944fe0b85505766050dba5ab5aec48b30f541
2016-05-20 00:31:05 -07:00
James Zern
ac49e4e4dc bit_reader.c: s/VP8L_USE_UNALIGNED_LOAD/VP8L_USE_FAST_LOAD/
no longer gate this on WEBP_FORCE_ALIGNED as WebPMemToUint32() provides
this service. replace that check with WORDS_BIGENDIAN as the block is
currently little-endian specific.

Change-Id: Ie04ec0179022d20dab53da878008ae049837782f
2016-05-19 23:14:02 -07:00
James Zern
d39ceb58ac VP8LDoFillBitWindow: remove stale TODO
the read size may be fixed, but the offsets into buf_ are not. forcing
an aligned read then shifting or using a temporary would be costly. this
is less important now that WebPMemToUint32() is being used.

Change-Id: I357fec8f750969cce91987abebed2f95e27a835f
2016-05-19 23:07:10 -07:00
Pascal Massimino
2ec2de1450 Merge "Speed-up BackwardReferencesHashChainDistanceOnly." 2016-05-19 05:17:03 +00:00
Vincent Rabaud
3e023c17cd Speed-up BackwardReferencesHashChainDistanceOnly.
Instead of comparing all the following pixels over len (which can
frequently reach the maximum MAX_LENGTH=4096 for some images),
intervals are stored and compared.

Change-Id: I0dafef6cc988dde3c1c03ae07305ac48901d60ee
2016-05-19 04:51:13 +00:00
Marcin Kowalczyk
f2e1efbeb7 Improve near lossless compression when a prediction filter is used.
The old implementation in enc/near_lossless.c performing a separate
preprocessing step is used only when a prediction filter is not used,
otherwise a new implementation integrated into lossless_enc.c is used.

It retains the same logic for converting near lossless quality into max
number of bits dropped, and for adjusting the number of bits based on
the smoothness of the image at a given pixel. As before, borders are not
changed.

Then, instead of quantizing raw component values, the residual after
subtract green and after prediction is quantized according to the
resulting number of bits, taking care to not cross the boundary between
255 and 0 after decoding. Ties are resolved by moving closer to the
prediction instead of by bankers’ rounding.

This results in about 15% size decrease for the same quality.

Change-Id: If3e9c388158c2e3e75ef88876703f40b932f671f
2016-05-18 20:59:02 +00:00
James Zern
e15afbce5d dsp.h: fix ubsan macro name
copy and paste error in the previous commit, change
no_sanitize("unsigned-integer-overflow") from WEBP_UBSAN_IGNORE_UNDEF ->
WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW

Change-Id: Id178ee14df1f2c4923a91ce423241e26b60b5d32
2016-05-13 11:09:57 -07:00
James Zern
e53c9ccb24 dsp.h: add WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
for suppressing expected failures with -fsanitize=integer

Change-Id: I954cba45f0c96478b770ed7a6ac7491359cae075
2016-05-12 23:51:23 -07:00
James Zern
af81fdb772 utils.h: quiet -fsanitize=undefined warnings
add WEBP_UBSAN_IGNORE_UNDEF to WebPMemToUint32() / WebPUint32ToMem()
when WEBP_FORCE_ALIGNED is unset

Change-Id: I726b2e708ce29681584eb10c8874d5cf1e798756
2016-05-11 23:47:52 -07:00
James Zern
ea0be354a0 dsp.h: remove utils.h include
include utils.h directly where needed to allow utils.h to rely on
defines from dsp.h in a follow-up.

Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d
2016-05-11 23:17:21 -07:00
James Zern
cd276aecd0 utils/*.c: ../utils/utils.h -> ./utils.h
Change-Id: I0154e788c864f77d15d6c287df59c0a02e6db5e9
2016-05-11 23:08:32 -07:00
James Zern
c892713104 utils/Makefile.am: add some missing headers
Change-Id: Id5b81337b92819038bc7006de8c30bfc8eb9def5
2016-05-11 23:03:52 -07:00
James Zern
ea24e026aa Merge "dsp.h: add WEBP_UBSAN_IGNORE_UNDEF" 2016-05-11 06:21:45 +00:00
James Zern
369e264e2e dsp.h: add WEBP_UBSAN_IGNORE_UNDEF
only defined when WEBP_FORCE_ALIGNED isn't. use it to quiet alignment
warnings VP8LoadNewBytes().

Change-Id: I710a74bb9375285974e97022540551a3f4eda414
2016-05-10 22:45:13 -07:00
James Zern
0d020a7892 Merge "add runtime NEON detection" 2016-05-11 05:42:13 +00:00
James Zern
5ee2136a71 Merge "add VP8LAddPixels() to lossless.h" 2016-05-10 22:39:29 +00:00
Pascal Massimino
47435a6162 add VP8LAddPixels() to lossless.h
Change-Id: I67f9118f875affa32c47adfedf9df28b0ac9957b
2016-05-10 20:30:30 +00:00
Pascal Massimino
8fa6ac68f0 remove two ubsan warnings
(regarding uint overflow)

Change-Id: I1a76e4b1268370b6b7d6a1aa93b99e57f55fd02e
2016-05-10 18:40:18 +00:00
James Zern
74fb56fb5d add runtime NEON detection
configure gets 2 new options:
--enable-neon / --enable-neon-rtcd

the NEON modules are split to their own convenience lib and built with
auto-detected flags if none are given via CFLAGS.

the /proc/cpuinfo check will only be used for armv7 targets whose
toolchain does not enable NEON by default or didn't have NEON forced by
the CFLAGS from the environment.

Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5
2016-05-06 15:32:48 -07:00
Jovan Zelincevic
4154a8395d MIPS update to new Unfilter API
Change-Id: I2b5960812954dfcabc84663382b9e032fd1eeb43
2016-05-05 15:50:34 +02:00
James Zern
6235147ed5 cherry-pick decoder fix for 64-bit android devices
This fixes decoders built against clang-3.8 (r11c). Without this change
bad conditional code would be generated causing all calls to
WebPParseHeaders() to return 4 (UNSUPPORTED_FEATURE).

Original fix:
https://android-review.googlesource.com/#/c/196123

Change-Id: Id4b4d84048d347cea110b6cf297ef9ef4fbed323
2016-05-02 22:27:42 -07:00
Marcin Kowalczyk
5f95589fae Fix WEBP_ALIGN in case the argument is a pointer to a type larger than a byte.
Change-Id: Id74a70aa02be813cdf3ab4407ce207bcd647a6a4
2016-05-02 12:40:37 -07:00
Pascal Massimino
2309fd5c1d replace num_parts_ by num_parts_minus_one_ (unsigned)
This avoids spurious warnings and extra calculations.

Change-Id: Ibbe1b893cf89ee6b31a04d7bcbf92b4b357c8628
2016-05-02 12:19:01 -07:00
James Zern
9629f4bcda SimplifySegments: quiet -Warray-bounds warning
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9
this is similar to the changes made in:
c8a87bb AssignSegments: quiet -Warray-bounds warning
3e7f34a AssignSegments: quiet array-bounds warning

Change-Id: Iec7d470be424390c66f769a19576021d0cd9a2fd
2016-05-02 12:17:49 -07:00
Pascal Massimino
de47492e91 Merge "update the Unfilter API in dsp to process one row independently" 2016-04-22 17:33:41 +00:00
Pascal Massimino
2102ccd091 update the Unfilter API in dsp to process one row independently
This will allow to work in-place on cropped area later.

Also sped up the inverse gradient filtering in SSE2 (~4%)

Change-Id: I463149eee95d36984328f163a1e17f8cabd87441
2016-04-21 08:10:45 +00:00
Urvang Joshi
e3912d560b WebPAnimEncoder: Restore canvas before evaluating blending possibility.
Without this, the canvas may have been modified in-between two
GenerateCandidate() calls.

Change-Id: I0364a269caabdf282c30a8fa3c61896f0247342e
2016-04-19 16:13:45 -07:00
Urvang Joshi
6e12e1e3d2 WebPAnimEncoder: Fix for single-frame optimization.
Change-Id: I2da8dc34ab9589b58cde0ecb05d71357d77f6b25
2016-04-15 12:10:07 -07:00
James Zern
602f344a36 Merge changes I1d03acac,Ifcb64219
* changes:
  anim_diff: Add an experimental option for max inter-frame diff.
  WebPAnimEncoder: FlattenSimilarPixels(): look for similar
2016-04-07 19:13:56 +00:00
Pascal Massimino
95ecccf6dc only apply color-mapping for alpha on the cropped area
This is only possible if the filtering is not VERTICAL or GRADIENT.
Otherwise, we need the spatial predictors and hence need the un-visible
part above crop_top row.

COLOR_INDEX transform is the only transform that is not predicted
from previous row. Applying the same for other transform (spatial
predict, ...) is going to be more involve and use an extra temporary row.

+ remove ApplyInverseTransformsAlpha()
(work is done directly within ExtractPalettedAlphaRows())

+ change back to using filter_ instead of unfilter_func_

Change-Id: I09e57efae4a4af00bde35f21ca6e3d73b35d7d43
2016-04-07 10:21:02 +02:00
Pascal Massimino
aa809cfeb3 only allocate alpha_plane_ up to crop_bottom row
-> reduces memory consumption in case of cropping

work for bug #288

Change-Id: Icab6eb8e67e3c5c184a9363fffcd9f6acd0b311c
2016-04-06 14:07:16 +02:00
Urvang Joshi
31f2b8d8e1 WebPAnimEncoder: FlattenSimilarPixels(): look for similar
not exactly same.

Based on lossy WebP quality setting, ignore minor differences when
flattening
similar blocks.

For 6k set, at default quality with '-min_size' option, improves
compression by 0.3%

Change-Id: Ifcb64219f941e869eb2643e231220b278aad4cd4
2016-04-06 00:17:19 -07:00
Pascal Massimino
774dfbdc34 perform alpha filtering within the decoding loop
Change-Id: I221ae20ba879cd4f7b48d012db1bc5443d968e17
2016-04-06 00:13:14 -07:00
Pascal Massimino
a4cae68de0 lossless decoding: only process decoded row up to last_row
there's some subtle changes:
  - DecodeAlphaData() may be called with pos==end because we don't want
    to decode more data (there's none left), but because we want to apply
    process_func() to all the unprocessed pixels already decoded
  - last_row is exclusive and should be understood as 'up to last_row'. Can be misleading.
  - VP8LDecodeAlphaImageStream() was testing dec->last_pixel_ for completion,
    which was wrong because last_pixel_ is the last *decoded* pixel, not the
    last *processed* one. -> test now uses last_row_, as expected

Change-Id: I1fb04ba25cd7a4775db9e3deee3e2ae80f9c0a75
2016-04-05 23:26:50 -07:00
Pascal Massimino
238cdcdbe1 Only call WebPDequantizeLevels() on cropped area
this might change some crc slightly, since WebPDequantizeLevels()
performs an analysis pass, counting levels, which impacts the smoothing.
Now, the cropping area is not the same, so minor diffs are expected here
and there.

Change-Id: I3cce1e40c6f11c25b7c841044d637685c5740352
2016-04-05 13:12:47 +00:00
Pascal Massimino
cf6c713a0b alpha: preparatory cleanup
* make ALPHNew/Delete static
* properly init ALPHDec::io_
* introduce AllocateAlphaPlane() and WebPDeallocateAlphaMemory()
* reorganize VP8DecompressAlphaRows()

but we're still allocate the full alpha-plane. Optim will come
in another patch since it's tricky

Change-Id: Ib6f190a40abb7926a71535b0ed67c39d0974e06a
2016-04-04 16:30:29 +02:00
Vincent Rabaud
b95ac0a221 Merge "VP8GetHeaders(): initialize VP8Io with sane value for crop/scale dimensions" 2016-04-04 13:50:05 +00:00
Pascal Massimino
8923139414 VP8GetHeaders(): initialize VP8Io with sane value for crop/scale dimensions
so we can use crop_width/height elsewhere, instead of testing use_cropping

Change-Id: I41705db9f6889b036ecc167d73d1a1536dc33552
2016-04-04 09:21:55 +00:00
Pascal Massimino
5828e1995e use_8b_decode -> use_8b_decode_
Change-Id: I78e2b26c9bbe1364e5d048794528f9290606b545
2016-04-04 09:00:14 +00:00
Pascal Massimino
8dca0247d2 fix bug in alpha.c that was triggering a memory error in incremental mode
this change will be superseded by patch #335160 eventually, but until then
let's fix the problem temporarily.

Change-Id: Iafd979c2ff6801e3f1de4614870ca854a4747b04
2016-04-01 00:59:39 -07:00
Urvang Joshi
9a950c5375 WebPAnimEncoder: Disable filtering when blending is used with lossy encoding.
When FlattenSimilarBlocks() was making some blocks transparent with
averaged RGB values, filtering in lossy compression was causing
blockiness just outside the edge of these blocks.
Disabling filtering for that particular case avoid these block
artifacts.

The total encoded size of the 6k GIF set remains roughly the same (in
fact, reduces a bit).

Change-Id: Ida71cbabd59d851e16d871f53d19473312b3cc77
2016-03-31 23:07:47 -07:00
Urvang Joshi
eb423903a4 WebPAnimEncoder: choose max diff for framerect based on quality.
We pick a mapping with quality 0 mapping to max diff 32, to quality 100
mapping to max_diff 1.

For 6k GIF image set, this improves compression by:
4% at quality 0
0.05% at quality 75

Benefits the MovingThumbnailer test videos too.

Change-Id: I6838ce864d41e1e65311d26b9b8115a12390a253
2016-03-31 23:07:41 -07:00
Urvang Joshi
ff0a94beda WebPAnimEncoder lossy: ignore small pixel differences for frame rectangles.
This way we can ignore some noisy pixels and get tighter frame
rectangles.

Some results:
- Correctness:
Tested that anim_diff reports all images are identical for lossless, and
similar min_psnr value for lossy and mixed modes.

Also checked output images visually to make sure there weren't any
obvious kinks.

- Compression:
A very tiny improvement for 6000 image GIF set we have (0.03%) for lossy
and mixed mode. For some of these images, frames get dropped
automatically as they have a very small diff from previous frame.

10 images from test_video_frames_png show a clear improvement in
compression though. This CL leads to 7 out of 9 lossy WebPs getting
smaller -- for one of them, this leads to a higher quality being picked
(as that’s still < 150 KB).

Change-Id: If539b9e77e1375aa15edc8f926933593a9865f1c
2016-03-31 20:58:42 -07:00
Pascal Massimino
6d8c07d375 Merge "WebPDequantizeLevels(): use stride in CountLevels()" 2016-03-31 07:49:56 +00:00
Pascal Massimino
d96fe5e079 WebPDequantizeLevels(): use stride in CountLevels()
follow-up for patch #333712

Change-Id: I3f85e3fce9e1c9d1a862a14ba56134882807718f
2016-03-31 00:08:51 -07:00
James Zern
ec1b2407a4 WebPPictureImport*: check output pointer
fixes crash with NULL output pointer in calls to simple encode api
(WebPEncodeRGB, etc.)

Change-Id: I91e7a1c0e070ea842b0a2a4ac54e981cac8629bf
2016-03-25 18:21:13 -07:00
Pascal Massimino
c07687699b Merge "Revert "Re-enable encoding of alpha plane with color cache for next release."" 2016-03-25 07:29:07 +00:00
James Zern
41f14bcbc5 WebPPictureImport*: check src pointer
fixes crash with NULL source pointer in calls to simple encode api
(WebPEncodeRGB, etc.)

Change-Id: I706d670c80298da5176aaa5ba0eb2238dd71a8f0
2016-03-24 22:52:01 -07:00
Pascal Massimino
64eed38779 Pass stride parameter to WebPDequantizeLevels()
and also pass 'VP8Io* io' extra param to VP8DecompressAlphaRows()

This is somehow in preparation for some memory optimizations in
the 'cropping' case. For now, only the easy crop_bottom case is
optimized.

Change-Id: Ib54531ba057bf62b98422dbb6c181dda626c72c2
2016-03-18 15:36:58 +01:00
Pascal Massimino
97934e2447 Revert "Re-enable encoding of alpha plane with color cache for next release."
This avoids generating file that would trigger a decoding bug
found in 0.4.0 -> 0.4.3 libwebp versions.

This reverts commit 6ecd72f845.

Change-Id: I4667cc8f7b851ba44479e3fe2b9d844b2c56fcf4
2016-03-18 11:01:54 +01:00
Pascal Massimino
e88c4ca013 fix -m 2 mode-cost evaluation (causing partition0 overflow)
The mode's bits were not taken into account, which is ok for most of cases.
But in case of super large image, with 'easy' content, their overhead starts
mattering a lot and we were omitting to optimize for these.
Now, these mode bits have their own lambda values associated, limiting
the jerkiness. We also limit (for -m 2 only) the individual number of bits
to something that will prevent the partition 0 overflow.

removed the I4_PENALTY constant, which was a rather crude approximation.
Replaced by some q-dependent expression.

fixes issue #289

Change-Id: I956ae2d2308c339adc4706d52722f0bb61ccf18c
2016-03-11 20:34:45 +01:00
Pascal Massimino
4562e83dc2 Merge "add extra meaning to WebPDecBuffer::is_external_memory" 2016-03-09 08:24:19 +00:00
Pascal Massimino
abdb109f3b add extra meaning to WebPDecBuffer::is_external_memory
If value is '2', it means the buffer is a 'slow' one, like GPU-mapped memory.

This change is backward compatible (setting is_external_memory to 2
will be a no-op in previous libraries)

dwebp:  add flags to force a particular colorspace format
new flags is:
   -pixel_format {RGB,RGBA,BGR,BGRA,ARGB,RGBA_4444,RGB_565,
                  rgbA,bgrA,Argb,rgbA_4444,YUV,YUVA}
and also,external_memory {0,1,2}
These flags are mostly for debuggging purpose, and hence are not documented.

Change-Id: Iac88ce1e10b35163dd7af57f9660f062f5d8ed5e
2016-03-09 09:00:13 +01:00
James Zern
875aec7044 enc_neon,cosmetics: break long comment
Change-Id: I88dff0271fef1cc6dd5888572bfe0f09f467b028
2016-03-08 23:33:21 -08:00
James Zern
71e856cf84 GetMBSSIM,cosmetics: fix alignment
Change-Id: I884b3361484b48917fa4cba33cd1217ac51685f9
2016-03-08 23:26:10 -08:00
Pascal Massimino
a90edffb7e fix missing 'extern' for SSIM function in dsp/
Change-Id: Id8143120f01065dc088f4e90bd930f8ea7c3ae5a
2016-03-08 10:27:46 -08:00
Pascal Massimino
423ecaf484 move some SSIM-accumulation function for dsp/
This is in preparation for some SSE2 code.

And generally speaking, the whole SSIM code needs some
revamp: we're not averaging the SSIM value at each pixels
but just computing the overall SSIM value once, for the whole
plane. The former might be better than the latter.

Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5
2016-03-08 07:50:09 +01:00
Pascal Massimino
f08e66245a Merge "Fix FindClosestDiscretized in near lossless:" 2016-03-04 09:34:16 +00:00
James Zern
0d40cc5ea3 enc_neon,Disto4x4: remove an unnecessary transpose
based on the sse2 change in:
9960c31 Remove an unnecessary transposition in TTransform.

~9-10.5% faster at the function-level, < 1% overall

Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7
2016-03-03 16:18:59 -08:00
Marcin Kowalczyk
e8feb20e39 Fix FindClosestDiscretized in near lossless:
- The result is now indeed closest among possible results for all inputs, which
  was not the case for bits>4, where the mapping was not even monotonic because
  GetValAndDistance was correct only if the significant part of initial fit in
  a byte at most twice.

- The set of results for a larger number of bits dropped is a subset of values
  for a smaller number of bits dropped. This implies that subsequent
  discretizations for a smaller number of bits dropped do not change already
  discretized pixels, which improves the quality (changes do not accumulate)
  and compression density (values tend to repeat more often).

- Errors are more fairly distributed between upwards and downwards thanks to
  bankers’ rounding, which avoids images getting darker or lighter in overall.

- Deltas between discretized values are more repetitive. This improves
  compression density if delta encoding is used.

Also, the implementation is much shorter now.

Change-Id: I0a98e7d5255e91a7b9c193a156cf5405d9701f16
2016-03-02 12:47:22 +01:00
James Zern
a6f23c49b2 Merge "AnimEncoder: Support progress hook and user data." 2016-02-20 04:25:57 +00:00
James Zern
a5193774b0 Merge "Near lossless feature: fix some comments." 2016-02-20 03:51:15 +00:00
Urvang Joshi
da98d31ced AnimEncoder: Support progress hook and user data.
Pass them along to internal 'pic' object, so that progress can be reported back
and user data can also be inspected.

Change-Id: Idb5d0d4a76d07283d704a86c5892e1ad7bda09fa
2016-02-19 19:50:30 -08:00
Urvang Joshi
3335713169 Near lossless feature: fix some comments.
Change-Id: I2c5fc2a3b3fe5123d66b42bf148e361b4862dfb9
2016-02-19 19:26:39 -08:00
James Zern
0beed01aa5 cosmetics: fix indent after 2f5e898
2f5e898 fix multiple allocation for transform buffer

Change-Id: Ied5c89c0040671e2eddf23c8b7a78e0d817dd18e
2016-02-19 19:22:34 -08:00
Pascal Massimino
6753f35cac Merge "FTransformWHT optimization." 2016-02-19 09:38:04 +00:00
Vincent Rabaud
6583bb1a42 Improve SSE4.1 implementation of TTransform.
SSE4.1 is slower than the SSE2 implementation and this seems to
be due to a slow _mm_loadl_epi64 implementation by gcc
(hence a bug with my gcc 4.8) and a very slow _mm_hadd_epi32. Both
got confirmed by IACA and experiments.

Change-Id: I05607f66b7ccd8f4f42e000693aea583ffd5768f
2016-02-19 09:11:53 +01:00
Vincent Rabaud
7561d0c338 FTransformWHT optimization.
Data is packed sooner in the functions.

Change-Id: I018cfeca43f015ac755c7f209f9a97984cc0517b
2016-02-18 17:44:05 +01:00
Pascal Massimino
7ccdb734c2 fix indentation after patch #328220
Change-Id: Iccfcf4deaed6b383b9f80ae84b5b0575a4e94b5f
2016-02-18 15:14:41 +01:00
Pascal Massimino
6ec0d2a946 clarify the logic of the error path when decoding fails.
Change-Id: I2f86751ddafa4708dac3ffc9d6ec1f156e027a83
2016-02-18 14:58:18 +01:00
Vincent Rabaud
8aa352b256 Merge "Remove an unnecessary transposition in TTransform." 2016-02-18 08:15:10 +00:00
James Zern
db86088426 Merge "remove useless #include" 2016-02-17 23:17:12 +00:00
Vincent Rabaud
9960c31685 Remove an unnecessary transposition in TTransform.
Change-Id: Ib715c2d5ba659cb2db9c6832875ba508cc2fca3e
2016-02-17 21:41:28 +01:00
Vincent Rabaud
6e36b51188 Small speedup in FTransform.
It removes two _mm_unpacklo_epi32 and two _mm_sub_epi16.

Change-Id: Icdf86259f796ba855d1cda5e9c0e99cb396cb351
2016-02-17 21:26:36 +01:00
Pascal Massimino
2b4fe33e00 Merge "fix multiple allocation for transform buffer" 2016-02-17 07:27:23 +00:00
Pascal Massimino
2f5e8986cf fix multiple allocation for transform buffer
We were not updating the current_width_, which is usually
not a problem, unless we use Delta Palette with small number
of colors
-> Addressed this re-entrancy problem by checking we have
enough capacity for transform buffer.

The problem is not currently visible, until we restrict
the number of gradient used in delta-palette to less than 16.
Then the buffers have different current_width_ and the problem
surfaces.

Change-Id: Icd84b919905d7789014bb6668bfb6813c93fb36e
2016-02-17 06:14:39 +01:00
Vincent Rabaud
bf2b4f114f Regroup common SSE code + optimization.
The transpose refactoring will help removing a transpose in a
later CL.

The horizontal add function helps removing a _mm_sad_epu8 in DC8uv
=> the latency/throughput went from 29/25 to 23/19

Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766
2016-02-16 18:34:34 +01:00
Nico Weber
3ef1ce98b9 yuv_sse2: fix -Wconstant-conversion warning
'implicit conversion from 'int' to 'short' changes value from 33050 to
-32486'

original patch:

https://codereview.chromium.org/1657313003/

Make libwebp build with -Wconstant-conversion from newer clangs.

After http://llvm.org/viewvc/llvm-project?rev=259271&view=rev, clang
points out that _mm_set1_epi16(33050) causes an overflow in the short
argument to _mm_set1_epi16().  Since there's no version that takes an
unsigned short, add an explicit cast to tell the compiler that this is
intentional.

No behavior change.

Change-Id: I6b4e3401b15cfbcc895f9e81b5c2dc59d43ffb9b
2016-02-02 14:52:11 -08:00
James Zern
ab3c2583aa anim_encode,DefaultEncoderOptions: init verbose
default to disabled

broken since:
c13245c AnimEncoder: Add a GetError() method.

Change-Id: I51ccb85d5df338570512ec1d7430ad3229f93a9f
2016-02-01 17:00:19 -08:00
Pascal Massimino
75f4af4d54 remove useless #include
Change-Id: Id34f12ec94d8be6853fabd67609a6006ac99f152
2016-01-25 09:34:10 -08:00
Pascal Massimino
6c1d763119 avoid Yoda style for comparison
Change-Id: I8ff9f96951e5e8a619f7132455dd281cbf91aa4d
2016-01-15 23:52:29 -08:00
Vincent Rabaud
8ce975ac82 SSE optimization for vector mismatch.
Change-Id: I564b822033b59d86635230f29ed6197e306a2c4f
2016-01-07 18:23:45 +01:00
Pascal Massimino
7e7b6ccc7f faster rgb565/rgb4444/argb output
SSE2 and NEON implementation.

Change-Id: I342a1c3d84937b8497f0aaecb7ce9bdb7f50296b
2015-12-17 23:38:58 -08:00
James Zern
71100500a8 bump version to 0.5.0
libwebp{,decoder} - 0.5.0
libwebp libtool - 6.0.0
libwebpdecoder libtool - 2.0.0

mux/demux - 0.3.0
libtool - 2.0.0

Change-Id: I5346d13eb827fb5890efbb63ff3f28cea9d0c55f
2015-12-17 19:45:14 -08:00
James Zern
d48e427b1d Merge "demux: accept raw bitstreams" 2015-12-17 22:52:10 +00:00
James Zern
99a01f4f8b Merge "Unify some entropy functions." 2015-12-17 22:35:29 +00:00
James Zern
4b025f10f7 Merge "configure: disable asserts by default" 2015-12-17 22:28:37 +00:00
James Zern
92cbddf89c Merge "fix PrintBlockInfo()" 2015-12-17 21:00:57 +00:00
Vincent Rabaud
ca509a3362 Unify some entropy functions.
The code and logic is unified when computing bit entropy + Huffman cost.

Speed-wise, we gain 8% for lossless encoding.
Logic-wise, the beginning/end of the distributions are handled properly
and the compression ratio does not change much.

Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633
2015-12-17 17:00:08 +01:00
Pascal Massimino
367bf903b3 fix PrintBlockInfo()
... which has gone out of sync since the last block-cache layout change.

Change-Id: Ic441ec07b0198b508ce3fd34ab582cb60b1daabc
2015-12-17 15:47:25 +01:00
Pascal Massimino
b0547ff0b4 move back common constants for lossless_enc*.c into the .h
Change-Id: I11bc979db691f6518d85e2e1c3ac7f05d69681b0
2015-12-17 15:11:56 +01:00
Lode Vandevenne
fb4c7832f1 lossless: simpler alpha cleanup preprocessing
setting all transparent pixels to black rather than the "flatten" method.

0.3% smaller filesize on the 1000 PNGs if alpha cleanup is used (before: 18685774, after: 18622472)

Change-Id: Ib0db9e7ccde55b36e82de07855f2dbb630fe62b1
2015-12-17 15:04:50 +01:00
Vincent Rabaud
47ddd5a4cc Move some codec logic out of ./dsp .
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.

Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
2015-12-17 07:03:25 +00:00
James Zern
357f455dec yuv_sse2: fix 32-bit visual studio build
src\dsp\yuv_sse2.c : C2719: 'in': formal parameter with
  __declspec(align('16')) won't be aligned
src\dsp\yuv_sse2.c : C2719: 'out': formal parameter with
  __declspec(align('16')) won't be aligned

Change-Id: Ifd79e33b35c70748faff19cd64eba4a8ffce5a5a
2015-12-16 15:04:36 -08:00
James Zern
b9d80fa4e8 configure: disable asserts by default
--enable-asserts can be used to avoid defining NDEBUG

Change-Id: I6216668e3f79f69bd8c453f0b36cecb3b585688e
2015-12-16 13:15:53 -08:00
Pascal Massimino
7badd3da4a cosmetic fix: sizeof(type) -> sizeof(*var)
Change-Id: I1a39fccfdcb9f0a4b9b025d3c9b522e8edfe7fd6
2015-12-16 18:29:14 +01:00
Vincent Rabaud
80ce27d34e Speed up 24-bit packing / unpacking in YUV / RGB conversions.
This implementation brings:
- an SSE implementation of packing / unpacking
- bigger buffers processed at the same time
The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on
lossy encoding (RGB to YUV was already optimized).

Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f
2015-12-16 11:06:42 +01:00
Pascal Massimino
68eebcb0ff remove a TODO about rotation
(won't happen yet)

Change-Id: Ibb4ceccd1d7af0f76594e71062983dc311ba9aa2
2015-12-15 23:36:12 -08:00
Pascal Massimino
2dee2966df remove few obsolete TODO about aligned loads in SSE2
Change-Id: I3628602942ea2ce34dbcb85975d15afc1041f76c
2015-12-15 23:00:41 -08:00
Pascal Massimino
e0c0bb3480 remove TODO about unused ref_lf_delta[]
Change-Id: I54983c0dfc6927564143bad56bd2e4c4cdfefc0e
2015-12-15 22:57:53 -08:00
Pascal Massimino
9cf1cc2bd6 remove few TODO:
* 256 -> RD_DISTO_MULT
  * don't use TDisto for UV mode picking

Change-Id: I243148c716fe688b5c1b1fb9b7a6e58d0b5e6835
2015-12-15 22:52:12 -08:00
James Zern
791896455a Merge changes from topic 'demux-fragment-cleanup'
* changes:
  demux: remove GetFragment()
  demux: remove dead fragment related TODO
  demux, Frame: remove is_fragment_ field
  demux,WebPIterator: remove fragment_num/num_fragments
  demux: remove WebPDemuxSelectFragment
2015-12-16 06:45:00 +00:00
James Zern
47399f92b0 demux: remove GetFragment()
Change-Id: Ibea117b64ca91ccafde80411c10e0035dc3247f3
2015-12-15 19:26:13 -08:00
James Zern
d3cfb79ad6 demux: remove dead fragment related TODO
Change-Id: Iea6bf4742f803af46cd18f5d26843548e1b5cf00
2015-12-15 17:44:17 -08:00
James Zern
ab714b8ac4 demux, Frame: remove is_fragment_ field
this hasn't been set since parsing of the experimental chunk was
removed.
+ cleanup IsValidExtendedFormat(). is_fragmented has caused immediate
  failure since:
  4e2589f demux: restore strict fragment flag check

Change-Id: If9ecfc19556297100a6d5de1ba2cffdcbdc6c8fd
2015-12-15 17:43:23 -08:00
James Zern
b105921c7d yuv_sse2, cosmetics: fix indent
+ remove unneeded header

Change-Id: I3247378fd3315d95bb3345625d3575aa9e05c1b8
2015-12-15 17:29:04 -08:00
James Zern
466c92e829 demux,WebPIterator: remove fragment_num/num_fragments
these are remnants of an unused experiment

Change-Id: Ia08f9e6a895d5afff41a49f6e680fd76f024a5ee
2015-12-15 14:06:34 -08:00
James Zern
11714ff158 demux: remove WebPDemuxSelectFragment
this never had any affect, fragments were an abandoned experiment

Change-Id: Ifef15486a04cdb58f89f7faf56c31fd0a06e44ab
2015-12-15 13:02:32 -08:00
Pascal Massimino
c0f7cc47f2 fix for bug #280: UMR in next->bits
exit as early as possible upon error.

Change-Id: I4f7702228a146c31cab3c3d21079fa1fe6904cb2
2015-12-15 14:05:13 +01:00
James Zern
d4f9c2efd4 enc/Makefile.am: add missing headers
Change-Id: Ic29497f425909eda1a7f23e6c8e92bd4ca17d44b
2015-12-14 23:07:54 -08:00
James Zern
3f3ea2c539 demux: accept raw bitstreams
i.e., allow the WebP file header and leading chunk header to be
stripped.

Change-Id: I43818ed56a14881d9ad11aaddcd0bf5c0ca6aef3
2015-12-14 20:14:54 -08:00
Sriraman Tallam
b275e598b5 fix optimized build with -mcmodel=medium
INFO: From Compiling src/dsp/cpu.c:
src/dsp/cpu.c: In function 'x86CPUInfo':
src/dsp/cpu.c:36:3: inconsistent operand constraints in an 'asm'

With PIC and mcmodel=medium, the %rbx register must be saved and
restored which causes this problem.  This was also solved in GCC-4.9 with
this patch:
https://gcc.gnu.org/ml/gcc-patches/2012-12/msg01484.html

Tested:
Builds fine with this change.

Change-Id: Icca8eea7bf5af3ef9f17f6ae2886e3430143febf
2015-12-11 16:49:10 -08:00
Pascal Massimino
038a060dfc Merge "add disto-based refinement for UV mode (if method = 1 or 2)" 2015-12-11 23:57:25 +00:00
Vincent Rabaud
2835089d6a Provide an SSE2 implementation of CombinedShannonEntropy.
CombinedShannonEntropy takes 30% for lossless compression.
This implementation speeds up the overall process by 2 to 3 %.

Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f
2015-12-11 15:12:19 +01:00
Pascal Massimino
e6c9351918 add disto-based refinement for UV mode (if method = 1 or 2)
This doesn't slow down much and give some quality improvement.

Change-Id: I5afbe62b9c3922b3ec1bf6538c68dcdb0f25d2e4
2015-12-11 03:15:59 -08:00
James Zern
04507dc91f Merge "fix undefined behaviour during shift, using a cast" 2015-12-11 06:32:45 +00:00
Vincent Rabaud
d3d163972f Optimize the heap usage in HistogramCombineGreedy.
The previous priority system used a heap which was too heavy to
maintain (what was gained from insertions / deletions was lost
due to a linear that still happened on the heap for invalidation).
The new structure is a priority queue where only the head is
ordered.

Change-Id: Id13f8694885a934fe2b2f115f8f84ada061b9016
2015-12-10 12:44:11 +01:00
Pascal Massimino
202a710b26 fix undefined behaviour during shift, using a cast
Change-Id: Ibca261d01092cecf8b37c54e9fcc920c9527c0a9
2015-12-10 08:09:23 +01:00
Pascal Massimino
14d27a46be improve method #2 by merging DistoRefine() and
SimpleQuantize()

it's now a single function, that reconstructs the intra4x4 block during the scan
The I4_PENALTY had to be adjusted.

Overall, result is better quality-wise (esp. at q < 50), and a tad faster too.

method #0, #1 and #3+ are unchanged

Change-Id: If262aeb552397860b3dd532df8df6b1357779222
2015-12-10 08:04:04 +01:00
Pascal Massimino
cb1ce9969c Merge "10% faster table-less SSE2/NEON version of YUV->RGB conversion" 2015-12-09 10:41:24 +00:00
Pascal Massimino
ac761a3738 10% faster table-less SSE2/NEON version of YUV->RGB conversion
* Precision is slightly different
* also implemented in SSE2 the missing WebPUpsamplers for MODE_ARGB, MODE_Argb, MODE_RGB565, etc.
* removing yuv_tables_sse2.h saved ~8k of binary size
* the mips32/mips_dsp_r2 code is disabled for now, since it has drifted away
* the NEON code is somewhat tricky

Change-Id: Icf205faa62cf46c2825d79f3af6725dc1ec7f052
2015-12-08 20:05:56 -08:00
Pascal Massimino
7eb01ff3e8 Merge "Improved alpha cleanup for the webp encoder when prediction transform is used." 2015-12-08 11:32:37 +00:00
Pascal Massimino
fb8c9106c7 Merge "introduce WebPMemToUint32 and WebPUint32ToMem for memory access" 2015-12-08 11:32:05 +00:00
James Zern
bd91af200a Merge "bit_reader: remove aarch64 BITS TODO" 2015-12-08 07:39:11 +00:00
Vincent Rabaud
6c702b81ac Speed up hash chain initialization using memset.
That gains 1% on lossy compression.

Change-Id: Ib9aa210194ed2f17eaff85b499b55cc4eb99ff11
2015-12-07 11:54:50 +01:00
James Zern
464ed10fa9 bit_reader: remove aarch64 BITS TODO
set BITS=56 in this case as it's mildly better on iOS (Xcode 7) and
Android (r10e + gcc-4.9)

Change-Id: I3265021a3572987d01edfafd5c1431207f07a170
2015-12-04 19:58:23 -08:00
Lode Vandevenne
6938111357 Improved alpha cleanup for the webp encoder when prediction transform is used.
Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset:
Alpha cleanup before: 18856614
Alpha cleanup after: 18685802
For reference, with no alpha cleanup: 19159992

Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps
preprocessing in the encoder, and the cases when the prediction transform is not used.

Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28
2015-12-04 13:50:56 +00:00
Pascal Massimino
2c08aac81a introduce WebPMemToUint32 and WebPUint32ToMem for memory access
it uses memcpy() when unaligned memory write is tricky

Change-Id: I5d966ca9d19e9b43ac90140fa487824116982874
2015-12-04 13:43:01 +00:00
Vincent Rabaud
010ca3d10d Fix FindMatchLength with non-aligned buffers.
The 32-bit buffers are actually rarely 64-bit aligned.
The new solution uses memcmp and is alignment agnostic.
It is also slightly faster.

Change-Id: I863003e9ee4ee8a3eed25b7b2478cb82a0ddbb20
2015-12-04 10:19:58 +01:00
James Zern
e4a7eed49d cosmetics: fix indent
Change-Id: I8be5152115618016e1e2a59fbfec78d5282ce57e
2015-12-03 00:53:59 -08:00
James Zern
0837512964 Merge "Make a separate case for low_effort in CopyImageWithPrediction" 2015-12-03 08:46:31 +00:00
James Zern
aa2eb2d4a1 Merge "cosmetics: fix indent" 2015-12-03 08:44:54 +00:00
James Zern
b7551e90e1 cosmetics: fix indent
Change-Id: I67e5a0308a964bc37b2314d96f3691fc0550e9bc
2015-12-03 00:34:15 -08:00
Lode Vandevenne
5bda52d4e8 Make a separate case for low_effort in CopyImageWithPrediction
for more speed.

This gives a roughly a 1% speedup for low_effort. But actually this is a
preparation for the upcoming CL that changes RGB values of transparent pixels
based on prediction, which should not be done for low_effort because that would
slightly hurt its performance.

On 1000 PNGs, with quality 0, method 0:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.034 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.428 MP/s

Change-Id: I5ed9f599bbf908a917723f3c780551ceb7fd724d
2015-12-03 00:22:50 -08:00
Scott Hancher
5ae220bef6 backward_references.c: Fixed compiler warning
"Implicit conversion loses integer precision: 'long' to 'int'."

Change-Id: I1aec7431f84123e5280447883eb80b84a3821d91
2015-12-02 23:51:06 -08:00
Pascal Massimino
363babe255 Merge "fix some warning about unaligned 32b reads" 2015-12-02 10:29:40 +00:00
Vincent Rabaud
a141178255 Optimization in hash chain comparison for 64 bit
Arrays were compared 32 bits at a time, it is now done 64 bits at a time.
Overall encoding speed-up is only of 0.2% on @skal's small PNG corpus.
It is of 3% on my initial 1.3 Mp desktop screenshot image.

Change-Id: I1acb32b437397a7bf3dcffbecbcd4b06d29c05e1
2015-12-01 13:01:57 +01:00
Vincent Rabaud
829bd14145 Combine Huffman cost and bit entropy into one loop
The same computation was done for both values: go over two buffers,
sum them up, and take a decision on the sum at each iteration.

MIPS32 code has been disabled for now, pending a code update.

Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d
2015-11-30 13:57:25 +01:00
James Zern
a7a954c851 Merge "lossless: make prediction in encoder work per scanline" 2015-11-25 20:40:44 +00:00
Pascal Massimino
61b605b407 Merge "fix of undefined multiply (int32 overflow)" 2015-11-25 08:39:33 +00:00
Lode Vandevenne
239421c5ef lossless: make prediction in encoder work per scanline
instead of per block. This prepares for a next CL that can make the
predictors alter RGB value behind transparent pixels for denser
encoding. Some predictors depend on the top-right pixel, and it must
have been already processed to know its new RGB value, so requires per
scanline instead of per block.

Running the encode speed test on 1000 PNGs 10 times with default
settings:
Before:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s
After:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s

Same but with quality 0, method 0 and 30 iterations:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s

No effect on compressed size, this produces exactly same files. No
significant measured effect on speed. Expected faster speed from better
memory layout with scanline processing but slower speed due to needing
to get predictor mode per pixel, may compensate each other.

Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334
2015-11-25 00:38:27 -08:00
Pascal Massimino
f5ca40e05f fix of undefined multiply (int32 overflow)
the problem was the incorporation of the extra constant 1<<16 in the kC1
constant, to emulate the addition. It's now removed and the addition is
performed explicitly.

No real speed difference observed.

cf. issue #278

Change-Id: I2c6499031571d98afff392fb5ebe21a5fa60722d
2015-11-24 23:18:31 -08:00
James Zern
5cd2ef4c4a Merge changes from topic 'win-threading-compat'
* changes:
  Makefile.vc: enable WEBP_USE_THREAD for windows phone
  thread: use CreateThread for windows phone
  thread: use WaitForSingleObjectEx if available
  thread: use InitializeCriticalSectionEx if available
  thread: use native windows cond var if available
2015-11-25 00:41:02 +00:00
James Zern
d2afe974f9 thread: use CreateThread for windows phone
_beginthreadex is unavailable for winrt/uwp

Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d
2015-11-23 23:00:40 -08:00
James Zern
0fd0e12bfe thread: use WaitForSingleObjectEx if available
Windows XP and up

Change-Id: Ie1a46a82722b8624437c8aba0aa4566a4b0b3f57
2015-11-23 23:00:05 -08:00
James Zern
63fadc9ffa thread: use InitializeCriticalSectionEx if available
Windows Vista / Server 2008 and up

Change-Id: I32c5b4e5384d614c5a821ef511293ff014c67966
2015-11-23 22:58:28 -08:00
James Zern
110ad5835e thread: use native windows cond var if available
Vista / Server 2008 and up. no speed difference observed.

Change-Id: Ice19704777cb679b290dc107a751a0f36dd0c0a9
2015-11-23 22:58:11 -08:00
James Zern
912c9fdf0c dec/webp: use GetLE(24|32) from utils
picks up the undefined behavior fix from the previous commit

BUG=278

Change-Id: Ie17bf7db827b1dc564194aadcf6c5e47f61681f7
2015-11-23 22:53:27 -08:00
James Zern
f1694481a9 utils/GetLE32: correct uint32 promotion
avoids undefined behavior when shifting an int by 24.

BUG=278

Change-Id: I7b5ad96715002c8f425d81789bb75f22c176ab76
2015-11-23 22:51:33 -08:00
James Zern
158763dea3 Merge "always call WebPInitSamplers(), don't try to be smart" 2015-11-23 22:23:21 +00:00
Pascal Massimino
3770f3bbb6 Merge "cleanup the YFIX/TFIX difference by removing some code and #define" 2015-11-23 20:47:42 +00:00
James Zern
a40f60a9b4 Merge "3% speed improvement for lossless webp encoder for low effort mode:" 2015-11-23 20:44:15 +00:00
Pascal Massimino
ed1c2bc655 always call WebPInitSamplers(), don't try to be smart
if FANCY_UPSAMPLING was not defined but io->fancy_upsampling was set,
then the call to WebPInitSamplers() was skipped -> boom.

Change-Id: Id63e2ecc09f532fbe2ec9936d9ce4b502ba8fac5
2015-11-23 09:53:52 -08:00
Lode Vandevenne
b8c44f1aa4 3% speed improvement for lossless webp encoder for low effort mode:
prevent updating unused histogram.

Benchmark on 1000 PNGs, 30 iterations, lossless, quality 0, method 0:
before: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 34.578 MP/s
after: Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.980 MP/s

Change-Id: Id62759d4d111a6ba41c85c611a15d4f6ffc9f935
2015-11-22 09:12:54 +01:00
Pascal Massimino
997e103871 cleanup the YFIX/TFIX difference by removing some code and #define
no speed or output difference

Change-Id: I50bfb44f357e19431457b1cf9504a5a6bcce1945
2015-11-21 23:51:58 -08:00
Lode Vandevenne
1f9be97c22 Make discarding invisible RGB values (cleanup alpha) the default.
Rename the flag to exact instead of the opposite cleanup_alpha. Add the flag to
WebPConfig. Do the cleanup in the webp encoder library rather than the cwebp
binary, this will be needed for the next stage: smarter alpha cleanup for
better compression which cannot be done as a preprocessing due to depending on
predictor choices in the encoder.

Change-Id: I2fbf57f918a35f2da6186ef0b5d85e5fd0020eef
2015-11-21 12:32:32 -08:00
Pascal Massimino
b37b0179c5 fix for issue #275: don't compare to out-of-bound pointers
the original change triggered several internal API modifs.
This is to ensure that we're never computing pointer that can
possibly wrap around, or differences between pointers that can
overflow.

no observed speed difference

Change-Id: I9c94dda38d94fecc010305e4ad12f13b8fda5380
2015-11-20 16:25:17 -08:00
Pascal Massimino
21735e06f7 speed-up trivial one-symbol decoding case for lossless
We now consider 3 special cases:
 * htree-group has only 1 code (no bit is read from bitstream)
 * htree-group has few enough literal symbols, so that all the bit
   codes can fit into a look-up table of less than 64 entries
 * htree-group has a trivial arb literal (not GREEN!), like before

No overall speed change.

Change-Id: I6077fa0b7e5c31a6c67aa8aca859c22cc50ee254
2015-11-16 14:04:51 -08:00
Urvang Joshi
397863bd66 Refactor CopyPlane() and CopyPixels() methods: put them in utils.
Change-Id: I0e1533df557a0fa42c670e3b826fc0675c36e0a5
2015-11-13 11:39:22 -08:00
Urvang Joshi
6ecd72f845 Re-enable encoding of alpha plane with color cache for next release.
This is a revert of: https://chromium-review.googlesource.com/#/c/73607/

Change-Id: I7ec45277d73608d77d5e873290c6c185caa30c32
2015-11-13 07:15:19 +00:00
Pascal Massimino
775d3a373c remove unused fields from WebPDecoderOptions and WebPBitstreamFeatures
Change-Id: I92692d2975644dba10a7ac54f5c0f63ebd1580e6
2015-11-13 00:16:29 +01:00
Urvang Joshi
c13245c7d8 AnimEncoder: Add a GetError() method.
We now get error string instead of printing it.
The verbose option is now only used to print info and warnings.

Change-Id: I985c5acd427a9d1973068e7b7a8af5dd0d6d2585
2015-11-11 16:14:09 -08:00
Urvang Joshi
688b265d5e AnimDecoder API: Add a GetDemuxer() method.
Change-Id: Ic6a86e8788f1a3e21d1287ece36d80d1153b8f5a
2015-11-11 10:36:17 -08:00
Urvang Joshi
1aa4e3d6ba WebPAnimDecoder: add an option to enable multi-threaded decoding.
Change-Id: I3ff12bc07fc5a1b57a6950afa0e5f54a12985e75
2015-11-11 10:34:42 -08:00
Urvang Joshi
3584abca16 AnimDecoder: option to decode to common color modes.
Change-Id: I77ddab9abe3c4b35a9bcfe4c90b3e43d3aef166d
2015-11-10 09:27:59 -08:00
Urvang Joshi
945cfa3b7c mux.h does NOT need to include encode.h
It was needed earlier for WebPAnimEncoder API when it was using structs
like WebPConfig, but it only uses pointers to those now.

Change-Id: Ic0c144966421c678e8ef54b3fa81574bb2c9cd08
2015-11-09 15:40:09 -08:00
Pascal Massimino
bfd3fc02df ~2x faster SSE2 RGB24toY, BGR24toY, ARGBToY|UV
global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)

Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV

Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
2015-11-06 15:02:01 -08:00
Pascal Massimino
52fdbdfe66 extract some RGB24 to Luma conversion function from enc/ to dsp/
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.

ConvertRowToY() should go into dsp/ too, at some point.

Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425
2015-10-30 00:28:11 -07:00
Pascal Massimino
ab8c2300b6 add missing \n
Change-Id: I0c9236bbeef5868629d4dc02e3fae6e79ca55949
2015-10-30 00:02:27 -07:00
James Zern
5bd04a087c sync versions with 0.4.4
libwebp{,decoder} - 0.4.4
libwebp libtool - 5.4.0
libwebpdecoder libtool - 1.4.0

mux/demux - 0.2.2 (unchanged)
libtool - 1.2.0 (unchanged)

(cherry picked from commit 62864042c0)

Change-Id: I7d421dc47ad4d25a17450ce1b04562c5d58c596b
2015-10-28 23:43:40 -07:00
Pascal Massimino
8f1fcc15af Merge "Move ARGB->YUV functions from dec/vp8l.c to dsp/yuv.c" 2015-10-29 06:38:52 +00:00
Pascal Massimino
25bf2ce5cc fix some warning about unaligned 32b reads
on x86 + gcc, the assembly code is the same.

Change-Id: Ib0d23772ccf928f8d9ebcb0e157c0573d1f6a786
2015-10-28 15:51:55 -07:00
Pascal Massimino
fa8927efe4 Move ARGB->YUV functions from dec/vp8l.c to dsp/yuv.c
also switch to using ExtractAlpha() instead of hard-coding the loop.

The ARGBToY/UV functions are rather easy to port to SSE2 / NEON.

Change-Id: I8f1346a9ca427a36ce2d6c848369ca7964d8b3c7
2015-10-28 01:45:08 -07:00
James Zern
f7c507a5f8 Merge "remove unnecessary #include "yuv.h"" 2015-10-27 21:54:21 +00:00
Pascal Massimino
14e4043b67 remove unnecessary #include "yuv.h"
Change-Id: I8b277433663e063e7a182f66818afec1654a39bd
2015-10-27 01:27:36 -07:00
Pascal Massimino
d64d376c2a change WEBP_ALIGN_CST value to 31
(and make dec/frame.c use the common macros too)

Change-Id: Ie44dbd82e067934b17ca3ffba4dd45ab0d61d3f6
2015-10-19 21:39:55 +00:00
James Zern
f717b82864 vp8l.c, cosmetics: fix indent after 95509f9
95509f9 large re-organization of the delta-palettization code

Change-Id: I9d27f15cb6072a2bd1dd593d53db5b2dd3c30133
2015-10-19 12:28:57 -07:00
James Zern
927ccdc43b Merge "fix alignment of allocated memory in AllocateTransformBuffer" 2015-10-19 19:15:04 +00:00
Pascal Massimino
fea94b2b36 fix alignment of allocated memory in AllocateTransformBuffer
likely to avoid unaligned reads in the future

Change-Id: I434ba17c139ad6e190ebd9b909b241c6c6f1e7f8
2015-10-18 13:09:22 -07:00
Pascal Massimino
5aa8d61f75 Merge "MIPS: rescaler code synced with C implementation" 2015-10-17 07:52:36 +00:00
Djordje Pesut
e7fb267df7 MIPS: rescaler code synced with C implementation
Change-Id: I4cec115d3fe6f3f825084d7388249694c500256a
2015-10-17 00:16:27 -07:00
Pascal Massimino
93c86ed5b9 Merge "format_constants.h: MKFOURCC, correct cast" 2015-10-17 05:45:05 +00:00
James Zern
5d791d2603 format_constants.h: MKFOURCC, correct cast
'd' should be promoted to uint32 before shifting by 24

Change-Id: I6212661af3802709b0098af8402ed73a0d9373ee
2015-10-16 18:43:40 -07:00
James Zern
65726cd3a7 dsp/lossless: Average2, make a constant unsigned
use 'u' rather than the unnecessary 'l' as a suffix. this prevents a
conversion warning with some toolchains

Change-Id: I21c33ce08819b3c839c75e03a8f7f3a6041d0695
2015-10-16 18:39:42 -07:00
Johann
d26d9def80 Use __has_builtin to check clang support
Older versions of Xcode with clang reporting versions 4.[012] and 5.0
did not include support for __builtin_bswap16. Checking in this manner
avoids using brittle version checks.

Matches a change to libvpx:
https://chromium-review.googlesource.com/305573
to fix:
https://code.google.com/p/webm/issues/detail?id=1082

Change-Id: I23ea466ee1b53b12cd3fb45f65a2186c8dda95a1
2015-10-14 17:48:08 -07:00
Pascal Massimino
12ec204ec7 moved ALIGN_CST into util/utils.h and renamed WEBP_ALIGN_xxx
Note that ALIGN_CST is still kept different in dec/frame.c for now,
because the values is 31 there, not 15. We might re-unite these two
later.

Change-Id: Ibbee607fac4eef02f175b56f0bb0ba359fda3b87
2015-10-14 00:03:14 -07:00
Pascal Massimino
67c547fdcd rescaler: ~20% faster SSE2 implementation for lossless ImportRowExpand
lossy (1-channel) speed-up is more on the 5% side.

Change-Id: Id19d97b9e9a34804b59604a5b48f94a37fdafd62
2015-10-14 07:32:12 +02:00
Pascal Massimino
99e3f8128a Merge "large re-organization of the delta-palettization code" 2015-10-14 05:11:47 +00:00
Pascal Massimino
95509f9914 large re-organization of the delta-palettization code
same functionality, but better code layout.

What changed:
  * don't trash the palette_[] in EncodePalette(), so it can be re-used
  * split generation of image from bit-stream coding
  * move all the delta-palette code to delta_palettization.c, and only have 1 entry point there WebPSearchOptimalDeltaPalette()
  * minimize the number of "#ifdef WEBP_EXPERIMENTAL_FEATURES" in vp8l.c
  * clarify the TransformBuffer stuff. more clean-up to come here...

This should make experimenting with delta-palettization easier and more compartimentalized.

Change-Id: Iadaa90e6c5b9dabc7791aec2530e18c973a94610
2015-10-14 00:25:42 +02:00
Pascal Massimino
74fb458bbc fix for weird msvc warning message
" warning C4098: 'RescalerImportRowShrinkSSE2' : 'void' function returning a value"

Change-Id: Ifa893502e3e4b394910e142d954393dda9d59d1a
2015-10-10 22:35:59 -07:00
Pascal Massimino
ae49ad8641 Merge "SSE2 implementation of ImportRowShrink" 2015-10-10 06:02:24 +00:00
Pascal Massimino
932fd4df61 SSE2 implementation of ImportRowShrink
some limitations: only for RGBA output,
and if reduction factor is not too small (dst_width > src_width / 128)

20-25% faster, ~4-6% global improvement total decoding.

Change-Id: I95366ddaa4a38e0a96bed754dfe790126f7bb84a
2015-10-09 13:04:54 -07:00
Pascal Massimino
b0c9d8af32 label rename: NO_CHANGE -> NoChange
Change-Id: I5b2beb93169d7c2bc95e6cdeb57770fc44b4963f
2015-10-07 22:53:34 -07:00
skal
b4e731cd93 neon-implementation for rescaler code
It's better to stay with a 32b fixed-point precision overall, otherwise
the C-version on ARM gets *slower*.
Actually, gcc ARM compiler optimizes some instructions pretty
well when WEBP_RESCALER_FIX is exactly 32, even in C.

Change-Id: I0eea97f7db5947470f5af355dee098eca81e178d
2015-10-07 21:18:39 -07:00