On the non-fast path (use_8b_decode_=0) for decoding the alpha-mask,
we could end up requesting ApplyInverseTransform() with more rows
to process than NUM_ARGB_CACHE_ROWS. This could only happen on the
very last bottom rows of the image.
* ProcessRows() doesn't need to be fixed, since we never request more
than NUM_ARGB_CACHE_ROWS rows. Added an assert for that.
* the use_8b_decode_=1 case doesn't use argb_cache_, but rather does
the palette-decoding call directly. So, no problem here too.
Only the generic (and rather rare) case of calling ExtractAlphaRows()
was affected.
Change-Id: I58e28d590dcc08c24d237429b79614abcef1db7c
This is getting back to the old behavior which is actually better for
compression and speed with the latest patches.
Change-Id: I35884bab02589297c25d6e1e66dc5f13e05f7aa7
This was defined (slightly differently) at two places. Created a common
method and moved to utils/utils.[hc].
Change-Id: I66c3ac6dea24e0cd2c0eaa5440f3142b4dbbe23b
we don't need to store the resulting histogram, so no need to
call HistogramAddEval().
Allows some signature simplifications...
Change-Id: I3fff6c45f4a7c6179499c6078ff159df4ca0ac53
In case where the same offset is found in consecutive pixels,
the cost computation from one pixel can be re-used for the next.
Change-Id: Ic03c7d4ab95f3612eafc703349cfefd75273c3d7
and also recycle the malloc'd intervals
This avoids quite some malloc/free cycles during interval managment.
Change-Id: Ic2892e7c0260d0fca0e455d4728f261fb4c3800e
In a lot of cases, only one interval is used. This can cause
a lot of malloc/free cycles for only 56 bytes. By caching this
single interval and re-using it, we remove this cycle in most
frequent cases.
Change-Id: Ia22d583f60ae438c216612062316b20ecb34f029
As per the spec
(https://developers.google.com/speed/webp/docs/riff_container), only the
extended file format can contain an unknown chunk. So, when assembling a
WebP file with muxer, whenever there is an unknown chunk present, we
should create a VP8X chunk (even though none of the features are
present).
BUG=webp:294
Change-Id: I5da52d311e1853d40063d0f5026100d4325effaa
In some cases, the hash chain for a function is filled several
times:
- GetBackwardReferences -> CalculateBestCacheSize ->
BackwardReferencesLz77 that computes the hash chain
- GetBackwardReferences ->
(not always) BackwardReferencesTraceBackwards ->
BackwardReferencesHashChainDistanceOnly that computes the hash
chain in a slightly different way
Speed and compression performance are slightly changed (+ or -)
but will be homogneized in a later patch.
Change-Id: I43f0ecc7a9312c2ed6cdba1c0fabc6c5ad91c953
This reverts commit 169004b1d5.
this changes the ABI, so should bump versions and add a note to NEWS
when we're ready to expose it
Change-Id: Ic5bbd0aee2b6fd0f9d438a9effedf22fe0cec4bf
tl;dr
We do the following:
- Start with transparent value of 0x00000000 instead of 0x00ffffff, so that
WebPCleanupTransparentAreaLossless() is a no-op.
- Restore the original canvas after lossy encoding, to discard changes made by
WebPCleanupTransparentArea() before the next encode.
Explanation of why:
In the mixed mode, anim_encoder tries to encode using both lossless and lossy
compression. In fact, when "min_size" option is enabled, there are at most 4
encodes that can happen in this order:
- lossless with dispose none
- lossy with dispose none
- lossless with dispose background
- lossy with dispose background
But both lossless and lossy both potentially modify the canvas during encode
(for better compression):
- Lossless: WebPCleanupTransparentAreaLossless() turns all transparent pixels
to 0x00000000
- Lossy: WebPCleanupTransparentArea() flattens some transparent pixels
So, the result is that, sometimes we feed the modified canvas to the encoder
instead of the original one, which isn't the right thing to do.
This also applies to just lossless or just lossy encoding, as multiple encodes
happen (with the two dispose methods) in those cases too.
Change-Id: Idfa8ce831a1627014785ba7d0316c42f72594455
This was defined (slightly differently) at two places. Created a common
method and moved to utils/utils.[hc].
Change-Id: I19adc9c48f2a4e2ec9d995e78add6f25172774c2
no longer gate this on WEBP_FORCE_ALIGNED as WebPMemToUint32() provides
this service. replace that check with WORDS_BIGENDIAN as the block is
currently little-endian specific.
Change-Id: Ie04ec0179022d20dab53da878008ae049837782f
the read size may be fixed, but the offsets into buf_ are not. forcing
an aligned read then shifting or using a temporary would be costly. this
is less important now that WebPMemToUint32() is being used.
Change-Id: I357fec8f750969cce91987abebed2f95e27a835f
Instead of comparing all the following pixels over len (which can
frequently reach the maximum MAX_LENGTH=4096 for some images),
intervals are stored and compared.
Change-Id: I0dafef6cc988dde3c1c03ae07305ac48901d60ee
The old implementation in enc/near_lossless.c performing a separate
preprocessing step is used only when a prediction filter is not used,
otherwise a new implementation integrated into lossless_enc.c is used.
It retains the same logic for converting near lossless quality into max
number of bits dropped, and for adjusting the number of bits based on
the smoothness of the image at a given pixel. As before, borders are not
changed.
Then, instead of quantizing raw component values, the residual after
subtract green and after prediction is quantized according to the
resulting number of bits, taking care to not cross the boundary between
255 and 0 after decoding. Ties are resolved by moving closer to the
prediction instead of by bankers’ rounding.
This results in about 15% size decrease for the same quality.
Change-Id: If3e9c388158c2e3e75ef88876703f40b932f671f
copy and paste error in the previous commit, change
no_sanitize("unsigned-integer-overflow") from WEBP_UBSAN_IGNORE_UNDEF ->
WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
Change-Id: Id178ee14df1f2c4923a91ce423241e26b60b5d32
add WEBP_UBSAN_IGNORE_UNDEF to WebPMemToUint32() / WebPUint32ToMem()
when WEBP_FORCE_ALIGNED is unset
Change-Id: I726b2e708ce29681584eb10c8874d5cf1e798756
include utils.h directly where needed to allow utils.h to rely on
defines from dsp.h in a follow-up.
Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d
configure gets 2 new options:
--enable-neon / --enable-neon-rtcd
the NEON modules are split to their own convenience lib and built with
auto-detected flags if none are given via CFLAGS.
the /proc/cpuinfo check will only be used for armv7 targets whose
toolchain does not enable NEON by default or didn't have NEON forced by
the CFLAGS from the environment.
Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5
This fixes decoders built against clang-3.8 (r11c). Without this change
bad conditional code would be generated causing all calls to
WebPParseHeaders() to return 4 (UNSUPPORTED_FEATURE).
Original fix:
https://android-review.googlesource.com/#/c/196123
Change-Id: Id4b4d84048d347cea110b6cf297ef9ef4fbed323
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9
this is similar to the changes made in:
c8a87bb AssignSegments: quiet -Warray-bounds warning
3e7f34a AssignSegments: quiet array-bounds warning
Change-Id: Iec7d470be424390c66f769a19576021d0cd9a2fd
This will allow to work in-place on cropped area later.
Also sped up the inverse gradient filtering in SSE2 (~4%)
Change-Id: I463149eee95d36984328f163a1e17f8cabd87441
This is only possible if the filtering is not VERTICAL or GRADIENT.
Otherwise, we need the spatial predictors and hence need the un-visible
part above crop_top row.
COLOR_INDEX transform is the only transform that is not predicted
from previous row. Applying the same for other transform (spatial
predict, ...) is going to be more involve and use an extra temporary row.
+ remove ApplyInverseTransformsAlpha()
(work is done directly within ExtractPalettedAlphaRows())
+ change back to using filter_ instead of unfilter_func_
Change-Id: I09e57efae4a4af00bde35f21ca6e3d73b35d7d43
not exactly same.
Based on lossy WebP quality setting, ignore minor differences when
flattening
similar blocks.
For 6k set, at default quality with '-min_size' option, improves
compression by 0.3%
Change-Id: Ifcb64219f941e869eb2643e231220b278aad4cd4
there's some subtle changes:
- DecodeAlphaData() may be called with pos==end because we don't want
to decode more data (there's none left), but because we want to apply
process_func() to all the unprocessed pixels already decoded
- last_row is exclusive and should be understood as 'up to last_row'. Can be misleading.
- VP8LDecodeAlphaImageStream() was testing dec->last_pixel_ for completion,
which was wrong because last_pixel_ is the last *decoded* pixel, not the
last *processed* one. -> test now uses last_row_, as expected
Change-Id: I1fb04ba25cd7a4775db9e3deee3e2ae80f9c0a75
this might change some crc slightly, since WebPDequantizeLevels()
performs an analysis pass, counting levels, which impacts the smoothing.
Now, the cropping area is not the same, so minor diffs are expected here
and there.
Change-Id: I3cce1e40c6f11c25b7c841044d637685c5740352
* make ALPHNew/Delete static
* properly init ALPHDec::io_
* introduce AllocateAlphaPlane() and WebPDeallocateAlphaMemory()
* reorganize VP8DecompressAlphaRows()
but we're still allocate the full alpha-plane. Optim will come
in another patch since it's tricky
Change-Id: Ib6f190a40abb7926a71535b0ed67c39d0974e06a
this change will be superseded by patch #335160 eventually, but until then
let's fix the problem temporarily.
Change-Id: Iafd979c2ff6801e3f1de4614870ca854a4747b04
When FlattenSimilarBlocks() was making some blocks transparent with
averaged RGB values, filtering in lossy compression was causing
blockiness just outside the edge of these blocks.
Disabling filtering for that particular case avoid these block
artifacts.
The total encoded size of the 6k GIF set remains roughly the same (in
fact, reduces a bit).
Change-Id: Ida71cbabd59d851e16d871f53d19473312b3cc77
We pick a mapping with quality 0 mapping to max diff 32, to quality 100
mapping to max_diff 1.
For 6k GIF image set, this improves compression by:
4% at quality 0
0.05% at quality 75
Benefits the MovingThumbnailer test videos too.
Change-Id: I6838ce864d41e1e65311d26b9b8115a12390a253
This way we can ignore some noisy pixels and get tighter frame
rectangles.
Some results:
- Correctness:
Tested that anim_diff reports all images are identical for lossless, and
similar min_psnr value for lossy and mixed modes.
Also checked output images visually to make sure there weren't any
obvious kinks.
- Compression:
A very tiny improvement for 6000 image GIF set we have (0.03%) for lossy
and mixed mode. For some of these images, frames get dropped
automatically as they have a very small diff from previous frame.
10 images from test_video_frames_png show a clear improvement in
compression though. This CL leads to 7 out of 9 lossy WebPs getting
smaller -- for one of them, this leads to a higher quality being picked
(as that’s still < 150 KB).
Change-Id: If539b9e77e1375aa15edc8f926933593a9865f1c
and also pass 'VP8Io* io' extra param to VP8DecompressAlphaRows()
This is somehow in preparation for some memory optimizations in
the 'cropping' case. For now, only the easy crop_bottom case is
optimized.
Change-Id: Ib54531ba057bf62b98422dbb6c181dda626c72c2
This avoids generating file that would trigger a decoding bug
found in 0.4.0 -> 0.4.3 libwebp versions.
This reverts commit 6ecd72f845.
Change-Id: I4667cc8f7b851ba44479e3fe2b9d844b2c56fcf4
The mode's bits were not taken into account, which is ok for most of cases.
But in case of super large image, with 'easy' content, their overhead starts
mattering a lot and we were omitting to optimize for these.
Now, these mode bits have their own lambda values associated, limiting
the jerkiness. We also limit (for -m 2 only) the individual number of bits
to something that will prevent the partition 0 overflow.
removed the I4_PENALTY constant, which was a rather crude approximation.
Replaced by some q-dependent expression.
fixes issue #289
Change-Id: I956ae2d2308c339adc4706d52722f0bb61ccf18c
If value is '2', it means the buffer is a 'slow' one, like GPU-mapped memory.
This change is backward compatible (setting is_external_memory to 2
will be a no-op in previous libraries)
dwebp: add flags to force a particular colorspace format
new flags is:
-pixel_format {RGB,RGBA,BGR,BGRA,ARGB,RGBA_4444,RGB_565,
rgbA,bgrA,Argb,rgbA_4444,YUV,YUVA}
and also,external_memory {0,1,2}
These flags are mostly for debuggging purpose, and hence are not documented.
Change-Id: Iac88ce1e10b35163dd7af57f9660f062f5d8ed5e
This is in preparation for some SSE2 code.
And generally speaking, the whole SSIM code needs some
revamp: we're not averaging the SSIM value at each pixels
but just computing the overall SSIM value once, for the whole
plane. The former might be better than the latter.
Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5
based on the sse2 change in:
9960c31 Remove an unnecessary transposition in TTransform.
~9-10.5% faster at the function-level, < 1% overall
Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7
- The result is now indeed closest among possible results for all inputs, which
was not the case for bits>4, where the mapping was not even monotonic because
GetValAndDistance was correct only if the significant part of initial fit in
a byte at most twice.
- The set of results for a larger number of bits dropped is a subset of values
for a smaller number of bits dropped. This implies that subsequent
discretizations for a smaller number of bits dropped do not change already
discretized pixels, which improves the quality (changes do not accumulate)
and compression density (values tend to repeat more often).
- Errors are more fairly distributed between upwards and downwards thanks to
bankers’ rounding, which avoids images getting darker or lighter in overall.
- Deltas between discretized values are more repetitive. This improves
compression density if delta encoding is used.
Also, the implementation is much shorter now.
Change-Id: I0a98e7d5255e91a7b9c193a156cf5405d9701f16
Pass them along to internal 'pic' object, so that progress can be reported back
and user data can also be inspected.
Change-Id: Idb5d0d4a76d07283d704a86c5892e1ad7bda09fa
SSE4.1 is slower than the SSE2 implementation and this seems to
be due to a slow _mm_loadl_epi64 implementation by gcc
(hence a bug with my gcc 4.8) and a very slow _mm_hadd_epi32. Both
got confirmed by IACA and experiments.
Change-Id: I05607f66b7ccd8f4f42e000693aea583ffd5768f
We were not updating the current_width_, which is usually
not a problem, unless we use Delta Palette with small number
of colors
-> Addressed this re-entrancy problem by checking we have
enough capacity for transform buffer.
The problem is not currently visible, until we restrict
the number of gradient used in delta-palette to less than 16.
Then the buffers have different current_width_ and the problem
surfaces.
Change-Id: Icd84b919905d7789014bb6668bfb6813c93fb36e
The transpose refactoring will help removing a transpose in a
later CL.
The horizontal add function helps removing a _mm_sad_epu8 in DC8uv
=> the latency/throughput went from 29/25 to 23/19
Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766
'implicit conversion from 'int' to 'short' changes value from 33050 to
-32486'
original patch:
https://codereview.chromium.org/1657313003/
Make libwebp build with -Wconstant-conversion from newer clangs.
After http://llvm.org/viewvc/llvm-project?rev=259271&view=rev, clang
points out that _mm_set1_epi16(33050) causes an overflow in the short
argument to _mm_set1_epi16(). Since there's no version that takes an
unsigned short, add an explicit cast to tell the compiler that this is
intentional.
No behavior change.
Change-Id: I6b4e3401b15cfbcc895f9e81b5c2dc59d43ffb9b
The code and logic is unified when computing bit entropy + Huffman cost.
Speed-wise, we gain 8% for lossless encoding.
Logic-wise, the beginning/end of the distributions are handled properly
and the compression ratio does not change much.
Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633
setting all transparent pixels to black rather than the "flatten" method.
0.3% smaller filesize on the 1000 PNGs if alpha cleanup is used (before: 18685774, after: 18622472)
Change-Id: Ib0db9e7ccde55b36e82de07855f2dbb630fe62b1
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.
Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
This implementation brings:
- an SSE implementation of packing / unpacking
- bigger buffers processed at the same time
The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on
lossy encoding (RGB to YUV was already optimized).
Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f
* changes:
demux: remove GetFragment()
demux: remove dead fragment related TODO
demux, Frame: remove is_fragment_ field
demux,WebPIterator: remove fragment_num/num_fragments
demux: remove WebPDemuxSelectFragment
this hasn't been set since parsing of the experimental chunk was
removed.
+ cleanup IsValidExtendedFormat(). is_fragmented has caused immediate
failure since:
4e2589f demux: restore strict fragment flag check
Change-Id: If9ecfc19556297100a6d5de1ba2cffdcbdc6c8fd