and also pass 'VP8Io* io' extra param to VP8DecompressAlphaRows()
This is somehow in preparation for some memory optimizations in
the 'cropping' case. For now, only the easy crop_bottom case is
optimized.
Change-Id: Ib54531ba057bf62b98422dbb6c181dda626c72c2
The mode's bits were not taken into account, which is ok for most of cases.
But in case of super large image, with 'easy' content, their overhead starts
mattering a lot and we were omitting to optimize for these.
Now, these mode bits have their own lambda values associated, limiting
the jerkiness. We also limit (for -m 2 only) the individual number of bits
to something that will prevent the partition 0 overflow.
removed the I4_PENALTY constant, which was a rather crude approximation.
Replaced by some q-dependent expression.
fixes issue #289
Change-Id: I956ae2d2308c339adc4706d52722f0bb61ccf18c
If value is '2', it means the buffer is a 'slow' one, like GPU-mapped memory.
This change is backward compatible (setting is_external_memory to 2
will be a no-op in previous libraries)
dwebp: add flags to force a particular colorspace format
new flags is:
-pixel_format {RGB,RGBA,BGR,BGRA,ARGB,RGBA_4444,RGB_565,
rgbA,bgrA,Argb,rgbA_4444,YUV,YUVA}
and also,external_memory {0,1,2}
These flags are mostly for debuggging purpose, and hence are not documented.
Change-Id: Iac88ce1e10b35163dd7af57f9660f062f5d8ed5e
This is in preparation for some SSE2 code.
And generally speaking, the whole SSIM code needs some
revamp: we're not averaging the SSIM value at each pixels
but just computing the overall SSIM value once, for the whole
plane. The former might be better than the latter.
Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5
based on the sse2 change in:
9960c31 Remove an unnecessary transposition in TTransform.
~9-10.5% faster at the function-level, < 1% overall
Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7
- The result is now indeed closest among possible results for all inputs, which
was not the case for bits>4, where the mapping was not even monotonic because
GetValAndDistance was correct only if the significant part of initial fit in
a byte at most twice.
- The set of results for a larger number of bits dropped is a subset of values
for a smaller number of bits dropped. This implies that subsequent
discretizations for a smaller number of bits dropped do not change already
discretized pixels, which improves the quality (changes do not accumulate)
and compression density (values tend to repeat more often).
- Errors are more fairly distributed between upwards and downwards thanks to
bankers’ rounding, which avoids images getting darker or lighter in overall.
- Deltas between discretized values are more repetitive. This improves
compression density if delta encoding is used.
Also, the implementation is much shorter now.
Change-Id: I0a98e7d5255e91a7b9c193a156cf5405d9701f16
Pass them along to internal 'pic' object, so that progress can be reported back
and user data can also be inspected.
Change-Id: Idb5d0d4a76d07283d704a86c5892e1ad7bda09fa
SSE4.1 is slower than the SSE2 implementation and this seems to
be due to a slow _mm_loadl_epi64 implementation by gcc
(hence a bug with my gcc 4.8) and a very slow _mm_hadd_epi32. Both
got confirmed by IACA and experiments.
Change-Id: I05607f66b7ccd8f4f42e000693aea583ffd5768f
We were not updating the current_width_, which is usually
not a problem, unless we use Delta Palette with small number
of colors
-> Addressed this re-entrancy problem by checking we have
enough capacity for transform buffer.
The problem is not currently visible, until we restrict
the number of gradient used in delta-palette to less than 16.
Then the buffers have different current_width_ and the problem
surfaces.
Change-Id: Icd84b919905d7789014bb6668bfb6813c93fb36e
The transpose refactoring will help removing a transpose in a
later CL.
The horizontal add function helps removing a _mm_sad_epu8 in DC8uv
=> the latency/throughput went from 29/25 to 23/19
Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766
'implicit conversion from 'int' to 'short' changes value from 33050 to
-32486'
original patch:
https://codereview.chromium.org/1657313003/
Make libwebp build with -Wconstant-conversion from newer clangs.
After http://llvm.org/viewvc/llvm-project?rev=259271&view=rev, clang
points out that _mm_set1_epi16(33050) causes an overflow in the short
argument to _mm_set1_epi16(). Since there's no version that takes an
unsigned short, add an explicit cast to tell the compiler that this is
intentional.
No behavior change.
Change-Id: I6b4e3401b15cfbcc895f9e81b5c2dc59d43ffb9b
The code and logic is unified when computing bit entropy + Huffman cost.
Speed-wise, we gain 8% for lossless encoding.
Logic-wise, the beginning/end of the distributions are handled properly
and the compression ratio does not change much.
Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633
setting all transparent pixels to black rather than the "flatten" method.
0.3% smaller filesize on the 1000 PNGs if alpha cleanup is used (before: 18685774, after: 18622472)
Change-Id: Ib0db9e7ccde55b36e82de07855f2dbb630fe62b1
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.
Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
This implementation brings:
- an SSE implementation of packing / unpacking
- bigger buffers processed at the same time
The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on
lossy encoding (RGB to YUV was already optimized).
Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f