This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram
This better matches Google style and clears some clang-tidy warnings.
Change-Id: Ib58d676fa79c5a4a95c676a98b62b548097f3c48
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram
This better matches Google style and clears some clang-tidy warnings.
Change-Id: Ia4ce0fd0095f76f7edbc0fc6fe7f625e0d8bc6df
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram
This better matches Google style and clears some clang-tidy warnings.
Change-Id: Ice1edbbd98172a916be6b6d3cdaff80fe05a6e37
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram
This better matches Google style and clears some clang-tidy warnings.
Change-Id: I23878bca2e14a898266704f3fec65d40f58fd0b2
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram
This better matches Google style and clears some clang-tidy warnings.
Change-Id: I9774ed6182ee4d872551aea56390fc0662cf0925
This is a follow up to:
ee8e8c62 Fix member naming for VP8LHistogram
This better matches Google style and clears some clang-tidy warnings.
Change-Id: Ida41ca82445800552573ff5ebbde743cf8fa6eff
- move the bin_id to the Histogram
- do not consider empty histograms
The speed-ups are negligible as linear algorithms in uint16_t are
removed, while the whole code is still O(N^2) in histograms.
Change-Id: Ie9c4831f0f3c64af9d9710a1dc2d817ba165389e
On some dataset, this was taking 2.5%. 2% when switching to
_mm_maskmoveu_si128. 1.7% when using _mm_loadu_si128
Confirmed by IACA: going from throughput of 4.26 to 3.5 and then
to 6.26 for twice the input.
Change-Id: I409f901aaad9d39bf55a1aac28cc25f126876b01
Entropy clustering merges symbol histograms to reduce the overall
entropy. The cost of 2 added histograms is compared to the 2 costs
of the individual histograms and if it is smaller, a merge is done.
Except for some symbols (distance and length), the computed cost is
the real final cost based on the histogram, and some constant cost
(independent from the probabilities of the symbols and hence the
merge) because the symbol is encode as Golomb.
This constant cost is useless and can be removed.
Change-Id: I6271e8c0e4111cdeff544cbdb7dec3c67be5309c
This restores the use of the function after
980b708e enc_neon: fix build w/aarch64 gcc < 9.4.0
The intrinsic was added to llvm for aarch64 in:
5e4ce1ae9dad Implement the newly added AArch64 ACLE functions for
ld1/st1 with 2/3/4 vectors. The functions are like:
vst1_s8_x2 ...
llvmorg-3.4.0-rc1~101
https://github.com/llvm/llvm-project/commit/5e4ce1ae9dad
Visual Studio 2019 and 2022 also support the function (2017 is still
disabled for this path due to it relying on arm64_neon.h).
Change-Id: I6ff10e22deb3968a48738a4458d2d3d55410b5ec
After:
2c70ad76 muxread,CreateInternal: fix riff size checks (cl/200674839)
`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` (plus additional 1 byte
padding if needed). A later check included `CHUNK_HEADER_SIZE` before
capping the value of the size passed to `WebPMuxCreateInternal()`,
missing cases with a small amount of extra data after the RIFF chunk
(like a newline when the file is opened and saved in a text editor) and
setting size to an incorrect value, so larger sizes would also fail.
Another check of `riff_size < CHUNK_HEADER_SIZE` after the call to
`SizeWithPadding()` is removed because 1) it could not fail given
`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` to the value; and 2) it is
redundant as `size < RIFF_HEADER_SIZE + CHUNK_HEADER_SIZE` is checked
earlier in the function.
Bug: webp:42340561
Change-Id: I58dc4f071b27c2841001b4012aabdb1869f64f97
The values for the R/G/B floating point formulas resembled
https://fourcc.org/fccyvrgb.php and Video Demystified, but the fixed
point values are more closely aligned to rounded values from
https://en.wikipedia.org/wiki/YCbCr and BT.601.
The R/G/B formulas with the values prior to this change are added to
sharpyuv_csp.c as they align with the fixed values. The origin of those
coefficients is unclear. For consistency between library versions we'll
leave them as is.
Bug: webp:375011696
Change-Id: Id3f2a57530eee700cc52a899b32b25b5c015e89b
Take advantage of the known sizes used by VP8LHistogramAdd() and
remove loop for the remainder. The loop was being auto-vectorized making
the code larger and slower than the vectorized C code.
For larger sizes the new code is ~3-4.5% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.
The LINE_SIZE==8 option is removed with this change. It had been set
to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8
case by 2 in any case; they both profile similarly.
Change-Id: I6dfedfd57474f44d15e2ce510a48e5252221077a
Take advantage of the known sizes used by VP8LHistogramAdd() and remove
loop for the remainder. The loop was being auto-vectorized making the
code larger and slower than the vectorized C code.
For larger sizes the new code is ~4-7% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.
The LINE_SIZE==8 option is removed with this change. It had been set to
16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case
by 2 in any case; they both profile similarly.
Change-Id: I2376e2dca3bffa38477b4a432f4c533419e3be0e
Extend VP8EncIterator::i4_boundary_ by 3 bytes to avoid Intra4Preds_NEON
reading deeper into the struct (likely padding) when top is positioned
at offset 29. This data is memset with MSan to prevent a warning due to
its incorrect modeling of tbl instructions.
Prior to:
169dfbf9 disable Intra4Preds_NEON
there was a mismatch in the preprocessor checks for enabling the
function in NEON and removing the C version; NEON used `BPS == 32` while
the C code was removed unconditionally when building for aarch64. This
patch also normalizes those checks to look for `BPS == 32` and `BPS !=
32` as appropriate.
Bug: b:366668849,webp:372109644
Change-Id: Ic9e6ad4b2d844cb446decd63aec0b2676a89c8d0
These appear as warnings under VS15 (16 and 17 are silent) and were
missed in:
a32b436b dsp/lossless*: use WEBP_RESTRICT qualifier
Change-Id: Ia7cffafc166f2da93b51714363558798cda71b67
* changes:
dsp/yuv*: use WEBP_RESTRICT qualifier
dsp/upsampling*: use WEBP_RESTRICT qualifier
dsp/rescaler*: use WEBP_RESTRICT qualifier
dsp/lossless*: use WEBP_RESTRICT qualifier
dsp/filters*: use WEBP_RESTRICT qualifier
dsp/enc*: use WEBP_RESTRICT qualifier
dsp/dec*: use WEBP_RESTRICT qualifier
dsp/cost*: use WEBP_RESTRICT qualifier
The load of the `top` parameter may over read causing MSan errors:
==7373==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0xfff891d52ad4 in Intra4Preds_NEON src/dsp/enc_neon.c:1003:12
#1 0xfff892d87618 in MakeIntra4Preds src/enc/quant_enc.c:484:3
Bug: b:366668849
Change-Id: I29cf3b2f402ee79ea93c1ee2a4fdd95083aeed68
Better vectorization in the C code, fewer instructions / comparisons in
NEON, and fewer reloads in SSE2/SSE4 w/ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I07a7e36a2dce8632c71c0fbbeef94dc51453eaf7
Better vectorization in the C code, fewer instructions in NEON, and some
code reordering / better register usage in SSE2/SSE4 w/ndk
r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: Ib29980f778ad3dbb952178ad8dee39b8673c4ff8
Some improvement in the C code. No changes in NEON or SSE2 w/ndk
r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I2316122db893f48f0afda90a147c83cac7f07526
lossless_enc: better vectorization, most benefits seen in AddVector/Eq
w/ndk r27/gcc-13/clang-16
lossless: minor reordering and some improvement to PredictorAdd5_SSE2
w/gcc-13
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7
Better stack/register usage in SSE2/NEON code and improved vectorization
of the C code with ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I32b53dd38bfc7e2231d875409e7dfda7c513cfb6
This allows for better vectorization of the C code, inlining of
TrueMotion_SSE2, better load usage in aarch64 and other minor
reordering with ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I07e9944d5c0aa5a079b22883ac5a2d649695e4a0
A minor improvement for arm targets with ndk r27/gcc-13 in H/VFilter8 (a
couple fewer moves w/aarch64) and much better vectorization of
DitherCombine8x8_C in most targets.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I03e73e6d6404261bb8408a9ae76a4b6ef142f8f0
on SetResidualCoeffs_*. This results in some minor code reordering when
targeting arvm7 with ndk r27 and other recent versions of clang. No
changes in the x86 compilations with clang-16 / gcc-13.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I7c3554ece848fafbc5ac9c4944f1dc85129f6fd8
histogram_symbols is converted to uint32_t and <<8 into
histogram_argb.
Using a uint32_t buffer from the start prevents copying and
converting the data.
Change-Id: I245003a6a0f048c31519afa25a600d4479e762e3
This is useful for a forward change that will improve compression.
It splits the residual computation and the best predictor
selection.
The only downside is that more memory is allocated: we had 2
histograms before, we now have 14, but this is necessary for the
later change. Still, this is nothing compared to what is done
later in the pipeline in HistogramSetTotalSize where the number of
histograms created is the number of pixels in the subsampled image.
Change-Id: If03501a26f00462dd1809daa6e9314abd180945d
WebPCleanupTransparentAreaLossless() was renamed to
WebPReplaceTransparentPixels() in:
55a080e5 Add WebPReplaceTransparentPixels() in dsp
Change-Id: I91e32574e6add2748c0655146f100eb2b40498b2
In practice, this can never happen because:
- 'streak' is at most as long as a histogram
- 'count' counts the number of streaks
'streak' and 'count' are therefore at most as big as the histogram
length which is at most the max of VP8LHistogramNumCodes,
which is 256+24+(1<<10).
Change-Id: I31c8834543479c8a9260732313ea26b045519515