Don't use `-fembed-bitcode`, fixes:
ld: warning: -bitcode_bundle is no longer supported and will be ignored
ld: -mllvm and -bitcode_bundle (Xcode setting ENABLE_BITCODE=YES) cannot
be used together
Change-Id: I4ead0fc71da39bb5ec92c1f5ba467b95ad8b7461
Take advantage of the known sizes used by VP8LHistogramAdd() and
remove loop for the remainder. The loop was being auto-vectorized making
the code larger and slower than the vectorized C code.
For larger sizes the new code is ~3-4.5% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.
The LINE_SIZE==8 option is removed with this change. It had been set
to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8
case by 2 in any case; they both profile similarly.
Change-Id: I6dfedfd57474f44d15e2ce510a48e5252221077a
Take advantage of the known sizes used by VP8LHistogramAdd() and remove
loop for the remainder. The loop was being auto-vectorized making the
code larger and slower than the vectorized C code.
For larger sizes the new code is ~4-7% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.
The LINE_SIZE==8 option is removed with this change. It had been set to
16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case
by 2 in any case; they both profile similarly.
Change-Id: I2376e2dca3bffa38477b4a432f4c533419e3be0e
This change is the same as the one that introduced the options to
img2webp:
0825faa4 img2webp: add -sharp_yuv/-near_lossless
Change-Id: Id380d159299c38dd6440f833d487e00c0976afec
Extend VP8EncIterator::i4_boundary_ by 3 bytes to avoid Intra4Preds_NEON
reading deeper into the struct (likely padding) when top is positioned
at offset 29. This data is memset with MSan to prevent a warning due to
its incorrect modeling of tbl instructions.
Prior to:
169dfbf9 disable Intra4Preds_NEON
there was a mismatch in the preprocessor checks for enabling the
function in NEON and removing the C version; NEON used `BPS == 32` while
the C code was removed unconditionally when building for aarch64. This
patch also normalizes those checks to look for `BPS == 32` and `BPS !=
32` as appropriate.
Bug: b:366668849,webp:372109644
Change-Id: Ic9e6ad4b2d844cb446decd63aec0b2676a89c8d0
These appear as warnings under VS15 (16 and 17 are silent) and were
missed in:
a32b436b dsp/lossless*: use WEBP_RESTRICT qualifier
Change-Id: Ia7cffafc166f2da93b51714363558798cda71b67
* changes:
dsp/yuv*: use WEBP_RESTRICT qualifier
dsp/upsampling*: use WEBP_RESTRICT qualifier
dsp/rescaler*: use WEBP_RESTRICT qualifier
dsp/lossless*: use WEBP_RESTRICT qualifier
dsp/filters*: use WEBP_RESTRICT qualifier
dsp/enc*: use WEBP_RESTRICT qualifier
dsp/dec*: use WEBP_RESTRICT qualifier
dsp/cost*: use WEBP_RESTRICT qualifier
The load of the `top` parameter may over read causing MSan errors:
==7373==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0xfff891d52ad4 in Intra4Preds_NEON src/dsp/enc_neon.c:1003:12
#1 0xfff892d87618 in MakeIntra4Preds src/enc/quant_enc.c:484:3
Bug: b:366668849
Change-Id: I29cf3b2f402ee79ea93c1ee2a4fdd95083aeed68
Better vectorization in the C code, fewer instructions / comparisons in
NEON, and fewer reloads in SSE2/SSE4 w/ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I07a7e36a2dce8632c71c0fbbeef94dc51453eaf7
Better vectorization in the C code, fewer instructions in NEON, and some
code reordering / better register usage in SSE2/SSE4 w/ndk
r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: Ib29980f778ad3dbb952178ad8dee39b8673c4ff8
Some improvement in the C code. No changes in NEON or SSE2 w/ndk
r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I2316122db893f48f0afda90a147c83cac7f07526
lossless_enc: better vectorization, most benefits seen in AddVector/Eq
w/ndk r27/gcc-13/clang-16
lossless: minor reordering and some improvement to PredictorAdd5_SSE2
w/gcc-13
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7
Better stack/register usage in SSE2/NEON code and improved vectorization
of the C code with ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I32b53dd38bfc7e2231d875409e7dfda7c513cfb6
This allows for better vectorization of the C code, inlining of
TrueMotion_SSE2, better load usage in aarch64 and other minor
reordering with ndk r27/gcc-13/clang-16.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I07e9944d5c0aa5a079b22883ac5a2d649695e4a0
A minor improvement for arm targets with ndk r27/gcc-13 in H/VFilter8 (a
couple fewer moves w/aarch64) and much better vectorization of
DitherCombine8x8_C in most targets.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I03e73e6d6404261bb8408a9ae76a4b6ef142f8f0
on SetResidualCoeffs_*. This results in some minor code reordering when
targeting arvm7 with ndk r27 and other recent versions of clang. No
changes in the x86 compilations with clang-16 / gcc-13.
This only affects non-vector pointers; any vector pointers are left as a
follow up.
Change-Id: I7c3554ece848fafbc5ac9c4944f1dc85129f6fd8
- use plural 'Notes' in the description of 'Background Color' to match
the formatting of the notes describing 'Disposal method';
- fix one unknown chunks link to contain both words to match others
+ add a couple of missing commas
These changes are based on editor changes in AUTH48:
https://datatracker.ietf.org/doc/draft-zern-webp/
Change-Id: Ibbed0459d42944099e295f492dc21bde4e107658
Use 'Chunk Size bytes - N' to avoid singular/plural confusion in the
case of 'Chunk Size - 1 bytes' case.
These changes are based on editor comments in AUTH48:
https://datatracker.ietf.org/doc/draft-zern-webp/
Change-Id: I898113033fd53d744fe9289f971887b8cfe278b9
histogram_symbols is converted to uint32_t and <<8 into
histogram_argb.
Using a uint32_t buffer from the start prevents copying and
converting the data.
Change-Id: I245003a6a0f048c31519afa25a600d4479e762e3
The wording might have implied that the library would optionally use
sharpyuv, though this option forces its use. The riskiness score
computed by SharpYuvEstimate420Risk() (extras/extras.c) is not used by
the library.
Change-Id: I56ea3262d7985215570809a4a629a2a7760e936a
This is useful for a forward change that will improve compression.
It splits the residual computation and the best predictor
selection.
The only downside is that more memory is allocated: we had 2
histograms before, we now have 14, but this is necessary for the
later change. Still, this is nothing compared to what is done
later in the pipeline in HistogramSetTotalSize where the number of
histograms created is the number of pixels in the subsampled image.
Change-Id: If03501a26f00462dd1809daa6e9314abd180945d
WebPCleanupTransparentAreaLossless() was renamed to
WebPReplaceTransparentPixels() in:
55a080e5 Add WebPReplaceTransparentPixels() in dsp
Change-Id: I91e32574e6add2748c0655146f100eb2b40498b2
In practice, this can never happen because:
- 'streak' is at most as long as a histogram
- 'count' counts the number of streaks
'streak' and 'count' are therefore at most as big as the histogram
length which is at most the max of VP8LHistogramNumCodes,
which is 256+24+(1<<10).
Change-Id: I31c8834543479c8a9260732313ea26b045519515
The default template for https://issues.webmproject.org/ is a public bug
report. Security issues can be reported securely using the 'Security
report' template.
Change-Id: Id489253c0def8a4d6d26327ea93ef4c796703ff1
The row parameter became a constant in:
2102ccd update the Unfilter API in dsp to process one row independently
num_rows is always equal to height.
Change-Id: Ie43dc5ef222e442ce8c92766da0b9824ccbca236
The inverse parameter became a constant in:
2102ccd update the Unfilter API in dsp to process one row independently
The row parameter to these functions is in a similar state; it will be
removed in a follow up.
Change-Id: I94cd8babe0e42474ff794ba5fa29dd48039de5f8
Replace vmovl_u8 -> s16 + signed vaddq with unsigned vaddw.
No change in assembly with clang-16 (armv7 & aarch64) and gcc-13
(aarch64). armv7 gcc-13 had kept the vmovl instructions, those are now
gone.
Change-Id: Ibb4fbdd5680d3e9dd06933c100528a6f363de472