Compare commits

...

44 Commits

Author SHA1 Message Date
Vincent Rabaud
a1ad3f1e37 Merge "Remove now unused ExtraCostCombined" into main 2025-04-01 00:28:47 -07:00
Vincent Rabaud
321561b41f Remove now unused ExtraCostCombined
Change-Id: Ic9d1ccf5b10fed67f836aa19fa0f84238acbf4c1
2025-03-29 23:34:20 +01:00
James Zern
e0ae21d231 WebPMemoryWriterClear: use WebPMemoryWriterInit
Removes some common code between the two functions.

Change-Id: If9f42e580e34dad63f3806750d9d7571941026b5
2025-03-28 12:37:24 -07:00
Vincent Rabaud
a4183d94c7 Remove the computation of ExtraCost when comparing histograms
Entropy clustering merges symbol histograms to reduce the overall
entropy. The cost of 2 added histograms is compared to the 2 costs
of the individual histograms and if it is smaller, a merge is done.

Except for some symbols (distance and length), the computed cost is
 the real final cost based on the histogram, and some constant cost
(independent from the probabilities of the symbols and hence the
merge) because the symbol is encode as Golomb.

This constant cost is useless and can be removed.

Change-Id: I6271e8c0e4111cdeff544cbdb7dec3c67be5309c
2025-03-28 15:00:41 +01:00
Vincent Rabaud
f2b3f52733 Get AVX2 into WebP lossless
Change-Id: Ifad3102c9f899a46401985515cd98f3f7a21887f
2025-03-28 11:44:03 +01:00
Vincent Rabaud
7c70ff7a3b Clean dsp/lossless includes
Change-Id: I47a405a9c402095b440404fe57ac08b5293ea71b
2025-03-25 12:38:00 +01:00
Vincent Rabaud
9dd5ae819b Use the full register in PredictorSub13_SSE2
No more than 15 registers are used at a time

Change-Id: I40f77d9df8500e5e0d52ff6b206d765e8be62ae1
2025-03-25 11:07:15 +01:00
James Zern
613be8fc61 Makefile.vc: add /MP to CFLAGS
This speeds up the batch rules by compiling source files in parallel.

Change-Id: If5076e9c245d82df957b05711a74e2569f4ba086
2025-03-17 16:33:51 -07:00
James Zern
1d86819f49 Merge changes I1437390a,I10a20de5,I1ac777d1 into main
* changes:
  pngdec.c: add support for 'eXIf' tag
  pngdec.c: support ImageMagick app1 exif text data
  pngdec.c: add missing #ifdef for png_get_iCCP
2025-03-06 14:00:07 -08:00
James Zern
743a5f092d enc_neon: enable vld1q_u8_x4 for clang & msvc
This restores the use of the function after
980b708e enc_neon: fix build w/aarch64 gcc < 9.4.0

The intrinsic was added to llvm for aarch64 in:
5e4ce1ae9dad Implement the newly added AArch64 ACLE functions for
             ld1/st1 with 2/3/4 vectors. The functions are like:
             vst1_s8_x2 ...
llvmorg-3.4.0-rc1~101
https://github.com/llvm/llvm-project/commit/5e4ce1ae9dad

Visual Studio 2019 and 2022 also support the function (2017 is still
disabled for this path due to it relying on arm64_neon.h).

Change-Id: I6ff10e22deb3968a48738a4458d2d3d55410b5ec
2025-03-05 16:56:20 -08:00
James Zern
565da14882 pngdec.c: add support for 'eXIf' tag
Test file created with exiftool 12.76:

```
exiftool test_app1_exif.png -exif:all \
  -exif:DocumentName=test_multi_exif.png -o test_multi_exif.png
```

Bug: webp:398066379
Change-Id: I1437390a70f5708421683eb69c588624bb376baa
2025-03-05 13:54:09 -08:00
James Zern
319860e919 pngdec.c: support ImageMagick app1 exif text data
Test file created with ImageMagick 6.9.13-12:

```
convert test_exif.png test_app1_exif.png
```

Bug: webp:398066379
Change-Id: I10a20de5699fabb0906045994d7d1f4b9e951973
2025-03-05 13:54:07 -08:00
James Zern
815fc1e110 pngdec.c: add missing #ifdef for png_get_iCCP
png_get_iCCP is an optional part of the API. Protect its usage with
PNG_iCCP_SUPPORTED.

Change-Id: I1ac777d1c2a200bb3e1303b3d095cc0d67633bd4
2025-03-05 13:54:04 -08:00
James Zern
980b708e2c enc_neon: fix build w/aarch64 gcc < 9.4.0
vld1q_u8_x4 was added for aarch64 in the gcc 9.4.0 release:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/ChangeLog;h=7558c0a369ea8c74a2b9369049a2d1cc187dc050;hb=13c83c4cc679ad5383ed57f359e53e8d518b7842#l2100

fixes:
src/dsp/enc_neon.c: In function 'Intra4Preds_NEON':
src/dsp/enc_neon.c:974:37: warning: implicit declaration of function
  'vld1q_u8_x4'; did you mean 'vld1q_u8_x2'?
  [-Wimplicit-function-declaration]

Bug: webp:398288323
Change-Id: Ic6e408065a375c945cc8691bd16a9f5d5642cfa2
2025-02-27 19:07:50 -08:00
James Zern
73b728cbb9 cmake: bump minimum version to 3.16
This matches the current support matrix (from 2024-12-17) [1] and quiets
a warning from recent (3.31.5) versions of cmake:

CMake Deprecation Warning at CMakeLists.txt:12 (cmake_minimum_required):
  Compatibility with CMake < 3.10 will be removed from a future version
  of CMake.

Explicit setting of CMP0072 is also removed; it was added in 3.11.

[1]: https://github.com/google/oss-policies-info/blob/main/foundational-cxx-support-matrix.md

Bug: webp:397130631
Change-Id: Ic844dadf983a82674990edbddbfc54329df12eb7
Fixed: webp:397130631
2025-02-20 12:32:08 -08:00
Vincent Rabaud
6a22b6709c Add a function to validate a WebPDecoderConfig
This echoes WebPValidateConfig for encoding.

Change-Id: Ib404d55c7af4d0755644879ec491e3998e6b5e8d
2025-01-30 10:10:08 +01:00
Vincent Rabaud
7ed2b10ef0 Use consistently signed stride types.
The stride can be negative when asked for a flipped image.

Change-Id: I049e8027c769186274a6a3049949f3fcaae7d2e9
2025-01-30 00:12:28 +01:00
Vincent Rabaud
654bfb040c Avoid nullptr arithmetic in VP8BitReaderSetBuffer
When start is nullptr, the IO is not used afterwards
anyway, so there is not risk.

Change-Id: I0a828aec85c6e228e95dfed4a40d348275a7c577
2025-01-30 00:12:15 +01:00
Vincent Rabaud
f8f2410710 Fix potential "divide by zero" in examples found by coverity
Change-Id: Ic41f9cb2ac24450986cd061db718953276eee080
2025-01-16 18:02:41 +01:00
James Zern
2af6c034ac libwebp-1.5.0
- 12/19/2024 version 1.5.0
   This is a binary compatible release.
   API changes:
     - `cross_color_transform_bits` added to WebPAuxStats
   * minor lossless encoder speed and compression improvements
   * lossless encoding does not use floats anymore
   * additional Arm optimizations for lossy & lossless + general code generation
     improvements
   * improvements to WASM performance (#643)
   * improvements and corrections in webp-container-spec.txt and
     webp-lossless-bitstream-spec.txt (#646, #355607636)
   * further security related hardening and increased fuzzing coverage w/fuzztest
     (oss-fuzz: #382816119, #70112, #70102, #69873, #69825, #69508, #69208)
   * miscellaneous warning, bug & build fixes (#499, #562, #381372617,
     #381109771, #42340561, #375011696, #372109644, chromium: #334120888)
   Tool updates:
     * gif2webp: add -sharp_yuv & -near_lossless
     * img2webp: add -exact & -noexact
     * exit codes normalized; running an example program with no
       arguments will output its help and exit with an error (#42340557,
       #381372617)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEaw5rcJdt4wPt8vYB+cPWvbgjK10FAmdk0soACgkQ+cPWvbgj
 K11UhhAAl5LtmIDz5uQE5ZlAADpIAuAC5nIikQUVY9up4RqAaw734atTh5JRzbpL
 QoQvAUPQ6YBdiH2GSF47THGHHQZfsV+f3yb0MICI3l6NOBJhHHFmG2Dt3IVVmO1l
 LZGM1CxaSZP7gpvSa/eNwvXEWxLezith7I3fyY0oIEf+JKdWan7uyeUPvc+iFrpo
 xTSpcAdWbHKGaC6zvH5gPJPlW64D2MZ31To+26s44uSgwpB6JrICXpxwn51cOClc
 1YzGJZ/aTQBphwY0W2yYFa5rBs6VxhCGHJAY5dSTmkeMUiQpIz0kugkKmyfVBmfN
 2tLC1suE0WgXzHUVwlZdorM5EIjXK0Orht/Fn5EphjmUXPr+2S+ENwiOnI2HPrEB
 Fn3Cy64uOHsuW58JVm+yUeNtPqB8uXunzQrteO7nd5aKXNxgth83QJwzv0T80tMI
 NltAfse+QPrbwA/GS250hh+8WfFzOr8i9W/3V0OYZXqLD/ooJA2hxy3gAU6Zr2qa
 GowRvbCZs18w1ormXoDEC3tBnBBPi8ktRfYd2wGHRl0VUFo1Ntyj+tr8NOuylpxO
 hE3WFn/Ao6Xs3WRSx1LppbPvWnH+j2UAm0QCeAcUN2A766XpKyupGDAg09fZUZe6
 korrahZni3I0uOpqyZX0W2FmCQYIIRHTwcCNLD/yTqlqhyiQtDg=
 =HZLr
 -----END PGP SIGNATURE-----

Merge tag 'v1.5.0'

libwebp-1.5.0

- 12/19/2024 version 1.5.0
  This is a binary compatible release.
  API changes:
    - `cross_color_transform_bits` added to WebPAuxStats
  * minor lossless encoder speed and compression improvements
  * lossless encoding does not use floats anymore
  * additional Arm optimizations for lossy & lossless + general code generation
    improvements
  * improvements to WASM performance (#643)
  * improvements and corrections in webp-container-spec.txt and
    webp-lossless-bitstream-spec.txt (#646, #355607636)
  * further security related hardening and increased fuzzing coverage w/fuzztest
    (oss-fuzz: #382816119, #70112, #70102, #69873, #69825, #69508, #69208)
  * miscellaneous warning, bug & build fixes (#499, #562, #381372617,
    #381109771, #42340561, #375011696, #372109644, chromium: #334120888)
  Tool updates:
    * gif2webp: add -sharp_yuv & -near_lossless
    * img2webp: add -exact & -noexact
    * exit codes normalized; running an example program with no
      arguments will output its help and exit with an error (#42340557,
      #381372617)

Bug: b:336795049,webp:380121350

* tag 'v1.5.0':
  update ChangeLog
  update NEWS
  tests/fuzzer/*: add missing <string_view> include
  fuzz_utils.cc: fix build error w/WEBP_REDUCE_SIZE
  mux_demux_api_fuzzer.cc: fix -Wshadow warning
  update ChangeLog
  update NEWS
  bump version to 1.5.0
  update AUTHORS

Change-Id: I076b197fac29230bc61bc5f06e950d83d058a737
2024-12-19 18:19:10 -08:00
James Zern
a4d7a71533 update ChangeLog
Bug: b:336795049,webp:380121350
Change-Id: I011bf6c44f89e475f58c6e96f5b68c6ed75a1e22
2024-12-19 17:17:50 -08:00
James Zern
c3d85ce4cf update NEWS
Bug: b:336795049,webp:380121350
Change-Id: Icb6f2f046647591a318f12853f265b2115060488
2024-12-19 12:32:05 -08:00
devtools-clrobot@google.com
ad14e811cf tests/fuzzer/*: add missing <string_view> include
Bug: webp:380121350
Change-Id: Ie0910165600317ed8c94305f0a793282e02e1c99
2024-12-19 12:31:52 -08:00
James Zern
74cd026edb fuzz_utils.cc: fix build error w/WEBP_REDUCE_SIZE
Correct (void) variable names.

Bug: webp:380121350
Change-Id: I3ce8a4d34f60f9ec0a467fcb5958b0b7a2edabc9
2024-12-17 09:58:20 -08:00
James Zern
a027aa93de mux_demux_api_fuzzer.cc: fix -Wshadow warning
`bool mux` -> `use_mux_api` to avoid conflicting with WebPMux variable.

Bug: webp:380121350
Change-Id: Ie3f8176efc296fae804c36ee0b27bf8e3034c6e8
2024-12-17 09:57:47 -08:00
James Zern
25e17c686f update ChangeLog
Bug: b:336795049,webp:380121350
Change-Id: I0ae2d1d3a812b6bc9c4cd889eae03308e344190b
2024-12-13 12:25:59 -08:00
James Zern
aa2684fccc update NEWS
Bug: b:336795049,webp:380121350
Change-Id: Ieb2c7961f6eef39813399855027d007e88ca7891
2024-12-13 09:27:03 -08:00
James Zern
369238461b bump version to 1.5.0
libwebp{,decoder} - 1.5.0
libwebp libtool - 8.10.1
libwebpdecoder libtool - 4.10.1

mux - 1.5.0
libtool - 4.1.1

demux - 1.5.0
libtool - 2.16.0

sharpyuv - 0.4.1
libtool - 1.1.1

Bug: b:336795049,webp:380121350
Change-Id: I53bdac2b0bd5ce30addf10e16776a16a07910e45
2024-12-12 17:43:51 -08:00
James Zern
ceea8ff6b3 update AUTHORS
Bug: b:336795049,webp:380121350
Change-Id: I45a2aff8d9f04d996e3391d111d18d0872f45b41
2024-12-12 17:43:41 -08:00
James Zern
e4f7a9f0c7 img2webp: add a warning for unused options
This may help prevent confusion when placing frame options after the
target frame.

Note for compatibility this isn't fatal, but the behavior may change in
the future.

Bug: webp:381372617
Change-Id: I9f3b51e60ff650ccc6fd29b8f5f607c3771a8a55
2024-12-10 17:32:54 -08:00
Vincent Rabaud
1b4c967fbb Merge "Properly check the data size against the end of the RIFF chunk" into main 2024-12-10 08:10:16 +00:00
Vincent Rabaud
9e5ecfaf00 Properly check the data size against the end of the RIFF chunk
Bug: oss-fuzz:382816119

Change-Id: I629870246d8f1bd7c6cb0d66e89018600cecee3a
2024-12-10 09:09:08 +01:00
James Zern
da0d9c7d4e examples: exit w/failure w/no args
cwebp, gif2webp, img2webp, vwebp and webpinfo are modified in this
change to align with the other examples. When given no arguments, the
examples print their help output and exit with failure.

Bug: webp:42340557, webp:381372617
Change-Id: Ifed4eb79e98233f7aa780c42e489636d0cf4a035
2024-12-06 14:53:06 -08:00
James Zern
fcff86c71b {gif,img}2webp: sync -m help w/cwebp
These were both missing the mention of the default value (4).

Bug: webm:381109771
Change-Id: Ibb0d822310af443c5ff5219fc0334008de0a0c60
2024-11-26 13:08:29 -08:00
James Zern
b76c4a8416 man/img2webp.1: sync -m text w/cwebp.1 & gif2webp.1
Bug: webp:381109771
Change-Id: I3df400305255ba74a913601cf7aa04f392814370
2024-11-26 13:03:20 -08:00
James Zern
306335198d muxread: fix reading of buffers > riff size
After:
  2c70ad76 muxread,CreateInternal: fix riff size checks (cl/200674839)

`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` (plus additional 1 byte
padding if needed). A later check included `CHUNK_HEADER_SIZE` before
capping the value of the size passed to `WebPMuxCreateInternal()`,
missing cases with a small amount of extra data after the RIFF chunk
(like a newline when the file is opened and saved in a text editor) and
setting size to an incorrect value, so larger sizes would also fail.

Another check of `riff_size < CHUNK_HEADER_SIZE` after the call to
`SizeWithPadding()` is removed because 1) it could not fail given
`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` to the value; and 2) it is
redundant as `size < RIFF_HEADER_SIZE + CHUNK_HEADER_SIZE` is checked
earlier in the function.

Bug: webp:42340561
Change-Id: I58dc4f071b27c2841001b4012aabdb1869f64f97
2024-11-22 12:40:34 -08:00
James Zern
4c85d860ea yuv.h: update RGB<->YUV coefficients in comment
The values for the R/G/B floating point formulas resembled
https://fourcc.org/fccyvrgb.php and Video Demystified, but the fixed
point values are more closely aligned to rounded values from
https://en.wikipedia.org/wiki/YCbCr and BT.601.

The R/G/B formulas with the values prior to this change are added to
sharpyuv_csp.c as they align with the fixed values. The origin of those
coefficients is unclear. For consistency between library versions we'll
leave them as is.

Bug: webp:375011696
Change-Id: Id3f2a57530eee700cc52a899b32b25b5c015e89b
2024-11-21 16:21:45 -08:00
James Zern
0ab789e067 Merge changes I6dfedfd5,I2376e2dc into main
* changes:
  rework AddVectorEq_SSE2
  rework AddVector_SSE2
2024-11-15 02:58:10 +00:00
James Zern
0323645066 {ios,xcframework}build.sh: fix compilation w/Xcode 16
Don't use `-fembed-bitcode`, fixes:
ld: warning: -bitcode_bundle is no longer supported and will be ignored
ld: -mllvm and -bitcode_bundle (Xcode setting ENABLE_BITCODE=YES) cannot
    be used together

Change-Id: I4ead0fc71da39bb5ec92c1f5ba467b95ad8b7461
2024-11-14 20:26:57 +00:00
James Zern
61e2cfdadd rework AddVectorEq_SSE2
Take advantage of the known sizes used by VP8LHistogramAdd() and
remove loop for the remainder. The loop was being auto-vectorized making
the code larger and slower than the vectorized C code.

For larger sizes the new code is ~3-4.5% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.

The LINE_SIZE==8 option is removed with this change. It had been set
to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8
case by 2 in any case; they both profile similarly.

Change-Id: I6dfedfd57474f44d15e2ce510a48e5252221077a
2024-11-14 12:21:39 -08:00
James Zern
7bda3deb89 rework AddVector_SSE2
Take advantage of the known sizes used by VP8LHistogramAdd() and remove
loop for the remainder. The loop was being auto-vectorized making the
code larger and slower than the vectorized C code.

For larger sizes the new code is ~4-7% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.

The LINE_SIZE==8 option is removed with this change. It had been set to
16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case
by 2 in any case; they both profile similarly.

Change-Id: I2376e2dca3bffa38477b4a432f4c533419e3be0e
2024-11-14 12:21:33 -08:00
Maryla
2ddaaf0aa5 Fix variable names in SharpYuvComputeConversionMatrix
Change-Id: Ia07e71aae42396100a4f50dc104e828239522d77
2024-11-07 09:37:40 +01:00
James Zern
a3ba6f19e9 Makefile.vc: fix gif2webp link error
Add missing dependency on libsharpyuv.

needed after:
f999d94f gif2webp: add -sharp_yuv/-near_lossless

Change-Id: I8bdd5c0fd4622f9c8ec6ffdf4ac11399f86350da
2024-11-06 10:14:05 -08:00
James Zern
f999d94f4a gif2webp: add -sharp_yuv/-near_lossless
This change is the same as the one that introduced the options to
img2webp:
0825faa4 img2webp: add -sharp_yuv/-near_lossless

Change-Id: Id380d159299c38dd6440f833d487e00c0976afec
2024-11-04 12:29:24 -08:00
67 changed files with 1870 additions and 285 deletions

View File

@ -11,11 +11,13 @@ Contributors:
- Christopher Degawa (ccom at randomderp dot com)
- Clement Courbet (courbet at google dot com)
- Djordje Pesut (djordje dot pesut at imgtec dot com)
- Frank (1433351828 at qq dot com)
- Frank Barchard (fbarchard at google dot com)
- Hui Su (huisu at google dot com)
- H. Vetinari (h dot vetinari at gmx dot com)
- Ilya Kurdyukov (jpegqs at gmail dot com)
- Ingvar Stepanyan (rreverser at google dot com)
- Istvan Stefan (Istvan dot Stefan at arm dot com)
- James Zern (jzern at google dot com)
- Jan Engelhardt (jengelh at medozas dot de)
- Jehan (jehan at girinstud dot io)
@ -62,6 +64,7 @@ Contributors:
- Vincent Rabaud (vrabaud at google dot com)
- Vlad Tsyrklevich (vtsyrklevich at chromium dot org)
- Wan-Teh Chang (wtc at google dot com)
- wrv (wrv at utexas dot edu)
- Yang Zhang (yang dot zhang at arm dot com)
- Yannis Guyon (yguyon at google dot com)
- Zhi An Ng (zhin at chromium dot org)

View File

@ -9,11 +9,7 @@
if(APPLE)
cmake_minimum_required(VERSION 3.17)
else()
cmake_minimum_required(VERSION 3.7)
endif()
if(POLICY CMP0072)
cmake_policy(SET CMP0072 NEW)
cmake_minimum_required(VERSION 3.16)
endif()
project(WebP C)
@ -567,7 +563,8 @@ if(WEBP_BUILD_GIF2WEBP)
add_executable(gif2webp ${GIF2WEBP_SRCS})
target_link_libraries(gif2webp exampleutil imageioutil webp libwebpmux
${WEBP_DEP_GIF_LIBRARIES})
target_include_directories(gif2webp PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src)
target_include_directories(gif2webp PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src
${CMAKE_CURRENT_SOURCE_DIR})
install(TARGETS gif2webp RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
endif()

138
ChangeLog
View File

@ -1,3 +1,141 @@
c3d85ce4 update NEWS
ad14e811 tests/fuzzer/*: add missing <string_view> include
74cd026e fuzz_utils.cc: fix build error w/WEBP_REDUCE_SIZE
a027aa93 mux_demux_api_fuzzer.cc: fix -Wshadow warning
25e17c68 update ChangeLog (tag: v1.5.0-rc1)
aa2684fc update NEWS
36923846 bump version to 1.5.0
ceea8ff6 update AUTHORS
e4f7a9f0 img2webp: add a warning for unused options
1b4c967f Merge "Properly check the data size against the end of the RIFF chunk" into main
9e5ecfaf Properly check the data size against the end of the RIFF chunk
da0d9c7d examples: exit w/failure w/no args
fcff86c7 {gif,img}2webp: sync -m help w/cwebp
b76c4a84 man/img2webp.1: sync -m text w/cwebp.1 & gif2webp.1
30633519 muxread: fix reading of buffers > riff size
4c85d860 yuv.h: update RGB<->YUV coefficients in comment
0ab789e0 Merge changes I6dfedfd5,I2376e2dc into main
03236450 {ios,xcframework}build.sh: fix compilation w/Xcode 16
61e2cfda rework AddVectorEq_SSE2
7bda3deb rework AddVector_SSE2
2ddaaf0a Fix variable names in SharpYuvComputeConversionMatrix
a3ba6f19 Makefile.vc: fix gif2webp link error
f999d94f gif2webp: add -sharp_yuv/-near_lossless
dfdcb7f9 Merge "lossless.h: fix function declaration mismatches" into main (tag: webp-rfc9649)
78ed6839 fix overread in Intra4Preds_NEON
d516a68e lossless.h: fix function declaration mismatches
87406904 Merge "Improve documentation of SharpYuvConversionMatrix." into main
fdb229ea Merge changes I07a7e36a,Ib29980f7,I2316122d,I2356e314,I32b53dd3, ... into main
0c3cd9cc Improve documentation of SharpYuvConversionMatrix.
169dfbf9 disable Intra4Preds_NEON
2dd5eb98 dsp/yuv*: use WEBP_RESTRICT qualifier
23bbafbe dsp/upsampling*: use WEBP_RESTRICT qualifier
35915b38 dsp/rescaler*: use WEBP_RESTRICT qualifier
a32b436b dsp/lossless*: use WEBP_RESTRICT qualifier
04d4b4f3 dsp/filters*: use WEBP_RESTRICT qualifier
b1cb37e6 dsp/enc*: use WEBP_RESTRICT qualifier
201894ef dsp/dec*: use WEBP_RESTRICT qualifier
02eac8a7 dsp/cost*: use WEBP_RESTRICT qualifier
84b118c9 Merge "webp-container-spec: normalize notes & unknown chunk link" into main
052cf42f webp-container-spec: normalize notes & unknown chunk link
220ee529 Search for best predictor transform bits
78619478 Try to reduce the sampling for the entropy image
14f09ab7 webp-container-spec: reorder chunk size - N text
a78c5356 Remove a useless malloc for entropy image
bc491763 Merge "Refactor predictor finding" into main
34f92238 man/{cwebp,img2webp}.1: rm 'if needed' from -sharp_yuv
367ca938 Refactor predictor finding
a582b53b webp-lossless-bitstream-spec: clarify some text
0fd25d84 Merge "anim_encode.c: fix function ref in comment" into main
f8882913 anim_encode.c: fix function ref in comment
40e4ca60 specs_generation.md: update kramdown command line
57883c78 img2webp: add -exact/-noexact per-frame options
1c8eba97 img2webp,cosmetics: add missing '.' spacers to help
2e81017c Convert predictor_enc.c to fixed point
94de6c7f Merge "Fix fuzztest link errors w/-DBUILD_SHARED_LIBS=1" into main
51d9832a Fix fuzztest link errors w/-DBUILD_SHARED_LIBS=1
7bcb36b8 Merge "Fix static overflow warning." into main
8e0cc14c Fix static overflow warning.
cea68462 README.md: add security report note
615e5874 Merge "make VP8LPredictor[01]_C() static" into main
233e86b9 Merge changes Ie43dc5ef,I94cd8bab into main
1a29fd2f make VP8LPredictor[01]_C() static
dd9d3770 Do*Filter_*: remove row & num_rows parameters
ab451a49 Do*Filter_C: remove dead 'inverse' code paths
f9a480f7 {TrueMotion,TM16}_NEON: remove zero extension
04834aca Merge changes I25c30a9e,I0a192fc6,I4cf89575 into main
39a602af webp-lossless-bitstream-spec: normalize predictor transform ref
f28c837d Merge "webp-container-spec: align anim pseudocode w/prose" into main
74be8e22 Fix implicit conversion issues
0c01db7c Merge "Increase the transform bits if possible." into main
f2d6dc1e Increase the transform bits if possible.
caa19e5b update link to issue tracker
c9dd9bd4 webp-container-spec: align anim pseudocode w/prose
8a7c8dc6 WASM: Enable VP8L_USE_FAST_LOAD
f0c53cd9 WASM: don't use USE_GENERIC_TREE
eef903d0 WASM: Enable 64-bit BITS caching
6296cc8d iterator_enc: make VP8IteratorReset() static
fbd93896 histogram_enc: make VP8LGetHistogramSize static
cc7ff545 cost_enc: make VP8CalculateLevelCosts[] static
4e2828ba vp8l_dec: make VP8LClear() static
d742b24a Intra16Preds_NEON: fix truemotion saturation
c7bb4cb5 Intra4Preds_NEON: fix truemotion saturation
952a989b Merge "Remove TODO now that log is using fixed point." into main
dde11574 Remove TODO now that log is using fixed point.
a1ca153d Fix hidden myerr in my_error_exit
3bd94202 Merge changes Iff6e47ed,I24c67cd5,Id781e761 into main
d27d246e Merge "Convert VP8LFastSLog2 to fixed point" into main
4838611f Disable msg_code use in fuzzing mode
314a142a Use QuantizeBlock_NEON for VP8EncQuantizeBlockWHT on Arm
3bfb05e3 Add AArch64 Neon implementation of Intra16Preds
baa93808 Add AArch64 Neon implementation of Intra4Preds
41a5e582 Fix errors when compiling code as C++
fb444b69 Convert VP8LFastSLog2 to fixed point
c1c89f51 Fix WEBP_NODISCARD comment and C++ version
66408c2c Switch the histogram_enc.h API to fixed point
ac1e410d Remove leftover tiff dep
b78d3957 Disable TIFF on fuzztest.
cff21a7d Do not build statically on oss-fuzz.
6853a8e5 Merge "Move more internal fuzzers to public." into main
9bc09db4 Merge "Convert VP8LFastLog2 to fixed point" into main
0a9f1c19 Convert VP8LFastLog2 to fixed point
db0cb9c2 Move more internal fuzzers to public.
ff2b5b15 Merge "advanced_api_fuzzer.cc: use crop dims in OOM check" into main
c4af79d0 Put 0 at the end of a palette and do not store it.
0ec80aef Delete last references to delta palettization
96d79f84 advanced_api_fuzzer.cc: use crop dims in OOM check
c35c7e02 Fix huffman fuzzer to not leak.
f2fe8dec Bump fuzztest dependency.
9ce982fd Fix fuzz tests to work on oss-fuzz
3ba8af1a Do not escape quotes anymore in build.sh
ea0e121b Allow centipede to be used as a fuzzing engine.
27731afd make VP8I4ModeOffsets & VP8MakeIntra4Preds static
ddd6245e oss-fuzz/build.sh: use heredoc for script creation
50074930 oss-fuzz/build.sh,cosmetics: fix indent
20e92f7d Limit the possible fuzz engines.
4f200de5 Switch public fuzz tests to fuzztest.
64186bb3 Add huffman_fuzzer to .gitignore
0905f61c Move build script from oss-fuzz repo to here.
e8678758 Fix link to Javascript documentation
5e5b8f0c Fix SSE2 Transform_AC3 function name
45129ee0 Revert "Check all the rows."
ee26766a Check all the rows.
7ec51c59 Increase the transform bits if possible.
3cd16fd3 Revert "Increase the transform bits if possible."
971a03d8 Increase the transform bits if possible.
1bf198a2 Allow transform_bits to be different during encoding.
1e462ca8 Define MAX_TRANSFORM_BITS according to the specification.
64d1ec23 Use (MIN/NUM)_(TRANSFORM/HUFFMAN)_BITS where appropriate
a90160e1 Refactor histograms in predictors.
a7aa7525 Fix some function declarations
68ff4e1e Merge "jpegdec: add a hint for EOF/READ errors" into main
79e7968a jpegdec: add a hint for EOF/READ errors
d33455cd man/*: s/BUGS/REPORTING BUGS/
a67ff735 normalize example exit status
edc28909 upsampling_{neon,sse41}: fix int sanitizer warning
3cada4ce ImgIoUtilReadFile: check ftell() return
dc950585 Merge tag 'v1.4.0'
845d5476 update ChangeLog (tag: v1.4.0, origin/1.4.0)
8a6a55bb update NEWS
cf7c5a5d provide a way to opt-out/override WEBP_NODISCARD
cc34288a update ChangeLog (tag: v1.4.0-rc1)

View File

@ -32,7 +32,7 @@ PLATFORM_LDFLAGS = /SAFESEH
NOLOGO = /nologo
CCNODBG = cl.exe $(NOLOGO) /O2 /DNDEBUG
CCDEBUG = cl.exe $(NOLOGO) /Od /Zi /D_DEBUG /RTC1
CFLAGS = /I. /Isrc $(NOLOGO) /W3 /EHsc /c
CFLAGS = /I. /Isrc $(NOLOGO) /MP /W3 /EHsc /c
CFLAGS = $(CFLAGS) /DWIN32 /D_CRT_SECURE_NO_WARNINGS /DWIN32_LEAN_AND_MEAN
LDFLAGS = /LARGEADDRESSAWARE /MANIFEST:EMBED /NXCOMPAT /DYNAMICBASE
LDFLAGS = $(LDFLAGS) $(PLATFORM_LDFLAGS)
@ -231,6 +231,7 @@ DSP_DEC_OBJS = \
$(DIROBJ)\dsp\lossless_neon.obj \
$(DIROBJ)\dsp\lossless_sse2.obj \
$(DIROBJ)\dsp\lossless_sse41.obj \
$(DIROBJ)\dsp\lossless_avx2.obj \
$(DIROBJ)\dsp\rescaler.obj \
$(DIROBJ)\dsp\rescaler_mips32.obj \
$(DIROBJ)\dsp\rescaler_mips_dsp_r2.obj \
@ -270,6 +271,7 @@ DSP_ENC_OBJS = \
$(DIROBJ)\dsp\lossless_enc_neon.obj \
$(DIROBJ)\dsp\lossless_enc_sse2.obj \
$(DIROBJ)\dsp\lossless_enc_sse41.obj \
$(DIROBJ)\dsp\lossless_enc_avx2.obj \
$(DIROBJ)\dsp\ssim.obj \
$(DIROBJ)\dsp\ssim_sse2.obj \
@ -393,7 +395,7 @@ $(DIRBIN)\dwebp.exe: $(IMAGEIO_UTIL_OBJS)
$(DIRBIN)\dwebp.exe: $(LIBWEBPDEMUX)
$(DIRBIN)\gif2webp.exe: $(DIROBJ)\examples\gif2webp.obj $(EX_GIF_DEC_OBJS)
$(DIRBIN)\gif2webp.exe: $(EX_UTIL_OBJS) $(IMAGEIO_UTIL_OBJS) $(LIBWEBPMUX)
$(DIRBIN)\gif2webp.exe: $(LIBWEBP)
$(DIRBIN)\gif2webp.exe: $(LIBWEBP) $(LIBSHARPYUV)
$(DIRBIN)\vwebp.exe: $(DIROBJ)\examples\vwebp.obj $(EX_UTIL_OBJS)
$(DIRBIN)\vwebp.exe: $(IMAGEIO_UTIL_OBJS) $(LIBWEBPDEMUX) $(LIBWEBP)
$(DIRBIN)\vwebp_sdl.exe: $(DIROBJ)\extras\vwebp_sdl.obj

22
NEWS
View File

@ -1,3 +1,25 @@
- 12/19/2024 version 1.5.0
This is a binary compatible release.
API changes:
- `cross_color_transform_bits` added to WebPAuxStats
* minor lossless encoder speed and compression improvements
* lossless encoding does not use floats anymore
* additional Arm optimizations for lossy & lossless + general code generation
improvements
* improvements to WASM performance (#643)
* improvements and corrections in webp-container-spec.txt and
webp-lossless-bitstream-spec.txt (#646, #355607636)
* further security related hardening and increased fuzzing coverage w/fuzztest
(oss-fuzz: #382816119, #70112, #70102, #69873, #69825, #69508, #69208)
* miscellaneous warning, bug & build fixes (#499, #562, #381372617,
#381109771, #42340561, #375011696, #372109644, chromium: #334120888)
Tool updates:
* gif2webp: add -sharp_yuv & -near_lossless
* img2webp: add -exact & -noexact
* exit codes normalized; running an example program with no
arguments will output its help and exit with an error (#42340557,
#381372617)
- 4/12/2024: version 1.4.0
This is a binary compatible release.
* API changes:

View File

@ -7,7 +7,7 @@
\__\__/\____/\_____/__/ ____ ___
/ _/ / \ \ / _ \/ _/
/ \_/ / / \ \ __/ \__
\____/____/\_____/_____/____/v1.4.0
\____/____/\_____/_____/____/v1.5.0
```
WebP codec is a library to encode and decode images in WebP format. This package

View File

@ -94,6 +94,9 @@
/* Set to 1 if SSE4.1 is supported */
#cmakedefine WEBP_HAVE_SSE41 1
/* Set to 1 if AVX2 is supported */
#cmakedefine WEBP_HAVE_AVX2 1
/* Set to 1 if TIFF library is installed */
#cmakedefine WEBP_HAVE_TIFF 1

View File

@ -38,9 +38,9 @@ function(webp_check_compiler_flag WEBP_SIMD_FLAG ENABLE_SIMD)
endfunction()
# those are included in the names of WEBP_USE_* in c++ code.
set(WEBP_SIMD_FLAGS "SSE41;SSE2;MIPS32;MIPS_DSP_R2;NEON;MSA")
set(WEBP_SIMD_FLAGS "AVX2;SSE41;SSE2;MIPS32;MIPS_DSP_R2;NEON;MSA")
set(WEBP_SIMD_FILE_EXTENSIONS
"_sse41.c;_sse2.c;_mips32.c;_mips_dsp_r2.c;_neon.c;_msa.c")
"_avx2.c;_sse41.c;_sse2.c;_mips32.c;_mips_dsp_r2.c;_neon.c;_msa.c")
if(MSVC AND CMAKE_C_COMPILER_ID STREQUAL "MSVC")
# With at least Visual Studio 12 (2013)+ /arch is not necessary to build SSE2
# or SSE4 code unless a lesser /arch is forced. MSVC does not have a SSE4
@ -50,12 +50,12 @@ if(MSVC AND CMAKE_C_COMPILER_ID STREQUAL "MSVC")
if(MSVC_VERSION GREATER_EQUAL 1800 AND NOT CMAKE_C_FLAGS MATCHES "/arch:")
set(SIMD_ENABLE_FLAGS)
else()
set(SIMD_ENABLE_FLAGS "/arch:AVX;/arch:SSE2;;;;")
set(SIMD_ENABLE_FLAGS "/arch:AVX2;/arch:AVX;/arch:SSE2;;;;")
endif()
set(SIMD_DISABLE_FLAGS)
else()
set(SIMD_ENABLE_FLAGS "-msse4.1;-msse2;-mips32;-mdspr2;-mfpu=neon;-mmsa")
set(SIMD_DISABLE_FLAGS "-mno-sse4.1;-mno-sse2;;-mno-dspr2;;-mno-msa")
set(SIMD_ENABLE_FLAGS "-mavx2;-msse4.1;-msse2;-mips32;-mdspr2;-mfpu=neon;-mmsa")
set(SIMD_DISABLE_FLAGS "-mno-avx2;-mno-sse4.1;-mno-sse2;;-mno-dspr2;;-mno-msa")
endif()
set(WEBP_SIMD_FILES_TO_INCLUDE)

View File

@ -1,4 +1,4 @@
AC_INIT([libwebp], [1.4.0],
AC_INIT([libwebp], [1.5.0],
[https://issues.webmproject.org],,
[https://developers.google.com/speed/webp])
AC_CANONICAL_HOST
@ -161,6 +161,25 @@ AS_IF([test "$GCC" = "yes" ], [
AC_SUBST([AM_CFLAGS])
dnl === Check for machine specific flags
AC_ARG_ENABLE([avx2],
AS_HELP_STRING([--disable-avx2],
[Disable detection of AVX2 support
@<:@default=auto@:>@]))
AS_IF([test "x$enable_avx2" != "xno" -a "x$enable_sse4_1" != "xno"
-a "x$enable_sse2" != "xno"], [
AVX2_FLAGS="$INTRINSICS_CFLAGS $AVX2_FLAGS"
TEST_AND_ADD_CFLAGS([AVX2_FLAGS], [-mavx2])
AS_IF([test -n "$AVX2_FLAGS"], [
SAVED_CFLAGS=$CFLAGS
CFLAGS="$CFLAGS $AVX2_FLAGS"
AC_CHECK_HEADER([immintrin.h],
[AC_DEFINE(WEBP_HAVE_AVX2, [1],
[Set to 1 if AVX2 is supported])],
[AVX2_FLAGS=""])
CFLAGS=$SAVED_CFLAGS])
AC_SUBST([AVX2_FLAGS])])
AC_ARG_ENABLE([sse4.1],
AS_HELP_STRING([--disable-sse4.1],
[Disable detection of SSE4.1 support

View File

@ -324,7 +324,7 @@ Per-frame options (only used for subsequent images input):
-lossless ............ use lossless mode (default)
-lossy ............... use lossy mode
-q <float> ........... quality
-m <int> ............. method to use
-m <int> ............. compression method (0=fast, 6=slowest), default=4
-exact, -noexact ..... preserve or alter RGB values in transparent area
(default: -noexact, may cause artifacts
with lossy animations)
@ -354,8 +354,12 @@ Options:
-lossy ................. encode image using lossy compression
-mixed ................. for each frame in the image, pick lossy
or lossless compression heuristically
-near_lossless <int> ... use near-lossless image preprocessing
(0..100=off), default=100
-sharp_yuv ............. use sharper (and slower) RGB->YUV conversion
(lossy only)
-q <float> ............. quality factor (0:small..100:big)
-m <int> ............... compression method (0=fast, 6=slowest)
-m <int> ............... compression method (0=fast, 6=slowest), default=4
-min_size .............. minimize output size (default:off)
lossless compression by default; can be
combined with -q, -m, -lossy or -mixed

View File

@ -67,7 +67,7 @@ dwebp_LDADD += ../src/libwebp.la
dwebp_LDADD +=$(PNG_LIBS) $(JPEG_LIBS)
gif2webp_SOURCES = gif2webp.c gifdec.c gifdec.h
gif2webp_CPPFLAGS = $(AM_CPPFLAGS) $(GIF_INCLUDES)
gif2webp_CPPFLAGS = $(AM_CPPFLAGS) $(GIF_INCLUDES) -I$(top_srcdir)
gif2webp_LDADD =
gif2webp_LDADD += libexample_util.la
gif2webp_LDADD += ../imageio/libimageio_util.la

View File

@ -771,6 +771,7 @@ void GetDiffAndPSNR(const uint8_t rgba1[], const uint8_t rgba2[],
*psnr = 99.; // PSNR when images are identical.
} else {
sse /= stride * height;
assert(sse != 0.0);
*psnr = 4.3429448 * log(255. * 255. / sse);
}
}

View File

@ -698,7 +698,7 @@ int main(int argc, const char* argv[]) {
if (argc == 1) {
HelpShort();
FREE_WARGV_AND_RETURN(EXIT_SUCCESS);
FREE_WARGV_AND_RETURN(EXIT_FAILURE);
}
for (c = 1; c < argc; ++c) {

View File

@ -28,6 +28,7 @@
#endif
#include <gif_lib.h>
#include "sharpyuv/sharpyuv.h"
#include "webp/encode.h"
#include "webp/mux.h"
#include "../examples/example_util.h"
@ -70,8 +71,14 @@ static void Help(void) {
printf(" -lossy ................. encode image using lossy compression\n");
printf(" -mixed ................. for each frame in the image, pick lossy\n"
" or lossless compression heuristically\n");
printf(" -near_lossless <int> ... use near-lossless image preprocessing\n"
" (0..100=off), default=100\n");
printf(" -sharp_yuv ............. use sharper (and slower) RGB->YUV "
"conversion\n"
" (lossy only)\n");
printf(" -q <float> ............. quality factor (0:small..100:big)\n");
printf(" -m <int> ............... compression method (0=fast, 6=slowest)\n");
printf(" -m <int> ............... compression method (0=fast, 6=slowest), "
"default=4\n");
printf(" -min_size .............. minimize output size (default:off)\n"
" lossless compression by default; can be\n"
" combined with -q, -m, -lossy or -mixed\n"
@ -151,7 +158,7 @@ int main(int argc, const char* argv[]) {
if (argc == 1) {
Help();
FREE_WARGV_AND_RETURN(EXIT_SUCCESS);
FREE_WARGV_AND_RETURN(EXIT_FAILURE);
}
for (c = 1; c < argc; ++c) {
@ -166,6 +173,10 @@ int main(int argc, const char* argv[]) {
} else if (!strcmp(argv[c], "-mixed")) {
enc_options.allow_mixed = 1;
config.lossless = 0;
} else if (!strcmp(argv[c], "-near_lossless") && c < argc - 1) {
config.near_lossless = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-sharp_yuv")) {
config.use_sharp_yuv = 1;
} else if (!strcmp(argv[c], "-loop_compatibility")) {
loop_compatibility = 1;
} else if (!strcmp(argv[c], "-q") && c < argc - 1) {
@ -226,10 +237,13 @@ int main(int argc, const char* argv[]) {
} else if (!strcmp(argv[c], "-version")) {
const int enc_version = WebPGetEncoderVersion();
const int mux_version = WebPGetMuxVersion();
const int sharpyuv_version = SharpYuvGetVersion();
printf("WebP Encoder version: %d.%d.%d\nWebP Mux version: %d.%d.%d\n",
(enc_version >> 16) & 0xff, (enc_version >> 8) & 0xff,
enc_version & 0xff, (mux_version >> 16) & 0xff,
(mux_version >> 8) & 0xff, mux_version & 0xff);
printf("libsharpyuv: %d.%d.%d\n", (sharpyuv_version >> 24) & 0xff,
(sharpyuv_version >> 16) & 0xffff, sharpyuv_version & 0xff);
FREE_WARGV_AND_RETURN(EXIT_SUCCESS);
} else if (!strcmp(argv[c], "-quiet")) {
quiet = 1;

View File

@ -62,7 +62,8 @@ static void Help(void) {
printf(" -lossless ............ use lossless mode (default)\n");
printf(" -lossy ............... use lossy mode\n");
printf(" -q <float> ........... quality\n");
printf(" -m <int> ............. method to use\n");
printf(" -m <int> ............. compression method (0=fast, 6=slowest), "
"default=4\n");
printf(" -exact, -noexact ..... preserve or alter RGB values in transparent "
"area\n"
" (default: -noexact, may cause artifacts\n"
@ -150,6 +151,7 @@ int main(int argc, const char* argv[]) {
WebPData webp_data;
int c;
int have_input = 0;
int last_input_index = 0;
CommandLineArguments cmd_args;
int ok;
@ -228,6 +230,8 @@ int main(int argc, const char* argv[]) {
}
if (!have_input) {
fprintf(stderr, "No input file(s) for generating animation!\n");
ok = 0;
Help();
goto End;
}
@ -276,6 +280,7 @@ int main(int argc, const char* argv[]) {
// read next input image
pic.use_argb = 1;
ok = ReadImage((const char*)GET_WARGV_SHIFTED(argv, c), &pic);
last_input_index = c;
if (!ok) goto End;
if (enc == NULL) {
@ -314,6 +319,13 @@ int main(int argc, const char* argv[]) {
++pic_num;
}
for (c = last_input_index + 1; c < argc; ++c) {
if (argv[c] != NULL) {
fprintf(stderr, "Warning: unused option [%s]!"
" Frame options go before the input frame.\n", argv[c]);
}
}
// add a last fake frame to signal the last duration
ok = ok && WebPAnimEncoderAdd(enc, NULL, timestamp_ms, NULL);
ok = ok && WebPAnimEncoderAssemble(enc, &webp_data);

View File

@ -568,7 +568,7 @@ int main(int argc, char* argv[]) {
if (kParams.file_name == NULL) {
printf("missing input file!!\n");
Help();
FREE_WARGV_AND_RETURN(EXIT_SUCCESS);
FREE_WARGV_AND_RETURN(EXIT_FAILURE);
}
if (!ImgIoUtilReadFile(kParams.file_name,

View File

@ -1132,7 +1132,7 @@ int main(int argc, const char* argv[]) {
if (argc == 1) {
Help();
FREE_WARGV_AND_RETURN(EXIT_SUCCESS);
FREE_WARGV_AND_RETURN(EXIT_FAILURE);
}
// Parse command-line input.

View File

@ -24,7 +24,7 @@
#include "webp/types.h"
#define XTRA_MAJ_VERSION 1
#define XTRA_MIN_VERSION 4
#define XTRA_MIN_VERSION 5
#define XTRA_REV_VERSION 0
//------------------------------------------------------------------------------

View File

@ -57,6 +57,12 @@ int main(int argc, char* argv[]) {
INIT_WARGV(argc, argv);
if (argc == 1) {
fprintf(stderr, "Usage: %s [-h] image.webp [more_files.webp...]\n",
argv[0]);
goto Error;
}
for (c = 1; c < argc; ++c) {
const char* file = NULL;
const uint8_t* webp = NULL;

View File

@ -139,6 +139,8 @@ static const struct {
{ "Raw profile type xmp", ProcessRawProfile, METADATA_OFFSET(xmp) },
// Exiftool puts exif data in APP1 chunk, too.
{ "Raw profile type APP1", ProcessRawProfile, METADATA_OFFSET(exif) },
// ImageMagick uses lowercase app1.
{ "Raw profile type app1", ProcessRawProfile, METADATA_OFFSET(exif) },
// XMP Specification Part 3, Section 3 #PNG
{ "XML:com.adobe.xmp", MetadataCopy, METADATA_OFFSET(xmp) },
{ NULL, NULL, 0 },
@ -159,6 +161,20 @@ static int ExtractMetadataFromPNG(png_structp png,
png_textp text = NULL;
const png_uint_32 num = png_get_text(png, info, &text, NULL);
png_uint_32 i;
#ifdef PNG_eXIf_SUPPORTED
// Look for an 'eXIf' tag. Preference is given to this tag as it's newer
// than the TextualData tags.
{
png_bytep exif;
png_uint_32 len;
if (png_get_eXIf_1(png, info, &len, &exif) == PNG_INFO_eXIf) {
if (!MetadataCopy((const char*)exif, len, &metadata->exif)) return 0;
}
}
#endif // PNG_eXIf_SUPPORTED
// Look for EXIF / XMP metadata.
for (i = 0; i < num; ++i, ++text) {
int j;
@ -192,6 +208,7 @@ static int ExtractMetadataFromPNG(png_structp png,
}
}
}
#ifdef PNG_iCCP_SUPPORTED
// Look for an ICC profile.
{
png_charp name;
@ -208,6 +225,7 @@ static int ExtractMetadataFromPNG(png_structp png,
if (!MetadataCopy((const char*)profile, len, &metadata->iccp)) return 0;
}
}
#endif // PNG_iCCP_SUPPORTED
}
return 1;
}

View File

@ -53,7 +53,7 @@ DEMUXLIBLIST=''
if [[ -z "${SDK}" ]]; then
echo "iOS SDK not available"
exit 1
elif [[ ${SDK%%.*} -gt 8 ]]; then
elif [[ ${SDK%%.*} -gt 8 && "${XCODE%%.*}" -lt 16 ]]; then
EXTRA_CFLAGS="-fembed-bitcode"
elif [[ ${SDK%%.*} -le 6 ]]; then
echo "You need iOS SDK version 6.0 or above"

View File

@ -1,5 +1,5 @@
.\" Hey, EMACS: -*- nroff -*-
.TH GIF2WEBP 1 "July 18, 2024"
.TH GIF2WEBP 1 "November 4, 2024"
.SH NAME
gif2webp \- Convert a GIF image to WebP
.SH SYNOPSIS
@ -39,6 +39,18 @@ Encode the image using lossy compression.
Mixed compression mode: optimize compression of the image by picking either
lossy or lossless compression for each frame heuristically.
.TP
.BI \-near_lossless " int
Specify the level of near\-lossless image preprocessing. This option adjusts
pixel values to help compressibility, but has minimal impact on the visual
quality. It triggers lossless compression mode automatically. The range is 0
(maximum preprocessing) to 100 (no preprocessing, the default). The typical
value is around 60. Note that lossy with \fB\-q 100\fP can at times yield
better results.
.TP
.B \-sharp_yuv
Use more accurate and sharper RGB->YUV conversion. Note that this process is
slower than the default 'fast' RGB->YUV conversion.
.TP
.BI \-q " float
Specify the compression factor for RGB channels between 0 and 100. The default
is 75.

View File

@ -1,5 +1,5 @@
.\" Hey, EMACS: -*- nroff -*-
.TH IMG2WEBP 1 "September 17, 2024"
.TH IMG2WEBP 1 "November 26, 2024"
.SH NAME
img2webp \- create animated WebP file from a sequence of input images.
.SH SYNOPSIS
@ -88,6 +88,10 @@ Specify the compression factor between 0 and 100. The default is 75.
Specify the compression method to use. This parameter controls the
trade off between encoding speed and the compressed file size and quality.
Possible values range from 0 to 6. Default value is 4.
When higher values are used, the encoder will spend more time inspecting
additional encoding possibilities and decide on the quality gain.
Lower value can result in faster processing time at the expense of
larger file size and lower compression quality.
.TP
.B \-exact, \-noexact
Preserve or alter RGB values in transparent area. The default is

View File

@ -33,7 +33,7 @@ libsharpyuv_la_SOURCES += sharpyuv_gamma.c sharpyuv_gamma.h
libsharpyuv_la_SOURCES += sharpyuv.c sharpyuv.h
libsharpyuv_la_CPPFLAGS = $(AM_CPPFLAGS)
libsharpyuv_la_LDFLAGS = -no-undefined -version-info 1:0:1 -lm
libsharpyuv_la_LDFLAGS = -no-undefined -version-info 1:1:1 -lm
libsharpyuv_la_LIBADD =
libsharpyuv_la_LIBADD += libsharpyuv_sse2.la
libsharpyuv_la_LIBADD += libsharpyuv_neon.la

View File

@ -6,8 +6,8 @@
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
VS_VERSION_INFO VERSIONINFO
FILEVERSION 0,0,4,0
PRODUCTVERSION 0,0,4,0
FILEVERSION 0,0,4,1
PRODUCTVERSION 0,0,4,1
FILEFLAGSMASK 0x3fL
#ifdef _DEBUG
FILEFLAGS 0x1L
@ -24,12 +24,12 @@ BEGIN
BEGIN
VALUE "CompanyName", "Google, Inc."
VALUE "FileDescription", "libsharpyuv DLL"
VALUE "FileVersion", "0.4.0"
VALUE "FileVersion", "0.4.1"
VALUE "InternalName", "libsharpyuv.dll"
VALUE "LegalCopyright", "Copyright (C) 2024"
VALUE "OriginalFilename", "libsharpyuv.dll"
VALUE "ProductName", "SharpYuv Library"
VALUE "ProductVersion", "0.4.0"
VALUE "ProductVersion", "0.4.1"
END
END
BLOCK "VarFileInfo"

View File

@ -52,7 +52,7 @@ extern "C" {
// SharpYUV API version following the convention from semver.org
#define SHARPYUV_VERSION_MAJOR 0
#define SHARPYUV_VERSION_MINOR 4
#define SHARPYUV_VERSION_PATCH 0
#define SHARPYUV_VERSION_PATCH 1
// Version as a uint32_t. The major number is the high 8 bits.
// The minor number is the middle 8 bits. The patch number is the low 16 bits.
#define SHARPYUV_MAKE_VERSION(MAJOR, MINOR, PATCH) \

View File

@ -22,16 +22,16 @@ void SharpYuvComputeConversionMatrix(const SharpYuvColorSpace* yuv_color_space,
const float kr = yuv_color_space->kr;
const float kb = yuv_color_space->kb;
const float kg = 1.0f - kr - kb;
const float cr = 0.5f / (1.0f - kb);
const float cb = 0.5f / (1.0f - kr);
const float cb = 0.5f / (1.0f - kb);
const float cr = 0.5f / (1.0f - kr);
const int shift = yuv_color_space->bit_depth - 8;
const float denom = (float)((1 << yuv_color_space->bit_depth) - 1);
float scale_y = 1.0f;
float add_y = 0.0f;
float scale_u = cr;
float scale_v = cb;
float scale_u = cb;
float scale_v = cr;
float add_uv = (float)(128 << shift);
assert(yuv_color_space->bit_depth >= 8);
@ -60,6 +60,10 @@ void SharpYuvComputeConversionMatrix(const SharpYuvColorSpace* yuv_color_space,
// Matrices are in YUV_FIX fixed point precision.
// WebP's matrix, similar but not identical to kRec601LimitedMatrix
// Derived using the following formulas:
// Y = 0.2569 * R + 0.5044 * G + 0.0979 * B + 16
// U = -0.1483 * R - 0.2911 * G + 0.4394 * B + 128
// V = 0.4394 * R - 0.3679 * G - 0.0715 * B + 128
static const SharpYuvConversionMatrix kWebpMatrix = {
{16839, 33059, 6420, 16 << 16},
{-9719, -19081, 28800, 128 << 16},

View File

@ -36,7 +36,7 @@ libwebp_la_LIBADD += utils/libwebputils.la
# other than the ones listed on the command line, i.e., after linking, it will
# not have unresolved symbols. Some platforms (Windows among them) require all
# symbols in shared libraries to be resolved at library creation.
libwebp_la_LDFLAGS = -no-undefined -version-info 8:9:1
libwebp_la_LDFLAGS = -no-undefined -version-info 8:10:1
libwebpincludedir = $(includedir)/webp
pkgconfig_DATA = libwebp.pc
@ -48,7 +48,7 @@ if BUILD_LIBWEBPDECODER
libwebpdecoder_la_LIBADD += dsp/libwebpdspdecode.la
libwebpdecoder_la_LIBADD += utils/libwebputilsdecode.la
libwebpdecoder_la_LDFLAGS = -no-undefined -version-info 4:9:1
libwebpdecoder_la_LDFLAGS = -no-undefined -version-info 4:10:1
pkgconfig_DATA += libwebpdecoder.pc
endif

View File

@ -26,10 +26,9 @@ static const uint8_t kModeBpp[MODE_LAST] = {
4, 4, 4, 2, // pre-multiplied modes
1, 1 };
// Check that webp_csp_mode is within the bounds of WEBP_CSP_MODE.
// Convert to an integer to handle both the unsigned/signed enum cases
// without the need for casting to remove type limit warnings.
static int IsValidColorspace(int webp_csp_mode) {
int IsValidColorspace(int webp_csp_mode) {
return (webp_csp_mode >= MODE_RGB && webp_csp_mode < MODE_LAST);
}

View File

@ -51,4 +51,7 @@ enum { MB_FEATURE_TREE_PROBS = 3,
NUM_PROBAS = 11
};
// Check that webp_csp_mode is within the bounds of WEBP_CSP_MODE.
int IsValidColorspace(int webp_csp_mode);
#endif // WEBP_DEC_COMMON_DEC_H_

View File

@ -12,7 +12,9 @@
// Author: Skal (pascal.massimino@gmail.com)
#include <assert.h>
#include <stddef.h>
#include <stdlib.h>
#include "src/dec/vp8i_dec.h"
#include "src/dec/webpi_dec.h"
#include "src/dsp/dsp.h"
@ -25,9 +27,9 @@
static int EmitYUV(const VP8Io* const io, WebPDecParams* const p) {
WebPDecBuffer* output = p->output;
const WebPYUVABuffer* const buf = &output->u.YUVA;
uint8_t* const y_dst = buf->y + (size_t)io->mb_y * buf->y_stride;
uint8_t* const u_dst = buf->u + (size_t)(io->mb_y >> 1) * buf->u_stride;
uint8_t* const v_dst = buf->v + (size_t)(io->mb_y >> 1) * buf->v_stride;
uint8_t* const y_dst = buf->y + (ptrdiff_t)io->mb_y * buf->y_stride;
uint8_t* const u_dst = buf->u + (ptrdiff_t)(io->mb_y >> 1) * buf->u_stride;
uint8_t* const v_dst = buf->v + (ptrdiff_t)(io->mb_y >> 1) * buf->v_stride;
const int mb_w = io->mb_w;
const int mb_h = io->mb_h;
const int uv_w = (mb_w + 1) / 2;
@ -42,7 +44,7 @@ static int EmitYUV(const VP8Io* const io, WebPDecParams* const p) {
static int EmitSampledRGB(const VP8Io* const io, WebPDecParams* const p) {
WebPDecBuffer* const output = p->output;
WebPRGBABuffer* const buf = &output->u.RGBA;
uint8_t* const dst = buf->rgba + (size_t)io->mb_y * buf->stride;
uint8_t* const dst = buf->rgba + (ptrdiff_t)io->mb_y * buf->stride;
WebPSamplerProcessPlane(io->y, io->y_stride,
io->u, io->v, io->uv_stride,
dst, buf->stride, io->mb_w, io->mb_h,
@ -57,7 +59,7 @@ static int EmitSampledRGB(const VP8Io* const io, WebPDecParams* const p) {
static int EmitFancyRGB(const VP8Io* const io, WebPDecParams* const p) {
int num_lines_out = io->mb_h; // a priori guess
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* dst = buf->rgba + (size_t)io->mb_y * buf->stride;
uint8_t* dst = buf->rgba + (ptrdiff_t)io->mb_y * buf->stride;
WebPUpsampleLinePairFunc upsample = WebPUpsamplers[p->output->colorspace];
const uint8_t* cur_y = io->y;
const uint8_t* cur_u = io->u;
@ -128,7 +130,7 @@ static int EmitAlphaYUV(const VP8Io* const io, WebPDecParams* const p,
const WebPYUVABuffer* const buf = &p->output->u.YUVA;
const int mb_w = io->mb_w;
const int mb_h = io->mb_h;
uint8_t* dst = buf->a + (size_t)io->mb_y * buf->a_stride;
uint8_t* dst = buf->a + (ptrdiff_t)io->mb_y * buf->a_stride;
int j;
(void)expected_num_lines_out;
assert(expected_num_lines_out == mb_h);
@ -181,8 +183,8 @@ static int EmitAlphaRGB(const VP8Io* const io, WebPDecParams* const p,
(colorspace == MODE_ARGB || colorspace == MODE_Argb);
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
int num_rows;
const size_t start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + start_y * buf->stride;
const int start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + (ptrdiff_t)start_y * buf->stride;
uint8_t* const dst = base_rgba + (alpha_first ? 0 : 3);
const int has_alpha = WebPDispatchAlpha(alpha, io->width, mb_w,
num_rows, dst, buf->stride);
@ -205,8 +207,8 @@ static int EmitAlphaRGBA4444(const VP8Io* const io, WebPDecParams* const p,
const WEBP_CSP_MODE colorspace = p->output->colorspace;
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
int num_rows;
const size_t start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + start_y * buf->stride;
const int start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + (ptrdiff_t)start_y * buf->stride;
#if (WEBP_SWAP_16BIT_CSP == 1)
uint8_t* alpha_dst = base_rgba;
#else
@ -271,9 +273,9 @@ static int EmitRescaledYUV(const VP8Io* const io, WebPDecParams* const p) {
static int EmitRescaledAlphaYUV(const VP8Io* const io, WebPDecParams* const p,
int expected_num_lines_out) {
const WebPYUVABuffer* const buf = &p->output->u.YUVA;
uint8_t* const dst_a = buf->a + (size_t)p->last_y * buf->a_stride;
uint8_t* const dst_a = buf->a + (ptrdiff_t)p->last_y * buf->a_stride;
if (io->a != NULL) {
uint8_t* const dst_y = buf->y + (size_t)p->last_y * buf->y_stride;
uint8_t* const dst_y = buf->y + (ptrdiff_t)p->last_y * buf->y_stride;
const int num_lines_out = Rescale(io->a, io->width, io->mb_h, p->scaler_a);
assert(expected_num_lines_out == num_lines_out);
if (num_lines_out > 0) { // unmultiply the Y
@ -362,7 +364,7 @@ static int ExportRGB(WebPDecParams* const p, int y_pos) {
const WebPYUV444Converter convert =
WebPYUV444Converters[p->output->colorspace];
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* dst = buf->rgba + (size_t)y_pos * buf->stride;
uint8_t* dst = buf->rgba + (ptrdiff_t)y_pos * buf->stride;
int num_lines_out = 0;
// For RGB rescaling, because of the YUV420, current scan position
// U/V can be +1/-1 line from the Y one. Hence the double test.
@ -389,14 +391,14 @@ static int EmitRescaledRGB(const VP8Io* const io, WebPDecParams* const p) {
while (j < mb_h) {
const int y_lines_in =
WebPRescalerImport(p->scaler_y, mb_h - j,
io->y + (size_t)j * io->y_stride, io->y_stride);
io->y + (ptrdiff_t)j * io->y_stride, io->y_stride);
j += y_lines_in;
if (WebPRescaleNeededLines(p->scaler_u, uv_mb_h - uv_j)) {
const int u_lines_in = WebPRescalerImport(
p->scaler_u, uv_mb_h - uv_j, io->u + (size_t)uv_j * io->uv_stride,
p->scaler_u, uv_mb_h - uv_j, io->u + (ptrdiff_t)uv_j * io->uv_stride,
io->uv_stride);
const int v_lines_in = WebPRescalerImport(
p->scaler_v, uv_mb_h - uv_j, io->v + (size_t)uv_j * io->uv_stride,
p->scaler_v, uv_mb_h - uv_j, io->v + (ptrdiff_t)uv_j * io->uv_stride,
io->uv_stride);
(void)v_lines_in; // remove a gcc warning
assert(u_lines_in == v_lines_in);
@ -409,7 +411,7 @@ static int EmitRescaledRGB(const VP8Io* const io, WebPDecParams* const p) {
static int ExportAlpha(WebPDecParams* const p, int y_pos, int max_lines_out) {
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* const base_rgba = buf->rgba + (size_t)y_pos * buf->stride;
uint8_t* const base_rgba = buf->rgba + (ptrdiff_t)y_pos * buf->stride;
const WEBP_CSP_MODE colorspace = p->output->colorspace;
const int alpha_first =
(colorspace == MODE_ARGB || colorspace == MODE_Argb);
@ -437,7 +439,7 @@ static int ExportAlpha(WebPDecParams* const p, int y_pos, int max_lines_out) {
static int ExportAlphaRGBA4444(WebPDecParams* const p, int y_pos,
int max_lines_out) {
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* const base_rgba = buf->rgba + (size_t)y_pos * buf->stride;
uint8_t* const base_rgba = buf->rgba + (ptrdiff_t)y_pos * buf->stride;
#if (WEBP_SWAP_16BIT_CSP == 1)
uint8_t* alpha_dst = base_rgba;
#else
@ -476,7 +478,7 @@ static int EmitRescaledAlphaRGB(const VP8Io* const io, WebPDecParams* const p,
int lines_left = expected_num_out_lines;
const int y_end = p->last_y + lines_left;
while (lines_left > 0) {
const int64_t row_offset = (int64_t)scaler->src_y - io->mb_y;
const int64_t row_offset = (ptrdiff_t)scaler->src_y - io->mb_y;
WebPRescalerImport(scaler, io->mb_h + io->mb_y - scaler->src_y,
io->a + row_offset * io->width, io->width);
lines_left -= p->emit_alpha_row(p, y_end - lines_left, lines_left);

View File

@ -32,7 +32,7 @@ extern "C" {
// version numbers
#define DEC_MAJ_VERSION 1
#define DEC_MIN_VERSION 4
#define DEC_MIN_VERSION 5
#define DEC_REV_VERSION 0
// YUV-cache parameters. Cache is 32-bytes wide (= one cacheline).

View File

@ -13,6 +13,7 @@
// Jyrki Alakuijala (jyrki@google.com)
#include <assert.h>
#include <stddef.h>
#include <stdlib.h>
#include "src/dec/alphai_dec.h"
@ -624,8 +625,8 @@ static int EmitRescaledRowsRGBA(const VP8LDecoder* const dec,
int num_lines_in = 0;
int num_lines_out = 0;
while (num_lines_in < mb_h) {
uint8_t* const row_in = in + (uint64_t)num_lines_in * in_stride;
uint8_t* const row_out = out + (uint64_t)num_lines_out * out_stride;
uint8_t* const row_in = in + (ptrdiff_t)num_lines_in * in_stride;
uint8_t* const row_out = out + (ptrdiff_t)num_lines_out * out_stride;
const int lines_left = mb_h - num_lines_in;
const int needed_lines = WebPRescaleNeededLines(dec->rescaler, lines_left);
int lines_imported;
@ -827,7 +828,7 @@ static void ProcessRows(VP8LDecoder* const dec, int row) {
if (WebPIsRGBMode(output->colorspace)) { // convert to RGBA
const WebPRGBABuffer* const buf = &output->u.RGBA;
uint8_t* const rgba =
buf->rgba + (int64_t)dec->last_out_row_ * buf->stride;
buf->rgba + (ptrdiff_t)dec->last_out_row_ * buf->stride;
const int num_rows_out =
#if !defined(WEBP_REDUCE_SIZE)
io->use_scaling ?

View File

@ -13,13 +13,15 @@
#include <stdlib.h>
#include "src/dec/common_dec.h"
#include "src/dec/vp8_dec.h"
#include "src/dec/vp8i_dec.h"
#include "src/dec/vp8li_dec.h"
#include "src/dec/webpi_dec.h"
#include "src/utils/rescaler_utils.h"
#include "src/utils/utils.h"
#include "src/webp/mux_types.h" // ALPHA_FLAG
#include "src/webp/decode.h"
#include "src/webp/mux_types.h" // ALPHA_FLAG
#include "src/webp/types.h"
//------------------------------------------------------------------------------
@ -747,6 +749,61 @@ int WebPInitDecoderConfigInternal(WebPDecoderConfig* config,
return 1;
}
static int WebPCheckCropDimensionsBasic(int x, int y, int w, int h) {
return !(x < 0 || y < 0 || w <= 0 || h <= 0);
}
int WebPValidateDecoderConfig(const WebPDecoderConfig* config) {
const WebPDecoderOptions* options;
if (config == NULL) return 0;
if (!IsValidColorspace(config->output.colorspace)) {
return 0;
}
options = &config->options;
// bypass_filtering, no_fancy_upsampling, use_cropping, use_scaling,
// use_threads, flip can be any integer and are interpreted as boolean.
// Check for cropping.
if (options->use_cropping && !WebPCheckCropDimensionsBasic(
options->crop_left, options->crop_top,
options->crop_width, options->crop_height)) {
return 0;
}
// Check for scaling.
if (options->use_scaling &&
(options->scaled_width < 0 || options->scaled_height < 0 ||
(options->scaled_width == 0 && options->scaled_height == 0))) {
return 0;
}
// In case the WebPBitstreamFeatures has been filled in, check further.
if (config->input.width > 0 || config->input.height > 0) {
int scaled_width = options->scaled_width;
int scaled_height = options->scaled_height;
if (options->use_cropping &&
!WebPCheckCropDimensions(config->input.width, config->input.height,
options->crop_left, options->crop_top,
options->crop_width, options->crop_height)) {
return 0;
}
if (options->use_scaling && !WebPRescalerGetScaledDimensions(
config->input.width, config->input.height,
&scaled_width, &scaled_height)) {
return 0;
}
}
// Check for dithering.
if (options->dithering_strength < 0 || options->dithering_strength > 100 ||
options->alpha_dithering_strength < 0 ||
options->alpha_dithering_strength > 100) {
return 0;
}
return 1;
}
VP8StatusCode WebPGetFeaturesInternal(const uint8_t* data, size_t data_size,
WebPBitstreamFeatures* features,
int version) {
@ -806,8 +863,8 @@ VP8StatusCode WebPDecode(const uint8_t* data, size_t data_size,
int WebPCheckCropDimensions(int image_width, int image_height,
int x, int y, int w, int h) {
return !(x < 0 || y < 0 || w <= 0 || h <= 0 ||
x >= image_width || w > image_width || w > image_width - x ||
return WebPCheckCropDimensionsBasic(x, y, w, h) &&
!(x >= image_width || w > image_width || w > image_width - x ||
y >= image_height || h > image_height || h > image_height - y);
}

View File

@ -13,6 +13,6 @@ noinst_HEADERS =
noinst_HEADERS += ../webp/format_constants.h
libwebpdemux_la_LIBADD = ../libwebp.la
libwebpdemux_la_LDFLAGS = -no-undefined -version-info 2:15:0
libwebpdemux_la_LDFLAGS = -no-undefined -version-info 2:16:0
libwebpdemuxincludedir = $(includedir)/webp
pkgconfig_DATA = libwebpdemux.pc

View File

@ -24,7 +24,7 @@
#include "src/webp/format_constants.h"
#define DMUX_MAJ_VERSION 1
#define DMUX_MIN_VERSION 4
#define DMUX_MIN_VERSION 5
#define DMUX_REV_VERSION 0
typedef struct {

View File

@ -6,8 +6,8 @@
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
VS_VERSION_INFO VERSIONINFO
FILEVERSION 1,0,4,0
PRODUCTVERSION 1,0,4,0
FILEVERSION 1,0,5,0
PRODUCTVERSION 1,0,5,0
FILEFLAGSMASK 0x3fL
#ifdef _DEBUG
FILEFLAGS 0x1L
@ -24,12 +24,12 @@ BEGIN
BEGIN
VALUE "CompanyName", "Google, Inc."
VALUE "FileDescription", "libwebpdemux DLL"
VALUE "FileVersion", "1.4.0"
VALUE "FileVersion", "1.5.0"
VALUE "InternalName", "libwebpdemux.dll"
VALUE "LegalCopyright", "Copyright (C) 2024"
VALUE "OriginalFilename", "libwebpdemux.dll"
VALUE "ProductName", "WebP Image Demuxer"
VALUE "ProductVersion", "1.4.0"
VALUE "ProductVersion", "1.5.0"
END
END
BLOCK "VarFileInfo"

View File

@ -5,6 +5,8 @@ noinst_LTLIBRARIES += libwebpdsp_sse2.la
noinst_LTLIBRARIES += libwebpdspdecode_sse2.la
noinst_LTLIBRARIES += libwebpdsp_sse41.la
noinst_LTLIBRARIES += libwebpdspdecode_sse41.la
noinst_LTLIBRARIES += libwebpdsp_avx2.la
noinst_LTLIBRARIES += libwebpdspdecode_avx2.la
noinst_LTLIBRARIES += libwebpdsp_neon.la
noinst_LTLIBRARIES += libwebpdspdecode_neon.la
noinst_LTLIBRARIES += libwebpdsp_msa.la
@ -44,6 +46,11 @@ ENC_SOURCES += lossless_enc.c
ENC_SOURCES += quant.h
ENC_SOURCES += ssim.c
libwebpdspdecode_avx2_la_SOURCES =
libwebpdspdecode_avx2_la_SOURCES += lossless_avx2.c
libwebpdspdecode_avx2_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdspdecode_avx2_la_CFLAGS = $(AM_CFLAGS) $(AVX2_FLAGS)
libwebpdspdecode_sse41_la_SOURCES =
libwebpdspdecode_sse41_la_SOURCES += alpha_processing_sse41.c
libwebpdspdecode_sse41_la_SOURCES += dec_sse41.c
@ -123,6 +130,12 @@ libwebpdsp_sse41_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdsp_sse41_la_CFLAGS = $(AM_CFLAGS) $(SSE41_FLAGS)
libwebpdsp_sse41_la_LIBADD = libwebpdspdecode_sse41.la
libwebpdsp_avx2_la_SOURCES =
libwebpdsp_avx2_la_SOURCES += lossless_enc_avx2.c
libwebpdsp_avx2_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdsp_avx2_la_CFLAGS = $(AM_CFLAGS) $(AVX2_FLAGS)
libwebpdsp_avx2_la_LIBADD = libwebpdspdecode_avx2.la
libwebpdsp_neon_la_SOURCES =
libwebpdsp_neon_la_SOURCES += cost_neon.c
libwebpdsp_neon_la_SOURCES += enc_neon.c
@ -167,6 +180,7 @@ libwebpdsp_la_LDFLAGS = -lm
libwebpdsp_la_LIBADD =
libwebpdsp_la_LIBADD += libwebpdsp_sse2.la
libwebpdsp_la_LIBADD += libwebpdsp_sse41.la
libwebpdsp_la_LIBADD += libwebpdsp_avx2.la
libwebpdsp_la_LIBADD += libwebpdsp_neon.la
libwebpdsp_la_LIBADD += libwebpdsp_msa.la
libwebpdsp_la_LIBADD += libwebpdsp_mips32.la
@ -180,6 +194,7 @@ if BUILD_LIBWEBPDECODER
libwebpdspdecode_la_LIBADD =
libwebpdspdecode_la_LIBADD += libwebpdspdecode_sse2.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_sse41.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_avx2.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_neon.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_msa.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_mips32.la

View File

@ -56,6 +56,11 @@
(defined(_M_X64) || defined(_M_IX86))
#define WEBP_MSC_SSE41 // Visual C++ SSE4.1 targets
#endif
#if defined(_MSC_VER) && _MSC_VER >= 1700 && \
(defined(_M_X64) || defined(_M_IX86))
#define WEBP_MSC_AVX2 // Visual C++ AVX2 targets
#endif
#endif
// WEBP_HAVE_* are used to indicate the presence of the instruction set in dsp
@ -80,6 +85,16 @@
#define WEBP_HAVE_SSE41
#endif
#if (defined(__AVX2__) || defined(WEBP_MSC_AVX2)) && \
(!defined(HAVE_CONFIG_H) || defined(WEBP_HAVE_AVX2))
#define WEBP_USE_AVX2
#endif
#if defined(WEBP_USE_AVX2) && !defined(WEBP_HAVE_AVX2)
#define WEBP_HAVE_AVX2
#endif
#undef WEBP_MSC_AVX2
#undef WEBP_MSC_SSE41
#undef WEBP_MSC_SSE2

View File

@ -945,6 +945,18 @@ static int Quantize2Blocks_NEON(int16_t in[32], int16_t out[32],
vst1q_u8(dst, r); \
} while (0)
static WEBP_INLINE uint8x16x4_t Vld1qU8x4(const uint8_t* ptr) {
#if LOCAL_CLANG_PREREQ(3, 4) || LOCAL_GCC_PREREQ(9, 4) || defined(_MSC_VER)
return vld1q_u8_x4(ptr);
#else
uint8x16x4_t res;
INIT_VECTOR4(res,
vld1q_u8(ptr + 0 * 16), vld1q_u8(ptr + 1 * 16),
vld1q_u8(ptr + 2 * 16), vld1q_u8(ptr + 3 * 16));
return res;
#endif
}
static void Intra4Preds_NEON(uint8_t* WEBP_RESTRICT dst,
const uint8_t* WEBP_RESTRICT top) {
// 0 1 2 3 4 5 6 7 8 9 10 11 12 13
@ -971,9 +983,9 @@ static void Intra4Preds_NEON(uint8_t* WEBP_RESTRICT dst,
30, 30, 30, 30, 0, 0, 0, 0, 21, 22, 23, 24, 16, 16, 16, 16
};
const uint8x16x4_t lookup_avgs1 = vld1q_u8_x4(kLookupTbl1);
const uint8x16x4_t lookup_avgs2 = vld1q_u8_x4(kLookupTbl2);
const uint8x16x4_t lookup_avgs3 = vld1q_u8_x4(kLookupTbl3);
const uint8x16x4_t lookup_avgs1 = Vld1qU8x4(kLookupTbl1);
const uint8x16x4_t lookup_avgs2 = Vld1qU8x4(kLookupTbl2);
const uint8x16x4_t lookup_avgs3 = Vld1qU8x4(kLookupTbl3);
const uint8x16_t preload = vld1q_u8(top - 5);
uint8x16x2_t qcombined;

View File

@ -13,15 +13,21 @@
// Jyrki Alakuijala (jyrki@google.com)
// Urvang Joshi (urvang@google.com)
#include "src/dsp/dsp.h"
#include "src/dsp/lossless.h"
#include <assert.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "src/dec/vp8li_dec.h"
#include "src/utils/endian_inl_utils.h"
#include "src/dsp/lossless.h"
#include "src/dsp/cpu.h"
#include "src/dsp/dsp.h"
#include "src/dsp/lossless_common.h"
#include "src/utils/endian_inl_utils.h"
#include "src/utils/utils.h"
#include "src/webp/decode.h"
#include "src/webp/format_constants.h"
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Image transforms.
@ -571,16 +577,21 @@ void VP8LConvertFromBGRA(const uint32_t* const in_data, int num_pixels,
//------------------------------------------------------------------------------
VP8LProcessDecBlueAndRedFunc VP8LAddGreenToBlueAndRed;
VP8LProcessDecBlueAndRedFunc VP8LAddGreenToBlueAndRed_SSE;
VP8LPredictorAddSubFunc VP8LPredictorsAdd[16];
VP8LPredictorAddSubFunc VP8LPredictorsAdd_SSE[16];
VP8LPredictorFunc VP8LPredictors[16];
// exposed plain-C implementations
VP8LPredictorAddSubFunc VP8LPredictorsAdd_C[16];
VP8LTransformColorInverseFunc VP8LTransformColorInverse;
VP8LTransformColorInverseFunc VP8LTransformColorInverse_SSE;
VP8LConvertFunc VP8LConvertBGRAToRGB;
VP8LConvertFunc VP8LConvertBGRAToRGB_SSE;
VP8LConvertFunc VP8LConvertBGRAToRGBA;
VP8LConvertFunc VP8LConvertBGRAToRGBA_SSE;
VP8LConvertFunc VP8LConvertBGRAToRGBA4444;
VP8LConvertFunc VP8LConvertBGRAToRGB565;
VP8LConvertFunc VP8LConvertBGRAToBGR;
@ -591,6 +602,7 @@ VP8LMapAlphaFunc VP8LMapColor8b;
extern VP8CPUInfo VP8GetCPUInfo;
extern void VP8LDspInitSSE2(void);
extern void VP8LDspInitSSE41(void);
extern void VP8LDspInitAVX2(void);
extern void VP8LDspInitNEON(void);
extern void VP8LDspInitMIPSdspR2(void);
extern void VP8LDspInitMSA(void);
@ -643,6 +655,11 @@ WEBP_DSP_INIT_FUNC(VP8LDspInit) {
#if defined(WEBP_HAVE_SSE41)
if (VP8GetCPUInfo(kSSE4_1)) {
VP8LDspInitSSE41();
#if defined(WEBP_HAVE_AVX2)
if (VP8GetCPUInfo(kAVX2)) {
VP8LDspInitAVX2();
}
#endif
}
#endif
}

View File

@ -64,10 +64,12 @@ typedef void (*VP8LPredictorAddSubFunc)(const uint32_t* in,
uint32_t* WEBP_RESTRICT out);
extern VP8LPredictorAddSubFunc VP8LPredictorsAdd[16];
extern VP8LPredictorAddSubFunc VP8LPredictorsAdd_C[16];
extern VP8LPredictorAddSubFunc VP8LPredictorsAdd_SSE[16];
typedef void (*VP8LProcessDecBlueAndRedFunc)(const uint32_t* src,
int num_pixels, uint32_t* dst);
extern VP8LProcessDecBlueAndRedFunc VP8LAddGreenToBlueAndRed;
extern VP8LProcessDecBlueAndRedFunc VP8LAddGreenToBlueAndRed_SSE;
typedef struct {
// Note: the members are uint8_t, so that any negative values are
@ -80,6 +82,7 @@ typedef void (*VP8LTransformColorInverseFunc)(const VP8LMultipliers* const m,
const uint32_t* src,
int num_pixels, uint32_t* dst);
extern VP8LTransformColorInverseFunc VP8LTransformColorInverse;
extern VP8LTransformColorInverseFunc VP8LTransformColorInverse_SSE;
struct VP8LTransform; // Defined in dec/vp8li.h.
@ -99,6 +102,8 @@ extern VP8LConvertFunc VP8LConvertBGRAToRGBA;
extern VP8LConvertFunc VP8LConvertBGRAToRGBA4444;
extern VP8LConvertFunc VP8LConvertBGRAToRGB565;
extern VP8LConvertFunc VP8LConvertBGRAToBGR;
extern VP8LConvertFunc VP8LConvertBGRAToRGB_SSE;
extern VP8LConvertFunc VP8LConvertBGRAToRGBA_SSE;
// Converts from BGRA to other color spaces.
void VP8LConvertFromBGRA(const uint32_t* const in_data, int num_pixels,
@ -149,21 +154,25 @@ void VP8LDspInit(void);
typedef void (*VP8LProcessEncBlueAndRedFunc)(uint32_t* dst, int num_pixels);
extern VP8LProcessEncBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed;
extern VP8LProcessEncBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed_SSE;
typedef void (*VP8LTransformColorFunc)(
const VP8LMultipliers* WEBP_RESTRICT const m, uint32_t* WEBP_RESTRICT dst,
int num_pixels);
extern VP8LTransformColorFunc VP8LTransformColor;
extern VP8LTransformColorFunc VP8LTransformColor_SSE;
typedef void (*VP8LCollectColorBlueTransformsFunc)(
const uint32_t* WEBP_RESTRICT argb, int stride,
int tile_width, int tile_height,
int green_to_blue, int red_to_blue, uint32_t histo[]);
extern VP8LCollectColorBlueTransformsFunc VP8LCollectColorBlueTransforms;
extern VP8LCollectColorBlueTransformsFunc VP8LCollectColorBlueTransforms_SSE;
typedef void (*VP8LCollectColorRedTransformsFunc)(
const uint32_t* WEBP_RESTRICT argb, int stride,
int tile_width, int tile_height,
int green_to_red, uint32_t histo[]);
extern VP8LCollectColorRedTransformsFunc VP8LCollectColorRedTransforms;
extern VP8LCollectColorRedTransformsFunc VP8LCollectColorRedTransforms_SSE;
// Expose some C-only fallback functions
void VP8LTransformColor_C(const VP8LMultipliers* WEBP_RESTRICT const m,
@ -181,20 +190,17 @@ void VP8LCollectColorBlueTransforms_C(const uint32_t* WEBP_RESTRICT argb,
extern VP8LPredictorAddSubFunc VP8LPredictorsSub[16];
extern VP8LPredictorAddSubFunc VP8LPredictorsSub_C[16];
extern VP8LPredictorAddSubFunc VP8LPredictorsSub_SSE[16];
// -----------------------------------------------------------------------------
// Huffman-cost related functions.
typedef uint32_t (*VP8LCostFunc)(const uint32_t* population, int length);
typedef uint32_t (*VP8LCostCombinedFunc)(const uint32_t* WEBP_RESTRICT X,
const uint32_t* WEBP_RESTRICT Y,
int length);
typedef uint64_t (*VP8LCombinedShannonEntropyFunc)(const uint32_t X[256],
const uint32_t Y[256]);
typedef uint64_t (*VP8LShannonEntropyFunc)(const uint32_t* X, int length);
extern VP8LCostFunc VP8LExtraCost;
extern VP8LCostCombinedFunc VP8LExtraCostCombined;
extern VP8LCombinedShannonEntropyFunc VP8LCombinedShannonEntropy;
extern VP8LShannonEntropyFunc VP8LShannonEntropy;
@ -255,6 +261,7 @@ typedef void (*VP8LBundleColorMapFunc)(const uint8_t* WEBP_RESTRICT const row,
int width, int xbits,
uint32_t* WEBP_RESTRICT dst);
extern VP8LBundleColorMapFunc VP8LBundleColorMap;
extern VP8LBundleColorMapFunc VP8LBundleColorMap_SSE;
void VP8LBundleColorMap_C(const uint8_t* WEBP_RESTRICT const row,
int width, int xbits, uint32_t* WEBP_RESTRICT dst);

442
src/dsp/lossless_avx2.c Normal file
View File

@ -0,0 +1,442 @@
// Copyright 2025 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// AVX2 variant of methods for lossless decoder
//
// Author: Vincent Rabaud (vrabaud@google.com)
#include "src/dsp/dsp.h"
#if defined(WEBP_USE_AVX2)
#include <immintrin.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/webp/format_constants.h"
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Predictor Transform
static WEBP_INLINE void Average2_m256i(const __m256i* const a0,
const __m256i* const a1,
__m256i* const avg) {
// (a + b) >> 1 = ((a + b + 1) >> 1) - ((a ^ b) & 1)
const __m256i ones = _mm256_set1_epi8(1);
const __m256i avg1 = _mm256_avg_epu8(*a0, *a1);
const __m256i one = _mm256_and_si256(_mm256_xor_si256(*a0, *a1), ones);
*avg = _mm256_sub_epi8(avg1, one);
}
// Batch versions of those functions.
// Predictor0: ARGB_BLACK.
static void PredictorAdd0_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i black = _mm256_set1_epi32((int)ARGB_BLACK);
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i res = _mm256_add_epi8(src, black);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[0](in + i, NULL, num_pixels - i, out + i);
}
(void)upper;
}
// Predictor1: left.
static void PredictorAdd1_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
__m256i prev = _mm256_set1_epi32((int)out[-1]);
for (i = 0; i + 8 <= num_pixels; i += 8) {
// h | g | f | e | d | c | b | a
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
// g | f | e | 0 | c | b | a | 0
const __m256i shift0 = _mm256_slli_si256(src, 4);
// g + h | f + g | e + f | e | c + d | b + c | a + b | a
const __m256i sum0 = _mm256_add_epi8(src, shift0);
// e + f | e | 0 | 0 | a + b | a | 0 | 0
const __m256i shift1 = _mm256_slli_si256(sum0, 8);
// e + f + g + h | e + f + g | e + f | e | a + b + c + d | a + b + c | a + b
// | a
const __m256i sum1 = _mm256_add_epi8(sum0, shift1);
// Add a + b + c + d to the upper lane.
const int32_t sum_abcd = _mm256_extract_epi32(sum1, 3);
const __m256i sum2 = _mm256_add_epi8(
sum1,
_mm256_set_epi32(sum_abcd, sum_abcd, sum_abcd, sum_abcd, 0, 0, 0, 0));
const __m256i res = _mm256_add_epi8(sum2, prev);
_mm256_storeu_si256((__m256i*)&out[i], res);
// replicate last res output in prev.
prev = _mm256_permutevar8x32_epi32(
res, _mm256_set_epi32(7, 7, 7, 7, 7, 7, 7, 7));
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[1](in + i, upper + i, num_pixels - i, out + i);
}
}
// Macro that adds 32-bit integers from IN using mod 256 arithmetic
// per 8 bit channel.
#define GENERATE_PREDICTOR_1(X, IN) \
static void PredictorAdd##X##_AVX2(const uint32_t* in, \
const uint32_t* upper, int num_pixels, \
uint32_t* WEBP_RESTRICT out) { \
int i; \
for (i = 0; i + 8 <= num_pixels; i += 8) { \
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]); \
const __m256i other = _mm256_loadu_si256((const __m256i*)&(IN)); \
const __m256i res = _mm256_add_epi8(src, other); \
_mm256_storeu_si256((__m256i*)&out[i], res); \
} \
if (i != num_pixels) { \
VP8LPredictorsAdd_SSE[(X)](in + i, upper + i, num_pixels - i, out + i); \
} \
}
// Predictor2: Top.
GENERATE_PREDICTOR_1(2, upper[i])
// Predictor3: Top-right.
GENERATE_PREDICTOR_1(3, upper[i + 1])
// Predictor4: Top-left.
GENERATE_PREDICTOR_1(4, upper[i - 1])
#undef GENERATE_PREDICTOR_1
// Due to averages with integers, values cannot be accumulated in parallel for
// predictors 5 to 7.
#define GENERATE_PREDICTOR_2(X, IN) \
static void PredictorAdd##X##_AVX2(const uint32_t* in, \
const uint32_t* upper, int num_pixels, \
uint32_t* WEBP_RESTRICT out) { \
int i; \
for (i = 0; i + 8 <= num_pixels; i += 8) { \
const __m256i Tother = _mm256_loadu_si256((const __m256i*)&(IN)); \
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]); \
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]); \
__m256i avg, res; \
Average2_m256i(&T, &Tother, &avg); \
res = _mm256_add_epi8(avg, src); \
_mm256_storeu_si256((__m256i*)&out[i], res); \
} \
if (i != num_pixels) { \
VP8LPredictorsAdd_SSE[(X)](in + i, upper + i, num_pixels - i, out + i); \
} \
}
// Predictor8: average TL T.
GENERATE_PREDICTOR_2(8, upper[i - 1])
// Predictor9: average T TR.
GENERATE_PREDICTOR_2(9, upper[i + 1])
#undef GENERATE_PREDICTOR_2
// Predictor10: average of (average of (L,TL), average of (T, TR)).
#define DO_PRED10(OUT) \
do { \
__m256i avgLTL, avg; \
Average2_m256i(&L, &TL, &avgLTL); \
Average2_m256i(&avgTTR, &avgLTL, &avg); \
L = _mm256_add_epi8(avg, src); \
out[i + (OUT)] = (uint32_t)_mm256_cvtsi256_si32(L); \
} while (0)
#define DO_PRED10_SHIFT \
do { \
/* Rotate the pre-computed values for the next iteration.*/ \
avgTTR = _mm256_srli_si256(avgTTR, 4); \
TL = _mm256_srli_si256(TL, 4); \
src = _mm256_srli_si256(src, 4); \
} while (0)
static void PredictorAdd10_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i, j;
__m256i L = _mm256_setr_epi32((int)out[-1], 0, 0, 0, 0, 0, 0, 0);
for (i = 0; i + 8 <= num_pixels; i += 8) {
__m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
__m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TR = _mm256_loadu_si256((const __m256i*)&upper[i + 1]);
__m256i avgTTR;
Average2_m256i(&T, &TR, &avgTTR);
{
const __m256i avgTTR_bak = avgTTR;
const __m256i TL_bak = TL;
const __m256i src_bak = src;
for (j = 0; j < 4; ++j) {
DO_PRED10(j);
DO_PRED10_SHIFT;
}
avgTTR = _mm256_permute2x128_si256(avgTTR_bak, avgTTR_bak, 1);
TL = _mm256_permute2x128_si256(TL_bak, TL_bak, 1);
src = _mm256_permute2x128_si256(src_bak, src_bak, 1);
for (; j < 8; ++j) {
DO_PRED10(j);
DO_PRED10_SHIFT;
}
}
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[10](in + i, upper + i, num_pixels - i, out + i);
}
}
#undef DO_PRED10
#undef DO_PRED10_SHIFT
// Predictor11: select.
#define DO_PRED11(OUT) \
do { \
const __m256i L_lo = _mm256_unpacklo_epi32(L, T); \
const __m256i TL_lo = _mm256_unpacklo_epi32(TL, T); \
const __m256i pb = _mm256_sad_epu8(L_lo, TL_lo); /* pb = sum |L-TL|*/ \
const __m256i mask = _mm256_cmpgt_epi32(pb, pa); \
const __m256i A = _mm256_and_si256(mask, L); \
const __m256i B = _mm256_andnot_si256(mask, T); \
const __m256i pred = _mm256_or_si256(A, B); /* pred = (pa > b)? L : T*/ \
L = _mm256_add_epi8(src, pred); \
out[i + (OUT)] = (uint32_t)_mm256_cvtsi256_si32(L); \
} while (0)
#define DO_PRED11_SHIFT \
do { \
/* Shift the pre-computed value for the next iteration.*/ \
T = _mm256_srli_si256(T, 4); \
TL = _mm256_srli_si256(TL, 4); \
src = _mm256_srli_si256(src, 4); \
pa = _mm256_srli_si256(pa, 4); \
} while (0)
static void PredictorAdd11_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i, j;
__m256i pa;
__m256i L = _mm256_setr_epi32((int)out[-1], 0, 0, 0, 0, 0, 0, 0);
for (i = 0; i + 8 <= num_pixels; i += 8) {
__m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
__m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
__m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
{
// We can unpack with any value on the upper 32 bits, provided it's the
// same on both operands (so that their sum of abs diff is zero). Here we
// use T.
const __m256i T_lo = _mm256_unpacklo_epi32(T, T);
const __m256i TL_lo = _mm256_unpacklo_epi32(TL, T);
const __m256i T_hi = _mm256_unpackhi_epi32(T, T);
const __m256i TL_hi = _mm256_unpackhi_epi32(TL, T);
const __m256i s_lo = _mm256_sad_epu8(T_lo, TL_lo);
const __m256i s_hi = _mm256_sad_epu8(T_hi, TL_hi);
pa = _mm256_packs_epi32(s_lo, s_hi); // pa = sum |T-TL|
}
{
const __m256i T_bak = T;
const __m256i TL_bak = TL;
const __m256i src_bak = src;
const __m256i pa_bak = pa;
for (j = 0; j < 4; ++j) {
DO_PRED11(j);
DO_PRED11_SHIFT;
}
T = _mm256_permute2x128_si256(T_bak, T_bak, 1);
TL = _mm256_permute2x128_si256(TL_bak, TL_bak, 1);
src = _mm256_permute2x128_si256(src_bak, src_bak, 1);
pa = _mm256_permute2x128_si256(pa_bak, pa_bak, 1);
for (; j < 8; ++j) {
DO_PRED11(j);
DO_PRED11_SHIFT;
}
}
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[11](in + i, upper + i, num_pixels - i, out + i);
}
}
#undef DO_PRED11
#undef DO_PRED11_SHIFT
// Predictor12: ClampedAddSubtractFull.
#define DO_PRED12(DIFF, OUT) \
do { \
const __m256i all = _mm256_add_epi16(L, (DIFF)); \
const __m256i alls = _mm256_packus_epi16(all, all); \
const __m256i res = _mm256_add_epi8(src, alls); \
out[i + (OUT)] = (uint32_t)_mm256_cvtsi256_si32(res); \
L = _mm256_unpacklo_epi8(res, zero); \
} while (0)
#define DO_PRED12_SHIFT(DIFF, LANE) \
do { \
/* Shift the pre-computed value for the next iteration.*/ \
if ((LANE) == 0) (DIFF) = _mm256_srli_si256(DIFF, 8); \
src = _mm256_srli_si256(src, 4); \
} while (0)
static void PredictorAdd12_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i zero = _mm256_setzero_si256();
const __m256i L8 = _mm256_setr_epi32((int)out[-1], 0, 0, 0, 0, 0, 0, 0);
__m256i L = _mm256_unpacklo_epi8(L8, zero);
for (i = 0; i + 8 <= num_pixels; i += 8) {
// Load 8 pixels at a time.
__m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i T_lo = _mm256_unpacklo_epi8(T, zero);
const __m256i T_hi = _mm256_unpackhi_epi8(T, zero);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i TL_lo = _mm256_unpacklo_epi8(TL, zero);
const __m256i TL_hi = _mm256_unpackhi_epi8(TL, zero);
__m256i diff_lo = _mm256_sub_epi16(T_lo, TL_lo);
__m256i diff_hi = _mm256_sub_epi16(T_hi, TL_hi);
const __m256i diff_lo_bak = diff_lo;
const __m256i diff_hi_bak = diff_hi;
const __m256i src_bak = src;
DO_PRED12(diff_lo, 0);
DO_PRED12_SHIFT(diff_lo, 0);
DO_PRED12(diff_lo, 1);
DO_PRED12_SHIFT(diff_lo, 0);
DO_PRED12(diff_hi, 2);
DO_PRED12_SHIFT(diff_hi, 0);
DO_PRED12(diff_hi, 3);
DO_PRED12_SHIFT(diff_hi, 0);
// Process the upper lane.
diff_lo = _mm256_permute2x128_si256(diff_lo_bak, diff_lo_bak, 1);
diff_hi = _mm256_permute2x128_si256(diff_hi_bak, diff_hi_bak, 1);
src = _mm256_permute2x128_si256(src_bak, src_bak, 1);
DO_PRED12(diff_lo, 4);
DO_PRED12_SHIFT(diff_lo, 0);
DO_PRED12(diff_lo, 5);
DO_PRED12_SHIFT(diff_lo, 1);
DO_PRED12(diff_hi, 6);
DO_PRED12_SHIFT(diff_hi, 0);
DO_PRED12(diff_hi, 7);
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[12](in + i, upper + i, num_pixels - i, out + i);
}
}
#undef DO_PRED12
#undef DO_PRED12_SHIFT
// Due to averages with integers, values cannot be accumulated in parallel for
// predictors 13.
//------------------------------------------------------------------------------
// Subtract-Green Transform
static void AddGreenToBlueAndRed_AVX2(const uint32_t* const src, int num_pixels,
uint32_t* dst) {
int i;
const __m256i kCstShuffle = _mm256_set_epi8(
-1, 29, -1, 29, -1, 25, -1, 25, -1, 21, -1, 21, -1, 17, -1, 17, -1, 13,
-1, 13, -1, 9, -1, 9, -1, 5, -1, 5, -1, 1, -1, 1);
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i in = _mm256_loadu_si256((const __m256i*)&src[i]); // argb
const __m256i in_0g0g = _mm256_shuffle_epi8(in, kCstShuffle); // 0g0g
const __m256i out = _mm256_add_epi8(in, in_0g0g);
_mm256_storeu_si256((__m256i*)&dst[i], out);
}
// fallthrough and finish off with SSE.
if (i != num_pixels) {
VP8LAddGreenToBlueAndRed_SSE(src + i, num_pixels - i, dst + i);
}
}
//------------------------------------------------------------------------------
// Color Transform
static void TransformColorInverse_AVX2(const VP8LMultipliers* const m,
const uint32_t* const src,
int num_pixels, uint32_t* dst) {
// sign-extended multiplying constants, pre-shifted by 5.
#define CST(X) (((int16_t)(m->X << 8)) >> 5) // sign-extend
const __m256i mults_rb =
_mm256_set1_epi32((int)((uint32_t)CST(green_to_red_) << 16 |
(CST(green_to_blue_) & 0xffff)));
const __m256i mults_b2 = _mm256_set1_epi32(CST(red_to_blue_));
#undef CST
const __m256i mask_ag = _mm256_set1_epi32((int)0xff00ff00);
const __m256i perm1 = _mm256_setr_epi8(
-1, 1, -1, 1, -1, 5, -1, 5, -1, 9, -1, 9, -1, 13, -1, 13, -1, 17, -1, 17,
-1, 21, -1, 21, -1, 25, -1, 25, -1, 29, -1, 29);
const __m256i perm2 = _mm256_setr_epi8(
-1, 2, -1, -1, -1, 6, -1, -1, -1, 10, -1, -1, -1, 14, -1, -1, -1, 18, -1,
-1, -1, 22, -1, -1, -1, 26, -1, -1, -1, 30, -1, -1);
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i A = _mm256_loadu_si256((const __m256i*)(src + i));
const __m256i B = _mm256_shuffle_epi8(A, perm1); // argb -> g0g0
const __m256i C = _mm256_mulhi_epi16(B, mults_rb);
const __m256i D = _mm256_add_epi8(A, C);
const __m256i E = _mm256_shuffle_epi8(D, perm2);
const __m256i F = _mm256_mulhi_epi16(E, mults_b2);
const __m256i G = _mm256_add_epi8(D, F);
const __m256i out = _mm256_blendv_epi8(G, A, mask_ag);
_mm256_storeu_si256((__m256i*)&dst[i], out);
}
// Fall-back to SSE-version for left-overs.
if (i != num_pixels) {
VP8LTransformColorInverse_SSE(m, src + i, num_pixels - i, dst + i);
}
}
//------------------------------------------------------------------------------
// Color-space conversion functions
static void ConvertBGRAToRGBA_AVX2(const uint32_t* WEBP_RESTRICT src,
int num_pixels, uint8_t* WEBP_RESTRICT dst) {
const __m256i* in = (const __m256i*)src;
__m256i* out = (__m256i*)dst;
while (num_pixels >= 8) {
const __m256i A = _mm256_loadu_si256(in++);
const __m256i B = _mm256_shuffle_epi8(
A,
_mm256_set_epi8(15, 12, 13, 14, 11, 8, 9, 10, 7, 4, 5, 6, 3, 0, 1, 2,
15, 12, 13, 14, 11, 8, 9, 10, 7, 4, 5, 6, 3, 0, 1, 2));
_mm256_storeu_si256(out++, B);
num_pixels -= 8;
}
// left-overs
if (num_pixels > 0) {
VP8LConvertBGRAToRGBA_SSE((const uint32_t*)in, num_pixels, (uint8_t*)out);
}
}
//------------------------------------------------------------------------------
// Entry point
extern void VP8LDspInitAVX2(void);
WEBP_TSAN_IGNORE_FUNCTION void VP8LDspInitAVX2(void) {
VP8LPredictorsAdd[0] = PredictorAdd0_AVX2;
VP8LPredictorsAdd[1] = PredictorAdd1_AVX2;
VP8LPredictorsAdd[2] = PredictorAdd2_AVX2;
VP8LPredictorsAdd[3] = PredictorAdd3_AVX2;
VP8LPredictorsAdd[4] = PredictorAdd4_AVX2;
VP8LPredictorsAdd[8] = PredictorAdd8_AVX2;
VP8LPredictorsAdd[9] = PredictorAdd9_AVX2;
VP8LPredictorsAdd[10] = PredictorAdd10_AVX2;
VP8LPredictorsAdd[11] = PredictorAdd11_AVX2;
VP8LPredictorsAdd[12] = PredictorAdd12_AVX2;
VP8LAddGreenToBlueAndRed = AddGreenToBlueAndRed_AVX2;
VP8LTransformColorInverse = TransformColorInverse_AVX2;
VP8LConvertBGRAToRGBA = ConvertBGRAToRGBA_AVX2;
}
#else // !WEBP_USE_AVX2
WEBP_DSP_INIT_STUB(VP8LDspInitAVX2)
#endif // WEBP_USE_AVX2

View File

@ -13,16 +13,19 @@
// Jyrki Alakuijala (jyrki@google.com)
// Urvang Joshi (urvang@google.com)
#include "src/dsp/dsp.h"
#include <assert.h>
#include <math.h>
#include <stdlib.h>
#include "src/dec/vp8li_dec.h"
#include "src/utils/endian_inl_utils.h"
#include <string.h>
#include "src/dsp/cpu.h"
#include "src/dsp/dsp.h"
#include "src/dsp/lossless.h"
#include "src/dsp/lossless_common.h"
#include "src/dsp/yuv.h"
#include "src/enc/histogram_enc.h"
#include "src/utils/utils.h"
#include "src/webp/format_constants.h"
#include "src/webp/types.h"
// lookup table for small values of log2(int) * (1 << LOG_2_PRECISION_BITS).
// Obtained in Python with:
@ -580,20 +583,6 @@ static uint32_t ExtraCost_C(const uint32_t* population, int length) {
return cost;
}
static uint32_t ExtraCostCombined_C(const uint32_t* WEBP_RESTRICT X,
const uint32_t* WEBP_RESTRICT Y,
int length) {
int i;
uint32_t cost = X[4] + Y[4] + X[5] + Y[5];
assert(length % 2 == 0);
for (i = 2; i < length / 2 - 1; ++i) {
const int xy0 = X[2 * i + 2] + Y[2 * i + 2];
const int xy1 = X[2 * i + 3] + Y[2 * i + 3];
cost += i * (xy0 + xy1);
}
return cost;
}
//------------------------------------------------------------------------------
static void AddVector_C(const uint32_t* WEBP_RESTRICT a,
@ -710,17 +699,20 @@ GENERATE_PREDICTOR_SUB(13)
//------------------------------------------------------------------------------
VP8LProcessEncBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed;
VP8LProcessEncBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed_SSE;
VP8LTransformColorFunc VP8LTransformColor;
VP8LTransformColorFunc VP8LTransformColor_SSE;
VP8LCollectColorBlueTransformsFunc VP8LCollectColorBlueTransforms;
VP8LCollectColorBlueTransformsFunc VP8LCollectColorBlueTransforms_SSE;
VP8LCollectColorRedTransformsFunc VP8LCollectColorRedTransforms;
VP8LCollectColorRedTransformsFunc VP8LCollectColorRedTransforms_SSE;
VP8LFastLog2SlowFunc VP8LFastLog2Slow;
VP8LFastSLog2SlowFunc VP8LFastSLog2Slow;
VP8LCostFunc VP8LExtraCost;
VP8LCostCombinedFunc VP8LExtraCostCombined;
VP8LCombinedShannonEntropyFunc VP8LCombinedShannonEntropy;
VP8LShannonEntropyFunc VP8LShannonEntropy;
@ -732,13 +724,16 @@ VP8LAddVectorEqFunc VP8LAddVectorEq;
VP8LVectorMismatchFunc VP8LVectorMismatch;
VP8LBundleColorMapFunc VP8LBundleColorMap;
VP8LBundleColorMapFunc VP8LBundleColorMap_SSE;
VP8LPredictorAddSubFunc VP8LPredictorsSub[16];
VP8LPredictorAddSubFunc VP8LPredictorsSub_C[16];
VP8LPredictorAddSubFunc VP8LPredictorsSub_SSE[16];
extern VP8CPUInfo VP8GetCPUInfo;
extern void VP8LEncDspInitSSE2(void);
extern void VP8LEncDspInitSSE41(void);
extern void VP8LEncDspInitAVX2(void);
extern void VP8LEncDspInitNEON(void);
extern void VP8LEncDspInitMIPS32(void);
extern void VP8LEncDspInitMIPSdspR2(void);
@ -760,7 +755,6 @@ WEBP_DSP_INIT_FUNC(VP8LEncDspInit) {
VP8LFastSLog2Slow = FastSLog2Slow_C;
VP8LExtraCost = ExtraCost_C;
VP8LExtraCostCombined = ExtraCostCombined_C;
VP8LCombinedShannonEntropy = CombinedShannonEntropy_C;
VP8LShannonEntropy = ShannonEntropy_C;
@ -815,6 +809,11 @@ WEBP_DSP_INIT_FUNC(VP8LEncDspInit) {
#if defined(WEBP_HAVE_SSE41)
if (VP8GetCPUInfo(kSSE4_1)) {
VP8LEncDspInitSSE41();
#if defined(WEBP_HAVE_AVX2)
if (VP8GetCPUInfo(kAVX2)) {
VP8LEncDspInitAVX2();
}
#endif
}
#endif
}
@ -850,7 +849,6 @@ WEBP_DSP_INIT_FUNC(VP8LEncDspInit) {
assert(VP8LFastLog2Slow != NULL);
assert(VP8LFastSLog2Slow != NULL);
assert(VP8LExtraCost != NULL);
assert(VP8LExtraCostCombined != NULL);
assert(VP8LCombinedShannonEntropy != NULL);
assert(VP8LShannonEntropy != NULL);
assert(VP8LGetEntropyUnrefined != NULL);

733
src/dsp/lossless_enc_avx2.c Normal file
View File

@ -0,0 +1,733 @@
// Copyright 2025 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// AVX2 variant of methods for lossless encoder
//
// Author: Vincent Rabaud (vrabaud@google.com)
#include "src/dsp/dsp.h"
#if defined(WEBP_USE_AVX2)
#include <assert.h>
#include <immintrin.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/dsp/lossless_common.h"
#include "src/utils/utils.h"
#include "src/webp/format_constants.h"
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Subtract-Green Transform
static void SubtractGreenFromBlueAndRed_AVX2(uint32_t* argb_data,
int num_pixels) {
int i;
const __m256i kCstShuffle = _mm256_set_epi8(
-1, 29, -1, 29, -1, 25, -1, 25, -1, 21, -1, 21, -1, 17, -1, 17, -1, 13,
-1, 13, -1, 9, -1, 9, -1, 5, -1, 5, -1, 1, -1, 1);
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i in = _mm256_loadu_si256((__m256i*)&argb_data[i]); // argb
const __m256i in_0g0g = _mm256_shuffle_epi8(in, kCstShuffle);
const __m256i out = _mm256_sub_epi8(in, in_0g0g);
_mm256_storeu_si256((__m256i*)&argb_data[i], out);
}
// fallthrough and finish off with plain-SSE
if (i != num_pixels) {
VP8LSubtractGreenFromBlueAndRed_SSE(argb_data + i, num_pixels - i);
}
}
//------------------------------------------------------------------------------
// Color Transform
// For sign-extended multiplying constants, pre-shifted by 5:
#define CST_5b(X) (((int16_t)((uint16_t)(X) << 8)) >> 5)
#define MK_CST_16(HI, LO) \
_mm256_set1_epi32((int)(((uint32_t)(HI) << 16) | ((LO) & 0xffff)))
static void TransformColor_AVX2(const VP8LMultipliers* WEBP_RESTRICT const m,
uint32_t* WEBP_RESTRICT argb_data,
int num_pixels) {
const __m256i mults_rb =
MK_CST_16(CST_5b(m->green_to_red_), CST_5b(m->green_to_blue_));
const __m256i mults_b2 = MK_CST_16(CST_5b(m->red_to_blue_), 0);
const __m256i mask_rb = _mm256_set1_epi32(0x00ff00ff); // red-blue masks
const __m256i kCstShuffle = _mm256_set_epi8(
29, -1, 29, -1, 25, -1, 25, -1, 21, -1, 21, -1, 17, -1, 17, -1, 13, -1,
13, -1, 9, -1, 9, -1, 5, -1, 5, -1, 1, -1, 1, -1);
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i in = _mm256_loadu_si256((__m256i*)&argb_data[i]); // argb
const __m256i A = _mm256_shuffle_epi8(in, kCstShuffle); // g0g0
const __m256i B = _mm256_mulhi_epi16(A, mults_rb); // x dr x db1
const __m256i C = _mm256_slli_epi16(in, 8); // r 0 b 0
const __m256i D = _mm256_mulhi_epi16(C, mults_b2); // x db2 0 0
const __m256i E = _mm256_srli_epi32(D, 16); // 0 0 x db2
const __m256i F = _mm256_add_epi8(E, B); // x dr x db
const __m256i G = _mm256_and_si256(F, mask_rb); // 0 dr 0 db
const __m256i out = _mm256_sub_epi8(in, G);
_mm256_storeu_si256((__m256i*)&argb_data[i], out);
}
// fallthrough and finish off with plain-C
if (i != num_pixels) {
VP8LTransformColor_SSE(m, argb_data + i, num_pixels - i);
}
}
//------------------------------------------------------------------------------
#define SPAN 16
static void CollectColorBlueTransforms_AVX2(const uint32_t* WEBP_RESTRICT argb,
int stride, int tile_width,
int tile_height, int green_to_blue,
int red_to_blue, uint32_t histo[]) {
const __m256i mult =
MK_CST_16(CST_5b(red_to_blue) + 256, CST_5b(green_to_blue));
const __m256i perm = _mm256_setr_epi8(
-1, 1, -1, 2, -1, 5, -1, 6, -1, 9, -1, 10, -1, 13, -1, 14, -1, 17, -1, 18,
-1, 21, -1, 22, -1, 25, -1, 26, -1, 29, -1, 30);
if (tile_width >= 8) {
int y, i;
for (y = 0; y < tile_height; ++y) {
uint8_t values[32];
const uint32_t* const src = argb + y * stride;
const __m256i A1 = _mm256_loadu_si256((const __m256i*)src);
const __m256i B1 = _mm256_shuffle_epi8(A1, perm);
const __m256i C1 = _mm256_mulhi_epi16(B1, mult);
const __m256i D1 = _mm256_sub_epi16(A1, C1);
__m256i E = _mm256_add_epi16(_mm256_srli_epi32(D1, 16), D1);
int x;
for (x = 8; x + 8 <= tile_width; x += 8) {
const __m256i A2 = _mm256_loadu_si256((const __m256i*)(src + x));
__m256i B2, C2, D2;
_mm256_storeu_si256((__m256i*)values, E);
for (i = 0; i < 32; i += 4) ++histo[values[i]];
B2 = _mm256_shuffle_epi8(A2, perm);
C2 = _mm256_mulhi_epi16(B2, mult);
D2 = _mm256_sub_epi16(A2, C2);
E = _mm256_add_epi16(_mm256_srli_epi32(D2, 16), D2);
}
_mm256_storeu_si256((__m256i*)values, E);
for (i = 0; i < 32; i += 4) ++histo[values[i]];
}
}
{
const int left_over = tile_width & 7;
if (left_over > 0) {
VP8LCollectColorBlueTransforms_SSE(argb + tile_width - left_over, stride,
left_over, tile_height, green_to_blue,
red_to_blue, histo);
}
}
}
static void CollectColorRedTransforms_AVX2(const uint32_t* WEBP_RESTRICT argb,
int stride, int tile_width,
int tile_height, int green_to_red,
uint32_t histo[]) {
const __m256i mult = MK_CST_16(0, CST_5b(green_to_red));
const __m256i mask_g = _mm256_set1_epi32(0x0000ff00);
if (tile_width >= 8) {
int y, i;
for (y = 0; y < tile_height; ++y) {
uint8_t values[32];
const uint32_t* const src = argb + y * stride;
const __m256i A1 = _mm256_loadu_si256((const __m256i*)src);
const __m256i B1 = _mm256_and_si256(A1, mask_g);
const __m256i C1 = _mm256_madd_epi16(B1, mult);
__m256i D = _mm256_sub_epi16(A1, C1);
int x;
for (x = 8; x + 8 <= tile_width; x += 8) {
const __m256i A2 = _mm256_loadu_si256((const __m256i*)(src + x));
__m256i B2, C2;
_mm256_storeu_si256((__m256i*)values, D);
for (i = 2; i < 32; i += 4) ++histo[values[i]];
B2 = _mm256_and_si256(A2, mask_g);
C2 = _mm256_madd_epi16(B2, mult);
D = _mm256_sub_epi16(A2, C2);
}
_mm256_storeu_si256((__m256i*)values, D);
for (i = 2; i < 32; i += 4) ++histo[values[i]];
}
}
{
const int left_over = tile_width & 7;
if (left_over > 0) {
VP8LCollectColorRedTransforms_SSE(argb + tile_width - left_over, stride,
left_over, tile_height, green_to_red,
histo);
}
}
}
#undef SPAN
#undef MK_CST_16
//------------------------------------------------------------------------------
// Note we are adding uint32_t's as *signed* int32's (using _mm256_add_epi32).
// But that's ok since the histogram values are less than 1<<28 (max picture
// size).
static void AddVector_AVX2(const uint32_t* WEBP_RESTRICT a,
const uint32_t* WEBP_RESTRICT b,
uint32_t* WEBP_RESTRICT out, int size) {
int i = 0;
int aligned_size = size & ~31;
// Size is, at minimum, NUM_DISTANCE_CODES (40) and may be as large as
// NUM_LITERAL_CODES (256) + NUM_LENGTH_CODES (24) + (0 or a non-zero power of
// 2). See the usage in VP8LHistogramAdd().
assert(size >= 32);
assert(size % 2 == 0);
do {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i + 0]);
const __m256i a1 = _mm256_loadu_si256((const __m256i*)&a[i + 8]);
const __m256i a2 = _mm256_loadu_si256((const __m256i*)&a[i + 16]);
const __m256i a3 = _mm256_loadu_si256((const __m256i*)&a[i + 24]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&b[i + 0]);
const __m256i b1 = _mm256_loadu_si256((const __m256i*)&b[i + 8]);
const __m256i b2 = _mm256_loadu_si256((const __m256i*)&b[i + 16]);
const __m256i b3 = _mm256_loadu_si256((const __m256i*)&b[i + 24]);
_mm256_storeu_si256((__m256i*)&out[i + 0], _mm256_add_epi32(a0, b0));
_mm256_storeu_si256((__m256i*)&out[i + 8], _mm256_add_epi32(a1, b1));
_mm256_storeu_si256((__m256i*)&out[i + 16], _mm256_add_epi32(a2, b2));
_mm256_storeu_si256((__m256i*)&out[i + 24], _mm256_add_epi32(a3, b3));
i += 32;
} while (i != aligned_size);
if ((size & 16) != 0) {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i + 0]);
const __m256i a1 = _mm256_loadu_si256((const __m256i*)&a[i + 8]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&b[i + 0]);
const __m256i b1 = _mm256_loadu_si256((const __m256i*)&b[i + 8]);
_mm256_storeu_si256((__m256i*)&out[i + 0], _mm256_add_epi32(a0, b0));
_mm256_storeu_si256((__m256i*)&out[i + 8], _mm256_add_epi32(a1, b1));
i += 16;
}
size &= 15;
if (size == 8) {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&b[i]);
_mm256_storeu_si256((__m256i*)&out[i], _mm256_add_epi32(a0, b0));
} else {
for (; size--; ++i) {
out[i] = a[i] + b[i];
}
}
}
static void AddVectorEq_AVX2(const uint32_t* WEBP_RESTRICT a,
uint32_t* WEBP_RESTRICT out, int size) {
int i = 0;
int aligned_size = size & ~31;
// Size is, at minimum, NUM_DISTANCE_CODES (40) and may be as large as
// NUM_LITERAL_CODES (256) + NUM_LENGTH_CODES (24) + (0 or a non-zero power of
// 2). See the usage in VP8LHistogramAdd().
assert(size >= 32);
assert(size % 2 == 0);
do {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i + 0]);
const __m256i a1 = _mm256_loadu_si256((const __m256i*)&a[i + 8]);
const __m256i a2 = _mm256_loadu_si256((const __m256i*)&a[i + 16]);
const __m256i a3 = _mm256_loadu_si256((const __m256i*)&a[i + 24]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&out[i + 0]);
const __m256i b1 = _mm256_loadu_si256((const __m256i*)&out[i + 8]);
const __m256i b2 = _mm256_loadu_si256((const __m256i*)&out[i + 16]);
const __m256i b3 = _mm256_loadu_si256((const __m256i*)&out[i + 24]);
_mm256_storeu_si256((__m256i*)&out[i + 0], _mm256_add_epi32(a0, b0));
_mm256_storeu_si256((__m256i*)&out[i + 8], _mm256_add_epi32(a1, b1));
_mm256_storeu_si256((__m256i*)&out[i + 16], _mm256_add_epi32(a2, b2));
_mm256_storeu_si256((__m256i*)&out[i + 24], _mm256_add_epi32(a3, b3));
i += 32;
} while (i != aligned_size);
if ((size & 16) != 0) {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i + 0]);
const __m256i a1 = _mm256_loadu_si256((const __m256i*)&a[i + 8]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&out[i + 0]);
const __m256i b1 = _mm256_loadu_si256((const __m256i*)&out[i + 8]);
_mm256_storeu_si256((__m256i*)&out[i + 0], _mm256_add_epi32(a0, b0));
_mm256_storeu_si256((__m256i*)&out[i + 8], _mm256_add_epi32(a1, b1));
i += 16;
}
size &= 15;
if (size == 8) {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&out[i]);
_mm256_storeu_si256((__m256i*)&out[i], _mm256_add_epi32(a0, b0));
} else {
for (; size--; ++i) {
out[i] += a[i];
}
}
}
//------------------------------------------------------------------------------
// Entropy
#if !defined(WEBP_HAVE_SLOW_CLZ_CTZ)
static uint64_t CombinedShannonEntropy_AVX2(const uint32_t X[256],
const uint32_t Y[256]) {
int i;
uint64_t retval = 0;
uint32_t sumX = 0, sumXY = 0;
const __m256i zero = _mm256_setzero_si256();
for (i = 0; i < 256; i += 32) {
const __m256i x0 = _mm256_loadu_si256((const __m256i*)(X + i + 0));
const __m256i y0 = _mm256_loadu_si256((const __m256i*)(Y + i + 0));
const __m256i x1 = _mm256_loadu_si256((const __m256i*)(X + i + 8));
const __m256i y1 = _mm256_loadu_si256((const __m256i*)(Y + i + 8));
const __m256i x2 = _mm256_loadu_si256((const __m256i*)(X + i + 16));
const __m256i y2 = _mm256_loadu_si256((const __m256i*)(Y + i + 16));
const __m256i x3 = _mm256_loadu_si256((const __m256i*)(X + i + 24));
const __m256i y3 = _mm256_loadu_si256((const __m256i*)(Y + i + 24));
const __m256i x4 = _mm256_packs_epi16(_mm256_packs_epi32(x0, x1),
_mm256_packs_epi32(x2, x3));
const __m256i y4 = _mm256_packs_epi16(_mm256_packs_epi32(y0, y1),
_mm256_packs_epi32(y2, y3));
// Packed pixels are actually in order: ... 17 16 12 11 10 9 8 3 2 1 0
const __m256i x5 = _mm256_permutevar8x32_epi32(
x4, _mm256_set_epi32(7, 3, 6, 2, 5, 1, 4, 0));
const __m256i y5 = _mm256_permutevar8x32_epi32(
y4, _mm256_set_epi32(7, 3, 6, 2, 5, 1, 4, 0));
const uint32_t mx =
(uint32_t)_mm256_movemask_epi8(_mm256_cmpgt_epi8(x5, zero));
uint32_t my =
(uint32_t)_mm256_movemask_epi8(_mm256_cmpgt_epi8(y5, zero)) | mx;
while (my) {
const int32_t j = BitsCtz(my);
uint32_t xy;
if ((mx >> j) & 1) {
const int x = X[i + j];
sumXY += x;
retval += VP8LFastSLog2(x);
}
xy = X[i + j] + Y[i + j];
sumX += xy;
retval += VP8LFastSLog2(xy);
my &= my - 1;
}
}
retval = VP8LFastSLog2(sumX) + VP8LFastSLog2(sumXY) - retval;
return retval;
}
#else
#define DONT_USE_COMBINED_SHANNON_ENTROPY_SSE2_FUNC // won't be faster
#endif
//------------------------------------------------------------------------------
static int VectorMismatch_AVX2(const uint32_t* const array1,
const uint32_t* const array2, int length) {
int match_len;
if (length >= 24) {
__m256i A0 = _mm256_loadu_si256((const __m256i*)&array1[0]);
__m256i A1 = _mm256_loadu_si256((const __m256i*)&array2[0]);
match_len = 0;
do {
// Loop unrolling and early load both provide a speedup of 10% for the
// current function. Also, max_limit can be MAX_LENGTH=4096 at most.
const __m256i cmpA = _mm256_cmpeq_epi32(A0, A1);
const __m256i B0 =
_mm256_loadu_si256((const __m256i*)&array1[match_len + 8]);
const __m256i B1 =
_mm256_loadu_si256((const __m256i*)&array2[match_len + 8]);
if ((uint32_t)_mm256_movemask_epi8(cmpA) != 0xffffffff) break;
match_len += 8;
{
const __m256i cmpB = _mm256_cmpeq_epi32(B0, B1);
A0 = _mm256_loadu_si256((const __m256i*)&array1[match_len + 8]);
A1 = _mm256_loadu_si256((const __m256i*)&array2[match_len + 8]);
if ((uint32_t)_mm256_movemask_epi8(cmpB) != 0xffffffff) break;
match_len += 8;
}
} while (match_len + 24 < length);
} else {
match_len = 0;
// Unroll the potential first two loops.
if (length >= 8 &&
(uint32_t)_mm256_movemask_epi8(_mm256_cmpeq_epi32(
_mm256_loadu_si256((const __m256i*)&array1[0]),
_mm256_loadu_si256((const __m256i*)&array2[0]))) == 0xffffffff) {
match_len = 8;
if (length >= 16 &&
(uint32_t)_mm256_movemask_epi8(_mm256_cmpeq_epi32(
_mm256_loadu_si256((const __m256i*)&array1[8]),
_mm256_loadu_si256((const __m256i*)&array2[8]))) == 0xffffffff) {
match_len = 16;
}
}
}
while (match_len < length && array1[match_len] == array2[match_len]) {
++match_len;
}
return match_len;
}
// Bundles multiple (1, 2, 4 or 8) pixels into a single pixel.
static void BundleColorMap_AVX2(const uint8_t* WEBP_RESTRICT const row,
int width, int xbits,
uint32_t* WEBP_RESTRICT dst) {
int x = 0;
assert(xbits >= 0);
assert(xbits <= 3);
switch (xbits) {
case 0: {
const __m256i ff = _mm256_set1_epi16((short)0xff00);
const __m256i zero = _mm256_setzero_si256();
// Store 0xff000000 | (row[x] << 8).
for (x = 0; x + 32 <= width; x += 32, dst += 32) {
const __m256i in = _mm256_loadu_si256((const __m256i*)&row[x]);
const __m256i in_lo = _mm256_unpacklo_epi8(zero, in);
const __m256i dst0 = _mm256_unpacklo_epi16(in_lo, ff);
const __m256i dst1 = _mm256_unpackhi_epi16(in_lo, ff);
const __m256i in_hi = _mm256_unpackhi_epi8(zero, in);
const __m256i dst2 = _mm256_unpacklo_epi16(in_hi, ff);
const __m256i dst3 = _mm256_unpackhi_epi16(in_hi, ff);
_mm256_storeu2_m128i((__m128i*)&dst[16], (__m128i*)&dst[0], dst0);
_mm256_storeu2_m128i((__m128i*)&dst[20], (__m128i*)&dst[4], dst1);
_mm256_storeu2_m128i((__m128i*)&dst[24], (__m128i*)&dst[8], dst2);
_mm256_storeu2_m128i((__m128i*)&dst[28], (__m128i*)&dst[12], dst3);
}
break;
}
case 1: {
const __m256i ff = _mm256_set1_epi16((short)0xff00);
const __m256i mul = _mm256_set1_epi16(0x110);
for (x = 0; x + 32 <= width; x += 32, dst += 16) {
// 0a0b | (where a/b are 4 bits).
const __m256i in = _mm256_loadu_si256((const __m256i*)&row[x]);
const __m256i tmp = _mm256_mullo_epi16(in, mul); // aba0
const __m256i pack = _mm256_and_si256(tmp, ff); // ab00
const __m256i dst0 = _mm256_unpacklo_epi16(pack, ff);
const __m256i dst1 = _mm256_unpackhi_epi16(pack, ff);
_mm256_storeu2_m128i((__m128i*)&dst[8], (__m128i*)&dst[0], dst0);
_mm256_storeu2_m128i((__m128i*)&dst[12], (__m128i*)&dst[4], dst1);
}
break;
}
case 2: {
const __m256i mask_or = _mm256_set1_epi32((int)0xff000000);
const __m256i mul_cst = _mm256_set1_epi16(0x0104);
const __m256i mask_mul = _mm256_set1_epi16(0x0f00);
for (x = 0; x + 32 <= width; x += 32, dst += 8) {
// 000a000b000c000d | (where a/b/c/d are 2 bits).
const __m256i in = _mm256_loadu_si256((const __m256i*)&row[x]);
const __m256i mul =
_mm256_mullo_epi16(in, mul_cst); // 00ab00b000cd00d0
const __m256i tmp =
_mm256_and_si256(mul, mask_mul); // 00ab000000cd0000
const __m256i shift = _mm256_srli_epi32(tmp, 12); // 00000000ab000000
const __m256i pack = _mm256_or_si256(shift, tmp); // 00000000abcd0000
// Convert to 0xff00**00.
const __m256i res = _mm256_or_si256(pack, mask_or);
_mm256_storeu_si256((__m256i*)dst, res);
}
break;
}
default: {
assert(xbits == 3);
for (x = 0; x + 32 <= width; x += 32, dst += 4) {
// 0000000a00000000b... | (where a/b are 1 bit).
const __m256i in = _mm256_loadu_si256((const __m256i*)&row[x]);
const __m256i shift = _mm256_slli_epi64(in, 7);
const uint32_t move = _mm256_movemask_epi8(shift);
dst[0] = 0xff000000 | ((move & 0xff) << 8);
dst[1] = 0xff000000 | (move & 0xff00);
dst[2] = 0xff000000 | ((move & 0xff0000) >> 8);
dst[3] = 0xff000000 | ((move & 0xff000000) >> 16);
}
break;
}
}
if (x != width) {
VP8LBundleColorMap_SSE(row + x, width - x, xbits, dst);
}
}
//------------------------------------------------------------------------------
// Batch version of Predictor Transform subtraction
static WEBP_INLINE void Average2_m256i(const __m256i* const a0,
const __m256i* const a1,
__m256i* const avg) {
// (a + b) >> 1 = ((a + b + 1) >> 1) - ((a ^ b) & 1)
const __m256i ones = _mm256_set1_epi8(1);
const __m256i avg1 = _mm256_avg_epu8(*a0, *a1);
const __m256i one = _mm256_and_si256(_mm256_xor_si256(*a0, *a1), ones);
*avg = _mm256_sub_epi8(avg1, one);
}
// Predictor0: ARGB_BLACK.
static void PredictorSub0_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i black = _mm256_set1_epi32((int)ARGB_BLACK);
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i res = _mm256_sub_epi8(src, black);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[0](in + i, NULL, num_pixels - i, out + i);
}
(void)upper;
}
#define GENERATE_PREDICTOR_1(X, IN) \
static void PredictorSub##X##_AVX2( \
const uint32_t* const in, const uint32_t* const upper, int num_pixels, \
uint32_t* WEBP_RESTRICT const out) { \
int i; \
for (i = 0; i + 8 <= num_pixels; i += 8) { \
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]); \
const __m256i pred = _mm256_loadu_si256((const __m256i*)&(IN)); \
const __m256i res = _mm256_sub_epi8(src, pred); \
_mm256_storeu_si256((__m256i*)&out[i], res); \
} \
if (i != num_pixels) { \
VP8LPredictorsSub_SSE[(X)](in + i, WEBP_OFFSET_PTR(upper, i), \
num_pixels - i, out + i); \
} \
}
GENERATE_PREDICTOR_1(1, in[i - 1]) // Predictor1: L
GENERATE_PREDICTOR_1(2, upper[i]) // Predictor2: T
GENERATE_PREDICTOR_1(3, upper[i + 1]) // Predictor3: TR
GENERATE_PREDICTOR_1(4, upper[i - 1]) // Predictor4: TL
#undef GENERATE_PREDICTOR_1
// Predictor5: avg2(avg2(L, TR), T)
static void PredictorSub5_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TR = _mm256_loadu_si256((const __m256i*)&upper[i + 1]);
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
__m256i avg, pred, res;
Average2_m256i(&L, &TR, &avg);
Average2_m256i(&avg, &T, &pred);
res = _mm256_sub_epi8(src, pred);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[5](in + i, upper + i, num_pixels - i, out + i);
}
}
#define GENERATE_PREDICTOR_2(X, A, B) \
static void PredictorSub##X##_AVX2(const uint32_t* in, \
const uint32_t* upper, int num_pixels, \
uint32_t* WEBP_RESTRICT out) { \
int i; \
for (i = 0; i + 8 <= num_pixels; i += 8) { \
const __m256i tA = _mm256_loadu_si256((const __m256i*)&(A)); \
const __m256i tB = _mm256_loadu_si256((const __m256i*)&(B)); \
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]); \
__m256i pred, res; \
Average2_m256i(&tA, &tB, &pred); \
res = _mm256_sub_epi8(src, pred); \
_mm256_storeu_si256((__m256i*)&out[i], res); \
} \
if (i != num_pixels) { \
VP8LPredictorsSub_SSE[(X)](in + i, upper + i, num_pixels - i, out + i); \
} \
}
GENERATE_PREDICTOR_2(6, in[i - 1], upper[i - 1]) // Predictor6: avg(L, TL)
GENERATE_PREDICTOR_2(7, in[i - 1], upper[i]) // Predictor7: avg(L, T)
GENERATE_PREDICTOR_2(8, upper[i - 1], upper[i]) // Predictor8: avg(TL, T)
GENERATE_PREDICTOR_2(9, upper[i], upper[i + 1]) // Predictor9: average(T, TR)
#undef GENERATE_PREDICTOR_2
// Predictor10: avg(avg(L,TL), avg(T, TR)).
static void PredictorSub10_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TR = _mm256_loadu_si256((const __m256i*)&upper[i + 1]);
__m256i avgTTR, avgLTL, avg, res;
Average2_m256i(&T, &TR, &avgTTR);
Average2_m256i(&L, &TL, &avgLTL);
Average2_m256i(&avgTTR, &avgLTL, &avg);
res = _mm256_sub_epi8(src, avg);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[10](in + i, upper + i, num_pixels - i, out + i);
}
}
// Predictor11: select.
static void GetSumAbsDiff32_AVX2(const __m256i* const A, const __m256i* const B,
__m256i* const out) {
// We can unpack with any value on the upper 32 bits, provided it's the same
// on both operands (to that their sum of abs diff is zero). Here we use *A.
const __m256i A_lo = _mm256_unpacklo_epi32(*A, *A);
const __m256i B_lo = _mm256_unpacklo_epi32(*B, *A);
const __m256i A_hi = _mm256_unpackhi_epi32(*A, *A);
const __m256i B_hi = _mm256_unpackhi_epi32(*B, *A);
const __m256i s_lo = _mm256_sad_epu8(A_lo, B_lo);
const __m256i s_hi = _mm256_sad_epu8(A_hi, B_hi);
*out = _mm256_packs_epi32(s_lo, s_hi);
}
static void PredictorSub11_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
__m256i pa, pb;
GetSumAbsDiff32_AVX2(&T, &TL, &pa); // pa = sum |T-TL|
GetSumAbsDiff32_AVX2(&L, &TL, &pb); // pb = sum |L-TL|
{
const __m256i mask = _mm256_cmpgt_epi32(pb, pa);
const __m256i A = _mm256_and_si256(mask, L);
const __m256i B = _mm256_andnot_si256(mask, T);
const __m256i pred = _mm256_or_si256(A, B); // pred = (L > T)? L : T
const __m256i res = _mm256_sub_epi8(src, pred);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[11](in + i, upper + i, num_pixels - i, out + i);
}
}
// Predictor12: ClampedSubSubtractFull.
static void PredictorSub12_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i zero = _mm256_setzero_si256();
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i L_lo = _mm256_unpacklo_epi8(L, zero);
const __m256i L_hi = _mm256_unpackhi_epi8(L, zero);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i T_lo = _mm256_unpacklo_epi8(T, zero);
const __m256i T_hi = _mm256_unpackhi_epi8(T, zero);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i TL_lo = _mm256_unpacklo_epi8(TL, zero);
const __m256i TL_hi = _mm256_unpackhi_epi8(TL, zero);
const __m256i diff_lo = _mm256_sub_epi16(T_lo, TL_lo);
const __m256i diff_hi = _mm256_sub_epi16(T_hi, TL_hi);
const __m256i pred_lo = _mm256_add_epi16(L_lo, diff_lo);
const __m256i pred_hi = _mm256_add_epi16(L_hi, diff_hi);
const __m256i pred = _mm256_packus_epi16(pred_lo, pred_hi);
const __m256i res = _mm256_sub_epi8(src, pred);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[12](in + i, upper + i, num_pixels - i, out + i);
}
}
// Predictors13: ClampedAddSubtractHalf
static void PredictorSub13_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i zero = _mm256_setzero_si256();
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
// lo.
const __m256i L_lo = _mm256_unpacklo_epi8(L, zero);
const __m256i T_lo = _mm256_unpacklo_epi8(T, zero);
const __m256i TL_lo = _mm256_unpacklo_epi8(TL, zero);
const __m256i sum_lo = _mm256_add_epi16(T_lo, L_lo);
const __m256i avg_lo = _mm256_srli_epi16(sum_lo, 1);
const __m256i A1_lo = _mm256_sub_epi16(avg_lo, TL_lo);
const __m256i bit_fix_lo = _mm256_cmpgt_epi16(TL_lo, avg_lo);
const __m256i A2_lo = _mm256_sub_epi16(A1_lo, bit_fix_lo);
const __m256i A3_lo = _mm256_srai_epi16(A2_lo, 1);
const __m256i A4_lo = _mm256_add_epi16(avg_lo, A3_lo);
// hi.
const __m256i L_hi = _mm256_unpackhi_epi8(L, zero);
const __m256i T_hi = _mm256_unpackhi_epi8(T, zero);
const __m256i TL_hi = _mm256_unpackhi_epi8(TL, zero);
const __m256i sum_hi = _mm256_add_epi16(T_hi, L_hi);
const __m256i avg_hi = _mm256_srli_epi16(sum_hi, 1);
const __m256i A1_hi = _mm256_sub_epi16(avg_hi, TL_hi);
const __m256i bit_fix_hi = _mm256_cmpgt_epi16(TL_hi, avg_hi);
const __m256i A2_hi = _mm256_sub_epi16(A1_hi, bit_fix_hi);
const __m256i A3_hi = _mm256_srai_epi16(A2_hi, 1);
const __m256i A4_hi = _mm256_add_epi16(avg_hi, A3_hi);
const __m256i pred = _mm256_packus_epi16(A4_lo, A4_hi);
const __m256i res = _mm256_sub_epi8(src, pred);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[13](in + i, upper + i, num_pixels - i, out + i);
}
}
//------------------------------------------------------------------------------
// Entry point
extern void VP8LEncDspInitAVX2(void);
WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitAVX2(void) {
VP8LSubtractGreenFromBlueAndRed = SubtractGreenFromBlueAndRed_AVX2;
VP8LTransformColor = TransformColor_AVX2;
VP8LCollectColorBlueTransforms = CollectColorBlueTransforms_AVX2;
VP8LCollectColorRedTransforms = CollectColorRedTransforms_AVX2;
VP8LAddVector = AddVector_AVX2;
VP8LAddVectorEq = AddVectorEq_AVX2;
VP8LCombinedShannonEntropy = CombinedShannonEntropy_AVX2;
VP8LVectorMismatch = VectorMismatch_AVX2;
VP8LBundleColorMap = BundleColorMap_AVX2;
VP8LPredictorsSub[0] = PredictorSub0_AVX2;
VP8LPredictorsSub[1] = PredictorSub1_AVX2;
VP8LPredictorsSub[2] = PredictorSub2_AVX2;
VP8LPredictorsSub[3] = PredictorSub3_AVX2;
VP8LPredictorsSub[4] = PredictorSub4_AVX2;
VP8LPredictorsSub[5] = PredictorSub5_AVX2;
VP8LPredictorsSub[6] = PredictorSub6_AVX2;
VP8LPredictorsSub[7] = PredictorSub7_AVX2;
VP8LPredictorsSub[8] = PredictorSub8_AVX2;
VP8LPredictorsSub[9] = PredictorSub9_AVX2;
VP8LPredictorsSub[10] = PredictorSub10_AVX2;
VP8LPredictorsSub[11] = PredictorSub11_AVX2;
VP8LPredictorsSub[12] = PredictorSub12_AVX2;
VP8LPredictorsSub[13] = PredictorSub13_AVX2;
VP8LPredictorsSub[14] = PredictorSub0_AVX2; // <- padding security sentinels
VP8LPredictorsSub[15] = PredictorSub0_AVX2;
}
#else // !WEBP_USE_AVX2
WEBP_DSP_INIT_STUB(VP8LEncDspInitAVX2)
#endif // WEBP_USE_AVX2

View File

@ -133,60 +133,6 @@ static uint32_t ExtraCost_MIPS32(const uint32_t* const population, int length) {
return ((int64_t)temp0 << 32 | temp1);
}
// C version of this function:
// int i = 0;
// int64_t cost = 0;
// const uint32_t* pX = &X[4];
// const uint32_t* pY = &Y[4];
// const uint32_t* LoopEnd = &X[length];
// while (pX != LoopEnd) {
// const uint32_t xy0 = *pX + *pY;
// const uint32_t xy1 = *(pX + 1) + *(pY + 1);
// ++i;
// cost += i * xy0;
// cost += i * xy1;
// pX += 2;
// pY += 2;
// }
// return cost;
static uint32_t ExtraCostCombined_MIPS32(const uint32_t* WEBP_RESTRICT const X,
const uint32_t* WEBP_RESTRICT const Y,
int length) {
int i, temp0, temp1, temp2, temp3;
const uint32_t* pX = &X[4];
const uint32_t* pY = &Y[4];
const uint32_t* const LoopEnd = &X[length];
__asm__ volatile(
"mult $zero, $zero \n\t"
"xor %[i], %[i], %[i] \n\t"
"beq %[pX], %[LoopEnd], 2f \n\t"
"1: \n\t"
"lw %[temp0], 0(%[pX]) \n\t"
"lw %[temp1], 0(%[pY]) \n\t"
"lw %[temp2], 4(%[pX]) \n\t"
"lw %[temp3], 4(%[pY]) \n\t"
"addiu %[i], %[i], 1 \n\t"
"addu %[temp0], %[temp0], %[temp1] \n\t"
"addu %[temp2], %[temp2], %[temp3] \n\t"
"addiu %[pX], %[pX], 8 \n\t"
"addiu %[pY], %[pY], 8 \n\t"
"madd %[i], %[temp0] \n\t"
"madd %[i], %[temp2] \n\t"
"bne %[pX], %[LoopEnd], 1b \n\t"
"2: \n\t"
"mfhi %[temp0] \n\t"
"mflo %[temp1] \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1),
[temp2]"=&r"(temp2), [temp3]"=&r"(temp3),
[i]"=&r"(i), [pX]"+r"(pX), [pY]"+r"(pY)
: [LoopEnd]"r"(LoopEnd)
: "memory", "hi", "lo"
);
return ((int64_t)temp0 << 32 | temp1);
}
#define HUFFMAN_COST_PASS \
__asm__ volatile( \
"sll %[temp1], %[temp0], 3 \n\t" \
@ -388,7 +334,6 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitMIPS32(void) {
VP8LFastSLog2Slow = FastSLog2Slow_MIPS32;
VP8LFastLog2Slow = FastLog2Slow_MIPS32;
VP8LExtraCost = ExtraCost_MIPS32;
VP8LExtraCostCombined = ExtraCostCombined_MIPS32;
VP8LGetEntropyUnrefined = GetEntropyUnrefined_MIPS32;
VP8LGetCombinedEntropyUnrefined = GetCombinedEntropyUnrefined_MIPS32;
VP8LAddVector = AddVector_MIPS32;

View File

@ -14,11 +14,16 @@
#include "src/dsp/dsp.h"
#if defined(WEBP_USE_SSE2)
#include <assert.h>
#include <emmintrin.h>
#include <string.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/dsp/common_sse2.h"
#include "src/dsp/lossless_common.h"
#include "src/utils/utils.h"
#include "src/webp/types.h"
// For sign-extended multiplying constants, pre-shifted by 5:
#define CST_5b(X) (((int16_t)((uint16_t)(X) << 8)) >> 5)
@ -175,64 +180,102 @@ static void CollectColorRedTransforms_SSE2(const uint32_t* WEBP_RESTRICT argb,
// Note we are adding uint32_t's as *signed* int32's (using _mm_add_epi32). But
// that's ok since the histogram values are less than 1<<28 (max picture size).
#define LINE_SIZE 16 // 8 or 16
static void AddVector_SSE2(const uint32_t* WEBP_RESTRICT a,
const uint32_t* WEBP_RESTRICT b,
uint32_t* WEBP_RESTRICT out, int size) {
int i;
for (i = 0; i + LINE_SIZE <= size; i += LINE_SIZE) {
int i = 0;
int aligned_size = size & ~15;
// Size is, at minimum, NUM_DISTANCE_CODES (40) and may be as large as
// NUM_LITERAL_CODES (256) + NUM_LENGTH_CODES (24) + (0 or a non-zero power of
// 2). See the usage in VP8LHistogramAdd().
assert(size >= 16);
assert(size % 2 == 0);
do {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
#if (LINE_SIZE == 16)
const __m128i a2 = _mm_loadu_si128((const __m128i*)&a[i + 8]);
const __m128i a3 = _mm_loadu_si128((const __m128i*)&a[i + 12]);
#endif
const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i + 0]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&b[i + 4]);
#if (LINE_SIZE == 16)
const __m128i b2 = _mm_loadu_si128((const __m128i*)&b[i + 8]);
const __m128i b3 = _mm_loadu_si128((const __m128i*)&b[i + 12]);
#endif
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
#if (LINE_SIZE == 16)
_mm_storeu_si128((__m128i*)&out[i + 8], _mm_add_epi32(a2, b2));
_mm_storeu_si128((__m128i*)&out[i + 12], _mm_add_epi32(a3, b3));
#endif
i += 16;
} while (i != aligned_size);
if ((size & 8) != 0) {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i + 0]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&b[i + 4]);
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
i += 8;
}
for (; i < size; ++i) {
out[i] = a[i] + b[i];
size &= 7;
if (size == 4) {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i]);
_mm_storeu_si128((__m128i*)&out[i], _mm_add_epi32(a0, b0));
} else if (size == 2) {
const __m128i a0 = _mm_loadl_epi64((const __m128i*)&a[i]);
const __m128i b0 = _mm_loadl_epi64((const __m128i*)&b[i]);
_mm_storel_epi64((__m128i*)&out[i], _mm_add_epi32(a0, b0));
}
}
static void AddVectorEq_SSE2(const uint32_t* WEBP_RESTRICT a,
uint32_t* WEBP_RESTRICT out, int size) {
int i;
for (i = 0; i + LINE_SIZE <= size; i += LINE_SIZE) {
int i = 0;
int aligned_size = size & ~15;
// Size is, at minimum, NUM_DISTANCE_CODES (40) and may be as large as
// NUM_LITERAL_CODES (256) + NUM_LENGTH_CODES (24) + (0 or a non-zero power of
// 2). See the usage in VP8LHistogramAdd().
assert(size >= 16);
assert(size % 2 == 0);
do {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
#if (LINE_SIZE == 16)
const __m128i a2 = _mm_loadu_si128((const __m128i*)&a[i + 8]);
const __m128i a3 = _mm_loadu_si128((const __m128i*)&a[i + 12]);
#endif
const __m128i b0 = _mm_loadu_si128((const __m128i*)&out[i + 0]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&out[i + 4]);
#if (LINE_SIZE == 16)
const __m128i b2 = _mm_loadu_si128((const __m128i*)&out[i + 8]);
const __m128i b3 = _mm_loadu_si128((const __m128i*)&out[i + 12]);
#endif
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
#if (LINE_SIZE == 16)
_mm_storeu_si128((__m128i*)&out[i + 8], _mm_add_epi32(a2, b2));
_mm_storeu_si128((__m128i*)&out[i + 12], _mm_add_epi32(a3, b3));
#endif
i += 16;
} while (i != aligned_size);
if ((size & 8) != 0) {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&out[i + 0]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&out[i + 4]);
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
i += 8;
}
for (; i < size; ++i) {
out[i] += a[i];
size &= 7;
if (size == 4) {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&out[i]);
_mm_storeu_si128((__m128i*)&out[i], _mm_add_epi32(a0, b0));
} else if (size == 2) {
const __m128i a0 = _mm_loadl_epi64((const __m128i*)&a[i]);
const __m128i b0 = _mm_loadl_epi64((const __m128i*)&out[i]);
_mm_storel_epi64((__m128i*)&out[i], _mm_add_epi32(a0, b0));
}
}
#undef LINE_SIZE
//------------------------------------------------------------------------------
// Entropy
@ -607,25 +650,43 @@ static void PredictorSub13_SSE2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m128i zero = _mm_setzero_si128();
for (i = 0; i + 2 <= num_pixels; i += 2) {
// we can only process two pixels at a time
const __m128i L = _mm_loadl_epi64((const __m128i*)&in[i - 1]);
const __m128i src = _mm_loadl_epi64((const __m128i*)&in[i]);
const __m128i T = _mm_loadl_epi64((const __m128i*)&upper[i]);
const __m128i TL = _mm_loadl_epi64((const __m128i*)&upper[i - 1]);
const __m128i L_lo = _mm_unpacklo_epi8(L, zero);
const __m128i T_lo = _mm_unpacklo_epi8(T, zero);
const __m128i TL_lo = _mm_unpacklo_epi8(TL, zero);
const __m128i sum = _mm_add_epi16(T_lo, L_lo);
const __m128i avg = _mm_srli_epi16(sum, 1);
const __m128i A1 = _mm_sub_epi16(avg, TL_lo);
const __m128i bit_fix = _mm_cmpgt_epi16(TL_lo, avg);
const __m128i A2 = _mm_sub_epi16(A1, bit_fix);
const __m128i A3 = _mm_srai_epi16(A2, 1);
const __m128i A4 = _mm_add_epi16(avg, A3);
const __m128i pred = _mm_packus_epi16(A4, A4);
const __m128i res = _mm_sub_epi8(src, pred);
_mm_storel_epi64((__m128i*)&out[i], res);
for (i = 0; i + 4 <= num_pixels; i += 4) {
const __m128i L = _mm_loadu_si128((const __m128i*)&in[i - 1]);
const __m128i src = _mm_loadu_si128((const __m128i*)&in[i]);
const __m128i T = _mm_loadu_si128((const __m128i*)&upper[i]);
const __m128i TL = _mm_loadu_si128((const __m128i*)&upper[i - 1]);
__m128i A4_lo, A4_hi;
// lo.
{
const __m128i L_lo = _mm_unpacklo_epi8(L, zero);
const __m128i T_lo = _mm_unpacklo_epi8(T, zero);
const __m128i TL_lo = _mm_unpacklo_epi8(TL, zero);
const __m128i sum_lo = _mm_add_epi16(T_lo, L_lo);
const __m128i avg_lo = _mm_srli_epi16(sum_lo, 1);
const __m128i A1_lo = _mm_sub_epi16(avg_lo, TL_lo);
const __m128i bit_fix_lo = _mm_cmpgt_epi16(TL_lo, avg_lo);
const __m128i A2_lo = _mm_sub_epi16(A1_lo, bit_fix_lo);
const __m128i A3_lo = _mm_srai_epi16(A2_lo, 1);
A4_lo = _mm_add_epi16(avg_lo, A3_lo);
}
// hi.
{
const __m128i L_hi = _mm_unpackhi_epi8(L, zero);
const __m128i T_hi = _mm_unpackhi_epi8(T, zero);
const __m128i TL_hi = _mm_unpackhi_epi8(TL, zero);
const __m128i sum_hi = _mm_add_epi16(T_hi, L_hi);
const __m128i avg_hi = _mm_srli_epi16(sum_hi, 1);
const __m128i A1_hi = _mm_sub_epi16(avg_hi, TL_hi);
const __m128i bit_fix_hi = _mm_cmpgt_epi16(TL_hi, avg_hi);
const __m128i A2_hi = _mm_sub_epi16(A1_hi, bit_fix_hi);
const __m128i A3_hi = _mm_srai_epi16(A2_hi, 1);
A4_hi = _mm_add_epi16(avg_hi, A3_hi);
}
{
const __m128i pred = _mm_packus_epi16(A4_lo, A4_hi);
const __m128i res = _mm_sub_epi8(src, pred);
_mm_storeu_si128((__m128i*)&out[i], res);
}
}
if (i != num_pixels) {
VP8LPredictorsSub_C[13](in + i, upper + i, num_pixels - i, out + i);
@ -666,6 +727,15 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitSSE2(void) {
VP8LPredictorsSub[13] = PredictorSub13_SSE2;
VP8LPredictorsSub[14] = PredictorSub0_SSE2; // <- padding security sentinels
VP8LPredictorsSub[15] = PredictorSub0_SSE2;
// SSE exports for AVX and above.
VP8LSubtractGreenFromBlueAndRed_SSE = SubtractGreenFromBlueAndRed_SSE2;
VP8LTransformColor_SSE = TransformColor_SSE2;
VP8LCollectColorBlueTransforms_SSE = CollectColorBlueTransforms_SSE2;
VP8LCollectColorRedTransforms_SSE = CollectColorRedTransforms_SSE2;
VP8LBundleColorMap_SSE = BundleColorMap_SSE2;
memcpy(VP8LPredictorsSub_SSE, VP8LPredictorsSub, sizeof(VP8LPredictorsSub));
}
#else // !WEBP_USE_SSE2

View File

@ -14,9 +14,13 @@
#include "src/dsp/dsp.h"
#if defined(WEBP_USE_SSE41)
#include <assert.h>
#include <smmintrin.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Cost operations.
@ -44,29 +48,6 @@ static uint32_t ExtraCost_SSE41(const uint32_t* const a, int length) {
return HorizontalSum_SSE41(cost);
}
static uint32_t ExtraCostCombined_SSE41(const uint32_t* WEBP_RESTRICT const a,
const uint32_t* WEBP_RESTRICT const b,
int length) {
int i;
__m128i cost = _mm_add_epi32(_mm_set_epi32(2 * a[7], 2 * a[6], a[5], a[4]),
_mm_set_epi32(2 * b[7], 2 * b[6], b[5], b[4]));
assert(length % 8 == 0);
for (i = 8; i + 8 <= length; i += 8) {
const int j = (i - 2) >> 1;
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&b[i + 4]);
const __m128i w = _mm_set_epi32(j + 3, j + 2, j + 1, j);
const __m128i a2 = _mm_hadd_epi32(a0, a1);
const __m128i b2 = _mm_hadd_epi32(b0, b1);
const __m128i mul = _mm_mullo_epi32(_mm_add_epi32(a2, b2), w);
cost = _mm_add_epi32(mul, cost);
}
return HorizontalSum_SSE41(cost);
}
//------------------------------------------------------------------------------
// Subtract-Green Transform
@ -195,10 +176,14 @@ extern void VP8LEncDspInitSSE41(void);
WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitSSE41(void) {
VP8LExtraCost = ExtraCost_SSE41;
VP8LExtraCostCombined = ExtraCostCombined_SSE41;
VP8LSubtractGreenFromBlueAndRed = SubtractGreenFromBlueAndRed_SSE41;
VP8LCollectColorBlueTransforms = CollectColorBlueTransforms_SSE41;
VP8LCollectColorRedTransforms = CollectColorRedTransforms_SSE41;
// SSE exports for AVX and above.
VP8LSubtractGreenFromBlueAndRed_SSE = SubtractGreenFromBlueAndRed_SSE41;
VP8LCollectColorBlueTransforms_SSE = CollectColorBlueTransforms_SSE41;
VP8LCollectColorRedTransforms_SSE = CollectColorRedTransforms_SSE41;
}
#else // !WEBP_USE_SSE41

View File

@ -15,10 +15,14 @@
#if defined(WEBP_USE_SSE2)
#include <emmintrin.h>
#include <string.h>
#include "src/dsp/common_sse2.h"
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/dsp/lossless_common.h"
#include <emmintrin.h>
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Predictor Transform
@ -707,6 +711,15 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LDspInitSSE2(void) {
VP8LConvertBGRAToRGBA4444 = ConvertBGRAToRGBA4444_SSE2;
VP8LConvertBGRAToRGB565 = ConvertBGRAToRGB565_SSE2;
VP8LConvertBGRAToBGR = ConvertBGRAToBGR_SSE2;
// SSE exports for AVX and above.
memcpy(VP8LPredictorsAdd_SSE, VP8LPredictorsAdd, sizeof(VP8LPredictorsAdd));
VP8LAddGreenToBlueAndRed_SSE = AddGreenToBlueAndRed_SSE2;
VP8LTransformColorInverse_SSE = TransformColorInverse_SSE2;
VP8LConvertBGRAToRGB_SSE = ConvertBGRAToRGB_SSE2;
VP8LConvertBGRAToRGBA_SSE = ConvertBGRAToRGBA_SSE2;
}
#else // !WEBP_USE_SSE2

View File

@ -13,9 +13,10 @@
#if defined(WEBP_USE_SSE41)
#include "src/dsp/common_sse41.h"
#include <smmintrin.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/dsp/lossless_common.h"
//------------------------------------------------------------------------------
// Color-space conversion functions
@ -124,6 +125,10 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LDspInitSSE41(void) {
VP8LTransformColorInverse = TransformColorInverse_SSE41;
VP8LConvertBGRAToRGB = ConvertBGRAToRGB_SSE41;
VP8LConvertBGRAToBGR = ConvertBGRAToBGR_SSE41;
// SSE exports for AVX and above.
VP8LTransformColorInverse_SSE = TransformColorInverse_SSE41;
VP8LConvertBGRAToRGB_SSE = ConvertBGRAToRGB_SSE41;
}
#else // !WEBP_USE_SSE41

View File

@ -11,15 +11,15 @@
//
// The exact naming is Y'CbCr, following the ITU-R BT.601 standard.
// More information at: https://en.wikipedia.org/wiki/YCbCr
// Y = 0.2569 * R + 0.5044 * G + 0.0979 * B + 16
// U = -0.1483 * R - 0.2911 * G + 0.4394 * B + 128
// V = 0.4394 * R - 0.3679 * G - 0.0715 * B + 128
// Y = 0.2568 * R + 0.5041 * G + 0.0979 * B + 16
// U = -0.1482 * R - 0.2910 * G + 0.4392 * B + 128
// V = 0.4392 * R - 0.3678 * G - 0.0714 * B + 128
// We use 16bit fixed point operations for RGB->YUV conversion (YUV_FIX).
//
// For the Y'CbCr to RGB conversion, the BT.601 specification reads:
// R = 1.164 * (Y-16) + 1.596 * (V-128)
// G = 1.164 * (Y-16) - 0.813 * (V-128) - 0.391 * (U-128)
// B = 1.164 * (Y-16) + 2.018 * (U-128)
// G = 1.164 * (Y-16) - 0.813 * (V-128) - 0.392 * (U-128)
// B = 1.164 * (Y-16) + 2.017 * (U-128)
// where Y is in the [16,235] range, and U/V in the [16,240] range.
//
// The fixed-point implementation used here is:

View File

@ -401,10 +401,8 @@ WEBP_NODISCARD static int GetCombinedHistogramEntropy(
*cost = GetCombinedEntropy(a->literal_, b->literal_,
VP8LHistogramNumCodes(palette_code_bits),
a->is_used_[0], b->is_used_[0], 0);
*cost += (uint64_t)VP8LExtraCostCombined(a->literal_ + NUM_LITERAL_CODES,
b->literal_ + NUM_LITERAL_CODES,
NUM_LENGTH_CODES)
<< LOG_2_PRECISION_BITS;
// No need to add the extra cost as it is a constant that does not influence
// the histograms.
if (*cost >= cost_threshold) return 0;
if (a->trivial_symbol_ != VP8L_NON_TRIVIAL_SYM &&
@ -434,9 +432,8 @@ WEBP_NODISCARD static int GetCombinedHistogramEntropy(
*cost += GetCombinedEntropy(a->distance_, b->distance_, NUM_DISTANCE_CODES,
a->is_used_[4], b->is_used_[4], 0);
*cost += (uint64_t)VP8LExtraCostCombined(a->distance_, b->distance_,
NUM_DISTANCE_CODES)
<< LOG_2_PRECISION_BITS;
// No need to add the extra cost as it is a constant that does not influence
// the histograms.
if (*cost >= cost_threshold) return 0;
return 1;
@ -528,16 +525,13 @@ static void UpdateHistogramCost(VP8LHistogram* const h) {
uint32_t alpha_sym, red_sym, blue_sym;
const uint64_t alpha_cost =
PopulationCost(h->alpha_, NUM_LITERAL_CODES, &alpha_sym, &h->is_used_[3]);
// No need to add the extra cost as it is a constant that does not influence
// the histograms.
const uint64_t distance_cost =
PopulationCost(h->distance_, NUM_DISTANCE_CODES, NULL, &h->is_used_[4]) +
((uint64_t)VP8LExtraCost(h->distance_, NUM_DISTANCE_CODES)
<< LOG_2_PRECISION_BITS);
PopulationCost(h->distance_, NUM_DISTANCE_CODES, NULL, &h->is_used_[4]);
const int num_codes = VP8LHistogramNumCodes(h->palette_code_bits_);
h->literal_cost_ =
PopulationCost(h->literal_, num_codes, NULL, &h->is_used_[0]) +
((uint64_t)VP8LExtraCost(h->literal_ + NUM_LITERAL_CODES,
NUM_LENGTH_CODES)
<< LOG_2_PRECISION_BITS);
PopulationCost(h->literal_, num_codes, NULL, &h->is_used_[0]);
h->red_cost_ =
PopulationCost(h->red_, NUM_LITERAL_CODES, &red_sym, &h->is_used_[1]);
h->blue_cost_ =

View File

@ -226,9 +226,7 @@ int WebPMemoryWrite(const uint8_t* data, size_t data_size,
void WebPMemoryWriterClear(WebPMemoryWriter* writer) {
if (writer != NULL) {
WebPSafeFree(writer->mem);
writer->mem = NULL;
writer->size = 0;
writer->max_size = 0;
WebPMemoryWriterInit(writer);
}
}

View File

@ -32,7 +32,7 @@ extern "C" {
// version numbers
#define ENC_MAJ_VERSION 1
#define ENC_MIN_VERSION 4
#define ENC_MIN_VERSION 5
#define ENC_REV_VERSION 0
enum { MAX_LF_LEVELS = 64, // Maximum loop filter level

View File

@ -6,8 +6,8 @@
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
VS_VERSION_INFO VERSIONINFO
FILEVERSION 1,0,4,0
PRODUCTVERSION 1,0,4,0
FILEVERSION 1,0,5,0
PRODUCTVERSION 1,0,5,0
FILEFLAGSMASK 0x3fL
#ifdef _DEBUG
FILEFLAGS 0x1L
@ -24,12 +24,12 @@ BEGIN
BEGIN
VALUE "CompanyName", "Google, Inc."
VALUE "FileDescription", "libwebp DLL"
VALUE "FileVersion", "1.4.0"
VALUE "FileVersion", "1.5.0"
VALUE "InternalName", "libwebp.dll"
VALUE "LegalCopyright", "Copyright (C) 2024"
VALUE "OriginalFilename", "libwebp.dll"
VALUE "ProductName", "WebP Image Codec"
VALUE "ProductVersion", "1.4.0"
VALUE "ProductVersion", "1.5.0"
END
END
BLOCK "VarFileInfo"

View File

@ -6,8 +6,8 @@
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
VS_VERSION_INFO VERSIONINFO
FILEVERSION 1,0,4,0
PRODUCTVERSION 1,0,4,0
FILEVERSION 1,0,5,0
PRODUCTVERSION 1,0,5,0
FILEFLAGSMASK 0x3fL
#ifdef _DEBUG
FILEFLAGS 0x1L
@ -24,12 +24,12 @@ BEGIN
BEGIN
VALUE "CompanyName", "Google, Inc."
VALUE "FileDescription", "libwebpdecoder DLL"
VALUE "FileVersion", "1.4.0"
VALUE "FileVersion", "1.5.0"
VALUE "InternalName", "libwebpdecoder.dll"
VALUE "LegalCopyright", "Copyright (C) 2024"
VALUE "OriginalFilename", "libwebpdecoder.dll"
VALUE "ProductName", "WebP Image Decoder"
VALUE "ProductVersion", "1.4.0"
VALUE "ProductVersion", "1.5.0"
END
END
BLOCK "VarFileInfo"

View File

@ -17,6 +17,6 @@ noinst_HEADERS =
noinst_HEADERS += ../webp/format_constants.h
libwebpmux_la_LIBADD = ../libwebp.la
libwebpmux_la_LDFLAGS = -no-undefined -version-info 4:0:1 -lm
libwebpmux_la_LDFLAGS = -no-undefined -version-info 4:1:1 -lm
libwebpmuxincludedir = $(includedir)/webp
pkgconfig_DATA = libwebpmux.pc

View File

@ -6,8 +6,8 @@
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
VS_VERSION_INFO VERSIONINFO
FILEVERSION 1,0,4,0
PRODUCTVERSION 1,0,4,0
FILEVERSION 1,0,5,0
PRODUCTVERSION 1,0,5,0
FILEFLAGSMASK 0x3fL
#ifdef _DEBUG
FILEFLAGS 0x1L
@ -24,12 +24,12 @@ BEGIN
BEGIN
VALUE "CompanyName", "Google, Inc."
VALUE "FileDescription", "libwebpmux DLL"
VALUE "FileVersion", "1.4.0"
VALUE "FileVersion", "1.5.0"
VALUE "InternalName", "libwebpmux.dll"
VALUE "LegalCopyright", "Copyright (C) 2024"
VALUE "OriginalFilename", "libwebpmux.dll"
VALUE "ProductName", "WebP Image Muxer"
VALUE "ProductVersion", "1.4.0"
VALUE "ProductVersion", "1.5.0"
END
END
BLOCK "VarFileInfo"

View File

@ -28,7 +28,7 @@ extern "C" {
// Defines and constants.
#define MUX_MAJ_VERSION 1
#define MUX_MIN_VERSION 4
#define MUX_MIN_VERSION 5
#define MUX_REV_VERSION 0
// Chunk object.

View File

@ -223,11 +223,13 @@ WebPMux* WebPMuxCreateInternal(const WebPData* bitstream, int copy_data,
// Note this padding is historical and differs from demux.c which does not
// pad the file size.
riff_size = SizeWithPadding(riff_size);
if (riff_size < CHUNK_HEADER_SIZE) goto Err;
// Make sure the whole RIFF header is available.
if (riff_size < RIFF_HEADER_SIZE) goto Err;
if (riff_size > size) goto Err;
// There's no point in reading past the end of the RIFF chunk.
if (size > riff_size + CHUNK_HEADER_SIZE) {
size = riff_size + CHUNK_HEADER_SIZE;
// There's no point in reading past the end of the RIFF chunk. Note riff_size
// includes CHUNK_HEADER_SIZE after SizeWithPadding().
if (size > riff_size) {
size = riff_size;
}
end = data + size;

View File

@ -15,6 +15,8 @@
#include "src/webp/config.h"
#endif
#include <stddef.h>
#include "src/dsp/cpu.h"
#include "src/utils/bit_reader_inl_utils.h"
#include "src/utils/utils.h"
@ -25,11 +27,12 @@
void VP8BitReaderSetBuffer(VP8BitReader* const br,
const uint8_t* const start,
size_t size) {
br->buf_ = start;
br->buf_end_ = start + size;
br->buf_max_ =
(size >= sizeof(lbit_t)) ? start + size - sizeof(lbit_t) + 1
: start;
if (start != NULL) {
br->buf_ = start;
br->buf_end_ = start + size;
br->buf_max_ =
(size >= sizeof(lbit_t)) ? start + size - sizeof(lbit_t) + 1 : start;
}
}
void VP8InitBitReader(VP8BitReader* const br,

View File

@ -20,7 +20,7 @@
extern "C" {
#endif
#define WEBP_DECODER_ABI_VERSION 0x0209 // MAJOR(8b) + MINOR(8b)
#define WEBP_DECODER_ABI_VERSION 0x0210 // MAJOR(8b) + MINOR(8b)
// Note: forward declaring enumerations is not allowed in (strict) C and C++,
// the types are left here for reference.
@ -451,7 +451,9 @@ struct WebPDecoderOptions {
// Will be snapped to even values.
int crop_width, crop_height; // dimension of the cropping area
int use_scaling; // if true, scaling is applied _afterward_
int scaled_width, scaled_height; // final resolution
int scaled_width, scaled_height; // final resolution. if one is 0, it is
// guessed from the other one to keep the
// original ratio.
int use_threads; // if true, use multi-threaded decoding
int dithering_strength; // dithering strength (0=Off, 100=full)
int flip; // if true, flip output vertically
@ -479,6 +481,11 @@ WEBP_NODISCARD static WEBP_INLINE int WebPInitDecoderConfig(
return WebPInitDecoderConfigInternal(config, WEBP_DECODER_ABI_VERSION);
}
// Returns true if 'config' is non-NULL and all configuration parameters are
// within their valid ranges.
WEBP_NODISCARD WEBP_EXTERN int WebPValidateDecoderConfig(
const WebPDecoderConfig* config);
// Instantiate a new incremental decoder object with the requested
// configuration. The bitstream can be passed using 'data' and 'data_size'
// parameter, in which case the features will be parsed and stored into

View File

@ -76,9 +76,8 @@ int CropOrScale(WebPPicture* const pic, const CropOrScaleParams& params) {
}
}
#else // defined(WEBP_REDUCE_SIZE)
(void)data;
(void)size;
(void)bit_pos;
(void)pic;
(void)params;
#endif // !defined(WEBP_REDUCE_SIZE)
return 1;
}

View File

@ -20,6 +20,7 @@
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <string_view>
#include "imageio/image_dec.h"
#include "imageio/metadata.h"

View File

@ -24,7 +24,7 @@
namespace {
void MuxDemuxApiTest(std::string_view data_in, bool mux) {
void MuxDemuxApiTest(std::string_view data_in, bool use_mux_api) {
const size_t size = data_in.size();
WebPData webp_data;
WebPDataInit(&webp_data);
@ -34,7 +34,7 @@ void MuxDemuxApiTest(std::string_view data_in, bool mux) {
// Extracted chunks and frames are not processed or decoded,
// which is already covered extensively by the other fuzz targets.
if (mux) {
if (use_mux_api) {
// Mux API
WebPMux* mux = WebPMuxCreate(&webp_data, size & 2);
if (!mux) return;

View File

@ -15,6 +15,7 @@
////////////////////////////////////////////////////////////////////////////////
#include <cstdint>
#include <string_view>
#include "src/webp/mux_types.h"
#include "tests/fuzzer/fuzz_utils.h"

View File

@ -172,7 +172,9 @@ for (( i = 0; i < $NUM_PLATFORMS; ++i )); do
CFLAGS="-pipe -isysroot ${SDKROOT} -O3 -DNDEBUG"
case "${PLATFORM}" in
iPhone*)
CFLAGS+=" -fembed-bitcode"
if [[ "${XCODE%%.*}" -lt 16 ]]; then
CFLAGS+=" -fembed-bitcode"
fi
CFLAGS+=" -target ${ARCH}-apple-ios${IOS_MIN_VERSION}"
[[ "${PLATFORM}" == *Simulator* ]] && CFLAGS+="-simulator"
;;