Compare commits

...

20 Commits
v1.5.0 ... main

Author SHA1 Message Date
Vincent Rabaud
a1ad3f1e37 Merge "Remove now unused ExtraCostCombined" into main 2025-04-01 00:28:47 -07:00
Vincent Rabaud
321561b41f Remove now unused ExtraCostCombined
Change-Id: Ic9d1ccf5b10fed67f836aa19fa0f84238acbf4c1
2025-03-29 23:34:20 +01:00
James Zern
e0ae21d231 WebPMemoryWriterClear: use WebPMemoryWriterInit
Removes some common code between the two functions.

Change-Id: If9f42e580e34dad63f3806750d9d7571941026b5
2025-03-28 12:37:24 -07:00
Vincent Rabaud
a4183d94c7 Remove the computation of ExtraCost when comparing histograms
Entropy clustering merges symbol histograms to reduce the overall
entropy. The cost of 2 added histograms is compared to the 2 costs
of the individual histograms and if it is smaller, a merge is done.

Except for some symbols (distance and length), the computed cost is
 the real final cost based on the histogram, and some constant cost
(independent from the probabilities of the symbols and hence the
merge) because the symbol is encode as Golomb.

This constant cost is useless and can be removed.

Change-Id: I6271e8c0e4111cdeff544cbdb7dec3c67be5309c
2025-03-28 15:00:41 +01:00
Vincent Rabaud
f2b3f52733 Get AVX2 into WebP lossless
Change-Id: Ifad3102c9f899a46401985515cd98f3f7a21887f
2025-03-28 11:44:03 +01:00
Vincent Rabaud
7c70ff7a3b Clean dsp/lossless includes
Change-Id: I47a405a9c402095b440404fe57ac08b5293ea71b
2025-03-25 12:38:00 +01:00
Vincent Rabaud
9dd5ae819b Use the full register in PredictorSub13_SSE2
No more than 15 registers are used at a time

Change-Id: I40f77d9df8500e5e0d52ff6b206d765e8be62ae1
2025-03-25 11:07:15 +01:00
James Zern
613be8fc61 Makefile.vc: add /MP to CFLAGS
This speeds up the batch rules by compiling source files in parallel.

Change-Id: If5076e9c245d82df957b05711a74e2569f4ba086
2025-03-17 16:33:51 -07:00
James Zern
1d86819f49 Merge changes I1437390a,I10a20de5,I1ac777d1 into main
* changes:
  pngdec.c: add support for 'eXIf' tag
  pngdec.c: support ImageMagick app1 exif text data
  pngdec.c: add missing #ifdef for png_get_iCCP
2025-03-06 14:00:07 -08:00
James Zern
743a5f092d enc_neon: enable vld1q_u8_x4 for clang & msvc
This restores the use of the function after
980b708e enc_neon: fix build w/aarch64 gcc < 9.4.0

The intrinsic was added to llvm for aarch64 in:
5e4ce1ae9dad Implement the newly added AArch64 ACLE functions for
             ld1/st1 with 2/3/4 vectors. The functions are like:
             vst1_s8_x2 ...
llvmorg-3.4.0-rc1~101
https://github.com/llvm/llvm-project/commit/5e4ce1ae9dad

Visual Studio 2019 and 2022 also support the function (2017 is still
disabled for this path due to it relying on arm64_neon.h).

Change-Id: I6ff10e22deb3968a48738a4458d2d3d55410b5ec
2025-03-05 16:56:20 -08:00
James Zern
565da14882 pngdec.c: add support for 'eXIf' tag
Test file created with exiftool 12.76:

```
exiftool test_app1_exif.png -exif:all \
  -exif:DocumentName=test_multi_exif.png -o test_multi_exif.png
```

Bug: webp:398066379
Change-Id: I1437390a70f5708421683eb69c588624bb376baa
2025-03-05 13:54:09 -08:00
James Zern
319860e919 pngdec.c: support ImageMagick app1 exif text data
Test file created with ImageMagick 6.9.13-12:

```
convert test_exif.png test_app1_exif.png
```

Bug: webp:398066379
Change-Id: I10a20de5699fabb0906045994d7d1f4b9e951973
2025-03-05 13:54:07 -08:00
James Zern
815fc1e110 pngdec.c: add missing #ifdef for png_get_iCCP
png_get_iCCP is an optional part of the API. Protect its usage with
PNG_iCCP_SUPPORTED.

Change-Id: I1ac777d1c2a200bb3e1303b3d095cc0d67633bd4
2025-03-05 13:54:04 -08:00
James Zern
980b708e2c enc_neon: fix build w/aarch64 gcc < 9.4.0
vld1q_u8_x4 was added for aarch64 in the gcc 9.4.0 release:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/ChangeLog;h=7558c0a369ea8c74a2b9369049a2d1cc187dc050;hb=13c83c4cc679ad5383ed57f359e53e8d518b7842#l2100

fixes:
src/dsp/enc_neon.c: In function 'Intra4Preds_NEON':
src/dsp/enc_neon.c:974:37: warning: implicit declaration of function
  'vld1q_u8_x4'; did you mean 'vld1q_u8_x2'?
  [-Wimplicit-function-declaration]

Bug: webp:398288323
Change-Id: Ic6e408065a375c945cc8691bd16a9f5d5642cfa2
2025-02-27 19:07:50 -08:00
James Zern
73b728cbb9 cmake: bump minimum version to 3.16
This matches the current support matrix (from 2024-12-17) [1] and quiets
a warning from recent (3.31.5) versions of cmake:

CMake Deprecation Warning at CMakeLists.txt:12 (cmake_minimum_required):
  Compatibility with CMake < 3.10 will be removed from a future version
  of CMake.

Explicit setting of CMP0072 is also removed; it was added in 3.11.

[1]: https://github.com/google/oss-policies-info/blob/main/foundational-cxx-support-matrix.md

Bug: webp:397130631
Change-Id: Ic844dadf983a82674990edbddbfc54329df12eb7
Fixed: webp:397130631
2025-02-20 12:32:08 -08:00
Vincent Rabaud
6a22b6709c Add a function to validate a WebPDecoderConfig
This echoes WebPValidateConfig for encoding.

Change-Id: Ib404d55c7af4d0755644879ec491e3998e6b5e8d
2025-01-30 10:10:08 +01:00
Vincent Rabaud
7ed2b10ef0 Use consistently signed stride types.
The stride can be negative when asked for a flipped image.

Change-Id: I049e8027c769186274a6a3049949f3fcaae7d2e9
2025-01-30 00:12:28 +01:00
Vincent Rabaud
654bfb040c Avoid nullptr arithmetic in VP8BitReaderSetBuffer
When start is nullptr, the IO is not used afterwards
anyway, so there is not risk.

Change-Id: I0a828aec85c6e228e95dfed4a40d348275a7c577
2025-01-30 00:12:15 +01:00
Vincent Rabaud
f8f2410710 Fix potential "divide by zero" in examples found by coverity
Change-Id: Ic41f9cb2ac24450986cd061db718953276eee080
2025-01-16 18:02:41 +01:00
James Zern
2af6c034ac libwebp-1.5.0
- 12/19/2024 version 1.5.0
   This is a binary compatible release.
   API changes:
     - `cross_color_transform_bits` added to WebPAuxStats
   * minor lossless encoder speed and compression improvements
   * lossless encoding does not use floats anymore
   * additional Arm optimizations for lossy & lossless + general code generation
     improvements
   * improvements to WASM performance (#643)
   * improvements and corrections in webp-container-spec.txt and
     webp-lossless-bitstream-spec.txt (#646, #355607636)
   * further security related hardening and increased fuzzing coverage w/fuzztest
     (oss-fuzz: #382816119, #70112, #70102, #69873, #69825, #69508, #69208)
   * miscellaneous warning, bug & build fixes (#499, #562, #381372617,
     #381109771, #42340561, #375011696, #372109644, chromium: #334120888)
   Tool updates:
     * gif2webp: add -sharp_yuv & -near_lossless
     * img2webp: add -exact & -noexact
     * exit codes normalized; running an example program with no
       arguments will output its help and exit with an error (#42340557,
       #381372617)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEaw5rcJdt4wPt8vYB+cPWvbgjK10FAmdk0soACgkQ+cPWvbgj
 K11UhhAAl5LtmIDz5uQE5ZlAADpIAuAC5nIikQUVY9up4RqAaw734atTh5JRzbpL
 QoQvAUPQ6YBdiH2GSF47THGHHQZfsV+f3yb0MICI3l6NOBJhHHFmG2Dt3IVVmO1l
 LZGM1CxaSZP7gpvSa/eNwvXEWxLezith7I3fyY0oIEf+JKdWan7uyeUPvc+iFrpo
 xTSpcAdWbHKGaC6zvH5gPJPlW64D2MZ31To+26s44uSgwpB6JrICXpxwn51cOClc
 1YzGJZ/aTQBphwY0W2yYFa5rBs6VxhCGHJAY5dSTmkeMUiQpIz0kugkKmyfVBmfN
 2tLC1suE0WgXzHUVwlZdorM5EIjXK0Orht/Fn5EphjmUXPr+2S+ENwiOnI2HPrEB
 Fn3Cy64uOHsuW58JVm+yUeNtPqB8uXunzQrteO7nd5aKXNxgth83QJwzv0T80tMI
 NltAfse+QPrbwA/GS250hh+8WfFzOr8i9W/3V0OYZXqLD/ooJA2hxy3gAU6Zr2qa
 GowRvbCZs18w1ormXoDEC3tBnBBPi8ktRfYd2wGHRl0VUFo1Ntyj+tr8NOuylpxO
 hE3WFn/Ao6Xs3WRSx1LppbPvWnH+j2UAm0QCeAcUN2A766XpKyupGDAg09fZUZe6
 korrahZni3I0uOpqyZX0W2FmCQYIIRHTwcCNLD/yTqlqhyiQtDg=
 =HZLr
 -----END PGP SIGNATURE-----

Merge tag 'v1.5.0'

libwebp-1.5.0

- 12/19/2024 version 1.5.0
  This is a binary compatible release.
  API changes:
    - `cross_color_transform_bits` added to WebPAuxStats
  * minor lossless encoder speed and compression improvements
  * lossless encoding does not use floats anymore
  * additional Arm optimizations for lossy & lossless + general code generation
    improvements
  * improvements to WASM performance (#643)
  * improvements and corrections in webp-container-spec.txt and
    webp-lossless-bitstream-spec.txt (#646, #355607636)
  * further security related hardening and increased fuzzing coverage w/fuzztest
    (oss-fuzz: #382816119, #70112, #70102, #69873, #69825, #69508, #69208)
  * miscellaneous warning, bug & build fixes (#499, #562, #381372617,
    #381109771, #42340561, #375011696, #372109644, chromium: #334120888)
  Tool updates:
    * gif2webp: add -sharp_yuv & -near_lossless
    * img2webp: add -exact & -noexact
    * exit codes normalized; running an example program with no
      arguments will output its help and exit with an error (#42340557,
      #381372617)

Bug: b:336795049,webp:380121350

* tag 'v1.5.0':
  update ChangeLog
  update NEWS
  tests/fuzzer/*: add missing <string_view> include
  fuzz_utils.cc: fix build error w/WEBP_REDUCE_SIZE
  mux_demux_api_fuzzer.cc: fix -Wshadow warning
  update ChangeLog
  update NEWS
  bump version to 1.5.0
  update AUTHORS

Change-Id: I076b197fac29230bc61bc5f06e950d83d058a737
2024-12-19 18:19:10 -08:00
29 changed files with 1519 additions and 197 deletions

View File

@ -9,11 +9,7 @@
if(APPLE)
cmake_minimum_required(VERSION 3.17)
else()
cmake_minimum_required(VERSION 3.7)
endif()
if(POLICY CMP0072)
cmake_policy(SET CMP0072 NEW)
cmake_minimum_required(VERSION 3.16)
endif()
project(WebP C)

View File

@ -32,7 +32,7 @@ PLATFORM_LDFLAGS = /SAFESEH
NOLOGO = /nologo
CCNODBG = cl.exe $(NOLOGO) /O2 /DNDEBUG
CCDEBUG = cl.exe $(NOLOGO) /Od /Zi /D_DEBUG /RTC1
CFLAGS = /I. /Isrc $(NOLOGO) /W3 /EHsc /c
CFLAGS = /I. /Isrc $(NOLOGO) /MP /W3 /EHsc /c
CFLAGS = $(CFLAGS) /DWIN32 /D_CRT_SECURE_NO_WARNINGS /DWIN32_LEAN_AND_MEAN
LDFLAGS = /LARGEADDRESSAWARE /MANIFEST:EMBED /NXCOMPAT /DYNAMICBASE
LDFLAGS = $(LDFLAGS) $(PLATFORM_LDFLAGS)
@ -231,6 +231,7 @@ DSP_DEC_OBJS = \
$(DIROBJ)\dsp\lossless_neon.obj \
$(DIROBJ)\dsp\lossless_sse2.obj \
$(DIROBJ)\dsp\lossless_sse41.obj \
$(DIROBJ)\dsp\lossless_avx2.obj \
$(DIROBJ)\dsp\rescaler.obj \
$(DIROBJ)\dsp\rescaler_mips32.obj \
$(DIROBJ)\dsp\rescaler_mips_dsp_r2.obj \
@ -270,6 +271,7 @@ DSP_ENC_OBJS = \
$(DIROBJ)\dsp\lossless_enc_neon.obj \
$(DIROBJ)\dsp\lossless_enc_sse2.obj \
$(DIROBJ)\dsp\lossless_enc_sse41.obj \
$(DIROBJ)\dsp\lossless_enc_avx2.obj \
$(DIROBJ)\dsp\ssim.obj \
$(DIROBJ)\dsp\ssim_sse2.obj \

View File

@ -94,6 +94,9 @@
/* Set to 1 if SSE4.1 is supported */
#cmakedefine WEBP_HAVE_SSE41 1
/* Set to 1 if AVX2 is supported */
#cmakedefine WEBP_HAVE_AVX2 1
/* Set to 1 if TIFF library is installed */
#cmakedefine WEBP_HAVE_TIFF 1

View File

@ -38,9 +38,9 @@ function(webp_check_compiler_flag WEBP_SIMD_FLAG ENABLE_SIMD)
endfunction()
# those are included in the names of WEBP_USE_* in c++ code.
set(WEBP_SIMD_FLAGS "SSE41;SSE2;MIPS32;MIPS_DSP_R2;NEON;MSA")
set(WEBP_SIMD_FLAGS "AVX2;SSE41;SSE2;MIPS32;MIPS_DSP_R2;NEON;MSA")
set(WEBP_SIMD_FILE_EXTENSIONS
"_sse41.c;_sse2.c;_mips32.c;_mips_dsp_r2.c;_neon.c;_msa.c")
"_avx2.c;_sse41.c;_sse2.c;_mips32.c;_mips_dsp_r2.c;_neon.c;_msa.c")
if(MSVC AND CMAKE_C_COMPILER_ID STREQUAL "MSVC")
# With at least Visual Studio 12 (2013)+ /arch is not necessary to build SSE2
# or SSE4 code unless a lesser /arch is forced. MSVC does not have a SSE4
@ -50,12 +50,12 @@ if(MSVC AND CMAKE_C_COMPILER_ID STREQUAL "MSVC")
if(MSVC_VERSION GREATER_EQUAL 1800 AND NOT CMAKE_C_FLAGS MATCHES "/arch:")
set(SIMD_ENABLE_FLAGS)
else()
set(SIMD_ENABLE_FLAGS "/arch:AVX;/arch:SSE2;;;;")
set(SIMD_ENABLE_FLAGS "/arch:AVX2;/arch:AVX;/arch:SSE2;;;;")
endif()
set(SIMD_DISABLE_FLAGS)
else()
set(SIMD_ENABLE_FLAGS "-msse4.1;-msse2;-mips32;-mdspr2;-mfpu=neon;-mmsa")
set(SIMD_DISABLE_FLAGS "-mno-sse4.1;-mno-sse2;;-mno-dspr2;;-mno-msa")
set(SIMD_ENABLE_FLAGS "-mavx2;-msse4.1;-msse2;-mips32;-mdspr2;-mfpu=neon;-mmsa")
set(SIMD_DISABLE_FLAGS "-mno-avx2;-mno-sse4.1;-mno-sse2;;-mno-dspr2;;-mno-msa")
endif()
set(WEBP_SIMD_FILES_TO_INCLUDE)

View File

@ -161,6 +161,25 @@ AS_IF([test "$GCC" = "yes" ], [
AC_SUBST([AM_CFLAGS])
dnl === Check for machine specific flags
AC_ARG_ENABLE([avx2],
AS_HELP_STRING([--disable-avx2],
[Disable detection of AVX2 support
@<:@default=auto@:>@]))
AS_IF([test "x$enable_avx2" != "xno" -a "x$enable_sse4_1" != "xno"
-a "x$enable_sse2" != "xno"], [
AVX2_FLAGS="$INTRINSICS_CFLAGS $AVX2_FLAGS"
TEST_AND_ADD_CFLAGS([AVX2_FLAGS], [-mavx2])
AS_IF([test -n "$AVX2_FLAGS"], [
SAVED_CFLAGS=$CFLAGS
CFLAGS="$CFLAGS $AVX2_FLAGS"
AC_CHECK_HEADER([immintrin.h],
[AC_DEFINE(WEBP_HAVE_AVX2, [1],
[Set to 1 if AVX2 is supported])],
[AVX2_FLAGS=""])
CFLAGS=$SAVED_CFLAGS])
AC_SUBST([AVX2_FLAGS])])
AC_ARG_ENABLE([sse4.1],
AS_HELP_STRING([--disable-sse4.1],
[Disable detection of SSE4.1 support

View File

@ -771,6 +771,7 @@ void GetDiffAndPSNR(const uint8_t rgba1[], const uint8_t rgba2[],
*psnr = 99.; // PSNR when images are identical.
} else {
sse /= stride * height;
assert(sse != 0.0);
*psnr = 4.3429448 * log(255. * 255. / sse);
}
}

View File

@ -139,6 +139,8 @@ static const struct {
{ "Raw profile type xmp", ProcessRawProfile, METADATA_OFFSET(xmp) },
// Exiftool puts exif data in APP1 chunk, too.
{ "Raw profile type APP1", ProcessRawProfile, METADATA_OFFSET(exif) },
// ImageMagick uses lowercase app1.
{ "Raw profile type app1", ProcessRawProfile, METADATA_OFFSET(exif) },
// XMP Specification Part 3, Section 3 #PNG
{ "XML:com.adobe.xmp", MetadataCopy, METADATA_OFFSET(xmp) },
{ NULL, NULL, 0 },
@ -159,6 +161,20 @@ static int ExtractMetadataFromPNG(png_structp png,
png_textp text = NULL;
const png_uint_32 num = png_get_text(png, info, &text, NULL);
png_uint_32 i;
#ifdef PNG_eXIf_SUPPORTED
// Look for an 'eXIf' tag. Preference is given to this tag as it's newer
// than the TextualData tags.
{
png_bytep exif;
png_uint_32 len;
if (png_get_eXIf_1(png, info, &len, &exif) == PNG_INFO_eXIf) {
if (!MetadataCopy((const char*)exif, len, &metadata->exif)) return 0;
}
}
#endif // PNG_eXIf_SUPPORTED
// Look for EXIF / XMP metadata.
for (i = 0; i < num; ++i, ++text) {
int j;
@ -192,6 +208,7 @@ static int ExtractMetadataFromPNG(png_structp png,
}
}
}
#ifdef PNG_iCCP_SUPPORTED
// Look for an ICC profile.
{
png_charp name;
@ -208,6 +225,7 @@ static int ExtractMetadataFromPNG(png_structp png,
if (!MetadataCopy((const char*)profile, len, &metadata->iccp)) return 0;
}
}
#endif // PNG_iCCP_SUPPORTED
}
return 1;
}

View File

@ -26,10 +26,9 @@ static const uint8_t kModeBpp[MODE_LAST] = {
4, 4, 4, 2, // pre-multiplied modes
1, 1 };
// Check that webp_csp_mode is within the bounds of WEBP_CSP_MODE.
// Convert to an integer to handle both the unsigned/signed enum cases
// without the need for casting to remove type limit warnings.
static int IsValidColorspace(int webp_csp_mode) {
int IsValidColorspace(int webp_csp_mode) {
return (webp_csp_mode >= MODE_RGB && webp_csp_mode < MODE_LAST);
}

View File

@ -51,4 +51,7 @@ enum { MB_FEATURE_TREE_PROBS = 3,
NUM_PROBAS = 11
};
// Check that webp_csp_mode is within the bounds of WEBP_CSP_MODE.
int IsValidColorspace(int webp_csp_mode);
#endif // WEBP_DEC_COMMON_DEC_H_

View File

@ -12,7 +12,9 @@
// Author: Skal (pascal.massimino@gmail.com)
#include <assert.h>
#include <stddef.h>
#include <stdlib.h>
#include "src/dec/vp8i_dec.h"
#include "src/dec/webpi_dec.h"
#include "src/dsp/dsp.h"
@ -25,9 +27,9 @@
static int EmitYUV(const VP8Io* const io, WebPDecParams* const p) {
WebPDecBuffer* output = p->output;
const WebPYUVABuffer* const buf = &output->u.YUVA;
uint8_t* const y_dst = buf->y + (size_t)io->mb_y * buf->y_stride;
uint8_t* const u_dst = buf->u + (size_t)(io->mb_y >> 1) * buf->u_stride;
uint8_t* const v_dst = buf->v + (size_t)(io->mb_y >> 1) * buf->v_stride;
uint8_t* const y_dst = buf->y + (ptrdiff_t)io->mb_y * buf->y_stride;
uint8_t* const u_dst = buf->u + (ptrdiff_t)(io->mb_y >> 1) * buf->u_stride;
uint8_t* const v_dst = buf->v + (ptrdiff_t)(io->mb_y >> 1) * buf->v_stride;
const int mb_w = io->mb_w;
const int mb_h = io->mb_h;
const int uv_w = (mb_w + 1) / 2;
@ -42,7 +44,7 @@ static int EmitYUV(const VP8Io* const io, WebPDecParams* const p) {
static int EmitSampledRGB(const VP8Io* const io, WebPDecParams* const p) {
WebPDecBuffer* const output = p->output;
WebPRGBABuffer* const buf = &output->u.RGBA;
uint8_t* const dst = buf->rgba + (size_t)io->mb_y * buf->stride;
uint8_t* const dst = buf->rgba + (ptrdiff_t)io->mb_y * buf->stride;
WebPSamplerProcessPlane(io->y, io->y_stride,
io->u, io->v, io->uv_stride,
dst, buf->stride, io->mb_w, io->mb_h,
@ -57,7 +59,7 @@ static int EmitSampledRGB(const VP8Io* const io, WebPDecParams* const p) {
static int EmitFancyRGB(const VP8Io* const io, WebPDecParams* const p) {
int num_lines_out = io->mb_h; // a priori guess
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* dst = buf->rgba + (size_t)io->mb_y * buf->stride;
uint8_t* dst = buf->rgba + (ptrdiff_t)io->mb_y * buf->stride;
WebPUpsampleLinePairFunc upsample = WebPUpsamplers[p->output->colorspace];
const uint8_t* cur_y = io->y;
const uint8_t* cur_u = io->u;
@ -128,7 +130,7 @@ static int EmitAlphaYUV(const VP8Io* const io, WebPDecParams* const p,
const WebPYUVABuffer* const buf = &p->output->u.YUVA;
const int mb_w = io->mb_w;
const int mb_h = io->mb_h;
uint8_t* dst = buf->a + (size_t)io->mb_y * buf->a_stride;
uint8_t* dst = buf->a + (ptrdiff_t)io->mb_y * buf->a_stride;
int j;
(void)expected_num_lines_out;
assert(expected_num_lines_out == mb_h);
@ -181,8 +183,8 @@ static int EmitAlphaRGB(const VP8Io* const io, WebPDecParams* const p,
(colorspace == MODE_ARGB || colorspace == MODE_Argb);
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
int num_rows;
const size_t start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + start_y * buf->stride;
const int start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + (ptrdiff_t)start_y * buf->stride;
uint8_t* const dst = base_rgba + (alpha_first ? 0 : 3);
const int has_alpha = WebPDispatchAlpha(alpha, io->width, mb_w,
num_rows, dst, buf->stride);
@ -205,8 +207,8 @@ static int EmitAlphaRGBA4444(const VP8Io* const io, WebPDecParams* const p,
const WEBP_CSP_MODE colorspace = p->output->colorspace;
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
int num_rows;
const size_t start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + start_y * buf->stride;
const int start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + (ptrdiff_t)start_y * buf->stride;
#if (WEBP_SWAP_16BIT_CSP == 1)
uint8_t* alpha_dst = base_rgba;
#else
@ -271,9 +273,9 @@ static int EmitRescaledYUV(const VP8Io* const io, WebPDecParams* const p) {
static int EmitRescaledAlphaYUV(const VP8Io* const io, WebPDecParams* const p,
int expected_num_lines_out) {
const WebPYUVABuffer* const buf = &p->output->u.YUVA;
uint8_t* const dst_a = buf->a + (size_t)p->last_y * buf->a_stride;
uint8_t* const dst_a = buf->a + (ptrdiff_t)p->last_y * buf->a_stride;
if (io->a != NULL) {
uint8_t* const dst_y = buf->y + (size_t)p->last_y * buf->y_stride;
uint8_t* const dst_y = buf->y + (ptrdiff_t)p->last_y * buf->y_stride;
const int num_lines_out = Rescale(io->a, io->width, io->mb_h, p->scaler_a);
assert(expected_num_lines_out == num_lines_out);
if (num_lines_out > 0) { // unmultiply the Y
@ -362,7 +364,7 @@ static int ExportRGB(WebPDecParams* const p, int y_pos) {
const WebPYUV444Converter convert =
WebPYUV444Converters[p->output->colorspace];
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* dst = buf->rgba + (size_t)y_pos * buf->stride;
uint8_t* dst = buf->rgba + (ptrdiff_t)y_pos * buf->stride;
int num_lines_out = 0;
// For RGB rescaling, because of the YUV420, current scan position
// U/V can be +1/-1 line from the Y one. Hence the double test.
@ -389,14 +391,14 @@ static int EmitRescaledRGB(const VP8Io* const io, WebPDecParams* const p) {
while (j < mb_h) {
const int y_lines_in =
WebPRescalerImport(p->scaler_y, mb_h - j,
io->y + (size_t)j * io->y_stride, io->y_stride);
io->y + (ptrdiff_t)j * io->y_stride, io->y_stride);
j += y_lines_in;
if (WebPRescaleNeededLines(p->scaler_u, uv_mb_h - uv_j)) {
const int u_lines_in = WebPRescalerImport(
p->scaler_u, uv_mb_h - uv_j, io->u + (size_t)uv_j * io->uv_stride,
p->scaler_u, uv_mb_h - uv_j, io->u + (ptrdiff_t)uv_j * io->uv_stride,
io->uv_stride);
const int v_lines_in = WebPRescalerImport(
p->scaler_v, uv_mb_h - uv_j, io->v + (size_t)uv_j * io->uv_stride,
p->scaler_v, uv_mb_h - uv_j, io->v + (ptrdiff_t)uv_j * io->uv_stride,
io->uv_stride);
(void)v_lines_in; // remove a gcc warning
assert(u_lines_in == v_lines_in);
@ -409,7 +411,7 @@ static int EmitRescaledRGB(const VP8Io* const io, WebPDecParams* const p) {
static int ExportAlpha(WebPDecParams* const p, int y_pos, int max_lines_out) {
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* const base_rgba = buf->rgba + (size_t)y_pos * buf->stride;
uint8_t* const base_rgba = buf->rgba + (ptrdiff_t)y_pos * buf->stride;
const WEBP_CSP_MODE colorspace = p->output->colorspace;
const int alpha_first =
(colorspace == MODE_ARGB || colorspace == MODE_Argb);
@ -437,7 +439,7 @@ static int ExportAlpha(WebPDecParams* const p, int y_pos, int max_lines_out) {
static int ExportAlphaRGBA4444(WebPDecParams* const p, int y_pos,
int max_lines_out) {
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* const base_rgba = buf->rgba + (size_t)y_pos * buf->stride;
uint8_t* const base_rgba = buf->rgba + (ptrdiff_t)y_pos * buf->stride;
#if (WEBP_SWAP_16BIT_CSP == 1)
uint8_t* alpha_dst = base_rgba;
#else
@ -476,7 +478,7 @@ static int EmitRescaledAlphaRGB(const VP8Io* const io, WebPDecParams* const p,
int lines_left = expected_num_out_lines;
const int y_end = p->last_y + lines_left;
while (lines_left > 0) {
const int64_t row_offset = (int64_t)scaler->src_y - io->mb_y;
const int64_t row_offset = (ptrdiff_t)scaler->src_y - io->mb_y;
WebPRescalerImport(scaler, io->mb_h + io->mb_y - scaler->src_y,
io->a + row_offset * io->width, io->width);
lines_left -= p->emit_alpha_row(p, y_end - lines_left, lines_left);

View File

@ -13,6 +13,7 @@
// Jyrki Alakuijala (jyrki@google.com)
#include <assert.h>
#include <stddef.h>
#include <stdlib.h>
#include "src/dec/alphai_dec.h"
@ -624,8 +625,8 @@ static int EmitRescaledRowsRGBA(const VP8LDecoder* const dec,
int num_lines_in = 0;
int num_lines_out = 0;
while (num_lines_in < mb_h) {
uint8_t* const row_in = in + (uint64_t)num_lines_in * in_stride;
uint8_t* const row_out = out + (uint64_t)num_lines_out * out_stride;
uint8_t* const row_in = in + (ptrdiff_t)num_lines_in * in_stride;
uint8_t* const row_out = out + (ptrdiff_t)num_lines_out * out_stride;
const int lines_left = mb_h - num_lines_in;
const int needed_lines = WebPRescaleNeededLines(dec->rescaler, lines_left);
int lines_imported;
@ -827,7 +828,7 @@ static void ProcessRows(VP8LDecoder* const dec, int row) {
if (WebPIsRGBMode(output->colorspace)) { // convert to RGBA
const WebPRGBABuffer* const buf = &output->u.RGBA;
uint8_t* const rgba =
buf->rgba + (int64_t)dec->last_out_row_ * buf->stride;
buf->rgba + (ptrdiff_t)dec->last_out_row_ * buf->stride;
const int num_rows_out =
#if !defined(WEBP_REDUCE_SIZE)
io->use_scaling ?

View File

@ -13,13 +13,15 @@
#include <stdlib.h>
#include "src/dec/common_dec.h"
#include "src/dec/vp8_dec.h"
#include "src/dec/vp8i_dec.h"
#include "src/dec/vp8li_dec.h"
#include "src/dec/webpi_dec.h"
#include "src/utils/rescaler_utils.h"
#include "src/utils/utils.h"
#include "src/webp/mux_types.h" // ALPHA_FLAG
#include "src/webp/decode.h"
#include "src/webp/mux_types.h" // ALPHA_FLAG
#include "src/webp/types.h"
//------------------------------------------------------------------------------
@ -747,6 +749,61 @@ int WebPInitDecoderConfigInternal(WebPDecoderConfig* config,
return 1;
}
static int WebPCheckCropDimensionsBasic(int x, int y, int w, int h) {
return !(x < 0 || y < 0 || w <= 0 || h <= 0);
}
int WebPValidateDecoderConfig(const WebPDecoderConfig* config) {
const WebPDecoderOptions* options;
if (config == NULL) return 0;
if (!IsValidColorspace(config->output.colorspace)) {
return 0;
}
options = &config->options;
// bypass_filtering, no_fancy_upsampling, use_cropping, use_scaling,
// use_threads, flip can be any integer and are interpreted as boolean.
// Check for cropping.
if (options->use_cropping && !WebPCheckCropDimensionsBasic(
options->crop_left, options->crop_top,
options->crop_width, options->crop_height)) {
return 0;
}
// Check for scaling.
if (options->use_scaling &&
(options->scaled_width < 0 || options->scaled_height < 0 ||
(options->scaled_width == 0 && options->scaled_height == 0))) {
return 0;
}
// In case the WebPBitstreamFeatures has been filled in, check further.
if (config->input.width > 0 || config->input.height > 0) {
int scaled_width = options->scaled_width;
int scaled_height = options->scaled_height;
if (options->use_cropping &&
!WebPCheckCropDimensions(config->input.width, config->input.height,
options->crop_left, options->crop_top,
options->crop_width, options->crop_height)) {
return 0;
}
if (options->use_scaling && !WebPRescalerGetScaledDimensions(
config->input.width, config->input.height,
&scaled_width, &scaled_height)) {
return 0;
}
}
// Check for dithering.
if (options->dithering_strength < 0 || options->dithering_strength > 100 ||
options->alpha_dithering_strength < 0 ||
options->alpha_dithering_strength > 100) {
return 0;
}
return 1;
}
VP8StatusCode WebPGetFeaturesInternal(const uint8_t* data, size_t data_size,
WebPBitstreamFeatures* features,
int version) {
@ -806,8 +863,8 @@ VP8StatusCode WebPDecode(const uint8_t* data, size_t data_size,
int WebPCheckCropDimensions(int image_width, int image_height,
int x, int y, int w, int h) {
return !(x < 0 || y < 0 || w <= 0 || h <= 0 ||
x >= image_width || w > image_width || w > image_width - x ||
return WebPCheckCropDimensionsBasic(x, y, w, h) &&
!(x >= image_width || w > image_width || w > image_width - x ||
y >= image_height || h > image_height || h > image_height - y);
}

View File

@ -5,6 +5,8 @@ noinst_LTLIBRARIES += libwebpdsp_sse2.la
noinst_LTLIBRARIES += libwebpdspdecode_sse2.la
noinst_LTLIBRARIES += libwebpdsp_sse41.la
noinst_LTLIBRARIES += libwebpdspdecode_sse41.la
noinst_LTLIBRARIES += libwebpdsp_avx2.la
noinst_LTLIBRARIES += libwebpdspdecode_avx2.la
noinst_LTLIBRARIES += libwebpdsp_neon.la
noinst_LTLIBRARIES += libwebpdspdecode_neon.la
noinst_LTLIBRARIES += libwebpdsp_msa.la
@ -44,6 +46,11 @@ ENC_SOURCES += lossless_enc.c
ENC_SOURCES += quant.h
ENC_SOURCES += ssim.c
libwebpdspdecode_avx2_la_SOURCES =
libwebpdspdecode_avx2_la_SOURCES += lossless_avx2.c
libwebpdspdecode_avx2_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdspdecode_avx2_la_CFLAGS = $(AM_CFLAGS) $(AVX2_FLAGS)
libwebpdspdecode_sse41_la_SOURCES =
libwebpdspdecode_sse41_la_SOURCES += alpha_processing_sse41.c
libwebpdspdecode_sse41_la_SOURCES += dec_sse41.c
@ -123,6 +130,12 @@ libwebpdsp_sse41_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdsp_sse41_la_CFLAGS = $(AM_CFLAGS) $(SSE41_FLAGS)
libwebpdsp_sse41_la_LIBADD = libwebpdspdecode_sse41.la
libwebpdsp_avx2_la_SOURCES =
libwebpdsp_avx2_la_SOURCES += lossless_enc_avx2.c
libwebpdsp_avx2_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdsp_avx2_la_CFLAGS = $(AM_CFLAGS) $(AVX2_FLAGS)
libwebpdsp_avx2_la_LIBADD = libwebpdspdecode_avx2.la
libwebpdsp_neon_la_SOURCES =
libwebpdsp_neon_la_SOURCES += cost_neon.c
libwebpdsp_neon_la_SOURCES += enc_neon.c
@ -167,6 +180,7 @@ libwebpdsp_la_LDFLAGS = -lm
libwebpdsp_la_LIBADD =
libwebpdsp_la_LIBADD += libwebpdsp_sse2.la
libwebpdsp_la_LIBADD += libwebpdsp_sse41.la
libwebpdsp_la_LIBADD += libwebpdsp_avx2.la
libwebpdsp_la_LIBADD += libwebpdsp_neon.la
libwebpdsp_la_LIBADD += libwebpdsp_msa.la
libwebpdsp_la_LIBADD += libwebpdsp_mips32.la
@ -180,6 +194,7 @@ if BUILD_LIBWEBPDECODER
libwebpdspdecode_la_LIBADD =
libwebpdspdecode_la_LIBADD += libwebpdspdecode_sse2.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_sse41.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_avx2.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_neon.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_msa.la
libwebpdspdecode_la_LIBADD += libwebpdspdecode_mips32.la

View File

@ -56,6 +56,11 @@
(defined(_M_X64) || defined(_M_IX86))
#define WEBP_MSC_SSE41 // Visual C++ SSE4.1 targets
#endif
#if defined(_MSC_VER) && _MSC_VER >= 1700 && \
(defined(_M_X64) || defined(_M_IX86))
#define WEBP_MSC_AVX2 // Visual C++ AVX2 targets
#endif
#endif
// WEBP_HAVE_* are used to indicate the presence of the instruction set in dsp
@ -80,6 +85,16 @@
#define WEBP_HAVE_SSE41
#endif
#if (defined(__AVX2__) || defined(WEBP_MSC_AVX2)) && \
(!defined(HAVE_CONFIG_H) || defined(WEBP_HAVE_AVX2))
#define WEBP_USE_AVX2
#endif
#if defined(WEBP_USE_AVX2) && !defined(WEBP_HAVE_AVX2)
#define WEBP_HAVE_AVX2
#endif
#undef WEBP_MSC_AVX2
#undef WEBP_MSC_SSE41
#undef WEBP_MSC_SSE2

View File

@ -945,6 +945,18 @@ static int Quantize2Blocks_NEON(int16_t in[32], int16_t out[32],
vst1q_u8(dst, r); \
} while (0)
static WEBP_INLINE uint8x16x4_t Vld1qU8x4(const uint8_t* ptr) {
#if LOCAL_CLANG_PREREQ(3, 4) || LOCAL_GCC_PREREQ(9, 4) || defined(_MSC_VER)
return vld1q_u8_x4(ptr);
#else
uint8x16x4_t res;
INIT_VECTOR4(res,
vld1q_u8(ptr + 0 * 16), vld1q_u8(ptr + 1 * 16),
vld1q_u8(ptr + 2 * 16), vld1q_u8(ptr + 3 * 16));
return res;
#endif
}
static void Intra4Preds_NEON(uint8_t* WEBP_RESTRICT dst,
const uint8_t* WEBP_RESTRICT top) {
// 0 1 2 3 4 5 6 7 8 9 10 11 12 13
@ -971,9 +983,9 @@ static void Intra4Preds_NEON(uint8_t* WEBP_RESTRICT dst,
30, 30, 30, 30, 0, 0, 0, 0, 21, 22, 23, 24, 16, 16, 16, 16
};
const uint8x16x4_t lookup_avgs1 = vld1q_u8_x4(kLookupTbl1);
const uint8x16x4_t lookup_avgs2 = vld1q_u8_x4(kLookupTbl2);
const uint8x16x4_t lookup_avgs3 = vld1q_u8_x4(kLookupTbl3);
const uint8x16x4_t lookup_avgs1 = Vld1qU8x4(kLookupTbl1);
const uint8x16x4_t lookup_avgs2 = Vld1qU8x4(kLookupTbl2);
const uint8x16x4_t lookup_avgs3 = Vld1qU8x4(kLookupTbl3);
const uint8x16_t preload = vld1q_u8(top - 5);
uint8x16x2_t qcombined;

View File

@ -13,15 +13,21 @@
// Jyrki Alakuijala (jyrki@google.com)
// Urvang Joshi (urvang@google.com)
#include "src/dsp/dsp.h"
#include "src/dsp/lossless.h"
#include <assert.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "src/dec/vp8li_dec.h"
#include "src/utils/endian_inl_utils.h"
#include "src/dsp/lossless.h"
#include "src/dsp/cpu.h"
#include "src/dsp/dsp.h"
#include "src/dsp/lossless_common.h"
#include "src/utils/endian_inl_utils.h"
#include "src/utils/utils.h"
#include "src/webp/decode.h"
#include "src/webp/format_constants.h"
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Image transforms.
@ -571,16 +577,21 @@ void VP8LConvertFromBGRA(const uint32_t* const in_data, int num_pixels,
//------------------------------------------------------------------------------
VP8LProcessDecBlueAndRedFunc VP8LAddGreenToBlueAndRed;
VP8LProcessDecBlueAndRedFunc VP8LAddGreenToBlueAndRed_SSE;
VP8LPredictorAddSubFunc VP8LPredictorsAdd[16];
VP8LPredictorAddSubFunc VP8LPredictorsAdd_SSE[16];
VP8LPredictorFunc VP8LPredictors[16];
// exposed plain-C implementations
VP8LPredictorAddSubFunc VP8LPredictorsAdd_C[16];
VP8LTransformColorInverseFunc VP8LTransformColorInverse;
VP8LTransformColorInverseFunc VP8LTransformColorInverse_SSE;
VP8LConvertFunc VP8LConvertBGRAToRGB;
VP8LConvertFunc VP8LConvertBGRAToRGB_SSE;
VP8LConvertFunc VP8LConvertBGRAToRGBA;
VP8LConvertFunc VP8LConvertBGRAToRGBA_SSE;
VP8LConvertFunc VP8LConvertBGRAToRGBA4444;
VP8LConvertFunc VP8LConvertBGRAToRGB565;
VP8LConvertFunc VP8LConvertBGRAToBGR;
@ -591,6 +602,7 @@ VP8LMapAlphaFunc VP8LMapColor8b;
extern VP8CPUInfo VP8GetCPUInfo;
extern void VP8LDspInitSSE2(void);
extern void VP8LDspInitSSE41(void);
extern void VP8LDspInitAVX2(void);
extern void VP8LDspInitNEON(void);
extern void VP8LDspInitMIPSdspR2(void);
extern void VP8LDspInitMSA(void);
@ -643,6 +655,11 @@ WEBP_DSP_INIT_FUNC(VP8LDspInit) {
#if defined(WEBP_HAVE_SSE41)
if (VP8GetCPUInfo(kSSE4_1)) {
VP8LDspInitSSE41();
#if defined(WEBP_HAVE_AVX2)
if (VP8GetCPUInfo(kAVX2)) {
VP8LDspInitAVX2();
}
#endif
}
#endif
}

View File

@ -64,10 +64,12 @@ typedef void (*VP8LPredictorAddSubFunc)(const uint32_t* in,
uint32_t* WEBP_RESTRICT out);
extern VP8LPredictorAddSubFunc VP8LPredictorsAdd[16];
extern VP8LPredictorAddSubFunc VP8LPredictorsAdd_C[16];
extern VP8LPredictorAddSubFunc VP8LPredictorsAdd_SSE[16];
typedef void (*VP8LProcessDecBlueAndRedFunc)(const uint32_t* src,
int num_pixels, uint32_t* dst);
extern VP8LProcessDecBlueAndRedFunc VP8LAddGreenToBlueAndRed;
extern VP8LProcessDecBlueAndRedFunc VP8LAddGreenToBlueAndRed_SSE;
typedef struct {
// Note: the members are uint8_t, so that any negative values are
@ -80,6 +82,7 @@ typedef void (*VP8LTransformColorInverseFunc)(const VP8LMultipliers* const m,
const uint32_t* src,
int num_pixels, uint32_t* dst);
extern VP8LTransformColorInverseFunc VP8LTransformColorInverse;
extern VP8LTransformColorInverseFunc VP8LTransformColorInverse_SSE;
struct VP8LTransform; // Defined in dec/vp8li.h.
@ -99,6 +102,8 @@ extern VP8LConvertFunc VP8LConvertBGRAToRGBA;
extern VP8LConvertFunc VP8LConvertBGRAToRGBA4444;
extern VP8LConvertFunc VP8LConvertBGRAToRGB565;
extern VP8LConvertFunc VP8LConvertBGRAToBGR;
extern VP8LConvertFunc VP8LConvertBGRAToRGB_SSE;
extern VP8LConvertFunc VP8LConvertBGRAToRGBA_SSE;
// Converts from BGRA to other color spaces.
void VP8LConvertFromBGRA(const uint32_t* const in_data, int num_pixels,
@ -149,21 +154,25 @@ void VP8LDspInit(void);
typedef void (*VP8LProcessEncBlueAndRedFunc)(uint32_t* dst, int num_pixels);
extern VP8LProcessEncBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed;
extern VP8LProcessEncBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed_SSE;
typedef void (*VP8LTransformColorFunc)(
const VP8LMultipliers* WEBP_RESTRICT const m, uint32_t* WEBP_RESTRICT dst,
int num_pixels);
extern VP8LTransformColorFunc VP8LTransformColor;
extern VP8LTransformColorFunc VP8LTransformColor_SSE;
typedef void (*VP8LCollectColorBlueTransformsFunc)(
const uint32_t* WEBP_RESTRICT argb, int stride,
int tile_width, int tile_height,
int green_to_blue, int red_to_blue, uint32_t histo[]);
extern VP8LCollectColorBlueTransformsFunc VP8LCollectColorBlueTransforms;
extern VP8LCollectColorBlueTransformsFunc VP8LCollectColorBlueTransforms_SSE;
typedef void (*VP8LCollectColorRedTransformsFunc)(
const uint32_t* WEBP_RESTRICT argb, int stride,
int tile_width, int tile_height,
int green_to_red, uint32_t histo[]);
extern VP8LCollectColorRedTransformsFunc VP8LCollectColorRedTransforms;
extern VP8LCollectColorRedTransformsFunc VP8LCollectColorRedTransforms_SSE;
// Expose some C-only fallback functions
void VP8LTransformColor_C(const VP8LMultipliers* WEBP_RESTRICT const m,
@ -181,20 +190,17 @@ void VP8LCollectColorBlueTransforms_C(const uint32_t* WEBP_RESTRICT argb,
extern VP8LPredictorAddSubFunc VP8LPredictorsSub[16];
extern VP8LPredictorAddSubFunc VP8LPredictorsSub_C[16];
extern VP8LPredictorAddSubFunc VP8LPredictorsSub_SSE[16];
// -----------------------------------------------------------------------------
// Huffman-cost related functions.
typedef uint32_t (*VP8LCostFunc)(const uint32_t* population, int length);
typedef uint32_t (*VP8LCostCombinedFunc)(const uint32_t* WEBP_RESTRICT X,
const uint32_t* WEBP_RESTRICT Y,
int length);
typedef uint64_t (*VP8LCombinedShannonEntropyFunc)(const uint32_t X[256],
const uint32_t Y[256]);
typedef uint64_t (*VP8LShannonEntropyFunc)(const uint32_t* X, int length);
extern VP8LCostFunc VP8LExtraCost;
extern VP8LCostCombinedFunc VP8LExtraCostCombined;
extern VP8LCombinedShannonEntropyFunc VP8LCombinedShannonEntropy;
extern VP8LShannonEntropyFunc VP8LShannonEntropy;
@ -255,6 +261,7 @@ typedef void (*VP8LBundleColorMapFunc)(const uint8_t* WEBP_RESTRICT const row,
int width, int xbits,
uint32_t* WEBP_RESTRICT dst);
extern VP8LBundleColorMapFunc VP8LBundleColorMap;
extern VP8LBundleColorMapFunc VP8LBundleColorMap_SSE;
void VP8LBundleColorMap_C(const uint8_t* WEBP_RESTRICT const row,
int width, int xbits, uint32_t* WEBP_RESTRICT dst);

442
src/dsp/lossless_avx2.c Normal file
View File

@ -0,0 +1,442 @@
// Copyright 2025 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// AVX2 variant of methods for lossless decoder
//
// Author: Vincent Rabaud (vrabaud@google.com)
#include "src/dsp/dsp.h"
#if defined(WEBP_USE_AVX2)
#include <immintrin.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/webp/format_constants.h"
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Predictor Transform
static WEBP_INLINE void Average2_m256i(const __m256i* const a0,
const __m256i* const a1,
__m256i* const avg) {
// (a + b) >> 1 = ((a + b + 1) >> 1) - ((a ^ b) & 1)
const __m256i ones = _mm256_set1_epi8(1);
const __m256i avg1 = _mm256_avg_epu8(*a0, *a1);
const __m256i one = _mm256_and_si256(_mm256_xor_si256(*a0, *a1), ones);
*avg = _mm256_sub_epi8(avg1, one);
}
// Batch versions of those functions.
// Predictor0: ARGB_BLACK.
static void PredictorAdd0_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i black = _mm256_set1_epi32((int)ARGB_BLACK);
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i res = _mm256_add_epi8(src, black);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[0](in + i, NULL, num_pixels - i, out + i);
}
(void)upper;
}
// Predictor1: left.
static void PredictorAdd1_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
__m256i prev = _mm256_set1_epi32((int)out[-1]);
for (i = 0; i + 8 <= num_pixels; i += 8) {
// h | g | f | e | d | c | b | a
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
// g | f | e | 0 | c | b | a | 0
const __m256i shift0 = _mm256_slli_si256(src, 4);
// g + h | f + g | e + f | e | c + d | b + c | a + b | a
const __m256i sum0 = _mm256_add_epi8(src, shift0);
// e + f | e | 0 | 0 | a + b | a | 0 | 0
const __m256i shift1 = _mm256_slli_si256(sum0, 8);
// e + f + g + h | e + f + g | e + f | e | a + b + c + d | a + b + c | a + b
// | a
const __m256i sum1 = _mm256_add_epi8(sum0, shift1);
// Add a + b + c + d to the upper lane.
const int32_t sum_abcd = _mm256_extract_epi32(sum1, 3);
const __m256i sum2 = _mm256_add_epi8(
sum1,
_mm256_set_epi32(sum_abcd, sum_abcd, sum_abcd, sum_abcd, 0, 0, 0, 0));
const __m256i res = _mm256_add_epi8(sum2, prev);
_mm256_storeu_si256((__m256i*)&out[i], res);
// replicate last res output in prev.
prev = _mm256_permutevar8x32_epi32(
res, _mm256_set_epi32(7, 7, 7, 7, 7, 7, 7, 7));
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[1](in + i, upper + i, num_pixels - i, out + i);
}
}
// Macro that adds 32-bit integers from IN using mod 256 arithmetic
// per 8 bit channel.
#define GENERATE_PREDICTOR_1(X, IN) \
static void PredictorAdd##X##_AVX2(const uint32_t* in, \
const uint32_t* upper, int num_pixels, \
uint32_t* WEBP_RESTRICT out) { \
int i; \
for (i = 0; i + 8 <= num_pixels; i += 8) { \
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]); \
const __m256i other = _mm256_loadu_si256((const __m256i*)&(IN)); \
const __m256i res = _mm256_add_epi8(src, other); \
_mm256_storeu_si256((__m256i*)&out[i], res); \
} \
if (i != num_pixels) { \
VP8LPredictorsAdd_SSE[(X)](in + i, upper + i, num_pixels - i, out + i); \
} \
}
// Predictor2: Top.
GENERATE_PREDICTOR_1(2, upper[i])
// Predictor3: Top-right.
GENERATE_PREDICTOR_1(3, upper[i + 1])
// Predictor4: Top-left.
GENERATE_PREDICTOR_1(4, upper[i - 1])
#undef GENERATE_PREDICTOR_1
// Due to averages with integers, values cannot be accumulated in parallel for
// predictors 5 to 7.
#define GENERATE_PREDICTOR_2(X, IN) \
static void PredictorAdd##X##_AVX2(const uint32_t* in, \
const uint32_t* upper, int num_pixels, \
uint32_t* WEBP_RESTRICT out) { \
int i; \
for (i = 0; i + 8 <= num_pixels; i += 8) { \
const __m256i Tother = _mm256_loadu_si256((const __m256i*)&(IN)); \
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]); \
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]); \
__m256i avg, res; \
Average2_m256i(&T, &Tother, &avg); \
res = _mm256_add_epi8(avg, src); \
_mm256_storeu_si256((__m256i*)&out[i], res); \
} \
if (i != num_pixels) { \
VP8LPredictorsAdd_SSE[(X)](in + i, upper + i, num_pixels - i, out + i); \
} \
}
// Predictor8: average TL T.
GENERATE_PREDICTOR_2(8, upper[i - 1])
// Predictor9: average T TR.
GENERATE_PREDICTOR_2(9, upper[i + 1])
#undef GENERATE_PREDICTOR_2
// Predictor10: average of (average of (L,TL), average of (T, TR)).
#define DO_PRED10(OUT) \
do { \
__m256i avgLTL, avg; \
Average2_m256i(&L, &TL, &avgLTL); \
Average2_m256i(&avgTTR, &avgLTL, &avg); \
L = _mm256_add_epi8(avg, src); \
out[i + (OUT)] = (uint32_t)_mm256_cvtsi256_si32(L); \
} while (0)
#define DO_PRED10_SHIFT \
do { \
/* Rotate the pre-computed values for the next iteration.*/ \
avgTTR = _mm256_srli_si256(avgTTR, 4); \
TL = _mm256_srli_si256(TL, 4); \
src = _mm256_srli_si256(src, 4); \
} while (0)
static void PredictorAdd10_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i, j;
__m256i L = _mm256_setr_epi32((int)out[-1], 0, 0, 0, 0, 0, 0, 0);
for (i = 0; i + 8 <= num_pixels; i += 8) {
__m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
__m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TR = _mm256_loadu_si256((const __m256i*)&upper[i + 1]);
__m256i avgTTR;
Average2_m256i(&T, &TR, &avgTTR);
{
const __m256i avgTTR_bak = avgTTR;
const __m256i TL_bak = TL;
const __m256i src_bak = src;
for (j = 0; j < 4; ++j) {
DO_PRED10(j);
DO_PRED10_SHIFT;
}
avgTTR = _mm256_permute2x128_si256(avgTTR_bak, avgTTR_bak, 1);
TL = _mm256_permute2x128_si256(TL_bak, TL_bak, 1);
src = _mm256_permute2x128_si256(src_bak, src_bak, 1);
for (; j < 8; ++j) {
DO_PRED10(j);
DO_PRED10_SHIFT;
}
}
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[10](in + i, upper + i, num_pixels - i, out + i);
}
}
#undef DO_PRED10
#undef DO_PRED10_SHIFT
// Predictor11: select.
#define DO_PRED11(OUT) \
do { \
const __m256i L_lo = _mm256_unpacklo_epi32(L, T); \
const __m256i TL_lo = _mm256_unpacklo_epi32(TL, T); \
const __m256i pb = _mm256_sad_epu8(L_lo, TL_lo); /* pb = sum |L-TL|*/ \
const __m256i mask = _mm256_cmpgt_epi32(pb, pa); \
const __m256i A = _mm256_and_si256(mask, L); \
const __m256i B = _mm256_andnot_si256(mask, T); \
const __m256i pred = _mm256_or_si256(A, B); /* pred = (pa > b)? L : T*/ \
L = _mm256_add_epi8(src, pred); \
out[i + (OUT)] = (uint32_t)_mm256_cvtsi256_si32(L); \
} while (0)
#define DO_PRED11_SHIFT \
do { \
/* Shift the pre-computed value for the next iteration.*/ \
T = _mm256_srli_si256(T, 4); \
TL = _mm256_srli_si256(TL, 4); \
src = _mm256_srli_si256(src, 4); \
pa = _mm256_srli_si256(pa, 4); \
} while (0)
static void PredictorAdd11_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i, j;
__m256i pa;
__m256i L = _mm256_setr_epi32((int)out[-1], 0, 0, 0, 0, 0, 0, 0);
for (i = 0; i + 8 <= num_pixels; i += 8) {
__m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
__m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
__m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
{
// We can unpack with any value on the upper 32 bits, provided it's the
// same on both operands (so that their sum of abs diff is zero). Here we
// use T.
const __m256i T_lo = _mm256_unpacklo_epi32(T, T);
const __m256i TL_lo = _mm256_unpacklo_epi32(TL, T);
const __m256i T_hi = _mm256_unpackhi_epi32(T, T);
const __m256i TL_hi = _mm256_unpackhi_epi32(TL, T);
const __m256i s_lo = _mm256_sad_epu8(T_lo, TL_lo);
const __m256i s_hi = _mm256_sad_epu8(T_hi, TL_hi);
pa = _mm256_packs_epi32(s_lo, s_hi); // pa = sum |T-TL|
}
{
const __m256i T_bak = T;
const __m256i TL_bak = TL;
const __m256i src_bak = src;
const __m256i pa_bak = pa;
for (j = 0; j < 4; ++j) {
DO_PRED11(j);
DO_PRED11_SHIFT;
}
T = _mm256_permute2x128_si256(T_bak, T_bak, 1);
TL = _mm256_permute2x128_si256(TL_bak, TL_bak, 1);
src = _mm256_permute2x128_si256(src_bak, src_bak, 1);
pa = _mm256_permute2x128_si256(pa_bak, pa_bak, 1);
for (; j < 8; ++j) {
DO_PRED11(j);
DO_PRED11_SHIFT;
}
}
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[11](in + i, upper + i, num_pixels - i, out + i);
}
}
#undef DO_PRED11
#undef DO_PRED11_SHIFT
// Predictor12: ClampedAddSubtractFull.
#define DO_PRED12(DIFF, OUT) \
do { \
const __m256i all = _mm256_add_epi16(L, (DIFF)); \
const __m256i alls = _mm256_packus_epi16(all, all); \
const __m256i res = _mm256_add_epi8(src, alls); \
out[i + (OUT)] = (uint32_t)_mm256_cvtsi256_si32(res); \
L = _mm256_unpacklo_epi8(res, zero); \
} while (0)
#define DO_PRED12_SHIFT(DIFF, LANE) \
do { \
/* Shift the pre-computed value for the next iteration.*/ \
if ((LANE) == 0) (DIFF) = _mm256_srli_si256(DIFF, 8); \
src = _mm256_srli_si256(src, 4); \
} while (0)
static void PredictorAdd12_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i zero = _mm256_setzero_si256();
const __m256i L8 = _mm256_setr_epi32((int)out[-1], 0, 0, 0, 0, 0, 0, 0);
__m256i L = _mm256_unpacklo_epi8(L8, zero);
for (i = 0; i + 8 <= num_pixels; i += 8) {
// Load 8 pixels at a time.
__m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i T_lo = _mm256_unpacklo_epi8(T, zero);
const __m256i T_hi = _mm256_unpackhi_epi8(T, zero);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i TL_lo = _mm256_unpacklo_epi8(TL, zero);
const __m256i TL_hi = _mm256_unpackhi_epi8(TL, zero);
__m256i diff_lo = _mm256_sub_epi16(T_lo, TL_lo);
__m256i diff_hi = _mm256_sub_epi16(T_hi, TL_hi);
const __m256i diff_lo_bak = diff_lo;
const __m256i diff_hi_bak = diff_hi;
const __m256i src_bak = src;
DO_PRED12(diff_lo, 0);
DO_PRED12_SHIFT(diff_lo, 0);
DO_PRED12(diff_lo, 1);
DO_PRED12_SHIFT(diff_lo, 0);
DO_PRED12(diff_hi, 2);
DO_PRED12_SHIFT(diff_hi, 0);
DO_PRED12(diff_hi, 3);
DO_PRED12_SHIFT(diff_hi, 0);
// Process the upper lane.
diff_lo = _mm256_permute2x128_si256(diff_lo_bak, diff_lo_bak, 1);
diff_hi = _mm256_permute2x128_si256(diff_hi_bak, diff_hi_bak, 1);
src = _mm256_permute2x128_si256(src_bak, src_bak, 1);
DO_PRED12(diff_lo, 4);
DO_PRED12_SHIFT(diff_lo, 0);
DO_PRED12(diff_lo, 5);
DO_PRED12_SHIFT(diff_lo, 1);
DO_PRED12(diff_hi, 6);
DO_PRED12_SHIFT(diff_hi, 0);
DO_PRED12(diff_hi, 7);
}
if (i != num_pixels) {
VP8LPredictorsAdd_SSE[12](in + i, upper + i, num_pixels - i, out + i);
}
}
#undef DO_PRED12
#undef DO_PRED12_SHIFT
// Due to averages with integers, values cannot be accumulated in parallel for
// predictors 13.
//------------------------------------------------------------------------------
// Subtract-Green Transform
static void AddGreenToBlueAndRed_AVX2(const uint32_t* const src, int num_pixels,
uint32_t* dst) {
int i;
const __m256i kCstShuffle = _mm256_set_epi8(
-1, 29, -1, 29, -1, 25, -1, 25, -1, 21, -1, 21, -1, 17, -1, 17, -1, 13,
-1, 13, -1, 9, -1, 9, -1, 5, -1, 5, -1, 1, -1, 1);
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i in = _mm256_loadu_si256((const __m256i*)&src[i]); // argb
const __m256i in_0g0g = _mm256_shuffle_epi8(in, kCstShuffle); // 0g0g
const __m256i out = _mm256_add_epi8(in, in_0g0g);
_mm256_storeu_si256((__m256i*)&dst[i], out);
}
// fallthrough and finish off with SSE.
if (i != num_pixels) {
VP8LAddGreenToBlueAndRed_SSE(src + i, num_pixels - i, dst + i);
}
}
//------------------------------------------------------------------------------
// Color Transform
static void TransformColorInverse_AVX2(const VP8LMultipliers* const m,
const uint32_t* const src,
int num_pixels, uint32_t* dst) {
// sign-extended multiplying constants, pre-shifted by 5.
#define CST(X) (((int16_t)(m->X << 8)) >> 5) // sign-extend
const __m256i mults_rb =
_mm256_set1_epi32((int)((uint32_t)CST(green_to_red_) << 16 |
(CST(green_to_blue_) & 0xffff)));
const __m256i mults_b2 = _mm256_set1_epi32(CST(red_to_blue_));
#undef CST
const __m256i mask_ag = _mm256_set1_epi32((int)0xff00ff00);
const __m256i perm1 = _mm256_setr_epi8(
-1, 1, -1, 1, -1, 5, -1, 5, -1, 9, -1, 9, -1, 13, -1, 13, -1, 17, -1, 17,
-1, 21, -1, 21, -1, 25, -1, 25, -1, 29, -1, 29);
const __m256i perm2 = _mm256_setr_epi8(
-1, 2, -1, -1, -1, 6, -1, -1, -1, 10, -1, -1, -1, 14, -1, -1, -1, 18, -1,
-1, -1, 22, -1, -1, -1, 26, -1, -1, -1, 30, -1, -1);
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i A = _mm256_loadu_si256((const __m256i*)(src + i));
const __m256i B = _mm256_shuffle_epi8(A, perm1); // argb -> g0g0
const __m256i C = _mm256_mulhi_epi16(B, mults_rb);
const __m256i D = _mm256_add_epi8(A, C);
const __m256i E = _mm256_shuffle_epi8(D, perm2);
const __m256i F = _mm256_mulhi_epi16(E, mults_b2);
const __m256i G = _mm256_add_epi8(D, F);
const __m256i out = _mm256_blendv_epi8(G, A, mask_ag);
_mm256_storeu_si256((__m256i*)&dst[i], out);
}
// Fall-back to SSE-version for left-overs.
if (i != num_pixels) {
VP8LTransformColorInverse_SSE(m, src + i, num_pixels - i, dst + i);
}
}
//------------------------------------------------------------------------------
// Color-space conversion functions
static void ConvertBGRAToRGBA_AVX2(const uint32_t* WEBP_RESTRICT src,
int num_pixels, uint8_t* WEBP_RESTRICT dst) {
const __m256i* in = (const __m256i*)src;
__m256i* out = (__m256i*)dst;
while (num_pixels >= 8) {
const __m256i A = _mm256_loadu_si256(in++);
const __m256i B = _mm256_shuffle_epi8(
A,
_mm256_set_epi8(15, 12, 13, 14, 11, 8, 9, 10, 7, 4, 5, 6, 3, 0, 1, 2,
15, 12, 13, 14, 11, 8, 9, 10, 7, 4, 5, 6, 3, 0, 1, 2));
_mm256_storeu_si256(out++, B);
num_pixels -= 8;
}
// left-overs
if (num_pixels > 0) {
VP8LConvertBGRAToRGBA_SSE((const uint32_t*)in, num_pixels, (uint8_t*)out);
}
}
//------------------------------------------------------------------------------
// Entry point
extern void VP8LDspInitAVX2(void);
WEBP_TSAN_IGNORE_FUNCTION void VP8LDspInitAVX2(void) {
VP8LPredictorsAdd[0] = PredictorAdd0_AVX2;
VP8LPredictorsAdd[1] = PredictorAdd1_AVX2;
VP8LPredictorsAdd[2] = PredictorAdd2_AVX2;
VP8LPredictorsAdd[3] = PredictorAdd3_AVX2;
VP8LPredictorsAdd[4] = PredictorAdd4_AVX2;
VP8LPredictorsAdd[8] = PredictorAdd8_AVX2;
VP8LPredictorsAdd[9] = PredictorAdd9_AVX2;
VP8LPredictorsAdd[10] = PredictorAdd10_AVX2;
VP8LPredictorsAdd[11] = PredictorAdd11_AVX2;
VP8LPredictorsAdd[12] = PredictorAdd12_AVX2;
VP8LAddGreenToBlueAndRed = AddGreenToBlueAndRed_AVX2;
VP8LTransformColorInverse = TransformColorInverse_AVX2;
VP8LConvertBGRAToRGBA = ConvertBGRAToRGBA_AVX2;
}
#else // !WEBP_USE_AVX2
WEBP_DSP_INIT_STUB(VP8LDspInitAVX2)
#endif // WEBP_USE_AVX2

View File

@ -13,16 +13,19 @@
// Jyrki Alakuijala (jyrki@google.com)
// Urvang Joshi (urvang@google.com)
#include "src/dsp/dsp.h"
#include <assert.h>
#include <math.h>
#include <stdlib.h>
#include "src/dec/vp8li_dec.h"
#include "src/utils/endian_inl_utils.h"
#include <string.h>
#include "src/dsp/cpu.h"
#include "src/dsp/dsp.h"
#include "src/dsp/lossless.h"
#include "src/dsp/lossless_common.h"
#include "src/dsp/yuv.h"
#include "src/enc/histogram_enc.h"
#include "src/utils/utils.h"
#include "src/webp/format_constants.h"
#include "src/webp/types.h"
// lookup table for small values of log2(int) * (1 << LOG_2_PRECISION_BITS).
// Obtained in Python with:
@ -580,20 +583,6 @@ static uint32_t ExtraCost_C(const uint32_t* population, int length) {
return cost;
}
static uint32_t ExtraCostCombined_C(const uint32_t* WEBP_RESTRICT X,
const uint32_t* WEBP_RESTRICT Y,
int length) {
int i;
uint32_t cost = X[4] + Y[4] + X[5] + Y[5];
assert(length % 2 == 0);
for (i = 2; i < length / 2 - 1; ++i) {
const int xy0 = X[2 * i + 2] + Y[2 * i + 2];
const int xy1 = X[2 * i + 3] + Y[2 * i + 3];
cost += i * (xy0 + xy1);
}
return cost;
}
//------------------------------------------------------------------------------
static void AddVector_C(const uint32_t* WEBP_RESTRICT a,
@ -710,17 +699,20 @@ GENERATE_PREDICTOR_SUB(13)
//------------------------------------------------------------------------------
VP8LProcessEncBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed;
VP8LProcessEncBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed_SSE;
VP8LTransformColorFunc VP8LTransformColor;
VP8LTransformColorFunc VP8LTransformColor_SSE;
VP8LCollectColorBlueTransformsFunc VP8LCollectColorBlueTransforms;
VP8LCollectColorBlueTransformsFunc VP8LCollectColorBlueTransforms_SSE;
VP8LCollectColorRedTransformsFunc VP8LCollectColorRedTransforms;
VP8LCollectColorRedTransformsFunc VP8LCollectColorRedTransforms_SSE;
VP8LFastLog2SlowFunc VP8LFastLog2Slow;
VP8LFastSLog2SlowFunc VP8LFastSLog2Slow;
VP8LCostFunc VP8LExtraCost;
VP8LCostCombinedFunc VP8LExtraCostCombined;
VP8LCombinedShannonEntropyFunc VP8LCombinedShannonEntropy;
VP8LShannonEntropyFunc VP8LShannonEntropy;
@ -732,13 +724,16 @@ VP8LAddVectorEqFunc VP8LAddVectorEq;
VP8LVectorMismatchFunc VP8LVectorMismatch;
VP8LBundleColorMapFunc VP8LBundleColorMap;
VP8LBundleColorMapFunc VP8LBundleColorMap_SSE;
VP8LPredictorAddSubFunc VP8LPredictorsSub[16];
VP8LPredictorAddSubFunc VP8LPredictorsSub_C[16];
VP8LPredictorAddSubFunc VP8LPredictorsSub_SSE[16];
extern VP8CPUInfo VP8GetCPUInfo;
extern void VP8LEncDspInitSSE2(void);
extern void VP8LEncDspInitSSE41(void);
extern void VP8LEncDspInitAVX2(void);
extern void VP8LEncDspInitNEON(void);
extern void VP8LEncDspInitMIPS32(void);
extern void VP8LEncDspInitMIPSdspR2(void);
@ -760,7 +755,6 @@ WEBP_DSP_INIT_FUNC(VP8LEncDspInit) {
VP8LFastSLog2Slow = FastSLog2Slow_C;
VP8LExtraCost = ExtraCost_C;
VP8LExtraCostCombined = ExtraCostCombined_C;
VP8LCombinedShannonEntropy = CombinedShannonEntropy_C;
VP8LShannonEntropy = ShannonEntropy_C;
@ -815,6 +809,11 @@ WEBP_DSP_INIT_FUNC(VP8LEncDspInit) {
#if defined(WEBP_HAVE_SSE41)
if (VP8GetCPUInfo(kSSE4_1)) {
VP8LEncDspInitSSE41();
#if defined(WEBP_HAVE_AVX2)
if (VP8GetCPUInfo(kAVX2)) {
VP8LEncDspInitAVX2();
}
#endif
}
#endif
}
@ -850,7 +849,6 @@ WEBP_DSP_INIT_FUNC(VP8LEncDspInit) {
assert(VP8LFastLog2Slow != NULL);
assert(VP8LFastSLog2Slow != NULL);
assert(VP8LExtraCost != NULL);
assert(VP8LExtraCostCombined != NULL);
assert(VP8LCombinedShannonEntropy != NULL);
assert(VP8LShannonEntropy != NULL);
assert(VP8LGetEntropyUnrefined != NULL);

733
src/dsp/lossless_enc_avx2.c Normal file
View File

@ -0,0 +1,733 @@
// Copyright 2025 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// AVX2 variant of methods for lossless encoder
//
// Author: Vincent Rabaud (vrabaud@google.com)
#include "src/dsp/dsp.h"
#if defined(WEBP_USE_AVX2)
#include <assert.h>
#include <immintrin.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/dsp/lossless_common.h"
#include "src/utils/utils.h"
#include "src/webp/format_constants.h"
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Subtract-Green Transform
static void SubtractGreenFromBlueAndRed_AVX2(uint32_t* argb_data,
int num_pixels) {
int i;
const __m256i kCstShuffle = _mm256_set_epi8(
-1, 29, -1, 29, -1, 25, -1, 25, -1, 21, -1, 21, -1, 17, -1, 17, -1, 13,
-1, 13, -1, 9, -1, 9, -1, 5, -1, 5, -1, 1, -1, 1);
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i in = _mm256_loadu_si256((__m256i*)&argb_data[i]); // argb
const __m256i in_0g0g = _mm256_shuffle_epi8(in, kCstShuffle);
const __m256i out = _mm256_sub_epi8(in, in_0g0g);
_mm256_storeu_si256((__m256i*)&argb_data[i], out);
}
// fallthrough and finish off with plain-SSE
if (i != num_pixels) {
VP8LSubtractGreenFromBlueAndRed_SSE(argb_data + i, num_pixels - i);
}
}
//------------------------------------------------------------------------------
// Color Transform
// For sign-extended multiplying constants, pre-shifted by 5:
#define CST_5b(X) (((int16_t)((uint16_t)(X) << 8)) >> 5)
#define MK_CST_16(HI, LO) \
_mm256_set1_epi32((int)(((uint32_t)(HI) << 16) | ((LO) & 0xffff)))
static void TransformColor_AVX2(const VP8LMultipliers* WEBP_RESTRICT const m,
uint32_t* WEBP_RESTRICT argb_data,
int num_pixels) {
const __m256i mults_rb =
MK_CST_16(CST_5b(m->green_to_red_), CST_5b(m->green_to_blue_));
const __m256i mults_b2 = MK_CST_16(CST_5b(m->red_to_blue_), 0);
const __m256i mask_rb = _mm256_set1_epi32(0x00ff00ff); // red-blue masks
const __m256i kCstShuffle = _mm256_set_epi8(
29, -1, 29, -1, 25, -1, 25, -1, 21, -1, 21, -1, 17, -1, 17, -1, 13, -1,
13, -1, 9, -1, 9, -1, 5, -1, 5, -1, 1, -1, 1, -1);
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i in = _mm256_loadu_si256((__m256i*)&argb_data[i]); // argb
const __m256i A = _mm256_shuffle_epi8(in, kCstShuffle); // g0g0
const __m256i B = _mm256_mulhi_epi16(A, mults_rb); // x dr x db1
const __m256i C = _mm256_slli_epi16(in, 8); // r 0 b 0
const __m256i D = _mm256_mulhi_epi16(C, mults_b2); // x db2 0 0
const __m256i E = _mm256_srli_epi32(D, 16); // 0 0 x db2
const __m256i F = _mm256_add_epi8(E, B); // x dr x db
const __m256i G = _mm256_and_si256(F, mask_rb); // 0 dr 0 db
const __m256i out = _mm256_sub_epi8(in, G);
_mm256_storeu_si256((__m256i*)&argb_data[i], out);
}
// fallthrough and finish off with plain-C
if (i != num_pixels) {
VP8LTransformColor_SSE(m, argb_data + i, num_pixels - i);
}
}
//------------------------------------------------------------------------------
#define SPAN 16
static void CollectColorBlueTransforms_AVX2(const uint32_t* WEBP_RESTRICT argb,
int stride, int tile_width,
int tile_height, int green_to_blue,
int red_to_blue, uint32_t histo[]) {
const __m256i mult =
MK_CST_16(CST_5b(red_to_blue) + 256, CST_5b(green_to_blue));
const __m256i perm = _mm256_setr_epi8(
-1, 1, -1, 2, -1, 5, -1, 6, -1, 9, -1, 10, -1, 13, -1, 14, -1, 17, -1, 18,
-1, 21, -1, 22, -1, 25, -1, 26, -1, 29, -1, 30);
if (tile_width >= 8) {
int y, i;
for (y = 0; y < tile_height; ++y) {
uint8_t values[32];
const uint32_t* const src = argb + y * stride;
const __m256i A1 = _mm256_loadu_si256((const __m256i*)src);
const __m256i B1 = _mm256_shuffle_epi8(A1, perm);
const __m256i C1 = _mm256_mulhi_epi16(B1, mult);
const __m256i D1 = _mm256_sub_epi16(A1, C1);
__m256i E = _mm256_add_epi16(_mm256_srli_epi32(D1, 16), D1);
int x;
for (x = 8; x + 8 <= tile_width; x += 8) {
const __m256i A2 = _mm256_loadu_si256((const __m256i*)(src + x));
__m256i B2, C2, D2;
_mm256_storeu_si256((__m256i*)values, E);
for (i = 0; i < 32; i += 4) ++histo[values[i]];
B2 = _mm256_shuffle_epi8(A2, perm);
C2 = _mm256_mulhi_epi16(B2, mult);
D2 = _mm256_sub_epi16(A2, C2);
E = _mm256_add_epi16(_mm256_srli_epi32(D2, 16), D2);
}
_mm256_storeu_si256((__m256i*)values, E);
for (i = 0; i < 32; i += 4) ++histo[values[i]];
}
}
{
const int left_over = tile_width & 7;
if (left_over > 0) {
VP8LCollectColorBlueTransforms_SSE(argb + tile_width - left_over, stride,
left_over, tile_height, green_to_blue,
red_to_blue, histo);
}
}
}
static void CollectColorRedTransforms_AVX2(const uint32_t* WEBP_RESTRICT argb,
int stride, int tile_width,
int tile_height, int green_to_red,
uint32_t histo[]) {
const __m256i mult = MK_CST_16(0, CST_5b(green_to_red));
const __m256i mask_g = _mm256_set1_epi32(0x0000ff00);
if (tile_width >= 8) {
int y, i;
for (y = 0; y < tile_height; ++y) {
uint8_t values[32];
const uint32_t* const src = argb + y * stride;
const __m256i A1 = _mm256_loadu_si256((const __m256i*)src);
const __m256i B1 = _mm256_and_si256(A1, mask_g);
const __m256i C1 = _mm256_madd_epi16(B1, mult);
__m256i D = _mm256_sub_epi16(A1, C1);
int x;
for (x = 8; x + 8 <= tile_width; x += 8) {
const __m256i A2 = _mm256_loadu_si256((const __m256i*)(src + x));
__m256i B2, C2;
_mm256_storeu_si256((__m256i*)values, D);
for (i = 2; i < 32; i += 4) ++histo[values[i]];
B2 = _mm256_and_si256(A2, mask_g);
C2 = _mm256_madd_epi16(B2, mult);
D = _mm256_sub_epi16(A2, C2);
}
_mm256_storeu_si256((__m256i*)values, D);
for (i = 2; i < 32; i += 4) ++histo[values[i]];
}
}
{
const int left_over = tile_width & 7;
if (left_over > 0) {
VP8LCollectColorRedTransforms_SSE(argb + tile_width - left_over, stride,
left_over, tile_height, green_to_red,
histo);
}
}
}
#undef SPAN
#undef MK_CST_16
//------------------------------------------------------------------------------
// Note we are adding uint32_t's as *signed* int32's (using _mm256_add_epi32).
// But that's ok since the histogram values are less than 1<<28 (max picture
// size).
static void AddVector_AVX2(const uint32_t* WEBP_RESTRICT a,
const uint32_t* WEBP_RESTRICT b,
uint32_t* WEBP_RESTRICT out, int size) {
int i = 0;
int aligned_size = size & ~31;
// Size is, at minimum, NUM_DISTANCE_CODES (40) and may be as large as
// NUM_LITERAL_CODES (256) + NUM_LENGTH_CODES (24) + (0 or a non-zero power of
// 2). See the usage in VP8LHistogramAdd().
assert(size >= 32);
assert(size % 2 == 0);
do {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i + 0]);
const __m256i a1 = _mm256_loadu_si256((const __m256i*)&a[i + 8]);
const __m256i a2 = _mm256_loadu_si256((const __m256i*)&a[i + 16]);
const __m256i a3 = _mm256_loadu_si256((const __m256i*)&a[i + 24]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&b[i + 0]);
const __m256i b1 = _mm256_loadu_si256((const __m256i*)&b[i + 8]);
const __m256i b2 = _mm256_loadu_si256((const __m256i*)&b[i + 16]);
const __m256i b3 = _mm256_loadu_si256((const __m256i*)&b[i + 24]);
_mm256_storeu_si256((__m256i*)&out[i + 0], _mm256_add_epi32(a0, b0));
_mm256_storeu_si256((__m256i*)&out[i + 8], _mm256_add_epi32(a1, b1));
_mm256_storeu_si256((__m256i*)&out[i + 16], _mm256_add_epi32(a2, b2));
_mm256_storeu_si256((__m256i*)&out[i + 24], _mm256_add_epi32(a3, b3));
i += 32;
} while (i != aligned_size);
if ((size & 16) != 0) {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i + 0]);
const __m256i a1 = _mm256_loadu_si256((const __m256i*)&a[i + 8]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&b[i + 0]);
const __m256i b1 = _mm256_loadu_si256((const __m256i*)&b[i + 8]);
_mm256_storeu_si256((__m256i*)&out[i + 0], _mm256_add_epi32(a0, b0));
_mm256_storeu_si256((__m256i*)&out[i + 8], _mm256_add_epi32(a1, b1));
i += 16;
}
size &= 15;
if (size == 8) {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&b[i]);
_mm256_storeu_si256((__m256i*)&out[i], _mm256_add_epi32(a0, b0));
} else {
for (; size--; ++i) {
out[i] = a[i] + b[i];
}
}
}
static void AddVectorEq_AVX2(const uint32_t* WEBP_RESTRICT a,
uint32_t* WEBP_RESTRICT out, int size) {
int i = 0;
int aligned_size = size & ~31;
// Size is, at minimum, NUM_DISTANCE_CODES (40) and may be as large as
// NUM_LITERAL_CODES (256) + NUM_LENGTH_CODES (24) + (0 or a non-zero power of
// 2). See the usage in VP8LHistogramAdd().
assert(size >= 32);
assert(size % 2 == 0);
do {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i + 0]);
const __m256i a1 = _mm256_loadu_si256((const __m256i*)&a[i + 8]);
const __m256i a2 = _mm256_loadu_si256((const __m256i*)&a[i + 16]);
const __m256i a3 = _mm256_loadu_si256((const __m256i*)&a[i + 24]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&out[i + 0]);
const __m256i b1 = _mm256_loadu_si256((const __m256i*)&out[i + 8]);
const __m256i b2 = _mm256_loadu_si256((const __m256i*)&out[i + 16]);
const __m256i b3 = _mm256_loadu_si256((const __m256i*)&out[i + 24]);
_mm256_storeu_si256((__m256i*)&out[i + 0], _mm256_add_epi32(a0, b0));
_mm256_storeu_si256((__m256i*)&out[i + 8], _mm256_add_epi32(a1, b1));
_mm256_storeu_si256((__m256i*)&out[i + 16], _mm256_add_epi32(a2, b2));
_mm256_storeu_si256((__m256i*)&out[i + 24], _mm256_add_epi32(a3, b3));
i += 32;
} while (i != aligned_size);
if ((size & 16) != 0) {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i + 0]);
const __m256i a1 = _mm256_loadu_si256((const __m256i*)&a[i + 8]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&out[i + 0]);
const __m256i b1 = _mm256_loadu_si256((const __m256i*)&out[i + 8]);
_mm256_storeu_si256((__m256i*)&out[i + 0], _mm256_add_epi32(a0, b0));
_mm256_storeu_si256((__m256i*)&out[i + 8], _mm256_add_epi32(a1, b1));
i += 16;
}
size &= 15;
if (size == 8) {
const __m256i a0 = _mm256_loadu_si256((const __m256i*)&a[i]);
const __m256i b0 = _mm256_loadu_si256((const __m256i*)&out[i]);
_mm256_storeu_si256((__m256i*)&out[i], _mm256_add_epi32(a0, b0));
} else {
for (; size--; ++i) {
out[i] += a[i];
}
}
}
//------------------------------------------------------------------------------
// Entropy
#if !defined(WEBP_HAVE_SLOW_CLZ_CTZ)
static uint64_t CombinedShannonEntropy_AVX2(const uint32_t X[256],
const uint32_t Y[256]) {
int i;
uint64_t retval = 0;
uint32_t sumX = 0, sumXY = 0;
const __m256i zero = _mm256_setzero_si256();
for (i = 0; i < 256; i += 32) {
const __m256i x0 = _mm256_loadu_si256((const __m256i*)(X + i + 0));
const __m256i y0 = _mm256_loadu_si256((const __m256i*)(Y + i + 0));
const __m256i x1 = _mm256_loadu_si256((const __m256i*)(X + i + 8));
const __m256i y1 = _mm256_loadu_si256((const __m256i*)(Y + i + 8));
const __m256i x2 = _mm256_loadu_si256((const __m256i*)(X + i + 16));
const __m256i y2 = _mm256_loadu_si256((const __m256i*)(Y + i + 16));
const __m256i x3 = _mm256_loadu_si256((const __m256i*)(X + i + 24));
const __m256i y3 = _mm256_loadu_si256((const __m256i*)(Y + i + 24));
const __m256i x4 = _mm256_packs_epi16(_mm256_packs_epi32(x0, x1),
_mm256_packs_epi32(x2, x3));
const __m256i y4 = _mm256_packs_epi16(_mm256_packs_epi32(y0, y1),
_mm256_packs_epi32(y2, y3));
// Packed pixels are actually in order: ... 17 16 12 11 10 9 8 3 2 1 0
const __m256i x5 = _mm256_permutevar8x32_epi32(
x4, _mm256_set_epi32(7, 3, 6, 2, 5, 1, 4, 0));
const __m256i y5 = _mm256_permutevar8x32_epi32(
y4, _mm256_set_epi32(7, 3, 6, 2, 5, 1, 4, 0));
const uint32_t mx =
(uint32_t)_mm256_movemask_epi8(_mm256_cmpgt_epi8(x5, zero));
uint32_t my =
(uint32_t)_mm256_movemask_epi8(_mm256_cmpgt_epi8(y5, zero)) | mx;
while (my) {
const int32_t j = BitsCtz(my);
uint32_t xy;
if ((mx >> j) & 1) {
const int x = X[i + j];
sumXY += x;
retval += VP8LFastSLog2(x);
}
xy = X[i + j] + Y[i + j];
sumX += xy;
retval += VP8LFastSLog2(xy);
my &= my - 1;
}
}
retval = VP8LFastSLog2(sumX) + VP8LFastSLog2(sumXY) - retval;
return retval;
}
#else
#define DONT_USE_COMBINED_SHANNON_ENTROPY_SSE2_FUNC // won't be faster
#endif
//------------------------------------------------------------------------------
static int VectorMismatch_AVX2(const uint32_t* const array1,
const uint32_t* const array2, int length) {
int match_len;
if (length >= 24) {
__m256i A0 = _mm256_loadu_si256((const __m256i*)&array1[0]);
__m256i A1 = _mm256_loadu_si256((const __m256i*)&array2[0]);
match_len = 0;
do {
// Loop unrolling and early load both provide a speedup of 10% for the
// current function. Also, max_limit can be MAX_LENGTH=4096 at most.
const __m256i cmpA = _mm256_cmpeq_epi32(A0, A1);
const __m256i B0 =
_mm256_loadu_si256((const __m256i*)&array1[match_len + 8]);
const __m256i B1 =
_mm256_loadu_si256((const __m256i*)&array2[match_len + 8]);
if ((uint32_t)_mm256_movemask_epi8(cmpA) != 0xffffffff) break;
match_len += 8;
{
const __m256i cmpB = _mm256_cmpeq_epi32(B0, B1);
A0 = _mm256_loadu_si256((const __m256i*)&array1[match_len + 8]);
A1 = _mm256_loadu_si256((const __m256i*)&array2[match_len + 8]);
if ((uint32_t)_mm256_movemask_epi8(cmpB) != 0xffffffff) break;
match_len += 8;
}
} while (match_len + 24 < length);
} else {
match_len = 0;
// Unroll the potential first two loops.
if (length >= 8 &&
(uint32_t)_mm256_movemask_epi8(_mm256_cmpeq_epi32(
_mm256_loadu_si256((const __m256i*)&array1[0]),
_mm256_loadu_si256((const __m256i*)&array2[0]))) == 0xffffffff) {
match_len = 8;
if (length >= 16 &&
(uint32_t)_mm256_movemask_epi8(_mm256_cmpeq_epi32(
_mm256_loadu_si256((const __m256i*)&array1[8]),
_mm256_loadu_si256((const __m256i*)&array2[8]))) == 0xffffffff) {
match_len = 16;
}
}
}
while (match_len < length && array1[match_len] == array2[match_len]) {
++match_len;
}
return match_len;
}
// Bundles multiple (1, 2, 4 or 8) pixels into a single pixel.
static void BundleColorMap_AVX2(const uint8_t* WEBP_RESTRICT const row,
int width, int xbits,
uint32_t* WEBP_RESTRICT dst) {
int x = 0;
assert(xbits >= 0);
assert(xbits <= 3);
switch (xbits) {
case 0: {
const __m256i ff = _mm256_set1_epi16((short)0xff00);
const __m256i zero = _mm256_setzero_si256();
// Store 0xff000000 | (row[x] << 8).
for (x = 0; x + 32 <= width; x += 32, dst += 32) {
const __m256i in = _mm256_loadu_si256((const __m256i*)&row[x]);
const __m256i in_lo = _mm256_unpacklo_epi8(zero, in);
const __m256i dst0 = _mm256_unpacklo_epi16(in_lo, ff);
const __m256i dst1 = _mm256_unpackhi_epi16(in_lo, ff);
const __m256i in_hi = _mm256_unpackhi_epi8(zero, in);
const __m256i dst2 = _mm256_unpacklo_epi16(in_hi, ff);
const __m256i dst3 = _mm256_unpackhi_epi16(in_hi, ff);
_mm256_storeu2_m128i((__m128i*)&dst[16], (__m128i*)&dst[0], dst0);
_mm256_storeu2_m128i((__m128i*)&dst[20], (__m128i*)&dst[4], dst1);
_mm256_storeu2_m128i((__m128i*)&dst[24], (__m128i*)&dst[8], dst2);
_mm256_storeu2_m128i((__m128i*)&dst[28], (__m128i*)&dst[12], dst3);
}
break;
}
case 1: {
const __m256i ff = _mm256_set1_epi16((short)0xff00);
const __m256i mul = _mm256_set1_epi16(0x110);
for (x = 0; x + 32 <= width; x += 32, dst += 16) {
// 0a0b | (where a/b are 4 bits).
const __m256i in = _mm256_loadu_si256((const __m256i*)&row[x]);
const __m256i tmp = _mm256_mullo_epi16(in, mul); // aba0
const __m256i pack = _mm256_and_si256(tmp, ff); // ab00
const __m256i dst0 = _mm256_unpacklo_epi16(pack, ff);
const __m256i dst1 = _mm256_unpackhi_epi16(pack, ff);
_mm256_storeu2_m128i((__m128i*)&dst[8], (__m128i*)&dst[0], dst0);
_mm256_storeu2_m128i((__m128i*)&dst[12], (__m128i*)&dst[4], dst1);
}
break;
}
case 2: {
const __m256i mask_or = _mm256_set1_epi32((int)0xff000000);
const __m256i mul_cst = _mm256_set1_epi16(0x0104);
const __m256i mask_mul = _mm256_set1_epi16(0x0f00);
for (x = 0; x + 32 <= width; x += 32, dst += 8) {
// 000a000b000c000d | (where a/b/c/d are 2 bits).
const __m256i in = _mm256_loadu_si256((const __m256i*)&row[x]);
const __m256i mul =
_mm256_mullo_epi16(in, mul_cst); // 00ab00b000cd00d0
const __m256i tmp =
_mm256_and_si256(mul, mask_mul); // 00ab000000cd0000
const __m256i shift = _mm256_srli_epi32(tmp, 12); // 00000000ab000000
const __m256i pack = _mm256_or_si256(shift, tmp); // 00000000abcd0000
// Convert to 0xff00**00.
const __m256i res = _mm256_or_si256(pack, mask_or);
_mm256_storeu_si256((__m256i*)dst, res);
}
break;
}
default: {
assert(xbits == 3);
for (x = 0; x + 32 <= width; x += 32, dst += 4) {
// 0000000a00000000b... | (where a/b are 1 bit).
const __m256i in = _mm256_loadu_si256((const __m256i*)&row[x]);
const __m256i shift = _mm256_slli_epi64(in, 7);
const uint32_t move = _mm256_movemask_epi8(shift);
dst[0] = 0xff000000 | ((move & 0xff) << 8);
dst[1] = 0xff000000 | (move & 0xff00);
dst[2] = 0xff000000 | ((move & 0xff0000) >> 8);
dst[3] = 0xff000000 | ((move & 0xff000000) >> 16);
}
break;
}
}
if (x != width) {
VP8LBundleColorMap_SSE(row + x, width - x, xbits, dst);
}
}
//------------------------------------------------------------------------------
// Batch version of Predictor Transform subtraction
static WEBP_INLINE void Average2_m256i(const __m256i* const a0,
const __m256i* const a1,
__m256i* const avg) {
// (a + b) >> 1 = ((a + b + 1) >> 1) - ((a ^ b) & 1)
const __m256i ones = _mm256_set1_epi8(1);
const __m256i avg1 = _mm256_avg_epu8(*a0, *a1);
const __m256i one = _mm256_and_si256(_mm256_xor_si256(*a0, *a1), ones);
*avg = _mm256_sub_epi8(avg1, one);
}
// Predictor0: ARGB_BLACK.
static void PredictorSub0_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i black = _mm256_set1_epi32((int)ARGB_BLACK);
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i res = _mm256_sub_epi8(src, black);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[0](in + i, NULL, num_pixels - i, out + i);
}
(void)upper;
}
#define GENERATE_PREDICTOR_1(X, IN) \
static void PredictorSub##X##_AVX2( \
const uint32_t* const in, const uint32_t* const upper, int num_pixels, \
uint32_t* WEBP_RESTRICT const out) { \
int i; \
for (i = 0; i + 8 <= num_pixels; i += 8) { \
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]); \
const __m256i pred = _mm256_loadu_si256((const __m256i*)&(IN)); \
const __m256i res = _mm256_sub_epi8(src, pred); \
_mm256_storeu_si256((__m256i*)&out[i], res); \
} \
if (i != num_pixels) { \
VP8LPredictorsSub_SSE[(X)](in + i, WEBP_OFFSET_PTR(upper, i), \
num_pixels - i, out + i); \
} \
}
GENERATE_PREDICTOR_1(1, in[i - 1]) // Predictor1: L
GENERATE_PREDICTOR_1(2, upper[i]) // Predictor2: T
GENERATE_PREDICTOR_1(3, upper[i + 1]) // Predictor3: TR
GENERATE_PREDICTOR_1(4, upper[i - 1]) // Predictor4: TL
#undef GENERATE_PREDICTOR_1
// Predictor5: avg2(avg2(L, TR), T)
static void PredictorSub5_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TR = _mm256_loadu_si256((const __m256i*)&upper[i + 1]);
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
__m256i avg, pred, res;
Average2_m256i(&L, &TR, &avg);
Average2_m256i(&avg, &T, &pred);
res = _mm256_sub_epi8(src, pred);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[5](in + i, upper + i, num_pixels - i, out + i);
}
}
#define GENERATE_PREDICTOR_2(X, A, B) \
static void PredictorSub##X##_AVX2(const uint32_t* in, \
const uint32_t* upper, int num_pixels, \
uint32_t* WEBP_RESTRICT out) { \
int i; \
for (i = 0; i + 8 <= num_pixels; i += 8) { \
const __m256i tA = _mm256_loadu_si256((const __m256i*)&(A)); \
const __m256i tB = _mm256_loadu_si256((const __m256i*)&(B)); \
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]); \
__m256i pred, res; \
Average2_m256i(&tA, &tB, &pred); \
res = _mm256_sub_epi8(src, pred); \
_mm256_storeu_si256((__m256i*)&out[i], res); \
} \
if (i != num_pixels) { \
VP8LPredictorsSub_SSE[(X)](in + i, upper + i, num_pixels - i, out + i); \
} \
}
GENERATE_PREDICTOR_2(6, in[i - 1], upper[i - 1]) // Predictor6: avg(L, TL)
GENERATE_PREDICTOR_2(7, in[i - 1], upper[i]) // Predictor7: avg(L, T)
GENERATE_PREDICTOR_2(8, upper[i - 1], upper[i]) // Predictor8: avg(TL, T)
GENERATE_PREDICTOR_2(9, upper[i], upper[i + 1]) // Predictor9: average(T, TR)
#undef GENERATE_PREDICTOR_2
// Predictor10: avg(avg(L,TL), avg(T, TR)).
static void PredictorSub10_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TR = _mm256_loadu_si256((const __m256i*)&upper[i + 1]);
__m256i avgTTR, avgLTL, avg, res;
Average2_m256i(&T, &TR, &avgTTR);
Average2_m256i(&L, &TL, &avgLTL);
Average2_m256i(&avgTTR, &avgLTL, &avg);
res = _mm256_sub_epi8(src, avg);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[10](in + i, upper + i, num_pixels - i, out + i);
}
}
// Predictor11: select.
static void GetSumAbsDiff32_AVX2(const __m256i* const A, const __m256i* const B,
__m256i* const out) {
// We can unpack with any value on the upper 32 bits, provided it's the same
// on both operands (to that their sum of abs diff is zero). Here we use *A.
const __m256i A_lo = _mm256_unpacklo_epi32(*A, *A);
const __m256i B_lo = _mm256_unpacklo_epi32(*B, *A);
const __m256i A_hi = _mm256_unpackhi_epi32(*A, *A);
const __m256i B_hi = _mm256_unpackhi_epi32(*B, *A);
const __m256i s_lo = _mm256_sad_epu8(A_lo, B_lo);
const __m256i s_hi = _mm256_sad_epu8(A_hi, B_hi);
*out = _mm256_packs_epi32(s_lo, s_hi);
}
static void PredictorSub11_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
__m256i pa, pb;
GetSumAbsDiff32_AVX2(&T, &TL, &pa); // pa = sum |T-TL|
GetSumAbsDiff32_AVX2(&L, &TL, &pb); // pb = sum |L-TL|
{
const __m256i mask = _mm256_cmpgt_epi32(pb, pa);
const __m256i A = _mm256_and_si256(mask, L);
const __m256i B = _mm256_andnot_si256(mask, T);
const __m256i pred = _mm256_or_si256(A, B); // pred = (L > T)? L : T
const __m256i res = _mm256_sub_epi8(src, pred);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[11](in + i, upper + i, num_pixels - i, out + i);
}
}
// Predictor12: ClampedSubSubtractFull.
static void PredictorSub12_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i zero = _mm256_setzero_si256();
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i L_lo = _mm256_unpacklo_epi8(L, zero);
const __m256i L_hi = _mm256_unpackhi_epi8(L, zero);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i T_lo = _mm256_unpacklo_epi8(T, zero);
const __m256i T_hi = _mm256_unpackhi_epi8(T, zero);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
const __m256i TL_lo = _mm256_unpacklo_epi8(TL, zero);
const __m256i TL_hi = _mm256_unpackhi_epi8(TL, zero);
const __m256i diff_lo = _mm256_sub_epi16(T_lo, TL_lo);
const __m256i diff_hi = _mm256_sub_epi16(T_hi, TL_hi);
const __m256i pred_lo = _mm256_add_epi16(L_lo, diff_lo);
const __m256i pred_hi = _mm256_add_epi16(L_hi, diff_hi);
const __m256i pred = _mm256_packus_epi16(pred_lo, pred_hi);
const __m256i res = _mm256_sub_epi8(src, pred);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[12](in + i, upper + i, num_pixels - i, out + i);
}
}
// Predictors13: ClampedAddSubtractHalf
static void PredictorSub13_AVX2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m256i zero = _mm256_setzero_si256();
for (i = 0; i + 8 <= num_pixels; i += 8) {
const __m256i L = _mm256_loadu_si256((const __m256i*)&in[i - 1]);
const __m256i src = _mm256_loadu_si256((const __m256i*)&in[i]);
const __m256i T = _mm256_loadu_si256((const __m256i*)&upper[i]);
const __m256i TL = _mm256_loadu_si256((const __m256i*)&upper[i - 1]);
// lo.
const __m256i L_lo = _mm256_unpacklo_epi8(L, zero);
const __m256i T_lo = _mm256_unpacklo_epi8(T, zero);
const __m256i TL_lo = _mm256_unpacklo_epi8(TL, zero);
const __m256i sum_lo = _mm256_add_epi16(T_lo, L_lo);
const __m256i avg_lo = _mm256_srli_epi16(sum_lo, 1);
const __m256i A1_lo = _mm256_sub_epi16(avg_lo, TL_lo);
const __m256i bit_fix_lo = _mm256_cmpgt_epi16(TL_lo, avg_lo);
const __m256i A2_lo = _mm256_sub_epi16(A1_lo, bit_fix_lo);
const __m256i A3_lo = _mm256_srai_epi16(A2_lo, 1);
const __m256i A4_lo = _mm256_add_epi16(avg_lo, A3_lo);
// hi.
const __m256i L_hi = _mm256_unpackhi_epi8(L, zero);
const __m256i T_hi = _mm256_unpackhi_epi8(T, zero);
const __m256i TL_hi = _mm256_unpackhi_epi8(TL, zero);
const __m256i sum_hi = _mm256_add_epi16(T_hi, L_hi);
const __m256i avg_hi = _mm256_srli_epi16(sum_hi, 1);
const __m256i A1_hi = _mm256_sub_epi16(avg_hi, TL_hi);
const __m256i bit_fix_hi = _mm256_cmpgt_epi16(TL_hi, avg_hi);
const __m256i A2_hi = _mm256_sub_epi16(A1_hi, bit_fix_hi);
const __m256i A3_hi = _mm256_srai_epi16(A2_hi, 1);
const __m256i A4_hi = _mm256_add_epi16(avg_hi, A3_hi);
const __m256i pred = _mm256_packus_epi16(A4_lo, A4_hi);
const __m256i res = _mm256_sub_epi8(src, pred);
_mm256_storeu_si256((__m256i*)&out[i], res);
}
if (i != num_pixels) {
VP8LPredictorsSub_SSE[13](in + i, upper + i, num_pixels - i, out + i);
}
}
//------------------------------------------------------------------------------
// Entry point
extern void VP8LEncDspInitAVX2(void);
WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitAVX2(void) {
VP8LSubtractGreenFromBlueAndRed = SubtractGreenFromBlueAndRed_AVX2;
VP8LTransformColor = TransformColor_AVX2;
VP8LCollectColorBlueTransforms = CollectColorBlueTransforms_AVX2;
VP8LCollectColorRedTransforms = CollectColorRedTransforms_AVX2;
VP8LAddVector = AddVector_AVX2;
VP8LAddVectorEq = AddVectorEq_AVX2;
VP8LCombinedShannonEntropy = CombinedShannonEntropy_AVX2;
VP8LVectorMismatch = VectorMismatch_AVX2;
VP8LBundleColorMap = BundleColorMap_AVX2;
VP8LPredictorsSub[0] = PredictorSub0_AVX2;
VP8LPredictorsSub[1] = PredictorSub1_AVX2;
VP8LPredictorsSub[2] = PredictorSub2_AVX2;
VP8LPredictorsSub[3] = PredictorSub3_AVX2;
VP8LPredictorsSub[4] = PredictorSub4_AVX2;
VP8LPredictorsSub[5] = PredictorSub5_AVX2;
VP8LPredictorsSub[6] = PredictorSub6_AVX2;
VP8LPredictorsSub[7] = PredictorSub7_AVX2;
VP8LPredictorsSub[8] = PredictorSub8_AVX2;
VP8LPredictorsSub[9] = PredictorSub9_AVX2;
VP8LPredictorsSub[10] = PredictorSub10_AVX2;
VP8LPredictorsSub[11] = PredictorSub11_AVX2;
VP8LPredictorsSub[12] = PredictorSub12_AVX2;
VP8LPredictorsSub[13] = PredictorSub13_AVX2;
VP8LPredictorsSub[14] = PredictorSub0_AVX2; // <- padding security sentinels
VP8LPredictorsSub[15] = PredictorSub0_AVX2;
}
#else // !WEBP_USE_AVX2
WEBP_DSP_INIT_STUB(VP8LEncDspInitAVX2)
#endif // WEBP_USE_AVX2

View File

@ -133,60 +133,6 @@ static uint32_t ExtraCost_MIPS32(const uint32_t* const population, int length) {
return ((int64_t)temp0 << 32 | temp1);
}
// C version of this function:
// int i = 0;
// int64_t cost = 0;
// const uint32_t* pX = &X[4];
// const uint32_t* pY = &Y[4];
// const uint32_t* LoopEnd = &X[length];
// while (pX != LoopEnd) {
// const uint32_t xy0 = *pX + *pY;
// const uint32_t xy1 = *(pX + 1) + *(pY + 1);
// ++i;
// cost += i * xy0;
// cost += i * xy1;
// pX += 2;
// pY += 2;
// }
// return cost;
static uint32_t ExtraCostCombined_MIPS32(const uint32_t* WEBP_RESTRICT const X,
const uint32_t* WEBP_RESTRICT const Y,
int length) {
int i, temp0, temp1, temp2, temp3;
const uint32_t* pX = &X[4];
const uint32_t* pY = &Y[4];
const uint32_t* const LoopEnd = &X[length];
__asm__ volatile(
"mult $zero, $zero \n\t"
"xor %[i], %[i], %[i] \n\t"
"beq %[pX], %[LoopEnd], 2f \n\t"
"1: \n\t"
"lw %[temp0], 0(%[pX]) \n\t"
"lw %[temp1], 0(%[pY]) \n\t"
"lw %[temp2], 4(%[pX]) \n\t"
"lw %[temp3], 4(%[pY]) \n\t"
"addiu %[i], %[i], 1 \n\t"
"addu %[temp0], %[temp0], %[temp1] \n\t"
"addu %[temp2], %[temp2], %[temp3] \n\t"
"addiu %[pX], %[pX], 8 \n\t"
"addiu %[pY], %[pY], 8 \n\t"
"madd %[i], %[temp0] \n\t"
"madd %[i], %[temp2] \n\t"
"bne %[pX], %[LoopEnd], 1b \n\t"
"2: \n\t"
"mfhi %[temp0] \n\t"
"mflo %[temp1] \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1),
[temp2]"=&r"(temp2), [temp3]"=&r"(temp3),
[i]"=&r"(i), [pX]"+r"(pX), [pY]"+r"(pY)
: [LoopEnd]"r"(LoopEnd)
: "memory", "hi", "lo"
);
return ((int64_t)temp0 << 32 | temp1);
}
#define HUFFMAN_COST_PASS \
__asm__ volatile( \
"sll %[temp1], %[temp0], 3 \n\t" \
@ -388,7 +334,6 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitMIPS32(void) {
VP8LFastSLog2Slow = FastSLog2Slow_MIPS32;
VP8LFastLog2Slow = FastLog2Slow_MIPS32;
VP8LExtraCost = ExtraCost_MIPS32;
VP8LExtraCostCombined = ExtraCostCombined_MIPS32;
VP8LGetEntropyUnrefined = GetEntropyUnrefined_MIPS32;
VP8LGetCombinedEntropyUnrefined = GetCombinedEntropyUnrefined_MIPS32;
VP8LAddVector = AddVector_MIPS32;

View File

@ -14,11 +14,16 @@
#include "src/dsp/dsp.h"
#if defined(WEBP_USE_SSE2)
#include <assert.h>
#include <emmintrin.h>
#include <string.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/dsp/common_sse2.h"
#include "src/dsp/lossless_common.h"
#include "src/utils/utils.h"
#include "src/webp/types.h"
// For sign-extended multiplying constants, pre-shifted by 5:
#define CST_5b(X) (((int16_t)((uint16_t)(X) << 8)) >> 5)
@ -645,25 +650,43 @@ static void PredictorSub13_SSE2(const uint32_t* in, const uint32_t* upper,
int num_pixels, uint32_t* WEBP_RESTRICT out) {
int i;
const __m128i zero = _mm_setzero_si128();
for (i = 0; i + 2 <= num_pixels; i += 2) {
// we can only process two pixels at a time
const __m128i L = _mm_loadl_epi64((const __m128i*)&in[i - 1]);
const __m128i src = _mm_loadl_epi64((const __m128i*)&in[i]);
const __m128i T = _mm_loadl_epi64((const __m128i*)&upper[i]);
const __m128i TL = _mm_loadl_epi64((const __m128i*)&upper[i - 1]);
const __m128i L_lo = _mm_unpacklo_epi8(L, zero);
const __m128i T_lo = _mm_unpacklo_epi8(T, zero);
const __m128i TL_lo = _mm_unpacklo_epi8(TL, zero);
const __m128i sum = _mm_add_epi16(T_lo, L_lo);
const __m128i avg = _mm_srli_epi16(sum, 1);
const __m128i A1 = _mm_sub_epi16(avg, TL_lo);
const __m128i bit_fix = _mm_cmpgt_epi16(TL_lo, avg);
const __m128i A2 = _mm_sub_epi16(A1, bit_fix);
const __m128i A3 = _mm_srai_epi16(A2, 1);
const __m128i A4 = _mm_add_epi16(avg, A3);
const __m128i pred = _mm_packus_epi16(A4, A4);
const __m128i res = _mm_sub_epi8(src, pred);
_mm_storel_epi64((__m128i*)&out[i], res);
for (i = 0; i + 4 <= num_pixels; i += 4) {
const __m128i L = _mm_loadu_si128((const __m128i*)&in[i - 1]);
const __m128i src = _mm_loadu_si128((const __m128i*)&in[i]);
const __m128i T = _mm_loadu_si128((const __m128i*)&upper[i]);
const __m128i TL = _mm_loadu_si128((const __m128i*)&upper[i - 1]);
__m128i A4_lo, A4_hi;
// lo.
{
const __m128i L_lo = _mm_unpacklo_epi8(L, zero);
const __m128i T_lo = _mm_unpacklo_epi8(T, zero);
const __m128i TL_lo = _mm_unpacklo_epi8(TL, zero);
const __m128i sum_lo = _mm_add_epi16(T_lo, L_lo);
const __m128i avg_lo = _mm_srli_epi16(sum_lo, 1);
const __m128i A1_lo = _mm_sub_epi16(avg_lo, TL_lo);
const __m128i bit_fix_lo = _mm_cmpgt_epi16(TL_lo, avg_lo);
const __m128i A2_lo = _mm_sub_epi16(A1_lo, bit_fix_lo);
const __m128i A3_lo = _mm_srai_epi16(A2_lo, 1);
A4_lo = _mm_add_epi16(avg_lo, A3_lo);
}
// hi.
{
const __m128i L_hi = _mm_unpackhi_epi8(L, zero);
const __m128i T_hi = _mm_unpackhi_epi8(T, zero);
const __m128i TL_hi = _mm_unpackhi_epi8(TL, zero);
const __m128i sum_hi = _mm_add_epi16(T_hi, L_hi);
const __m128i avg_hi = _mm_srli_epi16(sum_hi, 1);
const __m128i A1_hi = _mm_sub_epi16(avg_hi, TL_hi);
const __m128i bit_fix_hi = _mm_cmpgt_epi16(TL_hi, avg_hi);
const __m128i A2_hi = _mm_sub_epi16(A1_hi, bit_fix_hi);
const __m128i A3_hi = _mm_srai_epi16(A2_hi, 1);
A4_hi = _mm_add_epi16(avg_hi, A3_hi);
}
{
const __m128i pred = _mm_packus_epi16(A4_lo, A4_hi);
const __m128i res = _mm_sub_epi8(src, pred);
_mm_storeu_si128((__m128i*)&out[i], res);
}
}
if (i != num_pixels) {
VP8LPredictorsSub_C[13](in + i, upper + i, num_pixels - i, out + i);
@ -704,6 +727,15 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitSSE2(void) {
VP8LPredictorsSub[13] = PredictorSub13_SSE2;
VP8LPredictorsSub[14] = PredictorSub0_SSE2; // <- padding security sentinels
VP8LPredictorsSub[15] = PredictorSub0_SSE2;
// SSE exports for AVX and above.
VP8LSubtractGreenFromBlueAndRed_SSE = SubtractGreenFromBlueAndRed_SSE2;
VP8LTransformColor_SSE = TransformColor_SSE2;
VP8LCollectColorBlueTransforms_SSE = CollectColorBlueTransforms_SSE2;
VP8LCollectColorRedTransforms_SSE = CollectColorRedTransforms_SSE2;
VP8LBundleColorMap_SSE = BundleColorMap_SSE2;
memcpy(VP8LPredictorsSub_SSE, VP8LPredictorsSub, sizeof(VP8LPredictorsSub));
}
#else // !WEBP_USE_SSE2

View File

@ -14,9 +14,13 @@
#include "src/dsp/dsp.h"
#if defined(WEBP_USE_SSE41)
#include <assert.h>
#include <smmintrin.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Cost operations.
@ -44,29 +48,6 @@ static uint32_t ExtraCost_SSE41(const uint32_t* const a, int length) {
return HorizontalSum_SSE41(cost);
}
static uint32_t ExtraCostCombined_SSE41(const uint32_t* WEBP_RESTRICT const a,
const uint32_t* WEBP_RESTRICT const b,
int length) {
int i;
__m128i cost = _mm_add_epi32(_mm_set_epi32(2 * a[7], 2 * a[6], a[5], a[4]),
_mm_set_epi32(2 * b[7], 2 * b[6], b[5], b[4]));
assert(length % 8 == 0);
for (i = 8; i + 8 <= length; i += 8) {
const int j = (i - 2) >> 1;
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&b[i + 4]);
const __m128i w = _mm_set_epi32(j + 3, j + 2, j + 1, j);
const __m128i a2 = _mm_hadd_epi32(a0, a1);
const __m128i b2 = _mm_hadd_epi32(b0, b1);
const __m128i mul = _mm_mullo_epi32(_mm_add_epi32(a2, b2), w);
cost = _mm_add_epi32(mul, cost);
}
return HorizontalSum_SSE41(cost);
}
//------------------------------------------------------------------------------
// Subtract-Green Transform
@ -195,10 +176,14 @@ extern void VP8LEncDspInitSSE41(void);
WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitSSE41(void) {
VP8LExtraCost = ExtraCost_SSE41;
VP8LExtraCostCombined = ExtraCostCombined_SSE41;
VP8LSubtractGreenFromBlueAndRed = SubtractGreenFromBlueAndRed_SSE41;
VP8LCollectColorBlueTransforms = CollectColorBlueTransforms_SSE41;
VP8LCollectColorRedTransforms = CollectColorRedTransforms_SSE41;
// SSE exports for AVX and above.
VP8LSubtractGreenFromBlueAndRed_SSE = SubtractGreenFromBlueAndRed_SSE41;
VP8LCollectColorBlueTransforms_SSE = CollectColorBlueTransforms_SSE41;
VP8LCollectColorRedTransforms_SSE = CollectColorRedTransforms_SSE41;
}
#else // !WEBP_USE_SSE41

View File

@ -15,10 +15,14 @@
#if defined(WEBP_USE_SSE2)
#include <emmintrin.h>
#include <string.h>
#include "src/dsp/common_sse2.h"
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/dsp/lossless_common.h"
#include <emmintrin.h>
#include "src/webp/types.h"
//------------------------------------------------------------------------------
// Predictor Transform
@ -707,6 +711,15 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LDspInitSSE2(void) {
VP8LConvertBGRAToRGBA4444 = ConvertBGRAToRGBA4444_SSE2;
VP8LConvertBGRAToRGB565 = ConvertBGRAToRGB565_SSE2;
VP8LConvertBGRAToBGR = ConvertBGRAToBGR_SSE2;
// SSE exports for AVX and above.
memcpy(VP8LPredictorsAdd_SSE, VP8LPredictorsAdd, sizeof(VP8LPredictorsAdd));
VP8LAddGreenToBlueAndRed_SSE = AddGreenToBlueAndRed_SSE2;
VP8LTransformColorInverse_SSE = TransformColorInverse_SSE2;
VP8LConvertBGRAToRGB_SSE = ConvertBGRAToRGB_SSE2;
VP8LConvertBGRAToRGBA_SSE = ConvertBGRAToRGBA_SSE2;
}
#else // !WEBP_USE_SSE2

View File

@ -13,9 +13,10 @@
#if defined(WEBP_USE_SSE41)
#include "src/dsp/common_sse41.h"
#include <smmintrin.h>
#include "src/dsp/cpu.h"
#include "src/dsp/lossless.h"
#include "src/dsp/lossless_common.h"
//------------------------------------------------------------------------------
// Color-space conversion functions
@ -124,6 +125,10 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LDspInitSSE41(void) {
VP8LTransformColorInverse = TransformColorInverse_SSE41;
VP8LConvertBGRAToRGB = ConvertBGRAToRGB_SSE41;
VP8LConvertBGRAToBGR = ConvertBGRAToBGR_SSE41;
// SSE exports for AVX and above.
VP8LTransformColorInverse_SSE = TransformColorInverse_SSE41;
VP8LConvertBGRAToRGB_SSE = ConvertBGRAToRGB_SSE41;
}
#else // !WEBP_USE_SSE41

View File

@ -401,10 +401,8 @@ WEBP_NODISCARD static int GetCombinedHistogramEntropy(
*cost = GetCombinedEntropy(a->literal_, b->literal_,
VP8LHistogramNumCodes(palette_code_bits),
a->is_used_[0], b->is_used_[0], 0);
*cost += (uint64_t)VP8LExtraCostCombined(a->literal_ + NUM_LITERAL_CODES,
b->literal_ + NUM_LITERAL_CODES,
NUM_LENGTH_CODES)
<< LOG_2_PRECISION_BITS;
// No need to add the extra cost as it is a constant that does not influence
// the histograms.
if (*cost >= cost_threshold) return 0;
if (a->trivial_symbol_ != VP8L_NON_TRIVIAL_SYM &&
@ -434,9 +432,8 @@ WEBP_NODISCARD static int GetCombinedHistogramEntropy(
*cost += GetCombinedEntropy(a->distance_, b->distance_, NUM_DISTANCE_CODES,
a->is_used_[4], b->is_used_[4], 0);
*cost += (uint64_t)VP8LExtraCostCombined(a->distance_, b->distance_,
NUM_DISTANCE_CODES)
<< LOG_2_PRECISION_BITS;
// No need to add the extra cost as it is a constant that does not influence
// the histograms.
if (*cost >= cost_threshold) return 0;
return 1;
@ -528,16 +525,13 @@ static void UpdateHistogramCost(VP8LHistogram* const h) {
uint32_t alpha_sym, red_sym, blue_sym;
const uint64_t alpha_cost =
PopulationCost(h->alpha_, NUM_LITERAL_CODES, &alpha_sym, &h->is_used_[3]);
// No need to add the extra cost as it is a constant that does not influence
// the histograms.
const uint64_t distance_cost =
PopulationCost(h->distance_, NUM_DISTANCE_CODES, NULL, &h->is_used_[4]) +
((uint64_t)VP8LExtraCost(h->distance_, NUM_DISTANCE_CODES)
<< LOG_2_PRECISION_BITS);
PopulationCost(h->distance_, NUM_DISTANCE_CODES, NULL, &h->is_used_[4]);
const int num_codes = VP8LHistogramNumCodes(h->palette_code_bits_);
h->literal_cost_ =
PopulationCost(h->literal_, num_codes, NULL, &h->is_used_[0]) +
((uint64_t)VP8LExtraCost(h->literal_ + NUM_LITERAL_CODES,
NUM_LENGTH_CODES)
<< LOG_2_PRECISION_BITS);
PopulationCost(h->literal_, num_codes, NULL, &h->is_used_[0]);
h->red_cost_ =
PopulationCost(h->red_, NUM_LITERAL_CODES, &red_sym, &h->is_used_[1]);
h->blue_cost_ =

View File

@ -226,9 +226,7 @@ int WebPMemoryWrite(const uint8_t* data, size_t data_size,
void WebPMemoryWriterClear(WebPMemoryWriter* writer) {
if (writer != NULL) {
WebPSafeFree(writer->mem);
writer->mem = NULL;
writer->size = 0;
writer->max_size = 0;
WebPMemoryWriterInit(writer);
}
}

View File

@ -15,6 +15,8 @@
#include "src/webp/config.h"
#endif
#include <stddef.h>
#include "src/dsp/cpu.h"
#include "src/utils/bit_reader_inl_utils.h"
#include "src/utils/utils.h"
@ -25,11 +27,12 @@
void VP8BitReaderSetBuffer(VP8BitReader* const br,
const uint8_t* const start,
size_t size) {
br->buf_ = start;
br->buf_end_ = start + size;
br->buf_max_ =
(size >= sizeof(lbit_t)) ? start + size - sizeof(lbit_t) + 1
: start;
if (start != NULL) {
br->buf_ = start;
br->buf_end_ = start + size;
br->buf_max_ =
(size >= sizeof(lbit_t)) ? start + size - sizeof(lbit_t) + 1 : start;
}
}
void VP8InitBitReader(VP8BitReader* const br,

View File

@ -20,7 +20,7 @@
extern "C" {
#endif
#define WEBP_DECODER_ABI_VERSION 0x0209 // MAJOR(8b) + MINOR(8b)
#define WEBP_DECODER_ABI_VERSION 0x0210 // MAJOR(8b) + MINOR(8b)
// Note: forward declaring enumerations is not allowed in (strict) C and C++,
// the types are left here for reference.
@ -451,7 +451,9 @@ struct WebPDecoderOptions {
// Will be snapped to even values.
int crop_width, crop_height; // dimension of the cropping area
int use_scaling; // if true, scaling is applied _afterward_
int scaled_width, scaled_height; // final resolution
int scaled_width, scaled_height; // final resolution. if one is 0, it is
// guessed from the other one to keep the
// original ratio.
int use_threads; // if true, use multi-threaded decoding
int dithering_strength; // dithering strength (0=Off, 100=full)
int flip; // if true, flip output vertically
@ -479,6 +481,11 @@ WEBP_NODISCARD static WEBP_INLINE int WebPInitDecoderConfig(
return WebPInitDecoderConfigInternal(config, WEBP_DECODER_ABI_VERSION);
}
// Returns true if 'config' is non-NULL and all configuration parameters are
// within their valid ranges.
WEBP_NODISCARD WEBP_EXTERN int WebPValidateDecoderConfig(
const WebPDecoderConfig* config);
// Instantiate a new incremental decoder object with the requested
// configuration. The bitstream can be passed using 'data' and 'data_size'
// parameter, in which case the features will be parsed and stored into