Compare commits

...

11 Commits

Author SHA1 Message Date
fcff86c71b {gif,img}2webp: sync -m help w/cwebp
These were both missing the mention of the default value (4).

Bug: webm:381109771
Change-Id: Ibb0d822310af443c5ff5219fc0334008de0a0c60
2024-11-26 13:08:29 -08:00
b76c4a8416 man/img2webp.1: sync -m text w/cwebp.1 & gif2webp.1
Bug: webp:381109771
Change-Id: I3df400305255ba74a913601cf7aa04f392814370
2024-11-26 13:03:20 -08:00
306335198d muxread: fix reading of buffers > riff size
After:
  2c70ad76 muxread,CreateInternal: fix riff size checks (cl/200674839)

`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` (plus additional 1 byte
padding if needed). A later check included `CHUNK_HEADER_SIZE` before
capping the value of the size passed to `WebPMuxCreateInternal()`,
missing cases with a small amount of extra data after the RIFF chunk
(like a newline when the file is opened and saved in a text editor) and
setting size to an incorrect value, so larger sizes would also fail.

Another check of `riff_size < CHUNK_HEADER_SIZE` after the call to
`SizeWithPadding()` is removed because 1) it could not fail given
`SizeWithPadding()` adds `CHUNK_HEADER_SIZE` to the value; and 2) it is
redundant as `size < RIFF_HEADER_SIZE + CHUNK_HEADER_SIZE` is checked
earlier in the function.

Bug: webp:42340561
Change-Id: I58dc4f071b27c2841001b4012aabdb1869f64f97
2024-11-22 12:40:34 -08:00
4c85d860ea yuv.h: update RGB<->YUV coefficients in comment
The values for the R/G/B floating point formulas resembled
https://fourcc.org/fccyvrgb.php and Video Demystified, but the fixed
point values are more closely aligned to rounded values from
https://en.wikipedia.org/wiki/YCbCr and BT.601.

The R/G/B formulas with the values prior to this change are added to
sharpyuv_csp.c as they align with the fixed values. The origin of those
coefficients is unclear. For consistency between library versions we'll
leave them as is.

Bug: webp:375011696
Change-Id: Id3f2a57530eee700cc52a899b32b25b5c015e89b
2024-11-21 16:21:45 -08:00
0ab789e067 Merge changes I6dfedfd5,I2376e2dc into main
* changes:
  rework AddVectorEq_SSE2
  rework AddVector_SSE2
2024-11-15 02:58:10 +00:00
0323645066 {ios,xcframework}build.sh: fix compilation w/Xcode 16
Don't use `-fembed-bitcode`, fixes:
ld: warning: -bitcode_bundle is no longer supported and will be ignored
ld: -mllvm and -bitcode_bundle (Xcode setting ENABLE_BITCODE=YES) cannot
    be used together

Change-Id: I4ead0fc71da39bb5ec92c1f5ba467b95ad8b7461
2024-11-14 20:26:57 +00:00
61e2cfdadd rework AddVectorEq_SSE2
Take advantage of the known sizes used by VP8LHistogramAdd() and
remove loop for the remainder. The loop was being auto-vectorized making
the code larger and slower than the vectorized C code.

For larger sizes the new code is ~3-4.5% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.

The LINE_SIZE==8 option is removed with this change. It had been set
to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8
case by 2 in any case; they both profile similarly.

Change-Id: I6dfedfd57474f44d15e2ce510a48e5252221077a
2024-11-14 12:21:39 -08:00
7bda3deb89 rework AddVector_SSE2
Take advantage of the known sizes used by VP8LHistogramAdd() and remove
loop for the remainder. The loop was being auto-vectorized making the
code larger and slower than the vectorized C code.

For larger sizes the new code is ~4-7% faster than the old code with
about the same improvement against the vectorized C code. For the
minimal size (40), the new code is ~30% faster than the C and old SSE2
code.

The LINE_SIZE==8 option is removed with this change. It had been set to
16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case
by 2 in any case; they both profile similarly.

Change-Id: I2376e2dca3bffa38477b4a432f4c533419e3be0e
2024-11-14 12:21:33 -08:00
2ddaaf0aa5 Fix variable names in SharpYuvComputeConversionMatrix
Change-Id: Ia07e71aae42396100a4f50dc104e828239522d77
2024-11-07 09:37:40 +01:00
a3ba6f19e9 Makefile.vc: fix gif2webp link error
Add missing dependency on libsharpyuv.

needed after:
f999d94f gif2webp: add -sharp_yuv/-near_lossless

Change-Id: I8bdd5c0fd4622f9c8ec6ffdf4ac11399f86350da
2024-11-06 10:14:05 -08:00
f999d94f4a gif2webp: add -sharp_yuv/-near_lossless
This change is the same as the one that introduced the options to
img2webp:
0825faa4 img2webp: add -sharp_yuv/-near_lossless

Change-Id: Id380d159299c38dd6440f833d487e00c0976afec
2024-11-04 12:29:24 -08:00
14 changed files with 126 additions and 46 deletions

View File

@ -567,7 +567,8 @@ if(WEBP_BUILD_GIF2WEBP)
add_executable(gif2webp ${GIF2WEBP_SRCS}) add_executable(gif2webp ${GIF2WEBP_SRCS})
target_link_libraries(gif2webp exampleutil imageioutil webp libwebpmux target_link_libraries(gif2webp exampleutil imageioutil webp libwebpmux
${WEBP_DEP_GIF_LIBRARIES}) ${WEBP_DEP_GIF_LIBRARIES})
target_include_directories(gif2webp PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src) target_include_directories(gif2webp PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src
${CMAKE_CURRENT_SOURCE_DIR})
install(TARGETS gif2webp RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}) install(TARGETS gif2webp RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
endif() endif()

View File

@ -393,7 +393,7 @@ $(DIRBIN)\dwebp.exe: $(IMAGEIO_UTIL_OBJS)
$(DIRBIN)\dwebp.exe: $(LIBWEBPDEMUX) $(DIRBIN)\dwebp.exe: $(LIBWEBPDEMUX)
$(DIRBIN)\gif2webp.exe: $(DIROBJ)\examples\gif2webp.obj $(EX_GIF_DEC_OBJS) $(DIRBIN)\gif2webp.exe: $(DIROBJ)\examples\gif2webp.obj $(EX_GIF_DEC_OBJS)
$(DIRBIN)\gif2webp.exe: $(EX_UTIL_OBJS) $(IMAGEIO_UTIL_OBJS) $(LIBWEBPMUX) $(DIRBIN)\gif2webp.exe: $(EX_UTIL_OBJS) $(IMAGEIO_UTIL_OBJS) $(LIBWEBPMUX)
$(DIRBIN)\gif2webp.exe: $(LIBWEBP) $(DIRBIN)\gif2webp.exe: $(LIBWEBP) $(LIBSHARPYUV)
$(DIRBIN)\vwebp.exe: $(DIROBJ)\examples\vwebp.obj $(EX_UTIL_OBJS) $(DIRBIN)\vwebp.exe: $(DIROBJ)\examples\vwebp.obj $(EX_UTIL_OBJS)
$(DIRBIN)\vwebp.exe: $(IMAGEIO_UTIL_OBJS) $(LIBWEBPDEMUX) $(LIBWEBP) $(DIRBIN)\vwebp.exe: $(IMAGEIO_UTIL_OBJS) $(LIBWEBPDEMUX) $(LIBWEBP)
$(DIRBIN)\vwebp_sdl.exe: $(DIROBJ)\extras\vwebp_sdl.obj $(DIRBIN)\vwebp_sdl.exe: $(DIROBJ)\extras\vwebp_sdl.obj

View File

@ -324,7 +324,7 @@ Per-frame options (only used for subsequent images input):
-lossless ............ use lossless mode (default) -lossless ............ use lossless mode (default)
-lossy ............... use lossy mode -lossy ............... use lossy mode
-q <float> ........... quality -q <float> ........... quality
-m <int> ............. method to use -m <int> ............. compression method (0=fast, 6=slowest), default=4
-exact, -noexact ..... preserve or alter RGB values in transparent area -exact, -noexact ..... preserve or alter RGB values in transparent area
(default: -noexact, may cause artifacts (default: -noexact, may cause artifacts
with lossy animations) with lossy animations)
@ -354,8 +354,12 @@ Options:
-lossy ................. encode image using lossy compression -lossy ................. encode image using lossy compression
-mixed ................. for each frame in the image, pick lossy -mixed ................. for each frame in the image, pick lossy
or lossless compression heuristically or lossless compression heuristically
-near_lossless <int> ... use near-lossless image preprocessing
(0..100=off), default=100
-sharp_yuv ............. use sharper (and slower) RGB->YUV conversion
(lossy only)
-q <float> ............. quality factor (0:small..100:big) -q <float> ............. quality factor (0:small..100:big)
-m <int> ............... compression method (0=fast, 6=slowest) -m <int> ............... compression method (0=fast, 6=slowest), default=4
-min_size .............. minimize output size (default:off) -min_size .............. minimize output size (default:off)
lossless compression by default; can be lossless compression by default; can be
combined with -q, -m, -lossy or -mixed combined with -q, -m, -lossy or -mixed

View File

@ -67,7 +67,7 @@ dwebp_LDADD += ../src/libwebp.la
dwebp_LDADD +=$(PNG_LIBS) $(JPEG_LIBS) dwebp_LDADD +=$(PNG_LIBS) $(JPEG_LIBS)
gif2webp_SOURCES = gif2webp.c gifdec.c gifdec.h gif2webp_SOURCES = gif2webp.c gifdec.c gifdec.h
gif2webp_CPPFLAGS = $(AM_CPPFLAGS) $(GIF_INCLUDES) gif2webp_CPPFLAGS = $(AM_CPPFLAGS) $(GIF_INCLUDES) -I$(top_srcdir)
gif2webp_LDADD = gif2webp_LDADD =
gif2webp_LDADD += libexample_util.la gif2webp_LDADD += libexample_util.la
gif2webp_LDADD += ../imageio/libimageio_util.la gif2webp_LDADD += ../imageio/libimageio_util.la

View File

@ -28,6 +28,7 @@
#endif #endif
#include <gif_lib.h> #include <gif_lib.h>
#include "sharpyuv/sharpyuv.h"
#include "webp/encode.h" #include "webp/encode.h"
#include "webp/mux.h" #include "webp/mux.h"
#include "../examples/example_util.h" #include "../examples/example_util.h"
@ -70,8 +71,14 @@ static void Help(void) {
printf(" -lossy ................. encode image using lossy compression\n"); printf(" -lossy ................. encode image using lossy compression\n");
printf(" -mixed ................. for each frame in the image, pick lossy\n" printf(" -mixed ................. for each frame in the image, pick lossy\n"
" or lossless compression heuristically\n"); " or lossless compression heuristically\n");
printf(" -near_lossless <int> ... use near-lossless image preprocessing\n"
" (0..100=off), default=100\n");
printf(" -sharp_yuv ............. use sharper (and slower) RGB->YUV "
"conversion\n"
" (lossy only)\n");
printf(" -q <float> ............. quality factor (0:small..100:big)\n"); printf(" -q <float> ............. quality factor (0:small..100:big)\n");
printf(" -m <int> ............... compression method (0=fast, 6=slowest)\n"); printf(" -m <int> ............... compression method (0=fast, 6=slowest), "
"default=4\n");
printf(" -min_size .............. minimize output size (default:off)\n" printf(" -min_size .............. minimize output size (default:off)\n"
" lossless compression by default; can be\n" " lossless compression by default; can be\n"
" combined with -q, -m, -lossy or -mixed\n" " combined with -q, -m, -lossy or -mixed\n"
@ -166,6 +173,10 @@ int main(int argc, const char* argv[]) {
} else if (!strcmp(argv[c], "-mixed")) { } else if (!strcmp(argv[c], "-mixed")) {
enc_options.allow_mixed = 1; enc_options.allow_mixed = 1;
config.lossless = 0; config.lossless = 0;
} else if (!strcmp(argv[c], "-near_lossless") && c < argc - 1) {
config.near_lossless = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-sharp_yuv")) {
config.use_sharp_yuv = 1;
} else if (!strcmp(argv[c], "-loop_compatibility")) { } else if (!strcmp(argv[c], "-loop_compatibility")) {
loop_compatibility = 1; loop_compatibility = 1;
} else if (!strcmp(argv[c], "-q") && c < argc - 1) { } else if (!strcmp(argv[c], "-q") && c < argc - 1) {
@ -226,10 +237,13 @@ int main(int argc, const char* argv[]) {
} else if (!strcmp(argv[c], "-version")) { } else if (!strcmp(argv[c], "-version")) {
const int enc_version = WebPGetEncoderVersion(); const int enc_version = WebPGetEncoderVersion();
const int mux_version = WebPGetMuxVersion(); const int mux_version = WebPGetMuxVersion();
const int sharpyuv_version = SharpYuvGetVersion();
printf("WebP Encoder version: %d.%d.%d\nWebP Mux version: %d.%d.%d\n", printf("WebP Encoder version: %d.%d.%d\nWebP Mux version: %d.%d.%d\n",
(enc_version >> 16) & 0xff, (enc_version >> 8) & 0xff, (enc_version >> 16) & 0xff, (enc_version >> 8) & 0xff,
enc_version & 0xff, (mux_version >> 16) & 0xff, enc_version & 0xff, (mux_version >> 16) & 0xff,
(mux_version >> 8) & 0xff, mux_version & 0xff); (mux_version >> 8) & 0xff, mux_version & 0xff);
printf("libsharpyuv: %d.%d.%d\n", (sharpyuv_version >> 24) & 0xff,
(sharpyuv_version >> 16) & 0xffff, sharpyuv_version & 0xff);
FREE_WARGV_AND_RETURN(EXIT_SUCCESS); FREE_WARGV_AND_RETURN(EXIT_SUCCESS);
} else if (!strcmp(argv[c], "-quiet")) { } else if (!strcmp(argv[c], "-quiet")) {
quiet = 1; quiet = 1;

View File

@ -62,7 +62,8 @@ static void Help(void) {
printf(" -lossless ............ use lossless mode (default)\n"); printf(" -lossless ............ use lossless mode (default)\n");
printf(" -lossy ............... use lossy mode\n"); printf(" -lossy ............... use lossy mode\n");
printf(" -q <float> ........... quality\n"); printf(" -q <float> ........... quality\n");
printf(" -m <int> ............. method to use\n"); printf(" -m <int> ............. compression method (0=fast, 6=slowest), "
"default=4\n");
printf(" -exact, -noexact ..... preserve or alter RGB values in transparent " printf(" -exact, -noexact ..... preserve or alter RGB values in transparent "
"area\n" "area\n"
" (default: -noexact, may cause artifacts\n" " (default: -noexact, may cause artifacts\n"

View File

@ -53,7 +53,7 @@ DEMUXLIBLIST=''
if [[ -z "${SDK}" ]]; then if [[ -z "${SDK}" ]]; then
echo "iOS SDK not available" echo "iOS SDK not available"
exit 1 exit 1
elif [[ ${SDK%%.*} -gt 8 ]]; then elif [[ ${SDK%%.*} -gt 8 && "${XCODE%%.*}" -lt 16 ]]; then
EXTRA_CFLAGS="-fembed-bitcode" EXTRA_CFLAGS="-fembed-bitcode"
elif [[ ${SDK%%.*} -le 6 ]]; then elif [[ ${SDK%%.*} -le 6 ]]; then
echo "You need iOS SDK version 6.0 or above" echo "You need iOS SDK version 6.0 or above"

View File

@ -1,5 +1,5 @@
.\" Hey, EMACS: -*- nroff -*- .\" Hey, EMACS: -*- nroff -*-
.TH GIF2WEBP 1 "July 18, 2024" .TH GIF2WEBP 1 "November 4, 2024"
.SH NAME .SH NAME
gif2webp \- Convert a GIF image to WebP gif2webp \- Convert a GIF image to WebP
.SH SYNOPSIS .SH SYNOPSIS
@ -39,6 +39,18 @@ Encode the image using lossy compression.
Mixed compression mode: optimize compression of the image by picking either Mixed compression mode: optimize compression of the image by picking either
lossy or lossless compression for each frame heuristically. lossy or lossless compression for each frame heuristically.
.TP .TP
.BI \-near_lossless " int
Specify the level of near\-lossless image preprocessing. This option adjusts
pixel values to help compressibility, but has minimal impact on the visual
quality. It triggers lossless compression mode automatically. The range is 0
(maximum preprocessing) to 100 (no preprocessing, the default). The typical
value is around 60. Note that lossy with \fB\-q 100\fP can at times yield
better results.
.TP
.B \-sharp_yuv
Use more accurate and sharper RGB->YUV conversion. Note that this process is
slower than the default 'fast' RGB->YUV conversion.
.TP
.BI \-q " float .BI \-q " float
Specify the compression factor for RGB channels between 0 and 100. The default Specify the compression factor for RGB channels between 0 and 100. The default
is 75. is 75.

View File

@ -1,5 +1,5 @@
.\" Hey, EMACS: -*- nroff -*- .\" Hey, EMACS: -*- nroff -*-
.TH IMG2WEBP 1 "September 17, 2024" .TH IMG2WEBP 1 "November 26, 2024"
.SH NAME .SH NAME
img2webp \- create animated WebP file from a sequence of input images. img2webp \- create animated WebP file from a sequence of input images.
.SH SYNOPSIS .SH SYNOPSIS
@ -88,6 +88,10 @@ Specify the compression factor between 0 and 100. The default is 75.
Specify the compression method to use. This parameter controls the Specify the compression method to use. This parameter controls the
trade off between encoding speed and the compressed file size and quality. trade off between encoding speed and the compressed file size and quality.
Possible values range from 0 to 6. Default value is 4. Possible values range from 0 to 6. Default value is 4.
When higher values are used, the encoder will spend more time inspecting
additional encoding possibilities and decide on the quality gain.
Lower value can result in faster processing time at the expense of
larger file size and lower compression quality.
.TP .TP
.B \-exact, \-noexact .B \-exact, \-noexact
Preserve or alter RGB values in transparent area. The default is Preserve or alter RGB values in transparent area. The default is

View File

@ -22,16 +22,16 @@ void SharpYuvComputeConversionMatrix(const SharpYuvColorSpace* yuv_color_space,
const float kr = yuv_color_space->kr; const float kr = yuv_color_space->kr;
const float kb = yuv_color_space->kb; const float kb = yuv_color_space->kb;
const float kg = 1.0f - kr - kb; const float kg = 1.0f - kr - kb;
const float cr = 0.5f / (1.0f - kb); const float cb = 0.5f / (1.0f - kb);
const float cb = 0.5f / (1.0f - kr); const float cr = 0.5f / (1.0f - kr);
const int shift = yuv_color_space->bit_depth - 8; const int shift = yuv_color_space->bit_depth - 8;
const float denom = (float)((1 << yuv_color_space->bit_depth) - 1); const float denom = (float)((1 << yuv_color_space->bit_depth) - 1);
float scale_y = 1.0f; float scale_y = 1.0f;
float add_y = 0.0f; float add_y = 0.0f;
float scale_u = cr; float scale_u = cb;
float scale_v = cb; float scale_v = cr;
float add_uv = (float)(128 << shift); float add_uv = (float)(128 << shift);
assert(yuv_color_space->bit_depth >= 8); assert(yuv_color_space->bit_depth >= 8);
@ -60,6 +60,10 @@ void SharpYuvComputeConversionMatrix(const SharpYuvColorSpace* yuv_color_space,
// Matrices are in YUV_FIX fixed point precision. // Matrices are in YUV_FIX fixed point precision.
// WebP's matrix, similar but not identical to kRec601LimitedMatrix // WebP's matrix, similar but not identical to kRec601LimitedMatrix
// Derived using the following formulas:
// Y = 0.2569 * R + 0.5044 * G + 0.0979 * B + 16
// U = -0.1483 * R - 0.2911 * G + 0.4394 * B + 128
// V = 0.4394 * R - 0.3679 * G - 0.0715 * B + 128
static const SharpYuvConversionMatrix kWebpMatrix = { static const SharpYuvConversionMatrix kWebpMatrix = {
{16839, 33059, 6420, 16 << 16}, {16839, 33059, 6420, 16 << 16},
{-9719, -19081, 28800, 128 << 16}, {-9719, -19081, 28800, 128 << 16},

View File

@ -175,64 +175,102 @@ static void CollectColorRedTransforms_SSE2(const uint32_t* WEBP_RESTRICT argb,
// Note we are adding uint32_t's as *signed* int32's (using _mm_add_epi32). But // Note we are adding uint32_t's as *signed* int32's (using _mm_add_epi32). But
// that's ok since the histogram values are less than 1<<28 (max picture size). // that's ok since the histogram values are less than 1<<28 (max picture size).
#define LINE_SIZE 16 // 8 or 16
static void AddVector_SSE2(const uint32_t* WEBP_RESTRICT a, static void AddVector_SSE2(const uint32_t* WEBP_RESTRICT a,
const uint32_t* WEBP_RESTRICT b, const uint32_t* WEBP_RESTRICT b,
uint32_t* WEBP_RESTRICT out, int size) { uint32_t* WEBP_RESTRICT out, int size) {
int i; int i = 0;
for (i = 0; i + LINE_SIZE <= size; i += LINE_SIZE) { int aligned_size = size & ~15;
// Size is, at minimum, NUM_DISTANCE_CODES (40) and may be as large as
// NUM_LITERAL_CODES (256) + NUM_LENGTH_CODES (24) + (0 or a non-zero power of
// 2). See the usage in VP8LHistogramAdd().
assert(size >= 16);
assert(size % 2 == 0);
do {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]); const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]); const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
#if (LINE_SIZE == 16)
const __m128i a2 = _mm_loadu_si128((const __m128i*)&a[i + 8]); const __m128i a2 = _mm_loadu_si128((const __m128i*)&a[i + 8]);
const __m128i a3 = _mm_loadu_si128((const __m128i*)&a[i + 12]); const __m128i a3 = _mm_loadu_si128((const __m128i*)&a[i + 12]);
#endif
const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i + 0]); const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i + 0]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&b[i + 4]); const __m128i b1 = _mm_loadu_si128((const __m128i*)&b[i + 4]);
#if (LINE_SIZE == 16)
const __m128i b2 = _mm_loadu_si128((const __m128i*)&b[i + 8]); const __m128i b2 = _mm_loadu_si128((const __m128i*)&b[i + 8]);
const __m128i b3 = _mm_loadu_si128((const __m128i*)&b[i + 12]); const __m128i b3 = _mm_loadu_si128((const __m128i*)&b[i + 12]);
#endif
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0)); _mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1)); _mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
#if (LINE_SIZE == 16)
_mm_storeu_si128((__m128i*)&out[i + 8], _mm_add_epi32(a2, b2)); _mm_storeu_si128((__m128i*)&out[i + 8], _mm_add_epi32(a2, b2));
_mm_storeu_si128((__m128i*)&out[i + 12], _mm_add_epi32(a3, b3)); _mm_storeu_si128((__m128i*)&out[i + 12], _mm_add_epi32(a3, b3));
#endif i += 16;
} while (i != aligned_size);
if ((size & 8) != 0) {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i + 0]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&b[i + 4]);
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
i += 8;
} }
for (; i < size; ++i) {
out[i] = a[i] + b[i]; size &= 7;
if (size == 4) {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i]);
_mm_storeu_si128((__m128i*)&out[i], _mm_add_epi32(a0, b0));
} else if (size == 2) {
const __m128i a0 = _mm_loadl_epi64((const __m128i*)&a[i]);
const __m128i b0 = _mm_loadl_epi64((const __m128i*)&b[i]);
_mm_storel_epi64((__m128i*)&out[i], _mm_add_epi32(a0, b0));
} }
} }
static void AddVectorEq_SSE2(const uint32_t* WEBP_RESTRICT a, static void AddVectorEq_SSE2(const uint32_t* WEBP_RESTRICT a,
uint32_t* WEBP_RESTRICT out, int size) { uint32_t* WEBP_RESTRICT out, int size) {
int i; int i = 0;
for (i = 0; i + LINE_SIZE <= size; i += LINE_SIZE) { int aligned_size = size & ~15;
// Size is, at minimum, NUM_DISTANCE_CODES (40) and may be as large as
// NUM_LITERAL_CODES (256) + NUM_LENGTH_CODES (24) + (0 or a non-zero power of
// 2). See the usage in VP8LHistogramAdd().
assert(size >= 16);
assert(size % 2 == 0);
do {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]); const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]); const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
#if (LINE_SIZE == 16)
const __m128i a2 = _mm_loadu_si128((const __m128i*)&a[i + 8]); const __m128i a2 = _mm_loadu_si128((const __m128i*)&a[i + 8]);
const __m128i a3 = _mm_loadu_si128((const __m128i*)&a[i + 12]); const __m128i a3 = _mm_loadu_si128((const __m128i*)&a[i + 12]);
#endif
const __m128i b0 = _mm_loadu_si128((const __m128i*)&out[i + 0]); const __m128i b0 = _mm_loadu_si128((const __m128i*)&out[i + 0]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&out[i + 4]); const __m128i b1 = _mm_loadu_si128((const __m128i*)&out[i + 4]);
#if (LINE_SIZE == 16)
const __m128i b2 = _mm_loadu_si128((const __m128i*)&out[i + 8]); const __m128i b2 = _mm_loadu_si128((const __m128i*)&out[i + 8]);
const __m128i b3 = _mm_loadu_si128((const __m128i*)&out[i + 12]); const __m128i b3 = _mm_loadu_si128((const __m128i*)&out[i + 12]);
#endif
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0)); _mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1)); _mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
#if (LINE_SIZE == 16)
_mm_storeu_si128((__m128i*)&out[i + 8], _mm_add_epi32(a2, b2)); _mm_storeu_si128((__m128i*)&out[i + 8], _mm_add_epi32(a2, b2));
_mm_storeu_si128((__m128i*)&out[i + 12], _mm_add_epi32(a3, b3)); _mm_storeu_si128((__m128i*)&out[i + 12], _mm_add_epi32(a3, b3));
#endif i += 16;
} while (i != aligned_size);
if ((size & 8) != 0) {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&out[i + 0]);
const __m128i b1 = _mm_loadu_si128((const __m128i*)&out[i + 4]);
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
i += 8;
} }
for (; i < size; ++i) {
out[i] += a[i]; size &= 7;
if (size == 4) {
const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i]);
const __m128i b0 = _mm_loadu_si128((const __m128i*)&out[i]);
_mm_storeu_si128((__m128i*)&out[i], _mm_add_epi32(a0, b0));
} else if (size == 2) {
const __m128i a0 = _mm_loadl_epi64((const __m128i*)&a[i]);
const __m128i b0 = _mm_loadl_epi64((const __m128i*)&out[i]);
_mm_storel_epi64((__m128i*)&out[i], _mm_add_epi32(a0, b0));
} }
} }
#undef LINE_SIZE
//------------------------------------------------------------------------------ //------------------------------------------------------------------------------
// Entropy // Entropy

View File

@ -11,15 +11,15 @@
// //
// The exact naming is Y'CbCr, following the ITU-R BT.601 standard. // The exact naming is Y'CbCr, following the ITU-R BT.601 standard.
// More information at: https://en.wikipedia.org/wiki/YCbCr // More information at: https://en.wikipedia.org/wiki/YCbCr
// Y = 0.2569 * R + 0.5044 * G + 0.0979 * B + 16 // Y = 0.2568 * R + 0.5041 * G + 0.0979 * B + 16
// U = -0.1483 * R - 0.2911 * G + 0.4394 * B + 128 // U = -0.1482 * R - 0.2910 * G + 0.4392 * B + 128
// V = 0.4394 * R - 0.3679 * G - 0.0715 * B + 128 // V = 0.4392 * R - 0.3678 * G - 0.0714 * B + 128
// We use 16bit fixed point operations for RGB->YUV conversion (YUV_FIX). // We use 16bit fixed point operations for RGB->YUV conversion (YUV_FIX).
// //
// For the Y'CbCr to RGB conversion, the BT.601 specification reads: // For the Y'CbCr to RGB conversion, the BT.601 specification reads:
// R = 1.164 * (Y-16) + 1.596 * (V-128) // R = 1.164 * (Y-16) + 1.596 * (V-128)
// G = 1.164 * (Y-16) - 0.813 * (V-128) - 0.391 * (U-128) // G = 1.164 * (Y-16) - 0.813 * (V-128) - 0.392 * (U-128)
// B = 1.164 * (Y-16) + 2.018 * (U-128) // B = 1.164 * (Y-16) + 2.017 * (U-128)
// where Y is in the [16,235] range, and U/V in the [16,240] range. // where Y is in the [16,235] range, and U/V in the [16,240] range.
// //
// The fixed-point implementation used here is: // The fixed-point implementation used here is:

View File

@ -223,11 +223,11 @@ WebPMux* WebPMuxCreateInternal(const WebPData* bitstream, int copy_data,
// Note this padding is historical and differs from demux.c which does not // Note this padding is historical and differs from demux.c which does not
// pad the file size. // pad the file size.
riff_size = SizeWithPadding(riff_size); riff_size = SizeWithPadding(riff_size);
if (riff_size < CHUNK_HEADER_SIZE) goto Err;
if (riff_size > size) goto Err; if (riff_size > size) goto Err;
// There's no point in reading past the end of the RIFF chunk. // There's no point in reading past the end of the RIFF chunk. Note riff_size
if (size > riff_size + CHUNK_HEADER_SIZE) { // includes CHUNK_HEADER_SIZE after SizeWithPadding().
size = riff_size + CHUNK_HEADER_SIZE; if (size > riff_size) {
size = riff_size;
} }
end = data + size; end = data + size;

View File

@ -172,7 +172,9 @@ for (( i = 0; i < $NUM_PLATFORMS; ++i )); do
CFLAGS="-pipe -isysroot ${SDKROOT} -O3 -DNDEBUG" CFLAGS="-pipe -isysroot ${SDKROOT} -O3 -DNDEBUG"
case "${PLATFORM}" in case "${PLATFORM}" in
iPhone*) iPhone*)
CFLAGS+=" -fembed-bitcode" if [[ "${XCODE%%.*}" -lt 16 ]]; then
CFLAGS+=" -fembed-bitcode"
fi
CFLAGS+=" -target ${ARCH}-apple-ios${IOS_MIN_VERSION}" CFLAGS+=" -target ${ARCH}-apple-ios${IOS_MIN_VERSION}"
[[ "${PLATFORM}" == *Simulator* ]] && CFLAGS+="-simulator" [[ "${PLATFORM}" == *Simulator* ]] && CFLAGS+="-simulator"
;; ;;