Fix OOB write in BuildHuffmanTable.

First, BuildHuffmanTable is called to check if the data is valid. If it is and the table is not big enough, more memory is allocated. This will make sure that valid (but unoptimized because of unbalanced codes) streams are still decodable. Bug: chromium:1479274 Change-Id: I31c36dbf3aa78d35ecf38706b50464fd3d375741 (cherry picked from commit 902bc91903) (cherry picked from commit 2af26267cd)
vp8l_enc,WriteImage: add missing error check
2026-04-11 07:10:01 +02:00 · 2023-09-07 18:12:56 -07:00 · 2023-06-17 04:49:53 +00:00 · 2023-03-01 01:02:38 +00:00 · 2023-02-28 00:25:46 +00:00 · 2023-02-28 00:24:49 +00:00
161 changed files with 2133 additions and 10439 deletions
--- a/.cmake-format.py
+++ b/.cmake-format.py
@@ -1,240 +0,0 @@
-# ----------------------------------
-# Options affecting listfile parsing
-# ----------------------------------
-with section("parse"):
-
-  # Specify structure for custom cmake functions
-  additional_commands = { 'foo': { 'flags': ['BAR', 'BAZ'],
-             'kwargs': {'DEPENDS': '*', 'HEADERS': '*', 'SOURCES': '*'}}}
-
-  # Override configurations per-command where available
-  override_spec = {}
-
-  # Specify variable tags.
-  vartags = []
-
-  # Specify property tags.
-  proptags = []
-
-# -----------------------------
-# Options affecting formatting.
-# -----------------------------
-with section("format"):
-
-  # Disable formatting entirely, making cmake-format a no-op
-  disable = False
-
-  # How wide to allow formatted cmake files
-  line_width = 80
-
-  # How many spaces to tab for indent
-  tab_size = 2
-
-  # If true, lines are indented using tab characters (utf-8 0x09) instead of
-  # <tab_size> space characters (utf-8 0x20). In cases where the layout would
-  # require a fractional tab character, the behavior of the  fractional
-  # indentation is governed by <fractional_tab_policy>
-  use_tabchars = False
-
-  # If <use_tabchars> is True, then the value of this variable indicates how
-  # fractional indentions are handled during whitespace replacement. If set to
-  # 'use-space', fractional indentation is left as spaces (utf-8 0x20). If set
-  # to `round-up` fractional indentation is replaced with a single tab character
-  # (utf-8 0x09) effectively shifting the column to the next tabstop
-  fractional_tab_policy = 'use-space'
-
-  # If an argument group contains more than this many sub-groups (parg or kwarg
-  # groups) then force it to a vertical layout.
-  max_subgroups_hwrap = 3
-
-  # If a positional argument group contains more than this many arguments, then
-  # force it to a vertical layout.
-  max_pargs_hwrap = 6
-
-  # If a cmdline positional group consumes more than this many lines without
-  # nesting, then invalidate the layout (and nest)
-  max_rows_cmdline = 2
-
-  # If true, separate flow control names from their parentheses with a space
-  separate_ctrl_name_with_space = False
-
-  # If true, separate function names from parentheses with a space
-  separate_fn_name_with_space = False
-
-  # If a statement is wrapped to more than one line, than dangle the closing
-  # parenthesis on its own line.
-  dangle_parens = False
-
-  # If the trailing parenthesis must be 'dangled' on its on line, then align it
-  # to this reference: `prefix`: the start of the statement,  `prefix-indent`:
-  # the start of the statement, plus one indentation  level, `child`: align to
-  # the column of the arguments
-  dangle_align = 'prefix'
-
-  # If the statement spelling length (including space and parenthesis) is
-  # smaller than this amount, then force reject nested layouts.
-  min_prefix_chars = 4
-
-  # If the statement spelling length (including space and parenthesis) is larger
-  # than the tab width by more than this amount, then force reject un-nested
-  # layouts.
-  max_prefix_chars = 10
-
-  # If a candidate layout is wrapped horizontally but it exceeds this many
-  # lines, then reject the layout.
-  max_lines_hwrap = 2
-
-  # What style line endings to use in the output.
-  line_ending = 'unix'
-
-  # Format command names consistently as 'lower' or 'upper' case
-  command_case = 'canonical'
-
-  # Format keywords consistently as 'lower' or 'upper' case
-  keyword_case = 'unchanged'
-
-  # A list of command names which should always be wrapped
-  always_wrap = []
-
-  # If true, the argument lists which are known to be sortable will be sorted
-  # lexicographicall
-  enable_sort = True
-
-  # If true, the parsers may infer whether or not an argument list is sortable
-  # (without annotation).
-  autosort = False
-
-  # By default, if cmake-format cannot successfully fit everything into the
-  # desired linewidth it will apply the last, most agressive attempt that it
-  # made. If this flag is True, however, cmake-format will print error, exit
-  # with non-zero status code, and write-out nothing
-  require_valid_layout = False
-
-  # A dictionary mapping layout nodes to a list of wrap decisions. See the
-  # documentation for more information.
-  layout_passes = {}
-
-# ------------------------------------------------
-# Options affecting comment reflow and formatting.
-# ------------------------------------------------
-with section("markup"):
-
-  # What character to use for bulleted lists
-  bullet_char = '*'
-
-  # What character to use as punctuation after numerals in an enumerated list
-  enum_char = '.'
-
-  # If comment markup is enabled, don't reflow the first comment block in each
-  # listfile. Use this to preserve formatting of your copyright/license
-  # statements.
-  first_comment_is_literal = True
-
-  # If comment markup is enabled, don't reflow any comment block which matches
-  # this (regex) pattern. Default is `None` (disabled).
-  literal_comment_pattern = None
-
-  # Regular expression to match preformat fences in comments default=
-  # ``r'^\s*([`~]{3}[`~]*)(.*)$'``
-  fence_pattern = '^\\s*([`~]{3}[`~]*)(.*)$'
-
-  # Regular expression to match rulers in comments default=
-  # ``r'^\s*[^\w\s]{3}.*[^\w\s]{3}$'``
-  ruler_pattern = '^\\s*[^\\w\\s]{3}.*[^\\w\\s]{3}$'
-
-  # If a comment line matches starts with this pattern then it is explicitly a
-  # trailing comment for the preceeding argument. Default is '#<'
-  explicit_trailing_pattern = '#<'
-
-  # If a comment line starts with at least this many consecutive hash
-  # characters, then don't lstrip() them off. This allows for lazy hash rulers
-  # where the first hash char is not separated by space
-  hashruler_min_length = 10
-
-  # If true, then insert a space between the first hash char and remaining hash
-  # chars in a hash ruler, and normalize its length to fill the column
-  canonicalize_hashrulers = True
-
-  # enable comment markup parsing and reflow
-  enable_markup = True
-
-# ----------------------------
-# Options affecting the linter
-# ----------------------------
-with section("lint"):
-
-  # a list of lint codes to disable
-  disabled_codes = []
-
-  # regular expression pattern describing valid function names
-  function_pattern = '[0-9a-z_]+'
-
-  # regular expression pattern describing valid macro names
-  macro_pattern = '[0-9A-Z_]+'
-
-  # regular expression pattern describing valid names for variables with global
-  # (cache) scope
-  global_var_pattern = '[A-Z][0-9A-Z_]+'
-
-  # regular expression pattern describing valid names for variables with global
-  # scope (but internal semantic)
-  internal_var_pattern = '_[A-Z][0-9A-Z_]+'
-
-  # regular expression pattern describing valid names for variables with local
-  # scope
-  local_var_pattern = '[a-z][a-z0-9_]+'
-
-  # regular expression pattern describing valid names for privatedirectory
-  # variables
-  private_var_pattern = '_[0-9a-z_]+'
-
-  # regular expression pattern describing valid names for public directory
-  # variables
-  public_var_pattern = '[A-Z][0-9A-Z_]+'
-
-  # regular expression pattern describing valid names for function/macro
-  # arguments and loop variables.
-  argument_var_pattern = '[a-z][a-z0-9_]+'
-
-  # regular expression pattern describing valid names for keywords used in
-  # functions or macros
-  keyword_pattern = '[A-Z][0-9A-Z_]+'
-
-  # In the heuristic for C0201, how many conditionals to match within a loop in
-  # before considering the loop a parser.
-  max_conditionals_custom_parser = 2
-
-  # Require at least this many newlines between statements
-  min_statement_spacing = 1
-
-  # Require no more than this many newlines between statements
-  max_statement_spacing = 2
-  max_returns = 6
-  max_branches = 12
-  max_arguments = 5
-  max_localvars = 15
-  max_statements = 50
-
-# -------------------------------
-# Options affecting file encoding
-# -------------------------------
-with section("encode"):
-
-  # If true, emit the unicode byte-order mark (BOM) at the start of the file
-  emit_byteorder_mark = False
-
-  # Specify the encoding of the input file. Defaults to utf-8
-  input_encoding = 'utf-8'
-
-  # Specify the encoding of the output file. Defaults to utf-8. Note that cmake
-  # only claims to support utf-8 so be careful when using anything else
-  output_encoding = 'utf-8'
-
-# -------------------------------------
-# Miscellaneous configurations options.
-# -------------------------------------
-with section("misc"):
-
-  # A dictionary containing any per-command configuration overrides. Currently
-  # only `command_case` is supported.
-  per_command = {}
--- a/.mailmap
+++ b/.mailmap
@@ -16,4 +16,3 @@ James Zern <jzern@google.com>
 Roberto Alanis <alanisbaez@google.com>
 Brian Ledger <brianpl@google.com>
 Maryla Ustarroz-Calonge <maryla@google.com>
-Yannis Guyon <yguyon@google.com>
--- a/.style.yapf
+++ b/.style.yapf
@@ -1,2 +1,2 @@
 [style]
-based_on_style = yapf
+based_on_style = chromium
--- a/9
+++ b/9
@@ -2,8 +2,6 @@ Contributors:
 - Aidan O'Loan (aidanol at gmail dot com)
 - Alan Browning (browning at google dot com)
 - Alexandru Ardelean (ardeleanalex at gmail dot com)
- Anuraag Agrawal (anuraaga at gmail dot com)
- Arthur Eubanks (aeubanks at google dot com)
 - Brian Ledger (brianpl at google dot com)
 - Charles Munger (clm at google dot com)
 - Cheng Yi (cyi at google dot com)
@@ -21,8 +19,6 @@ Contributors:
 - Jehan (jehan at girinstud dot io)
 - Jeremy Maitin-Shepard (jbms at google dot com)
 - Johann Koenig (johann dot koenig at duck dot com)
- Jonathan Grant (jgrantinfotech at gmail dot com)
- Jonliu1993 (13720414433 at 163 dot com)
 - Jovan Zelincevic (jovan dot zelincevic at imgtec dot com)
 - Jyrki Alakuijala (jyrki at google dot com)
 - Konstantin Ivlev (tomskside at gmail dot com)
@@ -32,16 +28,12 @@ Contributors:
 - Marcin Kowalczyk (qrczak at google dot com)
 - Martin Olsson (mnemo at minimum dot se)
 - Maryla Ustarroz-Calonge (maryla at google dot com)
- Masahiro Hanada (hanada at atmark-techno dot com)
 - Mikołaj Zalewski (mikolajz at google dot com)
 - Mislav Bradac (mislavm at google dot com)
- natewood (natewood at fb dot com)
 - Nico Weber (thakis at chromium dot org)
 - Noel Chromium (noel at chromium dot org)
- Nozomi Isozaki (nontan at pixiv dot co dot jp)
 - Oliver Wolff (oliver dot wolff at qt dot io)
 - Owen Rodley (orodley at google dot com)
- Ozkan Sezer (sezeroz at gmail dot com)
 - Parag Salasakar (img dot mips1 at gmail dot com)
 - Pascal Massimino (pascal dot massimino at gmail dot com)
 - Paweł Hajdan, Jr (phajdan dot jr at chromium dot org)
@@ -55,7 +47,6 @@ Contributors:
 - Somnath Banerjee (somnath dot banerjee at gmail dot com)
 - Sriraman Tallam (tmsriram at google dot com)
 - Tamar Levy (tamar dot levy at intel dot com)
- Thiago Perrotta (tperrotta at google dot com)
 - Timothy Gu (timothygu99 at gmail dot com)
 - Urvang Joshi (urvang at google dot com)
 - Vikas Arora (vikasa at google dot com)
--- a/Android.mk
+++ b/Android.mk
@@ -164,7 +164,6 @@ utils_dec_srcs := \
    src/utils/color_cache_utils.c \
    src/utils/filters_utils.c \
    src/utils/huffman_utils.c \
-    src/utils/palette.c \
    src/utils/quant_levels_dec_utils.c \
    src/utils/random_utils.c \
    src/utils/rescaler_utils.c \
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -51,8 +51,6 @@ option(WEBP_ENABLE_SWAP_16BIT_CSP "Enable byte swap for 16 bit colorspaces."
       OFF)
 set(WEBP_BITTRACE "0" CACHE STRING "Bit trace mode (0=none, 1=bit, 2=bytes)")
 set_property(CACHE WEBP_BITTRACE PROPERTY STRINGS 0 1 2)
-option(WEBP_ENABLE_WUNUSED_RESULT "Add [[nodiscard]] to some functions. \
-       CMake must be at least 3.21 to force C23" OFF)

 if(WEBP_LINK_STATIC)
  if(WIN32)
@@ -84,8 +82,7 @@ if(WEBP_BUILD_WEBP_JS)
  set(WEBP_USE_THREAD OFF)

  if(WEBP_ENABLE_SIMD)
-    message(NOTICE
-            "wasm2js does not support SIMD, disabling webp.js generation.")
+    message("wasm2js does not support SIMD, disabling webp.js generation.")
  endif()
 endif()

@@ -101,20 +98,10 @@ if(NOT CMAKE_BUILD_TYPE)
 endif()

 # Include dependencies.
-if(WEBP_BUILD_ANIM_UTILS
-   OR WEBP_BUILD_CWEBP
-   OR WEBP_BUILD_DWEBP
-   OR WEBP_BUILD_EXTRAS
-   OR WEBP_BUILD_GIF2WEBP
-   OR WEBP_BUILD_IMG2WEBP)
-  set(WEBP_FIND_IMG_LIBS TRUE)
-else()
-  set(WEBP_FIND_IMG_LIBS FALSE)
-endif()
 include(cmake/deps.cmake)
 include(GNUInstallDirs)

-if(BUILD_SHARED_LIBS AND NOT DEFINED CMAKE_INSTALL_RPATH)
+if(BUILD_SHARED_LIBS AND NOT CMAKE_INSTALL_RPATH)
  # Set the rpath to match autoconf/libtool behavior. Note this must be set
  # before target creation.
  set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}")
@@ -135,7 +122,7 @@ if(WEBP_UNICODE)
  add_definitions(-DUNICODE -D_UNICODE)
 endif()

-if(WIN32 AND BUILD_SHARED_LIBS)
+if(MSVC AND BUILD_SHARED_LIBS)
  add_definitions(-DWEBP_DLL)
 endif()

@@ -163,20 +150,7 @@ if(MSVC)
  set(CMAKE_STATIC_LIBRARY_PREFIX "${webp_libname_prefix}")
 endif()

-if(NOT WIN32)
-  set(CMAKE_C_VISIBILITY_PRESET hidden)
-endif()
-
-if(WEBP_ENABLE_WUNUSED_RESULT)
-  if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.21.0)
-    set(CMAKE_C_STANDARD 23)
-  else()
-    unset(CMAKE_C_STANDARD)
-    add_compile_options($<$<COMPILE_LANGUAGE:C>:-std=gnu2x>)
-  endif()
-  add_compile_options(-Wunused-result)
-  add_definitions(-DWEBP_ENABLE_NODISCARD=1)
-endif()
+set(CMAKE_C_VISIBILITY_PRESET hidden)

 # ##############################################################################
 # Android only.
@@ -478,7 +452,6 @@ endif()
 if(WEBP_BUILD_ANIM_UTILS
   OR WEBP_BUILD_CWEBP
   OR WEBP_BUILD_DWEBP
-   OR WEBP_BUILD_EXTRAS
   OR WEBP_BUILD_GIF2WEBP
   OR WEBP_BUILD_IMG2WEBP
   OR WEBP_BUILD_VWEBP
@@ -515,8 +488,6 @@ if(WEBP_BUILD_ANIM_UTILS
    TARGET exampleutil imageioutil imagedec imageenc
    PROPERTY INCLUDE_DIRECTORIES ${CMAKE_CURRENT_SOURCE_DIR}/src
             ${CMAKE_CURRENT_BINARY_DIR}/src)
-  target_include_directories(imagedec PRIVATE ${WEBP_DEP_IMG_INCLUDE_DIRS})
-  target_include_directories(imageenc PRIVATE ${WEBP_DEP_IMG_INCLUDE_DIRS})
 endif()

 if(WEBP_BUILD_DWEBP)
@@ -575,8 +546,7 @@ if(WEBP_BUILD_IMG2WEBP)
  add_executable(img2webp ${IMG2WEBP_SRCS})
  target_link_libraries(img2webp exampleutil imagedec imageioutil webp
                        libwebpmux)
-  target_include_directories(img2webp PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src
-                                              ${CMAKE_CURRENT_SOURCE_DIR})
+  target_include_directories(img2webp PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src)
  install(TARGETS img2webp RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
 endif()

@@ -656,82 +626,44 @@ if(WEBP_BUILD_EXTRAS)
                                                  ${CMAKE_CURRENT_BINARY_DIR})

  # vwebp_sdl
-  find_package(SDL2 QUIET)
-  if(WEBP_BUILD_VWEBP AND SDL2_FOUND)
+  find_package(SDL)
+  if(WEBP_BUILD_VWEBP AND SDL_FOUND)
    add_executable(vwebp_sdl ${VWEBP_SDL_SRCS})
-    target_link_libraries(vwebp_sdl ${SDL2_LIBRARIES} imageioutil webp)
+    target_link_libraries(vwebp_sdl ${SDL_LIBRARY} imageioutil webp)
    target_include_directories(
      vwebp_sdl PRIVATE ${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR}
-                        ${CMAKE_CURRENT_BINARY_DIR}/src ${SDL2_INCLUDE_DIRS})
+                        ${CMAKE_CURRENT_BINARY_DIR}/src ${SDL_INCLUDE_DIR})
    set(WEBP_HAVE_SDL 1)
    target_compile_definitions(vwebp_sdl PUBLIC WEBP_HAVE_SDL)
-
-    set(CMAKE_REQUIRED_INCLUDES "${SDL2_INCLUDE_DIRS}")
-    check_c_source_compiles(
-      "
-        #define SDL_MAIN_HANDLED
-        #include \"SDL.h\"
-        int main(void) {
-          return 0;
-        }
-      "
-      HAVE_JUST_SDL_H)
-    set(CMAKE_REQUIRED_INCLUDES)
-    if(HAVE_JUST_SDL_H)
-      target_compile_definitions(vwebp_sdl PRIVATE WEBP_HAVE_JUST_SDL_H)
-    endif()
  endif()
 endif()

 if(WEBP_BUILD_WEBP_JS)
-  # The default stack size changed from 5MB to 64KB in 3.1.27. See
-  # https://crbug.com/webp/614.
-  if(EMSCRIPTEN_VERSION VERSION_GREATER_EQUAL "3.1.27")
-    # TOTAL_STACK size was renamed to STACK_SIZE in 3.1.27. The old name was
-    # kept for compatibility, but prefer the new one in case it is removed in
-    # the future.
-    set(emscripten_stack_size "-sSTACK_SIZE=5MB")
-  else()
-    set(emscripten_stack_size "-sTOTAL_STACK=5MB")
-  endif()
-  find_package(SDL2 REQUIRED)
  # wasm2js does not support SIMD.
  if(NOT WEBP_ENABLE_SIMD)
    # JavaScript version
    add_executable(webp_js ${CMAKE_CURRENT_SOURCE_DIR}/extras/webp_to_sdl.c)
-    target_link_libraries(webp_js webpdecoder SDL2)
+    target_link_libraries(webp_js webpdecoder SDL)
    target_include_directories(webp_js PRIVATE ${CMAKE_CURRENT_BINARY_DIR})
    set(WEBP_HAVE_SDL 1)
    set_target_properties(
      webp_js
-      PROPERTIES
-        # Emscripten puts -sUSE_SDL2=1 in this variable, though it's needed at
-        # compile time to ensure the headers are downloaded.
-        COMPILE_OPTIONS "${SDL2_LIBRARIES}"
-        LINK_FLAGS
-        "-sWASM=0 ${emscripten_stack_size} \
+      PROPERTIES LINK_FLAGS "-sWASM=0 \
         -sEXPORTED_FUNCTIONS=_WebPToSDL -sINVOKE_RUN=0 \
-         -sEXPORTED_RUNTIME_METHODS=cwrap ${SDL2_LIBRARIES} \
-         -sALLOW_MEMORY_GROWTH")
+         -sEXPORTED_RUNTIME_METHODS=cwrap")
    set_target_properties(webp_js PROPERTIES OUTPUT_NAME webp)
    target_compile_definitions(webp_js PUBLIC EMSCRIPTEN WEBP_HAVE_SDL)
  endif()

  # WASM version
  add_executable(webp_wasm ${CMAKE_CURRENT_SOURCE_DIR}/extras/webp_to_sdl.c)
-  target_link_libraries(webp_wasm webpdecoder SDL2)
+  target_link_libraries(webp_wasm webpdecoder SDL)
  target_include_directories(webp_wasm PRIVATE ${CMAKE_CURRENT_BINARY_DIR})
  set_target_properties(
    webp_wasm
-    PROPERTIES
-      # Emscripten puts -sUSE_SDL2=1 in this variable, though it's needed at
-      # compile time to ensure the headers are downloaded.
-      COMPILE_OPTIONS "${SDL2_LIBRARIES}"
-      LINK_FLAGS
-      "-sWASM=1 ${emscripten_stack_size} \
+    PROPERTIES LINK_FLAGS "-sWASM=1 \
       -sEXPORTED_FUNCTIONS=_WebPToSDL -sINVOKE_RUN=0 \
-       -sEXPORTED_RUNTIME_METHODS=cwrap ${SDL2_LIBRARIES} \
-       -sALLOW_MEMORY_GROWTH")
+       -sEXPORTED_RUNTIME_METHODS=cwrap")
  target_compile_definitions(webp_wasm PUBLIC EMSCRIPTEN WEBP_HAVE_SDL)

  target_compile_definitions(webpdspdecode PUBLIC EMSCRIPTEN)
@@ -810,8 +742,7 @@ endif()
 configure_package_config_file(
  ${CMAKE_CURRENT_SOURCE_DIR}/cmake/WebPConfig.cmake.in
  ${CMAKE_CURRENT_BINARY_DIR}/WebPConfig.cmake
-  INSTALL_DESTINATION ${ConfigPackageLocation}
-  PATH_VARS CMAKE_INSTALL_INCLUDEDIR)
+  INSTALL_DESTINATION ${ConfigPackageLocation})

 # Install the generated CMake files.
 install(FILES "${CMAKE_CURRENT_BINARY_DIR}/WebPConfigVersion.cmake"
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -19,59 +19,10 @@ again.

 All submissions, including submissions by project members, require review. We
 use a [Gerrit](https://www.gerritcodereview.com) instance hosted at
-https://chromium-review.googlesource.com for this purpose.
-
-## Sending patches
-
-The basic git workflow for modifying libwebp code and sending for review is:
-
-1.  Get the latest version of the repository locally:
-
-    ```sh
-    git clone https://chromium.googlesource.com/webm/libwebp && cd libwebp
-    ```
-
-2.  Copy the commit-msg script into ./git/hooks (this will add an ID to all of
-    your commits):
-
-    ```sh
-    curl -Lo .git/hooks/commit-msg https://chromium-review.googlesource.com/tools/hooks/commit-msg && chmod u+x .git/hooks/commit-msg
-    ```
-
-3.  Modify the local copy of libwebp. Make sure the code
-    [builds successfully](https://chromium.googlesource.com/webm/libwebp/+/HEAD/doc/building.md#cmake).
-
-4.  Choose a short and representative commit message:
-
-    ```sh
-    git commit -a -m "Set commit message here"
-    ```
-
-5.  Send the patch for review:
-
-    ```sh
-    git push https://chromium-review.googlesource.com/webm/libwebp HEAD:refs/for/main
-    ```
-
-    Go to https://chromium-review.googlesource.com to view your patch and
-    request a review from the maintainers.
-
-See the
+https://chromium-review.googlesource.com for this purpose. See the
 [WebM Project page](https://www.webmproject.org/code/contribute/submitting-patches/)
 for additional details.

-## Code Style
-
-The C code style is based on the
-[Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) and
-`clang-format --style=Google`, though this project doesn't use the tool to
-enforce the formatting.
-
-CMake files are formatted with
-[cmake-format](https://cmake-format.readthedocs.io/en/latest/). `cmake-format
-i` can be used to format individual files, it will use the settings from
-`.cmake-format.py`.
-
 ## Community Guidelines

 This project follows
--- a/232
+++ b/232
@@ -1,229 +1,3 @@
-f13c0886 NEWS: fix date
-74555950 Merge "vwebp: fix window title when options are given" into 1.4.0
-d781646c vwebp: fix window title when options are given
-c2e394de update NEWS
-f6d15cb7 bump version to 1.4.0
-57c388b8 update AUTHORS
-b3d1b2cb Merge changes I26f4aa22,I83386b6c,I320ed1a2 into main
-07216886 webp-container-spec: fix VP8 chunk ref ('VP8'->'VP8 ')
-f88666eb webp_js/*.html: fix canvas mapping
-e2c8f233 cmake,wasm: simplify SDL2 related flags
-d537cd37 cmake: fix vwebp_sdl compile w/libsdl-org release
-6c484cbf CMakeLists.txt: add missing WEBP_BUILD_EXTRAS check
-7b0bc235 man/cwebp.1: add more detail to -partition_limit
-3c0011bb WebPMuxGetChunk: add an assert
-955a3d14 Merge "muxread,MuxGet: add an assert" into main
-00abc000 muxread,MuxGet: add an assert
-40e85a0b Have the window title reflect the filename.
-1bf46358 man/cwebp.1: clarify -pass > 1 behavior w/o -size/-psnr
-eba03acb webp-container-spec: replace 'above' with 'earlier'
-a16d30cb webp-container-spec: clarify chunk order requirements
-8a7e9112 Merge "CMakeLists.txt: apply cmake-format" into main
-7fac6c1b Merge "Copy C code to not have multiplication overflow" into main
-e2922e43 Merge "Check for the presence of the ANDROID_ABI variable" into main
-501d9274 Copy C code to not have multiplication overflow
-fba7d62e CMakeLists.txt: apply cmake-format
-661c1b66 Merge "windows exports: use dllexport attribute, instead of visibility." into main
-8487860a windows exports: use dllexport attribute, instead of visibility.
-8ea678b9 webp/mux.h: data lifetime note w/copy_data=0
-79e05c7f Check for the presence of the ANDROID_ABI variable
-45f995a3 Expose functions for managing non-image chunks on WebPAnimEncoder
-1fb9f3dc gifdec: fix ErrorGIFNotAvailable() declaration
-4723db65 cosmetics: s/SANITY_CHECK/DCHECK/
-f4b9bc9e clear -Wextra-semi-stmt warnings
-713982b8 Limit animdecoder_fuzzer to 320MB
-cbe825e4 cmake: fix sharpyuv simd files' build
-f99305e9 Makefile.vc: add ARM64 support
-5efd6300 mv SharpYuvEstimate420Risk to extras/
-e78e924f Makefile.vc: add sharpyuv_risk_table.obj
-d7a0506d Add YUV420 riskiness metric.
-89c5b917 Merge "BuildHuffmanTable check sorted[] array bounds before writing" into main
-34c80749 Remove alpha encoding pessimization.
-13d9c30b Add a WEBP_NODISCARD
-24d7f9cb Switch code to SDL2.
-0b56dedc BuildHuffmanTable check sorted[] array bounds before writing
-a429c0de sharpyuv: convert some for() to do/while
-f0cd7861 DoSharpArgbToYuv: remove constant from loop
-339231cc SharpYuvConvertWithOptions,cosmetics: fix formatting
-307071f1 Remove medium/large code model-specific inline asm
-deadc339 Fix transfer functions where toGamma and toLinear are swapped.
-e7b78d43 Merge "Fix bug in FromLinearLog100." into main
-15a1309e Merge "webp-lossless-bitstream-spec: delete extra blank line" into main
-54ca9752 Fix bug in FromLinearLog100.
-d2cb2d8c Dereference after NULL check.
-e9d50107 webp-lossless-bitstream-spec: delete extra blank line
-78657971 Merge changes Ief442c90,Ie6e9c9a5 into main
-e30a5884 webp-lossless-bitstream-spec: update variable names
-09ca1368 Merge "webp-container-spec: change assert to MUST be TRUE" into main
-38cb4fc0 iosbuild,xcframeworkbuild: add SharpYuv framework
-40afa926 webp-lossless-bitstream-spec: simplify abstract
-9db21143 webp-container-spec: change assert to MUST be TRUE
-cdbf88ae Fix typo in API docs for incremental decoding
-05c46984 Reformat vcpkg build instructions.
-8534f539 Merge "Never send VP8_STATUS_SUSPENDED back in non-incremental." into main
-35e197bd Never send VP8_STATUS_SUSPENDED back in non-incremental.
-61441425 Add vcpkg installation instructions
-dce8397f Fix next is invalid pointer when WebPSafeMalloc fails
-57c58105 Cmake: wrong public macro WEBP_INCLUDE_DIRS
-c1ffd9ac Merge "vp8l_enc: fix non-C90 code" into main
-a3965948 Merge changes If628bb93,Ic79f6309,I45f0db23 into main
-f80e9b7e vp8l_enc: fix non-C90 code
-accd141d Update lossless spec for two simple codes.
-ac17ffff Fix non-C90 code.
-433c7dca Fix static analyzer warnings.
-5fac76cf Merge tag 'v1.3.2'
-ca332209 update ChangeLog (tag: v1.3.2)
-1ace578c update NEWS
-63234c42 bump version to 1.3.2
-a35ea50d Add a fuzzer for ReadHuffmanCodes
-95ea5226 Fix invalid incremental decoding check.
-2af26267 Fix OOB write in BuildHuffmanTable.
-902bc919 Fix OOB write in BuildHuffmanTable.
-7ba44f80 Homogenize "__asm__ volatile" vs "asm volatile"
-68e27135 webp-container-spec: reorder example chunk layout
-943b932a Merge changes I6a4d0a04,Ibc37b91e into main
-1cc94f95 decode.h: wrap idec example in /* */
-63acdd1e decode.h: fix decode example
-aac5c5d0 ReadHuffmanCode: rm redundant num code lengths check
-a2de25f6 webp-lossless-bitstream-spec: normalize list item case
-68820f0e webp-lossless-bitstream-spec: normalize pixel ref
-cdb31aa8 webp-lossless-bitstream-spec: add missing periods
-0535a8cf webp-lossless-bitstream-spec: fix grammar
-b6c4ce26 normalize numbered list item format
-dd7364c3 Merge "palette.c: fix msvc warnings" into main
-c63c5df6 palette.c: fix msvc warnings
-0a2cad51 webp-container-spec: move terms from intro section
-dd88d2ff webp-lossless-bitstream-spec: color_cache -> color cache
-6e750547 Merge changes I644d7d39,Icf05491e,Ic02e6652,I63b11258 into main
-67a7cc2b webp-lossless-bitstream-spec: fix code blocks
-1432ebba Refactor palette sorting computation.
-cd436142 webp-lossless-bitstream-spec: block -> chunk
-3cb66f64 webp-lossless-bitstream-spec: add some missing commas
-56471a53 webp-lossless-bitstream-spec: normalize item text in 5.1
-af7fbfd2 vp8l_dec,ReadTransform: improve error status reporting
-7d8e0896 vp8l_dec: add VP8LSetError()
-a71ce1cf animencoder_fuzzer: fix error check w/Nallocfuzz
-e94b36d6 webp-lossless-bitstream-spec: relocate details from 5.1
-84628e56 webp-lossless-bitstream-spec: clarify image width changes
-ee722997 alpha_dec: add missing VP8SetError()
-0081693d enc_dec_fuzzer: use WebPDecode()
-0fcb311c enc_dec_fuzzer: fix WebPEncode/pic.error_code check
-982c177c webp-lossless-bitstream-spec: fix struct member refs
-56cf5625 webp-lossless-bitstream-spec: use RFC 7405 for ABNF
-6c6b3fd3 webp-lossless-bitstream-spec,cosmetics: delete blank lines
-29b9eb15 Merge changes Id56ca4fd,I662bd1d7 into main
-47c0af8d ReadHuffmanCodes: rm max_alphabet_size calc
-b92deba3 animencoder_fuzzer: no WebPAnimEncoderAssemble check w/nallocfuzz
-6be9bf8b animencoder_fuzzer: fix leak on alloc failure
-5c965e55 vp8l_dec,cosmetics: add some /*param=*/ comments
-e4fc2f78 webp-lossless-bitstream-spec: add validity note for max_symbol
-71916726 webp-lossless-bitstream-spec: fix max_symbol definition
-eac3bd5c Have the palette code be in its own file.
-e2c85878 Add an initializer for the SharpYuvOptions struct.
-4222b006 Merge tag 'v1.3.1'
-25d94f47 Implement more transfer functions in libsharpyuv
-2153a679 Merge changes Id0300937,I5dba5ccf,I57bb68e0,I2dba7b4e,I172aca36, ... into main
-4298e976 webp-lossless-bitstream-spec: add PredictorTransformOutput
-cd7e02be webp-lossless-bitstream-spec: fix RIFF-header ABNF
-6c3845f9 webp-lossless-bitstream-spec: split LZ77 Backward Ref section
-7f1b6799 webp-lossless-bitstream-spec: split Meta Prefix Codes section
-7b634d8f webp-lossless-bitstream-spec: note transform order
-6d6d4915 webp-lossless-bitstream-spec: update transformations text
-fd7bb21c update ChangeLog (tag: v1.3.1-rc2, tag: v1.3.1)
-e1adea50 update NEWS
-6b1c722a lossless_common.h,cosmetics: fix a typo
-08d60d60 webp-lossless-bitstream-spec: split code length section
-7a12afcc webp-lossless-bitstream-spec: rm unused anchor
-43393320 enc/*: normalize WebPEncodingSetError() calls
-287fdefe enc/*: add missing WebPEncodingSetError() calls
-c3bd7cff EncodeAlphaInternal: add missing error check
-14a9dbfb webp-lossless-bitstream-spec: refine single node text
-64819c7c Implement ExtractGreen_SSE2
-d49cfbb3 vp8l_enc,WriteImage: add missing error check
-2e5a9ec3 muxread,MuxImageParse: add missing error checks
-ebb6f949 cmake,emscripten: explicitly set stack size
-59a2b1f9 WebPDecodeYUV: check u/v/stride/uv_stride ptrs
-8e965ccb Call png_get_channels() to see if image has alpha
-fe80fbbd webp-container-spec: add some missing commas
-e8ed3176 Merge "treat FILTER_NONE as a regular Unfilter[] call" into main
-03a7a048 webp-lossless-bitstream-spec: rm redundant statement
-c437c7aa webp-lossless-bitstream-spec: mv up prefix code group def
-e4f17a31 webp-lossless-bitstream-spec: fix section reference
-e2ecd5e9 webp-lossless-bitstream-spec: clarify ABNF syntax
-8b55425a webp-lossless-bitstream-spec: refine pixel copy text
-29c9f2d4 webp-lossless-bitstream-spec: minor wording updates
-6b02f660 treat FILTER_NONE as a regular Unfilter[] call
-7f75c91c webp-container-spec: fix location of informative msg
-f6499943 webp-container-spec: consistently quote FourCCs
-49918af3 webp-container-spec: minor wording updates
-7f0a3419 update ChangeLog (tag: v1.3.1-rc1)
-bab7efbe update NEWS
-7138bf8f bump version to 1.3.1
-435b4ded update AUTHORS
-47351229 update .mailmap
-46bc4fc9 Merge "Switch ExtraCost to ints and implement it in SSE." into main
-828b4ce0 Switch ExtraCost to ints and implement it in SSE.
-ff6c7f4e CONTRIBUTING.md: add C style / cmake-format notes
-dd530437 add .cmake-format.py
-adbe2cb1 cmake,cosmetics: apply cmake-format
-15b36508 doc/webp-container-spec: rm future codec comment
-c369c4bf doc/webp-lossless-bitstream-spec: improve link text
-1de35f47 doc/webp-container-spec: don't use 'currently'
-bb06a16e doc/webp-container-spec: prefer present tense
-9f38b71e doc/webp-lossless-bitstream-spec: prefer present tense
-7acb6b82 doc/webp-container-spec: avoid i.e. & e.g.
-4967e7cd doc/webp-lossless-bitstream-spec: avoid i.e. & e.g.
-e3366659 Merge "Do not find_package image libraries if not needed." into main
-428588ef clarify single leaf node trees and use of canonical prefix coding
-709ec152 Do not find_package image libraries if not needed.
-8dd80ef8 fuzz_utils.h: lower kFuzzPxLimit w/ASan
-8f187b9f Clean message calls in CMake
-cba30078 WebPConfig.cmake.in: use calculated include path
-6cf9a76a Merge "webp-lossless-bitstream-spec: remove use of 'dynamics'" into main
-740943b2 Merge "Specialize and optimize ITransform_SSE2 using do_two" into main
-2d547e24 Compare kFuzzPxLimit to max_num_operations
-ac42dde1 Specialize and optimize ITransform_SSE2 using do_two
-17e0ef1d webp-lossless-bitstream-spec: remove use of 'dynamics'
-ed274371 neon.h,cosmetics: clear a couple lint warnings
-3fb82947 cpu.h,cosmetics: segment defines
-0c496a4f cpu.h: add WEBP_AARCH64
-8151f388 move VP8GetCPUInfo declaration to cpu.c
-916548c2 Make kFuzzPxLimit sanitizer dependent
-4070b271 advanced_api_fuzzer: reduce scaling limit
-761f49c3 Merge "webp-lossless-bitstream-spec: add missing bits to ABNF" into main
-84d04c48 webp-lossless-bitstream-spec: add missing bits to ABNF
-0696e1a7 advanced_api_fuzzer: reduce scaling limit
-93d88aa2 Merge "deps.cmake: remove unneeded header checks" into main
-118e0035 deps.cmake: remove unneeded header checks
-4c3d7018 webp-lossless-bitstream-spec: condense normal-prefix-code
-a6a09b32 webp-lossless-bitstream-spec: fix 2 code typos
-50ac4f7c Merge "cpu.h: enable NEON w/_M_ARM64EC" into main
-4b7d7b4f Add contribution instructions
-0afbd97b cpu.h: enable NEON w/_M_ARM64EC
-349f4353 Merge changes Ibd89e56b,Ic57e7f84,I89096614 into main
-8f7513b7 upsampling_neon.c: fix WEBP_SWAP_16BIT_CSP check
-cbf624b5 advanced_api_fuzzer: reduce scaling limit
-89edfdd1 Skip slow scaling in libwebp advanced_api_fuzzer
-859f19f7 Reduce libwebp advanced_api_fuzzer threshold
-a4f04835 Merge changes Ic389aaa2,I329ccd79 into main
-1275fac8 Makefile.vc: fix img2webp link w/dynamic cfg
-2fe27bb9 img2webp: normalize help output
-24bed3d9 cwebp: reflow -near_lossless help text
-0825faa4 img2webp: add -sharp_yuv/-near_lossless
-d64e6d7d Merge "PaletteSortModifiedZeng: fix leak on error" into main
-0e12a22d Merge "EncodeAlphaInternal: clear result->bw on error" into main
-0edbb6ea PaletteSortModifiedZeng: fix leak on error
-41ffe04e Merge "Update yapf style from "chromium" to "yapf"" into main
-2d9d9265 Update yapf style from "chromium" to "yapf"
-a486d800 EncodeAlphaInternal: clear result->bw on error
-1347a32d Skip big scaled advanced_api_fuzzer
-52b6f067 Fix scaling limit in advanced_api_fuzzer.c
-73618428 Limit scaling in libwebp advanced_api_fuzzer.c
-b54d21a0 Merge "CMakeLists.txt: allow CMAKE_INSTALL_RPATH to be set empty" into main
-31c28db5 libwebp{,demux,mux}.pc.in: Requires -> Requires.private
-d9a505ff CMakeLists.txt: allow CMAKE_INSTALL_RPATH to be set empty
-bdf33d03 Merge tag 'v1.3.0'
-b5577769 update ChangeLog (tag: v1.3.0-rc1, tag: v1.3.0)
 0ba77244 update NEWS
 e763eb1e bump version to 1.3.0
 2a8686fc update AUTHORS
@@ -329,7 +103,7 @@ c626e7d5 cwebp: fix WebPPictureHasTransparency call
 866e349c Merge tag 'v1.2.4'
 c170df38 Merge "Create libsharpyuv.a in makefile.unix." into main
 9d7ff74a Create libsharpyuv.a in makefile.unix.
-0d1f1254 update ChangeLog (tag: v1.2.4)
+0d1f1254 update ChangeLog (tag: v1.2.4, origin/1.2.4)
 fcbc2d78 Merge "doc/*.txt: restrict code to 69 columns" into main
 4ad0e189 Merge "webp-container-spec.txt: normalize fourcc spelling" into main
 980d2488 update NEWS
@@ -1360,7 +1134,7 @@ b016cb91 NEON: faster fancy upsampling
 f04eb376 Merge tag 'v0.5.2'
 341d711c NEON: 5% faster conversion to RGB565 and RGBA4444
 abb54827 remove Clang warnings with unused arch arguments.
-ece9684f update ChangeLog (tag: v0.5.2-rc2, tag: v0.5.2)
+ece9684f update ChangeLog (tag: v0.5.2-rc2, tag: v0.5.2, origin/0.5.2)
 aa7744ca anim_util: quiet implicit conv warnings in 32-bit
 d9120271 jpegdec: correct ContextFill signature
 24eb3940 Remove some errors when compiling the code as C++.
@@ -1647,7 +1421,7 @@ bbb6ecd9 Merge "Add MSA optimized distortion functions"
 c0991a14 io,EmitRescaledAlphaYUV: factor out a common expr
 48bf5ed1 build.gradle: remove tab
 bfef6c9f Merge tag 'v0.5.1'
-3d97bb75 update ChangeLog (tag: v0.5.1)
+3d97bb75 update ChangeLog (tag: v0.5.1, origin/0.5.1)
 deb54d91 Clarify the expected 'config' lifespan in WebPIDecode()
 435308e0 Add MSA optimized encoder transform functions
 dce64bfa Add MSA optimized alpha filter functions
--- a/Makefile.vc
+++ b/Makefile.vc
@@ -12,8 +12,6 @@ LIBSHARPYUV_BASENAME = libsharpyuv
 ARCH = x86
 !ELSE IF ! [ cl 2>&1 | find "x64" > NUL ]
 ARCH = x64
-!ELSE IF ! [ cl 2>&1 | find "ARM64" > NUL ]
-ARCH = ARM64
 !ELSE IF ! [ cl 2>&1 | find "ARM" > NUL ]
 ARCH = ARM
 !ELSE
@@ -323,7 +321,6 @@ ENC_OBJS = \
 EXTRAS_OBJS = \
    $(DIROBJ)\extras\extras.obj \
    $(DIROBJ)\extras\quality_estimate.obj \
-    $(DIROBJ)\extras\sharpyuv_risk_table.obj \

 IMAGEIO_UTIL_OBJS = \
    $(DIROBJ)\imageio\imageio_util.obj \
@@ -339,7 +336,6 @@ UTILS_DEC_OBJS = \
    $(DIROBJ)\utils\color_cache_utils.obj \
    $(DIROBJ)\utils\filters_utils.obj \
    $(DIROBJ)\utils\huffman_utils.obj \
-    $(DIROBJ)\utils\palette.obj \
    $(DIROBJ)\utils\quant_levels_dec_utils.obj \
    $(DIROBJ)\utils\rescaler_utils.obj \
    $(DIROBJ)\utils\random_utils.obj \
@@ -404,7 +400,7 @@ $(DIRBIN)\webpmux.exe: $(EX_UTIL_OBJS) $(IMAGEIO_UTIL_OBJS) $(LIBWEBP)
 $(DIRBIN)\img2webp.exe: $(DIROBJ)\examples\img2webp.obj $(LIBWEBPMUX)
 $(DIRBIN)\img2webp.exe: $(IMAGEIO_DEC_OBJS)
 $(DIRBIN)\img2webp.exe: $(EX_UTIL_OBJS) $(IMAGEIO_UTIL_OBJS)
-$(DIRBIN)\img2webp.exe: $(LIBWEBPDEMUX) $(LIBWEBP) $(LIBSHARPYUV)
+$(DIRBIN)\img2webp.exe: $(LIBWEBPDEMUX) $(LIBWEBP)
 $(DIRBIN)\get_disto.exe: $(DIROBJ)\extras\get_disto.obj
 $(DIRBIN)\get_disto.exe: $(IMAGEIO_DEC_OBJS) $(IMAGEIO_UTIL_OBJS)
 $(DIRBIN)\get_disto.exe: $(LIBWEBPDEMUX) $(LIBWEBP)
--- a/32
+++ b/32
@@ -1,35 +1,3 @@
- 4/2/2024: version 1.4.0
-  This is a binary compatible release.
-  * API changes:
-    - libwebpmux: WebPAnimEncoderSetChunk, WebPAnimEncoderGetChunk,
-                  WebPAnimEncoderDeleteChunk
-    - libsharpyuv: SharpYuvOptionsInit, SharpYuvConvertWithOptions
-    - extras: SharpYuvEstimate420Risk
-  * further security related hardening in libwebp & examples
-  * some minor optimizations in the lossless encoder
-  * improvements and corrections in webp-container-spec.txt and
-    webp-lossless-bitstream-spec.txt (#611)
-  * miscellaneous warning, bug & build fixes (#615, #619, #632, #635)
-
- 9/13/2023: version 1.3.2
-  This is a binary compatible release.
-  * security fix for lossless decoder (chromium: #1479274, CVE-2023-4863)
-
- 6/23/2023: version 1.3.1
-  This is a binary compatible release.
-  * security fixes for lossless encoder (#603, chromium: #1420107, #1455619,
-    CVE-2023-1999)
-  * improve error reporting through WebPPicture error codes
-  * fix upsampling for RGB565 and RGBA4444 in NEON builds
-  * img2webp: add -sharp_yuv & -near_lossless
-  * Windows builds:
-    - fix compatibility with clang-cl (#607)
-    - improve Arm64 performance with cl.exe
-    - add Arm64EC support
-  * fix webp_js with emcc >= 3.1.27 (stack size change, #614)
-  * CMake fixes (#592, #610, #612)
-  * further updates to the container and lossless bitstream docs (#581, #611)
-
 - 12/16/2022: version 1.3.0
  This is a binary compatible release.
  * add libsharpyuv, which exposes -sharp_yuv/config.use_sharp_yuv
--- a/README.md
+++ b/README.md
@@ -7,7 +7,7 @@
      \__\__/\____/\_____/__/ ____  ___
            / _/ /    \    \ /  _ \/ _/
           /  \_/   / /   \ \   __/  \__
-           \____/____/\_____/_____/____/v1.4.0
+           \____/____/\_____/_____/____/v1.3.0
 ```

 WebP codec is a library to encode and decode images in WebP format. This package
--- a/build.gradle
+++ b/build.gradle
@@ -173,7 +173,6 @@ model {
            include "color_cache_utils.c"
            include "filters_utils.c"
            include "huffman_utils.c"
-            include "palette.c"
            include "quant_levels_dec_utils.c"
            include "random_utils.c"
            include "rescaler_utils.c"
--- a/cmake/WebPConfig.cmake.in
+++ b/cmake/WebPConfig.cmake.in
@@ -8,12 +8,9 @@ if(@WEBP_USE_THREAD@)
  find_dependency(Threads REQUIRED)
 endif()

-include("${CMAKE_CURRENT_LIST_DIR}/@PROJECT_NAME@Targets.cmake")
+include ("${CMAKE_CURRENT_LIST_DIR}/@PROJECT_NAME@Targets.cmake")

-set_and_check(WebP_INCLUDE_DIR "@PACKAGE_CMAKE_INSTALL_INCLUDEDIR@")
-set(WebP_INCLUDE_DIRS ${WebP_INCLUDE_DIR})
-set(WEBP_INCLUDE_DIRS ${WebP_INCLUDE_DIR})
+set(WebP_INCLUDE_DIRS "@CMAKE_INSTALL_FULL_INCLUDEDIR@")
+set(WEBP_INCLUDE_DIRS ${WebP_INCLUDE_DIRS})
 set(WebP_LIBRARIES "@INSTALLED_LIBRARIES@")
 set(WEBP_LIBRARIES "${WebP_LIBRARIES}")
-
-check_required_components(WebP)
--- a/cmake/config.h.in
+++ b/cmake/config.h.in
@@ -16,18 +16,48 @@
 /* Define to 1 if you have the <cpu-features.h> header file. */
 #cmakedefine HAVE_CPU_FEATURES_H 1

+/* Define to 1 if you have the <dlfcn.h> header file. */
+#cmakedefine HAVE_DLFCN_H 1
+
 /* Define to 1 if you have the <GLUT/glut.h> header file. */
 #cmakedefine HAVE_GLUT_GLUT_H 1

 /* Define to 1 if you have the <GL/glut.h> header file. */
 #cmakedefine HAVE_GL_GLUT_H 1

+/* Define to 1 if you have the <inttypes.h> header file. */
+#cmakedefine HAVE_INTTYPES_H 1
+
+/* Define to 1 if you have the <memory.h> header file. */
+#cmakedefine HAVE_MEMORY_H 1
+
 /* Define to 1 if you have the <OpenGL/glut.h> header file. */
 #cmakedefine HAVE_OPENGL_GLUT_H 1

+/* Have PTHREAD_PRIO_INHERIT. */
+#cmakedefine HAVE_PTHREAD_PRIO_INHERIT @HAVE_PTHREAD_PRIO_INHERIT@
+
 /* Define to 1 if you have the <shlwapi.h> header file. */
 #cmakedefine HAVE_SHLWAPI_H 1

+/* Define to 1 if you have the <stdint.h> header file. */
+#cmakedefine HAVE_STDINT_H 1
+
+/* Define to 1 if you have the <stdlib.h> header file. */
+#cmakedefine HAVE_STDLIB_H 1
+
+/* Define to 1 if you have the <strings.h> header file. */
+#cmakedefine HAVE_STRINGS_H 1
+
+/* Define to 1 if you have the <string.h> header file. */
+#cmakedefine HAVE_STRING_H 1
+
+/* Define to 1 if you have the <sys/stat.h> header file. */
+#cmakedefine HAVE_SYS_STAT_H 1
+
+/* Define to 1 if you have the <sys/types.h> header file. */
+#cmakedefine HAVE_SYS_TYPES_H 1
+
 /* Define to 1 if you have the <unistd.h> header file. */
 #cmakedefine HAVE_UNISTD_H 1

@@ -63,6 +93,9 @@
 /* Define to the version of this package. */
 #cmakedefine PACKAGE_VERSION "@PACKAGE_VERSION@"

+/* Define to 1 if you have the ANSI C header files. */
+#cmakedefine STDC_HEADERS 1
+
 /* Version number of package */
 #cmakedefine VERSION "@VERSION@"

--- a/cmake/cpu.cmake
+++ b/cmake/cpu.cmake
@@ -61,7 +61,7 @@ endif()
 set(WEBP_SIMD_FILES_TO_INCLUDE)
 set(WEBP_SIMD_FLAGS_TO_INCLUDE)

-if(ANDROID AND ANDROID_ABI)
+if(${ANDROID})
  if(${ANDROID_ABI} STREQUAL "armeabi-v7a")
    # This is because Android studio uses the configuration "-march=armv7-a
    # -mfloat-abi=softfp -mfpu=vfpv3-d16" that does not trigger neon
@@ -106,9 +106,8 @@ foreach(I_SIMD RANGE ${WEBP_SIMD_FLAGS_RANGE})
  endif()
  # Check which files we should include or not.
  list(GET WEBP_SIMD_FILE_EXTENSIONS ${I_SIMD} WEBP_SIMD_FILE_EXTENSION)
-  file(GLOB SIMD_FILES
-       "${CMAKE_CURRENT_LIST_DIR}/../sharpyuv/*${WEBP_SIMD_FILE_EXTENSION}"
-       "${CMAKE_CURRENT_LIST_DIR}/../src/dsp/*${WEBP_SIMD_FILE_EXTENSION}")
+  file(GLOB SIMD_FILES "${CMAKE_CURRENT_LIST_DIR}/../"
+       "src/dsp/*${WEBP_SIMD_FILE_EXTENSION}")
  if(WEBP_HAVE_${WEBP_SIMD_FLAG})
    # Memorize the file and flags.
    foreach(FILE ${SIMD_FILES})
--- a/cmake/deps.cmake
+++ b/cmake/deps.cmake
@@ -43,6 +43,16 @@ if(WEBP_USE_THREAD)
    if(CMAKE_USE_PTHREADS_INIT AND NOT CMAKE_SYSTEM_NAME STREQUAL "QNX")
      set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pthread")
    endif()
+    check_c_source_compiles(
+      "
+        #include <pthread.h>
+        int main (void) {
+          int attr = PTHREAD_PRIO_INHERIT;
+          return attr;
+        }
+      "
+      FLAG_HAVE_PTHREAD_PRIO_INHERIT)
+    set(HAVE_PTHREAD_PRIO_INHERIT ${FLAG_HAVE_PTHREAD_PRIO_INHERIT})
    list(APPEND WEBP_DEP_LIBRARIES Threads::Threads)
  endif()
  set(WEBP_USE_THREAD ${Threads_FOUND})
@@ -74,64 +84,72 @@ endif()
 # Find the standard image libraries.
 set(WEBP_DEP_IMG_LIBRARIES)
 set(WEBP_DEP_IMG_INCLUDE_DIRS)
-if(WEBP_FIND_IMG_LIBS)
-  foreach(I_LIB PNG JPEG TIFF)
-    # Disable tiff when compiling in static mode as it is failing on Ubuntu.
-    if(WEBP_LINK_STATIC AND ${I_LIB} STREQUAL "TIFF")
-      message(STATUS "TIFF is disabled when statically linking.")
-      continue()
-    endif()
-    find_package(${I_LIB})
-    set(WEBP_HAVE_${I_LIB} ${${I_LIB}_FOUND})
-    if(${I_LIB}_FOUND)
-      list(APPEND WEBP_DEP_IMG_LIBRARIES ${${I_LIB}_LIBRARIES})
-      list(APPEND WEBP_DEP_IMG_INCLUDE_DIRS ${${I_LIB}_INCLUDE_DIR}
-           ${${I_LIB}_INCLUDE_DIRS})
-    endif()
-  endforeach()
-  if(WEBP_DEP_IMG_INCLUDE_DIRS)
-    list(REMOVE_DUPLICATES WEBP_DEP_IMG_INCLUDE_DIRS)
+foreach(I_LIB PNG JPEG TIFF)
+  # Disable tiff when compiling in static mode as it is failing on Ubuntu.
+  if(WEBP_LINK_STATIC AND ${I_LIB} STREQUAL "TIFF")
+    message("TIFF is disabled when statically linking.")
+    continue()
  endif()
+  find_package(${I_LIB})
+  set(WEBP_HAVE_${I_LIB} ${${I_LIB}_FOUND})
+  if(${I_LIB}_FOUND)
+    list(APPEND WEBP_DEP_IMG_LIBRARIES ${${I_LIB}_LIBRARIES})
+    list(APPEND WEBP_DEP_IMG_INCLUDE_DIRS ${${I_LIB}_INCLUDE_DIR}
+         ${${I_LIB}_INCLUDE_DIRS})
+  endif()
+endforeach()
+if(WEBP_DEP_IMG_INCLUDE_DIRS)
+  list(REMOVE_DUPLICATES WEBP_DEP_IMG_INCLUDE_DIRS)
+endif()

-  # GIF detection, gifdec isn't part of the imageio lib.
-  include(CMakePushCheckState)
-  set(WEBP_DEP_GIF_LIBRARIES)
-  set(WEBP_DEP_GIF_INCLUDE_DIRS)
-  find_package(GIF)
-  set(WEBP_HAVE_GIF ${GIF_FOUND})
-  if(GIF_FOUND)
-    # GIF find_package only locates the header and library, it doesn't fail
-    # compile tests when detecting the version, but falls back to 3 (as of at
-    # least cmake 3.7.2). Make sure the library links to avoid incorrect
-    # detection when cross compiling.
-    cmake_push_check_state()
-    set(CMAKE_REQUIRED_LIBRARIES ${GIF_LIBRARIES})
-    set(CMAKE_REQUIRED_INCLUDES ${GIF_INCLUDE_DIR})
-    check_c_source_compiles(
-      "
+# GIF detection, gifdec isn't part of the imageio lib.
+include(CMakePushCheckState)
+set(WEBP_DEP_GIF_LIBRARIES)
+set(WEBP_DEP_GIF_INCLUDE_DIRS)
+find_package(GIF)
+set(WEBP_HAVE_GIF ${GIF_FOUND})
+if(GIF_FOUND)
+  # GIF find_package only locates the header and library, it doesn't fail
+  # compile tests when detecting the version, but falls back to 3 (as of at
+  # least cmake 3.7.2). Make sure the library links to avoid incorrect detection
+  # when cross compiling.
+  cmake_push_check_state()
+  set(CMAKE_REQUIRED_LIBRARIES ${GIF_LIBRARIES})
+  set(CMAKE_REQUIRED_INCLUDES ${GIF_INCLUDE_DIR})
+  check_c_source_compiles(
+    "
      #include <gif_lib.h>
      int main(void) {
        (void)DGifOpenFileHandle;
        return 0;
      }
      "
-      GIF_COMPILES)
-    cmake_pop_check_state()
-    if(GIF_COMPILES)
-      list(APPEND WEBP_DEP_GIF_LIBRARIES ${GIF_LIBRARIES})
-      list(APPEND WEBP_DEP_GIF_INCLUDE_DIRS ${GIF_INCLUDE_DIR})
-    else()
-      unset(GIF_FOUND)
-    endif()
+    GIF_COMPILES)
+  cmake_pop_check_state()
+  if(GIF_COMPILES)
+    list(APPEND WEBP_DEP_GIF_LIBRARIES ${GIF_LIBRARIES})
+    list(APPEND WEBP_DEP_GIF_INCLUDE_DIRS ${GIF_INCLUDE_DIR})
+  else()
+    unset(GIF_FOUND)
  endif()
 endif()

 # Check for specific headers.
 include(CheckIncludeFiles)
+check_include_files("stdlib.h;stdarg.h;string.h;float.h" STDC_HEADERS)
+check_include_files(dlfcn.h HAVE_DLFCN_H)
 check_include_files(GLUT/glut.h HAVE_GLUT_GLUT_H)
 check_include_files(GL/glut.h HAVE_GL_GLUT_H)
+check_include_files(inttypes.h HAVE_INTTYPES_H)
+check_include_files(memory.h HAVE_MEMORY_H)
 check_include_files(OpenGL/glut.h HAVE_OPENGL_GLUT_H)
 check_include_files(shlwapi.h HAVE_SHLWAPI_H)
+check_include_files(stdint.h HAVE_STDINT_H)
+check_include_files(stdlib.h HAVE_STDLIB_H)
+check_include_files(strings.h HAVE_STRINGS_H)
+check_include_files(string.h HAVE_STRING_H)
+check_include_files(sys/stat.h HAVE_SYS_STAT_H)
+check_include_files(sys/types.h HAVE_SYS_TYPES_H)
 check_include_files(unistd.h HAVE_UNISTD_H)
 check_include_files(wincodec.h HAVE_WINCODEC_H)
 check_include_files(windows.h HAVE_WINDOWS_H)
--- a/configure.ac
+++ b/configure.ac
@@ -1,4 +1,4 @@
-AC_INIT([libwebp], [1.4.0],
+AC_INIT([libwebp], [1.3.0],
        [https://bugs.chromium.org/p/webp],,
        [https://developers.google.com/speed/webp])
 AC_CANONICAL_HOST
@@ -106,7 +106,6 @@ TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wall])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wconstant-conversion])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wdeclaration-after-statement])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wextra])
-TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wextra-semi-stmt])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wfloat-conversion])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wformat -Wformat-nonliteral])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wformat -Wformat-security])
@@ -116,7 +115,6 @@ TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wold-style-definition])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wparentheses-equality])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wshadow])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wshorten-64-to-32])
-TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wstrict-prototypes])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wundef])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wunreachable-code-aggressive])
 TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wunreachable-code])
@@ -466,7 +464,7 @@ AC_ARG_ENABLE([sdl],
                              @<:@default=auto@:>@]))
 AS_IF([test "x$enable_sdl" != "xno"], [
  CLEAR_LIBVARS([SDL])
-  AC_PATH_PROGS([LIBSDL_CONFIG], [sdl2-config])
+  AC_PATH_PROGS([LIBSDL_CONFIG], [sdl-config])
  if test -n "$LIBSDL_CONFIG"; then
    SDL_INCLUDES=`$LIBSDL_CONFIG --cflags`
    SDL_LIBS="`$LIBSDL_CONFIG --libs`"
@@ -476,12 +474,13 @@ AS_IF([test "x$enable_sdl" != "xno"], [

  sdl_header="no"
  LIBCHECK_PROLOGUE([SDL])
-  AC_CHECK_HEADER([SDL2/SDL.h], [sdl_header="SDL2/SDL.h"],
-                  [AC_MSG_WARN(SDL2 library not available - no SDL.h)])
+  AC_CHECK_HEADER([SDL/SDL.h], [sdl_header="SDL/SDL.h"],
+                  [AC_CHECK_HEADER([SDL.h], [sdl_header="SDL.h"],
+                  [AC_MSG_WARN(SDL library not available - no sdl.h)])])
  if test x"$sdl_header" != "xno"; then
    AC_LANG_PUSH(C)
    SDL_SAVED_LIBS="$LIBS"
-    for lib in "" "-lSDL2" "-lSDL2main -lSDL2"; do
+    for lib in "" "-lSDL" "-lSDLmain -lSDL"; do
      LIBS="$SDL_SAVED_LIBS $lib"
      # Perform a full link to ensure SDL_main is resolved if needed.
      AC_LINK_IFELSE(
@@ -763,8 +762,7 @@ AC_CONFIG_FILES([Makefile src/Makefile man/Makefile \
                 src/libwebp.pc src/libwebpdecoder.pc \
                 src/demux/libwebpdemux.pc src/mux/libwebpmux.pc])

-dnl fix exports from MinGW builds
-AC_CONFIG_COMMANDS_POST([$SED -i 's/-DDLL_EXPORT/-DWEBP_DLL/' config.status])
+
 AC_OUTPUT

 AC_MSG_NOTICE([
--- a/doc/api.md
+++ b/doc/api.md
@@ -157,7 +157,7 @@ decoding is not finished yet or VP8_STATUS_OK when decoding is done. Any other
 status is an error condition.

 The 'idec' object must always be released (even upon an error condition) by
-calling: WebPIDelete(idec).
+calling: WebPDelete(idec).

 To retrieve partially decoded picture samples, one must use the corresponding
 method: WebPIDecGetRGB or WebPIDecGetYUVA. It will return the last displayable
--- a/doc/building.md
+++ b/doc/building.md
@@ -96,24 +96,6 @@ make
 make install
 ```

-## Building libwebp - Using vcpkg
-
-You can download and install libwebp using the
-[vcpkg](https://github.com/Microsoft/vcpkg) dependency manager:
-
-```shell
-git clone https://github.com/Microsoft/vcpkg.git
-cd vcpkg
-./bootstrap-vcpkg.sh
-./vcpkg integrate install
-./vcpkg install libwebp
-```
-
-The libwebp port in vcpkg is kept up to date by Microsoft team members and
-community contributors. If the version is out of date, please
-[create an issue or pull request](https://github.com/Microsoft/vcpkg) on the
-vcpkg repository.
-
 ## CMake

 With CMake, you can compile libwebp, cwebp, dwebp, gif2webp, img2webp, webpinfo
--- a/doc/tools.md
+++ b/doc/tools.md
@@ -82,8 +82,8 @@ Options:
                         green=0xe0 and blue=0xd0
 -noalpha ............... discard any transparency information
 -lossless .............. encode image losslessly, default=off
-near_lossless <int> ... use near-lossless image preprocessing
-                         (0..100=off), default=100
+-near_lossless <int> ... use near-lossless image
+                         preprocessing (0..100=off), default=100
 -hint <string> ......... specify image characteristics hint,
                         one of: photo, picture or graph

@@ -295,23 +295,19 @@ etc.
 Usage:

 ```shell
-img2webp [file_options] [[frame_options] frame_file]... [-o webp_file]
+img2webp [file_options] [[frame_options] frame_file]...
 ```

 File-level options (only used at the start of compression):

 ```
 -min_size ............ minimize size
+-loop <int> .......... loop count (default: 0, = infinite loop)
 -kmax <int> .......... maximum number of frame between key-frames
                        (0=only keyframes)
 -kmin <int> .......... minimum number of frame between key-frames
                        (0=disable key-frames altogether)
 -mixed ............... use mixed lossy/lossless automatic mode
-near_lossless <int> . use near-lossless image preprocessing
-                       (0..100=off), default=100
-sharp_yuv ........... use sharper (and slower) RGB->YUV conversion
-                       (lossy only)
-loop <int> .......... loop count (default: 0, = infinite loop)
 -v ................... verbose mode
 -h ................... this help
 -version ............. print version number and exit
--- a/doc/webp-container-spec.txt
+++ b/doc/webp-container-spec.txt
@@ -20,48 +20,47 @@ WebP Container Specification
 Introduction
 ------------

-WebP is an image format that uses either (i) the VP8 key frame encoding to
-compress image data in a lossy way or (ii) the WebP lossless encoding. These
-encoding schemes should make it more efficient than older formats, such as JPEG,
-GIF, and PNG. It is optimized for fast image transfer over the network (for
-example, for websites). The WebP format has feature parity (color profile,
-metadata, animation, etc.) with other formats as well. This document describes
-the structure of a WebP file.
+WebP is an image format that uses either (i) the VP8 key frame encoding
+to compress image data in a lossy way, or (ii) the WebP lossless encoding
+(and possibly other encodings in the future). These encoding schemes should
+make it more efficient than currently used formats. It is optimized for fast
+image transfer over the network (e.g., for websites). The WebP format has
+feature parity (color profile, metadata, animation, etc.) with other formats as
+well. This document describes the structure of a WebP file.

-The WebP container (that is, the RIFF container for WebP) allows feature support
-over and above the basic use case of WebP (that is, a file containing a single
-image encoded as a VP8 key frame). The WebP container provides additional
-support for the following:
+The WebP container (i.e., RIFF container for WebP) allows feature support over
+and above the basic use case of WebP (i.e., a file containing a single image
+encoded as a VP8 key frame). The WebP container provides additional support
+for:

-  * Lossless Compression: An image can be losslessly compressed, using the
+  * **Lossless compression.** An image can be losslessly compressed, using the
    WebP Lossless Format.

-  * Metadata: An image may have metadata stored in Exchangeable Image File
-    Format (Exif) or Extensible Metadata Platform (XMP) format.
+  * **Metadata.** An image may have metadata stored in Exif or XMP formats.

-  * Transparency: An image may have transparency, that is, an alpha channel.
+  * **Transparency.** An image may have transparency, i.e., an alpha channel.

-  * Color Profile: An image may have an embedded ICC profile as described
+  * **Color Profile.** An image may have an embedded ICC profile as described
    by the [International Color Consortium][iccspec].

-  * Animation: An image may have multiple frames with pauses between them,
+  * **Animation.** An image may have multiple frames with pauses between them,
    making it an animation.

-Terminology & Basics
--------------------
-
 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
 "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this
 document are to be interpreted as described in BCP 14 [RFC 2119][] [RFC 8174][]
 when, and only when, they appear in all capitals, as shown here.

-A WebP file contains either a still image (that is, an encoded matrix of pixels)
-or an [animation](#animation). Optionally, it can also contain transparency
-information, a color profile and metadata. We refer to the matrix of pixels as
-the _canvas_ of the image.
-
 Bit numbering in chunk diagrams starts at `0` for the most significant bit
-('MSB 0'), as described in [RFC 1166][].
+('MSB 0') as described in [RFC 1166][].
+
+Terminology & Basics
+--------------------
+
+A WebP file contains either a still image (i.e., an encoded matrix of pixels)
+or an [animation](#animation). Optionally, it can also contain transparency
+information, color profile and metadata. In case we need to refer only to the
+matrix of pixels, we will call it the _canvas_ of the image.

 Below are additional terms used throughout this document:

@@ -84,19 +83,20 @@ _uint32_

 _FourCC_

-: A four-character code (FourCC) is a _uint32_ created by concatenating four
+: A _FourCC_ (four-character code) is a _uint32_ created by concatenating four
  ASCII characters in little-endian order. This means 'aaaa' (0x61616161) and
 'AAAA' (0x41414141) are treated as different _FourCCs_.

 _1-based_

-: An unsigned integer field storing values offset by `-1`, for example, such a
-  field would store value _25_ as _24_.
+: An unsigned integer field storing values offset by `-1`. e.g., Such a field
+  would store value _25_ as _24_.

 _ChunkHeader('ABCD')_

-: Used to describe the _FourCC_ and _Chunk Size_ header of individual chunks,
-  where 'ABCD' is the FourCC for the chunk. This element's size is 8 bytes.
+: This is used to describe the _FourCC_ and _Chunk Size_ header of individual
+  chunks, where 'ABCD' is the FourCC for the chunk. This element's size is 8
+  bytes.


 RIFF File Format
@@ -124,11 +124,11 @@ Chunk FourCC: 32 bits
 Chunk Size: 32 bits (_uint32_)

 : The size of the chunk in bytes, not including this field, the chunk
-  identifier, or padding.
+  identifier or padding.

 Chunk Payload: _Chunk Size_ bytes

-: The data payload. If _Chunk Size_ is odd, a single padding byte -- which MUST
+: The data payload. If _Chunk Size_ is odd, a single padding byte -- that MUST
  be `0` to conform with RIFF -- is added.

 **Note:** RIFF has a convention that all-uppercase chunk FourCCs are standard
@@ -151,24 +151,24 @@ WebP File Header

 'RIFF': 32 bits

-: The ASCII characters 'R', 'I', 'F', 'F'.
+: The ASCII characters 'R' 'I' 'F' 'F'.

 File Size: 32 bits (_uint32_)

-: The size of the file in bytes, starting at offset 8. The maximum value of
+: The size of the file in bytes starting at offset 8. The maximum value of
  this field is 2^32 minus 10 bytes and thus the size of the whole file is at
-  most 4 GiB minus 2 bytes.
+  most 4GiB minus 2 bytes.

 'WEBP': 32 bits

-: The ASCII characters 'W', 'E', 'B', 'P'.
+: The ASCII characters 'W' 'E' 'B' 'P'.

 A WebP file MUST begin with a RIFF header with the FourCC 'WEBP'. The file size
 in the header is the total size of the chunks that follow plus `4` bytes for
 the 'WEBP' FourCC. The file SHOULD NOT contain any data after the data
 specified by _File Size_. Readers MAY parse such files, ignoring the trailing
 data. As the size of any chunk is even, the size given by the RIFF header is
-also even. The contents of individual chunks are described in the following
+also even. The contents of individual chunks will be described in the following
 sections.


@@ -188,10 +188,10 @@ Simple WebP (lossy) file format:
    |                    WebP file header (12 bytes)                |
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-    :                        'VP8 ' Chunk                           :
+    :                          VP8 chunk                            :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

-'VP8 ' Chunk:
+VP8 chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
@@ -206,15 +206,15 @@ VP8 data: _Chunk Size_ bytes

 : VP8 bitstream data.

-Note that the fourth character in the 'VP8 ' FourCC is an ASCII space (0x20).
+Note the fourth character in the 'VP8 ' FourCC is an ASCII space (0x20).

-The VP8 bitstream format specification is described in [VP8 Data Format and
-Decoding Guide][rfc 6386]. Note that the VP8 frame header contains the VP8 frame
+The VP8 bitstream format specification can be found at [VP8 Data Format and
+Decoding Guide][vp8spec]. Note that the VP8 frame header contains the VP8 frame
 width and height. That is assumed to be the width and height of the canvas.

 The VP8 specification describes how to decode the image into Y'CbCr format. To
-convert to RGB, [Recommendation BT.601][rec601] SHOULD be used. Applications MAY
-use another conversion method, but visual results may differ among decoders.
+convert to RGB, Rec. 601 SHOULD be used. Applications MAY use another
+conversion method, but visual results may differ among decoders.


 Simple File Format (Lossless)
@@ -235,10 +235,10 @@ Simple WebP (lossless) file format:
    |                    WebP file header (12 bytes)                |
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-    :                         'VP8L' Chunk                          :
+    :                          VP8L chunk                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

-'VP8L' Chunk:
+VP8L chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
@@ -266,17 +266,17 @@ Extended File Format

 An extended format file consists of:

-  * A 'VP8X' Chunk with information about features used in the file.
+  * A 'VP8X' chunk with information about features used in the file.

-  * An optional 'ICCP' Chunk with a color profile.
+  * An optional 'ICCP' chunk with color profile.

-  * An optional 'ANIM' Chunk with animation control data.
+  * An optional 'ANIM' chunk with animation control data.

  * Image data.

-  * An optional 'EXIF' Chunk with Exif metadata.
+  * An optional 'EXIF' chunk with Exif metadata.

-  * An optional 'XMP ' Chunk with XMP metadata.
+  * An optional 'XMP ' chunk with XMP metadata.

  * An optional list of [unknown chunks](#unknown-chunks).

@@ -290,18 +290,15 @@ up of:
 For an _animated image_, the _image data_ consists of multiple frames. More
 details about frames can be found in the [Animation](#animation) section.

-All chunks necessary for reconstruction and color correction, that is 'VP8X',
-'ICCP', 'ANIM', 'ANMF', 'ALPH', 'VP8 ' and 'VP8L', MUST appear in the order
-described earlier. Readers SHOULD fail when chunks necessary for reconstruction
-and color correction are out of order.
+All chunks SHOULD be placed in the same order as listed above. If a chunk
+appears in the wrong place, the file is invalid, but readers MAY parse the
+file, ignoring the chunks that are out of order.

-[Metadata](#metadata) and [unknown](#unknown-chunks) chunks MAY appear out of
-order.
-
-**Rationale:** The chunks necessary for reconstruction should appear first in
-the file to allow a reader to begin decoding an image before receiving all of
-the data. An application may benefit from varying the order of metadata and
-custom chunks to suit the implementation.
+**Rationale:** Setting the order of chunks should allow quicker file
+parsing. For example, if an 'ALPH' chunk does not appear in its required
+position, a decoder can choose to stop searching for it. The rule of
+ignoring late chunks should make programs that need to do a full search
+give the same results as the ones stopping early.

 Extended WebP file header:
 {:#extended_header}
@@ -329,7 +326,7 @@ Reserved (Rsv): 2 bits

 ICC profile (I): 1 bit

-: Set if the file contains an 'ICCP' Chunk.
+: Set if the file contains an ICC profile.

 Alpha (L): 1 bit

@@ -346,7 +343,7 @@ XMP metadata (X): 1 bit

 Animation (A): 1 bit

-: Set if this is an animated image. Data in 'ANIM' and 'ANMF' Chunks should be
+: Set if this is an animated image. Data in 'ANIM' and 'ANMF' chunks should be
  used to control the animation.

 Reserved (R): 1 bit
@@ -375,9 +372,9 @@ Future specifications may add more fields. Unknown fields MUST be ignored.

 #### Animation

-An animation is controlled by 'ANIM' and 'ANMF' Chunks.
+An animation is controlled by ANIM and ANMF chunks.

-'ANIM' Chunk:
+ANIM Chunk:
 {:#anim_chunk}

 For an animated image, this chunk contains the _global parameters_ of the
@@ -399,14 +396,14 @@ Background Color: 32 bits (_uint32_)
 : The default background color of the canvas in \[Blue, Green, Red, Alpha\]
  byte order. This color MAY be used to fill the unused space on the canvas
  around the frames, as well as the transparent pixels of the first frame.
-  The background color is also used when the Disposal method is `1`.
+  Background color is also used when disposal method is `1`.

 **Note**:

-  * The background color MAY contain a non-opaque alpha value, even if the
-    _Alpha_ flag in the ['VP8X' Chunk](#extended_header) is unset.
+  * Background color MAY contain a non-opaque alpha value, even if the _Alpha_
+    flag in [VP8X chunk](#extended_header) is unset.

-  * Viewer applications SHOULD treat the background color value as a hint and
+  * Viewer applications SHOULD treat the background color value as a hint, and
    are not required to use it.

  * The canvas is cleared at the start of each loop. The background color MAY be
@@ -414,14 +411,13 @@ Background Color: 32 bits (_uint32_)

 Loop Count: 16 bits (_uint16_)

-: The number of times to loop the animation. If it is `0`, this means
-  infinitely.
+: The number of times to loop the animation. `0` means infinitely.

-This chunk MUST appear if the _Animation_ flag in the 'VP8X' Chunk is set.
+This chunk MUST appear if the _Animation_ flag in the VP8X chunk is set.
 If the _Animation_ flag is not set and this chunk is present, it MUST be
 ignored.

-'ANMF' Chunk:
+ANMF chunk:

 For animated images, this chunk contains information about a _single_ frame.
 If the _Animation flag_ is not set, then this chunk SHOULD NOT be present.
@@ -463,10 +459,10 @@ Frame Height Minus One: 24 bits (_uint24_)

 Frame Duration: 24 bits (_uint24_)

-: The time to wait before displaying the next frame, in 1-millisecond units.
-  Note that the interpretation of the Frame Duration of 0 (and often <= 10) is
-  defined by the implementation. Many tools and browsers assign a minimum
-  duration similar to GIF.
+: The time to wait before displaying the next frame, in 1 millisecond units.
+  Note the interpretation of frame duration of 0 (and often <= 10) is
+  implementation defined. Many tools and browsers assign a minimum duration
+  similar to GIF.

 Reserved: 6 bits

@@ -477,10 +473,10 @@ Blending method (B): 1 bit
 : Indicates how transparent pixels of _the current frame_ are to be blended
  with corresponding pixels of the previous canvas:

-    * `0`: Use alpha-blending. After disposing of the previous frame, render the
+    * `0`: Use alpha blending. After disposing of the previous frame, render the
      current frame on the canvas using [alpha-blending](#alpha-blending). If
-      the current frame does not have an alpha channel, assume the alpha value
-      is 255, effectively replacing the rectangle.
+      the current frame does not have an alpha channel, assume alpha value of
+      255, effectively replacing the rectangle.

    * `1`: Do not blend. After disposing of the previous frame, render the
      current frame on the canvas by overwriting the rectangle covered by the
@@ -493,20 +489,20 @@ Disposal method (D): 1 bit

    * `0`: Do not dispose. Leave the canvas as is.

-    * `1`: Dispose to the background color. Fill the _rectangle_ on the canvas
-      covered by the _current frame_ with the background color specified in the
-      ['ANIM' Chunk](#anim_chunk).
+    * `1`: Dispose to background color. Fill the _rectangle_ on the canvas
+      covered by the _current frame_ with background color specified in the
+      [ANIM chunk](#anim_chunk).

 **Notes**:

  * The frame disposal only applies to the _frame rectangle_, that is, the
-    rectangle defined by _Frame X_, _Frame Y_, _frame width_, and _frame
-    height_. It may or may not cover the whole canvas.
+    rectangle defined by _Frame X_, _Frame Y_, _frame width_ and _frame height_.
+    It may or may not cover the whole canvas.

 {:#alpha-blending}
-  * Alpha-blending:
+  * **Alpha-blending**:

-    Given that each of the R, G, B, and A channels is 8 bits, and the RGB
+    Given that each of the R, G, B and A channels is 8-bit, and the RGB
    channels are _not premultiplied_ by alpha, the formula for blending
    'dst' onto 'src' is:

@@ -522,8 +518,8 @@ Disposal method (D): 1 bit

  * Alpha-blending SHOULD be done in linear color space, by taking into account
    the [color profile](#color-profile) of the image. If the color profile is
-    not present, standard RGB (sRGB) is to be assumed. (Note that sRGB also
-    needs to be linearized due to a gamma of ~2.2.)
+    not present, sRGB is to be assumed. (Note that sRGB also needs to be
+    linearized due to a gamma of ~2.2).

 Frame Data: _Chunk Size_ - `16` bytes

@@ -535,8 +531,8 @@ Frame Data: _Chunk Size_ - `16` bytes

  * An optional list of [unknown chunks](#unknown-chunks).

-**Note**: The 'ANMF' payload, _Frame Data_, consists of individual
-_padded_ chunks, as described by the [RIFF file format](#riff-file-format).
+**Note**: The 'ANMF' payload, _Frame Data_ above, consists of individual
+_padded_ chunks as described by the [RIFF file format](#riff-file-format).

 #### Alpha

@@ -553,20 +549,18 @@ Reserved (Rsv): 2 bits

 : MUST be `0`. Readers MUST ignore this field.

-Preprocessing (P): 2 bits
+Pre-processing (P): 2 bits

-: These _informative_ bits are used to signal the preprocessing that has
+: These _informative_ bits are used to signal the pre-processing that has
  been performed during compression. The decoder can use this information to
-  for example, dither the values or smooth the gradients prior to display.
+  e.g. dither the values or smooth the gradients prior to display.

-    * `0`: No preprocessing.
+    * `0`: No pre-processing.
    * `1`: Level reduction.

-Decoders are not required to use this information in any specified way.
-
 Filtering method (F): 2 bits

-: The filtering methods used are described as follows:
+: The filtering method used:

    * `0`: None.
    * `1`: Horizontal filter.
@@ -590,8 +584,8 @@ made depending on the filtering method:

 where `clip(v)` is equal to:

-  * 0    if v < 0,
-  * 255  if v > 255, or
+  * 0    if v < 0
+  * 255  if v > 255
  * v    otherwise

 The final value is derived by adding the decompressed value `X` to the
@@ -600,15 +594,17 @@ into the \[0..255\] one:

 `alpha = (predictor + X) % 256`

-There are special cases for the left-most and top-most pixel positions. For
-example, the top-left value at location (0, 0) uses 0 as the predictor value.
-Otherwise:
+There are special cases for the left-most and top-most pixel positions:

+  * The top-left value at location (0, 0) uses 0 as predictor value. Otherwise,
  * For horizontal or gradient filtering methods, the left-most pixels at
    location (0, y) are predicted using the location (0, y-1) just above.
  * For vertical or gradient filtering methods, the top-most pixels at
    location (x, 0) are predicted using the location (x-1, 0) on the left.

+
+Decoders are not required to use this information in any specified way.
+
 Compression method (C): 2 bits

 : The compression method used:
@@ -621,32 +617,32 @@ Alpha bitstream: _Chunk Size_ - `1` bytes
 : Encoded alpha bitstream.

 This optional chunk contains encoded alpha data for this frame. A frame
-containing a 'VP8L' Chunk SHOULD NOT contain this chunk.
+containing a 'VP8L' chunk SHOULD NOT contain this chunk.

 **Rationale**: The transparency information is already part of the 'VP8L'
-Chunk.
+chunk.

-The alpha channel data is stored as uncompressed raw data (when the
+The alpha channel data is stored as uncompressed raw data (when
 compression method is '0') or compressed using the lossless format
 (when the compression method is '1').

-  * Raw data: This consists of a byte sequence of length = width * height,
+  * Raw data: consists of a byte sequence of length width * height,
    containing all the 8-bit transparency values in scan order.

-  * Lossless format compression: The byte sequence is a compressed
-    image-stream (as described in ["WebP Lossless Bitstream Format"]
-    [webpllspec]) of implicit dimensions width x height. That is, this
-    image-stream does NOT contain any headers describing the image dimensions.
+  * Lossless format compression: the byte sequence is a compressed
+    image-stream (as described in the [WebP Lossless Bitstream Format]
+    [webpllspec]) of implicit dimension width x height. That is, this
+    image-stream does NOT contain any headers describing the image dimension.

-    **Rationale**: The dimensions are already known from other sources,
-    so storing them again would be redundant and prone to error.
+    **Rationale**: the dimension is already known from other sources,
+    so storing it again would be redundant and error-prone.

-    Once the image-stream is decoded into Alpha, Red, Green, Blue (ARGB) color
-    values, following the process described in the lossless format
-    specification, the transparency information must be extracted from the
-    *green* channel of the ARGB quadruplet.
+    Once the image-stream is decoded into ARGB color values, following
+    the process described in the lossless format specification, the
+    transparency information must be extracted from the *green* channel
+    of the ARGB quadruplet.

-    **Rationale**: The green channel is allowed extra transformation
+    **Rationale**: the green channel is allowed extra transformation
    steps in the specification -- unlike the other channels -- that can
    improve compression.

@@ -654,13 +650,13 @@ compression method is '0') or compressed using the lossless format

 This chunk contains compressed bitstream data for a single frame.

-A bitstream chunk may be either (i) a 'VP8 ' Chunk, using 'VP8 ' (note the
-significant fourth-character space) as its FourCC, _or_ (ii) a 'VP8L' Chunk,
-using 'VP8L' as its FourCC.
+A bitstream chunk may be either (i) a VP8 chunk, using "VP8 " (note the
+significant fourth-character space) as its tag _or_ (ii) a VP8L chunk, using
+"VP8L" as its tag.

-The formats of 'VP8 ' and 'VP8L' Chunks are as described in sections
+The formats of VP8 and VP8L chunks are as described in sections
 [Simple File Format (Lossy)](#simple-file-format-lossy)
-and [Simple File Format (Lossless)](#simple-file-format-lossless), respectively.
+and [Simple File Format (Lossless)](#simple-file-format-lossless) respectively.

 #### Color Profile

@@ -687,14 +683,14 @@ If this chunk is not present, sRGB SHOULD be assumed.

 #### Metadata

-Metadata can be stored in 'EXIF' or 'XMP ' Chunks.
+Metadata can be stored in 'EXIF' or 'XMP ' chunks.

 There SHOULD be at most one chunk of each type ('EXIF' and 'XMP '). If there
 are more such chunks, readers MAY ignore all except the first one.

 The chunks are defined as follows:

-'EXIF' Chunk:
+EXIF chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
@@ -709,7 +705,7 @@ Exif Metadata: _Chunk Size_ bytes

 : Image metadata in Exif format.

-'XMP ' Chunk:
+XMP chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
@@ -724,62 +720,62 @@ XMP Metadata: _Chunk Size_ bytes

 : Image metadata in XMP format.

-Note that the fourth character in the 'XMP ' FourCC is an ASCII space (0x20).
+Note the fourth character in the 'XMP ' FourCC is an ASCII space (0x20).

 Additional guidance about handling metadata can be found in the
-Metadata Working Group's ["Guidelines for Handling Metadata"][metadata].
+Metadata Working Group's [Guidelines for Handling Metadata][metadata].

 #### Unknown Chunks

-A RIFF chunk (described in the [RIFF File Format](#riff-file-format) section)
-whose FourCC is different from any of the chunks described in this document, is
+A RIFF chunk (described in [this](#terminology-amp-basics) section) whose _chunk
+tag_ is different from any of the chunks described in this document, is
 considered an _unknown chunk_.

 **Rationale**: Allowing unknown chunks gives a provision for future extension
-of the format and also allows storage of any application-specific data.
+of the format, and also allows storage of any application-specific data.

 A file MAY contain unknown chunks:

-  * at the end of the file, as described in [Extended WebP file
-    header](#extended_header) section, or
-  * at the end of 'ANMF' Chunks, as described in the
+  * At the end of the file as described in [Extended WebP file
+    header](#extended_header) section.
+  * At the end of ANMF chunks as described in the
    [Animation](#animation) section.

 Readers SHOULD ignore these chunks. Writers SHOULD preserve them in their
 original order (unless they specifically intend to modify these chunks).

-### Canvas Assembly from Frames
+### Assembling the Canvas From Frames

 Here we provide an overview of how a reader MUST assemble a canvas in the case
 of an animated image.

 The process begins with creating a canvas using the dimensions given in the
-'VP8X' Chunk, `Canvas Width Minus One + 1` pixels wide by `Canvas Height Minus
-One + 1` pixels high. The `Loop Count` field from the 'ANIM' Chunk controls how
+'VP8X' chunk, `Canvas Width Minus One + 1` pixels wide by `Canvas Height Minus
+One + 1` pixels high. The `Loop Count` field from the 'ANIM' chunk controls how
 many times the animation process is repeated. This is `Loop Count - 1` for
-nonzero `Loop Count` values or infinite if the `Loop Count` is zero.
+non-zero `Loop Count` values or infinitely if `Loop Count` is zero.

-At the beginning of each loop iteration, the canvas is filled using the
-background color from the 'ANIM' Chunk or an application-defined color.
+At the beginning of each loop iteration the canvas is filled using the
+background color from the 'ANIM' chunk or an application defined color.

-'ANMF' Chunks contain individual frames given in display order. Before rendering
+'ANMF' chunks contain individual frames given in display order. Before rendering
 each frame, the previous frame's `Disposal method` is applied.

 The rendering of the decoded frame begins at the Cartesian coordinates (`2 *
-Frame X`, `2 * Frame Y`), using the top-left corner of the canvas as the origin.
+Frame X`, `2 * Frame Y`) using the top-left corner of the canvas as the origin.
 `Frame Width Minus One + 1` pixels wide by `Frame Height Minus One + 1` pixels
 high are rendered onto the canvas using the `Blending method`.

 The canvas is displayed for `Frame Duration` milliseconds. This continues until
-all frames given by 'ANMF' Chunks have been displayed. A new loop iteration is
-then begun, or the canvas is left in its final state if all iterations have been
+all frames given by 'ANMF' chunks have been displayed. A new loop iteration is
+then begun or the canvas is left in its final state if all iterations have been
 completed.

 The following pseudocode illustrates the rendering process. The notation
-_VP8X.field_ means the field in the 'VP8X' Chunk with the same description.
+_VP8X.field_ means the field in the 'VP8X' chunk with the same description.

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-VP8X.flags.hasAnimation MUST be TRUE
+assert VP8X.flags.hasAnimation
 canvas ← new image of size VP8X.canvasWidth x VP8X.canvasHeight with
         background color ANIM.background_color.
 loop_count ← ANIM.loopCount
@@ -787,9 +783,9 @@ dispose_method ← Dispose to background color
 if loop_count == 0:
  loop_count = ∞
 frame_params ← nil
-next chunk in image_data is ANMF MUST be TRUE
+assert next chunk in image_data is ANMF
 for loop = 0..loop_count - 1
-  clear canvas to ANIM.background_color or application-defined color
+  clear canvas to ANIM.background_color or application defined color
  until eof or non-ANMF chunk
    frame_params.frameX = Frame X
    frame_params.frameY = Frame Y
@@ -798,24 +794,22 @@ for loop = 0..loop_count - 1
    frame_params.frameDuration = Frame Duration
    frame_right = frame_params.frameX + frame_params.frameWidth
    frame_bottom = frame_params.frameY + frame_params.frameHeight
-    VP8X.canvasWidth >= frame_right MUST be TRUE
-    VP8X.canvasHeight >= frame_bottom MUST be TRUE
+    assert VP8X.canvasWidth >= frame_right
+    assert VP8X.canvasHeight >= frame_bottom
    for subchunk in 'Frame Data':
      if subchunk.tag == "ALPH":
-        alpha subchunks not found in 'Frame Data' earlier MUST be
-          TRUE
+        assert alpha subchunks not found in 'Frame Data' earlier
        frame_params.alpha = alpha_data
      else if subchunk.tag == "VP8 " OR subchunk.tag == "VP8L":
-        bitstream subchunks not found in 'Frame Data' earlier MUST
-          be TRUE
+        assert bitstream subchunks not found in 'Frame Data' earlier
        frame_params.bitstream = bitstream_data
    render frame with frame_params.alpha and frame_params.bitstream
      on canvas with top-left corner at (frame_params.frameX,
-      frame_params.frameY), using Blending method
+      frame_params.frameY), using blending method
      frame_params.blendingMethod.
    canvas contains the decoded image.
    Show the contents of the canvas for
-    frame_params.frameDuration * 1 ms.
+    frame_params.frameDuration * 1ms.
    dispose_method = frame_params.disposeMethod
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@@ -823,7 +817,7 @@ for loop = 0..loop_count - 1
 Example File Layouts
 --------------------

-A lossy-encoded image with alpha may look as follows:
+A lossy encoded image with alpha may look as follows:

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 RIFF/WEBP
@@ -832,16 +826,16 @@ RIFF/WEBP
 +- VP8 (bitstream)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-A lossless-encoded image may look as follows:
+A losslessly encoded image may look as follows:

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 RIFF/WEBP
 +- VP8X (descriptions of features used)
-+- VP8L (lossless bitstream)
 +- XYZW (unknown chunk)
+- VP8L (lossless bitstream)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-A lossless image with an ICC profile and XMP metadata may
+A lossless image with ICC profile and XMP metadata may
 look as follows:

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -865,11 +859,10 @@ RIFF/WEBP
 +- EXIF (metadata)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+[vp8spec]:  https://datatracker.ietf.org/doc/html/rfc6386
 [webpllspec]: https://chromium.googlesource.com/webm/libwebp/+/HEAD/doc/webp-lossless-bitstream-spec.txt
 [iccspec]: https://www.color.org/icc_specs2.xalter
 [metadata]: https://web.archive.org/web/20180919181934/http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
-[rec601]: https://www.itu.int/rec/R-REC-BT.601
 [rfc 1166]: https://datatracker.ietf.org/doc/html/rfc1166
 [rfc 2119]: https://datatracker.ietf.org/doc/html/rfc2119
-[rfc 6386]: https://datatracker.ietf.org/doc/html/rfc6386
 [rfc 8174]: https://datatracker.ietf.org/doc/html/rfc8174
--- a/doc/webp-lossless-bitstream-spec.txt
+++ b/doc/webp-lossless-bitstream-spec.txt
--- a/examples/Makefile.am
+++ b/examples/Makefile.am
@@ -92,7 +92,7 @@ webpmux_LDADD += ../src/mux/libwebpmux.la
 webpmux_LDADD += ../src/libwebp.la

 img2webp_SOURCES = img2webp.c
-img2webp_CPPFLAGS = $(AM_CPPFLAGS) -I$(top_srcdir)
+img2webp_CPPFLAGS = $(AM_CPPFLAGS)
 img2webp_LDADD  =
 img2webp_LDADD += libexample_util.la
 img2webp_LDADD += ../imageio/libimageio_util.la
--- a/examples/anim_dump.c
+++ b/examples/anim_dump.c
@@ -98,11 +98,7 @@ int main(int argc, const char* argv[]) {
      for (i = 0; !error && i < image.num_frames; ++i) {
        W_CHAR out_file[1024];
        WebPDecBuffer buffer;
-        if (!WebPInitDecBuffer(&buffer)) {
-          fprintf(stderr, "Cannot init dec buffer\n");
-          error = 1;
-          continue;
-        }
+        WebPInitDecBuffer(&buffer);
        buffer.colorspace = MODE_RGBA;
        buffer.is_external_memory = 1;
        buffer.width = image.canvas_width;
--- a/examples/cwebp.c
+++ b/examples/cwebp.c
@@ -306,7 +306,6 @@ static int MyWriter(const uint8_t* data, size_t data_size,
 // Dumps a picture as a PGM file using the IMC4 layout.
 static int DumpPicture(const WebPPicture* const picture, const char* PGM_name) {
  int y;
-  int ok = 0;
  const int uv_width = (picture->width + 1) / 2;
  const int uv_height = (picture->height + 1) / 2;
  const int stride = (picture->width + 1) & ~1;
@@ -321,26 +320,23 @@ static int DumpPicture(const WebPPicture* const picture, const char* PGM_name) {
  if (f == NULL) return 0;
  fprintf(f, "P5\n%d %d\n255\n", stride, height);
  for (y = 0; y < picture->height; ++y) {
-    if (fwrite(src_y, picture->width, 1, f) != 1) goto Error;
+    if (fwrite(src_y, picture->width, 1, f) != 1) return 0;
    if (picture->width & 1) fputc(0, f);  // pad
    src_y += picture->y_stride;
  }
  for (y = 0; y < uv_height; ++y) {
-    if (fwrite(src_u, uv_width, 1, f) != 1) goto Error;
-    if (fwrite(src_v, uv_width, 1, f) != 1) goto Error;
+    if (fwrite(src_u, uv_width, 1, f) != 1) return 0;
+    if (fwrite(src_v, uv_width, 1, f) != 1) return 0;
    src_u += picture->uv_stride;
    src_v += picture->uv_stride;
  }
  for (y = 0; y < alpha_height; ++y) {
-    if (fwrite(src_a, picture->width, 1, f) != 1) goto Error;
+    if (fwrite(src_a, picture->width, 1, f) != 1) return 0;
    if (picture->width & 1) fputc(0, f);  // pad
    src_a += picture->a_stride;
  }
-  ok = 1;
-
- Error:
  fclose(f);
-  return ok;
+  return 1;
 }

 // -----------------------------------------------------------------------------
@@ -596,8 +592,9 @@ static void HelpLong(void) {
         "                           green=0xe0 and blue=0xd0\n");
  printf("  -noalpha ............... discard any transparency information\n");
  printf("  -lossless .............. encode image losslessly, default=off\n");
-  printf("  -near_lossless <int> ... use near-lossless image preprocessing\n"
-         "                           (0..100=off), default=100\n");
+  printf("  -near_lossless <int> ... use near-lossless image\n"
+         "                           preprocessing (0..100=off), "
+         "default=100\n");
  printf("  -hint <string> ......... specify image characteristics hint,\n");
  printf("                           one of: photo, picture or graph\n");

--- a/examples/gifdec.c
+++ b/examples/gifdec.c
@@ -317,7 +317,7 @@ void GIFDisplayError(const GifFileType* const gif, int gif_error) {

 #else  // !WEBP_HAVE_GIF

-static void ErrorGIFNotAvailable(void) {
+static void ErrorGIFNotAvailable() {
  fprintf(stderr, "GIF support not compiled. Please install the libgif-dev "
          "package before building.\n");
 }
--- a/examples/img2webp.c
+++ b/examples/img2webp.c
@@ -28,7 +28,6 @@
 #include "../imageio/imageio_util.h"
 #include "./stopwatch.h"
 #include "./unicode.h"
-#include "sharpyuv/sharpyuv.h"
 #include "webp/encode.h"
 #include "webp/mux.h"

@@ -36,22 +35,17 @@

 static void Help(void) {
  printf("Usage:\n\n");
-  printf("  img2webp [file_options] [[frame_options] frame_file]...");
-  printf(" [-o webp_file]\n\n");
+  printf("  img2webp [file_options] [[frame_options] frame_file]...\n");
+  printf("\n");

  printf("File-level options (only used at the start of compression):\n");
  printf(" -min_size ............ minimize size\n");
+  printf(" -loop <int> .......... loop count (default: 0, = infinite loop)\n");
  printf(" -kmax <int> .......... maximum number of frame between key-frames\n"
         "                        (0=only keyframes)\n");
  printf(" -kmin <int> .......... minimum number of frame between key-frames\n"
         "                        (0=disable key-frames altogether)\n");
  printf(" -mixed ............... use mixed lossy/lossless automatic mode\n");
-  printf(" -near_lossless <int> . use near-lossless image preprocessing\n"
-         "                        (0..100=off), default=100\n");
-  printf(" -sharp_yuv ........... use sharper (and slower) RGB->YUV "
-                                  "conversion\n                        "
-                                  "(lossy only)\n");
-  printf(" -loop <int> .......... loop count (default: 0, = infinite loop)\n");
  printf(" -v ................... verbose mode\n");
  printf(" -h ................... this help\n");
  printf(" -version ............. print version number and exit\n");
@@ -190,11 +184,6 @@ int main(int argc, const char* argv[]) {
      } else if (!strcmp(argv[c], "-mixed")) {
        anim_config.allow_mixed = 1;
        config.lossless = 0;
-      } else if (!strcmp(argv[c], "-near_lossless") && c + 1 < argc) {
-        argv[c] = NULL;
-        config.near_lossless = ExUtilGetInt(argv[++c], 0, &parse_error);
-      } else if (!strcmp(argv[c], "-sharp_yuv")) {
-        config.use_sharp_yuv = 1;
      } else if (!strcmp(argv[c], "-v")) {
        verbose = 1;
      } else if (!strcmp(argv[c], "-h") || !strcmp(argv[c], "-help")) {
@@ -203,13 +192,10 @@ int main(int argc, const char* argv[]) {
      } else if (!strcmp(argv[c], "-version")) {
        const int enc_version = WebPGetEncoderVersion();
        const int mux_version = WebPGetMuxVersion();
-        const int sharpyuv_version = SharpYuvGetVersion();
        printf("WebP Encoder version: %d.%d.%d\nWebP Mux version: %d.%d.%d\n",
               (enc_version >> 16) & 0xff, (enc_version >> 8) & 0xff,
               enc_version & 0xff, (mux_version >> 16) & 0xff,
               (mux_version >> 8) & 0xff, mux_version & 0xff);
-        printf("libsharpyuv: %d.%d.%d\n", (sharpyuv_version >> 24) & 0xff,
-               (sharpyuv_version >> 16) & 0xffff, sharpyuv_version & 0xff);
        goto End;
      } else {
        continue;
--- a/examples/vwebp.c
+++ b/examples/vwebp.c
@@ -18,7 +18,6 @@
 #define _POSIX_C_SOURCE 200112L  // for setenv
 #endif

-#include <assert.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -431,13 +430,10 @@ static void HandleDisplay(void) {
 #endif
 }

-static void StartDisplay(const char* filename) {
+static void StartDisplay(void) {
  int width = kParams.canvas_width;
  int height = kParams.canvas_height;
  int screen_width, screen_height;
-  const char viewername[] = " - WebP viewer";
-  // max linux file len + viewername string
-  char title[4096 + sizeof(viewername)] = "";
  // TODO(webp:365) GLUT_DOUBLE results in flickering / old frames to be
  // partially displayed with animated webp + alpha.
 #if defined(__APPLE__) || defined(_WIN32)
@@ -457,9 +453,8 @@ static void StartDisplay(const char* filename) {
      height = screen_height;
    }
  }
-  snprintf(title, sizeof(title), "%s%s", filename, viewername);
  glutInitWindowSize(width, height);
-  glutCreateWindow(title);
+  glutCreateWindow("WebP viewer");
  glutDisplayFunc(HandleDisplay);
  glutReshapeFunc(HandleReshape);
  glutIdleFunc(NULL);
@@ -498,7 +493,7 @@ static void Help(void) {
 }

 int main(int argc, char* argv[]) {
-  int c, file_name_argv_index = 1;
+  int c;
  WebPDecoderConfig* const config = &kParams.config;
  WebPIterator* const curr = &kParams.curr_frame;

@@ -545,10 +540,7 @@ int main(int argc, char* argv[]) {
    } else if (!strcmp(argv[c], "-mt")) {
      config->options.use_threads = 1;
    } else if (!strcmp(argv[c], "--")) {
-      if (c < argc - 1) {
-        kParams.file_name = (const char*)GET_WARGV(argv, ++c);
-        file_name_argv_index = c;
-      }
+      if (c < argc - 1) kParams.file_name = (const char*)GET_WARGV(argv, ++c);
      break;
    } else if (argv[c][0] == '-') {
      printf("Unknown option '%s'\n", argv[c]);
@@ -556,7 +548,6 @@ int main(int argc, char* argv[]) {
      FREE_WARGV_AND_RETURN(-1);
    } else {
      kParams.file_name = (const char*)GET_WARGV(argv, c);
-      file_name_argv_index = c;
    }

    if (parse_error) {
@@ -622,7 +613,7 @@ int main(int argc, char* argv[]) {

  // Position iterator to last frame. Next call to HandleDisplay will wrap over.
  // We take this into account by bumping up loop_count.
-  if (!WebPDemuxGetFrame(kParams.dmux, 0, curr)) goto Error;
+  WebPDemuxGetFrame(kParams.dmux, 0, curr);
  if (kParams.loop_count) ++kParams.loop_count;

 #if defined(__unix__) || defined(__CYGWIN__)
@@ -636,7 +627,7 @@ int main(int argc, char* argv[]) {
 #ifdef FREEGLUT
  glutSetOption(GLUT_ACTION_ON_WINDOW_CLOSE, GLUT_ACTION_CONTINUE_EXECUTION);
 #endif
-  StartDisplay(argv[file_name_argv_index]);
+  StartDisplay();

  if (kParams.has_animation) glutTimerFunc(0, decode_callback, 0);
  glutMainLoop();
--- a/examples/webpinfo.c
+++ b/examples/webpinfo.c
@@ -357,12 +357,12 @@ static WebPInfoStatus ParseLossyHeader(const ChunkData* const chunk_data,
  }
  data += 3;
  data_size -= 3;
-  printf(
-      "  Key frame:        %s\n"
-      "  Profile:          %d\n"
-      "  Display:          Yes\n"
-      "  Part. 0 length:   %d\n",
-      key_frame ? "Yes" : "No", profile, partition0_length);
+  printf("  Key frame:        %s\n"
+         "  Profile:          %d\n"
+         "  Display:          %s\n"
+         "  Part. 0 length:   %d\n",
+         key_frame ? "Yes" : "No", profile,
+         display ? "Yes" : "No", partition0_length);
  if (key_frame) {
    if (!(data[0] == 0x9d && data[1] == 0x01 && data[2] == 0x2a)) {
      LOG_ERROR("Invalid lossy bitstream signature.");
--- a/examples/webpmux.c
+++ b/examples/webpmux.c
@@ -150,20 +150,16 @@ static const char* ErrorString(WebPMuxError err) {
 }

 #define RETURN_IF_ERROR(ERR_MSG)                                     \
-  do {                                                               \
-    if (err != WEBP_MUX_OK) {                                        \
-      fprintf(stderr, ERR_MSG);                                      \
-      return err;                                                    \
-    }                                                                \
-  } while (0)
+  if (err != WEBP_MUX_OK) {                                          \
+    fprintf(stderr, ERR_MSG);                                        \
+    return err;                                                      \
+  }

 #define RETURN_IF_ERROR3(ERR_MSG, FORMAT_STR1, FORMAT_STR2)          \
-  do {                                                               \
-    if (err != WEBP_MUX_OK) {                                        \
-      fprintf(stderr, ERR_MSG, FORMAT_STR1, FORMAT_STR2);            \
-      return err;                                                    \
-    }                                                                \
-  } while (0)
+  if (err != WEBP_MUX_OK) {                                          \
+    fprintf(stderr, ERR_MSG, FORMAT_STR1, FORMAT_STR2);              \
+    return err;                                                      \
+  }

 #define ERROR_GOTO1(ERR_MSG, LABEL)                                  \
  do {                                                               \
@@ -609,26 +605,20 @@ static int ValidateCommandLine(const CommandLineArguments* const cmd_args,
 #define FEATURETYPE_IS_NIL (config->type_ == NIL_FEATURE)

 #define CHECK_NUM_ARGS_AT_LEAST(NUM, LABEL)                              \
-  do {                                                                   \
-    if (argc < i + (NUM)) {                                              \
-      fprintf(stderr, "ERROR: Too few arguments for '%s'.\n", argv[i]);  \
-      goto LABEL;                                                        \
-    }                                                                    \
-  } while (0)
+  if (argc < i + (NUM)) {                                                \
+    fprintf(stderr, "ERROR: Too few arguments for '%s'.\n", argv[i]);    \
+    goto LABEL;                                                          \
+  }

 #define CHECK_NUM_ARGS_AT_MOST(NUM, LABEL)                               \
-  do {                                                                   \
-    if (argc > i + (NUM)) {                                              \
-      fprintf(stderr, "ERROR: Too many arguments for '%s'.\n", argv[i]); \
-      goto LABEL;                                                        \
-    }                                                                    \
-  } while (0)
+  if (argc > i + (NUM)) {                                                \
+    fprintf(stderr, "ERROR: Too many arguments for '%s'.\n", argv[i]);   \
+    goto LABEL;                                                          \
+  }

 #define CHECK_NUM_ARGS_EXACTLY(NUM, LABEL)                               \
-  do {                                                                   \
-    CHECK_NUM_ARGS_AT_LEAST(NUM, LABEL);                                 \
-    CHECK_NUM_ARGS_AT_MOST(NUM, LABEL);                                  \
-  } while (0)
+  CHECK_NUM_ARGS_AT_LEAST(NUM, LABEL);                                   \
+  CHECK_NUM_ARGS_AT_MOST(NUM, LABEL);

 // Parses command-line arguments to fill up config object. Also performs some
 // semantic checks. unicode_argv contains wchar_t arguments or is null.
--- a/extras/Makefile.am
+++ b/extras/Makefile.am
@@ -7,7 +7,6 @@ noinst_HEADERS += ../src/webp/types.h

 libwebpextras_la_SOURCES =
 libwebpextras_la_SOURCES += extras.c extras.h quality_estimate.c
-libwebpextras_la_SOURCES += sharpyuv_risk_table.c sharpyuv_risk_table.h

 libwebpextras_la_CPPFLAGS = $(AM_CPPFLAGS)
 libwebpextras_la_LDFLAGS = -lm
--- a/extras/extras.c
+++ b/extras/extras.c
@@ -11,20 +11,14 @@
 //

 #include "extras/extras.h"
+#include "webp/format_constants.h"
+#include "src/dsp/dsp.h"

 #include <assert.h>
-#include <limits.h>
 #include <string.h>

-#include "extras/sharpyuv_risk_table.h"
-#include "sharpyuv/sharpyuv.h"
-#include "src/dsp/dsp.h"
-#include "src/utils/utils.h"
-#include "webp/format_constants.h"
-#include "webp/types.h"
-
 #define XTRA_MAJ_VERSION 1
-#define XTRA_MIN_VERSION 4
+#define XTRA_MIN_VERSION 3
 #define XTRA_REV_VERSION 0

 //------------------------------------------------------------------------------
@@ -166,159 +160,3 @@ int WebPUnmultiplyARGB(WebPPicture* pic) {
 }

 //------------------------------------------------------------------------------
-// 420 risk metric
-
-#define YUV_FIX 16  // fixed-point precision for RGB->YUV
-static const int kYuvHalf = 1 << (YUV_FIX - 1);
-
-// Maps a value in [0, (256 << YUV_FIX) - 1] to [0,
-// precomputed_scores_table_sampling - 1]. It is important that the extremal
-// values are preserved and 1:1 mapped:
-//  ConvertValue(0) = 0
-//  ConvertValue((256 << 16) - 1) = rgb_sampling_size - 1
-static int SharpYuvConvertValueToSampledIdx(int v, int rgb_sampling_size) {
-  v = (v + kYuvHalf) >> YUV_FIX;
-  v = (v < 0) ? 0 : (v > 255) ? 255 : v;
-  return (v * (rgb_sampling_size - 1)) / 255;
-}
-
-#undef YUV_FIX
-
-// For each pixel, computes the index to look up that color in a precomputed
-// risk score table where the YUV space is subsampled to a size of
-// precomputed_scores_table_sampling^3 (see sharpyuv_risk_table.h)
-static int SharpYuvConvertToYuvSharpnessIndex(
-    int r, int g, int b, const SharpYuvConversionMatrix* matrix,
-    int precomputed_scores_table_sampling) {
-  const int y = SharpYuvConvertValueToSampledIdx(
-      matrix->rgb_to_y[0] * r + matrix->rgb_to_y[1] * g +
-          matrix->rgb_to_y[2] * b + matrix->rgb_to_y[3],
-      precomputed_scores_table_sampling);
-  const int u = SharpYuvConvertValueToSampledIdx(
-      matrix->rgb_to_u[0] * r + matrix->rgb_to_u[1] * g +
-          matrix->rgb_to_u[2] * b + matrix->rgb_to_u[3],
-      precomputed_scores_table_sampling);
-  const int v = SharpYuvConvertValueToSampledIdx(
-      matrix->rgb_to_v[0] * r + matrix->rgb_to_v[1] * g +
-          matrix->rgb_to_v[2] * b + matrix->rgb_to_v[3],
-      precomputed_scores_table_sampling);
-  return y + u * precomputed_scores_table_sampling +
-         v * precomputed_scores_table_sampling *
-             precomputed_scores_table_sampling;
-}
-
-static void SharpYuvRowToYuvSharpnessIndex(
-    const uint8_t* r_ptr, const uint8_t* g_ptr, const uint8_t* b_ptr,
-    int rgb_step, int rgb_bit_depth, int width, uint16_t* dst,
-    const SharpYuvConversionMatrix* matrix,
-    int precomputed_scores_table_sampling) {
-  int i;
-  assert(rgb_bit_depth == 8);
-  (void)rgb_bit_depth;  // Unused for now.
-  for (i = 0; i < width;
-       ++i, r_ptr += rgb_step, g_ptr += rgb_step, b_ptr += rgb_step) {
-    dst[i] =
-        SharpYuvConvertToYuvSharpnessIndex(r_ptr[0], g_ptr[0], b_ptr[0], matrix,
-                                           precomputed_scores_table_sampling);
-  }
-}
-
-#define SAFE_ALLOC(W, H, T) ((T*)WebPSafeMalloc((uint64_t)(W) * (H), sizeof(T)))
-
-static int DoEstimateRisk(const uint8_t* r_ptr, const uint8_t* g_ptr,
-                          const uint8_t* b_ptr, int rgb_step, int rgb_stride,
-                          int rgb_bit_depth, int width, int height,
-                          const SharpYuvOptions* options,
-                          const uint8_t precomputed_scores_table[],
-                          int precomputed_scores_table_sampling,
-                          float* score_out) {
-  const int sampling3 = precomputed_scores_table_sampling *
-                        precomputed_scores_table_sampling *
-                        precomputed_scores_table_sampling;
-  const int kNoiseLevel = 4;
-  double total_score = 0;
-  double count = 0;
-  // Rows of indices in
-  uint16_t* row1 = SAFE_ALLOC(width, 1, uint16_t);
-  uint16_t* row2 = SAFE_ALLOC(width, 1, uint16_t);
-  uint16_t* tmp;
-  int i, j;
-
-  if (row1 == NULL || row2 == NULL) {
-    WebPFree(row1);
-    WebPFree(row2);
-    return 0;
-  }
-
-  // Convert the first row ahead.
-  SharpYuvRowToYuvSharpnessIndex(r_ptr, g_ptr, b_ptr, rgb_step, rgb_bit_depth,
-                                 width, row2, options->yuv_matrix,
-                                 precomputed_scores_table_sampling);
-
-  for (j = 1; j < height; ++j) {
-    r_ptr += rgb_stride;
-    g_ptr += rgb_stride;
-    b_ptr += rgb_stride;
-    // Swap row 1 and row 2.
-    tmp = row1;
-    row1 = row2;
-    row2 = tmp;
-    // Convert the row below.
-    SharpYuvRowToYuvSharpnessIndex(r_ptr, g_ptr, b_ptr, rgb_step, rgb_bit_depth,
-                                   width, row2, options->yuv_matrix,
-                                   precomputed_scores_table_sampling);
-    for (i = 0; i < width - 1; ++i) {
-      const int idx0 = row1[i + 0];
-      const int idx1 = row1[i + 1];
-      const int idx2 = row2[i + 0];
-      const int score = precomputed_scores_table[idx0 + sampling3 * idx1] +
-                        precomputed_scores_table[idx0 + sampling3 * idx2] +
-                        precomputed_scores_table[idx1 + sampling3 * idx2];
-      if (score > kNoiseLevel) {
-        total_score += score;
-        count += 1.0;
-      }
-    }
-  }
-  if (count > 0.) total_score /= count;
-
-  // If less than 1% of pixels were evaluated -> below noise level.
-  if (100. * count / (width * height) < 1.) total_score = 0.;
-
-  // Rescale to [0:100]
-  total_score = (total_score > 25.) ? 100. : total_score * 100. / 25.;
-
-  WebPFree(row1);
-  WebPFree(row2);
-
-  *score_out = (float)total_score;
-  return 1;
-}
-
-#undef SAFE_ALLOC
-
-int SharpYuvEstimate420Risk(const void* r_ptr, const void* g_ptr,
-                            const void* b_ptr, int rgb_step, int rgb_stride,
-                            int rgb_bit_depth, int width, int height,
-                            const SharpYuvOptions* options, float* score) {
-  if (width < 1 || height < 1 || width == INT_MAX || height == INT_MAX ||
-      r_ptr == NULL || g_ptr == NULL || b_ptr == NULL || options == NULL ||
-      score == NULL) {
-    return 0;
-  }
-  if (rgb_bit_depth != 8) {
-    return 0;
-  }
-
-  if (width <= 4 || height <= 4) {
-    *score = 0.0f;  // too small, no real risk.
-    return 1;
-  }
-
-  return DoEstimateRisk(
-      (const uint8_t*)r_ptr, (const uint8_t*)g_ptr, (const uint8_t*)b_ptr,
-      rgb_step, rgb_stride, rgb_bit_depth, width, height, options,
-      kSharpYuvPrecomputedRisk, kSharpYuvPrecomputedRiskYuvSampling, score);
-}
-
-//------------------------------------------------------------------------------
--- a/extras/extras.h
+++ b/extras/extras.h
@@ -17,10 +17,9 @@
 extern "C" {
 #endif

-#include "sharpyuv/sharpyuv.h"
 #include "webp/encode.h"

-#define WEBP_EXTRAS_ABI_VERSION 0x0003    // MAJOR(8b) + MINOR(8b)
+#define WEBP_EXTRAS_ABI_VERSION 0x0002    // MAJOR(8b) + MINOR(8b)

 //------------------------------------------------------------------------------

@@ -71,38 +70,6 @@ WEBP_EXTERN int VP8EstimateQuality(const uint8_t* const data, size_t size);

 //------------------------------------------------------------------------------

-// Computes a score between 0 and 100 which represents the risk of having visual
-// quality loss from converting an RGB image to YUV420.
-// A low score, typically < 40, means there is a low risk of artifacts from
-// chroma subsampling and a simple averaging algorithm can be used instead of
-// the more expensive SharpYuvConvert function.
-// A medium score, typically >= 40 and < 70, means that simple chroma
-// subsampling will produce artifacts and it may be advisable to use the more
-// costly SharpYuvConvert for YUV420 conversion.
-// A high score, typically >= 70, means there is a very high risk of artifacts
-// from chroma subsampling even with SharpYuvConvert, and best results might be
-// achieved by using YUV444.
-// If not using SharpYuvConvert, a threshold of about 50 can be used to decide
-// between (simple averaging) 420 and 444.
-// r_ptr, g_ptr, b_ptr: pointers to the source r, g and b channels. Should point
-//     to uint8_t buffers if rgb_bit_depth is 8, or uint16_t buffers otherwise.
-// rgb_step: distance in bytes between two horizontally adjacent pixels on the
-//     r, g and b channels. If rgb_bit_depth is > 8, it should be a
-//     multiple of 2.
-// rgb_stride: distance in bytes between two vertically adjacent pixels on the
-//     r, g, and b channels. If rgb_bit_depth is > 8, it should be a
-//     multiple of 2.
-// rgb_bit_depth: number of bits for each r/g/b value. Only a value of 8 is
-//     currently supported.
-// width, height: width and height of the image in pixels
-// Returns 0 on failure.
-WEBP_EXTERN int SharpYuvEstimate420Risk(
-    const void* r_ptr, const void* g_ptr, const void* b_ptr, int rgb_step,
-    int rgb_stride, int rgb_bit_depth, int width, int height,
-    const SharpYuvOptions* options, float* score);
-
-//------------------------------------------------------------------------------
-
 #ifdef __cplusplus
 }    // extern "C"
 #endif
--- a/extras/sharpyuv_risk_table.c
+++ b/extras/sharpyuv_risk_table.c
--- a/extras/sharpyuv_risk_table.h
+++ b/extras/sharpyuv_risk_table.h
@@ -1,27 +0,0 @@
-// Copyright 2023 Google Inc. All Rights Reserved.
-//
-// Use of this source code is governed by a BSD-style license
-// that can be found in the COPYING file in the root of the source
-// tree. An additional intellectual property rights grant can be found
-// in the file PATENTS. All contributing project authors may
-// be found in the AUTHORS file in the root of the source tree.
-// -----------------------------------------------------------------------------
-//
-// Precomputed data for 420 risk estimation.
-
-#ifndef WEBP_EXTRAS_SHARPYUV_RISK_TABLE_H_
-#define WEBP_EXTRAS_SHARPYUV_RISK_TABLE_H_
-
-#include "src/webp/types.h"
-
-extern const int kSharpYuvPrecomputedRiskYuvSampling;
-// Table of precomputed risk scores when chroma subsampling images with two
-// given colors.
-// Since precomputing values for all possible YUV colors would create a huge
-// table, the YUV space (i.e. [0, 255]^3) is reduced to
-// [0, kSharpYuvPrecomputedRiskYuvSampling-1]^3
-// where 255 maps to kSharpYuvPrecomputedRiskYuvSampling-1.
-// Table size: kSharpYuvPrecomputedRiskYuvSampling^6 bytes or 114 KiB
-extern const uint8_t kSharpYuvPrecomputedRisk[];
-
-#endif  // WEBP_EXTRAS_SHARPYUV_RISK_TABLE_H_
--- a/extras/vwebp_sdl.c
+++ b/extras/vwebp_sdl.c
@@ -30,7 +30,7 @@
 #if defined(WEBP_HAVE_JUST_SDL_H)
 #include <SDL.h>
 #else
-#include <SDL2/SDL.h>
+#include <SDL/SDL.h>
 #endif

 static void ProcessEvents(void) {
--- a/extras/webp_to_sdl.c
+++ b/extras/webp_to_sdl.c
@@ -20,75 +20,88 @@
 #include "webp_to_sdl.h"

 #include <stdio.h>
-
 #include "src/webp/decode.h"

 #if defined(WEBP_HAVE_JUST_SDL_H)
 #include <SDL.h>
 #else
-#include <SDL2/SDL.h>
+#include <SDL/SDL.h>
 #endif

 static int init_ok = 0;
 int WebPToSDL(const char* data, unsigned int data_size) {
  int ok = 0;
  VP8StatusCode status;
-  WebPBitstreamFeatures input;
-  uint8_t* output = NULL;
-  SDL_Window* window = NULL;
-  SDL_Renderer* renderer = NULL;
-  SDL_Texture* texture = NULL;
-  int width, height;
+  WebPDecoderConfig config;
+  WebPBitstreamFeatures* const input = &config.input;
+  WebPDecBuffer* const output = &config.output;
+  SDL_Surface* screen = NULL;
+  SDL_Surface* surface = NULL;
+
+  if (!WebPInitDecoderConfig(&config)) {
+    fprintf(stderr, "Library version mismatch!\n");
+    return 0;
+  }

  if (!init_ok) {
    SDL_Init(SDL_INIT_VIDEO);
    init_ok = 1;
  }

-  status = WebPGetFeatures((uint8_t*)data, (size_t)data_size, &input);
+  status = WebPGetFeatures((uint8_t*)data, (size_t)data_size, &config.input);
  if (status != VP8_STATUS_OK) goto Error;
-  width = input.width;
-  height = input.height;

-  SDL_CreateWindowAndRenderer(width, height, 0, &window, &renderer);
-  if (window == NULL || renderer == NULL) {
-    fprintf(stderr, "Unable to create window or renderer!\n");
+  screen = SDL_SetVideoMode(input->width, input->height, 32, SDL_SWSURFACE);
+  if (screen == NULL) {
+    fprintf(stderr, "Unable to set video mode (32bpp %dx%d)!\n",
+            input->width, input->height);
    goto Error;
  }
-  SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY,
-              "linear");  // make the scaled rendering look smoother.
-  SDL_RenderSetLogicalSize(renderer, width, height);

-  texture = SDL_CreateTexture(renderer, SDL_PIXELFORMAT_ABGR8888,
-                              SDL_TEXTUREACCESS_STREAMING, width, height);
-  if (texture == NULL) {
-    fprintf(stderr, "Unable to create %dx%d RGBA texture!\n", width, height);
+  surface = SDL_CreateRGBSurface(SDL_SWSURFACE,
+                                 input->width, input->height, 32,
+                                 0x000000ffu,   // R mask
+                                 0x0000ff00u,   // G mask
+                                 0x00ff0000u,   // B mask
+                                 0xff000000u);  // A mask
+
+  if (surface == NULL) {
+    fprintf(stderr, "Unable to create %dx%d RGBA surface!\n",
+            input->width, input->height);
    goto Error;
  }
+  if (SDL_MUSTLOCK(surface)) SDL_LockSurface(surface);

 #if SDL_BYTEORDER == SDL_BIG_ENDIAN
-  output = WebPDecodeBGRA((const uint8_t*)data, (size_t)data_size, &width,
-                          &height);
+  output->colorspace = MODE_BGRA;
 #else
-  output = WebPDecodeRGBA((const uint8_t*)data, (size_t)data_size, &width,
-                          &height);
+  output->colorspace = MODE_RGBA;
 #endif
-  if (output == NULL) {
+  output->width  = surface->w;
+  output->height = surface->h;
+  output->u.RGBA.rgba   = surface->pixels;
+  output->u.RGBA.stride = surface->pitch;
+  output->u.RGBA.size   = surface->pitch * surface->h;
+  output->is_external_memory = 1;
+
+  status = WebPDecode((const uint8_t*)data, (size_t)data_size, &config);
+  if (status != VP8_STATUS_OK) {
    fprintf(stderr, "Error decoding image (%d)\n", status);
    goto Error;
  }

-  SDL_UpdateTexture(texture, NULL, output, width * sizeof(uint32_t));
-  SDL_RenderClear(renderer);
-  SDL_RenderCopy(renderer, texture, NULL, NULL);
-  SDL_RenderPresent(renderer);
+  if (SDL_MUSTLOCK(surface)) SDL_UnlockSurface(surface);
+  if (SDL_BlitSurface(surface, NULL, screen, NULL) ||
+      SDL_Flip(screen)) {
+    goto Error;
+  }
+
  ok = 1;

 Error:
-  // We should call SDL_DestroyWindow(window) but that makes .js fail.
-  SDL_DestroyRenderer(renderer);
-  SDL_DestroyTexture(texture);
-  WebPFree(output);
+  SDL_FreeSurface(surface);
+  SDL_FreeSurface(screen);
+  WebPFreeDecBuffer(output);
  return ok;
 }

--- a/imageio/image_enc.c
+++ b/imageio/image_enc.c
@@ -260,20 +260,14 @@ int WebPWritePAM(FILE* fout, const WebPDecBuffer* const buffer) {

 // Save 16b mode (RGBA4444, RGB565, ...) for debugging purpose.
 int WebPWrite16bAsPGM(FILE* fout, const WebPDecBuffer* const buffer) {
-  uint32_t width, height;
-  uint8_t* rgba;
-  int stride;
+  const uint32_t width = buffer->width;
+  const uint32_t height = buffer->height;
+  const uint8_t* rgba = buffer->u.RGBA.rgba;
+  const int stride = buffer->u.RGBA.stride;
  const uint32_t bytes_per_px = 2;
  uint32_t y;

-  if (fout == NULL || buffer == NULL) return 0;
-
-  width = buffer->width;
-  height = buffer->height;
-  rgba = buffer->u.RGBA.rgba;
-  stride = buffer->u.RGBA.stride;
-
-  if (rgba == NULL) return 0;
+  if (fout == NULL || buffer == NULL || rgba == NULL) return 0;

  fprintf(fout, "P5\n%u %u\n255\n", width * bytes_per_px, height);
  for (y = 0; y < height; ++y) {
@@ -301,29 +295,22 @@ static void PutLE32(uint8_t* const dst, uint32_t value) {
 #define BMP_HEADER_SIZE 54
 #define BMP_HEADER_ALPHA_EXTRA_SIZE 16  // for alpha info
 int WebPWriteBMP(FILE* fout, const WebPDecBuffer* const buffer) {
-  int has_alpha, header_size;
-  uint32_t width, height;
-  uint8_t* rgba;
-  int stride;
+  const int has_alpha = WebPIsAlphaMode(buffer->colorspace);
+  const int header_size =
+      BMP_HEADER_SIZE + (has_alpha ? BMP_HEADER_ALPHA_EXTRA_SIZE : 0);
+  const uint32_t width = buffer->width;
+  const uint32_t height = buffer->height;
+  const uint8_t* rgba = buffer->u.RGBA.rgba;
+  const int stride = buffer->u.RGBA.stride;
+  const uint32_t bytes_per_px = has_alpha ? 4 : 3;
  uint32_t y;
-  uint32_t bytes_per_px, line_size, image_size, bmp_stride, total_size;
+  const uint32_t line_size = bytes_per_px * width;
+  const uint32_t bmp_stride = (line_size + 3) & ~3;   // pad to 4
+  const uint32_t image_size = bmp_stride * height;
+  const uint32_t total_size =  image_size + header_size;
  uint8_t bmp_header[BMP_HEADER_SIZE + BMP_HEADER_ALPHA_EXTRA_SIZE] = { 0 };

-  if (fout == NULL || buffer == NULL) return 0;
-
-  has_alpha = WebPIsAlphaMode(buffer->colorspace);
-  header_size = BMP_HEADER_SIZE + (has_alpha ? BMP_HEADER_ALPHA_EXTRA_SIZE : 0);
-  width = buffer->width;
-  height = buffer->height;
-  rgba = buffer->u.RGBA.rgba;
-  stride = buffer->u.RGBA.stride;
-  bytes_per_px = has_alpha ? 4 : 3;
-  line_size = bytes_per_px * width;
-  bmp_stride = (line_size + 3) & ~3;  // pad to 4
-  image_size = bmp_stride * height;
-  total_size = image_size + header_size;
-
-  if (rgba == NULL) return 0;
+  if (fout == NULL || buffer == NULL || rgba == NULL) return 0;

  // bitmap file header
  PutLE16(bmp_header + 0, 0x4d42);                // signature 'BM'
@@ -385,14 +372,17 @@ int WebPWriteBMP(FILE* fout, const WebPDecBuffer* const buffer) {
 #define TIFF_HEADER_SIZE (EXTRA_DATA_OFFSET + EXTRA_DATA_SIZE)

 int WebPWriteTIFF(FILE* fout, const WebPDecBuffer* const buffer) {
-  int has_alpha;
-  uint32_t width, height;
-  uint8_t* rgba;
-  int stride;
-  uint8_t bytes_per_px = 0;
-  const uint8_t assoc_alpha = 0;
+  const int has_alpha = WebPIsAlphaMode(buffer->colorspace);
+  const uint32_t width = buffer->width;
+  const uint32_t height = buffer->height;
+  const uint8_t* rgba = buffer->u.RGBA.rgba;
+  const int stride = buffer->u.RGBA.stride;
+  const uint8_t bytes_per_px = has_alpha ? 4 : 3;
+  const uint8_t assoc_alpha =
+      WebPIsPremultipliedMode(buffer->colorspace) ? 1 : 2;
  // For non-alpha case, we omit tag 0x152 (ExtraSamples).
-  const uint8_t num_ifd_entries = 0;
+  const uint8_t num_ifd_entries = has_alpha ? NUM_IFD_ENTRIES
+                                            : NUM_IFD_ENTRIES - 1;
  uint8_t tiff_header[TIFF_HEADER_SIZE] = {
    0x49, 0x49, 0x2a, 0x00,   // little endian signature
    8, 0, 0, 0,               // offset to the unique IFD that follows
@@ -426,20 +416,7 @@ int WebPWriteTIFF(FILE* fout, const WebPDecBuffer* const buffer) {
  };
  uint32_t y;

-  if (fout == NULL || buffer == NULL) return 0;
-
-  has_alpha = WebPIsAlphaMode(buffer->colorspace);
-  width = buffer->width;
-  height = buffer->height;
-  rgba = buffer->u.RGBA.rgba;
-  stride = buffer->u.RGBA.stride;
-
-  if (rgba == NULL) return 0;
-
-  // Update bytes_per_px, num_ifd_entries and assoc_alpha.
-  tiff_header[38] = tiff_header[102] = bytes_per_px = has_alpha ? 4 : 3;
-  tiff_header[8] = has_alpha ? NUM_IFD_ENTRIES : NUM_IFD_ENTRIES - 1;
-  tiff_header[186] = WebPIsPremultipliedMode(buffer->colorspace) ? 1 : 2;
+  if (fout == NULL || buffer == NULL || rgba == NULL) return 0;

  // Fill placeholders in IFD:
  PutLE32(tiff_header + 10 + 8, width);
--- a/imageio/pngdec.c
+++ b/imageio/pngdec.c
@@ -235,7 +235,7 @@ int ReadPNG(const uint8_t* const data, size_t data_size,
  volatile png_infop end_info = NULL;
  PNGReadContext context = { NULL, 0, 0 };
  int color_type, bit_depth, interlaced;
-  int num_channels;
+  int has_alpha;
  int num_passes;
  int p;
  volatile int ok = 0;
@@ -293,6 +293,9 @@ int ReadPNG(const uint8_t* const data, size_t data_size,
  }
  if (png_get_valid(png, info, PNG_INFO_tRNS)) {
    png_set_tRNS_to_alpha(png);
+    has_alpha = 1;
+  } else {
+    has_alpha = !!(color_type & PNG_COLOR_MASK_ALPHA);
  }

  // Apply gamma correction if needed.
@@ -307,16 +310,13 @@ int ReadPNG(const uint8_t* const data, size_t data_size,

  if (!keep_alpha) {
    png_set_strip_alpha(png);
+    has_alpha = 0;
  }

  num_passes = png_set_interlace_handling(png);
  png_read_update_info(png, info);

-  num_channels = png_get_channels(png, info);
-  if (num_channels != 3 && num_channels != 4) {
-    goto Error;
-  }
-  stride = (int64_t)num_channels * width * sizeof(*rgb);
+  stride = (int64_t)(has_alpha ? 4 : 3) * width * sizeof(*rgb);
  if (stride != (int)stride ||
      !ImgIoUtilCheckSizeArgumentsOverflow(stride, height)) {
    goto Error;
@@ -341,8 +341,8 @@ int ReadPNG(const uint8_t* const data, size_t data_size,

  pic->width = (int)width;
  pic->height = (int)height;
-  ok = (num_channels == 4) ? WebPPictureImportRGBA(pic, rgb, (int)stride)
-                           : WebPPictureImportRGB(pic, rgb, (int)stride);
+  ok = has_alpha ? WebPPictureImportRGBA(pic, rgb, (int)stride)
+                 : WebPPictureImportRGB(pic, rgb, (int)stride);

  if (!ok) {
    goto Error;
--- a/iosbuild.sh
+++ b/iosbuild.sh
@@ -41,7 +41,6 @@ readonly TARGETDIR="${TOPDIR}/WebP.framework"
 readonly DECTARGETDIR="${TOPDIR}/WebPDecoder.framework"
 readonly MUXTARGETDIR="${TOPDIR}/WebPMux.framework"
 readonly DEMUXTARGETDIR="${TOPDIR}/WebPDemux.framework"
-readonly SHARPYUVTARGETDIR="${TOPDIR}/SharpYuv.framework"
 readonly DEVELOPER=$(xcode-select --print-path)
 readonly PLATFORMSROOT="${DEVELOPER}/Platforms"
 readonly LIPO=$(xcrun -sdk iphoneos${SDK} -find lipo)
@@ -64,8 +63,7 @@ echo "Xcode Version: ${XCODE}"
 echo "iOS SDK Version: ${SDK}"

 if [[ -e "${BUILDDIR}" || -e "${TARGETDIR}" || -e "${DECTARGETDIR}" \
-      || -e "${MUXTARGETDIR}" || -e "${DEMUXTARGETDIR}" \
-      || -e "${SHARPYUVTARGETDIR}" ]]; then
+      || -e "${MUXTARGETDIR}" || -e "${DEMUXTARGETDIR}" ]]; then
  cat << EOF
 WARNING: The following directories will be deleted:
 WARNING:   ${BUILDDIR}
@@ -73,16 +71,14 @@ WARNING:   ${TARGETDIR}
 WARNING:   ${DECTARGETDIR}
 WARNING:   ${MUXTARGETDIR}
 WARNING:   ${DEMUXTARGETDIR}
-WARNING:   ${SHARPYUVTARGETDIR}
 WARNING: The build will continue in 5 seconds...
 EOF
  sleep 5
 fi
 rm -rf ${BUILDDIR} ${TARGETDIR} ${DECTARGETDIR} \
-    ${MUXTARGETDIR} ${DEMUXTARGETDIR} ${SHARPYUVTARGETDIR}
+    ${MUXTARGETDIR} ${DEMUXTARGETDIR}
 mkdir -p ${BUILDDIR} ${TARGETDIR}/Headers/ ${DECTARGETDIR}/Headers/ \
-    ${MUXTARGETDIR}/Headers/ ${DEMUXTARGETDIR}/Headers/ \
-    ${SHARPYUVTARGETDIR}/Headers/
+    ${MUXTARGETDIR}/Headers/ ${DEMUXTARGETDIR}/Headers/

 if [[ ! -e ${SRCDIR}/configure ]]; then
  if ! (cd ${SRCDIR} && sh autogen.sh); then
@@ -138,14 +134,13 @@ for PLATFORM in ${PLATFORMS}; do
  set +x

  # Build only the libraries, skip the examples.
-  make V=0 -C sharpyuv install
+  make V=0 -C sharpyuv
  make V=0 -C src install

  LIBLIST+=" ${ROOTDIR}/lib/libwebp.a"
  DECLIBLIST+=" ${ROOTDIR}/lib/libwebpdecoder.a"
  MUXLIBLIST+=" ${ROOTDIR}/lib/libwebpmux.a"
  DEMUXLIBLIST+=" ${ROOTDIR}/lib/libwebpdemux.a"
-  SHARPYUVLIBLIST+=" ${ROOTDIR}/lib/libsharpyuv.a"

  make clean

@@ -170,9 +165,4 @@ cp -a ${SRCDIR}/src/webp/{decode,types,mux_types,demux}.h \
    ${DEMUXTARGETDIR}/Headers/
 ${LIPO} -create ${DEMUXLIBLIST} -output ${DEMUXTARGETDIR}/WebPDemux

-echo "SHARPYUVLIBLIST = ${SHARPYUVLIBLIST}"
-cp -a ${SRCDIR}/sharpyuv/{sharpyuv,sharpyuv_csp}.h \
-    ${SHARPYUVTARGETDIR}/Headers/
-${LIPO} -create ${SHARPYUVLIBLIST} -output ${SHARPYUVTARGETDIR}/SharpYuv
-
 echo  "SUCCESS"
--- a/makefile.unix
+++ b/makefile.unix
@@ -37,13 +37,13 @@ else
 endif

 # SDL flags: use sdl-config if it exists
-SDL_CONFIG = $(shell sdl2-config --version 2> /dev/null)
+SDL_CONFIG = $(shell sdl-config --version 2> /dev/null)
 ifneq ($(SDL_CONFIG),)
-  SDL_LIBS = $(shell sdl2-config --libs)
-  SDL_FLAGS = $(shell sdl2-config --cflags)
+  SDL_LIBS = $(shell sdl-config --libs)
+  SDL_FLAGS = $(shell sdl-config --cflags)
 else
  # use best-guess
-  SDL_LIBS = -lSDL2
+  SDL_LIBS = -lSDL
  SDL_FLAGS =
 endif

@@ -276,7 +276,6 @@ UTILS_DEC_OBJS = \
    src/utils/color_cache_utils.o \
    src/utils/filters_utils.o \
    src/utils/huffman_utils.o \
-    src/utils/palette.o \
    src/utils/quant_levels_dec_utils.o \
    src/utils/random_utils.o \
    src/utils/rescaler_utils.o \
@@ -291,7 +290,6 @@ UTILS_ENC_OBJS = \
 EXTRA_OBJS = \
    extras/extras.o \
    extras/quality_estimate.o \
-    extras/sharpyuv_risk_table.o \

 LIBWEBPDECODER_OBJS = $(DEC_OBJS) $(DSP_DEC_OBJS) $(UTILS_DEC_OBJS)
 LIBWEBP_OBJS = $(LIBWEBPDECODER_OBJS) $(ENC_OBJS) \
@@ -345,7 +343,6 @@ HDRS = \
    src/utils/filters_utils.h \
    src/utils/huffman_utils.h \
    src/utils/huffman_encode_utils.h \
-    src/utils/palette.h \
    src/utils/quant_levels_utils.h \
    src/utils/quant_levels_dec_utils.h \
    src/utils/random_utils.h \
--- a/man/cwebp.1
+++ b/man/cwebp.1
@@ -1,5 +1,5 @@
 .\"                                      Hey, EMACS: -*- nroff -*-
-.TH CWEBP 1 "March 26, 2024"
+.TH CWEBP 1 "March 17, 2022"
 .SH NAME
 cwebp \- compress an image file to a WebP file
 .SH SYNOPSIS
@@ -135,9 +135,7 @@ are used, \fB\-size\fP value will prevail.
 Set a maximum number of passes to use during the dichotomy used by
 options \fB\-size\fP or \fB\-psnr\fP. Maximum value is 10, default is 1.
 If options \fB\-size\fP or \fB\-psnr\fP were used, but \fB\-pass\fP wasn't
-specified, a default value of '6' passes will be used. If \fB\-pass\fP is
-specified, but neither \fB-size\fP nor \fB-psnr\fP are, a target PSNR of 40dB
-will be used.
+specified, a default value of '6' passes will be used.
 .TP
 .BI \-qrange " int int
 Specifies the permissible interval for the quality factor. This is particularly
@@ -204,8 +202,7 @@ In the VP8 format, the so\-called control partition has a limit of 512k and
 is used to store the following information: whether the macroblock is skipped,
 which segment it belongs to, whether it is coded as intra 4x4 or intra 16x16
 mode, and finally the prediction modes to use for each of the sub\-blocks.
-For a very large image, 512k only leaves room for a few bits per 16x16
-macroblock.
+For a very large image, 512k only leaves room to few bits per 16x16 macroblock.
 The absolute minimum is 4 bits per macroblock. Skip, segment, and mode
 information can use up almost all these 4 bits (although the case is unlikely),
 which is problematic for very large images. The partition_limit factor controls
@@ -214,8 +211,7 @@ useful in case the 512k limit is reached and the following message is displayed:
 \fIError code: 6 (PARTITION0_OVERFLOW: Partition #0 is too big to fit 512k)\fP.
 If using \fB\-partition_limit\fP is not enough to meet the 512k constraint, one
 should use less segments in order to save more header bits per macroblock.
-See the \fB\-segments\fP option. Note the \fB-m\fP and \fB-q\fP options also
-influence the encoder's decisions and ability to hit this limit.
+See the \fB\-segments\fP option.

 .SS LOGGING OPTIONS
 These options control the level of output:
--- a/man/img2webp.1
+++ b/man/img2webp.1
@@ -1,10 +1,10 @@
 .\"                                      Hey, EMACS: -*- nroff -*-
-.TH IMG2WEBP 1 "March 17, 2023"
+.TH IMG2WEBP 1 "January 5, 2022"
 .SH NAME
 img2webp \- create animated WebP file from a sequence of input images.
 .SH SYNOPSIS
 .B img2webp
-[file_options] [[frame_options] frame_file]... [\-o webp_file]
+[file_options] [[frame_options] frame_file]...
 .br
 .B img2webp argument_file_name
 .br
@@ -44,18 +44,6 @@ Mixed compression mode: optimize compression of the image by picking either
 lossy or lossless compression for each frame heuristically. This global
 option disables the local option \fB-lossy\fP and \fB-lossless\fP .
 .TP
-.BI \-near_lossless " int
-Specify the level of near\-lossless image preprocessing. This option adjusts
-pixel values to help compressibility, but has minimal impact on the visual
-quality. It triggers lossless compression mode automatically. The range is 0
-(maximum preprocessing) to 100 (no preprocessing, the default). The typical
-value is around 60. Note that lossy with \fB\-q 100\fP can at times yield
-better results.
-.TP
-.B \-sharp_yuv
-Use more accurate and sharper RGB->YUV conversion if needed. Note that this
-process is slower than the default 'fast' RGB->YUV conversion.
-.TP
 .BI \-loop " int
 Specifies the number of times the animation should loop. Using '0'
 means 'loop indefinitely'.
--- a/sharpyuv/Makefile.am
+++ b/sharpyuv/Makefile.am
@@ -33,7 +33,7 @@ libsharpyuv_la_SOURCES += sharpyuv_gamma.c sharpyuv_gamma.h
 libsharpyuv_la_SOURCES += sharpyuv.c sharpyuv.h

 libsharpyuv_la_CPPFLAGS = $(AM_CPPFLAGS)
-libsharpyuv_la_LDFLAGS = -no-undefined -version-info 1:0:1 -lm
+libsharpyuv_la_LDFLAGS = -no-undefined -version-info 0:0:0 -lm
 libsharpyuv_la_LIBADD =
 libsharpyuv_la_LIBADD += libsharpyuv_sse2.la
 libsharpyuv_la_LIBADD += libsharpyuv_neon.la
--- a/sharpyuv/libsharpyuv.rc
+++ b/sharpyuv/libsharpyuv.rc
@@ -6,8 +6,8 @@
 LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US

 VS_VERSION_INFO VERSIONINFO
- FILEVERSION 0,0,4,0
- PRODUCTVERSION 0,0,4,0
+ FILEVERSION 0,0,2,0
+ PRODUCTVERSION 0,0,2,0
 FILEFLAGSMASK 0x3fL
 #ifdef _DEBUG
 FILEFLAGS 0x1L
@@ -24,12 +24,12 @@ BEGIN
        BEGIN
            VALUE "CompanyName", "Google, Inc."
            VALUE "FileDescription", "libsharpyuv DLL"
-            VALUE "FileVersion", "0.4.0"
+            VALUE "FileVersion", "0.2.0"
            VALUE "InternalName", "libsharpyuv.dll"
-            VALUE "LegalCopyright", "Copyright (C) 2024"
+            VALUE "LegalCopyright", "Copyright (C) 2022"
            VALUE "OriginalFilename", "libsharpyuv.dll"
            VALUE "ProductName", "SharpYuv Library"
-            VALUE "ProductVersion", "0.4.0"
+            VALUE "ProductVersion", "0.2.0"
        END
    END
    BLOCK "VarFileInfo"
--- a/sharpyuv/sharpyuv.c
+++ b/sharpyuv/sharpyuv.c
@@ -75,48 +75,41 @@ static int RGBToGray(int64_t r, int64_t g, int64_t b) {
 }

 static uint32_t ScaleDown(uint16_t a, uint16_t b, uint16_t c, uint16_t d,
-                          int rgb_bit_depth,
-                          SharpYuvTransferFunctionType transfer_type) {
+                          int rgb_bit_depth) {
  const int bit_depth = rgb_bit_depth + GetPrecisionShift(rgb_bit_depth);
-  const uint32_t A = SharpYuvGammaToLinear(a, bit_depth, transfer_type);
-  const uint32_t B = SharpYuvGammaToLinear(b, bit_depth, transfer_type);
-  const uint32_t C = SharpYuvGammaToLinear(c, bit_depth, transfer_type);
-  const uint32_t D = SharpYuvGammaToLinear(d, bit_depth, transfer_type);
-  return SharpYuvLinearToGamma((A + B + C + D + 2) >> 2, bit_depth,
-                               transfer_type);
+  const uint32_t A = SharpYuvGammaToLinear(a, bit_depth);
+  const uint32_t B = SharpYuvGammaToLinear(b, bit_depth);
+  const uint32_t C = SharpYuvGammaToLinear(c, bit_depth);
+  const uint32_t D = SharpYuvGammaToLinear(d, bit_depth);
+  return SharpYuvLinearToGamma((A + B + C + D + 2) >> 2, bit_depth);
 }

 static WEBP_INLINE void UpdateW(const fixed_y_t* src, fixed_y_t* dst, int w,
-                                int rgb_bit_depth,
-                                SharpYuvTransferFunctionType transfer_type) {
+                                int rgb_bit_depth) {
  const int bit_depth = rgb_bit_depth + GetPrecisionShift(rgb_bit_depth);
-  int i = 0;
-  do {
-    const uint32_t R =
-        SharpYuvGammaToLinear(src[0 * w + i], bit_depth, transfer_type);
-    const uint32_t G =
-        SharpYuvGammaToLinear(src[1 * w + i], bit_depth, transfer_type);
-    const uint32_t B =
-        SharpYuvGammaToLinear(src[2 * w + i], bit_depth, transfer_type);
+  int i;
+  for (i = 0; i < w; ++i) {
+    const uint32_t R = SharpYuvGammaToLinear(src[0 * w + i], bit_depth);
+    const uint32_t G = SharpYuvGammaToLinear(src[1 * w + i], bit_depth);
+    const uint32_t B = SharpYuvGammaToLinear(src[2 * w + i], bit_depth);
    const uint32_t Y = RGBToGray(R, G, B);
-    dst[i] = (fixed_y_t)SharpYuvLinearToGamma(Y, bit_depth, transfer_type);
-  } while (++i < w);
+    dst[i] = (fixed_y_t)SharpYuvLinearToGamma(Y, bit_depth);
+  }
 }

 static void UpdateChroma(const fixed_y_t* src1, const fixed_y_t* src2,
-                         fixed_t* dst, int uv_w, int rgb_bit_depth,
-                         SharpYuvTransferFunctionType transfer_type) {
-  int i = 0;
-  do {
+                         fixed_t* dst, int uv_w, int rgb_bit_depth) {
+  int i;
+  for (i = 0; i < uv_w; ++i) {
    const int r =
        ScaleDown(src1[0 * uv_w + 0], src1[0 * uv_w + 1], src2[0 * uv_w + 0],
-                  src2[0 * uv_w + 1], rgb_bit_depth, transfer_type);
+                  src2[0 * uv_w + 1], rgb_bit_depth);
    const int g =
        ScaleDown(src1[2 * uv_w + 0], src1[2 * uv_w + 1], src2[2 * uv_w + 0],
-                  src2[2 * uv_w + 1], rgb_bit_depth, transfer_type);
+                  src2[2 * uv_w + 1], rgb_bit_depth);
    const int b =
        ScaleDown(src1[4 * uv_w + 0], src1[4 * uv_w + 1], src2[4 * uv_w + 0],
-                  src2[4 * uv_w + 1], rgb_bit_depth, transfer_type);
+                  src2[4 * uv_w + 1], rgb_bit_depth);
    const int W = RGBToGray(r, g, b);
    dst[0 * uv_w] = (fixed_t)(r - W);
    dst[1 * uv_w] = (fixed_t)(g - W);
@@ -124,15 +117,15 @@ static void UpdateChroma(const fixed_y_t* src1, const fixed_y_t* src2,
    dst  += 1;
    src1 += 2;
    src2 += 2;
-  } while (++i < uv_w);
+  }
 }

 static void StoreGray(const fixed_y_t* rgb, fixed_y_t* y, int w) {
-  int i = 0;
+  int i;
  assert(w > 0);
-  do {
+  for (i = 0; i < w; ++i) {
    y[i] = RGBToGray(rgb[0 * w + i], rgb[1 * w + i], rgb[2 * w + i]);
-  } while (++i < w);
+  }
 }

 //------------------------------------------------------------------------------
@@ -158,9 +151,9 @@ static void ImportOneRow(const uint8_t* const r_ptr,
  // Convert the rgb_step from a number of bytes to a number of uint8_t or
  // uint16_t values depending the bit depth.
  const int step = (rgb_bit_depth > 8) ? rgb_step / 2 : rgb_step;
-  int i = 0;
+  int i;
  const int w = (pic_width + 1) & ~1;
-  do {
+  for (i = 0; i < pic_width; ++i) {
    const int off = i * step;
    const int shift = GetPrecisionShift(rgb_bit_depth);
    if (rgb_bit_depth == 8) {
@@ -172,7 +165,7 @@ static void ImportOneRow(const uint8_t* const r_ptr,
      dst[i + 1 * w] = Shift(((uint16_t*)g_ptr)[off], shift);
      dst[i + 2 * w] = Shift(((uint16_t*)b_ptr)[off], shift);
    }
-  } while (++i < pic_width);
+  }
  if (pic_width & 1) {  // replicate rightmost pixel
    dst[pic_width + 0 * w] = dst[pic_width + 0 * w - 1];
    dst[pic_width + 1 * w] = dst[pic_width + 1 * w - 1];
@@ -240,11 +233,8 @@ static int ConvertWRGBToYUV(const fixed_y_t* best_y, const fixed_t* best_uv,
  const int sfix = GetPrecisionShift(rgb_bit_depth);
  const int yuv_max = (1 << yuv_bit_depth) - 1;

-  best_uv = best_uv_base;
-  j = 0;
-  do {
-    i = 0;
-    do {
+  for (best_uv = best_uv_base, j = 0; j < height; ++j) {
+    for (i = 0; i < width; ++i) {
      const int off = (i >> 1);
      const int W = best_y[i];
      const int r = best_uv[off + 0 * uv_w] + W;
@@ -256,22 +246,19 @@ static int ConvertWRGBToYUV(const fixed_y_t* best_y, const fixed_t* best_uv,
      } else {
        ((uint16_t*)y_ptr)[i] = clip(y, yuv_max);
      }
-    } while (++i < width);
+    }
    best_y += w;
    best_uv += (j & 1) * 3 * uv_w;
    y_ptr += y_stride;
-  } while (++j < height);
-
-  best_uv = best_uv_base;
-  j = 0;
-  do {
-    i = 0;
-    do {
+  }
+  for (best_uv = best_uv_base, j = 0; j < uv_h; ++j) {
+    for (i = 0; i < uv_w; ++i) {
+      const int off = i;
      // Note r, g and b values here are off by W, but a constant offset on all
      // 3 components doesn't change the value of u and v with a YCbCr matrix.
-      const int r = best_uv[i + 0 * uv_w];
-      const int g = best_uv[i + 1 * uv_w];
-      const int b = best_uv[i + 2 * uv_w];
+      const int r = best_uv[off + 0 * uv_w];
+      const int g = best_uv[off + 1 * uv_w];
+      const int b = best_uv[off + 2 * uv_w];
      const int u = RGBToYUVComponent(r, g, b, yuv_matrix->rgb_to_u, sfix);
      const int v = RGBToYUVComponent(r, g, b, yuv_matrix->rgb_to_v, sfix);
      if (yuv_bit_depth <= 8) {
@@ -281,11 +268,11 @@ static int ConvertWRGBToYUV(const fixed_y_t* best_y, const fixed_t* best_uv,
        ((uint16_t*)u_ptr)[i] = clip(u, yuv_max);
        ((uint16_t*)v_ptr)[i] = clip(v, yuv_max);
      }
-    } while (++i < uv_w);
+    }
    best_uv += 3 * uv_w;
    u_ptr += u_stride;
    v_ptr += v_stride;
-  } while (++j < uv_h);
+  }
  return 1;
 }

@@ -298,7 +285,7 @@ static void* SafeMalloc(uint64_t nmemb, size_t size) {
  return malloc((size_t)total_size);
 }

-#define SAFE_ALLOC(W, H, T) ((T*)SafeMalloc((uint64_t)(W) * (H), sizeof(T)))
+#define SAFE_ALLOC(W, H, T) ((T*)SafeMalloc((W) * (H), sizeof(T)))

 static int DoSharpArgbToYuv(const uint8_t* r_ptr, const uint8_t* g_ptr,
                            const uint8_t* b_ptr, int rgb_step, int rgb_stride,
@@ -306,14 +293,12 @@ static int DoSharpArgbToYuv(const uint8_t* r_ptr, const uint8_t* g_ptr,
                            uint8_t* u_ptr, int u_stride, uint8_t* v_ptr,
                            int v_stride, int yuv_bit_depth, int width,
                            int height,
-                            const SharpYuvConversionMatrix* yuv_matrix,
-                            SharpYuvTransferFunctionType transfer_type) {
+                            const SharpYuvConversionMatrix* yuv_matrix) {
  // we expand the right/bottom border if needed
  const int w = (width + 1) & ~1;
  const int h = (height + 1) & ~1;
  const int uv_w = w >> 1;
  const int uv_h = h >> 1;
-  const int y_bit_depth = rgb_bit_depth + GetPrecisionShift(rgb_bit_depth);
  uint64_t prev_diff_y_sum = ~0;
  int j, iter;

@@ -361,9 +346,9 @@ static int DoSharpArgbToYuv(const uint8_t* r_ptr, const uint8_t* g_ptr,
    StoreGray(src1, best_y + 0, w);
    StoreGray(src2, best_y + w, w);

-    UpdateW(src1, target_y, w, rgb_bit_depth, transfer_type);
-    UpdateW(src2, target_y + w, w, rgb_bit_depth, transfer_type);
-    UpdateChroma(src1, src2, target_uv, uv_w, rgb_bit_depth, transfer_type);
+    UpdateW(src1, target_y, w, rgb_bit_depth);
+    UpdateW(src2, target_y + w, w, rgb_bit_depth);
+    UpdateChroma(src1, src2, target_uv, uv_w, rgb_bit_depth);
    memcpy(best_uv, target_uv, 3 * uv_w * sizeof(*best_uv));
    best_y += 2 * w;
    best_uv += 3 * uv_w;
@@ -384,8 +369,7 @@ static int DoSharpArgbToYuv(const uint8_t* r_ptr, const uint8_t* g_ptr,
    best_uv = best_uv_base;
    target_y = target_y_base;
    target_uv = target_uv_base;
-    j = 0;
-    do {
+    for (j = 0; j < h; j += 2) {
      fixed_y_t* const src1 = tmp_buffer + 0 * w;
      fixed_y_t* const src2 = tmp_buffer + 3 * w;
      {
@@ -396,21 +380,21 @@ static int DoSharpArgbToYuv(const uint8_t* r_ptr, const uint8_t* g_ptr,
        cur_uv = next_uv;
      }

-      UpdateW(src1, best_rgb_y + 0 * w, w, rgb_bit_depth, transfer_type);
-      UpdateW(src2, best_rgb_y + 1 * w, w, rgb_bit_depth, transfer_type);
-      UpdateChroma(src1, src2, best_rgb_uv, uv_w, rgb_bit_depth, transfer_type);
+      UpdateW(src1, best_rgb_y + 0 * w, w, rgb_bit_depth);
+      UpdateW(src2, best_rgb_y + 1 * w, w, rgb_bit_depth);
+      UpdateChroma(src1, src2, best_rgb_uv, uv_w, rgb_bit_depth);

      // update two rows of Y and one row of RGB
      diff_y_sum +=
-          SharpYuvUpdateY(target_y, best_rgb_y, best_y, 2 * w, y_bit_depth);
+          SharpYuvUpdateY(target_y, best_rgb_y, best_y, 2 * w,
+                          rgb_bit_depth + GetPrecisionShift(rgb_bit_depth));
      SharpYuvUpdateRGB(target_uv, best_rgb_uv, best_uv, 3 * uv_w);

      best_y += 2 * w;
      best_uv += 3 * uv_w;
      target_y += 2 * w;
      target_uv += 3 * uv_w;
-      j += 2;
-    } while (j < h);
+    }
    // test exit condition
    if (iter > 0) {
      if (diff_y_sum < diff_y_threshold) break;
@@ -434,7 +418,6 @@ static int DoSharpArgbToYuv(const uint8_t* r_ptr, const uint8_t* g_ptr,
  free(tmp_buffer);
  return ok;
 }
-
 #undef SAFE_ALLOC

 #if defined(WEBP_USE_THREAD) && !defined(_WIN32)
@@ -457,7 +440,6 @@ static int DoSharpArgbToYuv(const uint8_t* r_ptr, const uint8_t* g_ptr,
 // By default SharpYuvConvert calls it with SharpYuvGetCPUInfo. If needed,
 // users can declare it as extern and call it with an alternate VP8CPUInfo
 // function.
-extern VP8CPUInfo SharpYuvGetCPUInfo;
 SHARPYUV_EXTERN void SharpYuvInit(VP8CPUInfo cpu_info_func);
 void SharpYuvInit(VP8CPUInfo cpu_info_func) {
  static volatile VP8CPUInfo sharpyuv_last_cpuinfo_used =
@@ -479,42 +461,12 @@ void SharpYuvInit(VP8CPUInfo cpu_info_func) {
  UNLOCK_ACCESS_AND_RETURN;
 }

-int SharpYuvConvert(const void* r_ptr, const void* g_ptr, const void* b_ptr,
-                    int rgb_step, int rgb_stride, int rgb_bit_depth,
-                    void* y_ptr, int y_stride, void* u_ptr, int u_stride,
-                    void* v_ptr, int v_stride, int yuv_bit_depth, int width,
+int SharpYuvConvert(const void* r_ptr, const void* g_ptr,
+                    const void* b_ptr, int rgb_step, int rgb_stride,
+                    int rgb_bit_depth, void* y_ptr, int y_stride,
+                    void* u_ptr, int u_stride, void* v_ptr,
+                    int v_stride, int yuv_bit_depth, int width,
                    int height, const SharpYuvConversionMatrix* yuv_matrix) {
-  SharpYuvOptions options;
-  options.yuv_matrix = yuv_matrix;
-  options.transfer_type = kSharpYuvTransferFunctionSrgb;
-  return SharpYuvConvertWithOptions(
-      r_ptr, g_ptr, b_ptr, rgb_step, rgb_stride, rgb_bit_depth, y_ptr, y_stride,
-      u_ptr, u_stride, v_ptr, v_stride, yuv_bit_depth, width, height, &options);
-}
-
-int SharpYuvOptionsInitInternal(const SharpYuvConversionMatrix* yuv_matrix,
-                                SharpYuvOptions* options, int version) {
-  const int major = (version >> 24);
-  const int minor = (version >> 16) & 0xff;
-  if (options == NULL || yuv_matrix == NULL ||
-      (major == SHARPYUV_VERSION_MAJOR && major == 0 &&
-       minor != SHARPYUV_VERSION_MINOR) ||
-      (major != SHARPYUV_VERSION_MAJOR)) {
-    return 0;
-  }
-  options->yuv_matrix = yuv_matrix;
-  options->transfer_type = kSharpYuvTransferFunctionSrgb;
-  return 1;
-}
-
-int SharpYuvConvertWithOptions(const void* r_ptr, const void* g_ptr,
-                               const void* b_ptr, int rgb_step, int rgb_stride,
-                               int rgb_bit_depth, void* y_ptr, int y_stride,
-                               void* u_ptr, int u_stride, void* v_ptr,
-                               int v_stride, int yuv_bit_depth, int width,
-                               int height, const SharpYuvOptions* options) {
-  const SharpYuvConversionMatrix* yuv_matrix = options->yuv_matrix;
-  SharpYuvTransferFunctionType transfer_type = options->transfer_type;
  SharpYuvConversionMatrix scaled_matrix;
  const int rgb_max = (1 << rgb_bit_depth) - 1;
  const int rgb_round = 1 << (rgb_bit_depth - 1);
@@ -533,7 +485,7 @@ int SharpYuvConvertWithOptions(const void* r_ptr, const void* g_ptr,
  if (yuv_bit_depth != 8 && yuv_bit_depth != 10 && yuv_bit_depth != 12) {
    return 0;
  }
-  if (rgb_bit_depth > 8 && (rgb_step % 2 != 0 || rgb_stride % 2 != 0)) {
+  if (rgb_bit_depth > 8 && (rgb_step % 2 != 0 || rgb_stride %2 != 0)) {
    // Step/stride should be even for uint16_t buffers.
    return 0;
  }
@@ -568,7 +520,7 @@ int SharpYuvConvertWithOptions(const void* r_ptr, const void* g_ptr,
  return DoSharpArgbToYuv(r_ptr, g_ptr, b_ptr, rgb_step, rgb_stride,
                          rgb_bit_depth, y_ptr, y_stride, u_ptr, u_stride,
                          v_ptr, v_stride, yuv_bit_depth, width, height,
-                          &scaled_matrix, transfer_type);
+                          &scaled_matrix);
 }

 //------------------------------------------------------------------------------
--- a/sharpyuv/sharpyuv.h
+++ b/sharpyuv/sharpyuv.h
@@ -22,36 +22,21 @@ extern "C" {
 #else
 // This explicitly marks library functions and allows for changing the
 // signature for e.g., Windows DLL builds.
-#if defined(_WIN32) && defined(WEBP_DLL)
-#define SHARPYUV_EXTERN __declspec(dllexport)
-#elif defined(__GNUC__) && __GNUC__ >= 4
+#if defined(__GNUC__) && __GNUC__ >= 4
 #define SHARPYUV_EXTERN extern __attribute__((visibility("default")))
 #else
+#if defined(_MSC_VER) && defined(WEBP_DLL)
+#define SHARPYUV_EXTERN __declspec(dllexport)
+#else
 #define SHARPYUV_EXTERN extern
-#endif /* defined(_WIN32) && defined(WEBP_DLL) */
+#endif /* _MSC_VER && WEBP_DLL */
+#endif /* __GNUC__ >= 4 */
 #endif /* WEBP_EXTERN */
 #endif /* SHARPYUV_EXTERN */

-#ifndef SHARPYUV_INLINE
-#ifdef WEBP_INLINE
-#define SHARPYUV_INLINE WEBP_INLINE
-#else
-#ifndef _MSC_VER
-#if defined(__cplusplus) || !defined(__STRICT_ANSI__) || \
-    (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L)
-#define SHARPYUV_INLINE inline
-#else
-#define SHARPYUV_INLINE
-#endif
-#else
-#define SHARPYUV_INLINE __forceinline
-#endif /* _MSC_VER */
-#endif /* WEBP_INLINE */
-#endif /* SHARPYUV_INLINE */
-
 // SharpYUV API version following the convention from semver.org
 #define SHARPYUV_VERSION_MAJOR 0
-#define SHARPYUV_VERSION_MINOR 4
+#define SHARPYUV_VERSION_MINOR 2
 #define SHARPYUV_VERSION_PATCH 0
 // Version as a uint32_t. The major number is the high 8 bits.
 // The minor number is the middle 8 bits. The patch number is the low 16 bits.
@@ -76,33 +61,6 @@ typedef struct {
  int rgb_to_v[4];
 } SharpYuvConversionMatrix;

-typedef struct SharpYuvOptions SharpYuvOptions;
-
-// Enums for transfer functions, as defined in H.273,
-// https://www.itu.int/rec/T-REC-H.273-202107-I/en
-typedef enum SharpYuvTransferFunctionType {
-  // 0 is reserved
-  kSharpYuvTransferFunctionBt709 = 1,
-  // 2 is unspecified
-  // 3 is reserved
-  kSharpYuvTransferFunctionBt470M = 4,
-  kSharpYuvTransferFunctionBt470Bg = 5,
-  kSharpYuvTransferFunctionBt601 = 6,
-  kSharpYuvTransferFunctionSmpte240 = 7,
-  kSharpYuvTransferFunctionLinear = 8,
-  kSharpYuvTransferFunctionLog100 = 9,
-  kSharpYuvTransferFunctionLog100_Sqrt10 = 10,
-  kSharpYuvTransferFunctionIec61966 = 11,
-  kSharpYuvTransferFunctionBt1361 = 12,
-  kSharpYuvTransferFunctionSrgb = 13,
-  kSharpYuvTransferFunctionBt2020_10Bit = 14,
-  kSharpYuvTransferFunctionBt2020_12Bit = 15,
-  kSharpYuvTransferFunctionSmpte2084 = 16,  // PQ
-  kSharpYuvTransferFunctionSmpte428 = 17,
-  kSharpYuvTransferFunctionHlg = 18,
-  kSharpYuvTransferFunctionNum
-} SharpYuvTransferFunctionType;
-
 // Converts RGB to YUV420 using a downsampling algorithm that minimizes
 // artefacts caused by chroma subsampling.
 // This is slower than standard downsampling (averaging of 4 UV values).
@@ -127,8 +85,6 @@ typedef enum SharpYuvTransferFunctionType {
 //     adjacent pixels on the y, u and v channels. If yuv_bit_depth > 8, they
 //     should be multiples of 2.
 // width, height: width and height of the image in pixels
-// This function calls SharpYuvConvertWithOptions with a default transfer
-// function of kSharpYuvTransferFunctionSrgb.
 SHARPYUV_EXTERN int SharpYuvConvert(const void* r_ptr, const void* g_ptr,
                                    const void* b_ptr, int rgb_step,
                                    int rgb_stride, int rgb_bit_depth,
@@ -137,31 +93,6 @@ SHARPYUV_EXTERN int SharpYuvConvert(const void* r_ptr, const void* g_ptr,
                                    int yuv_bit_depth, int width, int height,
                                    const SharpYuvConversionMatrix* yuv_matrix);

-struct SharpYuvOptions {
-  // This matrix cannot be NULL and can be initialized by
-  // SharpYuvComputeConversionMatrix.
-  const SharpYuvConversionMatrix* yuv_matrix;
-  SharpYuvTransferFunctionType transfer_type;
-};
-
-// Internal, version-checked, entry point
-SHARPYUV_EXTERN int SharpYuvOptionsInitInternal(const SharpYuvConversionMatrix*,
-                                                SharpYuvOptions*, int);
-
-// Should always be called, to initialize a fresh SharpYuvOptions
-// structure before modification. SharpYuvOptionsInit() must have succeeded
-// before using the 'options' object.
-static SHARPYUV_INLINE int SharpYuvOptionsInit(
-    const SharpYuvConversionMatrix* yuv_matrix, SharpYuvOptions* options) {
-  return SharpYuvOptionsInitInternal(yuv_matrix, options, SHARPYUV_VERSION);
-}
-
-SHARPYUV_EXTERN int SharpYuvConvertWithOptions(
-    const void* r_ptr, const void* g_ptr, const void* b_ptr, int rgb_step,
-    int rgb_stride, int rgb_bit_depth, void* y_ptr, int y_stride, void* u_ptr,
-    int u_stride, void* v_ptr, int v_stride, int yuv_bit_depth, int width,
-    int height, const SharpYuvOptions* options);
-
 // TODO(b/194336375): Add YUV444 to YUV420 conversion. Maybe also add 422
 // support (it's rarely used in practice, especially for images).

--- a/sharpyuv/sharpyuv_dsp.c
+++ b/sharpyuv/sharpyuv_dsp.c
@@ -17,7 +17,6 @@
 #include <stdlib.h>

 #include "sharpyuv/sharpyuv_cpu.h"
-#include "src/webp/types.h"

 //-----------------------------------------------------------------------------

@@ -70,9 +69,9 @@ uint64_t (*SharpYuvUpdateY)(const uint16_t* src, const uint16_t* ref,
 void (*SharpYuvUpdateRGB)(const int16_t* src, const int16_t* ref, int16_t* dst,
                          int len);
 void (*SharpYuvFilterRow)(const int16_t* A, const int16_t* B, int len,
-                          const uint16_t* best_y, uint16_t* out, int bit_depth);
+                          const uint16_t* best_y, uint16_t* out,
+                          int bit_depth);

-extern VP8CPUInfo SharpYuvGetCPUInfo;
 extern void InitSharpYuvSSE2(void);
 extern void InitSharpYuvNEON(void);

--- a/sharpyuv/sharpyuv_gamma.c
+++ b/sharpyuv/sharpyuv_gamma.c
@@ -12,7 +12,6 @@
 #include "sharpyuv/sharpyuv_gamma.h"

 #include <assert.h>
-#include <float.h>
 #include <math.h>

 #include "src/webp/types.h"
@@ -98,7 +97,7 @@ static WEBP_INLINE uint32_t FixedPointInterpolation(int v, uint32_t* tab,
  return result;
 }

-static uint32_t ToLinearSrgb(uint16_t v, int bit_depth) {
+uint32_t SharpYuvGammaToLinear(uint16_t v, int bit_depth) {
  const int shift = GAMMA_TO_LINEAR_TAB_BITS - bit_depth;
  if (shift > 0) {
    return kGammaToLinearTabS[v << shift];
@@ -106,314 +105,9 @@ static uint32_t ToLinearSrgb(uint16_t v, int bit_depth) {
  return FixedPointInterpolation(v, kGammaToLinearTabS, -shift, 0);
 }

-static uint16_t FromLinearSrgb(uint32_t value, int bit_depth) {
+uint16_t SharpYuvLinearToGamma(uint32_t value, int bit_depth) {
  return FixedPointInterpolation(
      value, kLinearToGammaTabS,
      (GAMMA_TO_LINEAR_BITS - LINEAR_TO_GAMMA_TAB_BITS),
      bit_depth - GAMMA_TO_LINEAR_BITS);
 }
-
-////////////////////////////////////////////////////////////////////////////////
-
-#define CLAMP(x, low, high) \
-  (((x) < (low)) ? (low) : (((high) < (x)) ? (high) : (x)))
-#define MIN(a, b) (((a) < (b)) ? (a) : (b))
-#define MAX(a, b) (((a) > (b)) ? (a) : (b))
-
-static WEBP_INLINE float Roundf(float x) {
-  if (x < 0)
-    return (float)ceil((double)(x - 0.5f));
-  else
-    return (float)floor((double)(x + 0.5f));
-}
-
-static WEBP_INLINE float Powf(float base, float exp) {
-  return (float)pow((double)base, (double)exp);
-}
-
-static WEBP_INLINE float Log10f(float x) { return (float)log10((double)x); }
-
-static float ToLinear709(float gamma) {
-  if (gamma < 0.f) {
-    return 0.f;
-  } else if (gamma < 4.5f * 0.018053968510807f) {
-    return gamma / 4.5f;
-  } else if (gamma < 1.f) {
-    return Powf((gamma + 0.09929682680944f) / 1.09929682680944f, 1.f / 0.45f);
-  }
-  return 1.f;
-}
-
-static float FromLinear709(float linear) {
-  if (linear < 0.f) {
-    return 0.f;
-  } else if (linear < 0.018053968510807f) {
-    return linear * 4.5f;
-  } else if (linear < 1.f) {
-    return 1.09929682680944f * Powf(linear, 0.45f) - 0.09929682680944f;
-  }
-  return 1.f;
-}
-
-static float ToLinear470M(float gamma) {
-  return Powf(CLAMP(gamma, 0.f, 1.f), 2.2f);
-}
-
-static float FromLinear470M(float linear) {
-  return Powf(CLAMP(linear, 0.f, 1.f), 1.f / 2.2f);
-}
-
-static float ToLinear470Bg(float gamma) {
-  return Powf(CLAMP(gamma, 0.f, 1.f), 2.8f);
-}
-
-static float FromLinear470Bg(float linear) {
-  return Powf(CLAMP(linear, 0.f, 1.f), 1.f / 2.8f);
-}
-
-static float ToLinearSmpte240(float gamma) {
-  if (gamma < 0.f) {
-    return 0.f;
-  } else if (gamma < 4.f * 0.022821585529445f) {
-    return gamma / 4.f;
-  } else if (gamma < 1.f) {
-    return Powf((gamma + 0.111572195921731f) / 1.111572195921731f, 1.f / 0.45f);
-  }
-  return 1.f;
-}
-
-static float FromLinearSmpte240(float linear) {
-  if (linear < 0.f) {
-    return 0.f;
-  } else if (linear < 0.022821585529445f) {
-    return linear * 4.f;
-  } else if (linear < 1.f) {
-    return 1.111572195921731f * Powf(linear, 0.45f) - 0.111572195921731f;
-  }
-  return 1.f;
-}
-
-static float ToLinearLog100(float gamma) {
-  // The function is non-bijective so choose the middle of [0, 0.01].
-  const float mid_interval = 0.01f / 2.f;
-  return (gamma <= 0.0f) ? mid_interval
-                          : Powf(10.0f, 2.f * (MIN(gamma, 1.f) - 1.0f));
-}
-
-static float FromLinearLog100(float linear) {
-  return (linear < 0.01f) ? 0.0f : 1.0f + Log10f(MIN(linear, 1.f)) / 2.0f;
-}
-
-static float ToLinearLog100Sqrt10(float gamma) {
-  // The function is non-bijective so choose the middle of [0, 0.00316227766f[.
-  const float mid_interval = 0.00316227766f / 2.f;
-  return (gamma <= 0.0f) ? mid_interval
-                          : Powf(10.0f, 2.5f * (MIN(gamma, 1.f) - 1.0f));
-}
-
-static float FromLinearLog100Sqrt10(float linear) {
-  return (linear < 0.00316227766f) ? 0.0f
-                                  : 1.0f + Log10f(MIN(linear, 1.f)) / 2.5f;
-}
-
-static float ToLinearIec61966(float gamma) {
-  if (gamma <= -4.5f * 0.018053968510807f) {
-    return Powf((-gamma + 0.09929682680944f) / -1.09929682680944f, 1.f / 0.45f);
-  } else if (gamma < 4.5f * 0.018053968510807f) {
-    return gamma / 4.5f;
-  }
-  return Powf((gamma + 0.09929682680944f) / 1.09929682680944f, 1.f / 0.45f);
-}
-
-static float FromLinearIec61966(float linear) {
-  if (linear <= -0.018053968510807f) {
-    return -1.09929682680944f * Powf(-linear, 0.45f) + 0.09929682680944f;
-  } else if (linear < 0.018053968510807f) {
-    return linear * 4.5f;
-  }
-  return 1.09929682680944f * Powf(linear, 0.45f) - 0.09929682680944f;
-}
-
-static float ToLinearBt1361(float gamma) {
-  if (gamma < -0.25f) {
-    return -0.25f;
-  } else if (gamma < 0.f) {
-    return Powf((gamma - 0.02482420670236f) / -0.27482420670236f, 1.f / 0.45f) /
-           -4.f;
-  } else if (gamma < 4.5f * 0.018053968510807f) {
-    return gamma / 4.5f;
-  } else if (gamma < 1.f) {
-    return Powf((gamma + 0.09929682680944f) / 1.09929682680944f, 1.f / 0.45f);
-  }
-  return 1.f;
-}
-
-static float FromLinearBt1361(float linear) {
-  if (linear < -0.25f) {
-    return -0.25f;
-  } else if (linear < 0.f) {
-    return -0.27482420670236f * Powf(-4.f * linear, 0.45f) + 0.02482420670236f;
-  } else if (linear < 0.018053968510807f) {
-    return linear * 4.5f;
-  } else if (linear < 1.f) {
-    return 1.09929682680944f * Powf(linear, 0.45f) - 0.09929682680944f;
-  }
-  return 1.f;
-}
-
-static float ToLinearPq(float gamma) {
-  if (gamma > 0.f) {
-    const float pow_gamma = Powf(gamma, 32.f / 2523.f);
-    const float num = MAX(pow_gamma - 107.f / 128.f, 0.0f);
-    const float den = MAX(2413.f / 128.f - 2392.f / 128.f * pow_gamma, FLT_MIN);
-    return Powf(num / den, 4096.f / 653.f);
-  }
-  return 0.f;
-}
-
-static float FromLinearPq(float linear) {
-  if (linear > 0.f) {
-    const float pow_linear = Powf(linear, 653.f / 4096.f);
-    const float num = 107.f / 128.f + 2413.f / 128.f * pow_linear;
-    const float den = 1.0f + 2392.f / 128.f * pow_linear;
-    return Powf(num / den, 2523.f / 32.f);
-  }
-  return 0.f;
-}
-
-static float ToLinearSmpte428(float gamma) {
-  return Powf(MAX(gamma, 0.f), 2.6f) / 0.91655527974030934f;
-}
-
-static float FromLinearSmpte428(float linear) {
-  return Powf(0.91655527974030934f * MAX(linear, 0.f), 1.f / 2.6f);
-}
-
-// Conversion in BT.2100 requires RGB info. Simplify to gamma correction here.
-static float ToLinearHlg(float gamma) {
-  if (gamma < 0.f) {
-    return 0.f;
-  } else if (gamma <= 0.5f) {
-    return Powf((gamma * gamma) * (1.f / 3.f), 1.2f);
-  }
-  return Powf((expf((gamma - 0.55991073f) / 0.17883277f) + 0.28466892f) / 12.0f,
-              1.2f);
-}
-
-static float FromLinearHlg(float linear) {
-  linear = Powf(linear, 1.f / 1.2f);
-  if (linear < 0.f) {
-    return 0.f;
-  } else if (linear <= (1.f / 12.f)) {
-    return sqrtf(3.f * linear);
-  }
-  return 0.17883277f * logf(12.f * linear - 0.28466892f) + 0.55991073f;
-}
-
-uint32_t SharpYuvGammaToLinear(uint16_t v, int bit_depth,
-                               SharpYuvTransferFunctionType transfer_type) {
-  float v_float, linear;
-  if (transfer_type == kSharpYuvTransferFunctionSrgb) {
-    return ToLinearSrgb(v, bit_depth);
-  }
-  v_float = (float)v / ((1 << bit_depth) - 1);
-  switch (transfer_type) {
-    case kSharpYuvTransferFunctionBt709:
-    case kSharpYuvTransferFunctionBt601:
-    case kSharpYuvTransferFunctionBt2020_10Bit:
-    case kSharpYuvTransferFunctionBt2020_12Bit:
-      linear = ToLinear709(v_float);
-      break;
-    case kSharpYuvTransferFunctionBt470M:
-      linear = ToLinear470M(v_float);
-      break;
-    case kSharpYuvTransferFunctionBt470Bg:
-      linear = ToLinear470Bg(v_float);
-      break;
-    case kSharpYuvTransferFunctionSmpte240:
-      linear = ToLinearSmpte240(v_float);
-      break;
-    case kSharpYuvTransferFunctionLinear:
-      return v;
-    case kSharpYuvTransferFunctionLog100:
-      linear = ToLinearLog100(v_float);
-      break;
-    case kSharpYuvTransferFunctionLog100_Sqrt10:
-      linear = ToLinearLog100Sqrt10(v_float);
-      break;
-    case kSharpYuvTransferFunctionIec61966:
-      linear = ToLinearIec61966(v_float);
-      break;
-    case kSharpYuvTransferFunctionBt1361:
-      linear = ToLinearBt1361(v_float);
-      break;
-    case kSharpYuvTransferFunctionSmpte2084:
-      linear = ToLinearPq(v_float);
-      break;
-    case kSharpYuvTransferFunctionSmpte428:
-      linear = ToLinearSmpte428(v_float);
-      break;
-    case kSharpYuvTransferFunctionHlg:
-      linear = ToLinearHlg(v_float);
-      break;
-    default:
-      assert(0);
-      linear = 0;
-      break;
-  }
-  return (uint32_t)Roundf(linear * ((1 << 16) - 1));
-}
-
-uint16_t SharpYuvLinearToGamma(uint32_t v, int bit_depth,
-                               SharpYuvTransferFunctionType transfer_type) {
-  float v_float, linear;
-  if (transfer_type == kSharpYuvTransferFunctionSrgb) {
-    return FromLinearSrgb(v, bit_depth);
-  }
-  v_float = (float)v / ((1 << 16) - 1);
-  switch (transfer_type) {
-    case kSharpYuvTransferFunctionBt709:
-    case kSharpYuvTransferFunctionBt601:
-    case kSharpYuvTransferFunctionBt2020_10Bit:
-    case kSharpYuvTransferFunctionBt2020_12Bit:
-      linear = FromLinear709(v_float);
-      break;
-    case kSharpYuvTransferFunctionBt470M:
-      linear = FromLinear470M(v_float);
-      break;
-    case kSharpYuvTransferFunctionBt470Bg:
-      linear = FromLinear470Bg(v_float);
-      break;
-    case kSharpYuvTransferFunctionSmpte240:
-      linear = FromLinearSmpte240(v_float);
-      break;
-    case kSharpYuvTransferFunctionLinear:
-      return v;
-    case kSharpYuvTransferFunctionLog100:
-      linear = FromLinearLog100(v_float);
-      break;
-    case kSharpYuvTransferFunctionLog100_Sqrt10:
-      linear = FromLinearLog100Sqrt10(v_float);
-      break;
-    case kSharpYuvTransferFunctionIec61966:
-      linear = FromLinearIec61966(v_float);
-      break;
-    case kSharpYuvTransferFunctionBt1361:
-      linear = FromLinearBt1361(v_float);
-      break;
-    case kSharpYuvTransferFunctionSmpte2084:
-      linear = FromLinearPq(v_float);
-      break;
-    case kSharpYuvTransferFunctionSmpte428:
-      linear = FromLinearSmpte428(v_float);
-      break;
-    case kSharpYuvTransferFunctionHlg:
-      linear = FromLinearHlg(v_float);
-      break;
-    default:
-      assert(0);
-      linear = 0;
-      break;
-  }
-  return (uint16_t)Roundf(linear * ((1 << bit_depth) - 1));
-}
--- a/sharpyuv/sharpyuv_gamma.h
+++ b/sharpyuv/sharpyuv_gamma.h
@@ -12,7 +12,6 @@
 #ifndef WEBP_SHARPYUV_SHARPYUV_GAMMA_H_
 #define WEBP_SHARPYUV_SHARPYUV_GAMMA_H_

-#include "sharpyuv/sharpyuv.h"
 #include "src/webp/types.h"

 #ifdef __cplusplus
@@ -23,13 +22,11 @@ extern "C" {
 // SharpYuvGammaToLinear or SharpYuvLinearToGamma.
 void SharpYuvInitGammaTables(void);

-// Converts a 'bit_depth'-bit gamma color value to a 16-bit linear value.
-uint32_t SharpYuvGammaToLinear(uint16_t v, int bit_depth,
-                               SharpYuvTransferFunctionType transfer_type);
+// Converts a gamma color value on 'bit_depth' bits to a 16 bit linear value.
+uint32_t SharpYuvGammaToLinear(uint16_t v, int bit_depth);

-// Converts a 16-bit linear color value to a 'bit_depth'-bit gamma value.
-uint16_t SharpYuvLinearToGamma(uint32_t value, int bit_depth,
-                               SharpYuvTransferFunctionType transfer_type);
+// Converts a 16 bit linear color value to a gamma value on 'bit_depth' bits.
+uint16_t SharpYuvLinearToGamma(uint32_t value, int bit_depth);

 #ifdef __cplusplus
 }  // extern "C"
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -36,7 +36,7 @@ libwebp_la_LIBADD += utils/libwebputils.la
 # other than the ones listed on the command line, i.e., after linking, it will
 # not have unresolved symbols. Some platforms (Windows among them) require all
 # symbols in shared libraries to be resolved at library creation.
-libwebp_la_LDFLAGS = -no-undefined -version-info 8:9:1
+libwebp_la_LDFLAGS = -no-undefined -version-info 8:6:1
 libwebpincludedir = $(includedir)/webp
 pkgconfig_DATA = libwebp.pc

@@ -48,7 +48,7 @@ if BUILD_LIBWEBPDECODER
  libwebpdecoder_la_LIBADD += dsp/libwebpdspdecode.la
  libwebpdecoder_la_LIBADD += utils/libwebputilsdecode.la

-  libwebpdecoder_la_LDFLAGS = -no-undefined -version-info 4:9:1
+  libwebpdecoder_la_LDFLAGS = -no-undefined -version-info 4:6:1
  pkgconfig_DATA += libwebpdecoder.pc
 endif

--- a/src/dec/alpha_dec.c
+++ b/src/dec/alpha_dec.c
@@ -13,20 +13,18 @@

 #include <stdlib.h>
 #include "src/dec/alphai_dec.h"
-#include "src/dec/vp8_dec.h"
 #include "src/dec/vp8i_dec.h"
 #include "src/dec/vp8li_dec.h"
 #include "src/dsp/dsp.h"
 #include "src/utils/quant_levels_dec_utils.h"
 #include "src/utils/utils.h"
 #include "src/webp/format_constants.h"
-#include "src/webp/types.h"

 //------------------------------------------------------------------------------
 // ALPHDecoder object.

 // Allocates a new alpha decoder instance.
-WEBP_NODISCARD static ALPHDecoder* ALPHNew(void) {
+static ALPHDecoder* ALPHNew(void) {
  ALPHDecoder* const dec = (ALPHDecoder*)WebPSafeCalloc(1ULL, sizeof(*dec));
  return dec;
 }
@@ -47,9 +45,9 @@ static void ALPHDelete(ALPHDecoder* const dec) {
 // header for alpha data stored using lossless compression.
 // Returns false in case of error in alpha header (data too short, invalid
 // compression method or filter, error in lossless header data etc).
-WEBP_NODISCARD static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data,
-                                   size_t data_size, const VP8Io* const src_io,
-                                   uint8_t* output) {
+static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data,
+                    size_t data_size, const VP8Io* const src_io,
+                    uint8_t* output) {
  int ok = 0;
  const uint8_t* const alpha_data = data + ALPHA_HEADER_LEN;
  const size_t alpha_data_size = data_size - ALPHA_HEADER_LEN;
@@ -81,9 +79,7 @@ WEBP_NODISCARD static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data,
  }

  // Copy the necessary parameters from src_io to io
-  if (!VP8InitIo(io)) {
-    return 0;
-  }
+  VP8InitIo(io);
  WebPInitCustomIo(NULL, io);
  io->opaque = dec;
  io->width = src_io->width;
@@ -111,8 +107,7 @@ WEBP_NODISCARD static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data,
 // starting from row number 'row'. It assumes that rows up to (row - 1) have
 // already been decoded.
 // Returns false in case of bitstream error.
-WEBP_NODISCARD static int ALPHDecode(VP8Decoder* const dec, int row,
-                                     int num_rows) {
+static int ALPHDecode(VP8Decoder* const dec, int row, int num_rows) {
  ALPHDecoder* const alph_dec = dec->alph_dec_;
  const int width = alph_dec->width_;
  const int height = alph_dec->io_.crop_bottom;
@@ -122,12 +117,21 @@ WEBP_NODISCARD static int ALPHDecode(VP8Decoder* const dec, int row,
    const uint8_t* deltas = dec->alpha_data_ + ALPHA_HEADER_LEN + row * width;
    uint8_t* dst = dec->alpha_plane_ + row * width;
    assert(deltas <= &dec->alpha_data_[dec->alpha_data_size_]);
-    assert(WebPUnfilters[alph_dec->filter_] != NULL);
-    for (y = 0; y < num_rows; ++y) {
-      WebPUnfilters[alph_dec->filter_](prev_line, deltas, dst, width);
-      prev_line = dst;
-      dst += width;
-      deltas += width;
+    if (alph_dec->filter_ != WEBP_FILTER_NONE) {
+      assert(WebPUnfilters[alph_dec->filter_] != NULL);
+      for (y = 0; y < num_rows; ++y) {
+        WebPUnfilters[alph_dec->filter_](prev_line, deltas, dst, width);
+        prev_line = dst;
+        dst += width;
+        deltas += width;
+      }
+    } else {
+      for (y = 0; y < num_rows; ++y) {
+        memcpy(dst, deltas, width * sizeof(*dst));
+        prev_line = dst;
+        dst += width;
+        deltas += width;
+      }
    }
    dec->alpha_prev_line_ = prev_line;
  } else {  // alph_dec->method_ == ALPHA_LOSSLESS_COMPRESSION
@@ -143,8 +147,7 @@ WEBP_NODISCARD static int ALPHDecode(VP8Decoder* const dec, int row,
  return 1;
 }

-WEBP_NODISCARD static int AllocateAlphaPlane(VP8Decoder* const dec,
-                                             const VP8Io* const io) {
+static int AllocateAlphaPlane(VP8Decoder* const dec, const VP8Io* const io) {
  const int stride = io->width;
  const int height = io->crop_bottom;
  const uint64_t alpha_size = (uint64_t)stride * height;
@@ -152,8 +155,7 @@ WEBP_NODISCARD static int AllocateAlphaPlane(VP8Decoder* const dec,
  dec->alpha_plane_mem_ =
      (uint8_t*)WebPSafeMalloc(alpha_size, sizeof(*dec->alpha_plane_));
  if (dec->alpha_plane_mem_ == NULL) {
-    return VP8SetError(dec, VP8_STATUS_OUT_OF_MEMORY,
-                       "Alpha decoder initialization failed.");
+    return 0;
  }
  dec->alpha_plane_ = dec->alpha_plane_mem_;
  dec->alpha_prev_line_ = NULL;
@@ -172,9 +174,9 @@ void WebPDeallocateAlphaMemory(VP8Decoder* const dec) {
 //------------------------------------------------------------------------------
 // Main entry point.

-WEBP_NODISCARD const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
-                                                     const VP8Io* const io,
-                                                     int row, int num_rows) {
+const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
+                                      const VP8Io* const io,
+                                      int row, int num_rows) {
  const int width = io->width;
  const int height = io->crop_bottom;

@@ -187,19 +189,10 @@ WEBP_NODISCARD const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
  if (!dec->is_alpha_decoded_) {
    if (dec->alph_dec_ == NULL) {    // Initialize decoder.
      dec->alph_dec_ = ALPHNew();
-      if (dec->alph_dec_ == NULL) {
-        VP8SetError(dec, VP8_STATUS_OUT_OF_MEMORY,
-                    "Alpha decoder initialization failed.");
-        return NULL;
-      }
+      if (dec->alph_dec_ == NULL) return NULL;
      if (!AllocateAlphaPlane(dec, io)) goto Error;
      if (!ALPHInit(dec->alph_dec_, dec->alpha_data_, dec->alpha_data_size_,
                    io, dec->alpha_plane_)) {
-        VP8LDecoder* const vp8l_dec = dec->alph_dec_->vp8l_dec_;
-        VP8SetError(dec,
-                    (vp8l_dec == NULL) ? VP8_STATUS_OUT_OF_MEMORY
-                                       : vp8l_dec->status_,
-                    "Alpha decoder initialization failed.");
        goto Error;
      }
      // if we allowed use of alpha dithering, check whether it's needed at all
--- a/src/dec/buffer_dec.c
+++ b/src/dec/buffer_dec.c
@@ -75,7 +75,7 @@ static VP8StatusCode CheckDecBuffer(const WebPDecBuffer* const buffer) {
    const WebPRGBABuffer* const buf = &buffer->u.RGBA;
    const int stride = abs(buf->stride);
    const uint64_t size =
-        MIN_BUFFER_SIZE((uint64_t)width * kModeBpp[mode], height, stride);
+        MIN_BUFFER_SIZE(width * kModeBpp[mode], height, stride);
    ok &= (size <= buf->size);
    ok &= (stride >= width * kModeBpp[mode]);
    ok &= (buf->rgba != NULL);
--- a/src/dec/idec_dec.c
+++ b/src/dec/idec_dec.c
@@ -17,10 +17,8 @@

 #include "src/dec/alphai_dec.h"
 #include "src/dec/webpi_dec.h"
-#include "src/dec/vp8_dec.h"
 #include "src/dec/vp8i_dec.h"
 #include "src/utils/utils.h"
-#include "src/webp/decode.h"

 // In append mode, buffer allocations increase as multiples of this value.
 // Needs to be a power of 2.
@@ -163,9 +161,8 @@ static void DoRemap(WebPIDecoder* const idec, ptrdiff_t offset) {

 // Appends data to the end of MemBuffer->buf_. It expands the allocated memory
 // size if required and also updates VP8BitReader's if new memory is allocated.
-WEBP_NODISCARD static int AppendToMemBuffer(WebPIDecoder* const idec,
-                                            const uint8_t* const data,
-                                            size_t data_size) {
+static int AppendToMemBuffer(WebPIDecoder* const idec,
+                             const uint8_t* const data, size_t data_size) {
  VP8Decoder* const dec = (VP8Decoder*)idec->dec_;
  MemBuffer* const mem = &idec->mem_;
  const int need_compressed_alpha = NeedCompressedAlpha(idec);
@@ -206,9 +203,8 @@ WEBP_NODISCARD static int AppendToMemBuffer(WebPIDecoder* const idec,
  return 1;
 }

-WEBP_NODISCARD static int RemapMemBuffer(WebPIDecoder* const idec,
-                                         const uint8_t* const data,
-                                         size_t data_size) {
+static int RemapMemBuffer(WebPIDecoder* const idec,
+                          const uint8_t* const data, size_t data_size) {
  MemBuffer* const mem = &idec->mem_;
  const uint8_t* const old_buf = mem->buf_;
  const uint8_t* const old_start =
@@ -241,8 +237,7 @@ static void ClearMemBuffer(MemBuffer* const mem) {
  }
 }

-WEBP_NODISCARD static int CheckMemBufferMode(MemBuffer* const mem,
-                                             MemBufferMode expected) {
+static int CheckMemBufferMode(MemBuffer* const mem, MemBufferMode expected) {
  if (mem->mode_ == MEM_MODE_NONE) {
    mem->mode_ = expected;    // switch to the expected mode
  } else if (mem->mode_ != expected) {
@@ -253,7 +248,7 @@ WEBP_NODISCARD static int CheckMemBufferMode(MemBuffer* const mem,
 }

 // To be called last.
-WEBP_NODISCARD static VP8StatusCode FinishDecoding(WebPIDecoder* const idec) {
+static VP8StatusCode FinishDecoding(WebPIDecoder* const idec) {
  const WebPDecoderOptions* const options = idec->params_.options;
  WebPDecBuffer* const output = idec->params_.output;

@@ -263,10 +258,8 @@ WEBP_NODISCARD static VP8StatusCode FinishDecoding(WebPIDecoder* const idec) {
    if (status != VP8_STATUS_OK) return status;
  }
  if (idec->final_output_ != NULL) {
-    const VP8StatusCode status = WebPCopyDecBufferPixels(
-        output, idec->final_output_);  // do the slow-copy
+    WebPCopyDecBufferPixels(output, idec->final_output_);  // do the slow-copy
    WebPFreeDecBuffer(&idec->output_);
-    if (status != VP8_STATUS_OK) return status;
    *output = *idec->final_output_;
    idec->final_output_ = NULL;
  }
@@ -295,7 +288,7 @@ static void RestoreContext(const MBContext* context, VP8Decoder* const dec,
 static VP8StatusCode IDecError(WebPIDecoder* const idec, VP8StatusCode error) {
  if (idec->state_ == STATE_VP8_DATA) {
    // Synchronize the thread, clean-up and check for errors.
-    (void)VP8ExitCritical((VP8Decoder*)idec->dec_, &idec->io_);
+    VP8ExitCritical((VP8Decoder*)idec->dec_, &idec->io_);
  }
  idec->state_ = STATE_ERROR;
  return error;
@@ -336,7 +329,6 @@ static VP8StatusCode DecodeWebPHeaders(WebPIDecoder* const idec) {
    if (dec == NULL) {
      return VP8_STATUS_OUT_OF_MEMORY;
    }
-    dec->incremental_ = 1;
    idec->dec_ = dec;
    dec->alpha_data_ = headers.alpha_data;
    dec->alpha_data_size_ = headers.alpha_data_size;
@@ -609,9 +601,8 @@ static VP8StatusCode IDecode(WebPIDecoder* idec) {
 //------------------------------------------------------------------------------
 // Internal constructor

-WEBP_NODISCARD static WebPIDecoder* NewDecoder(
-    WebPDecBuffer* const output_buffer,
-    const WebPBitstreamFeatures* const features) {
+static WebPIDecoder* NewDecoder(WebPDecBuffer* const output_buffer,
+                                const WebPBitstreamFeatures* const features) {
  WebPIDecoder* idec = (WebPIDecoder*)WebPSafeCalloc(1ULL, sizeof(*idec));
  if (idec == NULL) {
    return NULL;
@@ -623,10 +614,8 @@ WEBP_NODISCARD static WebPIDecoder* NewDecoder(
  idec->last_mb_y_ = -1;

  InitMemBuffer(&idec->mem_);
-  if (!WebPInitDecBuffer(&idec->output_) || !VP8InitIo(&idec->io_)) {
-    WebPSafeFree(idec);
-    return NULL;
-  }
+  WebPInitDecBuffer(&idec->output_);
+  VP8InitIo(&idec->io_);

  WebPResetDecParams(&idec->params_);
  if (output_buffer == NULL || WebPAvoidSlowMemory(output_buffer, features)) {
@@ -685,8 +674,7 @@ void WebPIDelete(WebPIDecoder* idec) {
    if (!idec->is_lossless_) {
      if (idec->state_ == STATE_VP8_DATA) {
        // Synchronize the thread, clean-up and check for errors.
-        // TODO(vrabaud) do we care about the return result?
-        (void)VP8ExitCritical((VP8Decoder*)idec->dec_, &idec->io_);
+        VP8ExitCritical((VP8Decoder*)idec->dec_, &idec->io_);
      }
      VP8Delete((VP8Decoder*)idec->dec_);
    } else {
@@ -863,8 +851,8 @@ const WebPDecBuffer* WebPIDecodedArea(const WebPIDecoder* idec,
  return src;
 }

-WEBP_NODISCARD uint8_t* WebPIDecGetRGB(const WebPIDecoder* idec, int* last_y,
-                                       int* width, int* height, int* stride) {
+uint8_t* WebPIDecGetRGB(const WebPIDecoder* idec, int* last_y,
+                        int* width, int* height, int* stride) {
  const WebPDecBuffer* const src = GetOutputBuffer(idec);
  if (src == NULL) return NULL;
  if (src->colorspace >= MODE_YUV) {
@@ -879,10 +867,10 @@ WEBP_NODISCARD uint8_t* WebPIDecGetRGB(const WebPIDecoder* idec, int* last_y,
  return src->u.RGBA.rgba;
 }

-WEBP_NODISCARD uint8_t* WebPIDecGetYUVA(const WebPIDecoder* idec, int* last_y,
-                                        uint8_t** u, uint8_t** v, uint8_t** a,
-                                        int* width, int* height, int* stride,
-                                        int* uv_stride, int* a_stride) {
+uint8_t* WebPIDecGetYUVA(const WebPIDecoder* idec, int* last_y,
+                         uint8_t** u, uint8_t** v, uint8_t** a,
+                         int* width, int* height,
+                         int* stride, int* uv_stride, int* a_stride) {
  const WebPDecBuffer* const src = GetOutputBuffer(idec);
  if (src == NULL) return NULL;
  if (src->colorspace < MODE_YUV) {
--- a/src/dec/tree_dec.c
+++ b/src/dec/tree_dec.c
@@ -12,11 +12,10 @@
 // Author: Skal (pascal.massimino@gmail.com)

 #include "src/dec/vp8i_dec.h"
-#include "src/dsp/cpu.h"
 #include "src/utils/bit_reader_inl_utils.h"

 #if !defined(USE_GENERIC_TREE)
-#if !defined(__arm__) && !defined(_M_ARM) && !WEBP_AARCH64
+#if !defined(__arm__) && !defined(_M_ARM) && !defined(__aarch64__)
 // using a table is ~1-2% slower on ARM. Prefer the coded-tree approach then.
 #define USE_GENERIC_TREE 1   // ALTERNATE_CODE
 #else
--- a/src/dec/vp8_dec.c
+++ b/src/dec/vp8_dec.c
@@ -86,8 +86,6 @@ void VP8Delete(VP8Decoder* const dec) {

 int VP8SetError(VP8Decoder* const dec,
                VP8StatusCode error, const char* const msg) {
-  // VP8_STATUS_SUSPENDED is only meaningful in incremental decoding.
-  assert(dec->incremental_ || error != VP8_STATUS_SUSPENDED);
  // The oldest error reported takes precedence over the new one.
  if (dec->status_ == VP8_STATUS_OK) {
    dec->status_ = error;
@@ -192,12 +190,12 @@ static int ParseSegmentHeader(VP8BitReader* br,
 }

 // Paragraph 9.5
-// If we don't have all the necessary data in 'buf', this function returns
-// VP8_STATUS_SUSPENDED in incremental decoding, VP8_STATUS_NOT_ENOUGH_DATA
-// otherwise.
-// In incremental decoding, this case is not necessarily an error. Still, no
-// bitreader is ever initialized to make it possible to read unavailable memory.
-// If we don't even have the partitions' sizes, then VP8_STATUS_NOT_ENOUGH_DATA
+// This function returns VP8_STATUS_SUSPENDED if we don't have all the
+// necessary data in 'buf'.
+// This case is not necessarily an error (for incremental decoding).
+// Still, no bitreader is ever initialized to make it possible to read
+// unavailable memory.
+// If we don't even have the partitions' sizes, than VP8_STATUS_NOT_ENOUGH_DATA
 // is returned, and this is an unrecoverable error.
 // If the partitions were positioned ok, VP8_STATUS_OK is returned.
 static VP8StatusCode ParsePartitions(VP8Decoder* const dec,
@@ -227,10 +225,8 @@ static VP8StatusCode ParsePartitions(VP8Decoder* const dec,
    sz += 3;
  }
  VP8InitBitReader(dec->parts_ + last_part, part_start, size_left);
-  if (part_start < buf_end) return VP8_STATUS_OK;
-  return dec->incremental_
-             ? VP8_STATUS_SUSPENDED  // Init is ok, but there's not enough data
-             : VP8_STATUS_NOT_ENOUGH_DATA;
+  return (part_start < buf_end) ? VP8_STATUS_OK :
+           VP8_STATUS_SUSPENDED;   // Init is ok, but there's not enough data
 }

 // Paragraph 9.4
@@ -498,8 +494,6 @@ static int GetCoeffsAlt(VP8BitReader* const br,
  return 16;
 }

-extern VP8CPUInfo VP8GetCPUInfo;
-
 WEBP_DSP_INIT_FUNC(InitGetCoeffs) {
  if (VP8GetCPUInfo != NULL && VP8GetCPUInfo(kSlowSSSE3)) {
    GetCoeffs = GetCoeffsAlt;
--- a/src/dec/vp8_dec.h
+++ b/src/dec/vp8_dec.h
@@ -15,7 +15,6 @@
 #define WEBP_DEC_VP8_DEC_H_

 #include "src/webp/decode.h"
-#include "src/webp/types.h"

 #ifdef __cplusplus
 extern "C" {
@@ -109,14 +108,16 @@ struct VP8Io {
 };

 // Internal, version-checked, entry point
-WEBP_NODISCARD int VP8InitIoInternal(VP8Io* const, int);
+int VP8InitIoInternal(VP8Io* const, int);

 // Set the custom IO function pointers and user-data. The setter for IO hooks
 // should be called before initiating incremental decoding. Returns true if
 // WebPIDecoder object is successfully modified, false otherwise.
-WEBP_NODISCARD int WebPISetIOHooks(WebPIDecoder* const idec, VP8IoPutHook put,
-                                   VP8IoSetupHook setup,
-                                   VP8IoTeardownHook teardown, void* user_data);
+int WebPISetIOHooks(WebPIDecoder* const idec,
+                    VP8IoPutHook put,
+                    VP8IoSetupHook setup,
+                    VP8IoTeardownHook teardown,
+                    void* user_data);

 // Main decoding object. This is an opaque structure.
 typedef struct VP8Decoder VP8Decoder;
@@ -127,17 +128,17 @@ VP8Decoder* VP8New(void);
 // Must be called to make sure 'io' is initialized properly.
 // Returns false in case of version mismatch. Upon such failure, no other
 // decoding function should be called (VP8Decode, VP8GetHeaders, ...)
-WEBP_NODISCARD static WEBP_INLINE int VP8InitIo(VP8Io* const io) {
+static WEBP_INLINE int VP8InitIo(VP8Io* const io) {
  return VP8InitIoInternal(io, WEBP_DECODER_ABI_VERSION);
 }

 // Decode the VP8 frame header. Returns true if ok.
 // Note: 'io->data' must be pointing to the start of the VP8 frame header.
-WEBP_NODISCARD int VP8GetHeaders(VP8Decoder* const dec, VP8Io* const io);
+int VP8GetHeaders(VP8Decoder* const dec, VP8Io* const io);

 // Decode a picture. Will call VP8GetHeaders() if it wasn't done already.
 // Returns false in case of error.
-WEBP_NODISCARD int VP8Decode(VP8Decoder* const dec, VP8Io* const io);
+int VP8Decode(VP8Decoder* const dec, VP8Io* const io);

 // Return current status of the decoder:
 VP8StatusCode VP8Status(VP8Decoder* const dec);
--- a/src/dec/vp8i_dec.h
+++ b/src/dec/vp8i_dec.h
@@ -21,7 +21,6 @@
 #include "src/utils/random_utils.h"
 #include "src/utils/thread_utils.h"
 #include "src/dsp/dsp.h"
-#include "src/webp/types.h"

 #ifdef __cplusplus
 extern "C" {
@@ -32,7 +31,7 @@ extern "C" {

 // version numbers
 #define DEC_MAJ_VERSION 1
-#define DEC_MIN_VERSION 4
+#define DEC_MIN_VERSION 3
 #define DEC_REV_VERSION 0

 // YUV-cache parameters. Cache is 32-bytes wide (= one cacheline).
@@ -187,7 +186,6 @@ struct VP8Decoder {

  // Main data source
  VP8BitReader br_;
-  int incremental_;  // if true, incremental decoding is expected

  // headers
  VP8FrameHeader   frm_hdr_;
@@ -283,7 +281,7 @@ int VP8ParseIntraModeRow(VP8BitReader* const br, VP8Decoder* const dec);
 void VP8ParseQuant(VP8Decoder* const dec);

 // in frame.c
-WEBP_NODISCARD int VP8InitFrame(VP8Decoder* const dec, VP8Io* const io);
+int VP8InitFrame(VP8Decoder* const dec, VP8Io* const io);
 // Call io->setup() and finish setting up scan parameters.
 // After this call returns, one must always call VP8ExitCritical() with the
 // same parameters. Both functions should be used in pair. Returns VP8_STATUS_OK
@@ -291,7 +289,7 @@ WEBP_NODISCARD int VP8InitFrame(VP8Decoder* const dec, VP8Io* const io);
 VP8StatusCode VP8EnterCritical(VP8Decoder* const dec, VP8Io* const io);
 // Must always be called in pair with VP8EnterCritical().
 // Returns false in case of error.
-WEBP_NODISCARD int VP8ExitCritical(VP8Decoder* const dec, VP8Io* const io);
+int VP8ExitCritical(VP8Decoder* const dec, VP8Io* const io);
 // Return the multi-threading method to use (0=off), depending
 // on options and bitstream size. Only for lossy decoding.
 int VP8GetThreadMethod(const WebPDecoderOptions* const options,
@@ -301,12 +299,11 @@ int VP8GetThreadMethod(const WebPDecoderOptions* const options,
 void VP8InitDithering(const WebPDecoderOptions* const options,
                      VP8Decoder* const dec);
 // Process the last decoded row (filtering + output).
-WEBP_NODISCARD int VP8ProcessRow(VP8Decoder* const dec, VP8Io* const io);
+int VP8ProcessRow(VP8Decoder* const dec, VP8Io* const io);
 // To be called at the start of a new scanline, to initialize predictors.
 void VP8InitScanline(VP8Decoder* const dec);
 // Decode one macroblock. Returns false if there is not enough data.
-WEBP_NODISCARD int VP8DecodeMB(VP8Decoder* const dec,
-                               VP8BitReader* const token_br);
+int VP8DecodeMB(VP8Decoder* const dec, VP8BitReader* const token_br);

 // in alpha.c
 const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
--- a/src/dec/vp8l_dec.c
+++ b/src/dec/vp8l_dec.c
@@ -12,7 +12,6 @@
 // Authors: Vikas Arora (vikaas.arora@gmail.com)
 //          Jyrki Alakuijala (jyrki@google.com)

-#include <assert.h>
 #include <stdlib.h>

 #include "src/dec/alphai_dec.h"
@@ -102,14 +101,6 @@ static const uint16_t kTableSize[12] = {
  FIXED_TABLE_SIZE + 2704
 };

-static int VP8LSetError(VP8LDecoder* const dec, VP8StatusCode error) {
-  // The oldest error reported takes precedence over the new one.
-  if (dec->status_ == VP8_STATUS_OK || dec->status_ == VP8_STATUS_SUSPENDED) {
-    dec->status_ = error;
-  }
-  return 0;
-}
-
 static int DecodeImageStream(int xsize, int ysize,
                             int is_level0,
                             VP8LDecoder* const dec,
@@ -310,7 +301,7 @@ static int ReadHuffmanCodeLengths(

 End:
  VP8LHuffmanTablesDeallocate(&tables);
-  if (!ok) return VP8LSetError(dec, VP8_STATUS_BITSTREAM_ERROR);
+  if (!ok) dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
  return ok;
 }

@@ -342,7 +333,10 @@ static int ReadHuffmanCode(int alphabet_size, VP8LDecoder* const dec,
    int i;
    int code_length_code_lengths[NUM_CODE_LENGTH_CODES] = { 0 };
    const int num_codes = VP8LReadBits(br, 4) + 4;
-    assert(num_codes <= NUM_CODE_LENGTH_CODES);
+    if (num_codes > NUM_CODE_LENGTH_CODES) {
+      dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
+      return 0;
+    }

    for (i = 0; i < num_codes; ++i) {
      code_length_code_lengths[kCodeLengthCodeOrder[i]] = VP8LReadBits(br, 3);
@@ -357,14 +351,15 @@ static int ReadHuffmanCode(int alphabet_size, VP8LDecoder* const dec,
                                 code_lengths, alphabet_size);
  }
  if (!ok || size == 0) {
-    return VP8LSetError(dec, VP8_STATUS_BITSTREAM_ERROR);
+    dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
+    return 0;
  }
  return size;
 }

 static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
                            int color_cache_bits, int allow_recursion) {
-  int i;
+  int i, j;
  VP8LBitReader* const br = &dec->br_;
  VP8LMetadata* const hdr = &dec->hdr_;
  uint32_t* huffman_image = NULL;
@@ -372,6 +367,9 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
  HuffmanTables* huffman_tables = &hdr->huffman_tables_;
  int num_htree_groups = 1;
  int num_htree_groups_max = 1;
+  int max_alphabet_size = 0;
+  int* code_lengths = NULL;
+  const int table_size = kTableSize[color_cache_bits];
  int* mapping = NULL;
  int ok = 0;

@@ -385,7 +383,7 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
    const int huffman_xsize = VP8LSubSampleSize(xsize, huffman_precision);
    const int huffman_ysize = VP8LSubSampleSize(ysize, huffman_precision);
    const int huffman_pixs = huffman_xsize * huffman_ysize;
-    if (!DecodeImageStream(huffman_xsize, huffman_ysize, /*is_level0=*/0, dec,
+    if (!DecodeImageStream(huffman_xsize, huffman_ysize, 0, dec,
                           &huffman_image)) {
      goto Error;
    }
@@ -409,7 +407,7 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
      // values [0, num_htree_groups)
      mapping = (int*)WebPSafeMalloc(num_htree_groups_max, sizeof(*mapping));
      if (mapping == NULL) {
-        VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
+        dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
        goto Error;
      }
      // -1 means a value is unmapped, and therefore unused in the Huffman
@@ -428,52 +426,25 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,

  if (br->eos_) goto Error;

-  if (!ReadHuffmanCodesHelper(color_cache_bits, num_htree_groups,
-                              num_htree_groups_max, mapping, dec,
-                              huffman_tables, &htree_groups)) {
-    goto Error;
-  }
-  ok = 1;
-
-  // All OK. Finalize pointers.
-  hdr->huffman_image_ = huffman_image;
-  hdr->num_htree_groups_ = num_htree_groups;
-  hdr->htree_groups_ = htree_groups;
-
- Error:
-  WebPSafeFree(mapping);
-  if (!ok) {
-    WebPSafeFree(huffman_image);
-    VP8LHuffmanTablesDeallocate(huffman_tables);
-    VP8LHtreeGroupsFree(htree_groups);
-  }
-  return ok;
-}
-
-int ReadHuffmanCodesHelper(int color_cache_bits, int num_htree_groups,
-                           int num_htree_groups_max, const int* const mapping,
-                           VP8LDecoder* const dec,
-                           HuffmanTables* const huffman_tables,
-                           HTreeGroup** const htree_groups) {
-  int i, j, ok = 0;
-  const int max_alphabet_size =
-      kAlphabetSize[0] + ((color_cache_bits > 0) ? 1 << color_cache_bits : 0);
-  const int table_size = kTableSize[color_cache_bits];
-  int* code_lengths = NULL;
-
-  if ((mapping == NULL && num_htree_groups != num_htree_groups_max) ||
-      num_htree_groups > num_htree_groups_max) {
-    goto Error;
+  // Find maximum alphabet size for the htree group.
+  for (j = 0; j < HUFFMAN_CODES_PER_META_CODE; ++j) {
+    int alphabet_size = kAlphabetSize[j];
+    if (j == 0 && color_cache_bits > 0) {
+      alphabet_size += 1 << color_cache_bits;
+    }
+    if (max_alphabet_size < alphabet_size) {
+      max_alphabet_size = alphabet_size;
+    }
  }

-  code_lengths =
-      (int*)WebPSafeCalloc((uint64_t)max_alphabet_size, sizeof(*code_lengths));
-  *htree_groups = VP8LHtreeGroupsNew(num_htree_groups);
+  code_lengths = (int*)WebPSafeCalloc((uint64_t)max_alphabet_size,
+                                      sizeof(*code_lengths));
+  htree_groups = VP8LHtreeGroupsNew(num_htree_groups);

-  if (*htree_groups == NULL || code_lengths == NULL ||
+  if (htree_groups == NULL || code_lengths == NULL ||
      !VP8LHuffmanTablesAllocate(num_htree_groups * table_size,
                                 huffman_tables)) {
-    VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
+    dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
    goto Error;
  }

@@ -493,7 +464,7 @@ int ReadHuffmanCodesHelper(int color_cache_bits, int num_htree_groups,
      }
    } else {
      HTreeGroup* const htree_group =
-          &(*htree_groups)[(mapping == NULL) ? i : mapping[i]];
+          &htree_groups[(mapping == NULL) ? i : mapping[i]];
      HuffmanCode** const htrees = htree_group->htrees;
      int size;
      int total_size = 0;
@@ -545,12 +516,18 @@ int ReadHuffmanCodesHelper(int color_cache_bits, int num_htree_groups,
  }
  ok = 1;

+  // All OK. Finalize pointers.
+  hdr->huffman_image_ = huffman_image;
+  hdr->num_htree_groups_ = num_htree_groups;
+  hdr->htree_groups_ = htree_groups;
+
 Error:
  WebPSafeFree(code_lengths);
+  WebPSafeFree(mapping);
  if (!ok) {
+    WebPSafeFree(huffman_image);
    VP8LHuffmanTablesDeallocate(huffman_tables);
-    VP8LHtreeGroupsFree(*htree_groups);
-    *htree_groups = NULL;
+    VP8LHtreeGroupsFree(htree_groups);
  }
  return ok;
 }
@@ -574,7 +551,8 @@ static int AllocateAndInitRescaler(VP8LDecoder* const dec, VP8Io* const io) {
                               scaled_data_size * sizeof(*scaled_data);
  uint8_t* memory = (uint8_t*)WebPSafeMalloc(memory_size, sizeof(*memory));
  if (memory == NULL) {
-    return VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
+    dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
+    return 0;
  }
  assert(dec->rescaler_memory == NULL);
  dec->rescaler_memory = memory;
@@ -1108,10 +1086,12 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data,
 End:
  br->eos_ = VP8LIsEndOfStream(br);
  if (!ok || (br->eos_ && pos < end)) {
-    return VP8LSetError(
-        dec, br->eos_ ? VP8_STATUS_SUSPENDED : VP8_STATUS_BITSTREAM_ERROR);
+    ok = 0;
+    dec->status_ = br->eos_ ? VP8_STATUS_SUSPENDED
+                            : VP8_STATUS_BITSTREAM_ERROR;
+  } else {
+    dec->last_pixel_ = pos;
  }
-  dec->last_pixel_ = pos;
  return ok;
 }

@@ -1261,20 +1241,9 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
  }

  br->eos_ = VP8LIsEndOfStream(br);
-  // In incremental decoding:
-  // br->eos_ && src < src_last: if 'br' reached the end of the buffer and
-  // 'src_last' has not been reached yet, there is not enough data. 'dec' has to
-  // be reset until there is more data.
-  // !br->eos_ && src < src_last: this cannot happen as either the buffer is
-  // fully read, either enough has been read to reach 'src_last'.
-  // src >= src_last: 'src_last' is reached, all is fine. 'src' can actually go
-  // beyond 'src_last' in case the image is cropped and an LZ77 goes further.
-  // The buffer might have been enough or there is some left. 'br->eos_' does
-  // not matter.
-  assert(!dec->incremental_ || (br->eos_ && src < src_last) || src >= src_last);
-  if (dec->incremental_ && br->eos_ && src < src_last) {
+  if (dec->incremental_ && br->eos_ && src < src_end) {
    RestoreState(dec);
-  } else if ((dec->incremental_ && src >= src_last) || !br->eos_) {
+  } else if (!br->eos_) {
    // Process the remaining rows corresponding to last row-block.
    if (process_func != NULL) {
      process_func(dec, row > last_row ? last_row : row);
@@ -1289,7 +1258,8 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
  return 1;

 Error:
-  return VP8LSetError(dec, VP8_STATUS_BITSTREAM_ERROR);
+  dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
+  return 0;
 }

 // -----------------------------------------------------------------------------
@@ -1356,7 +1326,7 @@ static int ReadTransform(int* const xsize, int const* ysize,
                                               transform->bits_),
                             VP8LSubSampleSize(transform->ysize_,
                                               transform->bits_),
-                             /*is_level0=*/0, dec, &transform->data_);
+                             0, dec, &transform->data_);
      break;
    case COLOR_INDEXING_TRANSFORM: {
       const int num_colors = VP8LReadBits(br, 8) + 1;
@@ -1366,11 +1336,8 @@ static int ReadTransform(int* const xsize, int const* ysize,
                      : 3;
       *xsize = VP8LSubSampleSize(transform->xsize_, bits);
       transform->bits_ = bits;
-       ok = DecodeImageStream(num_colors, /*ysize=*/1, /*is_level0=*/0, dec,
-                              &transform->data_);
-       if (ok && !ExpandColorMap(num_colors, transform)) {
-         return VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
-       }
+       ok = DecodeImageStream(num_colors, 1, 0, dec, &transform->data_);
+       ok = ok && ExpandColorMap(num_colors, transform);
      break;
    }
    case SUBTRACT_GREEN_TRANSFORM:
@@ -1476,7 +1443,7 @@ static int DecodeImageStream(int xsize, int ysize,
    color_cache_bits = VP8LReadBits(br, 4);
    ok = (color_cache_bits >= 1 && color_cache_bits <= MAX_CACHE_BITS);
    if (!ok) {
-      VP8LSetError(dec, VP8_STATUS_BITSTREAM_ERROR);
+      dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
      goto End;
    }
  }
@@ -1485,7 +1452,7 @@ static int DecodeImageStream(int xsize, int ysize,
  ok = ok && ReadHuffmanCodes(dec, transform_xsize, transform_ysize,
                              color_cache_bits, is_level0);
  if (!ok) {
-    VP8LSetError(dec, VP8_STATUS_BITSTREAM_ERROR);
+    dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
    goto End;
  }

@@ -1493,7 +1460,8 @@ static int DecodeImageStream(int xsize, int ysize,
  if (color_cache_bits > 0) {
    hdr->color_cache_size_ = 1 << color_cache_bits;
    if (!VP8LColorCacheInit(&hdr->color_cache_, color_cache_bits)) {
-      ok = VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
+      dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
+      ok = 0;
      goto End;
    }
  } else {
@@ -1510,7 +1478,8 @@ static int DecodeImageStream(int xsize, int ysize,
    const uint64_t total_size = (uint64_t)transform_xsize * transform_ysize;
    data = (uint32_t*)WebPSafeMalloc(total_size, sizeof(*data));
    if (data == NULL) {
-      ok = VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
+      dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
+      ok = 0;
      goto End;
    }
  }
@@ -1555,7 +1524,8 @@ static int AllocateInternalBuffers32b(VP8LDecoder* const dec, int final_width) {
  dec->pixels_ = (uint32_t*)WebPSafeMalloc(total_num_pixels, sizeof(uint32_t));
  if (dec->pixels_ == NULL) {
    dec->argb_cache_ = NULL;    // for soundness
-    return VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
+    dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
+    return 0;
  }
  dec->argb_cache_ = dec->pixels_ + num_pixels + cache_top_pixels;
  return 1;
@@ -1566,7 +1536,8 @@ static int AllocateInternalBuffers8b(VP8LDecoder* const dec) {
  dec->argb_cache_ = NULL;    // for soundness
  dec->pixels_ = (uint32_t*)WebPSafeMalloc(total_num_pixels, sizeof(uint8_t));
  if (dec->pixels_ == NULL) {
-    return VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
+    dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
+    return 0;
  }
  return 1;
 }
@@ -1621,8 +1592,7 @@ int VP8LDecodeAlphaHeader(ALPHDecoder* const alph_dec,
  dec->status_ = VP8_STATUS_OK;
  VP8LInitBitReader(&dec->br_, data, data_size);

-  if (!DecodeImageStream(alph_dec->width_, alph_dec->height_, /*is_level0=*/1,
-                         dec, /*decoded_data=*/NULL)) {
+  if (!DecodeImageStream(alph_dec->width_, alph_dec->height_, 1, dec, NULL)) {
    goto Err;
  }

@@ -1677,24 +1647,22 @@ int VP8LDecodeHeader(VP8LDecoder* const dec, VP8Io* const io) {

  if (dec == NULL) return 0;
  if (io == NULL) {
-    return VP8LSetError(dec, VP8_STATUS_INVALID_PARAM);
+    dec->status_ = VP8_STATUS_INVALID_PARAM;
+    return 0;
  }

  dec->io_ = io;
  dec->status_ = VP8_STATUS_OK;
  VP8LInitBitReader(&dec->br_, io->data, io->data_size);
  if (!ReadImageInfo(&dec->br_, &width, &height, &has_alpha)) {
-    VP8LSetError(dec, VP8_STATUS_BITSTREAM_ERROR);
+    dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
    goto Error;
  }
  dec->state_ = READ_DIM;
  io->width = width;
  io->height = height;

-  if (!DecodeImageStream(width, height, /*is_level0=*/1, dec,
-                         /*decoded_data=*/NULL)) {
-    goto Error;
-  }
+  if (!DecodeImageStream(width, height, 1, dec, NULL)) goto Error;
  return 1;

 Error:
@@ -1724,7 +1692,7 @@ int VP8LDecodeImage(VP8LDecoder* const dec) {
    assert(dec->output_ != NULL);

    if (!WebPIoInitFromOptions(params->options, io, MODE_BGRA)) {
-      VP8LSetError(dec, VP8_STATUS_INVALID_PARAM);
+      dec->status_ = VP8_STATUS_INVALID_PARAM;
      goto Err;
    }

@@ -1734,7 +1702,7 @@ int VP8LDecodeImage(VP8LDecoder* const dec) {
    if (io->use_scaling && !AllocateAndInitRescaler(dec, io)) goto Err;
 #else
    if (io->use_scaling) {
-      VP8LSetError(dec, VP8_STATUS_INVALID_PARAM);
+      dec->status_ = VP8_STATUS_INVALID_PARAM;
      goto Err;
    }
 #endif
@@ -1752,7 +1720,7 @@ int VP8LDecodeImage(VP8LDecoder* const dec) {
          dec->hdr_.saved_color_cache_.colors_ == NULL) {
        if (!VP8LColorCacheInit(&dec->hdr_.saved_color_cache_,
                                dec->hdr_.color_cache_.hash_bits_)) {
-          VP8LSetError(dec, VP8_STATUS_OUT_OF_MEMORY);
+          dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
          goto Err;
        }
      }
--- a/src/dec/vp8li_dec.h
+++ b/src/dec/vp8li_dec.h
@@ -20,7 +20,6 @@
 #include "src/utils/bit_reader_utils.h"
 #include "src/utils/color_cache_utils.h"
 #include "src/utils/huffman_utils.h"
-#include "src/webp/types.h"

 #ifdef __cplusplus
 extern "C" {
@@ -100,26 +99,25 @@ struct ALPHDecoder;  // Defined in dec/alphai.h.

 // Decodes image header for alpha data stored using lossless compression.
 // Returns false in case of error.
-WEBP_NODISCARD int VP8LDecodeAlphaHeader(struct ALPHDecoder* const alph_dec,
-                                         const uint8_t* const data,
-                                         size_t data_size);
+int VP8LDecodeAlphaHeader(struct ALPHDecoder* const alph_dec,
+                          const uint8_t* const data, size_t data_size);

 // Decodes *at least* 'last_row' rows of alpha. If some of the initial rows are
 // already decoded in previous call(s), it will resume decoding from where it
 // was paused.
 // Returns false in case of bitstream error.
-WEBP_NODISCARD int VP8LDecodeAlphaImageStream(
-    struct ALPHDecoder* const alph_dec, int last_row);
+int VP8LDecodeAlphaImageStream(struct ALPHDecoder* const alph_dec,
+                               int last_row);

 // Allocates and initialize a new lossless decoder instance.
-WEBP_NODISCARD VP8LDecoder* VP8LNew(void);
+VP8LDecoder* VP8LNew(void);

 // Decodes the image header. Returns false in case of error.
-WEBP_NODISCARD int VP8LDecodeHeader(VP8LDecoder* const dec, VP8Io* const io);
+int VP8LDecodeHeader(VP8LDecoder* const dec, VP8Io* const io);

 // Decodes an image. It's required to decode the lossless header before calling
 // this function. Returns false in case of error, with updated dec->status_.
-WEBP_NODISCARD int VP8LDecodeImage(VP8LDecoder* const dec);
+int VP8LDecodeImage(VP8LDecoder* const dec);

 // Resets the decoder in its initial state, reclaiming memory.
 // Preserves the dec->status_ value.
@@ -128,18 +126,6 @@ void VP8LClear(VP8LDecoder* const dec);
 // Clears and deallocate a lossless decoder instance.
 void VP8LDelete(VP8LDecoder* const dec);

-// Helper function for reading the different Huffman codes and storing them in
-// 'huffman_tables' and 'htree_groups'.
-// If mapping is NULL 'num_htree_groups_max' must equal 'num_htree_groups'.
-// If it is not NULL, it maps 'num_htree_groups_max' indices to the
-// 'num_htree_groups' groups. If 'num_htree_groups_max' > 'num_htree_groups',
-// some of those indices map to -1. This is used for non-balanced codes to
-// limit memory usage.
-WEBP_NODISCARD int ReadHuffmanCodesHelper(
-    int color_cache_bits, int num_htree_groups, int num_htree_groups_max,
-    const int* const mapping, VP8LDecoder* const dec,
-    HuffmanTables* const huffman_tables, HTreeGroup** const htree_groups);
-
 //------------------------------------------------------------------------------

 #ifdef __cplusplus
--- a/src/dec/webp_dec.c
+++ b/src/dec/webp_dec.c
@@ -13,14 +13,11 @@

 #include <stdlib.h>

-#include "src/dec/vp8_dec.h"
 #include "src/dec/vp8i_dec.h"
 #include "src/dec/vp8li_dec.h"
 #include "src/dec/webpi_dec.h"
 #include "src/utils/utils.h"
 #include "src/webp/mux_types.h"  // ALPHA_FLAG
-#include "src/webp/decode.h"
-#include "src/webp/types.h"

 //------------------------------------------------------------------------------
 // RIFF layout is:
@@ -447,9 +444,8 @@ void WebPResetDecParams(WebPDecParams* const params) {
 // "Into" decoding variants

 // Main flow
-WEBP_NODISCARD static VP8StatusCode DecodeInto(const uint8_t* const data,
-                                               size_t data_size,
-                                               WebPDecParams* const params) {
+static VP8StatusCode DecodeInto(const uint8_t* const data, size_t data_size,
+                                WebPDecParams* const params) {
  VP8StatusCode status;
  VP8Io io;
  WebPHeaderStructure headers;
@@ -463,9 +459,7 @@ WEBP_NODISCARD static VP8StatusCode DecodeInto(const uint8_t* const data,
  }

  assert(params != NULL);
-  if (!VP8InitIo(&io)) {
-    return VP8_STATUS_INVALID_PARAM;
-  }
+  VP8InitIo(&io);
  io.data = headers.data + headers.offset;
  io.data_size = headers.data_size - headers.offset;
  WebPInitCustomIo(params, &io);  // Plug the I/O functions.
@@ -529,16 +523,17 @@ WEBP_NODISCARD static VP8StatusCode DecodeInto(const uint8_t* const data,
 }

 // Helpers
-WEBP_NODISCARD static uint8_t* DecodeIntoRGBABuffer(WEBP_CSP_MODE colorspace,
-                                                    const uint8_t* const data,
-                                                    size_t data_size,
-                                                    uint8_t* const rgba,
-                                                    int stride, size_t size) {
+static uint8_t* DecodeIntoRGBABuffer(WEBP_CSP_MODE colorspace,
+                                     const uint8_t* const data,
+                                     size_t data_size,
+                                     uint8_t* const rgba,
+                                     int stride, size_t size) {
  WebPDecParams params;
  WebPDecBuffer buf;
-  if (rgba == NULL || !WebPInitDecBuffer(&buf)) {
+  if (rgba == NULL) {
    return NULL;
  }
+  WebPInitDecBuffer(&buf);
  WebPResetDecParams(&params);
  params.output = &buf;
  buf.colorspace    = colorspace;
@@ -583,7 +578,8 @@ uint8_t* WebPDecodeYUVInto(const uint8_t* data, size_t data_size,
                           uint8_t* v, size_t v_size, int v_stride) {
  WebPDecParams params;
  WebPDecBuffer output;
-  if (luma == NULL || !WebPInitDecBuffer(&output)) return NULL;
+  if (luma == NULL) return NULL;
+  WebPInitDecBuffer(&output);
  WebPResetDecParams(&params);
  params.output = &output;
  output.colorspace      = MODE_YUV;
@@ -605,17 +601,13 @@ uint8_t* WebPDecodeYUVInto(const uint8_t* data, size_t data_size,

 //------------------------------------------------------------------------------

-WEBP_NODISCARD static uint8_t* Decode(WEBP_CSP_MODE mode,
-                                      const uint8_t* const data,
-                                      size_t data_size, int* const width,
-                                      int* const height,
-                                      WebPDecBuffer* const keep_info) {
+static uint8_t* Decode(WEBP_CSP_MODE mode, const uint8_t* const data,
+                       size_t data_size, int* const width, int* const height,
+                       WebPDecBuffer* const keep_info) {
  WebPDecParams params;
  WebPDecBuffer output;

-  if (!WebPInitDecBuffer(&output)) {
-    return NULL;
-  }
+  WebPInitDecBuffer(&output);
  WebPResetDecParams(&params);
  params.output = &output;
  output.colorspace = mode;
@@ -666,26 +658,19 @@ uint8_t* WebPDecodeBGRA(const uint8_t* data, size_t data_size,
 uint8_t* WebPDecodeYUV(const uint8_t* data, size_t data_size,
                       int* width, int* height, uint8_t** u, uint8_t** v,
                       int* stride, int* uv_stride) {
-  // data, width and height are checked by Decode().
-  if (u == NULL || v == NULL || stride == NULL || uv_stride == NULL) {
-    return NULL;
-  }
+  WebPDecBuffer output;   // only to preserve the side-infos
+  uint8_t* const out = Decode(MODE_YUV, data, data_size,
+                              width, height, &output);

-  {
-    WebPDecBuffer output;   // only to preserve the side-infos
-    uint8_t* const out = Decode(MODE_YUV, data, data_size,
-                                width, height, &output);
-
-    if (out != NULL) {
-      const WebPYUVABuffer* const buf = &output.u.YUVA;
-      *u = buf->u;
-      *v = buf->v;
-      *stride = buf->y_stride;
-      *uv_stride = buf->u_stride;
-      assert(buf->u_stride == buf->v_stride);
-    }
-    return out;
+  if (out != NULL) {
+    const WebPYUVABuffer* const buf = &output.u.YUVA;
+    *u = buf->u;
+    *v = buf->v;
+    *stride = buf->y_stride;
+    *uv_stride = buf->u_stride;
+    assert(buf->u_stride == buf->v_stride);
  }
+  return out;
 }

 static void DefaultFeatures(WebPBitstreamFeatures* const features) {
@@ -741,9 +726,7 @@ int WebPInitDecoderConfigInternal(WebPDecoderConfig* config,
  }
  memset(config, 0, sizeof(*config));
  DefaultFeatures(&config->input);
-  if (!WebPInitDecBuffer(&config->output)) {
-    return 0;
-  }
+  WebPInitDecBuffer(&config->output);
  return 1;
 }

@@ -782,9 +765,7 @@ VP8StatusCode WebPDecode(const uint8_t* data, size_t data_size,
  if (WebPAvoidSlowMemory(params.output, &config->input)) {
    // decoding to slow memory: use a temporary in-mem buffer to decode into.
    WebPDecBuffer in_mem_buffer;
-    if (!WebPInitDecBuffer(&in_mem_buffer)) {
-      return VP8_STATUS_INVALID_PARAM;
-    }
+    WebPInitDecBuffer(&in_mem_buffer);
    in_mem_buffer.colorspace = config->output.colorspace;
    in_mem_buffer.width = config->input.width;
    in_mem_buffer.height = config->input.height;
--- a/src/dec/webpi_dec.h
+++ b/src/dec/webpi_dec.h
@@ -20,7 +20,6 @@ extern "C" {

 #include "src/utils/rescaler_utils.h"
 #include "src/dec/vp8_dec.h"
-#include "src/webp/decode.h"

 //------------------------------------------------------------------------------
 // WebPDecParams: Decoding output parameters. Transient internal object.
@@ -88,9 +87,8 @@ void WebPInitCustomIo(WebPDecParams* const params, VP8Io* const io);

 // Setup crop_xxx fields, mb_w and mb_h in io. 'src_colorspace' refers
 // to the *compressed* format, not the output one.
-WEBP_NODISCARD int WebPIoInitFromOptions(
-    const WebPDecoderOptions* const options, VP8Io* const io,
-    WEBP_CSP_MODE src_colorspace);
+int WebPIoInitFromOptions(const WebPDecoderOptions* const options,
+                          VP8Io* const io, WEBP_CSP_MODE src_colorspace);

 //------------------------------------------------------------------------------
 // Internal functions regarding WebPDecBuffer memory (in buffer.c).
--- a/src/demux/Makefile.am
+++ b/src/demux/Makefile.am
@@ -13,6 +13,6 @@ noinst_HEADERS =
 noinst_HEADERS += ../webp/format_constants.h

 libwebpdemux_la_LIBADD = ../libwebp.la
-libwebpdemux_la_LDFLAGS = -no-undefined -version-info 2:15:0
+libwebpdemux_la_LDFLAGS = -no-undefined -version-info 2:12:0
 libwebpdemuxincludedir = $(includedir)/webp
 pkgconfig_DATA = libwebpdemux.pc
--- a/src/demux/anim_decode.c
+++ b/src/demux/anim_decode.c
@@ -20,7 +20,6 @@
 #include "src/utils/utils.h"
 #include "src/webp/decode.h"
 #include "src/webp/demux.h"
-#include "src/webp/types.h"

 #define NUM_CHANNELS 4

@@ -69,9 +68,8 @@ int WebPAnimDecoderOptionsInitInternal(WebPAnimDecoderOptions* dec_options,
  return 1;
 }

-WEBP_NODISCARD static int ApplyDecoderOptions(
-    const WebPAnimDecoderOptions* const dec_options,
-    WebPAnimDecoder* const dec) {
+static int ApplyDecoderOptions(const WebPAnimDecoderOptions* const dec_options,
+                               WebPAnimDecoder* const dec) {
  WEBP_CSP_MODE mode;
  WebPDecoderConfig* config = &dec->config_;
  assert(dec_options != NULL);
@@ -84,9 +82,7 @@ WEBP_NODISCARD static int ApplyDecoderOptions(
  dec->blend_func_ = (mode == MODE_RGBA || mode == MODE_BGRA)
                         ? &BlendPixelRowNonPremult
                         : &BlendPixelRowPremult;
-  if (!WebPInitDecoderConfig(config)) {
-    return 0;
-  }
+  WebPInitDecoderConfig(config);
  config->output.colorspace = mode;
  config->output.is_external_memory = 1;
  config->options.use_threads = dec_options->use_threads;
@@ -161,8 +157,8 @@ static int IsFullFrame(int width, int height, int canvas_width,
 }

 // Clear the canvas to transparent.
-WEBP_NODISCARD static int ZeroFillCanvas(uint8_t* buf, uint32_t canvas_width,
-                                         uint32_t canvas_height) {
+static int ZeroFillCanvas(uint8_t* buf, uint32_t canvas_width,
+                          uint32_t canvas_height) {
  const uint64_t size =
      (uint64_t)canvas_width * canvas_height * NUM_CHANNELS * sizeof(*buf);
  if (!CheckSizeOverflow(size)) return 0;
@@ -183,8 +179,8 @@ static void ZeroFillFrameRect(uint8_t* buf, int buf_stride, int x_offset,
 }

 // Copy width * height pixels from 'src' to 'dst'.
-WEBP_NODISCARD static int CopyCanvas(const uint8_t* src, uint8_t* dst,
-                                     uint32_t width, uint32_t height) {
+static int CopyCanvas(const uint8_t* src, uint8_t* dst,
+                      uint32_t width, uint32_t height) {
  const uint64_t size = (uint64_t)width * height * NUM_CHANNELS;
  if (!CheckSizeOverflow(size)) return 0;
  assert(src != NULL && dst != NULL);
@@ -428,9 +424,7 @@ int WebPAnimDecoderGetNext(WebPAnimDecoder* dec,
  WebPDemuxReleaseIterator(&dec->prev_iter_);
  dec->prev_iter_ = iter;
  dec->prev_frame_was_keyframe_ = is_key_frame;
-  if (!CopyCanvas(dec->curr_frame_, dec->prev_frame_disposed_, width, height)) {
-    goto Error;
-  }
+  CopyCanvas(dec->curr_frame_, dec->prev_frame_disposed_, width, height);
  if (dec->prev_iter_.dispose_method == WEBP_MUX_DISPOSE_BACKGROUND) {
    ZeroFillFrameRect(dec->prev_frame_disposed_, width * NUM_CHANNELS,
                      dec->prev_iter_.x_offset, dec->prev_iter_.y_offset,
--- a/src/demux/demux.c
+++ b/src/demux/demux.c
@@ -24,7 +24,7 @@
 #include "src/webp/format_constants.h"

 #define DMUX_MAJ_VERSION 1
-#define DMUX_MIN_VERSION 4
+#define DMUX_MIN_VERSION 3
 #define DMUX_REV_VERSION 0

 typedef struct {
--- a/src/demux/libwebpdemux.pc.in
+++ b/src/demux/libwebpdemux.pc.in
@@ -6,6 +6,6 @@ includedir=@includedir@
 Name: libwebpdemux
 Description: Library for parsing the WebP graphics format container
 Version: @PACKAGE_VERSION@
-Requires.private: libwebp >= 0.2.0
+Requires: libwebp >= 0.2.0
 Cflags: -I${includedir}
 Libs: -L${libdir} -l@webp_libname_prefix@webpdemux
--- a/src/demux/libwebpdemux.rc
+++ b/src/demux/libwebpdemux.rc
@@ -6,8 +6,8 @@
 LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US

 VS_VERSION_INFO VERSIONINFO
- FILEVERSION 1,0,4,0
- PRODUCTVERSION 1,0,4,0
+ FILEVERSION 1,0,3,0
+ PRODUCTVERSION 1,0,3,0
 FILEFLAGSMASK 0x3fL
 #ifdef _DEBUG
 FILEFLAGS 0x1L
@@ -24,12 +24,12 @@ BEGIN
        BEGIN
            VALUE "CompanyName", "Google, Inc."
            VALUE "FileDescription", "libwebpdemux DLL"
-            VALUE "FileVersion", "1.4.0"
+            VALUE "FileVersion", "1.3.0"
            VALUE "InternalName", "libwebpdemux.dll"
-            VALUE "LegalCopyright", "Copyright (C) 2024"
+            VALUE "LegalCopyright", "Copyright (C) 2022"
            VALUE "OriginalFilename", "libwebpdemux.dll"
            VALUE "ProductName", "WebP Image Demuxer"
-            VALUE "ProductVersion", "1.4.0"
+            VALUE "ProductVersion", "1.3.0"
        END
    END
    BLOCK "VarFileInfo"
--- a/src/dsp/alpha_processing.c
+++ b/src/dsp/alpha_processing.c
@@ -425,7 +425,6 @@ void (*WebPAlphaReplace)(uint32_t* src, int length, uint32_t color);
 //------------------------------------------------------------------------------
 // Init function

-extern VP8CPUInfo VP8GetCPUInfo;
 extern void WebPInitAlphaProcessingMIPSdspR2(void);
 extern void WebPInitAlphaProcessingSSE2(void);
 extern void WebPInitAlphaProcessingSSE41(void);
--- a/src/dsp/alpha_processing_sse2.c
+++ b/src/dsp/alpha_processing_sse2.c
@@ -144,46 +144,6 @@ static int ExtractAlpha_SSE2(const uint8_t* WEBP_RESTRICT argb, int argb_stride,
  return (alpha_and == 0xff);
 }

-static void ExtractGreen_SSE2(const uint32_t* WEBP_RESTRICT argb,
-                              uint8_t* WEBP_RESTRICT alpha, int size) {
-  int i;
-  const __m128i mask = _mm_set1_epi32(0xff);
-  const __m128i* src = (const __m128i*)argb;
-
-  for (i = 0; i + 16 <= size; i += 16, src += 4) {
-    const __m128i a0 = _mm_loadu_si128(src + 0);
-    const __m128i a1 = _mm_loadu_si128(src + 1);
-    const __m128i a2 = _mm_loadu_si128(src + 2);
-    const __m128i a3 = _mm_loadu_si128(src + 3);
-    const __m128i b0 = _mm_srli_epi32(a0, 8);
-    const __m128i b1 = _mm_srli_epi32(a1, 8);
-    const __m128i b2 = _mm_srli_epi32(a2, 8);
-    const __m128i b3 = _mm_srli_epi32(a3, 8);
-    const __m128i c0 = _mm_and_si128(b0, mask);
-    const __m128i c1 = _mm_and_si128(b1, mask);
-    const __m128i c2 = _mm_and_si128(b2, mask);
-    const __m128i c3 = _mm_and_si128(b3, mask);
-    const __m128i d0 = _mm_packs_epi32(c0, c1);
-    const __m128i d1 = _mm_packs_epi32(c2, c3);
-    const __m128i e = _mm_packus_epi16(d0, d1);
-    // store
-    _mm_storeu_si128((__m128i*)&alpha[i], e);
-  }
-  if (i + 8 <= size) {
-    const __m128i a0 = _mm_loadu_si128(src + 0);
-    const __m128i a1 = _mm_loadu_si128(src + 1);
-    const __m128i b0 = _mm_srli_epi32(a0, 8);
-    const __m128i b1 = _mm_srli_epi32(a1, 8);
-    const __m128i c0 = _mm_and_si128(b0, mask);
-    const __m128i c1 = _mm_and_si128(b1, mask);
-    const __m128i d = _mm_packs_epi32(c0, c1);
-    const __m128i e = _mm_packus_epi16(d, d);
-    _mm_storel_epi64((__m128i*)&alpha[i], e);
-    i += 8;
-  }
-  for (; i < size; ++i) alpha[i] = argb[i] >> 8;
-}
-
 //------------------------------------------------------------------------------
 // Non-dither premultiplied modes

@@ -394,7 +354,6 @@ WEBP_TSAN_IGNORE_FUNCTION void WebPInitAlphaProcessingSSE2(void) {
  WebPDispatchAlpha = DispatchAlpha_SSE2;
  WebPDispatchAlphaToGreen = DispatchAlphaToGreen_SSE2;
  WebPExtractAlpha = ExtractAlpha_SSE2;
-  WebPExtractGreen = ExtractGreen_SSE2;

  WebPHasAlpha8b = HasAlpha8b_SSE2;
  WebPHasAlpha32b = HasAlpha32b_SSE2;
--- a/src/dsp/cost.c
+++ b/src/dsp/cost.c
@@ -374,7 +374,6 @@ static void SetResidualCoeffs_C(const int16_t* const coeffs,
 VP8GetResidualCostFunc VP8GetResidualCost;
 VP8SetResidualCoeffsFunc VP8SetResidualCoeffs;

-extern VP8CPUInfo VP8GetCPUInfo;
 extern void VP8EncDspCostInitMIPS32(void);
 extern void VP8EncDspCostInitMIPSdspR2(void);
 extern void VP8EncDspCostInitSSE2(void);
--- a/src/dsp/cost_neon.c
+++ b/src/dsp/cost_neon.c
@@ -29,7 +29,7 @@ static void SetResidualCoeffs_NEON(const int16_t* const coeffs,
  const uint8x16_t eob = vcombine_u8(vqmovn_u16(eob_0), vqmovn_u16(eob_1));
  const uint8x16_t masked = vandq_u8(eob, vld1q_u8(position));

-#if WEBP_AARCH64
+#ifdef __aarch64__
  res->last = vmaxvq_u8(masked) - 1;
 #else
  const uint8x8_t eob_8x8 = vmax_u8(vget_low_u8(masked), vget_high_u8(masked));
@@ -43,7 +43,7 @@ static void SetResidualCoeffs_NEON(const int16_t* const coeffs,

  vst1_lane_s32(&res->last, vreinterpret_s32_u32(eob_32x2), 0);
  --res->last;
-#endif  // WEBP_AARCH64
+#endif  // __aarch64__

  res->coeffs = coeffs;
 }
--- a/src/dsp/cpu.c
+++ b/src/dsp/cpu.c
@@ -36,6 +36,18 @@ static WEBP_INLINE void GetCPUInfo(int cpu_info[4], int info_type) {
    : "=a"(cpu_info[0]), "=D"(cpu_info[1]), "=c"(cpu_info[2]), "=d"(cpu_info[3])
    : "a"(info_type), "c"(0));
 }
+#elif defined(__x86_64__) && \
+      (defined(__code_model_medium__) || defined(__code_model_large__)) && \
+      defined(__PIC__)
+static WEBP_INLINE void GetCPUInfo(int cpu_info[4], int info_type) {
+  __asm__ volatile (
+    "xchg{q}\t{%%rbx}, %q1\n"
+    "cpuid\n"
+    "xchg{q}\t{%%rbx}, %q1\n"
+    : "=a"(cpu_info[0]), "=&r"(cpu_info[1]), "=c"(cpu_info[2]),
+      "=d"(cpu_info[3])
+    : "a"(info_type), "c"(0));
+}
 #elif defined(__i386__) || defined(__x86_64__)
 static WEBP_INLINE void GetCPUInfo(int cpu_info[4], int info_type) {
  __asm__ volatile (
@@ -161,7 +173,6 @@ static int x86CPUInfo(CPUFeature feature) {
  }
  return 0;
 }
-WEBP_EXTERN VP8CPUInfo VP8GetCPUInfo;
 VP8CPUInfo VP8GetCPUInfo = x86CPUInfo;
 #elif defined(WEBP_ANDROID_NEON)  // NB: needs to be before generic NEON test.
 static int AndroidCPUInfo(CPUFeature feature) {
@@ -173,7 +184,6 @@ static int AndroidCPUInfo(CPUFeature feature) {
  }
  return 0;
 }
-WEBP_EXTERN VP8CPUInfo VP8GetCPUInfo;
 VP8CPUInfo VP8GetCPUInfo = AndroidCPUInfo;
 #elif defined(EMSCRIPTEN) // also needs to be before generic NEON test
 // Use compile flags as an indicator of SIMD support instead of a runtime check.
@@ -198,7 +208,6 @@ static int wasmCPUInfo(CPUFeature feature) {
  }
  return 0;
 }
-WEBP_EXTERN VP8CPUInfo VP8GetCPUInfo;
 VP8CPUInfo VP8GetCPUInfo = wasmCPUInfo;
 #elif defined(WEBP_HAVE_NEON)
 // In most cases this function doesn't check for NEON support (it's assumed by
@@ -227,7 +236,6 @@ static int armCPUInfo(CPUFeature feature) {
  return 1;
 #endif
 }
-WEBP_EXTERN VP8CPUInfo VP8GetCPUInfo;
 VP8CPUInfo VP8GetCPUInfo = armCPUInfo;
 #elif defined(WEBP_USE_MIPS32) || defined(WEBP_USE_MIPS_DSP_R2) || \
      defined(WEBP_USE_MSA)
@@ -239,9 +247,7 @@ static int mipsCPUInfo(CPUFeature feature) {
  }

 }
-WEBP_EXTERN VP8CPUInfo VP8GetCPUInfo;
 VP8CPUInfo VP8GetCPUInfo = mipsCPUInfo;
 #else
-WEBP_EXTERN VP8CPUInfo VP8GetCPUInfo;
 VP8CPUInfo VP8GetCPUInfo = NULL;
 #endif
--- a/src/dsp/cpu.h
+++ b/src/dsp/cpu.h
@@ -43,9 +43,6 @@
 #define __has_builtin(x) 0
 #endif

-//------------------------------------------------------------------------------
-// x86 defines.
-
 #if !defined(HAVE_CONFIG_H)
 #if defined(_MSC_VER) && _MSC_VER > 1310 && \
    (defined(_M_X64) || defined(_M_IX86))
@@ -83,9 +80,6 @@
 #undef WEBP_MSC_SSE41
 #undef WEBP_MSC_SSE2

-//------------------------------------------------------------------------------
-// Arm defines.
-
 // The intrinsics currently cause compiler errors with arm-nacl-gcc and the
 // inline assembly would need to be modified for use with Native Client.
 #if ((defined(__ARM_NEON__) || defined(__aarch64__)) &&       \
@@ -104,26 +98,16 @@
 // inclusion of arm64_neon.h; Visual Studio 2019 includes this file in
 // arm_neon.h. Compile errors were seen with Visual Studio 2019 16.4 with
 // vtbl4_u8(); a fix was made in 16.6.
-#if defined(_MSC_VER) && \
-    ((_MSC_VER >= 1700 && defined(_M_ARM)) || \
-     (_MSC_VER >= 1926 && (defined(_M_ARM64) || defined(_M_ARM64EC))))
+#if defined(_MSC_VER) && ((_MSC_VER >= 1700 && defined(_M_ARM)) || \
+                          (_MSC_VER >= 1926 && defined(_M_ARM64)))
 #define WEBP_USE_NEON
 #define WEBP_USE_INTRINSICS
 #endif

-#if defined(__aarch64__) || defined(_M_ARM64) || defined(_M_ARM64EC)
-#define WEBP_AARCH64 1
-#else
-#define WEBP_AARCH64 0
-#endif
-
 #if defined(WEBP_USE_NEON) && !defined(WEBP_HAVE_NEON)
 #define WEBP_HAVE_NEON
 #endif

-//------------------------------------------------------------------------------
-// MIPS defines.
-
 #if defined(__mips__) && !defined(__mips64) && defined(__mips_isa_rev) && \
    (__mips_isa_rev >= 1) && (__mips_isa_rev < 6)
 #define WEBP_USE_MIPS32
@@ -139,8 +123,6 @@
 #define WEBP_USE_MSA
 #endif

-//------------------------------------------------------------------------------
-
 #ifndef WEBP_DSP_OMIT_C_CODE
 #define WEBP_DSP_OMIT_C_CODE 1
 #endif
@@ -151,14 +133,13 @@
 #define WEBP_NEON_OMIT_C_CODE 0
 #endif

-#if !(LOCAL_CLANG_PREREQ(3, 8) || LOCAL_GCC_PREREQ(4, 8) || WEBP_AARCH64)
+#if !(LOCAL_CLANG_PREREQ(3, 8) || LOCAL_GCC_PREREQ(4, 8) || \
+      defined(__aarch64__))
 #define WEBP_NEON_WORK_AROUND_GCC 1
 #else
 #define WEBP_NEON_WORK_AROUND_GCC 0
 #endif

-//------------------------------------------------------------------------------
-
 // This macro prevents thread_sanitizer from reporting known concurrent writes.
 #define WEBP_TSAN_IGNORE_FUNCTION
 #if defined(__has_feature)
@@ -260,7 +241,16 @@ typedef enum {
  kMSA
 } CPUFeature;

+#ifdef __cplusplus
+extern "C" {
+#endif
+
 // returns true if the CPU supports the feature.
 typedef int (*VP8CPUInfo)(CPUFeature feature);
+WEBP_EXTERN VP8CPUInfo VP8GetCPUInfo;
+
+#ifdef __cplusplus
+}    // extern "C"
+#endif

 #endif  // WEBP_DSP_CPU_H_
--- a/src/dsp/dec.c
+++ b/src/dsp/dec.c
@@ -37,6 +37,9 @@ static WEBP_INLINE uint8_t clip_8b(int v) {
  STORE(3, y, DC - (d));            \
 } while (0)

+#define MUL1(a) ((((a) * 20091) >> 16) + (a))
+#define MUL2(a) (((a) * 35468) >> 16)
+
 #if !WEBP_NEON_OMIT_C_CODE
 static void TransformOne_C(const int16_t* in, uint8_t* dst) {
  int C[4 * 4], *tmp;
@@ -45,10 +48,8 @@ static void TransformOne_C(const int16_t* in, uint8_t* dst) {
  for (i = 0; i < 4; ++i) {    // vertical pass
    const int a = in[0] + in[8];    // [-4096, 4094]
    const int b = in[0] - in[8];    // [-4095, 4095]
-    const int c = WEBP_TRANSFORM_AC3_MUL2(in[4]) -
-                  WEBP_TRANSFORM_AC3_MUL1(in[12]);  // [-3783, 3783]
-    const int d = WEBP_TRANSFORM_AC3_MUL1(in[4]) +
-                  WEBP_TRANSFORM_AC3_MUL2(in[12]);  // [-3785, 3781]
+    const int c = MUL2(in[4]) - MUL1(in[12]);   // [-3783, 3783]
+    const int d = MUL1(in[4]) + MUL2(in[12]);   // [-3785, 3781]
    tmp[0] = a + d;   // [-7881, 7875]
    tmp[1] = b + c;   // [-7878, 7878]
    tmp[2] = b - c;   // [-7878, 7878]
@@ -68,10 +69,8 @@ static void TransformOne_C(const int16_t* in, uint8_t* dst) {
    const int dc = tmp[0] + 4;
    const int a =  dc +  tmp[8];
    const int b =  dc -  tmp[8];
-    const int c =
-        WEBP_TRANSFORM_AC3_MUL2(tmp[4]) - WEBP_TRANSFORM_AC3_MUL1(tmp[12]);
-    const int d =
-        WEBP_TRANSFORM_AC3_MUL1(tmp[4]) + WEBP_TRANSFORM_AC3_MUL2(tmp[12]);
+    const int c = MUL2(tmp[4]) - MUL1(tmp[12]);
+    const int d = MUL1(tmp[4]) + MUL2(tmp[12]);
    STORE(0, 0, a + d);
    STORE(1, 0, b + c);
    STORE(2, 0, b - c);
@@ -84,15 +83,17 @@ static void TransformOne_C(const int16_t* in, uint8_t* dst) {
 // Simplified transform when only in[0], in[1] and in[4] are non-zero
 static void TransformAC3_C(const int16_t* in, uint8_t* dst) {
  const int a = in[0] + 4;
-  const int c4 = WEBP_TRANSFORM_AC3_MUL2(in[4]);
-  const int d4 = WEBP_TRANSFORM_AC3_MUL1(in[4]);
-  const int c1 = WEBP_TRANSFORM_AC3_MUL2(in[1]);
-  const int d1 = WEBP_TRANSFORM_AC3_MUL1(in[1]);
+  const int c4 = MUL2(in[4]);
+  const int d4 = MUL1(in[4]);
+  const int c1 = MUL2(in[1]);
+  const int d1 = MUL1(in[1]);
  STORE2(0, a + d4, d1, c1);
  STORE2(1, a + c4, d1, c1);
  STORE2(2, a - c4, d1, c1);
  STORE2(3, a - d4, d1, c1);
 }
+#undef MUL1
+#undef MUL2
 #undef STORE2

 static void TransformTwo_C(const int16_t* in, uint8_t* dst, int do_two) {
@@ -733,7 +734,6 @@ VP8SimpleFilterFunc VP8SimpleHFilter16i;
 void (*VP8DitherCombine8x8)(const uint8_t* dither, uint8_t* dst,
                            int dst_stride);

-extern VP8CPUInfo VP8GetCPUInfo;
 extern void VP8DspInitSSE2(void);
 extern void VP8DspInitSSE41(void);
 extern void VP8DspInitNEON(void);
--- a/src/dsp/dec_mips32.c
+++ b/src/dsp/dec_mips32.c
@@ -18,8 +18,8 @@

 #include "src/dsp/mips_macro.h"

-static const int kC1 = WEBP_TRANSFORM_AC3_C1;
-static const int kC2 = WEBP_TRANSFORM_AC3_C2;
+static const int kC1 = 20091 + (1 << 16);
+static const int kC2 = 35468;

 static WEBP_INLINE int abs_mips32(int x) {
  const int sign = x >> 31;
@@ -219,7 +219,7 @@ static void TransformOne(const int16_t* in, uint8_t* dst) {
  int temp0, temp1, temp2, temp3, temp4;
  int temp5, temp6, temp7, temp8, temp9;
  int temp10, temp11, temp12, temp13, temp14;
-  int temp15, temp16, temp17, temp18, temp19;
+  int temp15, temp16, temp17, temp18;
  int16_t* p_in = (int16_t*)in;

  // loops unrolled and merged to avoid usage of tmp buffer
@@ -233,14 +233,16 @@ static void TransformOne(const int16_t* in, uint8_t* dst) {
    "addu     %[temp16], %[temp0],  %[temp8]           \n\t"
    "subu     %[temp0],  %[temp0],  %[temp8]           \n\t"
    "mul      %[temp8],  %[temp4],  %[kC2]             \n\t"
-    MUL_SHIFT_C1(temp17, temp12)
-    MUL_SHIFT_C1_IO(temp4, temp19)
+    "mul      %[temp17], %[temp12], %[kC1]             \n\t"
+    "mul      %[temp4],  %[temp4],  %[kC1]             \n\t"
    "mul      %[temp12], %[temp12], %[kC2]             \n\t"
    "lh       %[temp1],  2(%[in])                      \n\t"
    "lh       %[temp5],  10(%[in])                     \n\t"
    "lh       %[temp9],  18(%[in])                     \n\t"
    "lh       %[temp13], 26(%[in])                     \n\t"
    "sra      %[temp8],  %[temp8],  16                 \n\t"
+    "sra      %[temp17], %[temp17], 16                 \n\t"
+    "sra      %[temp4],  %[temp4],  16                 \n\t"
    "sra      %[temp12], %[temp12], 16                 \n\t"
    "lh       %[temp2],  4(%[in])                      \n\t"
    "lh       %[temp6],  12(%[in])                     \n\t"
@@ -259,43 +261,49 @@ static void TransformOne(const int16_t* in, uint8_t* dst) {
    "addu     %[temp12], %[temp0],  %[temp17]          \n\t"
    "subu     %[temp0],  %[temp0],  %[temp17]          \n\t"
    "mul      %[temp9],  %[temp5],  %[kC2]             \n\t"
-    MUL_SHIFT_C1(temp17, temp13)
-    MUL_SHIFT_C1_IO(temp5, temp19)
+    "mul      %[temp17], %[temp13], %[kC1]             \n\t"
+    "mul      %[temp5],  %[temp5],  %[kC1]             \n\t"
    "mul      %[temp13], %[temp13], %[kC2]             \n\t"
    "sra      %[temp9],  %[temp9],  16                 \n\t"
+    "sra      %[temp17], %[temp17], 16                 \n\t"
    "subu     %[temp17], %[temp9],  %[temp17]          \n\t"
+    "sra      %[temp5],  %[temp5],  16                 \n\t"
    "sra      %[temp13], %[temp13], 16                 \n\t"
    "addu     %[temp5],  %[temp5],  %[temp13]          \n\t"
    "addu     %[temp13], %[temp1],  %[temp17]          \n\t"
    "subu     %[temp1],  %[temp1],  %[temp17]          \n\t"
-    MUL_SHIFT_C1(temp17, temp14)
+    "mul      %[temp17], %[temp14], %[kC1]             \n\t"
    "mul      %[temp14], %[temp14], %[kC2]             \n\t"
    "addu     %[temp9],  %[temp16], %[temp5]           \n\t"
    "subu     %[temp5],  %[temp16], %[temp5]           \n\t"
    "addu     %[temp16], %[temp2],  %[temp10]          \n\t"
    "subu     %[temp2],  %[temp2],  %[temp10]          \n\t"
    "mul      %[temp10], %[temp6],  %[kC2]             \n\t"
-    MUL_SHIFT_C1_IO(temp6, temp19)
+    "mul      %[temp6],  %[temp6],  %[kC1]             \n\t"
+    "sra      %[temp17], %[temp17], 16                 \n\t"
    "sra      %[temp14], %[temp14], 16                 \n\t"
    "sra      %[temp10], %[temp10], 16                 \n\t"
+    "sra      %[temp6],  %[temp6],  16                 \n\t"
    "subu     %[temp17], %[temp10], %[temp17]          \n\t"
    "addu     %[temp6],  %[temp6],  %[temp14]          \n\t"
    "addu     %[temp10], %[temp16], %[temp6]           \n\t"
    "subu     %[temp6],  %[temp16], %[temp6]           \n\t"
    "addu     %[temp14], %[temp2],  %[temp17]          \n\t"
    "subu     %[temp2],  %[temp2],  %[temp17]          \n\t"
-    MUL_SHIFT_C1(temp17, temp15)
+    "mul      %[temp17], %[temp15], %[kC1]             \n\t"
    "mul      %[temp15], %[temp15], %[kC2]             \n\t"
    "addu     %[temp16], %[temp3],  %[temp11]          \n\t"
    "subu     %[temp3],  %[temp3],  %[temp11]          \n\t"
    "mul      %[temp11], %[temp7],  %[kC2]             \n\t"
-    MUL_SHIFT_C1_IO(temp7, temp19)
+    "mul      %[temp7],  %[temp7],  %[kC1]             \n\t"
    "addiu    %[temp8],  %[temp8],  4                  \n\t"
    "addiu    %[temp12], %[temp12], 4                  \n\t"
    "addiu    %[temp0],  %[temp0],  4                  \n\t"
    "addiu    %[temp4],  %[temp4],  4                  \n\t"
+    "sra      %[temp17], %[temp17], 16                 \n\t"
    "sra      %[temp15], %[temp15], 16                 \n\t"
    "sra      %[temp11], %[temp11], 16                 \n\t"
+    "sra      %[temp7],  %[temp7],  16                 \n\t"
    "subu     %[temp17], %[temp11], %[temp17]          \n\t"
    "addu     %[temp7],  %[temp7],  %[temp15]          \n\t"
    "addu     %[temp15], %[temp3],  %[temp17]          \n\t"
@@ -305,40 +313,48 @@ static void TransformOne(const int16_t* in, uint8_t* dst) {
    "addu     %[temp16], %[temp8],  %[temp10]          \n\t"
    "subu     %[temp8],  %[temp8],  %[temp10]          \n\t"
    "mul      %[temp10], %[temp9],  %[kC2]             \n\t"
-    MUL_SHIFT_C1(temp17, temp11)
-    MUL_SHIFT_C1_IO(temp9, temp19)
+    "mul      %[temp17], %[temp11], %[kC1]             \n\t"
+    "mul      %[temp9],  %[temp9],  %[kC1]             \n\t"
    "mul      %[temp11], %[temp11], %[kC2]             \n\t"
    "sra      %[temp10], %[temp10], 16                 \n\t"
+    "sra      %[temp17], %[temp17], 16                 \n\t"
+    "sra      %[temp9],  %[temp9],  16                 \n\t"
    "sra      %[temp11], %[temp11], 16                 \n\t"
    "subu     %[temp17], %[temp10], %[temp17]          \n\t"
    "addu     %[temp11], %[temp9],  %[temp11]          \n\t"
    "addu     %[temp10], %[temp12], %[temp14]          \n\t"
    "subu     %[temp12], %[temp12], %[temp14]          \n\t"
    "mul      %[temp14], %[temp13], %[kC2]             \n\t"
-    MUL_SHIFT_C1(temp9, temp15)
-    MUL_SHIFT_C1_IO(temp13, temp19)
+    "mul      %[temp9],  %[temp15], %[kC1]             \n\t"
+    "mul      %[temp13], %[temp13], %[kC1]             \n\t"
    "mul      %[temp15], %[temp15], %[kC2]             \n\t"
    "sra      %[temp14], %[temp14], 16                 \n\t"
+    "sra      %[temp9],  %[temp9],  16                 \n\t"
+    "sra      %[temp13], %[temp13], 16                 \n\t"
    "sra      %[temp15], %[temp15], 16                 \n\t"
    "subu     %[temp9],  %[temp14], %[temp9]           \n\t"
    "addu     %[temp15], %[temp13], %[temp15]          \n\t"
    "addu     %[temp14], %[temp0],  %[temp2]           \n\t"
    "subu     %[temp0],  %[temp0],  %[temp2]           \n\t"
    "mul      %[temp2],  %[temp1],  %[kC2]             \n\t"
-    MUL_SHIFT_C1(temp13, temp3)
-    MUL_SHIFT_C1_IO(temp1, temp19)
+    "mul      %[temp13], %[temp3],  %[kC1]             \n\t"
+    "mul      %[temp1],  %[temp1],  %[kC1]             \n\t"
    "mul      %[temp3],  %[temp3],  %[kC2]             \n\t"
    "sra      %[temp2],  %[temp2],  16                 \n\t"
+    "sra      %[temp13], %[temp13], 16                 \n\t"
+    "sra      %[temp1],  %[temp1],  16                 \n\t"
    "sra      %[temp3],  %[temp3],  16                 \n\t"
    "subu     %[temp13], %[temp2],  %[temp13]          \n\t"
    "addu     %[temp3],  %[temp1],  %[temp3]           \n\t"
    "addu     %[temp2],  %[temp4],  %[temp6]           \n\t"
    "subu     %[temp4],  %[temp4],  %[temp6]           \n\t"
    "mul      %[temp6],  %[temp5],  %[kC2]             \n\t"
-    MUL_SHIFT_C1(temp1, temp7)
-    MUL_SHIFT_C1_IO(temp5, temp19)
+    "mul      %[temp1],  %[temp7],  %[kC1]             \n\t"
+    "mul      %[temp5],  %[temp5],  %[kC1]             \n\t"
    "mul      %[temp7],  %[temp7],  %[kC2]             \n\t"
    "sra      %[temp6],  %[temp6],  16                 \n\t"
+    "sra      %[temp1],  %[temp1],  16                 \n\t"
+    "sra      %[temp5],  %[temp5],  16                 \n\t"
    "sra      %[temp7],  %[temp7],  16                 \n\t"
    "subu     %[temp1],  %[temp6],  %[temp1]           \n\t"
    "addu     %[temp7],  %[temp5],  %[temp7]           \n\t"
@@ -526,7 +542,7 @@ static void TransformOne(const int16_t* in, uint8_t* dst) {
      [temp9]"=&r"(temp9), [temp10]"=&r"(temp10), [temp11]"=&r"(temp11),
      [temp12]"=&r"(temp12), [temp13]"=&r"(temp13), [temp14]"=&r"(temp14),
      [temp15]"=&r"(temp15), [temp16]"=&r"(temp16), [temp17]"=&r"(temp17),
-      [temp18]"=&r"(temp18), [temp19]"=&r"(temp19)
+      [temp18]"=&r"(temp18)
    : [in]"r"(p_in), [kC1]"r"(kC1), [kC2]"r"(kC2), [dst]"r"(dst)
    : "memory", "hi", "lo"
  );
--- a/src/dsp/dec_mips_dsp_r2.c
+++ b/src/dsp/dec_mips_dsp_r2.c
@@ -18,8 +18,10 @@

 #include "src/dsp/mips_macro.h"

-static const int kC1 = WEBP_TRANSFORM_AC3_C1;
-static const int kC2 = WEBP_TRANSFORM_AC3_C2;
+static const int kC1 = 20091 + (1 << 16);
+static const int kC2 = 35468;
+
+#define MUL(a, b) (((a) * (b)) >> 16)

 static void TransformDC(const int16_t* in, uint8_t* dst) {
  int temp1, temp2, temp3, temp4, temp5, temp6, temp7, temp8, temp9, temp10;
@@ -47,10 +49,10 @@ static void TransformDC(const int16_t* in, uint8_t* dst) {

 static void TransformAC3(const int16_t* in, uint8_t* dst) {
  const int a = in[0] + 4;
-  int c4 = WEBP_TRANSFORM_AC3_MUL2(in[4]);
-  const int d4 = WEBP_TRANSFORM_AC3_MUL1(in[4]);
-  const int c1 = WEBP_TRANSFORM_AC3_MUL2(in[1]);
-  const int d1 = WEBP_TRANSFORM_AC3_MUL1(in[1]);
+  int c4 = MUL(in[4], kC2);
+  const int d4 = MUL(in[4], kC1);
+  const int c1 = MUL(in[1], kC2);
+  const int d1 = MUL(in[1], kC1);
  int temp1, temp2, temp3, temp4, temp5, temp6, temp7, temp8, temp9;
  int temp10, temp11, temp12, temp13, temp14, temp15, temp16, temp17, temp18;

@@ -477,6 +479,8 @@ static void HFilter8i(uint8_t* u, uint8_t* v, int stride,
  FilterLoop24(v + 4, 1, stride, 8, thresh, ithresh, hev_thresh);
 }

+#undef MUL
+
 //------------------------------------------------------------------------------
 // Simple In-loop filtering (Paragraph 15.2)

--- a/src/dsp/dec_msa.c
+++ b/src/dsp/dec_msa.c
@@ -37,6 +37,8 @@
  d1_m = d_tmp1_m + d_tmp2_m;                                    \
  BUTTERFLY_4(a1_m, b1_m, c1_m, d1_m, out0, out1, out2, out3);   \
 }
+#define MULT1(a) ((((a) * 20091) >> 16) + (a))
+#define MULT2(a) (((a) * 35468) >> 16)

 static void TransformOne(const int16_t* in, uint8_t* dst) {
  v8i16 input0, input1;
@@ -122,10 +124,10 @@ static void TransformDC(const int16_t* in, uint8_t* dst) {

 static void TransformAC3(const int16_t* in, uint8_t* dst) {
  const int a = in[0] + 4;
-  const int c4 = WEBP_TRANSFORM_AC3_MUL2(in[4]);
-  const int d4 = WEBP_TRANSFORM_AC3_MUL1(in[4]);
-  const int in2 = WEBP_TRANSFORM_AC3_MUL2(in[1]);
-  const int in3 = WEBP_TRANSFORM_AC3_MUL1(in[1]);
+  const int c4 = MULT2(in[4]);
+  const int d4 = MULT1(in[4]);
+  const int in2 = MULT2(in[1]);
+  const int in3 = MULT1(in[1]);
  v4i32 tmp0 = { 0 };
  v4i32 out0 = __msa_fill_w(a + d4);
  v4i32 out1 = __msa_fill_w(a + c4);
--- a/src/dsp/dec_neon.c
+++ b/src/dsp/dec_neon.c
@@ -1000,9 +1000,8 @@ static void HFilter8i_NEON(uint8_t* u, uint8_t* v, int stride,
 // libwebp adds 1 << 16 to cospi8sqrt2minus1 (kC1). However, this causes the
 // same issue with kC1 and vqdmulh that we work around by down shifting kC2

-static const int16_t kC1 = WEBP_TRANSFORM_AC3_C1;
-static const int16_t kC2 =
-    WEBP_TRANSFORM_AC3_C2 / 2;  // half of kC2, actually. See comment above.
+static const int16_t kC1 = 20091;
+static const int16_t kC2 = 17734;  // half of kC2, actually. See comment above.

 #if defined(WEBP_USE_INTRINSICS)
 static WEBP_INLINE void Transpose8x2_NEON(const int16x8_t in0,
@@ -1256,12 +1255,15 @@ static void TransformWHT_NEON(const int16_t* in, int16_t* out) {

 //------------------------------------------------------------------------------

+#define MUL(a, b) (((a) * (b)) >> 16)
 static void TransformAC3_NEON(const int16_t* in, uint8_t* dst) {
+  static const int kC1_full = 20091 + (1 << 16);
+  static const int kC2_full = 35468;
  const int16x4_t A = vld1_dup_s16(in);
-  const int16x4_t c4 = vdup_n_s16(WEBP_TRANSFORM_AC3_MUL2(in[4]));
-  const int16x4_t d4 = vdup_n_s16(WEBP_TRANSFORM_AC3_MUL1(in[4]));
-  const int c1 = WEBP_TRANSFORM_AC3_MUL2(in[1]);
-  const int d1 = WEBP_TRANSFORM_AC3_MUL1(in[1]);
+  const int16x4_t c4 = vdup_n_s16(MUL(in[4], kC2_full));
+  const int16x4_t d4 = vdup_n_s16(MUL(in[4], kC1_full));
+  const int c1 = MUL(in[1], kC2_full);
+  const int d1 = MUL(in[1], kC1_full);
  const uint64_t cd = (uint64_t)( d1 & 0xffff) <<  0 |
                      (uint64_t)( c1 & 0xffff) << 16 |
                      (uint64_t)(-c1 & 0xffff) << 32 |
@@ -1272,6 +1274,7 @@ static void TransformAC3_NEON(const int16_t* in, uint8_t* dst) {
  const int16x8_t m2_m3 = vcombine_s16(vqsub_s16(B, c4), vqsub_s16(B, d4));
  Add4x4_NEON(m0_m1, m2_m3, dst);
 }
+#undef MUL

 //------------------------------------------------------------------------------
 // 4x4
@@ -1425,7 +1428,7 @@ static WEBP_INLINE void DC8_NEON(uint8_t* dst, int do_top, int do_left) {

  if (do_top) {
    const uint8x8_t A = vld1_u8(dst - BPS);  // top row
-#if WEBP_AARCH64
+#if defined(__aarch64__)
    const uint16_t p2 = vaddlv_u8(A);
    sum_top = vdupq_n_u16(p2);
 #else
@@ -1508,7 +1511,7 @@ static WEBP_INLINE void DC16_NEON(uint8_t* dst, int do_top, int do_left) {

  if (do_top) {
    const uint8x16_t A = vld1q_u8(dst - BPS);  // top row
-#if WEBP_AARCH64
+#if defined(__aarch64__)
    const uint16_t p3 = vaddlvq_u8(A);
    sum_top = vdupq_n_u16(p3);
 #else
--- a/src/dsp/dec_sse2.c
+++ b/src/dsp/dec_sse2.c
@@ -196,13 +196,15 @@ static void Transform_SSE2(const int16_t* in, uint8_t* dst, int do_two) {
 }

 #if (USE_TRANSFORM_AC3 == 1)
-
+#define MUL(a, b) (((a) * (b)) >> 16)
 static void TransformAC3(const int16_t* in, uint8_t* dst) {
+  static const int kC1 = 20091 + (1 << 16);
+  static const int kC2 = 35468;
  const __m128i A = _mm_set1_epi16(in[0] + 4);
-  const __m128i c4 = _mm_set1_epi16(WEBP_TRANSFORM_AC3_MUL2(in[4]));
-  const __m128i d4 = _mm_set1_epi16(WEBP_TRANSFORM_AC3_MUL1(in[4]));
-  const int c1 = WEBP_TRANSFORM_AC3_MUL2(in[1]);
-  const int d1 = WEBP_TRANSFORM_AC3_MUL1(in[1]);
+  const __m128i c4 = _mm_set1_epi16(MUL(in[4], kC2));
+  const __m128i d4 = _mm_set1_epi16(MUL(in[4], kC1));
+  const int c1 = MUL(in[1], kC2);
+  const int d1 = MUL(in[1], kC1);
  const __m128i CD = _mm_set_epi16(0, 0, 0, 0, -d1, -c1, c1, d1);
  const __m128i B = _mm_adds_epi16(A, CD);
  const __m128i m0 = _mm_adds_epi16(B, d4);
@@ -236,7 +238,7 @@ static void TransformAC3(const int16_t* in, uint8_t* dst) {
  WebPInt32ToMem(dst + 2 * BPS, _mm_cvtsi128_si32(dst2));
  WebPInt32ToMem(dst + 3 * BPS, _mm_cvtsi128_si32(dst3));
 }
-
+#undef MUL
 #endif   // USE_TRANSFORM_AC3

 //------------------------------------------------------------------------------
@@ -257,15 +259,15 @@ static WEBP_INLINE void SignedShift8b_SSE2(__m128i* const x) {
  *x = _mm_packs_epi16(lo_1, hi_1);
 }

-#define FLIP_SIGN_BIT2(a, b) do {                                              \
+#define FLIP_SIGN_BIT2(a, b) {                                                 \
  (a) = _mm_xor_si128(a, sign_bit);                                            \
  (b) = _mm_xor_si128(b, sign_bit);                                            \
-} while (0)
+}

-#define FLIP_SIGN_BIT4(a, b, c, d) do {                                        \
+#define FLIP_SIGN_BIT4(a, b, c, d) {                                           \
  FLIP_SIGN_BIT2(a, b);                                                        \
  FLIP_SIGN_BIT2(c, d);                                                        \
-} while (0)
+}

 // input/output is uint8_t
 static WEBP_INLINE void GetNotHEV_SSE2(const __m128i* const p1,
@@ -643,12 +645,12 @@ static void SimpleHFilter16i_SSE2(uint8_t* p, int stride, int thresh) {
  (m) = _mm_max_epu8(m, MM_ABS(p2, p1));                                       \
 } while (0)

-#define LOAD_H_EDGES4(p, stride, e1, e2, e3, e4) do {                          \
+#define LOAD_H_EDGES4(p, stride, e1, e2, e3, e4) {                             \
  (e1) = _mm_loadu_si128((__m128i*)&(p)[0 * (stride)]);                        \
  (e2) = _mm_loadu_si128((__m128i*)&(p)[1 * (stride)]);                        \
  (e3) = _mm_loadu_si128((__m128i*)&(p)[2 * (stride)]);                        \
  (e4) = _mm_loadu_si128((__m128i*)&(p)[3 * (stride)]);                        \
-} while (0)
+}

 #define LOADUV_H_EDGE(p, u, v, stride) do {                                    \
  const __m128i U = _mm_loadl_epi64((__m128i*)&(u)[(stride)]);                 \
@@ -656,18 +658,18 @@ static void SimpleHFilter16i_SSE2(uint8_t* p, int stride, int thresh) {
  (p) = _mm_unpacklo_epi64(U, V);                                              \
 } while (0)

-#define LOADUV_H_EDGES4(u, v, stride, e1, e2, e3, e4) do {                     \
+#define LOADUV_H_EDGES4(u, v, stride, e1, e2, e3, e4) {                        \
  LOADUV_H_EDGE(e1, u, v, 0 * (stride));                                       \
  LOADUV_H_EDGE(e2, u, v, 1 * (stride));                                       \
  LOADUV_H_EDGE(e3, u, v, 2 * (stride));                                       \
  LOADUV_H_EDGE(e4, u, v, 3 * (stride));                                       \
-} while (0)
+}

-#define STOREUV(p, u, v, stride) do {                                          \
+#define STOREUV(p, u, v, stride) {                                             \
  _mm_storel_epi64((__m128i*)&(u)[(stride)], p);                               \
  (p) = _mm_srli_si128(p, 8);                                                  \
  _mm_storel_epi64((__m128i*)&(v)[(stride)], p);                               \
-} while (0)
+}

 static WEBP_INLINE void ComplexMask_SSE2(const __m128i* const p1,
                                         const __m128i* const p0,
--- a/src/dsp/dsp.h
+++ b/src/dsp/dsp.h
@@ -203,11 +203,6 @@ extern VP8DecIdct VP8TransformDC;
 extern VP8DecIdct VP8TransformDCUV;
 extern VP8WHT VP8TransformWHT;

-#define WEBP_TRANSFORM_AC3_C1 20091
-#define WEBP_TRANSFORM_AC3_C2 35468
-#define WEBP_TRANSFORM_AC3_MUL1(a) ((((a) * WEBP_TRANSFORM_AC3_C1) >> 16) + (a))
-#define WEBP_TRANSFORM_AC3_MUL2(a) (((a) * WEBP_TRANSFORM_AC3_C2) >> 16)
-
 // *dst is the destination block, with stride BPS. Boundary samples are
 // assumed accessible when needed.
 typedef void (*VP8PredFunc)(uint8_t* dst);
--- a/src/dsp/enc.c
+++ b/src/dsp/enc.c
@@ -109,6 +109,10 @@ static WEBP_TSAN_IGNORE_FUNCTION void InitTables(void) {
 #define STORE(x, y, v) \
  dst[(x) + (y) * BPS] = clip_8b(ref[(x) + (y) * BPS] + ((v) >> 3))

+static const int kC1 = 20091 + (1 << 16);
+static const int kC2 = 35468;
+#define MUL(a, b) (((a) * (b)) >> 16)
+
 static WEBP_INLINE void ITransformOne(const uint8_t* ref, const int16_t* in,
                                      uint8_t* dst) {
  int C[4 * 4], *tmp;
@@ -117,10 +121,8 @@ static WEBP_INLINE void ITransformOne(const uint8_t* ref, const int16_t* in,
  for (i = 0; i < 4; ++i) {    // vertical pass
    const int a = in[0] + in[8];
    const int b = in[0] - in[8];
-    const int c =
-        WEBP_TRANSFORM_AC3_MUL2(in[4]) - WEBP_TRANSFORM_AC3_MUL1(in[12]);
-    const int d =
-        WEBP_TRANSFORM_AC3_MUL1(in[4]) + WEBP_TRANSFORM_AC3_MUL2(in[12]);
+    const int c = MUL(in[4], kC2) - MUL(in[12], kC1);
+    const int d = MUL(in[4], kC1) + MUL(in[12], kC2);
    tmp[0] = a + d;
    tmp[1] = b + c;
    tmp[2] = b - c;
@@ -132,12 +134,10 @@ static WEBP_INLINE void ITransformOne(const uint8_t* ref, const int16_t* in,
  tmp = C;
  for (i = 0; i < 4; ++i) {    // horizontal pass
    const int dc = tmp[0] + 4;
-    const int a = dc + tmp[8];
-    const int b = dc - tmp[8];
-    const int c =
-        WEBP_TRANSFORM_AC3_MUL2(tmp[4]) - WEBP_TRANSFORM_AC3_MUL1(tmp[12]);
-    const int d =
-        WEBP_TRANSFORM_AC3_MUL1(tmp[4]) + WEBP_TRANSFORM_AC3_MUL2(tmp[12]);
+    const int a =  dc +  tmp[8];
+    const int b =  dc -  tmp[8];
+    const int c = MUL(tmp[4], kC2) - MUL(tmp[12], kC1);
+    const int d = MUL(tmp[4], kC1) + MUL(tmp[12], kC2);
    STORE(0, i, a + d);
    STORE(1, i, b + c);
    STORE(2, i, b - c);
@@ -222,6 +222,7 @@ static void FTransformWHT_C(const int16_t* in, int16_t* out) {
 }
 #endif  // !WEBP_NEON_OMIT_C_CODE

+#undef MUL
 #undef STORE

 //------------------------------------------------------------------------------
@@ -731,7 +732,6 @@ VP8QuantizeBlockWHT VP8EncQuantizeBlockWHT;
 VP8BlockCopy VP8Copy4x4;
 VP8BlockCopy VP8Copy16x8;

-extern VP8CPUInfo VP8GetCPUInfo;
 extern void VP8EncDspInitSSE2(void);
 extern void VP8EncDspInitSSE41(void);
 extern void VP8EncDspInitNEON(void);
--- a/src/dsp/enc_mips32.c
+++ b/src/dsp/enc_mips32.c
@@ -21,8 +21,8 @@
 #include "src/enc/vp8i_enc.h"
 #include "src/enc/cost_enc.h"

-static const int kC1 = WEBP_TRANSFORM_AC3_C1;
-static const int kC2 = WEBP_TRANSFORM_AC3_C2;
+static const int kC1 = 20091 + (1 << 16);
+static const int kC2 = 35468;

 // macro for one vertical pass in ITransformOne
 // MUL macro inlined
@@ -30,7 +30,7 @@ static const int kC2 = WEBP_TRANSFORM_AC3_C2;
 // A..D - offsets in bytes to load from in buffer
 // TEMP0..TEMP3 - registers for corresponding tmp elements
 // TEMP4..TEMP5 - temporary registers
-#define VERTICAL_PASS(A, B, C, D, TEMP4, TEMP0, TEMP1, TEMP2, TEMP3) \
+#define VERTICAL_PASS(A, B, C, D, TEMP4, TEMP0, TEMP1, TEMP2, TEMP3)        \
  "lh      %[temp16],      " #A "(%[temp20])                 \n\t"          \
  "lh      %[temp18],      " #B "(%[temp20])                 \n\t"          \
  "lh      %[temp17],      " #C "(%[temp20])                 \n\t"          \
@@ -38,10 +38,12 @@ static const int kC2 = WEBP_TRANSFORM_AC3_C2;
  "addu    %[" #TEMP4 "],    %[temp16],      %[temp18]       \n\t"          \
  "subu    %[temp16],      %[temp16],      %[temp18]         \n\t"          \
  "mul     %[" #TEMP0 "],    %[temp17],      %[kC2]          \n\t"          \
-  MUL_SHIFT_C1_IO(temp17, temp18)                                           \
-  MUL_SHIFT_C1(temp18, temp19)                                              \
+  "mul     %[temp18],      %[temp19],      %[kC1]            \n\t"          \
+  "mul     %[temp17],      %[temp17],      %[kC1]            \n\t"          \
  "mul     %[temp19],      %[temp19],      %[kC2]            \n\t"          \
  "sra     %[" #TEMP0 "],    %[" #TEMP0 "],    16            \n\n"          \
+  "sra     %[temp18],      %[temp18],      16                \n\n"          \
+  "sra     %[temp17],      %[temp17],      16                \n\n"          \
  "sra     %[temp19],      %[temp19],      16                \n\n"          \
  "subu    %[" #TEMP2 "],    %[" #TEMP0 "],    %[temp18]     \n\t"          \
  "addu    %[" #TEMP3 "],    %[temp17],      %[temp19]       \n\t"          \
@@ -56,15 +58,17 @@ static const int kC2 = WEBP_TRANSFORM_AC3_C2;
 // temp0..temp15 holds tmp[0]..tmp[15]
 // A - offset in bytes to load from ref and store to dst buffer
 // TEMP0, TEMP4, TEMP8 and TEMP12 - registers for corresponding tmp elements
-#define HORIZONTAL_PASS(A, TEMP0, TEMP4, TEMP8, TEMP12) \
+#define HORIZONTAL_PASS(A, TEMP0, TEMP4, TEMP8, TEMP12)                       \
  "addiu   %[" #TEMP0 "],    %[" #TEMP0 "],    4               \n\t"          \
  "addu    %[temp16],      %[" #TEMP0 "],    %[" #TEMP8 "]     \n\t"          \
  "subu    %[temp17],      %[" #TEMP0 "],    %[" #TEMP8 "]     \n\t"          \
  "mul     %[" #TEMP0 "],    %[" #TEMP4 "],    %[kC2]          \n\t"          \
-  MUL_SHIFT_C1_IO(TEMP4, TEMP8)                                               \
-  MUL_SHIFT_C1(TEMP8, TEMP12)                                                 \
+  "mul     %[" #TEMP8 "],    %[" #TEMP12 "],   %[kC1]          \n\t"          \
+  "mul     %[" #TEMP4 "],    %[" #TEMP4 "],    %[kC1]          \n\t"          \
  "mul     %[" #TEMP12 "],   %[" #TEMP12 "],   %[kC2]          \n\t"          \
  "sra     %[" #TEMP0 "],    %[" #TEMP0 "],    16              \n\t"          \
+  "sra     %[" #TEMP8 "],    %[" #TEMP8 "],    16              \n\t"          \
+  "sra     %[" #TEMP4 "],    %[" #TEMP4 "],    16              \n\t"          \
  "sra     %[" #TEMP12 "],   %[" #TEMP12 "],   16              \n\t"          \
  "subu    %[temp18],      %[" #TEMP0 "],    %[" #TEMP8 "]     \n\t"          \
  "addu    %[temp19],      %[" #TEMP4 "],    %[" #TEMP12 "]    \n\t"          \
--- a/src/dsp/enc_mips_dsp_r2.c
+++ b/src/dsp/enc_mips_dsp_r2.c
@@ -20,8 +20,8 @@
 #include "src/enc/cost_enc.h"
 #include "src/enc/vp8i_enc.h"

-static const int kC1 = WEBP_TRANSFORM_AC3_C1;
-static const int kC2 = WEBP_TRANSFORM_AC3_C2;
+static const int kC1 = 20091 + (1 << 16);
+static const int kC2 = 35468;

 // O - output
 // I - input (macro doesn't change it)
--- a/src/dsp/enc_neon.c
+++ b/src/dsp/enc_neon.c
@@ -27,9 +27,8 @@
 // This code is pretty much the same as TransformOne in the dec_neon.c, except
 // for subtraction to *ref. See the comments there for algorithmic explanations.

-static const int16_t kC1 = WEBP_TRANSFORM_AC3_C1;
-static const int16_t kC2 =
-    WEBP_TRANSFORM_AC3_C2 / 2;  // half of kC2, actually. See comment above.
+static const int16_t kC1 = 20091;
+static const int16_t kC2 = 17734;  // half of kC2, actually. See comment above.

 // This code works but is *slower* than the inlined-asm version below
 // (with gcc-4.6). So we disable it for now. Later, it'll be conditional to
@@ -765,7 +764,7 @@ static WEBP_INLINE void AccumulateSSE16_NEON(const uint8_t* const a,

 // Horizontal sum of all four uint32_t values in 'sum'.
 static int SumToInt_NEON(uint32x4_t sum) {
-#if WEBP_AARCH64
+#if defined(__aarch64__)
  return (int)vaddvq_u32(sum);
 #else
  const uint64x2_t sum2 = vpaddlq_u32(sum);
@@ -866,7 +865,7 @@ static int QuantizeBlock_NEON(int16_t in[16], int16_t out[16],
  uint8x8x4_t shuffles;
  // vtbl?_u8 are marked unavailable for iOS arm64 with Xcode < 6.3, use
  // non-standard versions there.
-#if defined(__APPLE__) && WEBP_AARCH64 && \
+#if defined(__APPLE__) && defined(__aarch64__) && \
    defined(__apple_build_version__) && (__apple_build_version__< 6020037)
  uint8x16x2_t all_out;
  INIT_VECTOR2(all_out, vreinterpretq_u8_s16(out0), vreinterpretq_u8_s16(out1));
--- a/src/dsp/enc_sse2.c
+++ b/src/dsp/enc_sse2.c
@@ -25,160 +25,9 @@
 //------------------------------------------------------------------------------
 // Transforms (Paragraph 14.4)

-// Does one inverse transform.
-static void ITransform_One_SSE2(const uint8_t* ref, const int16_t* in,
-                                uint8_t* dst) {
-  // This implementation makes use of 16-bit fixed point versions of two
-  // multiply constants:
-  //    K1 = sqrt(2) * cos (pi/8) ~= 85627 / 2^16
-  //    K2 = sqrt(2) * sin (pi/8) ~= 35468 / 2^16
-  //
-  // To be able to use signed 16-bit integers, we use the following trick to
-  // have constants within range:
-  // - Associated constants are obtained by subtracting the 16-bit fixed point
-  //   version of one:
-  //      k = K - (1 << 16)  =>  K = k + (1 << 16)
-  //      K1 = 85267  =>  k1 =  20091
-  //      K2 = 35468  =>  k2 = -30068
-  // - The multiplication of a variable by a constant become the sum of the
-  //   variable and the multiplication of that variable by the associated
-  //   constant:
-  //      (x * K) >> 16 = (x * (k + (1 << 16))) >> 16 = ((x * k ) >> 16) + x
-  const __m128i k1k2 = _mm_set_epi16(-30068, -30068, -30068, -30068,
-                                     20091, 20091, 20091, 20091);
-  const __m128i k2k1 = _mm_set_epi16(20091, 20091, 20091, 20091,
-                                     -30068, -30068, -30068, -30068);
-  const __m128i zero = _mm_setzero_si128();
-  const __m128i zero_four = _mm_set_epi16(0, 0, 0, 0, 4, 4, 4, 4);
-  __m128i T01, T23;
-
-  // Load and concatenate the transform coefficients.
-  const __m128i in01 = _mm_loadu_si128((const __m128i*)&in[0]);
-  const __m128i in23 = _mm_loadu_si128((const __m128i*)&in[8]);
-  // a00 a10 a20 a30   a01 a11 a21 a31
-  // a02 a12 a22 a32   a03 a13 a23 a33
-
-  // Vertical pass and subsequent transpose.
-  {
-    const __m128i in1 = _mm_unpackhi_epi64(in01, in01);
-    const __m128i in3 = _mm_unpackhi_epi64(in23, in23);
-
-    // First pass, c and d calculations are longer because of the "trick"
-    // multiplications.
-    // c = MUL(in1, K2) - MUL(in3, K1) = MUL(in1, k2) - MUL(in3, k1) + in1 - in3
-    // d = MUL(in1, K1) + MUL(in3, K2) = MUL(in1, k1) + MUL(in3, k2) + in1 + in3
-    const __m128i a_d3 = _mm_add_epi16(in01, in23);
-    const __m128i b_c3 = _mm_sub_epi16(in01, in23);
-    const __m128i c1d1 = _mm_mulhi_epi16(in1, k2k1);
-    const __m128i c2d2 = _mm_mulhi_epi16(in3, k1k2);
-    const __m128i c3 = _mm_unpackhi_epi64(b_c3, b_c3);
-    const __m128i c4 = _mm_sub_epi16(c1d1, c2d2);
-    const __m128i c = _mm_add_epi16(c3, c4);
-    const __m128i d4u = _mm_add_epi16(c1d1, c2d2);
-    const __m128i du = _mm_add_epi16(a_d3, d4u);
-    const __m128i d = _mm_unpackhi_epi64(du, du);
-
-    // Second pass.
-    const __m128i comb_ab = _mm_unpacklo_epi64(a_d3, b_c3);
-    const __m128i comb_dc = _mm_unpacklo_epi64(d, c);
-
-    const __m128i tmp01 = _mm_add_epi16(comb_ab, comb_dc);
-    const __m128i tmp32 = _mm_sub_epi16(comb_ab, comb_dc);
-    const __m128i tmp23 = _mm_shuffle_epi32(tmp32, _MM_SHUFFLE(1, 0, 3, 2));
-
-    const __m128i transpose_0 = _mm_unpacklo_epi16(tmp01, tmp23);
-    const __m128i transpose_1 = _mm_unpackhi_epi16(tmp01, tmp23);
-    // a00 a20 a01 a21   a02 a22 a03 a23
-    // a10 a30 a11 a31   a12 a32 a13 a33
-
-    T01 = _mm_unpacklo_epi16(transpose_0, transpose_1);
-    T23 = _mm_unpackhi_epi16(transpose_0, transpose_1);
-    // a00 a10 a20 a30   a01 a11 a21 a31
-    // a02 a12 a22 a32   a03 a13 a23 a33
-  }
-
-  // Horizontal pass and subsequent transpose.
-  {
-    const __m128i T1 = _mm_unpackhi_epi64(T01, T01);
-    const __m128i T3 = _mm_unpackhi_epi64(T23, T23);
-
-    // First pass, c and d calculations are longer because of the "trick"
-    // multiplications.
-    const __m128i dc = _mm_add_epi16(T01, zero_four);
-
-    // c = MUL(T1, K2) - MUL(T3, K1) = MUL(T1, k2) - MUL(T3, k1) + T1 - T3
-    // d = MUL(T1, K1) + MUL(T3, K2) = MUL(T1, k1) + MUL(T3, k2) + T1 + T3
-    const __m128i a_d3 = _mm_add_epi16(dc, T23);
-    const __m128i b_c3 = _mm_sub_epi16(dc, T23);
-    const __m128i c1d1 = _mm_mulhi_epi16(T1, k2k1);
-    const __m128i c2d2 = _mm_mulhi_epi16(T3, k1k2);
-    const __m128i c3 = _mm_unpackhi_epi64(b_c3, b_c3);
-    const __m128i c4 = _mm_sub_epi16(c1d1, c2d2);
-    const __m128i c = _mm_add_epi16(c3, c4);
-    const __m128i d4u = _mm_add_epi16(c1d1, c2d2);
-    const __m128i du = _mm_add_epi16(a_d3, d4u);
-    const __m128i d = _mm_unpackhi_epi64(du, du);
-
-    // Second pass.
-    const __m128i comb_ab = _mm_unpacklo_epi64(a_d3, b_c3);
-    const __m128i comb_dc = _mm_unpacklo_epi64(d, c);
-
-    const __m128i tmp01 = _mm_add_epi16(comb_ab, comb_dc);
-    const __m128i tmp32 = _mm_sub_epi16(comb_ab, comb_dc);
-    const __m128i tmp23 = _mm_shuffle_epi32(tmp32, _MM_SHUFFLE(1, 0, 3, 2));
-
-    const __m128i shifted01 = _mm_srai_epi16(tmp01, 3);
-    const __m128i shifted23 = _mm_srai_epi16(tmp23, 3);
-    // a00 a01 a02 a03   a10 a11 a12 a13
-    // a20 a21 a22 a23   a30 a31 a32 a33
-
-    const __m128i transpose_0 = _mm_unpacklo_epi16(shifted01, shifted23);
-    const __m128i transpose_1 = _mm_unpackhi_epi16(shifted01, shifted23);
-    // a00 a20 a01 a21   a02 a22 a03 a23
-    // a10 a30 a11 a31   a12 a32 a13 a33
-
-    T01 = _mm_unpacklo_epi16(transpose_0, transpose_1);
-    T23 = _mm_unpackhi_epi16(transpose_0, transpose_1);
-    // a00 a10 a20 a30   a01 a11 a21 a31
-    // a02 a12 a22 a32   a03 a13 a23 a33
-  }
-
-  // Add inverse transform to 'ref' and store.
-  {
-    // Load the reference(s).
-    __m128i ref01, ref23, ref0123;
-    int32_t buf[4];
-
-    // Load four bytes/pixels per line.
-    const __m128i ref0 = _mm_cvtsi32_si128(WebPMemToInt32(&ref[0 * BPS]));
-    const __m128i ref1 = _mm_cvtsi32_si128(WebPMemToInt32(&ref[1 * BPS]));
-    const __m128i ref2 = _mm_cvtsi32_si128(WebPMemToInt32(&ref[2 * BPS]));
-    const __m128i ref3 = _mm_cvtsi32_si128(WebPMemToInt32(&ref[3 * BPS]));
-    ref01 = _mm_unpacklo_epi32(ref0, ref1);
-    ref23 = _mm_unpacklo_epi32(ref2, ref3);
-
-    // Convert to 16b.
-    ref01 = _mm_unpacklo_epi8(ref01, zero);
-    ref23 = _mm_unpacklo_epi8(ref23, zero);
-    // Add the inverse transform(s).
-    ref01 = _mm_add_epi16(ref01, T01);
-    ref23 = _mm_add_epi16(ref23, T23);
-    // Unsigned saturate to 8b.
-    ref0123 = _mm_packus_epi16(ref01, ref23);
-
-    _mm_storeu_si128((__m128i *)buf, ref0123);
-
-    // Store four bytes/pixels per line.
-    WebPInt32ToMem(&dst[0 * BPS], buf[0]);
-    WebPInt32ToMem(&dst[1 * BPS], buf[1]);
-    WebPInt32ToMem(&dst[2 * BPS], buf[2]);
-    WebPInt32ToMem(&dst[3 * BPS], buf[3]);
-  }
-}
-
-// Does two inverse transforms.
-static void ITransform_Two_SSE2(const uint8_t* ref, const int16_t* in,
-                                uint8_t* dst) {
+// Does one or two inverse transforms.
+static void ITransform_SSE2(const uint8_t* ref, const int16_t* in, uint8_t* dst,
+                            int do_two) {
  // This implementation makes use of 16-bit fixed point versions of two
  // multiply constants:
  //    K1 = sqrt(2) * cos (pi/8) ~= 85627 / 2^16
@@ -200,21 +49,33 @@ static void ITransform_Two_SSE2(const uint8_t* ref, const int16_t* in,
  __m128i T0, T1, T2, T3;

  // Load and concatenate the transform coefficients (we'll do two inverse
-  // transforms in parallel).
+  // transforms in parallel). In the case of only one inverse transform, the
+  // second half of the vectors will just contain random value we'll never
+  // use nor store.
  __m128i in0, in1, in2, in3;
  {
-    const __m128i tmp0 = _mm_loadu_si128((const __m128i*)&in[0]);
-    const __m128i tmp1 = _mm_loadu_si128((const __m128i*)&in[8]);
-    const __m128i tmp2 = _mm_loadu_si128((const __m128i*)&in[16]);
-    const __m128i tmp3 = _mm_loadu_si128((const __m128i*)&in[24]);
-    in0 = _mm_unpacklo_epi64(tmp0, tmp2);
-    in1 = _mm_unpackhi_epi64(tmp0, tmp2);
-    in2 = _mm_unpacklo_epi64(tmp1, tmp3);
-    in3 = _mm_unpackhi_epi64(tmp1, tmp3);
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    in0 = _mm_loadl_epi64((const __m128i*)&in[0]);
+    in1 = _mm_loadl_epi64((const __m128i*)&in[4]);
+    in2 = _mm_loadl_epi64((const __m128i*)&in[8]);
+    in3 = _mm_loadl_epi64((const __m128i*)&in[12]);
+    // a00 a10 a20 a30   x x x x
+    // a01 a11 a21 a31   x x x x
+    // a02 a12 a22 a32   x x x x
+    // a03 a13 a23 a33   x x x x
+    if (do_two) {
+      const __m128i inB0 = _mm_loadl_epi64((const __m128i*)&in[16]);
+      const __m128i inB1 = _mm_loadl_epi64((const __m128i*)&in[20]);
+      const __m128i inB2 = _mm_loadl_epi64((const __m128i*)&in[24]);
+      const __m128i inB3 = _mm_loadl_epi64((const __m128i*)&in[28]);
+      in0 = _mm_unpacklo_epi64(in0, inB0);
+      in1 = _mm_unpacklo_epi64(in1, inB1);
+      in2 = _mm_unpacklo_epi64(in2, inB2);
+      in3 = _mm_unpacklo_epi64(in3, inB3);
+      // a00 a10 a20 a30   b00 b10 b20 b30
+      // a01 a11 a21 a31   b01 b11 b21 b31
+      // a02 a12 a22 a32   b02 b12 b22 b32
+      // a03 a13 a23 a33   b03 b13 b23 b33
+    }
  }

  // Vertical pass and subsequent transpose.
@@ -287,11 +148,19 @@ static void ITransform_Two_SSE2(const uint8_t* ref, const int16_t* in,
    const __m128i zero = _mm_setzero_si128();
    // Load the reference(s).
    __m128i ref0, ref1, ref2, ref3;
-    // Load eight bytes/pixels per line.
-    ref0 = _mm_loadl_epi64((const __m128i*)&ref[0 * BPS]);
-    ref1 = _mm_loadl_epi64((const __m128i*)&ref[1 * BPS]);
-    ref2 = _mm_loadl_epi64((const __m128i*)&ref[2 * BPS]);
-    ref3 = _mm_loadl_epi64((const __m128i*)&ref[3 * BPS]);
+    if (do_two) {
+      // Load eight bytes/pixels per line.
+      ref0 = _mm_loadl_epi64((const __m128i*)&ref[0 * BPS]);
+      ref1 = _mm_loadl_epi64((const __m128i*)&ref[1 * BPS]);
+      ref2 = _mm_loadl_epi64((const __m128i*)&ref[2 * BPS]);
+      ref3 = _mm_loadl_epi64((const __m128i*)&ref[3 * BPS]);
+    } else {
+      // Load four bytes/pixels per line.
+      ref0 = _mm_cvtsi32_si128(WebPMemToInt32(&ref[0 * BPS]));
+      ref1 = _mm_cvtsi32_si128(WebPMemToInt32(&ref[1 * BPS]));
+      ref2 = _mm_cvtsi32_si128(WebPMemToInt32(&ref[2 * BPS]));
+      ref3 = _mm_cvtsi32_si128(WebPMemToInt32(&ref[3 * BPS]));
+    }
    // Convert to 16b.
    ref0 = _mm_unpacklo_epi8(ref0, zero);
    ref1 = _mm_unpacklo_epi8(ref1, zero);
@@ -307,21 +176,20 @@ static void ITransform_Two_SSE2(const uint8_t* ref, const int16_t* in,
    ref1 = _mm_packus_epi16(ref1, ref1);
    ref2 = _mm_packus_epi16(ref2, ref2);
    ref3 = _mm_packus_epi16(ref3, ref3);
-    // Store eight bytes/pixels per line.
-    _mm_storel_epi64((__m128i*)&dst[0 * BPS], ref0);
-    _mm_storel_epi64((__m128i*)&dst[1 * BPS], ref1);
-    _mm_storel_epi64((__m128i*)&dst[2 * BPS], ref2);
-    _mm_storel_epi64((__m128i*)&dst[3 * BPS], ref3);
-  }
-}
-
-// Does one or two inverse transforms.
-static void ITransform_SSE2(const uint8_t* ref, const int16_t* in, uint8_t* dst,
-                            int do_two) {
-  if (do_two) {
-    ITransform_Two_SSE2(ref, in, dst);
-  } else {
-    ITransform_One_SSE2(ref, in, dst);
+    // Store the results.
+    if (do_two) {
+      // Store eight bytes/pixels per line.
+      _mm_storel_epi64((__m128i*)&dst[0 * BPS], ref0);
+      _mm_storel_epi64((__m128i*)&dst[1 * BPS], ref1);
+      _mm_storel_epi64((__m128i*)&dst[2 * BPS], ref2);
+      _mm_storel_epi64((__m128i*)&dst[3 * BPS], ref3);
+    } else {
+      // Store four bytes/pixels per line.
+      WebPInt32ToMem(&dst[0 * BPS], _mm_cvtsi128_si32(ref0));
+      WebPInt32ToMem(&dst[1 * BPS], _mm_cvtsi128_si32(ref1));
+      WebPInt32ToMem(&dst[2 * BPS], _mm_cvtsi128_si32(ref2));
+      WebPInt32ToMem(&dst[3 * BPS], _mm_cvtsi128_si32(ref3));
+    }
  }
 }

--- a/src/dsp/filters.c
+++ b/src/dsp/filters.c
@@ -19,16 +19,14 @@
 //------------------------------------------------------------------------------
 // Helpful macro.

-#define DCHECK(in, out)                                                        \
-  do {                                                                         \
-    assert((in) != NULL);                                                      \
-    assert((out) != NULL);                                                     \
-    assert(width > 0);                                                         \
-    assert(height > 0);                                                        \
-    assert(stride >= width);                                                   \
-    assert(row >= 0 && num_rows > 0 && row + num_rows <= height);              \
-    (void)height;  /* Silence unused warning. */                               \
-  } while (0)
+# define SANITY_CHECK(in, out)                                                 \
+  assert((in) != NULL);                                                        \
+  assert((out) != NULL);                                                       \
+  assert(width > 0);                                                           \
+  assert(height > 0);                                                          \
+  assert(stride >= width);                                                     \
+  assert(row >= 0 && num_rows > 0 && row + num_rows <= height);                \
+  (void)height;  // Silence unused warning.

 #if !WEBP_NEON_OMIT_C_CODE
 static WEBP_INLINE void PredictLine_C(const uint8_t* src, const uint8_t* pred,
@@ -51,7 +49,7 @@ static WEBP_INLINE void DoHorizontalFilter_C(const uint8_t* in,
  const uint8_t* preds;
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;
  preds = inverse ? out : in;
@@ -88,7 +86,7 @@ static WEBP_INLINE void DoVerticalFilter_C(const uint8_t* in,
  const uint8_t* preds;
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;
  preds = inverse ? out : in;
@@ -133,7 +131,7 @@ static WEBP_INLINE void DoGradientFilter_C(const uint8_t* in,
  const uint8_t* preds;
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;
  preds = inverse ? out : in;
@@ -167,7 +165,7 @@ static WEBP_INLINE void DoGradientFilter_C(const uint8_t* in,
 }
 #endif  // !WEBP_NEON_OMIT_C_CODE

-#undef DCHECK
+#undef SANITY_CHECK

 //------------------------------------------------------------------------------

@@ -191,12 +189,6 @@ static void GradientFilter_C(const uint8_t* data, int width, int height,

 //------------------------------------------------------------------------------

-static void NoneUnfilter_C(const uint8_t* prev, const uint8_t* in,
-                           uint8_t* out, int width) {
-  (void)prev;
-  if (out != in) memcpy(out, in, width * sizeof(*out));
-}
-
 static void HorizontalUnfilter_C(const uint8_t* prev, const uint8_t* in,
                                 uint8_t* out, int width) {
  uint8_t pred = (prev == NULL) ? 0 : prev[0];
@@ -241,14 +233,13 @@ static void GradientUnfilter_C(const uint8_t* prev, const uint8_t* in,
 WebPFilterFunc WebPFilters[WEBP_FILTER_LAST];
 WebPUnfilterFunc WebPUnfilters[WEBP_FILTER_LAST];

-extern VP8CPUInfo VP8GetCPUInfo;
 extern void VP8FiltersInitMIPSdspR2(void);
 extern void VP8FiltersInitMSA(void);
 extern void VP8FiltersInitNEON(void);
 extern void VP8FiltersInitSSE2(void);

 WEBP_DSP_INIT_FUNC(VP8FiltersInit) {
-  WebPUnfilters[WEBP_FILTER_NONE] = NoneUnfilter_C;
+  WebPUnfilters[WEBP_FILTER_NONE] = NULL;
 #if !WEBP_NEON_OMIT_C_CODE
  WebPUnfilters[WEBP_FILTER_HORIZONTAL] = HorizontalUnfilter_C;
  WebPUnfilters[WEBP_FILTER_VERTICAL] = VerticalUnfilter_C;
@@ -287,7 +278,6 @@ WEBP_DSP_INIT_FUNC(VP8FiltersInit) {
  }
 #endif

-  assert(WebPUnfilters[WEBP_FILTER_NONE] != NULL);
  assert(WebPUnfilters[WEBP_FILTER_HORIZONTAL] != NULL);
  assert(WebPUnfilters[WEBP_FILTER_VERTICAL] != NULL);
  assert(WebPUnfilters[WEBP_FILTER_GRADIENT] != NULL);
--- a/src/dsp/filters_mips_dsp_r2.c
+++ b/src/dsp/filters_mips_dsp_r2.c
@@ -24,16 +24,14 @@
 //------------------------------------------------------------------------------
 // Helpful macro.

-#define DCHECK(in, out)                                                        \
-  do {                                                                         \
-    assert(in != NULL);                                                        \
-    assert(out != NULL);                                                       \
-    assert(width > 0);                                                         \
-    assert(height > 0);                                                        \
-    assert(stride >= width);                                                   \
-    assert(row >= 0 && num_rows > 0 && row + num_rows <= height);              \
-    (void)height;  /* Silence unused warning. */                               \
-  } while (0)
+# define SANITY_CHECK(in, out)                                                 \
+  assert(in != NULL);                                                          \
+  assert(out != NULL);                                                         \
+  assert(width > 0);                                                           \
+  assert(height > 0);                                                          \
+  assert(stride >= width);                                                     \
+  assert(row >= 0 && num_rows > 0 && row + num_rows <= height);                \
+  (void)height;  // Silence unused warning.

 #define DO_PREDICT_LINE(SRC, DST, LENGTH, INVERSE) do {                        \
    const uint8_t* psrc = (uint8_t*)(SRC);                                     \
@@ -202,7 +200,7 @@ static WEBP_INLINE void DoHorizontalFilter_MIPSdspR2(const uint8_t* in,
  const uint8_t* preds;
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;
  preds = in;
@@ -250,7 +248,7 @@ static WEBP_INLINE void DoVerticalFilter_MIPSdspR2(const uint8_t* in,
  const uint8_t* preds;
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;
  preds = in;
@@ -318,7 +316,7 @@ static void DoGradientFilter_MIPSdspR2(const uint8_t* in,
  const uint8_t* preds;
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;
  preds = in;
@@ -380,7 +378,7 @@ static void GradientUnfilter_MIPSdspR2(const uint8_t* prev, const uint8_t* in,
 #undef DO_PREDICT_LINE_VERTICAL
 #undef PREDICT_LINE_ONE_PASS
 #undef DO_PREDICT_LINE
-#undef DCHECK
+#undef SANITY_CHECK

 //------------------------------------------------------------------------------
 // Entry point
--- a/src/dsp/filters_msa.c
+++ b/src/dsp/filters_msa.c
@@ -56,14 +56,12 @@ static WEBP_INLINE void PredictLineInverse0(const uint8_t* src,
 //------------------------------------------------------------------------------
 // Helpful macro.

-#define DCHECK(in, out)        \
-  do {                         \
-    assert(in != NULL);        \
-    assert(out != NULL);       \
-    assert(width > 0);         \
-    assert(height > 0);        \
-    assert(stride >= width);   \
-  } while (0)
+#define SANITY_CHECK(in, out)  \
+  assert(in != NULL);          \
+  assert(out != NULL);         \
+  assert(width > 0);           \
+  assert(height > 0);          \
+  assert(stride >= width);

 //------------------------------------------------------------------------------
 // Horrizontal filter
@@ -74,7 +72,7 @@ static void HorizontalFilter_MSA(const uint8_t* data, int width, int height,
  const uint8_t* in = data;
  uint8_t* out = filtered_data;
  int row = 1;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);

  // Leftmost pixel is the same as input for topmost scanline.
  out[0] = in[0];
@@ -137,7 +135,7 @@ static void GradientFilter_MSA(const uint8_t* data, int width, int height,
  const uint8_t* preds = data;
  uint8_t* out = filtered_data;
  int row = 1;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);

  // left prediction for top scan-line
  out[0] = in[0];
@@ -165,7 +163,7 @@ static void VerticalFilter_MSA(const uint8_t* data, int width, int height,
  const uint8_t* preds = data;
  uint8_t* out = filtered_data;
  int row = 1;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);

  // Very first top-left pixel is copied.
  out[0] = in[0];
@@ -184,7 +182,7 @@ static void VerticalFilter_MSA(const uint8_t* data, int width, int height,
  }
 }

-#undef DCHECK
+#undef SANITY_CHECK

 //------------------------------------------------------------------------------
 // Entry point
--- a/src/dsp/filters_neon.c
+++ b/src/dsp/filters_neon.c
@@ -21,16 +21,14 @@
 //------------------------------------------------------------------------------
 // Helpful macros.

-#define DCHECK(in, out)                                                        \
-  do {                                                                         \
-    assert(in != NULL);                                                        \
-    assert(out != NULL);                                                       \
-    assert(width > 0);                                                         \
-    assert(height > 0);                                                        \
-    assert(stride >= width);                                                   \
-    assert(row >= 0 && num_rows > 0 && row + num_rows <= height);              \
-    (void)height;  /* Silence unused warning. */                               \
-  } while (0)
+# define SANITY_CHECK(in, out)                                                 \
+  assert(in != NULL);                                                          \
+  assert(out != NULL);                                                         \
+  assert(width > 0);                                                           \
+  assert(height > 0);                                                          \
+  assert(stride >= width);                                                     \
+  assert(row >= 0 && num_rows > 0 && row + num_rows <= height);                \
+  (void)height;  // Silence unused warning.

 // load eight u8 and widen to s16
 #define U8_TO_S16(A) vreinterpretq_s16_u16(vmovl_u8(A))
@@ -73,7 +71,7 @@ static WEBP_INLINE void DoHorizontalFilter_NEON(const uint8_t* in,
                                                uint8_t* out) {
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;

@@ -112,7 +110,7 @@ static WEBP_INLINE void DoVerticalFilter_NEON(const uint8_t* in,
                                              uint8_t* out) {
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;

@@ -174,7 +172,7 @@ static WEBP_INLINE void DoGradientFilter_NEON(const uint8_t* in,
                                              uint8_t* out) {
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;

@@ -203,7 +201,7 @@ static void GradientFilter_NEON(const uint8_t* data, int width, int height,
                        filtered_data);
 }

-#undef DCHECK
+#undef SANITY_CHECK

 //------------------------------------------------------------------------------
 // Inverse transforms
--- a/src/dsp/filters_sse2.c
+++ b/src/dsp/filters_sse2.c
@@ -23,16 +23,14 @@
 //------------------------------------------------------------------------------
 // Helpful macro.

-#define DCHECK(in, out)                                                        \
-  do {                                                                         \
-    assert((in) != NULL);                                                      \
-    assert((out) != NULL);                                                     \
-    assert(width > 0);                                                         \
-    assert(height > 0);                                                        \
-    assert(stride >= width);                                                   \
-    assert(row >= 0 && num_rows > 0 && row + num_rows <= height);              \
-    (void)height;  /* Silence unused warning. */                               \
-  } while (0)
+# define SANITY_CHECK(in, out)                                                 \
+  assert((in) != NULL);                                                        \
+  assert((out) != NULL);                                                       \
+  assert(width > 0);                                                           \
+  assert(height > 0);                                                          \
+  assert(stride >= width);                                                     \
+  assert(row >= 0 && num_rows > 0 && row + num_rows <= height);                \
+  (void)height;  // Silence unused warning.

 static void PredictLineTop_SSE2(const uint8_t* src, const uint8_t* pred,
                                uint8_t* dst, int length) {
@@ -80,7 +78,7 @@ static WEBP_INLINE void DoHorizontalFilter_SSE2(const uint8_t* in,
                                                uint8_t* out) {
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;

@@ -113,7 +111,7 @@ static WEBP_INLINE void DoVerticalFilter_SSE2(const uint8_t* in,
                                              uint8_t* out) {
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;

@@ -176,7 +174,7 @@ static WEBP_INLINE void DoGradientFilter_SSE2(const uint8_t* in,
                                              uint8_t* out) {
  const size_t start_offset = row * stride;
  const int last_row = row + num_rows;
-  DCHECK(in, out);
+  SANITY_CHECK(in, out);
  in += start_offset;
  out += start_offset;

@@ -199,7 +197,7 @@ static WEBP_INLINE void DoGradientFilter_SSE2(const uint8_t* in,
  }
 }

-#undef DCHECK
+#undef SANITY_CHECK

 //------------------------------------------------------------------------------

--- a/src/dsp/lossless.c
+++ b/src/dsp/lossless.c
@@ -588,7 +588,6 @@ VP8LConvertFunc VP8LConvertBGRAToBGR;
 VP8LMapARGBFunc VP8LMapColor32b;
 VP8LMapAlphaFunc VP8LMapColor8b;

-extern VP8CPUInfo VP8GetCPUInfo;
 extern void VP8LDspInitSSE2(void);
 extern void VP8LDspInitSSE41(void);
 extern void VP8LDspInitNEON(void);
--- a/src/dsp/lossless.h
+++ b/src/dsp/lossless.h
@@ -182,9 +182,9 @@ extern VP8LPredictorAddSubFunc VP8LPredictorsSub_C[16];
 // -----------------------------------------------------------------------------
 // Huffman-cost related functions.

-typedef uint32_t (*VP8LCostFunc)(const uint32_t* population, int length);
-typedef uint32_t (*VP8LCostCombinedFunc)(const uint32_t* X, const uint32_t* Y,
-                                         int length);
+typedef float (*VP8LCostFunc)(const uint32_t* population, int length);
+typedef float (*VP8LCostCombinedFunc)(const uint32_t* X, const uint32_t* Y,
+                                      int length);
 typedef float (*VP8LCombinedShannonEntropyFunc)(const int X[256],
                                                const int Y[256]);

--- a/src/dsp/lossless_common.h
+++ b/src/dsp/lossless_common.h
@@ -16,10 +16,10 @@
 #ifndef WEBP_DSP_LOSSLESS_COMMON_H_
 #define WEBP_DSP_LOSSLESS_COMMON_H_

-#include "src/dsp/cpu.h"
-#include "src/utils/utils.h"
 #include "src/webp/types.h"

+#include "src/utils/utils.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -166,7 +166,7 @@ uint32_t VP8LSubPixels(uint32_t a, uint32_t b) {
 }

 //------------------------------------------------------------------------------
-// Transform-related functions used in both encoding and decoding.
+// Transform-related functions use din both encoding and decoding.

 // Macros used to create a batch predictor that iteratively uses a
 // one-pixel predictor.
--- a/src/dsp/lossless_enc.c
+++ b/src/dsp/lossless_enc.c
@@ -636,25 +636,20 @@ void VP8LBundleColorMap_C(const uint8_t* const row, int width, int xbits,

 //------------------------------------------------------------------------------

-static uint32_t ExtraCost_C(const uint32_t* population, int length) {
+static float ExtraCost_C(const uint32_t* population, int length) {
  int i;
-  uint32_t cost = population[4] + population[5];
-  assert(length % 2 == 0);
-  for (i = 2; i < length / 2 - 1; ++i) {
-    cost += i * (population[2 * i + 2] + population[2 * i + 3]);
-  }
+  float cost = 0.f;
+  for (i = 2; i < length - 2; ++i) cost += (i >> 1) * population[i + 2];
  return cost;
 }

-static uint32_t ExtraCostCombined_C(const uint32_t* X, const uint32_t* Y,
-                                    int length) {
+static float ExtraCostCombined_C(const uint32_t* X, const uint32_t* Y,
+                                  int length) {
  int i;
-  uint32_t cost = X[4] + Y[4] + X[5] + Y[5];
-  assert(length % 2 == 0);
-  for (i = 2; i < length / 2 - 1; ++i) {
-    const int xy0 = X[2 * i + 2] + Y[2 * i + 2];
-    const int xy1 = X[2 * i + 3] + Y[2 * i + 3];
-    cost += i * (xy0 + xy1);
+  float cost = 0.f;
+  for (i = 2; i < length - 2; ++i) {
+    const int xy = X[i + 2] + Y[i + 2];
+    cost += (i >> 1) * xy;
  }
  return cost;
 }
@@ -796,7 +791,6 @@ VP8LBundleColorMapFunc VP8LBundleColorMap;
 VP8LPredictorAddSubFunc VP8LPredictorsSub[16];
 VP8LPredictorAddSubFunc VP8LPredictorsSub_C[16];

-extern VP8CPUInfo VP8GetCPUInfo;
 extern void VP8LEncDspInitSSE2(void);
 extern void VP8LEncDspInitSSE41(void);
 extern void VP8LEncDspInitNEON(void);
--- a/src/dsp/lossless_enc_mips32.c
+++ b/src/dsp/lossless_enc_mips32.c
@@ -103,8 +103,8 @@ static float FastLog2Slow_MIPS32(uint32_t v) {
 //     cost += i * *(pop + 1);
 //     pop += 2;
 //   }
-//   return cost;
-static uint32_t ExtraCost_MIPS32(const uint32_t* const population, int length) {
+//   return (float)cost;
+static float ExtraCost_MIPS32(const uint32_t* const population, int length) {
  int i, temp0, temp1;
  const uint32_t* pop = &population[4];
  const uint32_t* const LoopEnd = &population[length];
@@ -130,7 +130,7 @@ static uint32_t ExtraCost_MIPS32(const uint32_t* const population, int length) {
    : "memory", "hi", "lo"
  );

-  return ((int64_t)temp0 << 32 | temp1);
+  return (float)((int64_t)temp0 << 32 | temp1);
 }

 // C version of this function:
@@ -148,9 +148,9 @@ static uint32_t ExtraCost_MIPS32(const uint32_t* const population, int length) {
 //     pX += 2;
 //     pY += 2;
 //   }
-//   return cost;
-static uint32_t ExtraCostCombined_MIPS32(const uint32_t* const X,
-                                         const uint32_t* const Y, int length) {
+//   return (float)cost;
+static float ExtraCostCombined_MIPS32(const uint32_t* const X,
+                                      const uint32_t* const Y, int length) {
  int i, temp0, temp1, temp2, temp3;
  const uint32_t* pX = &X[4];
  const uint32_t* pY = &Y[4];
@@ -183,7 +183,7 @@ static uint32_t ExtraCostCombined_MIPS32(const uint32_t* const X,
    : "memory", "hi", "lo"
  );

-  return ((int64_t)temp0 << 32 | temp1);
+  return (float)((int64_t)temp0 << 32 | temp1);
 }

 #define HUFFMAN_COST_PASS                                 \
--- a/src/dsp/lossless_enc_neon.c
+++ b/src/dsp/lossless_enc_neon.c
@@ -25,7 +25,7 @@

 // vtbl?_u8 are marked unavailable for iOS arm64 with Xcode < 6.3, use
 // non-standard versions there.
-#if defined(__APPLE__) && WEBP_AARCH64 && \
+#if defined(__APPLE__) && defined(__aarch64__) && \
    defined(__apple_build_version__) && (__apple_build_version__< 6020037)
 #define USE_VTBLQ
 #endif
--- a/src/dsp/lossless_enc_sse41.c
+++ b/src/dsp/lossless_enc_sse41.c
@@ -18,53 +18,8 @@
 #include <smmintrin.h>
 #include "src/dsp/lossless.h"

-//------------------------------------------------------------------------------
-// Cost operations.
-
-static WEBP_INLINE uint32_t HorizontalSum_SSE41(__m128i cost) {
-  cost = _mm_add_epi32(cost, _mm_srli_si128(cost, 8));
-  cost = _mm_add_epi32(cost, _mm_srli_si128(cost, 4));
-  return _mm_cvtsi128_si32(cost);
-}
-
-static uint32_t ExtraCost_SSE41(const uint32_t* const a, int length) {
-  int i;
-  __m128i cost = _mm_set_epi32(2 * a[7], 2 * a[6], a[5], a[4]);
-  assert(length % 8 == 0);
-
-  for (i = 8; i + 8 <= length; i += 8) {
-    const int j = (i - 2) >> 1;
-    const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i]);
-    const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
-    const __m128i w = _mm_set_epi32(j + 3, j + 2, j + 1, j);
-    const __m128i a2 = _mm_hadd_epi32(a0, a1);
-    const __m128i mul = _mm_mullo_epi32(a2, w);
-    cost = _mm_add_epi32(mul, cost);
-  }
-  return HorizontalSum_SSE41(cost);
-}
-
-static uint32_t ExtraCostCombined_SSE41(const uint32_t* const a,
-                                        const uint32_t* const b, int length) {
-  int i;
-  __m128i cost = _mm_add_epi32(_mm_set_epi32(2 * a[7], 2 * a[6], a[5], a[4]),
-                               _mm_set_epi32(2 * b[7], 2 * b[6], b[5], b[4]));
-  assert(length % 8 == 0);
-
-  for (i = 8; i + 8 <= length; i += 8) {
-    const int j = (i - 2) >> 1;
-    const __m128i a0 = _mm_loadu_si128((const __m128i*)&a[i]);
-    const __m128i a1 = _mm_loadu_si128((const __m128i*)&a[i + 4]);
-    const __m128i b0 = _mm_loadu_si128((const __m128i*)&b[i]);
-    const __m128i b1 = _mm_loadu_si128((const __m128i*)&b[i + 4]);
-    const __m128i w = _mm_set_epi32(j + 3, j + 2, j + 1, j);
-    const __m128i a2 = _mm_hadd_epi32(a0, a1);
-    const __m128i b2 = _mm_hadd_epi32(b0, b1);
-    const __m128i mul = _mm_mullo_epi32(_mm_add_epi32(a2, b2), w);
-    cost = _mm_add_epi32(mul, cost);
-  }
-  return HorizontalSum_SSE41(cost);
-}
+// For sign-extended multiplying constants, pre-shifted by 5:
+#define CST_5b(X)  (((int16_t)((uint16_t)(X) << 8)) >> 5)

 //------------------------------------------------------------------------------
 // Subtract-Green Transform
@@ -89,9 +44,6 @@ static void SubtractGreenFromBlueAndRed_SSE41(uint32_t* argb_data,
 //------------------------------------------------------------------------------
 // Color Transform

-// For sign-extended multiplying constants, pre-shifted by 5:
-#define CST_5b(X) (((int16_t)((uint16_t)(X) << 8)) >> 5)
-
 #define MK_CST_16(HI, LO) \
  _mm_set1_epi32((int)(((uint32_t)(HI) << 16) | ((LO) & 0xffff)))

@@ -191,8 +143,6 @@ static void CollectColorRedTransforms_SSE41(const uint32_t* argb, int stride,
 extern void VP8LEncDspInitSSE41(void);

 WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitSSE41(void) {
-  VP8LExtraCost = ExtraCost_SSE41;
-  VP8LExtraCostCombined = ExtraCostCombined_SSE41;
  VP8LSubtractGreenFromBlueAndRed = SubtractGreenFromBlueAndRed_SSE41;
  VP8LCollectColorBlueTransforms = CollectColorBlueTransforms_SSE41;
  VP8LCollectColorRedTransforms = CollectColorRedTransforms_SSE41;
--- a/src/dsp/lossless_neon.c
+++ b/src/dsp/lossless_neon.c
@@ -146,9 +146,9 @@ static void ConvertBGRAToRGB_NEON(const uint32_t* src,
 #define LOAD_U32P_AS_U8(IN) vreinterpret_u8_u32(vld1_u32((IN)))
 #define LOADQ_U32_AS_U8(IN) vreinterpretq_u8_u32(vdupq_n_u32((IN)))
 #define LOADQ_U32P_AS_U8(IN) vreinterpretq_u8_u32(vld1q_u32((IN)))
-#define GET_U8_AS_U32(IN) vget_lane_u32(vreinterpret_u32_u8((IN)), 0)
-#define GETQ_U8_AS_U32(IN) vgetq_lane_u32(vreinterpretq_u32_u8((IN)), 0)
-#define STOREQ_U8_AS_U32P(OUT, IN) vst1q_u32((OUT), vreinterpretq_u32_u8((IN)))
+#define GET_U8_AS_U32(IN) vget_lane_u32(vreinterpret_u32_u8((IN)), 0);
+#define GETQ_U8_AS_U32(IN) vgetq_lane_u32(vreinterpretq_u32_u8((IN)), 0);
+#define STOREQ_U8_AS_U32P(OUT, IN) vst1q_u32((OUT), vreinterpretq_u32_u8((IN)));
 #define ROTATE32_LEFT(L) vextq_u8((L), (L), 12)    // D|C|B|A -> C|B|A|D

 static WEBP_INLINE uint8x8_t Average2_u8_NEON(uint32_t a0, uint32_t a1) {
@@ -498,7 +498,7 @@ static void PredictorAdd13_NEON(const uint32_t* in, const uint32_t* upper,

 // vtbl?_u8 are marked unavailable for iOS arm64 with Xcode < 6.3, use
 // non-standard versions there.
-#if defined(__APPLE__) && WEBP_AARCH64 && \
+#if defined(__APPLE__) && defined(__aarch64__) && \
    defined(__apple_build_version__) && (__apple_build_version__< 6020037)
 #define USE_VTBLQ
 #endif
--- a/src/dsp/mips_macro.h
+++ b/src/dsp/mips_macro.h
@@ -45,38 +45,28 @@
  "ulw    %[" #O2 "],    " #I3 "+" XSTR(I9) "*" #I7 "(%[" #I0 "])       \n\t"  \
  "ulw    %[" #O3 "],    " #I4 "+" XSTR(I9) "*" #I8 "(%[" #I0 "])       \n\t"

-
-// O - output
-// I - input (macro doesn't change it so it should be different from I)
-#define MUL_SHIFT_C1(O, I)                                                     \
-  "mul              %[" #O "],    %[" #I "],    %[kC1]        \n\t"            \
-  "sra              %[" #O "],    %[" #O "],    16            \n\t"            \
-  "addu             %[" #O "],    %[" #O "],    %[" #I "]     \n\t"
-#define MUL_SHIFT_C2(O, I) \
-  "mul              %[" #O "],    %[" #I "],    %[kC2]        \n\t"            \
-  "sra              %[" #O "],    %[" #O "],    16            \n\t"
-
-// Same as #define MUL_SHIFT_C1 but I and O are the same. It stores the
-// intermediary result in TMP.
-#define MUL_SHIFT_C1_IO(IO, TMP)                                               \
-  "mul              %[" #TMP "],  %[" #IO  "], %[kC1]     \n\t"                \
-  "sra              %[" #TMP "],  %[" #TMP "], 16         \n\t"                \
-  "addu             %[" #IO  "],  %[" #TMP "], %[" #IO "] \n\t"
-
 // O - output
 // IO - input/output
 // I - input (macro doesn't change it)
 #define MUL_SHIFT_SUM(O0, O1, O2, O3, O4, O5, O6, O7,                          \
                      IO0, IO1, IO2, IO3,                                      \
                      I0, I1, I2, I3, I4, I5, I6, I7)                          \
-  MUL_SHIFT_C2(O0, I0)                                                         \
-  MUL_SHIFT_C1(O1, I0)                                                         \
-  MUL_SHIFT_C2(O2, I1)                                                         \
-  MUL_SHIFT_C1(O3, I1)                                                         \
-  MUL_SHIFT_C2(O4, I2)                                                         \
-  MUL_SHIFT_C1(O5, I2)                                                         \
-  MUL_SHIFT_C2(O6, I3)                                                         \
-  MUL_SHIFT_C1(O7, I3)                                                         \
+  "mul              %[" #O0 "],   %[" #I0 "],   %[kC2]        \n\t"            \
+  "mul              %[" #O1 "],   %[" #I0 "],   %[kC1]        \n\t"            \
+  "mul              %[" #O2 "],   %[" #I1 "],   %[kC2]        \n\t"            \
+  "mul              %[" #O3 "],   %[" #I1 "],   %[kC1]        \n\t"            \
+  "mul              %[" #O4 "],   %[" #I2 "],   %[kC2]        \n\t"            \
+  "mul              %[" #O5 "],   %[" #I2 "],   %[kC1]        \n\t"            \
+  "mul              %[" #O6 "],   %[" #I3 "],   %[kC2]        \n\t"            \
+  "mul              %[" #O7 "],   %[" #I3 "],   %[kC1]        \n\t"            \
+  "sra              %[" #O0 "],   %[" #O0 "],   16            \n\t"            \
+  "sra              %[" #O1 "],   %[" #O1 "],   16            \n\t"            \
+  "sra              %[" #O2 "],   %[" #O2 "],   16            \n\t"            \
+  "sra              %[" #O3 "],   %[" #O3 "],   16            \n\t"            \
+  "sra              %[" #O4 "],   %[" #O4 "],   16            \n\t"            \
+  "sra              %[" #O5 "],   %[" #O5 "],   16            \n\t"            \
+  "sra              %[" #O6 "],   %[" #O6 "],   16            \n\t"            \
+  "sra              %[" #O7 "],   %[" #O7 "],   16            \n\t"            \
  "addu             %[" #IO0 "],  %[" #IO0 "],  %[" #I4 "]    \n\t"            \
  "addu             %[" #IO1 "],  %[" #IO1 "],  %[" #I5 "]    \n\t"            \
  "subu             %[" #IO2 "],  %[" #IO2 "],  %[" #I6 "]    \n\t"            \
--- a/src/dsp/msa_macro.h
+++ b/src/dsp/msa_macro.h
@@ -73,25 +73,27 @@
 #define ST_UW(...) ST_W(v4u32, __VA_ARGS__)
 #define ST_SW(...) ST_W(v4i32, __VA_ARGS__)

-#define MSA_LOAD_FUNC(TYPE, INSTR, FUNC_NAME)               \
-  static inline TYPE FUNC_NAME(const void* const psrc) {    \
-    const uint8_t* const psrc_m = (const uint8_t*)psrc;     \
-    TYPE val_m;                                             \
-    __asm__ volatile("" #INSTR " %[val_m], %[psrc_m]  \n\t" \
-                     : [val_m] "=r"(val_m)                  \
-                     : [psrc_m] "m"(*psrc_m));              \
-    return val_m;                                           \
+#define MSA_LOAD_FUNC(TYPE, INSTR, FUNC_NAME)             \
+  static inline TYPE FUNC_NAME(const void* const psrc) {  \
+    const uint8_t* const psrc_m = (const uint8_t*)psrc;   \
+    TYPE val_m;                                           \
+    asm volatile (                                        \
+      "" #INSTR " %[val_m], %[psrc_m]  \n\t"              \
+      : [val_m] "=r" (val_m)                              \
+      : [psrc_m] "m" (*psrc_m));                          \
+    return val_m;                                         \
  }

 #define MSA_LOAD(psrc, FUNC_NAME)  FUNC_NAME(psrc)

-#define MSA_STORE_FUNC(TYPE, INSTR, FUNC_NAME)                 \
-  static inline void FUNC_NAME(TYPE val, void* const pdst) {   \
-    uint8_t* const pdst_m = (uint8_t*)pdst;                    \
-    TYPE val_m = val;                                          \
-    __asm__ volatile(" " #INSTR "  %[val_m],  %[pdst_m]  \n\t" \
-                     : [pdst_m] "=m"(*pdst_m)                  \
-                     : [val_m] "r"(val_m));                    \
+#define MSA_STORE_FUNC(TYPE, INSTR, FUNC_NAME)               \
+  static inline void FUNC_NAME(TYPE val, void* const pdst) { \
+    uint8_t* const pdst_m = (uint8_t*)pdst;                  \
+    TYPE val_m = val;                                        \
+    asm volatile (                                           \
+      " " #INSTR "  %[val_m],  %[pdst_m]  \n\t"              \
+      : [pdst_m] "=m" (*pdst_m)                              \
+      : [val_m] "r" (val_m));                                \
  }

 #define MSA_STORE(val, pdst, FUNC_NAME)  FUNC_NAME(val, pdst)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Vincent Rabaud	4619a48fc3	Fix OOB write in BuildHuffmanTable. First, BuildHuffmanTable is called to check if the data is valid. If it is and the table is not big enough, more memory is allocated. This will make sure that valid (but unoptimized because of unbalanced codes) streams are still decodable. Bug: chromium:1479274 Change-Id: I31c36dbf3aa78d35ecf38706b50464fd3d375741 (cherry picked from commit `902bc91903`) (cherry picked from commit `2af26267cd`)	2023-09-07 18:12:56 -07:00
James Zern	6a319d4da3	vp8l_enc,WriteImage: add missing error check VP8LBitWriterFinish() may cause the VP8LBitWriter's buffer to be grown. If that allocation fails, VP8LBitWriterNumBytes() will return a size larger than the current allocation resulting in a heap overwrite of the missing bytes. ==3531848==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61d000000880 at pc 0x556eddfa1007 bp 0x7ffe434c7a90 sp 0x7ffe434c7260 READ of size 2052 at 0x61d000000880 thread T0 #0 0x556eddfa1006 in __asan_memcpy #1 0x556eddfeeccf in WebPMemoryWrite src/enc/picture_enc.c:220:5 #2 0x556ede0f9f87 in WriteImage src/enc/vp8l_enc.c:1454:8 Found by Nallocfuzz (https://github.com/catenacyber/nallocfuzz). Change-Id: Ib1c9454c2c51849b0ba58c5347e6bd5b02a12fbe (cherry picked from commit `d49cfbb348`)	2023-06-17 04:49:53 +00:00
James Zern	fd7b5d4846	Merge "PaletteSortModifiedZeng: fix leak on error" into 1.3.0	2023-03-01 01:02:38 +00:00
James Zern	4654e1e738	EncodeAlphaInternal: clear result->bw on error This avoids a double free should the function fail prior to VP8BitWriterInit() and a previous trial result's buffer carried over. Previously in ApplyFiltersAndEncode() trial.bw (with a previous iteration's buffer) would be freed, followed by best.bw pointing to the same buffer. Since: `187d379d` add a fallback to ALPHA_NO_COMPRESSION In addition, check the return value of VP8BitWriterInit() in this function. Bug: webp:603 Change-Id: Ic258381ee26c8c16bc211d157c8153831c8c6910 (cherry picked from commit `a486d800b6`)	2023-02-28 00:25:46 +00:00
James Zern	d23169349f	PaletteSortModifiedZeng: fix leak on error Change-Id: I462bd9a3bc4670efdf251c295f6771a38c08a6ce (cherry picked from commit `0edbb6ea71`)	2023-02-28 00:24:49 +00:00