First, BuildHuffmanTable is called to check if the data is valid.
If it is and the table is not big enough, more memory is allocated.
This will make sure that valid (but unoptimized because of unbalanced
codes) streams are still decodable.
Bug: chromium:1479274
Change-Id: I31c36dbf3aa78d35ecf38706b50464fd3d375741
(cherry picked from commit 902bc91903)
(cherry picked from commit 2af26267cd)
WEBP_REDUCE_SIZE was introduced to bring down the library size by
removing cropping and scaling support. Previously WebPPictureView() was
only used with these two, but in
ec178f2c Add progress hook granularity in lossless
an additional use was added in VP8LEncodeStream() when extra side
configurations are used in crunch mode (-mt, quality == 100 & method ==
6 or quality >= 75 & method == 5 with a palette present currently).
WebPPictureView() and, for coherency, WebPPictureIsView() are
restored in this configuration to avoid affecting the general encode
path.
Previously WebPPictureView() was assumed to always succeed in these
cases which could result in crashes with WEBP_REDUCE_SIZE defined.
Bug: chromium:1345547
Bug: chromium:1345595
Bug: chromium:1345772
Bug: chromium:1345804
Change-Id: Ifecde36a726a434510478a764514b1469942c684
libwebp-1.2.3
- 6/30/2022: version 1.2.3
This is a binary compatible release.
* security fix for lossless encoder (#565, chromium:1313709)
* improved progress granularity in WebPReportProgress() when using lossless
* improved precision in Sharp YUV (-sharp_yuv) conversion
* many corrections to webp-lossless-bitstream-spec.txt (#551)
* crash/leak fixes on error/OOM and other bug fixes (#558, #563, #569, #573)
Bug: webp:568
Bug: b/230421671
* tag 'v1.2.3':
update ChangeLog
dsp/cpu.h: add missing extern "C"
update ChangeLog
vwebp: fix file name display in windows unicode build
webpmux: fix -frame option in windows unicode build
makefile.unix: add sharpyuv objects to clean target
update NEWS
bump version to 1.2.3
update AUTHORS
update .mailmap
Change-Id: I148041862bc930472e849d278471aa17ea12258b
This fixes incorrect character display in the file name
in e.g., stats output from cwebp.exe in visual studio builds.
In mingw further changes will be needed as using %s in wfprintf() for
wchar_t pointers is a Microsoft extension that doesn't seem to be
supported with gcc -fms-extensions. %ls works, but will require updating
format strings and wrapping concatenations with TO_W_CHAR() as visual
studio will reject merging string constants of different width.
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode
Change-Id: I57d24c3815f7b5aa3971ac315c65c4458258d706
when building for windows with _UNICODE defined
unicode_gif.h|49 col 3| warning: ISO C90 forbids mixed declarations and
code [-Wdeclaration-after-statement]
Change-Id: Ib9f0cc0eba036d6cd4221a4f70a078770dde01d0
... since no guarantee of Huffman coding can be introduced at decoding
time and encoding didn't actually use Huffman coding in the first place
Bug: webp:551
Change-Id: I400466bb3b4a1d5506353eb3f287d658603164ee
This way, only functions marked with WEBP_EXTERN get exposed by the library.
This flag is already set in other build configs: configure.ac, makefile.unix,
Android.mk
Change-Id: Iafd8f160e539290e7ea46ceec764a090c6e626a9
and use it to suppress a false positive related to data that passes
through RGBA32PackedToPlanar_16b_SSE41(). Current versions (tested with
clang 13.0.1, using -O0 and the build from oss-fuzz of enc_dec_fuzzer)
model shuffles incorrectly reporting use of uninitialized
data related to the alpha change that's removed when converting to YUV.
valgrind behaves correctly, however.
Bug: webp:573
Change-Id: If76997668dcdd436adf280a2e6dcffba766a2875
libwebp still has a size threshold of 4 pixels in picture_csp_enc.c
but the sharpyuv library itself can handle any image size.
Change-Id: Ic1c78c8f8fae528395fa46a74f717f47cb13098e
* changes:
CMake: add src to webpinfo includes
CMake: add WEBP_BUILD_WEBPINFO to list of checks for exampleutil
CMake: add WEBP_BUILD_WEBPMUX to list of checks for exampleutil
CMake: link imageioutil to exampleutil after defined
WEBP_DEP_LIBRARIES: use Threads::Threads
quiets a warning under visual studio:
src\enc\picture_enc.c(48) : warning C4028: formal parameter 1 different
from declaration
Change-Id: Ic3affbbb0e22ac8c43fa183e13506eee72e180dc
- Remove SHALF constant so that we get back the original value when
calling UpScale then downscale with rounding
- Make the linear->gamma precomputed table bigger (8 bits rather than 5)
- Round values in kLinearToGammaTabS
Change-Id: Ic9634d32cf93de321d2f6e9e4cad7f156d8d07df
- pic->picture in public header
- match implementation to declaration in PictureImport, WebPPictureRescale, WebpBlendAlpha
Change-Id: Ibf3771af22d671bba6fd657684add618c6f32978
for recent changes to color transform, simple code length code, image
structure and details of decoding prefix codes
Bug: webp:551
Change-Id: I36a4d1ee3e25b8a56f5d926f4770374ace99bc2f
and the corresponding InverseTransform(); the operations were swapped.
The forward transform subtracts the deltas, the inverse adds them.
Bug: webp:551
Change-Id: Ieb90a24f843cee7cdfe46cbb15ec7d716ca9d269
the value of the TR-pixel on the right border is the leftmost pixel on
the current row, not the previous one
Bug: webp:551
Change-Id: I157e8c16ce43a9f52c36aa880be1be3bef19c063
Remove unused constants.
Use ALL_CAPS for defines and kCamelCase for static const values.
Change some defines into static constants if they are not used in array sizes.
Change-Id: I036b0f99215fd0414a33f099bd6b809ff8ee4541
under some versions of visual studio:
sharpyuv\sharpyuv_csp.c(43): warning C4244: '=': conversion from 'int' to
'float', possible loss of data
sharpyuv\sharpyuv_csp.c(35): warning C4244: 'initializing': conversion from
'int' to 'float', possible loss of data
Change-Id: I452d24857b43b8a80fae0f339163b4b3965f2d9f
quiets -Wunused-but-set-variable
frame_count has been unused in this function since:
ab714b8a demux, Frame: remove is_fragment_ field
Change-Id: Ie6afda915c6b82736e05e7490eba0165c3dd37e4
1 space is most common in the source; this fixes some mixed cases within
lossless files, likely from clang-format
Change-Id: I504206d5bf418781d4131ee73570ecee4e0a69a1
several calls to ChunkSetHead() were unchecked, causing the chunk to
leak should the call fail due to OOM
Tested:
for i in `seq 1 1125`; do
export MALLOC_FAIL_AT=$i
./examples/gif2webp gif_file
./examples/gif2webp -mixed gif_file
done
for i in `seq 1 171`; do
export MALLOC_FAIL_AT=$i
./examples/img2webp jpeg_file -o /dev/null
./examples/img2webp -mixed jpeg_file -o /dev/null
done
Change-Id: I479bc487f61b493e5ce033872d353007055c172a
argv_data would leak if the argv_ allocation failed or the MAX_ARGC cap
was hit
Tested:
for i in `seq 1 171`; do
export MALLOC_FAIL_AT=$i
./examples/img2webp jpeg_file -o /dev/null
./examples/img2webp -mixed jpeg_file -o /dev/null
done
Change-Id: I39864691e96d5456f324d95a3653bba0f6d6a7be
in ReadAnimatedWebP() & ReadAnimatedGIF() when AllocateFrames() fails
due to OOM
Tested:
for i in `seq 1 278`; do
export MALLOC_FAIL_AT=$i
./examples/anim_diff webp1 webp2
./examples/anim_diff gif1 gif2
done
Change-Id: I5d486c2c2982ae088e34a25d8e3e0188df1a9b51
previously calls to WebPPictureCopy() weren't checked, causing a crash
when the canvas copies were accessed later in the loop
Tested:
for i in `seq 1 1125`; do
export MALLOC_FAIL_AT=$i
./examples/gif2webp gif_file
./examples/gif2webp -mixed gif_file
done
Change-Id: I186d13be0ec8f3b72b7d247a8482590c1b1be541
previously failures in the call to
VP8LBackwardReferencesTraceBackwards() would be ignored which, though it
wouldn't result in a crash, would produce non-deterministic output
Change-Id: Id9890a60883c3270ec75e968506d46eea32b76d4
change CostManager to calloc to avoid frees on undefined pointer
values in CostManagerClear() should the cost_model allocation succeed,
but the cost_manager allocation fail
since:
v0.5.0-93-g3e023c17 Speed-up BackwardReferencesHashChainDistanceOnly.
Tested:
for i in `seq 1 639`; do
export MALLOC_FAIL_AT=$i
./examples/cwebp -m 6 -q 100 -lossless jpeg_file
done
Bug: webp:565
Change-Id: I376d81e6f41eb73529053e9e30c142b4b4f6b45b
initialize bw_side before calling EncoderAnalyze() & EncoderInit() which
may fail; previously this would cause a free of an invalid pointer in
VP8LBitWriterWipeOut().
since at least:
v0.6.0-120-gf8c2ac15 Multi-thread the lossless cruncher.
Tested:
for i in `seq 1 639`; do
export MALLOC_FAIL_AT=$i
./examples/cwebp -m 6 -q 100 -lossless jpeg_file
done
Bug: webp:565
Change-Id: I1c95883834b6e4b13aee890568ce3bad0f4266f0
AC_PROG_LIBTOOL is deprecated. quiets a warning from autoconf 2.71:
configure.ac:12: warning: The macro `AC_PROG_LIBTOOL' is obsolete.
Change-Id: I3b131f5ee636a8df48057862e600759f25ad7289
the trailing width % 8 bytes would clear the upper bytes of
alpha_mask as they're done one at a time
since:
49d0280d NEON: implement several alpha-processing functions
Change-Id: Iff76c0af3094597285a6aa6ed032b345f9856aae
this removes the mt.exe step and may fix some build failures if the
executable is locked after linking, e.g., by an antivirus program
Change-Id: I9a8184f3aaecb62a133a13d9ad3383667bc93fd3
It's self contained apart from a dependency on src/webp/types.h and src/dsp/cpu.h
For now it's only set up as an internal library, not an installable one.
Webp doesn't depend on it yet, the code is only duplicated.
Change-Id: I752799894f9d4105d0d296ddebd9f9641181a1ec
Break the main README into into multiple pages in the doc/ directory,
except for the tests, swig and webp_js docs which are in the corresponding
directories.
The webp mux doc is merged into the API doc and the tools doc.
Change-Id: Ia407617dd88094f4662841d37947cfef80799914
results in code layout changes, a couple fewer instructions; some of the
smaller functions were unaffected as they were inlined, but are updated
for consistency. this mostly affects VP8Decimate(), ReconstructIntra16()
and ReconstructUV().
Change-Id: Icc2582278987a66ad1110bab683d1e0c21e6591a
A WebPPicture instance is necessary to call WebPReportProgress() which
sets WebPPicture::error_code so as well use WebPEncodingSetError() to
record errors too, instead of functions returning a WebPEncodingError.
However there must be one WebPPicture instance per thread, with error
codes merged at sync time. A mutex could simplify that but it is not
the objective of this change.
https://groups.google.com/a/webmproject.org/g/webp-discuss/c/yOiP8APubgc/m/vCTvxl6ODgAJ
Change-Id: Ia1a8f9d1199202e1c88484ce719b0180a80447ce
When using CMake the `<prefix>/usr/share/WebP/cmake/WebPConfig.cmake` gets
used to get the names of the libraries to link against.
Since version 1.2.1 of libwebp, libwebpmux is on by default.
This causes a linker error because the linker arg should be `-lwebpmux`
instead of `-llibwebpmux`.
This is fixable by renaming `libwebpmux` to `webpmux` in a few places.
This was identified after an update to 1.2.1 of libwebp in the OpenWrt
project:
https://github.com/openwrt/packages/pull/16766
Internally in OpenWrt, this was patched here:
https://github.com/openwrt/packages/pull/16784
Change-Id: I97501c62b1e97b133a62b04d6516261d6d7c7fd0
Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
- prefer Exif to EXIF when not referring to the chunk
- normalize use of code format for canvas / frame width & height
Change-Id: I7fc9e87baeb5d8e719274680ce621dcb078a985f
libwebp-1.2.2
- 1/11/2022: version 1.2.2
This is a binary compatible release.
* webpmux: add "-set bgcolor A,R,G,B"
* add ARM64 NEON support for MSVC builds (#539)
* fix duplicate include error in Xcode when using multiple XCFrameworks in a
project (#542)
* doc updates and bug fixes (#538, #544, #548, #550)
* tag 'v1.2.2':
update ChangeLog
libwebp: Fix VP8EncTokenLoop() progress
BMP enc: fix the transparency case
libwebp: do not destroy jpeg codec twice on error
update ChangeLog
update NEWS
man/img2webp.1: update date
Reword img2webp synopsis command line
anim_decode: fix alpha blending with big-endian
webpinfo: fix fourcc comparison w/big-endian
update ChangeLog
update NEWS
bump version to 1.2.2
update AUTHORS
Bug: webp:541, b/202302177
Change-Id: Iae875b6ec3084157837cc774c94088ca72e8dd91
When transparency is present, it's not enough to just write 32bpp samples.
One need to use the full BITMAPV3INFOHEADER syntax and specify the
masks for BGRA.
see https://en.wikipedia.org/wiki/BMP_file_format#Pixel_storage
Also remove the height-flip trick and write samples bottom-to-top instead.
Change-Id: If5d92c11453b96764b5bfbf19e9678e632bc911f
(cherry picked from commit 480cd51de6)
WebPPictureImportRGB() can fail on memory allocation. In this case,
jpeg_destroy_decompress() was already called, so do not go to Error.
Free metadata as an error is returned.
Change-Id: I045b072090e9063d3ad10369ad18b0f08bdffe9f
(cherry picked from commit 6e8a4126f2)
WebPPictureImportRGB() can fail on memory allocation. In this case,
jpeg_destroy_decompress() was already called, so do not go to Error.
Free metadata as an error is returned.
Change-Id: I045b072090e9063d3ad10369ad18b0f08bdffe9f
When transparency is present, it's not enough to just write 32bpp samples.
One need to use the full BITMAPV3INFOHEADER syntax and specify the
masks for BGRA.
see https://en.wikipedia.org/wiki/BMP_file_format#Pixel_storage
Also remove the height-flip trick and write samples bottom-to-top instead.
Change-Id: If5d92c11453b96764b5bfbf19e9678e632bc911f
images are decoded in RGBA/BGRA, but represented as uint32_t during the
blend process; this fixes the channel extraction
Bug: webp:548
Change-Id: Ie74aa43d8f87d3552d5afc0abba466335f5d1617
(cherry picked from commit e4886716d3)
store the recognized fourccs in little-endian order to match how the
fourcc is being read from the file
Bug: webp:548
Change-Id: I9de77db92208709d5e711846908a51e563102fa5
(cherry picked from commit e3cb052ca5)
images are decoded in RGBA/BGRA, but represented as uint32_t during the
blend process; this fixes the channel extraction
Bug: webp:548
Change-Id: Ie74aa43d8f87d3552d5afc0abba466335f5d1617
store the recognized fourccs in little-endian order to match how the
fourcc is being read from the file
Bug: webp:548
Change-Id: I9de77db92208709d5e711846908a51e563102fa5
msa_macro.h
neon.h
allows the headers to be built / analyzed under different target
configurations
Change-Id: Ibbcfada210b54988aa5279674d53af8e21fd4a97
- prefer https
- metadataworkinggroup.org/com seem to be offline; the web archive link
was obtained from exiftool: https://exiftool.org/TagNames/MWG.html
- fix kramdown link, rubyforge has been gone a long time
- fix png/zlib links
Bug: webp:544
Bug: b/202302177
Change-Id: Id69de4553e7baf00393f12a2c1acb262443a1a93
to https://datatracker.ietf.org/doc/html/... the http tools.ietf.org
links redirect here sometimes, in other cases they 404.
Bug: webp:544
Change-Id: I900972070d6c5659c45a86a89e78b870f42fe5bc
... when it's not available. Even if the value was discarded and
never used, some msan config were complaining about reading it
and passing it around.
Change-Id: Iab8d24676c5bb58e607a829121e36c2862da397c
Similarly to "-set loop <>", this is useful to set
the parameter at global level and not just while assembling
the frames separately.
Change-Id: I79bcbe37f8ff50b9904e78d14011ccbac72f17a1
Detects whether test block should be run in the current test shard.
if shard_should_run; then
...
fi
Bug: b/185520507
Change-Id: I408d226651630a01c5e447dbff7d26b73426ce42
unknown chunks as a group are well defined in the text
+ remove the Note about stable features as it matched the list given;
there are no WIP features for the format
Change-Id: I11a4de39f9d203b815879f0e9dea9c45b2ae6ff6
shfmt is expected to be installed along with go.
GO111MODULE=on go install mvdan.cc/sh/v3/cmd/shfmt@latest
Change-Id: I27004e72f79f71df9405158f77df485f43b028bb
Bug: b:185520494
Shellcheck is expected to be installed because depot_tools does not
provide the tool.
Change-Id: I9c8793cf7e2adedcb1e9f8244b886d6f2a54e2ae
Bug: b:185520494
Visual Studio added ARM64 support, but requires arm64_neon.h to be
included rather than arm_neon.h. Visual Studio 2019 addressed this so
we'll start with that version and leave a local adapter include for a
follow up.
Bug: webp:539
Change-Id: If975c029dafffba99210b3bb2d670035a83e8105
- only initialize variable when needed
- perform first loop outside the for loop
- perform computation only if know we are not already worse
- not adding base_score every time
Change-Id: I2cb8231fcaec1113b5902ed61b685f0ae3c78823
libwebp-1.2.1
- 7/20/2021: version 1.2.1
This is a binary compatible release.
* minor lossless encoder improvements and x86 color conversion speed up
* add ARM64 simulator support to xcframeworkbuild.sh (#510)
* further security related hardening in libwebp & examples
(issues: #497, #508, #518)
(chromium: #1196480, #1196773, #1196775, #1196777, #1196778, #1196850)
(oss-fuzz: #28658, #28978)
* toolchain updates and bug fixes (#498, #501, #502, #504, #505, #506, #509,
#533)
* use more inclusive language within the source (#507)
* tag 'v1.2.1':
update ChangeLog
fuzzer/*: normalize src/ includes
fix indent
update ChangeLog
dsp/*: use WEBP_HAVE_* to determine Init availability
update NEWS
bump version to 1.2.1
update AUTHORS
Bug: 521, b/188435220
Change-Id: Id6e6781d8da878413c2e57dcae5e2f2841174181
this uses the format introduced to some files in:
cc3577e9 fuzzer/*: use src/ based include paths
Change-Id: I9b5cbeadbb9d54d1e89f474a6e479a5eb3175ed7
(cherry picked from commit c5bc36243a)
after:
ece18e55 dsp.h: respect --disable-sse2/sse4.1/neon
WEBP_USE_* will be set when a module is targeting a particular
instruction set, e.g., sse4.1, and not overridden if WEBP_HAVE_SSE41 is
set, as previously this would ignore the case where the instruction set
was disabled via config.h and the HAVE macro was unset.
dsp.h not ensures WEBP_HAVE_* are set when WEBP_USE_* to cover cases
where the files are built without config.h.
Change-Id: Ia1c2dcf4100cc1081d968acb6e085e2a1768ece6
(cherry picked from commit 1fe3162541)
after:
ece18e55 dsp.h: respect --disable-sse2/sse4.1/neon
WEBP_USE_* will be set when a module is targeting a particular
instruction set, e.g., sse4.1, and not overridden if WEBP_HAVE_SSE41 is
set, as previously this would ignore the case where the instruction set
was disabled via config.h and the HAVE macro was unset.
dsp.h not ensures WEBP_HAVE_* are set when WEBP_USE_* to cover cases
where the files are built without config.h.
Change-Id: Ia1c2dcf4100cc1081d968acb6e085e2a1768ece6
This way target_link_libraries(my_target PUBLIC webp) is enough,
no need to add include directories explicitly for the public headers
such as src/webp/decode.h. This matches the /usr/include/webp/decode.h
installation folder path prefix (#include "webp/decode.h").
See https://bugs.chromium.org/p/webp/issues/detail?id=532
The target_include_directories() calls for cwebp, dwebp etc. may be
unnecessary now and may be removed in another patch.
Change-Id: I2c85218920ffc35ebec23e0dd239f71e29add23b
previously this would be overridden if the instruction set was enabled
via -msse4.1, __aarch64__, etc.
Change-Id: I51e87a7da7589c6093d260b848ab41d89ec7b990
the container check may not be strictly necessary now that the alpha
check has matured and the library will detect usable alpha, but it's
safer to fix this case first before making larger changes.
Bug: webp:533
Change-Id: I2e1ba42156970d579a52bd183707a037e65fd900
similar to '* const', __restrict needs to be included in the
declaration to avoid warnings like:
src\dsp\alpha_processing.c(429): warning C4028: formal parameter 1
different from declaration
this change also moves WEBP_RESTRICT to dsp.h to avoid a circular
dependency between it and utils.h which already includes dsp.h
Change-Id: Ib070d358a372e76fae4be5445ab288940b9baac0
Between 3.17.0 and 3.18.2 check_cxx_compiler_flag() sets a normal
variable at parent scope while check_cxx_source_compiles() continues to
set an internal cache variable, so we unset both to avoid the failure /
success state persisting between checks.
This was fixed in 3.18.3 [1], but regressed shortly afterward [2];
currently seen with 3.20.0 installed via homebrew.
[1] https://gitlab.kitware.com/cmake/cmake/-/issues/21207
[2]
90dead024c CheckCompilerFlag: unified way to check compiler flags per language
Change-Id: I57282f89b07a9cfd85aca7569380f7d115c0b3cf
this will provide the main libraries and their supporting examples by
default if external library requirements (e.g., gif2webp+libgif) are
met.
Bug: webp:501
Change-Id: I593adf9222698e2dc5a2199909949c7fea1273b2
this ensures the defaults are set properly before testing conditions for
enabling certain binaries. previously anim_diff would need
--enable-libwebpdemux to be explicitly added though it defaults to
enabled.
Bug: webp:501
Change-Id: Ifac68eac7096b39e98d0025e07a37b0be3d32c0a
this can help with some aliasing issues with some versions of clang/gcc,
similar to:
3e265136 Add WEBP_RESTRICT & use it in VP8BitReader
Change-Id: I863e53cc9d707c9a4b21373ca743c3089aed012e
this should be the full path to the installed headers to be added to the
client code's search path, e.g., /usr/local/include when including
<webp/decode.h>
Bug: webp:509
Change-Id: I756249876d8de421c9a33513221fb635157560ef
this enables cases that might trigger overflows, but increases the risk
of OOM and timeouts
Bug: chromium:1196850
Change-Id: I317b5109525646731e762faa3c34ed28a27595dc
Check encoded_frames_ count and call FlushFrames if necessary after
IncreasePreviousDuration. Avoids an overflow in encoded_frames_[] with
-kmax 0 and an assertion failure related to the previous and keyframe
durations when a frame is forced in this way.
Based on patch by tomwei7g <at> gmail
Bug: webp:518
Change-Id: Idef685e6c06a67d48fcdc048265ca0e672a01263
Silences CMake's warning and uses GLVND, which still works for linux
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
Change-Id: Iadd173b41c4fbcdff9fe512f9a7dbf70a6d95bcd
Marking the `VP8BitReader` as `__restrict__` helps the compiler generate
better code avoiding issues related to aliasing (re-loads/stores).
Change-Id: Ib7178f57e27e5f40572efc3e567cdf994ea6d928
use 64-bit math in calculating the offsets as they may exceed 32-bits
when scaling
Bug: chromium:1196850
Change-Id: I6a484fc4dded6f6c4b82346ef145eb69c1477b3c
Adds an additional option similar to configure's --enable-libwebpmux
to toggle building libwebpmux separate from the binaries
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
Change-Id: I0443b84eea36d86791e2e421a6fc0070879a7bef
promote out_width to size_t before multiplying
src/dec/io_dec.c:301:30: runtime error: signed integer overflow: 2 *
1224167500 cannot be represented in type 'int'
#0 0x55fd9e8de2bd in InitYUVRescaler src/dec/io_dec.c:301:30
#1 0x55fd9e8de2bd in CustomSetup src/dec/io_dec.c:571:54
Bug: chromium:1196850
Change-Id: I70d0aac1b5eef163a3f353b721fb9ab561e02040
this will avoid the potential for some integer overflows in rescaler
calculations
Bug: chromium:1196850
Change-Id: Iaa09f5d6b888b39aaeb2154d470279620362d6eb
in Export increment the dst pointer, but in EmitRescaledRowsRGBA use
64-bit math as the number of output lines is variable and may still
overflow when incrementing.
Bug: chromium:1196850
Change-Id: I5c65b875894ee9da0fef1e24d27e507171800c4a
with large sizes the intermediate calculations may exceed 32-bits
src/dec/io_dec.c:491:17: runtime error: signed integer overflow: 3 *
788529152 cannot be represented in type 'int'
#0 0x557a3ad972b2 in InitRGBRescaler src/dec/io_dec.c:491:17
#1 0x557a3ad972b2 in CustomSetup src/dec/io_dec.c:563:29
Bug: chromium:1196850
Change-Id: Iaf2e8a6de9481dfea31dcd7fccb2d4eca767bddf
with large scale values the offset to the end of the buffer may exceed
32-bits range.
src/dec/buffer_dec.c:158:39: runtime error: signed integer overflow: 2 *
1275068416 cannot be represented in type 'int'
#0 0x56444802bea5 in WebPFlipBuffer src/dec/buffer_dec.c:158:39
Bug: chromium:1196850
Change-Id: I08c8b69ada5d5dd3e9bf2b9006dffa0c5f2103a5
in addition to checking the environment for "MALLOC_LIMIT"; the
environment will still take precedence.
this is in preparation for adding extreme config value coverage to
advanced_api_fuzzer
Bug: chromium:1196850
Change-Id: Ibe22f5e39e030a422fd6e383269bde35252d3fae
avoids integer overflow in extreme cases:
src/dsp/rescaler.c:45:32: runtime error: signed integer overflow: 129 *
16777215 cannot be represented in type 'int'
#0 0x556bde3538e3 in WebPRescalerImportRowExpand_C src/dsp/rescaler.c:45:32
#1 0x556bde357465 in RescalerImportRowExpand_SSE2 src/dsp/rescaler_sse2.c:56:5
...
Bug: chromium:1196850
Change-Id: I4f923807f106713e113f3eec62a1d1c346066345
after the check using 64-bit math we used a signed integer in the
multiplication. previously unsigned integer max was tested.
fixes cases like:
src/dec/buffer_dec.c:108:16: runtime error: signed integer overflow:
944731466 * 4 cannot be represented in type 'int'
#0 0x55e56187dc1d in AllocateBuffer src/dec/buffer_dec.c:108:16
#1 0x55e56187dc1d in WebPAllocateDecBuffer src/dec/buffer_dec.c:216:12
...
Bug: chromium:1196850
Change-Id: I6e5b3e5d1d5b50b5c98c39bbf9813a63fedc5ca7
replace with more inclusive terms or remove the comment entirely if the
meaning was already clear.
Bug: webp:507
Change-Id: Ica3bbf751ebf79f6668df6e6209af770248ff4ca
enc is allocated with WebPSafeCalloc so there's no need to clear the
pointers afterward.
this has the side-effect of removing a non-inclusive term.
Bug: webp:507
Change-Id: I82f82954936638c4c15d33b2d6f0497a6a13571f
This reverts commit b6513fbaa8.
This change can produce files that can cause decode failures in some
versions of chrome and safari/ios/macos.
https://chromium-review.googlesource.com/c/chromium/src/+/2876279
The chrome fix will be available in M92. This change can be revisited
after it and the mac updates are more widely deployed.
Bug: b/186640109,b/188702956
Change-Id: I296b8fe88c6c48219e3edf532226c4f972f1605b
though the max chunk/payload sizes were checked and would fail the
padded size was being calculated beforehand which could result in a
(harmless) unsigned int overflow warning.
Bug: webp:508
Change-Id: I4fa30ded2b027c1577b03049a2deeb7bf75e5472
fixes conversion warnings in visual studio after:
b1674240 Add modified Zeng's method to palette sorting.
src\enc\vp8l_enc.c(296) : warning C4244: '=' : conversion from 'const
uint16_t' to 'uint8_t', possible loss of data
src\enc\vp8l_enc.c(299) : warning C4244: '=' : conversion from 'const
uint16_t' to 'uint8_t', possible loss of data
Change-Id: I981b1ba4912edbbafbd49f1f5b1043bf12266920
large values of x_add and y_add may rollover an int causing a later
assertion to fail in WebPRescalerExportRow due to fxy_scale incorrectly
being set to 0.
fixes:
src/dsp/rescaler.c:178: void WebPRescalerExportRow(WebPRescaler *const):
Assertion `wrk->src_height == wrk->dst_height && wrk->x_add == 1'
failed.
Bug: chromium:1196480
Change-Id: I2c00f015d61a1257033d8edb1edd4d060d6878b7
this matches the description in WebPDecoderOptions and prevents a
mismatch between the user supplied options and the ones used by io.
Bug: chromium:1196773, chromium:1196775, chromium:1196480
Change-Id: I3603b806884cfc6969b093d06b7980b0cc13199b
this matches the description in WebPDecoderOptions and prevents a
mismatch between the user supplied options and the ones used by io.
Bug: chromium:1196480
Change-Id: Id464f999d737078078f9d21afe25b349317f5ab4
and avoid integer overflow in test of x/width and y/height parameters
against the image width/height
Bug: chromium:1196778, chromium:1196777, chromium:1196480
Change-Id: I7b8f1f4dbebfe073b1ba260b8317979488655dcc
With palette+predictors, cross-color was forced (because of predictors).
No need for cross-color for palettes as R/B==0.
This saves 10 bytes per image that uses palette+predictors.
Change-Id: If2184d16cdabe1e8498009062284ad3e37ef1342
if bypass_filtering was set to 1 in the user provided options it
shouldn't be reset in the use_scaling pass even if the image satisfies
the scaling requirements.
Change-Id: I036029907886acb63748872d5f8763954a7c607b
ICC extraction via GetColorContexts may fail due to the operation not
being supported with e.g., bitmaps.
Bug: webp:506
Change-Id: I587220e688ac90d77c84af21c9e7bc8bf178f3aa
this controls the config.h define of the same name and matches the
behavior of configure's --disable-threading
Bug: webp:502
Change-Id: Id135bbf4e6b3c9900e2b4f4f7ab744b3457ce41c
VerticalUnfilter_SSE2 has long been disabled due to a crash in an
Android emulator that hasn't reproduced elsewhere (crbug.com/654974).
this synchronizes the code for now to avoid needing to locally edit the
file on import.
Bug: 1141126
Change-Id: Ib61aeab93caaff1759606566b9e499eaac1576cf
this synchronizes the code with chrome, where this format allows the
code to pass buildtools/checkdeps/checkdeps.py
Bug: 1141126
Change-Id: I25361b1a43cd95730814302f02aa16af8fdb5fd2
Also fix the variables: we need to check for PTHREAD_PRIO_INHERIT
and PTHREAD_CREATE_JOINABLE (not PTHREAD_CREATE_UNDETACHED) and
internally use HAVE_PTHREAD_PRIO_INHERIT and PTHREAD_CREATE_JOINABLE
(and not HAVE_PTHREAD_CREATE_JOINABLE).
cmake/config.h.in actually had the right variables.
BUG=webp:498
Change-Id: Ibf6cf854337cea5781a74316024f8ff4960366d7
(cherry picked from commit ceddb5fc8d)
Also fix the variables: we need to check for PTHREAD_PRIO_INHERIT
and PTHREAD_CREATE_JOINABLE (not PTHREAD_CREATE_UNDETACHED) and
internally use HAVE_PTHREAD_PRIO_INHERIT and PTHREAD_CREATE_JOINABLE
(and not HAVE_PTHREAD_CREATE_JOINABLE).
cmake/config.h.in actually had the right variables.
BUG=webp:498
Change-Id: Ibf6cf854337cea5781a74316024f8ff4960366d7
this function produces different results from the C code due to
use of double/float resulting in output differences when compared to
-noasm.
Bug: webp:499
Change-Id: Ia039b168c0a66da723fb434656657ba1948db8ae
Some PNG input contain chunks larger than libpng's default
memory-alloc limit (8M).
Raise this limit reasonably if it looks like the input bitstream
is larger than libpng's default limit.
BUG=webp:497
Change-Id: I2c9fbed727424042444b82cbf15e0781cefb38dc
(cherry picked from commit 8696147da4)
Some PNG input contain chunks larger than libpng's default
memory-alloc limit (8M).
Raise this limit reasonably if it looks like the input bitstream
is larger than libpng's default limit.
BUG=webp:497
Change-Id: I2c9fbed727424042444b82cbf15e0781cefb38dc
add IOS_MIN_VERSION, MACOSX_MIN_VERSION and MACOSX_CATALYST_MIN_VERSION
to allow control of the minimum versions supported based on the
deployment target; based on feedback from:
e8e8db98 add xcframeworkbuild.sh
e8e8db985a
Change-Id: I9fbca072bf00c4cb8e59143371a2d3522d22808b
somewhat arbitrary version selection, but this matches what's currently
in use in ExoPlayer in the dev-v2 branch
fixes:
FAILURE: Build failed with an exception.
* What went wrong:
Could not determine java version from '11.0.9.1'.
and removes the wrapper task as this is predefined in later versions:
Build file '/home/jzern/external/libwebp/build.gradle' line: 437
* What went wrong:
A problem occurred evaluating root project 'libwebp'.
> Cannot add task 'wrapper' as a task with that name already exists.
Change-Id: Ib9ef757d874cd6d4b08da3e7cef3ac5316935966
It's explicitly safe (and recommended!) to plug external data into
the pic->y/u/v/argb fields. They are guaranteed to be preserved
by the encoding process if no conversion is needed.
Change-Id: I325ca41a6a834f7f028431c605dddef67e9542cc
this is similar to iosbuild.sh, but will create .xcframework variants to
support Catalyst targets. currently it includes libs for:
ios-arm64_armv7_armv7s
ios-arm64_x86_64-maccatalyst
ios-i386_x86_64-simulator
macos-arm64_x86_64
this script requires Xcode 12+ to target arm64 for mac/catalyst, Xcode
11 builds will create x86_64 libraries only. iosbuild.sh remains for
compatibility purposes
Change-Id: I289c54c4b85848392a99bea698d45d54a9781f49
Since
a376e7b9 Fix compilation on windows and clang-cl+ninja.
the highest level assembly flag (/arch:AVX / -msse4.1) would be applied
to all assembly files. This could result in higher level instructions
being used which would defeat the runtime cpu detection check and could
result in a crash.
With this change -m style flags are used with clang-cl as it supports
them and allows the correct level to be set. With at least Visual
Studio 12 (2013)+ /arch is not necessary to build SSE2 or SSE4 code
unless a lesser /arch is forced so these flags are avoided. This matches
the nmake build.
For emscripten only sse2 and sse4.1 are tested (NEON will succeed in
being enabled, but fail to build). This is consistent with
the current behavior added in:
commit 36c81ff6a9
WASM-SIMD: port 2 patches from rreverser@'s tree
* Stop on first found SIMD version
* Imply lower SSE version when higher is found
Bug: webp:488
Change-Id: I34d01274e5204a477b6b9f35ed566048a62b4c57
Tested: msvc 2013-2019 debug win32/x64 builds; clang-cl under 2019
Add a stub file to object only library targets.
https://cmake.org/cmake/help/v3.18/command/add_library.html#object-libraries
Some native build systems (such as Xcode) may not like targets that have
only object files, so consider adding at least one real source file to
any target that references $<TARGET_OBJECTS:objlib>.
Bug: webp:477
Change-Id: I5d404c921d84433c7a0b22e0736ab0dbe699d264
- Add `-msimd128` to flags to actually enable WebAssembly SIMD
when performing SIMD detection. It's currently required in
addition to `-msse*` / `-mfpu=neon` flags which only perform
translation of corresponding intrinsics to Wasm SIMD ones.
See a discussion at emscripten-core/emscripten#12714 for
automating this and making easier in the future.
- Remove compilation branch that prevented definitions of
`WEBP_USE_SSE` and `WEBP_USE_NEON` on Emscripten even when
SIMD support was detected at compile-time.
- Add an implementation of `VP8GetCPUInfo` for Emscripten which
uses static `WEBP_USE_*` flags to determine if a corresponding
SIMD instruction is supported. This is because Wasm doesn't
have proper feature detection (yet) and requires making separate
build for SIMD version anyway.
Change-Id: I77592081b91fd0e4cbc9242f5600ce905184f506
The offset *can* be negative, but the sanitizer reports strange
address behaviour when row_offset is unsigned size_t.
For safety, use int64_t instead (probably overkill. int32_t is probably ok).
Change-Id: I1bd424bfdb5447b3839f40679581d6bdea075320
After ParseAnimationFrame() calls StoreFrame(), check if StoreFrame() reads
more than anmf_payload_size bytes from dmux->mem_. Treat that as PARSE_ERROR.
Change-Id: I0d03885c19d32792af78de7bed1a944ca01f1dc6
with functions that can legitimately fail when under memory pressure the
fuzzer should exit gracefully rather than abort().
+ add some more error detail to output
Bug: chromium:1140448
Change-Id: I1a8582a939e0a5b2b8631c95c0464658c99063e2
this was only used in debug builds, the build should be fast enough in
either case.
https://docs.microsoft.com/en-us/cpp/build/reference/gm-enable-minimal-rebuild?view=vs-2019
/Gm is deprecated. It may not trigger a build for certain kinds of
header file changes. You may safely remove this option from your
projects. To improve build times, we recommend you use precompiled
headers and incremental and parallel build options instead.
Change-Id: I8e3c0e7cc82e10bd1d2b0904d290fe4e050ebe8b
with WebPReplaceTransparentPixels() function signature:
src\enc\picture_tools_enc.c(86): warning C4028: formal parameter 1
different from declaration
Change-Id: I0140d61b0dfebcbb4189707e8f2f4b1af802a4d7
this provides stronger synchronization when pthreads are available as
was done in 'd77bf512 add WEBP_DSP_INIT / WEBP_DSP_INIT_FUNC' for the
other init functions.
Change-Id: I2ffe4e24454d276c2411ece34dca38d23d4756d5
The output was always 99dB because the lossless pipeline is not
modifying the 'picture'. Changing that is not that simple because
near_lossless impacts both VP8ApplyNearLossless() and
ApplyPredictFilter(); the latter cannot be applied as is to the input
and thus the final modified 'picture' cannot be easily retrieved
without decoding the encoded bitstream. Hence ReadWebP() is called in
cwebp.c on the encoded bitstream kept in memory to get the correct
distortion.
However -get_psnr returns a different distortion than get_disto for
lossy encoding configurations because cwebp loads the source as YUV
while get_disto directly reads it as RGB without conversion loss.
Change-Id: I5c32cf8f89eb137973dc7eebda747682d921b8e2
Fix another pessimization found by the pingo image compressor.
Refactoring is necessary to make LZ77 computation
common to cache or no-cache analysis.
Slower by 1.7x instead of 2x
Change-Id: I396701ea6e88543dbfe9471eb552877f6c8ce1e3
qmin / qmax are now using the pad[] spot at the end of the struct,
and we don't need to bump the ABI major number.
Change-Id: I41adcaf1600b29a5a05c9fe380bfd977cf425124
keep the minimum, but recommend the latest so we don't need to update
this in the future.
BUG=webp:463
Change-Id: I0615039323d3c77684c6be2e890514f6973cf535
remove EMSCRIPTEN_GENERATE_BITCODE_STATIC_LIBRARIES, this option is only
supported with the deprecated fastcomp backend; the wasm version will
begin to throw an error if it's used. 1.39.12 already fails to build when
using this option.
+ fix a typo
BUG=webp:463
Change-Id: I7175df4d5b2b6cb6aaf20eb41063c09c03b9b69a
this was not giving a good alpha value, making the method 5/6 a little
blurrier than method 4 (!).
Change-Id: I69b9890dea21499c1af1753e87d9f7adf8b433de
This is particularly useful for multi-pass search (but not only),
to prevent the search from going over or below a reasonable threshold.
E.g.: 'cwebp -qrange 50 80 ...' will prevent any unreasonable degradation.
new cwebp option: -qrange min max
Change-Id: I59f394533535fc20b6996bc0895f4301476d5eff
when building a dll based libwebp include the dsp private symbols that
WebPUnmultiplyARGB requires
Change-Id: I7cf7da0b20d6cf6740219c8562380926a0abd93c
(cherry picked from commit cf047e8347)
when building a dll based libwebp include the dsp private symbols that
WebPUnmultiplyARGB requires
Change-Id: I7cf7da0b20d6cf6740219c8562380926a0abd93c
this is defined to 0 by dsp.h if it wasn't defined previously, since:
47178dbd extras: add WebPUnmultiplyARGB() convenience function
Change-Id: If4dd48360a95b2786410670cff5ac655227fb6dd
PredictorSub0_SSE2 doesn't use 'upper' (neither does
VP8LPredictorsSub_C[0]); just pass NULL when dealing with trailing
pixels to avoid undefined behavior when offsetting a NULL pointer
BUG=chromium:1026858,oss-fuzz:19430
Change-Id: I08be8899ed2e34f26aaee34defe68dbd0fe216d3
some toolchains may implement vcreate_u64 as an assignment to a vector
causing a type mismatch:
invalid conversion between vector type 'uint64x1_t' (vector of 1
'uint64_t' value) and integer type 'unsigned int' of different size
const uint64x1_t LKJI____ = vcreate_u64(L | (K << 8) | (J << 16) | (I << 24));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: I5c7b0076ad66d4b3fcdcb7ee9f59bbaa6f19b783
The workaround for GCC ARM must not be applied when another toolchain
(like MSVC) is used for the build.
Change-Id: I11ec4558902063ccb085d3f435e24b3a60739dd5
'upper' could be NULL and it would be increased.
But that is for predictor zero that does not use 'upper'.
Change-Id: Icd4ae6792cc55ea021b4f828c3dbdb5f03e120d8
add some missing dependencies and convert utility libraries to
static only libraries to avoid creating unnecessary shared object
libraries which may fail to link due to missing symbols.
Change-Id: Iaa91a3d97fa5af6ada4b2a851cc7fc2879d871da
Since people seem to write "2 ^ X" hoping that it means "1 << X", clang
recently added a warning for this pattern.
It incorrectly fires on this file. To suppress it, restructure the code
to be less clever. (Alternatively we could use "xor" instead of "^" or
write "0x2" instead of "2" but both seem worse.)
No intended behavior change.
Bug: chromium:995200
Change-Id: I64744345be5f5a8cd1f4aaeaf0982da239b378a7
sometimes, the last rows of the alpha plane contain more than NUM_ARGB_CACHE_ROWS
rows to process. But ExtractAlphaRows() was repeatedly calling ApplyInverseTransforms()
without updating the dec->last_row_ field, which is the starting row used as starting
point.
Fix would consist of either updating correctly dec->last_row_ before calling
ApplyInverseTransforms(). Or pass the starting row explicitly, which is simpler.
BUG=webp:439
Change-Id: Id99f2c28662d02b2b866cb79e666050be9d59e04
For some exact resonance the over-quantization was exactly
compensating the under-quantization, leading to resonance
and strange patterns.
-> we special-handle the very flat blocks, hopefully for the
greater good (and not just the bad-resonance case).
For 'fast mode' (-m 3 or less), we just pay special attention
to the border of the image, where the oscillation / instability
usually starts. For the inner part of the image, since we're not
doing rd-opt, it's harder to fix anything.
Overall, on 'regular' images, the change is written the noise,
often leading to overall faster encoding (because of the short-cut).
BUG=webp:432
Change-Id: Ifaa8286499add80fd77daecf8e347abbff7c3a15
previously setting EXTRA_LIBS on the command line would skip any
per-target updates. this allows both EXTRA_LIBS and [CD]WEBP_LIBS,
GL_LIBS to be set together avoiding the need to add unnecessary
dependencies to EXTRA_LIBS which affects all targets.
Change-Id: I63bbc14f0ac0cd04aa50c5d5047060ac57d0c9dc
missed in a788b49
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)
Change-Id: I52dcd9cd613107f5424177c277785b92430bffb7
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -114 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 142 (8-bit,
unsigned)
Change-Id: I7f08a836ddcf777454dfd5b877a81b62b2abac86
with clang7+ quiets conversion warnings like:
implicit conversion from type 'int' of value -12 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 244 (8-bit,
unsigned)
Change-Id: I053c92301e55dcb0cae89a7733636283da942176
no change in object code
from clang-7 integer sanitizer:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
1955895199 (32-bit, unsigned) to type 'uint8_t' (aka 'unsigned char')
changed the value to 159 (8-bit, unsigned)
Change-Id: I0c3022339e34b9c9af03167ab827ade677973644
_mm_set1_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 65280 (32-bit, signed) to
type 'short' changed the value to -256 (16-bit, signed)
Change-Id: Iad64f6209a8c130a7df67515451ded45b3f91702
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 189 (32-bit, signed) to
type 'char' changed the value to -67 (8-bit, signed)
implicit conversion from type 'int' of value 128 (32-bit, signed) to
type 'char' changed the value to -128 (8-bit, signed)
implicit conversion from type 'int' of value 33909 (32-bit, signed) to
type 'short' changed the value to -31627 (16-bit, signed)
Change-Id: Id6b191b2c06881e27d447eeb1ff5bb2c1857b6ba
holding the associated mutex while signaling a condition variable isn't
necessary and in some implementations will reduce performance as the
woken thread may test the mutex, fail and go back to sleep.
Change-Id: Id685a47b0c76fc4a1c5acedcb6623e8c55056415
_mm_set1_epi8() takes a char argument
_mm_insert_epi16 takes a short argument
from clang-7 integer sanitizer:
implicit conversion from type 'int' of value 255 (32-bit, signed) to
type 'char' changed the value to -1 (8-bit, signed)
implicit conversion from type 'int' of value 33153 (32-bit, signed) to
type 'short' changed the value to -32383 (16-bit, signed)
Change-Id: Ic88c8ef3d00146d34f53a560582db673f818370d
no change in object code
from clang-7 -fsanitize=implicit-integer-truncation
implicit conversion from type 'int' of value -16 (32-bit, signed) to
type 'uint8_t' (aka 'unsigned char') changed the value to 240 (8-bit,
unsigned)
Change-Id: Ia7cbaad247ab22b505b7f98b1247219c024f6db0
This is to make the initial window be rescaled in case the image
dimension is too large to fit the display.
BUG=webp:433
Change-Id: Ib04c12962bc8c26e74c8a6193829214da636ebde
no change in object code
from clang-7 -fsanitize=implicit-integer-truncation
implicit conversion from type 'int32_t' (aka 'int') of value 287
(32-bit, signed) to type 'uint8_t' (aka 'unsigned char') changed the
value to 31 (8-bit, unsigned)
Change-Id: I692368bcc2f41412697b8ae51e53078831072891
regenerated with swig 3.0.10
matches the version used to update libwebp.py in:
0e7f8548 update generated swig files
Change-Id: I976a1a6b673c8174db5e97e8316c59970582e565
no change in object code
from clang-7 -fsanitize=implicit-integer-truncation
implicit conversion from type 'int' of value 39736 (32-bit, signed) to type 'uint8_t' (aka 'unsigned char') changed the value to 56 (8-bit, unsigned)
Change-Id: I0ecf24c5b1b11e056c58b3b85ea529c30cdadf57
"implicit conversion from type 'uint32_t' (aka 'unsigned int') of value xxxxx (32-bit, unsigned) to type 'uint8_t' (aka 'unsigned char') changed the value to xx (8-bit, unsigned)"
and same with signed -> unsigned conversion with truncation.
Change-Id: I50cae41a9ce7edcfcc814cc3ee2556b927064f43
We saturate the result to [0..255]
It's the easiest and safest, given the wide variety of scaling
range we cover: we're not using floats, so precision is always
an issue at one end or the other of the scaling spectrum.
we also use:
round(a - floor(b))
instead of:
floor(a - round(b))
to handle difficult cases (ratio ~= .99, e.g.)
MIPS code is still disabled (and wrong)
Change-Id: I18d3f5ddc4c524879c257b928329b1c648fa7fb5
previously if the mappings allocation failed histo_queue->queue would be
uninitialized; split the conditionals
Change-Id: I1b50b987e734393893dc8a83a3f314522ccd0c83
note that config.exact defaults to 0 and point users to WebPEncode() if
the default isn't acceptable.
BUG=webp:424
Change-Id: I179c34649834aeadc1606d0856f33e8255048ea1
This is to prevent resizing to dimension 0
+ added some safety checks about src_width > 0 and src_height > 0
BUG=webp:418
Change-Id: Ic04a53ad26455d80538bc8681882a554fca2a340
CommandLineToArgvW() returns a LPWSTR*, storing to const LPWSTR* is
incorrect without a cast; fixes gcc -Wincompatible-pointer-types
and clang(-cl) -Wincompatible-pointer-types-discads-qualifiers warnings
Change-Id: Iad5b49c4862c7be68251272e50d3c751099559bc
Taking the comments from the internal ParseHeadersInternal.
(which is called by GetFeatures).
BUG=webp:411
Change-Id: I9999b4a183805e2db1456610a30024a0d8be4d00
It's safer to clip the passed param instead of doing 32b arithmetic
and clipping afterward.
Output is unchanged, but code no longer rely on UB.
Change-Id: Ia5b4de6e8863981753f1d17f062965a6a5da5bed
dst->cur_ was not set.
The bug occurred only with several VP8LBitWriter instances
(thread_level > 0) and in 32-bit (in 64-bit, src->cur_ was
always 0 in VP8LBitWriterClone()).
BUG=chromium:917029
Change-Id: I0d94a3d8e62b247fd616eebe1009868dc8a5ed2e
Move IsFlat to its own header. This allows it to continue to be
inlined. Using the RTCD and creating a distinct function slows down arm
builds.
flower mug
C 3.59 2.12
NEON 3.47 2.01
BUG=b/118740850
Change-Id: Id77e8f76d9e9790c498806e7070bbe37c10bc2e9
thresh is defined by FLATNESS_LIMIT_* which ranges from 2-10.
score_t is int64 which is a touch overkill.
Change-Id: I308bd440bf11643665d3642fe361495a257b6e52
libwebp-1.0.1
- 11/2/2018: version 1.0.1
This is a binary compatible release.
* lossless encoder speedups
* big-endian fix for alpha decoding (issue #393)
* gif2webp fix for loop count=65535 transcode (issue #382)
* further security related hardening in libwebp & libwebpmux
(issues #383, #385, #386, #387, #388, #391)
(oss-fuzz #9099, #9100, #9105, #9106, #9111, #9112, #9119, #9123, #9170,
#9178, #9179, #9183, #9186, #9191, #9364, #9417, #9496, #10349,
#10423, #10634, #10700, #10838, #10922, #11021, #11088, #11152)
* miscellaneous bug & build fixes (issues #381, #394, #396, #397, #400)
* tag 'v1.0.1':
update ChangeLog
Fix pair update in stochastic entropy merging.
README.mux: add a reference to the AnimDecoder API
CMake: fix webp_js compilation
update NEWS
bump version to 1.0.1
Speed-up: Make sure we only initialize histograms when needed.
update AUTHORS
img2webp: add help note about arguments from a file
Speedups for empty histograms.
Split HistogramAdd to only have the high level logic in C.
Fix compilation on windows and clang-cl+ninja.
Change-Id: I4b58eee66b25da184ac4bf4c70e43e43682b3a23
Direct copy of sse2. Slight improvement because neon has
abs().
flower.ppm had minimal improvement. Somewhat expected because
GetResidualCost_C is only ~3.6%
mug.ppm had a better improvement because GetResidualCost_C is
almost 9%.
C 2.150
NEON 2.130
BUG=b/118740850
Change-Id: Ibc0dd97a81596635f5599cf568205974b4fd2597
Much faster with aarch64. Still somewhat faster without vmaxv.
C: 3.700s
ArmV7: 3.675
aarch64: 3.600
BUG=b/118740850
Change-Id: I3be852da89633eca4bddce443c87f5e4a2f55868
The old code simply did not make sense.
The effect is that the pair would be popped from the
queue no matter what; as the queue is small, it does
not matter that much on the results.
But it will matter for a later CL.
Change-Id: If50c9fa9d7f3ac3c48bb7336d81479287d4944c4
(cherry picked from commit 485ff86fbb)
Stick to the strict necessary for running webp_js,
and avoid building sub-lib or examples with heavy dependencies.
Change-Id: Ife4170a7839fb3201b2cf158d98d17bebe10008f
(cherry picked from commit 4cd0582d50)
The old code simply did not make sense.
The effect is that the pair would be popped from the
queue no matter what; as the queue is small, it does
not matter that much on the results.
But it will matter for a later CL.
Change-Id: If50c9fa9d7f3ac3c48bb7336d81479287d4944c4
Stick to the strict necessary for running webp_js,
and avoid building sub-lib or examples with heavy dependencies.
Change-Id: Ife4170a7839fb3201b2cf158d98d17bebe10008f
Also, histograms in a HistogramSet can be initialized all
at once.
Change-Id: Ibbfa6034dce58dca8bb9113487e2ae507222ce7d
(cherry picked from commit 6752904b2f)
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.
Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
(cherry picked from commit decf6f6b87)
When histograms are empty, it is easy to add them.
They should also not be considered when merging histograms
(it is a waste of CPU).
This does not change the compression performance,
just the speed.
Change-Id: I42c721ca0f9c5ea067e73b792aa3db6d5e71d01f
Define macros in examples/unicode.h to use Unicode argv
on Windows. Keep char everywhere on Unix since it handles
UTF-8 without any change.
Impact:
- All fopen () and SHCreateStreamOnFile(),
- All fprintf() printing file paths,
- All strcmp() used with "-",
- File path parsing,
- Gif reading.
Concerned executables from examples/ and extras/:
anim_diff, anim_dump, vwebp, vwebp_sdl,
cwebp, dwebp, gif2webp, img2webp,
webpmux, webpinfo, webp_quality, get_disto
When compiled on Windows with Unicode enabled, webpmux and
img2webp will not work when used with an argument file and
will print "Reading arguments from a file is a feature
unavailable with Unicode binaries."
BUG=webp:398
Change-Id: Ic55d222a3ce1a715f9c4cce57ecbe2705d5ce317
GIFDisplayError() is already defined exactly the same way
in gifdec. Remove it from anim_util.c and add dependency.
Change-Id: Iec01b41c44d0b61b3a279b8cd754d9917d64f804
https://github.com/cheshirekow/cmake_format
A complete row of # is replaced by only one,
so add a space after the first one to keep it.
Change-Id: I367749353a555c89c717f1939220699e43ecab2c
To be compiled with Visual Studio on Windows through CMake.
Targets are get_disto, vwebp_sdl, webp_quality.
Change-Id: Idec1e19b61b6a661011effee42f7440264afd3ed
Instead of unsetting WEBP_BUILD_GIF2WEBP,
disable it to keep it displayed in ccmake.
Disable WEBP_BUILD_GIF2WEBP before it's tested.
Do the same for WEBP_BUILD_ANIM_UTILS.
Change-Id: I91920814fbda4825ced9a801bb005f8d44bfe09c
Fix issue where color data is discarded in fully transparent
areas when -resize -exact.
BUG=webp:397
Change-Id: I58ce8d5ae172d5d0f0138e07c7df3a3c6cbd0019
after:
bc5092b1 pngdec: set memory functions
png_alloc_size_t was added in 1.4 use png_size_t in earlier versions
Change-Id: If65ac1c501e2d497b1be480095bf21f06ea7026a
use png_create_read_struct_2 to set a malloc function allowing the code
to fail on large allocations while fuzzing
Change-Id: Iaca1b93ecc6570067708f3ae2db07fbca74386ee
The graphical bug happens when there is a frame disposed to background
color, followed by another frame that does not blend, and their areas
don't fully overlap. Only the previous frame clears its part of the
viewport. The fix consists in clearing the screen for the previous and
the current frame if needed.
Change-Id: I3425cf7297f0c7b2cf13a3a61b517cc0b1c031d8
Option -usebgcolor may be used to display ANIM background color (or white if no ANIM chunk), blended on top of checkerboard. By default this is disabled (old behavior) to easily see transparent areas. Spec says that "background color MAY be used", so it's an option.
Key b may be pressed to toggle ANIM background color display. There are visual artifacts (leftovers) when toggling during an animation. This is already the case for rescaling, toggling info etc. (fixing it implies storing viewport render or rendering whole animation from start till current frame).
BUG=webp:394
Change-Id: If9ab898b2eac77226f30f062d522f9861789ef8f
Container spec indicates that background color is written
in BGRA byte order, but glClearColor() parameters are RGBA.
Anyway the checkerboard is displayed right after glClear()
calls so it replaces background color.
In response to webp-discuss/TkLHALGaHaM
Change-Id: Ief5435fadfd6a422b881a9dc240b5e8dc6546e19
We should be using 'floor' when doing the final divide.
-> new MACRO is MULT_FIX_FLOOR()
XXX*** Mips code is DISABLED for now ***XXX
I'll update and re-enable it in a later
patch, since this code needs some refactoring first.
BUG=oss-fuzz:9179
Change-Id: Ic0693cdca4e71f5beab1029475e35c4d06b12d13
* Assert chunklist
* fix potential memory leak and
* fix null pointer access
There should not be several alpha_ or img_ chunks in SynthesizeBitstream. Use ChunkListDelete in MuxImageRelease to be safe.
A null pointer accessed in WebPMuxPushFrame triggered a harmless runtime error.
Change-Id: I3027f8752093652bd41f55e667d041c0de77ab6e
The chunk list only has two operations: append and set
to one element. The two operations are split and the append
one is sped up by storing the last element.
Corrupted data could make a very long list to search through.
BUG=oss-fuzz:9190
Change-Id: I1aa813ca629df29efaa3b46dbd4c4c42dbeaa34c
The standard allows for Huffman images with any coefficients.
Hence potentially big memory allocations. The previous workaround
was "trying" things out, the new one is more rigorous and
only allocates what is needed, modifying the Huffman image
to contain the minimal set of coefficients.
BUG=oss-fuzz:8623,oss-fuzz:9111,oss-fuzz:9134
Change-Id: I6a972e90e4ae509c15cb41ee22c58b775fa3f4aa
idec_dec.c, DecodeRemaining: Set decoder state to ERROR to prevent VP8ExitCritical to be called again
Change-Id: Id5f893f45c348e1c529680d930e640f780a73d4c
treat an ANMF chunk containing multiple VP8/VP8L file as malformed.
fixes a WebPMuxImage::img_ leak.
Though the invalid free in #9106 was avoided in (ubsan):
be738c6d muxread,ChunkVerifyAndAssign: validate chunk_size
that file would still cause a leak similar to #9099.
BUG=oss-fuzz:9099,oss-fuzz:9106
Change-Id: Ib873446a1188afeeb2fe5d53a86b75e0c5de9573
(we also limit radius based on height too, for good measure, although it's not an asan bug)
fixes oss-fuzz issue #9105
Change-Id: Ie0d79dd81480dc4e2b653b7e992e5cdcd3dfa834
before accounting for padding which might overflow if chunk_size is >
MAX_CHUNK_PAYLOAD.
BUG=webp:387,webp:388
Change-Id: I3985b8817ed4faaec0629102c5333c228a0e9c98
previously when adjusting size down based on a smaller riff_size the
checks were insufficient to prevent 'size -= RIFF_HEADER_SIZE' from
rolling over causing ChunkVerifyAndAssign to over read. the new checks
are imported from demux.c.
BUG=webp:386
Change-Id: If863c4a9892977b9ade7dd894392a0ecae13775c
with loop_compatibility disabled (the default), non-zero loop counts
will be incremented by 1 for browser rendering compatibility. the max,
65535, is a special case as the muxer will fail if it is exceeded; avoid
increasing the limit in this case. this isn't 100% correct, but should
be close enough given the high number of iterations.
BUG=webp:382
Change-Id: Icde3e98a58e9ee89604a72fafda30ab71060dec5
libwebp-1.0.0
- 4/2/2018: version 1.0.0
This is a binary compatible release.
* lossy encoder improvements to avoid chroma shifts in various circumstances
(issues #308, #340)
* big-endian fixes for decode, RGBA import and WebPPictureDistortion
Tool updates:
gifwebp, anim_diff - default duration behavior (<= 10ms) changed to match
web browsers, transcoding tools (issue #379)
img2webp, webpmux - allow options to be passed in via a file (issue #355)
* tag 'v1.0.0': (23 commits)
update ChangeLog
webp-container-spec: correct frame duration=0 note
vwebp: Copy Chrome's behavior w/frame duration == 0
update ChangeLog
add WEBP_DSP_INIT / WEBP_DSP_INIT_FUNC
fix 16b overflow in SSE2
makefile.unix: add DEBUG flag for compiling w/ debug-symbol
cwebp,get_disto: fix bpp output
cmake: Make sure we use near-lossless by default.
fix bug in WebPImport565: alpha value was not set
update ChangeLog
Revert "Use proper targets for CMake."
Use proper targets for CMake.
Remove some very hard TODOs.
{de,}mux/Makefile.am: add missing headers
makefile.unix,dist: use ascii for text output
add -version option to anim_dump,anim_diff and img2webp
webp_js: fix webp_js demo html
update ChangeLog
update AUTHORS
...
Change-Id: I5659406c022a0964f728ce2eb35338fd9c195466
the interpretation of a 0 duration depends on the implementation;
merging of multiple frames isn't guaranteed, some may enforce a minimum
duration.
BUG=webp:380
Change-Id: Idf592049d2092e4cc5cfb2e4c59ddbc91bd52f9c
(cherry picked from commit 71c39a06c8)
the interpretation of a 0 duration depends on the implementation;
merging of multiple frames isn't guaranteed, some may enforce a minimum
duration.
BUG=webp:380
Change-Id: Idf592049d2092e4cc5cfb2e4c59ddbc91bd52f9c
this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.
Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
(cherry picked from commit d77bf512bd)
this internalizes the init checks and provides stronger synchronization
with pthreads when available while still allowing VP8GetCPUInfo to be
modified (mostly for testing purposes). windows is left as is since a
critical section or mutex would cause a leak.
Change-Id: Ieb997e014f2805c0ae39c16f13337663521356f4
the 'accum' variable can be larger than 15b for large
rescale values.
Assert triggered:
src/dsp/rescaler_sse2.c:249: RescalerExportRowExpand_SSE2: Assertion `v >= 0 && v <= 255' failed.
src/dsp/rescaler_sse2.c:350: RescalerExportRowShrink_SSE2: Assertion `v >= 0 && v <= 255' failed.
-> fall back to C implementation in this case for now
Change-Id: I7ea1cb72301cafc1459be403f6a6f4e3cbc89bb1
Also fix the bug where near lossless was not used
and allow examples to be built by default.
Change-Id: Ieb5ef77fafe83f3776ff4fd27a6d26534c7a51f3
(cherry picked from commit e155dda0cc)
this prevents unknown escapes containing '-'s getting stripped on OS X
when a tty targeted font is used
Change-Id: I11d77f2984d9fd67a8b22948fb21e4c11396aec4
This is to harmonize the -h/-version options on all our examples.
+ added GetAnimatedImageVersions() method to anim_util.*
Change-Id: I2304a1c29e310682e97f236d3867274a192a7a09
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates function pointer casts that were causing cfi-icall
failures.
[1] https://www.chromium.org/developers/testing/control-flow-integrity
BUG=chromium:827826
Change-Id: I5db021d06390a6cefd670fdd2f0d34c9e530465e
(cherry picked from commit 978eec2507)
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates function pointer casts that were causing cfi-icall
failures.
[1] https://www.chromium.org/developers/testing/control-flow-integrity
BUG=chromium:827826
Change-Id: I5db021d06390a6cefd670fdd2f0d34c9e530465e
this is consistent with web browser behavior as well as various
transcoding tools (ffmpeg, gif2apng, etc).
also: update anim_diff to account for this new behaviour.
BUG=webp:379
Change-Id: I70cc72a6b401ef32b73cd182a3f12d993d495bf4
Output is <.1% difference in size, randomly.
Speed is 30-50% faster (-m 0 -sharp_yuv).
It also gives the exact same output on ARM and x86, because floats
are no longer used.
Change-Id: Id0f0aa748cc4fc0b82bac1fc5ca954775a0a1b7c
this quiets a -Wclobbered warning on const has_alpha under gcc-7 and
brings the variables closer to their first use.
Change-Id: I8a24f275b7ff34a94d47b576bcf276dbedac2121
do_copy is a loop invariant, but based on a variable parameter; it would
only be extracted if Import was inlined.
Change-Id: Id5b4a1a4a83a4f2083444da4934e4c994df65b44
Apply gamma correction to the decoded RGB values.
This handles corner cases where the PNG file doesn't have
a standard 1/2.2 gamma value.
BUG=webp:369
Change-Id: I9907b6e2c458002de7c26d0b9e416278cca33990
for q<=98, we always enable error diffusion.
+ reduce storage 2x by using int8_t
+ make the error diffusion more robust
BUG=webp:340,308
Change-Id: I0608df839ff7b64d6843005a0f81d2577143af9e
* regarding alpha_data_ used for testing.
alpha_data_!=NULL is as close a good test as we'll get.
* regarding filter-strength / sharpness forcing
no practical use (can be done during encode cycles,
for experimentation)
* regarding a 'less-complex' filtering:
no practical use so far. Next version!
Change-Id: If2dfff5818552a7d3e7c23ac08d64fe6d270229c
For large images overflowing the partition0, we re-do a number
of passes but were forgetting to reset the block_count[].
This was leading to incorrect summary.
+ some cosmetic fixes here and there
BUG=webp:355
Change-Id: Ie87158d7f177f8efdca429b146cfcd0e81652d2f
We can't predict if the quality is going to be below the threshold
eventually, so we might as well enable it always.
Change-Id: I30aedecc8c6d4daf159f6ef152697df0206d1e93
This fixes some color smearing due to heavy quantization.
This is only enabled for q <= 30 (cf ERROR_DIFFUSION_QUALITY)
Change-Id: I07e83a4d38461357a32c9e214f7eadc6db73baa9
with WEBP_NEON_OMIT_C_CODE the default _C functions won't be set and
with WEBP_REDUCE_CSP the NEON functions won't be either triggering an
assert for an empty table member.
BUG=chromium:792627
Change-Id: I8d2d430eaa37bb92885b61a3dd39f961924a8def
upsampling_sse41.c and yuv_sse41.c added in:
807b53c4 Implement the upsampling/yuv functions in SSE41
Change-Id: I186cb6f6c296ba26b8e9b42d88da7f58c55710a9
RGB to YUV conversion was not using SSE to finish up the row.
End data is now copied to a buffer big enough to fit in a
SSE register.
(UPSAMPLE_LAST_BLOCK was already using that trick).
Change-Id: Ie539bcbe570a643a774aa88263503c0d2c41890f
if a single text file name is supplied as argument
(e.g.: 'webpmux my_long_list_of_frames.txt'), the command
line arguments are actually parsed from this file.
Tokenizer will remove space, tabs, LF, CR, returns, etc.
+ changed ImgIoUtilReadFile() to return a null-terminated
data, for convenience.
+ misc clean-up in the code
BUG=webp:355
Change-Id: I76796305641d660933de5881763d723006712fa9
---
alpha processing is still required when requesting premultiplied output
since:
1b27bf8b WEBP_REDUCE_SIZE: disable all rescaler code
Change-Id: Id1b03256c4c04b8db31527e60cd31dd20ce6f3ad
libwebp-0.6.1
- 11/24/2017: version 0.6.1
This is a binary compatible release.
* lossless performance and compression improvements + a new 'cruncher' mode
(-m 6 -q 100)
* ARM performance improvements with clang (15-20% w/ndk r15c, issue #339)
* webp-js: emscripten/webassembly based javascript decoder
* miscellaneous bug & build fixes (issue #329, #332, #343, #353, #360, #361,
#363)
Tool updates / additions:
added webpinfo - prints file format information (issue #330)
gif2webp - loop behavior modified to match Chrome M63+ (crbug.com/649264);
'-loop_compatibility' can be used for the old behavior
* tag 'v0.6.1':
update ChangeLog
WEBP_REDUCE_CSP: restrict colorspace support
update ChangeLog
vwebp: disable double buffering on windows & mac
webp_to_sdl.c: fix file mode
WEBP_REDUCE_SIZE: disable all rescaler code
webpinfo: add -version option
bump version to 0.6.1
update NEWS
README: add webpinfo section
Change-Id: Iab2153fae38da3c99daccdf57fec816e07b7909a
only supported ones are: RGBA/BGRA/rgbA/bgrA (decoder)
as well as: WebPPictureImportRGB/RGBX/RGBA (encoder).
(note: extras/get_disto is affected too)
Change-Id: If6c4f95054ca15759c4e289fb3b4c352b3521c2c
(cherry picked from commit 6de20df02c)
only supported ones are: RGBA/BGRA/rgbA/bgrA (decoder)
as well as: WebPPictureImportRGB/RGBX/RGBA (encoder).
(note: extras/get_disto is affected too)
Change-Id: If6c4f95054ca15759c4e289fb3b4c352b3521c2c
this results in flickering with animated webp + alpha. disabling the
option is a workaround to restore the behavior to the previous release.
BUG=webp:365
Change-Id: Iac7fcc0d483837e76cc54ad3f26c4e0e5511e31d
remove auto-filter (-af) support and make WebPPictureCopy,
WebPPictureIsView, WebPPictureView, WebPPictureCrop, and
WebPPictureRescale noops.
Change-Id: If39d512cc268a0015298a1138dbc94feb86575e5
with gcc-4.8, clang-4.0.1/5 this is no faster (actually up to 2x slower)
than the code generated for memset (0x01010... * dst[-1]). shuffles in
sse4 recover a bit, but performance is still down.
Change-Id: Ie85e8353f8ede559d0b05a1d388787fd18ecc80f
Rewrote WebPPictureHasTransparency() to use them (even for argb).
This is 10% faster, for some reasons.
SSE2 version should be straightforward.
Removes a TODO.
Change-Id: I7ad5848fc5e355e2df505dbcd5a0f42fb6cbab41
The WebPDemux and WebPAnimDecoder APIs are provided for the purpose of
animated webp parsing and decoding. No major changes are currently
planned for the libwebp API.
Change-Id: I2758ecda195b0c4091572d5731a0a85fa3716303
this can result in an alignment hint on arm causing a SIGBUS. casting
the input ptr to anything aside from its type is unnecessary for memcpy
and is contrary to the intent of this function.
Change-Id: I9a4d3f4be90f80cd8c3e96ccbe557e51e34cf7a5
when targeting NEON C functions with NEON equivalents won't be used, but
will contribute to binary size. the same goes for sse2, etc., but this
change is primarily concerned with binary sizes for android arm targets.
note '-noasm' or otherwise modifying VP8GetCPUInfo will have no effect
on the use of NEON functions.
this decision can be overridden by defining WEBP_DSP_OMIT_C_CODE to 0.
Change-Id: I47bd453c84a3d341ca39bc986a39eb9c785aface
* changes:
{dec,enc}_neon: harmonize function suffixes x2
upsampling_neon: harmonize function suffixes
yuv_neon: harmonize function suffixes
rescaler_neon: harmonize function suffixes
lossless_neon: harmonize function suffixes
lossless_enc_neon: harmonize function suffixes
filters_neon,cosmetics: fix indent
enc_neon: harmonize function suffixes
dec_neon: harmonize function suffixes
and all older versions.
force Sub3() to not be inlined, otherwise the code in Select() will be
incorrect.
extends the check add previously in:
637b3888 dsp/lossless: workaround gcc-4.9 bug on arm
BUG=webp:363
Change-Id: I1403b558f8660b764f3a570a3326822d5ef0be29
the incorrect bit was being extracted from the lossless bitstream header
causing the alpha flag in VP8X to be misreported. previously the
signature byte was ignored in the calculation of the offset.
since:
8ba1bf61 Stricter check for presence of alpha when writing lossless
images
BUG=webp:361
Change-Id: I7c618b5f01a37f5e4b799dee11a7949efaf88046
avoids over reading if the reported ANMF payload is < 8 bytes.
likely broken since:
81b8a741 Design change in ANMF and FRGM chunks:
Change-Id: I3e267bafea348a50545587dea8fafb2199c6b650
ssse3 is bit #9 in ecx, bit 1 is sse3. this only controls the check for
slow ssse3 and likely had no ill effect.
Change-Id: I84ce73dc480e1cdbd085e37be06f3f402116c201
It is still redefined, but to the same constant,
hence no more warning as mentioned on BUG=webp:358
Change-Id: I80a834c139d5d60cd693d468b0e2ea399729ab3e
This patch fixes the compatibility for loop-count handling.
This aims at addressing the change in Chrome handling of loop-count
prior to M63.
Before M63: loop-count interpretation was aligned to GIF's behaviour in
Chrome, but incompatible with WebP's spec. In particular, you couldn't
loop exactly once.
Post-M63: loop-count in WebP is really the total number of loops. Gif2webp
will convert correctly from a GIF source by adjusting the loop count.
Note: The Chrome version can be retrieved from the User-Agent
string (chrome://version).
An M63 version will contain the pattern:
Chrome/63.x.xxxx.xx
for instance.
Change-Id: Ie6dc13227e6498f4d7af2f09247913648997648a
calculate the file duration using unsigned math. this could still result
in an incorrect average duration calculation if there were multiple
rollovers. caching the duration is an option if it was desirable to
support such an extreme case.
Change-Id: I3875d94d081fec947c03a857055df6e27ff5351d
...prior to allocating a Picture. this is consistent with the other
readers and allows the allocation size to be limited at compile time
BUG=webp:355
Change-Id: Ib8e027ef863489b1e0f9e2a1403c3836da3ef48d
WEBP_MAX_IMAGE_SIZE can be defined to control this limit.
Set it to 1.5GiB w/--config=asan-fuzzer to avoid OOM with large resolution
images. This limit leaves some headroom over the single image max of 2^14 *
2^14 * 4
BUG=webp:355
Change-Id: I4d48eb0a063638297a842582e0229dfd5a54df5f
everything post jpeg decoder creation should go through the error path
to ensure it's cleaned up properly
Change-Id: If78b4529e40797c67c3d0e624af1c036badea674
For a pixel, we look for the longest match starting in a window around it.
For the following pixel, the previous result can be used and smaller
search window is used.
Change-Id: Ice16f9a7c8754099d068380848f0d77de3f756ac
add a check for cpu-features.h and rework some of the ifdef's around
android + neon. for android builds with cpu-features enabled the
*_neon.c files will still need to be flagged correctly (with e.g.,
.c.neon in Android.mk) to properly build them.
BUG=webp:353
Change-Id: I905ce305af0a204e560b915d8665093a3edaceb9
including the type in the macro doesn't bring much benefit to ordering,
current platforms work with a prefix, this would be insufficient if the
attribute needed to follow the function prototype. this form makes it
easier to override on the command line.
BUG=webp:355
Change-Id: Iba41ec0bb319403054be0e899c4cc472dd932fd9
we use a static guard to only call SDL_Init() once.
Another option would be to call SDL_Quit(), but it doesn't seem to be
doing anything.
Also, fix the HTML code and add 'use strict' directive.
BUG=webp:354
Change-Id: I3c6421e2c1c8cc200556cd4092a0ead9a8b054ef
this check was relocated in:
b903b80c Split cost-based backward references in its own file.
quiets -Wundef
Change-Id: I7f7a4773fb8cc77ca9f671b11f50d5db2275d415
- some image libraries define _INCLUDE_DIR and not INCLUDE_DIRS.
- CMakePushCheckState is a non-destructive way of changing the
CMAKE_REQUIRED_* variables.
Change-Id: I7d318a0ddf9754a5c047f47ba1713f9f94e1ec1c
GIF find_package only locates the header and library, it doesn't fail
compile tests when detecting the version, but falls back to 3 (as of at
least cmake 3.7.2). Make sure the library links to avoid incorrect
detection when cross compiling.
Change-Id: I0224180bbdf800ab261a9333236730b23c006e69
If "exact" is false, we can modify the luma samples in fully transparent
areas to facilitate lossy compression. Experiments on some PNG images
show compression improvement of more than 20%.
Change-Id: I1a728cfa920a6652bc1f600d87c01f7f648c4942
this avoids the result being cached in a common variable causing
potentially incorrect results in future tests and improves the logging
output
BUG=webp:351
Change-Id: I4b3110beec38f505fc8f835c3ea689a7d81a36bb
Half of the functionality was duplicated.
The rest is about the alpha channel handling so we
might as well put it in the appropriate file.
Change-Id: I8d5ef0afce82cc4842ab7132fd97995c42e6140a
As backward references use the plane code when checking the cost
of a distance, statistics used to compute the cost should use it too.
This provides a small compression improvement at no speed cost.
Change-Id: Icade150929ee39ef6dc0d8b1fc85973086ecf41d
quiets undefined sanitizer warnings of the form:
left shift of 128 by 24 places cannot be represented in type 'int'
Change-Id: I8a389f2ac9238513517180f302f759425eeb7262
this avoids setting bytes_per_px < depth causing an undersized
allocation for rgb import resulting in a crash.
BUG=b/37930872
Change-Id: I32a86f91528acc084a53d08c9fde9f2f1270a603
When re-initializing a bit writer, we could set invalid values because
the bit writer was not big enough.
Change-Id: Id25ab6712603245a5a12d5f4a86fe35a9a799a5d
the TIFFGetField() return for TIFFTAG_EXTRASAMPLES is defined as (count,
types array) [1]. previously the count was being checked rather than the
first element of the array to determine whether the alpha was associated
(pre-multiplied) and the result needed to be unmultiplied.
since:
9273e441 fix TIFF encoder regarding rgbA/RGBA
[1] http://www.libtiff.org/man/TIFFSetField.3t.html
Change-Id: I6e41be9d038fe8afb6d0aa3c8048925dc901113b
This is to prepare the inclusion of <windows.h>
FrameRect => FrameRectangle
CLIP_MASK => CLIP_8b_MASK
Change-Id: Ia4b1fa4ac06137b4102c91e232206a1fb7159ce0
A command line tool to print out the chunk level structure of
WebP files along with basic integrity checks.
BUG=webp:330
Change-Id: Ic69f646f649abb655b1854621d99afedeed158d7
Encoder:
We were always using ExtraSamples=1, which means associated-alpha.
But we don't need the (lossy) excursion to rgbA format. We can save
the samples as RGBA directly, by changing ExtraSamples to '2'.
The TIFF encoder now checks the colorspace properly, to handle
premultiplied format as well as non-premultiplied.
Decoder:
The result of TIFFReadRGBAImageOriented() is always pre-multiply.
So, in case an alpha channel is present, we need to unmultiply it before
calling WebPPictureImportRGBA().
See:
https://www.itu.int/itudoc/itu-t/com16/tiff-fx/docs/tiff6.pdf (page 31)
and also http://www.asmail.be/msg0055469184.html
Change-Id: I3258bfdb0eb2e1a53d6c04414f55edb2926c938c
The patch 21735e0 introduced a bug where a goto path was not testing
the eos_ state. If this happened just before a row_sync, a SaveState()
would be called that would store the eos_ state as '1' till the end
of the loop. This usually was not a problem, except for the very last
chunk where we disable the incremental decoding altogether (we have all
the data). The termination tests were then going wrong.
The fix is to add a proper eos_ test and avoid falling in this inconsistent
state.
(21735e06f7)
BUG=webp:332
Change-Id: Ib16773aee26bfd068fbf4e9db3d2313bd978b269
The FLAGS_LIBS was dropped at some point.
There is still an SDL_Main / main problem though, but it's a
separate topic.
Change-Id: Ief4d79624e94e7904134df8d84a4ac7c77c48ce7
Before, the color cache size was chosen optimally for LZ77 and
the same value was used for RLE. Now, we optimize its value
taking both LZ77 and RLE into account.
Unfortunately, that comes with a small CPU hit.
Change-Id: I6261f04af78cf0784bb8e8fc4b4af5f566a0e071
Between each iteration we keep track of the previously found
potential merge hence less work to do.
Change-Id: I2b6237447e79443516a6111727d96c24f10bd98a
It was a bad implementation of a Lehmer random number generator
(the saturation was done wrong and mostly & was used instead of % .....).
That lead to "for" loop stuck with the same values given a specific seed,
hence wasted "for" loops (e.g. seed getting at 374988608 and modulo of 64
later leads to 0 even when updating the seed with the old formula).
As the "for" loops now always return a proper pair of histograms, their
number can greatly be reduced, hence a speedup.
Change-Id: I9f5b44d66cc96fd4824189d92276c3756c8ead5b
This code is ultra-critical for lossless decoding, especially on ARM.
The extra call VP8LIsEndOfStream() was causing unnecessary slow-down.
Now, we check for bitstream-end separately in the main loop.
Change-Id: I739b5d74cc29578e2b712ba99b544fd995ef0e0d
Currently, none are available. If WEBP_HAVE_SSE2 eventually works,
we'll have to refine this conditionals.
BUG=webp:261
Change-Id: Ibc63ee1c013f2a4169eeb85cc8b6317b6420c2ad
The build is based on CMake.
There is a demo HTML page under webp_js/index.html
See README.webp_js file.
BUG=webp:261
Change-Id: I6612378b89907efd7b863720c6becf98385fc406
Previously, the stochastic method for histogram
combination could finish in a greedy way
if the number of iterations to perform so was smaller.
Except that another greedy combination was performed
afterwards ... hence wasted CPU in some cases.
Change-Id: Ic0f26873e6dc746679486b91cb35d73efee91931
The initial re-writing of this part of the code with intervals
had to be done with a complex logic (mostly intervals with a
lower and upper bound, not a constant value like now) to properly
deal with the inefficiencies of the then LZ77 algorithm.
The improvements made to LZ77 since, now allow for a simpler logic.
There were also small errors in the interval insertion logic
that lead to small inefficiencies (hence a slightly better
compression rate).
Change-Id: If079a0cafaae7be8e3f253485d9015a7177cf973
Other libraries are also cleaned to automatically read the Makefile.am.
Dependencies also got cleaned.
Change-Id: I5d1ff0a4010d59e8c929ed0c9c30c05d83c271f8
Uses WebPToSDL() generic function defined in vwebp_sdl.[ch].
This function is not included in the libextras library, because it
would bring in an SDL dependency. Probably too heavy for now.
WebPToSDL() is separate, because it will be called used by the javascript
version of libwebp (through emscripten build rules)
Change-Id: Ic85b36f8ce4784f46023656278f6480be6802834
previously this was reading the first file a second time, since:
b0450139 ReadImage(): restore size reporting
BUG=webp:329
Change-Id: Ie75192e36a06102b7617841768a18d4dfb02d1f5
drop './' from the reference in webp_quality_LDADD.. this form is used
in the other makefiles. this fixes a parallel build failure seen under
freebsd:
make[1]: don't know how to make ./libwebpextras.la. Stop
Change-Id: I59635a0c747d402cd990f6379eb1948c3e40f278
Documentation says: "if kmin == 0, then key-frame insertion is disabled;
and if kmax == 0, then all frames will be key-frames."
Reading this, you'd expect that if kmax == 0, then with any kmin <= 0
all frames will be key-frames. But actually the kmin <= 0 test is caught
first and you get the opposite (no keyframes but the first). You'd have
instead to set kmax == 0 and any value kmin > 0, which is absolutely
counter-intuitive (reversing order).
Moreover kmax == 1 has no valid kmin (kmin == 1 conflicts with the
`kmax > kmin` rule and kmin == 0 conflicts with `kmin >= kmax / 2 + 1`).
So it should be considered an exception too.
Instead I propose this new logic:
- kmax == 1 means that all frames are keyframes (you are explicitly
requesting a keyframe every 1 frame at most, i.e. all frames).
- kmax == 0 means no keyframes (you ask for a keyframe every 0 frames,
i.e. never).
This is more "logical" language-wise, and also does not involve any
conflicts about what if both kmax and kmin are 0, since now a single
property value is meaningful for the 2 exceptional cases.
Change-Id: Ia90fb963bc26904ff078d2e4ef9f74b22b13a0fd
(cherry picked from commit 2dc0bdcaee)
Compile with XCode, it appears quite slower than the C-version,
especially for arm64.
Change-Id: Ic46dba184a36be454fef674129d2f909003788fc
(cherry picked from commit 4f3e3bbd44)
Documentation says: "if kmin == 0, then key-frame insertion is disabled;
and if kmax == 0, then all frames will be key-frames."
Reading this, you'd expect that if kmax == 0, then with any kmin <= 0
all frames will be key-frames. But actually the kmin <= 0 test is caught
first and you get the opposite (no keyframes but the first). You'd have
instead to set kmax == 0 and any value kmin > 0, which is absolutely
counter-intuitive (reversing order).
Moreover kmax == 1 has no valid kmin (kmin == 1 conflicts with the
`kmax > kmin` rule and kmin == 0 conflicts with `kmin >= kmax / 2 + 1`).
So it should be considered an exception too.
Instead I propose this new logic:
- kmax == 1 means that all frames are keyframes (you are explicitly
requesting a keyframe every 1 frame at most, i.e. all frames).
- kmax == 0 means no keyframes (you ask for a keyframe every 0 frames,
i.e. never).
This is more "logical" language-wise, and also does not involve any
conflicts about what if both kmax and kmin are 0, since now a single
property value is meaningful for the 2 exceptional cases.
Change-Id: Ia90fb963bc26904ff078d2e4ef9f74b22b13a0fd
the default is format is roman, fixes:
`R' is a string (producing the registered sign), not a macro.
Change-Id: If2bce714eff1237cd1702ae1143323249d85b93b
strip '_debug' from the library basename to form the resource target.
since:
919f9e2f Merge "add .rc files for windows dll versioning"
BUG=webp:323
Change-Id: I97cfa48afa846211385720034a40aa452c68134c
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
remove libimagedec from webp_quality, only the webp headers are
inspected. libwebpextras isn't needed by get_disto.
Change-Id: Ib85f97e2c4a9edd97392fd20ef294d1ccc76dda5
We can switch at run-time between the standard GetCoeffs() critical
function, that uses a fast variant of VP8GetBit().
However, some platforms have slow instructions that make standard
VP8GetBit() slow. GetCoeffs() is the right level of branching to
switch to GetCoeffsAlt() that avoids these slow instructions in some
not-frequent cases.
Next patch will upgrade VP8GetBit() to use clz, after this one
is proved to be neutral speed-wise.
Change-Id: Ia6cef5de9de6131574d2202bbc0bea8559c9b693
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.
Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
This makes the structure more generic, without the hard-coded
internal structure.
This is a borderline incompatible ABI change, even if WebPIDecoder structure
is opaque.
Change-Id: I518765c3f76fc17a136cef045a5a8aa70ed70e85
30% faster on x86, 5% faster on N5.
New generic function: WebPLog2FloorC()
This function is called as fallback for BitsLog2Floor() when there's
no clz() available.
Change-Id: Ica15c6092112e514c0e200fab89c434de48d4b19
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.
Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731
Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
and 15% faster MultARGBRow()
by switching to formulae:
X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
(X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128
Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
The previous optimization was performing dichotomy on a function that
is anything in practice, hence a bit of randomness.
Also, two magic constants were used, one for an extra constant cost,
one for an extra linear cost. Both values/models were empirical.
A brute force search for the best cache size is now performed.
To have less CPU impact, a speed optimization is also made by not
inserting a value again and again.
This makes sense but it's also the most common case of when LZ77 is
useful hence an overall improvement sometimes.
Change-Id: I57de5750ad2313b2feecbcd15cd6e4feeb98e5c8
libwebp-0.5.2
- 12/13/2016: version 0.5.2
This is a binary compatible release.
This release covers CVE-2016-8888 and CVE-2016-9085.
* further security related hardening in the tools; fixes to
gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
#312)
* full libwebp (encoder & decoder) iOS framework; libwebpdecoder
WebP.framework renamed to WebPDecoder.framework (issue #307)
* CMake support for Android Studio (2.2)
* miscellaneous build related fixes (issue #306, #313)
* miscellaneous documentation improvements (issue #225)
* minor lossy encoder fixes and improvements
* tag 'v0.5.2': (54 commits)
update ChangeLog
anim_util: quiet implicit conv warnings in 32-bit
jpegdec: correct ContextFill signature
Remove some errors when compiling the code as C++.
vwebp: clear canvas during resize w/o animation
tiffdec: restore libtiff 3.9.x compatibility
update NEWS
AnimEncoder: avoid freeing uninitialized memory pointer.
WebPAnimEncoder: If 'minimize_size' and 'allow_mixed' on, try lossy + lossless.
fix a potential overflow with MALLOC_LIMIT
bump version to 0.5.2
update AUTHORS & .mailmap
iosbuild.sh: add WebPDecoder.framework + encoder
AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
Fix assertions in WebPRescalerExportRow()
fix a typo in WebPPictureYUVAToARGB's doc
systematically call WebPDemuxReleaseIterator() on dec->prev_iter_
doc: use two's complement explicitly for uint8->int8 conversion
Anim_encoder: correctly handle enc->prev_candidate_undecided_
WebPPictureDistortion(): free() -> WebPSafeFree()
...
Change-Id: I16bcf54af41ce8fad98d4fbc8aa1df58f338fc23
This is duplicating code from compiler flag checking that was once
added (then reverted to CMake) as the problem seem to be on the
clang side as detailed in:
http://public.kitware.com/Bug/view.php?id=13194
I also tried to remove a similar warning with pthreads but there
is also an issue on the clang side:
https://llvm.org/bugs/show_bug.cgi?id=7798
Change-Id: I5b0061f0f71e49b493c5ee0c98f70533c28164bd
the sizes are already validated by CheckSizeForOverflow(), add casts to
size_t to avoid -Wshorten-64-to-32
Change-Id: Ida9102c2104f4a334a0ad16d6e01a12bedfd4eec
(cherry picked from commit 1e2e25b0d4)
this corrects the checkboard pattern displayed with transparent images
Change-Id: I5f46dbc9fa3893d61f5f1d4fda643ac030238f94
(cherry picked from commit 67c25ad5b4)
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().
BUG=webp:322
Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
(cherry picked from commit 3ebe1c0003)
This improves compression by ~5% at default quality.
If only 'allow_mixed' is on (but 'minimize_size' isn't), we continue to
use a heuristic to try one of the two or both.
Change-Id: Ia573a73ea26ad25f9debff759eed69d2b0449e82
(cherry picked from commit 3f4042b52a)
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().
BUG=webp:322
Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
the sizes are already validated by CheckSizeForOverflow(), add casts to
size_t to avoid -Wshorten-64-to-32
Change-Id: Ida9102c2104f4a334a0ad16d6e01a12bedfd4eec
Passing the 'verbose' flag to DecodeWebP() wasn't mandated,
and was creating a forced dependency between imageio/ and examples/
Change-Id: Ib3d3f381a7b699df369a97cfb44360580422df11
after:
fbba5bc optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...
Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
WebPDecoder.framework replaces WebP.framework as the decode-only
framework.
WebP.framework now includes the full library allowing for use of the
encoder.
BUG=webp:307
Change-Id: Ic8139f201576bf94b0d4a31cb7cad0655cd8ba97
(cherry picked from commit 1d5046d1f9)
For some reason, gcc has hard time inlining this one...
Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]
Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
WebPDecoder.framework replaces WebP.framework as the decode-only
framework.
WebP.framework now includes the full library allowing for use of the
encoder.
BUG=webp:307
Change-Id: Ic8139f201576bf94b0d4a31cb7cad0655cd8ba97
Set enc->prev_candidate_undecided_ as 0 when a frame is not chosen
as a possible keyframe, so that the dispose method can be
dispose-to-background.
Change-Id: If2899f5dbc06fb53705fb8240072ab6440a6de12
(cherry picked from commit 29fedbf58b)
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)
We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.
Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
(cherry picked from commit 0a3838ca77)
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).
Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.
Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
(cherry picked from commit e4cd4daf74)
some multiplies here and there needed some extra checks
and error reporting. Even if width * height is guaranteed
to be < 2**32, we were multiplying by num_channels and
triggering a 32b overflow.
Some multiplies were not using size_t or uint64_t, additionally.
Change-Id: If2a35b94c8af204135f4b88a7fd63850aa381bbf
(cherry picked from commit 1c36440094)
Change it from transparent white to transparent black, which matches
the transparent color assumed in Webp dispose-to-background method.
Also pre-multiply background colors before comparison in anim_diff,
just as what is done with regular pixel values.
Change-Id: I5a790522df21619c666ce499f73e42294ed276f2
(cherry picked from commit 43bd895879)
WEBP_HAVE_FLAG_LOCAL -> WEBP_HAVE_FLAG_${WEBP_SIMD_FLAG} this will
include the flag being tested in the output:
-- Performing Test WEBP_HAVE_FLAG_NEON
Change-Id: I1c0a143a857b16e4eb1fcf8b23c176380a5fef29
(cherry picked from commit ee1057e3b1)
This will well isolate contributions for original code,
generated code and SIMD (especially for Android).
Change-Id: Ie47664decc7f43c2f57260a72cab951c347281a7
(cherry picked from commit 6c62841076)
brings the final libwebp.so size down 16/20K with arm64/armv7 builds
using ndk-r13
Change-Id: I20d8aba61d6b692b0fc32f4b271e2f9872f03c28
(cherry picked from commit de568abfdb)
quiets a -Wimplicit-function-declaration with some configurations of gcc
(-std=c99).
_POSIX_C_SOURCE is preferred over _BSD_SOURCE with newer versions of
glibc
Change-Id: I378bffb13ba52ff5c4bad1433090dcc387e5d507
(cherry picked from commit bfab894739)
The image is scaled to fit the whole viewport.
Avoid some oddities with offsets, etc.
removes some TODO.
Change-Id: I52fae9ca80a2feed234f32261c7f6358d7594e21
(cherry picked from commit 9310d19258)
max_i4_header_bits_ could drop to zero for difficult image and trigger
a loop. Surprisingly, StatLoop() didn't have this bug.
Change-Id: Idc0f9eadef30a2b2f02041b994f25def30901e36
(cherry picked from commit 21e7537abe)
Pick the mode with the smallest alpha.
It only affects m0, in which case the mode decision is not re-examined
later in VP8Decimate(). Tests on some natural content png images show
PSNR increase as well as visual quality improvement.
Change-Id: Iea997e718cd7477160fa05eb7cfb35f4cec2fa9a
(cherry picked from commit 1377ac2ec1)
use a compile check on a separate file to avoid assuming using
arm_neon.h is safe to use without flags when just the file itself is
self-contained with GCC target pragmas.
BUG=webp:313
Change-Id: I48f92ae3e6e4a9468ea5b937c80a89ee40b2dcfd
(cherry picked from commit 0104d730bf)
Average3 created a slowdown of 1-2% in lossless decoding.
Average4 created a slowdown of 2-3% in lossless decoding.
Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5
...instead of the pointers stored in the array.
Should be faster (inlined) and safer.
Also: suffix explicitly the functions with _SSE2
Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90
Before, a first thread could enter VP8LDspInitSSE2, set
VP8LPredictorsAdd to an SSE2 version BEFORE another thread
would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C
thus leading to a C version actually being the SSE2 one (which
would then create an infinite recursion in the SSE2 predictors
at execution).
Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a
it actually disables the disposal / blending method
and just displays the raw delta values.
Useful for debugging.
TODO: Outline the refreshed area with a drawn rectangle?
Change-Id: I6f8cddd0aad8b953cff78a693ae7e8c31def010c
Benchmarks from vrabaud@:
8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged
skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged
images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged
images/lpcb corpus speed: unchanged, corpus size: unchanged
images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged
images/png_doc corpus speed: unchanged, corpus size: unchanged
images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged
images/jpeg_small corpus speed: unchanged, corpus size: unchanged
images/icip_core1 corpus speed: unchanged, corpus size: unchanged
images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged
images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged
images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged
images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged
images/gif corpus speed: faster: -1.4 % , corpus size: unchanged
images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged
images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged
images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged
images/all corpus speed: faster: -1.5 % , corpus size: unchanged
images/png_small corpus speed: unchanged, corpus size: unchanged
images/png corpus speed: unchanged, corpus size: unchanged
images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged
Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b
avoiding triplets of data should make it easier to write SSE2 versions.
FilterRow() can now filter all input in one single pass
-> conversion is 15-20% faster (but still overall slow compared to -pre 0)
Change-Id: I14c3215e672fdecde7ec80394e814bdc7445019f
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)
We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.
Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
The options are now:
-duration d -> set the whole animation to duration 'd'
-duration d,s -> set only frame 's' to duration 'd'
-duration d,s,e -> set only interval [s,d] to duration 'd'
+ style fix
Change-Id: I72e95282d520146f76696666f44280ad9506affa
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I6ad9f93b6c4b665c559bff87716a7b847f66a20d
(cherry picked from commit 342e15f0ce)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I2881bec2884b550c966108beeff1bf0d8ef9f76b
(cherry picked from commit 1147ab4ee7)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I693cbb295df9cf94aa89294b19c0496bdbe84d18
(cherry picked from commit de9fa5074e)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I3d7b689be8d5751248a82d1021243d80d3f67203
(cherry picked from commit deb1b83199)
this will force a constant duration for an interval of frames
in an animation.
Notes:
a) '-duration [...]' can be repeated as many times as needed.
b) intervals are taken into account in option order. If they overlap, values will be overwritten.
c) 'start' and 'end' can be omitted, but not the duration value.
d) 'end' can be equal to '0', in which case it means 'last frame'
e) single-image files are untouched (ie. not turned into an animation file).
Some example usage:
webpmux -duration 150 in.webp -o out.webp
webpmux -duration 33,10,0 in.webp -o out.webp
webpmux -duration 200,2 -duration 150,0,50 in.webp -o out.webp
Change-Id: I9b595dafa77f9221bacd080be7858b1457f54636
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).
Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.
Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
and use it to validate decoder allocations. fixes a crash in jpegdec at
least.
BUG=webp:312
Change-Id: Ia940590098f29510add6aad10a8dfe9e9ea46bf4
(cherry picked from commit bc86b7a8a1)
make this function return success/failure.
an empty map or out of bounds read is treated as an error.
BUG=webp:316
Change-Id: Ic8651836915ea4dd8e0dc81ca8d5d3f247be1ff8
(cherry picked from commit cac9a36a23)
- remove the inclusion of format_constants.h
- use incremental update of pointer, instead of arithmetic
(follow-up to e2affacc35)
Change-Id: I48420c8defc8d47339f54bc00e9da9617f08ab32
(cherry picked from commit 9f5c8eca79)
Mostly: avoid doing calculation like: ptr + j * stride
when stride is 'int'. Rather use size_t, or pointer increments (ptr += stride)
when possible.
BUG=webp:314
Change-Id: I81c684b515dd1ec4f601f32d50a6e821c4e46e20
(cherry picked from commit e2affacc35)
DGifGetExtension() may successfully return, but the data pointer should
still be validated
BUG=webp:310
Change-Id: I6cfe617871fef2fe07887e5f48bb20f7ab7cfb35
(cherry picked from commit 806f6279ae)
make this function return success/failure.
an empty map or out of bounds read is treated as an error.
BUG=webp:316
Change-Id: Ic8651836915ea4dd8e0dc81ca8d5d3f247be1ff8
This reverts commit f048d38d38.
the 'len' in Remap refers to the src[] not the colormap; this change
breaks valid files
BUG=webp:316
Change-Id: I1ed40075c2194df91d345cb6f29619b1f5cc96fc
Roughly, if both the source and the reference areas are
darker too dark (R/G/B <= ~6), they are ignored.
One caveat: SSIM calculation won't work for U/V planes,
which are 128-centered and not related to luminance.
But WebPPlaneDistortion() enforces the conversion to RGB,
if needed.
Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591
Make WebPPictureDistortion() only compute distortion on A/R/G/B planes, not Y/U/V(A).
(not just for SSIM, but PSNR too).
This is to avoid problems with using SSIM on U/V channels.
If Y/U/V distortion is needed, one can always use WebPPlaneDistortion() individually.
Change-Id: If8bc9c3ac12a8d2220f03224694fc389b16b7da9
use a compile check on a separate file to avoid assuming using
arm_neon.h is safe to use without flags when just the file itself is
self-contained with GCC target pragmas.
BUG=webp:313
Change-Id: I48f92ae3e6e4a9468ea5b937c80a89ee40b2dcfd
Also introduce an always-failing 'reader' for unknown formats.
So we don't have to check reader==NULL, code is more regular.
-> We can get read of specific ReadPNG(), ReadJPEG(), ... declaration and use.
Change-Id: I290759705420878f00c7223c726d4ad404afd9c4
- remove the inclusion of format_constants.h
- use incremental update of pointer, instead of arithmetic
(follow-up to e2affacc35)
Change-Id: I48420c8defc8d47339f54bc00e9da9617f08ab32
Mostly: avoid doing calculation like: ptr + j * stride
when stride is 'int'. Rather use size_t, or pointer increments (ptr += stride)
when possible.
BUG=webp:314
Change-Id: I81c684b515dd1ec4f601f32d50a6e821c4e46e20
When compiling as experimental, WEBP_EXPERIMENTAL_FEATURES
would not be defined because the header defining it would
not be included.
Hence runtime errors in debug mode when running:
./cwebp -lossles whatever
...
Error! Cannot encode picture as WebP
Error code: 4 (INVALID_CONFIGURATION: configuration is invalid)
(detail: WebPConfig would have a random value set for
delta_palettization as config.c does not consider
it to exist.)
Change-Id: I41761cffe81a971130ed514b195a73d1c6dac1b7
DGifGetExtension() may successfully return, but the data pointer should
still be validated
BUG=webp:310
Change-Id: I6cfe617871fef2fe07887e5f48bb20f7ab7cfb35
* prevent 64bit overflow by controlling the 32b->64b conversions
and preventively descaling by 8bit before the final multiply
* adjust the threshold constants C1 and C2 to de-emphasis the dark
areas
* use a hat-like filter instead of box-filtering to avoid blockiness
during averaging
SSIM distortion calc is actually *faster* now in SSE2, because of the
unrolling during the function rewrite.
The C-version is quite slower because still un-optimized.
Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90
quiets a -Wimplicit-function-declaration with some configurations of gcc
(-std=c99).
_POSIX_C_SOURCE is preferred over _BSD_SOURCE with newer versions of
glibc
Change-Id: I378bffb13ba52ff5c4bad1433090dcc387e5d507
The image is scaled to fit the whole viewport.
Avoid some oddities with offsets, etc.
removes some TODO.
Change-Id: I52fae9ca80a2feed234f32261c7f6358d7594e21
If a small hash map can be used, use it to avoid binary search.
This fist hash function that is tried works with the previous
use case of having indexed data in green.
Change-Id: I2f91cec5f3ca7e9c393fd829e69e09bab74f4e7c
using -ssim -o will trigger SSIM map calculation instead of add-diff map.
-gray converts the error map to intensity instead of having each channels' error separated.
Change-Id: I4bdb88880a252e5562aa4e0e3c2353ad93aef20e
some multiplies here and there needed some extra checks
and error reporting. Even if width * height is guaranteed
to be < 2**32, we were multiplying by num_channels and
triggering a 32b overflow.
Some multiplies were not using size_t or uint64_t, additionally.
Change-Id: If2a35b94c8af204135f4b88a7fd63850aa381bbf
The most common conditions are re-ordered and cached.
iter_min was recently introduced to make sure enough iterations
are made in cases where there are many matches (mostly uniform regions).
Now that those are properly analyzed, it becomes useless.
Change-Id: Id3010ee4ec66b84d602fcb926f91eb9155ad27f4
-Skip examining quantized levels that are too high.
-Calculate last_pos_cost only when needed.
Encoding speed for m6 is increased by about 3%;
Compression performance is neutral.
Change-Id: I8af70b049587cca0375d9b3eb00479ec7c0c842a
max_i4_header_bits_ could drop to zero for difficult image and trigger
a loop. Surprisingly, StatLoop() didn't have this bug.
Change-Id: Idc0f9eadef30a2b2f02041b994f25def30901e36
Pick the mode with the smallest alpha.
It only affects m0, in which case the mode decision is not re-examined
later in VP8Decimate(). Tests on some natural content png images show
PSNR increase as well as visual quality improvement.
Change-Id: Iea997e718cd7477160fa05eb7cfb35f4cec2fa9a
SSIM results are incompatible with previous version!
We're now averaging the SSIM value for each pixels instead of
printing a frame-level global SSIM value.
* Got rid of some old code
* switched to uint32_t for accumulation
* refactoring
SSIM calculation is ~4x faster now.
Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673
Having it architecture dependent resulted in an extra
function call of an extern function, hence no inlining and
a 5-10% impact on performance.
Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450
-print_psnr is now much faster because it doesn't use the SSIM code.
The SSIM speed-up and re-write will come later.
Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823
we don't need to centralize best_uv[] since target_uv[] and best_rgb_uv[]
are already centralized. The diff 'W' was just in the ~[-2,2] range, so
we can ignore the correction.
Overall speed-impact is not large, though. Around ~4% faster conversion.
Output with -pre 4 is expected to be slightly different
Change-Id: Ib59f033955577c49b084d0560108020f42d84102
also: remove the useless clipping in StoreGray()
For speed reason, the 'gray' plane was initialized with the same
value for 2x2 block. But in some cases (underlying camera noise, e.g.),
it could lead to instability during iteration, noise amplification,
and visible banding.
Using a precise (but slower) initialization solves the issue, and
since the convergence is faster, we might actually gain some speed.
Change-Id: I81c42101497e7096a8f60289d710f5a3bcb0ddea
We usually need at least 2 iterations to converge
(and usually not much more after that). Only 1 was not enough.
Change-Id: Iaf802ea81afa2596f4ba045c92f5eaff61623b7b
Change it from transparent white to transparent black, which matches
the transparent color assumed in Webp dispose-to-background method.
Also pre-multiply background colors before comparison in anim_diff,
just as what is done with regular pixel values.
Change-Id: I5a790522df21619c666ce499f73e42294ed276f2
Set enc->prev_candidate_undecided_ as 0 when a frame is not chosen
as a possible keyframe, so that the dispose method can be
dispose-to-background.
Change-Id: If2899f5dbc06fb53705fb8240072ab6440a6de12
No need to find backward references for pixels in uniform regions
by looking at all pixels.
Only pixels at the same distance from the end need to be compared to.
Change-Id: I4f187e965f0667d3a929775726a412f7e69f6473
if src_{r,g,b} = 0, any value of src_a or dst_a such that
abs(src_a - dst_a) <= max_allowed_diff
was making the test pass, despite being very different-looking pixels.
The fix is to require same values for src_a and dst_a before attempting
an r/g/b comparison.
Change-Id: If3a55a229eab3904ed454f20065e49e35c39f25c
Constants are such that brute force is sometimes faster for some
data (mostly big images it seems).
Change-Id: I90aef536408683535e3b09ddfa2e77a9834038f6
gcc-4.8.3 was giving the undefined reference to ImgIoUtilReadFile and
ImgIoUtilCopyPlane when linking libimagedec.a
Change-Id: I8b85ab3a860602535629d0bb541a39754413728e
Return key/index if the query is found, and -1 otherwise.
The benefit of this is to save a hashing computation.
Change-Id: Iff056be330f5fb8204011259ac814f7677dd40fe
WEBP_HAVE_FLAG_LOCAL -> WEBP_HAVE_FLAG_${WEBP_SIMD_FLAG} this will
include the flag being tested in the output:
-- Performing Test WEBP_HAVE_FLAG_NEON
Change-Id: I1c0a143a857b16e4eb1fcf8b23c176380a5fef29
decoding and file i/o have been split to imageio, all that remains is
some string routines used for parameter parsing in the examples
Change-Id: I77386cd8aa39124b9e14c95fdbaa17ea4ab5bb24
This function returns a rough estimation of the quality factor used
for compressing the WebP bitstream.
Simple command line: 'webp_quality [-quiet] in.webp'
should print the estimated quality factor.
Change-Id: Ifba3e489461f5a587003ac9f08cc7556e9b24ac2
reserve src/ for the code of the main libraries: libwebp, libwebpdemux
and libwebpmux
+ demote this library to internal only; i.e., don't install the header +
lib with make install
Change-Id: I8c9844db8f494be0fa0a2549a5b75b5cebcf666d
We add the following MSA optimized rescaling functions:
- RescalerExportRowExpand
- RescalerExportRowShrink
Change-Id: Ic1c76065423b02617db94cf0c22bb564219b36e6
We add the following MSA optimized color transform functions:
- TransformColor
- SubtractGreenFromBlueAndRed
Change-Id: Ib182d2b5faa7191f503ce70f0dfde0ac89402fd3
This improves compression by ~5% at default quality.
If only 'allow_mixed' is on (but 'minimize_size' isn't), we continue to
use a heuristic to try one of the two or both.
Change-Id: Ia573a73ea26ad25f9debff759eed69d2b0449e82
The decision is based on the variance between DC values of each
sub-4x4 block. This heuristic is rather ok for predicting whether
the 2nd transform (intra-16) is going to help or not.
The decision threshold varies with quality (=quantization).
It's only used for -m 0 and -m 1, where no full RD-opt is performed.
It actually makes these modes quite faster, with RD curve much
closer to the -m 2 mode.
Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8
We add the following MSA optimized encoder quantization functions:
- QuantizeBlock
- Quantize2Blocks
Change-Id: Ie32b442afa99eee62d2ef48942b41116a4e157d3
pre-allocating a sorted[] array for most common cases of small
alphabet size cuts a lot of traffic.
Change-Id: I73ff2f6e507f81b0b0bb7d9801a344aa4bcb038a
This will well isolate contributions for original code,
generated code and SIMD (especially for Android).
Change-Id: Ie47664decc7f43c2f57260a72cab951c347281a7
+ s/src_a/dst_a/
+ remove unnecessary (void) as expected_num_lines_out is used within the
function
Change-Id: Ic45f798ef22bd19eaabf1a0512d1cf8a201bb4b5
libwebp-0.5.1
- 6/14/2016: version 0.5.1
This is a binary compatible release.
* miscellaneous bug fixes (issues #280, #289)
* reverted alpha plane encoding with color cache for compatibility with
libwebp 0.4.0->0.4.3 (issues #291, #298)
* lossless encoding performance improvements
* memory reduction in both lossless encoding and decoding
* force mux output to be in the extended format (VP8X) when undefined chunks
are present (issue #294)
* gradle, cmake build support
* workaround for compiler bug causing 64-bit decode failures on android
devices using clang-3.8 in the r11c NDK
* various WebPAnimEncoder improvements
* tag 'v0.5.1': (30 commits)
update ChangeLog
Clarify the expected 'config' lifespan in WebPIDecode()
update ChangeLog
Fix corner case in CostManagerInit.
gif2webp: normalize the number of .'s in the help message
vwebp: normalize the number of .'s in the help message
cwebp: normalize the number of .'s in the help message
fix rescaling bug: alpha plane wasn't filled with 0xff
Improve lossless compression.
'our bug tracker' -> 'the bug tracker'
normalize the number of .'s in the help message
pngdec,ReadFunc: throw an error on invalid read
decode.h,WebPGetInfo: normalize function comment
Inline GetResidual for speed.
Speed-up uniform-region processing.
free -> WebPSafeFree()
DecodeImageData(): change the incorrect assert
Fix a boundary case in BackwardReferencesHashChainDistanceOnly.
Make sure to consider small distances in LZ77.
add some asserts to delimit the perimeter of CostManager's operation
...
Change-Id: I44cee79fddd43527062ea9d83be67da42484ebfc
We add the following MSA optimized color transform functions:
- AddGreenToBlueAndRed
- TransformColorInverse
Change-Id: Iceab3813905955aa8b811253df9188512fc7de3f
(in case no alpha was present in the source .webp,
but user requested some)
Change-Id: I9011d38237907c60d6796a86bd2c72166aa80f27
(cherry picked from commit 06a38c7b1c)
This is essentially a revert of a3611513d2
and cfbcc5ece0.
Here is what happened: there was a corruption bug that eventually
got fixed by 0174d18d8b.
But before finding the root, a3611513d2
and cfbcc5ece0 hid the bug
by not imposing length of 1 when it was actually 2 or 3 (which does help
compression as a litteral is more efficient than an offset and a length
of size 2 or 3).
Change-Id: I6f18fc1f583a51ac9d8aab2508458264047cd493
convert the assert() to an error check to avoid crashing when reading
malformed files.
BUG=webp:302
Change-Id: I25eed9cab5c0a439bd3411beacc83f3a27af2bbf
We only perform a single pass, and swap the final histograms
into the beginning of the array as we go. Therefore, they are
already at the correct place at the end of the pass.
-> HistogramCompactBins() is removed, we just truncate the array.
output is bitwise the same.
Change-Id: I9508c96dda0f8903c927a71b06af4e6490c3249c
output should bit-write the same as before, in both
low_effort and non low_effort modes.
if anything, speed is a tad faster, probably because of the
reduced memory traffic.
Change-Id: Iaa2ddcfda2aaffefe7e5b7bc89216373d1ddb194
MAX_COLOR_COUNT was just a synonym and its use in the header was a bit
strange given the visibility of that define.
Change-Id: I536964ddc14a0c48263191b6afb80695b5a038e6
this function can be called not to decode pixels, but simply
to finish processing (through process_func()) the already decoded
pixels.
Change-Id: I80485e92e3c47f0aa3389476dcb82745a243fc4a
The optimization for (len != MIN_LENGTH) actually only holds for
(len > MIN_LENGTH) but (len < MIN_LENGTH) can now happen as len can
be changed in the loop before.
Change-Id: I3f9f91a540206c80385c5fba96c3d64ab9536752
On the non-fast path (use_8b_decode_=0) for decoding the alpha-mask,
we could end up requesting ApplyInverseTransform() with more rows
to process than NUM_ARGB_CACHE_ROWS. This could only happen on the
very last bottom rows of the image.
* ProcessRows() doesn't need to be fixed, since we never request more
than NUM_ARGB_CACHE_ROWS rows. Added an assert for that.
* the use_8b_decode_=1 case doesn't use argb_cache_, but rather does
the palette-decoding call directly. So, no problem here too.
Only the generic (and rather rare) case of calling ExtractAlphaRows()
was affected.
Change-Id: I58e28d590dcc08c24d237429b79614abcef1db7c
This is getting back to the old behavior which is actually better for
compression and speed with the latest patches.
Change-Id: I35884bab02589297c25d6e1e66dc5f13e05f7aa7
This was defined (slightly differently) at two places. Created a common
method and moved to utils/utils.[hc].
Change-Id: I66c3ac6dea24e0cd2c0eaa5440f3142b4dbbe23b
we don't need to store the resulting histogram, so no need to
call HistogramAddEval().
Allows some signature simplifications...
Change-Id: I3fff6c45f4a7c6179499c6078ff159df4ca0ac53
In case where the same offset is found in consecutive pixels,
the cost computation from one pixel can be re-used for the next.
Change-Id: Ic03c7d4ab95f3612eafc703349cfefd75273c3d7
and also recycle the malloc'd intervals
This avoids quite some malloc/free cycles during interval managment.
Change-Id: Ic2892e7c0260d0fca0e455d4728f261fb4c3800e
In a lot of cases, only one interval is used. This can cause
a lot of malloc/free cycles for only 56 bytes. By caching this
single interval and re-using it, we remove this cycle in most
frequent cases.
Change-Id: Ia22d583f60ae438c216612062316b20ecb34f029
As per the spec
(https://developers.google.com/speed/webp/docs/riff_container), only the
extended file format can contain an unknown chunk. So, when assembling a
WebP file with muxer, whenever there is an unknown chunk present, we
should create a VP8X chunk (even though none of the features are
present).
BUG=webp:294
Change-Id: I5da52d311e1853d40063d0f5026100d4325effaa
In some cases, the hash chain for a function is filled several
times:
- GetBackwardReferences -> CalculateBestCacheSize ->
BackwardReferencesLz77 that computes the hash chain
- GetBackwardReferences ->
(not always) BackwardReferencesTraceBackwards ->
BackwardReferencesHashChainDistanceOnly that computes the hash
chain in a slightly different way
Speed and compression performance are slightly changed (+ or -)
but will be homogneized in a later patch.
Change-Id: I43f0ecc7a9312c2ed6cdba1c0fabc6c5ad91c953
-> WebPImageReader
Introduce a variant of image-guessing function that returns a reader
directly: WebPGuessImageReader()
Change-Id: I5ddc53024fcf941e33d997b2be6aa1a963d939ab
adds a generic examples/image_dec.[ch] entry point too.
WebPGuessImageType() can be used to infer image type.
Change-Id: I8337e7b6ad91863c9cf118e4791668d2d175079b
This reverts commit 169004b1d5.
this changes the ABI, so should bump versions and add a note to NEWS
when we're ready to expose it
Change-Id: Ic5bbd0aee2b6fd0f9d438a9effedf22fe0cec4bf
tl;dr
We do the following:
- Start with transparent value of 0x00000000 instead of 0x00ffffff, so that
WebPCleanupTransparentAreaLossless() is a no-op.
- Restore the original canvas after lossy encoding, to discard changes made by
WebPCleanupTransparentArea() before the next encode.
Explanation of why:
In the mixed mode, anim_encoder tries to encode using both lossless and lossy
compression. In fact, when "min_size" option is enabled, there are at most 4
encodes that can happen in this order:
- lossless with dispose none
- lossy with dispose none
- lossless with dispose background
- lossy with dispose background
But both lossless and lossy both potentially modify the canvas during encode
(for better compression):
- Lossless: WebPCleanupTransparentAreaLossless() turns all transparent pixels
to 0x00000000
- Lossy: WebPCleanupTransparentArea() flattens some transparent pixels
So, the result is that, sometimes we feed the modified canvas to the encoder
instead of the original one, which isn't the right thing to do.
This also applies to just lossless or just lossy encoding, as multiple encodes
happen (with the two dispose methods) in those cases too.
Change-Id: Idfa8ce831a1627014785ba7d0316c42f72594455
This was defined (slightly differently) at two places. Created a common
method and moved to utils/utils.[hc].
Change-Id: I19adc9c48f2a4e2ec9d995e78add6f25172774c2
no longer gate this on WEBP_FORCE_ALIGNED as WebPMemToUint32() provides
this service. replace that check with WORDS_BIGENDIAN as the block is
currently little-endian specific.
Change-Id: Ie04ec0179022d20dab53da878008ae049837782f
the read size may be fixed, but the offsets into buf_ are not. forcing
an aligned read then shifting or using a temporary would be costly. this
is less important now that WebPMemToUint32() is being used.
Change-Id: I357fec8f750969cce91987abebed2f95e27a835f
Instead of comparing all the following pixels over len (which can
frequently reach the maximum MAX_LENGTH=4096 for some images),
intervals are stored and compared.
Change-Id: I0dafef6cc988dde3c1c03ae07305ac48901d60ee
The old implementation in enc/near_lossless.c performing a separate
preprocessing step is used only when a prediction filter is not used,
otherwise a new implementation integrated into lossless_enc.c is used.
It retains the same logic for converting near lossless quality into max
number of bits dropped, and for adjusting the number of bits based on
the smoothness of the image at a given pixel. As before, borders are not
changed.
Then, instead of quantizing raw component values, the residual after
subtract green and after prediction is quantized according to the
resulting number of bits, taking care to not cross the boundary between
255 and 0 after decoding. Ties are resolved by moving closer to the
prediction instead of by bankers’ rounding.
This results in about 15% size decrease for the same quality.
Change-Id: If3e9c388158c2e3e75ef88876703f40b932f671f
copy and paste error in the previous commit, change
no_sanitize("unsigned-integer-overflow") from WEBP_UBSAN_IGNORE_UNDEF ->
WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
Change-Id: Id178ee14df1f2c4923a91ce423241e26b60b5d32
add WEBP_UBSAN_IGNORE_UNDEF to WebPMemToUint32() / WebPUint32ToMem()
when WEBP_FORCE_ALIGNED is unset
Change-Id: I726b2e708ce29681584eb10c8874d5cf1e798756
include utils.h directly where needed to allow utils.h to rely on
defines from dsp.h in a follow-up.
Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d
configure gets 2 new options:
--enable-neon / --enable-neon-rtcd
the NEON modules are split to their own convenience lib and built with
auto-detected flags if none are given via CFLAGS.
the /proc/cpuinfo check will only be used for armv7 targets whose
toolchain does not enable NEON by default or didn't have NEON forced by
the CFLAGS from the environment.
Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5
This fixes decoders built against clang-3.8 (r11c). Without this change
bad conditional code would be generated causing all calls to
WebPParseHeaders() to return 4 (UNSUPPORTED_FEATURE).
Original fix:
https://android-review.googlesource.com/#/c/196123
Change-Id: Id4b4d84048d347cea110b6cf297ef9ef4fbed323
TEST_AND_ADD_CFLAGS tests flags independently from AM_CFLAGS. The lack
of -Wformat may cause -Wformat-* checks to fail (gcc 4.9), though in the
end they would have succeeded with -Wall present.
Change-Id: Ic8f0d5b8a3f75a0b105bdc4ddd16690878a30222
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9
this is similar to the changes made in:
c8a87bb AssignSegments: quiet -Warray-bounds warning
3e7f34a AssignSegments: quiet array-bounds warning
Change-Id: Iec7d470be424390c66f769a19576021d0cd9a2fd
This will allow to work in-place on cropped area later.
Also sped up the inverse gradient filtering in SSE2 (~4%)
Change-Id: I463149eee95d36984328f163a1e17f8cabd87441
This is only possible if the filtering is not VERTICAL or GRADIENT.
Otherwise, we need the spatial predictors and hence need the un-visible
part above crop_top row.
COLOR_INDEX transform is the only transform that is not predicted
from previous row. Applying the same for other transform (spatial
predict, ...) is going to be more involve and use an extra temporary row.
+ remove ApplyInverseTransformsAlpha()
(work is done directly within ExtractPalettedAlphaRows())
+ change back to using filter_ instead of unfilter_func_
Change-Id: I09e57efae4a4af00bde35f21ca6e3d73b35d7d43
After the introduction of lossy frame rectangles
we need equivalent option in anim_diff for merging similar frames.
Change-Id: I1d03acace396ec4cb0212586c6e8b8ec5b0b0bfc
not exactly same.
Based on lossy WebP quality setting, ignore minor differences when
flattening
similar blocks.
For 6k set, at default quality with '-min_size' option, improves
compression by 0.3%
Change-Id: Ifcb64219f941e869eb2643e231220b278aad4cd4
there's some subtle changes:
- DecodeAlphaData() may be called with pos==end because we don't want
to decode more data (there's none left), but because we want to apply
process_func() to all the unprocessed pixels already decoded
- last_row is exclusive and should be understood as 'up to last_row'. Can be misleading.
- VP8LDecodeAlphaImageStream() was testing dec->last_pixel_ for completion,
which was wrong because last_pixel_ is the last *decoded* pixel, not the
last *processed* one. -> test now uses last_row_, as expected
Change-Id: I1fb04ba25cd7a4775db9e3deee3e2ae80f9c0a75
this might change some crc slightly, since WebPDequantizeLevels()
performs an analysis pass, counting levels, which impacts the smoothing.
Now, the cropping area is not the same, so minor diffs are expected here
and there.
Change-Id: I3cce1e40c6f11c25b7c841044d637685c5740352
* make ALPHNew/Delete static
* properly init ALPHDec::io_
* introduce AllocateAlphaPlane() and WebPDeallocateAlphaMemory()
* reorganize VP8DecompressAlphaRows()
but we're still allocate the full alpha-plane. Optim will come
in another patch since it's tricky
Change-Id: Ib6f190a40abb7926a71535b0ed67c39d0974e06a
this change will be superseded by patch #335160 eventually, but until then
let's fix the problem temporarily.
Change-Id: Iafd979c2ff6801e3f1de4614870ca854a4747b04
When FlattenSimilarBlocks() was making some blocks transparent with
averaged RGB values, filtering in lossy compression was causing
blockiness just outside the edge of these blocks.
Disabling filtering for that particular case avoid these block
artifacts.
The total encoded size of the 6k GIF set remains roughly the same (in
fact, reduces a bit).
Change-Id: Ida71cbabd59d851e16d871f53d19473312b3cc77
We pick a mapping with quality 0 mapping to max diff 32, to quality 100
mapping to max_diff 1.
For 6k GIF image set, this improves compression by:
4% at quality 0
0.05% at quality 75
Benefits the MovingThumbnailer test videos too.
Change-Id: I6838ce864d41e1e65311d26b9b8115a12390a253
This way we can ignore some noisy pixels and get tighter frame
rectangles.
Some results:
- Correctness:
Tested that anim_diff reports all images are identical for lossless, and
similar min_psnr value for lossy and mixed modes.
Also checked output images visually to make sure there weren't any
obvious kinks.
- Compression:
A very tiny improvement for 6000 image GIF set we have (0.03%) for lossy
and mixed mode. For some of these images, frames get dropped
automatically as they have a very small diff from previous frame.
10 images from test_video_frames_png show a clear improvement in
compression though. This CL leads to 7 out of 9 lossy WebPs getting
smaller -- for one of them, this leads to a higher quality being picked
(as that’s still < 150 KB).
Change-Id: If539b9e77e1375aa15edc8f926933593a9865f1c
and also pass 'VP8Io* io' extra param to VP8DecompressAlphaRows()
This is somehow in preparation for some memory optimizations in
the 'cropping' case. For now, only the easy crop_bottom case is
optimized.
Change-Id: Ib54531ba057bf62b98422dbb6c181dda626c72c2
This avoids generating file that would trigger a decoding bug
found in 0.4.0 -> 0.4.3 libwebp versions.
This reverts commit 6ecd72f845.
Change-Id: I4667cc8f7b851ba44479e3fe2b9d844b2c56fcf4
The mode's bits were not taken into account, which is ok for most of cases.
But in case of super large image, with 'easy' content, their overhead starts
mattering a lot and we were omitting to optimize for these.
Now, these mode bits have their own lambda values associated, limiting
the jerkiness. We also limit (for -m 2 only) the individual number of bits
to something that will prevent the partition 0 overflow.
removed the I4_PENALTY constant, which was a rather crude approximation.
Replaced by some q-dependent expression.
fixes issue #289
Change-Id: I956ae2d2308c339adc4706d52722f0bb61ccf18c
If value is '2', it means the buffer is a 'slow' one, like GPU-mapped memory.
This change is backward compatible (setting is_external_memory to 2
will be a no-op in previous libraries)
dwebp: add flags to force a particular colorspace format
new flags is:
-pixel_format {RGB,RGBA,BGR,BGRA,ARGB,RGBA_4444,RGB_565,
rgbA,bgrA,Argb,rgbA_4444,YUV,YUVA}
and also,external_memory {0,1,2}
These flags are mostly for debuggging purpose, and hence are not documented.
Change-Id: Iac88ce1e10b35163dd7af57f9660f062f5d8ed5e
This is in preparation for some SSE2 code.
And generally speaking, the whole SSIM code needs some
revamp: we're not averaging the SSIM value at each pixels
but just computing the overall SSIM value once, for the whole
plane. The former might be better than the latter.
Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5
based on the sse2 change in:
9960c31 Remove an unnecessary transposition in TTransform.
~9-10.5% faster at the function-level, < 1% overall
Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7
- The result is now indeed closest among possible results for all inputs, which
was not the case for bits>4, where the mapping was not even monotonic because
GetValAndDistance was correct only if the significant part of initial fit in
a byte at most twice.
- The set of results for a larger number of bits dropped is a subset of values
for a smaller number of bits dropped. This implies that subsequent
discretizations for a smaller number of bits dropped do not change already
discretized pixels, which improves the quality (changes do not accumulate)
and compression density (values tend to repeat more often).
- Errors are more fairly distributed between upwards and downwards thanks to
bankers’ rounding, which avoids images getting darker or lighter in overall.
- Deltas between discretized values are more repetitive. This improves
compression density if delta encoding is used.
Also, the implementation is much shorter now.
Change-Id: I0a98e7d5255e91a7b9c193a156cf5405d9701f16
Pass them along to internal 'pic' object, so that progress can be reported back
and user data can also be inspected.
Change-Id: Idb5d0d4a76d07283d704a86c5892e1ad7bda09fa
SSE4.1 is slower than the SSE2 implementation and this seems to
be due to a slow _mm_loadl_epi64 implementation by gcc
(hence a bug with my gcc 4.8) and a very slow _mm_hadd_epi32. Both
got confirmed by IACA and experiments.
Change-Id: I05607f66b7ccd8f4f42e000693aea583ffd5768f
We were not updating the current_width_, which is usually
not a problem, unless we use Delta Palette with small number
of colors
-> Addressed this re-entrancy problem by checking we have
enough capacity for transform buffer.
The problem is not currently visible, until we restrict
the number of gradient used in delta-palette to less than 16.
Then the buffers have different current_width_ and the problem
surfaces.
Change-Id: Icd84b919905d7789014bb6668bfb6813c93fb36e
The transpose refactoring will help removing a transpose in a
later CL.
The horizontal add function helps removing a _mm_sad_epu8 in DC8uv
=> the latency/throughput went from 29/25 to 23/19
Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766
This is to prevent users shooting in the foot using -psnr or
-size alone and not getting the expected result.
Change-Id: I67a3289e4ec0a2a813c98807f2ec5e600f52dc63
'implicit conversion from 'int' to 'short' changes value from 33050 to
-32486'
original patch:
https://codereview.chromium.org/1657313003/
Make libwebp build with -Wconstant-conversion from newer clangs.
After http://llvm.org/viewvc/llvm-project?rev=259271&view=rev, clang
points out that _mm_set1_epi16(33050) causes an overflow in the short
argument to _mm_set1_epi16(). Since there's no version that takes an
unsigned short, add an explicit cast to tell the compiler that this is
intentional.
No behavior change.
Change-Id: I6b4e3401b15cfbcc895f9e81b5c2dc59d43ffb9b
The goal is not to replace our autotools configuration, but to
provide a minimal CMake file that can build the lib, cwebp and
dwebp.
Change-Id: I3be343bd698d118c5f00172449d232d87e868f23
libwebp-0.5.0
- 12/17/2015: version 0.5.0
* miscellaneous bug & build fixes (issues #234, #258, #274, #275, #278)
* encoder & decoder speed-ups on x86/ARM/MIPS for lossy & lossless
- note! YUV->RGB conversion was sped-up, but the results will be slightly
different from previous releases
* various lossless encoder improvements
* gif2webp improvements, -min_size option added
* tools fully support input from stdin and output to stdout (issue #168)
* New WebPAnimEncoder API for creating animations
* New WebPAnimDecoder API for decoding animations
* other API changes:
- libwebp:
WebPPictureSmartARGBToYUVA() (-pre 4 in cwebp)
WebPConfig::exact (-exact in cwebp; -alpha_cleanup is now the default)
WebPConfig::near_lossless (-near_lossless in cwebp)
WebPFree() (free'ing webp allocated memory in other languages)
WebPConfigLosslessPreset()
WebPMemoryWriterClear()
- libwebpdemux: removed experimental fragment related fields and functions
- libwebpmux: WebPMuxSetCanvasSize()
* new libwebpextras library with some uncommon import functions:
WebPImportGray/WebPImportRGB565/WebPImportRGB4444
* tag 'v0.5.0':
update ChangeLog
faster rgb565/rgb4444/argb output
update NEWS
update AUTHORS
update mailmap
bump version to 0.5.0
README: update help text, repo link
Change-Id: I21dc611cfd2a3cb6ed6ba5c455a5049fe615f7a1
The code and logic is unified when computing bit entropy + Huffman cost.
Speed-wise, we gain 8% for lossless encoding.
Logic-wise, the beginning/end of the distributions are handled properly
and the compression ratio does not change much.
Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633
setting all transparent pixels to black rather than the "flatten" method.
0.3% smaller filesize on the 1000 PNGs if alpha cleanup is used (before: 18685774, after: 18622472)
Change-Id: Ib0db9e7ccde55b36e82de07855f2dbb630fe62b1
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.
Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
This implementation brings:
- an SSE implementation of packing / unpacking
- bigger buffers processed at the same time
The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on
lossy encoding (RGB to YUV was already optimized).
Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f
* changes:
demux: remove GetFragment()
demux: remove dead fragment related TODO
demux, Frame: remove is_fragment_ field
demux,WebPIterator: remove fragment_num/num_fragments
demux: remove WebPDemuxSelectFragment
this hasn't been set since parsing of the experimental chunk was
removed.
+ cleanup IsValidExtendedFormat(). is_fragmented has caused immediate
failure since:
4e2589f demux: restore strict fragment flag check
Change-Id: If9ecfc19556297100a6d5de1ba2cffdcbdc6c8fd
and make it the default if available. this limits symbol visibility to
those marked with WEBP_EXTERN.
Change-Id: I529bf6d143a3008d9385044c87ad7dce72dce9a0
though visible WebPCopyPlane & WebPCopyPixels are not part of the public
api; avoid using a private header within this public module.
Change-Id: I5c8615fcc07090ffaa8933b00af418d8431936eb
INFO: From Compiling src/dsp/cpu.c:
src/dsp/cpu.c: In function 'x86CPUInfo':
src/dsp/cpu.c:36:3: inconsistent operand constraints in an 'asm'
With PIC and mcmodel=medium, the %rbx register must be saved and
restored which causes this problem. This was also solved in GCC-4.9 with
this patch:
https://gcc.gnu.org/ml/gcc-patches/2012-12/msg01484.html
Tested:
Builds fine with this change.
Change-Id: Icca8eea7bf5af3ef9f17f6ae2886e3430143febf
CombinedShannonEntropy takes 30% for lossless compression.
This implementation speeds up the overall process by 2 to 3 %.
Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f
The previous priority system used a heap which was too heavy to
maintain (what was gained from insertions / deletions was lost
due to a linear that still happened on the heap for invalidation).
The new structure is a priority queue where only the head is
ordered.
Change-Id: Id13f8694885a934fe2b2f115f8f84ada061b9016
SimpleQuantize()
it's now a single function, that reconstructs the intra4x4 block during the scan
The I4_PENALTY had to be adjusted.
Overall, result is better quality-wise (esp. at q < 50), and a tad faster too.
method #0, #1 and #3+ are unchanged
Change-Id: If262aeb552397860b3dd532df8df6b1357779222
* Precision is slightly different
* also implemented in SSE2 the missing WebPUpsamplers for MODE_ARGB, MODE_Argb, MODE_RGB565, etc.
* removing yuv_tables_sse2.h saved ~8k of binary size
* the mips32/mips_dsp_r2 code is disabled for now, since it has drifted away
* the NEON code is somewhat tricky
Change-Id: Icf205faa62cf46c2825d79f3af6725dc1ec7f052
we map the input file into memory, even in the non-stdin case.
This is less efficient than letting the png/jpeg/... decoding libraries
use fread()'s, but more general.
Change-Id: I4501cb9a1daf69593eb8e3326c115cd8cbdf92fd
-> read is a bit slower (memory allocation and such) than reading directly from disk.
-> we're not yet ready to accept stdin as input (-- -) because we still need to guess
the file type with GetImageType(). And since we can't rewind on stdin, this will need
a bit more work before being able to read from stdin.
Change-Id: I6491fac4b07d28d1854518325ead88669656ddbf
Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset:
Alpha cleanup before: 18856614
Alpha cleanup after: 18685802
For reference, with no alpha cleanup: 19159992
Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps
preprocessing in the encoder, and the cases when the prediction transform is not used.
Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28
The 32-bit buffers are actually rarely 64-bit aligned.
The new solution uses memcmp and is alignment agnostic.
It is also slightly faster.
Change-Id: I863003e9ee4ee8a3eed25b7b2478cb82a0ddbb20
for more speed.
This gives a roughly a 1% speedup for low_effort. But actually this is a
preparation for the upcoming CL that changes RGB values of transparent pixels
based on prediction, which should not be done for low_effort because that would
slightly hurt its performance.
On 1000 PNGs, with quality 0, method 0:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.034 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.428 MP/s
Change-Id: I5ed9f599bbf908a917723f3c780551ceb7fd724d
-Wshorten-64-to-32 and -Wunreachable-code were lost in:
96201e5 migrate anim_diff tool from C++ to C89
Change-Id: Ic7161908b1aa88094563f0c7bcad1d508101d015
add -flax-vector-conversions as a workaround when building the
intrinsics files. this was made available in 4.3.x, but may have been
backported. issue observed with:
cc (GCC) 4.2.1 20070831 patched [FreeBSD] (9.3).
src/dsp/alpha_processing_sse2.c:237 incompatible type for argument 1
of '__builtin_ia32_psrlqi128'
BUG=274
Change-Id: I996076643158ecc581facd22f9fb87bec257ccea
Arrays were compared 32 bits at a time, it is now done 64 bits at a time.
Overall encoding speed-up is only of 0.2% on @skal's small PNG corpus.
It is of 3% on my initial 1.3 Mp desktop screenshot image.
Change-Id: I1acb32b437397a7bf3dcffbecbcd4b06d29c05e1
The same computation was done for both values: go over two buffers,
sum them up, and take a decision on the sum at each iteration.
MIPS32 code has been disabled for now, pending a code update.
Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d
instead of per block. This prepares for a next CL that can make the
predictors alter RGB value behind transparent pixels for denser
encoding. Some predictors depend on the top-right pixel, and it must
have been already processed to know its new RGB value, so requires per
scanline instead of per block.
Running the encode speed test on 1000 PNGs 10 times with default
settings:
Before:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s
After:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s
Same but with quality 0, method 0 and 30 iterations:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s
No effect on compressed size, this produces exactly same files. No
significant measured effect on speed. Expected faster speed from better
memory layout with scanline processing but slower speed due to needing
to get predictor mode per pixel, may compensate each other.
Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334
the problem was the incorporation of the extra constant 1<<16 in the kC1
constant, to emulate the addition. It's now removed and the addition is
performed explicitly.
No real speed difference observed.
cf. issue #278
Change-Id: I2c6499031571d98afff392fb5ebe21a5fa60722d
* changes:
Makefile.vc: enable WEBP_USE_THREAD for windows phone
thread: use CreateThread for windows phone
thread: use WaitForSingleObjectEx if available
thread: use InitializeCriticalSectionEx if available
thread: use native windows cond var if available
if FANCY_UPSAMPLING was not defined but io->fancy_upsampling was set,
then the call to WebPInitSamplers() was skipped -> boom.
Change-Id: Id63e2ecc09f532fbe2ec9936d9ce4b502ba8fac5
Rename the flag to exact instead of the opposite cleanup_alpha. Add the flag to
WebPConfig. Do the cleanup in the webp encoder library rather than the cwebp
binary, this will be needed for the next stage: smarter alpha cleanup for
better compression which cannot be done as a preprocessing due to depending on
predictor choices in the encoder.
Change-Id: I2fbf57f918a35f2da6186ef0b5d85e5fd0020eef
the original change triggered several internal API modifs.
This is to ensure that we're never computing pointer that can
possibly wrap around, or differences between pointers that can
overflow.
no observed speed difference
Change-Id: I9c94dda38d94fecc010305e4ad12f13b8fda5380
We now consider 3 special cases:
* htree-group has only 1 code (no bit is read from bitstream)
* htree-group has few enough literal symbols, so that all the bit
codes can fit into a look-up table of less than 64 entries
* htree-group has a trivial arb literal (not GREEN!), like before
No overall speed change.
Change-Id: I6077fa0b7e5c31a6c67aa8aca859c22cc50ee254
We now get error string instead of printing it.
The verbose option is now only used to print info and warnings.
Change-Id: I985c5acd427a9d1973068e7b7a8af5dd0d6d2585
+ jenkins fixes for native config (library order)
+ add a missing -lm
+ replace log10 by log, just in case
+ partially reverted configure.ac to remove the C++ part
Change-Id: Iee099c544451b23c6cfaca53d5a95d2d332e066e
It was needed earlier for WebPAnimEncoder API when it was using structs
like WebPConfig, but it only uses pointers to those now.
Change-Id: Ic0c144966421c678e8ef54b3fa81574bb2c9cd08
global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)
Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV
Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.
ConvertRowToY() should go into dsp/ too, at some point.
Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425
also switch to using ExtractAlpha() instead of hard-coding the loop.
The ARGBToY/UV functions are rather easy to port to SSE2 / NEON.
Change-Id: I8f1346a9ca427a36ce2d6c848369ca7964d8b3c7
This is to infer the needed conversion to YUV(A) or RGB(A).
This is useful to avoid some conversion steps between ARGB and YUVA.
For instance, if the input file is a JPEG, we decode to RGB and
convert to YUV right away, without the intermediate step to ARGB.
The only caveat is that cropping/scaling might give slightly different result,
because of YUV420 downsampling. Therefore, we omit this feature
at cwebp level, when -crop or -rescale is used.
Change-Id: I5a3abe5108982f2a4570e841e3d9baffc73f5bee
when the previous frame does not specify dispose to background only the
current frame's rectangle should be cleared
related to bug #245
Change-Id: I2fc4f5be99057e0bf87d8fedec57b06859b070bd
use 'u' rather than the unnecessary 'l' as a suffix. this prevents a
conversion warning with some toolchains
Change-Id: I21c33ce08819b3c839c75e03a8f7f3a6041d0695
Note that ALIGN_CST is still kept different in dec/frame.c for now,
because the values is 31 there, not 15. We might re-unite these two
later.
Change-Id: Ibbee607fac4eef02f175b56f0bb0ba359fda3b87
same functionality, but better code layout.
What changed:
* don't trash the palette_[] in EncodePalette(), so it can be re-used
* split generation of image from bit-stream coding
* move all the delta-palette code to delta_palettization.c, and only have 1 entry point there WebPSearchOptimalDeltaPalette()
* minimize the number of "#ifdef WEBP_EXPERIMENTAL_FEATURES" in vp8l.c
* clarify the TransformBuffer stuff. more clean-up to come here...
This should make experimenting with delta-palettization easier and more compartimentalized.
Change-Id: Iadaa90e6c5b9dabc7791aec2530e18c973a94610
some limitations: only for RGBA output,
and if reduction factor is not too small (dst_width > src_width / 128)
20-25% faster, ~4-6% global improvement total decoding.
Change-Id: I95366ddaa4a38e0a96bed754dfe790126f7bb84a
It's better to stay with a 32b fixed-point precision overall, otherwise
the C-version on ARM gets *slower*.
Actually, gcc ARM compiler optimizes some instructions pretty
well when WEBP_RESCALER_FIX is exactly 32, even in C.
Change-Id: I0eea97f7db5947470f5af355dee098eca81e178d
-fembed-bitcode is the default, a framework built without this flag will
fail to link against an application using it.
BUG=267
Change-Id: I83461cb058b1866ac99b3f0bdfa890933e88ed26
New palette compresses more than 20% better with minimum quality loss.
Tested on set of wikipedia images with command line:
cwebp -delta_palettization
Change-Id: I82ec7d513136599cd70386f607f634502eb9095d
Earlier, we stored a 1x1 frame for such frames. Now, we drop every such
frame and increase the duration of its previous frame instead.
Also, modify the anim_diff tool to handle animated images that are
equivalent, but have different number of frames.
Change-Id: I2688b1771e1f5f9f6a78e48ec81b01c3cd495403
The rounding and arithmetic is not the same as previously, to prevent overflow cases for large upscale factors.
We still rely on 32b x 32b -> 64b multiplies. Raised the fixed-point precision to 32b
so that we have some nice shifts from epi64 to epi32.
Changed rescaler_t type to 'uint32_t' in order to squeeze in all the precision required.
The MIPS code has been disabled because it's now out-of-sync. Will be fixed in
a subsequent CL when the dust settles.
~30-35% faster
Change-Id: I32e4ddc00933f1b1aa3463403086199fd5dad07b
* vertical expansion now uses bilinear interpolation
* heavily assumes that the alpha plane is decoded in full, not row-by-row
* split the RescalerExportRow and RescalerImportRow methods into Shrink
and Expand variants.
* MIPS implementation of ExportRowExpand is missing.
There's room for extra speed optim and code re-org, but let's keep that for later patches.
addresses https://code.google.com/p/webp/issues/detail?id=254
Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb
This is designed for the simple use-case where one wants to decode all
frames one-by-one in order.
Also, use this API in anim_util library, which is in turn used by
anim_diff tool.
Change-Id: Ie8b653c04e867d40fd23321b3dd41b87689656c7
This makes the chains more efficient and a larger variety of data is tested.
0.02 % compression gain at q 100, 0.05 % at default quality. 0.8 % speedup by
callgrind.
0.16 % compression gain for lossy alpha ?!
Change-Id: I888120133352799eb14f5f602c7f40ab404bd665
this allows scaling to a particular width/height while preserving the
source aspect ratio using WebPRescalerGetScaledDimensions().
Change-Id: I77b11528753290c1e9bb942ac761c215ccfb8701
using a *tmp_plane buffer to split a/r/g/b planes up appeared to
be the easiest route, compared to copy-pasting the whole code and
making it x_stride aware...
Change-Id: I0898ef1df62bd3e1713b77187b31b5eeef3832fe
allows the values to be used in preproc checks, fixing a
-Wunreachable-code warning in 64-bit builds where VP8L_WRITER_BITS != 16
Change-Id: Ie98dff4e8ef896436557c64d5da2c5d70228a730
Slightly faster on -m 0 -q 0, particularly for small images (50 x 75
image was 0.1 % faster on callgrind measurement).
Increases compression density by 0.005 % for the 1000 images, but small
images can improve even 0.5 % (about 4 bytes, depending on the
characteristics of the palette).
Change-Id: I94f568d396ac62a054a829abeeef3eb0af6b3f94
If this flag is not used, RGB is premultiplied before comparison.
Otherwise, the raw R/G/B values are compared, which can be a problem
in transparent area (alpha=0 R/G/B=anything)
Change-Id: I131cc10ec92414ad508b81f599a60d0097cac470
the x_add/x_sub increments were wrong for u/v in the upscaling case.
They shouldn't be left to the caller's discretion, but set up by
WebPRescalerInit to their exact necessary values.
-> Cleaned-up WebPRescalerInit() param list.
-> added safety asserts
-> removed the mips32/mips_r2 variant of "ImportRow" which were buggy prior
Change-Id: I347c75804d835811e7025de92a0758d7929dfc09
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~54% faster at the
function level and mildly faster overall
Change-Id: Ibc648e9ee35021d48901e05aa596aa01067796a2
a total impact of 1 % on encoding speed
This allows for performance neutral removal of the binary search
in cache bits selection. This will give a small improvement in
compression density.
Change-Id: If5d4d59460fa1924ce71af977320834a47c2054a
0.21 % compression density improvement for 1000 png corpus in
lossless mode
0.50 % compression density improvement for 1000 png corpus in
lossy mode
Change-Id: I14ee8c427ae5d3e116b0ee6695fcdea3321a319d
valgrind --tool=callgrind shows a 9 % speedup: 1021201984 ticks before vs.
927917709 after
-q 0 -m 0 -lossless ~/alpi/1.png
22.040 MP/s before
24.796 MP/s after
Change-Id: Iaab928167b3e20fb0d9401c6f8317a26c5a610b4
* changes:
lossless: combine the Huffman code with extra bits
lossless: Inlining add literal
lossless: simplify HashChainFindCopy heuristics
lossless: 0.5 % compression density improvement
lossless: Add zeroes into the predicted histograms.
lossless: encoding, don't compute unnecessary histo
lossless: Remove about 25 % of the speed degradation
Faster alpha coding for webp
lossless: rle mode not to accept lengths smaller than 4.
lossless: Less code for the entropy selection
lossless: 0.37 % compression density improvement
_BitScanReverse() takes an unsigned long*
http://msdn.microsoft.com/en-us/library/fbxyd7zd.aspx
fixes:
C4057: 'function': 'unsigned long *' differs in indirection to slightly
different base types from 'uint32_t *'
fixes issue #253
Change-Id: I0101ef7be18c7ed188b35e9b17e7f71290953786
do not do length 2 matches far away
speedup for non compressible data by inserting two literals at a time
when no matches are found
Change-Id: Ia8e033071f4186bb8148bb2bf13ca37586734aa3
Increases compression density by 0.03 % for lossy.
Speeds up at least one of the lossy alpha images by 20 %.
Palette entropy 'kludge' seems to save 1-2 % on alpha images.
Change-Id: I2116b8d81593ac8173bfba54a7c833997fca0804
share the computation between different modes
3-5 % speedup for lossless alpha
1 % for lossy alpha
no change in compression density
Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
introduced in:
"lossless: 0.37 % compression density improvement"
Uses the statistics of red and blue histograms to decide if to run
cross color correction at all.
Improves compression density by 0.02 % or so.
Change-Id: I47429557e9cdbd9fa90c584696f241b17427d73f
No significant size degradation (+0.001 %) for 1000 image corpus
Fixes the 8 ms vs 2 ms degradation from:
"lossless: 0.37 % compression density improvement"
Change-Id: Id540169a305d9d5c6213a82b46c879761b3ca608
counting the entropy expectation for five different configurations:
palette
non-predicted
non-predicted with subtract green
predicted
predicted with subtract green
and choose the strategy with the smallest expected entropy
Change-Id: Iaaf209c0d565660a54a4f9b3959067afb9951960
this should be used in preference to free() for releasing memory
returned from WebPDecode*() / WebPEncode*(). this simplifies memory
management when working through language bindings
Change-Id: I15eb538a45390efc552fda8e5c251a3fbdc13c29
After several trials at re-organizing the main loop and accumulation scheme,
this is apparently the faster variant.
removed the SSE41 version, which is no longer faster now.
For some reason, the AVX variant seems to benefit most for the change.
Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~70% faster at the
function level and 1-2% faster overall
Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.
Removed the SSE4.1 implementation which is now slower than SSE2.
Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
New implementations: SubtractGreenFromBlueAndRed and TransformColor
around 1-2% faster lossless encoding.
Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
DispatchAlpha* functions are hard to speed up, compared to SSE2.
ExtractAlpha sees a ~15% speed-up though.
Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de
over a 1000 image corpus
Single photograph benchmark:
Before:
Q=20: 2.560 MP/s
Q=40: 2.593 MP/s
Q=60: 1.795 MP/s
Q=80: 1.603 MP/s
Q=99: 1.122 MP/s
After:
Q=20: 3.334 MP/s
Q=40: 2.464 MP/s
Q=60: 2.009 MP/s
Q=80: 1.871 MP/s
Q=99: 1.163 MP/s
This CL allows for some further improvements that would not be possible
otherwise.
Change-Id: I61ba154beca2266cb96469281cf96e84a4412586
use vld1_dup_u8() rather than a separate ld+dup after the values were
zero extended; mildly faster at the function level
Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89
It can be used to test if given pair of animated images (GIF and/or
WebP) are identical in terms of pixel match and other animation
properties.
Change-Id: I84adea145e9d062be6ad06a0d4fcdc9658cf52d4
VP8EncPredChroma8 improvements over ~20M pixels
left/top: ~67%
left-only: ~52%
top-only: ~57%
none: ~61%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76
VP8EncPredLuma16 improvements over ~20M pixels
left/top: ~75%
left-only: ~47%
top-only: ~59%
none: ~63%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d
this target is out of date and there are better ways to make a clean
tree (the first 2 versioned, the last one not):
make distclean (when using autoconf)
git clean -fdx
git archive
Change-Id: I766b75e0adf566c6f7db1a087ff486020b031b3a
structured extended feature flags require eax = 7; avoids incorrectly
detecting avx2 on some older processors that support avx.
for completeness also check for value=1 support used by the other
checks.
from [1]:
INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor
Information and the Vendor Identification String
[1]
http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html
Change-Id: I60b20d661a978d551614dbf7acdc25db19cb6046
* make VP8LPrefetchBits() safe wrt past-EOS reads
* set 'BitReader::bits_" to a safe shifting value upon EOS
no visible performance difference on x86
Change-Id: I0a4177928cfa81d5dfc9054b36a686eaa1bf8c65
Speed-wise equivalent on x86 and ARM (maybe a tad faster, hard to tell).
Note that the two 32-bit multiples are not strictly equivalent
to the 64-bit one, since we're missing one carry propagation.
In practice, no observable difference was seen because of this
slightly different hashing result.
Change-Id: I8f2381175eae1cb20dabf149e6b27e1768fba6ab
update makefile.unix to provide 'make extras' building instructions.
note: input ordering depends on WEBP_SWAP_16BIT_CSP for rgb565 and rgb4444
Change-Id: I6f22d32189d9ba2619146a9714cedabfe28e2ad0
When converting from video sources, the duration of current frame
is often unavailable until the next frame. So, we internally convert
timestamps to durations.
Change-Id: I20ad86361c22e014be7eb91f00d5d40108281351
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.
DC8uv: ~19% faster
DC8uvNoLeft: ~12% faster
Change-Id: I707c4f6177a65b5d1f2d3deeca87d2bb740185e2
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.
DC16: ~20% faster
DC16NoLeft: ~14% faster
Change-Id: I7ec3f8a6e5923f88a530f79fceb88d5001bef691
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning
Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
had to rename few structs.
-> we can now include both vp8i.h and vp8enci.h without naming
conflicts.
Change-Id: Ib41b498f1b57aab3d6b796361afc45210ec75174
Visible speed-up, thanks to pshufb and pabsw and psignw use.
had to tweak configure.ac to make "smmintri.h" presence correctly
detected (we need to set the CPPFLAGS instead of the CFLAGS!)
Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080
allows the former to be inlined; negligible speed-up in most cases,
however this is structure is consistent with the rest of the optimized
modules
Change-Id: Ib080240b06f7a995b47f1906627850c355b82901
we look at average global improvement and stop when things are
moving slow, or when we had a quite good first iteration already
(means: the picture is "not difficult")
Change-Id: I8ab7d100353039b5b32bb5fac3fe03c8440c78d5
Speedup method StoreImageToBitMask by replacing the code to find histogram
index and Huffman tree codes at every iteration to a more optimal code that
updates these only when the current pixel (to write) crosses the histogram
tile-row boundary.
This change speeds up the StoreImageToBitMask method by 5%.
Change-Id: If01a1ccd7820f9a3a3e5bc449d070defa51be14b
the standard vtbl functions are available there [1][2].
based on a patch from: aaroncrespo
fixes issue #243.
[1]
http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf
[2] Apple LLVM Compiler Version 6.1
- Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0.
[...]
Support for the arm64 architecture has been significantly revised to
align with ARM's implementation, where the most visible impact is that a
few of the vector intrinsics have changed to match ARM's specifications.
Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447
The MIPS code for cost is not updated yet, that's why i keep Residual::*cost
around for now. Should be removed in favor of *costs later.
Change-Id: Id1d09a8c37ea8c5b34ad5eb8811d6a3ec6c4d89f
affected functions: SimpleVFilter16, SimpleHFilter16,
SimpleVFilter16i and SimpleHFilter16i
noticed bug in FilterLoop26 (fix included in this patch)
Change-Id: I72d9c1e45cbac6393eba52bb549b04924d463e30
removes circular dependency between dsp and enc.
since:
a987fae MIPS: dspr2: added optimization for function GetResidualCost
Change-Id: Ifeb8fc02de89e2ba982ed7ffacd925d649bfec3c
kGammaFix is now only defined with USE_GAMMA_COMPRESSION;
fixes:
use of undeclared identifier 'kGammaFix'
Change-Id: Ib1e2f410eff9b83be065894f88181f91dd2776e1
set/get residual C functions moved to new file in src/dsp
mips32 version of GetResidualCost moved to new file
Change-Id: I7cebb7933a89820ff28c187249a9181f281081d2
partially normalize indent, vertical whitespace and capitalization with
the copy used on developers.google.com/speed/webp
Change-Id: I8044418eeb9eaf5bd5c799675c74f6f845d503d6
the input to the function is non-const and the pointer being operated is
being free'd; removes an unnecessary cast in the process
Change-Id: Ic515ed672ddf7f8e4e36eeac696ff7aa8a3652f7
this is in line with the recommendation in the spec, cf.,
5603947 webp-container-spec: clarify background clear on loop
Change-Id: Id3910395b05a1a1f2804be841b61f97bd4bac593
Updated the near-lossless level mapping and make it correlated to lossy
quality i.e 100 => minimum loss (in-fact no-loss) and the visual-quality loss
increases with decrease in near-lossless level (quality) till value 0.
The new mapping implies following (PSNR) loss-metric:
-near_lossless 100: No-loss (bit-stream same as -lossless).
-near_lossless 80: Very very high PSNR (around 54dB).
-near_lossless 60: Very high PSNR (around 48dB).
-near_lossless 40: High PSNR (around 42dB).
-near_lossless 20: Moderate PSNR (around 36dB).
-near_lossless 0: Low PSNR (around 30dB).
Change-Id: I930de4b18950faf2868c97d42e9e49ba0b642960
at the beginning of the loop there's an implicit clear of the entire
canvas to the background (or application defined) color. this avoids
adding the final composited frame to the first.
Change-Id: Ia3a52cf4482c6176334a5c9c99a0ddd07d1776e7
previously the first frame would be redisplayed, which might be
unexpected if the final frame was meant to be a composite, for example.
Change-Id: I4da795623c71501e2fa426e8fba8fb2ffcbab58a
DitherRow() only checks this value, not 'skip_' so previously it was
uninitialized for these blocks.
Change-Id: I0f698b81854ee9d91edacb51c1e3bdab9cba96f2
similar to:
1ba61b0 enable NEON intrinsics in aarch64 builds
vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.
performance varies based on the input, 1-3% on encode was observed
Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
AnalyzeSubtractGreen constitutes about 8-10% of the comression CPU cycles.
Statistically, subtract-green is proved to be useful for most of the
non-palette compression. So instead of evaluating the entropy (by calling
AnalyzeSubtractGreen) apply subtract-green transform for the low-effort
compression.
This changes speeds up the compression at m=0 by 8-10% (with very slight loss
of 0.07% in the compression density).
Change-Id: I9797dc39437ae089716acb14631bbc77d367acf4
Speed up AnalyzeSubtractGreen by looping through the image pixel once to
compute the two histograms.
AnalyzeEntropy code cleanup.
Removed some 'if' conditions and pointer indirections inside pixel iterate loop.
Change-Id: Ia65e3033988ff67df8e3ecce19d6e34cfc76358e
Enable the WebP near-lossless feature by pre-processing the image to smoothen
the pixels.
On a 1000 PNG image corpus, for which WebP lossless (default settings) gets
25% compression gains, following is the performance of near-lossless feature
at various '-near_lossless' levels:
-near_lossless 90: 30% (very very high PSNR 54-60dB)
-near_lossless 75: 38% (very high PSNR 48-54dB)
-near_lossless 50: 45% (high PSNR 42-48dB)
-near_lossless 25: 48% (moderate PSNR 36-42dB)
-near_lossless 10: 50% (PSNR 30-36dB)
WebP near-lossless is specifically useful for discrete-tone images like
line-art, icons etc.
Change-Id: I7d12a2c9362ccd076d09710ea05c85fa64664c38
Some frames that were previously selected as key-frames were incorrectly
being reset to sub-frames.
Change-Id: Iee342dbb9a9aec144b8185c3b54ca56aa7038bfb
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.
The heavy bottleneck left for optimization is still GradientUnfilter()
Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
Simplify and speedup backward references for low-effort settings by evaluating
LZ77 references only. This change speeds up compression by 10-25% at lower
(q <= 25) quality range with a slight drop (0.2%) in the compression density.
Change-Id: Ibd6f03b1a062d8ab9191786c2a425e9132e4779f
Cleaup Near-lossless code
- Simplified and refactored the code.
- Removed the requirement (TODO) to allocate the buffer of size WxH and work
with buffer of size 3xW.
- Disabled the Near-lossless prr-processing for small icon images (W < 64 and H < 64).
Change-Id: Id7ee90c90622368d5528de4dd14fd5ead593bb1b
Reported here: https://code.google.com/p/webp/issues/detail?id=239
At the beginning of method 'DecodeImageData', pixels up to
'dec->last_pixel_' are assumed to be already cached. So, at the end of
previous call to that method also, that assumption should hold true.
Hence, we should cache all pixels up to 'src' regardless of 'src_last'.
This affects lossless incremental decoding only, as that is when
src_last and src_end differ.
Note: alpha decoding is implicitly incremental, as alpha decoding of
only the rows 'y_end - y_start' happens during FinishRow() call. So, this bug
affects alpha decoding in non-incremental decoding flow as well.
This bug was introduced in: https://gerrit.chromium.org/gerrit/#/c/59716.
Change-Id: Ide6edfeb2609b02aff701e1bd9fd776da0a16be0
SanitizeEncoderOptions() was changing kmin to 2 too, which resulted in a
bad state with kmin == kmax.
Change-Id: Ie7273f1949bac469e7e6c8efbc98b154caf6de0f
* use the same TFIX == YFIX precision (2bits)
* use int instead of float in LinearToGammaF()
output is visually equivalent. Code is a little faster.
Change-Id: Ie3cfebca351dbcbd924b3d00801d6523dca6981f
add additional return checks and asserts to avoid:
C6102: Using 'XXX' from failed function call ...
Change-Id: I51f5fa630324e0cd7b2d9fceefecb4f4021474b1
width / height are unsigned; fixes a warning with msvs /analyze:
C6340: Mismatch on sign: 'const unsigned int' passed as _Param_(4) when
some signed type is required in call to 'fprintf'.
Change-Id: I5f1fad4c93745baf17d70178a5e66579ccd2b155
check enc->argb_ to quiet an msvs /analyze warning:
C6387: 'enc->argb_+y*width' could be '0': this does not adhere to the
specification for the function 'memcpy'.
Change-Id: I87544e92ee0d3ea38942a475c30c6d552f9877b7
quiets an msvs /analyze warning, the cast is due to an earlier msvs
build warning.
C6297: Arithmetic overflow: 32-bit value is shifted, then cast to
64-bit value. Results might not be an expected value.
Change-Id: I0bb6cda57879f2fbd1e3515f6753a11bc08d14ac
and only use it on x86 / x64 where it's available.
has the side-effect of quieting a msvs /analyze warning:
C6001: Using uninitialized memory 'cpu_info'.
Change-Id: Iae51be3b22b2ee949cfc473eeea9fd9fb6b3c2cb
Disable costly TraceBackwards heuristic for computing the backward references
for low_effort (method=0) compression.
The TraceBackwards heuristic is already disabled for lower (q < 25) quality
range. Following is the compression data for 1000 image corpus for q >= 25.
This speeds up compression (q >= 25) by a factor of 2.5-3X with slight loss of
compression density (0.7% for lower quality range and 1.2% for higher qualities).
Change-Id: I256c9e2137c7de4083f423ea32ee12d3b0f46253
added new function CollectColorRedTransforms to C, which calls
TransformColorRed and it is realized via pointer to function
Change-Id: Ia68d73bfcf1ca2cb443dc2825910946221f87835
explicitly add immintrin.h instead of transitively picking it up via
windows.h presumably. makes the code easier to move around.
Change-Id: If70d5143ac94fc331da763ce034358858e460e06
added new function CollectColorBlueTransforms to C, which calls
TransformColorBlue and it is realized via pointer to function
Change-Id: Ia488b7a7a689223b5d33aae9724afab89b97fced
reported in https://code.google.com/p/webp/issues/detail?id=237
An empty partition #0 should be indicative of a bitstream error.
The previous code was correct, only an assert was triggered in debug mode.
But we might as well handle the case properly right away...
Change-Id: I4dc31a46191fa9e65659c9a5bf5de9605e93f2f5
gif2webp.exe is excluded from 'all' as libgif requires C99 support which
is only available from VS2013 onward. additional include paths/libraries
can be added with CL=/I... + LINK=... for this target.
Change-Id: If9f6be1c4029f486a9d6cdc0e862376cbd1e1be0
untested
builds only libwebp (targeting windows phone), with threading disabled
for now due to the use of desktop specific threading API
Change-Id: Idf3057a73c88e1a672ffe0a7c72d2626d701b706
- Lower the threshold parameters for HashChainFindCopy.
For 1000 image PNG corpus (m=0), this change yields speedup of 15-20% at
lower quality range (0.25% drop in compression density) and about 10%
for higher quality range without any drop in the compression density.
Following is the compression stats (before/after) for method = 0:
Before After
bpp/MPs bpp/MPs
q=0 2.8615/18.000 2.8651/18.631
q=5 2.8615/18.216 2.8650/20.517
q=10 2.8572/18.070 2.8650/21.992
q=15 2.8519/18.371 2.8584/21.747
q=20 2.8454/18.975 2.8515/20.448
q=25 2.8230/8.531 2.8253/9.585
// Compression density remains same for q-range [30-100]
q=30 2.7310/7.706 2.7310/8.028
q=35 2.7253/6.855 2.7253/7.184
q=40 2.7231/6.364 2.7231/6.604
q=45 2.7216/5.844 2.7216/6.223
q=50 2.7196/5.210 2.7196/5.731
q=55 2.7208/4.766 2.7208/4.970
q=60 2.7195/4.495 2.7195/4.602
q=65 2.7185/4.024 2.7185/4.236
q=70 2.7174/3.699 2.7174/3.861
q=75 2.7164/3.449 2.7164/3.605
q=80 2.7161/3.222 2.7161/3.038
q=85 2.7153/2.919 2.7153/2.946
q=90 2.7145/2.766 2.7145/2.771
q=95 2.7124/2.548 2.7124/2.575
q=100 2.6873/2.253 2.6873/2.335
Change-Id: I0e17581fb71f6094032ad06c6203350bd502f9a1
- Do light weight entropy based histogram combine and leave out CPU
intensive stochastic and greedy heuristics for combining the
histograms.
For 1000 image PNG corpus (m=0), this change yields speedup of 10% at
lower quality range (1% drop in compression density) and about 5% for
higher quality range (1% drop in compression density). Following is the
compression stats (before/after) for method = 0:
Before After
bpp/MPs bpp/MPs
q=0 2.8336/16.577 2.8615/18.000
q=5 2.8336/16.504 2.8615/18.216
q=10 2.8293/16.419 2.8572/18.070
q=15 2.8242/17.582 2.8519/18.371
q=20 2.8182/16.131 2.8454/18.975
q=25 2.7924/7.670 2.8230/8.531
q=30 2.7078/6.635 2.7310/7.706
q=35 2.7028/6.203 2.7253/6.855
q=40 2.7005/6.198 2.7231/6.364
q=45 2.6989/5.570 2.7216/5.844
q=50 2.6970/5.087 2.7196/5.210
q=55 2.6963/4.589 2.7208/4.766
q=60 2.6949/4.292 2.7195/4.495
q=65 2.6940/3.970 2.7185/4.024
q=70 2.6929/3.698 2.7174/3.699
q=75 2.6919/3.427 2.7164/3.449
q=80 2.6918/3.106 2.7161/3.222
q=85 2.6909/2.856 2.7153/2.919
q=90 2.6902/2.695 2.7145/2.766
q=95 2.6881/2.499 2.7124/2.548
q=100 2.6873/2.253 2.6873/2.285
Change-Id: I0567945068f8dc7888041e93d872f9def91f50ba
As 'curr_canvas_mod' is being modified during calls to IncreaseTransparency()
and FlattenSimilarBlocks(), GetSubRect() should get the sub-frame from
'curr_canvas_mod' as well.
Earlier, GetSubRect() was computed from 'curr_canvas', so modifying
'curr_canvas_mod' had no effect on encoding.
Change-Id: Ia847503007b66364817fe57def5a9e3c37d1b3cc
We set the parameters even if there is just one frame. This is to make sure
assembly is correct even if single frame animation is NOT converted to a full
frame later.
Change-Id: If79e6aa5e2575cb0f3cd229f16c655b3663c35b0
Given that we decided to only handle frame disposal for output WebP
internally,
only current and previous canvas need to be maintained.
Change-Id: I625293bed5aeb5aabf4eca779f6ec3ee84c9ff2a
A separate API to generate animated WebP images.
It will eventually replace the internal gif2webp_util methods.
Also: update makefiles.
Change-Id: Idf61dfc1016c10b24fea70425d1a2323cffba515
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)
Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row
Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
references to fragments remain, along with some superfluous checks; these
will be removed in a future commit.
Change-Id: I39fe9314900ecbc5d60e5065b65fa1b4c668af63
and apply Paeth predictor (predictor#11) for the low effort (m=0) mode.
For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality
range and about 10% for higher quality range.
Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93
most of the time, we don't need to actually move the
data.
Compression is randomly slightly different, because HistogramCompactBins() changed.
Timing is about the same.
Change-Id: Ia6af8e9780581014d6860f2b546189ac817cfad1
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.
Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
we don't need to store the whole distribution in order to compute the alpha
Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.
Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/
With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.
Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
~68% faster
reuses TM4() adding support for the additional rows, the columns were
already being done.
Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
Move all the Entropy evaluation methods to lossless.c (from histogram.c).
There's slight difference in the way entropy is computed for evaluating
entropy in prediction methods and histogram (literal) for huffman trees.
Plan (later) to merge few (static) methods and reduce the code size.
This change has no impact on the compression speed/density.
Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15
Remove handling for WEBP_HINT_GRAPH w.r.t use_palette flag.
The WEBP_HINT_GRAPH is now used at one place, to set the initial size of the
Bit Writer as bpp for photo images are generally larger than the graphical
images.
Change-Id: I1b9c4436c85a8f69da74c0dbcd292397323f2696
Update BackwardRefsWithLocalCache to do in-place update of backward
references w.r.t local color cache index.
No impact on the compression density or compression speed.
Change-Id: Ie066251464c3928c044e037b43df3af28b48ca30
histogram.c:
- Verified (earlier) that there's low correlation between Red & Blue colors
(particularly after applying Cross-color transform). The Bin based histogram
merge, bins on three entropies viz literal, red & blue symbols. Removing
either of blue or red increases the compression density. So keeping the bins
for red & blue sybmols.
- Keeping the compact bins method as-is. This way it's simpler to read.
huffman_encode.h: Added field comments for struct HuffmanTree and removed the TODO.
Change-Id: Ia76f7bc730079d1b3b644038c5d9931db3797f0e
Use the refs_lz77 computed (with cache_bits=0) in the method 'CalculateBestCacheSize'
to regenerate the LZ77 references corresponding to the optimum cache_bits and avoid
calling costly 'BackwardReferencesLz77' one extra time.
This change leaves the compression density unchanged and speeds up compression
by 10-15%.
Change-Id: I5a92e11788d3c3f656aa7e1fba54fb5d96ee0027
This wasn't working for this specific scenario:
- Encode an RGBA 'pic' (with trivial alpha) using lossy encoding.
(so that pic->a == NULL after import happens).
- Modify the 'pic->argb' so that it has non-trivial alpha.
- Encode the same 'pic' again.
This used to fail to encode alpha data as pic->a == NULL.
Change-Id: Ieaaa7bd09825c42f54fbd99e6781d98f0b19cc0c
- The optimal cache bits is evaluated inside the method 'VP8LGetBackwardReferences'.
- The input cache_bits to 'VP8LGetBackwardReferences' sets the maximum cache
bits to use (passing 0 implies disabling the local color cache).
- The local color cache is disabled for lowerf (<= 25) quality levels (as before).
- Enabled local color cache for palette images as well. This saves additional
0.017% bytes with a slight (2-3%) improvement in the compression speed.
- Removed 'use_2d_locality' parameter from method VP8LGetBackwardReferences, as
this option is not an option now (after we freeze the lossless bit-stream).
Change-Id: I33430401e465474fa1be899f330387cd2b466280
This is because, FlattenSimilarBlocks() replaces some opaque pixels by
transparent ones. This results in an equivalent output only if blending
is turned on for the current frame.
Change-Id: I05612c952fdbd4b3a6e0ac9f3a7d49822f0cfb9b
Updated BackwardReferencesRle method by utilizing the local color cache.
Also changed the name of method BackwardReferencesHashChain to
BackwardReferencesLz77 to reflect the LZ77 coding.
For the 1000 image corpus, this change saves 0.2% bytes
(at default settings) and is 2-5% faster to encode.
Change-Id: Ic3f288253b3bbb101a69945a80994c3fd0917f8b
Optimize backwardreferences (about 0.1% byte savings) with almost same
compression speed (3% faster on defaut compression settings).
1.) Simplified iteration logic for HashChainFindCopy.
- Remapped the iter_max constant.
2.) Simplified main for loop for BackwardReferencesHashChain
- Removed 'if' conditions for corner cases in the main loop.
- Refactored the method(AddSingleLiteral) for adding one pixel.
Change-Id: I1bc44832fd81f11e714868a13e606c8f83157e64
Speed up BackwardReferencesHashChainDistanceOnly method by:
1.) Remove for loop for shortmax code path.
2.) Execute the shortmax code path after regular call to
HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH).
3.) Also for shortmax, call method HashChainFindOffset (for length = 2),
instead of expensive method HashChainFindCopy().
4.) Handling first pixel (i==0) outside main loop and removing one if
condition (i > 0) per pixel.
5.) Handle the last pixel outside the main 'for' loop.
Overall compression speedup observed is around 5% (+/- noise).
Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b
at least clang 3.[45] in c++ mode with -std=c++11 define __STRICT_ANSI__
this change set WEBP_INLINE to inline for c++/non-strict-ansi/> c99
fixes crbug.com/428383
Change-Id: Ief2b934353c336a75865c73c90cc3dc5e4f83913
Enable bin-partition entropy based heuristic for merging histograms
for higher (q >= 90) qualities as well. Keep the old behavior at the
maximum quality level (q==100).
This speeds up the compression between Q=90-99 (method=4) by factor 5-7X
and with loss of 0.5-0.8% in the compression density.
Change-Id: I011182cb8ae5403c565a150362bc302630b3f330
The Maximum allowed limit is 11.
The Q=25 and below is not impacted as cache bits are forced to 0.
This saves 0.05% - 0.1% bytes for other quality with almost same compression
speed (+/- 2-3%, that's more of a noise).
Change-Id: Icf972a98f298c89e140e37a627baf709134be9a0
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).
The compression density remains the same with this change, with little faster
compression speed.
Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.
This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.
Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
Snapping odd offsets in GIF to even offsets in WebP was causing extra row/column
being disposed in such cases.
Code is rewritten to maintain previous and current canvas (it used to maintain
previous canvas and current frame earlier). And we recompute change rectangles
as those from GIF may no longer apply.
Also, this renders methods like ReduceTransparency() and ConvertToKeyFrame()
redundant, as internally maintained current canvas is always independent of
previous canvases.
Disposal method choice: we pick the disposal method that results in the smallest
change rectangle.
Change-Id: Ic31186d98fe1a2a790a89d1571b17e3abd127e79
Instead of calling HashChainFindMethod, call a new (subset) method
HashChainFindOffset to get the offset/distance for a given length.
The encoding is tad faster at default compression
Before After
bpp/rate bpp/rate
442 Palette 0.2720/5.270 MP/s 0.2720/5.790 MP/s
558 non-palette 3.7607/0.797 MP/s 3.7607/0.816 MP/s
Change-Id: If4041a9c18f7e972f49fcbab8c3e2f013d8bf1cf
Updated VP8LGetBackwardReferences and HashChainFindCopy method with following:
- Remove the recursive CostModelBuild.
- Reuse the lz77 backward refs in CostModelBuild, instead of evaluating it
again (as it was done for recursion_level=0).
- Consolidated the Match-length logic inside FindMatchLength method.
- Removed the logic for altering best_length/val based on the 2D distance.
The additional 162 value (+= 9 * 9 + 9 * 9 - y * y - x * x) can't change the
best_val eval computation to choose a different curr_length, as best_val was
set to 'curr_length << 16'.
Following is the impact on the compression speed/density at default & max
quality, overall this speeds up compression by 5-15% (q=100 -> 75) with a tad
drop (0.02-0.03%) in compression density for the non-palette images.
Before After
bpp/Rate(MP/s) bpp/Rate(MP/s)
q=75 (def)
All 1000 2.4492/1.049 MP/s 2.4498/1.230 MP/s
Palette 0.2719/5.060 MP/s 0.2719/6.110 MP/s
non-Palette 3.7597/0.732 MP/s 3.7607/0.840 MP/s
q=100
All 1000 2.4134/0.125 MP/s 2.4142/0.131 MP/s
Palette 0.2692/2.585 MP/s 0.2692/2.885 MP/s
non-Palette 3.7040/0.079 MP/s 3.7053/0.083 MP/s
Change-Id: I27a5eff3356d876c3e949fd32262244b25678b7a
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples
Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
Compared to previous mode it gives another 10-30% improvement in compression keeping comparable PSNR on corresponding quality settings.
Still protected by the WEBP_EXPERIMENTAL_FEATURES flag.
Change-Id: I4821815b9a508f4f38c98821acaddb74c73c60ac
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9
In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault
Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
Evaluate if for Palette images (num_colors <= 256), non-palette
compression path (Subtract green, predictor transform etc) yield an
optimal compression density.
This change reduces the WebP file (for palette images) size by 0.4% with
drop of 3-5% in compression speed.
Change-Id: I1ad66fa94db4fd7ba7bc215763791ef662cd4f42
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9
Change-Id: Ifa7c0dd7f3f075b3860fa8ec176d2c98ff54fcea
got rid of the |a-b|^|b-a| method and went back
to just (a-b)^2 instead.
quality | size(bytes) after/before | time (ms) after/before
Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0
We compact the palette by weighted distance, favoring the green channel.
Average gain on paletted file is ~0.5%, with gain up to 6-7% on some favorable cases.
Encoding speed is unaffected.
Disabled for alpha (or any single-channel input)
Also: always use quality=20 for EncodePalette() since it
doesn't make any real difference.
Change-Id: I19fb14316a366f139a941b45aef5663a33c905e1
SSE2 version is 2.1x faster
This is used to transfer the alpha plane to green channel before lossless compression.
Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
Don't combine the Histograms that have trivial (single valued A, R & B)
symbols.
Following is the compression savings data along with compression time (before
& after) per image.
Before After
bpp, rate(MP/s) bpp, rate(MP/s)
Q=25, method = 4 2.508, 1.807 2.499, 1.916
Q=50, method = 4 2.460, 1.488 2.456, 1.512
Q=75, method = 4 2.452, 1.078 2.450, 1.092
Q=25, method = 5 2.505, 1.398 2.496, 1.383
Q=50, method = 5 2.458, 1.170 2.453, 1.143
Q=75, method = 5 2.453, 0.886 2.450, 0.855
This change provides 0.1-0.4% compression gains and speeds up the lossless
compression for the default method=4 (the drop in compression speed is between 1-3.5% for method=5).
Change-Id: Idfd88c2092f37afacd26a97097b3053f8183953a
Tested on 1000 pngs corpus with quality 90-100 it gives ~0.15% improvement
in compression density and ~7% speed up.
Change-Id: I460f56c96707edb3c1f0b51a024e5122e10458df
iOS 5 support isn't available in the Xcode 6 install; iOS 6 covers
phones starting at the 3GS, so should be a reasonable base line
Change-Id: Ie5603c9e30cb52114b372509e183febbf679a69a
* We don't need to change DecodeAlpha, since incremental
decoding is not useful for Alpha (we already decode
progressively along the RGB)
* Similarly, we don't do incremental decoding for level>0 planes:
the metadata don't turn into visible pixel (only the ones in level0), so...
(No visible speed change)
Change-Id: I2fd4b9ba227561a7dbede647686584b752be7baa
- don't call VP8LClear() when there's no error (and let the caller do it)
- only initialize output once if state_ is not READ_DATA
- don't over-set dec->status_ = READ_DATA
- don't re-set dec->status_ if DecodeImageStream() fails
- remove unneeded dec->action_ field
- make ReadImageInfo() check br->eos_
- use ErrorStatusLossless() more consistently
Change-Id: Ica6e4b1c82e3fce8b1ce0274def551a886b73b0b
For some GIF images, the first frame is missing the corresponding
graphic control extension. For such cases, we were never calling
GetBackgroundColor(), and default background color value (white) was being used
incorrectly.
So, we call GetBackgroundColor() when we encounter the first image
descriptor instead, to make sure that it is always called.
Change-Id: I00fc8e943d8a0c1578dcd718f3e74dec7de4ed61
Optimize the decoding for region that have trivial literal codes.
The trivial literal is defined as huffman image with Red, Blue and Alpha
huffman trees with only single code values.
This speeds up lossless decoding by 3%
Change-Id: I0204949917836f74c0eb4ba5a7f4052a4797833b
put WebPMuxConfig on the stack in main() rather than allocating it in
InitializeConfig(); removes a level of indirection there.
Change-Id: I81d386f7472ebbd322dd3fdbfda9d78dbeb62a66
* We were re-doing most of the work in plain-C as 'left-over'.
* we were always returning has_alpha = true because of a bad mask all_0xff
These bugs were conservative and silent, in the sense that we were 'just' doing
more work than necessary.
Now, the SSE2 version is really 2x faster than the C version.
Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa
Handle the corner case when VP8LDecodeImage() method is called with an invalid
header data. The lossless decoding doesn't support incremental mode yet.
Return the error status as BITSTREAM error in case not all pixels are decoded
with the provided bit-stream. Also added asserts in the VP8LDecodeImage() method
to validate the decoder header with appropriate/valid data for huffman trees
(htree_groups_ etc).
Change-Id: Ibac9fcfc4bd0a2c5f624bb9d4a2b9f6459aa19ea
-> introduce special case 64b pattern-copy, similar to the 8b one for alpha.
-> use mempcy() for non-overlapping areas
+ cosmetics and homogenezation of the code
Change-Id: I0e65e04b96fec94c009a4614137dfba2a0f98561
ReadSymbol() finishes with a VP8LSetBitPos() call only and could miss an eos_ during the decode loop.
Things are faster because of inlining too.
Change-Id: I2d2a275f38834ba005bc767d45c5de72d032103e
eos_ needs to be set only when superfluous bits have actually
been requested.
Earlier, we were assuming pre-mature end-of-stream to be an error.
Now, more precisely, we mark error when we have encountered end-of-stream *and*
we attempt to read more bits after that.
This handles cases where image data requires no bits to be read
Change-Id: I628e2c39c64f10c443fb51f86b1f5919cc9fd299
if ALPHA_LOSSLESS_COMPRESSION produces a too big file (very rare!),
we fall-back to no-compression automatically.
Change-Id: I5f3f509c635ce43a5e7c23f5d0f0c8329a5f24b7
speed-up is ~1.6% for photographic image to 10% for graphical image
(1000 images corpus was sped up by 5.8 %)
Code by akramarz@google.com and jyrki@google.com
Change-Id: Iceb2e50e6cc761b9315a3865d22ec9d19b8011c6
Split initialization of YUV444Converters[] out of Upsamplers init.
update test for NULL function pointers
Change-Id: I9603f54250f90c85a12ffbecfd6c59e9b06c47e0
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.
Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
rightmost pixel was missing a copy, which could lead to invalid read.
Also added a lower dimension of 4, below which we use the regular conversion.
This is to prevent corner cases, in addition to not being overkill.
Change-Id: Iac12e7a3d74590f12fe8eeb1830b9891e61439f6
_M_IX86 will be defined in mingw builds after including windows.h. as
the gcc inline asm is first, this missing check would only have caused
an error if the code was reorganized.
Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c
with a special case for dithering==0., it gets somewhat faster on x86
thanks to inlining.
Also, less macros.
Change-Id: Ic2f2bf6718310743bb40cef2104fa759a073e6d5
New function: WebPPictureSmartARGBToYUVA()
This implement smart RGB->YUV conversion.
This is rather undocumented for now, and is triggered using '-pre 4'
preprocessing option.
This is slow-ish and use quite some memory, but should be improvable.
This is somehow a usable beta version.
Change-Id: Ia50a8c30134e4cab8a7d3eb70aef13ce1f6187a1
This compresses the uimage using lossless compression and controlable
decimating pre-process.
Code is under WEBP_EXPERIMENTAL_FEATURE while it's being experimented with.
Change-Id: I8b7f4cfcc3c6afc52a556102842bdbb045ed5ee8
libwebp 0.4.1
- 7/24/14: version 0.4.1
This is a binary compatible release.
* AArch64 (arm64) & MIPS support/optimizations
* NEON assembly additions:
- ~25% faster lossy decode / encode (-m 4)
- ~10% faster lossless decode
- ~5-10% faster lossless encode (-m 3/4)
* dwebp/vwebp can read from stdin
* cwebp/gif2webp can write to stdout
* cwebp can read webp files; useful if storing sources as webp lossless
* tag 'v0.4.1':
update ChangeLog
iosbuild.sh: specify optimization flags
update ChangeLog
makefile.unix: add vwebp.1 to the dist target
update ChangeLog
gif2webp: dust up the help message
remove -noalphadither option from README/vwebp.1
update NEWS for the next release
update AUTHORS
bump version to 0.4.1
restore mux API compatibility
remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
restore encode API compatibility
restore decode API compatibility
gif2webp: fix compile with giflib 5.1.0
gif2webp: simplify giflib version checking
Change-Id: Icf599f29bc6c0db757bc133aaddb3dbbbc316e08
explicitly set '-O3 -DNDEBUG'. setting CFLAGS on the command line
overrides the default, resulting in -O0.
Change-Id: I213979f646b1444b1d8e0eb0bb58e9b2c3cc4dd3
* try to avoid trailing '.'
* rationalize capitalization
missed in:
0a8b886 dust up the help message
Change-Id: I6f80736cc8a2ff4f185f63d463a57d5bbf88a0db
+ vwebp's -help output
this is a future option; missed in:
793368e restore decode API compatibility
Change-Id: If920df2cf8de57ebad93a6b98830562149396d8d
We store the raw RGB samples decoded from JPEG, and avoid precision loss.
Note that this may increase the encoding time reported by
cwebp -v, since RGB->YUV now occur during WebPEncode call
(in case of lossy), instead of ReadJPEG().
This also increases the memory use, since we're carying the
source ARGB samples around.
Change-Id: Ic2180206cfc9f5574f391e91c3b89b9d81695d01
fixes thread support detection in e.g., cross-compiles/mingw installs
without libpthread, the behavior is unchanged in mingw installs with
libpthread. in the latter case although libpthread is detected the
windows api (_beginthreadex) path is being used.
Change-Id: Ie5f39f15f380731a1006a2497496d627f08fa103
Some single-frame GIF images have a canvas larger than the frame rectangle. For
such images, we retain the ANMF, ANIM and VP8X chunks in the output WebP file.
This ensures that the full canvas width/height and frame offsets are retained.
Change-Id: I3ebae4893f953984de4072fda0938411de787a29
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.
Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
defines HAVE_BUILTIN_BSWAP16/32/64
updated endian_inl.h to have a non-configure fallback for gcc and clang
BSwap16() now uses __builtin_bswap16 if available
Change-Id: Ia04ee07b39303c4b247df96d84f298fb8a81f389
possibly other cross-compiles; avoids misusing -mavx2/-msse2 in these
cases by checking the usability of the associated intrinsics header
files.
iosbuild.sh has been broken since at least:
6e61a3a configure: test for -msse2
Change-Id: Ic650d9eec70f6a5de7d89997b3b425a4aa50ccd9
also reduce the load size from 64 to 32 bits as the top 32 bits are
being shifted away in the operation.
the change is neutral speed-wise on x86_64 as is the change in load size
on x86, but it gives a slight improvement on 32-bit arm.
x86 is improved ~13%, 32-bit arm ~3.7%
aarch64 is untested but will likely benefit as well.
Change-Id: Ibcb02a70f46f2651105d7ab571afe352673bef48
forces aligned memory reads (via memcpy) in the VP8 bit reader, useful
for platforms that don't support unaligned loads.
Change-Id: Ifa44a9a1677fbdc6a929520f9340b7e3fcbd6692
this defines WORDS_BIGENDIAN, replacing uses of
__BIG_ENDIAN__/__BYTE_ORDER__ with it
+ fixes lossless BGRA output with big-endian toolchains
that do not define __BIG_ENDIAN__ (codesourcery mips gcc)
Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7
moves the following to this header:
- htole*() definitions from bit_writer.c
- __BIG_ENDIAN__ fallback define from bit_reader_inl.h
Change-Id: I7fff59543f08a70bf8f9ddac849b72ed290471b1
(typecast uint32 pointer to uint64).
The proposed change is little (0.05%) slower but avoids uint32 to uint64
pointer conversion.
Change-Id: I6b8828077ea1324fabd04bfa7e7439e324776250
previously, the final canvas size was adjusted tightly from the
animation frames. Now, it can be specified separately (to be larger, in particular).
calling WebPMuxSetCanvasSize(mux, 0, 0) triggers the 'adjust tightly' behaviour.
This can be useful after calling WebPMuxCreate() if further image addition
is expected.
-> Fixed gif2webp accordingly.
also: made WebPMuxAssemble() more robust by systematically zero-ing WebPData.
Change-Id: Ib4f7eac372cf9dbf6e25cd686a77960e386a0b7f
this will remove a warning about the shift amount not being
an immediate (=constant).
Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a
Also: fix the NULL sharpen argument mismatch.
mostly to balance the use of bswap64, some gcc platforms are already
interpreting the default case the same
Change-Id: Icf860f55b3f16bea349a7d721e6d6abeeb4e5cf3
Similarly to Chrome, we then use the first sub-rectangle
to set the canvas size.
Also: add check for too-large GIF dimensions (>MAX_CANVAS_SIZE)
Change-Id: Idce55f1e6f6982a8f0e082aac540e16b530e023e
We align with Blink/Chromium code by:
- checking for ANIMEXTS1.0 signature too
- using ByteCount >= 3 instead of requiring ByteCount==3
Change-Id: Idc484ca62878517df3dccb1fdb3bb45104a5e066
see: http://odur.let.rug.nl/kleiweg/gif/netscape.html
Another store to load forward block was detected coming from the function
FTransform.
FTransform save the output data 4 times 8 bytes each. when this data is
later being loaded by the QuantizeBlock function in one chunk of 16 bytes
that caused a store to load forward block.
The fix was done in the FTransform function where each two consecutive 8 bytes
were merged into one 16 bytes register and saved into the memory.
This fix gives ~21% function level gain and 1.6% user level gain.
Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992
ChunkVerifyAndAssign() expects to have at least 8 bytes to work with,
but was only checking for the presence of 4.
Change-Id: I8456b15d872de24a90c1e8fbfba463391ced5c7f
new options:
dwebp -alpha_dither
vwebp -noalphadither
When the source was marked as quantized, we use a threshold-averaging
filter to smooth the decoded alpha plane.
Note: this option forces the decoding of alpha data in one pass, and
might slow the decoding a bit.
The new field in WebPDecoderOptions struct is 'alpha_dithering_strength'
(0 by default, means: off). Max strength value is '100'.
Change-Id: I218e21af96360d4781587fede95f8ea4e2b7287a
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)
Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)
Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.
Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
User-hook can fail but error was not propagated back.
Change-Id: Ic79f9543bf767634a127eccfef90af855ff15c34
Also: some ad-hoc clean-up and API dusting. More to come later...
this change has the side-effect of using directory names in the
include, silencing a lint warning.
Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.
also: fixes a warning for win64
+ slight speed-up
Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
+ add a WEBP_HAVE_SSE2 to dsp.h
not all 32-bit toolchain configurations will have sse2 enabled by
default
Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
this is used to set WEBP_USE_AVX2 in files where the build flag won't be
used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called
Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f
* remove LEFT/RIGHT_JUSTIFY distinction. It's all RIGHT_JUSTIFY now.
* simplify VP8GetSigned(), and add some masking branch-less code. Much
faster on ARM (~13% speed-up). 8% on x86-64, 5% on MacBook.
* split critical implementation into separate bit_reader_inl.h file that
is only included where needed (vp8.c / tree.c / bit_reader.c)
* bumped BITS value from 16 to 24 for x86-32b too, since it's a bit faster.
Change-Id: If41ca1da3e5c3dadacf2379d1ba419b151e7fce8
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).
Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
Needed to add 'volatile' and some casts.
Relevant excerpt from the 'man longjmp':
===============
The values of automatic variables are unspecified after a call to longjmp() if they meet all the following criteria:
· they are local to the function that made the corresponding setjmp(3) call;
· their values are changed between the calls to setjmp(3) and longjmp(); and
· they are not declared as volatile.
===============
Change-Id: Ic72dc92669513a820369ca52a038afa9ec88091f
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.
Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()
Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
similar to makefile.unix:
> nmake /f Makefile.vc CFG=release-static HAVE_AVX2=1
from the msdn:
The /arch:AVX2 option and __AVX2__ macro were introduced in Visual
Studio 2013 Update 2, version 12.0.34567.1
(Update 2, version 12.0.30501.00 seems to work)
Change-Id: I649ee47c9fdc399fc71a8ac8464728608d9b6412
gcc was generating very complex code, one for each case of br->len_ values!
also, pretty-fy the mask constants
Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled
Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
Sometimes, we can write 18bit or more at time, and it would
overflow the 32bit accumulator.
Also clarified the num-bits limitations (and exposed
VP8L_MAX_NUM_BIT_READ in bit_reader.h)
fixes http://code.google.com/p/webp/issues/detail?id=200
Seems a bit faster (use of local fields for bits_ / used_)
also: added the __QNX__ bswap while at it.
Change-Id: I876db93a931db15b083cf1d838c70105effa7167
* remove some sign-bit flipping
* turn some macro into inline functions
* fix some 'const' in signatures
* clarify the int8/uint8 usage
Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d
the inclusion of the files is harmless when NEON is not enabled and will
allow them to be built with NEON for APP_ABI=arm64-v8a which currently
does not use the '.neon' suffix
Change-Id: I39377876b1b68822c38f4e2396da93c56145fc0f
also changed the token-page layout a little bit to remove
a not-needed field.
This reduces the number of malloc()/free() calls substantially
with minimal increase in memory consumption (~2%).
For the tail of large sources, the number of malloc calls goes
typically from ~10000 to ~100 (e.g.: bryce_big.jpg: 22711 -> 105)
Change-Id: Ib847f41e618ed8c303d26b76da982fbc48de45b9
MALLOC_FAIL_AT flag can be used to set-up a pre-determined failure
point during malloc calls. The counter value is retrieved using
getenv().
Example usage: export MALLOC_FAIL_AT=37 && cwebp input.png
will make 'cwebp' report a memory allocation error the 37th time
malloc() or calloc() is called.
MALLOC_MEM_LIMIT can be used similarly to prevent allocating more
than a given amount of memory. This is usually less convenient to
use than MALLOC_FAIL_AT since one has to know in advance the typical
memory size allocated.
Both these flags are meant to be used for debugging only!
Also: added a 'total_mem_allocated' to record the overall memory allocated
Change-Id: I9d408095ee7d76acba0f3a31b1276fc36478720a
Non-photo source produce far less literal reference and their
buffer is usually much smaller than the picture size if its compresses
well. Hence, use a block-base allocation (and recycling) to avoid
pre-allocating a buffer with maximal size.
This can reduce memory consumption up to 50% for non-photographic
content. Encode speed is also a little better (1-2%)
Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11
This change reduces the number of calls to WebPSafeMalloc from 200 to
100. The overall memory consumption is down 3% for Lenna image.
Change-Id: I1b351a1f61abf2634c035ef1ccb34050b7876bdd
Some tracing code is activated by PRINT_MEM_INFO flag.
For debugging only! (not thread-safe, and slow).
Change-Id: I282c623c960f97d474a35b600981b761ef89ace9
the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_)
is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain
in the encoder.
Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677
We use automatic int->uint64_t promotion where applicable.
(uint64_t should be kept only for overflow checking and memory alloc).
Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125
on ExUtilLoadWebP() failure no allocated memory will be returned, so
it's safe to exit immediately. additionally, any webp specific problems
will already have been reported as part of the call to
WebPGetFeatures().
broken since:
4a0e739 dwebp: move webp decoding to example_util
Change-Id: Ibc632015a1f52bae7f96d063252624123fa7c2da
doing so is not part of ISO C; removes some pedantic warnings.
use webp/decode.h to pickup VP8StatusCode instead.
Change-Id: I19b35e0f8a36fb7c45944ae9ca86838e08b90548
The predictors based on Average2 are tad slower.
Following is the performance data for these predictors normalized to
number of instruction cycles (as per valgrind) per operation:
- Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11
instruction cycles for the C version.
- Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12
instruction cycles for the C version.
The predictors based on Average4 is faster and Average3 is tad slower:
- Predictor10 (Average4) now takes 23 instruction cycles compared to 25
instruction cycles for the C version.
- Predictor5 (Average3) now takes 20 instruction cycles compared to 18
instruction cycles for the C version.
Maybe SSE2 version of Average2 can be improved further. Otherwise, we can
remove the SSE2 version and always fallback to the C version.
Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029
* merged the two HistogramAdd/AddEval() into a single call
(with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster
Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
move simple loop filter defines closer to their use and LOAD* to a
location common with the intrinsics
Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3
the complex loop filter has no inline equivalent; the simple loop filter
remains conditional on USE_INTRINSICS: it's left undefined for now.
Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2
CollectHistogram / SSE* / QuantizeBlock have no inline equivalents,
enable them where possible and use USE_INTRINSICS to control borderline
cases: it's left undefined for now.
Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155
using this in Load4x16 was slightly slower and didn't help mitigate any
of the remaining build issues with 4.6.x.
Change-Id: Idabfe1b528842a514d14a85f4cefeb90abe08e51
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.
Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.
Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.
Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
HuffmanCost and HuffmanCostCombined optimized and added
'const' to some variables from ExtraCost functions.
Change-Id: I28b2b357a06766bee78bdab294b5fc8c05ac120d
When remapping buffer, br->eos_ was wrongly being set to true for
certain
images.
Also, refactored the end-of-stream detection as a function.
Reported in http://crbug.com/364830
Change-Id: I716ce082ef2b505fe24246b9c14912d8e97b5d84
Some versions of compiler in debug build can't find
a register in class 'GR_REGS' while reloading 'asm'
Number of used registers is decreased in this fix.
Change-Id: I7d7b8172b8f37f1de4db3d8534a346d7a72c5065
This is to help further optimizations.
(like in https://gerrit.chromium.org/gerrit/#/c/69787/)
There's a small slowdown (~0.5% at -z 9 quality) due to
function pointer usage. Note that, for speed, it's important
to return VP8LStreaks by value, and not pass a pointer.
Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000
.set at - Indicates that macro expansions may clobber
the assembler temporary ($at or $28) register.
Some macros may not be expanded without this
and will generate an error message if noat
is in effect.
"at" also added to the clobber list.
Change-Id: I67feebbd9f2944fc7f26c28496e49e1e2348529d
avoids:
src/dsp/enc_mips32.c: In function 'ITransformOne':
src/dsp/enc_mips32.c:123:3: can't find a register in class 'GR_REGS' while reloading 'asm'
src/dsp/enc_mips32.c:123:3: 'asm' operand has impossible constraints
Change-Id: Ic469667ee572f25e502c9873c913643cf7bbe89d
apparently faster, but we might save some load/store to/from memory
once we settle for the intrinsics-based FTransform()
(also: fixed some #ifdef USE_INTRINSICS problems)
Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7
Functions VP8LFastLog2Slow and VP8LFastSLog2Slow
also: replaced some "% y" by "& (y-1)" in the C-version
(since y is a power-of-two)
Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0
Verified OK, but right now they don't seem faster.
So they are disabled behind a USE_INTRINSICS flag (off for now)
Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c
+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual
Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d
slightly faster than the inline asm
in practice not much faster than the C-code in a full NEON build, but
still better overall in an Android-like one that only enables NEON for
certain files.
Change-Id: I69534016186064fd92476d5eabc0f53462d53146
* inverse transform is actually slower with intrinsics + gcc-4.6,
so is left disabled for now.
With gcc-4.8, it's a bit faster than inlined assembly.
* Sum of Square error function provide a 2-3% speed up
There's enabled by default (since there's no inlined-asm equivalent)
Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a
+ misc cosmetics
* seems 4% slower than inlined-asm with gcc-4.6
* is a tad faster (<1%) with gcc-4.8
(disabled for now)
Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095
The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t
register, and re-use the previous (luma) functions
Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38
This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.
Change-Id: I491aa1c726def934161d4a4377e009737fbeff82
+ added some work-around gcc-4.6 to make it compile (except one function).
+ lots of revamping
All variants tested ok.
Speed-up is ~5-7%
Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3
vertical only currently, 2.5-3% faster
placed under USE_INTRINSICS as this change depends on the simple
loopfilter
improves the simple loopfilter slightly thanks to some reorganization
Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155
When 4 pixels are left, they should be processed with SSE2.
Decoding is marginally faster (~0.4%).
Encoding speed: No observable difference.
Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e
new file: lossless_neon.c
speedup is ~5%
gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509
I've tried adding -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.
Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0
It's disable for now, because it crashes gcc-4.6.3 during compilation
with -O2 or -O3. It's been tested OK with -O1.
Code is still globally disabled with USE_INTRINSICS, though.
Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c
disabled for now (but tested OK), thanks to the USE_INTRINSICS #define
We'll activate the code when we're on par with non-intrinsics
Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff
Tune HistogramCombineBin for hard images that are larger than 1-2 Mega
pixel and represent photographic images.
This speeds up lossless encoding on 1000 image corpus by 10-12% and compression
penalty of 0.1-0.2%.
Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81
there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.
The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.
Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).
Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b
expose the predictor array as function pointers instead
of each individual sub-function
+ merged Average2() into ClampedAddSubtractHalf directly
+ unified the signature as "VP8LProcessBlueAndRedFunc"
no speed diff observed
Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044
With -bypass_filter switched on, the lossless-compressed
data is decoded ahead of time (before being transformed and
display). Hence, the last row was called twice.
http://code.google.com/p/webp/issues/detail?id=193
Change-Id: I9e13f495f6bd6f75fa84c4a21911f14c402d4b10
(and ~2-3% on ARM)
We don't need to store cost/score for each node, but only for
the current and previous one -> simplify code and save some memory.
Also made the 'Node' structure tighter.
Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6
all the functions involved return double and later these locals are used
in double calculations. fixes a vs build warning
Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14
Get back some of the compression gains by extending the search space for
GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not
helping much in yielding better compression density.
Before:
1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations)
Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s
After:
1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations)
Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s
Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d
Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash
for merging the Histograms more efficiently, instead of the randomized
heuristic of existing method HistogramCombine.
This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50)
with 0.8% penalty in compression density. For lower method, the speed up is 25-30%,
with 0.4% penalty in the compression density.
Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce
Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove
one if condition inside the main/critial loop. Also separated TransformColor &
TransformColorInverse into separate functions and avoid one 'if condition'
inside this critical method.
This change speeds up lossless decoding for Lenna image about 5% and 1000 image
corpus by 3-4%.
Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d
* allow reading from stdin for dwebp / vwebp
* allow writing to stdout for gif2webp
by introducing a new function ExUtilReadFromStdin()
Example use: cat in.webp | dwebp -o - -- - > out.png
Note that the '-- -' option must appear *last*
(as per general fashion for '--' option parsing)
Change-Id: I8df0f3a246cc325925d6b6f668ba060f7dd81d68
These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
cwebp -z 5 in.png -o out_lossless.webp
There are 10 possible values for -z parameter:
0 (fastest, lowest compression)
to 9 (slowest, best compression)
A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.
This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.
Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)
Output is bitwise the same as before.
Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.
This speeds up lossless encoding by 1-2% and almost no impact on compression
density.
Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b
mostly by:
- storing a single rd-score instead of cost / distortion separately
- evaluating terminal cost only once
- getting some invariants out of the loops
- more consts behind fewer variables
Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c
incorporate non-last cost in per-level cost table
also: correct trellis-quant cost evaluation at nodes
(output a little bit different now). Method 6 is ~4% faster.
Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8
Speedup lossless encoder by 20-25% by optimizing:
- GetBestColorTransformForTile: Use techniques like binary search and
local minima search to reduce the search space.
- VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for
log(1 + x) and increase the threshold for calling the approximate
version of log_2 (compared to costly call to log()).
Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d
converts 2 s16 vectors to 2 u8 and store to uint8_t destination;
TransformAC3 can reuse this after a rework
Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0
Separate the C version from the MIPS32 version and have run-time
initialization during RescalerInit()
Change-Id: I93cfa5691c073a099fe62eda1333ad2bb749915b
Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.
Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.
This speeds up the lossless encoding at lower qualities by 10-15%.
Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
* simplify the endian logic
* remove the need for memset()
* write 16 or 32 at a time (likely aligned)
Makes the code a bit faster on ARM (~1%)
Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c
-> remove the 'color_transform' multiplier, use more constants, etc.
This function is particularly critical, mostly because of
GetBestColorTransformForTile().
Loop is a bit faster (maybe ~1%)
Change-Id: I90c96a3437cafb184773acef55c77e40c224388f
The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits)
data in the destination buffer and while pre-multiplying the alpha to RGB colors.
Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba
(instead of per-macroblock)
speed unchanged.
simplified the context-saving for incremental decoding
Change-Id: I301be581bab581ff68de14c4ffe5bc0ec63f34be
VP8GetThreadMethod() may be called with a NULL headers param; correct an
assert.
broken since:
8a2fa09 Add a second multi-thread method
Change-Id: If7b6d1b8f4ec874d343a806cee5f5e6bb6438620
This makes the segmentation overall less prone to
local-optimum or boundary effect.
(and overall, encoding is a little faster)
Change-Id: I35688098b0f43c28b5cb81c4a92e1575bb0eddb9
The registers and instructions are quite different to 32bit
and the assembly code needs a rewrite.
more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/
Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f
the -alpha_cleanup flag was ineffective since we switched cwebp
to using ARGB input always.
Original idea by David Eckel (dvdckl at gmail dot com)
Change-Id: I0917a8b91ce15a43199728ff4ee2a163be443bab
New API options: WebPDecoderOptions.flip and 'dwebp -flip ...'
it uses negative stride trick.
Also changed the decoder code to support user-supplied
buffers with negative stride, independently of the
WebPDecoderOptions.flip value.
Change-Id: I4dc0d06f0c87e51a3f3428be4fee2d6b5ad76053
libwebp 0.4.0
- 12/19/13: version 0.4.0
* improved gif2webp tool
* numerous fixes, compression improvement and speed-up
* dither option added to decoder (dwebp -dither 50 ...)
* improved multi-threaded modes (-mt option)
* improved filtering strength determination
* New function: WebPMuxGetCanvasSize
* BMP and TIFF format output added to 'dwebp'
* Significant memory reduction for decoding lossy images with alpha.
* Intertwined decoding of RGB and alpha for a shorter
time-to-first-decoded-pixel.
* WebPIterator has a new member 'has_alpha' denoting whether the frame
contains transparency.
* Container spec amended with new 'blending method' for animation.
* tag 'v0.4.0':
update ChangeLog
update NEWS description with new general features
gif2webp: don't use C99 %zu
cwebp: fix metadata output w/lossy+alpha
makefile.unix: clean up libgif2webp_util.a
update Changelog
bump version to 0.4.0
update AUTHORS & .mailmap
update NEWS for 0.4.0
Change-Id: I6df1b512fe0b697192f1f9b29431cd59aa6063d1
from git://git.savannah.gnu.org/autoconf-archive.git
100644 blob d383ad5c6d m4/ax_pthread.m4
dd94691 AX_PTHREAD: add support for Clang
386b290 AX_PTHREAD: fix support for AIX
e40cfd6 AX_PTHREAD: improve support for AIX
Change-Id: Ic1f1edc656e4e917b88f151493147f650b94242f
For example, distance code `1` indicates offset of `(0, 1)` for the neighboring
pixel, that is, the pixel above the current pixel (0-pixel difference in
X-direction and 1 pixel difference in Y-direction). Similarly, distance code
`3` indicates left-top pixel.
For example, distance code `1` indicates an offset of `(0, 1)` for the
neighboring pixel, that is, the pixel above the current pixel (0pixel
difference in X-direction and 1 pixel difference in Y-direction). Similarly,
distance code `3` indicates left-top pixel.
The decoder can convert a distances code 'i' to a scan-line order distance
'dist' as follows:
The decoder can convert a distance code `i` to a scan-line order distance
`dist` as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(xi, yi) = distance_map[i]
@@ -760,21 +777,22 @@ if (dist < 1) {
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
where 'distance_map' is the mapping noted above and `xsize` is the width of the
where `distance_map` is the mapping noted above and `xsize` is the width of the
image in pixels.
#### 4.2.3 Color Cache Coding
{:#color-cache-code}
Color cache stores a set of colors that have been recently used in the image.
**Rationale:** This way, the recently used colors can sometimes be referred to
more efficiently than emitting them using other two methods (described in
[4.2.1](#huffman-coded-literals) and [4.2.2](#lz77-backward-reference)).
more efficiently than emitting them using the other two methods (described in
[4.2.1](#prefix-coded-literals) and [4.2.2](#lz77-backward-reference)).
Color cache codes are stored as follows. First, there is a 1-bit value that
indicates if the color cache is used. If this bit is 0, no color cache codes
exist, and they are not transmitted in the Huffman code that decodes the green
exist, and they are not transmitted in the prefix code that decodes the green
symbols and the length prefix codes. However, if this bit is 1, the color cache
size is read next:
@@ -805,130 +823,245 @@ literals, into the cache in the order they appear in the stream.
### 5.1 Overview
Most of the data is coded using [canonical Huffman code][canonical_huff]. Hence,
the codes are transmitted by sending the _Huffman code lengths_, as opposed to
the actual _Huffman codes_.
Most of the data is coded using a [canonical prefix code][canonical_huff].
Hence, the codes are transmitted by sending the _prefix code lengths_, as
opposed to the actual _prefix codes_.
In particular, the format uses **spatially-variant Huffman coding**. In other
In particular, the format uses **spatially-variant prefix coding**. In other
words, different blocks of the image can potentially use different entropy
codes.
**Rationale**: Different areas of the image may have different characteristics. So, allowing them to use different entropy codes provides more flexibility and
potentially a better compression.
**Rationale**: Different areas of the image may have different characteristics.
So, allowing them to use different entropy codes provides more flexibility and
potentially better compression.
### 5.2 Details
The encoded image data consists of two parts:
The encoded image data consists of several parts:
1. Meta Huffman codes
1. Decoding and building the prefix codes \[AMENDED2\]
1. Meta prefix codes
1. Entropy-coded image data
#### 5.2.1 Decoding of Meta Huffman Codes
#### 5.2.1 Decoding and Building the Prefix Codes
As noted earlier, the format allows the use of different Huffman codes for
different blocks of the image. _Meta Huffman codes_ are indexes identifying
which Huffman codes to use in different parts of the image.
There are several steps in decoding the prefix codes.
Meta Huffman codes may be used _only_ when the image is being used in the
**Decoding the Code Lengths:**
{:#decoding-the-code-lengths}
This section describes how to read the prefix code lengths from the bitstream.
The prefix code lengths can be coded in two ways. The method used is specified
by a 1-bit value.
* If this bit is 1, it is a _simple code length code_, and
* If this bit is 0, it is a _normal code length code_.
In both cases, there can be unused code lengths that are still part of the
stream. This may be inefficient, but it is allowed by the format.
**(i) Simple Code Length Code:**
\[AMENDED2\]
This variant is used in the special case when only 1 or 2 prefix symbols are
in the range \[0..255\] with code length `1`. All other prefix code lengths
are implicitly zeros.
The first bit indicates the number of symbols:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int num_symbols = ReadBits(1) + 1;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Following are the symbol values.
This first symbol is coded using 1 or 8 bits depending on the value of
`is_first_8bits`. The range is \[0..1\] or \[0..255\], respectively.
The second symbol, if present, is always assumed to be in the range \[0..255\]
and coded using 8 bits.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int is_first_8bits = ReadBits(1);
symbol0 = ReadBits(1 + 7 * is_first_8bits);
code_lengths[symbol0] = 1;
if (num_symbols == 2) {
symbol1 = ReadBits(8);
code_lengths[symbol1] = 1;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Note:** Another special case is when _all_ prefix code lengths are _zeros_
(an empty prefix code). For example, a prefix code for distance can be empty
if there are no backward references. Similarly, prefix codes for alpha, red,
and blue can be empty if all pixels within the same meta prefix code are
produced using the color cache. However, this case doesn't need a special
handling, as empty prefix codes can be coded as those containing a single
symbol `0`.
**(ii) Normal Code Length Code:**
The code lengths of the prefix code fit in 8 bits and are read as follows.
First, `num_code_lengths` specifies the number of code lengths.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int num_code_lengths = 4 + ReadBits(4);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If `num_code_lengths` is > 18, the bitstream is invalid.
The code lengths are themselves encoded using prefix codes: lower level code
lengths `code_length_code_lengths` first have to be read. The rest of those
`code_length_code_lengths` (according to the order in `kCodeLengthCodeOrder`)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.