libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-08-29 15:22:12 +02:00

Author	SHA1	Message	Date
James Zern	de6aee46e1	Reapply "dsp/lossless{,_enc}_sse2.c: reorder _SSE assignments" This reverts commit `61e5c391d6`. When `WEBP_USE_THREAD` is not defined the assignments of _SSE and their unsuffixed counterparts may race. Assigning _SSE directly rather than relying on the unsuffixed values avoids a case where the _SSE variants may refer to the calling function (i.e., AVX2) resulting in infinite recursion. Defining `WEBP_USE_THREAD` is recommended when decode/encode calls can be made from different threads. In the previous commit (`2246828b`) not all indices of `VP8LPredictorsAdd_SSE[]` were assigned a value, namely 14 and 15. These are used in handling invalid bitstreams to avoid a branch in a hot function. The indices are now assigned to `PredictorAdd0_SSE2` which mimics `VP8LPredictorsSub[]` in lossless_enc_sse2.c. Bug: 435213378, 438295348, 438294044, 438264629, 438294033 Change-Id: I3623717597f0ac6b0d60429adfbb20c611fe6742	2025-08-14 13:22:56 -07:00
James Zern	61e5c391d6	Revert "dsp/lossless{,_enc}_sse2.c: reorder _SSE assignments" This reverts commit `2246828be3`. Reason for revert: NULL dereferences in the fuzzers. The `VP8LPredictorsAdd_SSE` table is not completely initialized (index 14 and 15) which may be accessed with an invalid bitstream. Bug: 435213378 Original change's description: > dsp/lossless{,_enc}_sse2.c: reorder _SSE assignments > > When `WEBP_USE_THREAD` is not defined the assignments of _SSE and their > unsuffixed counterparts may race. Assigning _SSE directly rather than > relying on the unsuffixed values avoids a case where the *_SSE variants > may refer to the calling function (i.e., AVX2) resulting in infinite > recursion. > > Defining `WEBP_USE_THREAD` is recommended when decode/encode calls can > be made from different threads. > > Bug: 435213378 > Change-Id: Id5549730cb72be99b3014ed8e4e355f3ea988659 Bug: 435213378, 438295348, 438294044, 438264629, 438294033 Change-Id: I3299d6fbb29c45872e2ea1f8f1c3d0ebbda64a69	2025-08-13 17:10:20 -07:00
James Zern	2246828be3	dsp/lossless{,_enc}_sse2.c: reorder _SSE assignments When `WEBP_USE_THREAD` is not defined the assignments of _SSE and their unsuffixed counterparts may race. Assigning _SSE directly rather than relying on the unsuffixed values avoids a case where the _SSE variants may refer to the calling function (i.e., AVX2) resulting in infinite recursion. Defining `WEBP_USE_THREAD` is recommended when decode/encode calls can be made from different threads. Bug: 435213378 Change-Id: Id5549730cb72be99b3014ed8e4e355f3ea988659	2025-08-07 13:50:16 -07:00
Wan-Teh Chang	ac2795e904	Remove pthread code for Windows older than Vista Assume the CONDITION_VARIABLE added in Windows Vista is available. Remove an unneeded WaitForSingleObject() macro that converts WaitForSingleObject() calls to WaitForSingleObjectEx() calls with bAlertable=FALSE. The WaitForSingleObject() function does not enter an alertable wait state, so it is equivalent to WaitForSingleObjectEx() with bAlertable=FALSE. Remove code for Windows older than Vista in src/dsp/cpu.h. Change-Id: I7df95557713923e05a7bfb62e095ec6172cfd708	2025-08-05 12:01:55 -07:00
James Zern	c41d168d25	Merge "Apply "default unsafe" annotation across webputils" into main	2025-08-05 11:23:47 -07:00
mxms	ff87eeecc9	Apply "default unsafe" annotation across webputils Import bounds_safety.h across all of webputils, with one exception being dsp.h, since it's imported by webputils.h in one place. Also prepend WEBP_ASSUME_UNSAFE_INDEXABLE_ABI to every webputil file to indicate to the compiler that every pointer should be treated as __unsafe_indexable. We also need to replace memcpy/memset/memmove with the unsafe variants WEBP_UNSAFE_*, as memcpy/memset/memmove require bounded/sized pointers. With this change, all of libwebputils (and libwebp) should build with -DWEBP_ENABLE_FBOUNDS_SAFETY=true Change-Id: Iad87be0455182d534c074ef6dc1a30fa66b74b6c	2025-08-04 18:56:57 -07:00
Wan-Teh Chang	54f23b049e	Implement WEBP_DSP_INIT with SRWLOCK for Windows A slim reader/writer (SRW) lock can be initialized statically with the constant SRWLOCK_INIT. It is the only Windows synchronization object I can find with this property. Note: On old Windows versions that don't have SRWLOCK, use the fallback, thread-unsafe implementation. Change a NOLINT comment to a NOLINTNEXTLINE comment to prevent clang-format from aligning the #else and #endif comments in undesired way. Bug: 435213378 Change-Id: Iecff615a14a1905aedd2c05ad9444889f711cc17	2025-08-04 15:01:09 -07:00
Wan-Teh Chang	313692d51e	Use file static variables in WEBP_DSP_INIT_FUNC() Function static variables are initialized on the first call to the function. In C the initialization of function static variables is not thread-safe. Use file static variables instead in the WEBP_DSP_INIT_FUNC() macro. Remove the volatile qualifier for the pthread version of the func##_last_cpuinfo_used variable because the variable is only accessed while holding the mutex. Change-Id: I1237904a49d2467d7ce79fc53f9e7f966aa7a5c1	2025-08-01 19:22:17 -07:00
clang-format	44257cb826	apply clang-format (Debian clang-format version 19.1.7 (3+build4)) with `--style=Google`. Manual changes: * clang-format disabled around macros with stringification (mostly assembly) * some inline assembly strings were adjusted to avoid awkward line breaks * trailing commas, `//` or suffixes (`ull`) added to help array formatting * thread_utils.c: parameter comments were changed to the more common /...=/ style to improve formatting The automatically generated code under swig/ was skipped. Bug: 433996651 Change-Id: Iea3f24160d78d2a2653971cdf13fa932e47ff1b3	2025-07-31 14:53:58 -07:00
Vincent Rabaud	8c815d82d7	Add ARGB/ABGR support to WebPConvertRGB24ToY/WebPConvertBGR24ToY Rename them to WebPConvertRGBToY/WebPConvertBGRToY and accept the 'step' parameter (3 for RGB, 4 for ARGB). Change-Id: I930a23894e4135a34fff2174e6a5bbee1eac2ba0	2025-07-24 14:14:20 +02:00
James Zern	753ed11ef8	enc_neon.c: fix aarch64 compilation w/gcc < 8.5.0 Fixes: dsp/enc_neon.c:1192:11: warning: implicit declaration of function 'vld1_u8_x2'; did you mean 'vld1_u32'? [-Wimplicit-function-declaration] inner = vld1_u8_x2(top); ^~~~~~~~~~ vld1_u32 Change-Id: I8d0175561efd69bc9614a68dca1d0fc19cdf91be	2025-05-30 10:25:38 -07:00
Henner Zeller	98c2780100	IWYU: Include all headers for symbols used in files. Semi-automatically taking the the misc-include-cleaner warnings by clang-tidy and fixing files to be self-contained. Change-Id: Iaaa2b2ec9d6dcce547fa5cb6b4f056dfc8c781ff	2025-05-15 14:53:57 +02:00
Vincent Rabaud	57e324e2eb	Refactor VP8LHistogram histogram_enc.cc - move HistogramAdd to histogram_enc.cc: it is too high level - homogenize the argument naming (e.g. h for histogram, p for population) - separate a bit the data from the stats (only used within VP8LGetHistoImageSymbols) Change-Id: I274546e3ff96297383bcae0a95696c11f18decbf	2025-04-23 19:12:21 +02:00
James Zern	f8b360c419	alpha_processing_sse2: quiet signed conv warning After: `44f91b0d` Speed DispatchAlpha_SSE2 up _mm_set1_epi8 takes a char argument; add a `char` cast for 0xff. from clang-14 integer sanitizer: implicit conversion from type 'int' of value 255 (32-bit, signed) to type 'char' changed the value to -1 (8-bit, signed) Change-Id: I0f4ed092eddc0beb311f44bf3d4b74a4d1177040	2025-04-17 12:21:34 -07:00
James Zern	ad52d5fc7e	dec/dsp/enc/utils,cosmetics: rm struct member '_' suffix This is a follow up to: `ee8e8c62` Fix member naming for VP8LHistogram This better matches Google style and clears some clang-tidy warnings. This is the final change in this set. It is rather large due to the shared dependencies between dec/enc. Change-Id: I89de06b5653ae0bb627f904fa6060334831f7e3b	2025-04-16 13:23:42 -07:00
Vincent Rabaud	44f91b0ddd	Speed DispatchAlpha_SSE2 up On some dataset, this was taking 2.5%. 2% when switching to _mm_maskmoveu_si128. 1.7% when using _mm_loadu_si128 Confirmed by IACA: going from throughput of 4.26 to 3.5 and then to 6.26 for twice the input. Change-Id: I409f901aaad9d39bf55a1aac28cc25f126876b01	2025-04-10 11:53:19 +02:00
Vincent Rabaud	ee8e8c620f	Fix member naming for VP8LHistogram clang-tidy keeps complaining and that typedef will evolve in the future Change-Id: I734f2ae7dc0f4deac0dd391ae9f4b38c45507651	2025-04-10 09:54:57 +02:00
Vincent Rabaud	321561b41f	Remove now unused ExtraCostCombined Change-Id: Ic9d1ccf5b10fed67f836aa19fa0f84238acbf4c1	2025-03-29 23:34:20 +01:00
Vincent Rabaud	f2b3f52733	Get AVX2 into WebP lossless Change-Id: Ifad3102c9f899a46401985515cd98f3f7a21887f	2025-03-28 11:44:03 +01:00
Vincent Rabaud	7c70ff7a3b	Clean dsp/lossless includes Change-Id: I47a405a9c402095b440404fe57ac08b5293ea71b	2025-03-25 12:38:00 +01:00
Vincent Rabaud	9dd5ae819b	Use the full register in PredictorSub13_SSE2 No more than 15 registers are used at a time Change-Id: I40f77d9df8500e5e0d52ff6b206d765e8be62ae1	2025-03-25 11:07:15 +01:00
James Zern	743a5f092d	enc_neon: enable vld1q_u8_x4 for clang & msvc This restores the use of the function after `980b708e` enc_neon: fix build w/aarch64 gcc < 9.4.0 The intrinsic was added to llvm for aarch64 in: 5e4ce1ae9dad Implement the newly added AArch64 ACLE functions for ld1/st1 with 2/3/4 vectors. The functions are like: vst1_s8_x2 ... llvmorg-3.4.0-rc1~101 https://github.com/llvm/llvm-project/commit/5e4ce1ae9dad Visual Studio 2019 and 2022 also support the function (2017 is still disabled for this path due to it relying on arm64_neon.h). Change-Id: I6ff10e22deb3968a48738a4458d2d3d55410b5ec	2025-03-05 16:56:20 -08:00
James Zern	980b708e2c	enc_neon: fix build w/aarch64 gcc < 9.4.0 vld1q_u8_x4 was added for aarch64 in the gcc 9.4.0 release: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/ChangeLog;h=7558c0a369ea8c74a2b9369049a2d1cc187dc050;hb=13c83c4cc679ad5383ed57f359e53e8d518b7842#l2100 fixes: src/dsp/enc_neon.c: In function 'Intra4Preds_NEON': src/dsp/enc_neon.c:974:37: warning: implicit declaration of function 'vld1q_u8_x4'; did you mean 'vld1q_u8_x2'? [-Wimplicit-function-declaration] Bug: webp:398288323 Change-Id: Ic6e408065a375c945cc8691bd16a9f5d5642cfa2	2025-02-27 19:07:50 -08:00
James Zern	4c85d860ea	yuv.h: update RGB<->YUV coefficients in comment The values for the R/G/B floating point formulas resembled https://fourcc.org/fccyvrgb.php and Video Demystified, but the fixed point values are more closely aligned to rounded values from https://en.wikipedia.org/wiki/YCbCr and BT.601. The R/G/B formulas with the values prior to this change are added to sharpyuv_csp.c as they align with the fixed values. The origin of those coefficients is unclear. For consistency between library versions we'll leave them as is. Bug: webp:375011696 Change-Id: Id3f2a57530eee700cc52a899b32b25b5c015e89b	2024-11-21 16:21:45 -08:00
James Zern	61e2cfdadd	rework AddVectorEq_SSE2 Take advantage of the known sizes used by VP8LHistogramAdd() and remove loop for the remainder. The loop was being auto-vectorized making the code larger and slower than the vectorized C code. For larger sizes the new code is ~3-4.5% faster than the old code with about the same improvement against the vectorized C code. For the minimal size (40), the new code is ~30% faster than the C and old SSE2 code. The LINE_SIZE==8 option is removed with this change. It had been set to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case by 2 in any case; they both profile similarly. Change-Id: I6dfedfd57474f44d15e2ce510a48e5252221077a	2024-11-14 12:21:39 -08:00
James Zern	7bda3deb89	rework AddVector_SSE2 Take advantage of the known sizes used by VP8LHistogramAdd() and remove loop for the remainder. The loop was being auto-vectorized making the code larger and slower than the vectorized C code. For larger sizes the new code is ~4-7% faster than the old code with about the same improvement against the vectorized C code. For the minimal size (40), the new code is ~30% faster than the C and old SSE2 code. The LINE_SIZE==8 option is removed with this change. It had been set to 16 for its entire life and clang-16 was unrolling the LINE_SIZE==8 case by 2 in any case; they both profile similarly. Change-Id: I2376e2dca3bffa38477b4a432f4c533419e3be0e	2024-11-14 12:21:33 -08:00
James Zern	dfdcb7f95c	Merge "lossless.h: fix function declaration mismatches" into main	2024-10-09 22:30:49 +00:00
James Zern	78ed683978	fix overread in Intra4Preds_NEON Extend VP8EncIterator::i4_boundary_ by 3 bytes to avoid Intra4Preds_NEON reading deeper into the struct (likely padding) when top is positioned at offset 29. This data is memset with MSan to prevent a warning due to its incorrect modeling of tbl instructions. Prior to: `169dfbf9` disable Intra4Preds_NEON there was a mismatch in the preprocessor checks for enabling the function in NEON and removing the C version; NEON used `BPS == 32` while the C code was removed unconditionally when building for aarch64. This patch also normalizes those checks to look for `BPS == 32` and `BPS != 32` as appropriate. Bug: b:366668849,webp:372109644 Change-Id: Ic9e6ad4b2d844cb446decd63aec0b2676a89c8d0	2024-10-08 16:55:12 -07:00
James Zern	d516a68e54	lossless.h: fix function declaration mismatches These appear as warnings under VS15 (16 and 17 are silent) and were missed in: `a32b436b` dsp/lossless*: use WEBP_RESTRICT qualifier Change-Id: Ia7cffafc166f2da93b51714363558798cda71b67	2024-10-08 13:41:16 -07:00
James Zern	fdb229ea3a	Merge changes I07a7e36a,Ib29980f7,I2316122d,I2356e314,I32b53dd3, ... into main * changes: dsp/yuv: use WEBP_RESTRICT qualifier dsp/upsampling: use WEBP_RESTRICT qualifier dsp/rescaler: use WEBP_RESTRICT qualifier dsp/lossless: use WEBP_RESTRICT qualifier dsp/filters: use WEBP_RESTRICT qualifier dsp/enc: use WEBP_RESTRICT qualifier dsp/dec: use WEBP_RESTRICT qualifier dsp/cost: use WEBP_RESTRICT qualifier	2024-10-03 17:01:02 +00:00
James Zern	169dfbf931	disable Intra4Preds_NEON The load of the `top` parameter may over read causing MSan errors: ==7373==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0xfff891d52ad4 in Intra4Preds_NEON src/dsp/enc_neon.c:1003:12 #1 0xfff892d87618 in MakeIntra4Preds src/enc/quant_enc.c:484:3 Bug: b:366668849 Change-Id: I29cf3b2f402ee79ea93c1ee2a4fdd95083aeed68	2024-10-02 15:42:19 -07:00
James Zern	2dd5eb9862	dsp/yuv*: use WEBP_RESTRICT qualifier Better vectorization in the C code, fewer instructions / comparisons in NEON, and fewer reloads in SSE2/SSE4 w/ndk r27/gcc-13/clang-16. This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I07a7e36a2dce8632c71c0fbbeef94dc51453eaf7	2024-10-02 14:55:15 -07:00
James Zern	23bbafbeb8	dsp/upsampling*: use WEBP_RESTRICT qualifier Better vectorization in the C code, fewer instructions in NEON, and some code reordering / better register usage in SSE2/SSE4 w/ndk r27/gcc-13/clang-16. This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: Ib29980f778ad3dbb952178ad8dee39b8673c4ff8	2024-10-02 14:55:15 -07:00
James Zern	35915b389e	dsp/rescaler*: use WEBP_RESTRICT qualifier Some improvement in the C code. No changes in NEON or SSE2 w/ndk r27/gcc-13/clang-16. This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I2316122db893f48f0afda90a147c83cac7f07526	2024-10-02 14:55:14 -07:00
James Zern	a32b436bd5	dsp/lossless*: use WEBP_RESTRICT qualifier lossless_enc: better vectorization, most benefits seen in AddVector/Eq w/ndk r27/gcc-13/clang-16 lossless: minor reordering and some improvement to PredictorAdd5_SSE2 w/gcc-13 This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I2356e314f391ee2f2c71f00bc6ee10097d3881e7	2024-10-02 14:55:14 -07:00
James Zern	04d4b4f387	dsp/filters*: use WEBP_RESTRICT qualifier Better stack/register usage in SSE2/NEON code and improved vectorization of the C code with ndk r27/gcc-13/clang-16. This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I32b53dd38bfc7e2231d875409e7dfda7c513cfb6	2024-10-02 14:55:14 -07:00
James Zern	b1cb37e659	dsp/enc*: use WEBP_RESTRICT qualifier This allows for better vectorization of the C code, inlining of TrueMotion_SSE2, better load usage in aarch64 and other minor reordering with ndk r27/gcc-13/clang-16. This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I07e9944d5c0aa5a079b22883ac5a2d649695e4a0	2024-10-02 14:55:14 -07:00
James Zern	201894ef24	dsp/dec*: use WEBP_RESTRICT qualifier A minor improvement for arm targets with ndk r27/gcc-13 in H/VFilter8 (a couple fewer moves w/aarch64) and much better vectorization of DitherCombine8x8_C in most targets. This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I03e73e6d6404261bb8408a9ae76a4b6ef142f8f0	2024-10-02 14:55:14 -07:00
James Zern	02eac8a741	dsp/cost: use WEBP_RESTRICT qualifier on SetResidualCoeffs_. This results in some minor code reordering when targeting arvm7 with ndk r27 and other recent versions of clang. No changes in the x86 compilations with clang-16 / gcc-13. This only affects non-vector pointers; any vector pointers are left as a follow up. Change-Id: I7c3554ece848fafbc5ac9c4944f1dc85129f6fd8	2024-10-02 14:55:14 -07:00
Vincent Rabaud	220ee52967	Search for best predictor transform bits This is useful in cruncher mode. Change-Id: I8586bdbf464daf85db381ab77a18bf63dd48f323	2024-09-24 10:44:22 +02:00
James Zern	615e58744f	Merge "make VP8LPredictor[01]_C() static" into main	2024-08-22 17:35:52 +00:00
James Zern	233e86b91f	Merge changes Ie43dc5ef,I94cd8bab into main * changes: DoFilter_: remove row & num_rows parameters Do*Filter_C: remove dead 'inverse' code paths	2024-08-19 18:51:06 +00:00
James Zern	1a29fd2fc3	make VP8LPredictor[01]_C() static Only predictors 2-13 are reused in lossless_enc.c. Change-Id: Ia3a7342fccfb44b9ad5297f48d6be2d96af68ec8	2024-08-16 10:58:45 -07:00
James Zern	dd9d3770d7	DoFilter_: remove row & num_rows parameters The row parameter became a constant in: `2102ccd` update the Unfilter API in dsp to process one row independently num_rows is always equal to height. Change-Id: Ie43dc5ef222e442ce8c92766da0b9824ccbca236	2024-08-12 19:36:31 -07:00
James Zern	ab451a495c	Do*Filter_C: remove dead 'inverse' code paths The inverse parameter became a constant in: `2102ccd` update the Unfilter API in dsp to process one row independently The row parameter to these functions is in a similar state; it will be removed in a follow up. Change-Id: I94cd8babe0e42474ff794ba5fa29dd48039de5f8	2024-08-08 18:13:48 -07:00
James Zern	f9a480f7c3	{TrueMotion,TM16}_NEON: remove zero extension Replace vmovl_u8 -> s16 + signed vaddq with unsigned vaddw. No change in assembly with clang-16 (armv7 & aarch64) and gcc-13 (aarch64). armv7 gcc-13 had kept the vmovl instructions, those are now gone. Change-Id: Ibb4fbdd5680d3e9dd06933c100528a6f363de472	2024-08-07 16:43:14 -07:00
James Zern	d742b24a88	Intra16Preds_NEON: fix truemotion saturation This needs to be done with signed saturation as the sum may be negative. fixes mismatch with C code after: `3bfb05e3` Add AArch64 Neon implementation of Intra16Preds Change-Id: I017e939d7155cc3489ceb76fc8ad50ac9917f23d	2024-07-11 13:37:06 -07:00
James Zern	c7bb4cb585	Intra4Preds_NEON: fix truemotion saturation This needs to be done with signed saturation as the sum may be negative. fixes mismatch with C code after: `baa93808` Add AArch64 Neon implementation of Intra4Preds Change-Id: I190c3d7f78cfd2c7ae83fb7059de41e307abda36	2024-07-11 13:37:06 -07:00
Vincent Rabaud	dde11574b0	Remove TODO now that log is using fixed point. Bug: webp:499 Change-Id: I39ab340ec6b5932db7535c6b7f31843c28de8415	2024-07-11 20:11:03 +00:00
James Zern	3bd9420289	Merge changes Iff6e47ed,I24c67cd5,Id781e761 into main * changes: Use QuantizeBlock_NEON for VP8EncQuantizeBlockWHT on Arm Add AArch64 Neon implementation of Intra16Preds Add AArch64 Neon implementation of Intra4Preds	2024-07-11 02:04:42 +00:00

1 2 3 4 5 ...

966 Commits