The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.
The heavy bottleneck left for optimization is still GradientUnfilter()
Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row
Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
This compresses the uimage using lossless compression and controlable
decimating pre-process.
Code is under WEBP_EXPERIMENTAL_FEATURE while it's being experimented with.
Change-Id: I8b7f4cfcc3c6afc52a556102842bdbb045ed5ee8
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.
Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()
Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled
Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
the inclusion of the files is harmless when NEON is not enabled and will
allow them to be built with NEON for APP_ABI=arm64-v8a which currently
does not use the '.neon' suffix
Change-Id: I39377876b1b68822c38f4e2396da93c56145fc0f
Functions VP8LFastLog2Slow and VP8LFastSLog2Slow
also: replaced some "% y" by "& (y-1)" in the C-version
(since y is a power-of-two)
Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0
new file: lossless_neon.c
speedup is ~5%
gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509
I've tried adding -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.
Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0
expose the predictor array as function pointers instead
of each individual sub-function
+ merged Average2() into ClampedAddSubtractHalf directly
+ unified the signature as "VP8LProcessBlueAndRedFunc"
no speed diff observed
Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044
-ffunction-sections / -fdata-sections
can improve final binary size when used with --gc-sections, speed impact
untested
Change-Id: I37f4b5da2f34acede7965c2da2e1b97125473adc
-> split libraries further into decoder / encoder
-> add libwebpdecoder.a in Makefile.unix
-> make dwebp link against libwebpdecoder.a in Makefile.unix
also: in makefile.unix, pass EXTRA_FLAGS to LDFLAGS too
(otherwise, -m32 wouldn't work, e.g.)
Change-Id: Ief3da02a729dd86bbaf949ed048836716941657f
- along the lines of the SSE chroma upsampling.
Total speedup is ~30%.
4% speed loss on YuvToRgbXX conversion using tables instead
of 14-bit fixed precision. TODO(later): investigate, and compare
to x86.
see http://code.google.com/p/webp/issues/detail?id=134
Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009
(implements the backward and forward transforms in the encoder)
original patch by Wayne Chen (datoudatou at gmail dot com)
Change-Id: Ic00f3bffcdf7a924f043006728735c810ee47a57
the extended file format is still under development and related
libs/binaries are not fit for release
configure:
--enable/disable-libwebpmux; default is disabled.
makefile.unix:
src/mux/libwebpmux.a and examples/webpmux must be explicitly specified
Makefile.vc:
$(DIRLIB)\libwebpmux.lib and $(DIRBIN)\webpmux.exe must be explicitly
specified
Change-Id: I8246746b256010dd2a2e4de58291222d7eaf0457
Defining LOCAL_ARM_NEON = true can result in neon instructions being
used in portions unprotected by the cpu check.
This changes defines a WEBP_USE_NEON/WEBP_ANDROID_NEON pair similar to
the SSE2 code and MSVC.
Change-Id: Ifac010b06e42c73d5aca529baa2198c6796674bd
* Method #1 is now calling the lossless encoder on the alpha plane.
Format is not final, it's just a first draft. We need ad-hoc functions.
* removed now useless utils/alpha.*
* added utils/quant_levels.h instead
* removed the TCoder code altogether
Change-Id: I636840b6129a43171b74860e0a0fc5bb1bcffc6a