Commit Graph

1350 Commits

Author SHA1 Message Date
James Zern
6781423b7d configure: check for __builtin_bswapXX()
defines HAVE_BUILTIN_BSWAP16/32/64
updated endian_inl.h to have a non-configure fallback for gcc and clang
BSwap16() now uses __builtin_bswap16 if available

Change-Id: Ia04ee07b39303c4b247df96d84f298fb8a81f389
2014-07-05 12:35:13 -07:00
James Zern
6422e683af VP8LFillBitWindow: enable fast path for 32-bit builds
also reduce the load size from 64 to 32 bits as the top 32 bits are
being shifted away in the operation.

the change is neutral speed-wise on x86_64 as is the change in load size
on x86, but it gives a slight improvement on 32-bit arm.
x86 is improved ~13%, 32-bit arm ~3.7%
aarch64 is untested but will likely benefit as well.

Change-Id: Ibcb02a70f46f2651105d7ab571afe352673bef48
2014-07-04 14:42:47 -07:00
James Zern
4f7f52b2a1 VP8LFillBitWindow: respect WEBP_FORCE_ALIGNED
Change-Id: I23eddf01590de002efc21d8c7acc545a08fc3e48
2014-07-04 13:53:29 -07:00
James Zern
e458badcc3 endian_inl.h: implement htoleXX with BSwapXX
+ s/htole(16|32)/HToLE$1/ to avoid any name conflicts

Change-Id: Ic1c84711557e50f73d83ca5aa2b3992ac6738216
2014-07-04 12:16:36 -07:00
James Zern
f2664d1aab endian_inl.h: add BSwap16
+ use it in VP8LoadNewBytes()

Change-Id: I701d3652dc0cbd553852978702ef68c2657bca1c
2014-07-04 12:16:28 -07:00
James Zern
6fbf5345e3 Merge "configure: add --enable-aligned" 2014-07-04 00:12:28 -07:00
James Zern
dc0f479d6a configure: add --enable-aligned
forces aligned memory reads (via memcpy) in the VP8 bit reader, useful
for platforms that don't support unaligned loads.

Change-Id: Ifa44a9a1677fbdc6a929520f9340b7e3fcbd6692
2014-07-03 23:30:09 -07:00
Pascal Massimino
257adfb0be remove experimental YUV444 YUV422 and YUV400 code
(never used)

Change-Id: I12a886703592133939607df05132e9b498f916c1
2014-07-03 22:26:25 -07:00
James Zern
380cca4f2c configure.ac: add AC_C_BIGENDIAN
this defines WORDS_BIGENDIAN, replacing uses of
__BIG_ENDIAN__/__BYTE_ORDER__ with it

+ fixes lossless BGRA output with big-endian toolchains
  that do not define __BIG_ENDIAN__ (codesourcery mips gcc)

Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7
2014-07-03 18:15:50 -07:00
James Zern
ee70a90187 endian_inl.h: add BSwap64
Change-Id: I66672b770500294b8f4ee8fa4bf1dfff1119dbe6
2014-07-03 13:35:34 -07:00
James Zern
47779d46c8 endian_inl.h: add BSwap32
Change-Id: I96e3ae49659307024415d64587e6312888a0070f
2014-07-03 13:28:13 -07:00
James Zern
d5104b1ff6 utils: add endian_inl.h
moves the following to this header:
- htole*() definitions from bit_writer.c
- __BIG_ENDIAN__ fallback define from bit_reader_inl.h

Change-Id: I7fff59543f08a70bf8f9ddac849b72ed290471b1
2014-07-03 13:07:14 -07:00
Vikas Arora
516971b136 lossless: Remove unaligned read warning
(typecast uint32 pointer to uint64).
The proposed change is little (0.05%) slower but avoids uint32 to uint64
pointer conversion.

Change-Id: I6b8828077ea1324fabd04bfa7e7439e324776250
2014-07-02 20:55:27 -07:00
James Zern
e59f53600f neon: normalize vdup_n_* usage
with constants, prefer this over vmov_n_* or vcreate_*

Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a
2014-07-01 00:55:05 -07:00
James Zern
6ee7160dd2 Merge changes I0da7b3d3,Idad2f278,I4accc305
* changes:
  neon: add INIT_VECTOR4
  neon: add INIT_VECTOR3
  neon: add INIT_VECTOR2
2014-07-01 00:48:38 -07:00
skal
abc02f2492 Merge "fix (uncompiled) typo" 2014-07-01 00:29:19 -07:00
James Zern
bc03670f01 neon: add INIT_VECTOR4
used to initialize NxMx4 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e
2014-07-01 00:18:23 -07:00
James Zern
6c1c632b03 neon: add INIT_VECTOR3
used to initialize NxMx3 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: Idad2f278ab104cf2cc650517194258ce3cfb37b4
2014-06-30 23:53:23 -07:00
James Zern
dc7687e51b neon: add INIT_VECTOR2
used to initialize NxMx2 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139
2014-06-30 23:52:42 -07:00
skal
4536e7c49c add WebPMuxSetCanvasSize() to the mux API
previously, the final canvas size was adjusted tightly from the
animation frames. Now, it can be specified separately (to be larger, in particular).

calling WebPMuxSetCanvasSize(mux, 0, 0) triggers the 'adjust tightly' behaviour.
This can be useful after calling WebPMuxCreate() if further image addition
is expected.

-> Fixed gif2webp accordingly.

also: made WebPMuxAssemble() more robust by systematically zero-ing WebPData.

Change-Id: Ib4f7eac372cf9dbf6e25cd686a77960e386a0b7f
2014-06-30 07:00:49 +02:00
skal
824eab1086 fix (uncompiled) typo
output was not properly descaled when compiled
without USE_GAMMA_COMPRESSION

Change-Id: I5282c9573cfcfbef4be22b6b3cdec7e16cad9c1c
2014-06-27 17:10:25 +02:00
Pascal Massimino
1f3e5f1e60 remove unused 'shift' argument and QFIX2 define
this will remove a warning about the shift amount not being
an immediate (=constant).

Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a
Also: fix the NULL sharpen argument mismatch.
2014-06-26 00:44:12 -07:00
James Zern
1da3d46138 VP8LoadNewBytes: use __builtin_bswap32 if available
mostly to balance the use of bswap64, some gcc platforms are already
interpreting the default case the same

Change-Id: Icf860f55b3f16bea349a7d721e6d6abeeb4e5cf3
2014-06-24 23:24:36 -07:00
Pascal Massimino
25aaddc84c rename interface -> winterface
to avoid name clash on win32

Change-Id: Ia4ad3d4528df4652bab803f9a9544e21e6c4b177
2014-06-23 14:21:22 -07:00
skal
5584d9d2fc make WebPSetWorkerInterface() check its arguments
Change-Id: I522c58cfe05e864a50cacb58bdfa14d5369c6d60
2014-06-23 07:39:56 +02:00
James Zern
a9cf31913c cosmetics: update thread.h comments
WebPWorker*() are now part of WebPWorkerInterface; refer to them with
unadorned names.

Change-Id: Iae1dd59f1e545cba6dd8c18f26ba60eb9a84419b
2014-06-19 19:31:10 -07:00
levytamar82
27bfeee43a QuantizeBlock SSE2 Optimization:
Another store to load forward block was detected coming from the function
FTransform.
FTransform save the output data 4 times 8 bytes each. when this data is
later being loaded by the QuantizeBlock function in one chunk of 16 bytes
that caused a store to load forward block.
The fix was done in the FTransform function where each two consecutive 8 bytes
were merged into one 16 bytes register and saved into the memory.
This fix gives ~21% function level gain and 1.6% user level gain.

Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992
2014-06-18 16:22:00 -07:00
skal
c5c6b408b1 Merge "add alpha dithering for lossy" 2014-06-18 07:32:24 -07:00
James Zern
6e93317f5b muxread: fix out of bounds read
ChunkVerifyAndAssign() expects to have at least 8 bytes to work with,
but was only checking for the presence of 4.

Change-Id: I8456b15d872de24a90c1e8fbfba463391ced5c7f
2014-06-17 12:53:28 -07:00
skal
bbe32df1e3 add alpha dithering for lossy
new options:
 dwebp -alpha_dither
 vwebp -noalphadither

When the source was marked as quantized, we use a threshold-averaging
filter to smooth the decoded alpha plane.
Note: this option forces the decoding of alpha data in one pass, and
might slow the decoding a bit.

The new field in WebPDecoderOptions struct is 'alpha_dithering_strength'
(0 by default, means: off). Max strength value is '100'.

Change-Id: I218e21af96360d4781587fede95f8ea4e2b7287a
2014-06-14 00:06:16 +02:00
skal
790207679d Merge "make error-code reporting consistent upon malloc failure" 2014-06-13 00:25:30 -07:00
skal
77bf4410f7 make error-code reporting consistent upon malloc failure
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)

Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)

Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
2014-06-13 08:45:12 +02:00
James Zern
7a93c000ee **/Makefile.am: remove unused AM_CPPFLAGS
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.

Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
2014-06-12 11:59:05 -07:00
skal
24e3080571 Add an interface abstraction to the WebP worker thread implementation
This allows custom implementations of threading mecanism.

Patch by Leonhard Gruenschloss.

Change-Id: Id8ea5917acd2f24fa8bce79748d1747de2751614
2014-06-12 11:35:44 +02:00
James Zern
32b3137936 configure: move config.h to src/webp/config.h
this change has the side-effect of using directory names in the
include, silencing a lint warning.

Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
2014-06-10 23:42:00 -07:00
James Zern
90090d99b5 Merge changes I7c675e51,I84f7d785
* changes:
  configure: test for -msse2
  rename upsampling_mips32.c to yuv_mips32.c
2014-06-10 16:15:21 -07:00
skal
69fce2ea78 remove the special casing for res->first in VP8SetResidualCoeffs
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.

also: fixes a warning for win64
    + slight speed-up

Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
2014-06-08 06:40:22 +02:00
James Zern
6e61a3a905 configure: test for -msse2
+ add a WEBP_HAVE_SSE2 to dsp.h

not all 32-bit toolchain configurations will have sse2 enabled by
default

Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
2014-06-07 19:44:08 -07:00
James Zern
b9d2efc629 rename upsampling_mips32.c to yuv_mips32.c
matches yuv_sse2 added in;
bdfeeba dsp/yuv: move sse2 functions to yuv_sse2.c

Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3
2014-06-07 12:35:47 -07:00
James Zern
bdfeebaa01 dsp/yuv: move sse2 functions to yuv_sse2.c
Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000
2014-06-05 23:52:54 -07:00
pascal massimino
46b32e861a Merge "configure: set WEBP_HAVE_AVX2 when available" 2014-06-05 02:57:42 -07:00
pascal massimino
88305db4fc Merge "VP8RandomBits2: prevent signed int overflow" 2014-06-05 01:46:42 -07:00
James Zern
73fee88c4a VP8RandomBits2: prevent signed int overflow
'diff' at its largest may be INT_MAX; << 1 of anything at or above
1 << 30 will overflow.

Change-Id: Idb2b5a9b55acc2f6d5e32be8baaebee3f89919ad
2014-06-04 23:19:03 -07:00
James Zern
db4860b355 enc_sse2: prevent signed int overflow
_mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed
int.

Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda
2014-06-04 23:18:22 -07:00
James Zern
230a055501 configure: set WEBP_HAVE_AVX2 when available
this is used to set WEBP_USE_AVX2 in files where the build flag won't be
used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called

Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f
2014-06-03 23:29:23 -07:00
skal
a2ac8a420e restore original value_/range_ field order
no speed change, just for coherency

Change-Id: Iaa395bca24f33a14b68ba6920b838ef87d0d0db6
2014-06-03 09:36:56 +02:00
James Zern
5e2ee56fdd Merge "remove libwebpdspdecode dep on libwebpdsp_avx2" 2014-06-03 00:28:54 -07:00
James Zern
61362db57c remove libwebpdspdecode dep on libwebpdsp_avx2
it's encode only, libwebpdecoder doesn't need the symbols

Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83
2014-06-03 00:05:56 -07:00
Pascal Massimino
42c447aeb0 Merge "lossy bit-reader clean-up:" 2014-06-02 23:53:00 -07:00
James Zern
479ffd8b5d Merge "remove unused #include's" 2014-06-02 23:07:20 -07:00