Commit Graph

1421 Commits

Author SHA1 Message Date
James Zern
4c6dde37b9 bit_writer: cosmetics: rename kFlush() -> Flush()
Change-Id: I8907927974188bee85ffade1d75d2e50817aa115
2014-08-05 22:14:29 -07:00
James Zern
f7b4c48bba cosmetics: remove some extraneous 'extern's
Change-Id: Ib3f0cff37120c51633387dd1c46592c53ab0ba6d
2014-08-05 22:14:24 -07:00
James Zern
b47fb00ac0 vp8enci.h: cosmetics: fix '*' placement
associate with the type

Change-Id: Icf94f11bf79f6ccee3150e27b228755f8f3f0f37
2014-08-05 22:14:12 -07:00
skal
b5a36cc9ad add -near_lossless [0..100] experimental option
This compresses the uimage using lossless compression and controlable
decimating pre-process.
Code is under WEBP_EXPERIMENTAL_FEATURE while it's being experimented with.

Change-Id: I8b7f4cfcc3c6afc52a556102842bdbb045ed5ee8
2014-08-05 19:17:10 +02:00
James Zern
0524d9e5e8 dsp: detect mips64 & disable mips32 code
Change-Id: Icf68dafd5cf0614ca25b36a0252caa1784ac8059
2014-08-01 21:18:53 -07:00
James Zern
29a9fe222a libwebp 0.4.1
- 7/24/14: version 0.4.1
   This is a binary compatible release.
   * AArch64 (arm64) & MIPS support/optimizations
   * NEON assembly additions:
     - ~25% faster lossy decode / encode (-m 4)
     - ~10% faster lossless decode
     - ~5-10% faster lossless encode (-m 3/4)
   * dwebp/vwebp can read from stdin
   * cwebp/gif2webp can write to stdout
   * cwebp can read webp files; useful if storing sources as webp lossless
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJT1xp9AAoJEPnD1r24IytdjDEP/3ZOnrWG0OIThlGE6bqgO3oy
 Y5O7RrvzFuPdGEZ1Kl9jDXjzsYY018/+HJmOD3kf+Qt/+F/8hpGH520VuEiJdVIW
 UcvoYaYq9xrmKNqEJx910Vh8TP7wE2T62OJcqKWg2JEczfUWn8WOKjmM5c8N1kJ2
 q6EbpCdWlxcD49L/MavJ5Yfw9jSZAjKzOIxxz0C294iMTK4IcSmeVvdqhkdyh96E
 CABw3o8sJfqB6p+KXjweXcE2KOhvzAWqTRcIogDC0jV/PgOlindf6k0am2FJHvMM
 A+sf/pmD0YKI1vEaXW+Vs6cz6LzvwbIkJSwuzBA7FYHAG5yqTSkQDxTSttw/RwiW
 fUScqHjQVBUqkM5bdOsdYBSDutQKDF2+WfcK5jXFdnydkQi59HKHV2R0K5cXYqfN
 Tu7aMBqFcfGunLlzfKCJcz8SElEmUjG6oAzRZYcdM9dmnR7ypQK17A/GbaysKKOE
 HMmep7uNX25w+6AL7zExnmPPPtSz+kj1SXt9fgldkelDhg1faAgfwXb/N4E+00lA
 1+aJD3gHcR4QnDI4gnKBKHyIktQPfNKMQ6xuL0oyvsalQ/loz08wu0aACcGDFrg4
 uOVVxTqU+pEITuwGcNk228+O2EbMWzzi3+Vhi1v3Gg3jJ3TRB3QN6NohmrsIackL
 4W2V5NoX5i2VizGfLy2g
 =GWd5
 -----END PGP SIGNATURE-----

Merge tag 'v0.4.1'

libwebp 0.4.1
- 7/24/14: version 0.4.1
  This is a binary compatible release.
  * AArch64 (arm64) & MIPS support/optimizations
  * NEON assembly additions:
    - ~25% faster lossy decode / encode (-m 4)
    - ~10% faster lossless decode
    - ~5-10% faster lossless encode (-m 3/4)
  * dwebp/vwebp can read from stdin
  * cwebp/gif2webp can write to stdout
  * cwebp can read webp files; useful if storing sources as webp lossless

* tag 'v0.4.1':
  update ChangeLog
  iosbuild.sh: specify optimization flags
  update ChangeLog
  makefile.unix: add vwebp.1 to the dist target
  update ChangeLog
  gif2webp: dust up the help message
  remove -noalphadither option from README/vwebp.1
  update NEWS for the next release
  update AUTHORS
  bump version to 0.4.1
  restore mux API compatibility
  remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
  restore encode API compatibility
  restore decode API compatibility
  gif2webp: fix compile with giflib 5.1.0
  gif2webp: simplify giflib version checking

Change-Id: Icf599f29bc6c0db757bc133aaddb3dbbbc316e08
2014-07-29 18:06:58 -07:00
James Zern
85213b9bbe bump version to 0.4.1
libwebp{,decoder} - 0.4.1
libwebp libtool - 5.1.0
libwebpdecoder libtool - 1.1.0

mux/demux - 0.2.1
libtool - 1.1.0

Change-Id: If593a198f802fd68c7dbbdbe0fc2612dbc44e2df
2014-07-23 17:17:25 -07:00
James Zern
695f80ae25 Merge "restore mux API compatibility" into 0.4.1 2014-07-23 17:11:33 -07:00
James Zern
862d296cf9 restore mux API compatibility
protect WebPMuxSetCanvasSize w/a WEBP_MUX_ABI_VERSION check

Change-Id: I6b01af55ebb4cc4c860d3cbf43be722077896748
2014-07-23 16:13:56 -07:00
skal
8f6f8c5dde remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
There's no speed diff, so better remove it altogether

Reported in https://code.google.com/p/webp/issues/detail?id=215

Change-Id: I991330de18bec340029d6df5fed0dfb4337e4662
2014-07-23 14:15:40 -07:00
James Zern
c2fc52e4ec restore encode API compatibility
protect WebPConfigLosslessPreset/WebPMemoryWriterClear w/a
WEBP_ENCODER_ABI_VERSION check

Change-Id: If4debc15fee172a3f18079bc2bd29eb8447bc14b
2014-07-22 22:19:55 -07:00
James Zern
793368e8c6 restore decode API compatibility
protect flip/alpha_dither w/a WEBP_DECODER_ABI_VERSION check

Change-Id: I437a5d5f78800f71b7e7e323faa321f946bf9515
2014-07-22 20:03:52 -07:00
Vikas Arora
d2cc61b7dd Extend MakeARGB32() to accept Alpha channel.
Change-Id: I31b8e2d085000e2e3687a373401e4f655f11fc42
2014-07-21 14:49:38 -07:00
skal
3398d81ac3 Actuate memory stats for PRINT_MEMORY_INFO
Change-Id: If7eac591b5205990ca452ca02b084a908482850a
2014-07-21 13:16:18 -07:00
skal
6c347bbb0c move WebPPictureInit to picture.c
Change-Id: I4b8c352cfd47256d0c3827334a6942c1caf742f6
2014-07-21 14:16:19 +02:00
Pascal Massimino
1549d62067 reorder the YUVA->ARGB and ARGB->YUVA functions correctly
+ rework few loops
+ consolidate few error-checks / error-reporting
+ don't modify picture->colorspace in Import() for ARGB output

Change-Id: Iae6da9b50acc738c59b85c3ee64efbaf6af8bffc
2014-07-18 07:15:54 -07:00
Pascal Massimino
736f2a175e extract colorspace code from picture.c into picture_csp.c
had to refactor few functions here and there.

Change-Id: I86fde6fec7c2fc7eb48f0ecf327dbbd2bd40b9d4
2014-07-16 16:37:26 -07:00
Pascal Massimino
fbadb48026 split monolithic picture.c into picture_{tools,psnr,rescale}.c
Change-Id: Ia5eb5496e4337e5bac8203872c5b014cad21c4f9
2014-07-12 09:13:33 -07:00
James Zern
c76f07ecc2 dec_neon/TransformAC3: initialize vector w/vcreate
replaces {} initialization gnu-ism

Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0
2014-07-11 15:56:53 -07:00
Urvang Joshi
bb4fc051bf gif2webp: Allow single-frame animations
Some single-frame GIF images have a canvas larger than the frame rectangle. For
such images, we retain the ANMF, ANIM and VP8X chunks in the output WebP file.
This ensures that the full canvas width/height and frame offsets are retained.

Change-Id: I3ebae4893f953984de4072fda0938411de787a29
2014-07-10 15:21:05 -07:00
James Zern
46fd44c104 thread: remove harmless race on status_ in End()
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.

Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
2014-07-08 20:32:29 -07:00
James Zern
6781423b7d configure: check for __builtin_bswapXX()
defines HAVE_BUILTIN_BSWAP16/32/64
updated endian_inl.h to have a non-configure fallback for gcc and clang
BSwap16() now uses __builtin_bswap16 if available

Change-Id: Ia04ee07b39303c4b247df96d84f298fb8a81f389
2014-07-05 12:35:13 -07:00
James Zern
6422e683af VP8LFillBitWindow: enable fast path for 32-bit builds
also reduce the load size from 64 to 32 bits as the top 32 bits are
being shifted away in the operation.

the change is neutral speed-wise on x86_64 as is the change in load size
on x86, but it gives a slight improvement on 32-bit arm.
x86 is improved ~13%, 32-bit arm ~3.7%
aarch64 is untested but will likely benefit as well.

Change-Id: Ibcb02a70f46f2651105d7ab571afe352673bef48
2014-07-04 14:42:47 -07:00
James Zern
4f7f52b2a1 VP8LFillBitWindow: respect WEBP_FORCE_ALIGNED
Change-Id: I23eddf01590de002efc21d8c7acc545a08fc3e48
2014-07-04 13:53:29 -07:00
James Zern
e458badcc3 endian_inl.h: implement htoleXX with BSwapXX
+ s/htole(16|32)/HToLE$1/ to avoid any name conflicts

Change-Id: Ic1c84711557e50f73d83ca5aa2b3992ac6738216
2014-07-04 12:16:36 -07:00
James Zern
f2664d1aab endian_inl.h: add BSwap16
+ use it in VP8LoadNewBytes()

Change-Id: I701d3652dc0cbd553852978702ef68c2657bca1c
2014-07-04 12:16:28 -07:00
James Zern
6fbf5345e3 Merge "configure: add --enable-aligned" 2014-07-04 00:12:28 -07:00
James Zern
dc0f479d6a configure: add --enable-aligned
forces aligned memory reads (via memcpy) in the VP8 bit reader, useful
for platforms that don't support unaligned loads.

Change-Id: Ifa44a9a1677fbdc6a929520f9340b7e3fcbd6692
2014-07-03 23:30:09 -07:00
Pascal Massimino
257adfb0be remove experimental YUV444 YUV422 and YUV400 code
(never used)

Change-Id: I12a886703592133939607df05132e9b498f916c1
2014-07-03 22:26:25 -07:00
James Zern
380cca4f2c configure.ac: add AC_C_BIGENDIAN
this defines WORDS_BIGENDIAN, replacing uses of
__BIG_ENDIAN__/__BYTE_ORDER__ with it

+ fixes lossless BGRA output with big-endian toolchains
  that do not define __BIG_ENDIAN__ (codesourcery mips gcc)

Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7
2014-07-03 18:15:50 -07:00
James Zern
ee70a90187 endian_inl.h: add BSwap64
Change-Id: I66672b770500294b8f4ee8fa4bf1dfff1119dbe6
2014-07-03 13:35:34 -07:00
James Zern
47779d46c8 endian_inl.h: add BSwap32
Change-Id: I96e3ae49659307024415d64587e6312888a0070f
2014-07-03 13:28:13 -07:00
James Zern
d5104b1ff6 utils: add endian_inl.h
moves the following to this header:
- htole*() definitions from bit_writer.c
- __BIG_ENDIAN__ fallback define from bit_reader_inl.h

Change-Id: I7fff59543f08a70bf8f9ddac849b72ed290471b1
2014-07-03 13:07:14 -07:00
Vikas Arora
516971b136 lossless: Remove unaligned read warning
(typecast uint32 pointer to uint64).
The proposed change is little (0.05%) slower but avoids uint32 to uint64
pointer conversion.

Change-Id: I6b8828077ea1324fabd04bfa7e7439e324776250
2014-07-02 20:55:27 -07:00
James Zern
e59f53600f neon: normalize vdup_n_* usage
with constants, prefer this over vmov_n_* or vcreate_*

Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a
2014-07-01 00:55:05 -07:00
James Zern
6ee7160dd2 Merge changes I0da7b3d3,Idad2f278,I4accc305
* changes:
  neon: add INIT_VECTOR4
  neon: add INIT_VECTOR3
  neon: add INIT_VECTOR2
2014-07-01 00:48:38 -07:00
skal
abc02f2492 Merge "fix (uncompiled) typo" 2014-07-01 00:29:19 -07:00
James Zern
bc03670f01 neon: add INIT_VECTOR4
used to initialize NxMx4 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e
2014-07-01 00:18:23 -07:00
James Zern
6c1c632b03 neon: add INIT_VECTOR3
used to initialize NxMx3 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: Idad2f278ab104cf2cc650517194258ce3cfb37b4
2014-06-30 23:53:23 -07:00
James Zern
dc7687e51b neon: add INIT_VECTOR2
used to initialize NxMx2 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139
2014-06-30 23:52:42 -07:00
skal
4536e7c49c add WebPMuxSetCanvasSize() to the mux API
previously, the final canvas size was adjusted tightly from the
animation frames. Now, it can be specified separately (to be larger, in particular).

calling WebPMuxSetCanvasSize(mux, 0, 0) triggers the 'adjust tightly' behaviour.
This can be useful after calling WebPMuxCreate() if further image addition
is expected.

-> Fixed gif2webp accordingly.

also: made WebPMuxAssemble() more robust by systematically zero-ing WebPData.

Change-Id: Ib4f7eac372cf9dbf6e25cd686a77960e386a0b7f
2014-06-30 07:00:49 +02:00
skal
824eab1086 fix (uncompiled) typo
output was not properly descaled when compiled
without USE_GAMMA_COMPRESSION

Change-Id: I5282c9573cfcfbef4be22b6b3cdec7e16cad9c1c
2014-06-27 17:10:25 +02:00
Pascal Massimino
1f3e5f1e60 remove unused 'shift' argument and QFIX2 define
this will remove a warning about the shift amount not being
an immediate (=constant).

Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a
Also: fix the NULL sharpen argument mismatch.
2014-06-26 00:44:12 -07:00
James Zern
1da3d46138 VP8LoadNewBytes: use __builtin_bswap32 if available
mostly to balance the use of bswap64, some gcc platforms are already
interpreting the default case the same

Change-Id: Icf860f55b3f16bea349a7d721e6d6abeeb4e5cf3
2014-06-24 23:24:36 -07:00
Pascal Massimino
25aaddc84c rename interface -> winterface
to avoid name clash on win32

Change-Id: Ia4ad3d4528df4652bab803f9a9544e21e6c4b177
2014-06-23 14:21:22 -07:00
skal
5584d9d2fc make WebPSetWorkerInterface() check its arguments
Change-Id: I522c58cfe05e864a50cacb58bdfa14d5369c6d60
2014-06-23 07:39:56 +02:00
James Zern
a9cf31913c cosmetics: update thread.h comments
WebPWorker*() are now part of WebPWorkerInterface; refer to them with
unadorned names.

Change-Id: Iae1dd59f1e545cba6dd8c18f26ba60eb9a84419b
2014-06-19 19:31:10 -07:00
levytamar82
27bfeee43a QuantizeBlock SSE2 Optimization:
Another store to load forward block was detected coming from the function
FTransform.
FTransform save the output data 4 times 8 bytes each. when this data is
later being loaded by the QuantizeBlock function in one chunk of 16 bytes
that caused a store to load forward block.
The fix was done in the FTransform function where each two consecutive 8 bytes
were merged into one 16 bytes register and saved into the memory.
This fix gives ~21% function level gain and 1.6% user level gain.

Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992
2014-06-18 16:22:00 -07:00
skal
c5c6b408b1 Merge "add alpha dithering for lossy" 2014-06-18 07:32:24 -07:00
James Zern
6e93317f5b muxread: fix out of bounds read
ChunkVerifyAndAssign() expects to have at least 8 bytes to work with,
but was only checking for the presence of 4.

Change-Id: I8456b15d872de24a90c1e8fbfba463391ced5c7f
2014-06-17 12:53:28 -07:00
skal
bbe32df1e3 add alpha dithering for lossy
new options:
 dwebp -alpha_dither
 vwebp -noalphadither

When the source was marked as quantized, we use a threshold-averaging
filter to smooth the decoded alpha plane.
Note: this option forces the decoding of alpha data in one pass, and
might slow the decoding a bit.

The new field in WebPDecoderOptions struct is 'alpha_dithering_strength'
(0 by default, means: off). Max strength value is '100'.

Change-Id: I218e21af96360d4781587fede95f8ea4e2b7287a
2014-06-14 00:06:16 +02:00
skal
790207679d Merge "make error-code reporting consistent upon malloc failure" 2014-06-13 00:25:30 -07:00
skal
77bf4410f7 make error-code reporting consistent upon malloc failure
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)

Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)

Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
2014-06-13 08:45:12 +02:00
James Zern
7a93c000ee **/Makefile.am: remove unused AM_CPPFLAGS
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.

Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
2014-06-12 11:59:05 -07:00
skal
24e3080571 Add an interface abstraction to the WebP worker thread implementation
This allows custom implementations of threading mecanism.

Patch by Leonhard Gruenschloss.

Change-Id: Id8ea5917acd2f24fa8bce79748d1747de2751614
2014-06-12 11:35:44 +02:00
James Zern
32b3137936 configure: move config.h to src/webp/config.h
this change has the side-effect of using directory names in the
include, silencing a lint warning.

Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
2014-06-10 23:42:00 -07:00
James Zern
90090d99b5 Merge changes I7c675e51,I84f7d785
* changes:
  configure: test for -msse2
  rename upsampling_mips32.c to yuv_mips32.c
2014-06-10 16:15:21 -07:00
skal
69fce2ea78 remove the special casing for res->first in VP8SetResidualCoeffs
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.

also: fixes a warning for win64
    + slight speed-up

Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
2014-06-08 06:40:22 +02:00
James Zern
6e61a3a905 configure: test for -msse2
+ add a WEBP_HAVE_SSE2 to dsp.h

not all 32-bit toolchain configurations will have sse2 enabled by
default

Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
2014-06-07 19:44:08 -07:00
James Zern
b9d2efc629 rename upsampling_mips32.c to yuv_mips32.c
matches yuv_sse2 added in;
bdfeeba dsp/yuv: move sse2 functions to yuv_sse2.c

Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3
2014-06-07 12:35:47 -07:00
James Zern
bdfeebaa01 dsp/yuv: move sse2 functions to yuv_sse2.c
Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000
2014-06-05 23:52:54 -07:00
pascal massimino
46b32e861a Merge "configure: set WEBP_HAVE_AVX2 when available" 2014-06-05 02:57:42 -07:00
pascal massimino
88305db4fc Merge "VP8RandomBits2: prevent signed int overflow" 2014-06-05 01:46:42 -07:00
James Zern
73fee88c4a VP8RandomBits2: prevent signed int overflow
'diff' at its largest may be INT_MAX; << 1 of anything at or above
1 << 30 will overflow.

Change-Id: Idb2b5a9b55acc2f6d5e32be8baaebee3f89919ad
2014-06-04 23:19:03 -07:00
James Zern
db4860b355 enc_sse2: prevent signed int overflow
_mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed
int.

Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda
2014-06-04 23:18:22 -07:00
James Zern
230a055501 configure: set WEBP_HAVE_AVX2 when available
this is used to set WEBP_USE_AVX2 in files where the build flag won't be
used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called

Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f
2014-06-03 23:29:23 -07:00
skal
a2ac8a420e restore original value_/range_ field order
no speed change, just for coherency

Change-Id: Iaa395bca24f33a14b68ba6920b838ef87d0d0db6
2014-06-03 09:36:56 +02:00
James Zern
5e2ee56fdd Merge "remove libwebpdspdecode dep on libwebpdsp_avx2" 2014-06-03 00:28:54 -07:00
James Zern
61362db57c remove libwebpdspdecode dep on libwebpdsp_avx2
it's encode only, libwebpdecoder doesn't need the symbols

Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83
2014-06-03 00:05:56 -07:00
Pascal Massimino
42c447aeb0 Merge "lossy bit-reader clean-up:" 2014-06-02 23:53:00 -07:00
James Zern
479ffd8b5d Merge "remove unused #include's" 2014-06-02 23:07:20 -07:00
James Zern
9754d39a4e Merge "strong filtering speed-up (~2-3% x86, ~1-2% for NEON)" 2014-06-02 23:06:18 -07:00
skal
158aff9bb9 remove unused #include's
Change-Id: Icd91a4b6a0bde49145f57e3e74a997822c45792c
2014-06-03 08:01:05 +02:00
Pascal Massimino
09545eeadc lossy bit-reader clean-up:
* remove LEFT/RIGHT_JUSTIFY distinction. It's all RIGHT_JUSTIFY now.
* simplify VP8GetSigned(), and add some masking branch-less code. Much
  faster on ARM (~13% speed-up). 8% on x86-64, 5% on MacBook.
* split critical implementation into separate bit_reader_inl.h file that
  is only included where needed (vp8.c / tree.c / bit_reader.c)
* bumped BITS value from 16 to 24 for x86-32b too, since it's a bit faster.

Change-Id: If41ca1da3e5c3dadacf2379d1ba419b151e7fce8
2014-06-03 07:46:55 +02:00
skal
ea8b0a171d strong filtering speed-up (~2-3% x86, ~1-2% for NEON)
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).

Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
2014-06-03 07:14:23 +02:00
skal
6679f8996f Optimize VP8SetResidualCoeffs.
Brings down WebP lossy encoding timings by 5%

Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489
2014-06-03 06:44:04 +02:00
James Zern
4dfa86b29c dsp/cpu: NaCl has no support for xgetbv
or the raw opcode; fixes:
  934ed4: unrecognized instruction

Change-Id: I981870baf0e8b03bf40144ea8ec25eff140d5bc3
2014-05-29 23:02:23 -07:00
skal
c9b340a279 fix missing WebPInitAlphaProcessing call for premultiplied colorspace output
(lossless only)

Change-Id: Ic2d01c8cf9bc1082f07f348733461eb2ee30288a
2014-05-28 10:44:05 +02:00
pascal massimino
57897bae09 Merge "lossless_neon: use vcreate_*() where appropriate" 2014-05-28 01:36:13 -07:00
pascal massimino
6aa4777b39 Merge "(enc|dec)_neon: use vcreate_*() where appropriate" 2014-05-28 01:34:56 -07:00
skal
0d346e418d Always reinit VP8TransformWHT instead of hard-coding
Change-Id: I2012749ed29bd166d2a96555372f0d9baa784385
2014-05-28 10:21:07 +02:00
James Zern
bf0e003067 lossless_neon: use vcreate_*() where appropriate
this is more portable than {} initialization.
more involved cases are left for a follow-up.

Change-Id: If7e111864f287ea0a5de6311454aeda37afbb52a
2014-05-27 16:27:46 -07:00
James Zern
9251c2f6d2 (enc|dec)_neon: use vcreate_*() where appropriate
this is more portable than {} initialization.
more involved cases are left for a follow-up.

Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17
2014-05-27 16:26:56 -07:00
skal
399b916d27 lossy decoding: correct alpha-rescaling for YUVA format
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.

Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()

Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
2014-05-27 15:27:13 -07:00
Pascal Massimino
fdbcd44dd3 simplify VP8LInitBitReader()
gcc was generating very complex code, one for each case of br->len_ values!

also, pretty-fy the mask constants

Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42
2014-05-22 21:44:16 -07:00
James Zern
515e35cfb1 Merge "add stub dsp/enc_avx2.c" 2014-05-22 18:28:38 -07:00
skal
a05dc1402c SSE2: yuv->rgb speed-up for point-sampling
- use statically initialized tables (if WEBP_YUV_USE_SSE2_TABLES is defined)
 - use SSE2 row conversion for yuv->ARGB / RGBA / ABGR / RGB / BGR
 - clean-up and harmonize the WebpUpsamplers[] usage.

Change-Id: Ic5f3659a995927bd7363defac99c1fc03a85a47d
2014-05-22 09:56:47 +02:00
James Zern
178e9a69ae add stub dsp/enc_avx2.c
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled

Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
2014-05-22 00:31:46 -07:00
James Zern
e46a247c87 cpu: fix check for __cpuidex availability
__cpuidex was added in VS2008 /SP1/

Change-Id: Ie49b00b0246bd6537c0ed583412f17d6fd135baa
2014-05-21 22:59:47 -07:00
skal
176fda2650 fix the bit-writer for lossless in 32bit mode
Sometimes, we can write 18bit or more at time, and it would
overflow the 32bit accumulator.

Also clarified the num-bits limitations (and exposed
VP8L_MAX_NUM_BIT_READ in bit_reader.h)

fixes http://code.google.com/p/webp/issues/detail?id=200

Seems a bit faster (use of local fields for bits_ / used_)

also: added the __QNX__ bswap while at it.

Change-Id: I876db93a931db15b083cf1d838c70105effa7167
2014-05-22 07:19:22 +02:00
James Zern
541784c710 dsp.h: add a check for AVX2 / define WEBP_USE_AVX2
Change-Id: I90cc870f0bb4426af701779c367587dc2ae79c8b
2014-05-21 20:46:28 -07:00
James Zern
bdb151ee80 dsp/cpu: add AVX2 detection
currently unused.
https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Change-Id: I314200f890c58b9a587b902b214f90deb95f0579
2014-05-20 22:48:54 -07:00
Pascal Massimino
ab9f2f8685 Merge "revamp the point-sampling functions by processing a full plane" 2014-05-20 15:21:31 -07:00
Pascal Massimino
a2f8b28905 revamp the point-sampling functions by processing a full plane
-nofancy is slower than fancy upsampler, because the latter has SSE2 optim.

Change-Id: Ibf22e5a8ea1de86a54248d4a4ecc63d514c01b88
2014-05-20 15:13:44 -07:00
Pascal Massimino
ef076026af use decoder's DSP functions for autofilter
-af is now faster (6-7%), since we're using the SSE2 variant
Output is binary the same as before.

Change-Id: If75694594c9501cd486b8f237a810ddcc145cadd
2014-05-20 14:55:05 -07:00
pascal massimino
2b5cb32612 Merge "dsp/cpu: add AVX detection" 2014-05-20 01:10:18 -07:00
James Zern
df08e67e06 dsp/cpu: add AVX detection
currently unused.
https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions

similar checks exist in ffmpeg, libyuv. the visual studio inline asm is
based off of libyuv.

Change-Id: I3e233de3492172434e482607a94b99c617f11aad
2014-05-20 00:25:12 -07:00
Pascal Massimino
e2f405c969 Merge "clean-up and slight speed-up in-loop filtering SSE2" 2014-05-20 00:08:40 -07:00
Pascal Massimino
f60957bfd2 clean-up and slight speed-up in-loop filtering SSE2
* remove some sign-bit flipping
* turn some macro into inline functions
* fix some 'const' in signatures
* clarify the int8/uint8 usage

Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d
2014-05-19 23:23:47 -07:00
James Zern
a577b23a0a dsp/WEBP_USE_NEON: test for __aarch64__
__ARM_NEON__ is unset by current linux gcc/clang + android toolchains
for aarch64/arm64 builds.

Change-Id: Ib2ca172ea6fcf046e4ced19a431088674c99b7f6
2014-05-14 00:07:13 -07:00