Commit Graph

1371 Commits

Author SHA1 Message Date
skal
bbe32df1e3 add alpha dithering for lossy
new options:
 dwebp -alpha_dither
 vwebp -noalphadither

When the source was marked as quantized, we use a threshold-averaging
filter to smooth the decoded alpha plane.
Note: this option forces the decoding of alpha data in one pass, and
might slow the decoding a bit.

The new field in WebPDecoderOptions struct is 'alpha_dithering_strength'
(0 by default, means: off). Max strength value is '100'.

Change-Id: I218e21af96360d4781587fede95f8ea4e2b7287a
2014-06-14 00:06:16 +02:00
skal
790207679d Merge "make error-code reporting consistent upon malloc failure" 2014-06-13 00:25:30 -07:00
skal
77bf4410f7 make error-code reporting consistent upon malloc failure
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)

Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)

Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
2014-06-13 08:45:12 +02:00
James Zern
7a93c000ee **/Makefile.am: remove unused AM_CPPFLAGS
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.

Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
2014-06-12 11:59:05 -07:00
skal
24e3080571 Add an interface abstraction to the WebP worker thread implementation
This allows custom implementations of threading mecanism.

Patch by Leonhard Gruenschloss.

Change-Id: Id8ea5917acd2f24fa8bce79748d1747de2751614
2014-06-12 11:35:44 +02:00
James Zern
32b3137936 configure: move config.h to src/webp/config.h
this change has the side-effect of using directory names in the
include, silencing a lint warning.

Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
2014-06-10 23:42:00 -07:00
James Zern
90090d99b5 Merge changes I7c675e51,I84f7d785
* changes:
  configure: test for -msse2
  rename upsampling_mips32.c to yuv_mips32.c
2014-06-10 16:15:21 -07:00
skal
69fce2ea78 remove the special casing for res->first in VP8SetResidualCoeffs
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.

also: fixes a warning for win64
    + slight speed-up

Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
2014-06-08 06:40:22 +02:00
James Zern
6e61a3a905 configure: test for -msse2
+ add a WEBP_HAVE_SSE2 to dsp.h

not all 32-bit toolchain configurations will have sse2 enabled by
default

Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
2014-06-07 19:44:08 -07:00
James Zern
b9d2efc629 rename upsampling_mips32.c to yuv_mips32.c
matches yuv_sse2 added in;
bdfeeba dsp/yuv: move sse2 functions to yuv_sse2.c

Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3
2014-06-07 12:35:47 -07:00
James Zern
bdfeebaa01 dsp/yuv: move sse2 functions to yuv_sse2.c
Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000
2014-06-05 23:52:54 -07:00
pascal massimino
46b32e861a Merge "configure: set WEBP_HAVE_AVX2 when available" 2014-06-05 02:57:42 -07:00
pascal massimino
88305db4fc Merge "VP8RandomBits2: prevent signed int overflow" 2014-06-05 01:46:42 -07:00
James Zern
73fee88c4a VP8RandomBits2: prevent signed int overflow
'diff' at its largest may be INT_MAX; << 1 of anything at or above
1 << 30 will overflow.

Change-Id: Idb2b5a9b55acc2f6d5e32be8baaebee3f89919ad
2014-06-04 23:19:03 -07:00
James Zern
db4860b355 enc_sse2: prevent signed int overflow
_mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed
int.

Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda
2014-06-04 23:18:22 -07:00
James Zern
230a055501 configure: set WEBP_HAVE_AVX2 when available
this is used to set WEBP_USE_AVX2 in files where the build flag won't be
used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called

Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f
2014-06-03 23:29:23 -07:00
skal
a2ac8a420e restore original value_/range_ field order
no speed change, just for coherency

Change-Id: Iaa395bca24f33a14b68ba6920b838ef87d0d0db6
2014-06-03 09:36:56 +02:00
James Zern
5e2ee56fdd Merge "remove libwebpdspdecode dep on libwebpdsp_avx2" 2014-06-03 00:28:54 -07:00
James Zern
61362db57c remove libwebpdspdecode dep on libwebpdsp_avx2
it's encode only, libwebpdecoder doesn't need the symbols

Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83
2014-06-03 00:05:56 -07:00
Pascal Massimino
42c447aeb0 Merge "lossy bit-reader clean-up:" 2014-06-02 23:53:00 -07:00
James Zern
479ffd8b5d Merge "remove unused #include's" 2014-06-02 23:07:20 -07:00
James Zern
9754d39a4e Merge "strong filtering speed-up (~2-3% x86, ~1-2% for NEON)" 2014-06-02 23:06:18 -07:00
skal
158aff9bb9 remove unused #include's
Change-Id: Icd91a4b6a0bde49145f57e3e74a997822c45792c
2014-06-03 08:01:05 +02:00
Pascal Massimino
09545eeadc lossy bit-reader clean-up:
* remove LEFT/RIGHT_JUSTIFY distinction. It's all RIGHT_JUSTIFY now.
* simplify VP8GetSigned(), and add some masking branch-less code. Much
  faster on ARM (~13% speed-up). 8% on x86-64, 5% on MacBook.
* split critical implementation into separate bit_reader_inl.h file that
  is only included where needed (vp8.c / tree.c / bit_reader.c)
* bumped BITS value from 16 to 24 for x86-32b too, since it's a bit faster.

Change-Id: If41ca1da3e5c3dadacf2379d1ba419b151e7fce8
2014-06-03 07:46:55 +02:00
skal
ea8b0a171d strong filtering speed-up (~2-3% x86, ~1-2% for NEON)
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).

Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
2014-06-03 07:14:23 +02:00
skal
6679f8996f Optimize VP8SetResidualCoeffs.
Brings down WebP lossy encoding timings by 5%

Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489
2014-06-03 06:44:04 +02:00
James Zern
4dfa86b29c dsp/cpu: NaCl has no support for xgetbv
or the raw opcode; fixes:
  934ed4: unrecognized instruction

Change-Id: I981870baf0e8b03bf40144ea8ec25eff140d5bc3
2014-05-29 23:02:23 -07:00
skal
c9b340a279 fix missing WebPInitAlphaProcessing call for premultiplied colorspace output
(lossless only)

Change-Id: Ic2d01c8cf9bc1082f07f348733461eb2ee30288a
2014-05-28 10:44:05 +02:00
pascal massimino
57897bae09 Merge "lossless_neon: use vcreate_*() where appropriate" 2014-05-28 01:36:13 -07:00
pascal massimino
6aa4777b39 Merge "(enc|dec)_neon: use vcreate_*() where appropriate" 2014-05-28 01:34:56 -07:00
skal
0d346e418d Always reinit VP8TransformWHT instead of hard-coding
Change-Id: I2012749ed29bd166d2a96555372f0d9baa784385
2014-05-28 10:21:07 +02:00
James Zern
bf0e003067 lossless_neon: use vcreate_*() where appropriate
this is more portable than {} initialization.
more involved cases are left for a follow-up.

Change-Id: If7e111864f287ea0a5de6311454aeda37afbb52a
2014-05-27 16:27:46 -07:00
James Zern
9251c2f6d2 (enc|dec)_neon: use vcreate_*() where appropriate
this is more portable than {} initialization.
more involved cases are left for a follow-up.

Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17
2014-05-27 16:26:56 -07:00
skal
399b916d27 lossy decoding: correct alpha-rescaling for YUVA format
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.

Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()

Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
2014-05-27 15:27:13 -07:00
Pascal Massimino
fdbcd44dd3 simplify VP8LInitBitReader()
gcc was generating very complex code, one for each case of br->len_ values!

also, pretty-fy the mask constants

Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42
2014-05-22 21:44:16 -07:00
James Zern
515e35cfb1 Merge "add stub dsp/enc_avx2.c" 2014-05-22 18:28:38 -07:00
skal
a05dc1402c SSE2: yuv->rgb speed-up for point-sampling
- use statically initialized tables (if WEBP_YUV_USE_SSE2_TABLES is defined)
 - use SSE2 row conversion for yuv->ARGB / RGBA / ABGR / RGB / BGR
 - clean-up and harmonize the WebpUpsamplers[] usage.

Change-Id: Ic5f3659a995927bd7363defac99c1fc03a85a47d
2014-05-22 09:56:47 +02:00
James Zern
178e9a69ae add stub dsp/enc_avx2.c
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled

Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
2014-05-22 00:31:46 -07:00
James Zern
e46a247c87 cpu: fix check for __cpuidex availability
__cpuidex was added in VS2008 /SP1/

Change-Id: Ie49b00b0246bd6537c0ed583412f17d6fd135baa
2014-05-21 22:59:47 -07:00
skal
176fda2650 fix the bit-writer for lossless in 32bit mode
Sometimes, we can write 18bit or more at time, and it would
overflow the 32bit accumulator.

Also clarified the num-bits limitations (and exposed
VP8L_MAX_NUM_BIT_READ in bit_reader.h)

fixes http://code.google.com/p/webp/issues/detail?id=200

Seems a bit faster (use of local fields for bits_ / used_)

also: added the __QNX__ bswap while at it.

Change-Id: I876db93a931db15b083cf1d838c70105effa7167
2014-05-22 07:19:22 +02:00
James Zern
541784c710 dsp.h: add a check for AVX2 / define WEBP_USE_AVX2
Change-Id: I90cc870f0bb4426af701779c367587dc2ae79c8b
2014-05-21 20:46:28 -07:00
James Zern
bdb151ee80 dsp/cpu: add AVX2 detection
currently unused.
https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Change-Id: I314200f890c58b9a587b902b214f90deb95f0579
2014-05-20 22:48:54 -07:00
Pascal Massimino
ab9f2f8685 Merge "revamp the point-sampling functions by processing a full plane" 2014-05-20 15:21:31 -07:00
Pascal Massimino
a2f8b28905 revamp the point-sampling functions by processing a full plane
-nofancy is slower than fancy upsampler, because the latter has SSE2 optim.

Change-Id: Ibf22e5a8ea1de86a54248d4a4ecc63d514c01b88
2014-05-20 15:13:44 -07:00
Pascal Massimino
ef076026af use decoder's DSP functions for autofilter
-af is now faster (6-7%), since we're using the SSE2 variant
Output is binary the same as before.

Change-Id: If75694594c9501cd486b8f237a810ddcc145cadd
2014-05-20 14:55:05 -07:00
pascal massimino
2b5cb32612 Merge "dsp/cpu: add AVX detection" 2014-05-20 01:10:18 -07:00
James Zern
df08e67e06 dsp/cpu: add AVX detection
currently unused.
https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions

similar checks exist in ffmpeg, libyuv. the visual studio inline asm is
based off of libyuv.

Change-Id: I3e233de3492172434e482607a94b99c617f11aad
2014-05-20 00:25:12 -07:00
Pascal Massimino
e2f405c969 Merge "clean-up and slight speed-up in-loop filtering SSE2" 2014-05-20 00:08:40 -07:00
Pascal Massimino
f60957bfd2 clean-up and slight speed-up in-loop filtering SSE2
* remove some sign-bit flipping
* turn some macro into inline functions
* fix some 'const' in signatures
* clarify the int8/uint8 usage

Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d
2014-05-19 23:23:47 -07:00
James Zern
a577b23a0a dsp/WEBP_USE_NEON: test for __aarch64__
__ARM_NEON__ is unset by current linux gcc/clang + android toolchains
for aarch64/arm64 builds.

Change-Id: Ib2ca172ea6fcf046e4ced19a431088674c99b7f6
2014-05-14 00:07:13 -07:00
skal
54bfffcabc move RemapBitReader() from idec.c to bit_reader code
mostly for coherency and later patch.

Change-Id: Ica8352d67845b6c5b3153435edfb4646c6f24341
2014-05-14 07:07:08 +02:00
James Zern
34168ecbe4 Merge "remove all unused layer code" 2014-05-08 22:51:13 -07:00
Pascal Massimino
f1e771735a remove all unused layer code
Change-Id: I220590162b24c70f404fe3087f19dd3e6cac3608
2014-05-08 22:37:38 -07:00
Vikas Arora
b0757db7c6 Code cleanup for VP8LGetHistoImageSymbols.
Fix comments and few nits.
Change-Id: I8fa25ed523f12c6a7bfe125f0e4d638466ba4304
2014-05-08 14:13:47 -07:00
skal
5fe628d35d make the token page size be variable instead of fixed 8192
also changed the token-page layout a little bit to remove
a not-needed field.

This reduces the number of malloc()/free() calls substantially
with minimal increase in memory consumption (~2%).
For the tail of large sources, the number of malloc calls goes
typically from ~10000 to ~100 (e.g.: bryce_big.jpg: 22711 -> 105)

Change-Id: Ib847f41e618ed8c303d26b76da982fbc48de45b9
2014-05-05 14:26:14 -07:00
skal
f948d08c81 memory debug: allow setting pre-defined malloc failure points
MALLOC_FAIL_AT flag can be used to set-up a pre-determined failure
point during malloc calls. The counter value is retrieved using
getenv().
Example usage: export MALLOC_FAIL_AT=37 && cwebp input.png
will make 'cwebp' report a memory allocation error the 37th time
malloc() or calloc() is called.

MALLOC_MEM_LIMIT can be used similarly to prevent allocating more
than a given amount of memory. This is usually less convenient to
use than MALLOC_FAIL_AT since one has to know in advance the typical
memory size allocated.

Both these flags are meant to be used for debugging only!

Also: added a 'total_mem_allocated' to record the overall memory allocated

Change-Id: I9d408095ee7d76acba0f3a31b1276fc36478720a
2014-05-05 14:01:33 -07:00
skal
ca3d746e39 use block-based allocation for backward refs storage, and free-lists
Non-photo source produce far less literal reference and their
buffer is usually much smaller than the picture size if its compresses
well. Hence, use a block-base allocation (and recycling) to avoid
pre-allocating a buffer with maximal size.

This can reduce memory consumption up to 50% for non-photographic
content. Encode speed is also a little better (1-2%)

Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11
2014-05-05 11:11:55 -07:00
James Zern
1ba61b09f9 enable NEON intrinsics in aarch64 builds
avoids functions that use vtbl? as in iOS builds these are marked
unavailable

Change-Id: I17aedc3c7dc8f1d5be0941205de0b22c3772ef1b
2014-05-03 12:37:42 -07:00
James Zern
b9d2bb67d6 dsp/neon.h: coalesce intrinsics-related defines
Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946
2014-05-03 11:34:07 -07:00
Vikas Arora
9383afd5c7 Reduce number of memory allocations while decoding lossless.
This change reduces the number of calls to WebPSafeMalloc from 200 to
100. The overall memory consumption is down 3% for Lenna image.

Change-Id: I1b351a1f61abf2634c035ef1ccb34050b7876bdd
2014-05-02 01:01:43 -07:00
James Zern
888e63edc9 Merge "dsp/lossless: prevent signed int overflow in left shift ops" 2014-05-02 00:29:54 -07:00
Pascal Massimino
8137f3edbd Merge "instrument memory allocation routines for debugging" 2014-05-02 00:23:48 -07:00
Pascal Massimino
2aa187360d instrument memory allocation routines for debugging
Some tracing code is activated by PRINT_MEM_INFO flag.
For debugging only! (not thread-safe, and slow).

Change-Id: I282c623c960f97d474a35b600981b761ef89ace9
2014-05-02 00:19:55 -07:00
skal
d3bcf72bf5 Don't allocate VP8LHashChain, but treat like automatic object
the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_)
is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain
in the encoder.

Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677
2014-04-30 14:10:48 -07:00
James Zern
bd6b8619dd dsp/lossless: prevent signed int overflow in left shift ops
force unsigned when shifting by 24.

Change-Id: I453601f33fdf01c516ef66ad23399ae6cbe032b3
2014-04-30 00:10:49 -07:00
James Zern
b7f19b8311 Merge "dec/vp8l: prevent signed int overflow in left shift ops" 2014-04-29 15:56:54 -07:00
Pascal Massimino
29059d5178 Merge "remove some uint64_t casts and use." 2014-04-29 14:15:40 -07:00
James Zern
e69a1df4b7 dec/vp8l: prevent signed int overflow in left shift ops
force unsigned when shifting by 24.

Change-Id: I6f9ca5fa2109e59b1d46a909136384fc6dc8ca0b
2014-04-29 14:12:38 -07:00
Pascal Massimino
cf5eb8ad19 remove some uint64_t casts and use.
We use automatic int->uint64_t promotion where applicable.

(uint64_t should be kept only for overflow checking and memory alloc).

Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125
2014-04-29 09:08:25 -07:00
Djordje Pesut
38e2db3e16 MIPS: MIPS32r1: Added optimization for HistogramAdd.
Change-Id: I39622a9c340c4090f64dd10e515c4ef2aa21d10a
2014-04-29 08:36:51 -07:00
Vikas Arora
6d6865f0db Added SSE2 variants for Average2/3/4
The predictors based on Average2 are tad slower.

Following is the performance data for these predictors normalized to
number of instruction cycles (as per valgrind) per operation:
- Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11
  instruction cycles for the C version.
- Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12
  instruction cycles for the C version.

The predictors based on Average4 is faster and Average3 is tad slower:
- Predictor10 (Average4) now takes 23 instruction cycles compared to 25
  instruction cycles for the C version.
- Predictor5 (Average3) now takes 20 instruction cycles compared to 18
  instruction cycles for the C version.

Maybe SSE2 version of Average2 can be improved further. Otherwise, we can
remove the SSE2 version and always fallback to the C version.

Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029
2014-04-28 14:47:30 -07:00
Pascal Massimino
b3a616b356 make HistogramAdd() a pointer in dsp
* merged the two HistogramAdd/AddEval() into a single call
  (with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
  of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster

Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
2014-04-28 10:09:34 -07:00
James Zern
c8bbb636ea dec_neon: relocate some inline-asm defines
move simple loop filter defines closer to their use and LOAD* to a
location common with the intrinsics

Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3
2014-04-28 00:41:42 -07:00
James Zern
4e393bb9f1 dec_neon: enable intrinsics-only functions
the complex loop filter has no inline equivalent; the simple loop filter
remains conditional on USE_INTRINSICS: it's left undefined for now.

Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2
2014-04-28 00:39:46 -07:00
James Zern
ba99a922ab dec_neon: use positive tests for USE_INTRINSICS
makes Simple* layout consistent with the rest of the file

Change-Id: Ib3108b0f2c694c634210e22027c253ea6236a9c6
2014-04-28 00:38:47 -07:00
James Zern
a7828e8bdb dec_neon: make WORK_AROUND_GCC conditional on version
Change-Id: Ic1b95f8749988de90df7c1ff6c537a21981329db
2014-04-28 00:01:19 -07:00
pascal massimino
3f3d717a6c Merge "enc_neon: enable intrinsics-only functions" 2014-04-27 02:05:53 -07:00
pascal massimino
de3cb6c820 Merge "move LOCAL_GCC_VERSION def to dsp.h" 2014-04-27 02:04:08 -07:00
pascal massimino
ca49e7ad97 Merge "enc_neon: move Transpose4x4 to dsp/neon.h" 2014-04-27 01:11:05 -07:00
Pascal Massimino
ad900abddd Merge "fix warning about size_t -> int conversion" 2014-04-27 01:07:03 -07:00
Pascal Massimino
4825b4360d fix warning about size_t -> int conversion
+ re-order and add some const

Change-Id: I3746520b75699e56e20835d10d1dd9cd9fd6d85d
2014-04-27 00:50:07 -07:00
James Zern
42b35e086b enc_neon: enable intrinsics-only functions
CollectHistogram / SSE* / QuantizeBlock have no inline equivalents,
enable them where possible and use USE_INTRINSICS to control borderline
cases: it's left undefined for now.

Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155
2014-04-26 19:09:04 -07:00
James Zern
f937e01261 move LOCAL_GCC_VERSION def to dsp.h
+ add LOCAL_GCC_PREREQ and use it in lossless_neon.c

Change-Id: Ic9fd99540bc3e19c027d1598e4530dfdc9b9de00
2014-04-26 19:09:04 -07:00
James Zern
5e1a17ef4b enc_neon: move Transpose4x4 to dsp/neon.h
+ reuse it in TransformWHT()

Change-Id: Idfbd0f9b58d6253ac3d65ba55b58989c427ee989
2014-04-26 14:06:04 -07:00
James Zern
c7b92a5a29 dec_neon: (WORK_AROUND_GCC) delete unused Load4x8
using this in Load4x16 was slightly slower and didn't help mitigate any
of the remaining build issues with 4.6.x.

Change-Id: Idabfe1b528842a514d14a85f4cefeb90abe08e51
2014-04-26 12:36:14 -07:00
Vikas Arora
0b896101b4 Reduce memory footprint for encoding WebP lossless.
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.

Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.

Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.

Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
2014-04-26 01:14:33 -07:00
Djordje Pesut
1d62acf6af MIPS: MIPS32r1: Added optimization for HuffmanCost functions.
HuffmanCost and HuffmanCostCombined optimized and added
'const' to some variables from ExtraCost functions.

Change-Id: I28b2b357a06766bee78bdab294b5fc8c05ac120d
2014-04-24 11:14:57 +02:00
James Zern
c0220460e9 Merge "Bugfix: Incremental decode of lossy-alpha" 2014-04-22 16:33:12 -07:00
Urvang Joshi
8c7cd722f6 Bugfix: Incremental decode of lossy-alpha
When remapping buffer, br->eos_ was wrongly being set to true for
certain
images.

Also, refactored the end-of-stream detection as a function.

Reported in http://crbug.com/364830

Change-Id: I716ce082ef2b505fe24246b9c14912d8e97b5d84
2014-04-22 16:06:32 -07:00
Djordje Pesut
7955152d58 MIPS: fix error with number of registers.
Some versions of compiler in debug build can't find
a register in class 'GR_REGS' while reloading 'asm'

Number of used registers is decreased in this fix.

Change-Id: I7d7b8172b8f37f1de4db3d8534a346d7a72c5065
2014-04-22 12:06:45 +02:00
skal
b1dabe3767 Merge "Move the HuffmanCost() function to dsp lib" 2014-04-18 12:08:22 -07:00
skal
75b12006e3 Move the HuffmanCost() function to dsp lib
This is to help further optimizations.
(like in https://gerrit.chromium.org/gerrit/#/c/69787/)

There's a small slowdown (~0.5% at -z 9 quality) due to
function pointer usage. Note that, for speed, it's important
to return VP8LStreaks by value, and not pass a pointer.

Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000
2014-04-18 11:59:48 -07:00
Djordje Pesut
2772b8bd98 MIPS: fix assembler error revealed by clang's debug build
.set at -  Indicates that macro expansions may clobber
           the assembler temporary ($at or $28) register.
           Some macros may not be expanded without this
           and will generate an error message if noat
           is in effect.

"at" also added to the clobber list.

Change-Id: I67feebbd9f2944fc7f26c28496e49e1e2348529d
2014-04-18 18:10:52 +02:00
James Zern
6653b601ef enc_mips32: fix unused symbol warning in debug
move kC1 / kC2 under __OPTIMIZE__
missed in:
8dec120 enc_mips32: disable ITransform(One) in debug builds

Change-Id: Ic9a12e6d73090c8c06b0e7a4bc56dd9c76b8e596
2014-04-17 23:35:36 -07:00
James Zern
8dec120975 enc_mips32: disable ITransform(One) in debug builds
avoids:
src/dsp/enc_mips32.c: In function 'ITransformOne':
src/dsp/enc_mips32.c:123:3: can't find a register in class 'GR_REGS' while reloading 'asm'
src/dsp/enc_mips32.c:123:3: 'asm' operand has impossible constraints

Change-Id: Ic469667ee572f25e502c9873c913643cf7bbe89d
2014-04-17 20:10:31 -07:00
James Zern
98519dd5c1 enc_neon: convert Disto4x4 to intrinsics
Change-Id: I0f00d5af2de2301e8237c2a38a9612d3645abad6
2014-04-17 18:29:31 -07:00
Pascal Massimino
fe9317c9bf cosmetics:
* remove MIPS32 suffix from static function names
* fix a long line in enc_neon.c

Change-Id: Ia1294ae46f471b3eb1e9ba43c6aa1b29a7aeb447
2014-04-16 00:36:19 -07:00
James Zern
953b074677 enc_neon: cosmetics
fix/remove incorrect comments
+ whitespace

Change-Id: Id1b86beb23e5bf946e73c34ab7066b6ca177f33b
2014-04-15 23:57:03 -07:00
skal
a9fc697cb6 Merge "WIP: extract the float-calculation of HuffmanCost from loop" 2014-04-15 11:33:11 -07:00
skal
3f84b5219d Merge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)" 2014-04-15 07:09:12 -07:00