Commit Graph

1538 Commits

Author SHA1 Message Date
Djordje Pesut
7ce8788b06 MIPS: dspr2: added optimization for function MakeARGB32
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row

Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
2014-12-22 12:31:36 +01:00
Pascal Massimino
87c3d53180 method=0: Don't evaluate any predictor
and apply Paeth predictor (predictor#11) for the low effort (m=0) mode.

For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality
range and about 10% for higher quality range.

Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93
2014-12-17 18:41:08 +01:00
Djordje Pesut
6f4fcb983e Merge "MIPS: dspr2: added optimization for function ImportRow" 2014-12-17 09:36:02 -08:00
Pascal Massimino
24284459c7 replace unneeded calls to HistogramCopy() by swaps
most of the time, we don't need to actually move the
data.

Compression is randomly slightly different, because HistogramCompactBins() changed.
Timing is about the same.

Change-Id: Ia6af8e9780581014d6860f2b546189ac817cfad1
2014-12-17 15:32:36 +01:00
Djordje Pesut
bdf7b40c5c MIPS: dspr2: added optimization for function ImportRow
Change-Id: I8205b551755ee51f5efd0c54d64c8b09771786b1
2014-12-17 15:24:41 +01:00
pascal massimino
e66a9225f3 Merge "MIPS: dspr2: added optimization for function ExportRowC" 2014-12-17 04:36:25 -08:00
Djordje Pesut
c279fec192 MIPS: dspr2: added optimization for function ExportRowC
Change-Id: Ie1a303089eb64736f8bc7573819a8219aa7528a3
2014-12-17 12:01:48 +01:00
Pascal Massimino
31a9cf6417 Speedup WebP lossless compression for low effort (m=0) mode with following:
- Disable Cross-Color transform.
- Evaluate predictors #11 (paeth), #12 and #13 only.

Change-Id: I857264c85c61c3957d4fb45ae32d261d947c8bed
2014-12-17 11:52:11 +01:00
Djordje Pesut
9275d91c79 MIPS: dspr2: added optimization for function TrueMotion
Change-Id: Id006d9591c0c922e28f7f4c01e4006f0f07bdd56
2014-12-12 14:38:55 +01:00
pascal massimino
26106d662e Merge "enc_neon: fix building with non-Xcode clang (iOS)" 2014-12-11 02:25:25 -08:00
Pascal Massimino
1c4e3efea0 unroll the kBands[] indirection to remove a dereference in GetCoeffs()
speed-up is small but visible.

Change-Id: Icff546adc3276f3c3d46b147c4a735b5eb8ff22e
2014-12-11 08:06:20 +01:00
James Zern
a3946b8956 enc_neon: fix building with non-Xcode clang (iOS)
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.

Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
2014-12-10 15:50:26 -08:00
Pascal Massimino
8ed9c00d5e Merge "simplify the Histogram struct, to only store max_value and last_nz" 2014-12-10 02:02:05 -08:00
Pascal Massimino
bad775715a simplify the Histogram struct, to only store max_value and last_nz
we don't need to store the whole distribution in order to compute the alpha

Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.

Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
2014-12-10 10:44:57 +01:00
Djordje Pesut
3cca0dc7f0 MIPS: dspr2: Added optimization for DCMode function
Change-Id: I8ea31907c1ea1259ec4db8cee1a479bd13a025a1
2014-12-09 13:58:39 +01:00
Djordje Pesut
37e395fd1c MIPS: fix functions to use generic BPS istead of hardcoded value
Change-Id: I2d68abef886eff7f8df230f155b758dccd7d04fd
2014-12-05 15:55:47 +01:00
James Zern
9475bef4d7 PickBestUV: fix VP8Copy16x8 invocation
param order is src, dst

broken in:
66ad372 factorize BPS definition in dsp.h and add VP8Copy16x8

Change-Id: I761f618e3fe31ae7f58953256381f4f16bdb238e
2014-12-04 23:12:30 -08:00
James Zern
441f273f19 Merge changes I55f8da52,Id73a1e96
* changes:
  cosmetics: add some missing != NULL comparisons
  factorize BPS definition in dsp.h and add VP8Copy16x8
2014-12-04 20:46:29 -08:00
Pascal Massimino
4a279a680e cosmetics: add some missing != NULL comparisons
Change-Id: I55f8da527e5e8ee4b49c7e7aa0d61ea4a6c80904
2014-12-04 14:54:11 +01:00
Pascal Massimino
66ad372500 factorize BPS definition in dsp.h and add VP8Copy16x8
Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d
2014-12-04 13:45:14 +01:00
Pascal Massimino
432e5b550e make ALIGN_xxx naming consistent
(potentially for future factorization between enc/ and dec/)

Change-Id: Ibf6670e21433a6a6a7202dcbe76f7efc8493b8cf
2014-12-04 13:32:10 +01:00
Pascal Massimino
57606047ec encoder: switch BPS to 32 instead of 16
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/

With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.

Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
2014-12-04 09:17:18 +01:00
Djordje Pesut
1b66bbe998 MIPS: dspr2: added optimization for function TransformColor_C
Change-Id: Idbf5cecf6775340585b0fd7e6ddcb29c2fcbea36
2014-12-01 15:46:06 +01:00
James Zern
c6d0f9e758 histogram: cosmetics
fix indent + other minor spelling / whitespace changes

Change-Id: I6e4462b75c98994e3c53c115de07047dbe71ce3c
2014-11-25 15:53:19 -08:00
James Zern
f399d30764 Merge changes I6eac17e5,I32d2b514
* changes:
  dec_neon: add TM8uv
  dsp: initialize VP8PredChroma8 in VP8DspInit()
2014-11-25 15:32:14 -08:00
James Zern
9de9074c92 dec_neon: add TM8uv
~68% faster

reuses TM4() adding support for the additional rows, the columns were
already being done.

Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
2014-11-25 14:40:17 -08:00
James Zern
8e517eca68 bit_reader/kVP8NewRange: range_t -> uint8_t
decreases the size of each entry from 4 bytes to 1.

Change-Id: I3e6a50bcbc279e5edfa411edb97b04300dedc7ae
2014-11-24 22:16:26 -08:00
James Zern
e18571393d dsp: initialize VP8PredChroma8 in VP8DspInit()
the table becomes non-const to allow for platform-specific optimizations

Change-Id: I32d2b51480020dc653ecfafd20b6b0f096af349f
2014-11-24 22:12:42 -08:00
Vikas Arora
e0c809ad23 Move Entropy methods to lossless.c
Move all the Entropy evaluation methods to lossless.c (from histogram.c).
There's slight difference in the way entropy is computed for evaluating
entropy in prediction methods and histogram (literal) for huffman trees.
Plan (later) to merge few (static) methods and reduce the code size.

This change has no impact on the compression speed/density.

Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15
2014-11-20 13:48:05 -08:00
Vikas Arora
a0df55104e Remove handling for WEBP_HINT_GRAPH
Remove handling for WEBP_HINT_GRAPH w.r.t use_palette flag.

The WEBP_HINT_GRAPH is now used at one place, to set the initial size of the
Bit Writer as bpp for photo images are generally larger than the graphical
images.

Change-Id: I1b9c4436c85a8f69da74c0dbcd292397323f2696
2014-11-13 15:49:23 -08:00
Vikas Arora
413dfc0c4b Move static method definition before its usage.
Change-Id: Id766c2bea92e7ebf0de65046f73429b74b4fdda4
2014-11-13 13:18:30 -08:00
Vikas Arora
0f23566558 Update BackwardRefsWithLocalCache.
Update BackwardRefsWithLocalCache to do in-place update of backward
references w.r.t local color cache index.

No impact on the compression density or compression speed.

Change-Id: Ie066251464c3928c044e037b43df3af28b48ca30
2014-11-13 11:54:26 -08:00
Vikas Arora
d69e36ec59 Remove TODOs from lossless encoder code.
histogram.c:
 - Verified (earlier) that there's low correlation between Red & Blue colors
   (particularly after applying Cross-color transform). The Bin based histogram
   merge, bins on three entropies viz literal, red & blue symbols. Removing
   either of blue or red increases the compression density. So keeping the bins
   for red & blue sybmols.
 - Keeping the compact bins method as-is. This way it's simpler to read.
huffman_encode.h: Added field comments for struct HuffmanTree and removed the TODO.

Change-Id: Ia76f7bc730079d1b3b644038c5d9931db3797f0e
2014-11-12 16:10:16 -08:00
Vikas Arora
fdaac8e0ca Optmize VP8LGetBackwardReferences LZ77 references.
Use the refs_lz77 computed (with cache_bits=0) in the method 'CalculateBestCacheSize'
to regenerate the LZ77 references corresponding to the optimum cache_bits and avoid
calling costly 'BackwardReferencesLz77' one extra time.

This change leaves the compression density unchanged and speeds up compression
by 10-15%.

Change-Id: I5a92e11788d3c3f656aa7e1fba54fb5d96ee0027
2014-11-12 14:50:04 -08:00
Djordje Pesut
2f0e2ba826 MIPS: dspr2: added optimization for function Select
Change-Id: I22470d8b9ab8c5e90c5330ff12c9852676da1a3d
2014-11-07 09:44:16 +01:00
pascal massimino
a3e79a46f6 Merge "WebPEncode: Support encoding same pic twice (even if modified)" 2014-11-06 22:20:01 -08:00
Urvang Joshi
e4f4dddba3 WebPEncode: Support encoding same pic twice (even if modified)
This wasn't working for this specific scenario:
- Encode an RGBA 'pic' (with trivial alpha) using lossy encoding.
(so that pic->a == NULL after import happens).
- Modify the 'pic->argb' so that it has non-trivial alpha.
- Encode the same 'pic' again.
This used to fail to encode alpha data as pic->a == NULL.

Change-Id: Ieaaa7bd09825c42f54fbd99e6781d98f0b19cc0c
2014-11-06 13:52:48 -08:00
pascal massimino
cbc3fbb4d7 Merge "Updated VP8LGetBackwardReferences and color cache." 2014-11-06 13:47:21 -08:00
Vikas Arora
95a9bd85c4 Updated VP8LGetBackwardReferences and color cache.
- The optimal cache bits is evaluated inside the method 'VP8LGetBackwardReferences'.
- The input cache_bits to 'VP8LGetBackwardReferences' sets the maximum cache
  bits to use (passing 0 implies disabling the local color cache).
- The local color cache is disabled for lowerf (<= 25) quality levels (as before).
- Enabled local color cache for palette images as well. This saves additional
  0.017% bytes with a slight (2-3%) improvement in the compression speed.
- Removed 'use_2d_locality' parameter from method VP8LGetBackwardReferences, as
  this option is not an option now (after we freeze the lossless bit-stream).

Change-Id: I33430401e465474fa1be899f330387cd2b466280
2014-11-06 13:14:05 -08:00
Djordje Pesut
54f2c14cce MIPS: dspr2: added optimization for function FTransform
Change-Id: Ib5850edbc2a586ec9781f494b2337f024e22af78
2014-11-06 14:21:33 +01:00
Djordje Pesut
aa42f4231f MIPS: dspr2: Added optimization for function VP8LSubtractGreenFromBlueAndRed
Change-Id: I683c73cceee4a40ca810deba15e54fbf7dbe8918
2014-11-06 10:56:18 +01:00
Djordje Pesut
95ca44a718 MIPS: dspr2: added optimization for Disto4x4
enc/dec common macros moved to mips_macro.h

Change-Id: I38d491e772554ac663dd5eb4d15485c0343f23b1
2014-11-05 12:06:15 +01:00
James Zern
4171b6724e backward_references.c: reindent after c8581b0
Change-Id: Icfc0fe8e266c0f67a70b8cb095e5aaee155290b6
2014-11-04 17:40:04 +01:00
Vikas Arora
c8581b06e1 Optimize BackwardReferences for RLE encoding.
Updated BackwardReferencesRle method by utilizing the local color cache.
Also changed the name of method BackwardReferencesHashChain to
BackwardReferencesLz77 to reflect the LZ77 coding.

For the 1000 image corpus, this change saves 0.2% bytes
(at default settings) and is 2-5% faster to encode.

Change-Id: Ic3f288253b3bbb101a69945a80994c3fd0917f8b
2014-11-04 08:12:07 -08:00
Djordje Pesut
5798eee6be MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899)
Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140
2014-11-03 15:39:00 +01:00
Vikas Arora
4167a3f5f7 Optimize backwardreferences
Optimize backwardreferences (about 0.1% byte savings) with almost same
compression speed (3% faster on defaut compression settings).
1.) Simplified iteration logic for HashChainFindCopy.
    - Remapped the iter_max constant.
2.) Simplified main for loop for BackwardReferencesHashChain
    - Removed 'if' conditions for corner cases in the main loop.
    - Refactored the method(AddSingleLiteral) for adding one pixel.

Change-Id: I1bc44832fd81f11e714868a13e606c8f83157e64
2014-10-31 18:08:38 -07:00
James Zern
d18554c30d Merge "webp/types.h: use inline for clang++/-std=c++11" 2014-10-31 03:53:06 -07:00
Vikas Arora
77bdddf016 Speed up BackwardReferences
Speed up BackwardReferencesHashChainDistanceOnly method by:
1.) Remove for loop for shortmax code path.
2.) Execute the shortmax code path after regular call to
HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH).
3.) Also for shortmax, call method HashChainFindOffset (for length = 2),
instead of expensive method HashChainFindCopy().
4.) Handling first pixel (i==0) outside main loop and removing one if
condition (i > 0) per pixel.
5.) Handle the last pixel outside the main 'for' loop.

Overall compression speedup observed is around 5% (+/- noise).

Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b
2014-10-30 10:58:24 -07:00
James Zern
6638710b9e webp/types.h: use inline for clang++/-std=c++11
at least clang 3.[45] in c++ mode with -std=c++11 define __STRICT_ANSI__
this change set WEBP_INLINE to inline for c++/non-strict-ansi/> c99

fixes crbug.com/428383

Change-Id: Ief2b934353c336a75865c73c90cc3dc5e4f83913
2014-10-30 15:25:27 +01:00
Vikas Arora
abf04205b3 Enable entropy based merge histo for (q<100)
Enable bin-partition entropy based heuristic for merging histograms
for higher (q >= 90) qualities as well. Keep the old behavior at the
maximum quality level (q==100).

This speeds up the compression between Q=90-99 (method=4) by factor 5-7X
and with loss of 0.5-0.8% in the compression density.

Change-Id: I011182cb8ae5403c565a150362bc302630b3f330
2014-10-30 03:59:36 -07:00