67 Commits

Author SHA1 Message Date
Pascal Massimino
58410cd6dc fix bug in RefineUsingDistortion()
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)

We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.

Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
(cherry picked from commit 0a3838ca77c515ace2c49738f6976dc8aa3e136c)
2016-12-08 15:48:16 -08:00
Pascal Massimino
e168af8c6c fix filtering auto-adjustment
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).

Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.

Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
(cherry picked from commit e4cd4daf746a03aec9fd709ece756e6d39740aff)
2016-12-08 15:48:08 -08:00
James Zern
9629f4bcda SimplifySegments: quiet -Warray-bounds warning
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9
this is similar to the changes made in:
c8a87bb AssignSegments: quiet -Warray-bounds warning
3e7f34a AssignSegments: quiet array-bounds warning

Change-Id: Iec7d470be424390c66f769a19576021d0cd9a2fd
2016-05-02 12:17:49 -07:00
Pascal Massimino
e88c4ca013 fix -m 2 mode-cost evaluation (causing partition0 overflow)
The mode's bits were not taken into account, which is ok for most of cases.
But in case of super large image, with 'easy' content, their overhead starts
mattering a lot and we were omitting to optimize for these.
Now, these mode bits have their own lambda values associated, limiting
the jerkiness. We also limit (for -m 2 only) the individual number of bits
to something that will prevent the partition 0 overflow.

removed the I4_PENALTY constant, which was a rather crude approximation.
Replaced by some q-dependent expression.

fixes issue #289

Change-Id: I956ae2d2308c339adc4706d52722f0bb61ccf18c
2016-03-11 20:34:45 +01:00
Pascal Massimino
367bf903b3 fix PrintBlockInfo()
... which has gone out of sync since the last block-cache layout change.

Change-Id: Ic441ec07b0198b508ce3fd34ab582cb60b1daabc
2015-12-17 15:47:25 +01:00
Pascal Massimino
9cf1cc2bd6 remove few TODO:
* 256 -> RD_DISTO_MULT
  * don't use TDisto for UV mode picking

Change-Id: I243148c716fe688b5c1b1fb9b7a6e58d0b5e6835
2015-12-15 22:52:12 -08:00
Pascal Massimino
e6c9351918 add disto-based refinement for UV mode (if method = 1 or 2)
This doesn't slow down much and give some quality improvement.

Change-Id: I5afbe62b9c3922b3ec1bf6538c68dcdb0f25d2e4
2015-12-11 03:15:59 -08:00
Pascal Massimino
14d27a46be improve method #2 by merging DistoRefine() and
SimpleQuantize()

it's now a single function, that reconstructs the intra4x4 block during the scan
The I4_PENALTY had to be adjusted.

Overall, result is better quality-wise (esp. at q < 50), and a tad faster too.

method #0, #1 and #3+ are unchanged

Change-Id: If262aeb552397860b3dd532df8df6b1357779222
2015-12-10 08:04:04 +01:00
skal
ac76801159 introduce FTransform2 to perform two transforms at a time.
FTransform goes from ~12.0% to 11.5% total CPU time.

Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
Pascal Massimino
82d980209b add a dec/common.h header to collect common enc/dec #defines
had to rename few structs.

-> we can now include both vp8i.h and vp8enci.h without naming
conflicts.

Change-Id: Ib41b498f1b57aab3d6b796361afc45210ec75174
2015-03-31 22:17:58 -07:00
Pascal Massimino
2382050748 1-2% faster encoding by removing an indirection in GetResidualCost()
The MIPS code for cost is not updated yet, that's why i keep Residual::*cost
around for now. Should be removed in favor of *costs later.

Change-Id: Id1d09a8c37ea8c5b34ad5eb8811d6a3ec6c4d89f
2015-02-19 08:44:35 +01:00
James Zern
9475bef4d7 PickBestUV: fix VP8Copy16x8 invocation
param order is src, dst

broken in:
66ad372 factorize BPS definition in dsp.h and add VP8Copy16x8

Change-Id: I761f618e3fe31ae7f58953256381f4f16bdb238e
2014-12-04 23:12:30 -08:00
Pascal Massimino
66ad372500 factorize BPS definition in dsp.h and add VP8Copy16x8
Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d
2014-12-04 13:45:14 +01:00
Pascal Massimino
57606047ec encoder: switch BPS to 32 instead of 16
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/

With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.

Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
2014-12-04 09:17:18 +01:00
skal
a48a2d7635 ~3-5% faster encoding optimizing PickBestIntra*()
* Add early-out check for Intra16
* replace some memcpy() by pointer swap

Change-Id: I5edc5f7fbc8e39984deb48e6c045c97c61418589
2014-09-01 14:40:25 +02:00
skal
e632b0929b fix indentation
Change-Id: I2294a6c83e5f345f64bd5120b91532e00ed6c543
2014-08-25 23:52:09 -07:00
skal
73d361dd5f introduce VP8EncQuantize2Blocks to quantize two blocks at a time
No speed diff for now. We might reorder better the instructions later,
to speed things up.

Change-Id: I1949525a0b329c7fd861b8dbea7db4b23d37709c
2014-08-25 20:21:42 -07:00
skal
69fce2ea78 remove the special casing for res->first in VP8SetResidualCoeffs
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.

also: fixes a warning for win64
    + slight speed-up

Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
2014-06-08 06:40:22 +02:00
skal
869eaf6c60 ~30% encoding speedup: use NEON for QuantizeBlock()
also revamped the signature to avoid having to pass the 'first' parameter

Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5
2014-04-08 03:08:22 -07:00
James Zern
fbed36433d Merge "dsp: reuse wht transform from dec in encoder" 2014-03-26 15:13:07 -07:00
skal
d1b33ad58b 2-5% faster trellis with clang/MacOS
(and ~2-3% on ARM)

We don't need to store cost/score for each node, but only for
the current and previous one -> simplify code and save some memory.

Also made the 'Node' structure tighter.

Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6
2014-03-26 22:33:01 +01:00
James Zern
df230f2723 dsp: reuse wht transform from dec in encoder
Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717
2014-03-22 13:25:08 -07:00
Pascal Massimino
59daf08362 Merge "cosmetics:" 2014-03-18 04:02:33 -07:00
Pascal Massimino
536220084c cosmetics:
- use VP8ScanUV, separate from VP8Scan[] (for luma)
 - fix indentation
 - few missing consts
 - change TrellisQuantizeBlock() signature

Change-Id: I94b437d791cbf887015772b5923feb83dd145530
2014-03-18 03:34:56 -07:00
skal
30176619c6 4-5% faster trellis by removing some unneeded calculations.
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)

Output is bitwise the same as before.

Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
2014-03-06 15:57:25 +01:00
skal
82af82644b few cosmetics after patch #69079
Change-Id: Ifa758420421b5a05825a593f6b43504887603ee7
2014-03-03 23:53:08 +01:00
skal
5aeeb087d6 5-10% encoding speedup with faster trellis (-m 6)
mostly by:
- storing a single rd-score instead of cost / distortion separately
- evaluating terminal cost only once
- getting some invariants out of the loops
- more consts behind fewer variables

Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c
2014-03-03 22:07:06 +01:00
Pascal Massimino
4287d0d49b speed-up trellis quant (~5-10% overall speed-up)
store costs[] in node instead of context

Change-Id: I6aeb0fd94af9e48580106c41408900fe3467cc54
also: various cosmetics
2014-02-27 00:06:00 -08:00
Pascal Massimino
390c8b316d lossy encoding: ~3% speed-up
incorporate non-last cost in per-level cost table

also: correct trellis-quant cost evaluation at nodes
(output a little bit different now). Method 6 is ~4% faster.

Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8
2014-02-26 05:52:24 -08:00
skal
0235d5e44b 1-2% faster quantization in SSE2
C-version is a bit faster too (sub-1% faster on ARM)

Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97
2014-02-13 15:55:30 -08:00
Pascal Massimino
495bef413d fix bug in TrellisQuantize
the *quantized* level should be clipped to 2047, not the
original coeff.
(similar problem was fixed in the regular quantize function
quite some time ago)

Change-Id: I2fd2f8d94561ff0204e60535321ab41a565e8f85
2013-12-17 11:08:01 -08:00
James Zern
5227d99146 drop: ifdef __cplusplus checks from C files
the prototypes are already marked in the headers

Change-Id: I172fe742200c939ca32a70a2299809b8baf9b094
2013-12-13 11:42:13 -08:00
skal
73b731fb42 introduce a special quantization function for WHT
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.

Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
2013-12-10 14:21:47 +01:00
skal
a3359f5d2c Only compute quantization params once
(all quantization params #1..#15 are the same)

Change-Id: If04058bd89fe2677b5b118ee4e1bcce88f0e4bf5
2013-12-10 05:36:23 +01:00
skal
d513bb62bc * fix off-by-one zthresh calculation
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this

Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)

Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24
2013-12-09 13:56:04 +01:00
James Zern
4931c3294b cosmetics: fix some typos
Change-Id: I0d6efebd817815139db5ae87236fd8911df4d53c
2013-11-26 19:21:14 -08:00
skal
e3312ea681 detect flatness in blocks and favor DC prediction
this avoids local-minima that look bad, even if the distortion
looks low (e.g. gradients, sky,...). Mostly visible in the q=50-80 range.
Output size is mostly unchanged.

Change-Id: I425b600ec45420db409911367cda375870bc2c63
2013-11-01 00:47:04 +01:00
skal
a014e9c9cd tune quantization biases toward higher precision
* raise U/V quantization bias to more neutral values
* also raise the non-zero AC bias for Y1/Y2 matrices

(we need all the precision we can for U/V leves, which are often empty)
This will increase quality in the higher range (q >= 90) mostly.
Files size is exacted to raise a little (5-7%). and SSIM accordingly of course.

Change-Id: I8a9ffdb6d8fb6dadb959e3fd392e66dc5aaed64e
2013-10-30 23:57:23 +01:00
skal
1e898619cb add helpful PrintBlockInfo() function
(protected under a DEBUG_BLOCK compile flag)

Change-Id: Icb8da02dbd00e5cf856c314943c212f1c9578d9b
2013-10-30 19:25:27 +01:00
James Zern
d3408720d8 Merge "fast auto-determined filtering strength" 2013-10-29 12:47:58 -07:00
skal
f8bfd5cd1e fast auto-determined filtering strength
kLevelsFromDelta[sharpness][delta] is an inverse look-up table
that tells the minimum filtering strength needed to trigger the
filtering of a step with amplitude 'delta'. We use this table
in various situations:

a) when computing the initial (/global) filtering
strength for each segment. We look at the quantization
step and deduce the proper filtering strength needed
to result this quantization noise (talking the -f option
into account).

b) during intra16 calculation, when a block ends up
very empty (only DC coeffs are non-zero, all ACs have
vanished). We'll rely on the in-loop filtering to
restore the smoothness (if the source was gradient-like
smooth. That's why we look at the distortion too before
triggering the filtering).

Step b) goes _in addition_ to a), potentially raising
the filtering strength if blockiness is likely.

Change-Id: Icaeca93ef21da195b079e6587a44d9edfc8e9efa
2013-10-29 20:13:29 +01:00
Pascal Massimino
ac0bf951ca small clean-up in ExpandMatrix()
Change-Id: Ib06cb1658a6548f06bb7320310b3864881b606a7
2013-10-29 19:58:57 +01:00
James Zern
10fddf53bb enc/quant.c: silence a warning
score_t -> int: rd_i4.H contains a value from a uint16_t
lookup

Change-Id: I7227de2dfab74b4f796abbc47955197ffa0e6110
2013-09-11 00:04:11 -07:00
Pascal Massimino
9f24519e82 encoder: misc rate-related fixes
* fix VP8FixedCostsI4ÆÅ table
   (the constant cost '211' was erronenously included)
* use the rd-score for '211' correctly (calling SetRDScore() for good)
* count partition0 bits separately during rd-opt

No meaningful difference in rd-curve.

Change-Id: I6c49a150cf28928d9a92c32fff097600d7145ca4
2013-09-10 00:25:32 -07:00
skal
93402f02db multi-threaded segment analysis
When -mt is used, the analysis pass will be split in two
and each halves performed in parallel. This gives a 5%-9% speed-up.

This was a good occasion to revamp the iterator and analysis-loop
code. As a result, the default (non-mt) behaviour is a tad (~1%) faster.

Change-Id: Id0828c2ebe2e968db8ca227da80af591d6a4055f
2013-09-05 09:13:36 +02:00
skal
de4d4ad598 VP8EncIterator clean-up
- remove unused fields from iterator
- introduce VP8IteratorSetRow() too
- rename 'done_' to 'countdown_'
- bring y_left_/u_left_/v_left_ from VP8Encoder

Change-Id: Idc1c15743157936e4cbb7002ebb5cc3c90e7f92a
2013-08-01 23:05:54 -07:00
James Zern
d640614d54 update copyright text
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files

based on the discussion in:
https://codereview.chromium.org/12771026/

Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
2013-06-06 23:09:14 -07:00
Pascal Massimino
58ca6f65b7 rebalance method tools (-m) for methods [0..4]
(methods 5 and 6 are still untouched).

Methods #0 and #1 got much faster
Method #2 gets vastly improved in quality
Method #3 is noticeably faster for little lower quality
Method #4 (default) is 10-20% faster for comparable quality

+ update the internal doc about the methods' tools.

Example of speed difference:

Time to encode picture:
Method | Before | After
-m 0   | 1.272s | 0.517s
-m 1   | 1.295s | 0.623s
-m 2   | 2.217s | 0.834s
-m 3   | 2.816s | 2.243s
-m 4   | 3.235s | 3.014s
-m 5   | 3.668s | 3.654s
-m 6   | 8.296s | 8.235s

Change-Id: Ic41fda5de65066b3a6586cb8ae1ebb0206d47fe0
2013-02-27 02:19:20 -08:00
Pascal Massimino
5189957e07 describe rd-opt levels introduce VP8RDLevel enum
makes things somehow clearer compared to using magic constants

Change-Id: I9115cee71252511f722806427ee8a97f1a1cd95f
2013-02-26 02:20:59 -08:00
skal
e895059a05 add a -jpeg_like option
This option remaps internal parameters to better match
the expected compression curve of JPEG and produce output files
of similar size, but with better quality.

Change-Id: I96a1cbb480b1f6a0c6845a23c33dfd63f197b689
2013-02-06 14:19:16 +01:00