Commit Graph

1687 Commits

Author SHA1 Message Date
James Zern
f8740f0d6c dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
for consistency with other defines shared across modules

Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
James Zern
ce73abe054 Merge "introduce a separate WebPRescalerDspInit to initialize pointers" 2015-01-12 14:25:37 -08:00
Pascal Massimino
ab66becaae introduce a separate WebPRescalerDspInit to initialize pointers
so that we keep the details of WebPRescaler in utils/rescaler.c
when possible.

Change-Id: Ib6c1029a09b84cbc7a7d2f70dafa4d4d9132cecc
2015-01-12 13:58:30 -08:00
Pascal Massimino
205c7f26af fix handling of zero-sized partition #0 corner case
reported in https://code.google.com/p/webp/issues/detail?id=237

An empty partition #0 should be indicative of a bitstream error.
The previous code was correct, only an assert was triggered in debug mode.
But we might as well handle the case properly right away...

Change-Id: I4dc31a46191fa9e65659c9a5bf5de9605e93f2f5
2015-01-12 20:30:53 +01:00
pascal massimino
cbcdd5ffaf Merge "move rescaler functions to rescaler* files in src/dsp/" 2015-01-10 05:41:45 -08:00
pascal massimino
bf586e8844 Merge changes I230b3532,Idf3057a7
* changes:
  enable NEON for Windows ARM builds
  Makefile.vc: add rudimentary Windows ARM support
2015-01-10 02:14:48 -08:00
pascal massimino
6dc79dc226 Merge "anim_encode: fix type conversion warnings" 2015-01-10 02:12:25 -08:00
James Zern
4f43d38ca8 enable NEON for Windows ARM builds
Change-Id: I230b353214ce44ab29ffd2df6ccd14345d6578e8
2015-01-09 19:11:55 -08:00
James Zern
e7c5954c10 dec_neon: remove returns from void functions
Change-Id: I3c66a5dfe3de2bb3653cbbf1b92b0328aba62881
2015-01-09 18:08:05 -08:00
James Zern
f79c163bbf anim_encode: fix type conversion warnings
fixes:
C4267: '=' : conversion from 'size_t' to 'int', possible loss of data

Change-Id: Ie8e0bbd6f19fde21b2dbbd2a92cc99e76502dfed
2015-01-09 17:12:06 -08:00
Djordje Pesut
cbcbedd0de move rescaler functions to rescaler* files in src/dsp/
Change-Id: I906add1b1010a59ebfcc2dd81e15745433cc206b
2015-01-09 16:47:09 +01:00
James Zern
e8694d4dc3 mux: remove experimental FRGM parsing
fragment references remain: to be removed in a future commit

Change-Id: I02974c8a709cfe16dce72568639c8b912859de8e
2015-01-08 20:02:40 -08:00
Urvang Joshi
9e92b6eac6 AnimEncoder API: Optimize single-frame animated images
Try converting them to a non-animated image and pick that one if it's smaller
in size.

Change-Id: Ib97438fd2a95b1bfa9b7526a0938a9d85df33a57
2015-01-08 12:30:46 -08:00
Djordje Pesut
a28c4b363d MIPS: move WORK_AROUND_GCC define to appropriate place
Change-Id: I3055eca57dc4e9d39533a5b8170bbf7af9cd818f
2015-01-08 15:55:41 +01:00
Djordje Pesut
012d2c60fa MIPS: dspr2: added optimization for functions SSEAxB
list of optimized functions: SSE16x16, SSE8x8, SSE16x8, SSE4x4

Change-Id: Ie99e7cdd73b0d4ff855977315a5d0db9ffaa5f04
2015-01-08 13:49:17 +01:00
Djordje Pesut
9241ecf45d MIPS: dspr2: added optimization for function Average
Change-Id: I7ca316bc3f5fbdaf8dcaf9a2d2227a5134bf4f63
2015-01-08 11:46:15 +01:00
pascal massimino
9422211d5f Merge "Tune BackwardReferencesLz77 for low_effort (m=0)." 2015-01-08 00:46:51 -08:00
pascal massimino
df40057b21 Merge "Speedup VP8LGetHistoImageSymbols for low effort (m=0) mode." 2015-01-08 00:46:43 -08:00
Vikas Arora
ea08466d34 Tune BackwardReferencesLz77 for low_effort (m=0).
- Lower the threshold parameters for HashChainFindCopy.

For 1000 image PNG corpus (m=0), this change yields speedup of 15-20% at
lower quality range (0.25% drop in compression density) and about 10%
for higher quality range without any drop in the compression density.
Following is the compression stats (before/after) for method = 0:
         Before           After
         bpp/MPs          bpp/MPs
q=0      2.8615/18.000    2.8651/18.631
q=5      2.8615/18.216    2.8650/20.517
q=10     2.8572/18.070    2.8650/21.992
q=15     2.8519/18.371    2.8584/21.747
q=20     2.8454/18.975    2.8515/20.448
q=25     2.8230/8.531     2.8253/9.585
// Compression density remains same for q-range [30-100]
q=30     2.7310/7.706     2.7310/8.028
q=35     2.7253/6.855     2.7253/7.184
q=40     2.7231/6.364     2.7231/6.604
q=45     2.7216/5.844     2.7216/6.223
q=50     2.7196/5.210     2.7196/5.731
q=55     2.7208/4.766     2.7208/4.970
q=60     2.7195/4.495     2.7195/4.602
q=65     2.7185/4.024     2.7185/4.236
q=70     2.7174/3.699     2.7174/3.861
q=75     2.7164/3.449     2.7164/3.605
q=80     2.7161/3.222     2.7161/3.038
q=85     2.7153/2.919     2.7153/2.946
q=90     2.7145/2.766     2.7145/2.771
q=95     2.7124/2.548     2.7124/2.575
q=100    2.6873/2.253     2.6873/2.335

Change-Id: I0e17581fb71f6094032ad06c6203350bd502f9a1
2015-01-08 00:30:21 -08:00
Vikas Arora
b0b973c39b Speedup VP8LGetHistoImageSymbols for low effort (m=0) mode.
- Do light weight entropy based histogram combine and leave out CPU
  intensive stochastic and greedy heuristics for combining the
  histograms.

For 1000 image PNG corpus (m=0), this change yields speedup of 10% at
lower quality range (1% drop in compression density) and about 5% for
higher quality range (1% drop in compression density). Following is the
compression stats (before/after) for method = 0:
         Before           After
         bpp/MPs          bpp/MPs
q=0      2.8336/16.577    2.8615/18.000
q=5      2.8336/16.504    2.8615/18.216
q=10     2.8293/16.419    2.8572/18.070
q=15     2.8242/17.582    2.8519/18.371
q=20     2.8182/16.131    2.8454/18.975
q=25     2.7924/7.670     2.8230/8.531
q=30     2.7078/6.635     2.7310/7.706
q=35     2.7028/6.203     2.7253/6.855
q=40     2.7005/6.198     2.7231/6.364
q=45     2.6989/5.570     2.7216/5.844
q=50     2.6970/5.087     2.7196/5.210
q=55     2.6963/4.589     2.7208/4.766
q=60     2.6949/4.292     2.7195/4.495
q=65     2.6940/3.970     2.7185/4.024
q=70     2.6929/3.698     2.7174/3.699
q=75     2.6919/3.427     2.7164/3.449
q=80     2.6918/3.106     2.7161/3.222
q=85     2.6909/2.856     2.7153/2.919
q=90     2.6902/2.695     2.7145/2.766
q=95     2.6881/2.499     2.7124/2.548
q=100    2.6873/2.253     2.6873/2.285

Change-Id: I0567945068f8dc7888041e93d872f9def91f50ba
2015-01-08 00:29:57 -08:00
pascal massimino
c6d3292738 argb_sse2: cosmetics
clarify some variable names in PackARGB() + add some comments

Change-Id: I2bb91d6c52dcbcdebe0f92d5f2136c2d7d11af2a
2015-01-08 00:18:54 -08:00
James Zern
67f601cd46 make the 'last_cpuinfo_used' variable names unique
allows the sources to be #include'd in some hackish builds (don't do
that!)

Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91
2015-01-07 23:38:53 -08:00
Urvang Joshi
b9489861a3 AnimEncoder API: Init method for default options.
Change-Id: I3ccd7fe782e10c51986b55fc1a515d958ff70752
2015-01-07 14:32:11 -08:00
pascal massimino
856f8ec1fd Merge "AnimEncoder API: Remove AnimEncoderFrameOptions." 2015-01-07 13:39:10 -08:00
pascal massimino
c537514d46 Merge "AnimEncoder API: GenerateCandidates bugfix." 2015-01-07 13:38:24 -08:00
pascal massimino
dc0ce039f3 Merge "AnimEncoder API: Compute change rectangle for first frame too." 2015-01-07 13:37:27 -08:00
pascal massimino
f00b639b96 Merge "AnimEncoder API: In Assemble(), always set animation parameters." 2015-01-07 13:36:53 -08:00
pascal massimino
29ed796c17 Merge "AnimEncoder lib cleanup: prev to prev canvas not needed." 2015-01-07 13:36:21 -08:00
pascal massimino
9f0dd6e539 Merge "WebPAnimEncoder API: Header and implementation" 2015-01-07 13:35:49 -08:00
Urvang Joshi
5e56bbe09a AnimEncoder API: Remove AnimEncoderFrameOptions.
We only need config now, so this struct is not needed.

Change-Id: I5139956d13c36ceb4871d52122f248fe70f40c4b
2015-01-07 13:34:02 -08:00
Urvang Joshi
b902c3ea50 AnimEncoder API: GenerateCandidates bugfix.
As 'curr_canvas_mod' is being modified during calls to IncreaseTransparency()
and FlattenSimilarBlocks(), GetSubRect() should get the sub-frame from
'curr_canvas_mod' as well.

Earlier, GetSubRect() was computed from 'curr_canvas', so modifying
'curr_canvas_mod' had no effect on encoding.

Change-Id: Ia847503007b66364817fe57def5a9e3c37d1b3cc
2015-01-07 11:48:55 -08:00
Urvang Joshi
ef3c39bbd2 AnimEncoder API: Compute change rectangle for first frame too.
Earlier, we were always using full canvas for first frame.

Change-Id: Ib8d32961682c4b07010ea559a71dd59ab9ec0157
2015-01-07 11:26:27 -08:00
Urvang Joshi
eec423abe9 AnimEncoder API: In Assemble(), always set animation parameters.
We set the parameters even if there is just one frame. This is to make sure
assembly is correct even if single frame animation is NOT converted to a full
frame later.

Change-Id: If79e6aa5e2575cb0f3cd229f16c655b3663c35b0
2015-01-07 11:20:24 -08:00
Urvang Joshi
ae1c046e12 AnimEncoder lib cleanup: prev to prev canvas not needed.
Given that we decided to only handle frame disposal for output WebP
internally,
only current and previous canvas need to be maintained.

Change-Id: I625293bed5aeb5aabf4eca779f6ec3ee84c9ff2a
2015-01-07 11:17:40 -08:00
Urvang Joshi
4b997ae46d WebPAnimEncoder API: Header and implementation
A separate API to generate animated WebP images.
It will eventually replace the internal gif2webp_util methods.

Also: update makefiles.

Change-Id: Idf61dfc1016c10b24fea70425d1a2323cffba515
2015-01-07 10:42:02 -08:00
Pascal Massimino
9592053859 Merge "multi-thread fix: lock each entry points with a static var" 2015-01-07 00:03:51 -08:00
Pascal Massimino
4c1b300ada Merge "SSE2 implementation of VP8PackARGB" 2015-01-06 23:53:50 -08:00
James Zern
04c20e75ea Merge "MIPS: dspr2: added optimization for function Intra4Preds" 2015-01-06 16:15:10 -08:00
Pascal Massimino
a437694a17 multi-thread fix: lock each entry points with a static var
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)

Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
2015-01-05 07:48:49 -08:00
Pascal Massimino
ca7f60db5f SSE2 implementation of VP8PackARGB
Change-Id: I40c0e26a6a2701216e4ddebcf793aa535677f437
2015-01-05 05:17:51 -08:00
Pascal Massimino
72d573f693 simplify the PackARGB signature
Change-Id: I51570e362126b2681f93211a4f59a3fedb5fd4b5
2015-01-05 02:10:04 -08:00
James Zern
4e2589ff81 demux: restore strict fragment flag check
inadvertently removed in:
demux: remove experimental FRGM parsing

Change-Id: Ia9bb8211e2153df51e7a01cabe8552524b8ed218
2014-12-23 12:47:06 -05:00
James Zern
e752f0a673 Merge "demux: remove experimental FRGM parsing" 2014-12-23 09:12:15 -08:00
James Zern
f8abb112f2 Merge changes I109ec4d9,I73fe7743
* changes:
  dec_neon: add DC8uvNoTop / DC8uvNoLeft
  dec_neon: add DC8uv
2014-12-23 09:11:22 -08:00
Djordje Pesut
ae2188a435 MIPS: dspr2: added optimization for function Intra4Preds
Change-Id: Ie2a23c356a8715817b020fbee2b40e878e2946de
2014-12-23 17:32:27 +01:00
Pascal Massimino
1f4b8642e8 move VP8EncDspARGBInit() call closer to where it's needed
Change-Id: I0d5121b456918f0ee6646903a8d71d4384deafe3
2014-12-23 16:04:14 +01:00
James Zern
14108d7878 dec_neon: add DC8uvNoTop / DC8uvNoLeft
adds do_top/do_left flags to DC8uv; ~88% / ~92% faster respectively
no change in DC8uv speed.

Change-Id: I109ec4d9ad13c9db64516e98ed4693a21a3e9b54
2014-12-22 15:47:38 -05:00
James Zern
d8340da756 dec_neon: add DC8uv
~87% faster.

Change-Id: I73fe77437792f1361ce8ab0b411132c6ec0fa021
2014-12-22 14:36:45 -05:00
Djordje Pesut
7ce8788b06 MIPS: dspr2: added optimization for function MakeARGB32
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row

Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
2014-12-22 12:31:36 +01:00
James Zern
012e623ddd demux: remove experimental FRGM parsing
references to fragments remain, along with some superfluous checks; these
will be removed in a future commit.

Change-Id: I39fe9314900ecbc5d60e5065b65fa1b4c668af63
2014-12-19 19:03:17 -08:00
Pascal Massimino
87c3d53180 method=0: Don't evaluate any predictor
and apply Paeth predictor (predictor#11) for the low effort (m=0) mode.

For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality
range and about 10% for higher quality range.

Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93
2014-12-17 18:41:08 +01:00
Djordje Pesut
6f4fcb983e Merge "MIPS: dspr2: added optimization for function ImportRow" 2014-12-17 09:36:02 -08:00
Pascal Massimino
24284459c7 replace unneeded calls to HistogramCopy() by swaps
most of the time, we don't need to actually move the
data.

Compression is randomly slightly different, because HistogramCompactBins() changed.
Timing is about the same.

Change-Id: Ia6af8e9780581014d6860f2b546189ac817cfad1
2014-12-17 15:32:36 +01:00
Djordje Pesut
bdf7b40c5c MIPS: dspr2: added optimization for function ImportRow
Change-Id: I8205b551755ee51f5efd0c54d64c8b09771786b1
2014-12-17 15:24:41 +01:00
pascal massimino
e66a9225f3 Merge "MIPS: dspr2: added optimization for function ExportRowC" 2014-12-17 04:36:25 -08:00
Djordje Pesut
c279fec192 MIPS: dspr2: added optimization for function ExportRowC
Change-Id: Ie1a303089eb64736f8bc7573819a8219aa7528a3
2014-12-17 12:01:48 +01:00
Pascal Massimino
31a9cf6417 Speedup WebP lossless compression for low effort (m=0) mode with following:
- Disable Cross-Color transform.
- Evaluate predictors #11 (paeth), #12 and #13 only.

Change-Id: I857264c85c61c3957d4fb45ae32d261d947c8bed
2014-12-17 11:52:11 +01:00
Djordje Pesut
9275d91c79 MIPS: dspr2: added optimization for function TrueMotion
Change-Id: Id006d9591c0c922e28f7f4c01e4006f0f07bdd56
2014-12-12 14:38:55 +01:00
pascal massimino
26106d662e Merge "enc_neon: fix building with non-Xcode clang (iOS)" 2014-12-11 02:25:25 -08:00
Pascal Massimino
1c4e3efea0 unroll the kBands[] indirection to remove a dereference in GetCoeffs()
speed-up is small but visible.

Change-Id: Icff546adc3276f3c3d46b147c4a735b5eb8ff22e
2014-12-11 08:06:20 +01:00
James Zern
a3946b8956 enc_neon: fix building with non-Xcode clang (iOS)
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.

Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
2014-12-10 15:50:26 -08:00
Pascal Massimino
8ed9c00d5e Merge "simplify the Histogram struct, to only store max_value and last_nz" 2014-12-10 02:02:05 -08:00
Pascal Massimino
bad775715a simplify the Histogram struct, to only store max_value and last_nz
we don't need to store the whole distribution in order to compute the alpha

Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.

Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
2014-12-10 10:44:57 +01:00
Djordje Pesut
3cca0dc7f0 MIPS: dspr2: Added optimization for DCMode function
Change-Id: I8ea31907c1ea1259ec4db8cee1a479bd13a025a1
2014-12-09 13:58:39 +01:00
Djordje Pesut
37e395fd1c MIPS: fix functions to use generic BPS istead of hardcoded value
Change-Id: I2d68abef886eff7f8df230f155b758dccd7d04fd
2014-12-05 15:55:47 +01:00
James Zern
9475bef4d7 PickBestUV: fix VP8Copy16x8 invocation
param order is src, dst

broken in:
66ad372 factorize BPS definition in dsp.h and add VP8Copy16x8

Change-Id: I761f618e3fe31ae7f58953256381f4f16bdb238e
2014-12-04 23:12:30 -08:00
James Zern
441f273f19 Merge changes I55f8da52,Id73a1e96
* changes:
  cosmetics: add some missing != NULL comparisons
  factorize BPS definition in dsp.h and add VP8Copy16x8
2014-12-04 20:46:29 -08:00
Pascal Massimino
4a279a680e cosmetics: add some missing != NULL comparisons
Change-Id: I55f8da527e5e8ee4b49c7e7aa0d61ea4a6c80904
2014-12-04 14:54:11 +01:00
Pascal Massimino
66ad372500 factorize BPS definition in dsp.h and add VP8Copy16x8
Change-Id: Id73a1e968c96455808755df4d131d74e3e2e135d
2014-12-04 13:45:14 +01:00
Pascal Massimino
432e5b550e make ALIGN_xxx naming consistent
(potentially for future factorization between enc/ and dec/)

Change-Id: Ibf6670e21433a6a6a7202dcbe76f7efc8493b8cf
2014-12-04 13:32:10 +01:00
Pascal Massimino
57606047ec encoder: switch BPS to 32 instead of 16
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/

With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.

Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
2014-12-04 09:17:18 +01:00
Djordje Pesut
1b66bbe998 MIPS: dspr2: added optimization for function TransformColor_C
Change-Id: Idbf5cecf6775340585b0fd7e6ddcb29c2fcbea36
2014-12-01 15:46:06 +01:00
James Zern
c6d0f9e758 histogram: cosmetics
fix indent + other minor spelling / whitespace changes

Change-Id: I6e4462b75c98994e3c53c115de07047dbe71ce3c
2014-11-25 15:53:19 -08:00
James Zern
f399d30764 Merge changes I6eac17e5,I32d2b514
* changes:
  dec_neon: add TM8uv
  dsp: initialize VP8PredChroma8 in VP8DspInit()
2014-11-25 15:32:14 -08:00
James Zern
9de9074c92 dec_neon: add TM8uv
~68% faster

reuses TM4() adding support for the additional rows, the columns were
already being done.

Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
2014-11-25 14:40:17 -08:00
James Zern
8e517eca68 bit_reader/kVP8NewRange: range_t -> uint8_t
decreases the size of each entry from 4 bytes to 1.

Change-Id: I3e6a50bcbc279e5edfa411edb97b04300dedc7ae
2014-11-24 22:16:26 -08:00
James Zern
e18571393d dsp: initialize VP8PredChroma8 in VP8DspInit()
the table becomes non-const to allow for platform-specific optimizations

Change-Id: I32d2b51480020dc653ecfafd20b6b0f096af349f
2014-11-24 22:12:42 -08:00
Vikas Arora
e0c809ad23 Move Entropy methods to lossless.c
Move all the Entropy evaluation methods to lossless.c (from histogram.c).
There's slight difference in the way entropy is computed for evaluating
entropy in prediction methods and histogram (literal) for huffman trees.
Plan (later) to merge few (static) methods and reduce the code size.

This change has no impact on the compression speed/density.

Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15
2014-11-20 13:48:05 -08:00
Vikas Arora
a0df55104e Remove handling for WEBP_HINT_GRAPH
Remove handling for WEBP_HINT_GRAPH w.r.t use_palette flag.

The WEBP_HINT_GRAPH is now used at one place, to set the initial size of the
Bit Writer as bpp for photo images are generally larger than the graphical
images.

Change-Id: I1b9c4436c85a8f69da74c0dbcd292397323f2696
2014-11-13 15:49:23 -08:00
Vikas Arora
413dfc0c4b Move static method definition before its usage.
Change-Id: Id766c2bea92e7ebf0de65046f73429b74b4fdda4
2014-11-13 13:18:30 -08:00
Vikas Arora
0f23566558 Update BackwardRefsWithLocalCache.
Update BackwardRefsWithLocalCache to do in-place update of backward
references w.r.t local color cache index.

No impact on the compression density or compression speed.

Change-Id: Ie066251464c3928c044e037b43df3af28b48ca30
2014-11-13 11:54:26 -08:00
Vikas Arora
d69e36ec59 Remove TODOs from lossless encoder code.
histogram.c:
 - Verified (earlier) that there's low correlation between Red & Blue colors
   (particularly after applying Cross-color transform). The Bin based histogram
   merge, bins on three entropies viz literal, red & blue symbols. Removing
   either of blue or red increases the compression density. So keeping the bins
   for red & blue sybmols.
 - Keeping the compact bins method as-is. This way it's simpler to read.
huffman_encode.h: Added field comments for struct HuffmanTree and removed the TODO.

Change-Id: Ia76f7bc730079d1b3b644038c5d9931db3797f0e
2014-11-12 16:10:16 -08:00
Vikas Arora
fdaac8e0ca Optmize VP8LGetBackwardReferences LZ77 references.
Use the refs_lz77 computed (with cache_bits=0) in the method 'CalculateBestCacheSize'
to regenerate the LZ77 references corresponding to the optimum cache_bits and avoid
calling costly 'BackwardReferencesLz77' one extra time.

This change leaves the compression density unchanged and speeds up compression
by 10-15%.

Change-Id: I5a92e11788d3c3f656aa7e1fba54fb5d96ee0027
2014-11-12 14:50:04 -08:00
Djordje Pesut
2f0e2ba826 MIPS: dspr2: added optimization for function Select
Change-Id: I22470d8b9ab8c5e90c5330ff12c9852676da1a3d
2014-11-07 09:44:16 +01:00
pascal massimino
a3e79a46f6 Merge "WebPEncode: Support encoding same pic twice (even if modified)" 2014-11-06 22:20:01 -08:00
Urvang Joshi
e4f4dddba3 WebPEncode: Support encoding same pic twice (even if modified)
This wasn't working for this specific scenario:
- Encode an RGBA 'pic' (with trivial alpha) using lossy encoding.
(so that pic->a == NULL after import happens).
- Modify the 'pic->argb' so that it has non-trivial alpha.
- Encode the same 'pic' again.
This used to fail to encode alpha data as pic->a == NULL.

Change-Id: Ieaaa7bd09825c42f54fbd99e6781d98f0b19cc0c
2014-11-06 13:52:48 -08:00
pascal massimino
cbc3fbb4d7 Merge "Updated VP8LGetBackwardReferences and color cache." 2014-11-06 13:47:21 -08:00
Vikas Arora
95a9bd85c4 Updated VP8LGetBackwardReferences and color cache.
- The optimal cache bits is evaluated inside the method 'VP8LGetBackwardReferences'.
- The input cache_bits to 'VP8LGetBackwardReferences' sets the maximum cache
  bits to use (passing 0 implies disabling the local color cache).
- The local color cache is disabled for lowerf (<= 25) quality levels (as before).
- Enabled local color cache for palette images as well. This saves additional
  0.017% bytes with a slight (2-3%) improvement in the compression speed.
- Removed 'use_2d_locality' parameter from method VP8LGetBackwardReferences, as
  this option is not an option now (after we freeze the lossless bit-stream).

Change-Id: I33430401e465474fa1be899f330387cd2b466280
2014-11-06 13:14:05 -08:00
Djordje Pesut
54f2c14cce MIPS: dspr2: added optimization for function FTransform
Change-Id: Ib5850edbc2a586ec9781f494b2337f024e22af78
2014-11-06 14:21:33 +01:00
Djordje Pesut
aa42f4231f MIPS: dspr2: Added optimization for function VP8LSubtractGreenFromBlueAndRed
Change-Id: I683c73cceee4a40ca810deba15e54fbf7dbe8918
2014-11-06 10:56:18 +01:00
Djordje Pesut
95ca44a718 MIPS: dspr2: added optimization for Disto4x4
enc/dec common macros moved to mips_macro.h

Change-Id: I38d491e772554ac663dd5eb4d15485c0343f23b1
2014-11-05 12:06:15 +01:00
James Zern
4171b6724e backward_references.c: reindent after c8581b0
Change-Id: Icfc0fe8e266c0f67a70b8cb095e5aaee155290b6
2014-11-04 17:40:04 +01:00
Vikas Arora
c8581b06e1 Optimize BackwardReferences for RLE encoding.
Updated BackwardReferencesRle method by utilizing the local color cache.
Also changed the name of method BackwardReferencesHashChain to
BackwardReferencesLz77 to reflect the LZ77 coding.

For the 1000 image corpus, this change saves 0.2% bytes
(at default settings) and is 2-5% faster to encode.

Change-Id: Ic3f288253b3bbb101a69945a80994c3fd0917f8b
2014-11-04 08:12:07 -08:00
Djordje Pesut
5798eee6be MIPS: dspr2: unfilters bugfix (Ie7b7387478a6b5c3f08691628ae00f059cf6d899)
Change-Id: I78d97960efbd1ec1af51a5426e38dc01bdb48140
2014-11-03 15:39:00 +01:00
Vikas Arora
4167a3f5f7 Optimize backwardreferences
Optimize backwardreferences (about 0.1% byte savings) with almost same
compression speed (3% faster on defaut compression settings).
1.) Simplified iteration logic for HashChainFindCopy.
    - Remapped the iter_max constant.
2.) Simplified main for loop for BackwardReferencesHashChain
    - Removed 'if' conditions for corner cases in the main loop.
    - Refactored the method(AddSingleLiteral) for adding one pixel.

Change-Id: I1bc44832fd81f11e714868a13e606c8f83157e64
2014-10-31 18:08:38 -07:00
James Zern
d18554c30d Merge "webp/types.h: use inline for clang++/-std=c++11" 2014-10-31 03:53:06 -07:00
Vikas Arora
77bdddf016 Speed up BackwardReferences
Speed up BackwardReferencesHashChainDistanceOnly method by:
1.) Remove for loop for shortmax code path.
2.) Execute the shortmax code path after regular call to
HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH).
3.) Also for shortmax, call method HashChainFindOffset (for length = 2),
instead of expensive method HashChainFindCopy().
4.) Handling first pixel (i==0) outside main loop and removing one if
condition (i > 0) per pixel.
5.) Handle the last pixel outside the main 'for' loop.

Overall compression speedup observed is around 5% (+/- noise).

Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b
2014-10-30 10:58:24 -07:00
James Zern
6638710b9e webp/types.h: use inline for clang++/-std=c++11
at least clang 3.[45] in c++ mode with -std=c++11 define __STRICT_ANSI__
this change set WEBP_INLINE to inline for c++/non-strict-ansi/> c99

fixes crbug.com/428383

Change-Id: Ief2b934353c336a75865c73c90cc3dc5e4f83913
2014-10-30 15:25:27 +01:00
Vikas Arora
abf04205b3 Enable entropy based merge histo for (q<100)
Enable bin-partition entropy based heuristic for merging histograms
for higher (q >= 90) qualities as well. Keep the old behavior at the
maximum quality level (q==100).

This speeds up the compression between Q=90-99 (method=4) by factor 5-7X
and with loss of 0.5-0.8% in the compression density.

Change-Id: I011182cb8ae5403c565a150362bc302630b3f330
2014-10-30 03:59:36 -07:00
James Zern
572022a350 filters_mips_dsp_r2.c: disable unfilters
the output does not match the C-code.

Change-Id: Ie7b7387478a6b5c3f08691628ae00f059cf6d899
2014-10-30 11:10:11 +01:00
Djordje Pesut
a28e21b141 MIPS: dspr2: Added optimization for function ClampedAddSubtractFull
Change-Id: Iee98eaf007158f44a299dd5ba8d972d0d4108380
2014-10-29 13:08:06 +01:00
Djordje Pesut
18d5a1efa8 MIPS: dspr2: added optimization for function ClampedAddSubtractHalf
Change-Id: Iec22e897a4f56e79c18ec00f8caa9cefac67f186
2014-10-29 11:08:37 +01:00
Djordje Pesut
829a8c19a0 MIPS: dspr2: added optimization for ITransform
Change-Id: I3534fca143535c53d18a3749b3a1b0c8a7563463
2014-10-28 14:28:14 +01:00
Vikas Arora
653ace55c3 Increase the MAX_COLOR_CACHE_BITS from 9 to 10.
The Maximum allowed limit is 11.
The Q=25 and below is not impacted as cache bits are forced to 0.
This saves 0.05% - 0.1% bytes for other quality with almost same compression
speed (+/- 2-3%, that's more of a noise).

Change-Id: Icf972a98f298c89e140e37a627baf709134be9a0
2014-10-27 14:19:04 -07:00
Vikas Arora
919220c7e6 Change the logic adjusting the Histogram bits.
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).

The compression density remains the same with this change, with little faster
compression speed.

Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
2014-10-27 09:57:17 -07:00
pascal massimino
53b096c0d7 Merge "Fix bug in VP8LCalculateEstimateForCacheSize." 2014-10-27 02:31:10 -07:00
Vikas Arora
e912bd55be Fix bug in VP8LCalculateEstimateForCacheSize.
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.

This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.

Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
2014-10-26 20:05:48 -07:00
James Zern
22881c999e dec_neon: add RD4 intra predictor
based on the SSE2 version; a bit rough around the loads, but still ~38%
faster.

Change-Id: I22426d939a7354cbc9a85ca8c68235d6081b882f
2014-10-24 21:22:07 +02:00
James Zern
1304eb3418 Merge "dec_neon: DC4: use pair-wise adds for top row" 2014-10-23 08:08:34 -07:00
pascal massimino
7083006b61 Merge "dsp/dec_{neon,sse2}: VE4: normalize variable names" 2014-10-23 07:29:27 -07:00
James Zern
0db9031c79 dsp/dec_{neon,sse2}: VE4: normalize variable names
use '0' rather than '_' when dealing with variables that result from a
shift

Change-Id: I29280c0dead645ce39dc4bb42c3e19929b302fd4
2014-10-23 16:04:13 +02:00
James Zern
b5bc15305b dec_neon: DC4: use pair-wise adds for top row
reduces load count, slightly faster

Change-Id: I880340ef8ef75ce4ce321c330f56f86b758bda08
2014-10-23 15:48:49 +02:00
Pascal Massimino
5b90d8fe42 Unify the API between VP8BitWriter and VP8LBitWriter
BitReader will be next...

Change-Id: Icd9e7ab2e3890131e664c0523627d9b8c5399a74
2014-10-23 15:35:16 +02:00
pascal massimino
f7ada560ce Merge changes I2e06907b,Ia9ed4ca6,I782282ff
* changes:
  dec_neon: add DC4 intra predictor
  dec_neon: add TM4 intra predictor
  dec_neon: add LD4 intra predictor
2014-10-23 06:31:54 -07:00
pascal massimino
5beb6bf070 Merge "dec_neon: add VE4 intra predictor" 2014-10-23 05:38:41 -07:00
James Zern
eba6ce06c3 dec_neon: add DC4 intra predictor
~70% faster

Change-Id: I2e06907b8d69be71a8c5581832c931923c24bab0
2014-10-23 14:21:08 +02:00
James Zern
79abfbd9df dec_neon: add TM4 intra predictor
~21% faster

Change-Id: Ia9ed4ca650f9d544821fa1faf3173611806a272a
2014-10-23 14:21:08 +02:00
James Zern
fe395f0e4d dec_neon: add LD4 intra predictor
based on SSE2 version, ~55% faster

Change-Id: I782282ffc31dcf238890b3ba0decccf1d793dad0
2014-10-23 14:20:47 +02:00
James Zern
32de385eca dec_neon: add VE4 intra predictor
based on SSE2 version, ~59% faster

Change-Id: Iaa2181eb51bd975de0e9fe5c7b66ed18188f0e3b
2014-10-23 11:46:08 +02:00
Vikas Arora
c2b5a0396a Modify CostModel to allocate optimal memory.
Change-Id: I7d52675d28bfc109d4e901581fc24cd36fcb79ee
2014-10-22 13:30:33 -07:00
Pascal Massimino
b7a33d7e91 implement VE4/HE4/RD4/... in SSE2
(30% faster prediction functions, but overall speed-up is ~1% only)

Change-Id: I2c6e7074aa26a2359c9198a9015e5cbe143c2765
2014-10-22 18:25:36 +02:00
Pascal Massimino
97c76f1f30 make VP8PredLuma4[] non-const and initialize array in VP8DspInit()
also convert 'type *dst' to 'type* dst'

Change-Id: I41ab66ad15b548cc45d1cb8b10bbca4fe1528cae
2014-10-22 18:14:20 +02:00
pascal massimino
0ea8c6c219 Merge "PrintReg: output to stderr" 2014-10-22 08:55:10 -07:00
James Zern
f85ec712b0 PrintReg: output to stderr
allows use of '-o -' while testing

Change-Id: Ibc02d7cede2df4eb8be0a28c0ca4bf5e91864191
2014-10-22 17:28:19 +02:00
Vikas Arora
139142e440 Optimize BackwardReferenceHashChainFollowPath.
Instead of calling HashChainFindMethod, call a new (subset) method
HashChainFindOffset to get the offset/distance for a given length.

The encoding is tad faster at default compression

                       Before              After
                     bpp/rate            bpp/rate
442 Palette     0.2720/5.270 MP/s      0.2720/5.790 MP/s
558 non-palette 3.7607/0.797 MP/s      3.7607/0.816 MP/s

Change-Id: If4041a9c18f7e972f49fcbab8c3e2f013d8bf1cf
2014-10-21 10:04:27 -07:00
James Zern
5f36b68d22 enc/backward_references.c: fix indent
reindent after c24f895

Change-Id: I55adcbef21ea3fdaded84b138745515596191a09
2014-10-20 11:35:20 +02:00
James Zern
e0e9960dd1 Merge "sync version numbers to 0.4.2 release" 2014-10-17 11:47:30 -07:00
James Zern
64ac51446d sync version numbers to 0.4.2 release
libwebp{,decoder} - 0.4.2
libwebp libtool - 5.2.0
libwebpdecoder libtool - 1.2.0

mux/demux - 0.2.2
libtool - 1.2.0

(cherry picked from commit eec5f5f121)
(cherry picked from commit 857578a811)

Change-Id: Ie9d10c68e28083674a8865ad8447b1a70dcea95d
2014-10-17 19:50:21 +02:00
Vikas Arora
c24f8954be Simplify and speedup Backward refs computation.
Updated VP8LGetBackwardReferences and HashChainFindCopy method with following:
- Remove the recursive CostModelBuild.
- Reuse the lz77 backward refs in CostModelBuild, instead of evaluating it
  again (as it was done for recursion_level=0).
- Consolidated the Match-length logic inside FindMatchLength method.
- Removed the logic for altering best_length/val based on the 2D distance.
  The additional 162 value (+= 9 * 9 + 9 * 9 - y * y - x * x) can't change the
  best_val eval computation to choose a different curr_length, as best_val was
  set to 'curr_length << 16'.

  Following is the impact on the compression speed/density at default & max
  quality, overall this speeds up compression by 5-15% (q=100 -> 75) with a tad
  drop (0.02-0.03%) in compression density for the non-palette images.

                  Before                After
                bpp/Rate(MP/s)        bpp/Rate(MP/s)
q=75 (def)
All 1000        2.4492/1.049 MP/s     2.4498/1.230 MP/s
Palette         0.2719/5.060 MP/s     0.2719/6.110 MP/s
non-Palette     3.7597/0.732 MP/s     3.7607/0.840 MP/s

q=100
All 1000        2.4134/0.125 MP/s     2.4142/0.131 MP/s
Palette         0.2692/2.585 MP/s     0.2692/2.885 MP/s
non-Palette     3.7040/0.079 MP/s     3.7053/0.083 MP/s

Change-Id: I27a5eff3356d876c3e949fd32262244b25678b7a
2014-10-17 09:21:30 -07:00
James Zern
d1c359ef29 fix shared object build with -fvisibility=hidden
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples

Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
2014-10-17 11:50:52 +02:00
James Zern
a4c3a31b8f WEBP_TSAN_IGNORE_FUNCTION: fix gcc compat warning
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition

Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
2014-10-16 18:06:43 +02:00
Pascal Massimino
80247291c6 mark some init function as being safe for thread_sanitizer.
introduces the macro WEBP_TSAN_IGNORE_FUNCTION

Change-Id: I3de2b6c1a2076fba4da7ae50322551e026b2082b
2014-10-16 16:34:07 +02:00
James Zern
79b5bdbfde bit_reader.h: cosmetics: fix a typo
Change-Id: I1ba09124700b3120f18eb3705eb5ba805feb2ca0
2014-10-16 10:52:47 +02:00
Pascal Massimino
6c6736816c Improved near-lossless mode.
Compared to previous mode it gives another 10-30% improvement in compression keeping comparable PSNR on corresponding quality settings.

Still protected by the WEBP_EXPERIMENTAL_FEATURES flag.

Change-Id: I4821815b9a508f4f38c98821acaddb74c73c60ac
2014-10-15 10:57:21 -07:00
James Zern
0ce27e715e enc_mips32: workaround gcc-4.9 bug
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9

In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault

Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
2014-10-15 19:14:04 +02:00
James Zern
aca1b98f52 enc/vp8l.c: fix indent
reindent after ca00502

Change-Id: I8c88dbc11dc96c117531b17682b764a235ef23bb
2014-10-13 11:33:23 +02:00
Vikas Arora
ca00502788 Evaluate non-palette compression for palette image
Evaluate if for Palette images (num_colors <= 256), non-palette
compression path (Subtract green, predictor transform etc) yield an
optimal compression density.

This change reduces the WebP file (for palette images) size by 0.4% with
drop of 3-5% in compression speed.

Change-Id: I1ad66fa94db4fd7ba7bc215763791ef662cd4f42
2014-10-10 11:55:45 -07:00
James Zern
c8a87bb62d AssignSegments: quiet -Warray-bounds warning
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9

Change-Id: Ifa7c0dd7f3f075b3860fa8ec176d2c98ff54fcea
2014-10-10 17:18:39 +02:00
pascal massimino
32f67e309f Merge "enc_neon: initialize vectors w/vdup_n_u32" 2014-10-09 12:23:18 -07:00
Pascal Massimino
fabc65da32 1-3% faster encoding optimizing SSE_NxN functions
got rid of the |a-b|^|b-a| method and went back
to just (a-b)^2 instead.

quality | size(bytes) after/before | time (ms) after/before

Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0
2014-10-09 07:20:00 -07:00
James Zern
7534d71640 enc_neon: initialize vectors w/vdup_n_u32
replaces {} initialization gnu-ism

Change-Id: I5a7b2d4246f0205e4bfb7f4b77d720c47d8674ec
2014-10-09 12:35:41 +02:00
Pascal Massimino
5f81391263 Merge "Fix return code of EncodeImageInternal()" 2014-10-07 23:49:29 -07:00
Pascal Massimino
e321abe43d Fix return code of EncodeImageInternal()
It was returning 'VP8_ENC_OK' in case of memory error.

Change-Id: I184a3e29c9f1b863637cacbe389b058d75c3dbf8
2014-10-08 08:48:53 +02:00
Pascal Massimino
f82cb06afb optimize palette ordering
We compact the palette by weighted distance, favoring the green channel.

Average gain on paletted file is ~0.5%, with gain up to 6-7% on some favorable cases.
Encoding speed is unaffected.

Disabled for alpha (or any single-channel input)

Also: always use quality=20 for EncodePalette() since it
doesn't make any real difference.

Change-Id: I19fb14316a366f139a941b45aef5663a33c905e1
2014-10-08 08:42:36 +02:00
Pascal Massimino
f545feee64 don't set the alpha value for histogram index image
This leads to tiny extra compression (~few bytes per file) for free

Change-Id: Ia4d8cef3de4365e32eacefd69a57689c80042a23
2014-10-08 08:24:19 +02:00
Pascal Massimino
2d9b0a4472 add WebPDispatchAlphaToGreen() to dsp
SSE2 version is 2.1x faster

This is used to transfer the alpha plane to green channel before lossless compression.

Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
2014-10-06 23:15:44 +02:00
Vikas Arora
d5e498d47f Change Entropy based Histogram Combine heuristic.
Don't combine the Histograms that have trivial (single valued A, R & B)
  symbols.
Following is the compression savings data along with compression time (before
& after) per image.
                     Before             After
                     bpp, rate(MP/s)    bpp, rate(MP/s)
Q=25, method = 4     2.508, 1.807       2.499, 1.916
Q=50, method = 4     2.460, 1.488       2.456, 1.512
Q=75, method = 4     2.452, 1.078       2.450, 1.092
Q=25, method = 5     2.505, 1.398       2.496, 1.383
Q=50, method = 5     2.458, 1.170       2.453, 1.143
Q=75, method = 5     2.453, 0.886       2.450, 0.855

This change provides 0.1-0.4% compression gains and speeds up the lossless
compression for the default method=4 (the drop in compression speed is between 1-3.5% for method=5).

Change-Id: Idfd88c2092f37afacd26a97097b3053f8183953a
2014-09-30 13:41:39 -07:00
Pascal Massimino
47a2d8e1d9 fix MSVC float->int conversion warning
+ add a clarifying comment

Change-Id: I8ac1df1de2e5277f2d968dec489546e680bb5e0c
2014-09-27 00:36:01 -07:00
James Zern
35ad48b848 HistoHeapInit: correct positions allocation size
Change-Id: I1879fd48bee3aea6f0504926d7030b504dd9be07
2014-09-26 11:21:19 -07:00
Pascal Massimino
45d9635fd3 lossless: entropy clustering for high qualities.
Tested on 1000 pngs corpus with quality 90-100 it gives ~0.15% improvement
in compression density and ~7% speed up.

Change-Id: I460f56c96707edb3c1f0b51a024e5122e10458df
2014-09-26 15:26:56 +02:00