Commit Graph

2012 Commits

Author SHA1 Message Date
Jyrki Alakuijala
f3a7a5bf76 lossless: bit writer optimization
valgrind --tool=callgrind shows a 9 % speedup: 1021201984 ticks before vs.
927917709 after

-q 0 -m 0 -lossless ~/alpi/1.png
22.040 MP/s before
24.796 MP/s after

Change-Id: Iaab928167b3e20fb0d9401c6f8317a26c5a610b4
2015-07-20 16:18:40 -07:00
James Zern
d97b9ff755 Merge changes from topic 'lossless-enc-improvements'
* changes:
  lossless: combine the Huffman code with extra bits
  lossless: Inlining add literal
  lossless: simplify HashChainFindCopy heuristics
  lossless: 0.5 % compression density improvement
  lossless: Add zeroes into the predicted histograms.
  lossless: encoding, don't compute unnecessary histo
  lossless: Remove about 25 % of the speed degradation
  Faster alpha coding for webp
  lossless: rle mode not to accept lengths smaller than 4.
  lossless: Less code for the entropy selection
  lossless: 0.37 % compression density improvement
2015-07-20 19:38:42 +00:00
James Zern
0250dfcc19 msvc: fix pointer type warning in BitsLog2Floor
_BitScanReverse() takes an unsigned long*
http://msdn.microsoft.com/en-us/library/fbxyd7zd.aspx

fixes:
C4057: 'function': 'unsigned long *' differs in indirection to slightly
different base types from 'uint32_t *'

fixes issue #253

Change-Id: I0101ef7be18c7ed188b35e9b17e7f71290953786
2015-07-18 11:12:21 -07:00
Jyrki Alakuijala
52931fd548 lossless: combine the Huffman code with extra bits
gives 2 % speedup
24.9 -> 25.5 MP/s for a photo with -q 0 -m 0

Change-Id: If9ae04683a86dd7b1fced2183cf79b9349a24a9e
2015-07-07 20:24:28 -07:00
Jyrki Alakuijala
c4855ca249 lossless: Inlining add literal
this is a simple speedup of about 1-2 %

Change-Id: I0c7b01c0a69f4aeaf363ffda05a28871f1def696
2015-07-07 20:24:28 -07:00
Jyrki Alakuijala
8e9c94dedb lossless: simplify HashChainFindCopy heuristics
for small speedup
0.0003 % worse compression

Change-Id: Ic4b6b21e5279231c6321f2cec1c79f7e17e56afa
2015-07-07 20:24:27 -07:00
Jyrki Alakuijala
888429f409 lossless: 0.5 % compression density improvement
do not do length 2 matches far away

speedup for non compressible data by inserting two literals at a time
when no matches are found

Change-Id: Ia8e033071f4186bb8148bb2bf13ca37586734aa3
2015-07-07 20:24:27 -07:00
Jyrki Alakuijala
7b23b19808 lossless: Add zeroes into the predicted histograms.
Increases compression density by 0.03 % for lossy.

Speeds up at least one of the lossy alpha images by 20 %.

Palette entropy 'kludge' seems to save 1-2 % on alpha images.

Change-Id: I2116b8d81593ac8173bfba54a7c833997fca0804
2015-07-07 20:24:27 -07:00
Jyrki Alakuijala
85b44d8a69 lossless: encoding, don't compute unnecessary histo
share the computation between different modes

3-5 % speedup for lossless alpha
1 % for lossy alpha

no change in compression density

Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
2015-07-07 20:24:26 -07:00
Jyrki Alakuijala
d92453f381 lossless: Remove about 25 % of the speed degradation
introduced in:
"lossless: 0.37 % compression density improvement"

Uses the statistics of red and blue histograms to decide if to run
cross color correction at all.

Improves compression density by 0.02 % or so.

Change-Id: I47429557e9cdbd9fa90c584696f241b17427d73f
2015-07-07 20:24:26 -07:00
Jyrki Alakuijala
2cce031704 Faster alpha coding for webp
No significant size degradation (+0.001 %) for 1000 image corpus

Fixes the 8 ms vs 2 ms degradation from:
"lossless: 0.37 % compression density improvement"

Change-Id: Id540169a305d9d5c6213a82b46c879761b3ca608
2015-07-07 20:24:25 -07:00
Jyrki Alakuijala
5e75642efd lossless: rle mode not to accept lengths smaller than 4.
Gives a compression gain of 0.22 %

Change-Id: I0f3b8dad6b4c1bfb16eab095a467f34466b9e3b7
2015-07-07 20:24:25 -07:00
Jyrki Alakuijala
84326e4ab0 lossless: Less code for the entropy selection
Tested:
	1000 png corpus gives same results

Change-Id: Ief5ea7727290743b9bd893b08af7aa7951f556cb
2015-07-07 20:24:25 -07:00
Jyrki Alakuijala
16ab951abf lossless: 0.37 % compression density improvement
counting the entropy expectation for five different configurations:
palette
non-predicted
non-predicted with subtract green
predicted
predicted with subtract green

and choose the strategy with the smallest expected entropy

Change-Id: Iaaf209c0d565660a54a4f9b3959067afb9951960
2015-07-07 20:24:24 -07:00
James Zern
822f113ebb add WebPFree() to the API
this should be used in preference to free() for releasing memory
returned from WebPDecode*() / WebPEncode*(). this simplifies memory
management when working through language bindings

Change-Id: I15eb538a45390efc552fda8e5c251a3fbdc13c29
2015-07-06 23:27:51 -07:00
Pascal Massimino
0ae2c2e4b2 SSE2/SSE41: optimize SSE_16xN loops
After several trials at re-organizing the main loop and accumulation scheme,
this is apparently the faster variant.

removed the SSE41 version, which is no longer faster now.
For some reason, the AVX variant seems to benefit most for the change.

Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5
2015-07-02 20:55:04 +02:00
James Zern
39216e59d9 cosmetics: fix indent after 32462a07
Change-Id: If9a5d91c25e981bc4cd81adb476244e63fc7c3c8
2015-07-01 23:49:20 -07:00
James Zern
559e54ca60 Merge "SSE2: slightly faster FTransformWHT" 2015-07-02 06:36:33 +00:00
Pascal Massimino
8ef9a63b45 SSE2: slightly faster FTransformWHT
goes from 0.3% to 0.1% overall CPU time, but...

Change-Id: I4c9a92b1e1d6b58ed57c6b890366f1dbeaf84f84
2015-07-01 23:03:17 -07:00
James Zern
f27f773576 lossless_neon: enable VP8LAddGreenToBlueAndRed
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~70% faster at the
function level and 1-2% faster overall

Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb
2015-07-01 22:50:54 -07:00
Pascal Massimino
36e9c4bc50 SSE2: minor cosmetrics on in-loop filter code
Change-Id: Ic0e6502081d7063bb2841df74e05c450d708aaf2
2015-06-28 11:59:22 +02:00
James Zern
4741fac42e dsp/lossless_*sse2: remove some unnecessary inlines
TransformColor / TransformColorInverse are the top-level function
pointer calls

Change-Id: Ieabdb4005ff3e4f9bb3ebcb140ccb6bef5d28f8b
2015-06-25 21:02:01 -07:00
Pascal Massimino
1819965e0a fix warning ("left shift of negative value") using a cast
Change-Id: Ie99e8ff87924a1d15e2c5d83bd9adf07dab04e94
2015-06-24 23:46:09 -07:00
Pascal Massimino
7017001462 SSE2: speed-up some lossless-encoding functions
optimized: CollectColorRedTransforms, CollectColorBlueTransforms, SubtractGreenFromBlueAndRed

overall effect is sub-1% speed-up, though.

Change-Id: I9cb49af5c56e4c03db417929b0a2cf575d60a5c6
2015-06-24 20:09:13 -07:00
Pascal Massimino
abcb012841 Merge "SSE2: slightly faster (~5%) AddGreenToBlueAndRed()" 2015-06-24 09:37:46 +00:00
Pascal Massimino
2df5bd30a6 Merge "Speedup to HuffmanCostCombinedCount" 2015-06-24 07:42:26 +00:00
Pascal Massimino
9e356d6b25 SSE2: slightly faster (~5%) AddGreenToBlueAndRed()
Change-Id: Ie147010b66544c4e959f26966ad588394302d418
2015-06-24 09:36:44 +02:00
Pascal Massimino
fc6c75a2a2 SSE2: 53% faster TransformColor[Inverse]
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.

Removed the SSE4.1 implementation which is now slower than SSE2.

Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
2015-06-23 14:52:01 -07:00
Pascal Massimino
49073da6d6 SSE2: 46% speed-up of TransformColor[Inverse]
Change-Id: If3bf26dc8ed32a7c03cb438e5d5fc996e2e96b5e
2015-06-23 20:09:04 +02:00
Pascal Massimino
32462a072c Speedup to HuffmanCostCombinedCount
~3% speedup for lossless encoding
Improves compression ratio by ~0.03%

Change-Id: Ic6d05fb0b1099b5ca56689b92b1c6515d54a5d6b
2015-06-23 16:41:03 +02:00
Pascal Massimino
f3d687e3fa SSE4.1 implementation of some lossless encoding functions
New implementations: SubtractGreenFromBlueAndRed and TransformColor

around 1-2% faster lossless encoding.

Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
2015-06-23 08:46:57 +02:00
Pascal Massimino
bfc300c7ff SSE4.1 implementation of some alpha-processing functions
DispatchAlpha* functions are hard to speed up, compared to SSE2.
ExtractAlpha sees a ~15% speed-up though.

Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de
2015-06-19 14:17:39 -07:00
Pascal Massimino
7f9c98f21d Merge "sse2 in-loop: simplify SignedShift8b() a bit" 2015-06-12 07:37:32 +00:00
James Zern
ef314a5d6c dec_sse2/GetNotHEV: micro optimization
trade 2 subtractions + logical or for 1 max + 1 subtraction

Change-Id: I7d1f25f7cda2a89bc8247f3d3d5417f6b0e3d96c
2015-06-11 22:46:24 -07:00
Pascal Massimino
a729cff987 sse2 in-loop: simplify SignedShift8b() a bit
Change-Id: Ida3e096bb41451194d03dc7a97753a222ff0135c
2015-06-11 15:26:31 -07:00
Pascal Massimino
422ec9fb62 simplify Load8x4() a bit
Change-Id: I68cf09c432f48e34bbe1d47dd091417cfd40cf4e
2015-06-10 12:35:50 -07:00
James Zern
8df238ec8a Merge "remove some duplicate FlipSign()" 2015-06-06 05:25:04 +00:00
Pascal Massimino
751506c484 remove some duplicate FlipSign()
ApplyFilter2NoFlip is the new variant of ApplyFilter2 without the sign-flip

Change-Id: I2af54bd1499118c8321183e42251d265ba76219c
2015-06-05 17:20:29 +02:00
James Zern
65ef5afc27 Merge "lossless: 0.13% compression density gain" 2015-06-03 03:02:09 +00:00
Jyrki Alakuijala
2beef2f245 lossless: 0.13% compression density gain
over a 1000 image corpus

Single photograph benchmark:
Before:
Q=20: 2.560 MP/s
Q=40: 2.593 MP/s
Q=60: 1.795 MP/s
Q=80: 1.603 MP/s
Q=99: 1.122 MP/s

After:
Q=20: 3.334 MP/s
Q=40: 2.464 MP/s
Q=60: 2.009 MP/s
Q=80: 1.871 MP/s
Q=99: 1.163 MP/s

This CL allows for some further improvements that would not be possible
otherwise.

Change-Id: I61ba154beca2266cb96469281cf96e84a4412586
2015-06-02 17:27:36 -07:00
Pascal Massimino
3033f24c26 lossless: 0.06 % compression density improvement
Change-Id: Ib662e6aec53b40d6bc736d3ecfd6475bb005c790
2015-06-02 14:51:51 +02:00
James Zern
64960da9e1 dec_neon: add VE8uv / VE16
VE8uv/VE16: ~25%/~33% faster over 20M pixels

Change-Id: Ifac1114091527a05ed10edfcc43852edff012d14
2015-05-30 13:40:00 -07:00
James Zern
14dbd87bed dec_neon: add HE8uv / HE16
HE8uv/HE16: ~91%/~83% faster over 20M pixels

Change-Id: Ib0a776f7c193593ea0993e92cfa6e6be000fb810
2015-05-30 13:39:24 -07:00
skal
ac76801159 introduce FTransform2 to perform two transforms at a time.
FTransform goes from ~12.0% to 11.5% total CPU time.

Change-Id: Ibcb23155324f4fd8b235563f80668531c781f624
2015-05-18 21:06:15 -07:00
James Zern
aa6065aedd dec_neon: use vld1_dup(mem) rather than vdup(mem[0])
should result in slightly less general purpose register use

Change-Id: I6069f49541392e56c8db2c28c8d1fdf88c1a1726
2015-05-16 11:24:32 -07:00
Pascal Massimino
8b63ac78e0 Merge "dec_neon: add TM16" 2015-05-16 10:56:07 +00:00
Pascal Massimino
f51be09e1f Merge "dec_neon/TrueMotion: simply left border load" 2015-05-16 10:54:05 +00:00
James Zern
dc48196bd9 dec_neon: add TM16
over 20M pixels ~78% faster

Change-Id: I420d5d590f275f19e08f86df1d1caa6b82fffbde
2015-05-15 12:50:11 -07:00
James Zern
ea95b305ca dec_neon/TrueMotion: simply left border load
use vld1_dup_u8() rather than a separate ld+dup after the values were
zero extended; mildly faster at the function level

Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89
2015-05-15 12:48:13 -07:00
Pascal Massimino
f262d6120e speed-up SetResidualSSE2
(was unnecessarily complicated)

Before:
VP8SetResidualCoeffs: checksum = 1127918   elapsed = 475 ms.

Change-Id: Ia54bef86c45f9f474622ff16e594bf1da4f67ebd
After:
VP8SetResidualCoeffs: checksum = 1127918   elapsed = 404 ms.
2015-05-14 21:24:24 -07:00
James Zern
bf46d0acff fix mips2 build target
tested with mips1 and mips2; this should cover 3/4 as well.
fixes an ftbfs reported on the debian issue tracker:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785000

Change-Id: I2458487c92bd638589fdfec5adb4f22102a5960c
2015-05-13 10:36:22 -07:00
James Zern
929a0fdccd enc_sse2/TTransform: simplify abs calculation
max(b, 0 - b) works as well as (b ^ sign) - b

Change-Id: Iad923236fd70db85ff58a64d3c8e25e4f42a525d
2015-05-08 19:50:29 -07:00
James Zern
17dbd05819 enc_sse2/CollectHistogram: simplify abs calculation
max(out, 0 - out) works as well as (out ^ sign) - out

Change-Id: Id820ab9b296512cb0d56c8026b986bf98e3d3909
2015-05-08 19:49:08 -07:00
James Zern
a6c1593645 dec_neon: add DC16 intra predictors
improvement over 20M pixels:
DC16: ~77%
DC16NoTop: ~78%
DC16NoLeft: ~83%
DC16NoTopLeft: ~83%

Change-Id: I4c4ee16a8fa0eb466eee45dfa6f6bbce5ce64b99
2015-05-08 00:12:48 -07:00
James Zern
f274a96ce9 dsp/enc_sse2: add luma4 intra predictors
VP8EncPredLuma4 improvement over ~20M pixels: ~39%

Change-Id: I9cd841250771276d2d1bef3991215a56e83f7f20
2015-05-05 23:51:19 -07:00
James Zern
040b11bdf6 dsp/enc_sse2: add chroma intra predictors
VP8EncPredChroma8 improvements over ~20M pixels
left/top: ~67%
left-only: ~52%
top-only: ~57%
none: ~61%

based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary

Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76
2015-05-05 23:51:14 -07:00
James Zern
aee021bbb1 dsp/enc_sse2: add luma16 intra predictors
VP8EncPredLuma16 improvements over ~20M pixels
left/top: ~75%
left-only: ~47%
top-only: ~59%
none: ~63%

based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary

Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d
2015-05-05 23:51:07 -07:00
James Zern
4c9af02326 dec_neon: add DC8uvNoTopLeft
~93% faster

Change-Id: Icf0fd5f85ac53c306a1b69d84275023e5b24a602
2015-05-01 20:03:57 -07:00
Pascal Massimino
9287761d95 Merge "GetResidualCostSSE2: simplify abs calculation" 2015-04-30 06:30:58 +00:00
James Zern
0e009366f8 dsp/cpu.c(x86): check maximum supported cpuid feature
structured extended feature flags require eax = 7; avoids incorrectly
detecting avx2 on some older processors that support avx.
for completeness also check for value=1 support used by the other
checks.

from [1]:
INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor
Information and the Vendor Identification String

[1]
http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html

Change-Id: I60b20d661a978d551614dbf7acdc25db19cb6046
2015-04-29 23:22:53 -07:00
James Zern
b243a4bc30 GetResidualCostSSE2: simplify abs calculation
max(coeff, 0 - coeff) works as well as min/max/sub or
(coeff ^ sign) - coeff

Change-Id: I9b11715372e49cd83820677bf4beba4a1c04931c
2015-04-21 20:29:12 -07:00
Pascal Massimino
b83bd7c4ea Merge "populate 'libwebpextras' with: import gray, rgb565 and rgb4444 functions" 2015-04-17 15:30:52 -07:00
James Zern
dbba67d1e7 histogram.h: cosmetics: remove unnecessary includes
Change-Id: Ia8277d3587534c2a1af05d3df57a6973a68be16d
2015-04-17 12:23:06 -07:00
James Zern
e978fec61a Merge "VP8LBitReader: fix remaining ubsan error with large shifts" 2015-04-17 00:30:05 -07:00
James Zern
d6fe588469 Merge "ReconstructRow: move some one-time inits out of the main loop" 2015-04-16 14:51:36 -07:00
Pascal Massimino
a21d647c11 ReconstructRow: move some one-time inits out of the main loop
+ some cosmetics clean-up

Change-Id: Ifb34b914844bb7734137bacd61fcfc4a13971665
2015-04-16 14:31:19 -07:00
Pascal Massimino
7a01c3c3ec VP8LBitReader: fix remaining ubsan error with large shifts
* make VP8LPrefetchBits() safe wrt past-EOS reads
* set 'BitReader::bits_" to a safe shifting value upon EOS

no visible performance difference on x86

Change-Id: I0a4177928cfa81d5dfc9054b36a686eaa1bf8c65
2015-04-16 00:57:42 -07:00
Pascal Massimino
7fa67c9b9e change GetPixPairHash64() return type to uint32_t
Change-Id: Ibb61c1631d7a4bcda5417b5a85864d5e2c3f3858
2015-04-16 00:55:25 -07:00
pascal massimino
ec1fb9f8dd Merge "dsp/enc.c: cosmetics: move DST() def closer to use" 2015-04-16 00:17:37 -07:00
Pascal Massimino
7073bfb3ee Merge "split 64-mult hashing into two 32-bit multiplies" 2015-04-15 23:04:47 -07:00
James Zern
0768b252fa dsp/enc.c: cosmetics: move DST() def closer to use
Change-Id: Iccbcf046412426c2893b71eced517f611d2ffc3f
2015-04-15 20:03:39 -07:00
James Zern
6a48b8f003 Merge "fix MSVC size_t->int conversion warning" 2015-04-15 19:54:18 -07:00
James Zern
1db07cdeef Merge "anim_encode: cosmetics: fix alignment" 2015-04-15 15:32:12 -07:00
James Zern
e28271a394 anim_encode: cosmetics: fix alignment
Change-Id: I0a746421f5cceebbbecfb75d11d11ec5d86a1900
2015-04-15 15:03:17 -07:00
Pascal Massimino
7fe357b8c0 split 64-mult hashing into two 32-bit multiplies
Speed-wise equivalent on x86 and ARM (maybe a tad faster, hard to tell).

Note that the two 32-bit multiples are not strictly equivalent
to the 64-bit one, since we're missing one carry propagation.
In practice, no observable difference was seen because of this
slightly different hashing result.

Change-Id: I8f2381175eae1cb20dabf149e6b27e1768fba6ab
2015-04-15 17:45:19 +02:00
Pascal Massimino
af74c1453b populate 'libwebpextras' with: import gray, rgb565 and rgb4444 functions
update makefile.unix to provide 'make extras' building instructions.

note: input ordering depends on WEBP_SWAP_16BIT_CSP for rgb565 and rgb4444

Change-Id: I6f22d32189d9ba2619146a9714cedabfe28e2ad0
2015-04-15 02:54:44 -07:00
Pascal Massimino
6121413415 remove VP8Residual::cost unused field
Change-Id: Id494475b05c540b40fd104594acbcaa783b88d77
2015-04-15 01:56:31 -07:00
Pascal Massimino
e25448235a fix MSVC size_t->int conversion warning
use size_t for 'total_frames' and compute average with float arith.

Change-Id: Ibf16edb38405b0d525bec38c246cf874668c994e
2015-04-14 23:55:00 -07:00
Urvang Joshi
0ac29c5190 AnimEncoder API: Consistent use of trailing underscores in struct.
Change-Id: Ica361eee0059250a6800c6c43264e3bd5e5aa3e0
2015-04-14 15:44:41 -07:00
Urvang Joshi
d484555024 AnimEncoder API: Use timestamp instead of duration as input to Add().
When converting from video sources, the duration of current frame
is often unavailable until the next frame. So, we internally convert
timestamps to durations.

Change-Id: I20ad86361c22e014be7eb91f00d5d40108281351
2015-04-14 12:00:57 -07:00
James Zern
9904e365a8 dsp/dec_sse2: DC8uv / DC8uvNoLeft speedup
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.

DC8uv: ~19% faster
DC8uvNoLeft: ~12% faster

Change-Id: I707c4f6177a65b5d1f2d3deeca87d2bb740185e2
2015-04-08 23:12:53 -07:00
James Zern
7df2049785 dsp/dec_sse2: DC16 / DC16NoLeft speedup
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.

DC16: ~20% faster
DC16NoLeft: ~14% faster

Change-Id: I7ec3f8a6e5923f88a530f79fceb88d5001bef691
2015-04-08 23:10:39 -07:00
James Zern
db12250fd1 cosmetics: vp8enci.h: break long line
Change-Id: Ib7c7ef6171506e826ed5f7df20c5644f240fd645
2015-04-06 16:11:02 -07:00
James Zern
b44eda3f60 dsp: add DSP_INIT_STUB
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning

Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
2015-04-02 23:55:35 -07:00
Pascal Massimino
03e76e962e clarify the comment about double-setting the status in SetError()
Change-Id: I67107220b7a84459592c726dab95483acd4f59f2
2015-04-01 15:27:55 -07:00
Pascal Massimino
9fecdd713e remove unused EmitRGB()
Change-Id: If4d3d775b051206abdab8c603cd3887e9f25d102
2015-04-01 15:27:55 -07:00
Pascal Massimino
43f010dd6d move ReconstructRow to top
(one less TODO)

Change-Id: Iaf36d28ab10633faaaa25f2c37ac799747456adc
2015-04-01 15:27:36 -07:00
Pascal Massimino
82d980209b add a dec/common.h header to collect common enc/dec #defines
had to rename few structs.

-> we can now include both vp8i.h and vp8enci.h without naming
conflicts.

Change-Id: Ib41b498f1b57aab3d6b796361afc45210ec75174
2015-03-31 22:17:58 -07:00
pascal massimino
5d4744a253 Merge "enc_sse41: add Disto4x4 / Disto16x16" 2015-03-27 01:12:45 -07:00
Urvang Joshi
e38886a771 mux.h: Bump up ABI version
This was not bumped up after some recent changes; e.g.
WebPAnimEncoderOptionsInit() method.

Change-Id: Ia473b83ddd7a3d8c227d8eeb126809a97e327475
2015-03-26 23:52:22 -07:00
James Zern
1a338fb306 enc_sse41: add Disto4x4 / Disto16x16
direct translation from sse2; minor gain, fewer instructions

Change-Id: I60288a842fac1a686b82b5cab637931789fe29f2
2015-03-25 23:28:46 -07:00
Pascal Massimino
94055503e3 encoding SSE4.1 stub for StoreHistogram + Quantize + SSE_16xN
Visible speed-up, thanks to pshufb and pabsw and psignw use.

had to tweak configure.ac to make "smmintri.h" presence correctly
detected (we need to set the CPPFLAGS instead of the CFLAGS!)

Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080
2015-03-25 20:23:51 -07:00
Pascal Massimino
c64659e1b4 remove duplicate variables after the lossless{_enc}.c split
clang was giving "duplicate symbols" error messages at link time.

Change-Id: I2b77b55222fe033cc1d4636567902e80d814aab6
2015-03-25 11:10:21 +01:00
James Zern
67ba7c7acc enc_sse2: call local FTransform in CollectHistogram
allows the former to be inlined; negligible speed-up in most cases,
however this is structure is consistent with the rest of the optimized
modules

Change-Id: Ib080240b06f7a995b47f1906627850c355b82901
2015-03-24 20:22:24 -07:00
James Zern
182497993b dsp: s/VP8LSetHistogramData/VP8SetHistogramData/
this function is for lossy encoding; the VP8L prefix is used by lossless

Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb
2015-03-24 18:27:41 -07:00
James Zern
ede5e1584c cosmetics: dsp/lossless.h: reorder prototypes
group decoding / encoding functions together, followed by their
respective Init() function.

Change-Id: Ib4d22f8ec2369efec752faf733ecf53acc67b1ca
2015-03-24 17:52:42 -07:00
James Zern
553051f741 dsp/lossless: split enc/dec functions
adds lossless_enc*.c; reduces the size of the decode-only so: ~78K
w/gcc-4.8.2 on x86_64.

Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092
2015-03-23 22:57:50 -07:00
James Zern
cecf509662 dsp/yuv*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: I42e621481be7305bb7c426b4d0b279619195611e
2015-03-20 19:19:46 -07:00
James Zern
6584d398eb dsp/upsampling*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: I3c753915eefe900987c9720733efb720ebe6bfa7
2015-03-20 19:19:46 -07:00
James Zern
808094228c dsp/rescaler*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: Ife9c7cd363b3692b64a7ade1960cfce3a76c3ba2
2015-03-20 19:19:46 -07:00
James Zern
1d93ddec19 dsp/lossless*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: If8b4459556e6bfaa36ef046f66520558b9444fc2
2015-03-20 19:19:46 -07:00
James Zern
73805ff270 dsp/filters*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: Idf08ffeb2aef1392a6d69596d897a59deebb64cf
2015-03-20 19:19:46 -07:00
James Zern
fbdcef2401 dsp/enc*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: I0cf40b500f9b3eed55a3211213db180c7c0dd43b
2015-03-20 19:19:46 -07:00
James Zern
66de69c1fe dsp/dec*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: I319bc7714f36b8a3d8b35f6474e5592a439aaf24
2015-03-20 19:19:37 -07:00
James Zern
48e4ffd15e dsp/cost*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: Ie9bee5eaf9daebe0909ab1dda1cf1aa4ee1ef03e
2015-03-20 19:18:50 -07:00
James Zern
29fd6f90c0 dsp/argb*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: I46b89909a0279172d37dbda70f731c7b9f052dad
2015-03-20 19:18:50 -07:00
James Zern
80ff38130e dsp/alpha*.c: rework WEBP_USE_<arch> ifdef
add a dummy init rather than repeating the '#ifdef WEBP_USE_...'
pattern.

Change-Id: I9e7f187daffe1a3b1bc92953dce980c38d1a6269
2015-03-20 19:18:41 -07:00
Pascal Massimino
e9570dd987 stub for SSE4.1 support.
Change-Id: I0c845a98d2871cc8907ff7b914bab7747a92c7ed
2015-03-20 00:26:35 -07:00
pascal massimino
4a95384b34 Merge "dsp: add sse4.1 detection" 2015-03-19 00:08:02 -07:00
James Zern
cabf4bd2bc dsp: add sse4.1 detection
bit 19 in ecx
no targets or code

https://software.intel.com/en-us/articles/using-cpuid-to-detect-the-presence-of-sse-41-and-sse-42-instruction-sets

Change-Id: Ie61b004dd5b6a3639b30bd9d2a09e6d7359b8040
2015-03-18 19:16:47 -07:00
James Zern
4ecba1ab97 thread.h: rename interface param
this matches the code in thread.c; interface is a reserved word in some
windows configurations.

Change-Id: I9570b14171023214a51263211693f1a858a13acf
2015-03-18 19:07:41 -07:00
pascal massimino
b8d706c8c2 Merge "sync versions with 0.4.3" 2015-03-12 00:51:44 -07:00
James Zern
ae64a7117e Merge "add shell for libwebpextras" 2015-03-11 18:53:50 -07:00
James Zern
92a5da9c8c sync versions with 0.4.3
libwebp{,decoder} - 0.4.3
libwebp libtool - 5.3.0
libwebpdecoder libtool - 1.3.0

mux/demux - 0.2.2 (unchanged)
libtool - 1.2.0 (unchanged)

(cherry picked from commit bd852f5d81)

Change-Id: Ie8c35ffc20c1bfd782bdafd99da6c6b1373022c1
2015-03-11 17:29:23 -07:00
Pascal Massimino
9d4e2d1697 Merge "~30% faster smart-yuv (-pre 4) with early-out criterion" 2015-03-11 02:16:27 -07:00
Pascal Massimino
b1bdbbabfb ~30% faster smart-yuv (-pre 4) with early-out criterion
we look at average global improvement and stop when things are
moving slow, or when we had a quite good first iteration already
(means: the picture is "not difficult")

Change-Id: I8ab7d100353039b5b32bb5fac3fe03c8440c78d5
2015-03-11 00:42:12 -07:00
James Zern
7efb97483f Merge "Disable NEON code on Native Client" 2015-03-10 17:52:51 -07:00
Sam Clegg
ac4f5784a0 Disable NEON code on Native Client
The NEON assember in libwebp has not yet been ported
to Native Client. This changes disables it.
Related issue:
https://code.google.com/p/nativeclient/issues/detail?id=3205

Change-Id: I200291db7aa79d40c1f10cff7622c9b8599e6886
2015-03-10 16:17:25 -07:00
Urvang Joshi
0873f85b54 AnimEncoder API: Support input frames in YUV(A) format.
We automatically convert them to ARGB format.

Change-Id: Ia21f07e08c746e16a318cb035af375c81d9af0de
2015-03-10 11:27:09 -07:00
James Zern
5c176d2d9b add shell for libwebpextras
meant to contain additional utility functions useful in processing webp
input/output.

Change-Id: I014ae6b917d62e826aa23a3bbe99aac4462a97c2
2015-03-06 00:15:41 -08:00
Pascal Massimino
44bd95612e fix signature for VP8RecordCoeffTokens()
Change-Id: Ia2fe764b7280931335237ced8190604129fae565
2015-03-02 23:38:20 -08:00
Pascal Massimino
c9b8ea0eef small cosmetics on TokenBuffer.
Change-Id: I7c33651ed8e3a151aef44247db5fb1e8bf41f8ba
2015-03-03 00:48:28 +01:00
James Zern
76394c09d4 Merge "MIPS: dspr2: added optimization for TrueMotion" 2015-02-26 14:07:00 -08:00
James Zern
0f773693bf WebPPictureRescale: add a note about 0 width/height
Change-Id: I3890bb3fd32a148d7dd24c714546160c6c59d4ea
2015-02-26 10:49:48 -08:00
Djordje Pesut
241bb5d9d9 MIPS: dspr2: added optimization for TrueMotion
affected functions:
      TM4 - TrueMotion4
      TM8uv - TrueMotion8
      TM16 - TrueMotion16

Change-Id: Iff4377c4b0ae94716789c03fe1cd5bfd91f79188
2015-02-26 10:22:19 +01:00
Djordje Pesut
b5e79422d5 MIPS: dspr2: Added optimization for some convert functions
affected functions:
      VP8LConvertBGRAToRGBA4444_C
      VP8LConvertBGRAToRGB565_C
      VP8LConvertBGRAToBGR_C

Change-Id: I81513d242d33ebb9fef397ee6a2ca75d17f66e97
2015-02-24 10:51:34 +01:00
Djordje Pesut
0f595db60c MIPS: dspr2: Added optimization for some convert functions
affected functions:
  VP8LConvertBGRAToRGB_C
  VP8LConvertBGRAToRGBA_C

Change-Id: I5f25795c385688f2432d0710296e589f3793cb2b
2015-02-23 17:44:06 +01:00
Djordje Pesut
8a218b4a96 MIPS: [mips32|dspr2]: GetResidualCost rebased
Change-Id: Ie15524c773f7a8c79e002097881a508187ca7cc6
2015-02-23 10:43:42 +01:00
Vikas Arora
ef98750027 Speedup method StoreImageToBitMask by 5%.
Speedup method StoreImageToBitMask by replacing the code to find histogram
index and Huffman tree codes at every iteration to a more optimal code that
updates these only when the current pixel (to write) crosses the histogram
tile-row boundary.

This change speeds up the StoreImageToBitMask method by 5%.

Change-Id: If01a1ccd7820f9a3a3e5bc449d070defa51be14b
2015-02-20 09:46:19 -08:00
James Zern
602a00f93f fix iOS arm64 build with Xcode 6.3
the standard vtbl functions are available there [1][2].
based on a patch from: aaroncrespo
fixes issue #243.

[1]
http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf
[2] Apple LLVM Compiler Version 6.1
- Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0.
[...]
Support for the arm64 architecture has been significantly revised to
align with ARM's implementation, where the most visible impact is that a
few of the vector intrinsics have changed to match ARM's specifications.

Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447
2015-02-19 12:16:58 -08:00
Pascal Massimino
2382050748 1-2% faster encoding by removing an indirection in GetResidualCost()
The MIPS code for cost is not updated yet, that's why i keep Residual::*cost
around for now. Should be removed in favor of *costs later.

Change-Id: Id1d09a8c37ea8c5b34ad5eb8811d6a3ec6c4d89f
2015-02-19 08:44:35 +01:00
Djordje Pesut
eddb7e70be MIPS: dspr2: added otpimization for DC8uv, DC8uvNoTop and DC8uvNoLeft
added macros for load/store

Change-Id: I151d4d49bf1fab87fc3a82cb8e8e0835fe10b690
2015-02-18 18:24:10 +01:00
Djordje Pesut
73ba29158f MIPS: dspr2: added optimization for functions RD4 and LD4
Change-Id: I71216c1300f4eb254de4ae940ea9dcdba50aa080
2015-02-18 15:11:34 +01:00
Pascal Massimino
c7129da5b6 Merge "4-5% faster encoding using SSE2 for GetResidualCost" 2015-02-18 04:46:53 -08:00
Djordje Pesut
94380d00d9 MIPS: dspr2: added optimizaton for functions VE4 and DC4
Change-Id: I118adc6d3872742d8b1f9dbac438cba6fc90b7a9
2015-02-18 11:25:08 +01:00
Pascal Massimino
2a407092ab 4-5% faster encoding using SSE2 for GetResidualCost
new file: cost_sse2.c

Change-Id: I4896c07f5ff2443ef743f4435fe2758d95a672ed
2015-02-18 09:41:02 +01:00
James Zern
17e1986214 Merge "MIPS: dspr2: added optimization for simple filtering functions" 2015-02-17 14:57:05 -08:00
pascal massimino
3ec404c47a Merge "dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage" 2015-02-14 01:57:08 -08:00
James Zern
b969f5dfac dsp: normalize WEBP_TSAN_IGNORE_FUNCTION usage
the attribute is only necessary in one location; remove it from the
prototypes.

Change-Id: I3820a3c34fbb029fd7ac69a1b0a9b76091bdbde2
2015-02-13 15:23:40 -08:00
Djordje Pesut
d7b8e71126 MIPS: dspr2: added optimization for simple filtering functions
affected functions: SimpleVFilter16, SimpleHFilter16,
                    SimpleVFilter16i and SimpleHFilter16i

noticed bug in FilterLoop26 (fix included in this patch)

Change-Id: I72d9c1e45cbac6393eba52bb549b04924d463e30
2015-02-13 11:18:43 +01:00
pascal massimino
235f774e5f Merge "MIPS: dspr2: Added optimization for function VP8LTransformColorInverse_C" 2015-02-13 00:12:52 -08:00
Djordje Pesut
42a8a6280c MIPS: dspr2: Added optimization for function VP8LTransformColorInverse_C
Change-Id: I8b60e22c9f6c0badab6267a33751dfc28750f457
2015-02-13 08:57:20 +01:00
James Zern
9bc0f922aa ApplyFiltersAndEncode: only copy lossless stats
this avoids a race with multi-threaded lossy + alpha compression

Change-Id: Ie437105f5a899ed28b9c8885b6ca5431092ce8f5
2015-02-12 19:44:25 -08:00
James Zern
3030f11525 Merge "dsp/mips: add some missing TSan annotations" 2015-02-12 14:55:32 -08:00
pascal massimino
dfcf4593fe Merge "MIPS: dspr2: Added optimization for function VP8LAddGreenToBlueAndRed_C" 2015-02-12 14:48:17 -08:00
James Zern
55c75a25f0 dsp/mips: add some missing TSan annotations
Change-Id: I3c832aefdeac26c6c75c35b19b45c1a2f67493c5
2015-02-12 14:36:33 -08:00
Djordje Pesut
2cb879f0c6 MIPS: dspr2: Added optimization for function VP8LAddGreenToBlueAndRed_C
Change-Id: If897c6c2f1c4b8405789298e135d6a1e4bf13012
2015-02-12 09:06:49 +01:00
James Zern
e15560107c move some cost tables from enc/ to dsp/
removes circular dependency between dsp and enc.

since:
a987fae MIPS: dspr2: added optimization for function GetResidualCost

Change-Id: Ifeb8fc02de89e2ba982ed7ffacd925d649bfec3c
2015-02-11 16:10:06 -08:00
pascal massimino
c3a031686a Merge "picture_csp: fix build w/USE_GAMMA_COMPRESSION undefined" 2015-02-10 00:09:03 -08:00
pascal massimino
39537d7cfe Merge "VP8LDspInitMIPSdspR2: add missing TSan annotation" 2015-02-10 00:02:41 -08:00
James Zern
1dd419ced5 picture_csp: fix build w/USE_GAMMA_COMPRESSION undefined
kGammaFix is now only defined with USE_GAMMA_COMPRESSION;

fixes:
use of undeclared identifier 'kGammaFix'

Change-Id: Ib1e2f410eff9b83be065894f88181f91dd2776e1
2015-02-09 23:57:14 -08:00
James Zern
43fd3543df VP8LDspInitMIPSdspR2: add missing TSan annotation
Change-Id: Ic0d84e95daf063976b40fb5ba1e94d3547e2afba
2015-02-09 23:55:30 -08:00
pascal massimino
c7233dfcdc Merge "VP8LDspInit: remove memcpy" 2015-02-09 23:48:44 -08:00
James Zern
0ec4da960d picture_csp::InitGammaTables*: add missing TSan annotations
Change-Id: I66ca5b3e7b1614f861a9b68bd437f58b24cb1ebb
2015-02-09 23:44:47 -08:00
James Zern
35579a4902 VP8LDspInit: remove memcpy
without this change the TSan annotation is useless

Change-Id: Ief511379f3aad75889815d4fe8362aed5c1abac7
2015-02-09 23:41:24 -08:00
James Zern
97f6aff874 VP8YUVInit: add missing TSan annotation
Change-Id: I7f8868de425e1aac3721b3e328844725104d14db
2015-02-09 22:50:31 -08:00
James Zern
f9016d6662 dsp/enc::InitTables: add missing TSan annotation
Change-Id: I262b9071417a0ec502c7c0380f27da6413cc74e4
2015-02-09 22:40:45 -08:00
James Zern
e3d9771aa1 VP8EncDspCostInit*: add missing TSan annotations
Change-Id: I4cdb84bc8c9a8c6aa34b5773c8fb69e5810a9809
2015-02-09 22:39:14 -08:00
Djordje Pesut
309b790867 MIPS: mips32: Added optimization for function SetResidualCoeffs
Change-Id: If67c10285df71ba7dd1aff6c24c2145c280dd2bf
2015-02-09 13:17:49 +01:00
Pascal Massimino
a987faedfa MIPS: dspr2: added optimization for function GetResidualCost
set/get residual C functions moved to new file in src/dsp
mips32 version of GetResidualCost moved to new file

Change-Id: I7cebb7933a89820ff28c187249a9181f281081d2
2015-02-07 02:13:26 -08:00
pascal massimino
be6635e91d Merge "VP8TBufferClear: remove some misleading const's" 2015-02-06 02:01:15 -08:00
pascal massimino
02971e7228 Merge "VP8EmitTokens: remove unnecessary param void cast" 2015-02-06 02:00:16 -08:00
James Zern
3b77e5a735 VP8TBufferClear: remove some misleading const's
the input to the function is non-const and the pointer being operated is
being free'd; removes an unnecessary cast in the process

Change-Id: Ic515ed672ddf7f8e4e36eeac696ff7aa8a3652f7
2015-02-05 23:56:26 -08:00
James Zern
aa139c8f1a VP8EmitTokens: remove unnecessary param void cast
'final_pass' is used within the function

Change-Id: I81be1a6e18cafaa6ae685ed8ad2b107fa7ed29cf
2015-02-05 23:56:26 -08:00
James Zern
c24d8f144f cosmetics: upsampling_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: Ia6df6690c85f580b20f19ce85cc6ec7b52620aee
2015-02-05 23:51:57 -08:00
James Zern
1829c42c58 cosmetics: lossless_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I2405b18c6bb829b76c3a9814057ccbe6e14220d9
2015-02-05 23:51:44 -08:00
James Zern
183168f332 cosmetics: enc_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: Ib85d63abbb9fc33096f893c2524d3ce8ae3ebd03
2015-02-05 23:51:29 -08:00
James Zern
860badcacc cosmetics: dec_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I77decb55f1382bea4b646a11b77dfa40bf1ef94d
2015-02-05 23:51:16 -08:00
James Zern
0254db9793 cosmetics: argb_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I87fa2de11dafcc77767aab64e13b8c5585ebf5cd
2015-02-05 23:51:07 -08:00
James Zern
1aadf856c9 cosmetics: alpha_processing_sse2: add const to some casts
source pointers are often cast to __m128*, retain the const in those
cases

Change-Id: I00ba15af2f43d125ceb2620e82fd43d420fbb9d3
2015-02-05 23:50:39 -08:00
Vikas Arora
4c82284d2e Updated the near-lossless level mapping.
Updated the near-lossless level mapping and make it correlated to lossy
quality i.e 100 => minimum loss (in-fact no-loss) and the visual-quality loss
increases with decrease in near-lossless level (quality) till value 0.

The new mapping implies following (PSNR) loss-metric:
-near_lossless 100: No-loss (bit-stream same as -lossless).
-near_lossless  80: Very very high PSNR (around 54dB).
-near_lossless  60: Very high PSNR (around 48dB).
-near_lossless  40: High PSNR (around 42dB).
-near_lossless  20: Moderate PSNR (around 36dB).
-near_lossless   0: Low PSNR (around 30dB).

Change-Id: I930de4b18950faf2868c97d42e9e49ba0b642960
2015-02-05 11:17:14 -08:00
Pascal Massimino
19f0ba0eb9 Implement true-motion prediction in SSE2
(along with DC/HE/VE for chroma/luma16)

Overall effect is ~1% faster decoding.

Change-Id: I90917e050d61874cbc8da0e88f26b5dd6131c265
2015-02-04 17:02:22 +01:00
Pascal Massimino
774d4cb758 make VP8PredLuma16[] array non-const
Change-Id: I0ce7e4e847f9fffefb6544db9636068442a2d264
2015-02-04 17:00:22 +01:00
Djordje Pesut
d7eabb8031 Merge "MIPS: dspr2: Added optimization for function CollectHistogram" 2015-02-03 22:42:32 -08:00
Urvang Joshi
fe42739cc8 Use integers for kmin/kmax for simplicity.
Change-Id: I62237975d663641552107759af8d2d329b70a7c4
2015-02-03 13:57:52 -08:00
Urvang Joshi
b9df35f714 AnimEncode API: kmax=0 should imply all keyframes.
Earlier, it wasn't adding any keyframes at all.

Change-Id: If3824fc8e57548b8610a52e875fb9279f862fa57
2015-02-03 11:42:13 -08:00
Djordje Pesut
6ce296da12 MIPS: dspr2: Added optimization for function CollectHistogram
Change-Id: Id6b87ea1c9d21fee9494ad6c53ffc84ef60d5974
2015-02-03 14:11:20 +01:00
pascal massimino
be0fd1d52d Merge "dec/vp8: clear 'dither_' on skipped blocks" 2015-02-02 23:04:40 -08:00
James Zern
c86b40cca0 enc/near_lossless.c: fix alignment
Change-Id: Ifd1b1b88c375abf655d94e2ba7d52087110294a5
2015-02-02 19:35:12 -08:00
James Zern
66935fb9ee dec/vp8: clear 'dither_' on skipped blocks
DitherRow() only checks this value, not 'skip_' so previously it was
uninitialized for these blocks.

Change-Id: I0f698b81854ee9d91edacb51c1e3bdab9cba96f2
2015-02-02 19:28:34 -08:00
James Zern
b7de794622 Merge "lossless_neon: enable subtract green for aarch64" 2015-02-02 16:01:40 -08:00
Pascal Massimino
77724f70e9 SSE2 version of GradientUnfilter
somewhat 1-2% faster decoder for lossy+alpha

Change-Id: Ib317e26e9fcb8d37af02668ffbfccc4664e659fe
2015-01-31 23:18:00 +01:00
James Zern
416e1cea9b lossless_neon: enable subtract green for aarch64
similar to:
1ba61b0 enable NEON intrinsics in aarch64 builds

vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.

performance varies based on the input, 1-3% on encode was observed

Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
2015-01-31 11:32:05 -08:00
Vikas Arora
72831f6b28 Speedup AnalyzeAndInit for low effort compression.
AnalyzeSubtractGreen constitutes about 8-10% of the comression CPU cycles.
Statistically, subtract-green is proved to be useful for most of the
non-palette compression. So instead of evaluating the entropy (by calling
AnalyzeSubtractGreen) apply subtract-green transform for the low-effort
compression.

This changes speeds up the compression at m=0 by 8-10% (with very slight loss
of 0.07% in the compression density).

Change-Id: I9797dc39437ae089716acb14631bbc77d367acf4
2015-01-30 10:37:31 -08:00
Vikas Arora
a6597483af Speedup Analyze methods for lossless compression.
Speed up AnalyzeSubtractGreen by looping through the image pixel once to
compute the two histograms.

AnalyzeEntropy code cleanup.
Removed some 'if' conditions and pointer indirections inside pixel iterate loop.

Change-Id: Ia65e3033988ff67df8e3ecce19d6e34cfc76358e
2015-01-30 09:16:31 -08:00
Vikas Arora
98c8138663 Enable Near-lossless feature.
Enable the WebP near-lossless feature by pre-processing the image to smoothen
the pixels.

On a 1000 PNG image corpus, for which WebP lossless (default settings) gets
25% compression gains, following is the performance of near-lossless feature
at various '-near_lossless' levels:
-near_lossless 90: 30% (very very high PSNR 54-60dB)
-near_lossless 75: 38% (very high PSNR 48-54dB)
-near_lossless 50: 45% (high PSNR 42-48dB)
-near_lossless 25: 48% (moderate PSNR 36-42dB)
-near_lossless 10: 50% (PSNR 30-36dB)

WebP near-lossless is specifically useful for discrete-tone images like
line-art, icons etc.

Change-Id: I7d12a2c9362ccd076d09710ea05c85fa64664c38
2015-01-29 16:10:20 -08:00
Urvang Joshi
c6b24543fc AnimEncoder API: Fix for kmax=1 and default kmin case.
Some frames that were previously selected as key-frames were incorrectly
being reset to sub-frames.

Change-Id: Iee342dbb9a9aec144b8185c3b54ca56aa7038bfb
2015-01-29 13:26:01 -08:00
Pascal Massimino
022d2f886c add SSE2 variants for alpha filtering functions
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.

The heavy bottleneck left for optimization is still GradientUnfilter()

Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
2015-01-29 08:46:22 +01:00
Urvang Joshi
2db15a9583 Temporarily disable encoding of alpha plane with color cache.
This is to avoid triggering the related decoder bug.

Change-Id: I8fa074a5393bcd62aa4a2232cd4e02935e927a89
2015-01-28 15:28:02 -08:00
James Zern
1d575ccd4e Merge "Lossless decoding: Remove an unnecessary if condition." 2015-01-27 23:35:10 -08:00
James Zern
cafa1d882f Merge "Simplify backward refs calculation for low-effort." 2015-01-27 23:32:21 -08:00
Pascal Massimino
7afdaf8496 Alpha coding: reorganize the filter/unfiltering code
Move the filtering code to their own dsp/ spot
New function: VP8FiltersInit()

Change-Id: I0b2041eab42346c59b972f2575b05509e6a8f7b1
2015-01-28 08:02:41 +01:00
Vikas Arora
4d6d7285b0 Simplify backward refs calculation for low-effort.
Simplify and speedup backward references for low-effort settings by evaluating
LZ77 references only. This change speeds up compression by 10-25% at lower
(q <= 25) quality range with a slight drop (0.2%) in the compression density.

Change-Id: Ibd6f03b1a062d8ab9191786c2a425e9132e4779f
2015-01-27 09:36:14 -08:00
Vikas Arora
ec0d1be577 Cleaup Near-lossless code.
Cleaup Near-lossless code
- Simplified and refactored the code.
- Removed the requirement (TODO) to allocate the buffer of size WxH and work
  with buffer of size 3xW.
- Disabled the Near-lossless prr-processing for small icon images (W < 64 and H < 64).

Change-Id: Id7ee90c90622368d5528de4dd14fd5ead593bb1b
2015-01-26 15:29:59 -08:00
Vikas Arora
9814ddb601 Remove the post-transform near-lossless heuristic.
Remove the post-transform (prediction, subtract green & cross-color)
near-lossless heuristic, that's not ready yet and produces unacceptable visual
(banding) artifacts.

Change-Id: I9b606a790ce0344c588f2ef83a09c57ac19c2fc1
2015-01-26 14:19:00 -08:00
Urvang Joshi
4509e32e63 Lossless decoding: Remove an unnecessary if condition.
Change-Id: I4e32da538d7b8563305124fb5faa1f7ce8a976d1
2015-01-23 15:10:49 -08:00
pascal massimino
f2ebc4a836 Merge "Regression fix for lossless decoding" 2015-01-23 13:51:21 -08:00
Urvang Joshi
783a8cda24 Regression fix for lossless decoding
Reported here: https://code.google.com/p/webp/issues/detail?id=239

At the beginning of method 'DecodeImageData', pixels up to
'dec->last_pixel_' are assumed to be already cached. So, at the end of
previous call to that method also, that assumption should hold true.

Hence, we should cache all pixels up to 'src' regardless of 'src_last'.

This affects lossless incremental decoding only, as that is when
src_last and src_end differ.
Note: alpha decoding is implicitly incremental, as alpha decoding of
only the rows 'y_end - y_start' happens during FinishRow() call. So, this bug
affects alpha decoding in non-incremental decoding flow as well.

This bug was introduced in: https://gerrit.chromium.org/gerrit/#/c/59716.

Change-Id: Ide6edfeb2609b02aff701e1bd9fd776da0a16be0
2015-01-23 12:12:27 -08:00
Urvang Joshi
9a062b8ea6 AnimEncoder: Bugfix for kmin = 1 and kmax = 2.
SanitizeEncoderOptions() was changing kmin to 2 too, which resulted in a
bad state with kmin == kmax.

Change-Id: Ie7273f1949bac469e7e6c8efbc98b154caf6de0f
2015-01-23 10:48:59 -08:00
Pascal Massimino
0f027a72bf simplify smart RGB->YUV conversion code
* use the same TFIX == YFIX precision (2bits)
* use int instead of float in LinearToGammaF()

output is visually equivalent. Code is a little faster.

Change-Id: Ie3cfebca351dbcbd924b3d00801d6523dca6981f
2015-01-23 14:42:32 +01:00
Pascal Massimino
0d5b334ee8 BackwardReferencesHashChainFollowChosenPath: remove unused variable
Change-Id: I8dc4622dbacca03a7876f8856a0db5b9b9ec2fbd
2015-01-22 23:22:58 -08:00
Pascal Massimino
f480d1a7ef Fix to near lossless artefacts on palettized images.
Don't rely on palette not being there before the palette colors are counted.

Change-Id: I988286675d3398f2da8f6d2fb6462db08af8028d
2015-01-22 17:47:50 +01:00
Pascal Massimino
d4615d0889 Merge changes Ia1686828,I399fda40
* changes:
  rename HashChainInit into HashChainReset
  use uint16_t for chosen_path[]
2015-01-22 00:08:27 -08:00
Pascal Massimino
cb4a18a7ba rename HashChainInit into HashChainReset
this avoids the confusion with "VP8LHashChainInit"

Change-Id: Ia1686828c138729e5bda3cc5c8246d99c80915ef
2015-01-20 00:38:07 -08:00
Pascal Massimino
f079e487ae use uint16_t for chosen_path[]
len is MAX_LENGTH (4096) at max. This reduce memory for path by a half.

Change-Id: I399fda4093d93b1e9d956397b7b210956c5b948f
2015-01-20 00:34:09 -08:00
Djordje Pesut
da0912126b MIPS: dspr2: Added optimization for function FTransformWHT
Change-Id: I918366cd1908304068c66da9965efb0aa63320cd
2015-01-19 10:15:13 +01:00
pascal massimino
daeb276a2b Merge "MIPS: dspr2: Added optimization for MultARGBRow function" 2015-01-17 08:56:01 -08:00
pascal massimino
cc08742454 Merge "dsp/cpu: (msvs) add include for __cpuidex" 2015-01-17 08:55:31 -08:00
James Zern
f0e0677b87 VP8LEncodeStream: add an assert
check enc->argb_ to quiet an msvs /analyze warning:
C6387: 'enc->argb_+y*width' could be '0':  this does not adhere to the
specification for the function 'memcpy'.

Change-Id: I87544e92ee0d3ea38942a475c30c6d552f9877b7
2015-01-16 18:16:40 -08:00
James Zern
c5f7747fc5 VP8LColorCacheCopy: promote an int before shifting
quiets an msvs /analyze warning, the cast is due to an earlier msvs
build warning.
C6297: Arithmetic overflow:  32-bit value is shifted, then cast to
64-bit value.  Results might not be an expected value.

Change-Id: I0bb6cda57879f2fbd1e3515f6753a11bc08d14ac
2015-01-16 18:16:39 -08:00
James Zern
0de5f33e31 dsp/cpu: (msvs) add include for __cpuidex
and only use it on x86 / x64 where it's available.
has the side-effect of quieting a msvs /analyze warning:
C6001: Using uninitialized memory 'cpu_info'.

Change-Id: Iae51be3b22b2ee949cfc473eeea9fd9fb6b3c2cb
2015-01-16 18:16:10 -08:00
Djordje Pesut
7d850f7b9a MIPS: dspr2: Added optimization for MultARGBRow function
Change-Id: Ide549ae0d80413bea8c19fe091d97bffe8b17985
2015-01-16 15:56:34 +01:00
Djordje Pesut
5487529368 MIPS: dspr2: added optimization for function QuantizeBlock
Change-Id: Id217116890b7408d23464216608ce67ae545688a
2015-01-16 12:51:13 +01:00
James Zern
4fbe9cf202 dsp/cpu: (msvs) avoid immintrin.h on _M_ARM
_xgetgv() isn't relevant there anyway

broken since:
279e661 Merge "dsp/cpu: add include for _xgetbv() w/MSVS"

Change-Id: Iaa7bc0c5be9c06bfffab39e194c64c09bf5b5a27
2015-01-15 23:04:08 -08:00
Pascal Massimino
3fd59039bd simplify/reorganize arguments for CollectColorBlueTransforms
and other various call sites too.

Change-Id: Icb8f828dfe25672662de18d0e48e7d3144b1f38d
2015-01-15 18:12:12 -08:00
Vikas Arora
b9e356b998 Disable costly TraceBackwards for method=0.
Disable costly TraceBackwards heuristic for computing the backward references
for low_effort (method=0) compression.

The TraceBackwards heuristic is already disabled for lower (q < 25) quality
range. Following is the compression data for 1000 image corpus for q >= 25.

This speeds up compression (q >= 25) by a factor of 2.5-3X with slight loss of
compression density (0.7% for lower quality range and 1.2% for higher qualities).

Change-Id: I256c9e2137c7de4083f423ea32ee12d3b0f46253
2015-01-15 09:01:40 -08:00
Djordje Pesut
a7e7caa486 MIPS: dspr2: added optimization for function TransformColorRed
added new function CollectColorRedTransforms to C, which calls
TransformColorRed and it is realized via pointer to function

Change-Id: Ia68d73bfcf1ca2cb443dc2825910946221f87835
2015-01-15 09:32:09 +01:00
pascal massimino
2cb39180cc Merge "MIPS: dspr2: added optimization for function TransformColorBlue" 2015-01-15 00:06:01 -08:00
pascal massimino
279e66138d Merge "dsp/cpu: add include for _xgetbv() w/MSVS" 2015-01-15 00:05:35 -08:00
James Zern
b6c0428e8c dsp/cpu: add include for _xgetbv() w/MSVS
explicitly add immintrin.h instead of transitively picking it up via
windows.h presumably. makes the code easier to move around.

Change-Id: If70d5143ac94fc331da763ce034358858e460e06
2015-01-14 23:31:35 -08:00
Pascal Massimino
07c39559ea Merge "AnimEncoder API: Add info in README.mux" 2015-01-13 14:14:38 -08:00
Djordje Pesut
7b16197361 MIPS: dspr2: added optimization for function TransformColorBlue
added new function CollectColorBlueTransforms to C, which calls
TransformColorBlue and it is realized via pointer to function

Change-Id: Ia488b7a7a689223b5d33aae9724afab89b97fced
2015-01-13 10:39:38 +01:00
James Zern
d7c4b02a57 cpu: fix AVX2 detection for gcc/clang targets
ecx needs to be set to 0; the visual studio builds were already doing
this.

https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family

Change-Id: I95efb115b4d50bbdb6b14fca2aa63d0a24974e55
2015-01-12 17:58:57 -08:00
Urvang Joshi
9d299469d2 AnimEncoder API: Add info in README.mux
Also, add code snippet for WebPConfig in the example.

Change-Id: Ia50222690d0e2a84bdc5e9bf362675902a810f22
2015-01-12 15:48:18 -08:00
Pascal Massimino
d581ba40ba follow-up: clean up WebPRescalerXXX dsp function
by removing redundant RFIX macros and using a plain-C fallback.

Change-Id: I52436c672bf20780b6fe3bcf43fe73e1abac10ff
2015-01-12 15:26:55 -08:00
James Zern
f8740f0d6c dsp: s/USE_INTRINSICS/WEBP_USE_INTRINSICS/
for consistency with other defines shared across modules

Change-Id: I30cdb9f892e9ea48265883f560500ffb1d6799ee
2015-01-12 14:27:36 -08:00
James Zern
ce73abe054 Merge "introduce a separate WebPRescalerDspInit to initialize pointers" 2015-01-12 14:25:37 -08:00
Pascal Massimino
ab66becaae introduce a separate WebPRescalerDspInit to initialize pointers
so that we keep the details of WebPRescaler in utils/rescaler.c
when possible.

Change-Id: Ib6c1029a09b84cbc7a7d2f70dafa4d4d9132cecc
2015-01-12 13:58:30 -08:00
Pascal Massimino
205c7f26af fix handling of zero-sized partition #0 corner case
reported in https://code.google.com/p/webp/issues/detail?id=237

An empty partition #0 should be indicative of a bitstream error.
The previous code was correct, only an assert was triggered in debug mode.
But we might as well handle the case properly right away...

Change-Id: I4dc31a46191fa9e65659c9a5bf5de9605e93f2f5
2015-01-12 20:30:53 +01:00
pascal massimino
cbcdd5ffaf Merge "move rescaler functions to rescaler* files in src/dsp/" 2015-01-10 05:41:45 -08:00
pascal massimino
bf586e8844 Merge changes I230b3532,Idf3057a7
* changes:
  enable NEON for Windows ARM builds
  Makefile.vc: add rudimentary Windows ARM support
2015-01-10 02:14:48 -08:00
pascal massimino
6dc79dc226 Merge "anim_encode: fix type conversion warnings" 2015-01-10 02:12:25 -08:00
James Zern
4f43d38ca8 enable NEON for Windows ARM builds
Change-Id: I230b353214ce44ab29ffd2df6ccd14345d6578e8
2015-01-09 19:11:55 -08:00
James Zern
e7c5954c10 dec_neon: remove returns from void functions
Change-Id: I3c66a5dfe3de2bb3653cbbf1b92b0328aba62881
2015-01-09 18:08:05 -08:00
James Zern
f79c163bbf anim_encode: fix type conversion warnings
fixes:
C4267: '=' : conversion from 'size_t' to 'int', possible loss of data

Change-Id: Ie8e0bbd6f19fde21b2dbbd2a92cc99e76502dfed
2015-01-09 17:12:06 -08:00
Djordje Pesut
cbcbedd0de move rescaler functions to rescaler* files in src/dsp/
Change-Id: I906add1b1010a59ebfcc2dd81e15745433cc206b
2015-01-09 16:47:09 +01:00
James Zern
e8694d4dc3 mux: remove experimental FRGM parsing
fragment references remain: to be removed in a future commit

Change-Id: I02974c8a709cfe16dce72568639c8b912859de8e
2015-01-08 20:02:40 -08:00
Urvang Joshi
9e92b6eac6 AnimEncoder API: Optimize single-frame animated images
Try converting them to a non-animated image and pick that one if it's smaller
in size.

Change-Id: Ib97438fd2a95b1bfa9b7526a0938a9d85df33a57
2015-01-08 12:30:46 -08:00
Djordje Pesut
a28c4b363d MIPS: move WORK_AROUND_GCC define to appropriate place
Change-Id: I3055eca57dc4e9d39533a5b8170bbf7af9cd818f
2015-01-08 15:55:41 +01:00
Djordje Pesut
012d2c60fa MIPS: dspr2: added optimization for functions SSEAxB
list of optimized functions: SSE16x16, SSE8x8, SSE16x8, SSE4x4

Change-Id: Ie99e7cdd73b0d4ff855977315a5d0db9ffaa5f04
2015-01-08 13:49:17 +01:00
Djordje Pesut
9241ecf45d MIPS: dspr2: added optimization for function Average
Change-Id: I7ca316bc3f5fbdaf8dcaf9a2d2227a5134bf4f63
2015-01-08 11:46:15 +01:00
pascal massimino
9422211d5f Merge "Tune BackwardReferencesLz77 for low_effort (m=0)." 2015-01-08 00:46:51 -08:00
pascal massimino
df40057b21 Merge "Speedup VP8LGetHistoImageSymbols for low effort (m=0) mode." 2015-01-08 00:46:43 -08:00
Vikas Arora
ea08466d34 Tune BackwardReferencesLz77 for low_effort (m=0).
- Lower the threshold parameters for HashChainFindCopy.

For 1000 image PNG corpus (m=0), this change yields speedup of 15-20% at
lower quality range (0.25% drop in compression density) and about 10%
for higher quality range without any drop in the compression density.
Following is the compression stats (before/after) for method = 0:
         Before           After
         bpp/MPs          bpp/MPs
q=0      2.8615/18.000    2.8651/18.631
q=5      2.8615/18.216    2.8650/20.517
q=10     2.8572/18.070    2.8650/21.992
q=15     2.8519/18.371    2.8584/21.747
q=20     2.8454/18.975    2.8515/20.448
q=25     2.8230/8.531     2.8253/9.585
// Compression density remains same for q-range [30-100]
q=30     2.7310/7.706     2.7310/8.028
q=35     2.7253/6.855     2.7253/7.184
q=40     2.7231/6.364     2.7231/6.604
q=45     2.7216/5.844     2.7216/6.223
q=50     2.7196/5.210     2.7196/5.731
q=55     2.7208/4.766     2.7208/4.970
q=60     2.7195/4.495     2.7195/4.602
q=65     2.7185/4.024     2.7185/4.236
q=70     2.7174/3.699     2.7174/3.861
q=75     2.7164/3.449     2.7164/3.605
q=80     2.7161/3.222     2.7161/3.038
q=85     2.7153/2.919     2.7153/2.946
q=90     2.7145/2.766     2.7145/2.771
q=95     2.7124/2.548     2.7124/2.575
q=100    2.6873/2.253     2.6873/2.335

Change-Id: I0e17581fb71f6094032ad06c6203350bd502f9a1
2015-01-08 00:30:21 -08:00
Vikas Arora
b0b973c39b Speedup VP8LGetHistoImageSymbols for low effort (m=0) mode.
- Do light weight entropy based histogram combine and leave out CPU
  intensive stochastic and greedy heuristics for combining the
  histograms.

For 1000 image PNG corpus (m=0), this change yields speedup of 10% at
lower quality range (1% drop in compression density) and about 5% for
higher quality range (1% drop in compression density). Following is the
compression stats (before/after) for method = 0:
         Before           After
         bpp/MPs          bpp/MPs
q=0      2.8336/16.577    2.8615/18.000
q=5      2.8336/16.504    2.8615/18.216
q=10     2.8293/16.419    2.8572/18.070
q=15     2.8242/17.582    2.8519/18.371
q=20     2.8182/16.131    2.8454/18.975
q=25     2.7924/7.670     2.8230/8.531
q=30     2.7078/6.635     2.7310/7.706
q=35     2.7028/6.203     2.7253/6.855
q=40     2.7005/6.198     2.7231/6.364
q=45     2.6989/5.570     2.7216/5.844
q=50     2.6970/5.087     2.7196/5.210
q=55     2.6963/4.589     2.7208/4.766
q=60     2.6949/4.292     2.7195/4.495
q=65     2.6940/3.970     2.7185/4.024
q=70     2.6929/3.698     2.7174/3.699
q=75     2.6919/3.427     2.7164/3.449
q=80     2.6918/3.106     2.7161/3.222
q=85     2.6909/2.856     2.7153/2.919
q=90     2.6902/2.695     2.7145/2.766
q=95     2.6881/2.499     2.7124/2.548
q=100    2.6873/2.253     2.6873/2.285

Change-Id: I0567945068f8dc7888041e93d872f9def91f50ba
2015-01-08 00:29:57 -08:00
pascal massimino
c6d3292738 argb_sse2: cosmetics
clarify some variable names in PackARGB() + add some comments

Change-Id: I2bb91d6c52dcbcdebe0f92d5f2136c2d7d11af2a
2015-01-08 00:18:54 -08:00
James Zern
67f601cd46 make the 'last_cpuinfo_used' variable names unique
allows the sources to be #include'd in some hackish builds (don't do
that!)

Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91
2015-01-07 23:38:53 -08:00
Urvang Joshi
b9489861a3 AnimEncoder API: Init method for default options.
Change-Id: I3ccd7fe782e10c51986b55fc1a515d958ff70752
2015-01-07 14:32:11 -08:00
pascal massimino
856f8ec1fd Merge "AnimEncoder API: Remove AnimEncoderFrameOptions." 2015-01-07 13:39:10 -08:00
pascal massimino
c537514d46 Merge "AnimEncoder API: GenerateCandidates bugfix." 2015-01-07 13:38:24 -08:00