64 Commits

Author SHA1 Message Date
skal
9c4ce971a8 Simplify forward-WHT + SSE2 version
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.

also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )

Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
2013-04-26 08:57:18 +02:00
Pascal Massimino
3c8eb9a806 fix bad saturation order in QuantizeBlock
Saturation was done on input coeff, not quantized one.

This saturation is not absolutely needed: output of FTransformWHT
is in range [-16320, 16321]. At quality 100, max quantization steps is 8,
so the maximal range used by QuantizeBlock() is [-2040, 2040].
But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so
it's better to leave this saturation check for now.

addresses issue #145

Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28
2013-03-25 14:53:29 -07:00
James Zern
be7c96b069 cosmetics: break a few long lines
Change-Id: I785763b974b4e7664ad8e9884251aa2d5274b456
2013-01-23 14:50:19 -08:00
skal
d5838cd598 faster non-transposing SSE2 4x4 FTransform
1-2% faster.
uses pmaddwd instead of transpose + pmullw.
Can possibly be simplified further.

Change-Id: I420e148816c4c6ab5e2080c9b1719dbbe6762d4e
2012-11-27 08:38:24 +01:00
skal
42c3b550ba simplify the fwd transform
-> remove two shifts

Change-Id: Ibc55bca98588da30553a7870224ffd0e13d57f52
2012-11-15 09:51:35 +01:00
skal
118cb31270 Merge "add SSE2 version of Sum of Square error for 16x16, 16x8 and 8x8 case" 2012-11-15 00:07:44 -08:00
skal
e5c3b3f554 Simplify the texture evaluation Disto4x4()
We don't need to use the exact forward transform,
since it's only a rough evaluation.
-> Removed some shifts and rounding constants.

Change-Id: I3fdf8b4fe9720473894155e1ad0345f4d1fd9a33
2012-11-14 07:49:31 +01:00
skal
35bfd4c08f add SSE2 version of Sum of Square error for 16x16, 16x8 and 8x8 case
+ replace mm_set1_ps(0) by _mm_setzero_si128()

Change-Id: I4601033c27466532373f5dabfaf349ce5e5039da
2012-11-14 06:16:49 +01:00
skal
5725cabac0 new segmentation algorithm
fixes the 'blocky sky problem' (saturation problem: when luma was flat,
chroma noise was taking over, resulting in random segment id assigned.
When just using a common uniform segment was better).

+ side clean-up and readibility/experimentability MACRO'ization
+ added '-map 7' option

Change-Id: I35982a9e43c0fecbfdd7b05e4813e8ba8c121d71
2012-09-04 23:09:15 +02:00
Pascal Massimino
7c6e60f4bd make *InitSSE2() functions be empty on non-SSE2 platform
this avoids the '*.o has no symbols' warning messages

Change-Id: Idbaa02f5c2f7c632997a26f9507926922d191b6e
2012-08-27 23:40:47 -07:00
James Zern
80256b8567 enc_sse2 add missing stdlib.h include
lost in fbd82b5; most platforms were getting it indirectly through
emmintrin.h.

Change-Id: I310f8bc8e82d63cfbde74c34cd21b72514a16a01
2012-04-19 15:47:58 -07:00
James Zern
ad1e163a0d cosmetics: normalize copyright headers
Change-Id: I5e2462b101e0447a4f15a1455c07131bc97a52dd
2012-01-06 14:49:06 -08:00
James Zern
f06817aaea simplify checks for enabling SSE2 code
also fixes build issues under vs11 which has a native arm compiler for
windows 8 targets

Change-Id: Id76c2deae9fc9de147d13ad0d34edffcb5a726c4
2011-12-20 17:41:55 -08:00
Pascal Massimino
e06ac0887f create a separate libwebpdsp under src/dsp
Gathers all DSP-related function (and SSE2 implementations).
Clean-up some unwanted symbolic dependencies so that webp_encode,
webp_decode and webp_dsp are truly independent libraries.

+ opportunistic clean-up:
  * remove unneeded VP8DspInitTables(), now integrated in VP8DspInit()
  * make consistent use of VP8GetCPUInfo() in the various DspInit() funcs
  * change OUT macro to DST
2011-09-13 12:29:44 -07:00