* prevent 64bit overflow by controlling the 32b->64b conversions
and preventively descaling by 8bit before the final multiply
* adjust the threshold constants C1 and C2 to de-emphasis the dark
areas
* use a hat-like filter instead of box-filtering to avoid blockiness
during averaging
SSIM distortion calc is actually *faster* now in SSE2, because of the
unrolling during the function rewrite.
The C-version is quite slower because still un-optimized.
Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90
using -ssim -o will trigger SSIM map calculation instead of add-diff map.
-gray converts the error map to intensity instead of having each channels' error separated.
Change-Id: I4bdb88880a252e5562aa4e0e3c2353ad93aef20e