ssse3 is bit #9 in ecx, bit 1 is sse3. this only controls the check for
slow ssse3 and likely had no ill effect.
Change-Id: I84ce73dc480e1cdbd085e37be06f3f402116c201
add a check for cpu-features.h and rework some of the ifdef's around
android + neon. for android builds with cpu-features enabled the
*_neon.c files will still need to be flagged correctly (with e.g.,
.c.neon in Android.mk) to properly build them.
BUG=webp:353
Change-Id: I905ce305af0a204e560b915d8665093a3edaceb9
including the type in the macro doesn't bring much benefit to ordering,
current platforms work with a prefix, this would be insufficient if the
attribute needed to follow the function prototype. this form makes it
easier to override on the command line.
BUG=webp:355
Change-Id: Iba41ec0bb319403054be0e899c4cc472dd932fd9
Half of the functionality was duplicated.
The rest is about the alpha channel handling so we
might as well put it in the appropriate file.
Change-Id: I8d5ef0afce82cc4842ab7132fd97995c42e6140a
Currently, none are available. If WEBP_HAVE_SSE2 eventually works,
we'll have to refine this conditionals.
BUG=webp:261
Change-Id: Ibc63ee1c013f2a4169eeb85cc8b6317b6420c2ad
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.
Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.
Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731
Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
and 15% faster MultARGBRow()
by switching to formulae:
X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
(X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128
Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
after:
fbba5bc optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...
Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
For some reason, gcc has hard time inlining this one...
Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]
Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
Average3 created a slowdown of 1-2% in lossless decoding.
Average4 created a slowdown of 2-3% in lossless decoding.
Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5
...instead of the pointers stored in the array.
Should be faster (inlined) and safer.
Also: suffix explicitly the functions with _SSE2
Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90
Before, a first thread could enter VP8LDspInitSSE2, set
VP8LPredictorsAdd to an SSE2 version BEFORE another thread
would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C
thus leading to a C version actually being the SSE2 one (which
would then create an infinite recursion in the SSE2 predictors
at execution).
Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a
Benchmarks from vrabaud@:
8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged
skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged
images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged
images/lpcb corpus speed: unchanged, corpus size: unchanged
images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged
images/png_doc corpus speed: unchanged, corpus size: unchanged
images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged
images/jpeg_small corpus speed: unchanged, corpus size: unchanged
images/icip_core1 corpus speed: unchanged, corpus size: unchanged
images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged
images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged
images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged
images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged
images/gif corpus speed: faster: -1.4 % , corpus size: unchanged
images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged
images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged
images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged
images/all corpus speed: faster: -1.5 % , corpus size: unchanged
images/png_small corpus speed: unchanged, corpus size: unchanged
images/png corpus speed: unchanged, corpus size: unchanged
images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged
Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b
Roughly, if both the source and the reference areas are
darker too dark (R/G/B <= ~6), they are ignored.
One caveat: SSIM calculation won't work for U/V planes,
which are 128-centered and not related to luminance.
But WebPPlaneDistortion() enforces the conversion to RGB,
if needed.
Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591
* prevent 64bit overflow by controlling the 32b->64b conversions
and preventively descaling by 8bit before the final multiply
* adjust the threshold constants C1 and C2 to de-emphasis the dark
areas
* use a hat-like filter instead of box-filtering to avoid blockiness
during averaging
SSIM distortion calc is actually *faster* now in SSE2, because of the
unrolling during the function rewrite.
The C-version is quite slower because still un-optimized.
Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90
SSIM results are incompatible with previous version!
We're now averaging the SSIM value for each pixels instead of
printing a frame-level global SSIM value.
* Got rid of some old code
* switched to uint32_t for accumulation
* refactoring
SSIM calculation is ~4x faster now.
Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673
Having it architecture dependent resulted in an extra
function call of an extern function, hence no inlining and
a 5-10% impact on performance.
Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450
-print_psnr is now much faster because it doesn't use the SSIM code.
The SSIM speed-up and re-write will come later.
Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823
We add the following MSA optimized rescaling functions:
- RescalerExportRowExpand
- RescalerExportRowShrink
Change-Id: Ic1c76065423b02617db94cf0c22bb564219b36e6
We add the following MSA optimized color transform functions:
- TransformColor
- SubtractGreenFromBlueAndRed
Change-Id: Ib182d2b5faa7191f503ce70f0dfde0ac89402fd3
The decision is based on the variance between DC values of each
sub-4x4 block. This heuristic is rather ok for predicting whether
the 2nd transform (intra-16) is going to help or not.
The decision threshold varies with quality (=quantization).
It's only used for -m 0 and -m 1, where no full RD-opt is performed.
It actually makes these modes quite faster, with RD curve much
closer to the -m 2 mode.
Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8
We add the following MSA optimized encoder quantization functions:
- QuantizeBlock
- Quantize2Blocks
Change-Id: Ie32b442afa99eee62d2ef48942b41116a4e157d3
- 6/14/2016: version 0.5.1
This is a binary compatible release.
* miscellaneous bug fixes (issues #280, #289)
* reverted alpha plane encoding with color cache for compatibility with
libwebp 0.4.0->0.4.3 (issues #291, #298)
* lossless encoding performance improvements
* memory reduction in both lossless encoding and decoding
* force mux output to be in the extended format (VP8X) when undefined chunks
are present (issue #294)
* gradle, cmake build support
* workaround for compiler bug causing 64-bit decode failures on android
devices using clang-3.8 in the r11c NDK
* various WebPAnimEncoder improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXfb1vAAoJEPnD1r24IytdtbwP/iCCEEU9scepXgh9+ICUOm1D
6ASfz6eTYIPP4s2E+kIJKrKeGUrk7U1j6BeehjKxS3vMQxQlJvkXvepk0mdJUO4C
okttfLahLY6DOZSAETK9SI4haE2Uuz5WGfxMe8x+4uuZZTxSLHqOCFMvU2oxo6uM
rhErJgH3jWE9vGV9OuI8YUa109qGi8PLtErrFjXqFmAvnxJS95kJHr3MHVoulH8g
tXrSUYTq37BCfSsxudhZTCENLhYqlXHO5tydvQVAlVbXJfpOsNLQciWUrqFiPuB9
qhUv3smRV9YBd4XuUgFWLQcbcecQVBzIqxJ7lv41R71vi17Lu4plLjNAc0Cx70qc
cnfe/acH+9hX0EwBzpvOpN/Lzirx1tmBKPOqnSiFpFP48RZSngLMG0mwhUufyq1I
y6T2rEcMLRbAX/85sGMRd1AwffoW6OvgPG2LdhW2bh8u9YbA/g3qGH98z2T1JKjy
V/TNvpTjXAdZ5XQMY8zIunv83Wp/6AWmJIRWZ+mfhw29F/F80HQG2Ss7dulbe3m2
zpBjxdsaLj+9iZpheewrGGImZ5mJQsG7nRovtQ0VARVaRSY3xpaYug2CqXlQQ2bc
bjdmGS9u+a4fHdk+uKTMzJEbu4RbXcOeLrvpzA+PxhUQi9WRyLIucIWeVVEDiUI2
p7OJop9JmPjkRvvqfi5y
=Mchr
-----END PGP SIGNATURE-----
Merge tag 'v0.5.1'
libwebp-0.5.1
- 6/14/2016: version 0.5.1
This is a binary compatible release.
* miscellaneous bug fixes (issues #280, #289)
* reverted alpha plane encoding with color cache for compatibility with
libwebp 0.4.0->0.4.3 (issues #291, #298)
* lossless encoding performance improvements
* memory reduction in both lossless encoding and decoding
* force mux output to be in the extended format (VP8X) when undefined chunks
are present (issue #294)
* gradle, cmake build support
* workaround for compiler bug causing 64-bit decode failures on android
devices using clang-3.8 in the r11c NDK
* various WebPAnimEncoder improvements
* tag 'v0.5.1': (30 commits)
update ChangeLog
Clarify the expected 'config' lifespan in WebPIDecode()
update ChangeLog
Fix corner case in CostManagerInit.
gif2webp: normalize the number of .'s in the help message
vwebp: normalize the number of .'s in the help message
cwebp: normalize the number of .'s in the help message
fix rescaling bug: alpha plane wasn't filled with 0xff
Improve lossless compression.
'our bug tracker' -> 'the bug tracker'
normalize the number of .'s in the help message
pngdec,ReadFunc: throw an error on invalid read
decode.h,WebPGetInfo: normalize function comment
Inline GetResidual for speed.
Speed-up uniform-region processing.
free -> WebPSafeFree()
DecodeImageData(): change the incorrect assert
Fix a boundary case in BackwardReferencesHashChainDistanceOnly.
Make sure to consider small distances in LZ77.
add some asserts to delimit the perimeter of CostManager's operation
...
Change-Id: I44cee79fddd43527062ea9d83be67da42484ebfc
We add the following MSA optimized color transform functions:
- AddGreenToBlueAndRed
- TransformColorInverse
Change-Id: Iceab3813905955aa8b811253df9188512fc7de3f