libwebp

mirror of https://github.com/webmproject/libwebp.git synced 2025-07-15 13:29:54 +02:00

Author	SHA1	Message	Date
James Zern	dc48196bd9	dec_neon: add TM16 over 20M pixels ~78% faster Change-Id: I420d5d590f275f19e08f86df1d1caa6b82fffbde	2015-05-15 12:50:11 -07:00
James Zern	ea95b305ca	dec_neon/TrueMotion: simply left border load use vld1_dup_u8() rather than a separate ld+dup after the values were zero extended; mildly faster at the function level Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89	2015-05-15 12:48:13 -07:00
Pascal Massimino	f262d6120e	speed-up SetResidualSSE2 (was unnecessarily complicated) Before: VP8SetResidualCoeffs: checksum = 1127918 elapsed = 475 ms. Change-Id: Ia54bef86c45f9f474622ff16e594bf1da4f67ebd After: VP8SetResidualCoeffs: checksum = 1127918 elapsed = 404 ms.	2015-05-14 21:24:24 -07:00
James Zern	bf46d0acff	fix mips2 build target tested with mips1 and mips2; this should cover 3/4 as well. fixes an ftbfs reported on the debian issue tracker: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785000 Change-Id: I2458487c92bd638589fdfec5adb4f22102a5960c	2015-05-13 10:36:22 -07:00
James Zern	929a0fdccd	enc_sse2/TTransform: simplify abs calculation max(b, 0 - b) works as well as (b ^ sign) - b Change-Id: Iad923236fd70db85ff58a64d3c8e25e4f42a525d	2015-05-08 19:50:29 -07:00
James Zern	17dbd05819	enc_sse2/CollectHistogram: simplify abs calculation max(out, 0 - out) works as well as (out ^ sign) - out Change-Id: Id820ab9b296512cb0d56c8026b986bf98e3d3909	2015-05-08 19:49:08 -07:00
James Zern	a6c1593645	dec_neon: add DC16 intra predictors improvement over 20M pixels: DC16: ~77% DC16NoTop: ~78% DC16NoLeft: ~83% DC16NoTopLeft: ~83% Change-Id: I4c4ee16a8fa0eb466eee45dfa6f6bbce5ce64b99	2015-05-08 00:12:48 -07:00
James Zern	f274a96ce9	dsp/enc_sse2: add luma4 intra predictors VP8EncPredLuma4 improvement over ~20M pixels: ~39% Change-Id: I9cd841250771276d2d1bef3991215a56e83f7f20	2015-05-05 23:51:19 -07:00
James Zern	040b11bdf6	dsp/enc_sse2: add chroma intra predictors VP8EncPredChroma8 improvements over ~20M pixels left/top: ~67% left-only: ~52% top-only: ~57% none: ~61% based on dec_sse2 versions with minor changes to benefit from the linear storage of the left boundary Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76	2015-05-05 23:51:14 -07:00
James Zern	aee021bbb1	dsp/enc_sse2: add luma16 intra predictors VP8EncPredLuma16 improvements over ~20M pixels left/top: ~75% left-only: ~47% top-only: ~59% none: ~63% based on dec_sse2 versions with minor changes to benefit from the linear storage of the left boundary Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d	2015-05-05 23:51:07 -07:00
James Zern	4c9af02326	dec_neon: add DC8uvNoTopLeft ~93% faster Change-Id: Icf0fd5f85ac53c306a1b69d84275023e5b24a602	2015-05-01 20:03:57 -07:00
Pascal Massimino	9287761d95	Merge "GetResidualCostSSE2: simplify abs calculation"	2015-04-30 06:30:58 +00:00
James Zern	0e009366f8	dsp/cpu.c(x86): check maximum supported cpuid feature structured extended feature flags require eax = 7; avoids incorrectly detecting avx2 on some older processors that support avx. for completeness also check for value=1 support used by the other checks. from [1]: INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor Information and the Vendor Identification String [1] http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html Change-Id: I60b20d661a978d551614dbf7acdc25db19cb6046	2015-04-29 23:22:53 -07:00
James Zern	b243a4bc30	GetResidualCostSSE2: simplify abs calculation max(coeff, 0 - coeff) works as well as min/max/sub or (coeff ^ sign) - coeff Change-Id: I9b11715372e49cd83820677bf4beba4a1c04931c	2015-04-21 20:29:12 -07:00
Pascal Massimino	b83bd7c4ea	Merge "populate 'libwebpextras' with: import gray, rgb565 and rgb4444 functions"	2015-04-17 15:30:52 -07:00
James Zern	dbba67d1e7	histogram.h: cosmetics: remove unnecessary includes Change-Id: Ia8277d3587534c2a1af05d3df57a6973a68be16d	2015-04-17 12:23:06 -07:00
James Zern	e978fec61a	Merge "VP8LBitReader: fix remaining ubsan error with large shifts"	2015-04-17 00:30:05 -07:00
James Zern	d6fe588469	Merge "ReconstructRow: move some one-time inits out of the main loop"	2015-04-16 14:51:36 -07:00
Pascal Massimino	a21d647c11	ReconstructRow: move some one-time inits out of the main loop + some cosmetics clean-up Change-Id: Ifb34b914844bb7734137bacd61fcfc4a13971665	2015-04-16 14:31:19 -07:00
Pascal Massimino	7a01c3c3ec	VP8LBitReader: fix remaining ubsan error with large shifts * make VP8LPrefetchBits() safe wrt past-EOS reads * set 'BitReader::bits_" to a safe shifting value upon EOS no visible performance difference on x86 Change-Id: I0a4177928cfa81d5dfc9054b36a686eaa1bf8c65	2015-04-16 00:57:42 -07:00
Pascal Massimino	7fa67c9b9e	change GetPixPairHash64() return type to uint32_t Change-Id: Ibb61c1631d7a4bcda5417b5a85864d5e2c3f3858	2015-04-16 00:55:25 -07:00
pascal massimino	ec1fb9f8dd	Merge "dsp/enc.c: cosmetics: move DST() def closer to use"	2015-04-16 00:17:37 -07:00
Pascal Massimino	7073bfb3ee	Merge "split 64-mult hashing into two 32-bit multiplies"	2015-04-15 23:04:47 -07:00
James Zern	0768b252fa	dsp/enc.c: cosmetics: move DST() def closer to use Change-Id: Iccbcf046412426c2893b71eced517f611d2ffc3f	2015-04-15 20:03:39 -07:00
James Zern	6a48b8f003	Merge "fix MSVC size_t->int conversion warning"	2015-04-15 19:54:18 -07:00
James Zern	1db07cdeef	Merge "anim_encode: cosmetics: fix alignment"	2015-04-15 15:32:12 -07:00
James Zern	e28271a394	anim_encode: cosmetics: fix alignment Change-Id: I0a746421f5cceebbbecfb75d11d11ec5d86a1900	2015-04-15 15:03:17 -07:00
Pascal Massimino	7fe357b8c0	split 64-mult hashing into two 32-bit multiplies Speed-wise equivalent on x86 and ARM (maybe a tad faster, hard to tell). Note that the two 32-bit multiples are not strictly equivalent to the 64-bit one, since we're missing one carry propagation. In practice, no observable difference was seen because of this slightly different hashing result. Change-Id: I8f2381175eae1cb20dabf149e6b27e1768fba6ab	2015-04-15 17:45:19 +02:00
Pascal Massimino	af74c1453b	populate 'libwebpextras' with: import gray, rgb565 and rgb4444 functions update makefile.unix to provide 'make extras' building instructions. note: input ordering depends on WEBP_SWAP_16BIT_CSP for rgb565 and rgb4444 Change-Id: I6f22d32189d9ba2619146a9714cedabfe28e2ad0	2015-04-15 02:54:44 -07:00
Pascal Massimino	6121413415	remove VP8Residual::cost unused field Change-Id: Id494475b05c540b40fd104594acbcaa783b88d77	2015-04-15 01:56:31 -07:00
Pascal Massimino	e25448235a	fix MSVC size_t->int conversion warning use size_t for 'total_frames' and compute average with float arith. Change-Id: Ibf16edb38405b0d525bec38c246cf874668c994e	2015-04-14 23:55:00 -07:00
Urvang Joshi	0ac29c5190	AnimEncoder API: Consistent use of trailing underscores in struct. Change-Id: Ica361eee0059250a6800c6c43264e3bd5e5aa3e0	2015-04-14 15:44:41 -07:00
Urvang Joshi	d484555024	AnimEncoder API: Use timestamp instead of duration as input to Add(). When converting from video sources, the duration of current frame is often unavailable until the next frame. So, we internally convert timestamps to durations. Change-Id: I20ad86361c22e014be7eb91f00d5d40108281351	2015-04-14 12:00:57 -07:00
James Zern	9904e365a8	dsp/dec_sse2: DC8uv / DC8uvNoLeft speedup use psadbw to perform top row summation; left remains in C as repacking it into a vector to apply the same operation is too costly. DC8uv: ~19% faster DC8uvNoLeft: ~12% faster Change-Id: I707c4f6177a65b5d1f2d3deeca87d2bb740185e2	2015-04-08 23:12:53 -07:00
James Zern	7df2049785	dsp/dec_sse2: DC16 / DC16NoLeft speedup use psadbw to perform top row summation; left remains in C as repacking it into a vector to apply the same operation is too costly. DC16: ~20% faster DC16NoLeft: ~14% faster Change-Id: I7ec3f8a6e5923f88a530f79fceb88d5001bef691	2015-04-08 23:10:39 -07:00
James Zern	db12250fd1	cosmetics: vp8enci.h: break long line Change-Id: Ib7c7ef6171506e826ed5f7df20c5644f240fd645	2015-04-06 16:11:02 -07:00
James Zern	b44eda3f60	dsp: add DSP_INIT_STUB generates a stub function when the specific architecture is not enabled, exposing a symbol in the module, avoiding a compiler warning Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147	2015-04-02 23:55:35 -07:00
Pascal Massimino	03e76e962e	clarify the comment about double-setting the status in SetError() Change-Id: I67107220b7a84459592c726dab95483acd4f59f2	2015-04-01 15:27:55 -07:00
Pascal Massimino	9fecdd713e	remove unused EmitRGB() Change-Id: If4d3d775b051206abdab8c603cd3887e9f25d102	2015-04-01 15:27:55 -07:00
Pascal Massimino	43f010dd6d	move ReconstructRow to top (one less TODO) Change-Id: Iaf36d28ab10633faaaa25f2c37ac799747456adc	2015-04-01 15:27:36 -07:00
Pascal Massimino	82d980209b	add a dec/common.h header to collect common enc/dec #defines had to rename few structs. -> we can now include both vp8i.h and vp8enci.h without naming conflicts. Change-Id: Ib41b498f1b57aab3d6b796361afc45210ec75174	2015-03-31 22:17:58 -07:00
pascal massimino	5d4744a253	Merge "enc_sse41: add Disto4x4 / Disto16x16"	2015-03-27 01:12:45 -07:00
Urvang Joshi	e38886a771	mux.h: Bump up ABI version This was not bumped up after some recent changes; e.g. WebPAnimEncoderOptionsInit() method. Change-Id: Ia473b83ddd7a3d8c227d8eeb126809a97e327475	2015-03-26 23:52:22 -07:00
James Zern	1a338fb306	enc_sse41: add Disto4x4 / Disto16x16 direct translation from sse2; minor gain, fewer instructions Change-Id: I60288a842fac1a686b82b5cab637931789fe29f2	2015-03-25 23:28:46 -07:00
Pascal Massimino	94055503e3	encoding SSE4.1 stub for StoreHistogram + Quantize + SSE_16xN Visible speed-up, thanks to pshufb and pabsw and psignw use. had to tweak configure.ac to make "smmintri.h" presence correctly detected (we need to set the CPPFLAGS instead of the CFLAGS!) Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080	2015-03-25 20:23:51 -07:00
Pascal Massimino	c64659e1b4	remove duplicate variables after the lossless{_enc}.c split clang was giving "duplicate symbols" error messages at link time. Change-Id: I2b77b55222fe033cc1d4636567902e80d814aab6	2015-03-25 11:10:21 +01:00
James Zern	67ba7c7acc	enc_sse2: call local FTransform in CollectHistogram allows the former to be inlined; negligible speed-up in most cases, however this is structure is consistent with the rest of the optimized modules Change-Id: Ib080240b06f7a995b47f1906627850c355b82901	2015-03-24 20:22:24 -07:00
James Zern	182497993b	dsp: s/VP8LSetHistogramData/VP8SetHistogramData/ this function is for lossy encoding; the VP8L prefix is used by lossless Change-Id: I147590a91477a77af51ed79cc640546dfe53abdb	2015-03-24 18:27:41 -07:00
James Zern	ede5e1584c	cosmetics: dsp/lossless.h: reorder prototypes group decoding / encoding functions together, followed by their respective Init() function. Change-Id: Ib4d22f8ec2369efec752faf733ecf53acc67b1ca	2015-03-24 17:52:42 -07:00
James Zern	553051f741	dsp/lossless: split enc/dec functions adds lossless_enc*.c; reduces the size of the decode-only so: ~78K w/gcc-4.8.2 on x86_64. Change-Id: If5e4610b67d05eba5896bc64bab79e9df92b2092	2015-03-23 22:57:50 -07:00

1 2 3 4 5 ...

1865 Commits