tendermint

mirror of https://github.com/tendermint/tendermint.git synced 2026-06-10 00:03:04 +00:00

Author	SHA1	Message	Date
William Banfield	70624e8d27	more lock instrumentation	2022-08-03 18:20:12 -04:00
William Banfield	6b16cf6d68	Revert "update stats queue to be smaller" This reverts commit `d176124aa0`.	2022-08-03 18:15:46 -04:00
William Banfield	d392a07b99	no peer status	2022-08-03 18:02:11 -04:00
William Banfield	d176124aa0	update stats queue to be smaller	2022-08-03 17:27:04 -04:00
William Banfield	705316442a	More metrics	2022-08-03 16:58:12 -04:00
William Banfield	83dea898fb	add metrics	2022-08-03 16:45:44 -04:00
William Banfield	c764cebbe7	add unlock	2022-08-03 16:20:06 -04:00
William Banfield	f859f5ef6e	add intermediate lock log	2022-08-03 13:51:39 -04:00
William Banfield	92a8e74fdf	add more logs	2022-08-03 11:26:08 -04:00
William Banfield	5d2593c6ee	add lock logs	2022-07-29 15:49:08 -04:00
mergify[bot]	0d2bf39c23	indexer: work around indexing problem for duplicate transactions (forward port: #8625 ) (#8950 )	2022-07-21 19:33:08 +02:00
Callum Waters	3e96a376b0	spec: merge v0.35 spec into tendermint (#9018 )	2022-07-20 12:37:46 +02:00
M. J. Fromberger	22ed610083	mempool: rework lock discipline to mitigate callback deadlocks (#9030 ) The priority mempool has a stricter synchronization requirement than the legacy mempool. Under sufficiently-heavy load, exclusive access can lead to deadlocks when processing a large batch of transaction rechecks through an out-of-process application using the socket client. By design, a socket client stalls when its send buffer fills, during which time it holds a lock shared with the receive thread. While blocked in this state, a response read by the receive thread waits for the shared lock so the callback can be invoked. If we're lucky, the server will then read the next request and make enough room in the buffer for the sender to proceed. If not however (e.g., if the next request is bigger than the one just consumed), the receive thread is blocked: It is waiting on the lock and cannot read a response. Once the server's output buffer fills, the system deadlocks. This can happen with any sufficiently-busy workload, but is more likely during a large recheck in the v1 mempool, where the callbacks need exclusive access to mempool state. As a workaround, process rechecks for the priority mempool in their own goroutines outside the mempool mutex. Responses still head-of-line block, but will no longer get pushback due to contention on the mempool itself.	2022-07-19 13:28:46 -07:00
mergify[bot]	6b18dfcea1	Extract a library from the confix command-line tool. (backport #9012 ) (#9025 ) (cherry picked from commit `18b5a500da`) Pull out the library functionality from scripts/confix and move it to internal/libs/confix. Replace scripts/confix with a simple stub that has the same command-line API, but uses the library instead. Related: - Move and update unit tests. - Move scripts/confix/condiff to scripts/condiff. - Update test data for v34, v35, and v36. - Update reference diffs. - Update testdata README. Co-authored-by: M. J. Fromberger <fromberger@interchain.io>	2022-07-15 08:46:28 -07:00
M. J. Fromberger	b94470a6a4	mempool: ensure evicted transactions are removed from the cache (#9000 ) In the original implementation transactions evicted for priority were also removed from the cache. In addition, remove expired transactions from the cache. Related: - Add Has method to cache implementations. - Update tests to exercise this condition.	2022-07-14 06:51:54 -07:00
M. J. Fromberger	3790968156	mempool: release lock during app connection flush (#8984 ) This case is symmetric to what we did for CheckTx calls, where we release the mempool mutex to ensure callbacks can fire during call setup. We also need this behaviour for application flush, for the same reason: The caller holds the lock by contract from the Mempool interface.	2022-07-12 10:28:51 -07:00
M. J. Fromberger	9e64c95e56	mempool: reduce lock contention during CheckTx (cleanup) (#8983 ) The way this was originally structured, we reacquired the lock after issuing the initial ABCI CheckTx call, only to immediately release it. Restructure the code so that this redundant acquire is no longer necessary.	2022-07-12 08:00:29 -07:00
M. J. Fromberger	cb93d3b587	mempool: don't log message type mismatch in the default callback (#8969 )	2022-07-11 18:06:49 -07:00
M. J. Fromberger	f98de20f7e	p2p: ensure closed channels stop receiving service (#8979 ) Once these channels are closed, we should not continue to service them, as they will never again deliver nonzero values.	2022-07-11 16:34:05 -07:00
M. J. Fromberger	451e697331	Update generated mocks after upgrade of Mockery v2. (#8973 )	2022-07-11 09:18:36 -04:00
mergify[bot]	e3292a48e3	p2p: simpler priority queue (backport #8929 ) (#8956 )	2022-07-08 13:29:42 -04:00
mergify[bot]	1daf7b939d	p2p: make peer gossiping coinflip safer (#8949 ) (#8963 ) Closes #8948 (cherry picked from commit `61ce384d75`) Co-authored-by: Sam Kleinman <garen@tychoish.com>	2022-07-08 12:32:12 -04:00
mergify[bot]	156c305b08	p2p: delete cruft (#8958 ) (#8959 ) I think the decision in #8806 is that we shouldn't do this yet, so I think it's best to just drop this. (cherry picked from commit `636320f901`) Co-authored-by: Sam Kleinman <garen@tychoish.com>	2022-07-08 09:59:57 -04:00
M. J. Fromberger	bc49f66c35	Add more unit tests for the priority mempool. (#8961 ) - Add a test for time-based (TTL) expiration. - Add tests for eviction based on size and priority.	2022-07-07 14:56:34 -07:00
M. J. Fromberger	9b02094827	Fix unbounded heap growth in the priority mempool. (#8944 ) The primary effect of this change is to simplify the implementation of the priority mempool to eliminate an unbounded heap growth observed by Vega team when it was enabled in their testnet. It updates and fixes #8775. The main body of this change is to remove the auxiliary indexing structures, and use only the concurrent list structure (the same as the legacy mempool) to maintain both gossip order and priority. This means that operations that require priority information, such as block updates and insert-time evictions, require a linear scan over the mempool. This tradeoff greatly simplifies the code and eliminates the long-term heap load, at the cost of some extra CPU and short-lived working memory during CheckTx and Update calls. Rough benchmark results: - This PR: BenchmarkTxMempool_CheckTx-10 486373 2271 ns/op - Original priority mempool implementation: BenchmarkTxMempool_CheckTx-10 500302 2113 ns/op - Legacy (v0) mempool: BenchmarkCheckTx-10 364591 3571 ns/op These benchmarks are not a good proxy for production load, but at least suggest that the overhead of the implementation changes are not cause for concern. In addition: - Rework synchronization so that access to shared data structures is safe. Previously shared locks were used to exclude block updates during calls that update mempool state. Now access is properly exclusive where necessary. - Fix a bug in the recheck flow, where priority updates from the application were not correctly reflected in the index structures. - Eliminate the need for separate recheck cursors during block update. This avoids the need to explicitly invalidate elements of the concurrent list, which averts the dependency cycle that led to objects being pinned. - Clean up, clarify, and fix inaccuracies in documentation comments throughout the package. Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>	2022-07-07 07:15:08 -07:00
William Banfield	da83edc588	p2p: return from conn send on stopped mconn (#8904 ) Co-authored-by: Sam Kleinman <garen@tychoish.com>	2022-07-06 10:41:55 -04:00
mergify[bot]	047d7c927b	p2p: fix flakey test due to disconnect cooldown (#8917 ) (#8918 ) This test was made flakey by #8839. The cooldown period means that the node in the test will not try to reconnect as quickly as the test expects. This change makes the cooldown shorter in the test so that the node quickly reconnects. (cherry picked from commit `5274f80de4`) Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: Sam Kleinman <garen@tychoish.com>	2022-07-05 19:11:38 -04:00
mergify[bot]	49788adde5	p2p: use correct context error (#8916 ) (#8920 ) handshakeCtx is the internal context carrying the timeout. Its error should be used for the error return. (cherry picked from commit `921530c352`) Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: Sam Kleinman <garen@tychoish.com> Co-authored-by: Callum Waters <cmwaters19@gmail.com>	2022-07-05 13:36:26 -04:00
William Banfield	978f754ad3	p2p: set empty timeouts to configed values. (manual backport of #8847 ) (#8869 ) * regenerate mocks using newer style * p2p: set empty timeouts to small values. (#8847) These timeouts default to 'do not time out' if they are not set. This times up resources, potentially indefinitely. If node on the other side of the the handshake is up but unresponsive, the[ handshake call](https://github.com/tendermint/tendermint/blob/edec79448aa1d62b84683b1b22e12e145dbdda7c/internal/p2p/router.go#L720) will _never_ return. * fix light client select statement	2022-06-28 16:07:15 -04:00
mergify[bot]	c4ef566071	p2p: remove dial sleep and provide disconnect cooldown (backport #8839 ) (#8875 ) (cherry picked from commit `52b6dc19ba`)	2022-06-27 10:49:51 -04:00
mergify[bot]	826f224c2d	p2p: add eviction metrics and cleanup dialing error handling (backport #8819 ) (#8820 )	2022-06-24 10:42:58 -04:00
mergify[bot]	6f4ef72964	p2p: track peers by address (#8841 ) (#8855 ) (cherry picked from commit `436a38f876`) Co-authored-by: Sam Kleinman <garen@tychoish.com>	2022-06-23 13:21:46 -04:00
mergify[bot]	24701cd587	p2p: more dial routines (#8827 ) (#8828 )	2022-06-21 21:27:28 -04:00
William Banfield	e9c87a3c49	remove dial wake change (#8824 )	2022-06-21 20:20:04 -04:00
Callum Waters	4322f7d0b9	mempool: make error throwing for CheckTx consistent (#8817 )	2022-06-21 18:51:50 +02:00
Sam Kleinman	83526cacbc	p2p: peer store and dialing changes (0.35.x backport) (#8740 ) * p2p: peer store and dialing changes (cherry picked from commit `9dbb135152`) * reduce persistent peer max (cherry picked from commit `b213a2766f`) * don't gossip inactive peers (cherry picked from commit `cc28ce298f`) * fix small case (cherry picked from commit `56a91642dc`) * fix error message (cherry picked from commit `86db59f53b`) * remove seed flag (cherry picked from commit `000aa05485`) * reduce logging level (cherry picked from commit `4e2bc8f51e`) * make const (cherry picked from commit `e3068b50b2`) * update comment (cherry picked from commit `31bd396c88`) * cleanup (cherry picked from commit `eddb23b5af`) * oops * overflows (cherry picked from commit `4c8651026a`) * Update internal/p2p/peermanager.go Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com> (cherry picked from commit `f23f6e1089`) * Update internal/p2p/peermanager.go Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com> (cherry picked from commit `1c02758eaf`) * comment (cherry picked from commit `9f604fd2ef`) * test: new scoring (cherry picked from commit `930fd7f2be`) * fix scoring test (cherry picked from commit `9abc55f3a0`) * cleanup peer manager * fix panic * add metrics * fix compile * fix test * default metrics to noop * noop metrics * update metrics (cherry picked from commit `720600ef62`) * rename metrics * actually shuffle peers more * fix up advertise (cherry picked from commit `8195c97590`) * add max dialing attempts * connection tracking * comments mostly (cherry picked from commit `053ecd9b8c`) * Apply suggestions from code review Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com> * comments * fix lint * cr feedback * fixup cherrypick * make wb happy * more comments * fixup * fix lint * iota fix * add skip * cleanup * remove comment * fix rand * fix rand * use numaddresses correctly * advertise fixes * remove some things * cleanup comment * more fixes * toml * fix comment * fix spell * dec limit * fixes * up the attmept max * cr feedback * probablistic test * fix spell * add metrics for peers stored on startup * p2p: peer score should not wrap around (#8790) (cherry picked from commit `4d820ff4f5`) # Conflicts: # internal/p2p/peermanager.go * fix * wake more * wake if we need to Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>	2022-06-20 13:13:21 -04:00
mergify[bot]	74c6d8100d	p2p: fix typo (#8793 ) (#8794 )	2022-06-19 11:52:43 -07:00
mergify[bot]	ce8284c027	p2p: accept should not abort on first error (backport #8759 ) (#8760 )	2022-06-15 07:56:15 -04:00
Callum Waters	28c38522e0	do not log an error for duplicate txs (#8732 )	2022-06-10 11:56:00 +02:00
mergify[bot]	af0590a819	consensus: switch timeout message to be debug and clarify meaning (#8694 ) (#8696 ) (cherry picked from commit `75a12ea0c6`) Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: Sam Kleinman <garen@tychoish.com> Co-authored-by: Callum Waters <cmwaters19@gmail.com>	2022-06-09 09:45:58 -04:00
mergify[bot]	0e3a3fe58b	p2p: shed peers from store from other networks (backport #8678 ) (#8681 )	2022-06-02 12:15:55 -04:00
Callum Waters	e8ac37223f	pex: align max address thresholds (#8657 )	2022-05-31 14:07:25 -04:00
Sam Kleinman	a889f17e51	consensus: restructure peer catchup sleep (#8651 )	2022-05-31 11:31:51 -04:00
mergify[bot]	4ee91663da	p2p: reduce ability of SendError to disconnect peers (backport #8597 ) (#8603 )	2022-05-25 04:12:43 -04:00
mergify[bot]	2f8483aa85	p2p: remove unused get height methods (backport #8569 ) (#8571 )	2022-05-17 11:32:13 -04:00
mergify[bot]	12fed0ed53	blocksync: validate block before persisting it (backport #8493 ) (#8496 )	2022-05-12 10:36:48 +02:00
Sam Kleinman	bdd59c892c	statesync: avoid potential race (#8494 )	2022-05-11 15:09:41 -04:00
mergify[bot]	14f0d60f24	p2p: fix setting in con-tracker (#8370 ) (#8371 ) (cherry picked from commit `889341152a`)	2022-04-19 23:32:54 -07:00
mergify[bot]	04c1f76569	rpc: avoid leaking threads during checktx (backport #8328 ) (#8333 )	2022-04-17 09:17:03 -04:00
Ethan Reesor	226bc94c5f	node: always close database engine (#7113 ) (#8330 )	2022-04-15 14:37:34 -07:00

1 2 3 4

197 Commits