This case is symmetric to what we did for CheckTx calls, where we release the
mempool mutex to ensure callbacks can fire during call setup. We also need
this behaviour for application flush, for the same reason: The caller holds the
lock by contract from the Mempool interface.
The way this was originally structured, we reacquired the lock after issuing
the initial ABCI CheckTx call, only to immediately release it. Restructure the
code so that this redundant acquire is no longer necessary.
I think the decision in #8806 is that we shouldn't do this yet, so I think it's best to just drop this.
(cherry picked from commit 636320f901)
Co-authored-by: Sam Kleinman <garen@tychoish.com>
The primary effect of this change is to simplify the implementation of the
priority mempool to eliminate an unbounded heap growth observed by Vega team
when it was enabled in their testnet. It updates and fixes#8775.
The main body of this change is to remove the auxiliary indexing structures,
and use only the concurrent list structure (the same as the legacy mempool) to
maintain both gossip order and priority.
This means that operations that require priority information, such as block
updates and insert-time evictions, require a linear scan over the mempool.
This tradeoff greatly simplifies the code and eliminates the long-term heap
load, at the cost of some extra CPU and short-lived working memory during
CheckTx and Update calls.
Rough benchmark results:
- This PR:
BenchmarkTxMempool_CheckTx-10 486373 2271 ns/op
- Original priority mempool implementation:
BenchmarkTxMempool_CheckTx-10 500302 2113 ns/op
- Legacy (v0) mempool:
BenchmarkCheckTx-10 364591 3571 ns/op
These benchmarks are not a good proxy for production load, but at least suggest
that the overhead of the implementation changes are not cause for concern.
In addition:
- Rework synchronization so that access to shared data structures is safe.
Previously shared locks were used to exclude block updates during calls that
update mempool state. Now access is properly exclusive where necessary.
- Fix a bug in the recheck flow, where priority updates from the application
were not correctly reflected in the index structures.
- Eliminate the need for separate recheck cursors during block update. This
avoids the need to explicitly invalidate elements of the concurrent list,
which averts the dependency cycle that led to objects being pinned.
- Clean up, clarify, and fix inaccuracies in documentation comments throughout
the package.
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
This test was made flakey by #8839. The cooldown period means that the node in the test will not try to reconnect as quickly as the test expects. This change makes the cooldown shorter in the test so that the node quickly reconnects.
(cherry picked from commit 5274f80de4)
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Co-authored-by: Sam Kleinman <garen@tychoish.com>
* regenerate mocks using newer style
* p2p: set empty timeouts to small values. (#8847)
These timeouts default to 'do not time out' if they are not set. This times up resources, potentially indefinitely. If node on the other side of the the handshake is up but unresponsive, the[ handshake call](edec79448a/internal/p2p/router.go (L720)) will _never_ return.
* fix light client select statement
When testing rollback feature in the Cosmos SDK, we found that the app hash
in Tendermint after rollback was the value after the latest block, rather than
before it.
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
(cherry picked from commit 8a238fdcb4)
Co-authored-by: yihuang <huang@crypto.com>