The priority mempool has a stricter synchronization requirement than the legacy
mempool. Under sufficiently-heavy load, exclusive access can lead to deadlocks
when processing a large batch of transaction rechecks through an out-of-process
application using the socket client.
By design, a socket client stalls when its send buffer fills, during which time
it holds a lock shared with the receive thread. While blocked in this state, a
response read by the receive thread waits for the shared lock so the callback
can be invoked.
If we're lucky, the server will then read the next request and make enough room
in the buffer for the sender to proceed. If not however (e.g., if the next
request is bigger than the one just consumed), the receive thread is blocked:
It is waiting on the lock and cannot read a response. Once the server's output
buffer fills, the system deadlocks.
This can happen with any sufficiently-busy workload, but is more likely during
a large recheck in the v1 mempool, where the callbacks need exclusive access to
mempool state. As a workaround, process rechecks for the priority mempool in
their own goroutines outside the mempool mutex. Responses still head-of-line
block, but will no longer get pushback due to contention on the mempool itself.
In the original implementation transactions evicted for priority were also
removed from the cache. In addition, remove expired transactions from
the cache.
Related:
- Add Has method to cache implementations.
- Update tests to exercise this condition.
This case is symmetric to what we did for CheckTx calls, where we release the
mempool mutex to ensure callbacks can fire during call setup. We also need
this behaviour for application flush, for the same reason: The caller holds the
lock by contract from the Mempool interface.
The way this was originally structured, we reacquired the lock after issuing
the initial ABCI CheckTx call, only to immediately release it. Restructure the
code so that this redundant acquire is no longer necessary.
The primary effect of this change is to simplify the implementation of the
priority mempool to eliminate an unbounded heap growth observed by Vega team
when it was enabled in their testnet. It updates and fixes#8775.
The main body of this change is to remove the auxiliary indexing structures,
and use only the concurrent list structure (the same as the legacy mempool) to
maintain both gossip order and priority.
This means that operations that require priority information, such as block
updates and insert-time evictions, require a linear scan over the mempool.
This tradeoff greatly simplifies the code and eliminates the long-term heap
load, at the cost of some extra CPU and short-lived working memory during
CheckTx and Update calls.
Rough benchmark results:
- This PR:
BenchmarkTxMempool_CheckTx-10 486373 2271 ns/op
- Original priority mempool implementation:
BenchmarkTxMempool_CheckTx-10 500302 2113 ns/op
- Legacy (v0) mempool:
BenchmarkCheckTx-10 364591 3571 ns/op
These benchmarks are not a good proxy for production load, but at least suggest
that the overhead of the implementation changes are not cause for concern.
In addition:
- Rework synchronization so that access to shared data structures is safe.
Previously shared locks were used to exclude block updates during calls that
update mempool state. Now access is properly exclusive where necessary.
- Fix a bug in the recheck flow, where priority updates from the application
were not correctly reflected in the index structures.
- Eliminate the need for separate recheck cursors during block update. This
avoids the need to explicitly invalidate elements of the concurrent list,
which averts the dependency cycle that led to objects being pinned.
- Clean up, clarify, and fix inaccuracies in documentation comments throughout
the package.
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
This pull request fixes a panic that exists in both mempools. The panic occurs when the ABCI client misses a response from the ABCI application. This happen when the ABCI client drops the request as a result of a full client queue. The fix here was to loop through the ordered list of recheck-tx in the callback until one matches the currently observed recheck request.
(cherry picked from commit b0130c88fb)
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Addresses one of the concerns with #7041.
Provides a mechanism (via the RPC interface) to delete a single transaction, described by its hash, from the mempool. The method returns an error if the transaction cannot be found. Once the transaction is removed it remains in the cache and cannot be resubmitted until the cache is cleared or it expires from the cache.
(cherry picked from commit 851d2e3bde)
Co-authored-by: Sam Kleinman <garen@tychoish.com>
The code in the Tendermint repository makes heavy use of import aliasing.
This is made necessary by our extensive reuse of common base package names, and
by repetition of similar names across different subdirectories.
Unfortunately we have not been very consistent about which packages we alias in
various circumstances, and the aliases we use vary. In the spirit of the advice
in the style guide and https://github.com/golang/go/wiki/CodeReviewComments#imports,
his change makes an effort to clean up and normalize import aliasing.
This change makes no API or behavioral changes. It is a pure cleanup intended
o help make the code more readable to developers (including myself) trying to
understand what is being imported where.
Only unexported names have been modified, and the changes were generated and
applied mechanically with gofmt -r and comby, respecting the lexical and
syntactic rules of Go. Even so, I did not fix every inconsistency. Where the
changes would be too disruptive, I left it alone.
The principles I followed in this cleanup are:
- Remove aliases that restate the package name.
- Remove aliases where the base package name is unambiguous.
- Move overly-terse abbreviations from the import to the usage site.
- Fix lexical issues (remove underscores, remove capitalization).
- Fix import groupings to more closely match the style guide.
- Group blank (side-effecting) imports and ensure they are commented.
- Add aliases to multiple imports with the same base package name.
This changes adds a failing test for issue #6660. It achieves this by adding a transaction, starting the `broadcastTxRoutine` in a goroutine and then adding another transaction to the mempool. The `broadcastTxRoutine` can receive the second inserted transaction before `insertTx` returns. In that case, `broadcastTxRoutine` will derefence a nil pointer when referencing the `gossipEl` and panic.
This changes adds an `MempoolError` field to the `ResponseCheckTx`. This will allow clients to understand that their transaction was rejected from the mempool despite passing the ABCI check.
This change also updates the code to make use of early returns to prevent highly nested code blocks. Namely, it returns when the type assertion fails at the beginning of the method, instead of wrapping the entire method in a large if statement. This has a somewhat large effect on the diff as rendered by github.
addresses: #3546