mirror of
https://github.com/tendermint/tendermint.git
synced 2026-01-09 22:47:24 +00:00
adr 067: mempool refactor (#6368)
This commit is contained in:
committed by
GitHub
parent
9cee35bb8c
commit
f969614dc8
@@ -96,3 +96,4 @@ Note the context/background should be written in the present tense.
|
||||
- [ADR-054: Crypto-Encoding-2](./adr-054-crypto-encoding-2.md)
|
||||
- [ADR-057: RPC](./adr-057-RPC.md)
|
||||
- [ADR-063: Privval-gRPC](./adr-063-privval-grpc.md)
|
||||
- [ADR-067: Mempool Refactor](./adr-067-mempool-refactor)
|
||||
|
||||
303
docs/architecture/adr-067-mempool-refactor.md
Normal file
303
docs/architecture/adr-067-mempool-refactor.md
Normal file
@@ -0,0 +1,303 @@
|
||||
# ADR 067: Mempool Refactor
|
||||
|
||||
- [ADR 067: Mempool Refactor](#adr-067-mempool-refactor)
|
||||
- [Changelog](#changelog)
|
||||
- [Status](#status)
|
||||
- [Context](#context)
|
||||
- [Current Design](#current-design)
|
||||
- [Alternative Approaches](#alternative-approaches)
|
||||
- [Prior Art](#prior-art)
|
||||
- [Ethereum](#ethereum)
|
||||
- [Diem](#diem)
|
||||
- [Decision](#decision)
|
||||
- [Detailed Design](#detailed-design)
|
||||
- [CheckTx](#checktx)
|
||||
- [Mempool](#mempool)
|
||||
- [Eviction](#eviction)
|
||||
- [Gossiping](#gossiping)
|
||||
- [Performance](#performance)
|
||||
- [Future Improvements](#future-improvements)
|
||||
- [Consequences](#consequences)
|
||||
- [Positive](#positive)
|
||||
- [Negative](#negative)
|
||||
- [Neutral](#neutral)
|
||||
- [References](#references)
|
||||
|
||||
## Changelog
|
||||
|
||||
- April 19, 2021: Initial Draft (@alexanderbez)
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
|
||||
Tendermint Core has a reactor and data structure, mempool, that facilitates the
|
||||
ephemeral storage of uncommitted transactions. Honest nodes participating in a
|
||||
Tendermint network gossip these uncommitted transactions to each other if they
|
||||
pass the application's `CheckTx`. In addition, block proposers select from the
|
||||
mempool a subset of uncommitted transactions to include in the next block.
|
||||
|
||||
Currently, the mempool in Tendermint Core is designed as a FIFO queue. In other
|
||||
words, transactions are included in blocks as they are received by a node. There
|
||||
currently is no explicit and prioritized ordering of these uncommitted transactions.
|
||||
This presents a few technical and UX challenges for operators and applications.
|
||||
|
||||
Namely, validators are not able to prioritize transactions by their fees or any
|
||||
incentive aligned mechanism. In addition, the lack of prioritization also leads
|
||||
to cascading effects in terms of DoS and various attack vectors on networks,
|
||||
e.g. [cosmos/cosmos-sdk#8224](https://github.com/cosmos/cosmos-sdk/discussions/8224).
|
||||
|
||||
Thus, Tendermint Core needs the ability for an application and its users to
|
||||
prioritize transactions in a flexible and performant manner. Specifically, we're
|
||||
aiming to either improve, maintain or add the following properties in the
|
||||
Tendermint mempool:
|
||||
|
||||
- Allow application-determined transaction priority.
|
||||
- Allow efficient concurrent reads and writes.
|
||||
- Allow block proposers to reap transactions efficiently by priority.
|
||||
- Maintain a fixed mempool capacity by transaction size and evict lower priority
|
||||
transactions to make room for higher priority transactions.
|
||||
- Allow transactions to be gossiped by priority efficiently.
|
||||
- Allow operators to specify a maximum TTL for transactions in the mempool before
|
||||
they're automatically evicted if not selected for a block proposal in time.
|
||||
- Ensure the design allows for future extensions, such as replace-by-priority and
|
||||
allowing multiple pending transactions per sender, to be incorporated easily.
|
||||
|
||||
Note, not all of these properties will be addressed by the proposed changes in
|
||||
this ADR. However, this proposal will ensure that any unaddressed properties
|
||||
can be addressed in an easy and extensible manner in the future.
|
||||
|
||||
### Current Design
|
||||
|
||||

|
||||
|
||||
At the core of the `v0` mempool reactor is a concurrent linked-list. This is the
|
||||
primary data structure that contains `Tx` objects that have passed `CheckTx`.
|
||||
When a node receives a transaction from another peer, it executes `CheckTx`, which
|
||||
obtains a read-lock on the `*CListMempool`. If the transaction passes `CheckTx`
|
||||
locally on the node, it is added to the `*CList` by obtaining a write-lock. It
|
||||
is also added to the `cache` and `txsMap`, both of which obtain their own respective
|
||||
write-locks and map a reference from the transaction hash to the `Tx` itself.
|
||||
|
||||
Transactions are continuously gossiped to peers whenever a new transaction is added
|
||||
to a local node's `*CList`, where the node at the front of the `*CList` is selected.
|
||||
Another transaction will not be gossiped until the `*CList` notifies the reader
|
||||
that there are more transactions to gossip.
|
||||
|
||||
When a proposer attempts to propose a block, they will execute `ReapMaxBytesMaxGas`
|
||||
on the reactor's `*CListMempool`. This call obtains a read-lock on the `*CListMempool`
|
||||
and selects as many transactions as possible starting from the front of the `*CList`
|
||||
moving to the back of the list.
|
||||
|
||||
When a block is finally committed, a caller invokes `Update` on the reactor's
|
||||
`*CListMempool` with all the selected transactions. Note, the caller must also
|
||||
explicitly obtain a write-lock on the reactor's `*CListMempool`. This call
|
||||
will remove all the supplied transactions from the `txsMap` and the `*CList`, both
|
||||
of which obtain their own respective write-locks. In addition, the transaction
|
||||
may also be removed from the `cache` which obtains it's own write-lock.
|
||||
|
||||
## Alternative Approaches
|
||||
|
||||
When considering which approach to take for a priority-based flexible and
|
||||
performant mempool, there are two core candidates. The first candidate is less
|
||||
invasive in the required set of protocol and implementation changes, which
|
||||
simply extends the existing `CheckTx` ABCI method. The second candidate essentially
|
||||
involves the introduction of new ABCI method(s) and would require a higher degree
|
||||
of complexity in protocol and implementation changes, some of which may either
|
||||
overlap or conflict with the upcoming introduction of [ABCI++](https://github.com/tendermint/spec/blob/master/rfc/004-abci%2B%2B.md).
|
||||
|
||||
For more information on the various approaches and proposals, please see the
|
||||
[mempool discussion](https://github.com/tendermint/tendermint/discussions/6295).
|
||||
|
||||
## Prior Art
|
||||
|
||||
### Ethereum
|
||||
|
||||
The Ethereum mempool, specifically [Geth](https://github.com/ethereum/go-ethereum),
|
||||
contains a mempool, `*TxPool`, that contains various mappings indexed by account,
|
||||
such as a `pending` which contains all processable transactions for accounts
|
||||
prioritized by nonce. It also contains a `queue` which is the exact same mapping
|
||||
except it contains not currently processable transactions. The mempool also
|
||||
contains a `priced` index of type `*txPricedList` that is a priority queue based
|
||||
on transaction price.
|
||||
|
||||
### Diem
|
||||
|
||||
The [Diem mempool](https://github.com/diem/diem/blob/master/mempool/README.md#implementation-details)
|
||||
contains a similar approach to the one we propose. Specifically, the Diem mempool
|
||||
contains a mapping from `Account:[]Tx`. On top of this primary mapping from account
|
||||
to a list of transactions, are various indexes used to perform certain actions.
|
||||
|
||||
The main index, `PriorityIndex`. is an ordered queue of transactions that are
|
||||
“consensus-ready” (i.e., they have a sequence number which is sequential to the
|
||||
current sequence number for the account). This queue is ordered by gas price so
|
||||
that if a client is willing to pay more (than other clients) per unit of
|
||||
execution, then they can enter consensus earlier.
|
||||
|
||||
## Decision
|
||||
|
||||
To incorporate a priority-based flexible and performant mempool in Tendermint Core,
|
||||
we will introduce new fields, `priority` and `sender`, into the `ResponseCheckTx`
|
||||
type.
|
||||
|
||||
We will introduce a new versioned mempool reactor, `v1` and assume an implicit
|
||||
version of the current mempool reactor as `v0`. In the new `v1` mempool reactor,
|
||||
we largely keep the functionality the same as `v0` except we augment the underlying
|
||||
data structures. Specifically, we keep a mapping of senders to transaction objects.
|
||||
On top of this mapping, we index transactions to provide the ability to efficiently
|
||||
gossip and reap transactions by priority.
|
||||
|
||||
## Detailed Design
|
||||
|
||||
### CheckTx
|
||||
|
||||
We introduce the following new fields into the `ResponseCheckTx` type:
|
||||
|
||||
```diff
|
||||
message ResponseCheckTx {
|
||||
uint32 code = 1;
|
||||
bytes data = 2;
|
||||
string log = 3; // nondeterministic
|
||||
string info = 4; // nondeterministic
|
||||
int64 gas_wanted = 5 [json_name = "gas_wanted"];
|
||||
int64 gas_used = 6 [json_name = "gas_used"];
|
||||
repeated Event events = 7 [(gogoproto.nullable) = false, (gogoproto.jsontag) = "events,omitempty"];
|
||||
string codespace = 8;
|
||||
+ int64 priority = 9;
|
||||
+ string sender = 10;
|
||||
}
|
||||
```
|
||||
|
||||
It is entirely up the application in determining how these fields are populated
|
||||
and with what values, e.g. the `sender` could be the signer and fee payer
|
||||
of the transaction, the `priority` could be the cumulative sum of the fee(s).
|
||||
|
||||
Only `sender` is required, while `priority` can be omitted which would result in
|
||||
using the default value of zero.
|
||||
|
||||
### Mempool
|
||||
|
||||
The existing concurrent-safe linked-list will be replaced by a thread-safe map
|
||||
of `<sender:*Tx>`, i.e a mapping from `sender` to a single `*Tx` object, where
|
||||
each `*Tx` is the next valid and processable transaction from the given `sender`.
|
||||
|
||||
On top of this mapping, we index all transactions by priority using a thread-safe
|
||||
priority queue, i.e. a [max heap](https://en.wikipedia.org/wiki/Min-max_heap).
|
||||
When a proposer is ready to select transactions for the next block proposal,
|
||||
transactions are selected from this priority index by highest priority order.
|
||||
When a transaction is selected and reaped, it is removed from this index and
|
||||
from the `<sender:*Tx>` mapping.
|
||||
|
||||
We define `Tx` as the following data structure:
|
||||
|
||||
```go
|
||||
type Tx struct {
|
||||
// Tx represents the raw binary transaction data.
|
||||
Tx []byte
|
||||
|
||||
// Priority defines the transaction's priority as specified by the application
|
||||
// in the ResponseCheckTx response.
|
||||
Priority int64
|
||||
|
||||
// Sender defines the transaction's sender as specified by the application in
|
||||
// the ResponseCheckTx response.
|
||||
Sender string
|
||||
|
||||
// Index defines the current index in the priority queue index. Note, if
|
||||
// multiple Tx indexes are needed, this field will be removed and each Tx
|
||||
// index will have its own wrapped Tx type.
|
||||
Index int
|
||||
}
|
||||
```
|
||||
|
||||
### Eviction
|
||||
|
||||
Upon successfully executing `CheckTx` for a new `Tx` and the mempool is currently
|
||||
full, we must check if there exists a `Tx` of lower priority that can be evicted
|
||||
to make room for the new `Tx` with higher priority and with sufficient size
|
||||
capacity left.
|
||||
|
||||
If such a `Tx` exists, we find it by obtaining a read lock and sorting the
|
||||
priority queue index. Once sorted, we find the first `Tx` with lower priority and
|
||||
size such that the new `Tx` would fit within the mempool's size limit. We then
|
||||
remove this `Tx` from the priority queue index as well as the `<sender:*Tx>`
|
||||
mapping.
|
||||
|
||||
This will require additional `O(n)` space and `O(n*log(n))` runtime complexity. Note that the space complexity does not depend on the size of the tx.
|
||||
|
||||
### Gossiping
|
||||
|
||||
We keep the existing thread-safe linked list as an additional index. Using this
|
||||
index, we can efficiently gossip transactions in the same manner as they are
|
||||
gossiped now (FIFO).
|
||||
|
||||
Gossiping transactions will not require locking any other indexes.
|
||||
|
||||
### Performance
|
||||
|
||||
Performance should largely remain unaffected apart from the space overhead of
|
||||
keeping an additional priority queue index and the case where we need to evict
|
||||
transactions from the priority queue index. There should be no reads which
|
||||
block writes on any index
|
||||
|
||||
## Future Improvements
|
||||
|
||||
There are a few considerable ways in which the proposed design can be improved or
|
||||
expanded upon. Namely, transaction gossiping and for the ability to support
|
||||
multiple transactions from the same `sender`.
|
||||
|
||||
With regards to transaction gossiping, we need empirically validate whether we
|
||||
need to gossip by priority. In addition, the current method of gossiping may not
|
||||
be the most efficient. Specifically, broadcasting all the transactions a node
|
||||
has in it's mempool to it's peers. Rather, we should explore for the ability to
|
||||
gossip transactions on a request/response basis similar to Ethereum and other
|
||||
protocols. Not only does this reduce bandwidth and complexity, but also allows
|
||||
for us to explore gossiping by priority or other dimensions more efficiently.
|
||||
|
||||
Allowing for multiple transactions from the same `sender` is important and will
|
||||
most likely be a needed feature in the future development of the mempool, but for
|
||||
now it suffices to have the preliminary design agreed upon. Having the ability
|
||||
to support multiple transactions per `sender` will require careful thought with
|
||||
regards to the interplay of the corresponding ABCI application. Regardless, the
|
||||
proposed design should allow for adaptations to support this feature in a
|
||||
non-contentious and backwards compatible manner.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Transactions are allowed to be prioritized by the application.
|
||||
|
||||
### Negative
|
||||
|
||||
- Increased size of the `ResponseCheckTx` Protocol Buffer type.
|
||||
- Causal ordering is NOT maintained.
|
||||
- It is possible that certain transactions broadcasted in a particular order may
|
||||
pass `CheckTx` but not end up being committed in a block because they fail
|
||||
`CheckTx` later. e.g. Consider Tx<sub>1</sub> that sends funds from existing
|
||||
account Alice to a _new_ account Bob with priority P<sub>1</sub> and then later
|
||||
Bob's _new_ account sends funds back to Alice in Tx<sub>2</sub> with P<sub>2</sub>,
|
||||
such that P<sub>2</sub> > P<sub>1</sub>. If executed in this order, both
|
||||
transactions will pass `CheckTx`. However, when a proposer is ready to select
|
||||
transactions for the next block proposal, they will select Tx<sub>2</sub> before
|
||||
Tx<sub>1</sub> and thus Tx<sub>2</sub> will _fail_ because Tx<sub>1</sub> must
|
||||
be executed first. This is because there is a _causal ordering_,
|
||||
Tx<sub>1</sub> ➝ Tx<sub>2</sub>. These types of situations should be rare as
|
||||
most transactions are not causally ordered and can be circumvented by simply
|
||||
trying again at a later point in time or by ensuring the "child" priority is
|
||||
lower than the "parent" priority. In other words, if parents always have
|
||||
priories that are higher than their children, then the new mempool design will
|
||||
maintain causal ordering.
|
||||
|
||||
### Neutral
|
||||
|
||||
- A transaction that passed `CheckTx` and entered the mempool can later be evicted
|
||||
at a future point in time if a higher priority transaction entered while the
|
||||
mempool was full.
|
||||
|
||||
## References
|
||||
|
||||
- [ABCI++](https://github.com/tendermint/spec/blob/master/rfc/004-abci%2B%2B.md)
|
||||
- [Mempool Discussion](https://github.com/tendermint/tendermint/discussions/6295)
|
||||
BIN
docs/architecture/img/mempool-v0.jpeg
Normal file
BIN
docs/architecture/img/mempool-v0.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 92 KiB |
Reference in New Issue
Block a user