applied PR comments

This commit is contained in:
Jasmina Malicevic
2022-07-04 16:22:22 +02:00
parent 79476dead0
commit e9fef1763e

View File

@@ -7,42 +7,50 @@ parent:
# Blocksync
In a proof of work blockchain, syncing with the chain is the same process as staying up-to-date with the consensus: download blocks, and look for the one with the most total work. In proof-of-stake, the consensus process is more complex, as it involves rounds of communication between the nodes to determine what block should be committed next. Using this process to sync up with the blockchain from scratch can take a very long time. It's much faster to just download blocks and check the merkle tree of validators than to run the real-time consensus gossip protocol.
This section describes the blocksync service provided by Tendermint. The main functionality provided by it is enabling new or recovering nodes to sync up to the head of the chain quickly. Normally, in a proof-of-stake blockchain, blocks are decided when multiple nodes reach consensus on it, requiring many rounds of communication. Once a block is decided on, it is executed against the application and stored, together with the proof that at least 2/3 of the nodes have voted for it. This proof takes the shape of a `Commit` which stores the signatures of 2/3+ validators that approved the block.
The Blocksync Reactor's high level responsibility is to enable peers who are
far behind the current state of the consensus to quickly catch up by downloading
many blocks (that have already been decided) in parallel, verifying their commits, and executing them against the
ABCI application.
It is sufficient for a syncing node to contact its peers, download old blocks in parallel, verify their commit signatures and execute them sequentially (in order) against the application. As the signatures that validated a block at height `H` are stored with the block at height `H + 1`, blocksync needs to download both blocks in order to verify the former.
Tendermint full nodes run the Blocksync Reactor as a service to provide blocks
to new or recovering nodes. The nodes run the Blocksync Reactor in "fast_sync" mode,
where they actively make requests for more blocks until they sync up.
Once caught up, "fast_sync" mode is disabled and the node switches to
using the Consensus Reactor.
We can think about the blocksync service having two modes of operation. One is as a *server*, providing syncing nodes with blocks and commits, the other as a *client* requesting blocks from peers. Each full node runs a blocksync reactor, which launches both the server and client routines.
*Note* It is currently assumed that the Consensus reactor is already running. It is therefore not turned on by the Blocksync reactor. In case it has not been started, the Blocksync reactor simply returns.
Blocksync is implemented within a Reactor whose internal data structures keep track of peers connected to a node and keep a pool of blocks downloaded from them if the node is currently syncing. Blocks are applied one by one until a node is considered caught up (more details on the conditions can be found in the sections below).
### Conditions to start Blocksync
Once caught up, the node switches to consensus and stops running blocksync as a client.
A node can switch to blocksync directly on start-up or after completing `state-sync`. Currently, switching back to blocksync from consensus is not possible. It is expected to be handled in [Issue #129](https://github.com/tendermint/tendermint/issues/129).
### Starting Blocksync
A node can switch to blocksync directly on start-up or after completing `state-sync`. [State sync](ToDo) is downloading only snapshots of application state along with commits. This is further speeding up the process of catching up to the latest height as a node does not download the actual blocks.
The blocksync reactor service is started at the same time as all the other services in Tendermint. But blocksync-inc is disabled (blockSync boolean flag is false) initially and thus the blockpool and the routine to process blocks from the pool are not launched until the reactor is actually activated.
The reactor is activated after state sync, where the pool and request processing routines are launched.
However, receiving messages via the p2p channel and sending status updates to other nodes is enabled regardless of whether the blocksync reactor is started. This makes sense as a node should be able to send updates to other peers regardless of whether it itself is blocksyncing.
However, receiving messages via the p2p channel and sending status updates to other nodes is enabled regardless of whether the blocksync reactor is started. Essentially every node is running as a blocksync server as soon as it is started up.
**Note**. In the current version, if we start from state sync and block sync is not launched before as a service, the internal channels used by the reactor will not be created. We need to be careful to launch the blocksync *service* before we call the function to switch from statesync to blocksync.
**Note**. In the current version, if we start from state sync and blocksync is not launched before as a service, the internal channels used by the reactor will not be created. We need to be careful to launch the blocksync *service* before we call the function to switch from statesync to blocksync.
### Switching from blocksync to consensus
Ideally, the switch to consensus is done once the node considers itself caugh up or we have not advanced our height for more than 60s.
The former is checked by calling `isCaughtUp` inside `poolRoutine` periodically. This period is set with `switchToConsensusTicker`. We consider a node to be caught up if it is 1 height away from the maximum height reported by its peers. The reason we **do not catch up until the maximum height** (`pool.maxPeerHeight`) is that we cannot verify the block at `pool.maxPeerHeight` without the `lastCommit` of the block at `pool.maxPeerHeight + 1`.
Ideally, the switch to consensus is done once the node considers itself caugh up or we has not advanced its height for more than a predefined amount of time (currently 60s).
If the node is not starting from genesis, blocksync **does not** switch to consensus until we have synced at least one block. We need to have vote extensions in order to participate in consensus and they are not provided to the blocksync reactor after state sync. We therefore need to receive them from one of our peers.
We consider a node to be caught up if it is 1 height away from the maximum height reported by its peers. The reason we **do not catch up until the maximum height** (`pool.maxPeerHeight`) is that we cannot verify the block at `pool.maxPeerHeight` without the `lastCommit` of the block at `pool.maxPeerHeight + 1`.
This check is performed periodically by calling `isCaughtUp` inside `poolRoutine`.
When the node is starting from genesis, the first block does not need the vote extensions and is able to switch directly to consensus.
**Note** It is currently assumed that the Consensus reactor is already running. It is therefore not turned on by the Blocksync reactor. In case it has not been started, the Blocksync reactor simply returns.
#### **Vote extensions in v.036**
In v0.36, Tendermint introduced vote extensions, which is application defined data, attached to the votes deciding on a block. In order to actively participate in consensus, a node has to have vote extensions from the previous height. Therefore, in addition to storing and downloading commits, nodes download and send vote extensions to their peers as well.
For this reason, if the node is not starting from genesis but only after state sync, blocksync **does not** switch to consensus until it syncs at least one block. As state sync only stores a state snapshot (without vote extensions), we need to receive them from one of our peers.
### Switching to blocksync from consensus
Ideally, a node should be able to switch to blocksync even if it does not crash, but falls behind in consensus. Unfortunately, switching back to blocksync from consensus is not possible at the moment. It is expected to be handled in [Issue #129](https://github.com/tendermint/tendermint/issues/129).
## Architecture and algorithm
The Blocksync reactor is organised as a set of concurrent tasks: