mirror of
https://github.com/tendermint/tendermint.git
synced 2025-12-23 06:15:19 +00:00
This reverts commit 9842b4b0fb.
This commit is contained in:
@@ -5,22 +5,17 @@
|
||||
The ABCI message types are defined in a [protobuf
|
||||
file](https://github.com/tendermint/tendermint/blob/master/abci/types/types.proto).
|
||||
|
||||
ABCI methods are split across four separate ABCI _connections_:
|
||||
ABCI methods are split across 3 separate ABCI _connections_:
|
||||
|
||||
- Consensus connection: `InitChain`, `BeginBlock`, `DeliverTx`, `EndBlock`, `Commit`
|
||||
- Mempool connection: `CheckTx`
|
||||
- Info connection: `Info`, `SetOption`, `Query`
|
||||
- Snapshot connection: `ListSnapshots`, `LoadSnapshotChunk`, `OfferSnapshot`, `ApplySnapshotChunk`
|
||||
- `Consensus Connection`: `InitChain, BeginBlock, DeliverTx, EndBlock, Commit`
|
||||
- `Mempool Connection`: `CheckTx`
|
||||
- `Info Connection`: `Info, SetOption, Query`
|
||||
|
||||
The consensus connection is driven by a consensus protocol and is responsible
|
||||
The `Consensus Connection` is driven by a consensus protocol and is responsible
|
||||
for block execution.
|
||||
|
||||
The mempool connection is for validating new transactions, before they're
|
||||
The `Mempool Connection` is for validating new transactions, before they're
|
||||
shared or included in a block.
|
||||
|
||||
The info connection is for initialization and for queries from the user.
|
||||
|
||||
The snapshot connection is for serving and restoring [state sync snapshots](apps.md#state-sync).
|
||||
The `Info Connection` is for initialization and for queries from the user.
|
||||
|
||||
Additionally, there is a `Flush` method that is called on every connection,
|
||||
and an `Echo` method that is just for debugging.
|
||||
@@ -156,22 +151,6 @@ The result is an updated application state.
|
||||
Cryptographic commitments to the results of DeliverTx, EndBlock, and
|
||||
Commit are included in the header of the next block.
|
||||
|
||||
## State Sync
|
||||
|
||||
State sync allows new nodes to rapidly bootstrap by discovering, fetching, and applying
|
||||
state machine snapshots instead of replaying historical blocks. For more details, see the
|
||||
[state sync section](apps.md#state-sync).
|
||||
|
||||
When a new node is discovering snapshots in the P2P network, existing nodes will call
|
||||
`ListSnapshots` on the application to retrieve any local state snapshots. The new node will
|
||||
offer these snapshots to its local application via `OfferSnapshot`.
|
||||
|
||||
Once the application accepts a snapshot and begins restoring it, Tendermint will fetch snapshot
|
||||
chunks from existing nodes via `LoadSnapshotChunk` and apply them sequentially to the local
|
||||
application with `ApplySnapshotChunk`. When all chunks have been applied, the application
|
||||
`AppHash` is retrieved via an `Info` query and compared to the blockchain's `AppHash` verified
|
||||
via light client.
|
||||
|
||||
## Messages
|
||||
|
||||
### Echo
|
||||
@@ -409,83 +388,6 @@ via light client.
|
||||
other purposes, e.g. auditing, replay of non-persisted heights, light client
|
||||
verification, and so on.
|
||||
|
||||
### ListSnapshots
|
||||
|
||||
- **Response**:
|
||||
- `Snapshots ([]Snapshot)`: List of local state snapshots.
|
||||
- **Usage**:
|
||||
- Used during state sync to discover available snapshots on peers.
|
||||
- See `Snapshot` data type for details.
|
||||
|
||||
### LoadSnapshotChunk
|
||||
|
||||
- **Request**:
|
||||
- `Height (uint64)`: The height of the snapshot the chunks belongs to.
|
||||
- `Format (uint32)`: The application-specific format of the snapshot the chunk belongs to.
|
||||
- `Chunk (uint32)`: The chunk index, starting from `0` for the initial chunk.
|
||||
- **Response**:
|
||||
- `Chunk ([]byte)`: The binary chunk contents, in an arbitray format. Chunk messages cannot be
|
||||
larger than 16 MB _including metadata_, so 10 MB is a good starting point.
|
||||
- **Usage**:
|
||||
- Used during state sync to retrieve snapshot chunks from peers.
|
||||
|
||||
### OfferSnapshot
|
||||
|
||||
- **Request**:
|
||||
- `Snapshot (Snapshot)`: The snapshot offered for restoration.
|
||||
- `AppHash ([]byte)`: The light client-verified app hash for this height, from the blockchain.
|
||||
- **Response**:
|
||||
- `Result (Result)`: The result of the snapshot offer.
|
||||
- `accept`: Snapshot is accepted, start applying chunks.
|
||||
- `abort`: Abort snapshot restoration, and don't try any other snapshots.
|
||||
- `reject`: Reject this specific snapshot, try others.
|
||||
- `reject_format`: Reject all snapshots with this `format`, try others.
|
||||
- `reject_senders`: Reject all snapshots from all senders of this snapshot, try others.
|
||||
- **Usage**:
|
||||
- `OfferSnapshot` is called when bootstrapping a node using state sync. The application may
|
||||
accept or reject snapshots as appropriate. Upon accepting, Tendermint will retrieve and
|
||||
apply snapshot chunks via `ApplySnapshotChunk`. The application may also choose to reject a
|
||||
snapshot in the chunk response, in which case it should be prepared to accept further
|
||||
`OfferSnapshot` calls.
|
||||
- Only `AppHash` can be trusted, as it has been verified by the light client. Any other data
|
||||
can be spoofed by adversaries, so applications should employ additional verification schemes
|
||||
to avoid denial-of-service attacks. The verified `AppHash` is automatically checked against
|
||||
the restored application at the end of snapshot restoration.
|
||||
- For more information, see the `Snapshot` data type or the [state sync section](apps.md#state-sync).
|
||||
|
||||
### ApplySnapshotChunk
|
||||
|
||||
- **Request**:
|
||||
- `Index (uint32)`: The chunk index, starting from `0`. Tendermint applies chunks sequentially.
|
||||
- `Chunk ([]byte)`: The binary chunk contents, as returned by `LoadSnapshotChunk`.
|
||||
- `Sender (string)`: The P2P ID of the node who sent this chunk.
|
||||
- **Response**:
|
||||
- `Result (Result)`: The result of applying this chunk.
|
||||
- `accept`: The chunk was accepted.
|
||||
- `abort`: Abort snapshot restoration, and don't try any other snapshots.
|
||||
- `retry`: Reapply this chunk, combine with `RefetchChunks` and `RejectSenders` as appropriate.
|
||||
- `retry_snapshot`: Restart this snapshot from `OfferSnapshot`, reusing chunks unless
|
||||
instructed otherwise.
|
||||
- `reject_snapshot`: Reject this snapshot, try a different one.
|
||||
- `RefetchChunks ([]uint32)`: Refetch and reapply the given chunks, regardless of `Result`. Only
|
||||
the listed chunks will be refetched, and reapplied in sequential order.
|
||||
- `RejectSenders ([]string)`: Reject the given P2P senders, regardless of `Result`. Any chunks
|
||||
already applied will not be refetched unless explicitly requested, but queued chunks from these senders will be discarded, and new chunks or other snapshots rejected.
|
||||
- **Usage**:
|
||||
- The application can choose to refetch chunks and/or ban P2P peers as appropriate. Tendermint
|
||||
will not do this unless instructed by the application.
|
||||
- The application may want to verify each chunk, e.g. by attaching chunk hashes in
|
||||
`Snapshot.Metadata` and/or incrementally verifying contents against `AppHash`.
|
||||
- When all chunks have been accepted, Tendermint will make an ABCI `Info` call to verify that
|
||||
`LastBlockAppHash` and `LastBlockHeight` matches the expected values, and record the
|
||||
`AppVersion` in the node state. It then switches to fast sync or consensus and joins the
|
||||
network.
|
||||
- If Tendermint is unable to retrieve the next chunk after some time (e.g. because no suitable
|
||||
peers are available), it will reject the snapshot and try a different one via `OfferSnapshot`.
|
||||
The application should be prepared to reset and accept it or abort as appropriate.
|
||||
|
||||
###
|
||||
|
||||
## Data Types
|
||||
|
||||
### Header
|
||||
@@ -638,22 +540,3 @@ via light client.
|
||||
- `Type (string)`: Type of Merkle proof and how it's encoded.
|
||||
- `Key ([]byte)`: Key in the Merkle tree that this proof is for.
|
||||
- `Data ([]byte)`: Encoded Merkle proof for the key.
|
||||
|
||||
### Snapshot
|
||||
|
||||
- **Fields**:
|
||||
- `Height (uint64)`: The height at which the snapshot was taken (after commit).
|
||||
- `Format (uint32)`: An application-specific snapshot format, allowing applications to version
|
||||
their snapshot data format and make backwards-incompatible changes. Tendermint does not
|
||||
interpret this.
|
||||
- `Chunks (uint32)`: The number of chunks in the snapshot. Must be at least 1 (even if empty).
|
||||
- `Hash (bytes)`: An arbitrary snapshot hash. Must be equal only for identical snapshots across
|
||||
nodes. Tendermint does not interpret the hash, it only compares them.
|
||||
- `Metadata (bytes)`: Arbitrary application metadata, for example chunk hashes or other
|
||||
verification data.
|
||||
|
||||
- **Usage**:
|
||||
- Used for state sync snapshots, see [separate section](apps.md#state-sync) for details.
|
||||
- A snapshot is considered identical across nodes only if _all_ fields are equal (including
|
||||
`Metadata`). Chunks may be retrieved from all nodes that have the same snapshot.
|
||||
- When sent across the network, a snapshot message can be at most 4 MB.
|
||||
|
||||
@@ -14,11 +14,10 @@ Here we cover the following components of ABCI applications:
|
||||
application state
|
||||
- [Crash Recovery](#crash-recovery) - handshake protocol to synchronize
|
||||
Tendermint and the application on startup.
|
||||
- [State Sync](#state-sync) - rapid bootstrapping of new nodes by restoring state machine snapshots
|
||||
|
||||
## State
|
||||
|
||||
Since Tendermint maintains four concurrent ABCI connections, it is typical
|
||||
Since Tendermint maintains three concurrent ABCI connections, it is typical
|
||||
for an application to maintain a distinct state for each, and for the states to
|
||||
be synchronized during `Commit`.
|
||||
|
||||
@@ -93,11 +92,6 @@ QueryState should be set to the latest `DeliverTxState` at the end of every `Com
|
||||
ie. after the full block has been processed and the state committed to disk.
|
||||
Otherwise it should never be modified.
|
||||
|
||||
### Snapshot Connection
|
||||
|
||||
The Snapshot Connection is optional, and is only used to serve state sync snapshots for other nodes
|
||||
and/or restore state sync snapshots to a local node being bootstrapped.
|
||||
|
||||
## Transaction Results
|
||||
|
||||
`ResponseCheckTx` and `ResponseDeliverTx` contain the same fields.
|
||||
@@ -470,147 +464,3 @@ If `appBlockHeight == storeBlockHeight`
|
||||
update the state using the saved ABCI responses but dont run the block against the real app.
|
||||
This happens if we crashed after the app finished Commit but before Tendermint saved the state.
|
||||
|
||||
## State Sync
|
||||
|
||||
A new node joining the network can simply join consensus at the genesis height and replay all
|
||||
historical blocks until it is caught up. However, for large chains this can take a significant
|
||||
amount of time, often on the order of weeks or months.
|
||||
|
||||
State sync is an alternative mechanism for bootstrapping a new node, where it fetches a snapshot
|
||||
of the state machine at a given height and restores it. Depending on the application, this can
|
||||
be several orders of magnitude faster than replaying blocks.
|
||||
|
||||
Note that state sync does not currently backfill historical blocks, so the node will have a
|
||||
truncated block history - users are advised to consider the broader network implications of this in
|
||||
terms of block availability and auditability. This functionality may be added in the future.
|
||||
|
||||
For details on the specific ABCI calls and types, see the [methods and types section](abci.md).
|
||||
|
||||
### Taking Snapshots
|
||||
|
||||
Applications that want to support state syncing must take state snapshots at regular intervals. How
|
||||
this is accomplished is entirely up to the application. A snapshot consists of some metadata and
|
||||
a set of binary chunks in an arbitrary format:
|
||||
|
||||
* `Height (uint64)`: The height at which the snapshot is taken. It must be taken after the given
|
||||
height has been committed, and must not contain data from any later heights.
|
||||
|
||||
* `Format (uint32)`: An arbitrary snapshot format identifier. This can be used to version snapshot
|
||||
formats, e.g. to switch from Protobuf to MessagePack for serialization. The application can use
|
||||
this when restoring to choose whether to accept or reject a snapshot.
|
||||
|
||||
* `Chunks (uint32)`: The number of chunks in the snapshot. Each chunk contains arbitrary binary
|
||||
data, and should be less than 16 MB; 10 MB is a good starting point.
|
||||
|
||||
* `Hash ([]byte)`: An arbitrary hash of the snapshot. This is used to check whether a snapshot is
|
||||
the same across nodes when downloading chunks.
|
||||
|
||||
* `Metadata ([]byte)`: Arbitrary snapshot metadata, e.g. chunk hashes for verification or any other
|
||||
necessary info.
|
||||
|
||||
For a snapshot to be considered the same across nodes, all of these fields must be identical. When
|
||||
sent across the network, snapshot metadata messages are limited to 4 MB.
|
||||
|
||||
When a new node is running state sync and discovering snapshots, Tendermint will query an existing
|
||||
application via the ABCI `ListSnapshots` method to discover available snapshots, and load binary
|
||||
snapshot chunks via `LoadSnapshotChunk`. The application is free to choose how to implement this
|
||||
and which formats to use, but should provide the following guarantees:
|
||||
|
||||
* **Consistent:** A snapshot should be taken at a single isolated height, unaffected by
|
||||
concurrent writes. This can e.g. be accomplished by using a data store that supports ACID
|
||||
transactions with snapshot isolation.
|
||||
|
||||
* **Asynchronous:** Taking a snapshot can be time-consuming, so it should not halt chain progress,
|
||||
for example by running in a separate thread.
|
||||
|
||||
* **Deterministic:** A snapshot taken at the same height in the same format should be identical
|
||||
(at the byte level) across nodes, including all metadata. This ensures good availability of
|
||||
chunks, and that they fit together across nodes.
|
||||
|
||||
A very basic approach might be to use a datastore with MVCC transactions (such as RocksDB),
|
||||
start a transaction immediately after block commit, and spawn a new thread which is passed the
|
||||
transaction handle. This thread can then export all data items, serialize them using e.g.
|
||||
Protobuf, hash the byte stream, split it into chunks, and store the chunks in the file system
|
||||
along with some metadata - all while the blockchain is applying new blocks in parallel.
|
||||
|
||||
A more advanced approach might include incremental verification of individual chunks against the
|
||||
chain app hash, parallel or batched exports, compression, and so on.
|
||||
|
||||
Old snapshots should be removed after some time - generally only the last two snapshots are needed
|
||||
(to prevent the last one from being removed while a node is restoring it).
|
||||
|
||||
### Bootstrapping a Node
|
||||
|
||||
An empty node can be state synced by setting the configuration option `statesync.enabled =
|
||||
true`. The node also needs the chain genesis file for basic chain info, and configuration for
|
||||
light client verification of the restored snapshot: a set of Tendermint RPC servers, and a
|
||||
trusted header hash and corresponding height from a trusted source, via the `statesync`
|
||||
configuration section.
|
||||
|
||||
Once started, the node will connect to the P2P network and begin discovering snapshots. These
|
||||
will be offered to the local application, and once a snapshot is accepted Tendermint will fetch
|
||||
and apply the snapshot chunks. After all chunks have been successfully applied, Tendermint verifies
|
||||
the app's `AppHash` against the chain using the light client, then switches the node to normal
|
||||
consensus operation.
|
||||
|
||||
#### Snapshot Discovery
|
||||
|
||||
When the empty node join the P2P network, it asks all peers to report snapshots via the
|
||||
`ListSnapshots` ABCI call (limited to 10 per node). After some time, the node picks the most
|
||||
suitable snapshot (generally prioritized by height, format, and number of peers), and offers it
|
||||
to the application via `OfferSnapshot`. The application can choose a number of responses,
|
||||
including accepting or rejecting it, rejecting the offered format, rejecting the peer who sent
|
||||
it, and so on. Tendermint will keep discovering and offering snapshots until one is accepted or
|
||||
the application aborts.
|
||||
|
||||
#### Snapshot Restoration
|
||||
|
||||
Once a snapshot has been accepted via `OfferSnapshot`, Tendermint begins downloading chunks from
|
||||
any peers that have the same snapshot (i.e. that have identical metadata fields). Chunks are
|
||||
spooled in a temporary directory, and then given to the application in sequential order via
|
||||
`ApplySnapshotChunk` until all chunks have been accepted.
|
||||
|
||||
As with taking snapshots, the method for restoring them is entirely up to the application, but will
|
||||
generally be the inverse of how they are taken.
|
||||
|
||||
During restoration, the application can respond to `ApplySnapshotChunk` with instructions for how
|
||||
to continue. This will typically be to accept the chunk and await the next one, but it can also
|
||||
ask for chunks to be refetched (either the current one or any number of previous ones), P2P peers
|
||||
to be banned, snapshots to be rejected or retried, and a number of other responses - see the ABCI
|
||||
reference for details.
|
||||
|
||||
If Tendermint fails to fetch a chunk after some time, it will reject the snapshot and try a
|
||||
different one via `OfferSnapshot` - the application can choose whether it wants to support
|
||||
restarting restoration, or simply abort with an error.
|
||||
|
||||
#### Snapshot Verification
|
||||
|
||||
Once all chunks have been accepted, Tendermint issues an `Info` ABCI call to retrieve the
|
||||
`LastBlockAppHash`. This is compared with the trusted app hash from the chain, retrieved and
|
||||
verified using the light client. Tendermint also checks that `LastBlockHeight` corresponds to the
|
||||
height of the snapshot.
|
||||
|
||||
This verification ensures that an application is valid before joining the network. However, the
|
||||
snapshot restoration may take a long time to complete, so applications may want to employ additional
|
||||
verification during the restore to detect failures early. This might e.g. include incremental
|
||||
verification of each chunk against the app hash (using bundled Merkle proofs), checksums to
|
||||
protect against data corruption by the disk or network, and so on. However, it is important to
|
||||
note that the only trusted information available is the app hash, and all other snapshot metadata
|
||||
can be spoofed by adversaries.
|
||||
|
||||
Apps may also want to consider state sync denial-of-service vectors, where adversaries provide
|
||||
invalid or harmful snapshots to prevent nodes from joining the network. The application can
|
||||
counteract this by asking Tendermint to ban peers. As a last resort, node operators can use
|
||||
P2P configuration options to whitelist a set of trusted peers that can provide valid snapshots.
|
||||
|
||||
#### Transition to Consensus
|
||||
|
||||
Once the snapshot has been restored, Tendermint gathers additional information necessary for
|
||||
bootstrapping the node (e.g. chain ID, consensus parameters, validator sets, and block headers)
|
||||
from the genesis file and light client RPC servers. It also fetches and records the `AppVersion`
|
||||
from the ABCI application.
|
||||
|
||||
Once the node is bootstrapped with this information and the restored state machine, it transitions
|
||||
to fast sync (if enabled) to fetch any remaining blocks up the the chain head, and then
|
||||
transitions to regular consensus operation. At this point the node operates like any other node,
|
||||
apart from having a truncated block history at the height of the restored snapshot.
|
||||
|
||||
@@ -85,7 +85,7 @@ it is the standard way to encode integers in Protobuf. It is also generally shor
|
||||
As noted above, this prefixing does not apply for GRPC.
|
||||
|
||||
An ABCI server must also be able to support multiple connections, as
|
||||
Tendermint uses four connections.
|
||||
Tendermint uses three connections.
|
||||
|
||||
### Async vs Sync
|
||||
|
||||
|
||||
@@ -1,77 +0,0 @@
|
||||
# State Sync Reactor
|
||||
|
||||
State sync allows new nodes to rapidly bootstrap and join the network by discovering, fetching,
|
||||
and restoring state machine snapshots. For more information, see the [state sync ABCI section](../abci/apps.md#state-sync).
|
||||
|
||||
The state sync reactor has two main responsibilites:
|
||||
|
||||
* Serving state machine snapshots taken by the local ABCI application to new nodes joining the
|
||||
network.
|
||||
|
||||
* Discovering existing snapshots and fetching snapshot chunks for an empty local application
|
||||
being bootstrapped.
|
||||
|
||||
The state sync process for bootstrapping a new node is described in detail in the section linked
|
||||
above. While technically part of the reactor (see `statesync/syncer.go` and related components),
|
||||
this document will only cover the P2P reactor component.
|
||||
|
||||
For details on the ABCI methods and data types, see the [ABCI documentation](../abci/abci.md).
|
||||
|
||||
## State Sync P2P Protocol
|
||||
|
||||
When a new node begin state syncing, it will ask all peers it encounters if it has any
|
||||
available snapshots:
|
||||
|
||||
```go
|
||||
type snapshotsRequestMessage struct{}
|
||||
```
|
||||
|
||||
The receiver will query the local ABCI application via `ListSnapshots`, and send a message
|
||||
containing snapshot metadata (limited to 4 MB) for each of the 10 most recent snapshots:
|
||||
|
||||
```go
|
||||
type snapshotsResponseMessage struct {
|
||||
Height uint64
|
||||
Format uint32
|
||||
Chunks uint32
|
||||
Hash []byte
|
||||
Metadata []byte
|
||||
}
|
||||
```
|
||||
|
||||
The node running state sync will offer these snapshots to the local ABCI application via
|
||||
`OfferSnapshot` ABCI calls, and keep track of which peers contain which snapshots. Once a snapshot
|
||||
is accepted, the state syncer will request snapshot chunks from appropriate peers:
|
||||
|
||||
```go
|
||||
type chunkRequestMessage struct {
|
||||
Height uint64
|
||||
Format uint32
|
||||
Index uint32
|
||||
}
|
||||
```
|
||||
|
||||
The receiver will load the requested chunk from its local application via `LoadSnapshotChunk`,
|
||||
and respond with it (limited to 16 MB):
|
||||
|
||||
```go
|
||||
type chunkResponseMessage struct {
|
||||
Height uint64
|
||||
Format uint32
|
||||
Index uint32
|
||||
Chunk []byte
|
||||
Missing bool
|
||||
}
|
||||
```
|
||||
|
||||
Here, `Missing` is used to signify that the chunk was not found on the peer, since an empty
|
||||
chunk is a valid (although unlikely) response.
|
||||
|
||||
The returned chunk is given to the ABCI application via `ApplySnapshotChunk` until the snapshot
|
||||
is restored. If a chunk response is not returned within some time, it will be re-requested,
|
||||
possibly from a different peer.
|
||||
|
||||
The ABCI application is able to request peer bans and chunk refetching as part of the ABCI protocol.
|
||||
|
||||
If no state sync is in progress (i.e. during normal operation), any unsolicited response messages
|
||||
are discarded.
|
||||
Reference in New Issue
Block a user