diff --git a/rust-spec/fastsync/README.md b/rust-spec/fastsync/README.md new file mode 100644 index 000000000..9e2f91796 --- /dev/null +++ b/rust-spec/fastsync/README.md @@ -0,0 +1,50 @@ +# Fast Sync Subprotocol Specification + +This directory contains English and TLA+ specifications for the FastSync +protocol as it is currently implemented in the Tendermint Core codebase. + +## English Specification + +The [English Specification](fastsync.md) provides a detailed description of the +fast sync problem and the properties a correct protocol must satisfy. It also +includes a detailed description of the protocol as currently implemented in Go, +and an anlaysis of the implementation with respect to the properties. + +It was found that the current implementation does not satisfy certain +properties, and is therefore not a correct solution to the fast sync problem. +The issue discovered holds for all previous implementations of the protocol. A +fix is described which is straight forward to implement. + +## TLA+ Specification + +Two TLA+ specifications are provided: a high level [specification +of the protocol](fastsync.tla) and a low level [specification of the scheduler +component of the implementation](scheduler.tla). Both specifications contain +properties that may be checked by the TLC model checker, though only for small +values of the relevant parameters. + +We will continue to refine these specifications in our research work, +to deduplicate +the redundancies between them, improve their utility to researchers and +engineers, and to improve their verifiability. For now, they provide a complete +description of the fast sync protocol in TLA+; especially the +[scheduler.tla](scheduler.tla), which maps very closely to the current +implementation of the [scheduler in Go](https://github.com/tendermint/tendermint/blob/master/blockchain/v2/scheduler.go). + +The [scheduler.tla](scheduler.tla) can be model checked in TLC with the following +parameters: + +- Constants: + - numRequests <- 2 + - PeerIDs <- 0..2 + - ultimateHeight <- 3 +- Invariants: + - TypeOK +- Properties: + - TerminationWhenNoAdvance + - TerminationGoodPeers + - TerminationAllCases +- Proofs that properties are not vacuously true: + - TerminationGoodPeersPre + - TerminationAllCases + - SchedulerIncreasePre diff --git a/rust-spec/fastsync/Tinychain.tla b/rust-spec/fastsync/Tinychain.tla new file mode 100644 index 000000000..26200c4f6 --- /dev/null +++ b/rust-spec/fastsync/Tinychain.tla @@ -0,0 +1,113 @@ +-------------------------- MODULE Tinychain ---------------------------------- +(* A very abstract model of Tendermint blockchain. Its only purpose is to highlight + the relation between validator sets, next validator sets, and last commits. + *) + +EXTENDS Integers + +\* type annotation +a <: b == a + +\* the type of validator sets, e.g., STRING +VST == STRING + +\* LastCommit type. +\* It contains: +\* 1) the flag of whether the block id equals to the hash of the previous block, and +\* 2) the set of the validators who have committed the block. +\* In the implementation, blockId is the hash of the previous block, which cannot be forged. +\* We abstract block id into whether it equals to the hash of the previous block or not. +LCT == [blockIdEqRef |-> BOOLEAN, committers |-> VST] + +\* Block type. +\* A block contains its height, validator set, next validator set, and last commit. +\* Moreover, it contains the flag that tells us whether the block is equal to the one +\* on the reference chain (this is an abstraction of hash). +BT == [height |-> Int, hashEqRef |-> BOOLEAN, wellFormed |-> BOOLEAN, + VS |-> VST, NextVS |-> VST, lastCommit |-> LCT] + +CONSTANTS + (* + A set of abstract values, each value representing a set of validators. + For the purposes of this specification, they can be any values, + e.g., "s1", "s2", etc. + *) + VALIDATOR_SETS, + (* a nil validator set that is outside of VALIDATOR_SETS *) + NIL_VS, + (* The maximal height, up to which the blockchain may grow *) + MAX_HEIGHT + +Heights == 1..MAX_HEIGHT + +\* the set of all potential commits +Commits == [blockIdEqRef: BOOLEAN, committers: VALIDATOR_SETS] + +\* the set of all potential blocks, not necessarily coming from the blockchain +Blocks == + [height: Heights, hashEqRef: BOOLEAN, wellFormed: BOOLEAN, + VS: VALIDATOR_SETS, NextVS: VALIDATOR_SETS, lastCommit: Commits] + +\* Does the chain contain a sound sequence of blocks that could be produced by +\* a 2/3 of faulty validators. This operator can be used to initialise the chain! +\* Since we are abstracting validator sets with VALIDATOR_SETS, which are +\* 2/3 quorums, we just compare committers to those sets. In a more detailed +\* specification, one would write the \subseteq operator instead of equality. +IsCorrectChain(chain) == + \* restrict the structure of the blocks, to decrease the TLC search space + LET OkCommits == [blockIdEqRef: {TRUE}, committers: VALIDATOR_SETS] + OkBlocks == [height: Heights, hashEqRef: {TRUE}, wellFormed: {TRUE}, + VS: VALIDATOR_SETS, NextVS: VALIDATOR_SETS, lastCommit: OkCommits] + IN + /\ chain \in [1..MAX_HEIGHT -> OkBlocks] + /\ \A h \in 1..MAX_HEIGHT: + LET b == chain[h] IN + /\ b.height = h \* the height is correct + /\ h > 1 => + LET p == chain[h - 1] IN + /\ b.VS = p.NextVS \* the validators propagate from the previous block + /\ b.lastCommit.committers = p.VS \* and they are the committers + + +\* The basic properties of blocks on the blockchain: +\* They should pass the validity check and they may verify the next block. + +\* Does the block pass the consistency check against the next validators of the previous block +IsMatchingValidators(block, nextVS) == + \* simply check that the validator set is propagated correctly. + \* (the implementation tests hashes and the application state) + block.VS = nextVS + +\* Does the block verify the commit (of the next block) +PossibleCommit(block, commit) == + \* the commits are signed by the block validators + /\ commit.committers = block.VS + \* The block id in the commit matches the block hash (abstract comparison). + \* (The implementation has extensive tests for that.) + \* this is an abstraction of: commit.blockId = hash(block) + \* + \* These are possible scenarios on the concrete hashes: + \* + \* scenario 1: commit.blockId = 10 /\ hash(block) = 10 /\ hash(ref) = 10 + \* scenario 2: commit.blockId = 20 /\ hash(block) = 20 /\ block.VS /= ref.VS + \* scenario 3: commit.blockId = 50 /\ hash(block) = 100 + \* scenario 4: commit.blockId = 10 /\ hash(block) = 100 + \* scenario 5: commit.blockId = 100 /\ hash(block) = 10 + /\ commit.blockIdEqRef = block.hashEqRef + \* the following test would be cheating, as we do not have access to the + \* reference chain: + \* /\ commit.blockIdEqRef + +\* Basic invariants + +\* every block has the validator set that is chosen by its predecessor +ValidBlockInv(chain) == + \A h \in 2..MAX_HEIGHT: + IsMatchingValidators(chain[h], chain[h - 1].NextVS) + +\* last commit of every block is signed by the validators of the predecessor +VerifiedBlockInv(chain) == + \A h \in 2..MAX_HEIGHT: + PossibleCommit(chain[h - 1], chain[h].lastCommit) + +================================================================================== diff --git a/rust-spec/fastsync/fastsync.md b/rust-spec/fastsync/fastsync.md new file mode 100644 index 000000000..aaf0e7334 --- /dev/null +++ b/rust-spec/fastsync/fastsync.md @@ -0,0 +1,1216 @@ +# Fastsync + +Fastsync is a protocol that is used by a node to catch-up to the +current state of a Tendermint blockchain. Its typical use case is a +node that was disconnected from the system for some time. The +recovering node locally has a copy of a prefix of the blockchain, +and the corresponding application state that is slightly outdated. It +then queries its peers for the blocks that were decided on by the +Tendermint blockchain during the period the full node was +disconnected. After receiving these blocks, it executes the +transactions in the blocks in order to catch-up to the current height +of the blockchain and the corresponding application state. + +In practice it is sufficient to catch-up only close to the current +height: The Tendermint consensus reactor implements its own catch-up +functionality and can synchronize a node that is close to the current height, +perhaps within 10 blocks away from the current height of the blockchain. +Fastsync should bring a node within this range. + +## Outline + +- [Part I](#part-i---tendermint-blockchain): Introduction of Tendermint +blockchain terms that are relevant for FastSync protocol. + +- [Part II](#part-ii---sequential-definition-of-fastsync-problem): Introduction +of the problem addressed by the Fastsync protocol. + - [Fastsync Informal Problem + statement](#Fastsync-Informal-Problem-statement): For the general + audience, that is, engineers who want to get an overview over what + the component is doing from a bird's eye view. + + - [Sequential Problem statement](#Sequential-Problem-statement): + Provides a mathematical definition of the problem statement in + its sequential form, that is, ignoring the distributed aspect of + the implementation of the blockchain. + +- [Part III](#part-iii---fastsync-as-distributed-system): Distributed + aspects of the fast sync problem, system assumptions and temporal + logic specifications. + + - [Computational Model](#Computational-Model): + timing and correctness assumptions. + + - [Distributed Problem Statement](#Distributed-Problem-Statement): + temporal properties that formalize safety and liveness + properties of fast sync in distributed setting. + +- [Part IV](#part-iv---fastsync-protocol): Specification of Fastsync V2 + (the protocol underlying the current Golang implementation). + + - [Definitions](#Definitions): Describes inputs, outputs, + variables used by the protocol, auxiliary functions + + - [FastSync V2](#FastSync-V2): gives an outline of the solution, + and details of the functions used (with preconditions, + postconditions, error conditions). + + - [Algorithm Invariants](#Algorithm-Invariants): invariants over + the protocol variables that the implementation should maintain. + +- [Part V](#part-v---analysis-and-improvements): Analysis + of Fastsync V2 that highlights several issues that prevent achieving + some of the desired fault-tolerance properties. We also give some + suggestions on how to address the issues in the future. + + - [Analysis of Fastsync V2](#Analysis-of-Fastsync-V2): describes + undesirable scenarios of Fastsync V2, and why they violate + desirable temporal logic specification in an unreliable + distributed system. + + - [Suggestions](#Suggestions-for-an-Improved-Fastsync-Implementation) to address the issues discussed in the analysis. + +In this document we quite extensively use tags in order to be able to +reference assumptions, invariants, etc. in future communication. In +these tags we frequently use the following short forms: + +- TMBC: Tendermint blockchain +- SEQ: for sequential specifications +- FS: Fastsync +- LIVE: liveness +- SAFE: safety +- INV: invariant +- A: assumption +- V2: refers to specifics of Fastsync V2 +- FS-VAR: refers to properties of Fastsync protocol variables +- NewFS: refers to improved future Fastsync implementations + +# Part I - Tendermint Blockchain + +We will briefly list some of the notions of Tendermint blockchains that are +required for this specification. More details can be found [here][block]. + +#### **[TMBC-HEADER]** + +A set of blockchain transactions is stored in a data structure called +*block*, which contains a field called *header*. (The data structure +*block* is defined [here][block]). As the header contains hashes to +the relevant fields of the block, for the purpose of this +specification, we will assume that the blockchain is a list of +headers, rather than a list of blocks. + +#### **[TMBC-SEQ]** + +The Tendermint blockchain is a list *chain* of headers. + +#### **[TMBC-SEQ-GROW]** + +During operation, new headers may be appended to the list one by one. + +> In the following, *ETIME* is a lower bound +> on the time interval between the times at which two +> successor blocks are added. + +#### **[TMBC-SEQ-APPEND-E]** + +If a header is appended at time *t* then no additional header will be +appended before time *t + ETIME*. + +#### **[TMBC-AUTH-BYZ]** + +We assume the authenticated Byzantine fault model in which no node (faulty or +correct) may break digital signatures, but otherwise, no additional +assumption is made about the internal behavior of faulty +nodes. That is, faulty nodes are only limited in that they cannot forge +messages. + + + + + +> We observed that in the existing documentation the term +> *validator* refers to both a data structure and a full node that +> participates in the distributed computation. Therefore, we introduce +> the notions *validator pair* and *validator node*, respectively, to +> distinguish these notions in the cases where they are not clear from +> the context. + +#### **[TMBC-VALIDATOR-PAIR]** + +Given a full node, a +*validator pair* is a pair *(address, voting_power)*, where + +- *address* is the address (public key) of a full node, +- *voting_power* is an integer (representing the full node's + voting power in a given consensus instance). + +> In the Golang implementation the data type for *validator +> pair* is called `Validator`. + +#### **[TMBC-VALIDATOR-SET]** + +A *validator set* is a set of validator pairs. For a validator set +*vs*, we write *TotalVotingPower(vs)* for the sum of the voting powers +of its validator pairs. + +#### **[TMBC-CORRECT]** + +We define a predicate *correctUntil(n, t)*, where *n* is a node and *t* is a +time point. +The predicate *correctUntil(n, t)* is true if and only if the node *n* +follows all the protocols (at least) until time *t*. + +#### **[TMBC-TIME-PARAMS]** + +A blockchain has the following configuration parameters: + +- *unbondingPeriod*: a time duration. +- *trustingPeriod*: a time duration smaller than *unbondingPeriod*. + +#### **[TMBC-FM-2THIRDS]** + +If a block *h* is in the chain, +then there exists a subset *CorrV* +of *h.NextValidators*, such that: + +- *TotalVotingPower(CorrV) > 2/3 + TotalVotingPower(h.NextValidators)*; +- For every validator pair *(n,p)* in *CorrV*, it holds *correctUntil(n, + h.Time + trustingPeriod)*. + +#### **[TMBC-CORR-FULL]** + +Every correct full node locally stores a prefix of the +current list of headers from [**[TMBC-SEQ]**][TMBC-SEQ-link]. + +# Part II - Sequential Definition of Fastsync Problem + +## Fastsync Informal Problem statement + +A full node has as input a block of the blockchain at height *h* and +the corresponding application state (or the prefix of the current +blockchain until height *h*). It has access to a set *peerIDs* of full +nodes called *peers* that it knows of. The full node uses the peers +to read blocks of the Tendermint blockchain (in a safe way, that is, +it checks the soundness conditions), until it has read the most recent +block and then terminates. + +## Sequential Problem statement + +*Fastsync* gets as input a block of height *h* and the corresponding +application state *s* that corresponds to the block and state of that +height of the blockchain, and produces +as output (i) a list *L* of blocks starting at height *h* to some height +*terminationHeight*, and (ii) the application state when applying the +transactions of the list *L* to *s*. + +> In Tendermint, the commit for block of height *h* is contained in block *h + 1*, +> and thus the block of height *h + 1* is needed to verify the block of +> height *h*. Let us therefore clarify the following on the +> termination height: +> The returned value *terminationHeight* is the height of the block with the largest +> height that could be verified. In order to do so, *Fastsync* needs the +> block at height *terminationHeight + 1* of the blockchain. + +Fastsync has to satisfy the following properties: + +#### **[FS-SEQ-SAFE-START]** + +Let *bh* be the height of the blockchain at the time *Fastsync* +starts. By assumption we have *bh >= h*. +When *Fastsync* terminates, it outputs a list of all blocks from +height *h* to some height *terminationHeight >= bh - 1*. + +> The above property is independent of how many blocks are added to the +> blockchain while Fastsync is running. It links the target height to the +> initial state. If Fastsync has to catch-up many blocks, it would be +> better to link the target height to a time close to the +> termination. This is captured by the following specification: + +#### **[FS-SEQ-SAFE-SYNC]** + +Let *eh* be the height of the blockchain at the time *Fastsync* +terminates. There is a constant *D >= 1* such that when *Fastsync* +terminates, it outputs a list of all blocks from height *h* to some +height *terminationHeight >= eh - D*. + +#### **[FS-SEQ-SAFE-STATE]** + +Upon termination, the application state is the one that corresponds to +the blockchain at height *terminationHeight*. + +#### **[FS-SEQ-LIVE]** + +*Fastsync* eventually terminates. + +# Part III - FastSync as Distributed System + +## Computational Model + +#### **[FS-A-NODE]** + +We consider a node *FS* that performs *Fastsync*. + +#### **[FS-A-PEER-IDS]** + +*FS* has access to a set *peerIDs* of IDs (public keys) of peers + . During the execution of *Fastsync*, another protocol (outside + of this specification) may add new IDs to *peerIDs*. + +#### **[FS-A-PEER]** + +Peers can be faulty, and we do not make any assumptions about the number or +ratio of correct/faulty nodes. Faulty processes may be Byzantine +according to [**[TMBC-AUTH-BYZ]**][TMBC-Auth-Byz-link]. + +#### **[FS-A-VAL]** + +The system satisfies [**[TMBC-AUTH-BYZ]**][TMBC-Auth-Byz-link] and +[**[TMBC-FM-2THIRDS]**][TMBC-FM-2THIRDS-link]. Thus, there is a +blockchain that satisfies the soundness requirements (that is, the +validation rules in [[block]]). + +#### **[FS-A-COMM]** + +Communication between the node *FS* and all correct peers is reliable and +bounded in time: there is a message end-to-end delay *Delta* such that +if a message is sent at time *t* by a correct process to a correct +process, then it will be received and processed by time *t + +Delta*. This implies that we need a timeout of at least *2 Delta* for +remote procedure calls to ensure that the response of a correct peer +arrives before the timeout expires. + +## Distributed Problem Statement + +### Two Kinds of Termination + +We do not assume that there is a correct full node in +*peerIDs*. Under this assumption no protocol can guarantee the combination +of the properties [FS-SEQ-LIVE] and +[FS-SEQ-SAFE-START] and [FS-SEQ-SAFE-SYNC] described in the sequential +specification above. Thus, in the (unreliable) distributed setting, we +consider two kinds of termination (successful and failure) and we will +specify below under what (favorable) conditions *Fastsync* ensures to +terminate successfully, and satisfy the requirements of the sequential +problem statement: + +#### **[FS-DIST-LIVE]** + +*Fastsync* eventually terminates: it either *terminates successfully* or +it *terminates with failure*. + +### Fairness + +As mentioned above, without assumptions on the correctness of some +peers, no protocol can achieve the required specifications. Therefore, +we consider the following (fairness) constraint in the +safety and liveness properties below: + +#### **[FS-SOME-CORR-PEER]** + +Initially, the set *peerIDs* contains at least one correct full node. + +> While in principle the above condition [FS-SOME-CORR-PEER] +> can be part of a sufficient +> condition to solve [FS-SEQ-LIVE] and +> [FS-SEQ-SAFE-START] and [FS-SEQ-SAFE-SYNC] in the distributed +> setting (their corresponding properties are given below), we will discuss in +> [Part V](#part-v---analysis-and-improvements) that the +> current implementation of Fastsync (V2) requires the much +> stronger requirement [**[FS-ALL-CORR-PEER]**](#FS-ALL-CORR-PEER) +> given in Part V. + +### Safety + +> As this specification does +> not assume that a correct peer is at the most recent height +> of the blockchain (it might lag behind), the property [FS-SEQ-SAFE-START] +> cannot be ensured in an unreliable distributed setting. We consider +> the following relaxation. (Which is typically sufficient for +> Tendermint, as the consensus reactor then synchronizes from that +> height.) + +#### **[FS-DIST-SAFE-START]** + +Let *maxh* be the maximum +height of a correct peer [**[TMBC-CORR-FULL]**][TMBC-CORR-FULL-link] +in *peerIDs* at the time *Fastsync* starts. If *FastSync* terminates +successfully, it is at some height *terminationHeight >= maxh - 1*. + +> To address [FS-SEQ-SAFE-SYNC] we consider the following property in +> the distributed setting. See the comments below on the relation to +> the sequential version. + +#### **[FS-DIST-SAFE-SYNC]** + +Under [FS-SOME-CORR-PEER], there exists a constant time interval *TD*, such +that if *term* is the time *Fastsync* terminates and +*maxh* is the maximum height of a correct peer +[**[TMBC-CORR-FULL]**][TMBC-CORR-FULL-link] in *peerIDs* at the time +*term - TD*, then if *FastSync* terminates successfully, it is at +some height *terminationHeight >= maxh - 1*. + +> *TD* might depend on timeouts etc. We suggest that an acceptable +> value for *TD* is in the range of approx. 10 sec., that is the +> interval between two calls `QueryStatus()`; see below. +> We use *term - TD* as reference time, as we have to account +> for communication delay between the peer and *FS*. After the peer sent +> the last message to *FS*, the peer and *FS* run concurrently and +> independently. There is no assumption on the rate at which a peer can +> add blocks (e.g., it might be in the process of catching up +> itself). Hence, without additional assumption we cannot link +> [FS-DIST-SAFE-SYNC] to +> [**[FS-SEQ-SAFE-SYNC]**](#FS-SEQ-SAFE-SYNC), in particular to the +> parameter *D*. We discuss a +> way to achieve this below: +> **Relation to [FS-SEQ-SAFE-SYNC]:** +> Under [FS-SOME-CORR-PEER], if *peerIDs* contains a full node that is +> "synchronized with the blockchain", and *blockchainheight* is the height +> of the blockchain at time *term*, then *terminationHeight* may even +> achieve +> *blockchainheight - TD / ETIME*; +> cf. [**[TMBC-SEQ-APPEND-E]**][TMBC-SEQ-APPEND-E-link], that is, +> the parameter *D* from [FS-SEQ-SAFE-SYNC] is in the range of *TD / ETIME*. + +#### **[FS-DIST-SAFE-STATE]** + +It is the same as the sequential version +[**[FS-SEQ-SAFE-STATE]**](#FS-SEQ-SAFE-STATE). + +#### **[FS-DIST-NONABORT]** + +If there is one correct process in *peerIDs* [FS-SOME-CORR-PEER], +*Fastsync* never terminates with failure. (Together with [FS-DIST-LIVE] + that means it will terminate successfully.) + +# Part IV - Fastsync protocol + +Here we provide a specification of the FastSync V2 protocol as it is currently +implemented. The V2 design is the result of significant refactoring to improve +the testability and determinism in the implementation. The architecture is +detailed in +[ADR-43](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-043-blockchain-riri-org.md). + +In the original design, a go-routine (thread of execution) was spawned for each block requested, and +was responsible for both protocol logic and IO. In the V2 design, protocol logic +is decoupled from IO by using three total threads of execution: a scheduler, a +processer, and a demuxer. + +The scheduler contains the business logic for managing +peers and requesting blocks from them, while the processor handles the +computationally expensive block execution. Both the scheduler and processor +are structured as finite state machines that receive input events and emit +output events. The demuxer is responsible for all IO, including translating +between internal events and IO messages, and routing events between components. + +Protocols in Tendermint can be considered to consist of two +components: a "core" state machine and a "peer" state machine. The core state +machine refers to the internal state managed by the node, while the peer state +machine determines what messages to send to peers. In the FastSync design, the +core and peer state machines correspond to the processor and scheduler, +respectively. + +In the case of FastSync, the core state machine (the processor) is effectively +just the Tendermint block execution function, while virtually all protocol logic +is contained in the peer state machine (the scheduler). The processor is +only implemented as a separate component due to the computationally expensive nature +of block execution. We therefore focus our specification here on the peer state machine +(the scheduler component), capturing the core state machine (the processor component) +in the single `Execute` function, defined below. + +While the internal details of the `Execute` function are not relevant for the +FastSync protocol and are thus not part of this specification, they will be +defined in detail at a later date in a separate Block Execution specification. + +## Definitions + +> We now introduce variables and auxiliary functions used by the protocol. + +### Inputs + +- *startBlock*: the block Fastsync starts from +- *startState*: application state corresponding to *startBlock.Height* + +#### **[FS-A-V2-INIT]** + +- *startBlock* is from the blockchain +- *startState* is the application state of the blockchain at Height *startBlock.Height*. + +### Variables + +- *height*: kinitially *startBlock.Height + 1* + > height should be thought of the "height of the next block we need to download" +- *state*: initially *startState* +- *peerIDs*: peer addresses [FS-A-PEER-IDS](#fs-a-peer-ids) +- *peerHeights*: stores for each peer the height it reported. initially 0 +- *pendingBlocks*: stores for each height which peer was + queried. initially nil for each height +- *receivedBlocks*: stores for each height which peer returned + it. initially nil +- *blockstore*: stores for each height greater than + *startBlock.Height*, the block of that height. initially nil for + all heights +- *peerTimeStamp*: stores for each peer the last time a block was + received + +- *pendingTime*: stores for a given height the time a block was requested +- *peerRate*: stores for each peer the rate of received data in Bytes/second + +### Auxiliary Functions + +#### **[FS-FUNC-TARGET]** + +- *TargetHeight = max {peerHeigts(addr): addr in peerIDs} union {height}* + +#### **[FS-FUNC-MATCH]** + +```go +func VerifyCommit(b Block, c Commit) Boolean +``` + +- Comment + - Corresponds to `verifyCommit(chainID string, blockID + types.BlockID, height int64, commit *types.Commit) error` in the + current Golang implementation, which expects blockID and height + (from the first block) and the + corresponding commit from the following block. We use the + simplified form for ease in presentation. + +- Implementation remark + + + + - implements the check that *c* is a valid commit for block *b* +- Expected precondition + - *c* is a valid commit for block *b* +- Expected postcondition + - *true* if precondition holds + - *false* if precondition is violated +- Error condition + - none + +---- + +### Messages + +Peers participating in FastSync exchange the following set of messages. Messages are +encoded using the Amino serialization protocol. We define each message here +using Go syntax, annoted with the Amino type name. The prefix `bc` refers to +`blockchain`, which is the name of the FastSync reactor in the Go +implementation. + +#### bcBlockRequestMessage + +```go +// type: "tendermint/blockchain/BlockRequest" +type bcBlockRequestMessage struct { + Height int64 +} +``` + +Remark: + +- `msg.Height` > 0 + +#### bcNoBlockResponseMessage + +```go +// type: "tendermint/blockchain/NoBlockResponse" +type bcNoBlockResponseMessage struct { + Height int64 +} +``` + +Remark: + +- `msg.Height` > 0 +- This message type is included in the protocol for convenience and is not expected to be sent between two correct peers + +#### bcBlockResponseMessage + +```go +// type: "tendermint/blockchain/BlockResponse" +type bcBlockResponseMessage struct { + Block *types.Block +} +``` + +Remark: + +- `msg.Block` is a Tendermint block as defined in [[block]]. +- `msg.Block` != nil + +#### bcStatusRequestMessage + +```go +// type: "tendermint/blockchain/StatusRequest" +type bcStatusRequestMessage struct { + Height int64 +} +``` + +Remark: + +- `msg.Height` > 0 + +#### bcStatusResponseMessage + +```go +// type: "tendermint/blockchain/StatusResponse" +type bcStatusResponseMessage struct { + Height int64 +} +``` + +Remark: + +- `msg.Height` > 0 + +### Remote Functions + +Peers expose the following functions over +remote procedure calls. The "Expected precondition" are only expected for +correct peers (as no assumption is made on internals of faulty +processes [FS-A-PEER]). These functions are implemented using the above defined message types. + +> In this document we describe the communication with peers +via asynchronous RPCs. + +```go +func Status(addr Address) (int64, error) +``` + +- Implementation remark + - RPC to full node *addr* + - Request message: `bcStatusRequestMessage`. + - Response message: `bcStatusResponseMessage`. +- Expected precondition + - none +- Expected postcondition + - if *addr* is correct: Returns the current height `height` of the + peer. [FS-A-COMM] + - if *addr* is faulty: Returns an arbitrary height. [**[TMBC-AUTH-BYZ]**][TMBC-Auth-Byz-link] +- Error condition + - if *addr* is correct: none. By [FS-A-COMM] we assume communication is reliable and timely. + - if *addr* is faulty: arbitrary error (including timeout). [**[TMBC-AUTH-BYZ]**][TMBC-Auth-Byz-link] + +---- + + ```go +func Block(addr Address, height int64) (Block, error) +``` + +- Implementation remark + - RPC to full node *addr* + - Request message: `bcBlockRequestMessage`. + - Response message: `bcBlockResponseMessage` or `bcNoBlockResponseMessage`. +- Expected precondition + - 'height` is less than or equal to height of the peer +- Expected postcondition + - if *addr* is correct: Returns the block of height `height` + from the blockchain. [FS-A-COMM] + - if *addr* is faulty: Returns arbitrary or no block [**[TMBC-AUTH-BYZ]**][TMBC-Auth-Byz-link] +- Error condition + - if *addr* is correct: precondition violated (returns `bcNoBlockResponseMessage`). [FS-A-COMM] + - if *addr* is faulty: arbitrary error (including timeout). [**[TMBC-AUTH-BYZ]**][TMBC-Auth-Byz-link] + +---- + +## FastSync V2 + +### Outline + +The protocol is described in terms of functions that are triggered by +(external) events. The implementation uses a scheduler and a +de-multiplexer to deal with communicating with peers and to +trigger the execution of these functions: + +- `QueryStatus()`: regularly (currently every 10sec; necessarily + interval greater than *2 Delta*) queries all peers from *peerIDs* + for their current height [TMBC-CORR-FULL]. It does so + by calling `Status(n)` remotely on all peers *n*. + +- `CreateRequest`: regularly checks whether certain blocks have no + open request. If a block does not have an open request, it requests + one from a peer. It does so by calling `Block(n,h)` remotely on one + peer *n* for a missing height *h*. + +> We have left the strategy how peers are selected unspecified, and +> the currently existing different implementations of Fastsync differ +> in this aspect. In V2, a peer *p* is selected with the minimum number of +> pending requests that can serve the required height *h*, that is +> with *peerHeight(p) >= h*. + +The functions `Status` and `Block` are called by asynchronous +RPC. When they return, the following functions are called: + +- `OnStatusResponse(addr Address, height int64)`: The full node with + address *addr* returns its current height. The function updates the height + information about *addr*, and may also increase *TargetHeight*. + +- `OnBlockResponse(addr Address, b Block)`. The full node with + address *addr* returns a block. It is added to *blockstore*. Then + the auxiliary function `Execute` is called. + +- `Execute()`: Iterates over the *blockstore*. Checks soundness of + the blocks, and + executes the transactions of a sound block and updates *state*. + +> In addition to the functions above, the following two features are +> implemented in Fastsync V2 + +#### **[FS-V2-PEER-REMOVE]** + +Periodically, *peerTimeStamp* and *peerRate* and *pendingTime* are +analyzed. +If a peer *p* +has not provided a block recently (check of *peerTimeStamp[p]*) or it +has not provided sufficiently many data (check of *peerRate[p]*), then +*p* is removed from *peerIDs*. In addition, *pendingTime* is used to +estimate whether the peer that is responsible for the current height +has provided the corresponding block on time. + +#### **[FS-V2-TIMEOUT]** + +*Fastsync V2* starts a timeout whenever a block is +executed (that is, when the height is incremented). If the timeout expires +before the next block is executed, *Fastsync* terminates. +If this happens, then *Fastsync* terminates +with failure. + +### Details + + + +```go +func QueryStatus() +``` + +- Expected precondition + - peerIDs initialized and non-empty +- Expected postcondition + - call asynchronously `Status(n)` at each peer *n* in *peerIDs*. +- Error condition + - fails if precondition is violated + +---- + +```go +func OnStatusResponse(addr Address, ht int64) +``` + +- Comment + - *ht* is a height + - peers can provide the status without being called +- Expected precondition + - *peerHeights(addr) <= ht* +- Expected postcondition + - *peerHeights(addr) = ht* + - *TargetHeight* is updated +- Error condition + - if precondition is violated: *addr* not in *peerIDs* (that is, + *addr* is removed from *peerIDs*) +- Timeout condition + - if `OnStatusResponse(addr, ht)` was not invoked within *2 Delta* after + `Status(addr)` was called: *addr* not in *peerIDs* + +---- + +```go +func CreateRequest +``` + +- Expected precondition + - *height < TargetHeight* + - *peerIDs* nonempty +- Expected postcondition + - Function `Block` is called remotely at a peer *addr* in peerIDs + for a missing height *h* + *Remark:* different implementations may have different + strategies to balance the load over the peers + - *pendingblocks(h) = addr* + +---- + +```go +func OnBlockResponse(addr Address, b Block) +``` + +- Comment + - if after adding block *b*, blocks of heights *height* and + *height + 1* are in *blockstore*, then `Execute` is called +- Expected precondition + - *pendingblocks(b.Height) = addr* + - *b* satisfies basic soundness +- Expected postcondition + - if function `Execute` has been executed without error or was not + executed: + - *receivedBlocks(b.Height) = addr* + - *blockstore(b.Height) = b* + - *peerTimeStamp[addr]* is set to a time between invocation and + return of the function. + - *peerRate[addr]* is updated according to size of received + block and time it has passed between current time and last block received from this peer (addr) +- Error condition + - if precondition is violated: *addr* not in *peerIDs*; reset + *pendingblocks(b.Height)* to nil; +- Timeout condition + - if `OnBlockResponse(addr, b)` was not invoked within *2 Delta* after + `Block(addr,h)` was called for *b.Height = h*: *addr* not in *peerIDs* + +---- + +```go +func Execute() +``` + +- Comments + - none +- Expected precondition + - application state is the one of the blockchain at height + *height - 1* + - **[FS-V2-Verif]** for any two blocks *a* and *b* from + *receivedBlocks*: if + *a.Height + 1 = b.Height* then *VerifyCommit (a,b.Commit) = true* +- Expected postcondition + - Any two blocks *a* and *b* violating [FS-V2-Verif]: + *a* and *b* not in *blockstore*; nodes with Address + receivedBlocks(a.Height) and receivedBlocks(b.Height) not in peerIDs + - height is updated height of complete prefix that matches the blockchain + - state is the one of the blockchain at height *height - 1* + - if the new value of *height* is equal to *TargetHeight*, then + Fastsync + **terminates + successfully**. +- Error condition + - none + +---- + +## Algorithm Invariants + +> In contrast to the temporal properties above that define the problem +> statement, the following are invariants on the solution to the +> problem, that is on the algorithm. These invariants are useful for +> the verification, but can also guide the implementation. + +#### **[FS-VAR-STATE-INV]** + +It is always the case that *state* corresponds to the application state of the +blockchain of that height, that is, *state = chain[height - +1].AppState*; *chain* is defined in +[**[TMBC-SEQ]**][TMBC-SEQ-link]. + +#### **[FS-VAR-PEER-INV]** + +It is always the case that the set *peerIDs* only contains nodes that +have not yet misbehaved (by sending wrong data or timing out). + +#### **[FS-VAR-BLOCK-INV]** + +For *startBlock.Height <= i < height - 1*, let *b(i)* be the block with +height *i* in *blockstore*, it always holds that +*VerifyCommit(b(i), b(i+1).Commit) = true*. This means that *height* +can only be incremented if all blocks with lower height have been verified. + +# Part V - Analysis and Improvements + +## Analysis of Fastsync V2 + +#### **[FS-ISSUE-KILL]** + +If two blocks are not matching [FS-V2-Verif], `Execute` dismisses both +blocks and removes the peers that provided these blocks from +*peerIDs*. If block *a* was correct and provided by a correct peer *p*, +and block b was faulty and provided by a faulty peer, the protocol + +- removes the correct peer *p*, although it might be useful to + download blocks from it in the future +- removes the block *a*, so that a fresh copy of *a* needs to be downloaded + again from another peer + +By [FS-A-PEER] we do not put a restriction on the number + of faulty peers, so that faulty peers can make *FS* to remove all + correct peers from *peerIDs*. As a result, this version of + *Fastsync* violates [FS-DIST-SAFE-SYNC]. + +#### **[FS-ISSUE-NON-TERM]** + +Due to [**[FS-ISSUE-KILL]**](#fs-issue-kill), from some point on, only +faulty peers may be in *peerIDs*. They can thus control at which rate +*Fastsync* gets blocks. If the timeout duration from [FS-V2-TIMEOUT] +is greater than the time it takes to add a block to the blockchain +(LTIME in [**[TMBC-SEQ-APPEND-E]**][TMBC-SEQ-APPEND-E-link]), the +protocol may never terminate and thus violate [FS-DIST-LIVE]. This +scenario is even possible if a correct peer is always in *peerIDs*, +but faulty peers are regularly asked for blocks. + +### Consequence + +The issues [FS-ISSUE-KILL] and [FS-ISSUE-NON-TERM] explain why +does not satisfy the property [FS-DIST-LIVE] relevant for termination. +As a result, V2 only solves the specifications in a restricted form, +namely, when all peers are correct: + +#### **[FS-ALL-CORR-PEER]** + +At all times, the set *peerIDs* contains only correct full nodes. + +With this restriction we can give the achieved properties: + +#### **[FS-VC-ALL-CORR-NONABORT]** + +Under [FS-ALL-CORR-PEER], *Fastsync* never terminates with failure. + +#### **[FS-VC-ALL-CORR-LIVE]** + +Under [FS-ALL-CORR-PEER], *Fastsync* eventually terminates successfully. + +> In a fault tolerance context this is problematic, +> as it means that faulty peers can prevent *FastSync* from termination. +> We observe that this also touches other properties, namely, +> [FS-DIST-SAFE-START] and [FS-DIST-SAFE-SYNC]: +> Termination at an acceptable height are all conditional under +> "successful termination". The properties above severely restrict +> under which circumstances FastSync (V2) terminates successfully. +> As a result, large parts of the current +> implementation of are not fault-tolerant. We will +> discuss this, and suggestions how to solve this after the +> description of the current protocol. + +## Suggestions for an Improved Fastsync Implementation + +### Solution for [FS-ISSUE-KILL] + +To avoid [FS-ISSUE-KILL], we observe that +[**[TMBC-FM-2THIRDS]**][TMBC-FM-2THIRDS-link] ensures that from the +point a block was created, we assume that more than two thirds of the +validator nodes are correct until the *trustingPeriod* expires. Under +this assumption, assume the trusting period of *startBlock* is not +expired by the time *FastSync* checks a block *b1* with height +*startBlock.Height + 1*. To do so, we first need to check whether the +Commit in the block *b2* with *startBlock.Height + 2* contains more +than 2/3 of the voting power in *startBlock.NextValidators*. If this +is the case we can check *VerifyCommit (b1,b2.Commit)*. If we perform +checks in this order we observe: + +- By assumption, *startBlock* is OK, +- If the first check (2/3 of voting power) fails, + the peer that provided block *b2* is faulty, +- If the first check passes and the second check + fails (*VerifyCommit*), then the peer that provided *b1* is + faulty. +- If both checks pass, we can trust *b1* + +Based on this reasoning, we can ensure to only remove faulty peers +from *peerIDs*. That is, if +we sequentially verify blocks starting with *startBlock*, we will +never remove a correct peer from *peerIDs* and we will be able to +ensure the following invariant: + +#### **[NewFS-VAR-PEER-INV]** + +If a peer never misbehaves, it is never removed from *peerIDs*. It +follows that under [FS-SOME-CORR-PEER], *peerIDs* is always non-empty. + +> To ensure this, we suggest to change the protocol as follows: + +#### Fastsync has the following configuration parameters + +- *trustingPeriod*: a time duration; cf. + [**[TMBC-TIME-PARAMS]**][TMBC-TIME-PARAMS-link]. + +> [NewFS-A-INIT] is the suggested replacement of [FS-A-V2-INIT]. This will +> allow us to use the established trust to understand precisely which +> peer reported an invalid block in order to ensure the +> invariant [NewFS-VAR-TRUST-INV] below: + +#### **[NewFS-A-INIT]** + +- *startBlock* is from the blockchain, and within *trustingPeriod* +(possible with some extra margin to ensure termination before +*trustingPeriod* expired) +- *startState* is the application state of the blockchain at Height + *startBlock.Height*. +- *startHeight = startBlock.Height* + +#### Additional Variables + +- *trustedBlockstore*: stores for each height greater than or equal to + *startBlock.Height*, the block of that height. Initially it + contains only *startBlock* + +#### **[NewFS-VAR-TRUST-INV]** + +Let *b(i)* be the block in *trustedBlockstore* +with b(i).Height = i. It holds that +for *startHeight < i < height - 1*, +*VerifyCommit (b(i),b(i+1).Commit) = true*. + +> We propose to update the function `Execute`. To do so, we first +> define the following helper functions: + +```go +func ValidCommit(VS ValidatorSet, C Commit) Boolean +``` + +- Comments + - checks validator set based on [**[TMBC-FM-2THIRDS]**][TMBC-FM-2THIRDS-link] +- Expected precondition + - The validators in *C* + - are a subset of VS + - have more than 2/3 of the voting power in VS +- Expected postcondition + - returns *true* if precondition holds, and *false* otherwise +- Error condition + - none + +---- + +```go +func SequentialVerify { + while (true) { + b1 = blockstore[height]; + b2 = blockstore[height+1]; + if b1 == nil or b2 == nil { + exit; + } + if ValidCommit(trustedBlockstore[height - 1].NextValidators, b2.commit) { + // we trust b2 + if VerifyCommit(b1, b2.commit) { + trustedBlockstore.Add(b1); + height = height + 1; + } + else { + // as we trust b2, b1 must be faulty + blockstore.RemoveFromPeer(receivedBlocks[height]); + // we remove all blocks received from the faulty peer + peerIDs.Remove(receivedBlocks(bnew.Height)); + exit; + + } + } else { + // b2 is faulty + blockstore.RemoveFromPeer(receivedBlocks[height + 1]); + // we remove all blocks received from the faulty peer + peerIDs.Remove(receivedBlocks(bnew.Height)); + exit; } + } +} +``` + +- Comments + - none +- Expected precondition + - [NewFS-VAR-TRUST-INV] +- Expected postcondition + - [NewFS-VAR-TRUST-INV] + - there is no block *bnew* with *bnew.Height = height + 1* in + *blockstore* +- Error condition + - none + +---- + +> Then `Execute` just consists in calling `SequentialVerify` and then +> updating the application state to the (new) height. + +```go +func Execute() +``` + +- Comments + - first `SequentialVerify` is executed +- Expected precondition + - application state is the one of the blockchain at height + *height - 1* + - [NewFS-NOT-EXP] *trustedBlockstore[height-1].Time > now - trustingPeriod* +- Expected postcondition + - there is no block *bnew* with *bnew.Height = height + 1* in + *blockstore* + - state is the one of the blockchain at height *height - 1* + - if height = TargetHeight: **terminate successfully** +- Error condition + - fails if [NewFS-NOT-EXP] is violated + +---- + +### Solution for [FS-ISSUE-NON-TERM] + +As discussed above, the advantageous termination requirement is the +combination of [FS-DIST-LIVE] and [FS-DIST-NONABORT], that is, *Fastsync* +should terminate successfully in case there is at least one correct +peer in *peerIDs*. For this we have to ensure that faulty processes +cannot slow us down and provide blocks at a lower rate than the +blockchain may grow. To ensure that we will have to add an assumption +on message delays. + +#### **[NewFS-A-DELTA]** + +*2 Delta < ETIME*; cf. [**[TMBC-SEQ-APPEND-E]**][TMBC-SEQ-APPEND-E-link]. + +> This assumption implies that the timeouts for `OnBlockResponse` and +> `OnStatusResponse` are such that a faulty peer that tries to respond +> slower than *2 Delta* will be removed. In the following we will +> provide a rough estimate on termination time in a fault-prone +> scenario. +> In the following +> we assume that during a "long enough" finite good period no new +> faulty peers are added to *peerIDs*. Below we will sketch how "long +> enough" can be estimated based on the timing assumption in this +> specification. + +#### **[NewFS-A-STATUS-INTERVAL]** + +Let Sigma be the (upper bound on the) +time between two calls of `QueryStatus()`. + +#### **[NewFS-A-GOOD-PERIOD]** + +A time interval *[begin,end]* is *good period* if: + +- *fmax* is the number of faulty peers in *peerIDs* at time *begin* +- *end >= begin + 2 Delta (fmax + 3)* +- no faulty peer is added before time *end* + +> In the analysis below we assume that the termination condition of +> *Fastsync* is +> *height = TargetHeight* in the postcondition of +> `Execute`. Therefore, [NewFS-A-STATUS-INTERVAL] does not interfere +> with this analysis. If a correct peer reports a new height "shortly +> before termination" this leads to an additional round trip to +> request and add the new block. Then [NewFS-A-DELTA] ensures that +> *Fastsync* catches up. + +Arguments: + +1. If a faulty peer *p* reports a faulty block, `SequentialVerify` will + eventually remove *p* from *peerIDs* + +2. By `SequentialVerify`, if a faulty peer *p* reports multiple faulty + blocks, *p* will be removed upon trying to check the block with the + smallest height received from *p*. + +3. Assume whenever a block does not have an open request, `CreateRequest` is + called immediately, which calls `Block(n)` on a peer. Say this + happens at time *t*. There are two cases: + + - by t + 2 Delta a block is added to *blockStore* + - at t + 2 Delta `Block(n)` timed out and *n* is removed from + peer. + +4. Let *f(t)* be the number of faulty peers in *peerIDs* at time *t*; + *f(begin) = fmax*. + +5. Let t_i be the sequence of times `OnBlockResponse(addr,b)` is + invoked or times out with *b.Height = height + 1*. + +6. By 3., + - (a). *t_1 <= begin + 2 Delta* + - (b). *t_{i+1} <= t_i + 2 Delta* + +7. By an inductive argument we prove for *i > 0* that + + - (a). *height(t_{i+1}) > height(t_i)*, or + - (b). *f(t_{i+1}) < f(t_i))* and *height(t_{i+1}) = height(t_i)* + + Argument: if the peer is faulty and does not return a block, the + peer is removed, if it is faulty and returns a faulty block + `SequentialVerify` removes the peer (b). If the returned block is OK, + height is increased (a). + +8. By 2. and 7., faulty peers can delay incrementing the height at + most *fmax* times, where each time "costs" *2 Delta* seconds. We + have additional *2 Delta* initial offset (3a) plus *2 Delta* to get + all missing blocks after the last fault showed itself. (This + assumes that an arbitrary number of blocks can be obtained and + checked within one round-trip 2 Delta; which either needs + conservative estimation of Delta, or a more refined analysis). Thus + we reach the *targetHeight* and terminate by time *end*. + +# References + + + +[[block]] Specification of the block data structure. + + + +[block]: https://github.com/tendermint/spec/blob/d46cd7f573a2c6a2399fcab2cde981330aa63f37/spec/core/data_structures.md + + + +[TMBC-HEADER-link]: #tmbc-header + +[TMBC-SEQ-link]: #tmbc-seq + +[TMBC-CORR-FULL-link]: #tmbc-corrfull + +[TMBC-CORRECT-link]: #tmbc-correct + +[TMBC-Sign-link]: #tmbc-sign + +[TMBC-FaultyFull-link]: #tmbc-faultyfull + +[TMBC-TIME-PARAMS-link]: #tmbc-time-params + +[TMBC-SEQ-APPEND-E-link]: #tmbc-seq-append-e + +[TMBC-FM-2THIRDS-link]: #tmbc-fm-2thirds + +[TMBC-Auth-Byz-link]: #tmbc-auth-byz + +[TMBC-INV-SIGN-link]: #tmbc-inv-sign + +[TMBC-SOUND-DISTR-PossCommit--link]: #tmbc-sound-distr-posscommit + +[TMBC-INV-VALID-link]: #tmbc-inv-valid + + + + + + + + + + + + + + + + + + + + + + + + + +[LCV-VC-LIVE-link]: https://github.com/informalsystems/VDD/tree/master/lightclient/verification.md#lcv-vc-live + +[lightclient]: https://github.com/interchainio/tendermint-rs/blob/e2cb9aca0b95430fca2eac154edddc9588038982/docs/architecture/adr-002-lite-client.md + +[failuredetector]: https://github.com/informalsystems/VDD/blob/master/liteclient/failuredetector.md + +[fullnode]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md + +[FN-LuckyCase-link]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#fn-luckycase + +[blockchain-validator-set]: https://github.com/tendermint/spec/blob/d46cd7f573a2c6a2399fcab2cde981330aa63f37/spec/core/data_structures.md#data-structures + +[fullnode-data-structures]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#data-structures + +[FN-ManifestFaulty-link]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#fn-manifestfaulty diff --git a/rust-spec/fastsync/fastsync.tla b/rust-spec/fastsync/fastsync.tla new file mode 100644 index 000000000..e1b7b812b --- /dev/null +++ b/rust-spec/fastsync/fastsync.tla @@ -0,0 +1,825 @@ +----------------------------- MODULE fastsync ----------------------------- +(* + In this document we give the high level specification of the fast sync + protocol as implemented here: + https://github.com/tendermint/tendermint/tree/master/blockchain/v2. + +We assume a system in which one node is trying to sync with the blockchain +(replicated state machine) by downloading blocks from the set of full nodes +(we call them peers) that are block providers, and executing transactions +(part of the block) against the application. + +Peers can be faulty, and we don't make any assumption about the rate of correct/faulty +nodes in the node peerset (i.e., they can all be faulty). Correct peers are part +of the replicated state machine, i.e., they manage blockchain and execute +transactions against the same deterministic application. We don't make any +assumptions about the behavior of faulty processes. Processes (client and peers) +communicate by message passing. + + In this specification, we model this system with two parties: + - the node (state machine) that is doing fastsync and + - the environment with which node interacts. + +The environment consists of the set of (correct and faulty) peers with +which node interacts as part of fast sync protocol, but also contains some +aspects (adding/removing peers, timeout mechanism) that are part of the node +local environment (could be seen as part of the runtime in which node +executes). + +As part of the fast sync protocol a node and the peers exchange the following messages: + +- StatusRequest +- StatusResponse +- BlockRequest +- BlockResponse +- NoBlockResponse. + +A node is periodically issuing StatusRequests to query peers for their current height (to decide what +blocks to ask from what peers). Based on StatusResponses (that are sent by peers), the node queries +blocks for some height(s) by sending peers BlockRequest messages. A peer provides a requested block by +BlockResponse message. If a peer does not want to provide a requested block, then it sends NoBlockResponse message. +In addition to those messages, a node in this spec receives additional input messages (events): + +- AddPeer +- RemovePeer +- SyncTimeout. + +These are the control messages that are provided to the node by its execution enviornment. AddPeer +is for the case when a connection is established with a peer; similarly RemovePeer is for the case +a connection with the peer is terminated. Finally SyncTimeout is used to model a timeout trigger. + +We assume that fast sync protocol starts when connections with some number of peers +are established. Therefore, peer set is initialised with non-empty set of peer ids. Note however +that node does not know initially the peer heights. +*) + +EXTENDS Integers, FiniteSets, Sequences + + +CONSTANTS MAX_HEIGHT, \* the maximal height of blockchain + VALIDATOR_SETS, \* abstract set of validators + NIL_VS, \* a nil validator set + CORRECT, \* set of correct peers + FAULTY, \* set of faulty peers + TARGET_PENDING, \* maximum number of pending requests + downloaded blocks that are not yet processed + PEER_MAX_REQUESTS \* maximum number of pending requests per peer + +ASSUME CORRECT \intersect FAULTY = {} +ASSUME TARGET_PENDING > 0 +ASSUME PEER_MAX_REQUESTS > 0 + +\* the blockchain, see Tinychain +VARIABLE chain + +\* introduce tiny chain as the source of blocks for the correct nodes +INSTANCE Tinychain + +\* a special value for an undefined height +NilHeight == 0 + +\* the height of the genesis block +TrustedHeight == 1 + +\* the set of all peer ids the node can receive a message from +AllPeerIds == CORRECT \union FAULTY + +\* Correct last commit have enough voting power, i.e., +2/3 of the voting power of +\* the corresponding validator set (enoughVotingPower = TRUE) that signs blockId. +\* BlockId defines correct previous block (in the implementation it is the hash of the block). +\* Instead of blockId, we encode blockIdEqRef, which is true, if the block id is equal +\* to the hash of the previous block, see Tinychain. +CorrectLastCommit(h) == chain[h].lastCommit + +NilCommit == [blockIdEqRef |-> FALSE, committers |-> NIL_VS] + +\* correct node always supplies the blocks from the blockchain +CorrectBlock(h) == chain[h] + +NilBlock == + [height |-> 0, hashEqRef |-> FALSE, wellFormed |-> FALSE, + lastCommit |-> NilCommit, VS |-> NIL_VS, NextVS |-> NIL_VS] + +\* a special value for an undefined peer +NilPeer == "Nil" \* STRING for apalache efficiency + +\* control the state of the syncing node +States == { "running", "finished"} + +NoMsg == [type |-> "None"] + +\* the variables of the node running fastsync +VARIABLES + state, \* running or finished + (* + blockPool [ + height, \* current height we are trying to sync. Last block executed is height - 1 + peerIds, \* set of peers node is connected to + peerHeights, \* map of peer ids to its (stated) height + blockStore, \* map of heights to (received) blocks + receivedBlocks, \* map of heights to peer that has sent us the block (stored in blockStore) + pendingBlocks, \* map of heights to peer to which block request has been sent + syncHeight, \* height at the point syncTimeout was triggered last time + syncedBlocks \* number of blocks synced since last syncTimeout. If it is 0 when the next timeout occurs, then protocol terminates. + ] + *) + blockPool + + +\* the variables of the peers providing blocks +VARIABLES + (* + peersState [ + peerHeights, \* track peer heights + statusRequested, \* boolean set to true when StatusRequest is received. Models periodic sending of StatusRequests. + blocksRequested \* set of BlockRequests received that are not answered yet + ] + *) + peersState + + \* the variables for the network and scheduler +VARIABLES + turn, \* who is taking the turn: "Peers" or "Node" + inMsg, \* a node receives message by this variable + outMsg \* a node sends a message by this variable + + +(* the variables of the node *) +nvars == <> + +(*************** Type definitions for Apalache (model checker) **********************) +AsIntSet(S) == S <: {Int} + +\* type of process ids +PIDT == STRING +AsPidSet(S) == S <: {PIDT} + +\* ControlMessage type +CMT == [type |-> STRING, peerId |-> PIDT] \* type of control messages + +\* InMsg type +IMT == [type |-> STRING, peerId |-> PIDT, height |-> Int, block |-> BT] +AsInMsg(m) == m <: IMT +AsInMsgSet(S) == S <: {IMT} + +\* OutMsg type +OMT == [type |-> STRING, peerId |-> PIDT, height |-> Int] +AsOutMsg(m) == m <: OMT +AsOutMsgSet(S) == S <: {OMT} + +\* block pool type +BPT == [height |-> Int, peerIds |-> {PIDT}, peerHeights |-> [PIDT -> Int], + blockStore |-> [Int -> BT], receivedBlocks |-> [Int -> PIDT], + pendingBlocks |-> [Int -> PIDT], syncedBlocks |-> Int, syncHeight |-> Int] + +AsBlockPool(bp) == bp <: BPT + +(******************** Sets of messages ********************************) + +\* Control messages +ControlMsgs == + AsInMsgSet([type: {"addPeer"}, peerId: AllPeerIds]) + \union + AsInMsgSet([type: {"removePeer"}, peerId: AllPeerIds]) + \union + AsInMsgSet([type: {"syncTimeout"}]) + +\* All messages (and events) received by a node +InMsgs == + AsInMsgSet({NoMsg}) + \union + AsInMsgSet([type: {"blockResponse"}, peerId: AllPeerIds, block: Blocks]) + \union + AsInMsgSet([type: {"noBlockResponse"}, peerId: AllPeerIds, height: Heights]) + \union + AsInMsgSet([type: {"statusResponse"}, peerId: AllPeerIds, height: Heights]) + \union + ControlMsgs + +\* Messages sent by a node and received by peers (environment in our case) +OutMsgs == + AsOutMsgSet({NoMsg}) + \union + AsOutMsgSet([type: {"statusRequest"}]) \* StatusRequest is broadcast to the set of connected peers. + \union + AsOutMsgSet([type: {"blockRequest"}, peerId: AllPeerIds, height: Heights]) + + +(********************************** NODE ***********************************) + +InitNode == + \E pIds \in SUBSET AllPeerIds: \* set of peers node established initial connections with + /\ pIds \subseteq CORRECT \* this line is not necessary + /\ pIds /= AsPidSet({}) \* apalache better checks non-emptiness than subtracts from SUBSET + /\ blockPool = AsBlockPool([ + height |-> TrustedHeight + 1, \* the genesis block is at height 1 + syncHeight |-> TrustedHeight + 1, \* and we are synchronized to it + peerIds |-> pIds, + peerHeights |-> [p \in AllPeerIds |-> NilHeight], \* no peer height is known + blockStore |-> + [h \in Heights |-> + IF h > TrustedHeight THEN NilBlock ELSE chain[1]], + receivedBlocks |-> [h \in Heights |-> NilPeer], + pendingBlocks |-> [h \in Heights |-> NilPeer], + syncedBlocks |-> -1 + ]) + /\ state = "running" + +\* Remove faulty peers. +\* Returns new block pool. +\* See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L222 +RemovePeers(rmPeers, bPool) == + LET keepPeers == bPool.peerIds \ rmPeers IN + LET pHeights == + [p \in AllPeerIds |-> IF p \in rmPeers THEN NilHeight ELSE bPool.peerHeights[p]] IN + + LET failedRequests == + {h \in Heights: /\ h >= bPool.height + /\ \/ bPool.pendingBlocks[h] \in rmPeers + \/ bPool.receivedBlocks[h] \in rmPeers} IN + LET pBlocks == + [h \in Heights |-> IF h \in failedRequests THEN NilPeer ELSE bPool.pendingBlocks[h]] IN + LET rBlocks == + [h \in Heights |-> IF h \in failedRequests THEN NilPeer ELSE bPool.receivedBlocks[h]] IN + LET bStore == + [h \in Heights |-> IF h \in failedRequests THEN NilBlock ELSE bPool.blockStore[h]] IN + + IF keepPeers /= bPool.peerIds + THEN [bPool EXCEPT + !.peerIds = keepPeers, + !.peerHeights = pHeights, + !.pendingBlocks = pBlocks, + !.receivedBlocks = rBlocks, + !.blockStore = bStore + ] + ELSE bPool + +\* Add a peer. +\* see https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L198 +AddPeer(peer, bPool) == + [bPool EXCEPT !.peerIds = bPool.peerIds \union {peer}] + + +(* +Handle StatusResponse message. +If valid status response, update peerHeights. +If invalid (height is smaller than the current), then remove peer. +Returns new block pool. +See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L667 +*) +HandleStatusResponse(msg, bPool) == + LET peerHeight == bPool.peerHeights[msg.peerId] IN + + IF /\ msg.peerId \in bPool.peerIds + /\ msg.height >= peerHeight + THEN \* a correct response + LET pHeights == [bPool.peerHeights EXCEPT ![msg.peerId] = msg.height] IN + [bPool EXCEPT !.peerHeights = pHeights] + ELSE RemovePeers({msg.peerId}, bPool) \* the peer has sent us message with smaller height or peer is not in our peer list + + +(* +Handle BlockResponse message. +If valid block response, update blockStore, pendingBlocks and receivedBlocks. +If invalid (unsolicited response or malformed block), then remove peer. +Returns new block pool. +See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L522 +*) +HandleBlockResponse(msg, bPool) == + LET h == msg.block.height IN + + IF /\ msg.peerId \in bPool.peerIds + /\ bPool.blockStore[h] = NilBlock + /\ bPool.pendingBlocks[h] = msg.peerId + /\ msg.block.wellFormed + THEN + [bPool EXCEPT + !.blockStore = [bPool.blockStore EXCEPT ![h] = msg.block], + !.receivedBlocks = [bPool.receivedBlocks EXCEPT![h] = msg.peerId], + !.pendingBlocks = [bPool.pendingBlocks EXCEPT![h] = NilPeer] + ] + ELSE RemovePeers({msg.peerId}, bPool) + + HandleNoBlockResponse(msg, bPool) == + RemovePeers({msg.peerId}, bPool) + + +\* Compute max peer height. +\* See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L440 +MaxPeerHeight(bPool) == + IF bPool.peerIds = AsPidSet({}) + THEN 0 \* no peers, just return 0 + ELSE LET Hts == {bPool.peerHeights[p] : p \in bPool.peerIds} IN + CHOOSE max \in Hts: \A h \in Hts: h <= max + +(* Returns next height for which request should be sent. + Returns NilHeight in case there is no height for which request can be sent. + See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L454 *) +FindNextRequestHeight(bPool) == + LET S == {i \in Heights: + /\ i >= bPool.height + /\ i <= MaxPeerHeight(bPool) + /\ bPool.blockStore[i] = NilBlock + /\ bPool.pendingBlocks[i] = NilPeer} IN + IF S = AsIntSet({}) + THEN NilHeight + ELSE + CHOOSE min \in S: \A h \in S: h >= min + +\* Returns number of pending requests for a given peer. +NumOfPendingRequests(bPool, peer) == + LET peerPendingRequests == + {h \in Heights: + /\ h >= bPool.height + /\ bPool.pendingBlocks[h] = peer + } + IN + Cardinality(peerPendingRequests) + +(* Returns peer that can serve block for a given height. + Returns NilPeer in case there are no such peer. + See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L477 *) +FindPeerToServe(bPool, h) == + LET peersThatCanServe == { p \in bPool.peerIds: + /\ bPool.peerHeights[p] >= h + /\ NumOfPendingRequests(bPool, p) < PEER_MAX_REQUESTS } IN + + LET pendingBlocks == + {i \in Heights: + /\ i >= bPool.height + /\ \/ bPool.pendingBlocks[i] /= NilPeer + \/ bPool.blockStore[i] /= NilBlock + } IN + + IF \/ peersThatCanServe = AsPidSet({}) + \/ Cardinality(pendingBlocks) >= TARGET_PENDING + THEN NilPeer + \* pick a peer that can serve request for height h that has minimum number of pending requests + ELSE CHOOSE p \in peersThatCanServe: \A q \in peersThatCanServe: + /\ NumOfPendingRequests(bPool, p) <= NumOfPendingRequests(bPool, q) + + +\* Make a request for a block (if possible) and return a request message and block poool. +CreateRequest(bPool) == + LET nextHeight == FindNextRequestHeight(bPool) IN + + IF nextHeight = NilHeight THEN [msg |-> AsOutMsg(NoMsg), pool |-> bPool] + ELSE + LET peer == FindPeerToServe(bPool, nextHeight) IN + IF peer = NilPeer THEN [msg |-> AsOutMsg(NoMsg), pool |-> bPool] + ELSE + LET m == [type |-> "blockRequest", peerId |-> peer, height |-> nextHeight] IN + LET newPool == [bPool EXCEPT + !.pendingBlocks = [bPool.pendingBlocks EXCEPT ![nextHeight] = peer] + ] IN + [msg |-> m, pool |-> newPool] + + +\* Returns node state, i.e., defines termination condition. +\* See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L432 +ComputeNextState(bPool) == + IF bPool.syncedBlocks = 0 \* corresponds to the syncTimeout in case no progress has been made for a period of time. + THEN "finished" + ELSE IF /\ bPool.height > 1 + /\ bPool.height >= MaxPeerHeight(bPool) \* see https://github.com/tendermint/tendermint/blob/61057a8b0af2beadee106e47c4616b279e83c920/blockchain/v2/scheduler.go#L566 + THEN "finished" + ELSE "running" + +(* Verify if commit is for the given block id and if commit has enough voting power. + See https://github.com/tendermint/tendermint/blob/61057a8b0af2beadee106e47c4616b279e83c920/blockchain/v2/processor_context.go#L12 *) +VerifyCommit(block, lastCommit) == + PossibleCommit(block, lastCommit) + +(* Tries to execute next block in the pool, i.e., defines block validation logic. + Returns new block pool (peers that has send invalid blocks are removed). + See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/processor.go#L135 *) +ExecuteBlocks(bPool) == + LET bStore == bPool.blockStore IN + LET block0 == bStore[bPool.height - 1] IN + \* blockPool is initialized with height = TrustedHeight + 1, + \* so bStore[bPool.height - 1] is well defined + LET block1 == bStore[bPool.height] IN + LET block2 == bStore[bPool.height + 1] IN + + IF block1 = NilBlock \/ block2 = NilBlock + THEN bPool \* we don't have two next consecutive blocks + + ELSE IF ~IsMatchingValidators(block1, block0.NextVS) + \* Check that block1.VS = block0.Next. + \* Otherwise, CorrectBlocksInv fails. + \* In the implementation NextVS is part of the application state, + \* so a mismatch can be found without access to block0.NextVS. + THEN \* the block does not have the expected validator set + RemovePeers({bPool.receivedBlocks[bPool.height]}, bPool) + ELSE IF ~VerifyCommit(block1, block2.lastCommit) + \* Verify commit of block2 based on block1. + \* Interestingly, we do not have to call IsMatchingValidators. + THEN \* remove the peers of block1 and block2, as they are considered faulty + RemovePeers({bPool.receivedBlocks[bPool.height], + bPool.receivedBlocks[bPool.height + 1]}, + bPool) + ELSE \* all good, execute block at position height + [bPool EXCEPT !.height = bPool.height + 1] + + +\* Defines logic for pruning peers. +\* See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L613 +TryPrunePeer(bPool, suspectedSet, isTimedOut) == + (* -----------------------------------------------------------------------------------------------------------------------*) + (* Corresponds to function prunablePeers in scheduler.go file. Note that this function only checks if block has been *) + (* received from a peer during peerTimeout period. *) + (* Note that in case no request has been scheduled to a correct peer, or a request has been scheduled *) + (* recently, so the peer hasn't responded yet, a peer will be removed as no block is received within peerTimeout. *) + (* In case of faulty peers, we don't have any guarantee that they will respond. *) + (* Therefore, we model this with nondeterministic behavior as it could lead to peer removal, for both correct and faulty. *) + (* See scheduler.go *) + (* https://github.com/tendermint/tendermint/blob/4298bbcc4e25be78e3c4f21979d6aa01aede6e87/blockchain/v2/scheduler.go#L335 *) + LET toRemovePeers == bPool.peerIds \intersect suspectedSet IN + + (* + Corresponds to logic for pruning a peer that is responsible for delivering block for the next height. + The pruning logic for the next height is based on the time when a BlockRequest is sent. Therefore, if a request is sent + to a correct peer for the next height (blockPool.height), it should never be removed by this check as we assume that + correct peers respond timely and reliably. However, if a request is sent to a faulty peer then we + might get response on time or not, which is modelled with nondeterministic isTimedOut flag. + See scheduler.go + https://github.com/tendermint/tendermint/blob/4298bbcc4e25be78e3c4f21979d6aa01aede6e87/blockchain/v2/scheduler.go#L617 + *) + LET nextHeightPeer == bPool.pendingBlocks[bPool.height] IN + LET prunablePeers == + IF /\ nextHeightPeer /= NilPeer + /\ nextHeightPeer \in FAULTY + /\ isTimedOut + THEN toRemovePeers \union {nextHeightPeer} + ELSE toRemovePeers + IN + RemovePeers(prunablePeers, bPool) + + +\* Handle SyncTimeout. It models if progress has been made (height has increased) since the last SyncTimeout event. +HandleSyncTimeout(bPool) == + [bPool EXCEPT + !.syncedBlocks = bPool.height - bPool.syncHeight, + !.syncHeight = bPool.height + ] + +HandleResponse(msg, bPool) == + IF msg.type = "blockResponse" THEN + HandleBlockResponse(msg, bPool) + ELSE IF msg.type = "noBlockResponse" THEN + HandleNoBlockResponse(msg, bPool) + ELSE IF msg.type = "statusResponse" THEN + HandleStatusResponse(msg, bPool) + ELSE IF msg.type = "addPeer" THEN + AddPeer(msg.peerId, bPool) + ELSE IF msg.type = "removePeer" THEN + RemovePeers({msg.peerId}, bPool) + ELSE IF msg.type = "syncTimeout" THEN + HandleSyncTimeout(bPool) + ELSE + bPool + + +(* + At every node step we executed the following steps (atomically): + 1) input message is consumed and the corresponding handler is called, + 2) pruning logic is called + 3) block execution is triggered (we try to execute block at next height) + 4) a request to a peer is made (if possible) and + 5) we decide if termination condition is satisifed so we stop. +*) +NodeStep == + \E suspectedSet \in SUBSET AllPeerIds: \* suspectedSet is a nondeterministic set of peers + \E isTimedOut \in BOOLEAN: + LET bPool == HandleResponse(inMsg, blockPool) IN + LET bp == TryPrunePeer(bPool, suspectedSet, isTimedOut) IN + LET nbPool == ExecuteBlocks(bp) IN + LET msgAndPool == CreateRequest(nbPool) IN + LET nstate == ComputeNextState(msgAndPool.pool) IN + + /\ state' = nstate + /\ blockPool' = msgAndPool.pool + /\ outMsg' = msgAndPool.msg + /\ inMsg' = AsInMsg(NoMsg) + + +\* If node is running, then in every step we try to create blockRequest. +\* In addition, input message (if exists) is consumed and processed. +NextNode == + \/ /\ state = "running" + /\ NodeStep + + \/ /\ state = "finished" + /\ UNCHANGED <> + + +(********************************** Peers ***********************************) + +InitPeers == + \E pHeights \in [AllPeerIds -> Heights]: + peersState = [ + peerHeights |-> pHeights, + statusRequested |-> FALSE, + blocksRequested |-> AsOutMsgSet({}) + ] + +HandleStatusRequest(msg, pState) == + [pState EXCEPT + !.statusRequested = TRUE + ] + +HandleBlockRequest(msg, pState) == + [pState EXCEPT + !.blocksRequested = pState.blocksRequested \union AsOutMsgSet({msg}) + ] + +HandleRequest(msg, pState) == + IF msg = AsOutMsg(NoMsg) + THEN pState + ELSE IF msg.type = "statusRequest" + THEN HandleStatusRequest(msg, pState) + ELSE HandleBlockRequest(msg, pState) + +CreateStatusResponse(peer, pState, anyHeight) == + LET m == + IF peer \in CORRECT + THEN AsInMsg([type |-> "statusResponse", peerId |-> peer, height |-> pState.peerHeights[peer]]) + ELSE AsInMsg([type |-> "statusResponse", peerId |-> peer, height |-> anyHeight]) IN + + [msg |-> m, peers |-> pState] + +CreateBlockResponse(msg, pState, arbitraryBlock) == + LET m == + IF msg.peerId \in CORRECT + THEN AsInMsg([type |-> "blockResponse", peerId |-> msg.peerId, block |-> CorrectBlock(msg.height)]) + ELSE AsInMsg([type |-> "blockResponse", peerId |-> msg.peerId, block |-> arbitraryBlock]) IN + LET npState == + [pState EXCEPT + !.blocksRequested = pState.blocksRequested \ {msg} + ] IN + [msg |-> m, peers |-> npState] + +GrowPeerHeight(pState) == + \E p \in CORRECT: + /\ pState.peerHeights[p] < MAX_HEIGHT + /\ peersState' = [pState EXCEPT !.peerHeights[p] = @ + 1] + /\ inMsg' = AsInMsg(NoMsg) + +SendStatusResponseMessage(pState) == + /\ \E arbitraryHeight \in Heights: + \E peer \in AllPeerIds: + LET msgAndPeers == CreateStatusResponse(peer, pState, arbitraryHeight) IN + /\ peersState' = msgAndPeers.peers + /\ inMsg' = msgAndPeers.msg + + +SendAddPeerMessage == + \E peer \in AllPeerIds: + inMsg' = AsInMsg([type |-> "addPeer", peerId |-> peer]) + +SendRemovePeerMessage == + \E peer \in AllPeerIds: + inMsg' = AsInMsg([type |-> "removePeer", peerId |-> peer]) + +SendSyncTimeoutMessage == + inMsg' = AsInMsg([type |-> "syncTimeout"]) + + +SendControlMessage == + \/ SendAddPeerMessage + \/ SendRemovePeerMessage + \/ SendSyncTimeoutMessage + +\* An extremely important property of block hashes (blockId): +\* If the block hash coincides with the hash of the reference block, +\* then the blocks should be equal. +UnforgeableBlockId(height, block) == + block.hashEqRef => block = chain[height] + +\* A faulty peer cannot forge enough of the validators signatures. +\* In other words: If a commit contains enough signatures from the validators (in reality 2/3, in the model all), +\* then the blockID points to the block on the chain, encoded as block.lastCommit.blockIdEqRef being true +\* A more precise rule should have checked that the commiters have over 2/3 of the VS's voting power. +NoFork(height, block) == + (height > 1 /\ block.lastCommit.committers = chain[height - 1].VS) + => block.lastCommit.blockIdEqRef + +\* Can be block produced by a faulty peer, assuming it cannot generate forks (basic assumption of the protocol) +IsBlockByFaulty(height, block) == + /\ block.height = height + /\ UnforgeableBlockId(height, block) + /\ NoFork(height, block) + +SendBlockResponseMessage(pState) == + \* a response to a requested block: either by a correct, or by a faulty peer + \/ /\ pState.blocksRequested /= AsOutMsgSet({}) + /\ \E msg \in pState.blocksRequested: + \E block \in Blocks: + /\ IsBlockByFaulty(msg.height, block) + /\ LET msgAndPeers == CreateBlockResponse(msg, pState, block) IN + /\ peersState' = msgAndPeers.peers + /\ inMsg' = msgAndPeers.msg + + \* a faulty peer can always send an unsolicited block + \/ \E peerId \in FAULTY: + \E block \in Blocks: + /\ IsBlockByFaulty(block.height, block) + /\ peersState' = pState + /\ inMsg' = AsInMsg([type |-> "blockResponse", + peerId |-> peerId, block |-> block]) + +SendNoBlockResponseMessage(pState) == + /\ peersState' = pState + /\ inMsg' \in AsInMsgSet([type: {"noBlockResponse"}, peerId: FAULTY, height: Heights]) + + +SendResponseMessage(pState) == + \/ SendBlockResponseMessage(pState) + \/ SendNoBlockResponseMessage(pState) + \/ SendStatusResponseMessage(pState) + + +NextEnvStep(pState) == + \/ SendResponseMessage(pState) + \/ GrowPeerHeight(pState) + \/ SendControlMessage /\ peersState' = pState + \* note that we propagate pState that was missing in the previous version + + +\* Peers consume a message and update it's local state. It then makes a single step, i.e., it sends at most single message. +\* Message sent could be either a response to a request or faulty message (sent by faulty processes). +NextPeers == + LET pState == HandleRequest(outMsg, peersState) IN + /\ outMsg' = AsOutMsg(NoMsg) + /\ NextEnvStep(pState) + + +\* the composition of the node, the peers, the network and scheduler +Init == + /\ IsCorrectChain(chain) \* initialize the blockchain + /\ InitNode + /\ InitPeers + /\ turn = "Peers" + /\ inMsg = AsInMsg(NoMsg) + /\ outMsg = AsOutMsg([type |-> "statusRequest"]) + +Next == + IF turn = "Peers" + THEN + /\ NextPeers + /\ turn' = "Node" + /\ UNCHANGED <> + ELSE + /\ NextNode + /\ turn' = "Peers" + /\ UNCHANGED <> + + +FlipTurn == + turn' = + IF turn = "Peers" THEN + "Node" + ELSE + "Peers" + +\* Compute max peer height. Used as a helper operator in properties. +MaxCorrectPeerHeight(bPool) == + LET correctPeers == {p \in bPool.peerIds: p \in CORRECT} IN + IF correctPeers = AsPidSet({}) + THEN 0 \* no peers, just return 0 + ELSE LET Hts == {bPool.peerHeights[p] : p \in correctPeers} IN + CHOOSE max \in Hts: \A h \in Hts: h <= max + +\* properties to check +TypeOK == + /\ state \in States + /\ inMsg \in InMsgs + /\ outMsg \in OutMsgs + /\ turn \in {"Peers", "Node"} + /\ peersState \in [ + peerHeights: [AllPeerIds -> Heights \union {NilHeight}], + statusRequested: BOOLEAN, + blocksRequested: + SUBSET + [type: {"blockRequest"}, peerId: AllPeerIds, height: Heights] + + ] + /\ blockPool \in [ + height: Heights, + peerIds: SUBSET AllPeerIds, + peerHeights: [AllPeerIds -> Heights \union {NilHeight}], + blockStore: [Heights -> Blocks \union {NilBlock}], + receivedBlocks: [Heights -> AllPeerIds \union {NilPeer}], + pendingBlocks: [Heights -> AllPeerIds \union {NilPeer}], + syncedBlocks: Heights \union {NilHeight, -1}, + syncHeight: Heights + ] + +(* Incorrect synchronization: The last block may be never received *) +Sync1 == + [](state = "finished" => + blockPool.height >= MaxCorrectPeerHeight(blockPool)) + +Sync1AsInv == + state = "finished" => blockPool.height >= MaxCorrectPeerHeight(blockPool) + +(* Incorrect synchronization, as there may be a timeout *) +Sync2 == + \A p \in CORRECT: + \/ p \notin blockPool.peerIds + \/ [] (state = "finished" => blockPool.height >= blockPool.peerHeights[p] - 1) + +Sync2AsInv == + \A p \in CORRECT: + \/ p \notin blockPool.peerIds + \/ (state = "finished" => blockPool.height >= blockPool.peerHeights[p] - 1) + +(* Correct synchronization *) +Sync3 == + \A p \in CORRECT: + \/ p \notin blockPool.peerIds + \/ blockPool.syncedBlocks <= 0 \* timeout + \/ [] (state = "finished" => blockPool.height >= blockPool.peerHeights[p] - 1) + +Sync3AsInv == + \A p \in CORRECT: + \/ p \notin blockPool.peerIds + \/ blockPool.syncedBlocks <= 0 \* timeout + \/ (state = "finished" => blockPool.height >= blockPool.peerHeights[p] - 1) + +(* Naive termination *) +\* This property is violated, as the faulty peers may produce infinitely many responses +Termination == + WF_turn(FlipTurn) => <>(state = "finished") + +(* Termination by timeout: the protocol terminates, if there is a timeout *) +\* the precondition: fair flip turn and eventual timeout when no new blocks were synchronized +TerminationByTOPre == + /\ WF_turn(FlipTurn) + /\ <>(inMsg.type = "syncTimeout" /\ blockPool.height <= blockPool.syncHeight) + +TerminationByTO == + TerminationByTOPre => <>(state = "finished") + +(* The termination property when we only have correct peers *) +\* as correct peers may spam the node with addPeer, removePeer, and statusResponse, +\* we have to enforce eventual response (there are no queues in our spec) +CorrBlockResponse == + \A h \in Heights: + [](outMsg.type = "blockRequest" /\ outMsg.height = h + => <>(inMsg.type = "blockResponse" /\ inMsg.block.height = h)) + +\* a precondition for termination in presence of only correct processes +TerminationCorrPre == + /\ FAULTY = AsPidSet({}) + /\ WF_turn(FlipTurn) + /\ CorrBlockResponse + +\* termination when there are only correct processes +TerminationCorr == + TerminationCorrPre => <>(state = "finished") + +\* All synchronized blocks (but the last one) are exactly like in the reference chain +CorrectBlocksInv == + \/ state /= "finished" + \/ \A h \in 1..(blockPool.height - 1): + blockPool.blockStore[h] = chain[h] + +\* A false expectation that the protocol only finishes with the blocks +\* from the processes that had not been suspected in being faulty +SyncFromCorrectInv == + \/ state /= "finished" + \/ \A h \in 1..blockPool.height: + blockPool.receivedBlocks[h] \in blockPool.peerIds \union {NilPeer} + +\* A false expectation that a correct process is never removed from the set of peer ids. +\* A correct process may reply too late and then gets evicted. +CorrectNeverSuspectedInv == + CORRECT \subseteq blockPool.peerIds + +BlockPoolInvariant == + \A h \in Heights: + \* waiting for a block to arrive + \/ /\ blockPool.receivedBlocks[h] = NilPeer + /\ blockPool.blockStore[h] = NilBlock + \* valid block is received and is present in the store + \/ /\ blockPool.receivedBlocks[h] /= NilPeer + /\ blockPool.blockStore[h] /= NilBlock + /\ blockPool.pendingBlocks[h] = NilPeer + +(* a few simple properties that trigger counterexamples *) + +\* Shows execution in which peer set is empty +PeerSetIsNeverEmpty == blockPool.peerIds /= AsPidSet({}) + +\* Shows execution in which state = "finished" and MaxPeerHeight is not equal to 1 +StateNotFinished == + state /= "finished" \/ MaxPeerHeight(blockPool) = 1 + + +============================================================================= + +\*============================================================================= +\* Modification History +\* Last modified Fri May 29 20:41:53 CEST 2020 by igor +\* Last modified Thu Apr 16 16:57:22 CEST 2020 by zarkomilosevic +\* Created Tue Feb 04 10:36:18 CET 2020 by zarkomilosevic diff --git a/rust-spec/fastsync/scheduler.tla b/rust-spec/fastsync/scheduler.tla new file mode 100644 index 000000000..1de22e997 --- /dev/null +++ b/rust-spec/fastsync/scheduler.tla @@ -0,0 +1,606 @@ +------------------------------- MODULE scheduler ------------------------------- +(* + A specification of the fast sync scheduler that is introduced in blockchain/v2: + + https://github.com/tendermint/tendermint/tree/brapse/blockchain-v2-riri-reactor-2 + + The model includes: + - a scheduler that maintains the peers and blocks that it receives from the peers, and + - one environment simulating a correct peer + + This specification focuses on the events that are received and produced by the scheduler. + Communication between the scheduler and the other fastsync components is not specified. +*) + +EXTENDS Integers, FiniteSets + +\* the protocol parameters +CONSTANTS + PeerIDs, \* potential peer ids, a set of integers, e.g. 0..2 + ultimateHeight, \* the maximum height of the blockchain, an integer, e.g. 3 + numRequests \* the maximum number of requests made when scheduling new blocks, e.g. 2 + +\* a few definitions +None == -1 \* an undefined value +Heights == 0..ultimateHeight \* potential heights +noErr == "errNone" +Errors == { + noErr, "errPeerNotFound", "errDelRemovedPeer", "errAddDuplicatePeer", + "errUpdateRemovedPeer", "errAfterPeerRemove", "errBadPeer", "errBadPeerState", + "errProcessedBlockEv", "finished", "timeout", "errAddRemovedPeer", "errPeerNoBlock", "errBadBlockState"} + +PeerStates == {"peerStateUnknown", "peerStateNew", "peerStateReady", "peerStateRemoved"} + +BlockStates == { + "blockStateUnknown", "blockStateNew", "blockStatePending", + "blockStateReceived", "blockStateProcessed"} + +\* basic stuff +Min(a, b) == IF a < b THEN a ELSE b +Max(a, b) == IF a > b THEN a ELSE b + +\* the state of the scheduler: +VARIABLE turn \* who makes a step: the scheduler or the environment + +\* the state of the reactor: +VARIABLES inEvent, \* an event from the environment to the scheduler + envRunning \* a Boolean, negation of stopProcessing in the implementation + +\* the state of the scheduler: +VARIABLES outEvent, \* an event from the scheduler to the environment + scRunning + +\* the block pool: +VARIABLE scheduler + (* + scheduler is a record that contains: + height: Int, + height of the next block to collect + peers: PeerIDs, + the set of peers that have connected in the past, may include removed peers + peerHeights: [PeerIDs -> Heights], + a map to collect the peer heights, >0 if peer in ready state, None (-1) otherwise + peerStates: [PeerIDs -> PeerStates], + a map to record the peer states + blockStates: [Heights -> BlockStates] + a set of heights for which blocks are to be scheduled, pending or received + pendingBlocks: [Heights -> PeerIDs], + a set of heights for which blocks are to be scheduled or pending + receivedBlocks: [Heights -> PeerIDs], + a set of heights for which blocks were received but not yet processed + blocks: Heights, + the full set of blocks requested or downloaded by the scheduler + *) + +vars == <> + +\* for now just keep the height in the block +Blocks == [ height: Heights ] + +noEvent == [type |-> "NoEvent"] + +InEvents == + {noEvent} \cup + [type: {"rTrySchedule", "tNoAdvanceExp"}] \cup + [type: {"bcStatusResponse"}, peerID: PeerIDs, height: Heights] \cup + [type: {"bcBlockResponse"}, peerID: PeerIDs, height: Heights, block: Blocks] \cup + [type: {"bcNoBlockResponse"}, peerID: PeerIDs, height: Heights] \cup + [type: {"pcBlockProcessed"}, peerID: PeerIDs, height: Heights] \cup + [type: {"pcBlockVerificationFailure"}, height: Heights, firstPeerID: PeerIDs, secondPeerID: PeerIDs] \cup + [type: {"bcAddNewPeer"}, peerID: PeerIDs] \cup + [type: {"bcRemovePeer"}, peerID: PeerIDs] + +\* Output events produced by the scheduler. +\* Note: in v2 the status request is done by the reactor/ environment +OutEvents == + {noEvent} \cup + [type: {"scPeerError"}, peerID: PeerIDs, error: Errors] \cup + [type: {"scSchedulerFail"}, error: Errors] \cup + [type: {"scBlockRequest"}, peerID: PeerIDs, height: Heights] \cup + [type: {"scBlockReceived"}, peerID: PeerIDs, block: Blocks] \cup + [type: {"scPeersPruned"}, pruned: SUBSET [peerID: PeerIDs]] \cup + [type: {"scFinishedEv"}, error: Errors] + +(* ----------------------------------------------------------------------------------------------*) +(* The behavior of the scheduler that keeps track of peers, block requests and responses, etc. *) +(* See scheduler.go *) +(* https://github.com/tendermint/tendermint/blob/v0.33.3/blockchain/v2/scheduler.go *) +(* ----------------------------------------------------------------------------------------------*) + +addPeer(sc, peerID) == + IF peerID \in sc.peers THEN + [err |-> "errAddDuplicatePeer", val |-> sc] + ELSE IF sc.peerStates[peerID] = "peerStateRemoved" THEN + [err |-> "errAddRemovedPeer", val |-> sc] + ELSE + LET newPeers == sc.peers \cup { peerID } IN + LET newPeerHeights == [sc.peerHeights EXCEPT ![peerID] = None] IN + LET newPeerStates == [sc.peerStates EXCEPT ![peerID] = "peerStateNew"] IN + LET newSc == [sc EXCEPT + !.peers = newPeers, + !.peerHeights = newPeerHeights, + !.peerStates = newPeerStates] IN + [err |-> noErr, val |-> newSc] + +maxHeight(states, heights) == + LET activePeers == {p \in DOMAIN states: states[p] = "peerStateReady"} IN + IF activePeers = {} THEN + 0 \* no peers, just return 0 + ELSE + CHOOSE max \in { heights[p] : p \in activePeers }: + \A p \in activePeers: heights[p] <= max \* max is the maximum + +maxHeightScheduler(sc) == + maxHeight(sc.peerStates, sc.peerHeights) + +removePeer(sc, peerID) == + IF peerID \notin sc.peers THEN + [err |-> "errPeerNotFound", val |-> sc] + ELSE IF sc.peerStates[peerID] = "peerStateRemoved" THEN + [err |-> "errDelRemovedPeer", val |-> sc] + ELSE + LET newSc == [sc EXCEPT + !.peerHeights[peerID] = None, + !.peerStates[peerID] = "peerStateRemoved", + \* remove all blocks from peerID and block requests to peerID, see scheduler.removePeer + !.blockStates = [h \in Heights |-> + IF sc.pendingBlocks[h] = peerID \/ sc.receivedBlocks[h] = peerID THEN "blockStateNew" + ELSE sc.blockStates[h]], + !.pendingBlocks = [h \in Heights |-> IF sc.pendingBlocks[h] = peerID THEN None ELSE sc.pendingBlocks[h]], + !.receivedBlocks = [h \in Heights |-> IF sc.receivedBlocks[h] = peerID THEN None ELSE sc.receivedBlocks[h]] + ] IN + [err |-> noErr, val |-> newSc] + +addNewBlocks(sc, newMph) == + \* add new blocks to be requested (e.g. when overall max peer height has changed) + IF Cardinality(sc.blocks) >= numRequests THEN + [newBlocks |-> sc.blocks, newBlockStates |-> sc.blockStates] + ELSE + LET requestSet == sc.height.. Min(numRequests+sc.height, newMph) IN + LET heightsToRequest == {h \in requestSet: sc.blockStates[h] = "blockStateUnknown"} IN + LET newBlockStates == [ + h \in Heights |-> IF h \in heightsToRequest THEN "blockStateNew" + ELSE sc.blockStates[h]] IN + [newBlocks |-> heightsToRequest, newBlockStates |-> newBlockStates] + +\* Update the peer height (the peer should have been previously added) +setPeerHeight(sc, peerID, height) == + IF peerID \notin sc.peers THEN + [err |-> "errPeerNotFound", val |-> sc] + ELSE IF sc.peerStates[peerID] = "peerStateRemoved" THEN + [err |-> "errUpdateRemovedPeer", val |-> sc] + ELSE IF height < sc.peerHeights[peerID] THEN (* The peer is corrupt? Remove the peer. *) + removePeer(sc, peerID) + ELSE + LET newPeerHeights == [sc.peerHeights EXCEPT ![peerID] = height] IN \* set the peer's height + LET newPeerStates == [sc.peerStates EXCEPT ![peerID] = "peerStateReady"] IN \* set the peer's state + LET newMph == maxHeight(newPeerStates, newPeerHeights) IN + LET res == addNewBlocks(sc, newMph) IN + LET newSc == [sc EXCEPT + !.peerHeights = newPeerHeights, !.peerStates = newPeerStates, + !.blocks = res.newBlocks, !.blockStates = res.newBlockStates] IN + [err |-> noErr, val |-> newSc] + +nextHeightToSchedule(sc) == + LET toBeScheduled == {h \in DOMAIN sc.blockStates: sc.blockStates[h] = "blockStateNew"} \cup {ultimateHeight+1} IN + CHOOSE minH \in toBeScheduled: \A h \in toBeScheduled: h >= minH + +getStateAtHeight(sc, h) == + IF h < sc.height THEN + "blockStateProcessed" + ELSE IF h \in DOMAIN sc.blockStates THEN + sc.blockStates[h] + ELSE + "blockStateUnknown" + +markPending(sc, peerID, h) == + IF getStateAtHeight(sc, h) /= "blockStateNew" THEN + [err |-> "errBadBlockState", val |-> sc] + ELSE IF peerID \notin sc.peers \/ sc.peerStates[peerID] /= "peerStateReady" THEN + [err |-> "errBadPeerState", val |-> sc] + ELSE IF h > sc.peerHeights[peerID] THEN + [err |-> "errPeerTooShort", val |-> sc] + ELSE + LET newSc == [sc EXCEPT + !.blockStates = [sc.blockStates EXCEPT ![h] = "blockStatePending"], + !.pendingBlocks = [sc.pendingBlocks EXCEPT ![h] = peerID]] IN + [err |-> noErr, val |-> newSc] + +markReceived(sc, peerID, h) == + IF peerID \notin sc.peers \/ sc.peerStates[peerID] /= "peerStateReady" THEN + [err |-> "errBadPeerState", val |-> sc] + ELSE IF getStateAtHeight(sc, h) /= "blockStatePending" \/ sc.pendingBlocks[h] /= peerID THEN + [err |-> "errBadPeer", val |-> sc] + ELSE + LET newSc == [sc EXCEPT + !.blockStates = [sc.blockStates EXCEPT ![h] = "blockStateReceived"], + !.pendingBlocks = [sc.pendingBlocks EXCEPT ![h] = None], + !.receivedBlocks = [sc.receivedBlocks EXCEPT ![h] = peerID]] IN + [err |-> noErr, val |-> newSc] + +markProcessed(sc, h) == + IF getStateAtHeight(sc, h) /= "blockStateReceived" THEN + [err |-> "errProcessedBlockEv", val |-> sc] + ELSE + LET newSc == [sc EXCEPT + !.blockStates = [sc.blockStates EXCEPT ![h] = "blockStateProcessed"], + !.receivedBlocks = [sc.receivedBlocks EXCEPT ![h] = None], + !.height = sc.height + 1] IN + [err |-> noErr, val |-> newSc] + +reachedMaxHeight(sc) == + IF sc.peers = {} THEN + FALSE + ELSE + LET maxH == maxHeightScheduler(sc) IN + maxH > 0 /\ (sc.height >= maxH) + +highPeers(sc, minH) == {p \in sc.peers: sc.peerHeights[p] >= minH} + +(* ----------------------------------------------------------------------------------------------*) +(* The behavior of the scheduler state machine *) +(* See scheduler.go *) +(* https://github.com/tendermint/tendermint/tree/brapse/blockchain-v2-riri-reactor-2/scheduler.go*) +(* ----------------------------------------------------------------------------------------------*) +blStateInit(h, start) == + IF h <= start THEN + "blockStateProcessed" + ELSE "blockStateUnknown" + +InitSc == + /\ scRunning = TRUE + /\ outEvent = noEvent + /\ \E startHeight \in Heights: + scheduler = [ + initHeight |-> startHeight, + height |-> startHeight + 1, + peers |-> {}, + peerHeights |-> [p \in PeerIDs |-> None], + peerStates |-> [p \in PeerIDs |-> "peerStateUnknown"], + blocks |-> {}, + blockStates |-> [h \in Heights |-> blStateInit(h, startHeight)], + pendingBlocks |-> [h \in Heights |-> None], + receivedBlocks |-> [h \in Heights |-> None] + ] + +handleAddNewPeer == + /\ inEvent.type = "bcAddNewPeer" + /\ LET res == addPeer(scheduler, inEvent.peerID) IN + IF res.err /= noErr THEN + /\ outEvent' = [type |-> "scSchedulerFail", error |-> res.err] + /\ UNCHANGED <> + ELSE + /\ scheduler' = res.val + /\ UNCHANGED outEvent + /\ UNCHANGED scRunning + +finishSc(event) == + event.type = "scFinishedEv" /\ event.error = "finished" + +handleRemovePeer == + /\ inEvent.type = "bcRemovePeer" + /\ LET res == removePeer(scheduler, inEvent.peerID) IN + IF res.err /= noErr THEN + /\ outEvent' = [type |-> "scSchedulerFail", error |-> res.err] + /\ UNCHANGED scheduler + ELSE + /\ scheduler' = res.val + /\ IF reachedMaxHeight(scheduler') THEN + outEvent' = [type |-> "scFinishedEv", error |-> "errAfterPeerRemove"] + ELSE + UNCHANGED outEvent + /\ IF finishSc(outEvent') THEN + scRunning' = FALSE + ELSE UNCHANGED scRunning + +handleStatusResponse == + /\ inEvent.type = "bcStatusResponse" + /\ LET res == setPeerHeight(scheduler, inEvent.peerID, inEvent.height) IN + IF res.err /= noErr THEN + /\ outEvent' = [type |-> "scPeerError", peerID |-> inEvent.peerID, error |-> res.err] + /\ UNCHANGED scheduler + ELSE + /\ scheduler' = res.val + /\ UNCHANGED outEvent + /\ UNCHANGED scRunning + +handleTrySchedule == \* every 10 ms, but our spec is asynchronous + /\ inEvent.type = "rTrySchedule" + /\ LET minH == nextHeightToSchedule(scheduler) IN + IF minH = ultimateHeight+1 THEN + /\ outEvent' = noEvent + /\ UNCHANGED scheduler + ELSE IF minH = ultimateHeight+1 THEN + /\ outEvent' = noEvent + /\ UNCHANGED scheduler + ELSE + /\ LET hp == highPeers(scheduler, minH) IN + IF hp = {} THEN + /\ outEvent' = noEvent + /\ UNCHANGED scheduler + ELSE \E bestPeerID \in hp: + /\ LET res == markPending(scheduler, bestPeerID, minH) IN + /\ IF res.err /= noErr THEN + outEvent' = [type |-> "scSchedulerFail", error|-> res.err] + ELSE + outEvent' = [type |-> "scBlockRequest", peerID |-> bestPeerID, height |-> minH] + /\ scheduler' = res.val + /\ UNCHANGED scRunning + +handleBlockResponse == + /\ inEvent.type = "bcBlockResponse" + /\ LET res == markReceived(scheduler, inEvent.peerID, inEvent.height) IN + IF res.err /= noErr THEN + LET res1 == removePeer(scheduler, inEvent.peerID) IN + /\ outEvent' = [type |-> "scPeerError", peerID |-> inEvent.peerID, error |-> res.err] + /\ scheduler' = res1.val + ELSE + /\ outEvent' = [type |-> "scBlockReceived", peerID |-> inEvent.peerID, block |-> inEvent.block] + /\ scheduler' = res.val + /\ UNCHANGED scRunning + +handleNoBlockResponse == + /\ inEvent.type = "bcNoBlockResponse" + /\ IF (scheduler.peers = {} \/ scheduler.peerStates[inEvent.peerID] = "peerStateRemoved") THEN + /\ outEvent' = noEvent + /\ UNCHANGED scheduler + ELSE + LET res == removePeer(scheduler, inEvent.peerID) IN + /\ outEvent' = [type |-> "scPeerError", peerID |-> inEvent.peerID, error |-> "errPeerNoBlock"] + /\ scheduler' = res.val + /\ UNCHANGED scRunning + +handleBlockProcessed == + /\ inEvent.type = "pcBlockProcessed" + /\ IF inEvent.height /= scheduler.height THEN + /\ outEvent' = [type |-> "scSchedulerFail", error |-> "errProcessedBlockEv"] + /\ UNCHANGED scheduler + ELSE + LET res == markProcessed(scheduler, inEvent.height) IN + IF res.err /= noErr THEN + /\ outEvent' = [type |-> "scSchedulerFail", error |-> res.err] + /\ UNCHANGED scheduler + ELSE + /\ scheduler' = res.val + /\ IF reachedMaxHeight(scheduler') THEN + outEvent' = [type |-> "scFinishedEv", error |-> "finished"] + ELSE + outEvent' = noEvent + /\ IF finishSc(outEvent') THEN + scRunning' = FALSE + ELSE UNCHANGED scRunning + +handleBlockProcessError == + /\ inEvent.type = "pcBlockVerificationFailure" + /\ IF scheduler.peers = {} THEN + /\ outEvent' = noEvent + /\ UNCHANGED scheduler + ELSE + LET res1 == removePeer(scheduler, inEvent.firstPeerID) IN + LET res2 == removePeer(res1.val, inEvent.secondPeerID) IN + /\ IF reachedMaxHeight(res2.val) THEN + outEvent' = [type |-> "scFinishedEv", error |-> "finished"] + ELSE + outEvent' = noEvent + /\ scheduler' = res2.val + /\ IF finishSc(outEvent') THEN + scRunning' = FALSE + ELSE UNCHANGED scRunning + +handleNoAdvanceExp == + /\ inEvent.type = "tNoAdvanceExp" + /\ outEvent' = [type |-> "scFinishedEv", error |-> "timeout"] + /\ scRunning' = FALSE + /\ UNCHANGED <> + +NextSc == + IF ~scRunning THEN + UNCHANGED <> + ELSE + \/ handleStatusResponse + \/ handleAddNewPeer + \/ handleRemovePeer + \/ handleTrySchedule + \/ handleBlockResponse + \/ handleNoBlockResponse + \/ handleBlockProcessed + \/ handleBlockProcessError + \/ handleNoAdvanceExp + +(* ----------------------------------------------------------------------------------------------*) +(* The behavior of the environment. *) +(* ----------------------------------------------------------------------------------------------*) + +InitEnv == + /\ inEvent = noEvent + /\ envRunning = TRUE + +OnGlobalTimeoutTicker == + /\ inEvent' = [type |-> "tNoAdvanceExp"] + /\ envRunning' = FALSE + +OnTrySchedule == + /\ inEvent' = [type |-> "rTrySchedule"] + /\ UNCHANGED envRunning + +OnAddPeerEv == + /\ inEvent' \in [type: {"bcAddNewPeer"}, peerID: PeerIDs] + /\ UNCHANGED envRunning + +OnStatusResponseEv == + \* any status response can come from the blockchain, pick one non-deterministically + /\ inEvent' \in [type: {"bcStatusResponse"}, peerID: PeerIDs, height: Heights] + /\ UNCHANGED envRunning + +OnBlockResponseEv == + \* any block response can come from the blockchain, pick one non-deterministically + /\ inEvent' \in [type: {"bcBlockResponse"}, peerID: PeerIDs, height: Heights, block: Blocks] + /\ UNCHANGED envRunning + +OnNoBlockResponseEv == + \* any no block response can come from the blockchain, pick one non-deterministically + /\ inEvent' \in [type: {"bcNoBlockResponse"}, peerID: PeerIDs, height: Heights] + /\ UNCHANGED envRunning + +OnRemovePeerEv == + \* although bcRemovePeer admits an arbitrary set, we produce just a singleton + /\ inEvent' \in [type: {"bcRemovePeer"}, peerID: PeerIDs] + /\ UNCHANGED envRunning + +OnPcBlockProcessed == + /\ inEvent' \in [type: {"pcBlockProcessed"}, peerID: PeerIDs, height: Heights] + /\ UNCHANGED envRunning + +OnPcBlockVerificationFailure == + /\ inEvent' \in [type: {"pcBlockVerificationFailure"}, firstPeerID: PeerIDs, secondPeerID: PeerIDs, height: Heights] + /\ UNCHANGED envRunning + +\* messages from scheduler +OnScFinishedEv == + /\ outEvent.type = "scFinishedEv" + /\ envRunning' = FALSE \* stop the env + /\ UNCHANGED inEvent + +NextEnv == + IF ~envRunning THEN + UNCHANGED <> + ELSE + \/ OnScFinishedEv + \/ OnGlobalTimeoutTicker + \/ OnAddPeerEv + \/ OnTrySchedule + \/ OnStatusResponseEv + \/ OnBlockResponseEv + \/ OnNoBlockResponseEv + \/ OnRemovePeerEv + \/ OnPcBlockProcessed + \/ OnPcBlockVerificationFailure + +(* ----------------------------------------------------------------------------------------------*) +(* The system is the composition of the environment and the schedule *) +(* ----------------------------------------------------------------------------------------------*) +Init == turn = "environment" /\ InitEnv /\ InitSc + +FlipTurn == + turn' = ( + IF turn = "scheduler" THEN + "environment" + ELSE + "scheduler" + ) + +\* scheduler and environment alternate their steps (synchronous composition introduces more states) +Next == +/\ FlipTurn +/\ IF turn = "scheduler" THEN + /\ NextSc + /\ inEvent' = noEvent + /\ UNCHANGED envRunning + ELSE + /\ NextEnv + /\ outEvent' = noEvent + /\ UNCHANGED <> + +Spec == Init /\ [][Next]_vars /\ WF_turn(FlipTurn) + +(* ----------------------------------------------------------------------------------------------*) +(* Invariants *) +(* ----------------------------------------------------------------------------------------------*) +TypeOK == + /\ turn \in {"scheduler", "environment"} + /\ inEvent \in InEvents + /\ envRunning \in BOOLEAN + /\ outEvent \in OutEvents + /\ scheduler \in [ + initHeight: Heights, + height: Heights \cup {ultimateHeight + 1}, + peers: SUBSET PeerIDs, + peerHeights: [PeerIDs -> Heights \cup {None}], + peerStates: [PeerIDs -> PeerStates], + blocks: SUBSET Heights, + blockStates: [Heights -> BlockStates], + pendingBlocks: [Heights -> PeerIDs \cup {None}], + receivedBlocks: [Heights -> PeerIDs \cup {None}] + ] + +(* ----------------------------------------------------------------------------------------------*) +(* Helpers for Properties *) +(* ----------------------------------------------------------------------------------------------*) +NoFailuresAndTimeouts == + /\ inEvent.type /= "bcRemovePeer" + /\ inEvent.type /= "bcNoBlockResponse" + /\ inEvent.type /= "pcBlockVerificationFailure" + +\* simulate good peer behavior using this formula. Useful to show termination in the presence of good peers. +GoodResponse == + \/ inEvent.type + \in {"bcAddNewPeer", "bcStatusResponse", "bcBlockResponse", "pcBlockProcessed", "rTrySchedule"} + \/ ~envRunning + +\* all blocks from initHeight up to max peer height have been processed +AllRequiredBlocksProcessed == + LET maxH == Max(scheduler.height, maxHeightScheduler(scheduler)) IN + LET processedBlocks == {h \in scheduler.initHeight.. maxH-1: scheduler.blockStates[h] = "blockStateProcessed"} IN + scheduler.height >= maxH /\ Cardinality(processedBlocks) = scheduler.height - scheduler.initHeight + +IncreaseHeight == + (scheduler'.height > scheduler.height) \/ (scheduler.height >= ultimateHeight) + +(* ----------------------------------------------------------------------------------------------*) +(* Expected properties *) +(* ----------------------------------------------------------------------------------------------*) +(* *) +(* 1. TerminationWhenNoAdvance - termination if there are no correct peers. *) +(* The model assumes the "noAdvance" timer expires and the "tNoAdvanceExp" event is received *) +(* *) +TerminationWhenNoAdvance == + (inEvent.type = "tNoAdvanceExp") + => <>(outEvent.type = "scFinishedEv" /\ outEvent.error = "timeout") + +(* ----------------------------------------------------------------------------------------------*) +(* *) +(* 2. TerminationGoodPeers - *) +(* termination when IncreaseHeight holds true, fastsync is progressing, all blocks processed *) +(* *) +TerminationGoodPeers == + (/\ scheduler.height < ultimateHeight + /\ <>[]GoodResponse + /\[]<>(<>_<>) + ) + => <>(outEvent.type = "scFinishedEv" /\ AllRequiredBlocksProcessed) + +(* This property is violated. It shows that the precondition of TerminationGoodPeers is not *) +(* always FALSE *) +TerminationGoodPeersPre == + (/\ scheduler.height < ultimateHeight + /\ <>[]GoodResponse + /\[]<>(<>_<>) + ) + => FALSE + +(* ----------------------------------------------------------------------------------------------*) +(* 3. TerminationAllCases - *) +(* any peer behavior, either terminates with all blocks processed or times out *) +TerminationAllCases == + (/\ scheduler.height < ultimateHeight + /\(([]<> (<>_<>)) \/ <>(inEvent.type = "tNoAdvanceExp")) + ) + => <>(outEvent.type = "scFinishedEv" /\ (AllRequiredBlocksProcessed \/ outEvent.error = "timeout")) + +(* This property is violated. It shows that the precondition of TerminationAllCases is not *) +(* always FALSE *) +TerminationAllCasesPre == + (/\ scheduler.height < ultimateHeight + /\(([]<> (<>_<>)) \/ <>(inEvent.type = "tNoAdvanceExp")) + ) + => FALSE + +(* This property is violated. TLC output shows an example of increasing heights in the scheduler *) +SchedulerIncreasePre == +[]<>(<>_<>) + => FALSE + +============================================================================= +\* Modification History +\* Last modified Thu Apr 16 13:21:33 CEST 2020 by ancaz +\* Created Sat Feb 08 13:12:30 CET 2020 by ancaz diff --git a/rust-spec/fastsync/verification/001bmc-apalache.csv b/rust-spec/fastsync/verification/001bmc-apalache.csv new file mode 100644 index 000000000..e83c4c96f --- /dev/null +++ b/rust-spec/fastsync/verification/001bmc-apalache.csv @@ -0,0 +1,13 @@ +no,filename,tool,timeout,init,inv,next,args +1,MC_1_1_4.tla,apalache,1h,,Sync1AsInv,,--length=20 +2,MC_1_1_4.tla,apalache,1h,,Sync2AsInv,,--length=20 +3,MC_1_1_4.tla,apalache,4h,,Sync3AsInv,,--length=20 +4,MC_1_1_4.tla,apalache,8h,,CorrectBlocksInv,,--length=16 +5,MC_1_1_4.tla,apalache,1h,,SyncFromCorrectInv,,--length=20 +6,MC_1_1_4.tla,apalache,1h,,CorrectNeverSuspectedInv,,--length=20 +7,MC_1_2_4.tla,apalache,1h,,Sync1AsInv,,--length=20 +8,MC_1_2_4.tla,apalache,1h,,Sync2AsInv,,--length=20 +9,MC_1_2_4.tla,apalache,4h,,Sync3AsInv,,--length=20 +10,MC_1_2_4.tla,apalache,8h,,CorrectBlocksInv,,--length=16 +11,MC_1_2_4.tla,apalache,1h,,SyncFromCorrectInv,,--length=20 +12,MC_1_2_4.tla,apalache,1h,,CorrectNeverSuspectedInv,,--length=20 diff --git a/rust-spec/fastsync/verification/002tlc-tlc.csv b/rust-spec/fastsync/verification/002tlc-tlc.csv new file mode 100644 index 000000000..2e5429d48 --- /dev/null +++ b/rust-spec/fastsync/verification/002tlc-tlc.csv @@ -0,0 +1,9 @@ +no,filename,tool,timeout,init,inv,next,args +1,MC_1_1_4.tla,tlc,1h,,,,-workers 4 -config MC-Sync1AsInv.cfg +2,MC_1_1_4.tla,tlc,1h,,,,-workers 4 -config MC-Sync2AsInv.cfg +3,MC_1_1_4.tla,tlc,24h,,,,-workers 4 -config MC-Sync3AsInv.cfg +4,MC_1_1_4.tla,tlc,2h,,Termination,,-workers 4 -config MC-Termination.cfg +5,MC_1_1_4.tla,tlc,24h,,TerminationByTO,,-workers 4 -config MC-TerminationByTO.cfg +6,MC_1_1_4.tla,tlc,24h,,,,-workers 4 -config MC-CorrectBlocksInv.cfg +7,MC_1_1_4.tla,tlc,1h,,,,-workers 4 -config MC-SyncFromCorrectInv.cfg +8,MC_1_1_4.tla,tlc,1h,,,,-workers 4 -config MC-CorrectNeverSuspectedInv.cfg diff --git a/rust-spec/fastsync/verification/MC-CorrectBlocksInv.cfg b/rust-spec/fastsync/verification/MC-CorrectBlocksInv.cfg new file mode 100644 index 000000000..3e80ef165 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-CorrectBlocksInv.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* INVARIANT definition +INVARIANT +CorrectBlocksInv +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC-CorrectNeverSuspectedInv.cfg b/rust-spec/fastsync/verification/MC-CorrectNeverSuspectedInv.cfg new file mode 100644 index 000000000..8a4606a86 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-CorrectNeverSuspectedInv.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* INVARIANT definition +INVARIANT +CorrectNeverSuspectedInv +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC-Sync1AsInv.cfg b/rust-spec/fastsync/verification/MC-Sync1AsInv.cfg new file mode 100644 index 000000000..b557fb8b1 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-Sync1AsInv.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* INVARIANT definition +INVARIANT +Sync1AsInv +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC-Sync2AsInv.cfg b/rust-spec/fastsync/verification/MC-Sync2AsInv.cfg new file mode 100644 index 000000000..3308dc805 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-Sync2AsInv.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* INVARIANT definition +INVARIANT +Sync2AsInv +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC-Sync3AsInv.cfg b/rust-spec/fastsync/verification/MC-Sync3AsInv.cfg new file mode 100644 index 000000000..e2d3db6e4 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-Sync3AsInv.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* INVARIANT definition +INVARIANT +Sync3AsInv +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC-SyncFromCorrectInv.cfg b/rust-spec/fastsync/verification/MC-SyncFromCorrectInv.cfg new file mode 100644 index 000000000..b17317633 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-SyncFromCorrectInv.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* INVARIANT definition +INVARIANT +SyncFromCorrectInv +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC-Termination.cfg b/rust-spec/fastsync/verification/MC-Termination.cfg new file mode 100644 index 000000000..00e8236e7 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-Termination.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* PROPERTY definition +PROPERTY +Termination +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC-TerminationByTO.cfg b/rust-spec/fastsync/verification/MC-TerminationByTO.cfg new file mode 100644 index 000000000..3a871ccf9 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-TerminationByTO.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* PROPERTY definition +PROPERTY +TerminationByTO +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC-TerminationCorr.cfg b/rust-spec/fastsync/verification/MC-TerminationCorr.cfg new file mode 100644 index 000000000..bea493637 --- /dev/null +++ b/rust-spec/fastsync/verification/MC-TerminationCorr.cfg @@ -0,0 +1,10 @@ +\* INIT definition +INIT +Init +\* NEXT definition +NEXT +Next +\* PROPERTY definition +PROPERTY +TerminationCorr +\* Generated on Wed May 27 18:45:18 CEST 2020 diff --git a/rust-spec/fastsync/verification/MC_1_0_4.tla b/rust-spec/fastsync/verification/MC_1_0_4.tla new file mode 100644 index 000000000..b4ab71228 --- /dev/null +++ b/rust-spec/fastsync/verification/MC_1_0_4.tla @@ -0,0 +1,17 @@ +--------------------------- MODULE MC_1_0_4 ------------------------------------ + +\*a <: b == a \* type annotation + +VALIDATOR_SETS == {"vs1", "vs2"} +NIL_VS == "NilVS" +CORRECT == {"c1"} +FAULTY == {"f2"} \ {"f2"} +MAX_HEIGHT == 4 +PEER_MAX_REQUESTS == 2 +TARGET_PENDING == 3 + +VARIABLES + state, blockPool, peersState, chain, turn, inMsg, outMsg + +INSTANCE fastsync_apalache +================================================================================ diff --git a/rust-spec/fastsync/verification/MC_1_1_4.tla b/rust-spec/fastsync/verification/MC_1_1_4.tla new file mode 100644 index 000000000..9797fabfc --- /dev/null +++ b/rust-spec/fastsync/verification/MC_1_1_4.tla @@ -0,0 +1,15 @@ +--------------------------- MODULE MC_1_1_4 ------------------------------------ + +VALIDATOR_SETS == {"vs1", "vs2"} +NIL_VS == "NilVS" +CORRECT == {"c1"} +FAULTY == {"f2"} +MAX_HEIGHT == 4 +PEER_MAX_REQUESTS == 2 +TARGET_PENDING == 3 + +VARIABLES + state, blockPool, peersState, chain, turn, inMsg, outMsg + +INSTANCE fastsync_apalache +================================================================================ diff --git a/rust-spec/fastsync/verification/MC_1_2_4.tla b/rust-spec/fastsync/verification/MC_1_2_4.tla new file mode 100644 index 000000000..8e9dc2cc9 --- /dev/null +++ b/rust-spec/fastsync/verification/MC_1_2_4.tla @@ -0,0 +1,15 @@ +-------------------------- MODULE MC_1_2_4 ------------------------------------ + +VALIDATOR_SETS == {"vs1", "vs2"} +NIL_VS == "NilVS" +CORRECT == {"c1"} +FAULTY == {"f2", "f3"} +MAX_HEIGHT == 4 +PEER_MAX_REQUESTS == 2 +TARGET_PENDING == 3 + +VARIABLES + state, blockPool, peersState, chain, turn, inMsg, outMsg + +INSTANCE fastsync_apalache +=============================================================================== diff --git a/rust-spec/fastsync/verification/MC_1_2_5.tla b/rust-spec/fastsync/verification/MC_1_2_5.tla new file mode 100644 index 000000000..1bd335082 --- /dev/null +++ b/rust-spec/fastsync/verification/MC_1_2_5.tla @@ -0,0 +1,15 @@ +-------------------------- MODULE MC_1_2_5 ------------------------------------ + +VALIDATOR_SETS == {"vs1", "vs2"} +NIL_VS == "NilVS" +CORRECT == {"c1"} +FAULTY == {"f2", "f3"} +MAX_HEIGHT == 5 +PEER_MAX_REQUESTS == 2 +TARGET_PENDING == 3 + +VARIABLES + state, blockPool, peersState, chain, turn, inMsg, outMsg + +INSTANCE fastsync_apalache +=============================================================================== diff --git a/rust-spec/fastsync/verification/MC_1_3_5.tla b/rust-spec/fastsync/verification/MC_1_3_5.tla new file mode 100644 index 000000000..81dadb470 --- /dev/null +++ b/rust-spec/fastsync/verification/MC_1_3_5.tla @@ -0,0 +1,15 @@ +------------------------------- MODULE MC_1_3_5 ------------------------------------ + +MAX_HEIGHT == 5 +VALIDATOR_SETS == {"vs1", "vs2"} +NIL_VS == "NilVS" +CORRECT == {"c1"} +FAULTY == {"f2", "f3", "f4"} +PEER_MAX_REQUESTS == 2 +TARGET_PENDING == 3 + +VARIABLES + state, blockPool, peersState, chain, turn, inMsg, outMsg + +INSTANCE fastsync_apalache +======================================================================================== diff --git a/rust-spec/fastsync/verification/MC_2_0_4.tla b/rust-spec/fastsync/verification/MC_2_0_4.tla new file mode 100644 index 000000000..b39dcd608 --- /dev/null +++ b/rust-spec/fastsync/verification/MC_2_0_4.tla @@ -0,0 +1,17 @@ +--------------------------- MODULE MC_2_0_4 ------------------------------------ + +a <: b == a \* type annotation + +VALIDATOR_SETS == {"vs1", "vs2"} +NIL_VS == "NilVS" +CORRECT == {"c1", "c2"} +FAULTY == {} <: {STRING} +MAX_HEIGHT == 4 +PEER_MAX_REQUESTS == 2 +TARGET_PENDING == 3 + +VARIABLES + state, blockPool, peersState, chain, turn, inMsg, outMsg + +INSTANCE fastsync_apalache +================================================================================ diff --git a/rust-spec/fastsync/verification/Tinychain.tla b/rust-spec/fastsync/verification/Tinychain.tla new file mode 100644 index 000000000..b8a212996 --- /dev/null +++ b/rust-spec/fastsync/verification/Tinychain.tla @@ -0,0 +1,110 @@ +-------------------------- MODULE Tinychain ---------------------------------- +(* A very abstract model of Tendermint blockchain. Its only purpose is to highlight + the relation between validator sets, next validator sets, and last commits. + *) + +EXTENDS Integers + +\* type annotation +a <: b == a + +\* the type of validator sets, e.g., STRING +VST == STRING + +\* LastCommit type. +\* It contains: +\* 1) the flag of whether the block id equals to the hash of the previous block, and +\* 2) the set of the validators who have committed the block. +\* In the implementation, blockId is the hash of the previous block, which cannot be forged. +\* We abstract block id into whether it equals to the hash of the previous block or not. +LCT == [blockIdEqRef |-> BOOLEAN, committers |-> VST] + +\* Block type. +\* A block contains its height, validator set, next validator set, and last commit. +\* Moreover, it contains the flag that tells us whether the block equals to the one +\* on the reference chain (this is an abstraction of hash). +BT == [height |-> Int, hashEqRef |-> BOOLEAN, wellFormed |-> BOOLEAN, + VS |-> VST, NextVS |-> VST, lastCommit |-> LCT] + +CONSTANTS + (* + A set of abstract values, each value representing a set of validators. + For the purposes of this specification, they can be any values, + e.g., "s1", "s2", etc. + *) + VALIDATOR_SETS, + (* a nil validator set that is outside of VALIDATOR_SETS *) + NIL_VS, + (* The maximal height, up to which the blockchain may grow *) + MAX_HEIGHT + +Heights == 1..MAX_HEIGHT + +\* the set of all possible commits +Commits == [blockIdEqRef: BOOLEAN, committers: VALIDATOR_SETS] + +\* the set of all possible blocks, not necessarily valid ones +Blocks == + [height: Heights, hashEqRef: BOOLEAN, wellFormed: BOOLEAN, + VS: VALIDATOR_SETS, NextVS: VALIDATOR_SETS, lastCommit: Commits] + +\* Does the chain contain a sound sequence of blocks that could be produced by +\* a 2/3 of faulty validators. This operator can be used to initialise the chain! +IsCorrectChain(chain) == + \* restrict the structure of the blocks, to decrease the TLC search space + LET OkCommits == [blockIdEqRef: {TRUE}, committers: VALIDATOR_SETS] + OkBlocks == [height: Heights, hashEqRef: {TRUE}, wellFormed: {TRUE}, + VS: VALIDATOR_SETS, NextVS: VALIDATOR_SETS, lastCommit: OkCommits] + IN + /\ chain \in [1..MAX_HEIGHT -> OkBlocks] + /\ \A h \in 1..MAX_HEIGHT: + LET b == chain[h] IN + /\ b.height = h \* the height is correct + /\ h > 1 => + LET p == chain[h - 1] IN + /\ b.VS = p.NextVS \* the validators propagate from the previous block + /\ b.lastCommit.committers = p.VS \* and they are the committers + + +\* The basic properties of blocks on the blockchain: +\* They should pass the validity check and they may verify the next block. + +\* Does the block pass the consistency check against the next validators of the previous block +IsMatchingValidators(block, nextVS) == + \* simply check that the validator set is propagated correctly. + \* (the implementation tests hashes and the application state) + block.VS = nextVS + +\* Does the block verify the commit (of the next block) +PossibleCommit(block, commit) == + \* the commits are signed by the block validators + /\ commit.committers = block.VS + \* The block id in the commit matches the block hash (abstract comparison). + \* (The implementation has extensive tests for that.) + \* this is an abstraction of: commit.blockId = hash(block) + \* + \* These are possible scenarios on the concrete hashes: + \* + \* scenario 1: commit.blockId = 10 /\ hash(block) = 10 /\ hash(ref) = 10 + \* scenario 2: commit.blockId = 20 /\ hash(block) = 20 /\ block.VS /= ref.VS + \* scenario 3: commit.blockId = 50 /\ hash(block) = 100 + \* scenario 4: commit.blockId = 10 /\ hash(block) = 100 + \* scenario 5: commit.blockId = 100 /\ hash(block) = 10 + /\ commit.blockIdEqRef = block.hashEqRef + \* the following test would be cheating, as we do not have access to the + \* reference chain: + \* /\ commit.blockIdEqRef + +\* Basic invariants + +\* every block has the validator set that is chosen by its predecessor +ValidBlockInv(chain) == + \A h \in 2..MAX_HEIGHT: + IsMatchingValidators(chain[h], chain[h - 1].NextVS) + +\* last commit of every block is signed by the validators of the predecessor +VerifiedBlockInv(chain) == + \A h \in 2..MAX_HEIGHT: + PossibleCommit(chain[h - 1], chain[h].lastCommit) + +================================================================================== diff --git a/rust-spec/fastsync/verification/fastsync_apalache.tla b/rust-spec/fastsync/verification/fastsync_apalache.tla new file mode 100644 index 000000000..51dd54e57 --- /dev/null +++ b/rust-spec/fastsync/verification/fastsync_apalache.tla @@ -0,0 +1,822 @@ +----------------------------- MODULE fastsync_apalache ----------------------------- +(* + In this document we give the high level specification of the fast sync + protocol as implemented here: + https://github.com/tendermint/tendermint/tree/master/blockchain/v2. + +We assume a system in which one node is trying to sync with the blockchain +(replicated state machine) by downloading blocks from the set of full nodes +(we call them peers) that are block providers, and executing transactions +(part of the block) against the application. + +Peers can be faulty, and we don't make any assumption about the rate of correct/faulty +nodes in the node peerset (i.e., they can all be faulty). Correct peers are part +of the replicated state machine, i.e., they manage blockchain and execute +transactions against the same deterministic application. We don't make any +assumptions about the behavior of faulty processes. Processes (client and peers) +communicate by message passing. + + In this specification, we model this system with two parties: + - the node (state machine) that is doing fastsync and + - the environment with which node interacts. + +The environment consists of the set of (correct and faulty) peers with +which node interacts as part of fast sync protocol, but also contains some +aspects (adding/removing peers, timeout mechanism) that are part of the node +local environment (could be seen as part of the runtime in which node +executes). + +As part of the fast sync protocol a node and the peers exchange the following messages: + +- StatusRequest +- StatusResponse +- BlockRequest +- BlockResponse +- NoBlockResponse. + +A node is periodically issuing StatusRequests to query peers for their current height (to decide what +blocks to ask from what peers). Based on StatusResponses (that are sent by peers), the node queries +blocks for some height(s) by sending peers BlockRequest messages. A peer provides a requested block by +BlockResponse message. If a peer does not want to provide a requested block, then it sends NoBlockResponse message. +In addition to those messages, a node in this spec receives additional input messages (events): + +- AddPeer +- RemovePeer +- SyncTimeout. + +These are the control messages that are provided to the node by its execution enviornment. AddPeer +is for the case when a connection is established with a peer; similarly RemovePeer is for the case +a connection with the peer is terminated. Finally SyncTimeout is used to model a timeout trigger. + +We assume that fast sync protocol starts when connections with some number of peers +are established. Therefore, peer set is initialised with non-empty set of peer ids. Note however +that node does not know initially the peer heights. +*) + +EXTENDS Integers, FiniteSets, Sequences + + +CONSTANTS MAX_HEIGHT, \* the maximal height of blockchain + VALIDATOR_SETS, \* abstract set of validators + NIL_VS, \* a nil validator set + CORRECT, \* set of correct peers + FAULTY, \* set of faulty peers + TARGET_PENDING, \* maximum number of pending requests + downloaded blocks that are not yet processed + PEER_MAX_REQUESTS \* maximum number of pending requests per peer + +ASSUME CORRECT \intersect FAULTY = {} +ASSUME TARGET_PENDING > 0 +ASSUME PEER_MAX_REQUESTS > 0 + +\* the blockchain, see Tinychain +VARIABLE chain + +\* introduce tiny chain as the source of blocks for the correct nodes +INSTANCE Tinychain + +\* a special value for an undefined height +NilHeight == 0 + +\* the height of the genesis block +TrustedHeight == 1 + +\* the set of all peer ids the node can receive a message from +AllPeerIds == CORRECT \union FAULTY + +\* Correct last commit have enough voting power, i.e., +2/3 of the voting power of +\* the corresponding validator set (enoughVotingPower = TRUE) that signs blockId. +\* BlockId defines correct previous block (in the implementation it is the hash of the block). +\* Instead of blockId, we encode blockIdEqRef, which is true, if the block id is equal +\* to the hash of the previous block, see Tinychain. +CorrectLastCommit(h) == chain[h].lastCommit + +NilCommit == [blockIdEqRef |-> FALSE, committers |-> NIL_VS] + +\* correct node always supplies the blocks from the blockchain +CorrectBlock(h) == chain[h] + +NilBlock == + [height |-> 0, hashEqRef |-> FALSE, wellFormed |-> FALSE, + lastCommit |-> NilCommit, VS |-> NIL_VS, NextVS |-> NIL_VS] + +\* a special value for an undefined peer +NilPeer == "Nil" \* STRING for apalache efficiency + +\* control the state of the syncing node +States == { "running", "finished"} + +NoMsg == [type |-> "None"] + +\* the variables of the node running fastsync +VARIABLES + state, \* running or finished + (* + blockPool [ + height, \* current height we are trying to sync. Last block executed is height - 1 + peerIds, \* set of peers node is connected to + peerHeights, \* map of peer ids to its (stated) height + blockStore, \* map of heights to (received) blocks + receivedBlocks, \* map of heights to peer that has sent us the block (stored in blockStore) + pendingBlocks, \* map of heights to peer to which block request has been sent + syncHeight, \* height at the point syncTimeout was triggered last time + syncedBlocks \* number of blocks synced since last syncTimeout. If it is 0 when the next timeout occurs, then protocol terminates. + ] + *) + blockPool + + +\* the variables of the peers providing blocks +VARIABLES + (* + peersState [ + peerHeights, \* track peer heights + statusRequested, \* boolean set to true when StatusRequest is received. Models periodic sending of StatusRequests. + blocksRequested \* set of BlockRequests received that are not answered yet + ] + *) + peersState + + \* the variables for the network and scheduler +VARIABLES + turn, \* who is taking the turn: "Peers" or "Node" + inMsg, \* a node receives message by this variable + outMsg \* a node sends a message by this variable + + +(* the variables of the node *) +nvars == <> + +(*************** Type definitions for Apalache (model checker) **********************) +AsIntSet(S) == S <: {Int} + +\* type of process ids +PIDT == STRING +AsPidSet(S) == S <: {PIDT} + +\* ControlMessage type +CMT == [type |-> STRING, peerId |-> PIDT] \* type of control messages + +\* InMsg type +IMT == [type |-> STRING, peerId |-> PIDT, height |-> Int, block |-> BT] +AsInMsg(m) == m <: IMT +AsInMsgSet(S) == S <: {IMT} + +\* OutMsg type +OMT == [type |-> STRING, peerId |-> PIDT, height |-> Int] +AsOutMsg(m) == m <: OMT +AsOutMsgSet(S) == S <: {OMT} + +\* block pool type +BPT == [height |-> Int, peerIds |-> {PIDT}, peerHeights |-> [PIDT -> Int], + blockStore |-> [Int -> BT], receivedBlocks |-> [Int -> PIDT], + pendingBlocks |-> [Int -> PIDT], syncedBlocks |-> Int, syncHeight |-> Int] + +AsBlockPool(bp) == bp <: BPT + +(******************** Sets of messages ********************************) + +\* Control messages +ControlMsgs == + AsInMsgSet([type: {"addPeer"}, peerId: AllPeerIds]) + \union + AsInMsgSet([type: {"removePeer"}, peerId: AllPeerIds]) + \union + AsInMsgSet([type: {"syncTimeout"}]) + +\* All messages (and events) received by a node +InMsgs == + AsInMsgSet({NoMsg}) + \union + AsInMsgSet([type: {"blockResponse"}, peerId: AllPeerIds, block: Blocks]) + \union + AsInMsgSet([type: {"noBlockResponse"}, peerId: AllPeerIds, height: Heights]) + \union + AsInMsgSet([type: {"statusResponse"}, peerId: AllPeerIds, height: Heights]) + \union + ControlMsgs + +\* Messages sent by a node and received by peers (environment in our case) +OutMsgs == + AsOutMsgSet({NoMsg}) + \union + AsOutMsgSet([type: {"statusRequest"}]) \* StatusRequest is broadcast to the set of connected peers. + \union + AsOutMsgSet([type: {"blockRequest"}, peerId: AllPeerIds, height: Heights]) + + +(********************************** NODE ***********************************) + +InitNode == + \E pIds \in SUBSET AllPeerIds: \* set of peers node established initial connections with + /\ pIds \subseteq CORRECT + /\ pIds /= AsPidSet({}) \* apalache better checks non-emptiness than subtracts from SUBSET + /\ blockPool = AsBlockPool([ + height |-> TrustedHeight + 1, \* the genesis block is at height 1 + syncHeight |-> TrustedHeight + 1, \* and we are synchronized to it + peerIds |-> pIds, + peerHeights |-> [p \in AllPeerIds |-> NilHeight], \* no peer height is known + blockStore |-> + [h \in Heights |-> + IF h > TrustedHeight THEN NilBlock ELSE chain[1]], + receivedBlocks |-> [h \in Heights |-> NilPeer], + pendingBlocks |-> [h \in Heights |-> NilPeer], + syncedBlocks |-> -1 + ]) + /\ state = "running" + +\* Remove faulty peers. +\* Returns new block pool. +\* See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L222 +RemovePeers(rmPeers, bPool) == + LET keepPeers == bPool.peerIds \ rmPeers IN + LET pHeights == + [p \in AllPeerIds |-> IF p \in rmPeers THEN NilHeight ELSE bPool.peerHeights[p]] IN + + LET failedRequests == + {h \in Heights: /\ h >= bPool.height + /\ \/ bPool.pendingBlocks[h] \in rmPeers + \/ bPool.receivedBlocks[h] \in rmPeers} IN + LET pBlocks == + [h \in Heights |-> IF h \in failedRequests THEN NilPeer ELSE bPool.pendingBlocks[h]] IN + LET rBlocks == + [h \in Heights |-> IF h \in failedRequests THEN NilPeer ELSE bPool.receivedBlocks[h]] IN + LET bStore == + [h \in Heights |-> IF h \in failedRequests THEN NilBlock ELSE bPool.blockStore[h]] IN + + IF keepPeers /= bPool.peerIds + THEN [bPool EXCEPT + !.peerIds = keepPeers, + !.peerHeights = pHeights, + !.pendingBlocks = pBlocks, + !.receivedBlocks = rBlocks, + !.blockStore = bStore + ] + ELSE bPool + +\* Add a peer. +\* see https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L198 +AddPeer(peer, bPool) == + [bPool EXCEPT !.peerIds = bPool.peerIds \union {peer}] + + +(* +Handle StatusResponse message. +If valid status response, update peerHeights. +If invalid (height is smaller than the current), then remove peer. +Returns new block pool. +See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L667 +*) +HandleStatusResponse(msg, bPool) == + LET peerHeight == bPool.peerHeights[msg.peerId] IN + + IF /\ msg.peerId \in bPool.peerIds + /\ msg.height >= peerHeight + THEN \* a correct response + LET pHeights == [bPool.peerHeights EXCEPT ![msg.peerId] = msg.height] IN + [bPool EXCEPT !.peerHeights = pHeights] + ELSE RemovePeers({msg.peerId}, bPool) \* the peer has sent us message with smaller height or peer is not in our peer list + + +(* +Handle BlockResponse message. +If valid block response, update blockStore, pendingBlocks and receivedBlocks. +If invalid (unsolicited response or malformed block), then remove peer. +Returns new block pool. +See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L522 +*) +HandleBlockResponse(msg, bPool) == + LET h == msg.block.height IN + + IF /\ msg.peerId \in bPool.peerIds + /\ bPool.blockStore[h] = NilBlock + /\ bPool.pendingBlocks[h] = msg.peerId + /\ msg.block.wellFormed + THEN + [bPool EXCEPT + !.blockStore = [bPool.blockStore EXCEPT ![h] = msg.block], + !.receivedBlocks = [bPool.receivedBlocks EXCEPT![h] = msg.peerId], + !.pendingBlocks = [bPool.pendingBlocks EXCEPT![h] = NilPeer] + ] + ELSE RemovePeers({msg.peerId}, bPool) + + HandleNoBlockResponse(msg, bPool) == + RemovePeers({msg.peerId}, bPool) + + +\* Compute max peer height. +\* See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L440 +MaxPeerHeight(bPool) == + IF bPool.peerIds = AsPidSet({}) + THEN 0 \* no peers, just return 0 + ELSE LET Hts == {bPool.peerHeights[p] : p \in bPool.peerIds} IN + CHOOSE max \in Hts: \A h \in Hts: h <= max + +(* Returns next height for which request should be sent. + Returns NilHeight in case there is no height for which request can be sent. + See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L454 *) +FindNextRequestHeight(bPool) == + LET S == {i \in Heights: + /\ i >= bPool.height + /\ i <= MaxPeerHeight(bPool) + /\ bPool.blockStore[i] = NilBlock + /\ bPool.pendingBlocks[i] = NilPeer} IN + IF S = AsIntSet({}) + THEN NilHeight + ELSE + CHOOSE min \in S: \A h \in S: h >= min + +\* Returns number of pending requests for a given peer. +NumOfPendingRequests(bPool, peer) == + LET peerPendingRequests == + {h \in Heights: + /\ h >= bPool.height + /\ bPool.pendingBlocks[h] = peer + } + IN + Cardinality(peerPendingRequests) + +(* Returns peer that can serve block for a given height. + Returns NilPeer in case there are no such peer. + See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L477 *) +FindPeerToServe(bPool, h) == + LET peersThatCanServe == { p \in bPool.peerIds: + /\ bPool.peerHeights[p] >= h + /\ NumOfPendingRequests(bPool, p) < PEER_MAX_REQUESTS } IN + + LET pendingBlocks == + {i \in Heights: + /\ i >= bPool.height + /\ \/ bPool.pendingBlocks[i] /= NilPeer + \/ bPool.blockStore[i] /= NilBlock + } IN + + IF \/ peersThatCanServe = AsPidSet({}) + \/ Cardinality(pendingBlocks) >= TARGET_PENDING + THEN NilPeer + \* pick a peer that can serve request for height h that has minimum number of pending requests + ELSE CHOOSE p \in peersThatCanServe: \A q \in peersThatCanServe: + /\ NumOfPendingRequests(bPool, p) <= NumOfPendingRequests(bPool, q) + + +\* Make a request for a block (if possible) and return a request message and block poool. +CreateRequest(bPool) == + LET nextHeight == FindNextRequestHeight(bPool) IN + + IF nextHeight = NilHeight THEN [msg |-> AsOutMsg(NoMsg), pool |-> bPool] + ELSE + LET peer == FindPeerToServe(bPool, nextHeight) IN + IF peer = NilPeer THEN [msg |-> AsOutMsg(NoMsg), pool |-> bPool] + ELSE + LET m == [type |-> "blockRequest", peerId |-> peer, height |-> nextHeight] IN + LET newPool == [bPool EXCEPT + !.pendingBlocks = [bPool.pendingBlocks EXCEPT ![nextHeight] = peer] + ] IN + [msg |-> m, pool |-> newPool] + + +\* Returns node state, i.e., defines termination condition. +\* See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L432 +ComputeNextState(bPool) == + IF bPool.syncedBlocks = 0 \* corresponds to the syncTimeout in case no progress has been made for a period of time. + THEN "finished" + ELSE IF /\ bPool.height > 1 + /\ bPool.height >= MaxPeerHeight(bPool) \* see https://github.com/tendermint/tendermint/blob/61057a8b0af2beadee106e47c4616b279e83c920/blockchain/v2/scheduler.go#L566 + THEN "finished" + ELSE "running" + +(* Verify if commit is for the given block id and if commit has enough voting power. + See https://github.com/tendermint/tendermint/blob/61057a8b0af2beadee106e47c4616b279e83c920/blockchain/v2/processor_context.go#L12 *) +VerifyCommit(block, lastCommit) == + PossibleCommit(block, lastCommit) + +(* Tries to execute next block in the pool, i.e., defines block validation logic. + Returns new block pool (peers that has send invalid blocks are removed). + See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/processor.go#L135 *) +ExecuteBlocks(bPool) == + LET bStore == bPool.blockStore IN + LET block0 == bStore[bPool.height - 1] IN + \* blockPool is initialized with height = TrustedHeight + 1, + \* so bStore[bPool.height - 1] is well defined + LET block1 == bStore[bPool.height] IN + LET block2 == bStore[bPool.height + 1] IN + + IF block1 = NilBlock \/ block2 = NilBlock + THEN bPool \* we don't have two next consecutive blocks + + ELSE IF ~IsMatchingValidators(block1, block0.NextVS) + \* Check that block1.VS = block0.Next. + \* Otherwise, CorrectBlocksInv fails. + \* In the implementation NextVS is part of the application state, + \* so a mismatch can be found without access to block0.NextVS. + THEN \* the block does not have the expected validator set + RemovePeers({bPool.receivedBlocks[bPool.height]}, bPool) + ELSE IF ~VerifyCommit(block1, block2.lastCommit) + \* Verify commit of block2 based on block1. + \* Interestingly, we do not have to call IsMatchingValidators. + THEN \* remove the peers of block1 and block2, as they are considered faulty + RemovePeers({bPool.receivedBlocks[bPool.height], + bPool.receivedBlocks[bPool.height + 1]}, + bPool) + ELSE \* all good, execute block at position height + [bPool EXCEPT !.height = bPool.height + 1] + + +\* Defines logic for pruning peers. +\* See https://github.com/tendermint/tendermint/blob/dac030d6daf4d3e066d84275911128856838af4e/blockchain/v2/scheduler.go#L613 +TryPrunePeer(bPool, suspectedSet, isTimedOut) == + (* -----------------------------------------------------------------------------------------------------------------------*) + (* Corresponds to function prunablePeers in scheduler.go file. Note that this function only checks if block has been *) + (* received from a peer during peerTimeout period. *) + (* Note that in case no request has been scheduled to a correct peer, or a request has been scheduled *) + (* recently, so the peer hasn't responded yet, a peer will be removed as no block is received within peerTimeout. *) + (* In case of faulty peers, we don't have any guarantee that they will respond. *) + (* Therefore, we model this with nondeterministic behavior as it could lead to peer removal, for both correct and faulty. *) + (* See scheduler.go *) + (* https://github.com/tendermint/tendermint/blob/4298bbcc4e25be78e3c4f21979d6aa01aede6e87/blockchain/v2/scheduler.go#L335 *) + LET toRemovePeers == bPool.peerIds \intersect suspectedSet IN + + (* + Corresponds to logic for pruning a peer that is responsible for delivering block for the next height. + The pruning logic for the next height is based on the time when a BlockRequest is sent. Therefore, if a request is sent + to a correct peer for the next height (blockPool.height), it should never be removed by this check as we assume that + correct peers respond timely and reliably. However, if a request is sent to a faulty peer then we + might get response on time or not, which is modelled with nondeterministic isTimedOut flag. + See scheduler.go + https://github.com/tendermint/tendermint/blob/4298bbcc4e25be78e3c4f21979d6aa01aede6e87/blockchain/v2/scheduler.go#L617 + *) + LET nextHeightPeer == bPool.pendingBlocks[bPool.height] IN + LET prunablePeers == + IF /\ nextHeightPeer /= NilPeer + /\ nextHeightPeer \in FAULTY + /\ isTimedOut + THEN toRemovePeers \union {nextHeightPeer} + ELSE toRemovePeers + IN + RemovePeers(prunablePeers, bPool) + + +\* Handle SyncTimeout. It models if progress has been made (height has increased) since the last SyncTimeout event. +HandleSyncTimeout(bPool) == + [bPool EXCEPT + !.syncedBlocks = bPool.height - bPool.syncHeight, + !.syncHeight = bPool.height + ] + +HandleResponse(msg, bPool) == + IF msg.type = "blockResponse" THEN + HandleBlockResponse(msg, bPool) + ELSE IF msg.type = "noBlockResponse" THEN + HandleNoBlockResponse(msg, bPool) + ELSE IF msg.type = "statusResponse" THEN + HandleStatusResponse(msg, bPool) + ELSE IF msg.type = "addPeer" THEN + AddPeer(msg.peerId, bPool) + ELSE IF msg.type = "removePeer" THEN + RemovePeers({msg.peerId}, bPool) + ELSE IF msg.type = "syncTimeout" THEN + HandleSyncTimeout(bPool) + ELSE + bPool + + +(* + At every node step we executed the following steps (atomically): + 1) input message is consumed and the corresponding handler is called, + 2) pruning logic is called + 3) block execution is triggered (we try to execute block at next height) + 4) a request to a peer is made (if possible) and + 5) we decide if termination condition is satisifed so we stop. +*) +NodeStep == + \E suspectedSet \in SUBSET AllPeerIds: \* suspectedSet is a nondeterministic set of peers + \E isTimedOut \in BOOLEAN: + LET bPool == HandleResponse(inMsg, blockPool) IN + LET bp == TryPrunePeer(bPool, suspectedSet, isTimedOut) IN + LET nbPool == ExecuteBlocks(bp) IN + LET msgAndPool == CreateRequest(nbPool) IN + LET nstate == ComputeNextState(msgAndPool.pool) IN + + /\ state' = nstate + /\ blockPool' = msgAndPool.pool + /\ outMsg' = msgAndPool.msg + /\ inMsg' = AsInMsg(NoMsg) + + +\* If node is running, then in every step we try to create blockRequest. +\* In addition, input message (if exists) is consumed and processed. +NextNode == + \/ /\ state = "running" + /\ NodeStep + + \/ /\ state = "finished" + /\ UNCHANGED <> + + +(********************************** Peers ***********************************) + +InitPeers == + \E pHeights \in [AllPeerIds -> Heights]: + peersState = [ + peerHeights |-> pHeights, + statusRequested |-> FALSE, + blocksRequested |-> AsOutMsgSet({}) + ] + +HandleStatusRequest(msg, pState) == + [pState EXCEPT + !.statusRequested = TRUE + ] + +HandleBlockRequest(msg, pState) == + [pState EXCEPT + !.blocksRequested = pState.blocksRequested \union AsOutMsgSet({msg}) + ] + +HandleRequest(msg, pState) == + IF msg = AsOutMsg(NoMsg) + THEN pState + ELSE IF msg.type = "statusRequest" + THEN HandleStatusRequest(msg, pState) + ELSE HandleBlockRequest(msg, pState) + +CreateStatusResponse(peer, pState, anyHeight) == + LET m == + IF peer \in CORRECT + THEN AsInMsg([type |-> "statusResponse", peerId |-> peer, height |-> pState.peerHeights[peer]]) + ELSE AsInMsg([type |-> "statusResponse", peerId |-> peer, height |-> anyHeight]) IN + + [msg |-> m, peers |-> pState] + +CreateBlockResponse(msg, pState, arbitraryBlock) == + LET m == + IF msg.peerId \in CORRECT + THEN AsInMsg([type |-> "blockResponse", peerId |-> msg.peerId, block |-> CorrectBlock(msg.height)]) + ELSE AsInMsg([type |-> "blockResponse", peerId |-> msg.peerId, block |-> arbitraryBlock]) IN + LET npState == + [pState EXCEPT + !.blocksRequested = pState.blocksRequested \ {msg} + ] IN + [msg |-> m, peers |-> npState] + +GrowPeerHeight(pState) == + \E p \in CORRECT: + /\ pState.peerHeights[p] < MAX_HEIGHT + /\ peersState' = [pState EXCEPT !.peerHeights[p] = @ + 1] + /\ inMsg' = AsInMsg(NoMsg) + +SendStatusResponseMessage(pState) == + /\ \E arbitraryHeight \in Heights: + \E peer \in AllPeerIds: + LET msgAndPeers == CreateStatusResponse(peer, pState, arbitraryHeight) IN + /\ peersState' = msgAndPeers.peers + /\ inMsg' = msgAndPeers.msg + + +SendAddPeerMessage == + \E peer \in AllPeerIds: + inMsg' = AsInMsg([type |-> "addPeer", peerId |-> peer]) + +SendRemovePeerMessage == + \E peer \in AllPeerIds: + inMsg' = AsInMsg([type |-> "removePeer", peerId |-> peer]) + +SendSyncTimeoutMessage == + inMsg' = AsInMsg([type |-> "syncTimeout"]) + + +SendControlMessage == + \/ SendAddPeerMessage + \/ SendRemovePeerMessage + \/ SendSyncTimeoutMessage + +\* An extremely important property of block hashes (blockId): +\* If a commit is correct, then the previous block in the block store must be also correct. +UnforgeableBlockId(height, block) == + block.hashEqRef => block = chain[height] + +\* A faulty peer cannot forge enough of the validators signatures. +\* A more precise rule should have checked that the commiters have over 2/3 of the VS's voting power. +NoFork(height, block) == + (height > 1 /\ block.lastCommit.committers = chain[height - 1].VS) + => block.lastCommit.blockIdEqRef + +\* Can be block produced by a faulty peer +IsBlockByFaulty(height, block) == + /\ block.height = height + /\ UnforgeableBlockId(height, block) + /\ NoFork(height, block) + +SendBlockResponseMessage(pState) == + \* a response to a requested block: either by a correct, or by a faulty peer + \/ /\ pState.blocksRequested /= AsOutMsgSet({}) + /\ \E msg \in pState.blocksRequested: + \E block \in Blocks: + /\ IsBlockByFaulty(msg.height, block) + /\ LET msgAndPeers == CreateBlockResponse(msg, pState, block) IN + /\ peersState' = msgAndPeers.peers + /\ inMsg' = msgAndPeers.msg + + \* a faulty peer can always send an unsolicited block + \/ \E peerId \in FAULTY: + \E block \in Blocks: + /\ IsBlockByFaulty(block.height, block) + /\ peersState' = pState + /\ inMsg' = AsInMsg([type |-> "blockResponse", + peerId |-> peerId, block |-> block]) + +SendNoBlockResponseMessage(pState) == + /\ peersState' = pState + /\ inMsg' \in AsInMsgSet([type: {"noBlockResponse"}, peerId: FAULTY, height: Heights]) + + +SendResponseMessage(pState) == + \/ SendBlockResponseMessage(pState) + \/ SendNoBlockResponseMessage(pState) + \/ SendStatusResponseMessage(pState) + + +NextEnvStep(pState) == + \/ SendResponseMessage(pState) + \/ GrowPeerHeight(pState) + \/ SendControlMessage /\ peersState' = pState + \* note that we propagate pState that was missing in the previous version + + +\* Peers consume a message and update it's local state. It then makes a single step, i.e., it sends at most single message. +\* Message sent could be either a response to a request or faulty message (sent by faulty processes). +NextPeers == + LET pState == HandleRequest(outMsg, peersState) IN + /\ outMsg' = AsOutMsg(NoMsg) + /\ NextEnvStep(pState) + + +\* the composition of the node, the peers, the network and scheduler +Init == + /\ IsCorrectChain(chain) \* initialize the blockchain + /\ InitNode + /\ InitPeers + /\ turn = "Peers" + /\ inMsg = AsInMsg(NoMsg) + /\ outMsg = AsOutMsg([type |-> "statusRequest"]) + +Next == + IF turn = "Peers" + THEN + /\ NextPeers + /\ turn' = "Node" + /\ UNCHANGED <> + ELSE + /\ NextNode + /\ turn' = "Peers" + /\ UNCHANGED <> + + +FlipTurn == + turn' = + IF turn = "Peers" THEN + "Node" + ELSE + "Peers" + +\* Compute max peer height. Used as a helper operator in properties. +MaxCorrectPeerHeight(bPool) == + LET correctPeers == {p \in bPool.peerIds: p \in CORRECT} IN + IF correctPeers = AsPidSet({}) + THEN 0 \* no peers, just return 0 + ELSE LET Hts == {bPool.peerHeights[p] : p \in correctPeers} IN + CHOOSE max \in Hts: \A h \in Hts: h <= max + +\* properties to check +TypeOK == + /\ state \in States + /\ inMsg \in InMsgs + /\ outMsg \in OutMsgs + /\ turn \in {"Peers", "Node"} + /\ peersState \in [ + peerHeights: [AllPeerIds -> Heights \union {NilHeight}], + statusRequested: BOOLEAN, + blocksRequested: + SUBSET + [type: {"blockRequest"}, peerId: AllPeerIds, height: Heights] + + ] + /\ blockPool \in [ + height: Heights, + peerIds: SUBSET AllPeerIds, + peerHeights: [AllPeerIds -> Heights \union {NilHeight}], + blockStore: [Heights -> Blocks \union {NilBlock}], + receivedBlocks: [Heights -> AllPeerIds \union {NilPeer}], + pendingBlocks: [Heights -> AllPeerIds \union {NilPeer}], + syncedBlocks: Heights \union {NilHeight, -1}, + syncHeight: Heights + ] + +(* Incorrect synchronization: The last block may be never received *) +Sync1 == + [](state = "finished" => + blockPool.height >= MaxCorrectPeerHeight(blockPool)) + +Sync1AsInv == + state = "finished" => blockPool.height >= MaxCorrectPeerHeight(blockPool) + +(* Incorrect synchronization, as there may be a timeout *) +Sync2 == + \A p \in CORRECT: + \/ p \notin blockPool.peerIds + \/ [] (state = "finished" => blockPool.height >= blockPool.peerHeights[p] - 1) + +Sync2AsInv == + \A p \in CORRECT: + \/ p \notin blockPool.peerIds + \/ (state = "finished" => blockPool.height >= blockPool.peerHeights[p] - 1) + +(* Correct synchronization *) +Sync3 == + \A p \in CORRECT: + \/ p \notin blockPool.peerIds + \/ blockPool.syncedBlocks <= 0 \* timeout + \/ [] (state = "finished" => blockPool.height >= blockPool.peerHeights[p] - 1) + +Sync3AsInv == + \A p \in CORRECT: + \/ p \notin blockPool.peerIds + \/ blockPool.syncedBlocks <= 0 \* timeout + \/ (state = "finished" => blockPool.height >= blockPool.peerHeights[p] - 1) + +(* Naive termination *) +\* This property is violated, as the faulty peers may produce infinitely many responses +Termination == + WF_turn(FlipTurn) => <>(state = "finished") + +(* Termination by timeout: the protocol terminates, if there is a timeout *) +\* the precondition: fair flip turn and eventual timeout when no new blocks were synchronized +TerminationByTOPre == + /\ WF_turn(FlipTurn) + /\ <>(inMsg.type = "syncTimeout" /\ blockPool.height <= blockPool.syncHeight) + +TerminationByTO == + TerminationByTOPre => <>(state = "finished") + +(* The termination property when we only have correct peers *) +\* as correct peers may spam the node with addPeer, removePeer, and statusResponse, +\* we have to enforce eventual response (there are no queues in our spec) +CorrBlockResponse == + \A h \in Heights: + [](outMsg.type = "blockRequest" /\ outMsg.height = h + => <>(inMsg.type = "blockResponse" /\ inMsg.block.height = h)) + +\* a precondition for termination in presence of only correct processes +TerminationCorrPre == + /\ FAULTY = AsPidSet({}) + /\ WF_turn(FlipTurn) + /\ CorrBlockResponse + +\* termination when there are only correct processes +TerminationCorr == + TerminationCorrPre => <>(state = "finished") + +\* All synchronized blocks (but the last one) are exactly like in the reference chain +CorrectBlocksInv == + \/ state /= "finished" + \/ \A h \in 1..(blockPool.height - 1): + blockPool.blockStore[h] = chain[h] + +\* A false expectation that the protocol only finishes with the blocks +\* from the processes that had not been suspected in being faulty +SyncFromCorrectInv == + \/ state /= "finished" + \/ \A h \in 1..blockPool.height: + blockPool.receivedBlocks[h] \in blockPool.peerIds \union {NilPeer} + +\* A false expectation that a correct process is never removed from the set of peer ids. +\* A correct process may reply too late and then gets evicted. +CorrectNeverSuspectedInv == + CORRECT \subseteq blockPool.peerIds + +BlockPoolInvariant == + \A h \in Heights: + \* waiting for a block to arrive + \/ /\ blockPool.receivedBlocks[h] = NilPeer + /\ blockPool.blockStore[h] = NilBlock + \* valid block is received and is present in the store + \/ /\ blockPool.receivedBlocks[h] /= NilPeer + /\ blockPool.blockStore[h] /= NilBlock + /\ blockPool.pendingBlocks[h] = NilPeer + +(* a few simple properties that trigger counterexamples *) + +\* Shows execution in which peer set is empty +PeerSetIsNeverEmpty == blockPool.peerIds /= AsPidSet({}) + +\* Shows execution in which state = "finished" and MaxPeerHeight is not equal to 1 +StateNotFinished == + state /= "finished" \/ MaxPeerHeight(blockPool) = 1 + + +============================================================================= + +\*============================================================================= +\* Modification History +\* Last modified Fri May 29 20:41:53 CEST 2020 by igor +\* Last modified Thu Apr 16 16:57:22 CEST 2020 by zarkomilosevic +\* Created Tue Feb 04 10:36:18 CET 2020 by zarkomilosevic