Files
tendermint/blocksync/reactor.go
Thane Thomson b8c0a8b6ac [cherry-picked] abci++: Propagate vote extensions (RFC 017) (#8433)
* Add protos for ExtendedCommit

Cherry-pick from e73f0178b72a16ee81f8e856aadf651f2c62ec6e just the
changes to the .proto files, since we have deleted the .intermediate
files.

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* make proto-gen

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* BlockStore holds extended commit

Cherry-pick 8d504d4b50ec6afbdffe2df7ababbef30e15053d and fix conflicts.

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Reshuffle ExtendedCommit and ExtendedCommitSig

Separate the data structures and functions from their Commit-oriented
counterparts to adhere to the current coding style.

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix exit condition in blocksync

* Add note to remove TxResult proto

As Sergio pointed out in 3e31aa6f583cdc71e208ed03a82f1d804ec0de49, this
proto message can probably be removed. We should do this in a separate
PR.

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Lift termination condition into for loop

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Enforce vote extension signature requirement

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Expand on comment for PeekTwoBlocks for posterity

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Isolate TODO more clearly

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* make mockery

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix comment

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Make panic output from BlockStore.SaveBlock more readable

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add helper methods to ExtendedCommitSig and ExtendedCommit

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix most tests except TestHandshake*

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix store prefix collision

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix TestBlockFetchAtHeight

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove global state from store tests

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Apply suggestions from code review

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
Co-authored-by: Sergio Mena <sergio@informal.systems>

* blocksync: Just return error

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* make format

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Remove unused/commented-out code

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* blocksync: Change pool AddBlock function signature to return errors

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Improve legibility of switch statements

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* blocksync: Expand on extended commit requirement in AddBlock description

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* blocksync: Return error without also logging it

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* consensus: Rename short-lived local variable

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* consensus: Allocate TODO to Sergio

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* evidence/pool_test: Inline slice construction

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* state: Rename LoadBlockExtCommit to LoadBlockExtendedCommit

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* proto: Remove TODO on TxResult

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Minor format

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Reformat ExtendedCommitSig.BlockID

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Remove NewExtendedCommit constructor

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Remove NewCommit constructor

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Shorten receiver names for ExtendedCommit

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Convert ExtendedCommit.Copy to a deep clone

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Assign TODO to Sergio

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Fix legibility nits

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Improve legibility

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* store/state: Add TODO to move prefixes to common package

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Propagate validator info to PrepareProposal

In order to propagate validator voting power through to PrepareProposal,
we need to load the validator set info from the height corresponding to
the extended commit that we're passing through to PrepareProposal as the
"LocalLastCommit".

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Rename local var for clarity

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix TestMaxProposalBlockSize

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Rename local var for clarity

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove debug log

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove CommigSig.ForBlock helper

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove CommigSig.Absent helper

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove ExtendedCommitSig.ForBlock helper

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove ExtendedCommitSig.Absent helper

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* There are no extended commits below the initial height

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Fix comment grammar

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Remove JSON encoding from ExtendedCommit

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Embed CommitSig into ExtendedCommitSig instead of duplicating fields

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Rename ExtendedCommit vote_extension field to extension for consistency with domain types

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* blocksync: Panic if we peek a block without an extended commit

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Apply suggestions from code review

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Remove Sergio from TODO

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Increase hard-coded vote extension max size to 1MB

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* state: Remove unnecessary comment

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* state: Ensure no of commit sigs equals validator set length

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* make format

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Minor legibility improvements

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Improve legibility

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Remove unused GetVotes function on VoteSet

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Refactor TestMaxProposalBlockSize to construct more realistic extended commit

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Refactor buildExtendedCommitInfo to resemble buildLastCommitInfo

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Apply suggestions from code review

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* abci++: Disable VerifyVoteExtension call on nil precommits (#8491)

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* types: Require vote extensions on non-nil precommits and not otherwise

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Disable lint

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Increase timeout for TestReactorVotingPowerChange to counter flakiness

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Only sign and verify vote extensions in non-nil precommits

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Revert "Disable lint"

This reverts commit 6fffbf9402.

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Add missing non-nil check uncovered non-deterministically in TestHandshakeReplayAll

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Expand error message for accuracy

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Only call ExtendVote when we make non-nil precommits

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Revert "Increase timeout for TestReactorVotingPowerChange to counter flakiness"

This reverts commit af514939db.

Signed-off-by: Thane Thomson <connect@thanethomson.com>

* Refactor ValidateBasic for ExtendedCommitSig for legibility

Signed-off-by: Thane Thomson <connect@thanethomson.com>

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
2022-11-30 21:19:14 +01:00

459 lines
13 KiB
Go

package blocksync
import (
"fmt"
"reflect"
"time"
"github.com/tendermint/tendermint/libs/log"
"github.com/tendermint/tendermint/p2p"
bcproto "github.com/tendermint/tendermint/proto/tendermint/blocksync"
sm "github.com/tendermint/tendermint/state"
"github.com/tendermint/tendermint/store"
"github.com/tendermint/tendermint/types"
)
const (
// BlocksyncChannel is a channel for blocks and status updates (`BlockStore` height)
BlocksyncChannel = byte(0x40)
trySyncIntervalMS = 10
// stop syncing when last block's time is
// within this much of the system time.
// stopSyncingDurationMinutes = 10
// ask for best height every 10s
statusUpdateIntervalSeconds = 10
// check if we should switch to consensus reactor
switchToConsensusIntervalSeconds = 1
)
type consensusReactor interface {
// for when we switch from blocksync reactor and block sync to
// the consensus machine
SwitchToConsensus(state sm.State, skipWAL bool)
}
type peerError struct {
err error
peerID p2p.ID
}
func (e peerError) Error() string {
return fmt.Sprintf("error with peer %v: %s", e.peerID, e.err.Error())
}
// Reactor handles long-term catchup syncing.
type Reactor struct {
p2p.BaseReactor
// immutable
initialState sm.State
blockExec *sm.BlockExecutor
store sm.BlockStore
pool *BlockPool
blockSync bool
requestsCh <-chan BlockRequest
errorsCh <-chan peerError
metrics *Metrics
}
// NewReactor returns new reactor instance.
func NewReactor(state sm.State, blockExec *sm.BlockExecutor, store *store.BlockStore,
blockSync bool, metrics *Metrics) *Reactor {
if state.LastBlockHeight != store.Height() {
panic(fmt.Sprintf("state (%v) and store (%v) height mismatch", state.LastBlockHeight,
store.Height()))
}
requestsCh := make(chan BlockRequest, maxTotalRequesters)
const capacity = 1000 // must be bigger than peers count
errorsCh := make(chan peerError, capacity) // so we don't block in #Receive#pool.AddBlock
startHeight := store.Height() + 1
if startHeight == 1 {
startHeight = state.InitialHeight
}
pool := NewBlockPool(startHeight, requestsCh, errorsCh)
bcR := &Reactor{
initialState: state,
blockExec: blockExec,
store: store,
pool: pool,
blockSync: blockSync,
requestsCh: requestsCh,
errorsCh: errorsCh,
metrics: metrics,
}
bcR.BaseReactor = *p2p.NewBaseReactor("Reactor", bcR)
return bcR
}
// SetLogger implements service.Service by setting the logger on reactor and pool.
func (bcR *Reactor) SetLogger(l log.Logger) {
bcR.BaseService.Logger = l
bcR.pool.Logger = l
}
// OnStart implements service.Service.
func (bcR *Reactor) OnStart() error {
if bcR.blockSync {
err := bcR.pool.Start()
if err != nil {
return err
}
go bcR.poolRoutine(false)
}
return nil
}
// SwitchToBlockSync is called by the state sync reactor when switching to block sync.
func (bcR *Reactor) SwitchToBlockSync(state sm.State) error {
bcR.blockSync = true
bcR.initialState = state
bcR.pool.height = state.LastBlockHeight + 1
err := bcR.pool.Start()
if err != nil {
return err
}
go bcR.poolRoutine(true)
return nil
}
// OnStop implements service.Service.
func (bcR *Reactor) OnStop() {
if bcR.blockSync {
if err := bcR.pool.Stop(); err != nil {
bcR.Logger.Error("Error stopping pool", "err", err)
}
}
}
// GetChannels implements Reactor
func (bcR *Reactor) GetChannels() []*p2p.ChannelDescriptor {
return []*p2p.ChannelDescriptor{
{
ID: BlocksyncChannel,
Priority: 5,
SendQueueCapacity: 1000,
RecvBufferCapacity: 50 * 4096,
RecvMessageCapacity: MaxMsgSize,
MessageType: &bcproto.Message{},
},
}
}
// AddPeer implements Reactor by sending our state to peer.
func (bcR *Reactor) AddPeer(peer p2p.Peer) {
peer.Send(p2p.Envelope{
ChannelID: BlocksyncChannel,
Message: &bcproto.StatusResponse{
Base: bcR.store.Base(),
Height: bcR.store.Height(),
},
})
// it's OK if send fails. will try later in poolRoutine
// peer is added to the pool once we receive the first
// bcStatusResponseMessage from the peer and call pool.SetPeerRange
}
// RemovePeer implements Reactor by removing peer from the pool.
func (bcR *Reactor) RemovePeer(peer p2p.Peer, reason interface{}) {
bcR.pool.RemovePeer(peer.ID())
}
// respondToPeer loads a block and sends it to the requesting peer,
// if we have it. Otherwise, we'll respond saying we don't have it.
func (bcR *Reactor) respondToPeer(msg *bcproto.BlockRequest,
src p2p.Peer) (queued bool) {
block := bcR.store.LoadBlock(msg.Height)
if block != nil {
extCommit := bcR.store.LoadBlockExtendedCommit(msg.Height)
if extCommit == nil {
bcR.Logger.Error("found block in store without extended commit", "block", block)
return false
}
bl, err := block.ToProto()
if err != nil {
bcR.Logger.Error("could not convert msg to protobuf", "err", err)
return false
}
return src.TrySend(p2p.Envelope{
ChannelID: BlocksyncChannel,
Message: &bcproto.BlockResponse{
Block: bl,
ExtCommit: extCommit.ToProto(),
},
})
}
bcR.Logger.Info("Peer asking for a block we don't have", "src", src, "height", msg.Height)
return src.TrySend(p2p.Envelope{
ChannelID: BlocksyncChannel,
Message: &bcproto.NoBlockResponse{Height: msg.Height},
})
}
// Receive implements Reactor by handling 4 types of messages (look below).
func (bcR *Reactor) Receive(e p2p.Envelope) {
if err := ValidateMsg(e.Message); err != nil {
bcR.Logger.Error("Peer sent us invalid msg", "peer", e.Src, "msg", e.Message, "err", err)
bcR.Switch.StopPeerForError(e.Src, err)
return
}
bcR.Logger.Debug("Receive", "e.Src", e.Src, "chID", e.ChannelID, "msg", e.Message)
switch msg := e.Message.(type) {
case *bcproto.BlockRequest:
bcR.respondToPeer(msg, e.Src)
case *bcproto.BlockResponse:
bi, err := types.BlockFromProto(msg.Block)
if err != nil {
bcR.Logger.Error("Block content is invalid", "err", err)
return
}
extCommit, err := types.ExtendedCommitFromProto(msg.ExtCommit)
if err != nil {
bcR.Logger.Error("failed to convert extended commit from proto",
"peer", src,
"err", err)
return
}
if err := bcR.pool.AddBlock(e.Src.ID(), bi, extCommit, msg.Block.Size()); err != nil {
bcR.Logger.Error("failed to add block", "err", err)
}
case *bcproto.StatusRequest:
// Send peer our state.
e.Src.TrySend(p2p.Envelope{
ChannelID: BlocksyncChannel,
Message: &bcproto.StatusResponse{
Height: bcR.store.Height(),
Base: bcR.store.Base(),
},
})
case *bcproto.StatusResponse:
// Got a peer status. Unverified.
bcR.pool.SetPeerRange(e.Src.ID(), msg.Base, msg.Height)
case *bcproto.NoBlockResponse:
bcR.Logger.Debug("Peer does not have requested block", "peer", e.Src, "height", msg.Height)
default:
bcR.Logger.Error(fmt.Sprintf("Unknown message type %v", reflect.TypeOf(msg)))
}
}
// Handle messages from the poolReactor telling the reactor what to do.
// NOTE: Don't sleep in the FOR_LOOP or otherwise slow it down!
func (bcR *Reactor) poolRoutine(stateSynced bool) {
bcR.metrics.Syncing.Set(1)
defer bcR.metrics.Syncing.Set(0)
trySyncTicker := time.NewTicker(trySyncIntervalMS * time.Millisecond)
defer trySyncTicker.Stop()
statusUpdateTicker := time.NewTicker(statusUpdateIntervalSeconds * time.Second)
defer statusUpdateTicker.Stop()
switchToConsensusTicker := time.NewTicker(switchToConsensusIntervalSeconds * time.Second)
defer switchToConsensusTicker.Stop()
blocksSynced := uint64(0)
chainID := bcR.initialState.ChainID
state := bcR.initialState
lastHundred := time.Now()
lastRate := 0.0
didProcessCh := make(chan struct{}, 1)
go func() {
for {
select {
case <-bcR.Quit():
return
case <-bcR.pool.Quit():
return
case request := <-bcR.requestsCh:
peer := bcR.Switch.Peers().Get(request.PeerID)
if peer == nil {
continue
}
queued := peer.TrySend(p2p.Envelope{
ChannelID: BlocksyncChannel,
Message: &bcproto.BlockRequest{Height: request.Height},
})
if !queued {
bcR.Logger.Debug("Send queue is full, drop block request", "peer", peer.ID(), "height", request.Height)
}
case err := <-bcR.errorsCh:
peer := bcR.Switch.Peers().Get(err.peerID)
if peer != nil {
bcR.Switch.StopPeerForError(peer, err)
}
case <-statusUpdateTicker.C:
// ask for status updates
go bcR.BroadcastStatusRequest()
}
}
}()
FOR_LOOP:
for {
select {
case <-switchToConsensusTicker.C:
height, numPending, lenRequesters := bcR.pool.GetStatus()
outbound, inbound, _ := bcR.Switch.NumPeers()
bcR.Logger.Debug("Consensus ticker", "numPending", numPending, "total", lenRequesters,
"outbound", outbound, "inbound", inbound)
// TODO(sergio) Might be needed for implementing the upgrading solution. Remove after that
if state.LastBlockHeight > 0 && blocksSynced == 0 {
// Having state-synced, we need to blocksync at least one block
bcR.Logger.Info(
"no seen commit yet",
"height", height,
"last_block_height", state.LastBlockHeight,
"initial_height", state.InitialHeight,
"max_peer_height", bcR.pool.MaxPeerHeight(),
)
continue FOR_LOOP
}
if bcR.pool.IsCaughtUp() {
bcR.Logger.Info("Time to switch to consensus reactor!", "height", height)
if err := bcR.pool.Stop(); err != nil {
bcR.Logger.Error("Error stopping pool", "err", err)
}
conR, ok := bcR.Switch.Reactor("CONSENSUS").(consensusReactor)
if ok {
conR.SwitchToConsensus(state, blocksSynced > 0 || stateSynced)
}
// else {
// should only happen during testing
// }
break FOR_LOOP
}
case <-trySyncTicker.C: // chan time
select {
case didProcessCh <- struct{}{}:
default:
}
case <-didProcessCh:
// NOTE: It is a subtle mistake to process more than a single block
// at a time (e.g. 10) here, because we only TrySend 1 request per
// loop. The ratio mismatch can result in starving of blocks, a
// sudden burst of requests and responses, and repeat.
// Consequently, it is better to split these routines rather than
// coupling them as it's written here. TODO uncouple from request
// routine.
// See if there are any blocks to sync.
first, second, extCommit := bcR.pool.PeekTwoBlocks()
// bcR.Logger.Info("TrySync peeked", "first", first, "second", second)
if first == nil || second == nil || extCommit == nil {
if first != nil && extCommit == nil {
// See https://github.com/tendermint/tendermint/pull/8433#discussion_r866790631
panic(fmt.Errorf("peeked first block without extended commit at height %d - possible node store corruption", first.Height))
}
// we need all to sync the first block
continue FOR_LOOP
} else {
// Try again quickly next loop.
didProcessCh <- struct{}{}
}
firstParts, err := first.MakePartSet(types.BlockPartSizeBytes)
if err != nil {
bcR.Logger.Error("failed to make ",
"height", first.Height,
"err", err.Error())
break FOR_LOOP
}
firstPartSetHeader := firstParts.Header()
firstID := types.BlockID{Hash: first.Hash(), PartSetHeader: firstPartSetHeader}
// Finally, verify the first block using the second's commit
// NOTE: we can probably make this more efficient, but note that calling
// first.Hash() doesn't verify the tx contents, so MakePartSet() is
// currently necessary.
// TODO(sergio): Should we also validate against the extended commit?
err = state.Validators.VerifyCommitLight(
chainID, firstID, first.Height, second.LastCommit)
if err == nil {
// validate the block before we persist it
err = bcR.blockExec.ValidateBlock(state, first)
}
if err != nil {
bcR.Logger.Error("Error in validation", "err", err)
peerID := bcR.pool.RedoRequest(first.Height)
peer := bcR.Switch.Peers().Get(peerID)
if peer != nil {
// NOTE: we've already removed the peer's request, but we
// still need to clean up the rest.
bcR.Switch.StopPeerForError(peer, fmt.Errorf("Reactor validation error: %v", err))
}
peerID2 := bcR.pool.RedoRequest(second.Height)
peer2 := bcR.Switch.Peers().Get(peerID2)
if peer2 != nil && peer2 != peer {
// NOTE: we've already removed the peer's request, but we
// still need to clean up the rest.
bcR.Switch.StopPeerForError(peer2, fmt.Errorf("Reactor validation error: %v", err))
}
continue FOR_LOOP
}
bcR.pool.PopRequest()
// TODO: batch saves so we dont persist to disk every block
bcR.store.SaveBlock(first, firstParts, extCommit)
// TODO: same thing for app - but we would need a way to
// get the hash without persisting the state
state, err = bcR.blockExec.ApplyBlock(state, firstID, first)
if err != nil {
// TODO This is bad, are we zombie?
panic(fmt.Sprintf("Failed to process committed block (%d:%X): %v", first.Height, first.Hash(), err))
}
blocksSynced++
if blocksSynced%100 == 0 {
lastRate = 0.9*lastRate + 0.1*(100/time.Since(lastHundred).Seconds())
bcR.Logger.Info("Block Sync Rate", "height", bcR.pool.height,
"max_peer_height", bcR.pool.MaxPeerHeight(), "blocks/s", lastRate)
lastHundred = time.Now()
}
continue FOR_LOOP
case <-bcR.Quit():
break FOR_LOOP
}
}
}
// BroadcastStatusRequest broadcasts `BlockStore` base and height.
func (bcR *Reactor) BroadcastStatusRequest() {
bcR.Switch.Broadcast(p2p.Envelope{
ChannelID: BlocksyncChannel,
Message: &bcproto.StatusRequest{},
})
}