Compare commits

..

14 Commits

Author SHA1 Message Date
William Banfield
2fbb304237 nolint lll 2021-09-21 17:25:01 -04:00
William Banfield
c3805078ec remove ensureLock 2021-09-21 17:19:43 -04:00
William Banfield
78ef22f49d cleanup after rebase 2021-09-21 16:51:59 -04:00
William Banfield
e712ec4137 mark decide and validate methods as helpers 2021-09-21 16:40:27 -04:00
William Banfield
b96e4b6738 remove panics from main helper functions in consensus tests 2021-09-21 16:40:25 -04:00
Sam Kleinman
b0423e2445 e2e: allow load generator to succed for short tests (#6952)
This should address last night's failure. We've taken the perspective
of "the load generator shouldn't cause tests to fail" in recent
days/weeks, and I think this is just a next step along that line. The
e2e tests shouldn't test performance. 

I included some comments indicating the ways that this isn't ideal (it
is perhaps not), and I think that if test networks could make
assertions about the required rate, that might be a cool future
improvement (and good, perhaps, for system benchmarking.)
2021-09-16 15:45:51 +00:00
dependabot[bot]
b0684bd300 build(deps): Bump github.com/vektra/mockery/v2 from 2.9.0 to 2.9.3 (#6951)
Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.9.0 to 2.9.3.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/.goreleaser.yml)
- [Commits](https://github.com/vektra/mockery/compare/v2.9.0...v2.9.3)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-16 08:41:46 -04:00
William Banfield
382947ce93 rfc: add performance taxonomy rfc (#6921)
This document attempts to capture and discuss some of the areas of Tendermint that seem to be cited as causing performance issue. I'm hoping to continue to gather feedback and input on this document to better understand what issues Tendermint performance may cause for our users. 

The overall goal of this document is to allow the maintainers and community to get a better sense of these issues and to be more capably able to discuss them and weight trade-offs about any proposed performance-focused changes. This document does not aim to propose any performance improvements. It does suggest useful places for benchmarks and places where additional metrics would be useful for diagnosing and further understanding Tendermint performance.

Please comment with areas where my reasoning seems off or with additional areas that Tendermint performance may be causing user pain.
2021-09-16 06:13:27 +00:00
Callum Waters
9a7ce08e3e statesync: shut down node when statesync fails (#6944) 2021-09-16 07:43:23 +02:00
Sam Kleinman
55f6d20977 e2e: skip broadcastTxCommit check (#6949)
I think the `Sync` check covers our primary use case, and perhaps we
can turn this back on in the future after some kind of event-system
rewrite, or RPC rewrite that will avoid the serverside timeout.
2021-09-15 21:24:35 +00:00
Sam Kleinman
b9c35c1263 docs: fix openapi yaml lint (#6948)
saw this in the super lint.
2021-09-15 19:29:25 +00:00
Sam Kleinman
f08f72e334 rfc: e2e improvements (#6941) 2021-09-15 15:26:39 -04:00
Callum Waters
e932b469ed e2e: tweak semantics of waitForHeight (#6943) 2021-09-15 20:49:24 +02:00
Callum Waters
5db2a39643 docs: add documentation of unsafe_flush_mempool to openapi (#6947) 2021-09-15 17:28:01 +02:00
20 changed files with 793 additions and 347 deletions

View File

@@ -40,6 +40,7 @@ sections.
- [RFC-000: P2P Roadmap](./rfc-000-p2p-roadmap.rst)
- [RFC-001: Storage Engines](./rfc-001-storage-engine.rst)
- [RFC-002: Interprocess Communication](./rfc-002-ipc-ecosystem.md)
- [RFC-003: Performance Taxonomy](./rfc-003-performance-questions.md)
- [RFC-004: E2E Test Framework Enhancements](./rfc-004-e2e-framework.md)
<!-- - [RFC-NNN: Title](./rfc-NNN-title.md) -->

View File

@@ -0,0 +1,283 @@
# RFC 003: Taxonomy of potential performance issues in Tendermint
## Changelog
- 2021-09-02: Created initial draft (@wbanfield)
- 2021-09-14: Add discussion of the event system (@wbanfield)
## Abstract
This document discusses the various sources of performance issues in Tendermint and
attempts to clarify what work may be required to understand and address them.
## Background
Performance, loosely defined as the ability of a software process to perform its work
quickly and efficiently under load and within reasonable resource limits, is a frequent
topic of discussion in the Tendermint project.
To effectively address any issues with Tendermint performance we need to
categorize the various issues, understand their potential sources, and gauge their
impact on users.
Categorizing the different known performance issues will allow us to discuss and fix them
more systematically. This document proposes a rough taxonomy of performance issues
and highlights areas where more research into potential performance problems is required.
Understanding Tendermint's performance limitations will also be critically important
as we make changes to many of its subsystems. Performance is a central concern for
upcoming decisions regarding the `p2p` protocol, RPC message encoding and structure,
database usage and selection, and consensus protocol updates.
## Discussion
This section attempts to delineate the different sections of Tendermint functionality
that are often cited as having performance issues. It raises questions and suggests
lines of inquiry that may be valuable for better understanding Tendermint's performance issues.
As a note: We should avoid quickly adding many microbenchmarks or package level benchmarks.
These are prone to being worse than useless as they can obscure what _should_ be
focused on: performance of the system from the perspective of a user. We should,
instead, tune performance with an eye towards user needs and actions users make. These users comprise
both operators of Tendermint chains and the people generating transactions for
Tendermint chains. Both of these sets of users are largely aligned in wanting an end-to-end
system that operates quickly and efficiently.
REQUEST: The list below may be incomplete, if there are additional sections that are often
cited as creating poor performance, please comment so that they may be included.
### P2P
#### Claim: Tendermint cannot scale to large numbers of nodes
A complaint has been reported that Tendermint networks cannot scale to large numbers of nodes.
The listed number of nodes a user reported as causing issue was in the thousands.
We don't currently have evidence about what the upper-limit of nodes that Tendermint's
P2P stack can scale to.
We need to more concretely understand the source of issues and determine what layer
is causing a problem. It's possible that the P2P layer, in the absence of any reactors
sending data, is perfectly capable of managing thousands of peer connections. For
a reasonable networking and application setup, thousands of connections should not present any
issue for the application.
We need more data to understand the problem directly. We want to drive the popularity
and adoption of Tendermint and this will mean allowing for chains with more validators.
We should follow up with users experiencing this issue. We may then want to add
a series of metrics to the P2P layer to better understand the inefficiencies it produces.
The following metrics can help us understand the sources of latency in the Tendermint P2P stack:
* Number of messages sent and received per second
* Time of a message spent on the P2P layer send and receive queues
The following metrics exist and should be leveraged in addition to those added:
* Number of peers node's connected to
* Number of bytes per channel sent and received from each peer
### Sync
#### Claim: Block Syncing is slow
Bootstrapping a new node in a network to the height of the rest of the network is believed to
take longer than users would like. Block sync requires fetching all of the blocks from
peers and placing them into the local disk for storage. A useful line of inquiry
is understanding how quickly a perfectly tuned system _could_ fetch all of the state
over a network so that we understand how much overhead Tendermint actually adds.
The operation is likely to be _incredibly_ dependent on the environment in which
the node is being run. The factors that will influence syncing include:
1. Number of peers that a syncing node may fetch from.
2. Speed of the disk that a validator is writing to.
3. Speed of the network connection between the different peers that node is
syncing from.
We should calculate how quickly this operation _could possibly_ complete for common chains and nodes.
To calculate how quickly this operation could possibly complete, we should assume that
a node is reading at line-rate of the NIC and writing at the full drive speed to its
local storage. Comparing this theoretical upper-limit to the actual sync times
observed by node operators will give us a good point of comparison for understanding
how much overhead Tendermint incurs.
We should additionally add metrics to the blocksync operation to more clearly pinpoint
slow operations. The following metrics should be added to the block syncing operation:
* Time to fetch and validate each block
* Time to execute a block
* Blocks sync'd per unit time
### Application
Applications performing complex state transitions have the potential to bottleneck
the Tendermint node.
#### Claim: ABCI block delivery could cause slowdown
ABCI delivers blocks in several methods: `BeginBlock`, `DeliverTx`, `EndBlock`, `Commit`.
Tendermint delivers transactions one-by-one via the `DeliverTx` call. Most of the
transaction delivery in Tendermint occurs asynchronously and therefore appears unlikely to
form a bottleneck in ABCI.
After delivering all transactions, Tendermint then calls the `Commit` ABCI method.
Tendermint [locks all access to the mempool][abci-commit-description] while `Commit`
proceeds. This means that an application that is slow to execute all of its
transactions or finalize state during the `Commit` method will prevent any new
transactions from being added to the mempool. Apps that are slow to commit will
prevent consensus from proceeded to the next consensus height since Tendermint
cannot validate block proposals or produce block proposals without the
AppHash obtained from the `Commit` method. We should add a metric for each
step in the ABCI protocol to track the amount of time that a node spends communicating
with the application at each step.
#### Claim: ABCI serialization overhead causes slowdown
The most common way to run a Tendermint application is using the Cosmos-SDK.
The Cosmos-SDK runs the ABCI application within the same process as Tendermint.
When an application is run in the same process as Tendermint, a serialization penalty
is not paid. This is because the local ABCI client does not serialize method calls
and instead passes the protobuf type through directly. This can be seen
in [local_client.go][abci-local-client-code].
Serialization and deserialization in the gRPC and socket protocol ABCI methods
may cause slowdown. While these may cause issue, they are not part of the primary
usecase of Tendermint and do not necessarily need to be addressed at this time.
### RPC
#### Claim: The Query API is slow.
The query API locks a mutex across the ABCI connections. This causes consensus to
slow during queries, as ABCI is no longer able to make progress. This is known
to be causing issue in the cosmos-sdk and is being addressed [in the sdk][sdk-query-fix]
but a more robust solution may be required. Adding metrics to each ABCI client connection
and message as described in the Application section of this document would allow us
to further introspect the issue here.
#### Claim: RPC Serialization may cause slowdown
The Tendermint RPC uses a modified version of JSON-RPC. This RPC powers the `broadcast_tx_*` methods,
which is a critical method for adding transactions to Tendermint at the moment. This method is
likely invoked quite frequently on popular networks. Being able to perform efficiently
on this common and critical operation is very important. The current JSON-RPC implementation
relies heavily on type introspection via reflection, which is known to be very slow in
Go. We should therefore produce benchmarks of this method to determine how much overhead
we are adding to what, is likely to be, a very common operation.
The other JSON-RPC methods are much less critical to the core functionality of Tendermint.
While there may other points of performance consideration within the RPC, methods that do not
receive high volumes of requests should not be prioritized for performance consideration.
NOTE: Previous discussion of the RPC framework was done in [ADR 57][adr-57] and
there is ongoing work to inspect and alter the JSON-RPC framework in [RFC 002][rfc-002].
Much of these RPC-related performance considerations can either wait until the work of RFC 002 work is done or be
considered concordantly with the in-flight changes to the JSON-RPC.
### Protocol
#### Claim: Gossiping messages is a slow process
Currently, for any validator to successfully vote in a consensus _step_, it must
receive votes from greater than 2/3 of the validators on the network. In many cases,
it's preferable to receive as many votes as possible from correct validators.
This produces a quadratic increase in messages that are communicated as more validators join the network.
(Each of the N validators must communicate with all other N-1 validators).
This large number of messages communicated per step has been identified to impact
performance of the protocol. Given that the number of messages communicated has been
identified as a bottleneck, it would be extremely valuable to gather data on how long
it takes for popular chains with many validators to gather all votes within a step.
Metrics that would improve visibility into this include:
* Amount of time for a node to gather votes in a step.
* Amount of time for a node to gather all block parts.
* Number of votes each node sends to gossip (i.e. not its own votes, but votes it is
transmitting for a peer).
* Total number of votes each node sends to receives (A node may receive duplicate votes
so understanding how frequently this occurs will be valuable in evaluating the performance
of the gossip system).
#### Claim: Hashing Txs causes slowdown in Tendermint
Using a faster hash algorithm for Tx hashes is currently a point of discussion
in Tendermint. Namely, it is being considered as part of the [modular hashing proposal][modular-hashing].
It is currently unknown if hashing transactions in the Mempool forms a significant bottleneck.
Although it does not appear to be documented as slow, there are a few open github
issues that indicate a possible user preference for a faster hashing algorithm,
including [issue 2187][issue-2187] and [issue 2186][issue-2186].
It is likely worth investigating what order of magnitude Tx hashing takes in comparison to other
aspects of adding a Tx to the mempool. It is not currently clear if the rate of adding Tx
to the mempool is a source of user pain. We should not endeavor to make large changes to
consensus critical components without first being certain that the change is highly
valuable and impactful.
### Digital Signatures
#### Claim: Verification of digital signatures may cause slowdown in Tendermint
Working with cryptographic signatures can be computationally expensive. The cosmos
hub uses [ed25519 signatures][hub-signature]. The library performing signature
verification in Tendermint on votes is [benchmarked][ed25519-bench] to be able to perform an `ed25519`
signature in 75μs on a decently fast CPU. A validator in the Cosmos Hub performs
3 sets of verifications on the signatures of the 140 validators in the Hub
in a consensus round, during block verification, when verifying the prevotes, and
when verifying the precommits. With no batching, this would be roughly `3ms` per
round. It is quite unlikely, therefore, that this accounts for any serious amount
of the ~7 seconds of block time per height in the Hub.
This may cause slowdown when syncing, since the process needs to constantly verify
signatures. It's possible that improved signature aggregation will lead to improved
light client or other syncing performance. In general, a metric should be added
to track block rate while blocksyncing.
#### Claim: Our use of digital signatures in the consensus protocol contributes to performance issue
Currently, Tendermint's digital signature verification requires that all validators
receive all vote messages. Each validator must receive the complete digital signature
along with the vote message that it corresponds to. This means that all N validators
must receive messages from at least 2/3 of the N validators in each consensus
round. Given the potential for oddly shaped network topologies and the expected
variable network roundtrip times of a few hundred milliseconds in a blockchain,
it is highly likely that this amount of gossiping is leading to a significant amount
of the slowdown in the Cosmos Hub and in Tendermint consensus.
### Tendermint Event System
#### Claim: The event system is a bottleneck in Tendermint
The Tendermint Event system is used to communicate and store information about
internal Tendermint execution. The system uses channels internally to send messages
to different subscribers. Sending an event [blocks on the internal channel][event-send].
The default configuration is to [use an unbuffered channel for event publishes][event-buffer-capacity].
Several consumers of the event system also use an unbuffered channel for reads.
An example of this is the [event indexer][event-indexer-unbuffered], which takes an
unbuffered subscription to the event system. The result is that these unbuffered readers
can cause writes to the event system to block or slow down depending on contention in the
event system. This has implications for the consensus system, which [publishes events][consensus-event-send].
To better understand the performance of the event system, we should add metrics to track the timing of
event sends. The following metrics would be a good start for tracking this performance:
* Time in event send, labeled by Event Type
* Time in event receive, labeled by subscriber
* Event throughput, measured in events per unit time.
### References
[modular-hashing]: https://github.com/tendermint/tendermint/pull/6773
[issue-2186]: https://github.com/tendermint/tendermint/issues/2186
[issue-2187]: https://github.com/tendermint/tendermint/issues/2187
[rfc-002]: https://github.com/tendermint/tendermint/pull/6913
[adr-57]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-057-RPC.md
[issue-1319]: https://github.com/tendermint/tendermint/issues/1319
[abci-commit-description]: https://github.com/tendermint/spec/blob/master/spec/abci/apps.md#commit
[abci-local-client-code]: https://github.com/tendermint/tendermint/blob/511bd3eb7f037855a793a27ff4c53c12f085b570/abci/client/local_client.go#L84
[hub-signature]: https://github.com/cosmos/gaia/blob/0ecb6ed8a244d835807f1ced49217d54a9ca2070/docs/resources/genesis.md#consensus-parameters
[ed25519-bench]: https://github.com/oasisprotocol/curve25519-voi/blob/d2e7fc59fe38c18ca990c84c4186cba2cc45b1f9/PERFORMANCE.md
[event-send]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/libs/pubsub/pubsub.go#L338
[event-buffer-capacity]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/types/event_bus.go#L14
[event-indexer-unbuffered]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/state/indexer/indexer_service.go#L39
[consensus-event-send]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/internal/consensus/state.go#L1573
[sdk-query-fix]: https://github.com/cosmos/cosmos-sdk/pull/10045

2
go.mod
View File

@@ -34,7 +34,7 @@ require (
github.com/spf13/viper v1.8.1
github.com/stretchr/testify v1.7.0
github.com/tendermint/tm-db v0.6.4
github.com/vektra/mockery/v2 v2.9.0
github.com/vektra/mockery/v2 v2.9.3
golang.org/x/crypto v0.0.0-20210513164829-c07d793c2f9a
golang.org/x/net v0.0.0-20210428140749-89ef3d95e781
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c

4
go.sum
View File

@@ -895,8 +895,8 @@ github.com/valyala/bytebufferpool v1.0.0/go.mod h1:6bBcMArwyJ5K/AmCkWv1jt77kVWyC
github.com/valyala/fasthttp v1.16.0/go.mod h1:YOKImeEosDdBPnxc0gy7INqi3m1zK6A+xl6TwOBhHCA=
github.com/valyala/quicktemplate v1.6.3/go.mod h1:fwPzK2fHuYEODzJ9pkw0ipCPNHZ2tD5KW4lOuSdPKzY=
github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a/go.mod h1:v3UYOV9WzVtRmSR+PDvWpU/qWl4Wa5LApYYX4ZtKbio=
github.com/vektra/mockery/v2 v2.9.0 h1:+3FhCL3EviR779mTzXwUuhPNnqFUA7sDnt9OFkXaFd4=
github.com/vektra/mockery/v2 v2.9.0/go.mod h1:2gU4Cf/f8YyC8oEaSXfCnZBMxMjMl/Ko205rlP0fO90=
github.com/vektra/mockery/v2 v2.9.3 h1:ma6hcGQw4q/lhFUTJ+E9V8/5tsIcht9i2Q4d1qo26SQ=
github.com/vektra/mockery/v2 v2.9.3/go.mod h1:2gU4Cf/f8YyC8oEaSXfCnZBMxMjMl/Ko205rlP0fO90=
github.com/viki-org/dnscache v0.0.0-20130720023526-c70c1f23c5d8/go.mod h1:dniwbG03GafCjFohMDmz6Zc6oCuiqgH6tGNyXTkHzXE=
github.com/xiang90/probing v0.0.0-20190116061207-43a291ad63a2/go.mod h1:UETIi67q53MR2AWcXfiuqkDkRtnGDLqkBTpCHuJHxtU=
github.com/xo/terminfo v0.0.0-20210125001918-ca9a967f8778/go.mod h1:2MuV+tbUrU1zIOPMxZ5EncGwgmMJsa+9ucAQZXxsObs=

View File

@@ -52,7 +52,7 @@ func TestByzantinePrevoteEquivocation(t *testing.T) {
thisConfig := ResetConfig(fmt.Sprintf("%s_%d", testName, i))
defer os.RemoveAll(thisConfig.RootDir)
ensureDir(path.Dir(thisConfig.Consensus.WalFile()), 0700) // dir for wal
ensureDir(t, path.Dir(thisConfig.Consensus.WalFile()), 0700) // dir for wal
app := appFunc()
vals := types.TM2PB.ValidatorUpdates(state.Validators)
app.InitChain(abci.RequestInitChain{Validators: vals})

View File

@@ -69,9 +69,10 @@ func configSetup(t *testing.T) *cfg.Config {
return config
}
func ensureDir(dir string, mode os.FileMode) {
func ensureDir(t *testing.T, dir string, mode os.FileMode) {
t.Helper()
if err := tmos.EnsureDir(dir, mode); err != nil {
panic(err)
t.Fatalf("error opening directory: %s", err)
}
}
@@ -221,18 +222,20 @@ func startTestRound(cs *State, height int64, round int32) {
// Create proposal block from cs1 but sign it with vs.
func decideProposal(
t *testing.T,
cs1 *State,
vs *validatorStub,
height int64,
round int32,
) (proposal *types.Proposal, block *types.Block) {
t.Helper()
cs1.mtx.Lock()
block, blockParts := cs1.createProposalBlock()
validRound := cs1.ValidRound
chainID := cs1.state.ChainID
cs1.mtx.Unlock()
if block == nil {
panic("Failed to createProposalBlock. Did you forget to add commit for previous block?")
t.Fatal("Failed to createProposalBlock. Did you forget to add commit for previous block?")
}
// Make proposal
@@ -240,7 +243,7 @@ func decideProposal(
proposal = types.NewProposal(height, round, polRound, propBlockID)
p := proposal.ToProto()
if err := vs.SignProposal(context.Background(), chainID, p); err != nil {
panic(err)
t.Fatalf("error signing proposal: %s", err)
}
proposal.Signature = p.Signature
@@ -267,36 +270,38 @@ func signAddVotes(
}
func validatePrevote(t *testing.T, cs *State, round int32, privVal *validatorStub, blockHash []byte) {
t.Helper()
prevotes := cs.Votes.Prevotes(round)
pubKey, err := privVal.GetPubKey(context.Background())
require.NoError(t, err)
address := pubKey.Address()
var vote *types.Vote
if vote = prevotes.GetByAddress(address); vote == nil {
panic("Failed to find prevote from validator")
t.Fatalf("Failed to find prevote from validator")
}
if blockHash == nil {
if vote.BlockID.Hash != nil {
panic(fmt.Sprintf("Expected prevote to be for nil, got %X", vote.BlockID.Hash))
t.Fatalf("Expected prevote to be for nil, got %X", vote.BlockID.Hash)
}
} else {
if !bytes.Equal(vote.BlockID.Hash, blockHash) {
panic(fmt.Sprintf("Expected prevote to be for %X, got %X", blockHash, vote.BlockID.Hash))
t.Fatalf("Expected prevote to be for %X, got %X", blockHash, vote.BlockID.Hash)
}
}
}
func validateLastPrecommit(t *testing.T, cs *State, privVal *validatorStub, blockHash []byte) {
t.Helper()
votes := cs.LastCommit
pv, err := privVal.GetPubKey(context.Background())
require.NoError(t, err)
address := pv.Address()
var vote *types.Vote
if vote = votes.GetByAddress(address); vote == nil {
panic("Failed to find precommit from validator")
t.Fatalf("Failed to find precommit from validator")
}
if !bytes.Equal(vote.BlockID.Hash, blockHash) {
panic(fmt.Sprintf("Expected precommit to be for %X, got %X", blockHash, vote.BlockID.Hash))
t.Fatalf("Expected precommit to be for %X, got %X", blockHash, vote.BlockID.Hash)
}
}
@@ -309,41 +314,42 @@ func validatePrecommit(
votedBlockHash,
lockedBlockHash []byte,
) {
t.Helper()
precommits := cs.Votes.Precommits(thisRound)
pv, err := privVal.GetPubKey(context.Background())
require.NoError(t, err)
address := pv.Address()
var vote *types.Vote
if vote = precommits.GetByAddress(address); vote == nil {
panic("Failed to find precommit from validator")
t.Fatalf("Failed to find precommit from validator")
}
if votedBlockHash == nil {
if vote.BlockID.Hash != nil {
panic("Expected precommit to be for nil")
t.Fatalf("Expected precommit to be for nil")
}
} else {
if !bytes.Equal(vote.BlockID.Hash, votedBlockHash) {
panic("Expected precommit to be for proposal block")
t.Fatalf("Expected precommit to be for proposal block")
}
}
if lockedBlockHash == nil {
if cs.LockedRound != lockRound || cs.LockedBlock != nil {
panic(fmt.Sprintf(
t.Fatalf(
"Expected to be locked on nil at round %d. Got locked at round %d with block %v",
lockRound,
cs.LockedRound,
cs.LockedBlock))
cs.LockedBlock)
}
} else {
if cs.LockedRound != lockRound || !bytes.Equal(cs.LockedBlock.Hash(), lockedBlockHash) {
panic(fmt.Sprintf(
t.Fatalf(
"Expected block to be locked on round %d, got %d. Got locked block %X, expected %X",
lockRound,
cs.LockedRound,
cs.LockedBlock.Hash(),
lockedBlockHash))
lockedBlockHash)
}
}
}
@@ -357,6 +363,7 @@ func validatePrevoteAndPrecommit(
votedBlockHash,
lockedBlockHash []byte,
) {
t.Helper()
// verify the prevote
validatePrevote(t, cs, thisRound, privVal, votedBlockHash)
// verify precommit
@@ -444,13 +451,14 @@ func newStateWithConfigAndBlockStore(
return cs
}
func loadPrivValidator(config *cfg.Config) *privval.FilePV {
func loadPrivValidator(t *testing.T, config *cfg.Config) *privval.FilePV {
t.Helper()
privValidatorKeyFile := config.PrivValidator.KeyFile()
ensureDir(filepath.Dir(privValidatorKeyFile), 0700)
ensureDir(t, filepath.Dir(privValidatorKeyFile), 0700)
privValidatorStateFile := config.PrivValidator.StateFile()
privValidator, err := privval.LoadOrGenFilePV(privValidatorKeyFile, privValidatorStateFile)
if err != nil {
panic(err)
t.Fatalf("error generating validator file: %s", err)
}
privValidator.Reset()
return privValidator
@@ -475,220 +483,241 @@ func randState(config *cfg.Config, nValidators int) (*State, []*validatorStub) {
//-------------------------------------------------------------------------------
func ensureNoNewEvent(ch <-chan tmpubsub.Message, timeout time.Duration,
func ensureNoNewEvent(t *testing.T, ch <-chan tmpubsub.Message, timeout time.Duration,
errorMessage string) {
t.Helper()
select {
case <-time.After(timeout):
break
case <-ch:
panic(errorMessage)
t.Fatalf("unexpected event: %s", errorMessage)
}
}
func ensureNoNewEventOnChannel(ch <-chan tmpubsub.Message) {
func ensureNoNewEventOnChannel(t *testing.T, ch <-chan tmpubsub.Message) {
t.Helper()
ensureNoNewEvent(
t,
ch,
ensureTimeout,
"We should be stuck waiting, not receiving new event on the channel")
}
func ensureNoNewRoundStep(stepCh <-chan tmpubsub.Message) {
func ensureNoNewRoundStep(t *testing.T, stepCh <-chan tmpubsub.Message) {
t.Helper()
ensureNoNewEvent(
t,
stepCh,
ensureTimeout,
"We should be stuck waiting, not receiving NewRoundStep event")
}
func ensureNoNewUnlock(unlockCh <-chan tmpubsub.Message) {
func ensureNoNewUnlock(t *testing.T, unlockCh <-chan tmpubsub.Message) {
t.Helper()
ensureNoNewEvent(
t,
unlockCh,
ensureTimeout,
"We should be stuck waiting, not receiving Unlock event")
}
func ensureNoNewTimeout(stepCh <-chan tmpubsub.Message, timeout int64) {
func ensureNoNewTimeout(t *testing.T, stepCh <-chan tmpubsub.Message, timeout int64) {
t.Helper()
timeoutDuration := time.Duration(timeout*10) * time.Nanosecond
ensureNoNewEvent(
t,
stepCh,
timeoutDuration,
"We should be stuck waiting, not receiving NewTimeout event")
}
func ensureNewEvent(ch <-chan tmpubsub.Message, height int64, round int32, timeout time.Duration, errorMessage string) {
func ensureNewEvent(t *testing.T, ch <-chan tmpubsub.Message, height int64, round int32, timeout time.Duration, errorMessage string) { //nolint: lll
t.Helper()
select {
case <-time.After(timeout):
panic(errorMessage)
t.Fatalf("timed out waiting for new event: %s", errorMessage)
case msg := <-ch:
roundStateEvent, ok := msg.Data().(types.EventDataRoundState)
if !ok {
panic(fmt.Sprintf("expected a EventDataRoundState, got %T. Wrong subscription channel?",
msg.Data()))
t.Fatalf("expected a EventDataRoundState, got %T. Wrong subscription channel?", msg.Data())
}
if roundStateEvent.Height != height {
panic(fmt.Sprintf("expected height %v, got %v", height, roundStateEvent.Height))
t.Fatalf("expected height %v, got %v", height, roundStateEvent.Height)
}
if roundStateEvent.Round != round {
panic(fmt.Sprintf("expected round %v, got %v", round, roundStateEvent.Round))
t.Fatalf("expected round %v, got %v", round, roundStateEvent.Round)
}
// TODO: We could check also for a step at this point!
}
}
func ensureNewRound(roundCh <-chan tmpubsub.Message, height int64, round int32) {
func ensureNewRound(t *testing.T, roundCh <-chan tmpubsub.Message, height int64, round int32) {
t.Helper()
select {
case <-time.After(ensureTimeout):
panic("Timeout expired while waiting for NewRound event")
t.Fatal("Timeout expired while waiting for NewRound event")
case msg := <-roundCh:
newRoundEvent, ok := msg.Data().(types.EventDataNewRound)
if !ok {
panic(fmt.Sprintf("expected a EventDataNewRound, got %T. Wrong subscription channel?",
msg.Data()))
t.Fatalf("expected a EventDataNewRound, got %T. Wrong subscription channel?", msg.Data())
}
if newRoundEvent.Height != height {
panic(fmt.Sprintf("expected height %v, got %v", height, newRoundEvent.Height))
t.Fatalf("expected height %v, got %v", height, newRoundEvent.Height)
}
if newRoundEvent.Round != round {
panic(fmt.Sprintf("expected round %v, got %v", round, newRoundEvent.Round))
t.Fatalf("expected round %v, got %v", round, newRoundEvent.Round)
}
}
}
func ensureNewTimeout(timeoutCh <-chan tmpubsub.Message, height int64, round int32, timeout int64) {
func ensureNewTimeout(t *testing.T, timeoutCh <-chan tmpubsub.Message, height int64, round int32, timeout int64) {
t.Helper()
timeoutDuration := time.Duration(timeout*10) * time.Nanosecond
ensureNewEvent(timeoutCh, height, round, timeoutDuration,
ensureNewEvent(t, timeoutCh, height, round, timeoutDuration,
"Timeout expired while waiting for NewTimeout event")
}
func ensureNewProposal(proposalCh <-chan tmpubsub.Message, height int64, round int32) {
func ensureNewProposal(t *testing.T, proposalCh <-chan tmpubsub.Message, height int64, round int32) {
t.Helper()
select {
case <-time.After(ensureTimeout):
panic("Timeout expired while waiting for NewProposal event")
t.Fatalf("Timeout expired while waiting for NewProposal event")
case msg := <-proposalCh:
proposalEvent, ok := msg.Data().(types.EventDataCompleteProposal)
if !ok {
panic(fmt.Sprintf("expected a EventDataCompleteProposal, got %T. Wrong subscription channel?",
msg.Data()))
t.Fatalf("expected a EventDataCompleteProposal, got %T. Wrong subscription channel?",
msg.Data())
}
if proposalEvent.Height != height {
panic(fmt.Sprintf("expected height %v, got %v", height, proposalEvent.Height))
t.Fatalf("expected height %v, got %v", height, proposalEvent.Height)
}
if proposalEvent.Round != round {
panic(fmt.Sprintf("expected round %v, got %v", round, proposalEvent.Round))
t.Fatalf("expected round %v, got %v", round, proposalEvent.Round)
}
}
}
func ensureNewValidBlock(validBlockCh <-chan tmpubsub.Message, height int64, round int32) {
ensureNewEvent(validBlockCh, height, round, ensureTimeout,
func ensureNewValidBlock(t *testing.T, validBlockCh <-chan tmpubsub.Message, height int64, round int32) {
t.Helper()
ensureNewEvent(t, validBlockCh, height, round, ensureTimeout,
"Timeout expired while waiting for NewValidBlock event")
}
func ensureNewBlock(blockCh <-chan tmpubsub.Message, height int64) {
func ensureNewBlock(t *testing.T, blockCh <-chan tmpubsub.Message, height int64) {
t.Helper()
select {
case <-time.After(ensureTimeout):
panic("Timeout expired while waiting for NewBlock event")
t.Fatalf("Timeout expired while waiting for NewBlock event")
case msg := <-blockCh:
blockEvent, ok := msg.Data().(types.EventDataNewBlock)
if !ok {
panic(fmt.Sprintf("expected a EventDataNewBlock, got %T. Wrong subscription channel?",
msg.Data()))
t.Fatalf("expected a EventDataNewBlock, got %T. Wrong subscription channel?",
msg.Data())
}
if blockEvent.Block.Height != height {
panic(fmt.Sprintf("expected height %v, got %v", height, blockEvent.Block.Height))
t.Fatalf("expected height %v, got %v", height, blockEvent.Block.Height)
}
}
}
func ensureNewBlockHeader(blockCh <-chan tmpubsub.Message, height int64, blockHash tmbytes.HexBytes) {
func ensureNewBlockHeader(t *testing.T, blockCh <-chan tmpubsub.Message, height int64, blockHash tmbytes.HexBytes) {
t.Helper()
select {
case <-time.After(ensureTimeout):
panic("Timeout expired while waiting for NewBlockHeader event")
t.Fatalf("Timeout expired while waiting for NewBlockHeader event")
case msg := <-blockCh:
blockHeaderEvent, ok := msg.Data().(types.EventDataNewBlockHeader)
if !ok {
panic(fmt.Sprintf("expected a EventDataNewBlockHeader, got %T. Wrong subscription channel?",
msg.Data()))
t.Fatalf("expected a EventDataNewBlockHeader, got %T. Wrong subscription channel?",
msg.Data())
}
if blockHeaderEvent.Header.Height != height {
panic(fmt.Sprintf("expected height %v, got %v", height, blockHeaderEvent.Header.Height))
t.Fatalf("expected height %v, got %v", height, blockHeaderEvent.Header.Height)
}
if !bytes.Equal(blockHeaderEvent.Header.Hash(), blockHash) {
panic(fmt.Sprintf("expected header %X, got %X", blockHash, blockHeaderEvent.Header.Hash()))
t.Fatalf("expected header %X, got %X", blockHash, blockHeaderEvent.Header.Hash())
}
}
}
func ensureNewUnlock(unlockCh <-chan tmpubsub.Message, height int64, round int32) {
ensureNewEvent(unlockCh, height, round, ensureTimeout,
func ensureNewUnlock(t *testing.T, unlockCh <-chan tmpubsub.Message, height int64, round int32) {
t.Helper()
ensureNewEvent(t, unlockCh, height, round, ensureTimeout,
"Timeout expired while waiting for NewUnlock event")
}
func ensureProposal(proposalCh <-chan tmpubsub.Message, height int64, round int32, propID types.BlockID) {
func ensureProposal(t *testing.T, proposalCh <-chan tmpubsub.Message, height int64, round int32, propID types.BlockID) {
t.Helper()
select {
case <-time.After(ensureTimeout):
panic("Timeout expired while waiting for NewProposal event")
t.Fatalf("Timeout expired while waiting for NewProposal event")
case msg := <-proposalCh:
proposalEvent, ok := msg.Data().(types.EventDataCompleteProposal)
if !ok {
panic(fmt.Sprintf("expected a EventDataCompleteProposal, got %T. Wrong subscription channel?",
msg.Data()))
t.Fatalf("expected a EventDataCompleteProposal, got %T. Wrong subscription channel?",
msg.Data())
}
if proposalEvent.Height != height {
panic(fmt.Sprintf("expected height %v, got %v", height, proposalEvent.Height))
t.Fatalf("expected height %v, got %v", height, proposalEvent.Height)
}
if proposalEvent.Round != round {
panic(fmt.Sprintf("expected round %v, got %v", round, proposalEvent.Round))
t.Fatalf("expected round %v, got %v", round, proposalEvent.Round)
}
if !proposalEvent.BlockID.Equals(propID) {
panic(fmt.Sprintf("Proposed block does not match expected block (%v != %v)", proposalEvent.BlockID, propID))
t.Fatalf("Proposed block does not match expected block (%v != %v)", proposalEvent.BlockID, propID)
}
}
}
func ensurePrecommit(voteCh <-chan tmpubsub.Message, height int64, round int32) {
ensureVote(voteCh, height, round, tmproto.PrecommitType)
func ensurePrecommit(t *testing.T, voteCh <-chan tmpubsub.Message, height int64, round int32) {
t.Helper()
ensureVote(t, voteCh, height, round, tmproto.PrecommitType)
}
func ensurePrevote(voteCh <-chan tmpubsub.Message, height int64, round int32) {
ensureVote(voteCh, height, round, tmproto.PrevoteType)
func ensurePrevote(t *testing.T, voteCh <-chan tmpubsub.Message, height int64, round int32) {
t.Helper()
ensureVote(t, voteCh, height, round, tmproto.PrevoteType)
}
func ensureVote(voteCh <-chan tmpubsub.Message, height int64, round int32,
func ensureVote(t *testing.T, voteCh <-chan tmpubsub.Message, height int64, round int32,
voteType tmproto.SignedMsgType) {
t.Helper()
select {
case <-time.After(ensureTimeout):
panic("Timeout expired while waiting for NewVote event")
t.Fatalf("Timeout expired while waiting for NewVote event")
case msg := <-voteCh:
voteEvent, ok := msg.Data().(types.EventDataVote)
if !ok {
panic(fmt.Sprintf("expected a EventDataVote, got %T. Wrong subscription channel?",
msg.Data()))
t.Fatalf("expected a EventDataVote, got %T. Wrong subscription channel?",
msg.Data())
}
vote := voteEvent.Vote
if vote.Height != height {
panic(fmt.Sprintf("expected height %v, got %v", height, vote.Height))
t.Fatalf("expected height %v, got %v", height, vote.Height)
}
if vote.Round != round {
panic(fmt.Sprintf("expected round %v, got %v", round, vote.Round))
t.Fatalf("expected round %v, got %v", round, vote.Round)
}
if vote.Type != voteType {
panic(fmt.Sprintf("expected type %v, got %v", voteType, vote.Type))
t.Fatalf("expected type %v, got %v", voteType, vote.Type)
}
}
}
func ensurePrecommitTimeout(ch <-chan tmpubsub.Message) {
func ensurePrecommitTimeout(t *testing.T, ch <-chan tmpubsub.Message) {
t.Helper()
select {
case <-time.After(ensureTimeout):
panic("Timeout expired while waiting for the Precommit to Timeout")
t.Fatalf("Timeout expired while waiting for the Precommit to Timeout")
case <-ch:
}
}
func ensureNewEventOnChannel(ch <-chan tmpubsub.Message) {
func ensureNewEventOnChannel(t *testing.T, ch <-chan tmpubsub.Message) {
t.Helper()
select {
case <-time.After(ensureTimeout):
panic("Timeout expired while waiting for new activity on the channel")
t.Fatalf("Timeout expired while waiting for new activity on the channel")
case <-ch:
}
}
@@ -711,6 +740,7 @@ func randConsensusState(
appFunc func() abci.Application,
configOpts ...func(*cfg.Config),
) ([]*State, cleanupFunc) {
t.Helper()
genDoc, privVals := factory.RandGenesisDoc(config, nValidators, false, 30)
css := make([]*State, nValidators)
@@ -731,7 +761,7 @@ func randConsensusState(
opt(thisConfig)
}
ensureDir(filepath.Dir(thisConfig.Consensus.WalFile()), 0700) // dir for wal
ensureDir(t, filepath.Dir(thisConfig.Consensus.WalFile()), 0700) // dir for wal
app := appFunc()
@@ -759,6 +789,7 @@ func randConsensusState(
// nPeers = nValidators + nNotValidator
func randConsensusNetWithPeers(
t *testing.T,
config *cfg.Config,
nValidators,
nPeers int,
@@ -768,6 +799,7 @@ func randConsensusNetWithPeers(
) ([]*State, *types.GenesisDoc, *cfg.Config, cleanupFunc) {
genDoc, privVals := factory.RandGenesisDoc(config, nValidators, false, testMinPower)
css := make([]*State, nPeers)
t.Helper()
logger := consensusLogger()
var peer0Config *cfg.Config
@@ -776,7 +808,7 @@ func randConsensusNetWithPeers(
state, _ := sm.MakeGenesisState(genDoc)
thisConfig := ResetConfig(fmt.Sprintf("%s_%d", testName, i))
configRootDirs = append(configRootDirs, thisConfig.RootDir)
ensureDir(filepath.Dir(thisConfig.Consensus.WalFile()), 0700) // dir for wal
ensureDir(t, filepath.Dir(thisConfig.Consensus.WalFile()), 0700) // dir for wal
if i == 0 {
peer0Config = thisConfig
}
@@ -786,16 +818,16 @@ func randConsensusNetWithPeers(
} else {
tempKeyFile, err := ioutil.TempFile("", "priv_validator_key_")
if err != nil {
panic(err)
t.Fatalf("error creating temp file for validator key: %s", err)
}
tempStateFile, err := ioutil.TempFile("", "priv_validator_state_")
if err != nil {
panic(err)
t.Fatalf("error loading validator state: %s", err)
}
privVal, err = privval.GenFilePV(tempKeyFile.Name(), tempStateFile.Name(), "")
if err != nil {
panic(err)
t.Fatalf("error generating validator key: %s", err)
}
}

View File

@@ -40,12 +40,12 @@ func TestMempoolNoProgressUntilTxsAvailable(t *testing.T) {
newBlockCh := subscribe(cs.eventBus, types.EventQueryNewBlock)
startTestRound(cs, height, round)
ensureNewEventOnChannel(newBlockCh) // first block gets committed
ensureNoNewEventOnChannel(newBlockCh)
ensureNewEventOnChannel(t, newBlockCh) // first block gets committed
ensureNoNewEventOnChannel(t, newBlockCh)
deliverTxsRange(cs, 0, 1)
ensureNewEventOnChannel(newBlockCh) // commit txs
ensureNewEventOnChannel(newBlockCh) // commit updated app hash
ensureNoNewEventOnChannel(newBlockCh)
ensureNewEventOnChannel(t, newBlockCh) // commit txs
ensureNewEventOnChannel(t, newBlockCh) // commit updated app hash
ensureNoNewEventOnChannel(t, newBlockCh)
}
func TestMempoolProgressAfterCreateEmptyBlocksInterval(t *testing.T) {
@@ -63,9 +63,9 @@ func TestMempoolProgressAfterCreateEmptyBlocksInterval(t *testing.T) {
newBlockCh := subscribe(cs.eventBus, types.EventQueryNewBlock)
startTestRound(cs, cs.Height, cs.Round)
ensureNewEventOnChannel(newBlockCh) // first block gets committed
ensureNoNewEventOnChannel(newBlockCh) // then we dont make a block ...
ensureNewEventOnChannel(newBlockCh) // until the CreateEmptyBlocksInterval has passed
ensureNewEventOnChannel(t, newBlockCh) // first block gets committed
ensureNoNewEventOnChannel(t, newBlockCh) // then we dont make a block ...
ensureNewEventOnChannel(t, newBlockCh) // until the CreateEmptyBlocksInterval has passed
}
func TestMempoolProgressInHigherRound(t *testing.T) {
@@ -93,19 +93,19 @@ func TestMempoolProgressInHigherRound(t *testing.T) {
}
startTestRound(cs, height, round)
ensureNewRound(newRoundCh, height, round) // first round at first height
ensureNewEventOnChannel(newBlockCh) // first block gets committed
ensureNewRound(t, newRoundCh, height, round) // first round at first height
ensureNewEventOnChannel(t, newBlockCh) // first block gets committed
height++ // moving to the next height
round = 0
ensureNewRound(newRoundCh, height, round) // first round at next height
deliverTxsRange(cs, 0, 1) // we deliver txs, but dont set a proposal so we get the next round
ensureNewTimeout(timeoutCh, height, round, cs.config.TimeoutPropose.Nanoseconds())
ensureNewRound(t, newRoundCh, height, round) // first round at next height
deliverTxsRange(cs, 0, 1) // we deliver txs, but dont set a proposal so we get the next round
ensureNewTimeout(t, timeoutCh, height, round, cs.config.TimeoutPropose.Nanoseconds())
round++ // moving to the next round
ensureNewRound(newRoundCh, height, round) // wait for the next round
ensureNewEventOnChannel(newBlockCh) // now we can commit the block
round++ // moving to the next round
ensureNewRound(t, newRoundCh, height, round) // wait for the next round
ensureNewEventOnChannel(t, newBlockCh) // now we can commit the block
}
func deliverTxsRange(cs *State, start, end int) {

View File

@@ -336,7 +336,7 @@ func TestReactorWithEvidence(t *testing.T) {
defer os.RemoveAll(thisConfig.RootDir)
ensureDir(path.Dir(thisConfig.Consensus.WalFile()), 0700) // dir for wal
ensureDir(t, path.Dir(thisConfig.Consensus.WalFile()), 0700) // dir for wal
app := appFunc()
vals := types.TM2PB.ValidatorUpdates(state.Validators)
app.InitChain(abci.RequestInitChain{Validators: vals})
@@ -627,6 +627,7 @@ func TestReactorValidatorSetChanges(t *testing.T) {
nPeers := 7
nVals := 4
states, _, _, cleanup := randConsensusNetWithPeers(
t,
config,
nVals,
nPeers,

View File

@@ -58,7 +58,7 @@ func startNewStateAndWaitForBlock(t *testing.T, consensusReplayConfig *cfg.Confi
logger := log.TestingLogger()
state, err := sm.MakeGenesisStateFromFile(consensusReplayConfig.GenesisFile())
require.NoError(t, err)
privValidator := loadPrivValidator(consensusReplayConfig)
privValidator := loadPrivValidator(t, consensusReplayConfig)
blockStore := store.NewBlockStore(dbm.NewMemDB())
cs := newStateWithConfigAndBlockStore(
consensusReplayConfig,
@@ -154,7 +154,7 @@ LOOP:
blockStore := store.NewBlockStore(blockDB)
state, err := sm.MakeGenesisStateFromFile(consensusReplayConfig.GenesisFile())
require.NoError(t, err)
privValidator := loadPrivValidator(consensusReplayConfig)
privValidator := loadPrivValidator(t, consensusReplayConfig)
cs := newStateWithConfigAndBlockStore(
consensusReplayConfig,
state,
@@ -321,6 +321,7 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
nVals := 4
css, genDoc, config, cleanup := randConsensusNetWithPeers(
t,
config,
nVals,
nPeers,
@@ -345,15 +346,15 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
// start the machine
startTestRound(css[0], height, round)
incrementHeight(vss...)
ensureNewRound(newRoundCh, height, 0)
ensureNewProposal(proposalCh, height, round)
ensureNewRound(t, newRoundCh, height, 0)
ensureNewProposal(t, proposalCh, height, round)
rs := css[0].GetRoundState()
signAddVotes(sim.Config, css[0], tmproto.PrecommitType,
rs.ProposalBlock.Hash(), rs.ProposalBlockParts.Header(),
vss[1:nVals]...)
ensureNewRound(newRoundCh, height+1, 0)
ensureNewRound(t, newRoundCh, height+1, 0)
// HEIGHT 2
height++
@@ -380,12 +381,12 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
if err := css[0].SetProposalAndBlock(proposal, propBlock, propBlockParts, "some peer"); err != nil {
t.Fatal(err)
}
ensureNewProposal(proposalCh, height, round)
ensureNewProposal(t, proposalCh, height, round)
rs = css[0].GetRoundState()
signAddVotes(sim.Config, css[0], tmproto.PrecommitType,
rs.ProposalBlock.Hash(), rs.ProposalBlockParts.Header(),
vss[1:nVals]...)
ensureNewRound(newRoundCh, height+1, 0)
ensureNewRound(t, newRoundCh, height+1, 0)
// HEIGHT 3
height++
@@ -412,12 +413,12 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
if err := css[0].SetProposalAndBlock(proposal, propBlock, propBlockParts, "some peer"); err != nil {
t.Fatal(err)
}
ensureNewProposal(proposalCh, height, round)
ensureNewProposal(t, proposalCh, height, round)
rs = css[0].GetRoundState()
signAddVotes(sim.Config, css[0], tmproto.PrecommitType,
rs.ProposalBlock.Hash(), rs.ProposalBlockParts.Header(),
vss[1:nVals]...)
ensureNewRound(newRoundCh, height+1, 0)
ensureNewRound(t, newRoundCh, height+1, 0)
// HEIGHT 4
height++
@@ -471,7 +472,7 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
if err := css[0].SetProposalAndBlock(proposal, propBlock, propBlockParts, "some peer"); err != nil {
t.Fatal(err)
}
ensureNewProposal(proposalCh, height, round)
ensureNewProposal(t, proposalCh, height, round)
removeValidatorTx2 := kvstore.MakeValSetChangeTx(newVal2ABCI, 0)
err = assertMempool(css[0].txNotifier).CheckTx(context.Background(), removeValidatorTx2, nil, mempl.TxInfo{})
@@ -487,7 +488,7 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
rs.ProposalBlockParts.Header(), newVss[i])
}
ensureNewRound(newRoundCh, height+1, 0)
ensureNewRound(t, newRoundCh, height+1, 0)
// HEIGHT 5
height++
@@ -497,7 +498,7 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
newVss[newVssIdx].VotingPower = 25
sort.Sort(ValidatorStubsByPower(newVss))
selfIndex = valIndexFn(0)
ensureNewProposal(proposalCh, height, round)
ensureNewProposal(t, proposalCh, height, round)
rs = css[0].GetRoundState()
for i := 0; i < nVals+1; i++ {
if i == selfIndex {
@@ -507,7 +508,7 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
tmproto.PrecommitType, rs.ProposalBlock.Hash(),
rs.ProposalBlockParts.Header(), newVss[i])
}
ensureNewRound(newRoundCh, height+1, 0)
ensureNewRound(t, newRoundCh, height+1, 0)
// HEIGHT 6
height++
@@ -534,7 +535,7 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
if err := css[0].SetProposalAndBlock(proposal, propBlock, propBlockParts, "some peer"); err != nil {
t.Fatal(err)
}
ensureNewProposal(proposalCh, height, round)
ensureNewProposal(t, proposalCh, height, round)
rs = css[0].GetRoundState()
for i := 0; i < nVals+3; i++ {
if i == selfIndex {
@@ -544,7 +545,7 @@ func setupSimulator(t *testing.T) *simulatorTestSuite {
tmproto.PrecommitType, rs.ProposalBlock.Hash(),
rs.ProposalBlockParts.Header(), newVss[i])
}
ensureNewRound(newRoundCh, height+1, 0)
ensureNewRound(t, newRoundCh, height+1, 0)
sim.Chain = make([]*types.Block, 0)
sim.Commits = make([]*types.Commit, 0)

View File

@@ -137,7 +137,7 @@ type State struct {
done chan struct{}
// synchronous pubsub between consensus state and reactor.
// state only emits EventNewRoundStep and EventVote
// state only emits EventNewRoundStep, EventValidBlock, and EventVote
evsw tmevents.EventSwitch
// for reporting metrics

File diff suppressed because it is too large Load Diff

View File

@@ -379,6 +379,7 @@ func (c *Client) Update(ctx context.Context, now time.Time) (*types.LightBlock,
return nil, err
}
// If there is a new light block then verify it
if latestBlock.Height > lastTrustedHeight {
err = c.verifyLightBlock(ctx, latestBlock, now)
if err != nil {
@@ -388,7 +389,8 @@ func (c *Client) Update(ctx context.Context, now time.Time) (*types.LightBlock,
return latestBlock, nil
}
return nil, nil
// else return the latestTrustedBlock
return c.latestTrustedBlock, nil
}
// VerifyLightBlockAtHeight fetches the light block at the given height

View File

@@ -644,7 +644,7 @@ func TestClientReplacesPrimaryWithWitnessIfPrimaryIsUnavailable(t *testing.T) {
chainID,
trustOptions,
mockDeadNode,
[]provider.Provider{mockFullNode, mockFullNode},
[]provider.Provider{mockDeadNode, mockFullNode},
dbs.New(dbm.NewMemDB()),
light.Logger(log.TestingLogger()),
)
@@ -663,6 +663,32 @@ func TestClientReplacesPrimaryWithWitnessIfPrimaryIsUnavailable(t *testing.T) {
mockFullNode.AssertExpectations(t)
}
func TestClientReplacesPrimaryWithWitnessIfPrimaryDoesntHaveBlock(t *testing.T) {
mockFullNode := &provider_mocks.Provider{}
mockFullNode.On("LightBlock", mock.Anything, mock.Anything).Return(l1, nil)
mockDeadNode := &provider_mocks.Provider{}
mockDeadNode.On("LightBlock", mock.Anything, mock.Anything).Return(nil, provider.ErrLightBlockNotFound)
c, err := light.NewClient(
ctx,
chainID,
trustOptions,
mockDeadNode,
[]provider.Provider{mockDeadNode, mockFullNode},
dbs.New(dbm.NewMemDB()),
light.Logger(log.TestingLogger()),
)
require.NoError(t, err)
_, err = c.Update(ctx, bTime.Add(2*time.Hour))
require.NoError(t, err)
// we should still have the dead node as a witness because it
// hasn't repeatedly been unresponsive yet
assert.Equal(t, 2, len(c.Witnesses()))
mockDeadNode.AssertExpectations(t)
mockFullNode.AssertExpectations(t)
}
func TestClient_BackwardsVerification(t *testing.T) {
{
headers, vals, _ := genLightBlocksWithKeys(chainID, 9, 3, 0, bTime)

View File

@@ -702,7 +702,11 @@ func (n *nodeImpl) OnStart() error {
n.Logger.Info("starting state sync")
state, err := n.stateSyncReactor.Sync(context.TODO())
if err != nil {
n.Logger.Error("state sync failed", "err", err)
n.Logger.Error("state sync failed; shutting down this node", "err", err)
// stop the node
if err := n.Stop(); err != nil {
n.Logger.Error("failed to shut down node", "err", err)
}
return
}

View File

@@ -601,6 +601,32 @@ paths:
application/json:
schema:
$ref: "#/components/schemas/ErrorResponse"
/unsafe_flush_mempool:
get:
summary: Flush mempool of all unconfirmed transactions
operationId: unsafe_flush_mempool
tags:
- Unsafe
description: |
Flush flushes out the mempool. It acquires a read-lock, fetches all the
transactions currently in the transaction store and removes each transaction
from the store and all indexes and finally resets the cache.
Note, flushing the mempool may leave the mempool in an inconsistent state.
responses:
"200":
description: empty answer
content:
application/json:
schema:
$ref: "#/components/schemas/EmptyResponse"
"500":
description: empty error
content:
application/json:
schema:
$ref: "#/components/schemas/ErrorResponse"
/blockchain:
get:
summary: "Get block headers (max: 20) for minHeight <= height <= maxHeight."

View File

@@ -22,7 +22,7 @@ import (
// Metrics are based of the `benchmarkLength`, the amount of consecutive blocks
// sampled from in the testnet
func Benchmark(ctx context.Context, testnet *e2e.Testnet, benchmarkLength int64) error {
block, _, err := waitForHeight(ctx, testnet, 0)
block, err := getLatestBlock(ctx, testnet)
if err != nil {
return err
}

View File

@@ -54,9 +54,25 @@ func Load(ctx context.Context, testnet *e2e.Testnet) error {
case numSeen := <-chSuccess:
success += numSeen
case <-ctx.Done():
if success == 0 {
// if we couldn't submit any transactions,
// that's probably a problem and the test
// should error; however, for very short tests
// we shouldn't abort.
//
// The 2s cut off, is a rough guess based on
// the expected value of
// loadGenerateWaitTime. If the implementation
// of that function changes, then this might
// also need to change without more
// refactoring.
if success == 0 && time.Since(started) > 2*time.Second {
return errors.New("failed to submit any transactions")
}
// TODO perhaps allow test networks to
// declare required transaction rates, which
// might allow us to avoid the special case
// around 0 txs above.
rate := float64(success) / time.Since(started).Seconds()
logger.Info("ending transaction load",

View File

@@ -13,17 +13,22 @@ import (
)
// waitForHeight waits for the network to reach a certain height (or above),
// returning the highest height seen. Errors if the network is not making
// returning the block at the height seen. Errors if the network is not making
// progress at all.
// If height == 0, the initial height of the test network is used as the target.
func waitForHeight(ctx context.Context, testnet *e2e.Testnet, height int64) (*types.Block, *types.BlockID, error) {
var (
err error
maxResult *rpctypes.ResultBlock
clients = map[string]*rpchttp.HTTP{}
lastHeight int64
lastIncrease = time.Now()
nodesAtHeight = map[string]struct{}{}
numRunningNodes int
)
if height == 0 {
height = testnet.InitialHeight
}
for _, node := range testnet.Nodes {
if node.Stateless() {
continue
@@ -47,10 +52,10 @@ func waitForHeight(ctx context.Context, testnet *e2e.Testnet, height int64) (*ty
continue
}
// skip nodes that don't have state or haven't started yet
if node.Stateless() {
continue
}
if !node.HasStarted {
continue
}
@@ -67,16 +72,16 @@ func waitForHeight(ctx context.Context, testnet *e2e.Testnet, height int64) (*ty
wctx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
result, err := client.Block(wctx, nil)
result, err := client.Status(wctx)
if err != nil {
continue
}
if result.Block != nil && (maxResult == nil || result.Block.Height > maxResult.Block.Height) {
maxResult = result
if result.SyncInfo.LatestBlockHeight > lastHeight {
lastHeight = result.SyncInfo.LatestBlockHeight
lastIncrease = time.Now()
}
if maxResult != nil && maxResult.Block.Height >= height {
if result.SyncInfo.LatestBlockHeight >= height {
// the node has achieved the target height!
// add this node to the set of target
@@ -90,9 +95,16 @@ func waitForHeight(ctx context.Context, testnet *e2e.Testnet, height int64) (*ty
continue
}
// return once all nodes have reached
// the target height.
return maxResult.Block, &maxResult.BlockID, nil
// All nodes are at or above the target height. Now fetch the block for that target height
// and return it. We loop again through all clients because some may have pruning set but
// at least two of them should be archive nodes.
for _, c := range clients {
result, err := c.Block(ctx, &height)
if err != nil || result == nil || result.Block == nil {
continue
}
return result.Block, &result.BlockID, err
}
}
}
@@ -100,12 +112,12 @@ func waitForHeight(ctx context.Context, testnet *e2e.Testnet, height int64) (*ty
return nil, nil, errors.New("unable to connect to any network nodes")
}
if time.Since(lastIncrease) >= time.Minute {
if maxResult == nil {
return nil, nil, errors.New("chain stalled at unknown height")
if lastHeight == 0 {
return nil, nil, errors.New("chain stalled at unknown height (most likely upon starting)")
}
return nil, nil, fmt.Errorf("chain stalled at height %v [%d of %d nodes %+v]",
maxResult.Block.Height,
lastHeight,
len(nodesAtHeight),
numRunningNodes,
nodesAtHeight)
@@ -182,3 +194,35 @@ func waitForNode(ctx context.Context, node *e2e.Node, height int64) (*rpctypes.R
}
}
}
// getLatestBlock returns the last block that all active nodes in the network have
// agreed upon i.e. the earlist of each nodes latest block
func getLatestBlock(ctx context.Context, testnet *e2e.Testnet) (*types.Block, error) {
var earliestBlock *types.Block
for _, node := range testnet.Nodes {
// skip nodes that don't have state or haven't started yet
if node.Stateless() {
continue
}
if !node.HasStarted {
continue
}
client, err := node.Client()
if err != nil {
return nil, err
}
wctx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
result, err := client.Block(wctx, nil)
if err != nil {
return nil, err
}
if result.Block != nil && (earliestBlock == nil || earliestBlock.Height > result.Block.Height) {
earliestBlock = result.Block
}
}
return earliestBlock, nil
}

View File

@@ -10,7 +10,7 @@ import (
// Wait waits for a number of blocks to be produced, and for all nodes to catch
// up with it.
func Wait(ctx context.Context, testnet *e2e.Testnet, blocks int64) error {
block, _, err := waitForHeight(ctx, testnet, 0)
block, err := getLatestBlock(ctx, testnet)
if err != nil {
return err
}

View File

@@ -68,7 +68,7 @@ func TestApp_Tx(t *testing.T) {
}{
{
Name: "Sync",
WaitTime: 30 * time.Second,
WaitTime: time.Minute,
BroadcastTx: func(client *http.HTTP) broadcastFunc {
return func(ctx context.Context, tx types.Tx) error {
_, err := client.BroadcastTxSync(ctx, tx)
@@ -78,7 +78,13 @@ func TestApp_Tx(t *testing.T) {
},
{
Name: "Commit",
WaitTime: time.Minute,
WaitTime: 15 * time.Second,
// TODO: turn this check back on if it can
// return reliably. Currently these calls have
// a hard timeout of 10s (server side
// configured). The Sync check is probably
// safe.
ShouldSkip: true,
BroadcastTx: func(client *http.HTTP) broadcastFunc {
return func(ctx context.Context, tx types.Tx) error {
_, err := client.BroadcastTxCommit(ctx, tx)
@@ -87,8 +93,12 @@ func TestApp_Tx(t *testing.T) {
},
},
{
Name: "Async",
WaitTime: time.Minute,
Name: "Async",
WaitTime: 90 * time.Second,
// TODO: turn this check back on if there's a
// way to avoid failures in the case that the
// transaction doesn't make it into the
// mempool. (retries?)
ShouldSkip: true,
BroadcastTx: func(client *http.HTTP) broadcastFunc {
return func(ctx context.Context, tx types.Tx) error {