At first extendedCommitInfo expected votes to be in the same order as
their corresponding validators in the supplied CommitInfo struct, but
this proved to be rather difficult since when a validator set's loaded
from state it's first sorted by voting power and then by address.
Instead of sorting the votes in the same way, this approach simply maps
votes to their corresponding validator's address prior to constructing
the extended commit info. This way it's easy to look up the
corresponding vote and we don't need to care about vote order.
Signed-off-by: Thane Thomson <connect@thanethomson.com>
Replaces the set of timeout parameters in the config.toml file with unsafe-*override versions of the corresponding ConsensusParams.Timeout field. These fields can be used for the duration of v0.36 to override the consensus param in case of emergency.
Adds a set to the ./internal/consensus/State type for correctly calculating the value of each timeout based on the set of overrides specified.
closes: #8039
This pull request updates the new ABCI++ protos to use `enum`s in place of `bool`s. `enums` may be preferred over `bool` because an `enum` can be udpated to include new statuses in the future, whereas a `bool` cannot and is fixed as just `true` or `false` over the whole lifecycle of the API.
We can reduce the size of test fixtures (which will improve test
reliability) without impacting these tests' primary role (which is
correctness.)
Also reducing these test logging will make the tests easier to read,
which whill be a good quality of life improvement for devs.
This is the first step towards implementing vote extensions: generating
the relevant proto stubs and getting the build and linter to pass.
Signed-off-by: Thane Thomson <connect@thanethomson.com>
I see this panic in tests occasionally, and I don't think there's any
need to close this channel:
- it's only sent to in one place which has a select case with a
default clause, so there's no chance of deadlocks.
- the only place we recieve from it thas a timeout.
This contains two major changes:
- Remove the legacy test logging method, and just explicitly call the
noop logger. This is just to make the test logging behavior more
coherent and clear.
- Move the logging in the light package from the testing.T logger to
the noop logger. It's really the case that we very rarely need/want
to consider test logs unless we're doing reproductions and running a
narrow set of tests.
In most cases, I (for one) prefer to run in verbose mode so I can
watch progress of tests, but I basically never need to consider
logs. If I do want to see logs, then I can edit in the testing.T
logger locally (which is what you have to do today, anyway.)
This is failing intermittently, but it's a really simple test, and I
suspect that we're just running into thread scheduling issues on CI
nodes. I don't think making the test smaller reduces the utility of
this test.
closes: #7756
# What does this pull request change?
This pull request adds a new runbook for operators enountering errors related to the new Proposer-Based Timestamps algorithm. The goal of this runbook is to give operators a set of clear steps that they can follow if they are having issues producing blocks because of clock synchronization problems.
This pull request also renames the `*PrevoteDelay` metrics to drop the term `MessageDelay`. These metrics provide a combined view of `message_delay` + `synchrony` so the name may be confusing.
# Questions to reviewers
* Are there ways to make the set of steps clearer or are there any pieces that seem confusing?
This is minor, but I was trying to write a test and realized that the
application reference in the harness isn't actually used, which is
quite confusing.
The events switch code is largely vestigal and is responsible for
wiring between the consensus state machine and the consensus
reactor. While there might have been a need, historicallly to managed
these subscriptions at runtime, it's nolonger used: subscriptions are
registered during startup, and then the switch shuts down at at the
end.
Eventually the EventSwitch should be replaced by a much smaller
implementation of an eventloop in the consensus state machine, but
cutting down on the scope of the event switch will help clarify the
requirements from the consensus side.
This change implements the logic for the PrepareProposal ABCI++ method call. The main logic for creating and issuing the PrepareProposal request lives in execution.go and is tested in a set of new tests in execution_test.go. This change also updates the mempool mock to use a mockery generated version and removes much of the plumbing for the no longer used ABCIResponses.
Updates #8077. The panic handler for consensus currently attempts to effect a
clean shutdown, but this can leave a failed node running in an unknown state
for an arbitrary amount of time after the failure.
Since a panic at this point means consensus is already irrecoverably broken, we
should not allow the node to continue executing. After making a best effort to
shut down the writeahead log, re-panic to ensure the node will terminate before
any further state transitions are processed.
Even with this change, it is possible some transitions may occur while the
cleanup is happening. It might be preferable to abort unconditionally without
any attempt at cleanup.
Related changes:
- Clean up the creation of WAL directories.
- Filter WAL close errors at rethrow.
The message handling in this reactor is all under control of the reactor
itself, and does not call out to callbacks or other externally-supplied code.
It doesn't need to check for panics.
- Remove an irrelevant channel ID check.
- Remove an unnecessary panic recovery wrapper.