Updates #8077. The panic handler for consensus currently attempts to effect a
clean shutdown, but this can leave a failed node running in an unknown state
for an arbitrary amount of time after the failure.
Since a panic at this point means consensus is already irrecoverably broken, we
should not allow the node to continue executing. After making a best effort to
shut down the writeahead log, re-panic to ensure the node will terminate before
any further state transitions are processed.
Even with this change, it is possible some transitions may occur while the
cleanup is happening. It might be preferable to abort unconditionally without
any attempt at cleanup.
Related changes:
- Clean up the creation of WAL directories.
- Filter WAL close errors at rethrow.
This change set implements the most recent version of `FinalizeBlock`.
# What does this change actually contain?
* This change set is rather large but fear not! The majority of the files touched and changes are renaming `ResponseDeliverTx` to `ExecTxResult`. This should be a pretty inoffensive change since they're effectively the same type but with a different name.
* The `execBlockOnProxyApp` was totally removed since it served as just a wrapper around the logic that is now mostly encapsulated within `FinalizeBlock`
* The `updateState` helper function has been made a public method on `State`. It was being exposed as a shim through the testing infrastructure, so this seemed innocuous.
* Tests already existed to ensure that the application received the `ByzantineValidators` and the `ValidatorUpdates`, but one was fixed up to ensure that `LastCommitInfo` was being sent across.
* Tests were removed from the `psql` indexer that seemed to search for an event in the indexer that was not being created.
# Questions for reviewers
* We store this [ABCIResponses](5721a13ab1/proto/tendermint/state/types.pb.go (L37)) type in the data base as the block results. This type has changed since v0.35 to contain the `FinalizeBlock` response. I'm wondering if we need to do any shimming to keep the old data retrieveable?
* Similarly, this change is exposed via the RPC through [ResultBlockResults](5721a13ab1/rpc/coretypes/responses.go (L69)) changing. Should we somehow shim or notify for this change?
closes: #7658
This is mostly an extremely small change where I double a somewhat
arbitrarly set timeout from 1m to 2m for an entire test. When I put
these timeouts in the test, they were arbitrary based on my local
performance (which is quite fact,) and I expected that they'd need to
be tweaked in the future.
A big chunk of this PR is reworking a collection of helper functions
that produce somewhat intractable messages when a test fails, so that
the error messages take up less vertical space, hopefully without
losing any debugability.
This change implements the spec for `ProcessProposal`. It first calls the Tendermint block validation logic to check that all of the proposed block fields are well formed and do not violate any of the rules for Tendermint to consider the block valid and then passes the validated block the `ProcessProposal`.
This change also adds additional fixtures to test the change. It adds the `baseMock` types that holds a mock as well as a reference to `BaseApplication`. If the function was not setup by the test on the contained mock Application, the type delegates to the `BaseApplication` and returns what `BaseApplication` returns.
The change also switches the `makeState` helper to take an arg struct so that an ABCI application can be plumbed through when needed.
closes: #7656
This is the first step in removing the mutex from ABCI applications:
making our test applications hold mutexes, which this does, hopefully
with zero impact. If this lands well, then we can explore deleting the
other mutexes (in the ABCI server and the clients.) While this change
is not user impacting at all, removing the other mutexes *will* be.
In persuit of this, I've changed the KV app somewhat, to put almost
all of the logic in the base application and make the persistent
application mostly be a wrapper on top of that with a different
storage layer.
* event: Added Events after evidence validation; evidence: refactored AddEvidence
Added context and Metrics as parameter for the pool constructor
* evidence: pushed event firing into evidence pool and added metrics to represent the size of the evpool
* state: fixed parameters of evpool mock functions
* evidence: added test to confirm events are generated
* Removed obsolete EvidenceEventPublisher interface
* evidence: pool removed error on missing eventbus
Went through #2871, there are several issues, this PR tries to tackle the `HasVoteMessage` with an invalid validator index sent by a bad peer and it prevents the bad vote goes to the peerMsgQueue.
Future work, check other bad message cases and plumbing the reactor errors with the peer manager and then can disconnect the peer sending the bad messages.
This change adds logic to double the message delay bound after every 10 rounds. Alternatives to this somewhat magic number were discussed. Specifically, whether or not to make '10' modifiable as a parameter was discussed. Since this behavior only exists to ensure liveness in the case that these values were poorly chosen to begin with, a method to configure this value was not created. Chains that notice many 'untimely' rounds per the [relevant metric](https://github.com/tendermint/tendermint/pull/7709) are expected to take action to increase the configured message delay to more accurately match the conditions of the network.
closes: https://github.com/tendermint/spec/issues/371
This is clearly a cob-web in the code, and may predict a solution to #7729, though this is difficult to backport because we don't have contexts in 0.35