inspect: add inspect mode for debugging crashed tendermint node (#6785)

EDIT: Updated, see [comment below]( https://github.com/tendermint/tendermint/pull/6785#issuecomment-897793175)

This change adds a sketch of the `Debug` mode. 

This change adds a `Debug` struct to the node package. This `Debug` struct is intended to be created and started by a command in the `cmd` directory. The `Debug` struct runs the RPC server on the data directories: both the state store and the block store.

This change required a good deal of refactoring. Namely, a new `rpc.go` file was added to the `node` package. This file encapsulates functions for starting RPC servers used by nodes. A potential additional change is to further factor this code into shared code _in_ the `rpc` package. 

Minor API tweaks were also made that seemed appropriate such as the mechanism for fetching routes from the `rpc/core` package.

Additional work is required to register the `Debug` service as a command in the `cmd` directory but I am looking for feedback on if this direction seems appropriate before diving much further.

closes: #5908
This commit is contained in:
William Banfield
2021-08-24 14:12:06 -04:00
committed by GitHub
parent 6d5ff590c3
commit bc2b529b95
21 changed files with 1200 additions and 116 deletions

View File

@@ -62,3 +62,30 @@ given destination directory. Each archive will contain:
Note: goroutine.out and heap.out will only be written if a profile address is
provided and is operational. This command is blocking and will log any error.
## Tendermint Inspect
Tendermint includes an `inspect` command for querying Tendermint's state store and block
store over Tendermint RPC.
When the Tendermint consensus engine detects inconsistent state, it will crash the
entire Tendermint process.
While in this inconsistent state, a node running Tendermint's consensus engine will not start up.
The `inspect` command runs only a subset of Tendermint's RPC endpoints for querying the block store
and state store.
`inspect` allows operators to query a read-only view of the stage.
`inspect` does not run the consensus engine at all and can therefore be used to debug
processes that have crashed due to inconsistent state.
To start the `inspect` process, run
```bash
tendermint inspect
```
### RPC endpoints
The list of available RPC endpoints can be found by making a request to the RPC port.
For an `inspect` process running on `127.0.0.1:26657`, navigate your browser to
`http://127.0.0.1:26657/` to retrieve the list of enabled RPC endpoints.
Additional information on the Tendermint RPC endpoints can be found in the [rpc documentation](https://docs.tendermint.com/master/rpc).

View File

@@ -64,13 +64,42 @@ It wont kill the node, but it will gather all of the above data and package i
At this point, depending on how severe the degradation is, you may want to restart the process.
## Tendermint Inspect
What if the Tendermint node will not start up due to inconsistent consensus state?
When a node running the Tendermint consensus engine detects an inconsistent state
it will crash the entire Tendermint process.
The Tendermint consensus engine cannot be run in this inconsistent state and the so node
will fail to start up as a result.
The Tendermint RPC server can provide valuable information for debugging in this situation.
The Tendermint `inspect` command will run a subset of the Tendermint RPC server
that is useful for debugging inconsistent state.
### Running inspect
Start up the `inspect` tool on the machine where Tendermint crashed using:
```bash
tendermint inspect --home=</path/to/app.d>
```
`inspect` will use the data directory specified in your Tendermint configuration file.
`inspect` will also run the RPC server at the address specified in your Tendermint configuration file.
### Using inspect
With the `inspect` server running, you can access RPC endpoints that are critically important
for debugging.
Calling the `/status`, `/consensus_state` and `/dump_consensus_state` RPC endpoint
will return useful information about the Tendermint consensus state.
## Outro
Were hoping that the `tendermint debug` subcommand will become de facto the first response to any accidents.
Were hoping that these Tendermint tools will become de facto the first response for any accidents.
Let us know what your experience has been so far! Have you had a chance to try `tendermint debug` yet?
Let us know what your experience has been so far! Have you had a chance to try `tendermint debug` or `tendermint inspect` yet?
Join our chat, where we discuss the current issues and future improvements.
Join our [discord chat](https://discord.gg/vcExX9T), where we discuss the current issues and future improvements.