From a60d032b07fd6f5ee5687b0f7d206fb4ab16228d Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Tue, 3 Mar 2020 12:37:29 +0400 Subject: [PATCH] docs: write about debug kill and dump (#4516) * docs: write about debug kill and dump Closes #4325 * wrap file tree in code blocks --- docs/tendermint-core/running-in-production.md | 62 +++++++++---------- docs/tools/README.md | 13 ++-- docs/tools/debugging.md | 57 +++++++++++++++++ 3 files changed, 94 insertions(+), 38 deletions(-) create mode 100644 docs/tools/debugging.md diff --git a/docs/tendermint-core/running-in-production.md b/docs/tendermint-core/running-in-production.md index 7a436ec95..02a0eae58 100644 --- a/docs/tendermint-core/running-in-production.md +++ b/docs/tendermint-core/running-in-production.md @@ -111,54 +111,44 @@ to achieve the same things. ## Debugging Tendermint -If you ever have to debug Tendermint, the first thing you should -probably do is to check out the logs. See [How to read -logs](./how-to-read-logs.md), where we explain what certain log -statements mean. +If you ever have to debug Tendermint, the first thing you should probably do is +check out the logs. See [How to read logs](./how-to-read-logs.md), where we +explain what certain log statements mean. -If, after skimming through the logs, things are not clear still, the -next thing to try is query the /status RPC endpoint. It provides the -necessary info: whenever the node is syncing or not, what height it is -on, etc. +If, after skimming through the logs, things are not clear still, the next thing +to try is querying the `/status` RPC endpoint. It provides the necessary info: +whenever the node is syncing or not, what height it is on, etc. -``` +```sh curl http(s)://{ip}:{rpcPort}/status ``` -`dump_consensus_state` will give you a detailed overview of the -consensus state (proposer, lastest validators, peers states). From it, -you should be able to figure out why, for example, the network had -halted. +`/dump_consensus_state` will give you a detailed overview of the consensus +state (proposer, latest validators, peers states). From it, you should be able +to figure out why, for example, the network had halted. -``` +```sh curl http(s)://{ip}:{rpcPort}/dump_consensus_state ``` -There is a reduced version of this endpoint - `consensus_state`, which -returns just the votes seen at the current height. +There is a reduced version of this endpoint - `/consensus_state`, which returns +just the votes seen at the current height. -- [Github Issues](https://github.com/tendermint/tendermint/issues) -- [StackOverflow - questions](https://stackoverflow.com/questions/tagged/tendermint) +If, after consulting with the logs and above endpoints, you still have no idea +what's happening, consider using `tendermint debug kill` sub-command. This +command will scrap all the available info and kill the process. See +[Debugging](../tools/debugging.md) for the exact format. -### Debug Utility - -Tendermint also ships with a `debug` sub-command that allows you to kill a live -Tendermint process while collecting useful information in a compressed archive -such as the configuration used, consensus state, network state, the node' status, -the WAL, and even the stacktrace of the process before exit. These files can be -useful to examine when debugging a faulty Tendermint process. - -In addition, the `debug` sub-command also allows you to dump debugging data into -compressed archives at a regular interval. These archives contain the goroutine -and heap profiles in addition to the consensus state, network info, node status, -and even the WAL. +You can inspect the resulting archive yourself or create an issue on +[Github](https://github.com/tendermint/tendermint). Before opening an issue +however, be sure to check if there's [no existing +issue](https://github.com/tendermint/tendermint/issues) already. ## Monitoring Tendermint -Each Tendermint instance has a standard `/health` RPC endpoint, which -responds with 200 (OK) if everything is fine and 500 (or no response) - -if something is wrong. +Each Tendermint instance has a standard `/health` RPC endpoint, which responds +with 200 (OK) if everything is fine and 500 (or no response) - if something is +wrong. Other useful endpoints include mentioned earlier `/status`, `/net_info` and `/validators`. @@ -166,6 +156,10 @@ Other useful endpoints include mentioned earlier `/status`, `/net_info` and Tendermint also can report and serve Prometheus metrics. See [Metrics](./metrics.md). +`tendermint debug dump` sub-command can be used to periodically dump useful +information into an archive. See [Debugging](../tools/debugging.md) for more +information. + ## What happens when my app dies? You are supposed to run Tendermint under a [process diff --git a/docs/tools/README.md b/docs/tools/README.md index c326cde5b..bf9dd1f97 100644 --- a/docs/tools/README.md +++ b/docs/tools/README.md @@ -9,16 +9,21 @@ parent: Tendermint has some tools that are associated with it for: +- [Debugging](./debugging.md) - [Benchmarking](#benchmarking) -- [Validation of remote signers](./remote-signer-validation.md) - [Testnets](#testnets) - +- [Validation of remote signers](./remote-signer-validation.md) ## Benchmarking -Benchmarking is done with tm-load-test, for information on how to use the tool please visit the docs: https://github.com/interchainio/tm-load-test +- https://github.com/interchainio/tm-load-test +`tm-load-test` is a distributed load testing tool (and framework) for load +testing Tendermint networks. ## Testnets -The testnets tool is aimed at testing Tendermint with different configurations. For more information please visit: https://github.com/interchainio/testnets. +- https://github.com/interchainio/testnets + +This repository contains various different configurations of test networks for, +and relating to, Tendermint. diff --git a/docs/tools/debugging.md b/docs/tools/debugging.md new file mode 100644 index 000000000..50961dd3b --- /dev/null +++ b/docs/tools/debugging.md @@ -0,0 +1,57 @@ +# Debugging + +## tendermint debug kill + +Tendermint comes with a `debug` sub-command that allows you to kill a live +Tendermint process while collecting useful information in a compressed archive. +The information includes the configuration used, consensus state, network +state, the node' status, the WAL, and even the stack trace of the process +before exit. These files can be useful to examine when debugging a faulty +Tendermint process. + +```sh +tendermint debug kill --home= +``` + +will write debug info into a compressed archive. The archive will contain the +following: + +``` +├── config.toml +├── consensus_state.json +├── net_info.json +├── stacktrace.out +├── status.json +└── wal +``` + +Under the hood, `debug kill` fetches info from `/status`, `/net_info`, and +`/dump_consensus_state` HTTP endpoints, and kills the process with `-6`, which +catches the go-routine dump. + +## tendermint debug dump + +Also, the `debug dump` sub-command allows you to dump debugging data into +compressed archives at a regular interval. These archives contain the goroutine +and heap profiles in addition to the consensus state, network info, node +status, and even the WAL. + +```sh +tendermint debug dump --home= +``` + +will perform similarly to `kill` except it only polls the node and +dumps debugging data every frequency seconds to a compressed archive under a +given destination directory. Each archive will contain: + +``` +├── consensus_state.json +├── goroutine.out +├── heap.out +├── net_info.json +├── status.json +└── wal +``` + +Note: goroutine.out and heap.out will only be written if a profile address is +provided and is operational. This command is blocking and will log any error.