This pull request completes the change to the `metricsgen` metrics. It adds `go generate` directives to all of the files containing the `Metrics` structs.
Using the outputs of `metricsdiff` between these generated metrics and `main`, we can see that there is a minimal diff between the two sets of metrics when run locally. The diff here stems from removal of the word 'message' which was done in v0.36+ and is ultimately a better phrasing. This metric has not yet been released, so this phrasing is preferred.
```
./metricsdiff old new
Metric changes:
+++ tendermint_consensus_full_prevote_delay
+++ tendermint_consensus_quorum_prevote_delay
--- tendermint_consensus_full_prevote_message_delay
--- tendermint_consensus_quorum_prevote_message_delay
```
This change also adds parsing for a `metrics:` key in a field comment. If a comment line begins with `//metrics:` the rest of the line is interpreted to be the metric help text. Additionally, a bug where lists of labels were not properly quoted in the `metricsgen` rendered output was fixed.
In my view, docs and tests are not needed for this internal only change.
---
#### PR checklist
- [ ] Tests written/updated, or no tests needed
- [ ] `CHANGELOG_PENDING.md` updated, or no changelog entry needed
- [ ] Updated relevant documentation (`docs/`) and code comments, or no
documentation updates needed
Ports #8488 to main
* internal/proxy: add initial set of abci metrics (#7115)
This PR adds an initial set of metrics for use ABCI. The initial metrics enable the calculation of timing histograms and call counts for each of the ABCI methods. The metrics are also labeled as either 'sync' or 'async' to determine if the method call was performed using ABCI's `*Async` methods.
An example of these metrics is included here for reference:
```
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0001"} 0
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0004"} 5
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.002"} 12
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.009"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.02"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.1"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.65"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="2"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="6"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="25"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="+Inf"} 13
tendermint_abci_connection_method_timing_sum{chain_id="ci",method="commit",type="sync"} 0.007802058000000001
tendermint_abci_connection_method_timing_count{chain_id="ci",method="commit",type="sync"} 13
```
These metrics can easily be graphed using prometheus's `histogram_quantile(...)` method to pick out a particular quantile to graph or examine. I chose buckets that were somewhat of an estimate of expected range of times for ABCI operations. They start at .0001 seconds and range to 25 seconds. The hope is that this range captures enough possible times to be useful for us and operators.
* fixup
* fixups
* docs: add abci timing metrics to the metrics docs (#7311)
* format table
This change updates the lock handling in the consensus reactor. The consensus reactor now periodically fetches the RoundState and the gossip routines operate on this fetched copy instead of fetching the latest copy in each iteration of the gossip routine.
* change lock handling in consensus state file
* add comment explaining the unlock
* comment fix
* Update consensus/state.go
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* spelling fix
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* consensus: calculate prevote message delay metric (#7551)
## What does this pull request do?
This pull requests adds two metrics intended for use in calculating an experimental value for `MessageDelay`.
The metrics are as follows:
```
# HELP tendermint_consensus_complete_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved 100% of the voting power in the prevote step.
# TYPE tendermint_consensus_complete_prevote_message_delay gauge
tendermint_consensus_complete_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505
# HELP tendermint_consensus_quorum_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved a quorum in the prevote step.
# TYPE tendermint_consensus_quorum_prevote_message_delay gauge
tendermint_consensus_quorum_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505
```
## Why this change?
For more information on what these metrics are calculating, see #7202. The aim is to merge to backport these metrics to v0.34 and run nodes on a few popular chains with these metrics to determine the experimental values for `MessageDelay` on these popular chains and use these to select our default `SynchronyParams.MessageDelay` value.
## Why Gauges for the metrics?
Gauges allow us to overwrite the metric on each successive observation. We can then capture these metrics over time to track the highest and lowest observed value.
(cherry picked from commit 0c82ceaa5f)
# Conflicts:
# consensus/metrics.go
# consensus/state.go
* fix merge conflicts
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Co-authored-by: William Banfield <wbanfield@gmail.com>
Issues reported in Osmosis, where the message is extremely long. Also, there is absolutely no reason to log the message IMO. If we must, we can make the message log DEBUG.
(cherry picked from commit 58a6cfff9a)
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
This is an attempt to clean up the logging message as requested in #6269.
(cherry picked from commit 3f9066b290)
Co-authored-by: Sam Kleinman <garen@tychoish.com>
Executed a local network using simapp and looked for logs that seemed superfluous. This isn't by any means an exhaustive grooming, but should drastically help legibility of logs.
ref: #5912
Conflicting votes are now sent to the evidence pool to form duplicate vote evidence only once
the height of the evidence is finished and the time of the block finalised.
blockchain/vX reactor priority was decreased because during the normal operation
(i.e. when the node is not fast syncing) blockchain priority can't be
the same as consensus reactor priority. Otherwise, it's theoretically possible to
slow down consensus by constantly requesting blocks from the node.
NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when
the node is fast syncing, the priority is 10 (max), but when it's done
fast syncing - the priority gets decreased to 5 (only to serve blocks
for other nodes). But it's not possible now, therefore I decided to
focus on the normal operation (priority = 5).
evidence and consensus critical messages are more important than
the mempool ones, hence priorities are bumped by 1 (from 5 to 6).
statesync reactor priority was changed from 1 to 5 to be the same as
blockchain/vX priority.
Refs https://github.com/tendermint/tendermint/issues/5816
After a reactor has failed to parse an incoming message, it shouldn't output the "bad" data into the logs, as that data is unfiltered and could have anything in it. (We also don't think this information is helpful to have in the logs anyways.)
This fixes spurious `TestByzantinePrevoteEquivocation` failures by extending the block range and time spent waiting for evidence. I've seen many runs where the evidence isn't committed until e.g. height 27. Haven't looked into _why_ this happens, but as long as the evidence is committed eventually and the test doesn't spuriously fail I'm (mostly) happy. WDYT @cmwaters?