This change adds the new TimingParams proto messages. These new messages were build using the wb/proposer-based-timestamps branch on the spec repo.
This change also adds validation that these values are positive when parsed and adds the new parameters into the existing tests.
* internal/consensus: refactor the common_test functions to use a single timeout function
* remove ensurePrecommit
* Update internal/consensus/common_test.go
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* join lines for fatal messages
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* initial proposerWaitsUntil implementation
* switch to duration for easier use with timeout scheduling
* add proposal step waiting time with tests
* minor aesthetic change to IsTimely
* minor language fix
* Update internal/consensus/state.go
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* reword comment
* change accuracy to precision
* move tests to separate pbts test file
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* add failing test
* tweak comments in failing test
* failing test comment
* initial attempt at removing prevote locked block logic
* comment out broken function
* undo reset on prevotes
* fixing TestProposeValidBlock test
* update test for completed POL update
* comment updates
* further unlock testing
* update comments
* Update internal/consensus/state.go
* spacing nit
* comment cleanup
* nil check in addVote
* update unlock description
* update precommit on relock comment
* add ensure new timeout back
* rename IsZero to IsNil and replace uses of block len check with helper
* add testing.T to new assertions
* begin removing unlock condition
* fix TestStateProposerSelection2 to precommit for nil correctly
* remove erroneous sleep
* update TestStatePOL comment
* update relock test to be more clear
* add _ into test names
* rename slashing
* udpate no relock function to be cleaner
* do not relock on old proposal test cleanup
* con state name update
* remove all references to unlock
* update test comments to include new
* add relock test
* add ensureRelock to common_test
* remove all event unlock
* remove unlock checks
* no lint add space
* lint ++
* add test for nil prevote on different proposal
* fix prevote nil condition
* fix defaultDoPrevote
* state_test.go fixes to accomodate prevoting for nil
* add failing test for POL from previous round case
* update prevote logic to prevote POL from previous round
* state.go comment fixes
* update validatePrevotes to correctly look for nil
* update new test name and comment
* update POLFromPreviousRound test
* fixes post merge
* fix spacing
* make the linter happy
* change prevote log message
* update prevote nil debug line
* update enterPrevote comment
* lint
* Update internal/consensus/state.go
Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
* Update internal/consensus/state.go
Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
* add english description of alg rules
* Update internal/consensus/state.go
Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
* comment fixes from review
* fix comment
* fix comment
Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
This is a very small change, but removes a method from the
`service.Service` interface (a win!) and forces callers to explicitly
pass loggers in to objects during construction rather than (later)
injecting them. There's not a real need for this kind of lazy
construction of loggers, and I think a decent potential for confusion
for mutable loggers.
The main concern I have is that this changes the constructor API for
ABCI clients. I think this is fine, and I suspect that as we plumb
contexts through, and make changes to the RPC services there'll be a
number of similar sorts of changes to various (quasi) public
interfaces, which I think we should welcome.
I think calling os.Exit at arbitrary points is _bad_ and is good to
delete. I think panics in the case of data courruption have a chance
of providing useful information.
When dialing fails to succeed we should reduce the score of the peer,
which puts the peer at (potentially) greater chances of being removed
from the peer manager, and reduces the chance of the peer being
gossiped by the PEX reactor.
The evidence test produces a set of mock evidence in the evidence pool of the 'Primary' node. The test then fills the evidence pools of secondaries with half of this mock evidence. Finally, the test waits until the secondary has an evidence pool as full as the primary.
The assertions that are removed here were checking that the primary and secondaries' evidence channels were empty. However, nothing in the test actually ensures that the channels are empty. The test only waits for the secondaries to have received the complete set of evidence, and the secondaries already received half of the evidence at the beginning. It's more than possible that the secondaries can receive the complete set of evidence and not finish reading the duplicate evidence off the channels.
As a safety measure, don't allow a query string to be unreasonably
long. The query filter is not especially efficient, so a query that
needs more than basic detail should filter coarsely in the subscriber
and refine on the client side.
This affects Subscribe and TxSearch queries.
This follows the same model as we did in the p2p package.
Rework the indexer service constructor to take a struct of arguments,
that makes it easier to construct the optional settings.
Deprecate but do not remove the existing constructor.
Clean up node initialization a little bit.
This is part of the work described by #7156.
Remove "unbuffered subscriptions" from the pubsub service.
Replace them with a dedicated blocking "observer" mechanism.
Use the observer mechanism for indexing.
Add a SubscribeWithArgs method and deprecate the old Subscribe
method. Remove SubscribeUnbuffered entirely (breaking).
Rework the Subscription interface to eliminate exposed channels.
Subscriptions now use a context to manage lifecycle notifications.
Internalize the eventbus package.
Updates #7156, and a follow-up to #7070.
Event subscriptions in Tendermint currently use a fixed-length Go
channel as a queue. When the channel fills up, the publisher
immediately terminates the subscription. This prevents slow
subscribers from creating memory pressure on the node by not
servicing their queue fast enough.
Replace the buffered channel used to deliver events to buffered
subscribers with an explicit queue. The queue provides a soft
quota and burst credit mechanism: Clients that usually keep up
can survive occasional bursts, without allowing truly slow
clients to hog resources indefinitely.
The main (and minor) win of this PR is that the transport is fully the
responsibility of the router and the node doesn't need to be responsible for its lifecylce.
I saw one of these tests fail and it looks like it was using code that
wasn't being called anywhere, so I deleted it, and avoided the package
name aliasing.
This pull request adds a new "mesage_type" label to the send/recv bytes metrics calculated in the p2p code.
Below is a snippet of the updated metrics that includes the updated label:
```
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="2551a13ed720101b271a5df4816d1e4b3d3bd133"} 652
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="4b1068420ef739db63377250553562b9a978708a"} 631
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="927c50a5e508c747830ce3ba64a3f70fdda58ef2"} 631
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="2551a13ed720101b271a5df4816d1e4b3d3bd133"} 393
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="4b1068420ef739db63377250553562b9a978708a"} 357
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="927c50a5e508c747830ce3ba64a3f70fdda58ef2"} 386
```
This pull request fixes a panic that exists in both mempools. The panic occurs when the ABCI client misses a response from the ABCI application. This happen when the ABCI client drops the request as a result of a full client queue. The fix here was to loop through the ordered list of recheck-tx in the callback until one matches the currently observed recheck request.
This is, perhaps, the trival final piece of #7075 that I've been
working on.
There's more work to be done:
- push more of the setup into the pacakges themselves
- move channel-based sending/filtering out of the
- simplify the buffering throuhgout the p2p stack.
This change removes the partial gRPC interface to the RPC service, which was
deprecated in resolution of #6718.
Details:
- rpc: Remove the client and server interfaces and proto definitions.
- Remove the gRPC settings from the config library.
- Remove gRPC setup for the RPC service in the node startup.
- Fix various test helpers to remove gRPC bits.
- Remove the --rpc.grpc-laddr flag from the CLI.
Note that to satisfy the protobuf interface check, this change also includes a
temporary edit to buf.yaml, that I will revert after this is merged.
This metric describes itself as 'pending' but never actual decrements when the messages are removed from the queue.
This change fixes that by decrementing the metric when the data is removed from the queue.
This PR adds an initial set of metrics for use ABCI. The initial metrics enable the calculation of timing histograms and call counts for each of the ABCI methods. The metrics are also labeled as either 'sync' or 'async' to determine if the method call was performed using ABCI's `*Async` methods.
An example of these metrics is included here for reference:
```
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0001"} 0
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0004"} 5
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.002"} 12
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.009"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.02"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.1"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.65"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="2"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="6"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="25"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="+Inf"} 13
tendermint_abci_connection_method_timing_sum{chain_id="ci",method="commit",type="sync"} 0.007802058000000001
tendermint_abci_connection_method_timing_count{chain_id="ci",method="commit",type="sync"} 13
```
These metrics can easily be graphed using prometheus's `histogram_quantile(...)` method to pick out a particular quantile to graph or examine. I chose buckets that were somewhat of an estimate of expected range of times for ABCI operations. They start at .0001 seconds and range to 25 seconds. The hope is that this range captures enough possible times to be useful for us and operators.