As a small developer quality of life improvement, I found many individual unit tests that take longer than around a second to complete, and set them to skip when run under `go test -short`.
On my machine, the wall timings for tests (with `go test -count=1 ./...` and optionally `-short` and `-race`) are roughly:
- Long tests, no race detector: about 1m42s
- Short tests, no race detector: about 17s
- Long tests, race detector enabled: about 2m1s
- Short tests, race detector enabled: about 28s
This PR is split into many commits each touching a single package, with commit messages detailing the approximate timing change per package.
shout out to @joeabbey for the inspiration. This makes the lazy
functions internal by default to prevent potential misuse by external
callers.
Should backport cleanly into 0.36 and I'll handle a messy merge into 0.35
This is a manual forward-port of #8944 and related fixes from v0.35.x.
One difference of note is that the CheckTx response messages no longer have a
field to record an error from the ABCI application. The code is set up so that
these could be reported directly to the CheckTx caller, but it would be an API
change, and right now a bunch of test plumbing depends on the existing semantics.
Also fix up tests relying on implementation-specific mempool behavior.
- Commit was setting the expected mempool size incorrectly.
- Fix sequence test not to depend on the incorrect size.
Pull out the library functionality from scripts/confix and move it to
internal/libs/confix. Replace scripts/confix with a simple stub that has the
same command-line API, but uses the library instead.
Related:
- Move and update unit tests.
- Move scripts/confix/condiff to scripts/condiff.
- Update test data for v34, v35, and v36.
- Update reference diffs.
- Update testdata README.
This is (#8446) pulled from the `main/libp2p` branch but without any
of the libp2p content, and is perhaps the easiest first step to enable
pluggability at the peer layer, and makes it possible hoist shims
(including for, say 0.34) into tendermint without touching the reactors.
Port the bug fix terra-money#76 to upstream. This is critical for ethermint json-rpc to work.
fix: prevent duplicate tx index if it succeeded before
fix: use CodeTypeOk instead of 0
fix: handle duplicate txs within the same block
Co-authored-by: jess jesse@soob.co
ref: #5281
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
This test was made flakey by #8839. The cooldown period means that the node in the test will not try to reconnect as quickly as the test expects. This change makes the cooldown shorter in the test so that the node quickly reconnects.
I think we were leaving this library public because the SDK dependend
upon it, but the function the SDK was using was one that we'd removed
because *we* weren't using it any more, and I made a PR agasint the
SDK to clean that up.
ref: https://github.com/cosmos/cosmos-sdk/pull/12368
These timeouts default to 'do not time out' if they are not set. This times up resources, potentially indefinitely. If node on the other side of the the handshake is up but unresponsive, the[ handshake call](edec79448a/internal/p2p/router.go (L720)) will _never_ return.
These are proposed values that have not been validated. I intend to validate them in a production setting.
The dial routines perform network i/o, which is a blocking call into the kernel. These routines are completely unable to do anything else while the dial occurs, so for most of their lifecycle they are sitting idle waiting for the tcp stack to hand them data. We should increase this value by _a lot_ to enable more concurrent dials. This is unlikely to cause CPU starvation because these routines sit idle most of the time. The current value causes dials to occur _way_ too slowly.
Below is a graph demonstrating the before and after of this change in a testnetwork with many dead peers. You can observe that the rate that we connect to new, valid peers, is _much_ higher than previously. Change was deployed around the 31 minute mark on the graph.

Closes#8069
* Type `ABCIResponses` was just wrapping type `ResponseFinalizeBlock`. This patch removes the former.
* Did some renaming to avoid confusion on the data structure we are working with.
* We also remove any stale ABCIResponses we may have in the state store at the time of pruning
**IMPORTANT**: There is an undesirable side-effect of the unwrapping. An empty `ResponseFinalizeBlock` yields a 0-length proto-buf serialized buffer. This was not the case with `ABCIResponses`. I have added an interim solution, but open for suggestions on more elegant ones.
* abci:mempoolError from ResponseCheckTx
* responseCheckTx returns an error if Tendermint decides not to accept an app after CheckTx
*updated spec, upgrading.md and changelog.md
This pull requests adds the protocol buffer field for the `ABCI.VoteExtensionsEnableHeight` parameter. This proto field is threaded throughout all of the relevant places where consensus params are used and referenced.
This PR also adds validation of the consensus param updates. Previous consensus param changes didn't depend on _previous_ versions of the params, so this change adds a method for validating against the old params as well.
closes: #8453
This PR makes vote extensions optional within Tendermint. A new ConsensusParams field, called ABCIParams.VoteExtensionsEnableHeight, has been added to toggle whether or not extensions should be enabled or disabled depending on the current height of the consensus engine. Related to: #8453
Closes: #8575
This PR aims to fix the `LastCommitRound can only be negative for initial height 0` issue we see in the e2e tests by initializing the `state` object before starting the receive routines in the consensus reactor. This is somewhat inelegant, but it should fix the issue.
* Fix lock sequencing in socket client request tracking.
It is not safe to check base service state (IsRunning) while holding the lock
for the client state. If we do, then during shutdown we may deadlock with the
invocation of the OnStop handler, which the base service executes while holding
the service lock.
* Enqueue pending requests before sending them to the server.
If we don't do this, the server can reply before the request lands in the
queue. That will cause the receiver to terminate early for an unsolicited
response. So enqueue first: This is safe because we're doing it in the same
routine as services the channel, so we won't take another message till we are
safely past that point.
* Document what we did.
* Fix socket paths in tests.