Commit Graph

107 Commits

Author SHA1 Message Date
Sam Kleinman
e40a8468a4 config: backport file writing changes (#7182) 2021-10-29 06:38:52 -04:00
M. J. Fromberger
85086d7452 Fix metric cardinality left over from backport (#7180)
One of the patched uses in #7161 missed the message type field,
triggering panic failures from Prometheus.
2021-10-28 15:29:53 -07:00
mergify[bot]
dd1471da91 p2p: add message type into the send/recv bytes metrics (backport #7155) (#7161)
* p2p: add message type into the send/recv bytes metrics (#7155)

This pull request adds a new "mesage_type" label to the send/recv bytes metrics calculated in the p2p code.

Below is a snippet of the updated metrics that includes the updated label:
```
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="2551a13ed720101b271a5df4816d1e4b3d3bd133"} 652
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="4b1068420ef739db63377250553562b9a978708a"} 631
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="927c50a5e508c747830ce3ba64a3f70fdda58ef2"} 631
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="2551a13ed720101b271a5df4816d1e4b3d3bd133"} 393
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="4b1068420ef739db63377250553562b9a978708a"} 357
tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="927c50a5e508c747830ce3ba64a3f70fdda58ef2"} 386
```

(cherry picked from commit b4bc6bb4e8)
2021-10-27 07:34:24 -04:00
mergify[bot]
e62a75b627 state: add height assertion to rollback function (#7143) (#7148)
(cherry picked from commit a8ff617773)

Co-authored-by: Callum Waters <cmwaters19@gmail.com>
2021-10-21 18:07:51 +02:00
mergify[bot]
dbc72e0d69 mempool: remove panic when recheck-tx was not sent to ABCI application (#7134) (#7142)
This pull request fixes a panic that exists in both mempools. The panic occurs when the ABCI client misses a response from the ABCI application. This happen when the ABCI client drops the request as a result of a full client queue. The fix here was to loop through the ordered list of recheck-tx in the callback until one matches the currently observed recheck request.

(cherry picked from commit b0130c88fb)

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
2021-10-19 10:21:47 -04:00
mergify[bot]
b7fe214b81 Revert "abci: change client to use multi-reader mutexes (#6306)" (backport #7106) (#7110)
* Revert "abci: change client to use multi-reader mutexes (#6306)" (#7106)

This reverts commit 1c4dbe30d4.

(cherry picked from commit 34a3fcd8fc)
2021-10-12 12:03:00 -04:00
mergify[bot]
f0cd54825f cli: allow node operator to rollback last state (backport #7033) (#7081) 2021-10-08 09:56:18 +02:00
mergify[bot]
bff85fc07b mempool,rpc: add removetx rpc method (#7047) (#7065)
Addresses one of the concerns with #7041.

Provides a mechanism (via the RPC interface) to delete a single transaction, described by its hash, from the mempool. The method returns an error if the transaction cannot be found. Once the transaction is removed it remains in the cache and cannot be resubmitted until the cache is cleared or it expires from the cache.

(cherry picked from commit 851d2e3bde)

Co-authored-by: Sam Kleinman <garen@tychoish.com>
2021-10-05 16:36:21 -04:00
William Banfield
42ed5d75a5 consensus: wait until peerUpdates channel is closed to close remaining peers (#7058) (#7060)
The race occurred as a result of a goroutine launched by `processPeerUpdate` racing with the `OnStop` method. The `processPeerUpdates` goroutine deletes from the map as `OnStop` is reading from it. This change updates the `OnStop` method to wait for the peer updates channel to be done before closing the peers. It also copies the map contents to a new map so that it will not conflict with the view of the map that the goroutine created in `processPeerUpdate` sees.
2021-10-05 10:49:26 -04:00
William Banfield
243c62cc68 statesync: improve rare p2p race condition (#7042)
This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. 

The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.
2021-10-01 20:33:12 +00:00
William Banfield
177850a2c9 statesync: remove deadlock on init fail (#7029)
When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag.

```
goroutine 36 [chan receive]:
github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240)
        github.com/tendermint/tendermint/node/node.go:769 +0x132
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1()
        github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62
github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0)
        github.com/tendermint/tendermint/libs/os/os.go:26 +0x102
created by github.com/tendermint/tendermint/libs/os.TrapSignal
        github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6

goroutine 188 [semacquire]:
sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1)
        runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00026b1c8)
        sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
        sync/mutex.go:81
sync.(*RWMutex).Lock(0xc00026b1c8)
        sync/rwmutex.go:111 +0x90
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab
created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart
        github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd)
```
2021-09-30 19:19:10 +00:00
M. J. Fromberger
bdd815ebc9 Align atomic struct field for compatibility in 32-bit ABIs. (#7037)
The layout of struct fields means that interior fields may not be properly
aligned for 64-bit access.

Fixes #7000.
2021-09-30 10:53:05 -07:00
William Banfield
6a0d9c832a blocksync: fix shutdown deadlock issue (#7030)
When shutting down blocksync, it is observed that the process can hang completely. A dump of running goroutines reveals that this is due to goroutines not listening on the correct shutdown signal. Namely, the `poolRoutine` goroutine does not wait on `pool.Quit`. The `poolRoutine` does not receive any other shutdown signal during `OnStop` becuase it must stop before the `r.closeCh` is closed. Currently the `poolRoutine` listens in the `closeCh` which will not close until the `poolRoutine` stops and calls `poolWG.Done()`.

This change also puts the `requestRoutine()` in the `OnStart` method to make it more visible since it does not rely on anything that is spawned in the `poolRoutine`.

```
goroutine 183 [semacquire]:
sync.runtime_Semacquire(0xc0000d3bd8)
        runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc0000d3bd0)
        sync/waitgroup.go:130 +0x65
github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStop(0xc0000d3a00)
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:193 +0x47
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0000d3a00, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc00052c000)
        github.com/tendermint/tendermint/node/node.go:758 +0xc62
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00052c000, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1()
        github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62
github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000df6d20, 0x7f04a68da900, 0xc0004a8930, 0xc0005a72d8)
        github.com/tendermint/tendermint/libs/os/os.go:26 +0x102
created by github.com/tendermint/tendermint/libs/os.TrapSignal
        github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6


goroutine 161 [select]:
github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).poolRoutine(0xc0000d3a00, 0x0)
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:464 +0x2b3
created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:174 +0xf1

goroutine 162 [select]:
github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).processBlockSyncCh(0xc0000d3a00)
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:310 +0x151
created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:177 +0x54

goroutine 163 [select]:
github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).processPeerUpdates(0xc0000d3a00)
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:363 +0x12b
created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:178 +0x76
```
2021-09-30 16:19:18 +00:00
Sam Kleinman
23fe6fd2f9 statesync: ensure test network properly configured (#7026)
This test reliably gets hung up on network configuration, (which may
be a real issue,) but it's network setup is handcranked and we should
ensure that the test focuses on it's core assertions and doesn't fail for 
test architecture reasons.
2021-09-29 16:38:27 +00:00
Sam Kleinman
8758078786 consensus: avoid unbuffered channel in state test (#7025) 2021-09-29 11:19:34 -04:00
lklimek
1bd1593f20 fix: race condition in p2p_switch and pex_reactor (#7015)
Closes https://github.com/tendermint/tendermint/issues/7014
2021-09-28 09:32:14 -04:00
Sam Kleinman
9a16d930c6 statesync: add logging while waiting for peers (#7007) 2021-09-27 16:46:40 -04:00
Callum Waters
60a6c6fb1a e2e: allow running of single node using the e2e app (#6982) 2021-09-27 15:43:07 +02:00
Sam Kleinman
71c6682b57 statesync: clean up reactor/syncer lifecylce (#6995)
I've been noticing that there are a number of situations where the
statesync reactor blocks waiting for peers (or similar,) I've moved
things around to improve outcomes in local tests.
2021-09-24 21:40:12 +00:00
Sam Kleinman
b203c91799 rpc: implement BroadcastTxCommit without event subscriptions (#6984) 2021-09-24 13:01:35 -04:00
Sam Kleinman
bb8ffcb95b store: move pacakge to internal (#6978) 2021-09-23 11:46:42 -04:00
M. J. Fromberger
cf7537ea5f cleanup: Reduce and normalize import path aliasing. (#6975)
The code in the Tendermint repository makes heavy use of import aliasing.
This is made necessary by our extensive reuse of common base package names, and
by repetition of similar names across different subdirectories.

Unfortunately we have not been very consistent about which packages we alias in
various circumstances, and the aliases we use vary. In the spirit of the advice
in the style guide and https://github.com/golang/go/wiki/CodeReviewComments#imports,
his change makes an effort to clean up and normalize import aliasing.

This change makes no API or behavioral changes. It is a pure cleanup intended
o help make the code more readable to developers (including myself) trying to
understand what is being imported where.

Only unexported names have been modified, and the changes were generated and
applied mechanically with gofmt -r and comby, respecting the lexical and
syntactic rules of Go.  Even so, I did not fix every inconsistency. Where the
changes would be too disruptive, I left it alone.

The principles I followed in this cleanup are:

- Remove aliases that restate the package name.
- Remove aliases where the base package name is unambiguous.
- Move overly-terse abbreviations from the import to the usage site.
- Fix lexical issues (remove underscores, remove capitalization).
- Fix import groupings to more closely match the style guide.
- Group blank (side-effecting) imports and ensure they are commented.
- Add aliases to multiple imports with the same base package name.
2021-09-23 07:52:07 -07:00
M. J. Fromberger
41ac5b90c5 Fix script paths in go:generate directives. (#6973)
We moved some files further down in the directory structure in #6964, which
caused the relative paths to the mockery wrapper to stop working.

There does not seem to be an obvious way to get the module root as a default
environment variable, so for now I just added the extra up-slashes.
2021-09-23 01:02:24 +00:00
Sam Kleinman
1c4950dbd2 state: move package to internal (#6964) 2021-09-22 13:04:25 -04:00
Sam Kleinman
07d10184a1 inspect: remove duplicated construction path (#6966) 2021-09-22 12:32:49 -04:00
JayT106
84ffaaaf37 statesync/rpc: metrics for the statesync and the rpc SyncInfo (#6795) 2021-09-21 09:22:16 +02:00
Sam Kleinman
9dfdc62eb7 proxy: move proxy package to internal (#6953) 2021-09-20 15:18:48 -04:00
William Banfield
382947ce93 rfc: add performance taxonomy rfc (#6921)
This document attempts to capture and discuss some of the areas of Tendermint that seem to be cited as causing performance issue. I'm hoping to continue to gather feedback and input on this document to better understand what issues Tendermint performance may cause for our users. 

The overall goal of this document is to allow the maintainers and community to get a better sense of these issues and to be more capably able to discuss them and weight trade-offs about any proposed performance-focused changes. This document does not aim to propose any performance improvements. It does suggest useful places for benchmarks and places where additional metrics would be useful for diagnosing and further understanding Tendermint performance.

Please comment with areas where my reasoning seems off or with additional areas that Tendermint performance may be causing user pain.
2021-09-16 06:13:27 +00:00
William Banfield
63aeb50665 upgrading: add information into the UPGRADING.md for users of the codebase wishing to upgrade (#6898)
* add information on upgrading to the new p2p library

* clarify p2p backwards compatibility

* reorder p2p queue list

* add demo for p2p selection

* fix spacing in upgrading
2021-09-08 09:41:12 -04:00
Callum Waters
8fe651ba30 e2e: clean up generation of evidence (#6904) 2021-09-07 12:20:43 +02:00
Callum Waters
bda948e814 statesync: implement p2p state provider (#6807) 2021-09-02 13:19:18 +02:00
Sam Kleinman
c4df8a3840 types: move mempool error for consistency (#6875)
This is a little change just to make things more consistent ahead of
the 0.35 release.
2021-08-30 17:42:58 +00:00
Aleksandr Bezobchuk
58a6cfff9a internal/consensus: update error log (#6863)
Issues reported in Osmosis, where the message is extremely long. Also, there is absolutely no reason to log the message IMO. If we must, we can make the message log DEBUG.
2021-08-25 22:43:21 +00:00
Hlib Kanunnikov
d0e33b4292 blocksync: complete transition from Blockchain to BlockSync (#6847) 2021-08-23 16:45:08 +02:00
William Banfield
e801328128 clist: add simple property tests (#6791)
Adds a simple property test to the `clist` package. This test uses the [rapid](https://github.com/flyingmutant/rapid) library and works by modeling the internal clist as a simple array.

Follow up from this mornings workshop with the Regen team.
2021-08-03 19:30:05 +00:00
William Banfield
dc7c212c41 mempool/v1: test reactor does not panic on broadcast (#6772)
This changes adds a failing test for issue #6660. It achieves this by adding a transaction, starting the `broadcastTxRoutine` in a goroutine and then adding another transaction to the mempool. The `broadcastTxRoutine` can receive the second inserted transaction before `insertTx` returns. In that case, `broadcastTxRoutine` will derefence a nil pointer when referencing the `gossipEl` and panic.
2021-08-02 13:02:43 +00:00
William Banfield
4e96c6b234 tools: add mockery to tools.go and remove mockery version strings (#6787)
This change aims to keep versions of mockery consistent across developer laptops.

This change adds mockery to the `tools.go` file so that its version can be managed consistently in the `go.mod` file.

Additionally, this change temporarily disables adding mockery's version number to generated files. There is an outstanding issue against the mockery project related to the version string behavior when running from `go get`. I have created a pull request to fix this issue in the mockery project.
see: https://github.com/vektra/mockery/issues/397
2021-07-30 20:47:15 +00:00
Callum Waters
02f8e4c0bd blockstore: fix problem with seen commit (#6782) 2021-07-30 17:37:04 +02:00
M. J. Fromberger
6dd8984fef Fix and clarify breaks from select cases. (#6781)
Update those break statements inside case clauses that are intended to reach an
enclosing for loop, so that they correctly exit the loop.

The candidate files for this change were located using:

    % staticcheck -checks SA4011 ./... | cut -d: -f-2

This change is intended to preserve the intended semantics of the code, but
since the code as-written did not have its intended effect, some behaviour may
change. Specifically: Some loops may have run longer than they were supposed
to, prior to this change.

In one case I was not able to clearly determine the intended outcome. That case
has been commented but otherwise left as-written.

Fixes #6780.
2021-07-29 22:28:32 -04:00
JayT106
9a2a7d4307 state/privval: vote timestamp fix (#6748) 2021-07-29 12:52:53 +02:00
Callum Waters
6ff4c3139c blockchain: rename to blocksync service (#6755) 2021-07-28 17:25:42 +02:00
JayT106
e87b0391cb cli/indexer: Reindex events (#6676) 2021-07-28 00:04:54 +02:00
Aleksandr Bezobchuk
4f73748bc8 mempool v1: tweak broadcastTxRoutine (#6771) 2021-07-27 15:34:06 -04:00
William Banfield
a751eee7f2 p2p: add test for pqueue dequeue full error (#6760)
This adds a test for closing the `pqueue` while the `pqueue` contains data that has not yet been dequeued.

This issue was found while debugging #6705 

This test will fail until @cmwaters fix for this condition is merged.
2021-07-26 19:22:32 +00:00
Callum Waters
a341a626e0 p2p: avoid blocking on the dequeCh (#6765) 2021-07-26 09:09:07 -04:00
William Banfield
c3ae6f5b58 p2p: add coverage for mConnConnection.TrySendMessage (#6754)
This change adds additional coverage to the `mConnConnection.TrySendMessage` code path. Adds test to ensure it returns `io.EOF` when closed.

Addresses: #6570
2021-07-23 17:29:19 +00:00
Aleksandr Bezobchuk
a393cf8bab internal: update blockchain reactor godoc (#6749) 2021-07-23 08:15:57 -04:00
Callum Waters
0e2752ae42 light: improve error handling and allow providers to be added (#6733) 2021-07-22 18:12:34 +02:00
William Banfield
84c15857e4 mempool: return mempool errors to the abci client (#6740)
This changes adds an `MempoolError` field to the `ResponseCheckTx`. This will allow clients to understand that their transaction was rejected from the mempool despite passing the ABCI check. 

This change also updates the code to make use of early returns to prevent highly nested code blocks. Namely, it returns when the type assertion fails at the beginning of the method, instead of wrapping the entire method in a large if statement. This has a somewhat large effect on the diff as rendered by github.

addresses: #3546
2021-07-22 14:52:29 +00:00
JayT106
e70445f942 statesync/event: emit statesync start/end event (#6700) 2021-07-22 08:16:50 +02:00