Commit Graph

47 Commits

Author SHA1 Message Date
M. J. Fromberger
c8e8a62084 abci/client: simplify client interface (#7607)
This change has two main effects:

1. Remove most of the Async methods from the abci.Client interface.
   Remaining are FlushAsync, CommitTxAsync, and DeliverTxAsync.

2. Rename the synchronous methods to remove the "Sync" suffix.

The rest of the change is updating the implementations, subsets, and mocks of
the interface, along with the call sites that point to them.

* Fix stringly-typed mock stubs.
* Rename helper method.
2022-01-19 10:58:56 -08:00
Sam Kleinman
7ed57ef5f9 statesync: more orderly dispatcher shutdown (#7601) 2022-01-14 16:34:12 -05:00
Sam Kleinman
e07c4cdcf2 node: collapse initialization internals (#7567) 2022-01-12 15:32:22 -05:00
Sam Kleinman
5bf1bdcfb4 reactors: skip log on some routine cancels (#7556) 2022-01-11 12:56:52 -05:00
Sam Kleinman
fc36c7782f statesync: reactor and channel construction (#7529) 2022-01-07 12:04:17 -05:00
Sam Kleinman
bef120dadf contexts: remove all TODO instances (#7466) 2021-12-16 15:15:26 -05:00
Sam Kleinman
d0e03f01fc sync: remove special mutexes (#7438) 2021-12-13 13:35:32 -05:00
Sam Kleinman
65c0aaee5e p2p: use recieve for channel iteration (#7425) 2021-12-13 11:04:44 -05:00
Sam Kleinman
bd6dc3ca88 p2p: refactor channel Send/out (#7414) 2021-12-09 14:03:41 -05:00
Sam Kleinman
cb88bd3941 p2p: migrate to use new interface for channel errors (#7403)
* p2p: migrate to use new interface for channel errors

* Update internal/p2p/p2ptest/require.go

Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>

* rename

* feedback

Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2021-12-08 14:05:01 +00:00
Sam Kleinman
892f5d9524 service: cleanup mempool and peer update shutdown (#7401) 2021-12-08 08:44:32 -05:00
Sam Kleinman
0ff3d4b89d service: cleanup close channel in reactors (#7399) 2021-12-07 11:40:59 -05:00
Sam Kleinman
a62ac27047 service: remove exported logger from base implemenation (#7381) 2021-12-06 10:16:42 -05:00
Sam Kleinman
8a991e288c service: plumb contexts to all (most) threads (#7363)
This continues the push of plumbing contexts through tendermint. I
attempted to find all goroutines in the production code (non-test) and
made sure that these threads would exit when their contexts were
canceled, and I believe this PR does that.
2021-12-02 21:38:38 +00:00
Sam Kleinman
6ab62fe7b6 service: remove stop method and use contexts (#7292) 2021-11-18 17:56:21 -05:00
Sam Kleinman
ca8f004112 p2p: remove final shims from p2p package (#7136)
This is, perhaps, the trival final piece of #7075 that I've been
working on.

There's more work to be done: 
- push more of the setup into the pacakges themselves
- move channel-based sending/filtering out of the 
- simplify the buffering throuhgout the p2p stack.
2021-10-15 20:08:09 +00:00
Sam Kleinman
cbe6ad6cd5 p2p: flatten channel descriptor (#7132) 2021-10-15 13:03:10 -04:00
Sam Kleinman
0900ea8396 p2p: channel shim cleanup (#7129) 2021-10-15 12:31:33 -04:00
Sam Kleinman
f4a56f4034 p2p: refactor channel description (#7130)
This is another small sliver of #7075, with the intention of removing
the legacy shim layer related to channel registration.
2021-10-15 15:45:12 +00:00
Sam Kleinman
ded310093e lint: fix collection of stale errors (#7090)
Few things that had been annoying.
2021-10-09 15:33:54 +00:00
Sam Kleinman
1b5bb5348f p2p: cleanup unused arguments (#7079)
This is mostly just reading through the output of uparam, after
noticing that there were a few places where we were ignoring some arguments.
2021-10-08 12:49:17 +00:00
William Banfield
243c62cc68 statesync: improve rare p2p race condition (#7042)
This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. 

The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.
2021-10-01 20:33:12 +00:00
William Banfield
177850a2c9 statesync: remove deadlock on init fail (#7029)
When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag.

```
goroutine 36 [chan receive]:
github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240)
        github.com/tendermint/tendermint/node/node.go:769 +0x132
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1()
        github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62
github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0)
        github.com/tendermint/tendermint/libs/os/os.go:26 +0x102
created by github.com/tendermint/tendermint/libs/os.TrapSignal
        github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6

goroutine 188 [semacquire]:
sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1)
        runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00026b1c8)
        sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
        sync/mutex.go:81
sync.(*RWMutex).Lock(0xc00026b1c8)
        sync/rwmutex.go:111 +0x90
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab
created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart
        github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd)
```
2021-09-30 19:19:10 +00:00
Sam Kleinman
23fe6fd2f9 statesync: ensure test network properly configured (#7026)
This test reliably gets hung up on network configuration, (which may
be a real issue,) but it's network setup is handcranked and we should
ensure that the test focuses on it's core assertions and doesn't fail for 
test architecture reasons.
2021-09-29 16:38:27 +00:00
Sam Kleinman
9a16d930c6 statesync: add logging while waiting for peers (#7007) 2021-09-27 16:46:40 -04:00
Sam Kleinman
71c6682b57 statesync: clean up reactor/syncer lifecylce (#6995)
I've been noticing that there are a number of situations where the
statesync reactor blocks waiting for peers (or similar,) I've moved
things around to improve outcomes in local tests.
2021-09-24 21:40:12 +00:00
Sam Kleinman
bb8ffcb95b store: move pacakge to internal (#6978) 2021-09-23 11:46:42 -04:00
Sam Kleinman
1c4950dbd2 state: move package to internal (#6964) 2021-09-22 13:04:25 -04:00
JayT106
84ffaaaf37 statesync/rpc: metrics for the statesync and the rpc SyncInfo (#6795) 2021-09-21 09:22:16 +02:00
Sam Kleinman
9dfdc62eb7 proxy: move proxy package to internal (#6953) 2021-09-20 15:18:48 -04:00
Callum Waters
bda948e814 statesync: implement p2p state provider (#6807) 2021-09-02 13:19:18 +02:00
JayT106
e70445f942 statesync/event: emit statesync start/end event (#6700) 2021-07-22 08:16:50 +02:00
Callum Waters
6dd0cf92c8 router/statesync: add helpful log messages (#6724) 2021-07-15 19:26:35 +02:00
Sam Kleinman
ab5c63eff3 statesync: increase dispatcher timeout (#6714) 2021-07-13 13:04:18 -04:00
Callum Waters
a12e2bbb60 statesync: use initial height as a floor to backfilling (#6709) 2021-07-13 16:36:16 +02:00
William Banfield
cabd916517 Revert "statesync: keep peer despite lightblock query fail (#6692)" (#6696)
* Revert "statesync: keep peer despite lightblock query fail (#6692)"

This reverts commit 50b00dff71.
2021-07-12 15:20:02 -04:00
William Banfield
50b00dff71 statesync: keep peer despite lightblock query fail (#6692)
When a peer responds with no lightblock for the height we queried, we call the [removePeer method](https://github.com/tendermint/tendermint/blob/master/internal/statesync/reactor.go#L339). This removes the peer from the [dispatcher's list of called peer's](ad65883152/internal/statesync/dispatcher.go (L159)). When the dispatcher then receives responses from the removed peer, it [drops their responses](ad65883152/internal/statesync/dispatcher.go (L130)). These responses may be meaningful or contain a block or data that will help statesync proceed.

[The logs](https://gist.github.com/tychoish/34a1f61eaae3c36c23efc7d0001e805c), when this change is applied, show an additional 3 networking testnets passing. 

addresses:  #6691
2021-07-09 21:20:25 +00:00
Callum Waters
051e127d38 light: correctly handle contexts (#6687) 2021-07-09 18:48:18 +02:00
Aleksandr Bezobchuk
1dec3e139a add stacktrace to panic logs (#6662) 2021-07-06 14:26:18 -04:00
Callum Waters
a1e1e6c290 test: fix non-deterministic backfill test (#6648) 2021-07-05 16:42:36 +02:00
Sam Kleinman
917180dfd2 p2p: reduce buffering on channels (#6609)
Having smaller buffers in each reactor/channel will mean that there will be fewer stale messages.
2021-06-24 20:38:35 +00:00
Callum Waters
6e238b5b9d statesync: make fetching chunks more robust (#6587) 2021-06-21 10:14:15 -04:00
Callum Waters
25bb556fee p2p: increase queue size to 16MB (#6588) 2021-06-16 17:27:41 +02:00
Aleksandr Bezobchuk
7d961b55b2 state sync: tune request timeout and chunkers (#6566) 2021-06-15 14:33:26 -04:00
Callum Waters
74af343f28 statesync: tune backfill process (#6565)
This PR make some tweaks to backfill after running e2e tests:
- Separates sync and backfill as two distinct processes that the node calls. The reason is because if sync fails then the node should fail but if backfill fails it is still possible to proceed.
- Removes peers who don't have the block at a height from the local peer list. As the process goes backwards if a node doesn't have a block at a height they're likely pruning blocks and thus they won't have any prior ones either. 
- Sleep when we've run out of peers, then try again.
2021-06-11 15:26:18 +00:00
Callum Waters
6f6ac5c04e state sync: reverse sync implementation (#6463) 2021-06-08 19:23:52 +02:00
Sam Kleinman
a855f96946 p2p: renames for reactors and routing layer internal moves (#6547) 2021-06-08 08:17:09 -04:00