Commit Graph

56 Commits

Author SHA1 Message Date
Sam Kleinman
cf2a00b398 p2p: avoid using p2p.Channel internals (#8444) 2022-04-29 17:21:36 -04:00
Sam Kleinman
2e5d53ea9a abci: interface should take pointers to arguments (#8404) 2022-04-24 18:37:07 +00:00
Sam Kleinman
c372390fea eventbus: publish without contexts (#8369) 2022-04-18 16:28:31 -04:00
Sam Kleinman
90b951af72 node+statesync: normalize initialization (#8275) 2022-04-08 11:00:58 -04:00
Sam Kleinman
9d20e06900 statesync+blocksync: move event publications into the sync operations (#8274) 2022-04-07 16:23:36 -04:00
Sam Kleinman
9d1e8eaad4 node: remove channel and peer update initialization from construction (#8238) 2022-04-05 13:26:53 +00:00
Sam Kleinman
97f7021712 statesync: merge channel processing (#8240) 2022-04-04 12:31:15 -04:00
Sam Kleinman
8df7b6103f proxy: collapse triforcated abci.Client (#8067) 2022-03-06 18:25:50 -05:00
Sam Kleinman
58dc172611 p2p: plumb rudamentary service discovery to rectors and update statesync (#8030)
This is a little coarse, but the idea is that we'll send information
about the channels a peer has upon the peer-up event that we send to
reactors that we can then use to reject peers (if neeeded) from reactors.

This solves the problem where statesync would hang in test networks
(and presumably real) where we would attempt to statesync from seed
nodes, thereby hanging silently forever.
2022-02-28 20:02:54 +00:00
M. J. Fromberger
c8e8a62084 abci/client: simplify client interface (#7607)
This change has two main effects:

1. Remove most of the Async methods from the abci.Client interface.
   Remaining are FlushAsync, CommitTxAsync, and DeliverTxAsync.

2. Rename the synchronous methods to remove the "Sync" suffix.

The rest of the change is updating the implementations, subsets, and mocks of
the interface, along with the call sites that point to them.

* Fix stringly-typed mock stubs.
* Rename helper method.
2022-01-19 10:58:56 -08:00
Sam Kleinman
7ed57ef5f9 statesync: more orderly dispatcher shutdown (#7601) 2022-01-14 16:34:12 -05:00
Sam Kleinman
e07c4cdcf2 node: collapse initialization internals (#7567) 2022-01-12 15:32:22 -05:00
Sam Kleinman
5bf1bdcfb4 reactors: skip log on some routine cancels (#7556) 2022-01-11 12:56:52 -05:00
Sam Kleinman
fc36c7782f statesync: reactor and channel construction (#7529) 2022-01-07 12:04:17 -05:00
Sam Kleinman
bef120dadf contexts: remove all TODO instances (#7466) 2021-12-16 15:15:26 -05:00
Sam Kleinman
d0e03f01fc sync: remove special mutexes (#7438) 2021-12-13 13:35:32 -05:00
Sam Kleinman
65c0aaee5e p2p: use recieve for channel iteration (#7425) 2021-12-13 11:04:44 -05:00
Sam Kleinman
bd6dc3ca88 p2p: refactor channel Send/out (#7414) 2021-12-09 14:03:41 -05:00
Sam Kleinman
cb88bd3941 p2p: migrate to use new interface for channel errors (#7403)
* p2p: migrate to use new interface for channel errors

* Update internal/p2p/p2ptest/require.go

Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>

* rename

* feedback

Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2021-12-08 14:05:01 +00:00
Sam Kleinman
892f5d9524 service: cleanup mempool and peer update shutdown (#7401) 2021-12-08 08:44:32 -05:00
Sam Kleinman
0ff3d4b89d service: cleanup close channel in reactors (#7399) 2021-12-07 11:40:59 -05:00
Sam Kleinman
a62ac27047 service: remove exported logger from base implemenation (#7381) 2021-12-06 10:16:42 -05:00
Sam Kleinman
8a991e288c service: plumb contexts to all (most) threads (#7363)
This continues the push of plumbing contexts through tendermint. I
attempted to find all goroutines in the production code (non-test) and
made sure that these threads would exit when their contexts were
canceled, and I believe this PR does that.
2021-12-02 21:38:38 +00:00
Sam Kleinman
6ab62fe7b6 service: remove stop method and use contexts (#7292) 2021-11-18 17:56:21 -05:00
Sam Kleinman
ca8f004112 p2p: remove final shims from p2p package (#7136)
This is, perhaps, the trival final piece of #7075 that I've been
working on.

There's more work to be done: 
- push more of the setup into the pacakges themselves
- move channel-based sending/filtering out of the 
- simplify the buffering throuhgout the p2p stack.
2021-10-15 20:08:09 +00:00
Sam Kleinman
cbe6ad6cd5 p2p: flatten channel descriptor (#7132) 2021-10-15 13:03:10 -04:00
Sam Kleinman
0900ea8396 p2p: channel shim cleanup (#7129) 2021-10-15 12:31:33 -04:00
Sam Kleinman
f4a56f4034 p2p: refactor channel description (#7130)
This is another small sliver of #7075, with the intention of removing
the legacy shim layer related to channel registration.
2021-10-15 15:45:12 +00:00
Sam Kleinman
ded310093e lint: fix collection of stale errors (#7090)
Few things that had been annoying.
2021-10-09 15:33:54 +00:00
Sam Kleinman
1b5bb5348f p2p: cleanup unused arguments (#7079)
This is mostly just reading through the output of uparam, after
noticing that there were a few places where we were ignoring some arguments.
2021-10-08 12:49:17 +00:00
William Banfield
243c62cc68 statesync: improve rare p2p race condition (#7042)
This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. 

The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.
2021-10-01 20:33:12 +00:00
William Banfield
177850a2c9 statesync: remove deadlock on init fail (#7029)
When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag.

```
goroutine 36 [chan receive]:
github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240)
        github.com/tendermint/tendermint/node/node.go:769 +0x132
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1()
        github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62
github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0)
        github.com/tendermint/tendermint/libs/os/os.go:26 +0x102
created by github.com/tendermint/tendermint/libs/os.TrapSignal
        github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6

goroutine 188 [semacquire]:
sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1)
        runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00026b1c8)
        sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
        sync/mutex.go:81
sync.(*RWMutex).Lock(0xc00026b1c8)
        sync/rwmutex.go:111 +0x90
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab
created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart
        github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd)
```
2021-09-30 19:19:10 +00:00
Sam Kleinman
23fe6fd2f9 statesync: ensure test network properly configured (#7026)
This test reliably gets hung up on network configuration, (which may
be a real issue,) but it's network setup is handcranked and we should
ensure that the test focuses on it's core assertions and doesn't fail for 
test architecture reasons.
2021-09-29 16:38:27 +00:00
Sam Kleinman
9a16d930c6 statesync: add logging while waiting for peers (#7007) 2021-09-27 16:46:40 -04:00
Sam Kleinman
71c6682b57 statesync: clean up reactor/syncer lifecylce (#6995)
I've been noticing that there are a number of situations where the
statesync reactor blocks waiting for peers (or similar,) I've moved
things around to improve outcomes in local tests.
2021-09-24 21:40:12 +00:00
Sam Kleinman
bb8ffcb95b store: move pacakge to internal (#6978) 2021-09-23 11:46:42 -04:00
Sam Kleinman
1c4950dbd2 state: move package to internal (#6964) 2021-09-22 13:04:25 -04:00
JayT106
84ffaaaf37 statesync/rpc: metrics for the statesync and the rpc SyncInfo (#6795) 2021-09-21 09:22:16 +02:00
Sam Kleinman
9dfdc62eb7 proxy: move proxy package to internal (#6953) 2021-09-20 15:18:48 -04:00
Callum Waters
bda948e814 statesync: implement p2p state provider (#6807) 2021-09-02 13:19:18 +02:00
JayT106
e70445f942 statesync/event: emit statesync start/end event (#6700) 2021-07-22 08:16:50 +02:00
Callum Waters
6dd0cf92c8 router/statesync: add helpful log messages (#6724) 2021-07-15 19:26:35 +02:00
Sam Kleinman
ab5c63eff3 statesync: increase dispatcher timeout (#6714) 2021-07-13 13:04:18 -04:00
Callum Waters
a12e2bbb60 statesync: use initial height as a floor to backfilling (#6709) 2021-07-13 16:36:16 +02:00
William Banfield
cabd916517 Revert "statesync: keep peer despite lightblock query fail (#6692)" (#6696)
* Revert "statesync: keep peer despite lightblock query fail (#6692)"

This reverts commit 50b00dff71.
2021-07-12 15:20:02 -04:00
William Banfield
50b00dff71 statesync: keep peer despite lightblock query fail (#6692)
When a peer responds with no lightblock for the height we queried, we call the [removePeer method](https://github.com/tendermint/tendermint/blob/master/internal/statesync/reactor.go#L339). This removes the peer from the [dispatcher's list of called peer's](ad65883152/internal/statesync/dispatcher.go (L159)). When the dispatcher then receives responses from the removed peer, it [drops their responses](ad65883152/internal/statesync/dispatcher.go (L130)). These responses may be meaningful or contain a block or data that will help statesync proceed.

[The logs](https://gist.github.com/tychoish/34a1f61eaae3c36c23efc7d0001e805c), when this change is applied, show an additional 3 networking testnets passing. 

addresses:  #6691
2021-07-09 21:20:25 +00:00
Callum Waters
051e127d38 light: correctly handle contexts (#6687) 2021-07-09 18:48:18 +02:00
Aleksandr Bezobchuk
1dec3e139a add stacktrace to panic logs (#6662) 2021-07-06 14:26:18 -04:00
Callum Waters
a1e1e6c290 test: fix non-deterministic backfill test (#6648) 2021-07-05 16:42:36 +02:00
Sam Kleinman
917180dfd2 p2p: reduce buffering on channels (#6609)
Having smaller buffers in each reactor/channel will mean that there will be fewer stale messages.
2021-06-24 20:38:35 +00:00