Commit Graph

44 Commits

Author SHA1 Message Date
M. J. Fromberger
451e697331 Update generated mocks after upgrade of Mockery v2. (#8973) 2022-07-11 09:18:36 -04:00
William Banfield
978f754ad3 p2p: set empty timeouts to configed values. (manual backport of #8847) (#8869)
* regenerate mocks using newer style

* p2p: set empty timeouts to small values. (#8847)

These timeouts default to 'do not time out' if they are not set. This times up resources, potentially indefinitely. If node on the other side of the the handshake is up but unresponsive, the[ handshake call](edec79448a/internal/p2p/router.go (L720)) will _never_ return.

* fix light client select statement
2022-06-28 16:07:15 -04:00
Sam Kleinman
bdd59c892c statesync: avoid potential race (#8494) 2022-05-11 15:09:41 -04:00
mergify[bot]
ca69da7533 statesync: avoid compounding retry logic for fetching consensus parameters (backport #8032) (#8041)
(cherry picked from commit a965f03c15)
2022-03-01 18:59:29 -05:00
mergify[bot]
0e08c66926 p2p: plumb rudamentary service discovery to rectors and update statesync (backport #8030) (#8036) 2022-02-28 20:51:47 -08:00
mergify[bot]
cf7e440e68 statesync: assert app version matches (backport #7856) (#7886) 2022-02-21 11:19:17 +01:00
mergify[bot]
caddddcb3a light: remove legacy timeout scheme (backport #7776) (#7786) 2022-02-09 08:35:55 -05:00
Sam Kleinman
710407e9b2 consensus: delay start of peer routines (backport of #7753) (#7760) 2022-02-04 10:18:19 -05:00
Sam Kleinman
003d15fa4b lint: cleanup branch lint errors (#7238) 2021-11-04 07:44:13 -04:00
William Banfield
243c62cc68 statesync: improve rare p2p race condition (#7042)
This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. 

The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.
2021-10-01 20:33:12 +00:00
William Banfield
177850a2c9 statesync: remove deadlock on init fail (#7029)
When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag.

```
goroutine 36 [chan receive]:
github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240)
        github.com/tendermint/tendermint/node/node.go:769 +0x132
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1()
        github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62
github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0)
        github.com/tendermint/tendermint/libs/os/os.go:26 +0x102
created by github.com/tendermint/tendermint/libs/os.TrapSignal
        github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6

goroutine 188 [semacquire]:
sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1)
        runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00026b1c8)
        sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
        sync/mutex.go:81
sync.(*RWMutex).Lock(0xc00026b1c8)
        sync/rwmutex.go:111 +0x90
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab
created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart
        github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd)
```
2021-09-30 19:19:10 +00:00
Sam Kleinman
23fe6fd2f9 statesync: ensure test network properly configured (#7026)
This test reliably gets hung up on network configuration, (which may
be a real issue,) but it's network setup is handcranked and we should
ensure that the test focuses on it's core assertions and doesn't fail for 
test architecture reasons.
2021-09-29 16:38:27 +00:00
Sam Kleinman
9a16d930c6 statesync: add logging while waiting for peers (#7007) 2021-09-27 16:46:40 -04:00
Sam Kleinman
71c6682b57 statesync: clean up reactor/syncer lifecylce (#6995)
I've been noticing that there are a number of situations where the
statesync reactor blocks waiting for peers (or similar,) I've moved
things around to improve outcomes in local tests.
2021-09-24 21:40:12 +00:00
Sam Kleinman
bb8ffcb95b store: move pacakge to internal (#6978) 2021-09-23 11:46:42 -04:00
M. J. Fromberger
cf7537ea5f cleanup: Reduce and normalize import path aliasing. (#6975)
The code in the Tendermint repository makes heavy use of import aliasing.
This is made necessary by our extensive reuse of common base package names, and
by repetition of similar names across different subdirectories.

Unfortunately we have not been very consistent about which packages we alias in
various circumstances, and the aliases we use vary. In the spirit of the advice
in the style guide and https://github.com/golang/go/wiki/CodeReviewComments#imports,
his change makes an effort to clean up and normalize import aliasing.

This change makes no API or behavioral changes. It is a pure cleanup intended
o help make the code more readable to developers (including myself) trying to
understand what is being imported where.

Only unexported names have been modified, and the changes were generated and
applied mechanically with gofmt -r and comby, respecting the lexical and
syntactic rules of Go.  Even so, I did not fix every inconsistency. Where the
changes would be too disruptive, I left it alone.

The principles I followed in this cleanup are:

- Remove aliases that restate the package name.
- Remove aliases where the base package name is unambiguous.
- Move overly-terse abbreviations from the import to the usage site.
- Fix lexical issues (remove underscores, remove capitalization).
- Fix import groupings to more closely match the style guide.
- Group blank (side-effecting) imports and ensure they are commented.
- Add aliases to multiple imports with the same base package name.
2021-09-23 07:52:07 -07:00
Sam Kleinman
1c4950dbd2 state: move package to internal (#6964) 2021-09-22 13:04:25 -04:00
JayT106
84ffaaaf37 statesync/rpc: metrics for the statesync and the rpc SyncInfo (#6795) 2021-09-21 09:22:16 +02:00
Sam Kleinman
9dfdc62eb7 proxy: move proxy package to internal (#6953) 2021-09-20 15:18:48 -04:00
Callum Waters
bda948e814 statesync: implement p2p state provider (#6807) 2021-09-02 13:19:18 +02:00
William Banfield
4e96c6b234 tools: add mockery to tools.go and remove mockery version strings (#6787)
This change aims to keep versions of mockery consistent across developer laptops.

This change adds mockery to the `tools.go` file so that its version can be managed consistently in the `go.mod` file.

Additionally, this change temporarily disables adding mockery's version number to generated files. There is an outstanding issue against the mockery project related to the version string behavior when running from `go get`. I have created a pull request to fix this issue in the mockery project.
see: https://github.com/vektra/mockery/issues/397
2021-07-30 20:47:15 +00:00
Callum Waters
02f8e4c0bd blockstore: fix problem with seen commit (#6782) 2021-07-30 17:37:04 +02:00
Callum Waters
0e2752ae42 light: improve error handling and allow providers to be added (#6733) 2021-07-22 18:12:34 +02:00
JayT106
e70445f942 statesync/event: emit statesync start/end event (#6700) 2021-07-22 08:16:50 +02:00
Callum Waters
6dd0cf92c8 router/statesync: add helpful log messages (#6724) 2021-07-15 19:26:35 +02:00
William Banfield
a46724e4f6 statesync: dispatcher test uses internal channel for timing (#6713)
This code change amends the dispatcher tests to read from the dispatcher's `requestCh`. This ensures that a request is waiting when the test calls `dispatcher.respond`. 
addresses: #6711
2021-07-14 14:16:09 +00:00
Sam Kleinman
ab5c63eff3 statesync: increase dispatcher timeout (#6714) 2021-07-13 13:04:18 -04:00
Sam Kleinman
8228936155 e2e: extend timeouts in test harness (#6694) 2021-07-13 11:28:07 -04:00
Callum Waters
a12e2bbb60 statesync: use initial height as a floor to backfilling (#6709) 2021-07-13 16:36:16 +02:00
William Banfield
4009102e2b statesync: remove outgoingCalls race condition in dispatcher (#6699)
* statesync: remove outgoing calls race condition
2021-07-12 19:05:47 -04:00
William Banfield
cabd916517 Revert "statesync: keep peer despite lightblock query fail (#6692)" (#6696)
* Revert "statesync: keep peer despite lightblock query fail (#6692)"

This reverts commit 50b00dff71.
2021-07-12 15:20:02 -04:00
William Banfield
50b00dff71 statesync: keep peer despite lightblock query fail (#6692)
When a peer responds with no lightblock for the height we queried, we call the [removePeer method](https://github.com/tendermint/tendermint/blob/master/internal/statesync/reactor.go#L339). This removes the peer from the [dispatcher's list of called peer's](ad65883152/internal/statesync/dispatcher.go (L159)). When the dispatcher then receives responses from the removed peer, it [drops their responses](ad65883152/internal/statesync/dispatcher.go (L130)). These responses may be meaningful or contain a block or data that will help statesync proceed.

[The logs](https://gist.github.com/tychoish/34a1f61eaae3c36c23efc7d0001e805c), when this change is applied, show an additional 3 networking testnets passing. 

addresses:  #6691
2021-07-09 21:20:25 +00:00
Callum Waters
051e127d38 light: correctly handle contexts (#6687) 2021-07-09 18:48:18 +02:00
Callum Waters
2c14d491f6 fix leaking statesync test (#6680) 2021-07-08 15:26:35 +02:00
Aleksandr Bezobchuk
1dec3e139a add stacktrace to panic logs (#6662) 2021-07-06 14:26:18 -04:00
Callum Waters
a1e1e6c290 test: fix non-deterministic backfill test (#6648) 2021-07-05 16:42:36 +02:00
Sam Kleinman
917180dfd2 p2p: reduce buffering on channels (#6609)
Having smaller buffers in each reactor/channel will mean that there will be fewer stale messages.
2021-06-24 20:38:35 +00:00
Sam Kleinman
ae5f98881b p2p: make NodeID and NetAddress public (#6583) 2021-06-24 09:59:14 -04:00
Callum Waters
6e238b5b9d statesync: make fetching chunks more robust (#6587) 2021-06-21 10:14:15 -04:00
Callum Waters
25bb556fee p2p: increase queue size to 16MB (#6588) 2021-06-16 17:27:41 +02:00
Aleksandr Bezobchuk
7d961b55b2 state sync: tune request timeout and chunkers (#6566) 2021-06-15 14:33:26 -04:00
Callum Waters
74af343f28 statesync: tune backfill process (#6565)
This PR make some tweaks to backfill after running e2e tests:
- Separates sync and backfill as two distinct processes that the node calls. The reason is because if sync fails then the node should fail but if backfill fails it is still possible to proceed.
- Removes peers who don't have the block at a height from the local peer list. As the process goes backwards if a node doesn't have a block at a height they're likely pruning blocks and thus they won't have any prior ones either. 
- Sleep when we've run out of peers, then try again.
2021-06-11 15:26:18 +00:00
Callum Waters
6f6ac5c04e state sync: reverse sync implementation (#6463) 2021-06-08 19:23:52 +02:00
Sam Kleinman
a855f96946 p2p: renames for reactors and routing layer internal moves (#6547) 2021-06-08 08:17:09 -04:00