Commit Graph

131 Commits

Author SHA1 Message Date
William Banfield
738457a63f p2p: make queue read and write simultaneously 2022-07-01 18:55:46 -04:00
Ian Jungyong Um
1d96faa35a p2p: fix typo (#8922)
Fix minor typos in peermanager
2022-07-01 12:44:39 +00:00
William Banfield
921530c352 p2p: use correct context error (#8916)
handshakeCtx is the internal context carrying the timeout. Its error should be used for the error return.
2022-06-30 22:02:59 +00:00
William Banfield
5274f80de4 p2p: fix flakey test due to disconnect cooldown (#8917)
This test was made flakey by #8839. The cooldown period means that the node in the test will not try to reconnect as quickly as the test expects. This change makes the cooldown shorter in the test so that the node quickly reconnects.
2022-06-30 21:48:10 +00:00
Sam Kleinman
60881f1d06 p2p: stop mconn channel sends without timeout (#8906) 2022-06-30 09:01:02 -04:00
Sam Kleinman
52b6dc19ba p2p: remove dial sleep and provide disconnect cooldown (#8839)
Alternative proposal for #8826
2022-06-24 17:57:49 +00:00
William Banfield
5f5e74798b p2p: set empty timeouts to small values. (#8847)
These timeouts default to 'do not time out' if they are not set. This times up resources, potentially indefinitely. If node on the other side of the the handshake is up but unresponsive, the[ handshake call](edec79448a/internal/p2p/router.go (L720)) will _never_ return.

These are proposed values that have not been validated. I intend to validate them in a production setting.
2022-06-23 22:33:21 +00:00
Sam Kleinman
436a38f876 p2p: track peers by address (#8841) 2022-06-23 10:03:10 -04:00
Ian Jungyong Um
2e11760fbe p2p: fix typo (#8836) 2022-06-22 09:30:11 +02:00
William Banfield
8860e027a8 p2p: more dial routines (#8827)
The dial routines perform network i/o, which is a blocking call into the kernel. These routines are completely unable to do anything else while the dial occurs, so for most of their lifecycle they are sitting idle waiting for the tcp stack to hand them data. We should increase this value by _a lot_ to enable more concurrent dials. This is unlikely to cause CPU starvation because these routines sit idle most of the time. The current value causes dials to occur _way_ too slowly. 

Below is a graph demonstrating the before and after of this change in a testnetwork with many dead peers. You can observe that the rate that we connect to new, valid peers, is _much_ higher than previously. Change was deployed around the 31 minute mark on the graph.

![image](https://user-images.githubusercontent.com/4561443/174919007-50e4453a-edd8-41d0-97ee-dea8853d57f7.png)
2022-06-22 00:51:09 +00:00
Sam Kleinman
cfd13825e2 p2p: add eviction metrics and cleanup dialing error handling (#8819) 2022-06-21 20:44:14 +00:00
Sam Kleinman
28d3239958 p2p: wake dialing thread after sleep (#8803) 2022-06-20 15:47:56 +00:00
Ian Jungyong Um
e3e162ff10 p2p: fix typo (#8793)
Fix the typo in router
2022-06-19 15:26:25 +00:00
Sam Kleinman
4d820ff4f5 p2p: peer score should not wrap around (#8790) 2022-06-17 22:27:38 +00:00
Sam Kleinman
0ac03468d8 p2p: track peers stored on startup (#8787) 2022-06-17 18:10:10 +00:00
Sam Kleinman
9e5b13725d p2p: peer store and dialing changes (#8737) 2022-06-17 08:02:10 -04:00
Sam Kleinman
51b3f111dc p2p: fix mconn transport accept test (#8762)
Fix minor test incongruency missed earlier.
2022-06-14 23:48:48 +00:00
Sam Kleinman
979a6a1b13 p2p: accept should not abort on first error (#8759) 2022-06-14 19:12:53 -04:00
Sam Kleinman
bf1cb89bb7 Revert "p2p: self-add node should not error (tendermint#8753)" (#8757) 2022-06-14 20:55:10 +00:00
Sam Kleinman
7971f4a2fc p2p: self-add node should not error (#8753) 2022-06-14 16:45:05 +00:00
Sam Kleinman
666d93338a p2p: shed peers from store from other networks (#8678) 2022-06-02 11:14:25 -04:00
M. J. Fromberger
a988cefe5d Update generated mocks after #8607. (#8612) 2022-05-25 15:48:56 +00:00
Sam Kleinman
d59a53be01 p2p: reduce ability of SendError to disconnect peers (#8597) 2022-05-24 11:19:32 -04:00
Sam Kleinman
2897b75853 p2p: remove unused get height methods (#8569) 2022-05-17 10:56:26 -04:00
William Banfield
92811b9153 metrics: transition all metrics to using metricsgen generated constructors. (#8488)
## What does this change do?

This pull request completes the change to the `metricsgen` metrics. It adds `go generate` directives to all of the files containing the `Metrics` structs.

Using the outputs of `metricsdiff` between these generated metrics and `master`, we can see that there is not a diff between the two sets of metrics when run locally.
```
[william@sidewinder] tendermint[wb/metrics-gen-transition]:. ◆ ./scripts/metricsgen/metricsdiff/metricsdiff metrics_master metrics_generated
[william@sidewinder] tendermint[wb/metrics-gen-transition]:. ◆
```

This change also adds parsing for a `metrics:` key in a field comment. If a comment line begins with `//metrics:` the rest of the line is interpreted to be the metric help text. Additionally, a bug where lists of labels were not properly quoted in the `metricsgen` rendered output was fixed.
2022-05-12 18:39:12 +00:00
Sam Kleinman
37287ead94 p2p: remove message type from channel implementation (#8452) 2022-05-02 10:52:57 -04:00
Sam Kleinman
cf2a00b398 p2p: avoid using p2p.Channel internals (#8444) 2022-04-29 17:21:36 -04:00
Sam Kleinman
2a58ea3ab2 p2p: use nodeinfo less often (#8427) 2022-04-27 21:13:38 +00:00
Sam Kleinman
8670678291 p2p: remove support for multiple transports and endpoints (#8420) 2022-04-27 14:29:19 -04:00
M. J. Fromberger
001449d536 Remove obsolete build tagged patch for net.Pipe. (#8399)
The p2p/conn library was using a build patch to work around an old issue with
the net.Conn type that has not existed since Go 1.10. Remove the workaround and
update usage to use the standard net.Pipe directly.
2022-04-23 09:57:42 -07:00
Sam Kleinman
b5e6cf50d1 abci: Application should return errors errors and nilable response objects (#8396) 2022-04-22 20:40:42 -04:00
Sam Kleinman
8345dc4f7c abci: application type should take contexts (#8388) 2022-04-22 10:58:01 -04:00
Sam Kleinman
efd4f4a40b cleanup: unused parameters (#8372) 2022-04-18 16:45:21 -04:00
Sam Kleinman
889341152a p2p: fix setting in con-tracker (#8370) 2022-04-18 15:49:11 -04:00
M. J. Fromberger
14f41ac5e3 Fix more broken Markdown links. (#8271) 2022-04-07 00:15:20 -07:00
Sam Kleinman
d153388446 p2p: inject nodeinfo into router (#8261) 2022-04-06 14:02:07 -04:00
Sam Kleinman
9d1e8eaad4 node: remove channel and peer update initialization from construction (#8238) 2022-04-05 13:26:53 +00:00
Sam Kleinman
97f7021712 statesync: merge channel processing (#8240) 2022-04-04 12:31:15 -04:00
Sam Kleinman
0bded371c5 testing: logger cleanup (#8153)
This contains two major changes:

- Remove the legacy test logging method, and just explicitly call the
  noop logger. This is just to make the test logging behavior more
  coherent and clear. 
  
- Move the logging in the light package from the testing.T logger to
  the noop logger. It's really the case that we very rarely need/want
  to consider test logs unless we're doing reproductions and running a
  narrow set of tests.
  
In most cases, I (for one) prefer to run in verbose mode so I can
watch progress of tests, but I basically never need to consider
logs. If I do want to see logs, then I can edit in the testing.T
logger locally (which is what you have to do today, anyway.)
2022-03-18 17:39:38 +00:00
JayT106
4400b0f6d3 p2p: adjust max non-persistent peer score (#8137)
Guarantee persistent peers have the highest connecting priority. 
The peerStore.Ranked returns an arbitrary order of peers with the same scores.
2022-03-17 14:30:45 -07:00
M. J. Fromberger
658a7661c5 p2p: remove unnecessary panic handling in PEX reactor (#8110)
The message handling in this reactor is all under control of the reactor
itself, and does not call out to callbacks or other externally-supplied code.
It doesn't need to check for panics.

- Remove an irrelevant channel ID check.
- Remove an unnecessary panic recovery wrapper.
2022-03-11 08:23:03 -08:00
M. J. Fromberger
89b4321af2 p2p: update polling interval calculation for PEX requests (#8106)
The PEX reactor has a simple feedback control mechanism to decide how often to
poll peers for peer address updates. The idea is to poll more frequently when
knowledge of the network is less, and decrease frequency as knowledge grows.

This change solves two problems:

1. It is possible in some cases we may poll a peer "too often" and get dropped
   by that peer for spamming.

2. The first successful peer update with any content resets the polling timer
   to a very long time (10m), meaning if we are unlucky in getting an
   incomplete reply while the network is small, we may not try again for a very
   long time. This may contribute to difficulties bootstrapping sync.

The main change here is to only update the interval when new information is
added to the system, and not (as before) whenever a request is sent out to a
peer. The rate computation is essentially the same as before, although the code
has been a bit simplified, and I consolidated some of the error handling so
that we don't have to check in multiple places for the same conditions.

Related changes:

- Improve error diagnostics for too-soon and overflow conditions.
- Clean up state handling in the poll interval computation.
- Pin the minimum interval avert a chance of PEX spamming a peer.
2022-03-11 07:49:33 -08:00
JayT106
d9c9675e2a p2p+flowrate: rate control refactor (#7828)
Adding `CurrentTransferRate ` in the flowrate package because only the status of the transfer rate has been used.
2022-03-10 13:48:23 +00:00
M. J. Fromberger
2df5c85a8d Fix govet errors for %w use in test errors. (#8083)
The %w syntax is a fmt.Errorf thing, not supported by the testing package.
2022-03-07 18:17:37 +00:00
M. J. Fromberger
a22942504c p2p: re-enable tests previously disabled (#8049) 2022-03-01 12:25:11 -08:00
Sam Kleinman
58dc172611 p2p: plumb rudamentary service discovery to rectors and update statesync (#8030)
This is a little coarse, but the idea is that we'll send information
about the channels a peer has upon the peer-up event that we send to
reactors that we can then use to reject peers (if neeeded) from reactors.

This solves the problem where statesync would hang in test networks
(and presumably real) where we would attempt to statesync from seed
nodes, thereby hanging silently forever.
2022-02-28 20:02:54 +00:00
Sam Kleinman
a153f82433 p2p: ignore transport close error during cleanup (#8011)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-25 14:48:41 -05:00
Sam Kleinman
89dbebd1c5 p2p: retry failed connections slightly more aggressively (#8010)
* p2p: retry failed connections slightly more aggressively

* fix dial interval test
2022-02-25 18:05:29 +00:00
Sam Kleinman
c8ae5db50e p2p: relax pong timeout (#8007) 2022-02-25 10:58:29 -05:00
Sam Kleinman
c85e3e4ba8 p2p: mconn track last message for pongs (#7995)
* p2p: mconn track last message for pongs

* fix spell

* cr feedback

* test fix part one

* cleanup tests

* fix comment

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
2022-02-25 15:15:13 +00:00