Commit Graph

822 Commits

Author SHA1 Message Date
Marko
8c539f7c2b logs: cleanup (#6198) 2021-03-04 10:09:08 +00:00
Anton Kaliaev
089b314bdb test/fuzz: fix PEX reactor test (#6188)
* test/fuzz: fix PEX reactor test

* .github: [fuzz] increase retention period for crashers
2021-03-01 20:18:51 +04:00
Callum Waters
42f6c40751 p2p: enable scheme-less parsing of IPv6 strings (#6158) 2021-02-22 16:24:56 +01:00
Erik Grinaker
b6be889b97 node: feature flag for legacy p2p support (#6056) 2021-02-16 11:57:16 -05:00
Aleksandr Bezobchuk
16bbe8c862 consensus: p2p refactor (#5969) 2021-02-16 11:02:52 -05:00
Callum Waters
162f67cf26 correct spelling to US english (#6077) 2021-02-11 18:59:18 +01:00
Erik Grinaker
9b6d6a3ad0 p2p: tighten up Router and add tests (#6044)
This cleans up the `Router` code and adds a bunch of tests. These sorts of systems are a real pain to test, since they have a bunch of asynchronous goroutines living their own lives, so the test coverage is decent but not fantastic. Luckily we've been able to move all of the complex peer management and transport logic outside of the router, as synchronous components that are much easier to test, so the core router logic is fairly small and simple.

This also provides some initial test tooling in `p2p/p2ptest` that automatically sets up in-memory networks and channels for use in integration tests. It also includes channel-oriented test asserters in `p2p/p2ptest/require.go`, but these have primarily been written for router testing and should probably be adapted or extended for reactor testing.
2021-02-03 23:03:02 +00:00
Erik Grinaker
2aad26e2f1 p2p: tighten up and test PeerManager (#6034)
This tightens up the `PeerManager` and related code, adds a ton of tests, and fixes a bunch of inconsistencies and bugs.
2021-02-03 06:15:23 +00:00
Erik Grinaker
fc71882f74 p2p: add tests and fix bugs for NodeAddress and NodeID (#6021)
This renames `PeerAddress` to `NodeAddress`, moves it and `NodeID` into a separate file `address.go`, adds tests for them, and fixes a bunch of bugs and inconsistencies.
2021-02-01 09:03:41 +00:00
Erik Grinaker
1f39f808e1 p2p: tighten up and test Transport API (#6020)
This tightens up the new P2P `Transport` API and infrastructure, fixes a bunch of bugs and inconsistencies, and adds tests.
2021-02-01 08:24:31 +00:00
Erik Grinaker
50b8907581 p2p: clean up new Transport infrastructure (#6017)
This revises the new P2P `Transport` interface and does some preliminary code cleanups and simplifications.

The major change here is to add `Connection.Handshake()` for performing node handshakes (once the stream transport API is implemented, this can be done entirely independent of the transport).  This moves most of the handshaking logic into the `Router`, such as prevention of head-of-line blocking, validation of peer's `NodeInfo`, controlling timeouts, and so on. This significantly simplifies transports, completely removes the need for internal goroutines, and shares common logic across all transports. This also allows varying the handshake `NodeInfo` across peers, e.g. to vary `ListenAddr`. Similarly, connection filtering is also moved into the switch/router so that it can be shared between transports.
2021-01-30 10:51:22 +00:00
Marko
1f01e5d726 params: remove blockTimeIota (#5987)
## Description

- removes blocktimeiota 
- merges block params in abci and core state
- spec change: https://github.com/tendermint/spec/pull/248


Closes: #5939
2021-01-28 13:47:24 +00:00
Erik Grinaker
c900303ac6 test: fix flaky router broadcast test (#6006)
Fixes #6004 by reordering test to avoid race condition. Will redesign router tests to be resistant to this later.
2021-01-28 11:26:43 +00:00
Erik Grinaker
363804ac21 test: fix TestRouter to take into account PeerManager reconnects (#6002)
Fixes #5981, which was caused by changes in Router behavior after the introduction of the peer manager, leading to a race condition that could halt the test.

This is a temporary measure, I'll start tightening up the new P2P core tomorrow and write "real" tests with better test infrastructure.
2021-01-27 21:40:28 +00:00
Erik Grinaker
5a9b740acb test: fix TestSwitchAcceptRoutine by ignoring spurious error (#6001)
Another fix for `TestSwitchAcceptRoutine` following from #6000, since the `SetDeadline()` call also errors when the connection has been closed.
2021-01-27 21:31:31 +00:00
Erik Grinaker
aead4ab555 test: fix test data race in p2p.MemoryTransport with logger (#5995)
This patches over a test data race where the logger would try to read struct internals via `reflect` while these were concurrently modified (specifically `MemoryTransport.closeOnce`).
2021-01-27 21:05:48 +00:00
Erik Grinaker
4dca066aab test: disable TestPEXReactorSeedModeFlushStop due to flake (#5996)
This test occasionally fails because the peer is already stopped. It is unclear to me exactly what this test is supposed to do, since calling `FlushStop()` will stop the peer, but the test asserts that the peer shouldn't have been stopped by `FlushStop()` since calling `Stop()` afterwards will error in that case.

The current PEX reactor will be removed in the new P2P stack anyway.
2021-01-27 20:05:37 +00:00
Erik Grinaker
6e3c58204a test: fix TestSwitchAcceptRoutine flake by ignoring error type (#6000)
Fixes #5998. Sometimes the connection returns "use of closed network connection" instead, so for now we just accept any error. The switch is not long for this world anyway.
2021-01-27 19:53:07 +00:00
Erik Grinaker
06de7459c9 p2p: use stopCtx when dialing peers in Router (#5983)
This ensures we don't leak dial goroutines when shutting down the router.
2021-01-26 19:47:03 +01:00
Erik Grinaker
937a18468a test/p2p: close transports to avoid goroutine leak failures (#5982) 2021-01-26 17:49:37 +01:00
Erik Grinaker
fe5b312337 p2p: resolve PEX addresses in PEX reactor (#5980)
This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974.

I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options:

1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on.

2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs.

I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this.
2021-01-26 15:58:33 +00:00
Erik Grinaker
7ea8746ed1 proto/p2p: rename PEX messages and fields (#5974)
Fixes #5899 by renaming a bunch of P2P Protobuf entities (while maintaining wire compatibility):

* `Message` to `PexMessage` (as it's only used for PEX messages).
* `PexAddrs` to `PexResponse`.
* `PexResponse.Addrs` to `PexResponse.Addresses`.
* `NetAddress` to `PexAddress` (as it's only used by PEX).
2021-01-26 16:37:36 +01:00
Erik Grinaker
51aca684b8 p2p: add prototype PEX reactor for new stack (#5971)
This adds a prototype PEX reactor for the new P2P stack.
2021-01-26 15:10:41 +00:00
Erik Grinaker
13e772c916 p2p: add PeerManager.Advertise() (#5957)
Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement.
2021-01-25 18:56:35 +00:00
Erik Grinaker
81daaacae9 p2p: simplify PeerManager upgrade logic (#5962)
Follow-up from #5947, branched off of #5954.

This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes:

* Add `evict` map which queues up peers to explicitly evict.
* `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`).
* `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity.
* `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected).
* `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer.

This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of.
2021-01-25 17:51:14 +00:00
Erik Grinaker
a741314c97 p2p: improve peerStore prototype (#5954)
This improves the `peerStore` prototype by e.g.:

* Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance.
* Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations.
* Caching the ranked peer set, as a temporary solution until a better data structure is implemented.
* Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full.
* Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics.
2021-01-25 17:27:44 +00:00
Erik Grinaker
7e0436c6e6 p2p: make PeerManager.DialNext() and EvictNext() block (#5947)
See #5936 and #5938 for background.

The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about.

I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
2021-01-25 11:11:20 +00:00
Erik Grinaker
9c98af4277 test: fix TestPEXReactorRunning data race (#5955)
Fixes #5941.

Not entirely sure that this will fix the problem (couldn't reproduce), but in any case this is an artifact of a hack in the P2P transport refactor to make it work with the legacy P2P stack, and will be removed when the refactor is done anyway.
2021-01-24 12:45:55 +00:00
Erik Grinaker
670e9b427b p2p: improve PeerManager prototype (#5936)
This improves the prototype peer manager by:

* Exporting `PeerManager`, making it accessible by e.g. reactors.
* Replacing `Router.SubscribePeerUpdates()` with `PeerManager.Subscribe()`.
* Tracking address/peer connection statistics, and retrying dial failures with exponential backoff.
* Prioritizing peers, with persistent peers configuration.
* Limiting simultaneous connections.
* Evicting peers and upgrading to higher-priority peers.
* Tracking peer heights, as a workaround for legacy shared peer state APIs.

This is getting to a point where we need to determine precise semantics and implement tests, so we should figure out whether it's a reasonable abstraction that we want to use. The main questions are around the API model (i.e. synchronous method calls with the router polling the manager, vs. an event-driven model using channels, vs. the peer manager calling methods on the router to connect/disconnect peers), and who should have the responsibility of managing actual connections (currently the router, while the manager only tracks peer state).
2021-01-21 18:07:54 +00:00
Aleksandr Bezobchuk
15c1936b85 p2p: revise shim log levels (#5940)
Downgrade some noisy logs to DEBUG.
2021-01-21 14:18:27 +00:00
Erik Grinaker
96215a06ed p2p: add prototype peer lifecycle manager (#5882)
This adds a prototype peer lifecycle manager, `peerManager`, which stores peer data in an internal `peerStore`. The overall idea here is to have methods for peer lifecycle events which exchange a very narrow subset of peer data, and to keep all of the peer metadata (i.e. the `peerInfo` struct) internal, to decouple this from the router and simplify concurrency control. See `peerManager` GoDoc for more information.

The router is still responsible for actually dialing and accepting peer connections, and routing messages across them, but the peer manager is responsible for determining which peers to dial next, preventing multiple connections being established for the same peer (e.g. both inbound and outbound), and making sure we don't dial the same peer several times in parallel. Later it will also track retries and exponential backoff, as well as peer and address quality. It also assumes responsibility for peer updates subscriptions.

It's a bit unclear to me whether we want the peer manager to take on the responsibility of actually dialing and accepting connections as well, or if it should only be tracking peer state for the router while the router is responsible for all transport concerns. Let's revisit this later.
2021-01-18 19:56:13 +01:00
Erik Grinaker
c61cd3fd05 p2p: add Router prototype (#5831)
Early but functional prototype of the new `p2p.Router`, see its GoDoc comment for details on how it works. Expect much of this logic to change and improve as we evolve the new P2P stack.

There is a simple test that sets up an in-memory network of four routers with reactors and passes messages between them, but otherwise no exhaustive tests since this is very much a work-in-progress.
2021-01-08 15:32:11 +00:00
Aleksandr Bezobchuk
e986602649 evidence: p2p refactor (#5747) 2021-01-06 11:53:18 -05:00
Erik Grinaker
1ccd23ca1d p2p: fix MConnection inbound traffic statistics and rate limiting (#5868)
Fixes #5866. Inbound traffic monitoring (and by extension inbound rate limiting) was inadvertently removed in 660e72a.
2021-01-06 15:38:23 +01:00
Erik Grinaker
46964f62db p2p: fix IPv6 address handling in new transport API (#5853)
The old code naïvely concatenated IP and port, which doesn't work for IPv6 addresses where `:` can be part of the IP as well.
2021-01-04 15:05:43 +00:00
Erik Grinaker
91bef75f62 p2p: rename PubKeyToID to NodeIDFromPubKey 2021-01-04 11:25:20 +01:00
Erik Grinaker
b4ce1de44a p2p: rename NodeInfo.DefaultNodeID to NodeID 2021-01-04 11:25:20 +01:00
Erik Grinaker
1b6df6783d p2p: replace PeerID with NodeID 2021-01-04 11:25:20 +01:00
Erik Grinaker
cc3c18a6a7 p2p: add NodeID.Validate(), replaces validateID() 2021-01-04 11:25:20 +01:00
Erik Grinaker
8e7d431f6f p2p: rename ID to NodeID 2021-01-04 11:25:20 +01:00
Erik Grinaker
84ff991387 p2p: add MemoryTransport, an in-memory transport for testing (#5827) 2020-12-23 13:21:01 +01:00
Erik Grinaker
72f041b759 p2p: fix data race in MakeSwitch test helper (#5810)
Fixes #5809.
2020-12-19 06:35:16 +00:00
Anton Kaliaev
ced66e4eb5 config: increase MaxPacketMsgPayloadSize to 1400
The MTU (Maximum Transmission Unit) for Ethernet is 1500 bytes.
The IP header and the TCP header take up 20 bytes each at least (unless
optional header fields are used) and thus the max for (non-Jumbo frame)
Ethernet is 1500 - 20 -20 = 1460
Source: https://stackoverflow.com/a/3074427/820520
2020-12-17 15:59:18 +04:00
Anton Kaliaev
085fd66f33 p2p: do not format raw msg bytes
While debugging the mempool issue (#5796), I've noticed we're spending
quite a bit of time encoding blobs of data, which never get printed! The
reason is filtering occurs on the level below, so Go runtime rightfully
evaluates function arguments.

I think it's okay to not format raw bytes.
2020-12-17 15:59:18 +04:00
Erik Grinaker
e198edf20e p2p: remove NodeInfo interface and rename DefaultNodeInfo struct (#5799)
The `NodeInfo` interface does not appear to serve any purpose at all, so I removed it and renamed the `DefaultNodeInfo` struct to `NodeInfo` (including the Protobuf representations). Let me know if this is actually needed for anything.

Only the Protobuf rename is listed in the changelog, since we do not officially support API stability of the `p2p` package (according to `README.md`). The on-wire protocol remains compatible.
2020-12-15 18:54:25 +00:00
Erik Grinaker
bcfc889f25 p2p: implement new Transport interface (#5791)
This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog.

The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack.

The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely.

There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
2020-12-15 15:08:16 +00:00
Anton Kaliaev
28e79a4d02 cmd: modify gen_node_key to print key to STDOUT (#5772)
closes: #5770
closes: #5769

also, include node ID in the output (#5769) and modify NodeKey to use
value semantics (it makes perfect sense for NodeKey to not be a
pointer).
2020-12-10 11:02:35 +04:00
Aleksandr Bezobchuk
a879eb444d p2p: state sync reactor refactor (#5671) 2020-12-09 09:31:06 -05:00
Tess Rinearson
79890d8393 reactors: omit incoming message bytes from reactor logs (#5743)
After a reactor has failed to parse an incoming message, it shouldn't output the "bad" data into the logs, as that data is unfiltered and could have anything in it. (We also don't think this information is helpful to have in the logs anyways.)
2020-12-03 22:12:08 +00:00
Alessio Treglia
77d7328bc6 p2p/pex: fix flaky tests (#5733)
*testing.T.TempDir() causes test cases to fail when
it is unable to remove the temporary directory once
the test case execution terminates. This seems to
happen often with pex reactor test cases.
2020-12-02 13:47:59 +00:00