Commit Graph

8611 Commits

Author SHA1 Message Date
Erik Grinaker
5f88d6aa44 test/e2e: increase validator tolerances (#6037) 2021-02-02 16:41:27 +00:00
Tess Rinearson
b0dfbd1832 .github/workflows: try different e2e nightly test set (#6036) 2021-02-02 16:36:52 +01:00
Callum Waters
c7b619188d light: fix panic with RPC calls to commit and validator when height is nil (#6026) 2021-02-02 14:01:39 +01:00
Erik Grinaker
3d01d98f67 test/e2e: increase sign/propose tolerances (#6033)
E2E tests often fail because validators miss signing or proposing blocks. Often this is because e.g. there's a lot of disruption in the network or it takes a long time to start up all the nodes.

This changes the test criteria to only check for 3 signed/proposed blocks, rather than a fraction of the expected blocks. This should be enough to catch most issues, apart from performance problems causing nodes to miss signing/proposing, but we may want separate tests for those sorts of things.
2021-02-02 12:43:08 +00:00
Tess Rinearson
0b953eb190 Revert "e2e: releases nightly (#5906)" (#6031)
This reverts commit 64961e2267, to see if it will make the workflow dispatch trigger reappear and fix our Slack notification link.
2021-02-02 11:54:27 +01:00
Callum Waters
90d3f56797 store: fix deadlock in pruning (#6007) 2021-02-02 11:36:52 +01:00
Anton Kaliaev
9d1f77369f goreleaser: downcase archive and binary names (#6029)
before:

```
Tendermint_0.34.3_darwin_amd64.tar.gz

-rw-r--r--  0 runner docker 192329 Jan 19 19:30 CHANGELOG.md
-rw-r--r--  0 runner docker    321 Jan 19 19:30 CHANGELOG_PENDING.md
-rw-r--r--  0 runner docker  11382 Jan 19 19:30 LICENSE
-rw-r--r--  0 runner docker   8165 Jan 19 19:30 README.md
-rwxr-xr-x  0 runner docker 23224320 Jan 19 19:30 tendermint
```

after:

```
tendermint_0.34.3_darwin_amd64.tar.gz

-rw-r--r--  0 runner docker 192329 Jan 19 19:30 CHANGELOG.md
-rw-r--r--  0 runner docker    321 Jan 19 19:30 CHANGELOG_PENDING.md
-rw-r--r--  0 runner docker  11382 Jan 19 19:30 LICENSE
-rw-r--r--  0 runner docker   8165 Jan 19 19:30 README.md
-rwxr-xr-x  0 runner docker 23224320 Jan 19 19:30 tendermint
```
2021-02-02 08:22:56 +00:00
Marko
2a2279e010 types: cleanup protobuf.go (#6023)
## Description

- remove unused functions
- remove a function used in tests.

Closes: #XXX
2021-02-01 12:02:45 +00:00
Anton Kaliaev
1cd9bdb80b light/provider/http: fix Validators (#6022)
Closes #6010
2021-02-01 11:32:37 +00:00
Erik Grinaker
fc71882f74 p2p: add tests and fix bugs for NodeAddress and NodeID (#6021)
This renames `PeerAddress` to `NodeAddress`, moves it and `NodeID` into a separate file `address.go`, adds tests for them, and fixes a bunch of bugs and inconsistencies.
2021-02-01 09:03:41 +00:00
Erik Grinaker
1f39f808e1 p2p: tighten up and test Transport API (#6020)
This tightens up the new P2P `Transport` API and infrastructure, fixes a bunch of bugs and inconsistencies, and adds tests.
2021-02-01 08:24:31 +00:00
Erik Grinaker
50b8907581 p2p: clean up new Transport infrastructure (#6017)
This revises the new P2P `Transport` interface and does some preliminary code cleanups and simplifications.

The major change here is to add `Connection.Handshake()` for performing node handshakes (once the stream transport API is implemented, this can be done entirely independent of the transport).  This moves most of the handshaking logic into the `Router`, such as prevention of head-of-line blocking, validation of peer's `NodeInfo`, controlling timeouts, and so on. This significantly simplifies transports, completely removes the need for internal goroutines, and shares common logic across all transports. This also allows varying the handshake `NodeInfo` across peers, e.g. to vary `ListenAddr`. Similarly, connection filtering is also moved into the switch/router so that it can be shared between transports.
2021-01-30 10:51:22 +00:00
Aleksandr Bezobchuk
17905cbaa2 sync: move closer to separate file (#6015)
Closes: #6013
2021-01-29 16:59:15 +00:00
Aleksandr Bezobchuk
60bc071ed5 blockchain v0: skip TestReactor_BadBlockStopsPeer (#6014)
ref: #6005
2021-01-29 15:47:49 +00:00
Anton Kaliaev
b1646e51e2 test/e2e: enable pprof server to help debugging failures (#6003) 2021-01-28 15:21:07 +00:00
Anton Kaliaev
a54f1544f8 .github: rename crashers output (fuzz-nightly-test) (#5993) 2021-01-28 19:12:48 +04:00
Marko
1f01e5d726 params: remove blockTimeIota (#5987)
## Description

- removes blocktimeiota 
- merges block params in abci and core state
- spec change: https://github.com/tendermint/spec/pull/248


Closes: #5939
2021-01-28 13:47:24 +00:00
Erik Grinaker
c900303ac6 test: fix flaky router broadcast test (#6006)
Fixes #6004 by reordering test to avoid race condition. Will redesign router tests to be resistant to this later.
2021-01-28 11:26:43 +00:00
Erik Grinaker
363804ac21 test: fix TestRouter to take into account PeerManager reconnects (#6002)
Fixes #5981, which was caused by changes in Router behavior after the introduction of the peer manager, leading to a race condition that could halt the test.

This is a temporary measure, I'll start tightening up the new P2P core tomorrow and write "real" tests with better test infrastructure.
2021-01-27 21:40:28 +00:00
Erik Grinaker
5a9b740acb test: fix TestSwitchAcceptRoutine by ignoring spurious error (#6001)
Another fix for `TestSwitchAcceptRoutine` following from #6000, since the `SetDeadline()` call also errors when the connection has been closed.
2021-01-27 21:31:31 +00:00
Erik Grinaker
aead4ab555 test: fix test data race in p2p.MemoryTransport with logger (#5995)
This patches over a test data race where the logger would try to read struct internals via `reflect` while these were concurrently modified (specifically `MemoryTransport.closeOnce`).
2021-01-27 21:05:48 +00:00
Aleksandr Bezobchuk
bd8a9372d2 consensus: Groom Logs (#5917)
Executed a local network using simapp and looked for logs that seemed superfluous. This isn't by any means an exhaustive grooming, but should drastically help legibility of logs.


ref: #5912
2021-01-27 20:53:24 +00:00
Marko
70bb8cc8b7 proto: seperate native and proto types (#5994)
## Description

Separate protobuf and domain types. We should avoid using protobuf in our core logic. 

ref #5460
2021-01-27 20:14:27 +00:00
Erik Grinaker
4dca066aab test: disable TestPEXReactorSeedModeFlushStop due to flake (#5996)
This test occasionally fails because the peer is already stopped. It is unclear to me exactly what this test is supposed to do, since calling `FlushStop()` will stop the peer, but the test asserts that the peer shouldn't have been stopped by `FlushStop()` since calling `Stop()` afterwards will error in that case.

The current PEX reactor will be removed in the new P2P stack anyway.
2021-01-27 20:05:37 +00:00
Erik Grinaker
6e3c58204a test: fix TestSwitchAcceptRoutine flake by ignoring error type (#6000)
Fixes #5998. Sometimes the connection returns "use of closed network connection" instead, so for now we just accept any error. The switch is not long for this world anyway.
2021-01-27 19:53:07 +00:00
Erik Grinaker
f54f80bf0d test: don't use foo-bar.net in TestHTTPClientMakeHTTPDialer (#5997)
This test relied on connecting to the external site `foo-bar.net`, and (predictably) the site went down and broke all of our CI runs. This changes it to use local HTTP servers instead.
2021-01-27 19:17:00 +00:00
Anton Kaliaev
8ce254cdb7 CONTRIBUTING.md: update testing section (#5979)
[✌️ RENDERED](ad5a2ec28b/CONTRIBUTING.md)

Closes #5874
2021-01-27 10:00:39 +00:00
Anton Kaliaev
a2e684e51f .github: archive crashers and fix set-crashers-count step (#5992) 2021-01-27 11:35:48 +04:00
Sergey
3759bc511b docs: fix typo in state sync example (#5989) 2021-01-26 19:48:44 +01:00
Erik Grinaker
06de7459c9 p2p: use stopCtx when dialing peers in Router (#5983)
This ensures we don't leak dial goroutines when shutting down the router.
2021-01-26 19:47:03 +01:00
Aleksandr Bezobchuk
642ecc3f5c mempool: fix mempool tests timeout (#5988) 2021-01-26 13:26:47 -05:00
Aleksandr Bezobchuk
b19acfb605 mempool: fix TestReactorNoBroadcastToSender (#5984)
## Description

Looks like I missed a test in the original PR when fixing the tests.

Closes: #5956
2021-01-26 17:33:26 +00:00
Erik Grinaker
937a18468a test/p2p: close transports to avoid goroutine leak failures (#5982) 2021-01-26 17:49:37 +01:00
Erik Grinaker
fe5b312337 p2p: resolve PEX addresses in PEX reactor (#5980)
This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974.

I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options:

1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on.

2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs.

I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this.
2021-01-26 15:58:33 +00:00
Erik Grinaker
7ea8746ed1 proto/p2p: rename PEX messages and fields (#5974)
Fixes #5899 by renaming a bunch of P2P Protobuf entities (while maintaining wire compatibility):

* `Message` to `PexMessage` (as it's only used for PEX messages).
* `PexAddrs` to `PexResponse`.
* `PexResponse.Addrs` to `PexResponse.Addresses`.
* `NetAddress` to `PexAddress` (as it's only used by PEX).
2021-01-26 16:37:36 +01:00
Erik Grinaker
51aca684b8 p2p: add prototype PEX reactor for new stack (#5971)
This adds a prototype PEX reactor for the new P2P stack.
2021-01-26 15:10:41 +00:00
Anton Kaliaev
8718f6f5ff terminate go-fuzz gracefully (w/ SIGINT) (#5973)
and preserve exit code.

```
2021/01/26 03:34:49 workers: 2, corpus: 4 (8m28s ago), crashers: 0, restarts: 1/9976, execs: 11013732 (21596/sec), cover: 121, uptime: 8m30s
make: *** [fuzz-mempool] Terminated
Makefile:5: recipe for target 'fuzz-mempool' failed
Error: Process completed with exit code 124.
```

https://github.com/tendermint/tendermint/runs/1766661614

`continue-on-error` should make GH ignore any error codes.
2021-01-26 17:58:14 +04:00
Marko
91823eba32 tests: fix make test (#5966)
## Description
 
- bump deadlock dep to master
  - fixes `make test` since we now use `deadlock.Once`

Closes: #XXX
2021-01-26 08:31:42 +00:00
Aleksandr Bezobchuk
b3aae970d8 blockchain v0: fix waitgroup data race (#5970)
## Description

Fixes the data race in usage of `WaitGroup`. Specifically, the case where we invoke `Wait` _before_ the first delta `Add` call when the current waitgroup counter is zero. See https://golang.org/pkg/sync/#WaitGroup.Add.

Still not sure how this manifests itself in a test since the reactor has to be stopped virtually immediately after being started (I think?).

Regardless, this is the appropriate fix.

closes: #5968
2021-01-25 19:34:55 +00:00
Erik Grinaker
13e772c916 p2p: add PeerManager.Advertise() (#5957)
Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement.
2021-01-25 18:56:35 +00:00
Erik Grinaker
81daaacae9 p2p: simplify PeerManager upgrade logic (#5962)
Follow-up from #5947, branched off of #5954.

This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes:

* Add `evict` map which queues up peers to explicitly evict.
* `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`).
* `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity.
* `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected).
* `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer.

This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of.
2021-01-25 17:51:14 +00:00
Erik Grinaker
a741314c97 p2p: improve peerStore prototype (#5954)
This improves the `peerStore` prototype by e.g.:

* Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance.
* Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations.
* Caching the ranked peer set, as a temporary solution until a better data structure is implemented.
* Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full.
* Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics.
2021-01-25 17:27:44 +00:00
Aleksandr Bezobchuk
9e158839f6 mempool: fix reactor tests (#5967)
## Description

Update the faux router to either drop channel errors or handle them based on an argument. This prevents deadlocks in tests where we try to send an error on the mempool channel but there is no reader.

Closes: #5956
2021-01-25 16:59:18 +00:00
Callum Waters
aecfb0ecf0 e2e: add control over the log level of nodes (#5958) 2021-01-25 17:20:39 +01:00
Anton Kaliaev
680fb18414 .github: fix fuzz-nightly job (#5965)
outputs is a property of the job, not an individual step.
2021-01-25 19:48:57 +04:00
Marko
962a82c06e docs: log level docs (#5945)
## Description

add section on configuring log levels

Closes: #XXX
2021-01-25 13:37:18 +00:00
Anton Kaliaev
d76add65a6 libs/log: format []byte as hexidecimal string (uppercased) (#5960)
Closes: #5806 

Co-authored-by: Lanie Hei <heixx011@umn.edu>
2021-01-25 16:25:29 +04:00
Erik Grinaker
7e0436c6e6 p2p: make PeerManager.DialNext() and EvictNext() block (#5947)
See #5936 and #5938 for background.

The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about.

I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
2021-01-25 11:11:20 +00:00
odidev
cd3ebe8754 docker: release Linux/ARM64 image (#5925)
Co-authored-by: Marko <marbar3778@yahoo.com>
2021-01-25 11:01:49 +00:00
Marko
b958ba3440 docker: dont login when in PR (#5961) 2021-01-25 10:43:54 +00:00