Commit Graph

31 Commits

Author SHA1 Message Date
Cyrus Goh
5182ffee25 docs: master → docs-staging (#5990)
* Makefile: always pull image in proto-gen-docker. (#5953)

The `proto-gen-docker` target didn't pull an updated Docker image, and would use a local image if present which could be outdated and produce wrong results.

* test: fix TestPEXReactorRunning data race (#5955)

Fixes #5941.

Not entirely sure that this will fix the problem (couldn't reproduce), but in any case this is an artifact of a hack in the P2P transport refactor to make it work with the legacy P2P stack, and will be removed when the refactor is done anyway.

* test/fuzz: move fuzz tests into this repo (#5918)

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>

Closes #5907

- add init-corpus to blockchain reactor
- remove validator-set FromBytes test
now that we have proto, we don't need to test it! bye amino
- simplify mempool test
do we want to test remote ABCI app?
- do not recreate mux on every crash in jsonrpc test
- update p2p pex reactor test
- remove p2p/listener test
the API has changed + I did not understand what it's tested anyway
- update secretconnection test
- add readme and makefile
- list inputs in readme
- add nightly workflow
- remove blockchain fuzz test
EncodeMsg / DecodeMsg no longer exist

* docker: dont login when in PR (#5961)

* docker: release Linux/ARM64 image (#5925)

Co-authored-by: Marko <marbar3778@yahoo.com>

* p2p: make PeerManager.DialNext() and EvictNext() block (#5947)

See #5936 and #5938 for background.

The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about.

I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.

* libs/log: format []byte as hexidecimal string (uppercased) (#5960)

Closes: #5806 

Co-authored-by: Lanie Hei <heixx011@umn.edu>

* docs: log level docs (#5945)

## Description

add section on configuring log levels

Closes: #XXX

* .github: fix fuzz-nightly job (#5965)

outputs is a property of the job, not an individual step.

* e2e: add control over the log level of nodes (#5958)

* mempool: fix reactor tests (#5967)

## Description

Update the faux router to either drop channel errors or handle them based on an argument. This prevents deadlocks in tests where we try to send an error on the mempool channel but there is no reader.

Closes: #5956

* p2p: improve peerStore prototype (#5954)

This improves the `peerStore` prototype by e.g.:

* Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance.
* Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations.
* Caching the ranked peer set, as a temporary solution until a better data structure is implemented.
* Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full.
* Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics.

* p2p: simplify PeerManager upgrade logic (#5962)

Follow-up from #5947, branched off of #5954.

This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes:

* Add `evict` map which queues up peers to explicitly evict.
* `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`).
* `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity.
* `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected).
* `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer.

This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of.

* p2p: add PeerManager.Advertise() (#5957)

Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement.

* blockchain v0: fix waitgroup data race (#5970)

## Description

Fixes the data race in usage of `WaitGroup`. Specifically, the case where we invoke `Wait` _before_ the first delta `Add` call when the current waitgroup counter is zero. See https://golang.org/pkg/sync/#WaitGroup.Add.

Still not sure how this manifests itself in a test since the reactor has to be stopped virtually immediately after being started (I think?).

Regardless, this is the appropriate fix.

closes: #5968

* tests: fix `make test` (#5966)

## Description
 
- bump deadlock dep to master
  - fixes `make test` since we now use `deadlock.Once`

Closes: #XXX

* terminate go-fuzz gracefully (w/ SIGINT) (#5973)

and preserve exit code.

```
2021/01/26 03:34:49 workers: 2, corpus: 4 (8m28s ago), crashers: 0, restarts: 1/9976, execs: 11013732 (21596/sec), cover: 121, uptime: 8m30s
make: *** [fuzz-mempool] Terminated
Makefile:5: recipe for target 'fuzz-mempool' failed
Error: Process completed with exit code 124.
```

https://github.com/tendermint/tendermint/runs/1766661614

`continue-on-error` should make GH ignore any error codes.

* p2p: add prototype PEX reactor for new stack (#5971)

This adds a prototype PEX reactor for the new P2P stack.

* proto/p2p: rename PEX messages and fields (#5974)

Fixes #5899 by renaming a bunch of P2P Protobuf entities (while maintaining wire compatibility):

* `Message` to `PexMessage` (as it's only used for PEX messages).
* `PexAddrs` to `PexResponse`.
* `PexResponse.Addrs` to `PexResponse.Addresses`.
* `NetAddress` to `PexAddress` (as it's only used by PEX).

* p2p: resolve PEX addresses in PEX reactor (#5980)

This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974.

I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options:

1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on.

2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs.

I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this.

* test/p2p: close transports to avoid goroutine leak failures (#5982)

* mempool: fix TestReactorNoBroadcastToSender (#5984)

## Description

Looks like I missed a test in the original PR when fixing the tests.

Closes: #5956

* mempool: fix mempool tests timeout (#5988)

* p2p: use stopCtx when dialing peers in Router (#5983)

This ensures we don't leak dial goroutines when shutting down the router.

* docs: fix typo in state sync example (#5989)

Co-authored-by: Erik Grinaker <erik@interchain.berlin>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Marko <marbar3778@yahoo.com>
Co-authored-by: odidev <odidev@puresoftware.com>
Co-authored-by: Lanie Hei <heixx011@umn.edu>
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Sergey <52304443+c29r3@users.noreply.github.com>
2021-01-26 11:46:21 -08:00
Erik Grinaker
66ba12d9bc test/e2e: tolerate up to 2/3 missed signatures for a validator (#5878)
E2E tests often fail due to fast sync stalls causing the validator to miss signing blocks. This increases the tolerance for missed signatures to 2/3 to allow validators to spend more time starting up.
2021-01-08 11:06:51 +00:00
Marko
09cf0bcb01 privval: add grpc (#5725)
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
2021-01-06 10:49:30 -08:00
Erik Grinaker
1570d26f84 test/e2e: add conceptual overview (#5857)
This should be useful to understand the overall purpose and structure of the end-to-end tests.
2021-01-04 17:51:49 +00:00
Erik Grinaker
17ca6c6c98 test/e2e: disable abci/grpc and blockchain/v2 due to flake (#5854)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-04 15:19:07 +00:00
Callum Waters
3283a84ab2 cmd: hyphen case cli and config (#5777) 2020-12-11 13:14:04 +01:00
Anton Kaliaev
28e79a4d02 cmd: modify gen_node_key to print key to STDOUT (#5772)
closes: #5770
closes: #5769

also, include node ID in the output (#5769) and modify NodeKey to use
value semantics (it makes perfect sense for NodeKey to not be a
pointer).
2020-12-10 11:02:35 +04:00
Anton Kaliaev
243ff4b43d blockchain/v1: remove in favor of v2 (#5728) 2020-12-03 09:35:47 +04:00
Anton Kaliaev
170cb70e19 test/e2e: enable v1 and v2 blockchains (#5702)
* test/e2e: enable v1 and v2 blockchains

* modify networks/ci.toml
2020-11-23 13:32:22 +00:00
Marko
b2d72dce7e e2e: use ed25519 for secretConn (remote signer) (#5678)
## Description

Hardcode ed25519 to dialTCPFn in e2e tests. 

I will backport `DefaultRequestHandler` fixes

This will be replaced when grpc is implemented.
2020-11-17 13:55:23 +00:00
Marko
e0950515ff test/e2e: fix secp failures (#5649) 2020-11-16 12:31:32 +01:00
Marko
bf35cc6443 cmd: add support for --key (#5612)
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
2020-11-09 15:22:36 +01:00
Callum Waters
9fe7b4fe77 remove misbehaviors from e2e generator (#5629) 2020-11-06 14:27:48 +01:00
Callum Waters
3922dde05d evidence: structs can independently form abci evidence (#5610) 2020-11-04 17:14:48 +01:00
Erik Grinaker
53022220f6 test: fix various E2E test issues (#5576)
* Don't use state sync for nodes starting at initial height.
* Also remove stopped containers when cleaning up.
* Start nodes in order of startAt, mode, name to avoid full nodes starting before their seeds.
* Tweak network waiting to avoid halts caused by validator changes and perturbations.
* Disable most tests for seed nodes, which aren't always able to join consensus.
* Disable `blockchain/v2` due to known bugs.
2020-10-27 16:22:00 +00:00
Erik Grinaker
7daf6a1a03 test: disable E2E misbehaviors due to bugs (#5569)
Disables misbehaviors in E2E testnets due to failures caused by #5554 and #5560. Should be re-enabled once these are fixed.
2020-10-26 10:28:11 +00:00
Erik Grinaker
10dda219a1 test: fix handling of start height in generated E2E testnets (#5563)
In #5488 the E2E testnet generator changed to setting explicit `StartAt` heights for initial nodes. This broke the runner, which expected all initial nodes to have `StartAt: 0`, as well as validator set scheduling in the generator. Testnet loading now normalizes initial nodes to have `StartAt: 0`.

This also tweaks waiting for misbehavior heights to only use an additional wait if there actually is any misbehavior in the testnet, and to output information when waiting.
2020-10-26 10:09:46 +00:00
Callum Waters
50b91867c3 test: add evidence e2e tests (#5488) 2020-10-23 12:33:08 +02:00
Erik Grinaker
d11e5993b1 test: tag E2E Docker resources and autoremove them (#5558)
Fixes #5555.
2020-10-23 08:17:15 +00:00
Erik Grinaker
e0e006d10f test: run remaining E2E testnets on run-multiple.sh failure (#5557)
Fixes #5542.
2020-10-22 20:53:27 +00:00
Erik Grinaker
8c5fe166a6 test: enable restart/kill perturbations in E2E tests (#5537)
When #5536 lands we can re-enable restart/kill perturbations in E2E tests.
2020-10-20 19:30:07 +00:00
Erik Grinaker
9e6248c0d7 test: enable blockchain v2 in E2E testnet generator (#5533)
When #5499 and #5530 land, we can re-enable v2 in the E2E testnet generator (and thus the nightly E2E tests).
2020-10-20 11:44:22 +00:00
Erik Grinaker
3dabfbeae0 test: enable ABCI gRPC client in E2E testnets (#5521)
Once #5520 lands, we can re-enable gRPC ABCI protocol in the E2E testnets.
2020-10-20 08:27:58 +00:00
Erik Grinaker
f0d4ddcf3c test: tweak E2E tests for nightly runs (#5512) 2020-10-16 12:10:29 +02:00
Erik Grinaker
b6979e7fbd test: clean up E2E test volumes using a container (#5509) 2020-10-15 20:59:16 +02:00
Erik Grinaker
8a67968416 github: add nightly E2E testnet action (#5480) 2020-10-14 16:55:02 +02:00
Erik Grinaker
260cc5dd69 test/e2e: add random testnet generator (#5479)
Closes #5291. Adds a randomized testnet generator. Nightly CI job will be submitted separately. A few of the testnets can be a bit flaky, even after disabling known-faulty behavior and making minor tweaks, and the larger networks may be too resource-intensive to run in CI - this will be optimized separately.
2020-10-09 12:33:48 +00:00
Erik Grinaker
8d28e7467c test: add E2E test for node peering (#5465)
This was a missing test case from the old P2P tests removed in #5453, which makes sure that all nodes are able to peer with each other regardless of how they discover peers.

Fixes #2795, since the default CI testnet uses a combination of (partially meshed) persistent peers and PEX-based seed nodes.
2020-10-06 06:00:20 +00:00
Erik Grinaker
7e27e9b852 test: add GitHub action for end-to-end tests (#5452)
Partial fix for #5291.
2020-10-05 12:44:35 +02:00
Erik Grinaker
090afe30f9 test: add basic end-to-end test cases (#5450)
Partial fix for #5291.

This adds a basic set of test cases for core network invariants. Although small, it is sufficient to replace and extend the current set of P2P tests. Further test cases can be added later.
2020-10-05 10:23:12 +00:00
Erik Grinaker
250c3aa92e test: add end-to-end testing framework (#5435)
Partial fix for #5291. For details, see [README.md](https://github.com/tendermint/tendermint/blob/erik/e2e-tests/test/e2e/README.md) and [RFC-001](https://github.com/tendermint/tendermint/blob/master/docs/rfc/rfc-001-end-to-end-testing.md).

This only includes a single test case under `test/e2e/tests/`, as a proof of concept - additional test cases will be submitted separately. A randomized testnet generator will also be submitted separately, there a currently just a handful of static testnets under `test/e2e/networks/`. This will eventually replace the current P2P tests and run in CI.
2020-10-05 09:35:01 +00:00