Commit Graph

1113 Commits

Author SHA1 Message Date
Cyrus Goh
5182ffee25 docs: master → docs-staging (#5990)
* Makefile: always pull image in proto-gen-docker. (#5953)

The `proto-gen-docker` target didn't pull an updated Docker image, and would use a local image if present which could be outdated and produce wrong results.

* test: fix TestPEXReactorRunning data race (#5955)

Fixes #5941.

Not entirely sure that this will fix the problem (couldn't reproduce), but in any case this is an artifact of a hack in the P2P transport refactor to make it work with the legacy P2P stack, and will be removed when the refactor is done anyway.

* test/fuzz: move fuzz tests into this repo (#5918)

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>

Closes #5907

- add init-corpus to blockchain reactor
- remove validator-set FromBytes test
now that we have proto, we don't need to test it! bye amino
- simplify mempool test
do we want to test remote ABCI app?
- do not recreate mux on every crash in jsonrpc test
- update p2p pex reactor test
- remove p2p/listener test
the API has changed + I did not understand what it's tested anyway
- update secretconnection test
- add readme and makefile
- list inputs in readme
- add nightly workflow
- remove blockchain fuzz test
EncodeMsg / DecodeMsg no longer exist

* docker: dont login when in PR (#5961)

* docker: release Linux/ARM64 image (#5925)

Co-authored-by: Marko <marbar3778@yahoo.com>

* p2p: make PeerManager.DialNext() and EvictNext() block (#5947)

See #5936 and #5938 for background.

The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about.

I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.

* libs/log: format []byte as hexidecimal string (uppercased) (#5960)

Closes: #5806 

Co-authored-by: Lanie Hei <heixx011@umn.edu>

* docs: log level docs (#5945)

## Description

add section on configuring log levels

Closes: #XXX

* .github: fix fuzz-nightly job (#5965)

outputs is a property of the job, not an individual step.

* e2e: add control over the log level of nodes (#5958)

* mempool: fix reactor tests (#5967)

## Description

Update the faux router to either drop channel errors or handle them based on an argument. This prevents deadlocks in tests where we try to send an error on the mempool channel but there is no reader.

Closes: #5956

* p2p: improve peerStore prototype (#5954)

This improves the `peerStore` prototype by e.g.:

* Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance.
* Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations.
* Caching the ranked peer set, as a temporary solution until a better data structure is implemented.
* Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full.
* Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics.

* p2p: simplify PeerManager upgrade logic (#5962)

Follow-up from #5947, branched off of #5954.

This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes:

* Add `evict` map which queues up peers to explicitly evict.
* `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`).
* `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity.
* `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected).
* `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer.

This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of.

* p2p: add PeerManager.Advertise() (#5957)

Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement.

* blockchain v0: fix waitgroup data race (#5970)

## Description

Fixes the data race in usage of `WaitGroup`. Specifically, the case where we invoke `Wait` _before_ the first delta `Add` call when the current waitgroup counter is zero. See https://golang.org/pkg/sync/#WaitGroup.Add.

Still not sure how this manifests itself in a test since the reactor has to be stopped virtually immediately after being started (I think?).

Regardless, this is the appropriate fix.

closes: #5968

* tests: fix `make test` (#5966)

## Description
 
- bump deadlock dep to master
  - fixes `make test` since we now use `deadlock.Once`

Closes: #XXX

* terminate go-fuzz gracefully (w/ SIGINT) (#5973)

and preserve exit code.

```
2021/01/26 03:34:49 workers: 2, corpus: 4 (8m28s ago), crashers: 0, restarts: 1/9976, execs: 11013732 (21596/sec), cover: 121, uptime: 8m30s
make: *** [fuzz-mempool] Terminated
Makefile:5: recipe for target 'fuzz-mempool' failed
Error: Process completed with exit code 124.
```

https://github.com/tendermint/tendermint/runs/1766661614

`continue-on-error` should make GH ignore any error codes.

* p2p: add prototype PEX reactor for new stack (#5971)

This adds a prototype PEX reactor for the new P2P stack.

* proto/p2p: rename PEX messages and fields (#5974)

Fixes #5899 by renaming a bunch of P2P Protobuf entities (while maintaining wire compatibility):

* `Message` to `PexMessage` (as it's only used for PEX messages).
* `PexAddrs` to `PexResponse`.
* `PexResponse.Addrs` to `PexResponse.Addresses`.
* `NetAddress` to `PexAddress` (as it's only used by PEX).

* p2p: resolve PEX addresses in PEX reactor (#5980)

This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974.

I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options:

1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on.

2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs.

I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this.

* test/p2p: close transports to avoid goroutine leak failures (#5982)

* mempool: fix TestReactorNoBroadcastToSender (#5984)

## Description

Looks like I missed a test in the original PR when fixing the tests.

Closes: #5956

* mempool: fix mempool tests timeout (#5988)

* p2p: use stopCtx when dialing peers in Router (#5983)

This ensures we don't leak dial goroutines when shutting down the router.

* docs: fix typo in state sync example (#5989)

Co-authored-by: Erik Grinaker <erik@interchain.berlin>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Marko <marbar3778@yahoo.com>
Co-authored-by: odidev <odidev@puresoftware.com>
Co-authored-by: Lanie Hei <heixx011@umn.edu>
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Sergey <52304443+c29r3@users.noreply.github.com>
2021-01-26 11:46:21 -08:00
Callum
af723eca8a use correct source of evidence time
Conflicting votes are now sent to the evidence pool to form duplicate vote evidence only once
the height of the evidence is finished and the time of the block finalised.
2021-01-19 16:00:02 +01:00
Callum Waters
03a6fb2777 state: prune states using an iterator (#5864) 2021-01-08 17:05:27 +01:00
Aleksandr Bezobchuk
e986602649 evidence: p2p refactor (#5747) 2021-01-06 11:53:18 -05:00
Erik Grinaker
85353d9af5 test/consensus: improve WaitGroup handling in Byzantine tests (#5861)
Fixes #5845.
2021-01-05 10:44:03 +00:00
Erik Grinaker
8e7d431f6f p2p: rename ID to NodeID 2021-01-04 11:25:20 +01:00
Anton Kaliaev
aef1ac7ba5 modify Reactor priorities (#5826)
blockchain/vX reactor priority was decreased because during the normal operation
(i.e. when the node is not fast syncing) blockchain priority can't be
the same as consensus reactor priority. Otherwise, it's theoretically possible to
slow down consensus by constantly requesting blocks from the node.

NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when
the node is fast syncing, the priority is 10 (max), but when it's done
fast syncing - the priority gets decreased to 5 (only to serve blocks
for other nodes). But it's not possible now, therefore I decided to
focus on the normal operation (priority = 5).

evidence and consensus critical messages are more important than
the mempool ones, hence priorities are bumped by 1 (from 5 to 6).

statesync reactor priority was changed from 1 to 5 to be the same as
blockchain/vX priority.

Refs https://github.com/tendermint/tendermint/issues/5816
2020-12-23 12:31:00 +00:00
Dev Ojha
8c0af72987 consensus: deprecate time iota ms (#5792)
time_iota_ms is intended to ensure that an honest validator always generates timestamps 
with time increasing monotonically. For this purpose, it always suffices to have this parameter
set to `1ms`. Allowing users to choose different numbers increases bug surface area.
Thus the code now ignores the user provided time_iota_ms parameter (marking it as unused), 
and uses 1ms internally.
2020-12-17 17:32:42 +01:00
Anton Kaliaev
be6c016664 consensus: change log level to error when adding vote 2020-12-17 15:59:18 +04:00
Erik Grinaker
bcfc889f25 p2p: implement new Transport interface (#5791)
This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog.

The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack.

The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely.

There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
2020-12-15 15:08:16 +00:00
Tess Rinearson
79890d8393 reactors: omit incoming message bytes from reactor logs (#5743)
After a reactor has failed to parse an incoming message, it shouldn't output the "bad" data into the logs, as that data is unfiltered and could have anything in it. (We also don't think this information is helpful to have in the logs anyways.)
2020-12-03 22:12:08 +00:00
Alessio Treglia
bcb7044d64 consensus: fix flaky tests (#5734)
Replace testing.T.Cleanup() with deferred function
calls in test helpers as those cleanup functions
need to be called once the helper returns and not
when the entire test ends.

This reverts a few of the changes introduced in #5723.

Thanks: @erikgrinaker for pointing this out.
Ref: #5732
2020-12-02 14:21:06 +00:00
Anton Kaliaev
b1bbd37519 libs/bits: validate BitArray in FromProto (#5720)
Closes #5705
2020-12-01 12:44:56 +00:00
Anton Kaliaev
e13b4386ff abci: modify Client interface and socket client (#5673)
`abci.Client`:

- Sync and Async methods now accept a context for cancellation
    * grpc client uses context to cancel both Sync and Async requests
    * local client ignores context parameter
    * socket client uses context to cancel Sync requests and to drop Async requests before sending them if context was cancelled prior to that

- Async methods return an error
    * socket client returns an error immediately if queue is full for Async requests
    * local client always returns nil error
    * grpc client returns an error if context was cancelled before we got response or the receiving queue had a space for response (do not confuse with the sending queue from the socket client)

- specify clients semantics in [doc.go](https://raw.githubusercontent.com/tendermint/tendermint/27112fffa62276bc016d56741f686f0f77931748/abci/client/doc.go)

`mempool.TxInfo`

- add optional `Context` to `TxInfo`, which can be used to cancel `CheckTx` request

Closes #5190
2020-11-30 16:46:16 +04:00
Alessio Treglia
0de4bec862 use Cleanup(),TempDir() in test cases (#5723)
Replace defer with t.Cleanup().

Replace the combination of ioutil.TempDir, error checking
and defer os.RemoveAll() with Go testing.T's new TempDir()
helper.

Mark auxiliary functions as test helpers.
2020-11-30 12:13:25 +00:00
Erik Grinaker
6f9f8b58ae test: fix TestByzantinePrevoteEquivocation flake (#5710)
This fixes spurious `TestByzantinePrevoteEquivocation` failures by extending the block range and time spent waiting for evidence. I've seen many runs where the evidence isn't committed until e.g. height 27. Haven't looked into _why_ this happens, but as long as the evidence is committed eventually and the test doesn't spuriously fail I'm (mostly) happy. WDYT @cmwaters?
2020-11-24 14:49:10 +00:00
Anton Kaliaev
f2f6a78809 docs: warn developers about calling blocking funcs in Receive (#5679)
Refs #2888
2020-11-17 15:37:35 +00:00
Marko
bf35cc6443 cmd: add support for --key (#5612)
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
2020-11-09 15:22:36 +01:00
Callum Waters
3922dde05d evidence: structs can independently form abci evidence (#5610) 2020-11-04 17:14:48 +01:00
Erik Grinaker
17383be202 consensus: open target WAL as read/write during autorepair (#5536)
Fixes #5422. That turned out to be a whole lot easier than expected.
2020-10-20 18:20:41 +00:00
Callum Waters
257b34b459 evidence: don't gossip consensus evidence too soon (#5528)
and don't return errors on seeing the same evidence twice
2020-10-20 18:47:40 +02:00
Marko
346aa14db5 fix lint failures with 1.31 (#5489) 2020-10-13 10:22:53 +02:00
Callum Waters
302aec6dcc evidence: use bytes instead of quantity to limit size (#5449)
## Description

Closes: #5432
2020-10-07 09:29:52 +00:00
Callum Waters
433bdf5063 consensus: check block parts don't exceed maximum block bytes (#5431) 2020-10-01 09:59:19 +02:00
Anton Kaliaev
7d2b3e305e docs: minor tweaks (#5404)
* docs: fix /validators description

Refs https://github.com/tendermint/spec/pull/169

* consensus: remove nil err from logging statement

* update UPGRADING.md

* note about LightBlocks
2020-09-25 11:05:56 +04:00
QuantumExplorer
ecd3cbc2bd fix a few typos (#5402) 2020-09-25 08:38:28 +04:00
Erik Grinaker
58b4deca86 blockstore: fix race conditions when loading data (#5382)
Fixes #5377 and comments in #4588 (review).
2020-09-23 10:13:43 +04:00
Callum Waters
ed002cea7e evidence: introduction of LightClientAttackEvidence and refactor of evidence lifecycle (#5361)
evidence: modify evidence types (#5342)

light: detect light client attacks (#5344)

evidence: refactor evidence pool (#5345)

abci: application evidence prepared by evidence pool (#5354)
2020-09-22 10:22:54 +02:00
Marko
56911ee352 state: define interface for state store (#5348)
## Description

Make an interface for the state store. 

Closes: #5213
2020-09-15 07:45:48 +00:00
Marko
0ed8dba991 lint: enable errcheck (#5336)
## Description

Enable errcheck linter throughout the codebase

Closes: #5059
2020-09-07 15:03:18 +00:00
Callum Waters
3359e0bf2f resurrect consensus tests (#5334) 2020-09-07 09:54:13 +04:00
Marko
b8d08b9ef4 lint: add errchecks (#5316)
## Description

Work towards enabling errcheck

ref #5059
2020-09-04 11:58:03 +00:00
Marko
4787c5b61a metrics: switch from gauge to histogram (#5326)
## Description

Part of the issue is to add metrics to the websocket connection. It seems this would require some moving around of things in the node pkg. I opted to not make this change now, and wait for when we do a node pkg refactor. 

If someone disagrees with this appraoch please let me know, I can attempt to get metrics into the rpc layer.

There is not a need to update documentation as it already states this metric is a histogram..

Closes: #1791
2020-09-03 13:08:13 +00:00
Erik Grinaker
63ea4f1d26 consensus: fix wrong proposer schedule for InitChain validators (#5329)
Fixes #5328.
2020-09-03 12:58:09 +00:00
Marko
f7d4fafa73 docs: add missing metrics (#5325)
## Description

Add missing metrics. 

`Blockchain/v2` exposes metrics for events. I don't find these as something a node operator should utilize as it does not bring insight. 

Closes: #XXX
2020-09-03 07:50:31 +00:00
Marko
710a97d850 evidence: remove amnesia & POLC (#5319)
## Description

remove unneeded types 

![](https://media1.giphy.com/media/fSAyceY3BCgtiQGnJs/giphy.gif)

ref #5288
2020-09-02 13:05:15 +00:00
dongsam
e30b125725 consensus: double-sign risk reduction (ADR-51) (#5147)
Implementation spec of Double Signing Risk Reduction [ADR-51](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-051-double-signing-risk-reduction.md) by B-Harvest
- Add `DoubleSignCheckHeight` config variable to ConsensusConfig for "How many blocks looks back to check existence of the node's consensus votes when before joining consensus"
- Add `consensus.double_sign_check_height` to `config.toml` and `tendermint node` as flag for set `DoubleSignCheckHeight`
- Set default `consensus.double_sign_check_height` to `0`  ( it could be adjustable in this PR, disable when 0  )

Refs

- [ADR-51](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-051-double-signing-risk-reduction.md)
- [https://github.com/tendermint/tendermint/issues/4059](https://github.com/tendermint/tendermint/issues/4059)
- [https://github.com/tendermint/tendermint/pull/4262](https://github.com/tendermint/tendermint/pull/4262)
2020-08-27 08:57:36 +04:00
Callum Waters
b7f6e47a42 evidence: modularise evidence by moving verification function into evidence package (#5234) 2020-08-20 18:11:21 +02:00
Marko
42e4e8b58e lint: add markdown linter (#5254) 2020-08-17 16:40:50 +02:00
Marko
9e98c74e3c crypto: API modifications (#5236)
## Description

This PR aims to make the crypto.PubKey interface more intuitive. 

Changes: 

- `VerfiyBytes` -> `VerifySignature`

Before `Bytes()` was amino encoded, now since it is the byte representation should we get rid of it entirely?

EDIT: decided to keep `Bytes()` as it is useful if you are using the interface instead of the concrete key

Closes: #XXX
2020-08-13 12:29:16 +00:00
Erik Grinaker
e1a1395cf4 consensus: don't check InitChain app hash vs genesis app hash, replace it (#5237)
Followup from #5227. Instead of checking `ResponseInitChain.app_hash` against the genesis doc app hash, we instead replace it. We should probably remove the genesis doc app hash completely, and rely solely on the one from `InitChain`, I'll open a separate issue to discuss this.
2020-08-13 08:58:07 +00:00
Erik Grinaker
cc247c091b genesis: add support for arbitrary initial height (#5191)
Adds a genesis parameter `initial_height` which specifies the initial block height, as well as ABCI `RequestInitChain.InitialHeight` to pass it to the ABCI application, and `State.InitialHeight` to keep track of the initial height throughout the code. Fixes #2543, based on [RFC-002](https://github.com/tendermint/spec/pull/119). Spec changes in https://github.com/tendermint/spec/pull/135.
2020-08-11 17:03:28 +00:00
Erik Grinaker
08ffe13295 abci: add ResponseInitChain.app_hash, check and record it (#5227)
Fixes #5177.
2020-08-11 14:28:11 +00:00
Marko
40bd416d59 test: protobuf vectors for reactors (#5221)
## Description

Add test vectors for all reactors

- [x] state-sync
- [x] privval
- [x] mempool
- [x] p2p
- [x] evidence
- [ ] light?

this PR is primarily oriented at testvectors for things going over the wire. should we expand the testvectors into types as well?

Closes: #XXX
2020-08-11 14:00:11 +00:00
Callum Waters
312c4f8fe1 evidence: change evidence time to block time (#5219)
adds blockstore interface to evidence and adds fix to byzantine test
2020-08-11 14:39:07 +02:00
Erik Grinaker
f66b7a8e32 merkle: return hashes for empty merkle trees (#5193)
Fixes #5192.

@liamsi Can you verify that the test vectors match the Rust implementation? I updated `ProofsFromByteSlices()` as well, anything else that should be updated?
2020-08-11 10:31:05 +00:00
Callum Waters
77429b71d6 fix assertions on byzantine test (#5171) 2020-07-30 15:24:29 +02:00
Anton Kaliaev
0d8d721999 consensus: only call privValidator.GetPubKey once per block (#5143)
Closes #4865
2020-07-30 09:44:04 +00:00
Marko
dc71f265aa types: check if nil or empty valset (#5167)
Solves #5138 in the way that if a validatorSet is nil or empty it will not try to transform it to protobug

Co-authored-by: Callum Michael Waters <cmwaters19@gmail.com>
2020-07-29 20:16:42 +02:00
Callum Waters
bc8b3e830b consensus: added byzantine test, modified previous test (#5150) 2020-07-28 15:27:22 +02:00