Commit Graph

48 Commits

Author SHA1 Message Date
Cyrus Goh
5182ffee25 docs: master → docs-staging (#5990)
* Makefile: always pull image in proto-gen-docker. (#5953)

The `proto-gen-docker` target didn't pull an updated Docker image, and would use a local image if present which could be outdated and produce wrong results.

* test: fix TestPEXReactorRunning data race (#5955)

Fixes #5941.

Not entirely sure that this will fix the problem (couldn't reproduce), but in any case this is an artifact of a hack in the P2P transport refactor to make it work with the legacy P2P stack, and will be removed when the refactor is done anyway.

* test/fuzz: move fuzz tests into this repo (#5918)

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>

Closes #5907

- add init-corpus to blockchain reactor
- remove validator-set FromBytes test
now that we have proto, we don't need to test it! bye amino
- simplify mempool test
do we want to test remote ABCI app?
- do not recreate mux on every crash in jsonrpc test
- update p2p pex reactor test
- remove p2p/listener test
the API has changed + I did not understand what it's tested anyway
- update secretconnection test
- add readme and makefile
- list inputs in readme
- add nightly workflow
- remove blockchain fuzz test
EncodeMsg / DecodeMsg no longer exist

* docker: dont login when in PR (#5961)

* docker: release Linux/ARM64 image (#5925)

Co-authored-by: Marko <marbar3778@yahoo.com>

* p2p: make PeerManager.DialNext() and EvictNext() block (#5947)

See #5936 and #5938 for background.

The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about.

I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.

* libs/log: format []byte as hexidecimal string (uppercased) (#5960)

Closes: #5806 

Co-authored-by: Lanie Hei <heixx011@umn.edu>

* docs: log level docs (#5945)

## Description

add section on configuring log levels

Closes: #XXX

* .github: fix fuzz-nightly job (#5965)

outputs is a property of the job, not an individual step.

* e2e: add control over the log level of nodes (#5958)

* mempool: fix reactor tests (#5967)

## Description

Update the faux router to either drop channel errors or handle them based on an argument. This prevents deadlocks in tests where we try to send an error on the mempool channel but there is no reader.

Closes: #5956

* p2p: improve peerStore prototype (#5954)

This improves the `peerStore` prototype by e.g.:

* Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance.
* Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations.
* Caching the ranked peer set, as a temporary solution until a better data structure is implemented.
* Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full.
* Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics.

* p2p: simplify PeerManager upgrade logic (#5962)

Follow-up from #5947, branched off of #5954.

This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes:

* Add `evict` map which queues up peers to explicitly evict.
* `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`).
* `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity.
* `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected).
* `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer.

This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of.

* p2p: add PeerManager.Advertise() (#5957)

Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement.

* blockchain v0: fix waitgroup data race (#5970)

## Description

Fixes the data race in usage of `WaitGroup`. Specifically, the case where we invoke `Wait` _before_ the first delta `Add` call when the current waitgroup counter is zero. See https://golang.org/pkg/sync/#WaitGroup.Add.

Still not sure how this manifests itself in a test since the reactor has to be stopped virtually immediately after being started (I think?).

Regardless, this is the appropriate fix.

closes: #5968

* tests: fix `make test` (#5966)

## Description
 
- bump deadlock dep to master
  - fixes `make test` since we now use `deadlock.Once`

Closes: #XXX

* terminate go-fuzz gracefully (w/ SIGINT) (#5973)

and preserve exit code.

```
2021/01/26 03:34:49 workers: 2, corpus: 4 (8m28s ago), crashers: 0, restarts: 1/9976, execs: 11013732 (21596/sec), cover: 121, uptime: 8m30s
make: *** [fuzz-mempool] Terminated
Makefile:5: recipe for target 'fuzz-mempool' failed
Error: Process completed with exit code 124.
```

https://github.com/tendermint/tendermint/runs/1766661614

`continue-on-error` should make GH ignore any error codes.

* p2p: add prototype PEX reactor for new stack (#5971)

This adds a prototype PEX reactor for the new P2P stack.

* proto/p2p: rename PEX messages and fields (#5974)

Fixes #5899 by renaming a bunch of P2P Protobuf entities (while maintaining wire compatibility):

* `Message` to `PexMessage` (as it's only used for PEX messages).
* `PexAddrs` to `PexResponse`.
* `PexResponse.Addrs` to `PexResponse.Addresses`.
* `NetAddress` to `PexAddress` (as it's only used by PEX).

* p2p: resolve PEX addresses in PEX reactor (#5980)

This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974.

I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options:

1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on.

2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs.

I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this.

* test/p2p: close transports to avoid goroutine leak failures (#5982)

* mempool: fix TestReactorNoBroadcastToSender (#5984)

## Description

Looks like I missed a test in the original PR when fixing the tests.

Closes: #5956

* mempool: fix mempool tests timeout (#5988)

* p2p: use stopCtx when dialing peers in Router (#5983)

This ensures we don't leak dial goroutines when shutting down the router.

* docs: fix typo in state sync example (#5989)

Co-authored-by: Erik Grinaker <erik@interchain.berlin>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Marko <marbar3778@yahoo.com>
Co-authored-by: odidev <odidev@puresoftware.com>
Co-authored-by: Lanie Hei <heixx011@umn.edu>
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Sergey <52304443+c29r3@users.noreply.github.com>
2021-01-26 11:46:21 -08:00
Callum Waters
5cbb8263b4 patch concurrency issue with pruning in the light store (#5910) 2021-01-17 17:14:14 +01:00
Callum Waters
ca285844ea light: fix light store deadlock (#5901) 2021-01-14 17:22:09 +01:00
Callum Waters
385ea1db7d store: use db iterators for pruning and range-based queries (#5848) 2021-01-08 13:12:54 +01:00
Callum Waters
9b9222f461 store: order-preserving varint key encoding (#5771) 2021-01-05 16:53:26 +01:00
Callum Waters
f368b91caf light: minor fixes / standardising errors (#5716)
## Description

I'm just doing a self audit of the light client. There's a few things I've changed

- Validate trust level in `VerifyNonAdjacent` function
- Make errNoWitnesses public (it's something people running software on top of a light client should be able to parse)
- Remove `ChainID` check of witnesses on start up. We do this already when we compare the first header with witnesses
- Remove `ChainID()` from provider interface

Closes: #4538
2020-12-01 12:53:53 +00:00
Callum Waters
68dc751a8c light: ensure required header fields are present for verification (#5677) 2020-11-19 12:02:58 +01:00
Callum Waters
909da42789 light: make fraction parts uint64, ensuring that it is always positive (#5655) 2020-11-17 14:23:16 +01:00
Callum Waters
3922dde05d evidence: structs can independently form abci evidence (#5610) 2020-11-04 17:14:48 +01:00
Anton Kaliaev
627f7b5989 light: run detector for sequentially validating light client (#5538)
Closes #5445
2020-11-02 12:42:03 +04:00
Anton Kaliaev
8e6194626e light: model-based tests (#5461)
This is the first iteration of model-based testing in Go Tendermint. The test runner is using the static JSON fixtures located under the ./json directory. In the future, the Rust tensgen binary will be used to generate those (given the static intermediate scenarios and the test seed, which will be published along with each testgen release).

Closes: #5322
2020-11-02 12:07:18 +04:00
Anton Kaliaev
7121f68f25 light/rpc: fix ABCIQuery (#5375)
Closes #5106
2020-10-12 16:36:37 +04:00
Anton Kaliaev
12ebd7735a light: cross-check the very first header (#5429)
Closes #5428
2020-10-09 14:29:22 +04:00
Callum Waters
a4b7018732 light: expand on errors and docs (#5443) 2020-10-02 20:05:15 +02:00
Callum Waters
f02987e7bc simplify commit and validators rpc calls (#5393) 2020-09-25 11:19:04 +02:00
Callum Waters
ca8a404c7c cli: light home dir should default to where the full node default is (#5392) 2020-09-25 08:45:56 +02:00
Anton Kaliaev
4b99502d5b config: set time_iota_ms to timeout_commit in test genesis (#5386)
also, document consensus parameters.
https://forum.cosmos.network/t/consensus-timeouts-explained/1421

Closes #4489
2020-09-23 12:24:45 +04:00
Anton Kaliaev
85a4be87a7 rpc/client: take context as first param (#5347)
Closes #5145

also applies to light/client
2020-09-23 09:21:57 +04:00
Callum Waters
ed002cea7e evidence: introduction of LightClientAttackEvidence and refactor of evidence lifecycle (#5361)
evidence: modify evidence types (#5342)

light: detect light client attacks (#5344)

evidence: refactor evidence pool (#5345)

abci: application evidence prepared by evidence pool (#5354)
2020-09-22 10:22:54 +02:00
Marko
6ab2a19088 header: check block protocol (#5340)
## Description

Check block protocol version in header validate basic. 

I tried searching for where we check the P2P protocol version but was unable to find it. When we check compatibility with a node we check we both have the same block protocol and are on the same network, but we do not check if we are on the same P2P protocol. It makes sense if there is a handshake change because we would not be able to establish a secure connection, but a p2p protocol version bump may be because of a p2p message change, which would go unnoticed until that message is sent over the wire.  Is this purposeful?

Closes: #4790
2020-09-09 09:13:18 +00:00
Marko
0ed8dba991 lint: enable errcheck (#5336)
## Description

Enable errcheck linter throughout the codebase

Closes: #5059
2020-09-07 15:03:18 +00:00
Callum Waters
e2927d2088 light: move dropout handling and invalid data to the provider (#5308) 2020-09-02 18:28:48 +02:00
Marko
e0140e4beb evidence: remove ConflictingHeaders type (#5317)
## Description

Remove ConflictingHeaders & compositeEvidence types


Ref #5288
2020-09-01 16:34:37 +00:00
Callum Waters
2b58a62721 light: implement light block (#5298) 2020-09-01 17:45:55 +02:00
Callum Waters
86707862d4 fix validator set proposer priorities in light client provider (#5307) 2020-08-31 12:47:38 +02:00
Callum Waters
b7f6e47a42 evidence: modularise evidence by moving verification function into evidence package (#5234) 2020-08-20 18:11:21 +02:00
Marko
2d167aefcf ci: freeze golangci action version (#5196)
## Description

This PR updates golang-ci to latest and stops looking at master for the action. 

Closes: #XXX
2020-08-03 07:57:06 +00:00
Marko
2ac5a559b4 libs: wrap mutexes for build flag with godeadlock (#5126)
## Description

This PR wraps the stdlib sync.(RW)Mutex & godeadlock.(RW)Mutex. This enables using go-deadlock via a build flag instead of using sed to replace sync with godeadlock in all files

Closes: #3242
2020-07-20 07:55:09 +00:00
Anton Kaliaev
5223cbac27 light: return if target header is invalid (#5124)
Closes #5120
2020-07-17 06:13:00 +00:00
Anton Kaliaev
a08316f16a light: use bisection (not VerifyCommitTrusting) when verifying a head… (#5119)
Closes #4934

* light: do not compare trusted header w/ witnesses

we don't have trusted state to bisect from

* check header before checking height

otherwise you can get nil panic
2020-07-15 15:44:30 +04:00
Marko
6ccccb0933 lint: errcheck (#5091)
## Description

add more error checks to tests


gonna do a third PR that tackles the non test cases
2020-07-14 11:04:41 +00:00
Callum Waters
de8cb8c16d light: fix rpc calls: /block_results & /validators (#5104) 2020-07-10 14:20:31 +02:00
Anton Kaliaev
42be533129 types: verify commit fully
Since the light client work introduced in v0.33 it appears full nodes
are no longer fully verifying commit signatures during block execution -
they stop after +2/3. See in VerifyCommit:
0c7fd316eb/types/validator_set.go (L700-L703)

This means proposers can propose blocks that contain valid +2/3
signatures and then the rest of the signatures can be whatever they
want. They can claim that all the other validators signed just by
including a CommitSig with arbitrary signature data. While this doesn't
seem to impact safety of Tendermint per se, it means that Commits may
contain a lot of invalid data. This is already true of blocks, since
they can include invalid txs filled with garbage, but in that case the
application knows they they are invalid and can punish the proposer. But
since applications dont verify commit signatures directly (they trust
tendermint to do that), they won't be able to detect it.

This can impact incentivization logic in the application that depends on
the LastCommitInfo sent in BeginBlock, which includes which validators
signed. For instance, Gaia incentivizes proposers with a bonus for
including more than +2/3 of the signatures. But a proposer can now claim
that bonus just by including arbitrary data for the final -1/3 of
validators without actually waiting for their signatures. There may be
other tricks that can be played because of this.

In general, the full node should be a fully verifying machine. While
it's true that the light client can avoid verifying all signatures by
stopping after +2/3, the full node can not. Thus the light client and
full node should use distinct VerifyCommit functions if one is going to
stop after +2/3 or otherwise perform less validation (for instance light
clients can also skip verifying votes for nil while full nodes can not).

See a commit with a bad signature that verifies here: 56367fd. From what
I can tell, Tendermint will go on to think this commit is valid and
forward this data to the app, so the app will think the second validator
actually signed when it clearly did not.
2020-07-02 15:41:49 +02:00
Erik Grinaker
04b8cf7879 deps: bump tm-db to 0.6.0 (#5058) 2020-06-29 16:07:37 +02:00
Callum Waters
65d7ce9c9c evidence: improve amnesia evidence handling (#5003)
fix bug so that PotentialAmnesiaEvidence is being gossiped

handle inbound amnesia evidence correctly

add method to check if potential amnesia evidence is on trial

fix a bug with the height when we upgrade to amnesia evidence

change evidence to using just pointers.

More logging in the evidence module

Co-authored-by: Marko <marbar3778@yahoo.com>
2020-06-23 17:09:14 +02:00
Marko
dedf0d2350 proto: folder structure adhere to buf (#5025) 2020-06-22 10:00:51 +02:00
Marko
51da4fe356 types: rename partsheader to partsetheader (#5029)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-06-22 09:38:03 +02:00
Anton Kaliaev
257a374b78 rpc: add /check_tx endpoint (#5017)
Closes #4549
2020-06-19 09:25:52 +04:00
Marko
f6243d8b9e privval: migrate to protobuf (#4985) 2020-06-11 11:54:02 +02:00
Callum Waters
b1dba352b0 light: added more tests for pruning, initialization and bisection (#4978) 2020-06-10 18:56:24 +02:00
Marko
46f6d17601 crypto/merkle: remove simple prefix (#4989)
## Description

This PR removes simple prefix from all types in the crypto/merkle directory.

The two proto types `Proof` & `ProofOp` have been moved to the `proto/crypto/merkle` directory.

proto messge `Proof` was renamed to `ProofOps` and `SimpleProof` message to `Proof`. 

Closes: #2755
2020-06-10 14:57:38 +00:00
Erik Grinaker
ba3a2dde37 rpc: replace Amino with new JSON encoder (#4968)
Migrates the `rpc` package to use new JSON encoder in #4955. Branched off of that PR.

Tests pass, but I haven't done any manual testing beyond that. This should be handled as part of broader 0.34 testing.
2020-06-08 12:04:05 +00:00
Marko
4a87d60736 light: migrate to proto (#4964) 2020-06-08 09:14:58 +02:00
Marko
9ef266b88f types: migrate params to protobuf (#4962) 2020-06-05 15:29:53 +02:00
Marko
a88537bb88 ints: stricter numbers (#4939) 2020-06-04 16:34:56 +02:00
Callum Waters
d53a8d0377 light: implement validate basic (#4916)
run a validate basic on inbound validator sets and headers before further processing them
2020-06-04 13:45:39 +02:00
Anton Kaliaev
ce3c9c2341 rpc/core: return an error if page=0 (#4947)
* rpc/core: return an error if `page=0`

Closes #4942

affected endpoints:

- /validators
- /tx_search

* swagger: update doc for /unconfirmed_txs
2020-06-03 16:51:51 +04:00
Marko
c2578e2262 light: rename lite2 to light & remove lite (#4946)
This PR removes lite & renames lite2 to light throughout the repo

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

Closes: #4944
2020-06-03 10:13:42 +00:00