tendermint

mirror of https://github.com/tendermint/tendermint.git synced 2026-06-10 00:03:04 +00:00

Author	SHA1	Message	Date
Cyrus Goh	5182ffee25	docs: master → docs-staging (#5990 ) * Makefile: always pull image in proto-gen-docker. (#5953) The `proto-gen-docker` target didn't pull an updated Docker image, and would use a local image if present which could be outdated and produce wrong results. * test: fix TestPEXReactorRunning data race (#5955) Fixes #5941. Not entirely sure that this will fix the problem (couldn't reproduce), but in any case this is an artifact of a hack in the P2P transport refactor to make it work with the legacy P2P stack, and will be removed when the refactor is done anyway. * test/fuzz: move fuzz tests into this repo (#5918) Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com> Closes #5907 - add init-corpus to blockchain reactor - remove validator-set FromBytes test now that we have proto, we don't need to test it! bye amino - simplify mempool test do we want to test remote ABCI app? - do not recreate mux on every crash in jsonrpc test - update p2p pex reactor test - remove p2p/listener test the API has changed + I did not understand what it's tested anyway - update secretconnection test - add readme and makefile - list inputs in readme - add nightly workflow - remove blockchain fuzz test EncodeMsg / DecodeMsg no longer exist * docker: dont login when in PR (#5961) * docker: release Linux/ARM64 image (#5925) Co-authored-by: Marko <marbar3778@yahoo.com> * p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well. * libs/log: format []byte as hexidecimal string (uppercased) (#5960) Closes: #5806 Co-authored-by: Lanie Hei <heixx011@umn.edu> * docs: log level docs (#5945) ## Description add section on configuring log levels Closes: #XXX * .github: fix fuzz-nightly job (#5965) outputs is a property of the job, not an individual step. * e2e: add control over the log level of nodes (#5958) * mempool: fix reactor tests (#5967) ## Description Update the faux router to either drop channel errors or handle them based on an argument. This prevents deadlocks in tests where we try to send an error on the mempool channel but there is no reader. Closes: #5956 * p2p: improve peerStore prototype (#5954) This improves the `peerStore` prototype by e.g.: * Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance. * Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations. * Caching the ranked peer set, as a temporary solution until a better data structure is implemented. * Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full. * Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics. * p2p: simplify PeerManager upgrade logic (#5962) Follow-up from #5947, branched off of #5954. This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes: * Add `evict` map which queues up peers to explicitly evict. * `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`). * `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity. * `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected). * `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer. This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of. * p2p: add PeerManager.Advertise() (#5957) Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement. * blockchain v0: fix waitgroup data race (#5970) ## Description Fixes the data race in usage of `WaitGroup`. Specifically, the case where we invoke `Wait` _before_ the first delta `Add` call when the current waitgroup counter is zero. See https://golang.org/pkg/sync/#WaitGroup.Add. Still not sure how this manifests itself in a test since the reactor has to be stopped virtually immediately after being started (I think?). Regardless, this is the appropriate fix. closes: #5968 * tests: fix `make test` (#5966) ## Description - bump deadlock dep to master - fixes `make test` since we now use `deadlock.Once` Closes: #XXX * terminate go-fuzz gracefully (w/ SIGINT) (#5973) and preserve exit code. ``` 2021/01/26 03:34:49 workers: 2, corpus: 4 (8m28s ago), crashers: 0, restarts: 1/9976, execs: 11013732 (21596/sec), cover: 121, uptime: 8m30s make: *** [fuzz-mempool] Terminated Makefile:5: recipe for target 'fuzz-mempool' failed Error: Process completed with exit code 124. ``` https://github.com/tendermint/tendermint/runs/1766661614 `continue-on-error` should make GH ignore any error codes. * p2p: add prototype PEX reactor for new stack (#5971) This adds a prototype PEX reactor for the new P2P stack. * proto/p2p: rename PEX messages and fields (#5974) Fixes #5899 by renaming a bunch of P2P Protobuf entities (while maintaining wire compatibility): * `Message` to `PexMessage` (as it's only used for PEX messages). * `PexAddrs` to `PexResponse`. * `PexResponse.Addrs` to `PexResponse.Addresses`. * `NetAddress` to `PexAddress` (as it's only used by PEX). * p2p: resolve PEX addresses in PEX reactor (#5980) This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974. I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options: 1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on. 2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs. I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this. * test/p2p: close transports to avoid goroutine leak failures (#5982) * mempool: fix TestReactorNoBroadcastToSender (#5984) ## Description Looks like I missed a test in the original PR when fixing the tests. Closes: #5956 * mempool: fix mempool tests timeout (#5988) * p2p: use stopCtx when dialing peers in Router (#5983) This ensures we don't leak dial goroutines when shutting down the router. * docs: fix typo in state sync example (#5989) Co-authored-by: Erik Grinaker <erik@interchain.berlin> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> Co-authored-by: Marko <marbar3778@yahoo.com> Co-authored-by: odidev <odidev@puresoftware.com> Co-authored-by: Lanie Hei <heixx011@umn.edu> Co-authored-by: Callum Waters <cmwaters19@gmail.com> Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com> Co-authored-by: Sergey <52304443+c29r3@users.noreply.github.com>	2021-01-26 11:46:21 -08:00
Aleksandr Bezobchuk	68bd2116f0	mempool: p2p refactor (#5919 )	2021-01-22 09:34:12 -05:00
Aleksandr Bezobchuk	62d7a5d028	blockchain v0: p2p refactor (#5858 )	2021-01-18 16:35:11 -05:00
Erik Grinaker	0555772d3a	blockchain/v0: stop tickers on poolRoutine exit (#5860 ) Fixes #5841.	2021-01-05 14:45:24 +00:00
Erik Grinaker	1e1d087494	blockchain/v2: fix missing mutex unlock (#5862 ) Fixes #5843.	2021-01-05 14:35:20 +00:00
Erik Grinaker	b4ce1de44a	p2p: rename NodeInfo.DefaultNodeID to NodeID	2021-01-04 11:25:20 +01:00
Erik Grinaker	8e7d431f6f	p2p: rename ID to NodeID	2021-01-04 11:25:20 +01:00
Anton Kaliaev	aef1ac7ba5	modify Reactor priorities (#5826 ) blockchain/vX reactor priority was decreased because during the normal operation (i.e. when the node is not fast syncing) blockchain priority can't be the same as consensus reactor priority. Otherwise, it's theoretically possible to slow down consensus by constantly requesting blocks from the node. NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when the node is fast syncing, the priority is 10 (max), but when it's done fast syncing - the priority gets decreased to 5 (only to serve blocks for other nodes). But it's not possible now, therefore I decided to focus on the normal operation (priority = 5). evidence and consensus critical messages are more important than the mempool ones, hence priorities are bumped by 1 (from 5 to 6). statesync reactor priority was changed from 1 to 5 to be the same as blockchain/vX priority. Refs https://github.com/tendermint/tendermint/issues/5816	2020-12-23 12:31:00 +00:00
Erik Grinaker	e198edf20e	p2p: remove `NodeInfo` interface and rename `DefaultNodeInfo` struct (#5799 ) The `NodeInfo` interface does not appear to serve any purpose at all, so I removed it and renamed the `DefaultNodeInfo` struct to `NodeInfo` (including the Protobuf representations). Let me know if this is actually needed for anything. Only the Protobuf rename is listed in the changelog, since we do not officially support API stability of the `p2p` package (according to `README.md`). The on-wire protocol remains compatible.	2020-12-15 18:54:25 +00:00
Anton Kaliaev	5aa859c370	blockchain/v2: send status request when new peer joins (#5774 ) Closes #5766 * memoize the scSchedulerFail error to avoid printing it every scheduleFreq * blockchain/v2: modify switchIO funcs to accept peer instead of peerID	2020-12-14 11:25:28 +04:00
Anton Kaliaev	89e908e340	blockchain/v0: relax termination conditions and increase sync timeout (#5741 ) Closes: #5718	2020-12-08 11:33:03 +04:00
Tess Rinearson	79890d8393	reactors: omit incoming message bytes from reactor logs (#5743 ) After a reactor has failed to parse an incoming message, it shouldn't output the "bad" data into the logs, as that data is unfiltered and could have anything in it. (We also don't think this information is helpful to have in the logs anyways.)	2020-12-03 22:12:08 +00:00
Anton Kaliaev	243ff4b43d	blockchain/v1: remove in favor of v2 (#5728 )	2020-12-03 09:35:47 +04:00
Anton Kaliaev	33dbff61d3	blockchain/v1: fix deadlock (#5711 ) I introduced a new variable - syncEnded, which is now used to prevent sending new events to channels (which would block otherwise) if reactor is finished syncing Closes #4591	2020-12-01 13:08:33 +00:00
Anton Kaliaev	3ad1157451	blockchain/v1: handle peers without blocks (#5701 ) Closes #5444 Now we record the fact that a peer does not have a requested block and later use this information to make a new request for the same block from another peer.	2020-11-23 11:59:34 +00:00
Anton Kaliaev	f2f6a78809	docs: warn developers about calling blocking funcs in Receive (#5679 ) Refs #2888	2020-11-17 15:37:35 +00:00
Anton Kaliaev	335e97433c	blockchain/v2: remove peers from the processor (#5607 ) after they were pruned by the scheduler Closes #5513	2020-11-05 12:24:48 +00:00
Anton Kaliaev	bcf9b0aa39	blockchain/v2: make the removal of an already removed peer a noop (#5553 ) also, since multiple StopPeerForError calls may be executed in parallel, only execute StopPeerForError once Closes #5541	2020-10-30 10:31:22 +00:00
Anton Kaliaev	b4adeab8b9	blockchain/v2: fix panic: processed height X+1 but expected height X (#5530 ) Before: scheduler receives psBlockProcessed event, but does not mark block as processed because peer timed out (or was removed for other reasons) and all associated blocks were rescheduled. After: scheduler receives psBlockProcessed event and marks block as processed in any case (even if peer who provided this block errors). Closes #5387	2020-10-20 14:29:36 +04:00
Anton Kaliaev	d785036e0b	blockchain/v2: fix "panic: duplicate block enqueued by processor" (#5499 ) When a peer is stopped due to some network issue, the Reactor calls scheduler#handleRemovePeer, which removes the peer from the scheduler. BUT the peer stays in the processor, which sometimes could lead to "duplicate block enqueued by processor" panic WHEN the same block is requested by the scheduler again from a different peer. The solution is to return scPeerError, which will be propagated to the processor. The processor will clean up the blocks associated with the peer in purgePeer. Closes #5513, #5517	2020-10-20 14:19:00 +04:00
Marko	e1644d00c5	mempool: length prefix txs when getting them from mempool (#5483 ) ## Description In protobuf `[]byte` is varint encoded. When adding txs to the block we were not taking this into account. Closes: #XXX	2020-10-13 10:33:21 +00:00
Marko	346aa14db5	fix lint failures with 1.31 (#5489 )	2020-10-13 10:22:53 +02:00
Callum Waters	6a2a71be07	correctly calculate evidence data size (#5482 )	2020-10-12 11:28:41 +02:00
Callum Waters	4f79930c12	blockchain: remove duplication of validate basic (#5418 )	2020-09-28 17:02:46 +02:00
Marko	95367eaf51	blockchain/v1: add noBlockResponse handling (#5401 ) ## Description Add simple `NoBlockResponse` handling to blockchain reactor v1. I tested before and after with erik's e2e testing and was not able to reproduce the inability to sync after the changes were applied Closes: #5394	2020-09-28 07:20:54 +00:00
Callum Waters	ed002cea7e	evidence: introduction of LightClientAttackEvidence and refactor of evidence lifecycle (#5361 ) evidence: modify evidence types (#5342) light: detect light client attacks (#5344) evidence: refactor evidence pool (#5345) abci: application evidence prepared by evidence pool (#5354)	2020-09-22 10:22:54 +02:00
Marko	56911ee352	state: define interface for state store (#5348 ) ## Description Make an interface for the state store. Closes: #5213	2020-09-15 07:45:48 +00:00
Marko	6ab2a19088	header: check block protocol (#5340 ) ## Description Check block protocol version in header validate basic. I tried searching for where we check the P2P protocol version but was unable to find it. When we check compatibility with a node we check we both have the same block protocol and are on the same network, but we do not check if we are on the same P2P protocol. It makes sense if there is a handshake change because we would not be able to establish a secure connection, but a p2p protocol version bump may be because of a p2p message change, which would go unnoticed until that message is sent over the wire. Is this purposeful? Closes: #4790	2020-09-09 09:13:18 +00:00
Marko	0ed8dba991	lint: enable errcheck (#5336 ) ## Description Enable errcheck linter throughout the codebase Closes: #5059	2020-09-07 15:03:18 +00:00
Marko	135ac0400e	blockchain: verify +2/3 (#5278 ) ## Description Verify only +2/3 of the commit. Closes: #5259	2020-08-25 07:07:19 +00:00
Erik Grinaker	edf5cff80f	blockchain: fix fast sync halt with initial height > 1 (#5249 ) Blockchain reactors were not updated to handle arbitrary initial height after #5191.	2020-08-14 13:04:51 +00:00
Marko	40bd416d59	test: protobuf vectors for reactors (#5221 ) ## Description Add test vectors for all reactors - [x] state-sync - [x] privval - [x] mempool - [x] p2p - [x] evidence - [ ] light? this PR is primarily oriented at testvectors for things going over the wire. should we expand the testvectors into types as well? Closes: #XXX	2020-08-11 14:00:11 +00:00
Erik Grinaker	f66b7a8e32	merkle: return hashes for empty merkle trees (#5193 ) Fixes #5192. @liamsi Can you verify that the test vectors match the Rust implementation? I updated `ProofsFromByteSlices()` as well, anything else that should be updated?	2020-08-11 10:31:05 +00:00
n-hutton	375f0c819f	add fixes for flaky tests (#5146 ) While working on tendermint my colleague @jinmannwong fixed a few of the unit tests that we found to be flaky in our CI. We thought that you might find this useful, see below for comments.	2020-07-27 10:36:56 +04:00
Marko	2ac5a559b4	libs: wrap mutexes for build flag with godeadlock (#5126 ) ## Description This PR wraps the stdlib sync.(RW)Mutex & godeadlock.(RW)Mutex. This enables using go-deadlock via a build flag instead of using sed to replace sync with godeadlock in all files Closes: #3242	2020-07-20 07:55:09 +00:00
Marko	7c8c356f71	ci: version linter fix (#5128 ) ## Description linter version fix and run make format to have all ci run Closes: #XXX	2020-07-16 09:01:02 +00:00
Marko	6ccccb0933	lint: errcheck (#5091 ) ## Description add more error checks to tests gonna do a third PR that tackles the non test cases	2020-07-14 11:04:41 +00:00
Anton Kaliaev	730e16566e	proto: change type + a cleanup (#5107 ) - drop Height & Base from StatusRequest It does not make sense nor it's used anywhere currently. Also, there seem to be no trace of these fields in the ADR-40 (blockchain reactor v2). - change PacketMsg#EOF type from int32 to bool	2020-07-13 10:24:17 +00:00
Lei Wang	430162f8a1	Update reactor.go (#5088 ) check bcR.fastSync flag when "OnStop" fix "service/service.go:161 Not stopping BlockPool -- have not been started yet {"impl": "BlockPool"}" error when kill process	2020-07-07 09:47:49 +00:00
Marko	943bbd75a4	blockchain: test vectors for proto encoding (#5073 ) ## Description this PR adds test vectors for proto encoding. the main difference from amino was the removal of four bytes due to interface encoding. should i add more cases? Closes: #XXX	2020-07-02 13:48:31 +00:00
Marko	7e2cc1db5e	linter: (1/2) enable errcheck (#5064 ) ## Description partially cleanup in preparation for errcheck i ignored a bunch of defer errors in tests but with the update to go 1.14 we can use `t.Cleanup(func() { if err := <>; err != nil {..}}` to cover those errors, I will do this in pr number two of enabling errcheck. ref #5059	2020-07-01 15:13:11 +00:00
Marko	dedf0d2350	proto: folder structure adhere to buf (#5025 )	2020-06-22 10:00:51 +02:00
Marko	51da4fe356	types: rename partsheader to partsetheader (#5029 ) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2020-06-22 09:38:03 +02:00
Marko	f6243d8b9e	privval: migrate to protobuf (#4985 )	2020-06-11 11:54:02 +02:00
Marko	a89f2581fc	blockchain: proto migration (#4969 ) ## Description migration of blockchain reactors to protobuf Closes: #XXX	2020-06-08 12:45:03 +00:00
Erik Grinaker	b76b270a23	blockchain/v2: correctly set block store base in status responses (#4971 ) See: https://github.com/tendermint/tendermint/pull/4969#pullrequestreview-425298225	2020-06-05 15:18:12 +00:00
Marko	a88537bb88	ints: stricter numbers (#4939 )	2020-06-04 16:34:56 +02:00
Erik Grinaker	b9a0d47f14	test/blockchain/v0: mitigate test data race (#4886 ) Mitigates the below data race. The proper fix involves not fiddling with reactor internals, which needs a rewrite of the test and possible additional reactor infrastructure. ``` ================== WARNING: DATA RACE Write at 0x00c001118e78 by goroutine 187: github.com/tendermint/tendermint/blockchain/v0.TestBadBlockStopsPeer() /go/src/github.com/tendermint/tendermint/blockchain/v0/reactor_test.go:234 +0x9d7 testing.tRunner() /usr/local/go/src/testing/testing.go:992 +0x1eb Previous read at 0x00c001118e78 by goroutine 326: [failed to restore the stack] Goroutine 187 (running) created at: testing.(T).Run() /usr/local/go/src/testing/testing.go:1043 +0x660 testing.runTests.func1() /usr/local/go/src/testing/testing.go:1285 +0xa6 testing.tRunner() /usr/local/go/src/testing/testing.go:992 +0x1eb testing.runTests() /usr/local/go/src/testing/testing.go:1283 +0x527 testing.(M).Run() /usr/local/go/src/testing/testing.go:1200 +0x2ff main.main() _testmain.go:112 +0x337 Goroutine 326 (running) created at: github.com/tendermint/tendermint/blockchain/v0.(BlockchainReactor).OnStart() /go/src/github.com/tendermint/tendermint/blockchain/v0/reactor.go:118 +0x12c github.com/tendermint/tendermint/libs/service.(BaseService).Start() /go/src/github.com/tendermint/tendermint/libs/service/service.go:140 +0x504 github.com/tendermint/tendermint/blockchain/v0.(BlockchainReactor).Start() <autogenerated>:1 +0x43 github.com/tendermint/tendermint/p2p.(Switch).OnStart() /go/src/github.com/tendermint/tendermint/p2p/switch.go:225 +0x120 github.com/tendermint/tendermint/libs/service.(BaseService).Start() /go/src/github.com/tendermint/tendermint/libs/service/service.go:140 +0x504 github.com/tendermint/tendermint/p2p.StartSwitches() /go/src/github.com/tendermint/tendermint/p2p/test_util.go:168 +0x75 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches() /go/src/github.com/tendermint/tendermint/p2p/test_util.go:89 +0x17d github.com/tendermint/tendermint/blockchain/v0.TestBadBlockStopsPeer() /go/src/github.com/tendermint/tendermint/blockchain/v0/reactor_test.go:209 +0x768 testing.tRunner() /usr/local/go/src/testing/testing.go:992 +0x1eb ================== panic: BlockStore can only save contiguous blocks. Wanted 149, got 147 goroutine 1259 [running]: github.com/tendermint/tendermint/store.(BlockStore).SaveBlock(0xc000ff9cc0, 0xc001997180, 0xc0010c6a00, 0xc0013b3000) /go/src/github.com/tendermint/tendermint/store/store.go:276 +0xbc4 github.com/tendermint/tendermint/blockchain/v0.(BlockchainReactor).poolRoutine(0xc001118d00, 0x107c000) /go/src/github.com/tendermint/tendermint/blockchain/v0/reactor.go:355 +0xe90 created by github.com/tendermint/tendermint/blockchain/v0.(BlockchainReactor).OnStart /go/src/github.com/tendermint/tendermint/blockchain/v0/reactor.go:118 +0x12d FAIL github.com/tendermint/tendermint/blockchain/v0 11.447s FAIL ```	2020-05-26 11:51:37 +00:00
Callum Waters	970cbbad6d	blockchain[v1]: increased timeout times for peer tests (#4871 )	2020-05-25 17:33:13 +02:00
Marko	9149ee7d8b	lint: various fixes ## Description various linitng fixes	2020-05-18 10:20:06 +00:00

1 2 3 4 5 ...

385 Commits