* Makefile: always pull image in proto-gen-docker. (#5953)
The `proto-gen-docker` target didn't pull an updated Docker image, and would use a local image if present which could be outdated and produce wrong results.
* test: fix TestPEXReactorRunning data race (#5955)
Fixes#5941.
Not entirely sure that this will fix the problem (couldn't reproduce), but in any case this is an artifact of a hack in the P2P transport refactor to make it work with the legacy P2P stack, and will be removed when the refactor is done anyway.
* test/fuzz: move fuzz tests into this repo (#5918)
Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Closes#5907
- add init-corpus to blockchain reactor
- remove validator-set FromBytes test
now that we have proto, we don't need to test it! bye amino
- simplify mempool test
do we want to test remote ABCI app?
- do not recreate mux on every crash in jsonrpc test
- update p2p pex reactor test
- remove p2p/listener test
the API has changed + I did not understand what it's tested anyway
- update secretconnection test
- add readme and makefile
- list inputs in readme
- add nightly workflow
- remove blockchain fuzz test
EncodeMsg / DecodeMsg no longer exist
* docker: dont login when in PR (#5961)
* docker: release Linux/ARM64 image (#5925)
Co-authored-by: Marko <marbar3778@yahoo.com>
* p2p: make PeerManager.DialNext() and EvictNext() block (#5947)
See #5936 and #5938 for background.
The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about.
I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
* libs/log: format []byte as hexidecimal string (uppercased) (#5960)
Closes: #5806
Co-authored-by: Lanie Hei <heixx011@umn.edu>
* docs: log level docs (#5945)
## Description
add section on configuring log levels
Closes: #XXX
* .github: fix fuzz-nightly job (#5965)
outputs is a property of the job, not an individual step.
* e2e: add control over the log level of nodes (#5958)
* mempool: fix reactor tests (#5967)
## Description
Update the faux router to either drop channel errors or handle them based on an argument. This prevents deadlocks in tests where we try to send an error on the mempool channel but there is no reader.
Closes: #5956
* p2p: improve peerStore prototype (#5954)
This improves the `peerStore` prototype by e.g.:
* Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance.
* Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations.
* Caching the ranked peer set, as a temporary solution until a better data structure is implemented.
* Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full.
* Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics.
* p2p: simplify PeerManager upgrade logic (#5962)
Follow-up from #5947, branched off of #5954.
This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes:
* Add `evict` map which queues up peers to explicitly evict.
* `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`).
* `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity.
* `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected).
* `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer.
This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of.
* p2p: add PeerManager.Advertise() (#5957)
Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement.
* blockchain v0: fix waitgroup data race (#5970)
## Description
Fixes the data race in usage of `WaitGroup`. Specifically, the case where we invoke `Wait` _before_ the first delta `Add` call when the current waitgroup counter is zero. See https://golang.org/pkg/sync/#WaitGroup.Add.
Still not sure how this manifests itself in a test since the reactor has to be stopped virtually immediately after being started (I think?).
Regardless, this is the appropriate fix.
closes: #5968
* tests: fix `make test` (#5966)
## Description
- bump deadlock dep to master
- fixes `make test` since we now use `deadlock.Once`
Closes: #XXX
* terminate go-fuzz gracefully (w/ SIGINT) (#5973)
and preserve exit code.
```
2021/01/26 03:34:49 workers: 2, corpus: 4 (8m28s ago), crashers: 0, restarts: 1/9976, execs: 11013732 (21596/sec), cover: 121, uptime: 8m30s
make: *** [fuzz-mempool] Terminated
Makefile:5: recipe for target 'fuzz-mempool' failed
Error: Process completed with exit code 124.
```
https://github.com/tendermint/tendermint/runs/1766661614
`continue-on-error` should make GH ignore any error codes.
* p2p: add prototype PEX reactor for new stack (#5971)
This adds a prototype PEX reactor for the new P2P stack.
* proto/p2p: rename PEX messages and fields (#5974)
Fixes#5899 by renaming a bunch of P2P Protobuf entities (while maintaining wire compatibility):
* `Message` to `PexMessage` (as it's only used for PEX messages).
* `PexAddrs` to `PexResponse`.
* `PexResponse.Addrs` to `PexResponse.Addresses`.
* `NetAddress` to `PexAddress` (as it's only used by PEX).
* p2p: resolve PEX addresses in PEX reactor (#5980)
This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974.
I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options:
1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on.
2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs.
I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this.
* test/p2p: close transports to avoid goroutine leak failures (#5982)
* mempool: fix TestReactorNoBroadcastToSender (#5984)
## Description
Looks like I missed a test in the original PR when fixing the tests.
Closes: #5956
* mempool: fix mempool tests timeout (#5988)
* p2p: use stopCtx when dialing peers in Router (#5983)
This ensures we don't leak dial goroutines when shutting down the router.
* docs: fix typo in state sync example (#5989)
Co-authored-by: Erik Grinaker <erik@interchain.berlin>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Marko <marbar3778@yahoo.com>
Co-authored-by: odidev <odidev@puresoftware.com>
Co-authored-by: Lanie Hei <heixx011@umn.edu>
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Sergey <52304443+c29r3@users.noreply.github.com>
@p4u from vocdoni.io reported that the mempool might behave incorrectly under a
high load. The consequences can range from pauses between blocks to the peers
disconnecting from this node.
My current theory is that the flowrate lib we're using to control flow
(multiplex over a single TCP connection) was not designed w/ large blobs
(1MB batch of txs) in mind.
I've tried decreasing the Mempool reactor priority, but that did not
have any visible effect. What actually worked is adding a time.Sleep
into mempool.Reactor#broadcastTxRoutine after an each successful send ==
manual control flow of sort.
As a temporary remedy (until the mempool package
is refactored), the max-batch-bytes was disabled. Transactions will be sent
one by one without batching
Closes#5796
When set to true, an invalid transaction will be kept in the cache (this may help some applications to protect against spam).
NOTE: this is a temporary config option. The more correct solution would be to add a TTL to each transaction (i.e. CheckTx may return a TTL in ResponseCheckTx).
Closes: #5751
After a reactor has failed to parse an incoming message, it shouldn't output the "bad" data into the logs, as that data is unfiltered and could have anything in it. (We also don't think this information is helpful to have in the logs anyways.)
`abci.Client`:
- Sync and Async methods now accept a context for cancellation
* grpc client uses context to cancel both Sync and Async requests
* local client ignores context parameter
* socket client uses context to cancel Sync requests and to drop Async requests before sending them if context was cancelled prior to that
- Async methods return an error
* socket client returns an error immediately if queue is full for Async requests
* local client always returns nil error
* grpc client returns an error if context was cancelled before we got response or the receiving queue had a space for response (do not confuse with the sending queue from the socket client)
- specify clients semantics in [doc.go](https://raw.githubusercontent.com/tendermint/tendermint/27112fffa62276bc016d56741f686f0f77931748/abci/client/doc.go)
`mempool.TxInfo`
- add optional `Context` to `TxInfo`, which can be used to cancel `CheckTx` request
Closes#5190
found out this issue when trying to decouple mempool reactor with its
underlying clist implementation.
according to its comment, the test `TestReactorNoBroadcastToSender` is
intended to make sure that a peer shouldn't send tx back to its origin.
however, the current test forgot to init peer state key, thus the code
will get stuck at waiting for peer to catch up state *instead of* skipping
sending tx back:
b8d08b9ef4/mempool/reactor.go (L216-L226)
this PR fixes the issue by init peer state key.
## Description
Add test vectors for all reactors
- [x] state-sync
- [x] privval
- [x] mempool
- [x] p2p
- [x] evidence
- [ ] light?
this PR is primarily oriented at testvectors for things going over the wire. should we expand the testvectors into types as well?
Closes: #XXX
## Description
This PR wraps the stdlib sync.(RW)Mutex & godeadlock.(RW)Mutex. This enables using go-deadlock via a build flag instead of using sed to replace sync with godeadlock in all files
Closes: #3242
In order to have more control over the mempool implementation,
introduce a new exported function RemoveTxByKey.
Export also TxKey() and TxKeySize. Use TxKeySize const instead of
sha256.size, so future changes on the hash function won't break the API.
Allows using a TxKey (32 bytes reference) as parameter instead of
the complete array set. So the application layer does not need to
keep track of the whole transaction but only of the sha256 hash (32 bytes).
This function is useful when mempool.Recheck is disabled.
Allows the Application layer to implement its own cleaning mechanism
without having to re-implement the whole mempool interface.
Mempool.Update() would probably also need to change from txBytes to txKey,
but that would require to change the Interface thus will break backwards
compatibility. For now RemoveTxByKey() looks like a good compromise,
it won't break anything and will help to solve some mempool issues from the
application layer.
Signed-off-by: p4u <pau@dabax.net>
## Description
To provide the ability to add more message types without needing to cause a breaking change the mempool message was migrated to a oneof.
Closes: #XXX
* proto: move mempool to proto
- changes according to moving the mempool reactor to proto
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
Closes: #2883
Closes#4603
Commands used (VIM):
```
:args `rg -l errors.Wrap`
:argdo normal @q | update
```
where q is a macros rewriting the `errors.Wrap` to `fmt.Errorf`.
to prevent malicious nodes from sending us large messages (~21MB, which
is the default `RecvMessageCapacity`)
This allows us to remove unnecessary `maxMsgSize` check in `decodeMsg`. Since each channel has a msg capacity set to `maxMsgSize`, there's no need to check it again in `decodeMsg`.
Closes#1503
Previously, many reactors were initialized with the name "Reactor," which made it difficult to log which reactor was doing what. This changes those reactors' names to something more descriptive.
* lint: golint issue fixes
- on my local machine golint is a lot stricter than the bot so slowly going through and fixing things.
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* more fixes from golint
* remove isPeerPersistentFn
* add changelog entry
* libs/common: Refactor libs/common 5
- move mathematical functions and types out of `libs/common` to math pkg
- move net functions out of `libs/common` to net pkg
- move string functions out of `libs/common` to strings pkg
- move async functions out of `libs/common` to async pkg
- move bit functions out of `libs/common` to bits pkg
- move cmap functions out of `libs/common` to cmap pkg
- move os functions out of `libs/common` to os pkg
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* fix testing issues
* fix tests
closes#41417
woooooooooooooooooo kill the cmn pkg
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* add changelog entry
* fix goimport issues
* run gofmt
* libs/common: refactor libs common 3
- move nil.go into types folder and make private
- move service & baseservice out of common into service pkg
ref #4147
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* add changelog entry
* libs/common: refactor libs/common 2
- move random function to there own pkg
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* change imports and usage throughout repo
* fix goimports
* add changelog entry
* Fix long line errors in abci, crypto, and libs packages
* Fix long lines in p2p and rpc packages
* Fix long lines in abci, state, and tools packages
* Fix long lines in behaviour and blockchain packages
* Fix long lines in cmd and config packages
* Begin fixing long lines in consensus package
* Finish fixing long lines in consensus package
* Add lll exclusion for lines containing URLs
* Fix long lines in crypto package
* Fix long lines in evidence package
* Fix long lines in mempool and node packages
* Fix long lines in libs package
* Fix long lines in lite package
* Fix new long line in node package
* Fix long lines in p2p package
* Ignore gocritic warning
* Fix long lines in privval package
* Fix long lines in rpc package
* Fix long lines in scripts package
* Fix long lines in state package
* Fix long lines in tools package
* Fix long lines in types package
* Enable lll linter
* Include sender when logging rejected txns
* Log as peerID to be consistent with other log messages
* Updated CHANGELOG_PENDING
* Handle nil source
* Updated PR link in CHANGELOG_PENDING
* Renamed TxInfo.SenderAddress and peerAddress til PeerFullID
* Renamed PeerFullID to PeerP2PID
* Forgot to rename a couple of references
* Correct memory alignment for 32-bit machine.
Switching the `txsBytes` and `rechecking` fields of `CListMempool` to ensure the correct memory alignment for `atomic.LoadInt64` on 32-bit machine.
* Update CHANGELOG_PENDING.md for #3968Fixed#3968 `mempool` Memory Loading Error on 32-bit Ubuntu 16.04 Machine.
* Update CHANGELOG_PENDING.md
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
* Remove unnecessary type conversions
* Consolidate repeated strings into consts
* Clothe return statements
* Update blockchain/v1/reactor_fsm_test.go
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
This PR repairs linter errors seen when running the following commands:
golangci-lint run --no-config --disable-all=true --enable=unconvert
golangci-lint run --no-config --disable-all=true --enable=goconst
golangci-lint run --no-config --disable-all=true --enable=nakedret
Contributes to #3262
Add gocritic as a linter
The linting is not complete, but should i complete in this PR or in a following.
23 files have been touched so it may be better to do in a following PR
Commits:
* Add gocritic to linting
- Added gocritic to linting
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* gocritic
* pr comments
* remove switch in cmdBatch
cleanup to add linter
grpc change:
https://godoc.org/google.golang.org/grpc#WithContextDialerhttps://godoc.org/google.golang.org/grpc#WithDialer
grpc/grpc-go#2627
prometheous change:
due to UninstrumentedHandler, being deprecated in the future
empty branch = empty if or else statement
didn't delete them entirely but commented
couldn't find a reason to have them
could not replicate the issue #3406
but if want to keep it commented then we should comment out the if statement as well
* Renamed wire.go to codec.go
- Wire was the previous name of amino
- Codec describes the file better than `wire` & `amino`
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* ide error
* rename amino.go to codec.go