Commit Graph

9233 Commits

Author SHA1 Message Date
Callum Waters
c3dc7d20df docs: add roadmap to repo (#7107) 2021-10-15 10:35:36 +02:00
Jared Zhou
b95c261981 rpc: fix typo in broadcast commit (#7124) 2021-10-14 10:34:25 +02:00
M. J. Fromberger
bc1a20dbb8 Revert temporary patch to buf.yaml. (#7122)
This patch was needed to pass the buf breakage check for the proto file removed
in #7121, but now that master contains the change we no longer need the patch.
2021-10-13 23:37:50 +00:00
M. J. Fromberger
86f00135dd rpc: Remove the deprecated gRPC interface to the RPC service (#7121)
This change removes the partial gRPC interface to the RPC service, which was
deprecated in resolution of #6718.

Details:
- rpc: Remove the client and server interfaces and proto definitions.
- Remove the gRPC settings from the config library.
- Remove gRPC setup for the RPC service in the node startup.
- Fix various test helpers to remove gRPC bits.
- Remove the --rpc.grpc-laddr flag from the CLI.

Note that to satisfy the protobuf interface check, this change also includes a
temporary edit to buf.yaml, that I will revert after this is merged.
2021-10-13 15:01:01 -07:00
William Banfield
ff7b0e638e p2p: fix priority queue bytes pending calculation (#7120)
This metric describes itself as 'pending' but never actual decrements when the messages are removed from the queue.

This change fixes that by decrementing the metric when the data is removed from the queue.
2021-10-13 21:18:44 +00:00
William Banfield
36a1acff52 internal/proxy: add initial set of abci metrics (#7115)
This PR adds an initial set of metrics for use ABCI. The initial metrics enable the calculation of timing histograms and call counts for each of the ABCI methods. The metrics are also labeled as either 'sync' or 'async' to determine if the method call was performed using ABCI's `*Async` methods.

An example of these metrics is included here for reference:
```
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0001"} 0
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0004"} 5
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.002"} 12
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.009"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.02"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.1"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.65"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="2"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="6"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="25"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="+Inf"} 13
tendermint_abci_connection_method_timing_sum{chain_id="ci",method="commit",type="sync"} 0.007802058000000001
tendermint_abci_connection_method_timing_count{chain_id="ci",method="commit",type="sync"} 13
```

These metrics can easily be graphed using prometheus's `histogram_quantile(...)` method to pick out a particular quantile to graph or examine. I chose buckets that were somewhat of an estimate of expected range of times for ABCI operations. They start at .0001 seconds and range to 25 seconds. The hope is that this range captures enough possible times to be useful for us and operators.
2021-10-13 20:52:25 +00:00
Sam Kleinman
164de91842 rpc: move evidence tests to shared fixtures (#7119)
This is follow on to the work in #7112.
2021-10-13 18:29:03 +00:00
Callum Waters
4fe0f262d4 changelog: add 0.34.14 updates (#7117)
This PR adds the 0.34.14 changes to the changelog in master
2021-10-13 12:28:43 +00:00
M. J. Fromberger
6538776e6a build: Fix build-docker to include the full context. (#7114)
Fixes #7068. The build-docker rule relies on being able to run make
build-linux, but did not pull the Makefile into the build context.
There are various ways to fix this, but this was probably the smallest.
2021-10-12 22:29:06 +00:00
Sam Kleinman
4781d04d18 node: always close database engine (#7113) 2021-10-12 17:40:59 -04:00
Sam Kleinman
52ed994416 test: cleanup rpc/client and node test fixtures (#7112) 2021-10-12 16:49:45 -04:00
lklimek
0524558696 refactor: assignment copies lock value (#7108)
Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
2021-10-12 13:22:57 -07:00
Sam Kleinman
d837432681 ci: use run-multiple.sh for e2e pr tests (#7111)
* ci: use run-multiple.sh for e2e pr tests

* fix labeling

* Update .github/workflows/e2e-nightly-35x.yml

Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>

Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2021-10-12 16:28:10 +00:00
Sam Kleinman
34a3fcd8fc Revert "abci: change client to use multi-reader mutexes (#6306)" (#7106)
This reverts commit 1c4dbe30d4.
2021-10-12 11:29:31 -04:00
M. J. Fromberger
48295955ed light: Update links in package docs. (#7099)
Fixes #7098. The light client documentation moved to the spec repository.

I was not able to figure out what happened to light-client-protocol.md, it was removed in #5252 but no corresponding file exists in the spec repository. Since the spec also discusses the protocol, this change simply links to the spec and removes the non-functional reference.

Alternatively we could link to the top-level [light client doc](https://docs.tendermint.com/master/tendermint-core/light-client.html) if you think that's better.
2021-10-11 23:21:45 +00:00
Sam Kleinman
ded310093e lint: fix collection of stale errors (#7090)
Few things that had been annoying.
2021-10-09 15:33:54 +00:00
Sam Kleinman
befd669794 e2e: light nodes should use builtin abci app (#7095) 2021-10-09 04:20:09 +00:00
Sam Kleinman
3646b635d3 p2p, types: remove legacy NetAddress type (#7084) 2021-10-08 12:29:20 -04:00
Callum Waters
59404003ee p2p: rename pexV2 to pex (#7088) 2021-10-08 16:53:54 +02:00
Sam Kleinman
f2a8f5e054 e2e: abci protocol should be consistent across networks (#7078)
It seems weird in retrospect that we allow networks to contain
applications that use different ABCI protocols.
2021-10-08 13:42:23 +00:00
Sam Kleinman
1b5bb5348f p2p: cleanup unused arguments (#7079)
This is mostly just reading through the output of uparam, after
noticing that there were a few places where we were ignoring some arguments.
2021-10-08 12:49:17 +00:00
Callum Waters
4ca130d226 cli: allow node operator to rollback last state (#7033) 2021-10-08 09:15:13 +02:00
Sam Kleinman
1f438f205a e2e: improve network connectivity (#7077)
This tweaks the connectivity of test configurations, in hopes that more will be viable.

Additionally reduces the prevalence of testing the legacy mempool.
2021-10-07 23:07:35 +00:00
Sam Kleinman
5bf30bb049 p2p: cleanup transport interface (#7071)
This is another batch of things to cleanup in the legacy P2P system.
2021-10-06 19:17:44 +00:00
dependabot[bot]
e53f92ba9c build(deps): Bump github.com/adlio/schema from 1.1.13 to 1.1.14 (#7069)
Bumps [github.com/adlio/schema](https://github.com/adlio/schema) from 1.1.13 to 1.1.14.
- [Release notes](https://github.com/adlio/schema/releases)
- [Commits](https://github.com/adlio/schema/compare/v1.1.13...v1.1.14)

---
updated-dependencies:
- dependency-name: github.com/adlio/schema
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-06 09:14:55 -04:00
Callum Waters
e4d6f6df09 docs: create separate releases doc (#7040) 2021-10-06 11:59:21 +02:00
Sam Kleinman
0ef1a12186 ci: fix p2p configuration for e2e tests (#7066)
My earlier p2p cleanup code removed support for the p2p tests from the
e2e generator and runner, but missed removing the CI
configuration. This patch remedies that.
2021-10-06 04:11:19 +00:00
Sam Kleinman
72aee47847 ci: 0.35.x nightly should run from master and checkout the release branch (#7067)
Nightly branches run CI from master branch, and the configuration misses checking out the correct ref.
2021-10-06 04:08:54 +00:00
M. J. Fromberger
109814c85a Clarify decision record for ADR-065. (#7062)
While discussing a question about the indexing interface (#7044), we found some
confusion about the intent of the design decisions in ADR 065.

Based on discussion with the original authors of the ADR, this commit adds some
language to the Decisions section to spell out the intentions more clearly, and
to call out future work that this ADR did not explicitly decide about.
2021-10-05 14:10:11 -07:00
Sam Kleinman
851d2e3bde mempool,rpc: add removetx rpc method (#7047)
Addresses one of the concerns with #7041.

Provides a mechanism (via the RPC interface) to delete a single transaction, described by its hash, from the mempool. The method returns an error if the transaction cannot be found. Once the transaction is removed it remains in the cache and cannot be resubmitted until the cache is cleared or it expires from the cache.
2021-10-05 20:23:15 +00:00
Sam Kleinman
3ea81bfaa7 p2p: remove wdrr queue (#7064)
This code hasn't been battle tested, and seems to have grown
increasingly flaky int tests. Given our general direction of reducing
queue complexity over the next couple of releases I think it makes
sense to remove it.
2021-10-05 20:09:31 +00:00
Callum Waters
5703ae2fb3 e2e: automatically prune old app snapshots (#7034)
This PR tackles the case of using the e2e application in a long lived testnet. The application continually saves snapshots (usually every 100 blocks) which after a while bloats the size of the application. This PR prunes older snapshots so that only the most recent 10 snapshots remain.
2021-10-05 18:19:12 +00:00
Sam Kleinman
03ad7d6f20 p2p: delete legacy stack initial pass (#7035)
A few notes:

- this is not all the deletion that we can do, but this is the most
  "simple" case: it leaves in shims, and there's some trivial
  additional cleanup to the transport that can happen but that
  requires writing more code, and I wanted this to be easy to review
  above all else.
  
- This should land *after* we cut the branch for 0.35, but I'm
  anticipating that to happen soon, and I wanted to run this through
  CI.
2021-10-05 13:40:32 +00:00
William Banfield
f5b9c210ca consensus: wait until peerUpdates channel is closed to close remaining peers (#7058)
The race occurred as a result of a goroutine launched by `processPeerUpdate` racing with the `OnStop` method. The `processPeerUpdates` goroutine deletes from the map as `OnStop` is reading from it. This change updates the `OnStop` method to wait for the peer updates channel to be done before closing the peers. It also copies the map contents to a new map so that it will not conflict with the view of the map that the goroutine created in `processPeerUpdate` sees.
2021-10-04 22:37:18 +00:00
Sam Kleinman
cb69ed8135 blocksync/v2: remove unsupported reactor (#7046)
This commit should be one of the first to land as part of the v0.36
cycle *after* cutting the 0.35 branch. 

The blocksync/v2 reactor was originally implemented as an experiement
to produce an implementation of the blockstack protocol that would be
easier to test and validate, but it was never appropriately
operationalized and this implementation was never fully debugged. When
the p2p layer was refactored as part of the 0.35 cycle, the v2
implementation was not refactored and it was left in the codebase but
not removed. This commit just removes all references to it.
2021-10-04 21:12:51 +00:00
William Banfield
c201e3b54d scripts: fix authors script to take a ref (#7051)
This script is referenced from the release documentation, we should make sure it's functional. This is helpful in generating the "Special Thanks" section of the changelog.
2021-10-04 19:47:50 +00:00
M. J. Fromberger
b30ec89ee9 Add an e2e workflow for the v0.35.x backport branch. (#7048) 2021-10-04 10:35:16 -07:00
Sam Kleinman
6276fdcb5d ci: mergify support for 0.35 backports (#7050) 2021-10-04 13:04:15 -04:00
M. J. Fromberger
f361ce09b3 Update Go toolchains to 1.17 in Actions workflows. (#7049) v0.36.0-dev 2021-10-04 15:40:50 +00:00
William Banfield
243c62cc68 statesync: improve rare p2p race condition (#7042)
This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. 

The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.
2021-10-01 20:33:12 +00:00
William Banfield
177850a2c9 statesync: remove deadlock on init fail (#7029)
When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag.

```
goroutine 36 [chan receive]:
github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240)
        github.com/tendermint/tendermint/node/node.go:769 +0x132
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1()
        github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62
github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0)
        github.com/tendermint/tendermint/libs/os/os.go:26 +0x102
created by github.com/tendermint/tendermint/libs/os.TrapSignal
        github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6

goroutine 188 [semacquire]:
sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1)
        runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00026b1c8)
        sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
        sync/mutex.go:81
sync.(*RWMutex).Lock(0xc00026b1c8)
        sync/rwmutex.go:111 +0x90
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5
github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080)
        github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab
created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart
        github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd)
```
2021-09-30 19:19:10 +00:00
M. J. Fromberger
bdd815ebc9 Align atomic struct field for compatibility in 32-bit ABIs. (#7037)
The layout of struct fields means that interior fields may not be properly
aligned for 64-bit access.

Fixes #7000.
2021-09-30 10:53:05 -07:00
M. J. Fromberger
77052370cc Update default config template to match mapstructure keys. (#7036)
Fix a couple of cases where we updated the keys in the config reader, but
forgot to update some of their uses in the default template.

Fixes #7031.
2021-09-30 17:13:32 +00:00
William Banfield
6a0d9c832a blocksync: fix shutdown deadlock issue (#7030)
When shutting down blocksync, it is observed that the process can hang completely. A dump of running goroutines reveals that this is due to goroutines not listening on the correct shutdown signal. Namely, the `poolRoutine` goroutine does not wait on `pool.Quit`. The `poolRoutine` does not receive any other shutdown signal during `OnStop` becuase it must stop before the `r.closeCh` is closed. Currently the `poolRoutine` listens in the `closeCh` which will not close until the `poolRoutine` stops and calls `poolWG.Done()`.

This change also puts the `requestRoutine()` in the `OnStart` method to make it more visible since it does not rely on anything that is spawned in the `poolRoutine`.

```
goroutine 183 [semacquire]:
sync.runtime_Semacquire(0xc0000d3bd8)
        runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc0000d3bd0)
        sync/waitgroup.go:130 +0x65
github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStop(0xc0000d3a00)
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:193 +0x47
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0000d3a00, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc00052c000)
        github.com/tendermint/tendermint/node/node.go:758 +0xc62
github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00052c000, 0x0, 0x0)
        github.com/tendermint/tendermint/libs/service/service.go:171 +0x323
github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1()
        github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62
github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000df6d20, 0x7f04a68da900, 0xc0004a8930, 0xc0005a72d8)
        github.com/tendermint/tendermint/libs/os/os.go:26 +0x102
created by github.com/tendermint/tendermint/libs/os.TrapSignal
        github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6


goroutine 161 [select]:
github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).poolRoutine(0xc0000d3a00, 0x0)
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:464 +0x2b3
created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:174 +0xf1

goroutine 162 [select]:
github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).processBlockSyncCh(0xc0000d3a00)
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:310 +0x151
created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:177 +0x54

goroutine 163 [select]:
github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).processPeerUpdates(0xc0000d3a00)
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:363 +0x12b
created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart
        github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:178 +0x76
```
2021-09-30 16:19:18 +00:00
Tess Rinearson
c9d92f5f19 .github: remove tessr and bez from codeowners (#7028) 2021-09-29 22:02:28 +02:00
Sam Kleinman
23fe6fd2f9 statesync: ensure test network properly configured (#7026)
This test reliably gets hung up on network configuration, (which may
be a real issue,) but it's network setup is handcranked and we should
ensure that the test focuses on it's core assertions and doesn't fail for 
test architecture reasons.
2021-09-29 16:38:27 +00:00
M. J. Fromberger
962caeae65 Make doc site index default to the latest release (#7023)
Fix the order of lines in docs/versions so that v0.34 is last (the current release).

Related changes:

- Update docs/DOCS_README.md to reflect the current state of how we publish the site.
- Fix the build-docs target in Makefile to not perturb the package-lock.json during the build.
- Fix the Makefile rule to not clobber package-lock.json.
2021-09-29 08:38:52 -07:00
Sam Kleinman
8758078786 consensus: avoid unbuffered channel in state test (#7025) 2021-09-29 11:19:34 -04:00
Sam Kleinman
b1dfbb8bc3 e2e: generator ensure p2p modes (#7021) 2021-09-28 17:04:37 -04:00
Sam Kleinman
c18470a5f1 e2e: use network size in load generator (#7019) 2021-09-28 16:47:35 -04:00