Commit Graph

9171 Commits

Author SHA1 Message Date
Sam Kleinman
37ca98a544 e2e: reduce number of statesyncs in test networks (#6999) 2021-09-25 19:14:38 -04:00
Sam Kleinman
c101fa17ab e2e: add limit and sort to generator (#6998)
I observed a couple of problems with the generator in some recent tests: 

- there were a couple of hybrid test cases which did not have any
  legacy nodes (randomness and all.) I change the probability to
  produce more reliable results.

- added options to the generation to be able to add a max (to
  compliment the earlier min) number of nodes for local testing. 

- added an option to support reversing the sort order so "more
  complex" networks were first, as well as tweaked some of the point
  values. 

- this refactored the generators cli parsing to be a bit more clear.
2021-09-25 15:53:04 +00:00
M. J. Fromberger
118bfe2087 abci: Flush socket requests and responses immediately. (#6997)
The main effect of this change is to flush the socket client and server message
encoding buffers immediately once the message is fully and correctly encoded.
This allows us to remove the timer and some other special cases, without
changing the observed behaviour of the system.

-- Background

The socket protocol client and server each use a buffered writer to encode
request and response messages onto the underlying connection. This reduces the
possibility of a single message being split across multiple writes, but has the
side-effect that a request may remain buffered for some time.

The implementation worked around this by keeping a ticker that occasionally
triggers a flush, and by flushing the writer in response to an explicit request
baked into the client/server protocol (see also #6994).

These workarounds are both unnecessary: Once a message has been dequeued for
sending and fully encoded in wire format, there is no real use keeping all or
part of it buffered locally.  Moreover, using an asynchronous process to flush
the buffer makes the round-trip performance of the request unpredictable.

-- Benchmarks

Code: https://play.golang.org/p/0ChUOxJOiHt

I found no pre-existing performance benchmarks to justify the flush pattern,
but a natural question is whether this will significantly harm client/server
performance.  To test this, I implemented a simple benchmark that transfers
randomly-sized byte buffers from a no-op "client" to a no-op "server" over a
Unix-domain socket, using a buffered writer, both with and without explicit
flushes after each write.

As the following data show, flushing every time (FLUSH=true) does reduce raw
throughput, but not by a significant amount except for very small request
sizes, where the transfer time is already trivial (1.9μs).  Given that the
client is calibrated for 1MiB transactions, the overhead is not meaningful.

The percentage in each section is the speedup for flushing only when the buffer
is full, relative to flushing every block.  The benchmark uses the default
buffer size (4096 bytes), which is the same value used by the socket client and
server implementation:

  FLUSH  NBLOCKS  MAX      AVG     TOTAL       ELAPSED       TIME/BLOCK
  false  3957471  512      255     1011165416  2.00018873s   505ns
  true   1068568  512      255     273064368   2.000217051s  1.871µs
                                                             (73%)

  false  536096   4096     2048    1098066401  2.000229108s  3.731µs
  true   477911   4096     2047    978746731   2.000177825s  4.185µs
                                                             (10.8%)

  false  124595   16384    8181    1019340160  2.000235086s  16.053µs
  true   120995   16384    8179    989703064   2.000329349s  16.532µs
                                                             (2.9%)

  false  2114     1048576  525693  1111316541  2.000479928s  946.3µs
  true   2083     1048576  526379  1096449173  2.001817137s  961.025µs
                                                             (1.5%)

Note also that the FLUSH=false baseline is actually faster than the production
code, which flushes more often than is required by the buffer filling up.

Moreover, the timer slows down the overall transaction rate of the client and
server, indepenedent of how fast the socket transfer is, so the loss on a real
workload is probably much less.
2021-09-24 15:37:25 -07:00
Sam Kleinman
71c6682b57 statesync: clean up reactor/syncer lifecylce (#6995)
I've been noticing that there are a number of situations where the
statesync reactor blocks waiting for peers (or similar,) I've moved
things around to improve outcomes in local tests.
2021-09-24 21:40:12 +00:00
Sam Kleinman
dbad701515 ci: use smart merges (#6993) 2021-09-24 14:59:10 -04:00
Sam Kleinman
5e45676875 e2e: do not inject evidence through light proxy (#6992)
In the last run, there were two problems at the RPC layer returned
from light nodes' RPC end points. I think exercising the light client
proxy RPC system is something that can/should be done via unit
testing, and that likely these errors are (in production) transient
and (in CI) very likely to fail for test environment issues.
2021-09-24 18:27:00 +00:00
Sam Kleinman
08982c81fc e2e: skip validation of status apphash (#6991)
I believe this assertion is likely redundant given that we're checking the block apphash.
2021-09-24 17:49:06 +00:00
Sam Kleinman
b203c91799 rpc: implement BroadcastTxCommit without event subscriptions (#6984) 2021-09-24 13:01:35 -04:00
Sam Kleinman
ab8cfb9f57 e2e: tighten timing for load generation (#6990) 2021-09-24 12:28:51 -04:00
Sam Kleinman
c909f8a236 e2e: avoid non-determinism in app hash check (#6985) 2021-09-24 11:52:47 -04:00
Sam Kleinman
363b87e8ea changelog: add entry for interanlizations (#6989) 2021-09-24 10:50:19 -04:00
dependabot[bot]
dd4141e76f build(deps): Bump github.com/go-kit/kit from 0.11.0 to 0.12.0 (#6988)
Bumps [github.com/go-kit/kit](https://github.com/go-kit/kit) from 0.11.0 to 0.12.0.
- [Release notes](https://github.com/go-kit/kit/releases)
- [Commits](https://github.com/go-kit/kit/compare/v0.11.0...v0.12.0)

---
updated-dependencies:
- dependency-name: github.com/go-kit/kit
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-24 10:15:30 -04:00
Sam Kleinman
5ccd668c78 e2e: load should be proportional to network (#6983) 2021-09-23 16:58:10 -04:00
Sam Kleinman
e94c418ad9 e2e: always preserve failed networks (#6981) 2021-09-23 14:52:14 -04:00
Sam Kleinman
3d410e4a6b e2e: only check validator sets after statesync (#6980) 2021-09-23 14:31:59 -04:00
Sam Kleinman
8a171b8426 e2e: improve manifest sorting algorithim (#6979) 2021-09-23 12:42:20 -04:00
Sam Kleinman
bb8ffcb95b store: move pacakge to internal (#6978) 2021-09-23 11:46:42 -04:00
M. J. Fromberger
cf7537ea5f cleanup: Reduce and normalize import path aliasing. (#6975)
The code in the Tendermint repository makes heavy use of import aliasing.
This is made necessary by our extensive reuse of common base package names, and
by repetition of similar names across different subdirectories.

Unfortunately we have not been very consistent about which packages we alias in
various circumstances, and the aliases we use vary. In the spirit of the advice
in the style guide and https://github.com/golang/go/wiki/CodeReviewComments#imports,
his change makes an effort to clean up and normalize import aliasing.

This change makes no API or behavioral changes. It is a pure cleanup intended
o help make the code more readable to developers (including myself) trying to
understand what is being imported where.

Only unexported names have been modified, and the changes were generated and
applied mechanically with gofmt -r and comby, respecting the lexical and
syntactic rules of Go.  Even so, I did not fix every inconsistency. Where the
changes would be too disruptive, I left it alone.

The principles I followed in this cleanup are:

- Remove aliases that restate the package name.
- Remove aliases where the base package name is unambiguous.
- Move overly-terse abbreviations from the import to the usage site.
- Fix lexical issues (remove underscores, remove capitalization).
- Fix import groupings to more closely match the style guide.
- Group blank (side-effecting) imports and ensure they are commented.
- Add aliases to multiple imports with the same base package name.
2021-09-23 07:52:07 -07:00
Marko
c9beef796d proto: regenerate code (#6977)
## Description

Replace all seemed to have been used causing proto files to be changed without being regenerated
2021-09-23 12:37:11 +00:00
M. J. Fromberger
41ac5b90c5 Fix script paths in go:generate directives. (#6973)
We moved some files further down in the directory structure in #6964, which
caused the relative paths to the mockery wrapper to stop working.

There does not seem to be an obvious way to get the module root as a default
environment variable, so for now I just added the extra up-slashes.
2021-09-23 01:02:24 +00:00
M. J. Fromberger
7e4cc595d3 Remove the unused rpc/client/mocks package. (#6974)
This package is not used in the tendermint repository since 31e7cdee.

Note that this is not the same package as rpc/client/mock (N.B. singular) which
is still used in some tests.

A search of GitHub turns up only 11 uses, all of which are in clones of the
tendermint repo at old commits..
2021-09-22 17:45:11 -07:00
M. J. Fromberger
1995ef2572 rpc: Strip down the base RPC client interface. (#6971)
* rpc: Strip down the base RPC client interface.

Prior to this change, the RPC client interface requires implementing the entire
Service interface, but most of the methods of Service are not needed by the
concrete clients. Dissociate the Client interface from the Service interface.

- Extract only those methods of Service that are necessary to make the existing
  clients work.

- Update the clients to combine Start/Onstart and Stop/OnStop. This does not
  change what the clients do to start or stop. Only the websocket clients make
  use of this functionality anyway.

  The websocket implementation uses some plumbing from the BaseService helper.
  We should be able to excising that entirely, but the current interface
  dependencies among the clients would require a much larger change, and one
  that leaks into other (non-RPC) packages.

  As a less-invasive intermediate step, preserve the existing client behaviour
  (and tests) by extracting the necessary subset of the BaseService
  functionality to an analogous RunState helper for clients. I plan to obsolete
  that type in a future PR, but for now this makes a useful waypoint.

Related:
- Clean up client implementations.
- Update mocks.
2021-09-22 14:26:35 -07:00
Sam Kleinman
d04b6c2a5e e2e: run multiple should use preserve (#6972) 2021-09-22 13:13:31 -04:00
Sam Kleinman
1c4950dbd2 state: move package to internal (#6964) 2021-09-22 13:04:25 -04:00
Sam Kleinman
638346500d ci: reduce number of groups for 0.34 e2e runs (#6968) 2021-09-22 12:48:38 -04:00
Sam Kleinman
07d10184a1 inspect: remove duplicated construction path (#6966) 2021-09-22 12:32:49 -04:00
Sam Kleinman
5a13c7075b rfc: event system (#6957) 2021-09-21 11:22:55 -04:00
Daria K
0f53a590ff readme: update discord links (#6965)
This updates the Discord invite links.
2021-09-21 10:35:40 +00:00
Marko
df2d744ea9 config/docs: update and deprecated (#6879)
## Description

- Add deprecated to config values in toml
- update config in configuration doc
- explain how to set up a node with the new network

- add sentence about not needing to fork tendermint for built-in tutorial
 - closes #6865 
- add note to use a released version of tendermint with the tutorials. This is to avoid unknown issues prior to a release.
2021-09-21 10:32:00 +00:00
JayT106
84ffaaaf37 statesync/rpc: metrics for the statesync and the rpc SyncInfo (#6795) 2021-09-21 09:22:16 +02:00
Sam Kleinman
9dfdc62eb7 proxy: move proxy package to internal (#6953) 2021-09-20 15:18:48 -04:00
dependabot[bot]
cf59b8b38e build(deps): Bump github.com/spf13/viper from 1.8.1 to 1.9.0 (#6961)
Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.8.1 to 1.9.0.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](https://github.com/spf13/viper/compare/v1.8.1...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Kleinman <garen@tychoish.com>
2021-09-20 13:03:10 -04:00
Sam Kleinman
87b876a73b crypto/armor: remove unused package (#6963) 2021-09-20 12:30:26 -04:00
Ismail Khoffi
ad067d73b9 rfc: Fix a few typos and formatting glitches p2p roadmap (#6960) 2021-09-20 09:02:29 -04:00
dependabot[bot]
ea6eecbb91 build(deps): Bump github.com/vektra/mockery/v2 from 2.9.3 to 2.9.4 (#6956)
Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.9.3 to 2.9.4.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/.goreleaser.yml)
- [Commits](https://github.com/vektra/mockery/compare/v2.9.3...v2.9.4)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-17 10:06:40 -04:00
William Banfield
bf9232e99f e2e: cleanup on all errors if preserve not specified (#6950)
If the e2e tests error, they leave all of the e2e state around including containers and networks etc. 
We should clean this up when the tests shuts down, even if it exits in error.
2021-09-17 08:35:49 +00:00
Sam Kleinman
b0423e2445 e2e: allow load generator to succed for short tests (#6952)
This should address last night's failure. We've taken the perspective
of "the load generator shouldn't cause tests to fail" in recent
days/weeks, and I think this is just a next step along that line. The
e2e tests shouldn't test performance. 

I included some comments indicating the ways that this isn't ideal (it
is perhaps not), and I think that if test networks could make
assertions about the required rate, that might be a cool future
improvement (and good, perhaps, for system benchmarking.)
2021-09-16 15:45:51 +00:00
dependabot[bot]
b0684bd300 build(deps): Bump github.com/vektra/mockery/v2 from 2.9.0 to 2.9.3 (#6951)
Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.9.0 to 2.9.3.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/.goreleaser.yml)
- [Commits](https://github.com/vektra/mockery/compare/v2.9.0...v2.9.3)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-16 08:41:46 -04:00
William Banfield
382947ce93 rfc: add performance taxonomy rfc (#6921)
This document attempts to capture and discuss some of the areas of Tendermint that seem to be cited as causing performance issue. I'm hoping to continue to gather feedback and input on this document to better understand what issues Tendermint performance may cause for our users. 

The overall goal of this document is to allow the maintainers and community to get a better sense of these issues and to be more capably able to discuss them and weight trade-offs about any proposed performance-focused changes. This document does not aim to propose any performance improvements. It does suggest useful places for benchmarks and places where additional metrics would be useful for diagnosing and further understanding Tendermint performance.

Please comment with areas where my reasoning seems off or with additional areas that Tendermint performance may be causing user pain.
2021-09-16 06:13:27 +00:00
Callum Waters
9a7ce08e3e statesync: shut down node when statesync fails (#6944) 2021-09-16 07:43:23 +02:00
Sam Kleinman
55f6d20977 e2e: skip broadcastTxCommit check (#6949)
I think the `Sync` check covers our primary use case, and perhaps we
can turn this back on in the future after some kind of event-system
rewrite, or RPC rewrite that will avoid the serverside timeout.
2021-09-15 21:24:35 +00:00
Sam Kleinman
b9c35c1263 docs: fix openapi yaml lint (#6948)
saw this in the super lint.
2021-09-15 19:29:25 +00:00
Sam Kleinman
f08f72e334 rfc: e2e improvements (#6941) 2021-09-15 15:26:39 -04:00
Callum Waters
e932b469ed e2e: tweak semantics of waitForHeight (#6943) 2021-09-15 20:49:24 +02:00
Callum Waters
5db2a39643 docs: add documentation of unsafe_flush_mempool to openapi (#6947) 2021-09-15 17:28:01 +02:00
Sam Kleinman
6909158933 e2e: reduce load pressure (#6939) 2021-09-14 10:44:30 -04:00
dependabot[bot]
de2cffe7a4 build(deps): Bump codecov/codecov-action from 2.0.3 to 2.1.0 (#6938) 2021-09-14 08:31:41 -04:00
Sam Kleinman
c257cda212 e2e: slow load processes with longer evidence timeouts (#6936)
These are mostly the timeouts that I think we're still hitting in CI. 

At this point, the tests (on master) pass on my local machine (which is quite beefy) so I think this is just the first in (perhaps?) a sequence of changes that attempt to change timeouts and load patterns so that the tests pass in CI more reliably.
2021-09-13 20:57:25 +00:00
M. J. Fromberger
5a49d1b997 RFC 002 Interprocess Communication in Tendermint (#6913)
Communication in Tendermint among consensus nodes, applications, and operator
tools all use different message formats and transport mechanisms.  In some
cases there are multiple options. Having all these options complicates both the
code and the developer experience, and hides bugs. To support a more robust,
trustworthy, and usable system, we should document which communication paths
are essential, which could be removed or reduced in scope, and what we can
improve for the most important use cases.

This document proposes a variety of possible improvements of varying size and
scope. Specific design proposals should get their own documentation.
2021-09-13 15:41:21 -04:00
M. J. Fromberger
e4feb56813 Update CHANGELOG.md for release v0.34.13. (#6935)
Fixes #6933. I forgot to update master after I did the v0.34.13 release.
2021-09-13 18:58:04 +00:00