mirror of
https://github.com/tendermint/tendermint.git
synced 2026-02-11 14:21:18 +00:00
Merge branch 'master' into wb/rfc-perf-taxonomy
This commit is contained in:
@@ -62,7 +62,7 @@ be turned off regardless of other values provided.
|
||||
#### KV
|
||||
|
||||
The `kv` indexer type is an embedded key-value store supported by the main
|
||||
underling Tendermint database. Using the `kv` indexer type allows you to query
|
||||
underlying Tendermint database. Using the `kv` indexer type allows you to query
|
||||
for block and transaction events directly against Tendermint's RPC. However, the
|
||||
query syntax is limited and so this indexer type might be deprecated or removed
|
||||
entirely in the future.
|
||||
|
||||
@@ -36,10 +36,6 @@ proxy-app = "tcp://127.0.0.1:26658"
|
||||
# A custom human readable name for this node
|
||||
moniker = "anonymous"
|
||||
|
||||
# If this node is many blocks behind the tip of the chain, BlockSync
|
||||
# allows them to catchup quickly by downloading blocks in parallel
|
||||
# and verifying their commits
|
||||
fast-sync = true
|
||||
|
||||
# Mode of Node: full | validator | seed (default: "validator")
|
||||
# * validator node (default)
|
||||
@@ -356,11 +352,16 @@ temp-dir = ""
|
||||
#######################################################
|
||||
### BlockSync Configuration Connections ###
|
||||
#######################################################
|
||||
[fastsync]
|
||||
[blocksync]
|
||||
|
||||
# If this node is many blocks behind the tip of the chain, BlockSync
|
||||
# allows them to catchup quickly by downloading blocks in parallel
|
||||
# and verifying their commits
|
||||
enable = true
|
||||
|
||||
# Block Sync version to use:
|
||||
# 1) "v0" (default) - the legacy block sync implementation
|
||||
# 2) "v2" - complete redesign of v0, optimized for testability & readability
|
||||
# 1) "v0" (default) - the standard block sync implementation
|
||||
# 2) "v2" - DEPRECATED, please use v0
|
||||
version = "v0"
|
||||
|
||||
#######################################################
|
||||
|
||||
@@ -38,6 +38,9 @@ sections.
|
||||
## Table of Contents
|
||||
|
||||
- [RFC-000: P2P Roadmap](./rfc-000-p2p-roadmap.rst)
|
||||
- [RFC-001: Storage Engines](./rfc-001-storage-engine.rst)
|
||||
- [RFC-002: Interprocess Communication](./rfc-002-ipc-ecosystem.md)
|
||||
- [RFC-003: Performance Taxonomy](./rfc-003-performance-questions.md)
|
||||
- [RFC-004: E2E Test Framework Enhancements](./rfc-004-e2e-framework.md)
|
||||
|
||||
<!-- - [RFC-NNN: Title](./rfc-NNN-title.md) -->
|
||||
|
||||
179
docs/rfc/rfc-001-storage-engine.rst
Normal file
179
docs/rfc/rfc-001-storage-engine.rst
Normal file
@@ -0,0 +1,179 @@
|
||||
===========================================
|
||||
RFC 001: Storage Engines and Database Layer
|
||||
===========================================
|
||||
|
||||
Changelog
|
||||
---------
|
||||
|
||||
- 2021-04-19: Initial Draft (gist)
|
||||
- 2021-09-02: Migrated to RFC folder, with some updates
|
||||
|
||||
Abstract
|
||||
--------
|
||||
|
||||
The aspect of Tendermint that's responsible for persistence and storage (often
|
||||
"the database" internally) represents a bottle neck in the architecture of the
|
||||
platform, that the 0.36 release presents a good opportunity to correct. The
|
||||
current storage engine layer provides a great deal of flexibility that is
|
||||
difficult for users to leverage or benefit from, while also making it harder
|
||||
for Tendermint Core developers to deliver improvements on storage engine. This
|
||||
RFC discusses the possible improvements to this layer of the system.
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
Tendermint has a very thin common wrapper that makes Tendermint itself
|
||||
(largely) agnostic to the data storage layer (within the realm of the popular
|
||||
key-value/embedded databases.) This flexibility is not particularly useful:
|
||||
the benefits of a specific database engine in the context of Tendermint is not
|
||||
particularly well understood, and the maintenance burden for multiple backends
|
||||
is not commensurate with the benefit provided. Additionally, because the data
|
||||
storage layer is handled generically, and most tests run with an in-memory
|
||||
framework, it's difficult to take advantage of any higher-level features of a
|
||||
database engine.
|
||||
|
||||
Ideally, developers within Tendermint will be able to interact with persisted
|
||||
data via an interface that can function, approximately like an object
|
||||
store, and this storage interface will be able to accommodate all existing
|
||||
persistence workloads (e.g. block storage, local peer management information
|
||||
like the "address book", crash-recovery log like the WAL.) In addition to
|
||||
providing a more ergonomic interface and new semantics, by selecting a single
|
||||
storage engine tendermint can use native durability and atomicity features of
|
||||
the storage engine and simplify its own implementations.
|
||||
|
||||
Data Access Patterns
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Tendermint's data access patterns have the following characteristics:
|
||||
|
||||
- aggregate data size often exceeds memory.
|
||||
|
||||
- data is rarely mutated after it's written for most data (e.g. blocks), but
|
||||
small amounts of working data is persisted by nodes and is frequently
|
||||
mutated (e.g. peer information, validator information.)
|
||||
|
||||
- read patterns can be quite random.
|
||||
|
||||
- crash resistance and crash recovery, provided by write-ahead-logs (in
|
||||
consensus, and potentially for the mempool) should allow the system to
|
||||
resume work after an unexpected shut down.
|
||||
|
||||
Project Goals
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
As we think about replacing the current persistence layer, we should consider
|
||||
the following high level goals:
|
||||
|
||||
- drop dependencies on storage engines that have a CGo dependency.
|
||||
|
||||
- encapsulate data format and data storage from higher-level services
|
||||
(e.g. reactors) within tendermint.
|
||||
|
||||
- select a storage engine that does not incur any additional operational
|
||||
complexity (e.g. database should be embedded.)
|
||||
|
||||
- provide database semantics with sufficient ACID, snapshots, and
|
||||
transactional support.
|
||||
|
||||
Open Questions
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The following questions remain:
|
||||
|
||||
- what kind of data-access concurrency does tendermint require?
|
||||
|
||||
- would tendermint users SDK/etc. benefit from some shared database
|
||||
infrastructure?
|
||||
|
||||
- In earlier conversations it seemed as if the SDK has selected Badger and
|
||||
RocksDB for their storage engines, and it might make sense to be able to
|
||||
(optionally) pass a handle to a Badger instance between the libraries in
|
||||
some cases.
|
||||
|
||||
- what are typical data sizes, and what kinds of memory sizes can we expect
|
||||
operators to be able to provide?
|
||||
|
||||
- in addition to simple persistence, what kind of additional semantics would
|
||||
tendermint like to enjoy (e.g. transactional semantics, unique constraints,
|
||||
indexes, in-place-updates, etc.)?
|
||||
|
||||
Decision Framework
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Given the constraint of removing the CGo dependency, the decision is between
|
||||
"badger" and "boltdb" (in the form of the etcd/CoreOS fork,) as low level. On
|
||||
top of this and somewhat orthogonally, we must also decide on the interface to
|
||||
the database and how the larger application will have to interact with the
|
||||
database layer. Users of the data layer shouldn't ever need to interact with
|
||||
raw byte slices from the database, and should mostly have the experience of
|
||||
interacting with Go-types.
|
||||
|
||||
Badger is more consistently developed and has a broader feature set than
|
||||
Bolt. At the same time, Badger is likely more memory intensive and may have
|
||||
more overhead in terms of open file handles given it's model. At first glance,
|
||||
Badger is the obvious choice: it's actively developed and it has a lot of
|
||||
features that could be useful. Bolt is not without some benefits: it's stable
|
||||
and is maintained by the etcd folks, it's simpler model (single memory mapped
|
||||
file, etc,) may be easier to reason about.
|
||||
|
||||
I propose that we consider the following specific questions about storage
|
||||
engines:
|
||||
|
||||
- does Badger's evolving development, which may result in data file format
|
||||
changes in the future, and could restrict our access to using the latest
|
||||
version of the library between major upgrades, present a problem?
|
||||
|
||||
- do we do we have goals/concerns about memory footprint that Badger may
|
||||
prevent us from hitting, particularly as data sets grow over time?
|
||||
|
||||
- what kind of additional tooling might we need/like to build (dump/restore,
|
||||
etc.)?
|
||||
|
||||
- do we want to run unit/integration tests against a data files on disk rather
|
||||
than relying exclusively on the memory database?
|
||||
|
||||
Project Scope
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
This project will consist of the following aspects:
|
||||
|
||||
- selecting a storage engine, and modifying the tendermint codebase to
|
||||
disallow any configuration of the storage engine outside of the tendermint.
|
||||
|
||||
- remove the dependency on the current tm-db interfaces and replace with some
|
||||
internalized, safe, and ergonomic interface for data persistence with all
|
||||
required database semantics.
|
||||
|
||||
- update core tendermint code to use the new interface and data tools.
|
||||
|
||||
Next Steps
|
||||
~~~~~~~~~~
|
||||
|
||||
- circulate the RFC, and discuss options with appropriate stakeholders.
|
||||
|
||||
- write brief ADR to summarize decisions around technical decisions reached
|
||||
during the RFC phase.
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
- `bolddb <https://github.com/etcd-io/bbolt>`_
|
||||
- `badger <https://github.com/dgraph-io/badger>`_
|
||||
- `badgerdb overview <https://dbdb.io/db/badgerdb>`_
|
||||
- `botldb overview <https://dbdb.io/db/boltdb>`_
|
||||
- `boltdb vs badger <https://tech.townsourced.com/post/boltdb-vs-badger>`_
|
||||
- `bolthold <https://github.com/timshannon/bolthold>`_
|
||||
- `badgerhold <https://github.com/timshannon/badgerhold>`_
|
||||
- `Pebble <https://github.com/cockroachdb/pebble>`_
|
||||
- `SDK Issue Regarding IVAL <https://github.com/cosmos/cosmos-sdk/issues/7100>`_
|
||||
- `SDK Discussion about SMT/IVAL <https://github.com/cosmos/cosmos-sdk/discussions/8297>`_
|
||||
|
||||
Discussion
|
||||
----------
|
||||
|
||||
- All things being equal, my tendency would be to use badger, with badgerhold
|
||||
(if that makes sense) for its ergonomics and indexing capabilities, which
|
||||
will require some small selection of wrappers for better write transaction
|
||||
support. This is a weakly held tendency/belief and I think it would be
|
||||
useful for the RFC process to build consensus (or not) around this basic
|
||||
assumption.
|
||||
420
docs/rfc/rfc-002-ipc-ecosystem.md
Normal file
420
docs/rfc/rfc-002-ipc-ecosystem.md
Normal file
@@ -0,0 +1,420 @@
|
||||
# RFC 002: Interprocess Communication (IPC) in Tendermint
|
||||
|
||||
## Changelog
|
||||
|
||||
- 08-Sep-2021: Initial draft (@creachadair).
|
||||
|
||||
|
||||
## Abstract
|
||||
|
||||
Communication in Tendermint among consensus nodes, applications, and operator
|
||||
tools all use different message formats and transport mechanisms. In some
|
||||
cases there are multiple options. Having all these options complicates both the
|
||||
code and the developer experience, and hides bugs. To support a more robust,
|
||||
trustworthy, and usable system, we should document which communication paths
|
||||
are essential, which could be removed or reduced in scope, and what we can
|
||||
improve for the most important use cases.
|
||||
|
||||
This document proposes a variety of possible improvements of varying size and
|
||||
scope. Specific design proposals should get their own documentation.
|
||||
|
||||
|
||||
## Background
|
||||
|
||||
The Tendermint state replication engine has a complex IPC footprint.
|
||||
|
||||
1. Consensus nodes communicate with each other using a networked peer-to-peer
|
||||
message-passing protocol.
|
||||
|
||||
2. Consensus nodes communicate with the application whose state is being
|
||||
replicated via the [Application BlockChain Interface (ABCI)][abci].
|
||||
|
||||
3. Consensus nodes export a network-accessible [RPC service][rpc-service] to
|
||||
support operations (bootstrapping, debugging) and synchronization of [light clients][light-client].
|
||||
This interface is also used by the [`tendermint` CLI][tm-cli].
|
||||
|
||||
4. Consensus nodes export a gRPC service exposing a subset of the methods of
|
||||
the RPC service described by (3). This was intended to simplify the
|
||||
implementation of tools that already use gRPC to communicate with an
|
||||
application (via the Cosmos SDK), and wanted to also talk to the consensus
|
||||
node without implementing yet another RPC protocol.
|
||||
|
||||
The gRPC interface to the consensus node has been deprecated and is slated
|
||||
for removal in the forthcoming Tendermint v0.36 release.
|
||||
|
||||
5. Consensus nodes may optionally communicate with a "remote signer" that holds
|
||||
a validator key and can provide public keys and signatures to the consensus
|
||||
node. One of the stated goals of this configuration is to allow the signer
|
||||
to be run on a private network, separate from the consensus node, so that a
|
||||
compromise of the consensus node from the public network would be less
|
||||
likely to expose validator keys.
|
||||
|
||||
## Discussion: Transport Mechanisms
|
||||
|
||||
### Remote Signer Transport
|
||||
|
||||
A remote signer communicates with the consensus node in one of two ways:
|
||||
|
||||
1. "Raw": Using a TCP or Unix-domain socket which carries varint-prefixed
|
||||
protocol buffer messages. In this mode, the consensus node is the server,
|
||||
and the remote signer is the client.
|
||||
|
||||
This mode has been deprecated, and is intended to be removed.
|
||||
|
||||
2. gRPC: This mode uses the same protobuf messages as "Raw" node, but uses a
|
||||
standard encrypted gRPC HTTP/2 stub as the transport. In this mode, the
|
||||
remote signer is the server and the consensus node is the client.
|
||||
|
||||
|
||||
### ABCI Transport
|
||||
|
||||
In ABCI, the _application_ is the server, and the Tendermint consensus engine
|
||||
is the client. Most applications implement the server using the [Cosmos SDK][cosmos-sdk],
|
||||
which handles low-level details of the ABCI interaction and provides a
|
||||
higher-level interface to the rest of the application. The SDK is written in Go.
|
||||
|
||||
Beneath the SDK, the application communicates with Tendermint core in one of
|
||||
two ways:
|
||||
|
||||
- In-process direct calls (for applications written in Go and compiled against
|
||||
the Tendermint code). This is an optimization for the common case where an
|
||||
application is written in Go, to save on the overhead of marshaling and
|
||||
unmarshaling requests and responses within the same process:
|
||||
[`abci/client/local_client.go`][local-client]
|
||||
|
||||
- A custom remote procedure protocol built on wire-format protobuf messages
|
||||
using a socket (the "socket protocol"): [`abci/server/socket_server.go`][socket-server]
|
||||
|
||||
The SDK also provides a [gRPC service][sdk-grpc] accessible from outside the
|
||||
application, allowing transactions to be broadcast to the network, look up
|
||||
transactions, and simulate transaction costs.
|
||||
|
||||
|
||||
### RPC Transport
|
||||
|
||||
The consensus node RPC service allows callers to query consensus parameters
|
||||
(genesis data, transactions, commits), node status (network info, health
|
||||
checks), application state (abci_query, abci_info), mempool state, and other
|
||||
attributes of the node and its application. The service also provides methods
|
||||
allowing transactions and evidence to be injected ("broadcast") into the
|
||||
blockchain.
|
||||
|
||||
The RPC service is exposed in several ways:
|
||||
|
||||
- HTTP GET: Queries may be sent as URI parameters, with method names in the path.
|
||||
|
||||
- HTTP POST: Queries may be sent as JSON-RPC request messages in the body of an
|
||||
HTTP POST request. The server uses a custom implementation of JSON-RPC that
|
||||
is not fully compatible with the [JSON-RPC 2.0 spec][json-rpc], but handles
|
||||
the common cases.
|
||||
|
||||
- Websocket: Queries may be sent as JSON-RPC request messages via a websocket.
|
||||
This transport uses more or less the same JSON-RPC plumbing as the HTTP POST
|
||||
handler.
|
||||
|
||||
The websocket endpoint also includes three methods that are _only_ exported
|
||||
via websocket, which appear to support event subscription.
|
||||
|
||||
- gRPC: A subset of queries may be issued in protocol buffer format to the gRPC
|
||||
interface described above under (4). As noted, this endpoint is deprecated
|
||||
and will be removed in v0.36.
|
||||
|
||||
### Opportunities for Simplification
|
||||
|
||||
**Claim:** There are too many IPC mechanisms.
|
||||
|
||||
The preponderance of ABCI usage is via the Cosmos SDK, which means the
|
||||
application and the consensus node are compiled together into a single binary,
|
||||
and the consensus node calls the ABCI methods of the application directly as Go
|
||||
functions.
|
||||
|
||||
We also need a true IPC transport to support ABCI applications _not_ written in
|
||||
Go. There are also several known applications written in Rust, for example
|
||||
(including [Anoma](https://github.com/anoma/anoma), Penumbra,
|
||||
[Oasis](https://github.com/oasisprotocol/oasis-core), Twilight, and
|
||||
[Nomic](https://github.com/nomic-io/nomic)). Ideally we will have at most one
|
||||
such transport "built-in": More esoteric cases can be handled by a custom proxy.
|
||||
Pragmatically, gRPC is probably the right choice here.
|
||||
|
||||
The primary consumers of the multi-headed "RPC service" today are the light
|
||||
client and the `tendermint` command-line client. There is probably some local
|
||||
use via curl, but I expect that is mostly ad hoc. Ethan reports that nodes are
|
||||
often configured with the ports to the RPC service blocked, which is good for
|
||||
security but complicates use by the light client.
|
||||
|
||||
### Context: Remote Signer Issues
|
||||
|
||||
Since the remote signer needs a secure communication channel to exchange keys
|
||||
and signatures, and is expected to run truly remotely from the node (i.e., on a
|
||||
separate physical server), there is not a whole lot we can do here. We should
|
||||
finish the deprecation and removal of the "raw" socket protocol between the
|
||||
consensus node and remote signers, but the use of gRPC is appropriate.
|
||||
|
||||
The main improvement we can make is to simplify the implementation quite a bit,
|
||||
once we no longer need to support both "raw" and gRPC transports.
|
||||
|
||||
### Context: ABCI Issues
|
||||
|
||||
In the original design of ABCI, the presumption was that all access to the
|
||||
application should be mediated by the consensus node. The idea is that outside
|
||||
access could change application state and corrupt the consensus process, which
|
||||
relies on the application to be deterministic. Of course, even without outside
|
||||
access an application could behave nondeterministically, but allowing other
|
||||
programs to send it requests was seen as courting trouble.
|
||||
|
||||
Conversely, users noted that most of the time, tools written for a particular
|
||||
application don't want to talk to the consensus module directly. The
|
||||
application "owns" the state machine the consensus engine is replicating, so
|
||||
tools that care about application state should talk to the application.
|
||||
Otherwise, they would have to bake in knowledge about Tendermint (e.g., its
|
||||
interfaces and data structures) just because of the mediation.
|
||||
|
||||
For clients to talk directly to the application, however, there is another
|
||||
concern: The consensus node is the ABCI _client_, so it is inconvenient for the
|
||||
application to "push" work into the consensus module via ABCI itself. The
|
||||
current implementation works around this by calling the consensus node's RPC
|
||||
service, which exposes an `ABCIQuery` kitchen-sink method that allows the
|
||||
application a way to poke ABCI messages in the other direction.
|
||||
|
||||
Without this RPC method, you could work around this (at least in principle) by
|
||||
having the consensus module "poll" the application for work that needs done,
|
||||
but that has unsatisfactory implications for performance and robustness, as
|
||||
well as being harder to understand.
|
||||
|
||||
There has apparently been discussion about trying to make a more bidirectional
|
||||
communication between the consensus node and the application, but this issue
|
||||
seems to still be unresolved.
|
||||
|
||||
Another complication of ABCI is that it requires the application (server) to
|
||||
maintain [four separate connections][abci-conn]: One for "consensus" operations
|
||||
(BeginBlock, EndBlock, DeliverTx, Commit), one for "mempool" operations, one
|
||||
for "query" operations, and one for "snapshot" (state synchronization) operations.
|
||||
The rationale seems to have been that these groups of operations should be able
|
||||
to proceed concurrently with each other. In practice, it results in a very complex
|
||||
state management problem to coordinate state updates between the separate streams.
|
||||
While application authors in Go are mostly insulated from that complexity by the
|
||||
Cosmos SDK, the plumbing to maintain those separate streams is complicated, hard
|
||||
to understand, and we suspect it contains concurrency bugs and/or lock contention
|
||||
issues affecting performance that are subtle and difficult to pin down.
|
||||
|
||||
Even without changing the semantics of any ABCI operations, this code could be
|
||||
made smaller and easier to debug by separating the management of concurrency
|
||||
and locking from the IPC transport: If all requests and responses are routed
|
||||
through one connection, the server can explicitly maintain priority queues for
|
||||
requests and responses, and make less-conservative decisions about when locks
|
||||
are (or aren't) required to synchronize state access. With independent queues,
|
||||
the server must lock conservatively, and no optimistic scheduling is practical.
|
||||
|
||||
This would be a tedious implementation change, but should be achievable without
|
||||
breaking any of the existing interfaces. More importantly, it could potentially
|
||||
address a lot of difficult concurrency and performance problems we currently
|
||||
see anecdotally but have difficultly isolating because of how intertwined these
|
||||
separate message streams are at runtime.
|
||||
|
||||
TODO: Impact of ABCI++ for this topic?
|
||||
|
||||
### Context: RPC Issues
|
||||
|
||||
The RPC system serves several masters, and has a complex surface area. I
|
||||
believe there are some improvements that can be exposed by separating some of
|
||||
these concerns.
|
||||
|
||||
The Tendermint light client currently uses the RPC service to look up blocks
|
||||
and transactions, and to forward ABCI queries to the application. The light
|
||||
client proxy uses the RPC service via a websocket. The Cosmos IBC relayer also
|
||||
uses the RPC service via websocket to watch for transaction events, and uses
|
||||
the `ABCIQuery` method to fetch information and proofs for posted transactions.
|
||||
|
||||
Some work is already underway toward using P2P message passing rather than RPC
|
||||
to synchronize light client state with the rest of the network. IBC relaying,
|
||||
however, requires access to the event system, which is currently not accessible
|
||||
except via the RPC interface. Event subscription _could_ be exposed via P2P,
|
||||
but that is a larger project since it adds P2P communication load, and might
|
||||
thus have an impact on the performance of consensus.
|
||||
|
||||
If event subscription can be moved into the P2P network, we could entirely
|
||||
remove the websocket transport, even for clients that still need access to the
|
||||
RPC service. Until then, we may still be able to reduce the scope of the
|
||||
websocket endpoint to _only_ event subscription, by moving uses of the RPC
|
||||
server as a proxy to ABCI over to the gRPC interface.
|
||||
|
||||
Having the RPC server still makes sense for local bootstrapping and operations,
|
||||
but can be further simplified. Here are some specific proposals:
|
||||
|
||||
- Remove the HTTP GET interface entirely.
|
||||
|
||||
- Simplify JSON-RPC plumbing to remove unnecessary reflection and wrapping.
|
||||
|
||||
- Remove the gRPC interface (this is already planned for v0.36).
|
||||
|
||||
- Separate the websocket interface from the rest of the RPC service, and
|
||||
restrict it to only event subscription.
|
||||
|
||||
Eventually we should try to emove the websocket interface entirely, but we
|
||||
will need to revisit that (probably in a new RFC) once we've done some of the
|
||||
easier things.
|
||||
|
||||
These changes would preserve the ability of operators to issue queries with
|
||||
curl (but would require using JSON-RPC instead of URI parameters). That would
|
||||
be a little less user-friendly, but for a use case that should not be that
|
||||
prevalent.
|
||||
|
||||
These changes would also preserve compatibility with existing JSON-RPC based
|
||||
code paths like the `tendermint` CLI and the light client (even ahead of
|
||||
further work to remove that dependency).
|
||||
|
||||
**Design goal:** An operator should be able to disable non-local access to the
|
||||
RPC server on any node in the network without impairing the ability of the
|
||||
network to function for service of state replication, including light clients.
|
||||
|
||||
**Design principle:** All communication required to implement and monitor the
|
||||
consensus network should use P2P, including the various synchronizations.
|
||||
|
||||
### Options for ABCI Transport
|
||||
|
||||
The majority of current usage is in Go, and the majority of that is mediated by
|
||||
the Cosmos SDK, which uses the "direct call" interface. There is probably some
|
||||
opportunity to clean up the implementation of that code, notably by inverting
|
||||
which interface is at the "top" of the abstraction stack (currently it acts
|
||||
like an RPC interface, and escape-hatches into the direct call). However, this
|
||||
general approach works fine and doesn't need to be fundamentally changed.
|
||||
|
||||
For applications _not_ written in Go, the two remaining options are the
|
||||
"socket" protocol (another variation on varint-prefixed protobuf messages over
|
||||
an unstructured stream) and gRPC. It would be nice if we could get rid of one
|
||||
of these to reduce (unneeded?) optionality.
|
||||
|
||||
Since both the socket protocol and gRPC depend on protocol buffers, the
|
||||
"socket" protocol is the most obvious choice to remove. While gRPC is more
|
||||
complex, the set of languages that _have_ protobuf support but _lack_ gRPC
|
||||
support is small. Moreover, gRPC is already widely used in the rest of the
|
||||
ecosystem (including the Cosmos SDK).
|
||||
|
||||
If some use case did arise later that can't work with gRPC, it would not be too
|
||||
difficult for that application author to write a little proxy (in Go) that
|
||||
bridges the convenient SDK APIs into a simpler protocol than gRPC.
|
||||
|
||||
**Design principle:** It is better for an uncommon special case to carry the
|
||||
burdens of its specialness, than to bake an escape hatch into the infrastructure.
|
||||
|
||||
**Recommendation:** We should deprecate and remove the socket protocol.
|
||||
|
||||
### Options for RPC Transport
|
||||
|
||||
[ADR 057][adr-57] proposes using gRPC for the Tendermint RPC implementation.
|
||||
This is still possible, but if we are able to simplify and decouple the
|
||||
concerns as described above, I do not think it should be necessary.
|
||||
|
||||
While JSON-RPC is not the best possible RPC protocol for all situations, it has
|
||||
some advantages over gRPC for our domain. Specifically:
|
||||
|
||||
- It is easy to call JSON-RPC manually from the command-line, which helps with
|
||||
a common concern for the RPC service, local debugging and operations.
|
||||
|
||||
Relatedly: JSON is relatively easy for humans to read and write, and it can
|
||||
be easily copied and pasted to share sample queries and debugging results in
|
||||
chat, issue comments, and so on. Ideally, the RPC service will not be used
|
||||
for activities where the costs of a text protocol are important compared to
|
||||
its legibility and manual usability benefits.
|
||||
|
||||
- gRPC has an enormous dependency footprint for both clients and servers, and
|
||||
many of the features it provides to support security and performance
|
||||
(encryption, compression, streaming, etc.) are mostly irrelevant to local
|
||||
use. Tendermint already needs to include a gRPC client for the remote signer,
|
||||
but if we can avoid the need for a _client_ to depend on gRPC, that is a win
|
||||
for usability.
|
||||
|
||||
- If we intend to migrate light clients off RPC to use P2P entirely, there is
|
||||
no advantage to forcing a temporary migration to gRPC along the way; and once
|
||||
the light client is not dependent on the RPC service, the efficiency of the
|
||||
protocol is much less important.
|
||||
|
||||
- We can still get the benefits of generated data types using protocol buffers, even
|
||||
without using gRPC:
|
||||
|
||||
- Protobuf defines a standard JSON encoding for all message types so
|
||||
languages with protobuf support do not need to worry about type mapping
|
||||
oddities.
|
||||
|
||||
- Using JSON means that even languages _without_ good protobuf support can
|
||||
implement the protocol with a bit more work, and I expect this situation to
|
||||
be rare.
|
||||
|
||||
Even if a language lacks a good standard JSON-RPC mechanism, the protocol is
|
||||
lightweight and can be implemented by simple send/receive over TCP or
|
||||
Unix-domain sockets with no need for code generation, encryption, etc. gRPC
|
||||
uses a complex HTTP/2 based transport that is not easily replicated.
|
||||
|
||||
### Future Work
|
||||
|
||||
The background and proposals sketched above focus on the existing structure of
|
||||
Tendermint and improvements we can make in the short term. It is worthwhile to
|
||||
also consider options for longer-term broader changes to the IPC ecosystem.
|
||||
The following outlines some ideas at a high level:
|
||||
|
||||
- **Consensus service:** Today, the application and the consensus node are
|
||||
nominally connected only via ABCI. Tendermint was originally designed with
|
||||
the assumption that all communication with the application should be mediated
|
||||
by the consensus node. Based on further experience, however, the design goal
|
||||
is now that the _application_ should be the mediator of application state.
|
||||
|
||||
As noted above, however, ABCI is a client/server protocol, with the
|
||||
application as the server. For outside clients that turns out to have been a
|
||||
good choice, but it complicates the relationship between the application and
|
||||
the consensus node: Previously transactions were entered via the node, now
|
||||
they are entered via the app.
|
||||
|
||||
We have worked around this by using the Tendermint RPC service to give the
|
||||
application a "back channel" to the consensus node, so that it can push
|
||||
transactions back into the consensus network. But the RPC service exposes a
|
||||
lot of other functionality, too, including event subscription, block and
|
||||
transaction queries, and a lot of node status information.
|
||||
|
||||
Even if we can't easily "fix" the orientation of the ABCI relationship, we
|
||||
could improve isolation by splitting out the parts of the RPC service that
|
||||
the application needs as a back-channel, and sharing those _only_ with the
|
||||
application. By defining a "consensus service", we could give the application
|
||||
a way to talk back limited to only the capabilities it needs. This approach
|
||||
has the benefit that we could do it without breaking existing use, and if we
|
||||
later did "fix" the ABCI directionality, we could drop the special case
|
||||
without disrupting the rest of the RPC interface.
|
||||
|
||||
- **Event service:** Right now, the IBC relayer relies on the Tendermint RPC
|
||||
service to provide a stream of block and transaction events, which it uses to
|
||||
discover which transactions need relaying to other chains. While I think
|
||||
that event subscription should eventually be handled via P2P, we could gain
|
||||
some immediate benefit by splitting out event subscription from the rest of
|
||||
the RPC service.
|
||||
|
||||
In this model, an event subscription service would be exposed on the public
|
||||
network, but on a different endpoint. This would remove the need for the RPC
|
||||
service to support the websocket protocol, and would allow operators to
|
||||
isolate potentially sensitive status query results from the public network.
|
||||
|
||||
At the moment the relayers also use the RPC service to get block data for
|
||||
synchronization, but work is already in progress to handle that concern via
|
||||
the P2P layer. Once that's done, event subscription could be separated.
|
||||
|
||||
Separating parts of the existing RPC service is not without cost: It might
|
||||
require additional connection endpoints, for example, though it is also not too
|
||||
difficult for multiple otherwise-independent services to share a connection.
|
||||
|
||||
In return, though, it would become easier to reduce transport options and for
|
||||
operators to independently control access to sensitive data. Considering the
|
||||
viability and implications of these ideas is beyond the scope of this RFC, but
|
||||
they are documented here since they follow from the background we have already
|
||||
discussed.
|
||||
|
||||
## References
|
||||
|
||||
[abci]: https://github.com/tendermint/spec/tree/95cf253b6df623066ff7cd4074a94e7a3f147c7a/spec/abci
|
||||
[rpc-service]: https://docs.tendermint.com/master/rpc/
|
||||
[light-client]: https://docs.tendermint.com/master/tendermint-core/light-client.html
|
||||
[tm-cli]: https://github.com/tendermint/tendermint/tree/master/cmd/tendermint
|
||||
[cosmos-sdk]: https://github.com/cosmos/cosmos-sdk/
|
||||
[local-client]: https://github.com/tendermint/tendermint/blob/master/abci/client/local_client.go
|
||||
[socket-server]: https://github.com/tendermint/tendermint/blob/master/abci/server/socket_server.go
|
||||
[sdk-grpc]: https://pkg.go.dev/github.com/cosmos/cosmos-sdk/types/tx#ServiceServer
|
||||
[json-rpc]: https://www.jsonrpc.org/specification
|
||||
[abci-conn]: https://github.com/tendermint/spec/blob/master/spec/abci/apps.md#state
|
||||
[adr-57]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-057-RPC.md
|
||||
213
docs/rfc/rfc-004-e2e-framework.rst
Normal file
213
docs/rfc/rfc-004-e2e-framework.rst
Normal file
@@ -0,0 +1,213 @@
|
||||
========================================
|
||||
RFC 004: E2E Test Framework Enhancements
|
||||
========================================
|
||||
|
||||
Changelog
|
||||
---------
|
||||
|
||||
- 2021-09-14: started initial draft (@tychoish)
|
||||
|
||||
Abstract
|
||||
--------
|
||||
|
||||
This document discusses a series of improvements to the e2e test framework
|
||||
that we can consider during the next few releases to help boost confidence in
|
||||
Tendermint releases, and improve developer efficiency.
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
During the 0.35 release cycle, the E2E tests were a source of great
|
||||
value, helping to identify a number of bugs before release. At the same time,
|
||||
the tests were not consistently passing during this time, thereby reducing
|
||||
their value, and forcing the core development team to allocate time and energy
|
||||
to maintaining and chasing down issues with the e2e tests and the test
|
||||
harness. The experience of this release cycle calls to mind a series of
|
||||
improvements to the test framework, and this document attempts to capture
|
||||
these improvements, along with motivations, and potential for impact.
|
||||
|
||||
Projects
|
||||
--------
|
||||
|
||||
Flexible Workload Generation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Presently the e2e suite contains a single workload generation pattern, which
|
||||
exists simply to ensure that the test networks have some work during their
|
||||
runs. However, the shape and volume of the work is very consistent and is very
|
||||
gentle to help ensure test reliability.
|
||||
|
||||
We don't need a complex workload generation framework, but being able to have
|
||||
a few different workload shapes available for test networks, both generated and
|
||||
hand-crafted, would be useful.
|
||||
|
||||
Workload patterns/configurations might include:
|
||||
|
||||
- transaction targeting patterns (include light nodes, round robin, target
|
||||
individual nodes)
|
||||
|
||||
- variable transaction size over time.
|
||||
|
||||
- transaction broadcast option (synchronously, checked, fire-and-forget,
|
||||
mixed).
|
||||
|
||||
- number of transactions to submit.
|
||||
|
||||
- non-transaction workloads: (evidence submission, query, event subscription.)
|
||||
|
||||
Configurable Generator
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The nightly e2e suite is defined by the `testnet generator
|
||||
<https://github.com/tendermint/tendermint/blob/master/test/e2e/generator/generate.go#L13-L65>`_,
|
||||
and it's difficult to add dimensions or change the focus of the test suite in
|
||||
any way without modifying the implementation of the generator. If the
|
||||
generator were more configurable, potentially via a file rather than in
|
||||
the Go implementation, we could modify the focus of the test suite on the
|
||||
fly.
|
||||
|
||||
Features that we might want to configure:
|
||||
|
||||
- number of test networks to generate of various topologies, to improve
|
||||
coverage of different configurations.
|
||||
|
||||
- test application configurations (to modify the latency of ABCI calls, etc.)
|
||||
|
||||
- size of test networks.
|
||||
|
||||
- workload shape and behavior.
|
||||
|
||||
- initial sync and catch-up configurations.
|
||||
|
||||
The workload generator currently provides runtime options for limiting the
|
||||
generator to specific types of P2P stacks, and for generating multiple groups
|
||||
of test cases to support parallelism. The goal is to extend this pattern and
|
||||
avoid hardcoding the matrix of test cases in the generator code. Once the
|
||||
testnet configuration generation behavior is configurable at runtime,
|
||||
developers may be able to use the e2e framework to validate changes before
|
||||
landing changes that break e2e tests a day later.
|
||||
|
||||
In addition to the autogenerated suite, it might make sense to maintain a
|
||||
small collection of hand-crafted cases that exercise configurations of
|
||||
concern, to run as part of the nightly (or less frequent) loop.
|
||||
|
||||
Implementation Plan Structure
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
As a development team, we should determine the features should impact the e2e
|
||||
testing early in the development cycle, and if we intend to modify the e2e
|
||||
tests to exercise a feature, we should identify this early and begin the
|
||||
integration process as early as possible.
|
||||
|
||||
To facilitate this, we should adopt a practice whereby we exercise specific
|
||||
features that are currently under development more rigorously in the e2e
|
||||
suite, and then as development stabilizes we can reduce the number or weight
|
||||
of these features in the suite.
|
||||
|
||||
As of 0.35 there are essentially two end to end tests: the suite of 64
|
||||
generated test networks, and the hand crafted `ci.toml` test case. The
|
||||
generated test cases help provide systemtic coverage, while the `ci` run
|
||||
provides coverage for a large number of features.
|
||||
|
||||
Reduce Cycle Time
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
One of the barriers to leveraging the e2e framework, and one of the challenges
|
||||
in debugging failures, is the cycle time of running a single test iteration is
|
||||
quite high: 5 minutes to build the docker image, plus the time to run the test
|
||||
or tests.
|
||||
|
||||
There are a number of improvements and enhancements that can reduce the cycle
|
||||
time in practice:
|
||||
|
||||
- reduce the amount of time required to build the docker image used in these
|
||||
tests. Without the dependency on CGo, the tendermint binaries could be
|
||||
(cross) compiled outside of the docker container and then injected into
|
||||
them, which would take better advantage of docker's native caching,
|
||||
although, without the dependency on CGo there would be no hard requirement
|
||||
for the e2e tests to use docker.
|
||||
|
||||
- support test parallelism. Because of the way the testnets are orchestrated
|
||||
a single system can really only run one network at a time. For executions
|
||||
(local or remote) with more resources, there's no reason to run a few
|
||||
networks in parallel to reduce the feedback time.
|
||||
|
||||
- prune testnet configurations that are unlikely to provide good signal, to
|
||||
shorten the time to feedback.
|
||||
|
||||
- apply some kind of tiered approach to test execution, to improve the
|
||||
legibility of the test result. For example order tests by the dependency of
|
||||
their features, or run test networks without perturbations before running
|
||||
that configuration with perturbations, to be able to isolate the impact of
|
||||
specific features.
|
||||
|
||||
- orchestrate the test harness directly from go test rather than via a special
|
||||
harness and shell scripts so e2e tests may more naively fit into developers
|
||||
existing workflows.
|
||||
|
||||
Many of these improvements, particularly, reducing the build time will also
|
||||
reduce the time to get feedback during automated builds.
|
||||
|
||||
Deeper Insights
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
When a test network fails, it's incredibly difficult to understand _why_ the
|
||||
network failed, as the current system provides very little insight into the
|
||||
system outside of the process logs. When a test network stalls or fails
|
||||
developers should be able to quickly and easily get a sense of the state of
|
||||
the network and all nodes.
|
||||
|
||||
Improvements in persuit of this goal, include functionality that would help
|
||||
node operators in production environments by improving the quality and utility
|
||||
of the logging messages and other reported metrics, but also provide some
|
||||
tools to collect and aggregate this data for developers in the context of test
|
||||
networks.
|
||||
|
||||
- Interleave messages from all nodes in the network to be able to correlate
|
||||
events during the test run.
|
||||
|
||||
- Collect structured metrics of the system operation (CPU/MEM/IO) during the
|
||||
test run, as well as from each tendermint/application process.
|
||||
|
||||
- Build (simple) tools to be able to render and summarize the data collected
|
||||
during the test run to answer basic questions about test outcome.
|
||||
|
||||
Flexible Assertions
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Currently, all assertions run for every test network, which makes the
|
||||
assertions pretty bland, and the framework primarily useful as a smoke-test
|
||||
framework, but it might be useful to be able to write and run different
|
||||
tests for different configurations. This could allow us to test outside of the
|
||||
happy-path.
|
||||
|
||||
In general our existing assertions occupy a fraction of the total test time,
|
||||
so the relative cost of adding a few extra test assertions would be of limited
|
||||
cost, and could help build confidence.
|
||||
|
||||
Additional Kinds of Testing
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The existing e2e suite, exercises networks of nodes that have homogeneous
|
||||
tendermint version, stable configuration, that are expected to make
|
||||
progress. There are many other possible test configurations that may be
|
||||
interesting to engage with. These could include dimensions, such as:
|
||||
|
||||
- Multi-version testing to exercise our compatibility guarantees for networks
|
||||
that might have different tendermint versions.
|
||||
|
||||
- As a flavor or mult-version testing, include upgrade testing, to build
|
||||
confidence in migration code and procedures.
|
||||
|
||||
- Additional test applications, particularly practical-type applciations
|
||||
including some that use gaiad and/or the cosmos-sdk. Test-only applications
|
||||
that simulate other kinds of applications (e.g. variable application
|
||||
operation latency.)
|
||||
|
||||
- Tests of "non-viable" configurations that ensure that forbidden combinations
|
||||
lead to halts.
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
- `ADR 66: End-to-End Testing <../architecture/adr-66-e2e-testing.md>`_
|
||||
@@ -17,9 +17,9 @@ consensus gossip protocol.
|
||||
|
||||
## Using Block Sync
|
||||
|
||||
To support faster syncing, Tendermint offers a `fast-sync` mode, which
|
||||
To support faster syncing, Tendermint offers a `blocksync` mode, which
|
||||
is enabled by default, and can be toggled in the `config.toml` or via
|
||||
`--fast_sync=false`.
|
||||
`--blocksync.enable=false`.
|
||||
|
||||
In this mode, the Tendermint daemon will sync hundreds of times faster
|
||||
than if it used the real-time consensus process. Once caught up, the
|
||||
@@ -29,18 +29,23 @@ has at least one peer and it's height is at least as high as the max
|
||||
reported peer height. See [the IsCaughtUp
|
||||
method](https://github.com/tendermint/tendermint/blob/b467515719e686e4678e6da4e102f32a491b85a0/blockchain/pool.go#L128).
|
||||
|
||||
Note: There are two versions of Block Sync. We recommend using v0 as v2 is still in beta.
|
||||
Note: There are multiple versions of Block Sync. Please use v0 as the other versions are no longer supported.
|
||||
If you would like to use a different version you can do so by changing the version in the `config.toml`:
|
||||
|
||||
```toml
|
||||
#######################################################
|
||||
### Block Sync Configuration Connections ###
|
||||
#######################################################
|
||||
[fastsync]
|
||||
[blocksync]
|
||||
|
||||
# If this node is many blocks behind the tip of the chain, BlockSync
|
||||
# allows them to catchup quickly by downloading blocks in parallel
|
||||
# and verifying their commits
|
||||
enable = true
|
||||
|
||||
# Block Sync version to use:
|
||||
# 1) "v0" (default) - the legacy Block Sync implementation
|
||||
# 2) "v2" - complete redesign of v0, optimized for testability & readability
|
||||
# 1) "v0" (default) - the standard Block Sync implementation
|
||||
# 2) "v2" - DEPRECATED, please use v0
|
||||
version = "v0"
|
||||
```
|
||||
|
||||
@@ -55,4 +60,4 @@ the network best height, it will switches to the state sync mechanism and then e
|
||||
another event for exposing the fast-sync `complete` status and the state `height`.
|
||||
|
||||
The user can query the events by subscribing `EventQueryBlockSyncStatus`
|
||||
Please check [types](https://pkg.go.dev/github.com/tendermint/tendermint/types?utm_source=godoc#pkg-constants) for the details.
|
||||
Please check [types](https://pkg.go.dev/github.com/tendermint/tendermint/types?utm_source=godoc#pkg-constants) for the details.
|
||||
|
||||
@@ -185,51 +185,65 @@ the argument name and use `_` as a placeholder.
|
||||
|
||||
### Formatting
|
||||
|
||||
The following nuances when sending/formatting transactions should be
|
||||
taken into account:
|
||||
When sending transactions to the RPC interface, the following formatting rules
|
||||
must be followed:
|
||||
|
||||
With `GET`:
|
||||
Using `GET` (with parameters in the URL):
|
||||
|
||||
To send a UTF8 string byte array, quote the value of the tx parameter:
|
||||
To send a UTF8 string as transaction data, enclose the value of the `tx`
|
||||
parameter in double quotes:
|
||||
|
||||
```sh
|
||||
curl 'http://localhost:26657/broadcast_tx_commit?tx="hello"'
|
||||
```
|
||||
|
||||
which sends a 5 byte transaction: "h e l l o" \[68 65 6c 6c 6f\].
|
||||
which sends a 5-byte transaction: "h e l l o" \[68 65 6c 6c 6f\].
|
||||
|
||||
Note the URL must be wrapped with single quotes, else bash will ignore
|
||||
the double quotes. To avoid the single quotes, escape the double quotes:
|
||||
Note that the URL in this example is enclosed in single quotes to prevent the
|
||||
shell from interpreting the double quotes. Alternatively, you may escape the
|
||||
double quotes with backslashes:
|
||||
|
||||
```sh
|
||||
curl http://localhost:26657/broadcast_tx_commit?tx=\"hello\"
|
||||
```
|
||||
|
||||
Using a special character:
|
||||
The double-quoted format works with for multibyte characters, as long as they
|
||||
are valid UTF8, for example:
|
||||
|
||||
```sh
|
||||
curl 'http://localhost:26657/broadcast_tx_commit?tx="€5"'
|
||||
```
|
||||
|
||||
sends a 4 byte transaction: "€5" (UTF8) \[e2 82 ac 35\].
|
||||
sends a 4-byte transaction: "€5" (UTF8) \[e2 82 ac 35\].
|
||||
|
||||
To send as raw hex, omit quotes AND prefix the hex string with `0x`:
|
||||
Arbitrary (non-UTF8) transaction data may also be encoded as a string of
|
||||
hexadecimal digits (2 digits per byte). To do this, omit the quotation marks
|
||||
and prefix the hex string with `0x`:
|
||||
|
||||
```sh
|
||||
curl http://localhost:26657/broadcast_tx_commit?tx=0x01020304
|
||||
curl http://localhost:26657/broadcast_tx_commit?tx=0x68656C6C6F
|
||||
```
|
||||
|
||||
which sends a 4 byte transaction: \[01 02 03 04\].
|
||||
which sends the 5-byte transaction: \[68 65 6c 6c 6f\].
|
||||
|
||||
With `POST` (using `json`), the raw hex must be `base64` encoded:
|
||||
Using `POST` (with parameters in JSON), the transaction data are sent as a JSON
|
||||
string in base64 encoding:
|
||||
|
||||
```sh
|
||||
curl --data-binary '{"jsonrpc":"2.0","id":"anything","method":"broadcast_tx_commit","params": {"tx": "AQIDBA=="}}' -H 'content-type:text/plain;' http://localhost:26657
|
||||
curl http://localhost:26657 -H 'Content-Type: application/json' --data-binary '{
|
||||
"jsonrpc": "2.0",
|
||||
"id": "anything",
|
||||
"method": "broadcast_tx_commit",
|
||||
"params": {
|
||||
"tx": "aGVsbG8="
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
which sends the same 4 byte transaction: \[01 02 03 04\].
|
||||
which sends the same 5-byte transaction: \[68 65 6c 6c 6f\].
|
||||
|
||||
Note that raw hex cannot be used in `POST` transactions.
|
||||
Note that the hexadecimal encoding of transaction data is _not_ supported in
|
||||
JSON (`POST`) requests.
|
||||
|
||||
## Reset
|
||||
|
||||
|
||||
Reference in New Issue
Block a user