Remove ADR and RFC docs from the v0.35.x backport branch. (#7866)

2026-02-07 12:30:45 +00:00 · 2022-02-18 07:14:05 -08:00
parent 626caf7418
commit 601e44daff
89 changed files with 0 additions and 12293 deletions
--- a/docs/rfc/README.md
+++ b/docs/rfc/README.md
@@ -1,47 +0,0 @@
---
-order: 1
-parent:
-  order: false
---
-
-# Requests for Comments
-
-A Request for Comments (RFC) is a record of discussion on an open-ended topic
-related to the design and implementation of Tendermint Core, for which no
-immediate decision is required.
-
-The purpose of an RFC is to serve as a historical record of a high-level
-discussion that might otherwise only be recorded in an ad hoc way (for example,
-via gists or Google docs) that are difficult to discover for someone after the
-fact. An RFC _may_ give rise to more specific architectural _decisions_ for
-Tendermint, but those decisions must be recorded separately in [Architecture
-Decision Records (ADR)](./../architecture).
-
-As a rule of thumb, if you can articulate a specific question that needs to be
-answered, write an ADR. If you need to explore the topic and get input from
-others to know what questions need to be answered, an RFC may be appropriate.
-
-## RFC Content
-
-An RFC should provide:
-
- A **changelog**, documenting when and how the RFC has changed.
- An **abstract**, briefly summarizing the topic so the reader can quickly tell
-  whether it is relevant to their interest.
- Any **background** a reader will need to understand and participate in the
-  substance of the discussion (links to other documents are fine here).
- The **discussion**, the primary content of the document.
-
-The [rfc-template.md](./rfc-template.md) file includes placeholders for these
-sections.
-
-## Table of Contents
-
- [RFC-000: P2P Roadmap](./rfc-000-p2p-roadmap.rst)
- [RFC-001: Storage Engines](./rfc-001-storage-engine.rst)
- [RFC-002: Interprocess Communication](./rfc-002-ipc-ecosystem.md)
- [RFC-003: Performance Taxonomy](./rfc-003-performance-questions.md)
- [RFC-004: E2E Test Framework Enhancements](./rfc-004-e2e-framework.md)
- [RFC-005: Event System](./rfc-005-event-system.rst)
-
-<!-- - [RFC-NNN: Title](./rfc-NNN-title.md) -->
--- a/docs/rfc/rfc-000-p2p-roadmap.rst
+++ b/docs/rfc/rfc-000-p2p-roadmap.rst
@@ -1,316 +0,0 @@
-====================
-RFC 000: P2P Roadmap
-====================
-
-Changelog
---------
-
- 2021-08-20: Completed initial draft and distributed via a gist
- 2021-08-25: Migrated as an RFC and changed format
-
-Abstract
--------
-
-This document discusses the future of peer network management in Tendermint, with
-a particular focus on features, semantics, and a proposed roadmap.
-Specifically, we consider libp2p as a tool kit for implementing some fundamentals.
-
-Background
----------
-
-For the 0.35 release cycle the switching/routing layer of Tendermint was
-replaced. This work was done "in place," and produced a version of Tendermint
-that was backward-compatible and interoperable with previous versions of the
-software. While there are new p2p/peer management constructs in the new
-version (e.g. ``PeerManager`` and ``Router``), the main effect of this change
-was to simplify the ways that other components within Tendermint interacted with
-the peer management layer, and to make it possible for higher-level components
-(specifically the reactors), to be used and tested more independently.
-
-This refactoring, which was a major undertaking, was entirely necessary to
-enable areas for future development and iteration on this aspect of
-Tendermint. There are also a number of potential user-facing features that
-depend heavily on the p2p layer: additional transport protocols, transport
-compression, improved resilience to network partitions. These improvements to
-modularity, stability, and reliability of the p2p system will also make
-ongoing maintenance and feature development easier in the rest of Tendermint.
-
-Critique of Current Peer-to-Peer Infrastructure
---------------------------------------
-
-The current (refactored) P2P stack is an improvement on the previous iteration
-(legacy), but as of 0.35, there remains room for improvement in the design and
-implementation of the P2P layer.
-
-Some limitations of the current stack include:
-
- heavy reliance on buffering to avoid backups in the flow of components,
-  which is fragile to maintain and can lead to unexpected memory usage
-  patterns and forces the routing layer to make decisions about when messages
-  should be discarded.
-
- the current p2p stack relies on convention (rather than the compiler) to
-  enforce the API boundaries and conventions between reactors and the router,
-  making it very easy to write "wrong" reactor code or introduce a bad
-  dependency.
-
- the current stack is probably more complex and difficult to maintain because
-  the legacy system must coexist with the new components in 0.35. When the
-  legacy stack is removed there are some simple changes that will become
-  possible and could reduce the complexity of the new system. (e.g. `#6598
-  <https://github.com/tendermint/tendermint/issues/6598>`_.)
-
- the current stack encapsulates a lot of information about peers, and makes it
-  difficult to expose that information to monitoring/observability tools. This
-  general opacity also makes it difficult to interact with the peer system
-  from other areas of the code base (e.g. tests, reactors).
-
- the legacy stack provided some control to operators to force the system to
-  dial new peers or seed nodes or manipulate the topology of the system _in
-  situ_. The current stack can't easily provide this, and while the new stack
-  may have better behavior, it does leave operators hands tied.
-
-Some of these issues will be resolved early in the 0.36 cycle, with the
-removal of the legacy components.
-
-The 0.36 release also provides the opportunity to make changes to the
-protocol, as the release will not be compatible with previous releases.
-
-Areas for Development
---------------------
-
-These sections describe features that may make sense to include in a Phase 2 of
-a P2P project.
-
-Internal Message Passing
-~~~~~~~~~~~~~~~~~~~~~~~~
-
-Currently, there's no provision for intranode communication using the P2P
-layer, which means when two reactors need to interact with each other they
-have to have dependencies on each other's interfaces, and
-initialization. Changing these interactions (e.g. transitions between
-blocksync and consensus) from procedure calls to message passing.
-
-This is a relatively simple change and could be implemented with the following
-components:
-
- a constant to represent "local" delivery as  the ``To`` field on
-  ``p2p.Envelope``.
-
- special path for routing local messages that doesn't require message
-  serialization (protobuf marshalling/unmarshaling).
-
-Adding these semantics, particularly if in conjunction with synchronous
-semantics provides a solution to dependency graph problems currently present
-in the Tendermint codebase, which will simplify development, make it possible
-to isolate components for testing.
-
-Eventually, this will also make it possible to have a logical Tendermint node
-running in multiple processes or in a collection of containers, although the
-usecase of this may be debatable.
-
-Synchronous Semantics (Paired Request/Response)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-In the current system, all messages are sent with fire-and-forget semantics,
-and there's no coupling between a request sent via the p2p layer, and a
-response. These kinds of semantics would simplify the implementation of
-state and block sync reactors, and make intra-node message passing more
-powerful.
-
-For some interactions, like gossiping transactions between the mempools of
-different nodes, fire-and-forget semantics make sense, but for other
-operations the missing link between requests/responses leads to either
-inefficiency when a node fails to respond or becomes unavailable, or code that
-is just difficult to follow.
-
-To support this kind of work, the protocol would need to accommodate some kind
-of request/response ID to allow identifying out-of-order responses over a
-single connection. Additionally, expanded the programming model of the
-``p2p.Channel`` to accommodate some kind of _future_ or similar paradigm to
-make it viable to write reactor code without needing for the reactor developer
-to wrestle with lower level concurrency constructs.
-
-
-Timeout Handling (QoS)
-~~~~~~~~~~~~~~~~~~~~~~
-
-Currently, all timeouts, buffering, and QoS features are handled at the router
-layer, and the reactors are implemented in ways that assume/require
-asynchronous operation. This both increases the required complexity at the
-routing layer, and means that misbehavior at the reactor level is difficult to
-detect or attribute. Additionally, the current system provides three main
-parameters to control quality of service:
-
- buffer sizes for channels and queues.
-
- priorities for channels
-
- queue implementation details for shedding load.
-
-These end up being quite coarse controls, and changing the settings are
-difficult because as the queues and channels are able to buffer large numbers
-of messages it can be hard to see the impact of a given change, particularly
-in our extant test environment. In general, we should endeavor to:
-
- set real timeouts, via contexts, on most message send operations, so that
-  senders rather than queues can be responsible for timeout
-  logic. Additionally, this will make it possible to avoid sending messages
-  during shutdown.
-
- reduce (to the greatest extent possible) the amount of buffering in
-  channels and the queues, to more readily surface backpressure and reduce the
-  potential for buildup of stale messages.
-
-Stream Based Connection Handling
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Currently the transport layer is message based, which makes sense from a
-mental model of how the protocol works, but makes it more difficult to
-implement transports and connection types, as it forces a higher level view of
-the connection and interaction which makes it harder to implement for novel
-transport types and makes it more likely that message-based caching and rate
-limiting will be implemented at the transport layer rather than at a more
-appropriate level.
-
-The transport then, would be responsible for negotiating the connection and the
-handshake and otherwise behave like a socket/file descriptor with ``Read`` and
-``Write`` methods.
-
-While this was included in the initial design for the new P2P layer, it may be
-obviated entirely if the transport and peer layer is replaced with libp2p,
-which is primarily stream based.
-
-Service Discovery
-~~~~~~~~~~~~~~~~~
-
-In the current system, Tendermint assumes that all nodes in a network are
-largely equivalent, and nodes tend to be "chatty" making many requests of
-large numbers of peers and waiting for peers to (hopefully) respond. While
-this works and has allowed Tendermint to get to a certain point, this both
-produces a theoretical scaling bottle neck and makes it harder to test and
-verify components of the system.
-
-In addition to peer's identity and connection information, peers should be
-able to advertise a number of services or capabilities, and node operators or
-developers should be able to specify peer capability requirements (e.g. target
-at least <x>-percent of peers with <y> capability.)
-
-These capabilities may be useful in selecting peers to send messages to, it
-may make sense to extend Tendermint's message addressing capability to allow
-reactors to send messages to groups of peers based on role rather than only
-allowing addressing to one or all peers.
-
-Having a good service discovery mechanism may pair well with the synchronous
-semantics (request/response) work, as it allows reactors to "make a request of
-a peer with <x> capability and wait for the response," rather force the
-reactors to need to track the capabilities or state of specific peers.
-
-Solutions
---------
-
-Continued Homegrown Implementation
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The current peer system is homegrown and is conceptually compatible with the
-needs of the project, and while there are limitations to the system, the p2p
-layer is not (currently as of 0.35) a major source of bugs or friction during
-development.
-
-However, the current implementation makes a number of allowances for
-interoperability, and there are a collection of iterative improvements that
-should be considered in the next couple of releases. To maintain the current
-implementation, upcoming work would include:
-
- change the ``Transport`` mechanism to facilitate easier implementations.
-
- implement different ``Transport`` handlers to be able to manage peer
-  connections using different protocols (e.g. QUIC, etc.)
-
- entirely remove the constructs and implementations of the legacy peer
-  implementation.
-
- establish and enforce clearer chains of responsibility for connection
-  establishment (e.g. handshaking, setup,) which is currently shared between
-  three components.
-
- report better metrics regarding the into the state of peers and network
-  connectivity, which are opaque outside of the system. This is constrained at
-  the moment as a side effect of the split responsibility for connection
-  establishment.
-
- extend the PEX system to include service information so that nodes in the
-  network weren't necessarily homogeneous.
-
-While maintaining a bespoke peer management layer would seem to distract from
-development of core functionality, the truth is that (once the legacy code is
-removed,) the scope of the peer layer is relatively small from a maintenance
-perspective, and having control at this layer might actually afford the
-project with the ability to more rapidly iterate on some features.
-
-LibP2P
-~~~~~~
-
-LibP2P provides components that, approximately, account for the
-``PeerManager`` and ``Transport`` components of the current (new) P2P
-stack. The Go APIs seem reasonable, and being able to externalize the
-implementation details of peer and connection management seems like it could
-provide a lot of benefits, particularly in supporting a more active ecosystem.
-
-In general the API provides the kind of stream-based, multi-protocol
-supporting, and idiomatic baseline for implementing a peer layer. Additionally
-because it handles peer exchange and connection management at a lower
-level, by using libp2p it'd be possible to remove a good deal of code in favor
-of just using libp2p. Having said that, Tendermint's P2P layer covers a
-greater scope (e.g. message routing to different peers) and that layer is
-something that Tendermint might want to retain.
-
-The are a number of unknowns that require more research including how much of
-a peer database the Tendermint engine itself needs to maintain, in order to
-support higher level operations (consensus, statesync), but it might be the
-case that our internal systems need to know much less about peers than
-otherwise specified. Similarly, the current system has a notion of peer
-scoring that cannot be communicated to libp2p, which may be fine as this is
-only used to support peer exchange (PEX,) which would become a property libp2p
-and not expressed in it's current higher-level form.
-
-In general, the effort to switch to libp2p would involve:
-
- timing it during an appropriate protocol-breaking window, as it doesn't seem
-  viable to support both libp2p *and* the current p2p protocol.
-
- providing some in-memory testing network to support the use case that the
-  current ``p2p.MemoryNetwork`` provides.
-
- re-homing the ``p2p.Router`` implementation on top of libp2p components to
-  be able to maintain the current reactor implementations.
-
-Open question include:
-
- how much local buffering should we be doing? It sort of seems like we should
-  figure out what the expected behavior is for libp2p for QoS-type
-  functionality, and if our requirements mean that we should be implementing
-  this on top of things ourselves?
-
- if Tendermint was going to use libp2p, how would libp2p's stability
-  guarantees (protocol, etc.) impact/constrain Tendermint's stability
-  guarantees?
-
- what kind of introspection does libp2p provide, and to what extend would
-  this change or constrain the kind of observability that Tendermint is able
-  to provide?
-
- how do efforts to select "the best" (healthy, close, well-behaving, etc.)
-  peers work out if Tendermint is not maintaining a local peer database?
-
- would adding additional higher level semantics (internal message passing,
-  request/response pairs, service discovery, etc.) facilitate removing some of
-  the direct linkages between constructs/components in the system and reduce
-  the need for Tendermint nodes to maintain state about its peers?
-
-References
----------
-
- `Tracking Ticket for P2P Refactor Project <https://github.com/tendermint/tendermint/issues/5670>`_
- `ADR 61: P2P Refactor Scope <../architecture/adr-061-p2p-refactor-scope.md>`_
- `ADR 62: P2P Architecture and Abstraction <../architecture/adr-061-p2p-architecture.md>`_
--- a/docs/rfc/rfc-001-storage-engine.rst
+++ b/docs/rfc/rfc-001-storage-engine.rst
@@ -1,179 +0,0 @@
-===========================================
-RFC 001: Storage Engines and Database Layer
-===========================================
-
-Changelog
---------
-
- 2021-04-19: Initial Draft (gist)
- 2021-09-02: Migrated to RFC folder, with some updates  
-
-Abstract
--------
-
-The aspect of Tendermint that's responsible for persistence and storage (often
-"the database" internally) represents a bottle neck in the architecture of the
-platform, that the 0.36 release presents a good opportunity to correct. The
-current storage engine layer provides a great deal of flexibility that is
-difficult for users to leverage or benefit from, while also making it harder
-for Tendermint Core developers to deliver improvements on storage engine. This
-RFC discusses the possible improvements to this layer of the system.
-
-Background
----------
-
-Tendermint has a very thin common wrapper that makes Tendermint itself
-(largely) agnostic to the data storage layer (within the realm of the popular
-key-value/embedded databases.) This flexibility is not particularly useful:
-the benefits of a specific database engine in the context of Tendermint is not
-particularly well understood, and the maintenance burden for multiple backends
-is not commensurate with the benefit provided. Additionally, because the data
-storage layer is handled generically, and most tests run with an in-memory
-framework, it's difficult to take advantage of any higher-level features of a
-database engine.
-
-Ideally, developers within Tendermint will be able to interact with persisted
-data via an interface that can function, approximately like an object
-store, and this storage interface will be able to accommodate all existing
-persistence workloads (e.g. block storage, local peer management information
-like the "address book", crash-recovery log like the WAL.) In addition to
-providing a more ergonomic interface and new semantics, by selecting a single
-storage engine tendermint can use native durability and atomicity features of
-the storage engine and simplify its own implementations. 
-
-Data Access Patterns
-~~~~~~~~~~~~~~~~~~~~
-
-Tendermint's data access patterns have the following characteristics:
-
- aggregate data size often exceeds memory.
-
- data is rarely mutated after it's written for most data (e.g. blocks), but
-  small amounts of working data is persisted by nodes and is frequently
-  mutated (e.g. peer information, validator information.)
-
- read patterns can be quite random.
-
- crash resistance and crash recovery, provided by write-ahead-logs (in
-  consensus, and potentially for the mempool) should allow the system to
-  resume work after an unexpected shut down.
-
-Project Goals
-~~~~~~~~~~~~~
-
-As we think about replacing the current persistence layer, we should consider
-the following high level goals: 
-
- drop dependencies on storage engines that have a CGo dependency.
-
- encapsulate data format and data storage from higher-level services
-  (e.g. reactors) within tendermint.
-
- select a storage engine that does not incur any additional operational
-  complexity (e.g. database should be embedded.)
-
- provide database semantics with sufficient ACID, snapshots, and
-  transactional support.
-
-Open Questions
-~~~~~~~~~~~~~~
-
-The following questions remain:
-
- what kind of data-access concurrency does tendermint require?
-
- would tendermint users SDK/etc. benefit from some shared database
-  infrastructure?
-  
-  - In earlier conversations it seemed as if the SDK has selected Badger and
-    RocksDB for their storage engines, and it might make sense to be able to
-    (optionally) pass a handle to a Badger instance between the libraries in
-    some cases.
-
- what are typical data sizes, and what kinds of memory sizes can we expect
-  operators to be able to provide?
-
- in addition to simple persistence, what kind of additional semantics would
-  tendermint like to enjoy (e.g. transactional semantics, unique constraints,
-  indexes, in-place-updates, etc.)?
-
-Decision Framework
-~~~~~~~~~~~~~~~~~~
-
-Given the constraint of removing the CGo dependency, the decision is between
-"badger" and "boltdb" (in the form of the etcd/CoreOS fork,) as low level. On
-top of this and somewhat orthogonally, we must also decide on the interface to
-the database and how the larger application will have to interact with the
-database layer. Users of the data layer shouldn't ever need to interact with
-raw byte slices from the database, and should mostly have the experience of
-interacting with Go-types.
-
-Badger is more consistently developed and has a broader feature set than
-Bolt. At the same time, Badger is likely more memory intensive and may have
-more overhead in terms of open file handles given it's model. At first glance,
-Badger is the obvious choice: it's actively developed and it has a lot of
-features that could be useful. Bolt is not without some benefits: it's stable
-and is maintained by the etcd folks, it's simpler model (single memory mapped
-file, etc,) may be easier to reason about.
-
-I propose that we consider the following specific questions about storage
-engines:
-
- does Badger's evolving development, which may result in data file format
-  changes in the future, and could restrict our access to using the latest
-  version of the library between major upgrades, present a problem?
-
- do we do we have goals/concerns about memory footprint that Badger may
-  prevent us from hitting, particularly as data sets grow over time?
-
- what kind of additional tooling might we need/like to build (dump/restore,
-  etc.)?
-
- do we want to run unit/integration tests against a data files on disk rather
-  than relying exclusively on the memory database?
-
-Project Scope
-~~~~~~~~~~~~~
-
-This project will consist of the following aspects:
-
- selecting a storage engine, and modifying the tendermint codebase to
-  disallow any configuration of the storage engine outside of the tendermint. 
-
- remove the dependency on the current tm-db interfaces and replace with some
-  internalized, safe, and ergonomic interface for data persistence with all
-  required database semantics.
-
- update core tendermint code to use the new interface and data tools.
-
-Next Steps
-~~~~~~~~~~
-
- circulate the RFC, and discuss options with appropriate stakeholders. 
-  
- write brief ADR to summarize decisions around technical decisions reached
-  during the RFC phase. 
-
-References
----------
-
- `bolddb <https://github.com/etcd-io/bbolt>`_
- `badger <https://github.com/dgraph-io/badger>`_
- `badgerdb overview <https://dbdb.io/db/badgerdb>`_
- `botldb overview <https://dbdb.io/db/boltdb>`_
- `boltdb vs badger <https://tech.townsourced.com/post/boltdb-vs-badger>`_
- `bolthold <https://github.com/timshannon/bolthold>`_
- `badgerhold <https://github.com/timshannon/badgerhold>`_
- `Pebble <https://github.com/cockroachdb/pebble>`_
- `SDK Issue Regarding IVAL <https://github.com/cosmos/cosmos-sdk/issues/7100>`_
- `SDK Discussion about SMT/IVAL <https://github.com/cosmos/cosmos-sdk/discussions/8297>`_
-
-Discussion
----------
-
- All things being equal, my tendency would be to use badger, with badgerhold
-  (if that makes sense) for its ergonomics and indexing capabilities, which
-  will require some small selection of wrappers for better write transaction
-  support. This is a weakly held tendency/belief and I think it would be
-  useful for the RFC process to build consensus (or not) around this basic
-  assumption.
--- a/docs/rfc/rfc-002-ipc-ecosystem.md
+++ b/docs/rfc/rfc-002-ipc-ecosystem.md
@@ -1,420 +0,0 @@
-# RFC 002: Interprocess Communication (IPC) in Tendermint
-
-## Changelog
-
- 08-Sep-2021: Initial draft (@creachadair).
-
-
-## Abstract
-
-Communication in Tendermint among consensus nodes, applications, and operator
-tools all use different message formats and transport mechanisms.  In some
-cases there are multiple options. Having all these options complicates both the
-code and the developer experience, and hides bugs. To support a more robust,
-trustworthy, and usable system, we should document which communication paths
-are essential, which could be removed or reduced in scope, and what we can
-improve for the most important use cases.
-
-This document proposes a variety of possible improvements of varying size and
-scope. Specific design proposals should get their own documentation.
-
-
-## Background
-
-The Tendermint state replication engine has a complex IPC footprint.
-
-1. Consensus nodes communicate with each other using a networked peer-to-peer
-   message-passing protocol.
-
-2. Consensus nodes communicate with the application whose state is being
-   replicated via the [Application BlockChain Interface (ABCI)][abci].
-
-3. Consensus nodes export a network-accessible [RPC service][rpc-service] to
-   support operations (bootstrapping, debugging) and synchronization of [light clients][light-client].
-   This interface is also used by the [`tendermint` CLI][tm-cli].
-
-4. Consensus nodes export a gRPC service exposing a subset of the methods of
-   the RPC service described by (3). This was intended to simplify the
-   implementation of tools that already use gRPC to communicate with an
-   application (via the Cosmos SDK), and wanted to also talk to the consensus
-   node without implementing yet another RPC protocol.
-
-   The gRPC interface to the consensus node has been deprecated and is slated
-   for removal in the forthcoming Tendermint v0.36 release.
-
-5. Consensus nodes may optionally communicate with a "remote signer" that holds
-   a validator key and can provide public keys and signatures to the consensus
-   node. One of the stated goals of this configuration is to allow the signer
-   to be run on a private network, separate from the consensus node, so that a
-   compromise of the consensus node from the public network would be less
-   likely to expose validator keys.
-
-## Discussion: Transport Mechanisms
-
-### Remote Signer Transport
-
-A remote signer communicates with the consensus node in one of two ways:
-
-1. "Raw": Using a TCP or Unix-domain socket which carries varint-prefixed
-   protocol buffer messages. In this mode, the consensus node is the server,
-   and the remote signer is the client.
-
-   This mode has been deprecated, and is intended to be removed.
-
-2. gRPC: This mode uses the same protobuf messages as "Raw" node, but uses a
-   standard encrypted gRPC HTTP/2 stub as the transport. In this mode, the
-   remote signer is the server and the consensus node is the client.
-
-
-### ABCI Transport
-
-In ABCI, the _application_ is the server, and the Tendermint consensus engine
-is the client.  Most applications implement the server using the [Cosmos SDK][cosmos-sdk],
-which handles low-level details of the ABCI interaction and provides a
-higher-level interface to the rest of the application. The SDK is written in Go.
-
-Beneath the SDK, the application communicates with Tendermint core in one of
-two ways:
-
- In-process direct calls (for applications written in Go and compiled against
-  the Tendermint code).  This is an optimization for the common case where an
-  application is written in Go, to save on the overhead of marshaling and
-  unmarshaling requests and responses within the same process:
-  [`abci/client/local_client.go`][local-client]
-
- A custom remote procedure protocol built on wire-format protobuf messages
-  using a socket (the "socket protocol"): [`abci/server/socket_server.go`][socket-server]
-
-The SDK also provides a [gRPC service][sdk-grpc] accessible from outside the
-application, allowing transactions to be broadcast to the network, look up
-transactions, and simulate transaction costs.
-
-
-### RPC Transport
-
-The consensus node RPC service allows callers to query consensus parameters
-(genesis data, transactions, commits), node status (network info, health
-checks), application state (abci_query, abci_info), mempool state, and other
-attributes of the node and its application. The service also provides methods
-allowing transactions and evidence to be injected ("broadcast") into the
-blockchain.
-
-The RPC service is exposed in several ways:
-
- HTTP GET: Queries may be sent as URI parameters, with method names in the path.
-
- HTTP POST: Queries may be sent as JSON-RPC request messages in the body of an
-  HTTP POST request.  The server uses a custom implementation of JSON-RPC that
-  is not fully compatible with the [JSON-RPC 2.0 spec][json-rpc], but handles
-  the common cases.
-
- Websocket: Queries may be sent as JSON-RPC request messages via a websocket.
-  This transport uses more or less the same JSON-RPC plumbing as the HTTP POST
-  handler.
-
-  The websocket endpoint also includes three methods that are _only_ exported
-  via websocket, which appear to support event subscription.
-
- gRPC: A subset of queries may be issued in protocol buffer format to the gRPC
-  interface described above under (4). As noted, this endpoint is deprecated
-  and will be removed in v0.36.
-
-### Opportunities for Simplification
-
-**Claim:** There are too many IPC mechanisms.
-
-The preponderance of ABCI usage is via the Cosmos SDK, which means the
-application and the consensus node are compiled together into a single binary,
-and the consensus node calls the ABCI methods of the application directly as Go
-functions.
-
-We also need a true IPC transport to support ABCI applications _not_ written in
-Go.  There are also several known applications written in Rust, for example
-(including [Anoma](https://github.com/anoma/anoma), Penumbra,
-[Oasis](https://github.com/oasisprotocol/oasis-core), Twilight, and
-[Nomic](https://github.com/nomic-io/nomic)). Ideally we will have at most one
-such transport "built-in": More esoteric cases can be handled by a custom proxy.
-Pragmatically, gRPC is probably the right choice here.
-
-The primary consumers of the multi-headed "RPC service" today are the light
-client and the `tendermint` command-line client. There is probably some local
-use via curl, but I expect that is mostly ad hoc. Ethan reports that nodes are
-often configured with the ports to the RPC service blocked, which is good for
-security but complicates use by the light client.
-
-### Context: Remote Signer Issues
-
-Since the remote signer needs a secure communication channel to exchange keys
-and signatures, and is expected to run truly remotely from the node (i.e., on a
-separate physical server), there is not a whole lot we can do here. We should
-finish the deprecation and removal of the "raw" socket protocol between the
-consensus node and remote signers, but the use of gRPC is appropriate.
-
-The main improvement we can make is to simplify the implementation quite a bit,
-once we no longer need to support both "raw" and gRPC transports.
-
-### Context: ABCI Issues
-
-In the original design of ABCI, the presumption was that all access to the
-application should be mediated by the consensus node. The idea is that outside
-access could change application state and corrupt the consensus process, which
-relies on the application to be deterministic. Of course, even without outside
-access an application could behave nondeterministically, but allowing other
-programs to send it requests was seen as courting trouble.
-
-Conversely, users noted that most of the time, tools written for a particular
-application don't want to talk to the consensus module directly. The
-application "owns" the state machine the consensus engine is replicating, so
-tools that care about application state should talk to the application.
-Otherwise, they would have to bake in knowledge about Tendermint (e.g., its
-interfaces and data structures) just because of the mediation.
-
-For clients to talk directly to the application, however, there is another
-concern: The consensus node is the ABCI _client_, so it is inconvenient for the
-application to "push" work into the consensus module via ABCI itself.  The
-current implementation works around this by calling the consensus node's RPC
-service, which exposes an `ABCIQuery` kitchen-sink method that allows the
-application a way to poke ABCI messages in the other direction.
-
-Without this RPC method, you could work around this (at least in principle) by
-having the consensus module "poll" the application for work that needs done,
-but that has unsatisfactory implications for performance and robustness, as
-well as being harder to understand.
-
-There has apparently been discussion about trying to make a more bidirectional
-communication between the consensus node and the application, but this issue
-seems to still be unresolved.
-
-Another complication of ABCI is that it requires the application (server) to
-maintain [four separate connections][abci-conn]: One for "consensus" operations
-(BeginBlock, EndBlock, DeliverTx, Commit), one for "mempool" operations, one
-for "query" operations, and one for "snapshot" (state synchronization) operations.
-The rationale seems to have been that these groups of operations should be able
-to proceed concurrently with each other. In practice, it results in a very complex
-state management problem to coordinate state updates between the separate streams.
-While application authors in Go are mostly insulated from that complexity by the
-Cosmos SDK, the plumbing to maintain those separate streams is complicated, hard
-to understand, and we suspect it contains concurrency bugs and/or lock contention
-issues affecting performance that are subtle and difficult to pin down.
-
-Even without changing the semantics of any ABCI operations, this code could be
-made smaller and easier to debug by separating the management of concurrency
-and locking from the IPC transport: If all requests and responses are routed
-through one connection, the server can explicitly maintain priority queues for
-requests and responses, and make less-conservative decisions about when locks
-are (or aren't) required to synchronize state access. With independent queues,
-the server must lock conservatively, and no optimistic scheduling is practical.
-
-This would be a tedious implementation change, but should be achievable without
-breaking any of the existing interfaces. More importantly, it could potentially
-address a lot of difficult concurrency and performance problems we currently
-see anecdotally but have difficultly isolating because of how intertwined these
-separate message streams are at runtime.
-
-TODO: Impact of ABCI++ for this topic?
-
-### Context: RPC Issues
-
-The RPC system serves several masters, and has a complex surface area. I
-believe there are some improvements that can be exposed by separating some of
-these concerns.
-
-The Tendermint light client currently uses the RPC service to look up blocks
-and transactions, and to forward ABCI queries to the application.  The light
-client proxy uses the RPC service via a websocket. The Cosmos IBC relayer also
-uses the RPC service via websocket to watch for transaction events, and uses
-the `ABCIQuery` method to fetch information and proofs for posted transactions.
-
-Some work is already underway toward using P2P message passing rather than RPC
-to synchronize light client state with the rest of the network.  IBC relaying,
-however, requires access to the event system, which is currently not accessible
-except via the RPC interface. Event subscription _could_ be exposed via P2P,
-but that is a larger project since it adds P2P communication load, and might
-thus have an impact on the performance of consensus.
-
-If event subscription can be moved into the P2P network, we could entirely
-remove the websocket transport, even for clients that still need access to the
-RPC service. Until then, we may still be able to reduce the scope of the
-websocket endpoint to _only_ event subscription, by moving uses of the RPC
-server as a proxy to ABCI over to the gRPC interface.
-
-Having the RPC server still makes sense for local bootstrapping and operations,
-but can be further simplified. Here are some specific proposals:
-
- Remove the HTTP GET interface entirely.
-
- Simplify JSON-RPC plumbing to remove unnecessary reflection and wrapping.
-
- Remove the gRPC interface (this is already planned for v0.36).
-
- Separate the websocket interface from the rest of the RPC service, and
-  restrict it to only event subscription.
-
-  Eventually we should try to emove the websocket interface entirely, but we
-  will need to revisit that (probably in a new RFC) once we've done some of the
-  easier things.
-
-These changes would preserve the ability of operators to issue queries with
-curl (but would require using JSON-RPC instead of URI parameters). That would
-be a little less user-friendly, but for a use case that should not be that
-prevalent.
-
-These changes would also preserve compatibility with existing JSON-RPC based
-code paths like the `tendermint` CLI and the light client (even ahead of
-further work to remove that dependency).
-
-**Design goal:** An operator should be able to disable non-local access to the
-RPC server on any node in the network without impairing the ability of the
-network to function for service of state replication, including light clients.
-
-**Design principle:** All communication required to implement and monitor the
-consensus network should use P2P, including the various synchronizations.
-
-### Options for ABCI Transport
-
-The majority of current usage is in Go, and the majority of that is mediated by
-the Cosmos SDK, which uses the "direct call" interface. There is probably some
-opportunity to clean up the implementation of that code, notably by inverting
-which interface is at the "top" of the abstraction stack (currently it acts
-like an RPC interface, and escape-hatches into the direct call). However, this
-general approach works fine and doesn't need to be fundamentally changed.
-
-For applications _not_ written in Go, the two remaining options are the
-"socket" protocol (another variation on varint-prefixed protobuf messages over
-an unstructured stream) and gRPC. It would be nice if we could get rid of one
-of these to reduce (unneeded?) optionality.
-
-Since both the socket protocol and gRPC depend on protocol buffers, the
-"socket" protocol is the most obvious choice to remove. While gRPC is more
-complex, the set of languages that _have_ protobuf support but _lack_ gRPC
-support is small. Moreover, gRPC is already widely used in the rest of the
-ecosystem (including the Cosmos SDK).
-
-If some use case did arise later that can't work with gRPC, it would not be too
-difficult for that application author to write a little proxy (in Go) that
-bridges the convenient SDK APIs into a simpler protocol than gRPC.
-
-**Design principle:** It is better for an uncommon special case to carry the
-burdens of its specialness, than to bake an escape hatch into the infrastructure.
-
-**Recommendation:** We should deprecate and remove the socket protocol.
-
-### Options for RPC Transport
-
-[ADR 057][adr-57] proposes using gRPC for the Tendermint RPC implementation.
-This is still possible, but if we are able to simplify and decouple the
-concerns as described above, I do not think it should be necessary.
-
-While JSON-RPC is not the best possible RPC protocol for all situations, it has
-some advantages over gRPC for our domain. Specifically:
-
- It is easy to call JSON-RPC manually from the command-line, which helps with
-  a common concern for the RPC service, local debugging and operations.
-
-  Relatedly: JSON is relatively easy for humans to read and write, and it can
-  be easily copied and pasted to share sample queries and debugging results in
-  chat, issue comments, and so on. Ideally, the RPC service will not be used
-  for activities where the costs of a text protocol are important compared to
-  its legibility and manual usability benefits.
-
- gRPC has an enormous dependency footprint for both clients and servers, and
-  many of the features it provides to support security and performance
-  (encryption, compression, streaming, etc.) are mostly irrelevant to local
-  use. Tendermint already needs to include a gRPC client for the remote signer,
-  but if we can avoid the need for a _client_ to depend on gRPC, that is a win
-  for usability.
-
- If we intend to migrate light clients off RPC to use P2P entirely, there is
-  no advantage to forcing a temporary migration to gRPC along the way; and once
-  the light client is not dependent on the RPC service, the efficiency of the
-  protocol is much less important.
-
- We can still get the benefits of generated data types using protocol buffers, even
-  without using gRPC:
-
-  - Protobuf defines a standard JSON encoding for all message types so
-    languages with protobuf support do not need to worry about type mapping
-    oddities.
-
-  - Using JSON means that even languages _without_ good protobuf support can
-    implement the protocol with a bit more work, and I expect this situation to
-    be rare.
-
-Even if a language lacks a good standard JSON-RPC mechanism, the protocol is
-lightweight and can be implemented by simple send/receive over TCP or
-Unix-domain sockets with no need for code generation, encryption, etc. gRPC
-uses a complex HTTP/2 based transport that is not easily replicated.
-
-### Future Work
-
-The background and proposals sketched above focus on the existing structure of
-Tendermint and improvements we can make in the short term. It is worthwhile to
-also consider options for longer-term broader changes to the IPC ecosystem.
-The following outlines some ideas at a high level:
-
- **Consensus service:** Today, the application and the consensus node are
-  nominally connected only via ABCI. Tendermint was originally designed with
-  the assumption that all communication with the application should be mediated
-  by the consensus node.  Based on further experience, however, the design goal
-  is now that the _application_ should be the mediator of application state.
-
-  As noted above, however, ABCI is a client/server protocol, with the
-  application as the server. For outside clients that turns out to have been a
-  good choice, but it complicates the relationship between the application and
-  the consensus node: Previously transactions were entered via the node, now
-  they are entered via the app.
-
-  We have worked around this by using the Tendermint RPC service to give the
-  application a "back channel" to the consensus node, so that it can push
-  transactions back into the consensus network. But the RPC service exposes a
-  lot of other functionality, too, including event subscription, block and
-  transaction queries, and a lot of node status information.
-
-  Even if we can't easily "fix" the orientation of the ABCI relationship, we
-  could improve isolation by splitting out the parts of the RPC service that
-  the application needs as a back-channel, and sharing those _only_ with the
-  application. By defining a "consensus service", we could give the application
-  a way to talk back limited to only the capabilities it needs. This approach
-  has the benefit that we could do it without breaking existing use, and if we
-  later did "fix" the ABCI directionality, we could drop the special case
-  without disrupting the rest of the RPC interface.
-
- **Event service:** Right now, the IBC relayer relies on the Tendermint RPC
-  service to provide a stream of block and transaction events, which it uses to
-  discover which transactions need relaying to other chains.  While I think
-  that event subscription should eventually be handled via P2P, we could gain
-  some immediate benefit by splitting out event subscription from the rest of
-  the RPC service.
-
-  In this model, an event subscription service would be exposed on the public
-  network, but on a different endpoint. This would remove the need for the RPC
-  service to support the websocket protocol, and would allow operators to
-  isolate potentially sensitive status query results from the public network.
-
-  At the moment the relayers also use the RPC service to get block data for
-  synchronization, but work is already in progress to handle that concern via
-  the P2P layer. Once that's done, event subscription could be separated.
-
-Separating parts of the existing RPC service is not without cost: It might
-require additional connection endpoints, for example, though it is also not too
-difficult for multiple otherwise-independent services to share a connection.
-
-In return, though, it would become easier to reduce transport options and for
-operators to independently control access to sensitive data. Considering the
-viability and implications of these ideas is beyond the scope of this RFC, but
-they are documented here since they follow from the background we have already
-discussed.
-
-## References
-
-[abci]: https://github.com/tendermint/spec/tree/95cf253b6df623066ff7cd4074a94e7a3f147c7a/spec/abci
-[rpc-service]: https://docs.tendermint.com/master/rpc/
-[light-client]: https://docs.tendermint.com/master/tendermint-core/light-client.html
-[tm-cli]: https://github.com/tendermint/tendermint/tree/master/cmd/tendermint
-[cosmos-sdk]: https://github.com/cosmos/cosmos-sdk/
-[local-client]: https://github.com/tendermint/tendermint/blob/master/abci/client/local_client.go
-[socket-server]: https://github.com/tendermint/tendermint/blob/master/abci/server/socket_server.go
-[sdk-grpc]: https://pkg.go.dev/github.com/cosmos/cosmos-sdk/types/tx#ServiceServer
-[json-rpc]: https://www.jsonrpc.org/specification
-[abci-conn]: https://github.com/tendermint/spec/blob/master/spec/abci/apps.md#state
-[adr-57]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-057-RPC.md
--- a/docs/rfc/rfc-003-performance-questions.md
+++ b/docs/rfc/rfc-003-performance-questions.md
@@ -1,283 +0,0 @@
-# RFC 003: Taxonomy of potential performance issues in Tendermint 
-
-## Changelog
-
- 2021-09-02: Created initial draft (@wbanfield)
- 2021-09-14: Add discussion of the event system (@wbanfield)
-
-## Abstract
-
-This document discusses the various sources of performance issues in Tendermint and
-attempts to clarify what work may be required to understand and address them.
-
-## Background
-
-Performance, loosely defined as the ability of a software process to perform its work
-quickly and efficiently under load and within reasonable resource limits, is a frequent
-topic of discussion in the Tendermint project.
-To effectively address any issues with Tendermint performance we need to
-categorize the various issues, understand their potential sources, and gauge their
-impact on users.
-
-Categorizing the different known performance issues will allow us to discuss and fix them
-more systematically. This document proposes a rough taxonomy of performance issues
-and highlights areas where more research into potential performance problems is required.
-
-Understanding Tendermint's performance limitations will also be critically important
-as we make changes to many of its subsystems. Performance is a central concern for
-upcoming decisions regarding the `p2p` protocol, RPC message encoding and structure,
-database usage and selection, and consensus protocol updates.
-
-
-## Discussion
-
-This section attempts to delineate the different sections of Tendermint functionality
-that are often cited as having performance issues. It raises questions and suggests
-lines of inquiry that may be valuable for better understanding Tendermint's performance issues.
-
-As a note: We should avoid quickly adding many microbenchmarks or package level benchmarks. 
-These are prone to being worse than useless as they can obscure what _should_ be
-focused on: performance of the system from the perspective of a user. We should,
-instead, tune performance with an eye towards user needs and actions users make. These users comprise
-both operators of Tendermint chains and the people generating transactions for
-Tendermint chains. Both of these sets of users are largely aligned in wanting an end-to-end
-system that operates quickly and efficiently.
-
-REQUEST: The list below may be incomplete, if there are additional sections that are often
-cited as creating poor performance, please comment so that they may be included.
-
-### P2P
-
-#### Claim: Tendermint cannot scale to large numbers of nodes
-
-A complaint has been reported that Tendermint networks cannot scale to large numbers of nodes.
-The listed number of nodes a user reported as causing issue was in the thousands.
-We don't currently have evidence about what the upper-limit of nodes that Tendermint's
-P2P stack can scale to.
-
-We need to more concretely understand the source of issues and determine what layer
-is causing a problem. It's possible that the P2P layer, in the absence of any reactors
-sending data, is perfectly capable of managing thousands of peer connections. For
-a reasonable networking and application setup, thousands of connections should not present any
-issue for the application.
-
-We need more data to understand the problem directly. We want to drive the popularity
-and adoption of Tendermint and this will mean allowing for chains with more validators.
-We should follow up with users experiencing this issue. We may then want to add
-a series of metrics to the P2P layer to better understand the inefficiencies it produces.
-
-The following metrics can help us understand the sources of latency in the Tendermint P2P stack:
-
-* Number of messages sent and received per second
-* Time of a message spent on the P2P layer send and receive queues
-
-The following metrics exist and should be leveraged in addition to those added:
-
-* Number of peers node's connected to
-* Number of bytes per channel sent and received from each peer
-
-### Sync
-
-#### Claim: Block Syncing is slow
-
-Bootstrapping a new node in a network to the height of the rest of the network is believed to
-take longer than users would like. Block sync requires fetching all of the blocks from
-peers and placing them into the local disk for storage. A useful line of inquiry
-is understanding how quickly a perfectly tuned system _could_ fetch all of the state
-over a network so that we understand how much overhead Tendermint actually adds.
-
-The operation is likely to be _incredibly_ dependent on the environment in which
-the node is being run. The factors that will influence syncing include:
-1. Number of peers that a syncing node may fetch from.
-2. Speed of the disk that a validator is writing to.
-3. Speed of the network connection between the different peers that node is
-syncing from.
-
-We should calculate how quickly this operation _could possibly_ complete for common chains and nodes.
-To calculate how quickly this operation could possibly complete, we should assume that
-a node is reading at line-rate of the NIC and writing at the full drive speed to its
-local storage. Comparing this theoretical upper-limit to the actual sync times
-observed by node operators will give us a good point of comparison for understanding
-how much overhead Tendermint incurs.
-
-We should additionally add metrics to the blocksync operation to more clearly pinpoint
-slow operations. The following metrics should be added to the block syncing operation:
-
-* Time to fetch and validate each block
-* Time to execute a block
-* Blocks sync'd per unit time
-
-### Application
-
-Applications performing complex state transitions have the potential to bottleneck
-the Tendermint node.
-
-#### Claim: ABCI block delivery could cause slowdown
-
-ABCI delivers blocks in several methods: `BeginBlock`, `DeliverTx`, `EndBlock`, `Commit`.
-
-Tendermint delivers transactions one-by-one via the `DeliverTx` call. Most of the 
-transaction delivery in Tendermint occurs asynchronously and therefore appears unlikely to
-form a bottleneck in ABCI.
-
-After delivering all transactions, Tendermint then calls the `Commit` ABCI method.
-Tendermint [locks all access to the mempool][abci-commit-description] while `Commit`
-proceeds. This means that an application that is slow to execute all of its
-transactions or finalize state during the `Commit` method will prevent any new
-transactions from being added to the mempool.  Apps that are slow to commit will
-prevent consensus from proceeded to the next consensus height since Tendermint
-cannot validate block proposals or produce block proposals without the
-AppHash obtained from the `Commit` method. We should add a metric for each
-step in the ABCI protocol to track the amount of time that a node spends communicating
-with the application at each step.
-
-#### Claim: ABCI serialization overhead causes slowdown
-
-The most common way to run a Tendermint application is using the Cosmos-SDK.
-The Cosmos-SDK runs the ABCI application within the same process as Tendermint.
-When an application is run in the same process as Tendermint, a serialization penalty
-is not paid. This is because the local ABCI client does not serialize method calls
-and instead passes the protobuf type through directly. This can be seen
-in [local_client.go][abci-local-client-code].
-
-Serialization and deserialization in the gRPC and socket protocol ABCI methods
-may cause slowdown. While these may cause issue, they are not part of the primary
-usecase of Tendermint and do not necessarily need to be addressed at this time.
-
-### RPC
-
-#### Claim: The Query API is slow.
-
-The query API locks a mutex across the ABCI connections. This causes consensus to
-slow during queries, as ABCI is no longer able to make progress. This is known
-to be causing issue in the cosmos-sdk and is being addressed [in the sdk][sdk-query-fix]
-but a more robust solution may be required. Adding metrics to each ABCI client connection
-and message as described in the Application section of this document would allow us
-to further introspect the issue here. 
-
-#### Claim: RPC Serialization may cause slowdown
-
-The Tendermint RPC uses a modified version of JSON-RPC. This RPC powers the `broadcast_tx_*` methods,
-which is a critical method for adding transactions to Tendermint at the moment. This method is
-likely invoked quite frequently on popular networks. Being able to perform efficiently
-on this common and critical operation is very important. The current JSON-RPC implementation
-relies heavily on type introspection via reflection, which is known to be very slow in
-Go. We should therefore produce benchmarks of this method to determine how much overhead
-we are adding to what, is likely to be, a very common operation.
-
-The other JSON-RPC methods are much less critical to the core functionality of Tendermint.
-While there may other points of performance consideration within the RPC, methods that do not
-receive high volumes of requests should not be prioritized for performance consideration.
-
-NOTE: Previous discussion of the RPC framework was done in [ADR 57][adr-57] and 
-there is ongoing work to inspect and alter the JSON-RPC framework in [RFC 002][rfc-002]. 
-Much of these RPC-related performance considerations can either wait until the work of RFC 002 work is done or be
-considered concordantly with the in-flight changes to the JSON-RPC.
-
-### Protocol
-
-#### Claim: Gossiping messages is a slow process
-
-Currently, for any validator to successfully vote in a consensus _step_, it must
-receive votes from greater than 2/3 of the validators on the network. In many cases,
-it's preferable to receive as many votes as possible from correct validators.
-
-This produces a quadratic increase in messages that are communicated as more validators join the network.
-(Each of the N validators must communicate with all other N-1 validators).
-
-This large number of messages communicated per step has been identified to impact
-performance of the protocol. Given that the number of messages communicated has been
-identified as a bottleneck, it would be extremely valuable to gather data on how long
-it takes for popular chains with many validators to gather all votes within a step.
-
-Metrics that would improve visibility into this include:
-
-* Amount of time for a node to gather votes in a step.
-* Amount of time for a node to gather all block parts.
-* Number of votes each node sends to gossip (i.e. not its own votes, but votes it is
-transmitting for a peer).
-* Total number of votes each node sends to receives (A node may receive duplicate votes
-so understanding how frequently this occurs will be valuable in evaluating the performance
-of the gossip system).
-
-#### Claim: Hashing Txs causes slowdown in Tendermint
-
-Using a faster hash algorithm for Tx hashes is currently a point of discussion
-in Tendermint. Namely, it is being considered as part of the [modular hashing proposal][modular-hashing].
-It is currently unknown if hashing transactions in the Mempool forms a significant bottleneck.
-Although it does not appear to be documented as slow, there are a few open github
-issues that indicate a possible user preference for a faster hashing algorithm,
-including [issue 2187][issue-2187] and [issue 2186][issue-2186]. 
-
-It is likely worth investigating what order of magnitude Tx hashing takes in comparison to other
-aspects of adding a Tx to the mempool. It is not currently clear if the rate of adding Tx
-to the mempool is a source of user pain. We should not endeavor to make large changes to
-consensus critical components without first being certain that the change is highly
-valuable and impactful.
-
-### Digital Signatures
-
-#### Claim: Verification of digital signatures may cause slowdown in Tendermint
-
-Working with cryptographic signatures can be computationally expensive. The cosmos
-hub uses [ed25519 signatures][hub-signature]. The library performing signature
-verification in Tendermint on votes is [benchmarked][ed25519-bench] to be able to perform an `ed25519`
-signature in 75μs on a decently fast CPU. A validator in the Cosmos Hub performs
-3 sets of verifications on the signatures of the 140 validators in the Hub
-in a consensus round, during block verification, when verifying the prevotes, and
-when verifying the precommits. With no batching, this would be roughly `3ms` per
-round. It is quite unlikely, therefore, that this accounts for any serious amount
-of the ~7 seconds of block time per height in the Hub.
-
-This may cause slowdown when syncing, since the process needs to constantly verify
-signatures. It's possible that improved signature aggregation will lead to improved
-light client or other syncing performance. In general, a metric should be added
-to track block rate while blocksyncing.
-
-#### Claim: Our use of digital signatures in the consensus protocol contributes to performance issue
-
-Currently, Tendermint's digital signature verification requires that all validators
-receive all vote messages. Each validator must receive the complete digital signature
-along with the vote message that it corresponds to. This means that all N validators
-must receive messages from at least 2/3 of the N validators in each consensus
-round. Given the potential for oddly shaped network topologies and the expected
-variable network roundtrip times of a few hundred milliseconds in a blockchain,
-it is highly likely that this amount of gossiping is leading to a significant amount
-of the slowdown in the Cosmos Hub and in Tendermint consensus.
-
-### Tendermint Event System
-
-#### Claim: The event system is a bottleneck in Tendermint
-
-The Tendermint Event system is used to communicate and store information about
-internal Tendermint execution. The system uses channels internally to send messages
-to different subscribers. Sending an event [blocks on the internal channel][event-send].
-The default configuration is to [use an unbuffered channel for event publishes][event-buffer-capacity].
-Several consumers of the event system also use an unbuffered channel for reads.
-An example of this is the [event indexer][event-indexer-unbuffered], which takes an
-unbuffered subscription to the event system. The result is that these unbuffered readers
-can cause writes to the event system to block or slow down depending on contention in the
-event system. This has implications for the consensus system, which [publishes events][consensus-event-send].
-To better understand the performance of the event system, we should add metrics to track the timing of
-event sends. The following metrics would be a good start for tracking this performance:
-
-* Time in event send, labeled by Event Type
-* Time in event receive, labeled by subscriber
-* Event throughput, measured in events per unit time.
-
-### References
-[modular-hashing]: https://github.com/tendermint/tendermint/pull/6773
-[issue-2186]: https://github.com/tendermint/tendermint/issues/2186
-[issue-2187]: https://github.com/tendermint/tendermint/issues/2187
-[rfc-002]: https://github.com/tendermint/tendermint/pull/6913
-[adr-57]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-057-RPC.md
-[issue-1319]: https://github.com/tendermint/tendermint/issues/1319
-[abci-commit-description]: https://github.com/tendermint/spec/blob/master/spec/abci/apps.md#commit
-[abci-local-client-code]: https://github.com/tendermint/tendermint/blob/511bd3eb7f037855a793a27ff4c53c12f085b570/abci/client/local_client.go#L84
-[hub-signature]: https://github.com/cosmos/gaia/blob/0ecb6ed8a244d835807f1ced49217d54a9ca2070/docs/resources/genesis.md#consensus-parameters
-[ed25519-bench]: https://github.com/oasisprotocol/curve25519-voi/blob/d2e7fc59fe38c18ca990c84c4186cba2cc45b1f9/PERFORMANCE.md
-[event-send]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/libs/pubsub/pubsub.go#L338
-[event-buffer-capacity]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/types/event_bus.go#L14
-[event-indexer-unbuffered]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/state/indexer/indexer_service.go#L39
-[consensus-event-send]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/internal/consensus/state.go#L1573
-[sdk-query-fix]: https://github.com/cosmos/cosmos-sdk/pull/10045
--- a/docs/rfc/rfc-004-e2e-framework.rst
+++ b/docs/rfc/rfc-004-e2e-framework.rst
@@ -1,213 +0,0 @@
-========================================
-RFC 004: E2E Test Framework Enhancements
-========================================
-
-Changelog
---------
-
- 2021-09-14: started initial draft (@tychoish)
-
-Abstract
--------
-
-This document discusses a series of improvements to the e2e test framework
-that we can consider during the next few releases to help boost confidence in
-Tendermint releases, and improve developer efficiency.
-
-Background
----------
-
-During the 0.35 release cycle, the E2E tests were a source of great
-value, helping to identify a number of bugs before release. At the same time,
-the tests were not consistently passing during this time, thereby reducing
-their value, and forcing the core development team to allocate time and energy
-to maintaining and chasing down issues with the e2e tests and the test
-harness. The experience of this release cycle calls to mind a series of
-improvements to the test framework, and this document attempts to capture
-these improvements, along with motivations, and potential for impact.
-
-Projects
--------
-
-Flexible Workload Generation
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Presently the e2e suite contains a single workload generation pattern, which
-exists simply to ensure that the test networks have some work during their
-runs. However, the shape and volume of the work is very consistent and is very
-gentle to help ensure test reliability.
-
-We don't need a complex workload generation framework, but being able to have 
-a few different workload shapes available for test networks, both generated and
-hand-crafted, would be useful.
-
-Workload patterns/configurations might include:
-
- transaction targeting patterns (include light nodes, round robin, target
-  individual nodes)
-
- variable transaction size over time.
-
- transaction broadcast option (synchronously, checked, fire-and-forget,
-  mixed).
-
- number of transactions to submit.
-
- non-transaction workloads: (evidence submission, query, event subscription.)
-
-Configurable Generator
-~~~~~~~~~~~~~~~~~~~~~~
-
-The nightly e2e suite is defined by the `testnet generator
-<https://github.com/tendermint/tendermint/blob/master/test/e2e/generator/generate.go#L13-L65>`_,
-and it's difficult to add dimensions or change the focus of the test suite in
-any way without modifying the implementation of the generator. If the
-generator were more configurable, potentially via a file rather than in
-the Go implementation, we could modify the focus of the test suite on the
-fly.
-
-Features that we might want to configure:
-
- number of test networks to generate of various topologies, to improve
-  coverage of different configurations.
-
- test application configurations (to modify the latency of ABCI calls, etc.)
-
- size of test networks.
-
- workload shape and behavior.
-
- initial sync and catch-up configurations.
-
-The workload generator currently provides runtime options for limiting the
-generator to specific types of P2P stacks, and for generating multiple groups
-of test cases to support parallelism. The goal is to extend this pattern and
-avoid hardcoding the matrix of test cases in the generator code.  Once the
-testnet configuration generation behavior is configurable at runtime,
-developers may be able to use the e2e framework to validate changes before
-landing changes that break e2e tests a day later.
-
-In addition to the autogenerated suite, it might make sense to maintain a
-small collection of hand-crafted cases that exercise configurations of
-concern, to run as part of the nightly (or less frequent) loop.
-
-Implementation Plan Structure
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-As a development team, we should determine the features should impact the e2e
-testing early in the development cycle, and if we intend to modify the e2e
-tests to exercise a feature, we should identify this early and begin the
-integration process as early as possible.
-
-To facilitate this, we should adopt a practice whereby we exercise specific
-features that are currently under development more rigorously in the e2e
-suite, and then as development stabilizes we can reduce the number or weight
-of these features in the suite.
-
-As of 0.35 there are essentially two end to end tests: the suite of 64
-generated test networks, and the hand crafted `ci.toml` test case. The
-generated test cases help provide systemtic coverage, while the `ci` run 
-provides coverage for a large number of features. 
-
-Reduce Cycle Time
-~~~~~~~~~~~~~~~~~
-
-One of the barriers to leveraging the e2e framework, and one of the challenges
-in debugging failures, is the cycle time of running a single test iteration is
-quite high: 5 minutes to build the docker image, plus the time to run the test
-or tests.
-
-There are a number of improvements and enhancements that can reduce the cycle
-time in practice:
-
- reduce the amount of time required to build the docker image used in these
-  tests. Without the dependency on CGo, the tendermint binaries could be
-  (cross) compiled outside of the docker container and then injected into
-  them, which would take better advantage of docker's native caching,
-  although, without the dependency on CGo there would be no hard requirement
-  for the e2e tests to use docker.
-
- support test parallelism. Because of the way the testnets are orchestrated
-  a single system can really only run one network at a time. For executions
-  (local or remote) with more resources, there's no reason to run a few
-  networks in parallel to reduce the feedback time.
-
- prune testnet configurations that are unlikely to provide good signal, to
-  shorten the time to feedback.
-
- apply some kind of tiered approach to test execution, to improve the
-  legibility of the test result. For example order tests by the dependency of
-  their features, or run test networks without perturbations before running
-  that configuration with perturbations, to be able to isolate the impact of
-  specific features.
-
- orchestrate the test harness directly from go test rather than via a special
-  harness and shell scripts so e2e tests may more naively fit into developers
-  existing workflows.
-
-Many of these improvements, particularly, reducing the build time will also
-reduce the time to get feedback during automated builds.
-
-Deeper Insights
-~~~~~~~~~~~~~~~
-
-When a test network fails, it's incredibly difficult to understand _why_ the
-network failed, as the current system provides very little insight into the
-system outside of the process logs. When a test network stalls or fails
-developers should be able to quickly and easily get a sense of the state of
-the network and all nodes.
-
-Improvements in persuit of this goal, include functionality that would help
-node operators in production environments by improving the quality and utility
-of the logging messages and other reported metrics, but also provide some
-tools to collect and aggregate this data for developers in the context of test
-networks.
-
- Interleave messages from all nodes in the network to be able to correlate
-  events during the test run.
-
- Collect structured metrics of the system operation (CPU/MEM/IO) during the
-  test run, as well as from each tendermint/application process.
-
- Build (simple) tools to be able to render and summarize the data collected
-  during the test run to answer basic questions about test outcome.
-
-Flexible Assertions
-~~~~~~~~~~~~~~~~~~~
-
-Currently, all assertions run for every test network, which makes the
-assertions pretty bland, and the framework primarily useful as a smoke-test
-framework, but it might be useful to be able to write and run different
-tests for different configurations. This could allow us to test outside of the
-happy-path.
-
-In general our existing assertions occupy a fraction of the total test time,
-so the relative cost of adding a few extra test assertions would be of limited
-cost, and could help build confidence.
-
-Additional Kinds of Testing
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The existing e2e suite, exercises networks of nodes that have homogeneous
-tendermint version, stable configuration, that are expected to make
-progress. There are many other possible test configurations that may be
-interesting to engage with. These could include dimensions, such as:
-
- Multi-version testing to exercise our compatibility guarantees for networks
-  that might have different tendermint versions.
-
- As a flavor or mult-version testing, include upgrade testing, to build
-  confidence in migration code and procedures.
-
- Additional test applications, particularly practical-type applciations
-  including some that use gaiad and/or the cosmos-sdk. Test-only applications
-  that simulate other kinds of applications (e.g. variable application
-  operation latency.)
-
- Tests of "non-viable" configurations that ensure that forbidden combinations
-  lead to halts.
-
-References
----------
-
- `ADR 66: End-to-End Testing <../architecture/adr-66-e2e-testing.md>`_
--- a/docs/rfc/rfc-005-event-system.rst
+++ b/docs/rfc/rfc-005-event-system.rst
@@ -1,122 +0,0 @@
-=====================
-RFC 005: Event System
-=====================
-
-Changelog
---------
-
- 2021-09-17: Initial Draft (@tychoish)
-
-Abstract
--------
-
-The event system within Tendermint, which supports a lot of core
-functionality, also represents a major infrastructural liability. As part of
-our upcoming review of the RPC interfaces and our ongoing thoughts about
-stability and performance, as well as the preparation for Tendermint 1.0, we
-should revisit the design and implementation of the event system. This
-document discusses both the current state of the system and potential
-directions for future improvement.
-
-Background
----------
-
-Current State of Events
-~~~~~~~~~~~~~~~~~~~~~~~
-
-The event system makes it possible for clients, both internal and external,
-to receive notifications of state replication events, such as new blocks,
-new transactions, validator set changes, as well as intermediate events during
-consensus. Because the event system is very cross cutting, the behavior and
-performance of the event publication and subscription system has huge impacts
-for all of Tendermint.
-
-The subscription service is exposed over the RPC interface, but also powers
-the indexing (e.g. to an external database,) and is the mechanism by which
-`BroadcastTxCommit` is able to wait for transactions to land in a block.
-
-The current pubsub mechanism relies on a couple of buffered channels,
-primarily between all event creators and subscribers, but also for each
-subscription. The result of this design is that, in some situations with the
-right collection of slow subscription consumers the event system can put
-backpressure on the consensus state machine and message gossiping in the
-network, thereby causing nodes to lag.
-
-Improvements
-~~~~~~~~~~~~
-
-The current system relies on implicit, bounded queues built by the buffered channels,
-and though threadsafe, can force all activity within Tendermint to serialize,
-which does not need to happen. Additionally, timeouts for subscription
-consumers related to the implementation of the RPC layer, may complicate the
-use of the system.
-
-References
-~~~~~~~~~~
-
- Legacy Implementation
-  - `publication of events <https://github.com/tendermint/tendermint/blob/master/libs/pubsub/pubsub.go#L333-L345>`_ 
-  - `send operation <https://github.com/tendermint/tendermint/blob/master/libs/pubsub/pubsub.go#L489-L527>`_ 
-  - `send loop <https://github.com/tendermint/tendermint/blob/master/libs/pubsub/pubsub.go#L381-L402>`_
- Related RFCs 
-  - `RFC 002: IPC Ecosystem <./rfc-002-ipc-ecosystem.md>`_ 
-  - `RFC 003: Performance Questions <./rfc-003-performance-questions.md>`_ 
-
-Discussion
----------
-
-Changes to Published Events
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-As part of this process, the Tendermint team should do a study of the existing
-event types and ensure that there are viable production use cases for
-subscriptions to all event types. Instinctively it seems plausible that some
-of the events may not be useable outside of tendermint, (e.g. ``TimeoutWait``
-or ``NewRoundStep``) and it might make sense to remove them. Certainly, it
-would be good to make sure that we don't maintain infrastructure for unused or
-un-useful message indefinitely.
-
-Blocking Subscription
-~~~~~~~~~~~~~~~~~~~~~
-
-The blocking subscription mechanism makes it possible to have *send*
-operations into the subscription channel be un-buffered (the event processing
-channel is still buffered.) In the blocking case, events from one subscription
-can block processing that event for other non-blocking subscriptions. The main
-case, it seems for blocking subscriptions is ensuring that a transaction has
-been committed to a block for ``BroadcastTxCommit``. Removing blocking
-subscriptions entirely, and potentially finding another way to implement
-``BroadcastTxCommit``, could lead to important simplifications and
-improvements to throughput without requiring large changes.
-
-Subscription Identification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Before `#6386 <https://github.com/tendermint/tendermint/pull/6386>`_, all
-subscriptions were identified by the combination of a client ID and a query,
-and with that change, it became possible to identify all subscription given
-only an ID, but compatibility with the legacy identification means that there's a
-good deal of legacy code as well as client side efficiency that could be
-improved. 
-
-Pubsub Changes
-~~~~~~~~~~~~~~
-
-The pubsub core should be implemented in a way that removes the possibility of
-backpressure from the event system to impact the core system *or* for one
-subscription to impact the behavior of another area of the
-system. Additionally, because the current system is implemented entirely in
-terms of a collection of buffered channels, the event system (and large
-numbers of subscriptions) can be a source of memory pressure. 
-
-These changes could include: 
-
- explicit cancellation and timeouts promulgated from callers (e.g. RPC end
-  points, etc,) this should be done using contexts.
-
- subscription system should be able to spill to disk to avoid putting memory
-  pressure on the core behavior of the node (consensus, gossip).
-  
- subscriptions implemented as cursors rather than channels, with either
-  condition variables to simulate the existing "push" API or a client side
-  iterator API with some kind of long polling-type interface. 
--- a/docs/rfc/rfc-template.md
+++ b/docs/rfc/rfc-template.md
@@ -1,35 +0,0 @@
-# RFC {RFC-NUMBER}: {TITLE}
-
-## Changelog
-
- {date}: {changelog}
-
-## Abstract
-
-> A brief high-level synopsis of the topic of discussion for this RFC, ideally
-> just a few sentences.  This should help the reader quickly decide whether the
-> rest of the discussion is relevant to their interest.
-
-## Background
-
-> Any context or orientation needed for a reader to understand and participate
-> in the substance of the Discussion. If necessary, this section may include
-> links to other documentation or sources rather than restating existing
-> material, but should provide enough detail that the reader can tell what they
-> need to read to be up-to-date.
-
-### References
-
-> Links to external materials needed to follow the discussion may be added here.
->
-> In addition, if the discussion in a request for comments leads to any design
-> decisions, it may be helpful to add links to the ADR documents here after the
-> discussion has settled.
-
-## Discussion
-
-> This section contains the core of the discussion.
->
-> There is no fixed format for this section, but ideally changes to this
-> section should be updated before merging to reflect any discussion that took
-> place on the PR that made those changes.