Merge branch 'master' into wb/rfc-perf-taxonomy

ABCI query issue
metric for block rate while blocksyncing
2026-02-12 14:51:13 +00:00 · 2021-09-15 16:59:03 -04:00 · 2021-09-15 16:56:52 -04:00 · 2021-09-15 16:31:33 -04:00 · 2021-09-15 16:25:03 -04:00 · 2021-09-15 09:02:43 -04:00
3 changed files with 285 additions and 1 deletions
--- a/docs/rfc/README.md
+++ b/docs/rfc/README.md
@@ -40,6 +40,7 @@ sections.
 - [RFC-000: P2P Roadmap](./rfc-000-p2p-roadmap.rst)
 - [RFC-001: Storage Engines](./rfc-001-storage-engine.rst)
 - [RFC-002: Interprocess Communication](./rfc-002-ipc-ecosystem.md)
+- [RFC-003: Performance Taxonomy](./rfc-003-performance-questions.md)
 - [RFC-004: E2E Test Framework Enhancements](./rfc-004-e2e-framework.md)

 <!-- - [RFC-NNN: Title](./rfc-NNN-title.md) -->
--- a/docs/rfc/rfc-003-performance-questions.md
+++ b/docs/rfc/rfc-003-performance-questions.md
@@ -0,0 +1,283 @@
+# RFC 003: Taxonomy of potential performance issues in Tendermint 
+
+## Changelog
+
+- 2021-09-02: Created initial draft (@wbanfield)
+- 2021-09-14: Add discussion of the event system (@wbanfield)
+
+## Abstract
+
+This document discusses the various sources of performance issues in Tendermint and
+attempts to clarify what work may be required to understand and address them.
+
+## Background
+
+Performance, loosely defined as the ability of a software process to perform its work
+quickly and efficiently under load and within reasonable resource limits, is a frequent
+topic of discussion in the Tendermint project.
+To effectively address any issues with Tendermint performance we need to
+categorize the various issues, understand their potential sources, and gauge their
+impact on users.
+
+Categorizing the different known performance issues will allow us to discuss and fix them
+more systematically. This document proposes a rough taxonomy of performance issues
+and highlights areas where more research into potential performance problems is required.
+
+Understanding Tendermint's performance limitations will also be critically important
+as we make changes to many of its subsystems. Performance is a central concern for
+upcoming decisions regarding the `p2p` protocol, RPC message encoding and structure,
+database usage and selection, and consensus protocol updates.
+
+
+## Discussion
+
+This section attempts to delineate the different sections of Tendermint functionality
+that are often cited as having performance issues. It raises questions and suggests
+lines of inquiry that may be valuable for better understanding Tendermint's performance issues.
+
+As a note: We should avoid quickly adding many microbenchmarks or package level benchmarks. 
+These are prone to being worse than useless as they can obscure what _should_ be
+focused on: performance of the system from the perspective of a user. We should,
+instead, tune performance with an eye towards user needs and actions users make. These users comprise
+both operators of Tendermint chains and the people generating transactions for
+Tendermint chains. Both of these sets of users are largely aligned in wanting an end-to-end
+system that operates quickly and efficiently.
+
+REQUEST: The list below may be incomplete, if there are additional sections that are often
+cited as creating poor performance, please comment so that they may be included.
+
+### P2P
+
+#### Claim: Tendermint cannot scale to large numbers of nodes
+
+A complaint has been reported that Tendermint networks cannot scale to large numbers of nodes.
+The listed number of nodes a user reported as causing issue was in the thousands.
+We don't currently have evidence about what the upper-limit of nodes that Tendermint's
+P2P stack can scale to.
+
+We need to more concretely understand the source of issues and determine what layer
+is causing a problem. It's possible that the P2P layer, in the absence of any reactors
+sending data, is perfectly capable of managing thousands of peer connections. For
+a reasonable networking and application setup, thousands of connections should not present any
+issue for the application.
+
+We need more data to understand the problem directly. We want to drive the popularity
+and adoption of Tendermint and this will mean allowing for chains with more validators.
+We should follow up with users experiencing this issue. We may then want to add
+a series of metrics to the P2P layer to better understand the inefficiencies it produces.
+
+The following metrics can help us understand the sources of latency in the Tendermint P2P stack:
+
+* Number of messages sent and received per second
+* Time of a message spent on the P2P layer send and receive queues
+
+The following metrics exist and should be leveraged in addition to those added:
+
+* Number of peers node's connected to
+* Number of bytes per channel sent and received from each peer
+
+### Sync
+
+#### Claim: Block Syncing is slow
+
+Bootstrapping a new node in a network to the height of the rest of the network is believed to
+take longer than users would like. Block sync requires fetching all of the blocks from
+peers and placing them into the local disk for storage. A useful line of inquiry
+is understanding how quickly a perfectly tuned system _could_ fetch all of the state
+over a network so that we understand how much overhead Tendermint actually adds.
+
+The operation is likely to be _incredibly_ dependent on the environment in which
+the node is being run. The factors that will influence syncing include:
+1. Number of peers that a syncing node may fetch from.
+2. Speed of the disk that a validator is writing to.
+3. Speed of the network connection between the different peers that node is
+syncing from.
+
+We should calculate how quickly this operation _could possibly_ complete for common chains and nodes.
+To calculate how quickly this operation could possibly complete, we should assume that
+a node is reading at line-rate of the NIC and writing at the full drive speed to its
+local storage. Comparing this theoretical upper-limit to the actual sync times
+observed by node operators will give us a good point of comparison for understanding
+how much overhead Tendermint incurs.
+
+We should additionally add metrics to the blocksync operation to more clearly pinpoint
+slow operations. The following metrics should be added to the block syncing operation:
+
+* Time to fetch and validate each block
+* Time to execute a block
+* Blocks sync'd per unit time
+
+### Application
+
+Applications performing complex state transitions have the potential to bottleneck
+the Tendermint node.
+
+#### Claim: ABCI block delivery could cause slowdown
+
+ABCI delivers blocks in several methods: `BeginBlock`, `DeliverTx`, `EndBlock`, `Commit`.
+
+Tendermint delivers transactions one-by-one via the `DeliverTx` call. Most of the 
+transaction delivery in Tendermint occurs asynchronously and therefore appears unlikely to
+form a bottleneck in ABCI.
+
+After delivering all transactions, Tendermint then calls the `Commit` ABCI method.
+Tendermint [locks all access to the mempool][abci-commit-description] while `Commit`
+proceeds. This means that an application that is slow to execute all of its
+transactions or finalize state during the `Commit` method will prevent any new
+transactions from being added to the mempool.  Apps that are slow to commit will
+prevent consensus from proceeded to the next consensus height since Tendermint
+cannot validate block proposals or produce block proposals without the
+AppHash obtained from the `Commit` method. We should add a metric for each
+step in the ABCI protocol to track the amount of time that a node spends communicating
+with the application at each step.
+
+#### Claim: ABCI serialization overhead causes slowdown
+
+The most common way to run a Tendermint application is using the Cosmos-SDK.
+The Cosmos-SDK runs the ABCI application within the same process as Tendermint.
+When an application is run in the same process as Tendermint, a serialization penalty
+is not paid. This is because the local ABCI client does not serialize method calls
+and instead passes the protobuf type through directly. This can be seen
+in [local_client.go][abci-local-client-code].
+
+Serialization and deserialization in the gRPC and socket protocol ABCI methods
+may cause slowdown. While these may cause issue, they are not part of the primary
+usecase of Tendermint and do not necessarily need to be addressed at this time.
+
+### RPC
+
+#### Claim: The Query API is slow.
+
+The query API locks a mutex across the ABCI connections. This causes consensus to
+slow during queries, as ABCI is no longer able to make progress. This is known
+to be causing issue in the cosmos-sdk and is being addressed [in the sdk][sdk-query-fix]
+but a more robust solution may be required. Adding metrics to each ABCI client connection
+and message as described in the Application section of this document would allow us
+to further introspect the issue here. 
+
+#### Claim: RPC Serialization may cause slowdown
+
+The Tendermint RPC uses a modified version of JSON-RPC. This RPC powers the `broadcast_tx_*` methods,
+which is a critical method for adding transactions to Tendermint at the moment. This method is
+likely invoked quite frequently on popular networks. Being able to perform efficiently
+on this common and critical operation is very important. The current JSON-RPC implementation
+relies heavily on type introspection via reflection, which is known to be very slow in
+Go. We should therefore produce benchmarks of this method to determine how much overhead
+we are adding to what, is likely to be, a very common operation.
+
+The other JSON-RPC methods are much less critical to the core functionality of Tendermint.
+While there may other points of performance consideration within the RPC, methods that do not
+receive high volumes of requests should not be prioritized for performance consideration.
+
+NOTE: Previous discussion of the RPC framework was done in [ADR 57][adr-57] and 
+there is ongoing work to inspect and alter the JSON-RPC framework in [RFC 002][rfc-002]. 
+Much of these RPC-related performance considerations can either wait until the work of RFC 002 work is done or be
+considered concordantly with the in-flight changes to the JSON-RPC.
+
+### Protocol
+
+#### Claim: Gossiping messages is a slow process
+
+Currently, for any validator to successfully vote in a consensus _step_, it must
+receive votes from greater than 2/3 of the validators on the network. In many cases,
+it's preferable to receive as many votes as possible from correct validators.
+
+This produces a quadratic increase in messages that are communicated as more validators join the network.
+(Each of the N validators must communicate with all other N-1 validators).
+
+This large number of messages communicated per step has been identified to impact
+performance of the protocol. Given that the number of messages communicated has been
+identified as a bottleneck, it would be extremely valuable to gather data on how long
+it takes for popular chains with many validators to gather all votes within a step.
+
+Metrics that would improve visibility into this include:
+
+* Amount of time for a node to gather votes in a step.
+* Amount of time for a node to gather all block parts.
+* Number of votes each node sends to gossip (i.e. not its own votes, but votes it is
+transmitting for a peer).
+* Total number of votes each node sends to receives (A node may receive duplicate votes
+so understanding how frequently this occurs will be valuable in evaluating the performance
+of the gossip system).
+
+#### Claim: Hashing Txs causes slowdown in Tendermint
+
+Using a faster hash algorithm for Tx hashes is currently a point of discussion
+in Tendermint. Namely, it is being considered as part of the [modular hashing proposal][modular-hashing].
+It is currently unknown if hashing transactions in the Mempool forms a significant bottleneck.
+Although it does not appear to be documented as slow, there are a few open github
+issues that indicate a possible user preference for a faster hashing algorithm,
+including [issue 2187][issue-2187] and [issue 2186][issue-2186]. 
+
+It is likely worth investigating what order of magnitude Tx hashing takes in comparison to other
+aspects of adding a Tx to the mempool. It is not currently clear if the rate of adding Tx
+to the mempool is a source of user pain. We should not endeavor to make large changes to
+consensus critical components without first being certain that the change is highly
+valuable and impactful.
+
+### Digital Signatures
+
+#### Claim: Verification of digital signatures may cause slowdown in Tendermint
+
+Working with cryptographic signatures can be computationally expensive. The cosmos
+hub uses [ed25519 signatures][hub-signature]. The library performing signature
+verification in Tendermint on votes is [benchmarked][ed25519-bench] to be able to perform an `ed25519`
+signature in 75μs on a decently fast CPU. A validator in the Cosmos Hub performs
+3 sets of verifications on the signatures of the 140 validators in the Hub
+in a consensus round, during block verification, when verifying the prevotes, and
+when verifying the precommits. With no batching, this would be roughly `3ms` per
+round. It is quite unlikely, therefore, that this accounts for any serious amount
+of the ~7 seconds of block time per height in the Hub.
+
+This may cause slowdown when syncing, since the process needs to constantly verify
+signatures. It's possible that improved signature aggregation will lead to improved
+light client or other syncing performance. In general, a metric should be added
+to track block rate while blocksyncing.
+
+#### Claim: Our use of digital signatures in the consensus protocol contributes to performance issue
+
+Currently, Tendermint's digital signature verification requires that all validators
+receive all vote messages. Each validator must receive the complete digital signature
+along with the vote message that it corresponds to. This means that all N validators
+must receive messages from at least 2/3 of the N validators in each consensus
+round. Given the potential for oddly shaped network topologies and the expected
+variable network roundtrip times of a few hundred milliseconds in a blockchain,
+it is highly likely that this amount of gossiping is leading to a significant amount
+of the slowdown in the Cosmos Hub and in Tendermint consensus.
+
+### Tendermint Event System
+
+#### Claim: The event system is a bottleneck in Tendermint
+
+The Tendermint Event system is used to communicate and store information about
+internal Tendermint execution. The system uses channels internally to send messages
+to different subscribers. Sending an event [blocks on the internal channel][event-send].
+The default configuration is to [use an unbuffered channel for event publishes][event-buffer-capacity].
+Several consumers of the event system also use an unbuffered channel for reads.
+An example of this is the [event indexer][event-indexer-unbuffered], which takes an
+unbuffered subscription to the event system. The result is that these unbuffered readers
+can cause writes to the event system to block or slow down depending on contention in the
+event system. This has implications for the consensus system, which [publishes events][consensus-event-send].
+To better understand the performance of the event system, we should add metrics to track the timing of
+event sends. The following metrics would be a good start for tracking this performance:
+
+* Time in event send, labeled by Event Type
+* Time in event receive, labeled by subscriber
+* Event throughput, measured in events per unit time.
+
+### References
+[modular-hashing]: https://github.com/tendermint/tendermint/pull/6773
+[issue-2186]: https://github.com/tendermint/tendermint/issues/2186
+[issue-2187]: https://github.com/tendermint/tendermint/issues/2187
+[rfc-002]: https://github.com/tendermint/tendermint/pull/6913
+[adr-57]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-057-RPC.md
+[issue-1319]: https://github.com/tendermint/tendermint/issues/1319
+[abci-commit-description]: https://github.com/tendermint/spec/blob/master/spec/abci/apps.md#commit
+[abci-local-client-code]: https://github.com/tendermint/tendermint/blob/511bd3eb7f037855a793a27ff4c53c12f085b570/abci/client/local_client.go#L84
+[hub-signature]: https://github.com/cosmos/gaia/blob/0ecb6ed8a244d835807f1ced49217d54a9ca2070/docs/resources/genesis.md#consensus-parameters
+[ed25519-bench]: https://github.com/oasisprotocol/curve25519-voi/blob/d2e7fc59fe38c18ca990c84c4186cba2cc45b1f9/PERFORMANCE.md
+[event-send]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/libs/pubsub/pubsub.go#L338
+[event-buffer-capacity]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/types/event_bus.go#L14
+[event-indexer-unbuffered]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/state/indexer/indexer_service.go#L39
+[consensus-event-send]: https://github.com/tendermint/tendermint/blob/5bd3b286a2b715737f6d6c33051b69061d38f8ef/internal/consensus/state.go#L1573
+[sdk-query-fix]: https://github.com/cosmos/cosmos-sdk/pull/10045
--- a/internal/consensus/state.go
+++ b/internal/consensus/state.go
@@ -137,7 +137,7 @@ type State struct {
 	done chan struct{}

 	// synchronous pubsub between consensus state and reactor.
-	// state only emits EventNewRoundStep and EventVote
+	// state only emits EventNewRoundStep, EventValidBlock, and EventVote
 	evsw tmevents.EventSwitch

 	// for reporting metrics
Author	SHA1	Message	Date
William Banfield	3dfd3af01c	Merge branch 'master' into wb/rfc-perf-taxonomy	2021-09-15 16:59:03 -04:00
William Banfield	c7a44732ba	ABCI query issue	2021-09-15 16:56:52 -04:00
William Banfield	07b3aab8b6	metric for block rate while blocksyncing	2021-09-15 16:31:33 -04:00
William Banfield	544862f443	update readme and add propoagation block time	2021-09-15 16:25:03 -04:00
William Banfield	c8a26f4d29	Update docs/rfc/rfc-003-performance-questions.md Co-authored-by: Callum Waters <cmwaters19@gmail.com>	2021-09-15 09:02:43 -04:00
William Banfield	81930f7fdd	Update docs/rfc/rfc-003-performance-questions.md Co-authored-by: Callum Waters <cmwaters19@gmail.com>	2021-09-15 09:02:34 -04:00
William Banfield	c9ae3ba4bd	Update docs/rfc/rfc-003-performance-questions.md Co-authored-by: Callum Waters <cmwaters19@gmail.com>	2021-09-15 09:02:13 -04:00
William Banfield	0ca6a13c77	prefix rfc sections with claim	2021-09-14 17:38:18 -04:00
William Banfield	bf9335d4ec	update changelog	2021-09-14 17:05:02 -04:00
William Banfield	d91ec36a95	add section on the event system	2021-09-14 17:02:29 -04:00
William Banfield	5bd3b286a2	abci metric suggestion	2021-09-14 12:57:25 -04:00
William Banfield	2f92d1a01c	correct number of validators	2021-09-14 10:09:33 -04:00
William Banfield	1c61b41a2f	correct num validators in consensus round	2021-09-14 10:05:03 -04:00
William Banfield	660b6833e9	add section on digital signatures	2021-09-14 09:53:59 -04:00
William Banfield	7ae27cb856	add todos for signature and event sections, cleanup based on pr feedback	2021-09-10 17:01:25 -04:00
William Banfield	257606fde4	Update rfc-003-performance-questions.md	2021-09-09 19:16:38 -04:00
William Banfield	ca011c0b9c	rfc: add performance taxonomy rfc	2021-09-09 19:09:32 -04:00