mirror of
https://github.com/tendermint/tendermint.git
synced 2026-01-04 20:23:59 +00:00
First draft of ADR-082
Signed-off-by: Thane Thomson <connect@thanethomson.com>
This commit is contained in:
356
docs/architecture/adr-082-data-companion-api.md
Normal file
356
docs/architecture/adr-082-data-companion-api.md
Normal file
@@ -0,0 +1,356 @@
|
||||
# ADR 082: Data Companion API
|
||||
|
||||
## Changelog
|
||||
|
||||
- 2022-09-10: First draft (@thanethomson)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted | Rejected | Deprecated | Superseded by
|
||||
|
||||
## Context
|
||||
|
||||
This ADR proposes to effectively offload some data storage responsibilities and
|
||||
functionality to **single external companion service** in order to:
|
||||
|
||||
1. Improve core consensus stability.
|
||||
2. Eventually reduce the amount of functionality for which the core team is
|
||||
responsible, thereby improving maintainability.
|
||||
3. Still cater for certain use cases that require this data and functionality.
|
||||
|
||||
The way the system is currently built is such that a Tendermint node is mostly
|
||||
self-contained. While this philosophy initially allowed a certain degree of ease
|
||||
of node operation (i.e. simple UX), it has also lent itself to feature sprawl,
|
||||
with Tendermint being asked to take care of increasingly more than just
|
||||
consensus. Under the spotlight in this ADR are:
|
||||
|
||||
1. **Event indexing**, which is required in order to facilitate arbitrary block
|
||||
and transaction querying, as well as subscription for arbitrary events. We
|
||||
have already seen, for instance, how the current (somewhat unreliable) event
|
||||
subscription implementation can cause back-pressure into consensus and affect
|
||||
IBC relayer stability (see [\#6729] and [\#7156]).
|
||||
2. **Block execution result storage**, which is required to facilitate a number
|
||||
of RPC APIs, but also results in storage of large quantities of data that is
|
||||
not critical to consensus. This data is, however, critical for certain use
|
||||
cases outside of consensus.
|
||||
|
||||
Another intersecting issue is that there may be a variety of use cases that
|
||||
require different data models. A good example of this is the mere existence of
|
||||
the [PostgreSQL indexer] as compared to the default key/value indexer.
|
||||
|
||||
Continuing from and expanding on the ideas outlined in [RFC 006][rfc-006], the
|
||||
suggested approach in this ADR is to provide a mechanism to publish certain data
|
||||
from Tendermint, in real-time and with certain reliability guarantees, to a
|
||||
single companion service outside of the node that can use this data in whatever
|
||||
way it chooses (filter and republish it, store it, manipulate or enrich it,
|
||||
etc.).
|
||||
|
||||
Specifically, this mechanism would initially publish:
|
||||
|
||||
1. The results of block execution (e.g. data from `FinalizeBlockResponse`). This
|
||||
data is not accessible from the P2P layer, and currently provides valuable
|
||||
information for Tendermint users (whether events should be handled at all by
|
||||
Tendermint is a different problem).
|
||||
2. All block data (i.e. `Block` data structures that have been committed by the
|
||||
consensus engine).
|
||||
|
||||
## Alternative Approaches
|
||||
|
||||
One clear alternative to this would be the approach outlined in [ADR
|
||||
075][adr-075]. This approach still unfortunately leaves Tendermint responsible
|
||||
for maintaining a query interface and event indexing functionality, increasing
|
||||
the long-term maintenance burden of, and the possibility of feature sprawl in,
|
||||
that subsystem.
|
||||
|
||||
## Decision
|
||||
|
||||
> TODO(thane)
|
||||
|
||||
## Detailed Design
|
||||
|
||||
### Use Cases
|
||||
|
||||
1. Transaction facilitators could provide standalone RPC services that would be
|
||||
capable of receiving transactions and providing confirmations to submitters.
|
||||
These RPC services could be architected to scale independently of the node
|
||||
facilitating consensus.
|
||||
2. Block explorers could make use of this data to provide real-time information
|
||||
about the blockchain.
|
||||
3. IBC relayer nodes could use this data to filter and store only the data they
|
||||
need to facilitate relaying without putting additional pressure on the
|
||||
consensus engine (until such time that a decision is made on whether to
|
||||
continue providing event data from Tendermint).
|
||||
|
||||
### Requirements
|
||||
|
||||
1. All of the following data must be published to a single external consumer:
|
||||
1. `FinalizeBlockResponse` data, which includes at the very least:
|
||||
1. Events
|
||||
2. Transaction results
|
||||
2. Committed block data
|
||||
|
||||
2. It must be opt-in. In other words, it is off by default and turned on by way
|
||||
of a configuration parameter. When off, it must have negligible (ideally no)
|
||||
impact on system performance.
|
||||
|
||||
3. It must not cause back-pressure into consensus.
|
||||
|
||||
4. It must not cause unbounded memory growth.
|
||||
|
||||
5. Data must be published reliably. If data cannot be published, the node must
|
||||
rather crash in such a way that bringing the node up again will cause it to
|
||||
attempt to republish the unpublished data.
|
||||
|
||||
6. The interface must support extension by allowing more kinds of data to be
|
||||
exposed via this API (bearing in mind that every extension has the potential
|
||||
to affect system stability in the case of an unreliable data companion
|
||||
service).
|
||||
|
||||
### Entity Relationships
|
||||
|
||||
The following simple diagram shows the proposed relationships between
|
||||
Tendermint, a socket-based ABCI application, and the proposed data companion
|
||||
service to contextualize this ADR.
|
||||
|
||||
```
|
||||
+----------+ +------------+ +----------------+
|
||||
| ABCI App | <--- | Tendermint | ---> | Data Companion |
|
||||
+----------+ +------------+ +----------------+
|
||||
|
||||
```
|
||||
|
||||
As can be seen in this diagram, Tendermint connects out to both the ABCI
|
||||
application and data companion service based on the Tendermint node's
|
||||
configuration. The fact that Tendermint connects out to the companion service
|
||||
instead of the other way around provides a natural constraint on the number of
|
||||
consumers of the API.
|
||||
|
||||
### gRPC API
|
||||
|
||||
The following gRPC API is proposed:
|
||||
|
||||
```protobuf
|
||||
// DataCompanionService allows Tendermint to publish certain data generated by
|
||||
// the consensus engine to a single external consumer with specific reliability
|
||||
// guarantees.
|
||||
//
|
||||
// Note that implementers of this service must take into account the possibility
|
||||
// that Tendermint may re-send data that was previously sent. Therefore
|
||||
// the service should simply ignore previously seen data instead of responding
|
||||
// with errors to ensure correct functioning of the node.
|
||||
service DataCompanionService {
|
||||
// CommitBlock is called after a block has been committed. This method is
|
||||
// also called on Tendermint startup to ensure that the service received the
|
||||
// last committed block, in case there was previously a transport failure.
|
||||
//
|
||||
// If an error is returned, Tendermint will crash.
|
||||
rpc CommitBlock (CommitBlockRequest) returns (CommitBlockResponse) {}
|
||||
}
|
||||
|
||||
// CommitBlockRequest contains information about a block that has been committed
|
||||
// by the consensus engine.
|
||||
message CommitBlockRequest {
|
||||
// The block committed by the consensus engine.
|
||||
Block block = 1;
|
||||
|
||||
// The results from execution of this block.
|
||||
ExecBlockResults block_results = 2;
|
||||
}
|
||||
|
||||
// ExecBlockResults contains additional information about the execution of a
|
||||
// specific block that will not necessarily be included in the block itself.
|
||||
// This data is obtained from the FinalizeBlock ABCI response.
|
||||
message ExecBlockResults {
|
||||
repeated Event events = 1;
|
||||
repeated ExecTxResult tx_results = 2;
|
||||
}
|
||||
|
||||
// CommitBlockResponse is either empty or returns an error. Note that returning
|
||||
// an error here will cause Tendermint to crash.
|
||||
message CommitBlockResponse {
|
||||
oneof error {
|
||||
UnexpectedBlockError unexpected_block_err = 1;
|
||||
}
|
||||
}
|
||||
|
||||
// UnexpectedBlockError is an error returned by the server when Tendermint sent
|
||||
// it a block that is ahead of the block expected by the server.
|
||||
message UnexpectedBlockError {
|
||||
// The height of the block expected by the server.
|
||||
int64 expected_height = 1;
|
||||
}
|
||||
|
||||
//-----------------------------------------------------------------------------
|
||||
// The following types are defined as part of ABCI++, and provided here for
|
||||
// reference.
|
||||
//-----------------------------------------------------------------------------
|
||||
|
||||
// Event allows application developers to attach additional information to
|
||||
// ResponseFinalizeBlock, ResponseDeliverTx, ExecTxResult
|
||||
// Later, transactions may be queried using these events.
|
||||
message Event {
|
||||
string type = 1;
|
||||
repeated EventAttribute attributes = 2 [(gogoproto.nullable) = false, (gogoproto.jsontag) = "attributes,omitempty"];
|
||||
}
|
||||
|
||||
// EventAttribute is a single key-value pair, associated with an event.
|
||||
message EventAttribute {
|
||||
string key = 1;
|
||||
string value = 2;
|
||||
bool index = 3; // nondeterministic
|
||||
}
|
||||
|
||||
// ExecTxResult contains results of executing one individual transaction.
|
||||
//
|
||||
// * Its structure is equivalent to #ResponseDeliverTx which will be deprecated/deleted
|
||||
message ExecTxResult {
|
||||
uint32 code = 1;
|
||||
bytes data = 2;
|
||||
string log = 3; // nondeterministic
|
||||
string info = 4; // nondeterministic
|
||||
int64 gas_wanted = 5;
|
||||
int64 gas_used = 6;
|
||||
repeated Event events = 7
|
||||
[(gogoproto.nullable) = false, (gogoproto.jsontag) = "events,omitempty"]; // nondeterministic
|
||||
string codespace = 8;
|
||||
}
|
||||
```
|
||||
|
||||
### Request Buffering
|
||||
|
||||
In order to ensure reliable delivery, while also catering for intermittent
|
||||
faults, we should allow for configurable buffering of a reasonable, yet small,
|
||||
number of requests to the companion service. If this buffer fills up, we should
|
||||
panic.
|
||||
|
||||
This data would need to be stored on disk to ensure that Tendermint could
|
||||
attempt to resubmit all unsubmitted data upon Tendermint startup. At present,
|
||||
this data is already stored on disk, and so practically we would need to
|
||||
implement some form of background pruning mechanism to remove the data we know
|
||||
has been shared with the companion service.
|
||||
|
||||
### Interaction with Block Sync
|
||||
|
||||
During block sync, request buffering should be disabled and requests to the
|
||||
companion service should be blocking.
|
||||
|
||||
### Configuration
|
||||
|
||||
The following configuration file update is proposed to support the data
|
||||
companion API.
|
||||
|
||||
```toml
|
||||
[data_companion]
|
||||
# By default, the data companion service interaction is disabled. It is
|
||||
# recommended that this only be enabled on full nodes and not validators so as
|
||||
# to minimize the possibility of network instability.
|
||||
enabled = false
|
||||
|
||||
# Address at which the gRPC companion service server is hosted. It is
|
||||
# recommended that this companion service be co-located at least within the same
|
||||
# data center as the Tendermint node to reduce the risk of network latencies
|
||||
# interfering in node operation.
|
||||
addr = "http://localhost:26659"
|
||||
|
||||
# The number of requests to the companion service to durably buffer. Set to 0 to
|
||||
# enable blocking send mode, which will block on every request to the companion
|
||||
# service. A sensible non-zero value would be 10 (if the companion service is
|
||||
# unavailable to receive 10 blocks' data, something is probably wrong and
|
||||
# requires intervention).
|
||||
buffer_size = 0
|
||||
|
||||
# Use the experimental gzip compression call option when submitting data to the
|
||||
# server. See https://pkg.go.dev/google.golang.org/grpc#UseCompressor
|
||||
experimental_use_gzip = false
|
||||
```
|
||||
|
||||
It is unclear at present whether compressing the data being sent to the
|
||||
companion service will result in meaningful benefits.
|
||||
|
||||
### Monitoring
|
||||
|
||||
To monitor the health of the interaction between Tendermint and the companion
|
||||
service, the following additional Prometheus metrics are proposed:
|
||||
|
||||
- `data_companion_send_time` - A gauge that indicates the maximum time, in
|
||||
milliseconds, taken to send a single request to the companion service take
|
||||
from a rolling window of tracked send times (e.g. the maximum send time over
|
||||
the past minute).
|
||||
- `data_companion_buffer_utilization` - A gauge indicating how much of the
|
||||
request buffer has been used.
|
||||
|
||||
## Implications
|
||||
|
||||
1. We will be able to mark the following RPC APIs as deprecated and schedule
|
||||
them for removal in a future release:
|
||||
1. The [WebSocket subscription API][websocket-api]
|
||||
2. [`/tx_search`]
|
||||
3. [`/block_search`]
|
||||
4. [`/broadcast_tx_commit`]
|
||||
5. [`/block_results`]
|
||||
|
||||
2. We will be able to remove all event indexing from Tendermint once we remove
|
||||
the above APIs.
|
||||
|
||||
3. Depending on the implementation approach chosen, we will still need to store
|
||||
some quantity of data not critical to consensus. This data can automatically
|
||||
be pruned once Tendermint has successfully transmitted it to the companion
|
||||
service.
|
||||
|
||||
### Release Strategy
|
||||
|
||||
As captured in the [requirements](#requirements), we should be able to release
|
||||
this as an additive, opt-in change, thereby not impacting existing APIs. This
|
||||
way we can evolve this API until such time that consumers are satisfied with
|
||||
removal of the old APIs mentioned in [implications](#implications).
|
||||
|
||||
This also means we could potentially release this as a non-breaking change (i.e.
|
||||
in a patch release).
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Paves the way toward greater architectural simplification of Tendermint so it
|
||||
can focus on its core duty, consensus, while still facilitating existing use
|
||||
cases.
|
||||
- Can be rolled out as experimental and opt-in in a non-breaking way.
|
||||
- The broad nature of what the API publishes lends itself to reasonable
|
||||
long-term stability.
|
||||
|
||||
### Negative
|
||||
|
||||
- It is unclear at present as to the impact of the requirement to publish large
|
||||
quantities of block/result data on the speed of block execution. This should
|
||||
be quantified in production networks as soon as this feature can be rolled out
|
||||
as experimental. If the impact is meaningful, we should either remove the
|
||||
feature or develop mitigation strategies (e.g. allowing for queries to be
|
||||
specified via the configuration file, or supporting a specific subset of use
|
||||
cases' data).
|
||||
- Requires a reasonable amount of coordination work across a number of
|
||||
stakeholders across the ecosystem in order to ensure their use cases are
|
||||
addressed effectively and people have enough opportunity to migrate.
|
||||
|
||||
### Neutral
|
||||
|
||||
## References
|
||||
|
||||
- [\#6729]: Tendermint emits events over WebSocket faster than any clients can pull them if tx includes many events
|
||||
- [\#7156]: Tracking: PubSub performance and UX improvements
|
||||
- [RFC 003: Taxonomy of potential performance issues in Tendermint][rfc-003]
|
||||
- [RFC 006: Event Subscription][rfc-006]
|
||||
- [\#7471] Deterministic Events
|
||||
- [ADR 075: RPC Event Subscription Interface][adr-075]
|
||||
|
||||
[\#6729]: https://github.com/tendermint/tendermint/issues/6729
|
||||
[\#7156]: https://github.com/tendermint/tendermint/issues/7156
|
||||
[PostgreSQL indexer]: https://github.com/tendermint/tendermint/blob/0f45086c5fd79ba47ab0270944258a27ccfc6cc3/state/indexer/sink/psql/psql.go
|
||||
[\#7471]: https://github.com/tendermint/tendermint/issues/7471
|
||||
[rfc-003]: ../rfc/rfc-003-performance-questions.md
|
||||
[rfc-006]: ../rfc/rfc-006-event-subscription.md
|
||||
[adr-075]: ./adr-075-rpc-subscription.md
|
||||
[websocket-api]: https://docs.tendermint.com/v0.34/rpc/#/Websocket
|
||||
[`/tx_search`]: https://docs.tendermint.com/v0.34/rpc/#/Info/tx_search
|
||||
[`/block_search`]: https://docs.tendermint.com/v0.34/rpc/#/Info/block_search
|
||||
[`/broadcast_tx_commit`]: https://docs.tendermint.com/v0.34/rpc/#/Tx/broadcast_tx_commit
|
||||
[`/block_results`]: https://docs.tendermint.com/v0.34/rpc/#/Info/block_results
|
||||
Reference in New Issue
Block a user