From 4f401ddcb85ef44598efc1db26fee95e39e33d24 Mon Sep 17 00:00:00 2001 From: "M. J. Fromberger" Date: Wed, 8 Sep 2021 08:40:22 -0700 Subject: [PATCH] Paste and reformat draft from the original gist. --- docs/rfc/rfc-002-ipc-ecosystem.md | 239 ++++++++++++++++++++++++++++++ 1 file changed, 239 insertions(+) create mode 100644 docs/rfc/rfc-002-ipc-ecosystem.md diff --git a/docs/rfc/rfc-002-ipc-ecosystem.md b/docs/rfc/rfc-002-ipc-ecosystem.md new file mode 100644 index 000000000..525143b29 --- /dev/null +++ b/docs/rfc/rfc-002-ipc-ecosystem.md @@ -0,0 +1,239 @@ +# RFC 002: Interprocess Communication (IPC) in Tendermint + +## Changelog + +- 08-Sep-2021: Initial draft (@creachadair). + + +## Abstract + +Communication in Tendermint among consensus nodes, applications, and operator +tools all use different message formats and transport mechanisms. In some +cases there are multiple options. Having all these options complicates both the +code and the developer experience, and hides bugs. To support a more robust, +trustworthy, and usable system, we should document which communication paths +are essential, which could be removed or reduced in scope, and what we can +improve for the most important use cases. + + +## Background + +The Tendermint state replication engine has a complex IPC footprint. + +1. Consensus nodes communicate with each other using a networked peer-to-peer + message-passing protocol. + +2. Consensus nodes communicate with the application whose state is being + replicated via the [Application BlockChain Interface (ABCI)][abci]. + +3. Consensus nodes export a network-accessible [RPC service][rpc-service] to + support operations (bootstrapping, debugging) and synchronization of [light clients][light-client]. + This interface is also used by the [`tendermint` CLI][tm-cli]. + +4. Consensus nodes export a gRPC service exposing a subset of the methods of + the RPC service described by (3). This was intended to simplify the + implementation of tools that already use gRPC to communicate with an + application (via the Cosmos SDK), and wanted to also talk to the consensus + node without implementing yet another RPC protocol. + + The gRPC interface to the consensus node has been deprecated and is slated + for removal in the forthcoming Tendermint v0.36 release. + + +## Discussion + +TODO: What about the verifier/signer? I think this is maybe baked into the +consensus node. Can/does it run separately? If so, how does it communicate +with the consensus node? Does an application ever talk directly to the +verifier/signer? + +### ABCI Transport + +In ABCI, the _application_ is the server, and the Tendermint consensus engine +is the client. Most applications implement the server using the [Cosmos SDK][cosmos-sdk], +which handles low-level details of the ABCI interaction and provides a +higher-level interface to the rest of the application. The SDK is written in Go. + +Beneath the SDK, the application communicates with Tendermint core in one of +two ways: + +- In-process direct calls (for applications written in Go and compiled against + the Tendermint code). This is an optimization for the common case where an + application is written in Go, to save on the overhead of marshaling and + unmarshaling requests and responses within the same process: + [`abci/client/local_client.go`][local-client] + +- A custom remote procedure protocol built on wire-format protobuf messages + using a socket (the "socket protocol"): [`abci/server/socket_server.go`][socket-server] + +### RPC Transport + +The consensus node RPC service allows callers to query consensus parameters +(genesis data, transactions, commits), node status (network info, health +checks), application state (abci_query, abci_info), mempool state, and other +attributes of the node and its application. The service also provides methods +allowing transactions and evidence to be injected ("broadcast") into the +blockchain. + +The RPC service is exposed in several ways: + +- HTTP GET: Queries may be sent as URI parameters, with method names in the path. + +- HTTP POST: Queries may be sent as JSON-RPC request messages in the body of an + HTTP POST request. The server uses a custom implementation of JSON-RPC that + is not fully compatible with the [JSON-RPC 2.0 spec][json-rpc], but handles + the common cases. + +- Websocket: Queries may be sent as JSON-RPC request messages via a websocket. + This transport uses more or less the same JSON-RPC plumbing as the HTTP POST + handler. + + TODO: There appear to be some methods that are _only_ exported via websocket. + Figure out what is going on there and why it's that way. I think these may be + here for the same reason as the gRPC subset, viz., to support external + clients who could now talk directly to the application. + +- gRPC: A subset of queries may be issued in protocol buffer format to the gRPC + interface described above under (4). As noted, this endpoint is deprecated + and will be removed in v0.36. + +### Opportunities for Simplification + +**Claim:** There are too many IPC mechanisms. + +The preponderance of ABCI usage is via the Cosmos SDK, which means the +application and the consensus node are compiled together into a single binary, +and the consensus node calls the ABCI methods of the application directly as Go +functions. + +TODO: What are the remaining use cases for the socket protocol and gRPC? Ethan +mentioned some cases like web-based explorer UIs and possibly other tools. + +As far as I can tell the primary consumers of the multi-headed "RPC service" +are the light client and the `tendermint` command-line client. There is +probably some local use via curl, but I expect that is mostly ad hoc. Ethan +reports that nodes are often configured with the ports to the RPC service +blocked, which is good for security but complicates use by the light client. + +TODO: Figure out what is going on with the weird pseudo-inheritance service +interface. The BaseService type makes sense, but its embeds are more +complicated. + +### Context: ABCI Issues + +In the original design of ABCI, the presumption was that all access to the +application would be mediated by the consensus node. This makes sense, since +outside access could change application state and corrupt the consensus +process. Of course, even without outside access an application could behave +nondeterministically, but allowing other programs to send it requests was seen +as inviting trouble. + +Later, however, it was noted that most of the time, tools specific to a +particular application don't really want to talk to the consensus module +directly. The application is the "owner" of the state machine the consensus +engine is replicating, so tools that want to know things about application +state should talk to the application. Otherwise, such tools would have to bake +in knowledge about Tendermint (e.g., its interfaces and data structures) that +they might otherwise not need or care about. + +However, this raises another issue: The consensus node is the ABCI client, so +it is inconvenient for the application to "push" work into the consensus +module. In theory you could work around this by having the consensus module +"poll" the application for work that needs done, but that has unsatisfactory +implications for performance and robustness, as well as being harder to +understand. + +There has apparently been some discussion about trying to make a more +bidirectional communication between the consensus node and the application, but +this issue seems to still be under discussion. TODO: Impact of ABCI++ for this +question? + +### Context: RPC Issues + +The RPC system serves several masters, and has a complex surface area. I +believe there are some improvements that can be exposed by separating some of +these concerns. + +For light clients, some work is already underway to move toward using P2P +message passing rather than RPC to synchronize light client state with the rest +of the network. Because of the way light client synchronization interacts with +event logging, this is not as simple as just swapping out the transport: More +details of the event system will need to be made visible via P2P, and that in +turn has some implications for fault tolerance (in particular, I think the +ability of a badly-behaved client to DDoS the network without punishment is a +concern). However, once this work is done, it should in principle be possible +to remove the dependency of light clients on the RPC service. + +For bootstrapping and operations, the RPC mechanism still makes sense, but can +be further simplified. Here are some specific proposals: + +- Remove the HTTP GET interface entirely. +- Remove the gRPC interface (this is already planned for v0.36). +- Remove the websocket interface (TODO: are websockets used anymore?) +- Simplify the JSON-RPC plumbing to remove unnecessary reflection and interface + wrapping. + +These changes would preserve the ability of operators to issue queries with +curl (but would require using JSON-RPC instead of URI parameters). That would +be a little less user-friendly, but for a use case that should not be that +prevalent. + +These changes would also preserve compatibility with existing JSON-RPC based +code paths like the `tendermint` CLI and the light client (even ahead of +further work to remove that dependency). + +**Design goal:** An operator should be able to disable non-local access to the +RPC server on any node in the network without impairing the ability of the +network to function for service of state replication, including light clients. + +**Design principle:** Any communication required to operate the network should use P2P. + +### Options for RPC Transport + +[ADR 057][adr-57] proposes using gRPC for the Tendermint RPC implementation. +This is still possible, but if we are able to simplify and decouple the +concerns as described above, I do not think it should be necessary. + +While JSON-RPC is not the best possible RPC protocol for all situations, it has +some advantages over gRPC for our domain. Specifically: + +- It is easy to call JSON-RPC manually from the command-line, which helps with + a common concern for the RPC service, local debugging and operations. + +- gRPC has an enormous dependency footprint for both clients and servers, and + many of the features it provides to support security and performance + (encryption, compression, streaming, etc.) are mostly irrelevant to local + use. + +- If we intend to migrate light clients off RPC to use P2P entirely, there is + no advantage to forcing a temporary migration to gRPC along the way; and once + the light client is not dependent on the RPC service, the efficiency of the + protocol is much less important. + +- We can still get the benefits of generated data types using protocol buffers, even + without using gRPC: + + - Protobuf defines a standard JSON encoding for all message types so + languages with protobuf support do not need to worry about type mapping + oddities. + + - Using JSON means that even languages _without_ good protobuf support can + implement the protocol with a bit more work, and I expect this situation to + be rare. + +Even if a language lacks a good standard JSON-RPC mechanism, the protocol is +lightweight and can be implemented by simple send/receive over TCP or +Unix-domain sockets with no need for code generation, encryption, etc. gRPC +uses a complex HTTP/2 based transport that is not easily replicated. + +## References + +[abci]: https://github.com/tendermint/spec/tree/95cf253b6df623066ff7cd4074a94e7a3f147c7a/spec/abci +[rpc-service]: https://docs.tendermint.com/master/rpc/ +[light-client]: https://docs.tendermint.com/master/tendermint-core/light-client.html +[tm-cli]: https://github.com/tendermint/tendermint/tree/master/cmd/tendermint +[cosmos-sdk]: https://github.com/cosmos/cosmos-sdk/ +[local-client]: https://github.com/tendermint/tendermint/blob/master/abci/client/local_client.go +[socket-server]: https://github.com/tendermint/tendermint/blob/master/abci/server/socket_server.go +[json-rpc]: https://www.jsonrpc.org/specification +[adr-57]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-057-RPC.md