Compare commits

..

7 Commits

Author SHA1 Message Date
William Banfield
a985adeb7a issues 2022-05-08 22:58:49 -04:00
Sergio Mena
cec0a97987 RFC017: ABCI++ Vote Extension Propagation (#8317)
* 1st version

* Addressed (some of) @williambanfield's comments

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Daniel <daniel.cason@usi.ch>

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Daniel <daniel.cason@usi.ch>

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Daniel <daniel.cason@usi.ch>

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Sam Kleinman <garen@tychoish.com>

* Update docs/rfc/README.md

Co-authored-by: Sam Kleinman <garen@tychoish.com>

* Addressed some comments

* Addressed more comments. Improved description of Solution 3

* Work on 'definitions' section

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Callum Waters <cmwaters19@gmail.com>

* bottom

* Addressed Josef's valset-change comment. Other minor edits

* Improved wording of 'disjoint valsets' case

* Addressed TODOs: major revamp of various sections. First complete version.

* Fixed minor wording problem

* removed blank line

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Thane Thomson <connect@thanethomson.com>

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Thane Thomson <connect@thanethomson.com>

* Addressed some of Thane's comments

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Thane Thomson <connect@thanethomson.com>

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Thane Thomson <connect@thanethomson.com>

* Addressed outstanding comments

* Addressed @williambanfield's 'catch-up message' comment

* Removed TODO after confirming statesync is only run on nodes starting from scratch

* Removed TODO (after checking with Jasmina)

* Removed addressed TODO

* Addressed Josef's feedback on case (h)

* Typo

* Update docs/rfc/rfc-017-abci++-vote-extension-propag.md

Co-authored-by: Josef Widder <44643235+josef-widder@users.noreply.github.com>

* Added log line

Co-authored-by: Daniel <daniel.cason@usi.ch>
Co-authored-by: Sam Kleinman <garen@tychoish.com>
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
Co-authored-by: Thane Thomson <connect@thanethomson.com>
Co-authored-by: Josef Widder <44643235+josef-widder@users.noreply.github.com>
2022-05-09 02:15:08 +02:00
elias-orijtech
694ab2c6d1 test/fuzz: replace outdated reference to go-fuzz in README (#8477) 2022-05-06 11:02:19 -07:00
M. J. Fromberger
97f2944db0 Reintegrate docs deployment into the main TM repo (#8468)
Per https://github.com/tendermint/docs/issues/20, it is no longer necessary to
build the static documentation out of a separate repository.

This change:

- Adds an actions workflow to build and deploy the docs to GitHub Pages.
- Updates some build settings in a compatible manner.

This change does not affect the existing site deployment. To complete this
change, we will need to update the custom domain pointer and disable the
corresponding workflow in the tendermint/docs repository. Those changes can and
must be done after this is merged.

In the future should probably also move the build rule out of the Makefile and
into the workflow directly. That will also make it easier to manage caching of
build artifacts. For now, however, I've left it as-is, so that we do not break
the active workflow on tendermint/docs, which depends on it.
2022-05-06 07:35:35 -07:00
M. J. Fromberger
ef44460c41 Convert explicit zero comparison to a method. (#8475)
Fixes #8472.

I didn't see any other obvious cases of us doing this (although we do return zeroes in other places alongside errors, which is fine).
2022-05-06 13:44:09 +00:00
dependabot[bot]
ce40697ea6 build(deps): Bump github.com/vektra/mockery/v2 from 2.12.1 to 2.12.2 (#8474)
Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.12.1 to 2.12.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/vektra/mockery/releases">github.com/vektra/mockery/v2's releases</a>.</em></p>
<blockquote>
<h2>v2.12.2</h2>
<h2>Changelog</h2>
<ul>
<li>ea4c438 Add deprecation notice to logs</li>
<li>735bc0c Add go-get deprecation note</li>
<li>bea853e Add missing mock</li>
<li>989253d Fix *unsafe.Pointer</li>
<li>9228ad4 Merge pull request <a href="https://github-redirect.dependabot.com/vektra/mockery/issues/457">#457</a> from LandonTClipp/readme_deprecation</li>
<li>1d92e73 Merge pull request <a href="https://github-redirect.dependabot.com/vektra/mockery/issues/460">#460</a> from grongor/fix-unsafe-pointer</li>
<li>2fcd83d Merge pull request <a href="https://github-redirect.dependabot.com/vektra/mockery/issues/462">#462</a> from LandonTClipp/deprecation</li>
<li>9f67b8a More explicit deprecation for go-get</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="1d92e7320b"><code>1d92e73</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/vektra/mockery/issues/460">#460</a> from grongor/fix-unsafe-pointer</li>
<li><a href="2fcd83d746"><code>2fcd83d</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/vektra/mockery/issues/462">#462</a> from LandonTClipp/deprecation</li>
<li><a href="ea4c438a53"><code>ea4c438</code></a> Add deprecation notice to logs</li>
<li><a href="989253d1a4"><code>989253d</code></a> Fix *unsafe.Pointer</li>
<li><a href="bea853e93d"><code>bea853e</code></a> Add missing mock</li>
<li><a href="9f67b8afdc"><code>9f67b8a</code></a> More explicit deprecation for go-get</li>
<li><a href="9228ad4b4a"><code>9228ad4</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/vektra/mockery/issues/457">#457</a> from LandonTClipp/readme_deprecation</li>
<li><a href="735bc0c9f8"><code>735bc0c</code></a> Add go-get deprecation note</li>
<li>See full diff in <a href="https://github.com/vektra/mockery/compare/v2.12.1...v2.12.2">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/vektra/mockery/v2&package-manager=go_modules&previous-version=2.12.1&new-version=2.12.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>
2022-05-06 10:25:34 +00:00
William Banfield
e980e1468d RFC-018: initial research of BLS signature aggregation (#8358)
This provides an initial document for understanding the landscape of implementing a BLS signature aggregation scheme into Tendermint.
2022-05-05 12:08:22 -04:00
12 changed files with 1225 additions and 40 deletions

62
.github/workflows/docs-deployment.yml vendored Normal file
View File

@@ -0,0 +1,62 @@
# Build and deploy the docs.tendermint.com website content.
# The static content is published to GitHub Pages.
#
# For documentation build info, see docs/DOCS_README.md.
name: Build static documentation site
on:
workflow_dispatch: # allow manual updates
push:
branches:
- master
paths:
- docs/**
- spec/**
jobs:
# This is split into two jobs so that the build, which runs npm, does not
# have write access to anything. The deploy requires write access to publish
# to the branch used by GitHub Pages, however, so we can't just make the
# whole workflow read-only.
build:
name: VuePress build
runs-on: ubuntu-latest
container:
image: alpine:latest
permissions:
contents: read
steps:
- name: Install generator dependencies
run: |
apk add --no-cache make bash git npm
- uses: actions/checkout@v3
with:
# We need to fetch full history so the backport branches for previous
# versions will be available for the build.
fetch-depth: 0
- name: Build documentation
run: |
git config --global --add safe.directory "$PWD"
make build-docs
- uses: actions/upload-artifact@v3
with:
name: build-output
path: ~/output/
deploy:
name: Deploy to GitHub Pages
runs-on: ubuntu-latest
needs: build
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- uses: actions/download-artifact@v3
with:
name: build-output
path: ~/output
- name: Deploy to GitHub Pages
uses: JamesIves/github-pages-deploy-action@v4
with:
branch: 'docs-tendermint-com'
folder: ~/output
single-commit: true

View File

@@ -1,32 +0,0 @@
name: Documentation
# This workflow builds the static documentation site, and publishes the results to GitHub Pages.
# It runs on every push to the main branch, with changes in the docs and spec directories
on:
workflow_dispatch: # allow manual updates
push:
branches:
- master
paths:
- "docs/**"
- "spec/**"
jobs:
build-and-deploy:
runs-on: ubuntu-latest
container:
image: tendermintdev/docker-website-deployment
steps:
- name: Checkout 🛎️
uses: actions/checkout@v3
- name: Install and Build 🔧
run: |
apk add rsync
make build-docs
- name: Deploy 🚀
uses: JamesIves/github-pages-deploy-action@v4.3.0
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
BRANCH: gh-pages
FOLDER: ~/output
single-commit: true

View File

@@ -226,7 +226,8 @@ DESTINATION = ./index.html.md
build-docs:
@cd docs && \
while read -r branch path_prefix; do \
(git checkout $${branch} && npm ci && VUEPRESS_BASE="/$${path_prefix}/" npm run build) ; \
( git checkout $${branch} && npm ci --quiet && \
VUEPRESS_BASE="/$${path_prefix}/" npm run build --quiet ) ; \
mkdir -p ~/output/$${path_prefix} ; \
cp -r .vuepress/dist/* ~/output/$${path_prefix}/ ; \
cp ~/output/$${path_prefix}/index.html ~/output ; \

View File

@@ -15,7 +15,7 @@
"serve": "trap 'exit 0' SIGINT; vuepress dev --no-cache",
"postserve": "./post.sh",
"prebuild": "./pre.sh",
"build": "trap 'exit 0' SIGINT; vuepress build --no-cache",
"build": "trap 'exit 0' SIGINT; vuepress build --no-cache --silent",
"postbuild": "./post.sh"
},
"author": "",

View File

@@ -53,6 +53,10 @@ sections.
- [RFC-013: ABCI++](./rfc-013-abci++.md)
- [RFC-014: Semantic Versioning](./rfc-014-semantic-versioning.md)
- [RFC-015: ABCI++ Tx Mutation](./rfc-015-abci++-tx-mutation.md)
- [RFC-018: BLS Signature Aggregation Exploration](./rfc-018-bls-agg-exploration.md)
- [RFC-019: Configuration File Versioning](./rfc-019-config-version.md)
- [RFC-017: ABCI++ Vote Extension Propagation](./rfc-017-abci++-vote-extension-propag.md)
<!-- - [RFC-NNN: Title](./rfc-NNN-title.md) -->

View File

@@ -0,0 +1,571 @@
# RFC 017: ABCI++ Vote Extension Propagation
## Changelog
- 11-Apr-2022: Initial draft (@sergio-mena).
- 15-Apr-2022: Addressed initial comments. First complete version (@sergio-mena).
- 09-May-2022: Addressed all outstanding comments.
## Abstract
According to the
[ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md)
(as of 11-Apr-2022), a validator MUST provide a signed vote extension for each non-`nil` precommit vote
of height *h* that it uses to propose a block in height *h+1*. When a validator is up to
date, this is easy to do, but when a validator needs to catch up this is far from trivial as this data
cannot be retrieved from the blockchain.
This RFC presents and compares the different options to address this problem, which have been proposed
in several discussions by the Tendermint Core team.
## Document Structure
The RFC is structured as follows. In the [Background](#background) section,
subsections [Problem Description](#problem-description) and [Cases to Address](#cases-to-address)
explain the problem at hand from a high level perspective, i.e., abstracting away from the current
Tendermint implementation. In contrast, subsection
[Current Catch-up Mechanisms](#current-catch-up-mechanisms) delves into the details of the current
Tendermint code.
In the [Discussion](#discussion) section, subsection [Solutions Proposed](#solutions-proposed) is also
worded abstracting away from implementation details, whilst subsections
[Feasibility of the Proposed Solutions](#feasibility-of-the-proposed-solutions) and
[Current Limitations and Possible Implementations](#current-limitations-and-possible-implementations)
analize the viability of one of the proposed solutions in the context of Tendermint's architecture
based on reactors. Finally, [Formalization Work](#formalization-work) briefly discusses the work
still needed demonstrate the correctness of the chosen solution.
The high level subsections are aimed at readers who are familiar with consensus algorithms, in
particular with the one described in the Tendermint (white paper), but who are not necessarily
acquainted with the details of the Tendermint codebase. The other subsections, which go into
implementation details, are best understood by engineers with deep knowledge of the implementation of
Tendermint's blocksync and consensus reactors.
## Background
### Basic Definitions
This document assumes that all validators have equal voting power for the sake of simplicity. This is done
without loss of generality.
There are two types of votes in Tendermint: *prevotes* and *precommits*. Votes can be `nil` or refer to
a proposed block. This RFC focuses on precommits,
also known as *precommit votes*. In this document we sometimes call them simply *votes*.
Validators send precommit votes to their peer nodes in *precommit messages*. According to the
[ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md),
a precommit message MUST also contain a *vote extension*.
This mandatory vote extension can be empty, but MUST be signed with the same key as the precommit
vote (i.e., the sending validator's).
Nevertheless, the vote extension is signed independently from the vote, so a vote can be separated from
its extension.
The reason for vote extensions to be mandatory in precommit messages is that, otherwise, a (malicious)
node can omit a vote extension while still providing/forwarding/sending the corresponding precommit vote.
The validator set at height *h* is denoted *valset<sub>h</sub>*. A *commit* for height *h* consists of more
than *2n<sub>h</sub>/3* precommit votes voting for a block *b*, where *n<sub>h</sub>* denotes the size of
*valset<sub>h</sub>*. A commit does not contain `nil` precommit votes, and all votes in it refer to the
same block. An *extended commit* is a *commit* where every precommit vote has its respective vote extension
attached.
### Problem Description
In the version of [ABCI](https://github.com/tendermint/spec/blob/4fb99af/spec/abci/README.md) present up to
Tendermint v0.35, for any height *h*, a validator *v* MUST have the decided block *b* and a commit for
height *h* in order to decide at height *h*. Then, *v* just needs a commit for height *h* to propose at
height *h+1*, in the rounds of *h+1* where *v* is a proposer.
In [ABCI++](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md),
the information that a validator *v* MUST have to be able to decide in *h* does not change with
respect to pre-existing ABCI: the decided block *b* and a commit for *h*.
In contrast, for proposing in *h+1*, a commit for *h* is not enough: *v* MUST now have an extended
commit.
When a validator takes an active part in consensus at height *h*, it has all the data it needs in memory,
in its consensus state, to decide on *h* and propose in *h+1*. Things are not so easy in the cases when
*v* cannot take part in consensus because it is late (e.g., it falls behind, it crashes
and recovers, or it just starts after the others). If *v* does not take part, it cannot actively
gather precommit messages (which include vote extensions) in order to decide.
Before ABCI++, this was not a problem: full nodes are supposed to persist past blocks in the block store,
so other nodes would realise that *v* is late and send it the missing decided block at height *h* and
the corresponding commit (kept in block *h+1*) so that *v* can catch up.
However, we cannot apply this catch-up technique for ABCI++, as the vote extensions, which are part
of the needed *extended commit* are not part of the blockchain.
### Cases to Address
Before we tackle the description of the possible cases we need to address, let us describe the following
incremental improvement to the ABCI++ logic. Upon decision, a full node persists (e.g., in the block
store) the extended commit that allowed the node to decide. For the moment, let us assume the node only
needs to keep its *most recent* extended commit, and MAY remove any older extended commits from persistent
storage.
This improvement is so obvious that all solutions described in the [Discussion](#discussion) section use
it as a building block. Moreover, it completely addresses by itself some of the cases described in this
subsection.
We now describe the cases (i.e. possible *runs* of the system) that have been raised in different
discussions and need to be addressed. They are (roughly) ordered from easiest to hardest to deal with.
- **(a)** *Happy path: all validators advance together, no crash*.
This case is included for completeness. All validators have taken part in height *h*.
Even if some of them did not manage to send a precommit message for the decided block, they all
receive enough precommit messages to be able to decide. As vote extensions are mandatory in
precommit messages, every validator *v* trivially has all the information, namely the decided block
and the extended commit, needed to propose in height *h+1* for the rounds in which *v* is the
proposer.
No problem to solve here.
- **(b)** *All validators advance together, then all crash at the same height*.
This case has been raised in some discussions, the main concern being whether the vote extensions
for the previous height would be lost across the network. With the improvement described above,
namely persisting the latest extended commit at decision time, this case is solved.
When a crashed validator recovers, it recovers the last extended commit from persistent storage
and handshakes with the Application.
If need be, it also reconstructs messages for the unfinished height
(including all precommits received) from the WAL.
Then, the validator can resume where it was at the time of the crash. Thus, as extensions are
persisted, either in the WAL (in the form of received precommit messages), or in the latest
extended commit, the only way that vote extensions needed to start the next height could be lost
forever would be if all validators crashed and never recovered (e.g. disk corruption).
Since a *correct* node MUST eventually recover, this violates Tendermint's assumption of more than
*2n<sub>h</sub>/3* correct validators for every height *h*.
No problem to solve here.
- **(c)** *Lagging majority*.
Let us assume the validator set does not change between *h* and *h+1*.
It is not possible by the nature of the Tendermint algorithm, which requires more
than *2n<sub>h</sub>/3* precommit votes for some round of height *h* in order to make progress.
So, only up to *n<sub>h</sub>/3* validators can lag behind.
On the other hand, for the case where there are changes to the validator set between *h* and
*h+1* please see case (d) below, where the extreme case is discussed.
- **(d)** *Validator set changes completely between* h *and* h+1.
If sets *valset<sub>h</sub>* and *valset<sub>h+1</sub>* are disjoint,
more than *2n<sub>h</sub>/3* of validators in height *h* should
have actively participated in conensus in *h*. So, as of height *h*, only a minority of validators
in *h* can be lagging behind, although they could all lag behind from *h+1* on, as they are no
longer validators, only full nodes. This situation falls under the assumptions of case (h) below.
As for validators in *valset<sub>h+1</sub>*, as they were not validators as of height *h*, they
could all be lagging behind by that time. However, by the time *h* finishes and *h+1* begins, the
chain will halt until more than *2n<sub>h+1</sub>/3* of them have caught up and started consensus
at height *h+1*. If set *valset<sub>h+1</sub>* does not change in *h+2* and subsequent
heights, only up to *n<sub>h+1</sub>/3* validators will be able to lag behind. Thus, we have
converted this case into case (h) below.
- **(e)** *Enough validators crash to block the rest*.
In this case, blockchain progress halts, i.e. surviving full nodes keep increasing rounds
indefinitely, until some of the crashed validators are able to recover.
Those validators that recover first will handshake with the Application and recover at the height
they crashed, which is still the same the nodes that did not crash are stuck in, so they don't need
to catch up.
Further, they had persisted the extended commit for the previous height. Nothing to solve.
For those validators recovering later, we are in case (h) below.
- **(f)** *Some validators crash, but not enough to block progress*.
When the correct processes that crashed recover, they handshake with the Application and resume at
the height they were at when they crashed. As the blockchain did not stop making progress, the
recovered processes are likely to have fallen behind with respect to the progressing majority.
At this point, the recovered processes are in case (h) below.
- **(g)** *A new full node starts*.
The reasoning here also applies to the case when more than one full node are starting.
When the full node starts from scratch, it has no state (its current height is 0). Ignoring
statesync for the time being, the node just needs to catch up by applying past blocks one by one
(after verifying them).
Thus, the node is in case (h) below.
- **(h)** *Advancing majority, lagging minority*
In this case, some nodes are late. More precisely, at the present time, a set of full nodes,
denoted *L<sub>h<sub>p</sub></sub>*, are falling behind
(e.g., temporary disconnection or network partition, memory thrashing, crashes, new nodes)
an arbitrary
number of heights:
between *h<sub>s</sub>* and *h<sub>p</sub>*, where *h<sub>s</sub> < h<sub>p</sub>*, and
*h<sub>p</sub>* is the highest height
any correct full node has reached so far.
The correct full nodes that reached *h<sub>p</sub>* were able to decide for *h<sub>p</sub>-1*.
Therefore, less than *n<sub>h<sub>p</sub>-1</sub>/3* validators of *h<sub>p</sub>-1* can be part
of *L<sub>h<sub>p</sub></sub>*, since enough up-to-date validators needed to actively participate
in consensus for *h<sub>p</sub>-1*.
Since, at the present time,
no node in *L<sub>h<sub>p</sub></sub>* took part in any consensus between
*h<sub>s</sub>* and *h<sub>p</sub>-1*,
the reasoning above can be extended to validator set changes between *h<sub>s</sub>* and
*h<sub>p</sub>-1*. This results in the following restriction on the full nodes that can be part of *L<sub>h<sub>p</sub></sub>*.
- &forall; *h*, where *h<sub>s</sub> ≤ h < h<sub>p</sub>*,
| *valset<sub>h</sub>* &cap; *L<sub>h<sub>p</sub></sub>* | *< n<sub>h</sub>/3*
If this property does not hold for a particular height *h*, where
*h<sub>s</sub> ≤ h < h<sub>p</sub>*, Tendermint could not have progressed beyond *h* and
therefore no full node could have reached *h<sub>p</sub>* (a contradiction).
These lagging nodes in *L<sub>h<sub>p</sub></sub>* need to catch up. They have to obtain the
information needed to make
progress from other nodes. For each height *h* between *h<sub>s</sub>* and *h<sub>p</sub>-2*,
this includes the decided block for *h*, and the
precommit votes also for *deciding h* (which can be extracted from the block at height *h+1*).
At a given height *h<sub>c</sub>* (where possibly *h<sub>c</sub> << h<sub>p</sub>*),
a full node in *L<sub>h<sub>p</sub></sub>* will consider itself *caught up*, based on the
(maybe out of date) information it is getting from its peers. Then, the node needs to be ready to
propose at height *h<sub>c</sub>+1*, which requires having received the vote extensions for
*h<sub>c</sub>*.
As the vote extensions are *not* stored in the blocks, and it is difficult to have strong
guarantees on *when* a late node considers itself caught up, providing the late node with the right
vote extensions for the right height poses a problem.
At this point, we have described and compared all cases raised in discussions leading up to this
RFC. The list above aims at being exhaustive. The analysis of each case included above makes all of
them converge into case (h).
### Current Catch-up Mechanisms
We now briefly describe the current catch-up mechanisms in the reactors concerned in Tendermint.
#### Statesync
Full nodes optionally run statesync just after starting, when they start from scratch.
If statesync succeeds, an Application snapshot is installed, and Tendermint jumps from height 0 directly
to the height the Application snapshop represents, without applying the block of any previous height.
Some light blocks are received and stored in the block store for running light-client verification of
all the skipped blocks. Light blocks are incomplete blocks, typically containing the header and the
canonical commit but, e.g., no transactions. They are stored in the block store as "signed headers".
The statesync reactor is not really relevant for solving the problem discussed in this RFC. We will
nevertheless mention it when needed; in particular, to understand some corner cases.
#### Blocksync
The blocksync reactor kicks in after start up or recovery (and, optionally, after statesync is done)
and sends the following messages to its peers:
- `StatusRequest` to query the height its peers are currently at, and
- `BlockRequest`, asking for blocks of heights the local node is missing.
Using `BlockResponse` messages received from peers, the blocksync reactor validates each received
block using the block of the following height, saves the block in the block store, and sends the
block to the Application for execution.
If blocksync has validated and applied the block for the height *previous* to the highest seen in
a `StatusResponse` message, or if no progress has been made after a timeout, the node considers
itself as caught up and switches to the consensus reactor.
#### Consensus Reactor
The consensus reactor runs the full Tendermint algorithm. For a validator this means it has to
propose blocks, and send/receive prevote/precommit messages, as mandated by Tendermint, before it can
decide and move on to the next height.
If a full node that is running the consensus reactor falls behind at height *h*, when a peer node
realises this it will retrieve the canonical commit of *h+1* from the block store, and *convert*
it into a set of precommit votes and will send those to the late node.
## Discussion
### Solutions Proposed
These are the solutions proposed in discussions leading up to this RFC.
- **Solution 0.** *Vote extensions are made **best effort** in the specification*.
This is the simplest solution, considered as a way to provide vote extensions in a simple enough
way so that it can be part of v0.36.
It consists in changing the specification so as to not *require* that precommit votes used upon
`PrepareProposal` contain their corresponding vote extensions. In other words, we render vote
extensions optional.
There are strong implications stemming from such a relaxation of the original specification.
- As a vote extension is signed *separately* from the vote it is extending, an intermediate node
can now remove (i.e., censor) vote extensions from precommit messages at will.
- Further, there is no point anymore in the spec requiring the Application to accept a vote extension
passed via `VerifyVoteExtension` to consider a precommit message valid in its entirety. Remember
this behavior of `VerifyVoteExtension` is adding a constraint to Tendermint's conditions for
liveness.
In this situation, it is better and simpler to just drop the vote extension rejected by the
Application via `VerifyVoteExtension`, but still consider the precommit vote itself valid as long
as its signature verifies.
- **Solution 1.** *Include vote extensions in the blockchain*.
Another obvious solution, which has somehow been considered in the past, is to include the vote
extensions and their signatures in the blockchain.
The blockchain would thus include the extended commit, rather than a regular commit, as the structure
to be canonicalized in the next block.
With this solution, the current mechanisms implemented both in the blocksync and consensus reactors
would still be correct, as all the information a node needs to catch up, and to start proposing when
it considers itself as caught-up, can now be recovered from past blocks saved in the block store.
This solution has two main drawbacks.
- As the block format must change, upgrading a chain requires a hard fork. Furthermore,
all existing light client implementations will stop working until they are upgraded to deal with
the new format (e.g., how certain hashes calculated and/or how certain signatures are checked).
For instance, let us consider IBC, which relies on light clients. An IBC connection between
two chains will be broken if only one chain upgrades.
- The extra information (i.e., the vote extensions) that is now kept in the blockchain is not really
needed *at every height* for a late node to catch up.
- This information is only needed to be able to *propose* at the height the validator considers
itself as caught-up. If a validator is indeed late for height *h*, it is useless (although
correct) for it to call `PrepareProposal`, or `ExtendVote`, since the block is already decided.
- Moreover, some use cases require pretty sizeable vote extensions, which would result in an
important waste of space in the blockchain.
- **Solution 2.** *Skip* propose *step in Tendermint algorithm*.
This solution consists in modifying the Tendermint algorithm to skip the *send proposal* step in
heights where the node does not have the required vote extensions to populate the call to
`PrepareProposal`. The main idea behind this is that it should only happen when the validator is late
and, therefore, up-to-date validators have already proposed (and decided) for that height.
A small variation of this solution is, rather than skipping the *send proposal* step, the validator
sends a special *empty* or *bottom* (⊥) proposal to signal other nodes that it is not ready to propose
at (any round of) the current height.
The appeal of this solution is its simplicity. A possible implementation does not need to extend
the data structures, or change the current catch-up mechanisms implemented in the blocksync or
in the consensus reactor. When we lack the needed information (vote extensions), we simply rely
on another correct validator to propose a valid block in other rounds of the current height.
However, this solution can be attacked by a byzantine node in the network in the following way.
Let us consider the following scenario:
- all validators in *valset<sub>h</sub>* send out precommit messages, with vote extensions,
for height *h*, round 0, roughly at the same time,
- all those precommit messages contain non-`nil` precommit votes, which vote for block *b*
- all those precommit messages sent in height *h*, round 0, and all messages sent in
height *h*, round *r > 0* get delayed indefinitely, so,
- all validators in *valset<sub>h</sub>* keep waiting for enough precommit
messages for height *h*, round 0, needed for deciding in height *h*
- an intermediate (malicious) full node *m* manages to receive block *b*, and gather more than
*2n<sub>h</sub>/3* precommit messages for height *h*, round 0,
- one way or another, the solution should have either (a) a mechanism for a full node to *tell*
another full node it is late, or (b) a mechanism for a full node to conclude it is late based
on other full nodes' messages; any of these mechanisms should, at the very least,
require the late node receiving the decided block and a commit (not necessarily an extended
commit) for *h*,
- node *m* uses the gathered precommit messages to build a commit for height *h*, round 0,
- in order to convince full nodes that they are late, node *m* either (a) *tells* them they
are late, or (b) shows them it (i.e. *m*) is ahead, by sending them block *b*, along with the
commit for height *h*, round 0,
- all full nodes conclude they are late from *m*'s behavior, and use block *b* and the commit for
height *h*, round 0, to decide on height *h*, and proceed to height *h+1*.
At this point, *all* full nodes, including all validators in *valset<sub>h+1</sub>*, have advanced
to height *h+1* believing they are late, and so, expecting the *hypothetical* leading majority of
validators in *valset<sub>h+1</sub>* to propose for *h+1*. As a result, the blockhain
grinds to a halt.
A (rather complex) ad-hoc mechanism would need to be carried out by node operators to roll
back all validators to the precommit step of height *h*, round *r*, so that they can regenerate
vote extensions (remember vote extensions are non-deterministic) and continue execution.
- **Solution 3.** *Require extended commits to be available at switching time*.
This one is more involved than all previous solutions, and builds on an idea present in Solution 2:
vote extensions are actually not needed for Tendermint to make progress as long as the
validator is *certain* it is late.
We define two modes. The first is denoted *catch-up mode*, and Tendermint only calls
`FinalizeBlock` for each height when in this mode. The second is denoted *consensus mode*, in
which the validator considers itself up to date and fully participates in consensus and calls
`PrepareProposal`/`ProcessProposal`, `ExtendVote`, and `VerifyVoteExtension`, before calling
`FinalizeBlock`.
The catch-up mode does not need vote extension information to make progress, as all it needs is the
decided block at each height to call `FinalizeBlock` and keep the state-machine replication making
progress. The consensus mode, on the other hand, does need vote extension information when
starting every height.
Validators are in consensus mode by default. When a validator in consensus mode falls behind
for whatever reason, e.g. cases (b), (d), (e), (f), (g), or (h) above, we introduce the following
key safety property:
- for every height *h<sub>p</sub>*, a full node *f* in *h<sub>p</sub>* refuses to switch to catch-up
mode **until** there exists a height *h'* such that:
- *p* has received and (light-client) verified the blocks of
all heights *h*, where *h<sub>p</sub> ≤ h ≤ h'*
- it has received an extended commit for *h'* and has verified:
- the precommit vote signatures in the extended commit
- the vote extension signatures in the extended commit: each is signed with the same
key as the precommit vote it extends
If the condition above holds for *h<sub>p</sub>*, namely receiving a valid sequence of blocks in
the *f*'s future, and an extended commit corresponding to the last block in the sequence, then
node *f*:
- switches to catch-up mode,
- applies all blocks between *h<sub>p</sub>* and *h'* (calling `FinalizeBlock` only), and
- switches back to consensus mode using the extended commit for *h'* to propose in the rounds of
*h' + 1* where it is the proposer.
This mechanism, together with the invariant it uses, ensures that the node cannot be attacked by
being fed a block without extensions to make it believe it is late, in a similar way as explained
for Solution 2.
### Feasibility of the Proposed Solutions
Solution 0, besides the drawbacks described in the previous section, provides guarantees that are
weaker than the rest. The Application does not have the assurance that more than *2n<sub>h</sub>/3* vote
extensions will *always* be available when calling `PrepareProposal` at height *h+1*.
This level of guarantees is probably not strong enough for vote extensions to be useful for some
important use cases that motivated them in the first place, e.g., encrypted mempool transactions.
Solution 1, while being simple in that the changes needed in the current Tendermint codebase would
be rather small, is changing the block format, and would therefore require all blockchains using
Tendermint v0.35 or earlier to hard-fork when upgrading to v0.36.
Since Solution 2 can be attacked, one might prefer Solution 3, even if it is more involved
to implement. Further, we must elaborate on how we can turn Solution 3, described in abstract
terms in the previous section, into a concrete implementation compatible with the current
Tendermint codebase.
### Current Limitations and Possible Implementations
The main limitations affecting the current version of Tendermint are the following.
- The current version of the blocksync reactor does not use the full
[light client verification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/light-client/README.md)
algorithm to validate blocks coming from other peers.
- The code being structured into the blocksync and consensus reactors, only switching from the
blocksync reactor to the consensus reactor is supported; switching in the opposite direction is
not supported. Alternatively, the consensus reactor could have a mechanism allowing a late node
to catch up by skipping calls to `PrepareProposal`/`ProcessProposal`, and
`ExtendVote`/`VerifyVoteExtension` and only calling `FinalizeBlock` for each height.
Such a mechanism does not exist at the time of writing this RFC.
The blocksync reactor featuring light client verification is being actively worked on (tentatively
for v0.37). So it is best if this RFC does not try to delve into that problem, but just makes sure
its outcomes are compatible with that effort.
In subsection [Cases to Address](#cases-to-address), we concluded that we can focus on
solving case (h) in theoretical terms.
However, as the current Tendermint version does not yet support switching back to blocksync once a
node has switched to consensus, we need to split case (h) into two cases. When a full node needs to
catch up...
- **(h.1)** ... it has not switched yet from the blocksync reactor to the consensus reactor, or
- **(h.2)** ... it has already switched to the consensus reactor.
This is important in order to discuss the different possible implementations.
#### Base Implementation: Persist and Propagate Extended Commit History
In order to circumvent the fact that we cannot switch from the consensus reactor back to blocksync,
rather than just keeping the few most recent extended commits, nodes will need to keep
and gossip a backlog of extended commits so that the consensus reactor can still propose and decide
in out-of-date heights (even if those proposals will be useless).
The base implementation for which an experimental patch exists consists in the conservative
approach of persisting in the block store *all* extended commits for which we have also stored
the full block. Currently, when statesync is run at startup, it saves light blocks.
This base implementation does not seek
to receive or persist extended commits for those light blocks as they would not be of any use.
Then, we modify the blocksync reactor so that peers *always* send requested full blocks together
with the corresponding extended commit in the `BlockResponse` messages. This guarantees that the
block store being reconstructed by blocksync has the same information as that of peers that are
up to date (at least starting from the latest snapshot applied by statesync before starting blocksync).
Thus, blocksync has all the data it requires to switch to the consensus reactor, as long as one of
the following exit conditions are met:
- The node is still at height 0 (where no commit or extended commit is needed)
- The node has processed at least 1 block in blocksync
The second condition is needed in case the node has installed an Application snapshot during statesync.
If that is the case, at the time blocksync starts, the block store only has the data statesync has saved:
light blocks, and no extended commits.
Hence we need to blocksync at least one block from another node, which will be sent with its corresponding extended commit, before we can switch to consensus.
As a side note, a chain might be started at a height *h<sub>i</sub> > 0*, all other heights
*h < h<sub>i</sub>* being non-existent. In this case, the chain is still considered to be at height 0 before
block *h<sub>i</sub>* is applied, so the first condition above allows the node to switch to consensus even
if blocksync has not processed any block (which is always the case if all nodes are starting from scratch).
When a validator falls behind while having already switched to the consensus reactor, a peer node can
simply retrieve the extended commit for the required height from the block store and reconstruct a set of
precommit votes together with their extensions and send them in the form of precommit messages to the
validator falling behind, regardless of whether the peer node holds the extended commit because it
actually participated in that consensus and thus received the precommit messages, or it received the extended commit via a `BlockResponse` message while running blocksync.
This solution requires a few changes to the consensus reactor:
- upon saving the block for a given height in the block store at decision time, save the
corresponding extended commit as well
- in the catch-up mechanism, when a node realizes that another peer is more than 2 heights
behind, it uses the extended commit (rather than the canoncial commit as done previously) to
reconstruct the precommit votes with their corresponding extensions
The changes to the blocksync reactor are more substantial:
- the `BlockResponse` message is extended to include the extended commit of the same height as
the block included in the response (just as they are stored in the block store)
- structure `bpRequester` is likewise extended to hold the received extended commits coming in
`BlockResponse` messages
- method `PeekTwoBlocks` is modified to also return the extended commit corresponding to the first block
- when successfully verifying a received block, the reactor saves its corresponding extended commit in
the block store
The two main drawbacks of this base implementation are:
- the increased size taken by the block store, in particular with big extensions
- the increased bandwith taken by the new format of `BlockResponse`
#### Possible Optimization: Pruning the Extended Commit History
If we cannot switch from the consensus reactor back to the blocksync reactor we cannot prune the extended commit backlog in the block store without sacrificing the implementation's correctness. The asynchronous
nature of our distributed system model allows a process to fall behing an arbitrary number of
heights, and thus all extended commits need to be kept *just in case* a node that late had
previously switched to the consensus reactor.
However, there is a possibility to optimize the base implementation. Every time we enter a new height,
we could prune from the block store all extended commits that are more than *d* heights in the past.
Then, we need to handle two new situations, roughly equivalent to cases (h.1) and (h.2) described above.
- (h.1) A node starts from scratch or recovers after a crash. In thisy case, we need to modify the
blocksync reactor's base implementation.
- when receiving a `BlockResponse` message, it MUST accept that the extended commit set to `nil`,
- when sending a `BlockResponse` message, if the block store contains the extended commit for that
height, it MUST set it in the message, otherwise it sets it to `nil`,
- the exit conditions used for the base implementation are no longer valid; the only reliable exit
condition now consists in making sure that the last block processed by blocksync was received with
the corresponding commit, and not `nil`; this extended commit will allow the node to switch from
the blocksync reactor to the consensus reactor and immediately act as a proposer if required.
- (h.2) A node already running the consensus reactor falls behind beyond *d* heights. In principle,
the node will be stuck forever as no other node can provide the vote extensions it needs to make
progress (they all have pruned the corresponding extended commit).
However we can manually have the node crash and recover as a workaround. This effectively converts
this case into (h.1).
### Formalization Work
A formalization work to show or prove the correctness of the different use cases and solutions
presented here (and any other that may be found) needs to be carried out.
A question that needs a precise answer is how many extended commits (one?, two?) a node needs
to keep in persistent memory when implementing Solution 3 described above without Tendermint's
current limitations.
Another important invariant we need to prove formally is that the set of vote extensions
required to make progress will always be held somewhere in the network.
## References
- [ABCI++ specification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/abci%2B%2B/README.md)
- [ABCI as of v0.35](https://github.com/tendermint/spec/blob/4fb99af/spec/abci/README.md)
- [Vote extensions issue](https://github.com/tendermint/tendermint/issues/8174)
- [Light client verification](https://github.com/tendermint/tendermint/blob/4743a7ad0/spec/light-client/README.md)

View File

@@ -0,0 +1,555 @@
# RFC 018: BLS Signature Aggregation Exploration
## Changelog
- 01-April-2022: Initial draft (@williambanfield).
- 15-April-2022: Draft complete (@williambanfield).
## Abstract
## Background
### Glossary
The terms that are attached to these types of cryptographic signing systems
become confusing quickly. Different sources appear to use slightly different
meanings of each term and this can certainly add to the confusion. Below is
a brief glossary that may be helpful in understanding the discussion that follows.
* **Short Signature**: A signature that does not vary in length with the
number of signers.
* **Multi-Signature**: A signature generated over a single message
where, given the message and signature, a verifier is able to determine that
all parties signed the message. May be short or may vary with the number of signers.
* **Aggregated Signature**: A _short_ signature generated over messages with
possibly different content where, given the messages and signature, a verifier
should be able to determine that all the parties signed the designated messages.
* **Threshold Signature**: A _short_ signature generated from multiple signers
where, given a message and the signature, a verifier is able to determine that
a large enough share of the parties signed the message. The identities of the
parties that contributed to the signature are not revealed.
* **BLS Signature**: An elliptic-curve pairing-based signature system that
has some nice properties for short multi-signatures. May stand for
*Boneh-Lynn-Schacham* or *Barreto-Lynn-Scott* depending on the context. A
BLS signature is type of signature scheme that is distinct from other forms
of elliptic-curve signatures such as ECDSA and EdDSA.
* **Interactive**: Cryptographic scheme where parties need to perform one or
more request-response cycles to produce the cryptographic material. For
example, an interactive signature scheme may require the signer and the
verifier to cooperate to create and/or verify the signature, rather than a
signature being created ahead of time.
* **Non-interactive**: Cryptographic scheme where parties do not need to
perform any request-response cycles to produce the cryptographic material.
### Brief notes on pairing-based elliptic-curve cryptography
Pairing-based elliptic-curve cryptography is quite complex and relies on several
types of high-level math. Cryptography, in general, relies on being able to find
problems with an asymmetry between the difficulty of calculating the solution
and verifying that a given solution is correct.
Pairing-based cryptography works by operating on mathematical functions that
satisfy the property of **bilinear mapping**. This property is satisfied for
functions `e` with values `P`, `Q`, `R` and `S` where `e(P, Q + R) = e(P, Q) * e(P, R)`
and `e(P + S, Q) = e(P, Q) * e(S, Q)`. The most familiar example of this is
exponentiation. Written in common notation, `g^P*(Q+R) = g^(P*Q) * g^(P*R)` for
some value `g`.
Pairing-based elliptic-curve cryptography creates a bilinear mapping using
elliptic curves over a finite field. With some original curve, you can define two groups,
`G1` and `G2` which are points of the original curve _modulo_ different values.
Finally, you define a third group `Gt`, where points from `G1` and `G2` satisfy
the property of bilinearity with `Gt`. In this scheme, the function `e` takes
as inputs points in `G1` and `G2` and outputs values in `Gt`. Succintly, given
some point `P` in `G1` and some point `Q` in `G1`, `e(P, Q) = C` where `C` is in `Gt`.
You can efficiently compute the mapping of points in `G1` and `G2` into `Gt`,
but you cannot efficiently determine what points were summed and paired to
produce the value in `Gt`.
Functions are then defined to map digital signatures, messages, and keys into
and out of points of `G1` or `G2` and signature verification is the process
of calculating if a set of values representing a message, public key, and digital
signature produce the same value in `Gt` through `e`.
Signatures can be created as either points in `G1` with public keys being
created as points in `G2` or vice versa. For the case of BLS12-381, the popular
curve used, points in `G1` are represented with 48 bytes and points in `G2` are
represented with 96 bytes. It is up to the implementer of the cryptosystem to
decide which should be larger, the public keys or the signatures.
BLS signatures rely on pairing-based elliptic-curve cryptography to produce
various types of signatures. For a more in-depth but still high level discussion
pairing-based elliptic-curve cryptography, see Vitalik Buterin's post on
[Exploring Elliptic Curve Pairings][vitalik-pairing-post]. For much more in
depth discussion, see the specific paper on BLS12-381, [Short signatures from
the Weil Pairing][bls-weil-pairing] and
[Compact Multi-Signatures for Smaller Blockchains][multi-signatures-smaller-blockchains].
### Adoption
BLS signatures have already gained traction within several popular projects.
* Algorand is working on an implementation.
* [Zcash][zcash-adoption] has adopted BLS12-381 into the protocol.
* [Ethereum 2.0][eth-2-adoption] has adopted BLS12-381 into the protocol.
* [Chia Network][chia-adoption] has adopted BLS for signing blocks.
* [Ostracon][line-ostracon-pr], a fork of Tendermint has adopted BLS for signing blocks.
### What systems may be affected by adding aggregated signatures?
#### Gossip
Gossip could be updated to aggregate vote signatures during a consensus round.
This appears to be of frankly little utility. Creating an aggregated signature
incurs overhead, so frequently re-aggregating may incur a significant
overhead. How costly this is is still subject to further investigation and
performance testing.
Even if vote signatures were aggregated before gossip, each validator would still
need to receive and verify vote extension data from each (individual) peer validator in
order for consensus to proceed. That displaces any advantage gained by aggregating signatures across the vote message in the presence of vote extensions.
#### Block Creation
When creating a block, the proposer may create a small set of short
multi-signatures and attach these to the block instead of including one
signature per validator.
#### Block Verification
Currently, we verify each validator signature using the public key associated
with that validator. With signature aggregation, verification of blocks would
not verify many signatures individually, but would instead check the (single)
multi-signature using the public keys stored by the validator. This would also
require a mechanism for indicating which validators are included in the
aggregated signature.
#### IBC Relaying
IBC would no longer need to transmit a large set of signatures when
updating state. These state updates do not happen for every IBC packet, only
when changing an IBC light client's view of the counterparty chain's state.
General [IBC packets][ibc-packet] only contain enough information to correctly
route the data to the counterparty chain.
IBC does persist commit signatures to the chain in these `MsgUpdateClient`
message when updating state. This message would no longer need the full set
of unique signatures and would instead only need one signature for all of the
data in the header.
Adding BLS signatures would create a new signature type that must be
understood by the IBC module and by the relayers. For some operations, such
as state updates, the set of data written into the chain and received by the
IBC module could be slightly smaller.
## Discussion
### What are the proposed benefits to aggregated signatures?
#### Reduce Block Size
At the moment, a commit contains a 64-byte (512-bit) signature for each validator
that voted for the block. For the Cosmos Hub, which has 175 validators in the
active set, this amounts to about 11 KiB per block. That gives an upper bound of
around 113 GiB over the lifetime of the chain's 10.12M blocks. (Note, the Hub has
increased the number of validators in the active set over time so the total
signature size over the history of the chain is likely somewhat less than that).
Signature aggregation would only produce two signatures for the entire block.
One for the yeas and one for the nays. Each BLS aggregated signature is 48
bytes, per the [IETF standard of BLS signatures][bls-ietf-ecdsa-compare].
Over the lifetime of the same Cosmos Hub chain, that would amount to about 1
GB, a savings of 112 GB. While that is a large factor of reduction it's worth
bearing in mind that, at [GCP's cost][gcp-storage-pricing] of $.026 USD per GB,
that is a total savings of around $2.50 per month.
#### Reduce Signature Creation and Verification Time
From the [IETF draft standard on BLS Signatures][bls-ietf], BLS signatures can be
created in 370 microseconds and verified in 2700 microseconds. Our current
[Ed25519 implementation][voi-ed25519-perf] was benchmarked locally to take
13.9 microseconds to produce a signature and 2.03 milliseconds to batch verify
128 signatures, which is slightly fewer than the 175 in the Hub. blst, a popular
implementation of BLS signature aggregation was benchmarked to perform verification
on 100 signatures in 1.5 milliseconds [when run locally][blst-verify-bench]
on an 8 thread machine and pre-aggregated public keys. It is worth noting that
the `ed25519` library verification time grew steadily with the number of signatures,
whereas the bls library verification time remains constant. This is because the
number of operations used to verify a signature does not grow at all with the
number of signatures included in the aggregate signature (as long as the signers
signed over the same message data as is the case in Tendermint).
It is worth noting that this would also represent a _degredation_ in signature
verification time for chains with small validator sets. When batch verifying
only 32 signatures, our ed25519 library takes .57 milliseconds, whereas BLS
would still require the same 1.5 milliseconds.
For massive validator sets, blst dominates, taking the same 1.5 milliseconds to
check an aggregated signature from 1024 validators versus our ed25519 library's
13.066 milliseconds to batch verify a set of that size.
#### Reduce Light-Client Verification Time
The light client aims to be a faster and lighter-weight way to verify that a
block was voted on by a Tendermint network. The light client fetches
Tendermint block headers and commit signatures, performing public key
verification to ensure that the associated validator set signed the block.
Reducing the size of the commit signature would allow the light client to fetch
block data more quickly.
Additionally, the faster signature verification times of BLS signatures mean
that light client verification would proceed more quickly.
However, verification of an aggregated signature is all-or-nothing. The verifier
cannot check that some singular signer had a signature included in the block.
Instead, the verifier must use all public keys to check if some signature
was included. This does mean that any light client implementation must always
be able to fetch all public keys for any height instead of potentially being
able to check if some singular validator's key signed the block.
#### Reduce Gossip Bandwidth
##### Vote Gossip
It is possible to aggregate subsets of signatures during voting, so that the
network need not gossip all *n* validator signatures to all *n* validators.
Theoretically, subsets of the signatures could be aggregated during consensus
and vote messages could carry those aggregated signatures. Implementing this
would certainly increase the complexity of the gossip layer but could possibly
reduce the total number of signatures required to be verified by each validator.
##### Block Gossip
A reduction in the block size as a result of signature aggregation would
naturally lead to a reduction in the bandwidth required to gossip a block.
Each validator would only send and receive the smaller aggregated signatures
instead of the full list of multi-signatures as we have them now.
### What are the drawbacks to aggregated signatures?
#### Heterogeneous key types cannot be aggregated
Aggregation requires a specific signature algorithm, and our legacy signing schemes
cannot be aggregated. In practice, this means that aggregated signatures could
be created for a subset of validators using BLS signatures, and validators
with other key types (such as Ed25519) would still have to be be separately
propagated in blocks and votes.
#### Many HSMs do not support aggregated signatures
**Hardware Signing Modules** (HSM) are a popular way to manage private keys.
They provide additional security for key management and should be used when
possible for storing highly sensitive private key material.
Below is a list of popular HSMs along with their support for BLS signatures.
* YubiKey
* [No support][yubi-key-bls-support]
* Amazon Cloud HSM
* [No support][cloud-hsm-support]
* Ledger
* [Lists support for the BLS12-381 curve][ledger-bls-announce]
I cannot find support listed for Google Cloud, although perhaps it exists.
## Feasibility of implementation
This section outlines the various hurdles that would exist to implementing BLS
signature aggregation into Tendermint. It aims to demonstrate that we _could_
implement BLS signatures but that it would incur risk and require breaking changes for a
reasonably unclear benefit.
### Can aggregated signatures be added as soft-upgrades?
In my estimation, yes. With the implementation of proposer-based timestamps,
all validators now produce signatures on only one of two messages:
1. A [CanonicalVote][canonical-vote-proto] where the BlockID is the hash of the block or
2. A `CanonicalVote` where the `BlockID` is nil.
The block structure can be updated to perform hashing and validation in a new
way as a soft upgrade. This would look like adding a new section to the [Block.Commit][commit-proto] structure
alongside the current `Commit.Signatures` field. This new field, tentatively named
`AggregatedSignature` would contain the following structure:
```proto
message AggregatedSignature {
// yeas is a BitArray representing which validators in the active validator
// set issued a 'yea' vote for the block.
tendermint.libs.bits.BitArray yeas = 1;
// absent is a BitArray representing which validators in the active
// validator set did not issue votes for the block.
tendermint.libs.bits.BitArray absent = 2;
// yea_signature is an aggregated signature produced from all of the vote
// signatures for the block.
repeated bytes yea_signature = 3;
// nay_signature is an aggregated signature produced from all of the vote
// signatures from votes for 'nil' for this block.
// nay_signature should be made from all of the validators that were both not
// in the 'yeas' BitArray and not in the 'absent' BitArray.
repeated bytes nay_signature = 4;
}
```
Adding this new field as a soft upgrade would mean hashing this data structure
into the blockID along with the old `Commit.Signatures` when both are present
as well as ensuring that the voting power represented in the new
`AggregatedSignature` and `Signatures` field was enough to commit the block
during block validation. One can certainly imagine other possible schemes for
implementing this but the above should serve as a simple enough proof of concept.
### Implementing vote-time and commit-time signature aggregation separately
Implementing aggregated BLS signatures as part of the block structure can easily be
achieved without implementing any 'vote-time' signature aggregation.
The block proposer would gather all of the votes, complete with signatures,
as it does now, and produce a set of aggregate signatures from all of the
individual vote signatures.
Implementing 'vote-time' signature aggregation cannot be achieved without
also implementing commit-time signature aggregation. This is because such
signatures cannot be dis-aggregated into their constituent pieces. Therefore,
in order to implement 'vote-time' signature aggregation, we would need to
either first implement 'commit-time' signature aggregation, or implement both
'vote-time' signature aggregation while also updating the block creation and
verification protocols to allow for aggregated signatures.
### Updating IBC clients
In order for IBC clients to function, they must be able to perform light-client
verification of blocks on counterparty chains. Because BLS signatures are not
currently part of light-clients, chains that transmit messages over IBC
cannot update to using BLS signatures without their counterparties first
being upgraded to parse and verify BLS. If chains upgrade without their
counterparties first updating, they will lose the ability to interoperate with
non-updated chains.
### New attack surfaces
BLS signatures and signature aggregation comes with a new set of attack surfaces.
Additionally, it's not clear that all possible major attacks are currently known
on the BLS aggregation schemes since new ones have been discovered since the ietf
draft standard was written. The known attacks are manageable and are listed below.
Our implementation would need to prevent against these but this does not appear
to present a significant hurdle to implementation.
#### Rogue key attack prevention
Generating an aggregated signature requires guarding against what is called
a [rogue key attack][bls-ietf-terms]. A rogue key attack is one in which a
malicious actor can craft an _aggregate_ key that can produce signatures that
appear to include a signature from a private key that the malicious actor
does not actually know. In Tendermint terms, this would look like a Validator
producing a vote signed by both itself and some other validator where the other
validator did not actually produce the vote itself.
The main mechanisms for preventing this require that each entity prove that it
can can sign data with just their private key. The options involve either
ensuring that each entity sign a _different_ message when producing every
signature _or_ producing a [proof of possession][bls-ietf-pop] (PoP) when announcing
their key to the network.
A PoP is a message that demonstrates ownership of a private
key. A simple scheme for PoP is one where the entity announcing
its new public key to the network includes a digital signature over the bytes
of the public key generated using the associated private key. Everyone receiving
the public key and associated proof-of-possession can easily verify the
signature and be sure the entity owns the private key.
This PoP scheme suits the Tendermint use case quite well since
validator keys change infrequently so the associated PoPs would not be onerous
to produce, verify, and store. Using this scheme allows signature verification
to proceed more quickly, since all signatures are over identical data and
can therefore be checked using an aggregated public key instead of one at a
time, public key by public key.
#### Summing Zero Attacks
[Summing zero attacks][summing-zero-paper] are attacks that rely on using the '0' point of an
elliptic curve. For BLS signatures, if the point 0 is chosen as the private
key, then the 0 point will also always be the public key and all signatures
produced by the key will also be the 0 point. This is easy enough to
detect when verifying each signature individually.
However, because BLS signature aggregation creates an aggregated signature and
an aggregated public key, a set of colluding signers can create a pair or set
of signatures that are non-zero but which aggregate ("sum") to 0. The signatures that sum zero along with the
summed public key of the colluding signers will verify any message. This would
allow the colluding signers to sign any block or message with the same signature.
This would be reasonably easy to detect and create evidence for because, in
all other cases, the same signature should not verify more than message. It's
not exactly clear how such an attack would advantage the colluding validators
because the normal mechanisms of evidence gathering would still detect the
double signing, regardless of the signatures on both blocks being identical.
### Backwards Compatibility
Backwards compatibility is an important consideration for signature verification.
Specifically, it is important to consider whether chains using current versions
of IBC would be able to interact with chains adopting BLS.
Because the `Block` shared by IBC and Tendermint is produced and parsed using
protobuf, new structures can be added to the Block without breaking the
ability of legacy users to parse the new structure. Breaking changes between
current users of IBC and new Tendermint blocks only occur if data that is
relied upon by the current users is no longer included in the current fields.
For the case of BLS aggregated signatures, a new `AggregatedSignature` field
can therefore be added to the `Commit` field without breaking current users.
Current users will be broken when counterparty chains upgrade to the new version
and _begin using_ BLS signatures. Once counterparty chains begin using BLS
signatures, the BlockID hashes will include hashes of the `AggregatedSignature`
data structure that the legacy users will not be able to compute. Additionally,
the legacy software will not be able to parse and verify the signatures to
ensure that a supermajority of validators from the counterparty chain signed
the block.
### Library Support
Libraries for BLS signature creation are limited in number, although active
development appears to be ongoing. Cryptographic algorithms are difficult to
implement correctly and correctness issues are extremely serious and dangerous.
No further exploration of BLS should be undertaken without strong assurance of
a well-tested library with continuing support for creating and verifying BLS
signatures.
At the moment, there is one candidate, `blst`, that appears to be the most
mature and well vetted. While this library is undergoing continuing auditing
and is supported by funds from the Ethereum foundation, adopting a new cryptographic
library presents some serious risks. Namely, if the support for the library were
to be discontinued, Tendermint may become saddled with the requirement of supporting
a very complex piece of software or force a massive ecosystem-wide migration away
from BLS signatures.
This is one of the more serious reasons to avoid adopting BLS signatures at this
time. There is no gold standard library. Some projects look promising, but no
project has been formally verified with a long term promise of being supported
well into the future.
#### Go Standard Library
The Go Standard library has no implementation of BLS signatures.
#### BLST
[blst][blst], or 'blast' is an implementation of BLS signatures written in C
that provides bindings into Go as part of the repository. This library is
actively undergoing formal verification by Galois and previously received an
initial audit by NCC group, a firm I'd never heard of.
`blst` is [targeted for use in prysm][prysm-blst], the golang implementation of Ethereum 2.0.
#### Gnark-Crypto
[Gnark-Crypto][gnark] is a Go-native implementation of elliptic-curve pairing-based
cryptography. It is not audited and is documented as 'as-is', although
development appears to be active so formal verification may be forthcoming.
#### CIRCL
[CIRCL][circl] is a go-native implementation of several cryptographic primitives,
bls12-381 among them. The library is written and maintained by Cloudflare and
appears to receive frequent contributions. However, it lists itself as experimental
and urges users to take caution before using it in production.
### Added complexity to light client verification
Implementing BLS signature aggregation in Tendermint would pose issues for the
light client. The light client currently validates a subset of the signatures
on a block when performing the verification algorithm. This is no longer possible
with an aggregated signature. Aggregated signature verification is all-or-nothing.
The light client could no longer check that a subset of validators from some
set of validators is represented in the signature. Instead, it would need to create
a new aggregated key with all the stated signers for each height it verified where
the validator set changed.
This means that the speed advantages gained by using BLS cannot be fully realized
by the light client since the client needs to perform the expensive operation
of re-aggregating the public key. Aggregation is _not_ constant time in the
number of keys and instead grows linearly. When [benchmarked locally][blst-verify-bench-agg],
blst public key aggregation of 128 keys took 2.43 milliseconds. This, along with
the 1.5 milliseconds to verify a signature would raise light client signature
verification time to 3.9 milliseconds, a time above the previously mentioned
batch verification time using our ed25519 library of 2.0 milliseconds.
Schemes to cache aggregated subsets of keys could certainly cut this time down at the
cost of adding complexity to the light client.
### Added complexity to evidence handling
Implementing BLS signature aggregation in Tendermint would add complexity to
the evidence handling within Tendermint. Currently, the light client can submit
evidence of a fork attempt to the chain. This evidence consists of the set of
validators that double-signed, including their public keys, with the conflicting
block.
We can quickly check that the listed validators double signed by verifying
that each of their signatures are in the submitted conflicting block. A BLS
signature scheme would change this by requiring the light client to submit
the public keys of all of the validators that signed the conflicting block so
that the aggregated signature may be checked against the full signature set.
Again, aggregated signature verification is all-or-nothing, so without all of
the public keys, we cannot verify the signature at all. These keys would be
retrievable. Any party that wanted to create a fork would want to convince a
network that its fork is legitimate, so it would need to gossip the public keys.
This does not hamper the feasibility of implementing BLS signature aggregation
into Tendermint, but does represent yet another piece of added complexity to
the associated protocols.
## Open Questions
* *Q*: Can you aggregate Ed25519 signatures in Tendermint?
* There is a suggested scheme in github issue [7892][suggested-ed25519-agg],
but additional rigor would be required to fully verify its correctness.
## Current Consideration
Adopting a signature aggregation scheme presents some serious risks and costs
to the Tendermint project. It requires multiple backwards-incompatible changes
to the code, namely a change in the structure of the block and a new backwards-incompatible
signature and key type. It risks adding a new signature type for which new attack
types are still being discovered _and_ for which no industry standard, battle-tested
library yet exists.
The gains boasted by this new signing scheme are modest: Verification time is
marginally faster and block sizes shrink by a few kilobytes. These are relatively
minor gains in exchange for the complexity of the change and the listed risks of the technology.
We should take a wait-and-see approach to BLS signature aggregation, monitoring
the up-and-coming projects and consider implementing it as the libraries and
standards develop.
### References
[line-ostracon-repo]: https://github.com/line/ostracon
[line-ostracon-pr]: https://github.com/line/ostracon/pull/117
[mit-BLS-lecture]: https://youtu.be/BFwc2XA8rSk?t=2521
[gcp-storage-pricing]: https://cloud.google.com/storage/pricing#north-america_2
[yubi-key-bls-support]: https://github.com/Yubico/yubihsm-shell/issues/66
[cloud-hsm-support]: https://docs.aws.amazon.com/cloudhsm/latest/userguide/pkcs11-key-types.html
[bls-ietf]: https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-bls-signature-04
[bls-ietf-terms]: https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-bls-signature-04#section-1.3
[bls-ietf-pop]: https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-bls-signature-04#section-3.3
[multi-signatures-smaller-blockchains]: https://eprint.iacr.org/2018/483.pdf
[ibc-tendermint]: https://github.com/cosmos/ibc/tree/master/spec/client/ics-007-tendermint-client
[zcash-adoption]: https://github.com/zcash/zcash/issues/2502
[chia-adoption]: https://github.com/Chia-Network/chia-blockchain#chia-blockchain
[bls-ietf-ecdsa-compare]: https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-bls-signature-04#section-1.1
[voi-ed25519-perf]: https://github.com/williambanfield/curve25519-voi/blob/benchmark/primitives/ed25519/PERFORMANCE.txt#L79
[blst-verify-bench]: https://github.com/williambanfield/blst/blame/bench/bindings/go/PERFORMANCE.md#L9
[blst-verify-bench-agg]: https://github.com/williambanfield/blst/blame/bench/bindings/go/PERFORMANCE.md#L23
[vitalik-pairing-post]: https://medium.com/@VitalikButerin/exploring-elliptic-curve-pairings-c73c1864e627
[ledger-bls-announce]: https://www.ledger.com/first-ever-firmware-update-coming-to-the-ledger-nano-x
[commit-proto]: https://github.com/tendermint/tendermint/blob/be7cb50bb3432ee652f88a443e8ee7b8ef7122bc/proto/tendermint/types/types.proto#L121
[canonical-vote-proto]: https://github.com/tendermint/tendermint/blob/be7cb50bb3432ee652f88a443e8ee7b8ef7122bc/spec/core/encoding.md#L283
[blst]: https://github.com/supranational/blst
[prysm-blst]: https://github.com/prysmaticlabs/prysm/blob/develop/go.mod#L75
[gnark]: https://github.com/ConsenSys/gnark-crypto/
[eth-2-adoption]: https://notes.ethereum.org/@GW1ZUbNKR5iRjjKYx6_dJQ/Skxf3tNcg_
[bls-weil-pairing]: https://www.iacr.org/archive/asiacrypt2001/22480516.pdf
[summing-zero-paper]: https://eprint.iacr.org/2021/323.pdf
[circl]: https://github.com/cloudflare/circl
[light-client-evidence]: https://github.com/tendermint/tendermint/blob/a6fd1fe20116d4b1f7e819cded81cece8e5c1ac7/types/evidence.go#L245
[suggested-ed25519-agg]: https://github.com/tendermint/tendermint/issues/7892

2
go.mod
View File

@@ -42,7 +42,7 @@ require (
github.com/creachadair/taskgroup v0.3.2
github.com/golangci/golangci-lint v1.45.2
github.com/google/go-cmp v0.5.8
github.com/vektra/mockery/v2 v2.12.1
github.com/vektra/mockery/v2 v2.12.2
gotest.tools v2.2.0+incompatible
)

4
go.sum
View File

@@ -1059,8 +1059,8 @@ github.com/valyala/bytebufferpool v1.0.0/go.mod h1:6bBcMArwyJ5K/AmCkWv1jt77kVWyC
github.com/valyala/fasthttp v1.30.0/go.mod h1:2rsYD01CKFrjjsvFxx75KlEUNpWNBY9JWD3K/7o2Cus=
github.com/valyala/quicktemplate v1.7.0/go.mod h1:sqKJnoaOF88V07vkO+9FL8fb9uZg/VPSJnLYn+LmLk8=
github.com/valyala/tcplisten v1.0.0/go.mod h1:T0xQ8SeCZGxckz9qRXTfG43PvQ/mcWh7FwZEA7Ioqkc=
github.com/vektra/mockery/v2 v2.12.1 h1:BAJk2fGjVg/P9Fi+BxZD1/ZeKTOclpeAb/SKCc12zXc=
github.com/vektra/mockery/v2 v2.12.1/go.mod h1:8vf4KDDUptfkyypzdHLuE7OE2xA7Gdt60WgIS8PgD+U=
github.com/vektra/mockery/v2 v2.12.2 h1:JbRx9F+XcCJiDTyCm3V5lXYwl56m5ZouV6I9eZa1Dj0=
github.com/vektra/mockery/v2 v2.12.2/go.mod h1:8vf4KDDUptfkyypzdHLuE7OE2xA7Gdt60WgIS8PgD+U=
github.com/viki-org/dnscache v0.0.0-20130720023526-c70c1f23c5d8/go.mod h1:dniwbG03GafCjFohMDmz6Zc6oCuiqgH6tGNyXTkHzXE=
github.com/vishvananda/netlink v1.1.0/go.mod h1:cTgwzPIzzgDAYoQrMm0EdrjRUBkTqKYppBueQtXaqoE=
github.com/vishvananda/netns v0.0.0-20191106174202-0a2b9b5464df/go.mod h1:JP3t17pCcGlemwknint6hfoeCVQrEMVwxRLRjXpq+BU=

24
issues Normal file
View File

@@ -0,0 +1,24 @@
# Metricsgen generates metrics.md markdown file
The metricsgen tool parses and reads a metrics struct with various metrics
fields. Currently, the tool uses this data to generate the corresponding prometheus
and no-op constructors for use withint Tendermint. The same data that is parsed
to generate the constructor can trivially be used to generate the [metrics.md]()
documentation file. Autogenerating this file will reduce the toil of keeping this
file up to date and make it less likely for this simple file to become out of date.
- [] metricsgen generates markdown table output using `TemplateData` data
- [] make target added for regenerating this markdown file
# Replace go-kit metrics metricsgen output with standard prometheus types.
Tendermint uses the `go-kit` metrics library to create and register metrics.
Tendermint only uses the standard prometheus implementations of the various
go-kit metrics type. We therefore derive very little benefit from using the `go-kit`
types. It also prevents Tendermint from easily generating two instances of the `nodeImpl`
with metrics for each. This is because `go-kit` under the hood uses the `DefaultRegistery`
from prometheus which is a global object, created and managed by the prometheus
library. Migrating to use just prometheus will allow us to control and insantiate
separate metrics registries for each instance of the `nodeImpl` that is created.
- [] replace go-kit metrics with equivalent standard prometheus types

View File

@@ -279,8 +279,7 @@ func checkRequiredHeaderFields(h *types.SignedHeader) error {
return errors.New("height in trusted header must be set (non zero")
}
zeroTime := time.Time{}
if h.Time == zeroTime {
if h.Time.IsZero() {
return errors.New("time in trusted header must be set")
}

View File

@@ -1,6 +1,7 @@
# fuzz
Fuzzing for various packages in Tendermint using [go-fuzz](https://github.com/dvyukov/go-fuzz) library.
Fuzzing for various packages in Tendermint using the fuzzing infrastructure included in
Go 1.18.
Inputs: