mirror of
https://github.com/tendermint/tendermint.git
synced 2026-01-08 14:21:14 +00:00
While the name including an apostrophe is perfectly legal, it turns out to confuse some tools that don't escape things properly. Rename to remove the apostrophe rather than fight the world. Fixes #7836.
140 lines
5.5 KiB
Markdown
140 lines
5.5 KiB
Markdown
# RFC 008: Don't Panic
|
|
|
|
## Changelog
|
|
|
|
- 2021-12-17: initial draft (@tychoish)
|
|
|
|
## Abstract
|
|
|
|
Today, the Tendermint core codebase has panics in a number of cases as
|
|
a response to exceptional situations. These panics complicate testing,
|
|
and might make tendermint components difficult to use as a library in
|
|
some circumstances. This document outlines a project of converting
|
|
panics to errors and describes the situations where its safe to
|
|
panic.
|
|
|
|
## Background
|
|
|
|
Panics in Go are a great mechanism for aborting the current execution
|
|
for truly exceptional situations (e.g. memory errors, data corruption,
|
|
processes initialization); however, because they resemble exceptions
|
|
in other languages, it can be easy to over use them in the
|
|
implementation of software architectures. This certainly happened in
|
|
the history of Tendermint, and as we embark on the project of
|
|
stabilizing the package, we find ourselves in the right moment to
|
|
reexamine our use of panics, and largely where panics happen in the
|
|
code base.
|
|
|
|
There are still some situations where panics are acceptable and
|
|
desireable, but it's important that Tendermint, as a project, comes to
|
|
consensus--perhaps in the text of this document--on the situations
|
|
where it is acceptable to panic.
|
|
|
|
### References
|
|
|
|
- [Defer Panic and Recover](https://go.dev/blog/defer-panic-and-recover)
|
|
- [Why Go gets exceptions right](https://dave.cheney.net/tag/panic)
|
|
- [Don't panic](https://dave.cheney.net/practical-go/presentations/gophercon-singapore-2019.html#_dont_panic)
|
|
|
|
## Discussion
|
|
|
|
### Acceptable Panics
|
|
|
|
#### Initialization
|
|
|
|
It is unambiguously safe (and desireable) to panic in `init()`
|
|
functions in response to any kind of error. These errors are caught by
|
|
tests, and occur early enough in process initialization that they
|
|
won't cause unexpected runtime crashes.
|
|
|
|
Other code that is called early in process initialization MAY panic,
|
|
in some situations if it's not possible to return an error or cause
|
|
the process to abort early, although these situations should be
|
|
vanishingly slim.
|
|
|
|
#### Data Corruption
|
|
|
|
If Tendermint code encounters an inconsistency that could be
|
|
attributed to data corruption or a logical impossibility it is safer
|
|
to panic and crash the process than continue to attempt to make
|
|
progress in these situations.
|
|
|
|
Examples including reading data out of the storage engine that
|
|
is invalid or corrupt, or encountering an ambiguous situation where
|
|
the process should halt. Generally these forms of corruption are
|
|
detected after interacting with a trusted but external data source,
|
|
and reflect situations where the author thinks its safer to terminate
|
|
the process immediately rather than allow execution to continue.
|
|
|
|
#### Unrecoverable Consensus Failure
|
|
|
|
In general, a panic should be used in the case of unrecoverable
|
|
consensus failures. If a process detects that the network is
|
|
behaving in an incoherent way and it does not have a clearly defined
|
|
and mechanism for recovering, the process should panic.
|
|
|
|
#### Static Validity
|
|
|
|
It is acceptable to panic for invariant violations, within a library
|
|
or package, in situations that should be statically impossible,
|
|
because there is no way to make these kinds of assertions at compile
|
|
time.
|
|
|
|
For example, type-asserting `interface{}` values returned by
|
|
`container/list` and `container/heap` (and similar), is acceptable,
|
|
because package authors should have exclusive control of the inputs to
|
|
these containers. Packages should not expose the ability to add
|
|
arbitrary values to these data structures.
|
|
|
|
#### Controlled Panics Within Libraries
|
|
|
|
In some algorithms with highly recursive structures or very nested
|
|
call patterns, using a panic, in combination with conditional recovery
|
|
handlers results in more manageable code. Ultimately this is a limited
|
|
application, and implementations that use panics internally should
|
|
only recover conditionally, filtering out panics rather than ignoring
|
|
or handling all panics.
|
|
|
|
#### Request Handling
|
|
|
|
Code that handles responses to incoming/external requests
|
|
(e.g. `http.Handler`) should avoid panics, but practice this isn't
|
|
totally possible, and it makes sense that request handlers have some
|
|
kind of default recovery mechanism that will prevent one request from
|
|
terminating a service.
|
|
|
|
### Unacceptable Panics
|
|
|
|
In **no** other situation is it acceptable for the code to panic:
|
|
|
|
- there should be **no** controlled panics that callers are required
|
|
to handle across library/package boundaries.
|
|
- callers of library functions should not expect panics.
|
|
- ensuring that arbitrary go routines can't panic.
|
|
- ensuring that there are no arbitrary panics in core production code,
|
|
espically code that can run at any time during the lifetime of a
|
|
process.
|
|
- all test code and fixture should report normal test assertions with
|
|
a mechanism like testify's `require` assertion rather than calling
|
|
panic directly.
|
|
|
|
The goal of this increased "panic rigor" is to ensure that any escaped
|
|
panic is reflects a fixable bug in Tendermint.
|
|
|
|
### Removing Panics
|
|
|
|
The process for removing panics involve a few steps, and will be part
|
|
of an ongoing process of code modernization:
|
|
|
|
- converting existing explicit panics to errors in cases where it's
|
|
possible to return an error, the errors can and should be handled, and returning
|
|
an error would not lead to data corruption or cover up data
|
|
corruption.
|
|
|
|
- increase rigor around operations that can cause runtime errors, like
|
|
type assertions, nil pointer errors, array bounds access issues, and
|
|
either avoid these situations or return errors where possible.
|
|
|
|
- remove generic panic handlers which could cover and hide known
|
|
panics.
|