mirror of
https://github.com/tendermint/tendermint.git
synced 2026-01-03 11:45:18 +00:00
rfc: e2e improvements (#6941)
This commit is contained in:
@@ -40,5 +40,6 @@ sections.
|
||||
- [RFC-000: P2P Roadmap](./rfc-000-p2p-roadmap.rst)
|
||||
- [RFC-001: Storage Engines](./rfc-001-storage-engine.rst)
|
||||
- [RFC-002: Interprocess Communication](./rfc-002-ipc-ecosystem.md)
|
||||
- [RFC-004: E2E Test Framework Enhancements](./rfc-004-e2e-framework.md)
|
||||
|
||||
<!-- - [RFC-NNN: Title](./rfc-NNN-title.md) -->
|
||||
|
||||
213
docs/rfc/rfc-004-e2e-framework.rst
Normal file
213
docs/rfc/rfc-004-e2e-framework.rst
Normal file
@@ -0,0 +1,213 @@
|
||||
========================================
|
||||
RFC 004: E2E Test Framework Enhancements
|
||||
========================================
|
||||
|
||||
Changelog
|
||||
---------
|
||||
|
||||
- 2021-09-14: started initial draft (@tychoish)
|
||||
|
||||
Abstract
|
||||
--------
|
||||
|
||||
This document discusses a series of improvements to the e2e test framework
|
||||
that we can consider during the next few releases to help boost confidence in
|
||||
Tendermint releases, and improve developer efficiency.
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
During the 0.35 release cycle, the E2E tests were a source of great
|
||||
value, helping to identify a number of bugs before release. At the same time,
|
||||
the tests were not consistently passing during this time, thereby reducing
|
||||
their value, and forcing the core development team to allocate time and energy
|
||||
to maintaining and chasing down issues with the e2e tests and the test
|
||||
harness. The experience of this release cycle calls to mind a series of
|
||||
improvements to the test framework, and this document attempts to capture
|
||||
these improvements, along with motivations, and potential for impact.
|
||||
|
||||
Projects
|
||||
--------
|
||||
|
||||
Flexible Workload Generation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Presently the e2e suite contains a single workload generation pattern, which
|
||||
exists simply to ensure that the test networks have some work during their
|
||||
runs. However, the shape and volume of the work is very consistent and is very
|
||||
gentle to help ensure test reliability.
|
||||
|
||||
We don't need a complex workload generation framework, but being able to have
|
||||
a few different workload shapes available for test networks, both generated and
|
||||
hand-crafted, would be useful.
|
||||
|
||||
Workload patterns/configurations might include:
|
||||
|
||||
- transaction targeting patterns (include light nodes, round robin, target
|
||||
individual nodes)
|
||||
|
||||
- variable transaction size over time.
|
||||
|
||||
- transaction broadcast option (synchronously, checked, fire-and-forget,
|
||||
mixed).
|
||||
|
||||
- number of transactions to submit.
|
||||
|
||||
- non-transaction workloads: (evidence submission, query, event subscription.)
|
||||
|
||||
Configurable Generator
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The nightly e2e suite is defined by the `testnet generator
|
||||
<https://github.com/tendermint/tendermint/blob/master/test/e2e/generator/generate.go#L13-L65>`_,
|
||||
and it's difficult to add dimensions or change the focus of the test suite in
|
||||
any way without modifying the implementation of the generator. If the
|
||||
generator were more configurable, potentially via a file rather than in
|
||||
the Go implementation, we could modify the focus of the test suite on the
|
||||
fly.
|
||||
|
||||
Features that we might want to configure:
|
||||
|
||||
- number of test networks to generate of various topologies, to improve
|
||||
coverage of different configurations.
|
||||
|
||||
- test application configurations (to modify the latency of ABCI calls, etc.)
|
||||
|
||||
- size of test networks.
|
||||
|
||||
- workload shape and behavior.
|
||||
|
||||
- initial sync and catch-up configurations.
|
||||
|
||||
The workload generator currently provides runtime options for limiting the
|
||||
generator to specific types of P2P stacks, and for generating multiple groups
|
||||
of test cases to support parallelism. The goal is to extend this pattern and
|
||||
avoid hardcoding the matrix of test cases in the generator code. Once the
|
||||
testnet configuration generation behavior is configurable at runtime,
|
||||
developers may be able to use the e2e framework to validate changes before
|
||||
landing changes that break e2e tests a day later.
|
||||
|
||||
In addition to the autogenerated suite, it might make sense to maintain a
|
||||
small collection of hand-crafted cases that exercise configurations of
|
||||
concern, to run as part of the nightly (or less frequent) loop.
|
||||
|
||||
Implementation Plan Structure
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
As a development team, we should determine the features should impact the e2e
|
||||
testing early in the development cycle, and if we intend to modify the e2e
|
||||
tests to exercise a feature, we should identify this early and begin the
|
||||
integration process as early as possible.
|
||||
|
||||
To facilitate this, we should adopt a practice whereby we exercise specific
|
||||
features that are currently under development more rigorously in the e2e
|
||||
suite, and then as development stabilizes we can reduce the number or weight
|
||||
of these features in the suite.
|
||||
|
||||
As of 0.35 there are essentially two end to end tests: the suite of 64
|
||||
generated test networks, and the hand crafted `ci.toml` test case. The
|
||||
generated test cases help provide systemtic coverage, while the `ci` run
|
||||
provides coverage for a large number of features.
|
||||
|
||||
Reduce Cycle Time
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
One of the barriers to leveraging the e2e framework, and one of the challenges
|
||||
in debugging failures, is the cycle time of running a single test iteration is
|
||||
quite high: 5 minutes to build the docker image, plus the time to run the test
|
||||
or tests.
|
||||
|
||||
There are a number of improvements and enhancements that can reduce the cycle
|
||||
time in practice:
|
||||
|
||||
- reduce the amount of time required to build the docker image used in these
|
||||
tests. Without the dependency on CGo, the tendermint binaries could be
|
||||
(cross) compiled outside of the docker container and then injected into
|
||||
them, which would take better advantage of docker's native caching,
|
||||
although, without the dependency on CGo there would be no hard requirement
|
||||
for the e2e tests to use docker.
|
||||
|
||||
- support test parallelism. Because of the way the testnets are orchestrated
|
||||
a single system can really only run one network at a time. For executions
|
||||
(local or remote) with more resources, there's no reason to run a few
|
||||
networks in parallel to reduce the feedback time.
|
||||
|
||||
- prune testnet configurations that are unlikely to provide good signal, to
|
||||
shorten the time to feedback.
|
||||
|
||||
- apply some kind of tiered approach to test execution, to improve the
|
||||
legibility of the test result. For example order tests by the dependency of
|
||||
their features, or run test networks without perturbations before running
|
||||
that configuration with perturbations, to be able to isolate the impact of
|
||||
specific features.
|
||||
|
||||
- orchestrate the test harness directly from go test rather than via a special
|
||||
harness and shell scripts so e2e tests may more naively fit into developers
|
||||
existing workflows.
|
||||
|
||||
Many of these improvements, particularly, reducing the build time will also
|
||||
reduce the time to get feedback during automated builds.
|
||||
|
||||
Deeper Insights
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
When a test network fails, it's incredibly difficult to understand _why_ the
|
||||
network failed, as the current system provides very little insight into the
|
||||
system outside of the process logs. When a test network stalls or fails
|
||||
developers should be able to quickly and easily get a sense of the state of
|
||||
the network and all nodes.
|
||||
|
||||
Improvements in persuit of this goal, include functionality that would help
|
||||
node operators in production environments by improving the quality and utility
|
||||
of the logging messages and other reported metrics, but also provide some
|
||||
tools to collect and aggregate this data for developers in the context of test
|
||||
networks.
|
||||
|
||||
- Interleave messages from all nodes in the network to be able to correlate
|
||||
events during the test run.
|
||||
|
||||
- Collect structured metrics of the system operation (CPU/MEM/IO) during the
|
||||
test run, as well as from each tendermint/application process.
|
||||
|
||||
- Build (simple) tools to be able to render and summarize the data collected
|
||||
during the test run to answer basic questions about test outcome.
|
||||
|
||||
Flexible Assertions
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Currently, all assertions run for every test network, which makes the
|
||||
assertions pretty bland, and the framework primarily useful as a smoke-test
|
||||
framework, but it might be useful to be able to write and run different
|
||||
tests for different configurations. This could allow us to test outside of the
|
||||
happy-path.
|
||||
|
||||
In general our existing assertions occupy a fraction of the total test time,
|
||||
so the relative cost of adding a few extra test assertions would be of limited
|
||||
cost, and could help build confidence.
|
||||
|
||||
Additional Kinds of Testing
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The existing e2e suite, exercises networks of nodes that have homogeneous
|
||||
tendermint version, stable configuration, that are expected to make
|
||||
progress. There are many other possible test configurations that may be
|
||||
interesting to engage with. These could include dimensions, such as:
|
||||
|
||||
- Multi-version testing to exercise our compatibility guarantees for networks
|
||||
that might have different tendermint versions.
|
||||
|
||||
- As a flavor or mult-version testing, include upgrade testing, to build
|
||||
confidence in migration code and procedures.
|
||||
|
||||
- Additional test applications, particularly practical-type applciations
|
||||
including some that use gaiad and/or the cosmos-sdk. Test-only applications
|
||||
that simulate other kinds of applications (e.g. variable application
|
||||
operation latency.)
|
||||
|
||||
- Tests of "non-viable" configurations that ensure that forbidden combinations
|
||||
lead to halts.
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
- `ADR 66: End-to-End Testing <../architecture/adr-66-e2e-testing.md>`_
|
||||
Reference in New Issue
Block a user