The test `test_sync_point` had a few shortcomings that made it flaky
or simply wrong:
1. We were verifying that hints were written by checking the size of
in-flight hints. However, that could potentially lead to problems
in rare situations.
For instance, if all of the hints failed to be written to disk, the
size of in-flight hints would drop to zero, but creating a sync point
would correspond to the empty state.
In such a situation, we should fail immediately and indicate what
the cause was.
2. A sync point corresponds to the hints that have already been written
to disk. The number of those is tracked by the metric `written`.
It's a much more reliable way to make sure that hints have been
written to the commitlog. That ensures that the sync point we'll
create will really correspond to those hints.
3. The auxiliary function `wait_for` used in the test works like this:
it executes the passed callback and looks at the result. If it's
`None`, it retries it. Otherwise, the callback is deemed to have
finished its execution and no further retries will be attempted.
Before this commit, we simply returned a bool, and so the code was
wrong. We improve it.
---
Note that this fixes scylladb/scylladb#28203, which was a manifestation
of scylladb/scylladb#25879. We created a sync point that corresponded
to the empty state, and so it immediately resolved, even when node 3
was still dead.
As a bonus, we rewrite the auxiliary code responsible for fetching
metrics and manipulating sync points. Now it's asynchronous and
uses the existing standard mechanisms available to developers.
Furthermore, we reduce the time needed for executing
`test_sync_point` by 27 seconds.
---
The total difference in time needed to execute the whole test file
(on my local machine, in dev mode):
Before:
CPU utilization: 0.9%
real 2m7.811s
user 0m25.446s
sys 0m16.733s
After:
CPU utilization: 1.1%
real 1m40.288s
user 0m25.218s
sys 0m16.566s
---
Refs scylladb/scylladb#25879
Fixes scylladb/scylladb#28203
Backport: This improves the stability of our CI, so let's
backport it to all supported versions.
- (cherry picked from commit 628e74f157)
- (cherry picked from commit ac4af5f461)
- (cherry picked from commit c5239edf2a)
- (cherry picked from commit a256ba7de0)
- (cherry picked from commit f83f911bae)
Parent PR: #28602
Closes scylladb/scylladb#28622
* github.com:scylladb/scylladb:
test: cluster: Reduce wait time in test_sync_point
test: cluster: Fix test_sync_point
test: cluster: Await sync points asynchronously
test: cluster: Create sync points asynchronously
test: cluster: Fetch hint metrics asynchronously
Scylla in-source tests.
For details on how to run the tests, see docs/dev/testing.md
Shared C++ utils, libraries are in lib/, for Python - pylib/
alternator - Python tests which connect to a single server and use the DynamoDB API unit, boost, raft - unit tests in C++ cqlpy - Python tests which connect to a single server and use CQL topology* - tests that set up clusters and add/remove nodes cql - approval tests that use CQL and pre-recorded output rest_api - tests for Scylla REST API Port 9000 scylla-gdb - tests for scylla-gdb.py helper script nodetool - tests for C++ implementation of nodetool
If you can use an existing folder, consider adding your test to it. New folders should be used for new large categories/subsystems, or when the test environment is significantly different from some existing suite, e.g. you plan to start scylladb with different configuration, and you intend to add many tests and would like them to reuse an existing Scylla cluster (clusters can be reused for tests within the same folder).
To add a new folder, create a new directory, and then
copy & edit its suite.ini.