We adjust the test to work with the configuration option
`rf_rack_valid_keyspaces` enabled. For that, we ensure that there is
always at least one node in each of the three racks. This way, all
keyspaces we create and manipulate will remain RF-rack-valid since they
all use RF=3.
------------------------------------------------------------------------
To achieve that, we only need to adjust the following events:
1. `init_tablet_transfer`
The event creates a new keyspace and table and manually migrates
a tablet belonging to it. As long as we make sure the migration occurs
within the same rack, there will be no problem.
Since RF == #racks, each rack will have exactly one tablet replica,
so we can migrate the tablet to an arbitrary node in the same rack.
Note that there must exist a node that's not a replica. If there weren't
such a node, the test wouldn't have worked before this commit because
it's not possible to migrate a tablet from one node being its replica to
another. In other words, we have a guarantee that there are at least 4 nodes
in the cluster when we try to migrate a tablet replica.
That said, we check it anyway. If there's no viable node to migrate the
tablet replica to, we log that information and do nothing. That should be
an acceptable solution.
2. `add_new_node`
As long as we add a node to an existing rack, there's no way to
violate the invariant imposed by the configuration option, so we pick
a random rack out of the existing three and create a node in it.
3. `decommission_node`
We need to ensure that the node we'll be trying to decommission is
not the only one in its rack.
Following pretty much the same reasoning as in `init_tablet_transfer`,
we conclude there must be a rack with at least two nodes in it. Otherwise
we'd end up having to migrate a tablet from one replica node to another,
which is not possible.
What's more, decommissioning a node is not possible if any node in
the cluster is dead, so we can assume that `manager.running_servers`
returns the whole cluster.
4. `remove_node`
The same as `decommission_node`. Just note although the node we choose to
remove must be first stopped, none other node can be dead, so the whole
cluster must be returned by `manager.running_servers`.
------------------------------------------------------------------------
There's one more important thing to note. The test may sometimes trigger
a sequence of events where a new node is started, but, due to an error
injection, its initialization is not completed. Among other things, the
node may NOT have a host ID recognized by the rest of the nodes in the
cluster, and operations like tablet migration will fail if they target
it.
Thankfully, there seems to be a way to avoid problems stemming from
that. When a new node is added to the cluster, it should appear at the
end of the list returned by `manager.running_servers`. This most likely
stems from how dictionaries work in Python:
"Keys and values are iterated over in insertion order."
-- https://docs.python.org/3/library/stdtypes.html#dict-views
and the fact that we keep track of running servers using a dictionary.
Furthermore, we rely on the assumption that the test currently works
correctly.
Assume, to the contrary, that among the nodes taking part in the operations
listed above, there is at most one node per rack that has its host ID recognized
by the rest of the cluster. Note that only those nodes can store any tablets.
Let's refer to the set of those nodes as X.
Assume that we're dealing with tablet migration, decommissioning, or removing
a node. Since those operations involve tablet migration, at least one tablet
will need to be migrated from the node in question to another node in X.
However, since X consists of at most three nodes, and one of them is losing
its tablet, there is no viable target for the tablet, so the operation fails.
Using those assumptions, an auxiliary function, `select_viable_rack`,
was designed to carefully choose a correct rack, which we'll then pick nodes
from to perform the topological operations. It's simple: we just find the first
rack in the list that has at least two nodes in it. That should ensure that we
perform an operation that doesn't lead to any unforeseen disaster.
------------------------------------------------------------------------
Since the test effectively becomes more complex due to more care for keeping
the topology of the cluster valid, we extend the log messages to make them
more helpful when debugging a failure.
The Raft topology workflow was changed by the limited voters feature:
nodes no longer request votership themselves. As a result, the
"stop_before_becoming_raft_voter" error injection is now obsolete and
has been removed.
Fixes: scylladb/scylladb#23418
The order of entries in the ERROR_INJECTIONS list determines test
repeatability for a given random seed.
To allow removing error injections without affecting the order of the
remaining ones, removed injections are now renamed with a "REMOVED_"
prefix instead of being deleted.
This ensures they are ignored by the tests, while the sequence of active
injections—and thus test reproducibility—remains unchanged.
Rework `ScyllaLogFile.wait_for()` method to make it easier
to add required methods to ScyllaNode class of ccm-like shim.
Also, added `ScyllaLogFile.grep_for_errors()` method and
reworked `ScyllaLogFile.grep()`
Some of the tests in the test suite have proven to be more problematic
in adjusting to RF-rack-validity. Since we'd like to run as many tests
as possible with the `rf_rack_valid_keyspaces` configuration option
enabled, let's disable it in those. In the following commit, we'll enable
it by default.
The workflow of becoming a voter changes with the "limited voters"
feature, as the node will no longer become a voter on its own, but the
votership is being managed by the topology coordinator. This therefore
breaks the `stop_before_becoming_raft_voter` test, as that injection
relies on the old behavior.
We will disable the test for this particular case for now and address
either fixing of complete removal of the test in a follow-up task.
Refs: scylladb/scylladb#23418