Add get_non_coordinator_host() function which returns
ServerInfo for the first host which is not a coordinator
or None if there is no such host.
Also rework get_coordinator_host() to not fail if some
of the hosts don't have a host id.
In the following commit, we modify `test_topology_recovery_basic`
to test the recovery mode in the presence of live zero-token nodes.
Unfortunately, it requires a bit ugly workaround. Zero-token nodes
are ignored by the Python driver if it also connects to other
nodes because of empty tokens in the `system.peers` table. In that
test, we must connect to a zero-token node to enter the recovery
mode and purge the Raft data. Hence, we use different CQL sessions
for different nodes.
In the future, we may change the Python driver behavior and revert
this workaround. Moreover, the recovery tests will be removed or
significantly changed when we implement the manual recovery tool.
Therefore, we shouldn't worry about this workaround too much.
Before we use `check_system_topology_and_cdc_generations_v3_consistency`
in a test with a zero-token node, we must ensure it doesn't fail
because of zero tokens in a row of the `system.topology` table.
In one of the following patches, we reuse the helper functions from
`test_topology_ops` in a new test, so we move them to `util.py`.
Also, we add the `cl` parameter to `start_writes`, as the new test
will use `cl=2`.
Replaced the old `read_barrier` helper from "test/pylib/util.py"
by the new helper from "test/pylib/rest_client.py" that is calling
the newly introduced direct REST API.
Replaced in all relevant tests and decommissioned the old helper.
Introduced a new helper `get_host_api_address` to retrieve the host API
address - which in come cases can be different from the host address
(e.g. if the RPC address is changed).
Fixes: scylladb/scylladb#19662Closesscylladb/scylladb#19739
Fetching only the first page is not the intuitive behavior expected by users.
This causes flakiness in some tests which generate variable amount of
keys depending on execution speed and verify later that all keys were
written using a single SELECT statement. When the amount of keys
becomes larger than page size, the test fails.
Fixes#18774Closesscylladb/scylladb#19004
This patch allows us to restart writing (to the same table with
CDC enabled) with a new CQL session. It is useful when we want to
continue writing after closing the first CQL session, which
happens during the `reconnect_driver` call. We must stop writing
before calling `reconnect_driver`. If a write started just before
the first CQL session was closed, it would time out on the client.
We rename `finish_and_verify` - `stop_and_verify` is a better
name after introducing `restart`.
If coordinator node was killed, restarted, become not
operatable during topology operation, new coordinator should be elected,
operation should be aborted and cluster should be rolled back
Error injection will be used to kill the coordinator before streaming
starts
Closesscylladb/scylladb#16197
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp
by adding `ring_delay_ms` to it.
In this test, nodes are learning about new generations (introduced by upgrade
procedure and then by node bootstrap) concurrently with doing writes that
should go to these generations.
Because of `ring_delay_ms = 0', the generation could have been committed when
it should have already been in use.
This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```
Creating writes during such a generation can result in assigning them a wrong
generation or a failure. Failure may occur if it hits short time window when
`generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet
been executed. With a nonzero ring_delay_ms it's not a problem, because during
this time window, the generation should not be in use.
Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```
Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes.
Wait for the last generation to be in use and sleep one second to make sure
there are writes to the CDC table in this generation.
Fixes#17977
In topology on raft, management of CDC generations is moved to the topology coordinator.
We need to verify that the CDC keeps working correctly during the upgrade for topology on the raft.
A similar change will be made in the topology recovery test. It will reuse
the `start_writes_to_cdc_table` function.
Ref #17409Closesscylladb/scylladb#17828
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.
The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.
For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.
To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.
Fixesscylladb/scylladb#17587Fixesscylladb/scylladb#17600Fixesscylladb/scylladb#17621Closesscylladb/scylladb#17622
We extend `test_cdc_generation_clearing`. Now, it also tests the
clean-up of `TOPOLOGY.committed_cdc_generations` added in the
previous patch.
In the implementation, we harden the already existing
`check_system_topology_and_cdc_generations_v3_consistency`. After
the previous patch, data of every generation present in
`committed_cdc_generations` should be present in CDC_GENERATIONS_V3.
In other words, `committed_cdc_generations` should always be a
subset of a set containing generations in CDC_GENERATIONS_V3.
Before the previous patch, this wasn't true after the clearing, so
the new version of `test_cdc_generation_clearing` wouldn't pass
back then.
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.
In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. We need to adjust the Raft-based topology to ensure
all required generations are loaded into memory and their data
isn't cleared too early.
This patch is the first step of the adjustment. We replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
This set is sorted by timestamps, just like
`unpublished_cdc_generations`.
This patch is mostly refactoring. The last generation in
`committed_cdc_generations` is the equivalent of the previous
`current_cdc_generation_{uuid, timestamp}`. The other generations
are irrelevant for now. They will be used in the following patches.
After introducing `committed_cdc_generations`, a newly committed
generation is also unpublished (it was current and unpublished
before the patch). We introduce `add_new_committed_cdc_generation`,
which updates both sets of generations so that we don't have to
call `add_committed_cdc_generation` and
`add_unpublished_cdc_generation` together. It's easy to forget
that both of them are necessary. Before this patch, there was
no call to `add_unpublished_cdc_generation` in
`topology_coordinator::build_coordinator_state`. It was a bug
reported in scylladb/scylladb#17288. This patch fixes it.
This patch also removes "the current generation" notion from the
Raft-based topology. For the Raft-based topology, the current
generation was the last committed generation. However, for the
`cdc::metadata`, it was the generation operating now. These two
generations could be different, which was confusing. For the
`cdc::metadata`, the current generation is relevant as it is
handled differently, but for the Raft-based topology, it isn't.
Therefore, we change only the Raft-based topology. The generation
called "current" is called "the last committed" from now.
Adds three tests for the new upgrade procedure:
- test_topology_upgrade - upgrades a cluster operating in legacy mode to
use raft topology operations,
- test_topology_recovery_basic - performs recovery on a three-node
cluster, no node removal is done,
- test_topology_majority_loss - simulates a majority loss scenario, i.e.
removed two nodes out of three, performs recovery to rebuild the
raft topology state and re-add two nodes back.
The issues mentioned in the comment before are already fixed.
Unfortunately, there is another, opposite issue which this function can
be used for. The previous issue was about the existing driver session
not reconnecting. The current issue is about the existing driver session
reconnecting too much... (and in the middle of queries.)
We move all used util functions from topology_raft_disabled to
topology before we remove topology_raft_disabled. After this
change, util.py in topology will be the single util file for all
topology tests.
Some util functions in topology_raft_disabled aren't used anymore.
We don't move such functions and remove them instead.
The `reconnect_driver` function will be useful outside the
`topology_raft_disabled` test suite - namely, for cluster feature tests
in `topology`. The best course of action for this function would be to
put it into pylib utils; however, the function depends on ManagerClient
which is defined in `test.pylib.manager_client` that depends on
`test.pylib.utils` - therefore we cannot put it there as it would cause
an import cycle. The `topology.utils` module sounds like the next best
thing.
In addition, the docstring comment is updated to reflect that this
function will now be used to work around another issue as well.
`RandomTables.verify_schema` is often called in topology tests after
performing a schema change. It compares the schema tables fetched from
some node to the expected latest schema stored by the `RandomTables`
object.
However there's no guarantee that the latest schema change has already
propagated to the node which we query. We could have performed the
schema change on a different node and the change may not have been
applied yet on all nodes.
To fix that, pick a specific node and perform a read barrier on it, then
use that node to fetch the schema tables.
Fixes#13788Closes#13789
Raft replication doesn't guarantee that all replicas see
identical Raft state at all times, it only guarantees the
same order of events on all replicas.
When comparing raft state with gossip state on a node, first
issue a read barrier to ensure the node has the latest raft state.
To issue a read barrier it is sufficient to alter a non-existing
state: in order to validate the DDL the node needs to sync with the
leader and fetch its latest group0 state.
Fixes#13518 (flaky topology test).
Closes#13756