Incorrect passing of the artifacts_dir_url parameter from test.py to pytest leads to the situation when it will pass None as a string and pytest will generate incorrect URL.
In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same.
To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id.
Fixes: https://github.com/scylladb/scylladb/issues/17851
Fixes: https://github.com/scylladb/scylladb/issues/15973
Fetching only the first page is not the intuitive behavior expected by users.
This causes flakiness in some tests which generate variable amount of
keys depending on execution speed and verify later that all keys were
written using a single SELECT statement. When the amount of keys
becomes larger than page size, the test fails.
Fixes#18774Closesscylladb/scylladb#19004
This patch allows us to restart writing (to the same table with
CDC enabled) with a new CQL session. It is useful when we want to
continue writing after closing the first CQL session, which
happens during the `reconnect_driver` call. We must stop writing
before calling `reconnect_driver`. If a write started just before
the first CQL session was closed, it would time out on the client.
We rename `finish_and_verify` - `stop_and_verify` is a better
name after introducing `restart`.
The modified lines of code intend to await the first appearance of a log
on one of the nodes.
But due to misplaced parentheses, instead of creating a list of log-awaiting
tasks with a list comprehension, they pass a generator expression to
asyncio.create_task().
This is nonsense, and it fails immediately with a type error.
But since they don't actually check the result of the await,
the test just assumes that the search completed successfully.
This was uncovered by an upgrade to Python 3.12, because its typing is stronger
and asyncio.create_task() screams when it's passed a regular generator.
This patch fixes the bad list comprehension, and also adds an error check
on the completed awaitables (by calling `await` on them).
Fixes#18740Closesscylladb/scylladb#18754
This PR resolves issue with double count of the test result for topology tests. It will not appear in the consolidated report anymore.
Another fix is to provide a better view which test failed by modifying the test case name in the report enriching it with mode and run id, so making them unique across the run.
The scope of this change is:
1. Modify the test name to have run id in name
2. Add handlers to get logs of test.py and pytest in one file that are related to test, rather than to the full suite
3. Remove topology tests from aggregating them on a suite level in Junit results
4. Add a link to the logs related to the failed tests in Junit results, so it will be easier to navigate to all logs related to test
5. Gather logs related to the failed test to one directory for better logs investigation
Ref: scylladb/scylladb#17851Closesscylladb/scylladb#18277
move ManagerClient object creation/clear
to functions scope instead of session scope
to prevent test cases affect each other
by stopping sharing connections to cluster between tests
Even if there is no endpoint for the given IP the state can still belong to existing endpoint that
was restarted with different IP, so lets try to locate the endpoint by host id as well. Do it in raft
topology mode only to not have impact on gossiper mode.
Also make the test more robust in detecting wrong amount of entries in
the peers table. Today it may miss that there is a wrong entry there
because the map will squash two entries for the same host id into one.
Fixes: scylladb/scylladb#18419Fixes: scylladb/scylladb#18457
Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior.
In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed.
Fixes: scylladb/scylladb#18369Closesscylladb/scylladb#18379
* github.com:scylladb/scylladb:
test: get rid of server-side server_restart
test: util: get rid of the `restart` helper
test: {auth,topology}: use manager.rolling_restart
If coordinator node was killed, restarted, become not
operatable during topology operation, new coordinator should be elected,
operation should be aborted and cluster should be rolled back
Error injection will be used to kill the coordinator before streaming
starts
Closesscylladb/scylladb#16197
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp
by adding `ring_delay_ms` to it.
In this test, nodes are learning about new generations (introduced by upgrade
procedure and then by node bootstrap) concurrently with doing writes that
should go to these generations.
Because of `ring_delay_ms = 0', the generation could have been committed when
it should have already been in use.
This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```
Creating writes during such a generation can result in assigning them a wrong
generation or a failure. Failure may occur if it hits short time window when
`generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet
been executed. With a nonzero ring_delay_ms it's not a problem, because during
this time window, the generation should not be in use.
Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```
Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes.
Wait for the last generation to be in use and sleep one second to make sure
there are writes to the CDC table in this generation.
Fixes#17977
Some tests are only run in dev mode for some reason. For such tests
there's run_in_dev list, no need in putting it in all the non-dev
skip_in_... ones.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are many tests that are skipped in release mode becuase they rely
on error-injection machinery which doesn't work in release mode. Most of
those tests are listed in suite's skip_in_release, but it's not very
handy, mainly because it's not clear why the test is there. The
skip_mode decoration is much more convenient.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch introduces raft-based service levels.
The difference to the current method of working is:
- service levels are stored in `system.service_levels_v2`
- reads are executed with `LOCAL_ONE`
- writes are done via raft group0 operation
Service levels are migrated to v2 in topology upgrade.
After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then)
Fixes#17926Closesscylladb/scylladb#16585
* github.com:scylladb/scylladb:
test: test service levels v2 works in recovery mode
test: add test for service levels migration
test: add test for service levels snapshot
test:topology: extract `trigger_snapshot` to utils
main: create raft dda if sl data was migrated
service:qos: store information about sl data migration
service:qos: service levels migration
main: assign standard service level DDA before starting group0
service:qos: fix `is_v2()` method
service:qos: add a method to upgrade data accessor
test: add unit_test_raft_service_levels_accessor
service:storage_service: add support for service levels raft snapshot
service:qos: add abort_source for group0 operations
service:qos: raft service level distributed data accessor
service:qos: use group0_guard in data accessor
cql3:statements: run service level statements on shard0 with raft guard
test: fix overrides in unit_test_service_levels_accessor
service:qos: fix indentation
service:qos: coroutinize some of the methods
db:system_keyspace: add `SERVICE_LEVELS_V2` table
service:qos: extract common service levels' table functions
In this PR we add timeouts support to raft groups registry. We introduce
the `raft_server_with_timeouts` class, which wraps the `raft::server`
add exposes its interface with additional `raft_timeout` parameter. If
it's set, the wrapper cancels the `abort_source` after certain amount of
time. The value of the timeout can be specified either in the
`raft_timeout` parameter, or the default value can be set in `the
raft_server_with_timeouts` class constructor.
The `raft_group_registry` interface is extended with
`group0_with_timeouts()` method. It returns an instance of
`raft_server_with_timeouts` for group0 raft server. The timeout value
for it is configured in `create_server_for_group0`. It's one minute by
default and can be overridden for tests with
`group0-raft-op-timeout-in-ms` parameter.
The new api allows the client to decide whether to use timeouts or not.
In this PR we are reviewing all the group0 call sites and add
`raft_timeout` if that makes sense. The general principle is that if the
code is handling a client request and the client expects a potential
error, we use timeouts. We don't use timeouts for background fibers
(such as topology coordinator), since they wouldn't add much value. The
only thing the background fiber can do with a timeout is to retry, and
this will have the same end effect as not having a timeout at all.
Fixesscylladb/scylladb#16604Closesscylladb/scylladb#17590
* github.com:scylladb/scylladb:
migration_manager: use raft_timeout{}
storage_service::join_node_response_handler: use raft_timeout{}
storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
storage_service::set_tablet_balancing_enabled: use raft_timeout{}
storage_service::move_tablet: use raft_timeout{}
raft_check_and_repair_cdc_streams: use raft_timeout{}
raft_timeout: test that node operations fail properly
raft_rebuild: use raft_timeout{}
do_cluster_cleanup: use raft_timeout{}
raft_initialize_discovery_leader: use raft_timeout{}
update_topology_with_local_metadata: use with_timeout{}
raft_decommission: use raft_timeout{}
raft_removenode: use raft_timeout{}
join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
raft_group0: make_raft_config_nonvoter: add abort_source parameter
manager_client: server_add with start=false shouldn't call driver_connect
scylla_cluster: add seeds parameter to the add_server and servers_add
raft_server_with_timeouts: report the lost quorum
join_node_request_handler: add raft_timeout{} for start_operation
skip_mode: add platform_key
auth: use raft_timeout{}
raft_group0_client: add raft_timeout parameter
raft_group_registry: add group0_with_timeouts
utils: add composite_abort_source.hh
error_injection: move api registration to set_server_init
error_injection: add inject_parameter method
error_injection: move injection_name string into injection_shared_data
error_injection: pass injection parameters at startup
In subsequent commits we are going to add test.py
tests for raft_timeout{} feature. The problem is that
aarch/debug configuration is infamously slow. Timeout
settings used in tests work for all platforms but aarch/debug.
In this commit we extend the skip_mode attribute with the
platform_key property. We'll use @skip_mode('debug', platform_key='aarch64')
to skip the tests for this specific configuration.
The tests will still be run for aarch64/release.
In topology on raft, management of CDC generations is moved to the topology coordinator.
We need to verify that the CDC keeps working correctly during the upgrade for topology on the raft.
A similar change will be made in the topology recovery test. It will reuse
the `start_writes_to_cdc_table` function.
Ref #17409Closesscylladb/scylladb#17828
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.
The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.
For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.
To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.
Fixesscylladb/scylladb#17587Fixesscylladb/scylladb#17600Fixesscylladb/scylladb#17621Closesscylladb/scylladb#17622
We extend `test_cdc_generation_clearing`. Now, it also tests the
clean-up of `TOPOLOGY.committed_cdc_generations` added in the
previous patch.
In the implementation, we harden the already existing
`check_system_topology_and_cdc_generations_v3_consistency`. After
the previous patch, data of every generation present in
`committed_cdc_generations` should be present in CDC_GENERATIONS_V3.
In other words, `committed_cdc_generations` should always be a
subset of a set containing generations in CDC_GENERATIONS_V3.
Before the previous patch, this wasn't true after the clearing, so
the new version of `test_cdc_generation_clearing` wouldn't pass
back then.
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.
In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. We need to adjust the Raft-based topology to ensure
all required generations are loaded into memory and their data
isn't cleared too early.
This patch is the first step of the adjustment. We replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
This set is sorted by timestamps, just like
`unpublished_cdc_generations`.
This patch is mostly refactoring. The last generation in
`committed_cdc_generations` is the equivalent of the previous
`current_cdc_generation_{uuid, timestamp}`. The other generations
are irrelevant for now. They will be used in the following patches.
After introducing `committed_cdc_generations`, a newly committed
generation is also unpublished (it was current and unpublished
before the patch). We introduce `add_new_committed_cdc_generation`,
which updates both sets of generations so that we don't have to
call `add_committed_cdc_generation` and
`add_unpublished_cdc_generation` together. It's easy to forget
that both of them are necessary. Before this patch, there was
no call to `add_unpublished_cdc_generation` in
`topology_coordinator::build_coordinator_state`. It was a bug
reported in scylladb/scylladb#17288. This patch fixes it.
This patch also removes "the current generation" notion from the
Raft-based topology. For the Raft-based topology, the current
generation was the last committed generation. However, for the
`cdc::metadata`, it was the generation operating now. These two
generations could be different, which was confusing. For the
`cdc::metadata`, the current generation is relevant as it is
handled differently, but for the Raft-based topology, it isn't.
Therefore, we change only the Raft-based topology. The generation
called "current" is called "the last committed" from now.
When a node changes IP address we need to remove its old IP from `system.peers` and gossiper.
We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted.
The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address.
The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1.
To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup.
The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes.
Fixes#16886Fixes#16691Fixes#17199Closesscylladb/scylladb#17162
* github.com:scylladb/scylladb:
test_change_ip: improve the test
raft_ip_address_updater: remove stale IPs from gossiper
raft_address_map: add my ip with the new generation
system_keyspace::update_peer_info: check ep and host_id are not empty
system_keyspace::update_peer_info: make host_id an explicit parameter
system_keyspace::update_peer_info: remove any_set flag optimisation
system_keyspace: remove duplicate ips for host_id
system_keyspace: peers table: use coroutines
storage_service::raft_ip_address_updater: log gossiper event name
raft topology: ip change: purge old IP
on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
In this commit we refactor test_change_ip to improve
it in several ways:
* We inject failure before old IP is removed and verify
that after restart the node sees the proper peers - the
new IP for node2 and old IP for node3, which is not restarted
yet.
* We introduce the lambda wait_proper_ips, which checks not only the
system.peers table, but also gossiper and token_metadata.
* We call this lambda for all nodes, not only the first node;
this allows to validate that the node that has changed its
IP has the proper IP of itself in the data structures above.
Note that we need to inject an additional delay ip-change-raft-sync-delay
before old IP is removed. Otherwise the problem stop reproducing - other
nodes remove the old IP before it's send back to the just restarted node.
A user asked on the ScyllaDB forum several questions on whether
tombstone_gc works on materialized views. This patch includes two
tests that confirm the following:
1. The tombstone_gc may be set on a view - either during its creation
with CREATE MATERIALIZED VIEW or later with ALTER MATERIALIZED VIEW.
2. The tombstone_gc setting is correctly shown - for both base tables
and views - by the "DESC" statement.
3. The tombstone_gc setting is NOT inherited from a base table to a new
view - if you want this option on a view, you need to set it
separately.
Unfortunately, this test could not be a single-node cql-pytest because
we forbid tombstone_gc=repair when RF=1, and since recently, we forbid
setting RF>1 on a single-node setup. So the new tests are written in
the test/topology framework - which may run multiple tests against
a single three-node cluster run multiple tests against it.
To write tests over a shared cluster, we need functions which create
temporary keyspaces, tables and views, which are deleted automatically
as soon as a test ends. The test/topology framework was lacking such
functions, so this tests includes them - currently inside the test
file, but if other people find them useful they can be moved to a more
central location.
The new functions, net_test_keyspace(), new_test_table() and
new_materialized_view() are inspired by the identically-named
functions in test/cql-pytest/util.py, but the implementation is
different: Importantly, the new functions here are *async*
context managers, used via "async with", to fit with the rest
of the asynchronous code used in the topology test framework.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17345
The test checks that once a node is specified in ignored node list by
one topology operation the information is carried over to the next
operation as well.
Since now a node that is at one point was marked as dead, either via
--ignore-dead-nodes parameter or by been a target for removenode or
replace, can no longer be made "undead" we need to make sure that they
cannot rejoin the cluster any longer. Do that by banning them on a
messaging layer just like we do for nodes that are left.
Not that the removenode failure test had to be altered since it restarted
a node after removenode failure (which now will not work). Also, since
the check for liveness was removed from the topology coordinator (because
the node is already banned by then), the test case that triggers the
removed code is removed as well.
Adds three tests for the new upgrade procedure:
- test_topology_upgrade - upgrades a cluster operating in legacy mode to
use raft topology operations,
- test_topology_recovery_basic - performs recovery on a three-node
cluster, no node removal is done,
- test_topology_majority_loss - simulates a majority loss scenario, i.e.
removed two nodes out of three, performs recovery to rebuild the
raft topology state and re-add two nodes back.
After changing `left_token_ring` from a node state to a transition
state in scylladb/scylladb#17009, we do the same for
`rollback_to_normal`. `rollback_to_normal` was created as a node
state because `left_token_ring` was a node state.
This change will allow us to distinguish a failed removenode from
a failed decommission in the `rollback_to_normal` handler.
Currently, we use the same logic for both of them, so it's not
required. However, this might change, as it has happened with the
decommission and the failed bootstrap/replace in the
`left_token_ring` state (scylladb/scylladb#16797). We are making
this change now because it would be much harder after branching.
The change also simplifies the code in
`topology_coordinator:rollback_current_topology_op`.
Moving the `rollback_to_normal` handler from
`handle_node_transition` to `handle_topology_transition` created a
large diff. There is only one change - adding
`auto node = get_node_to_work_on(std::move(guard));`.
Add error handling to rebuild instead of retrying it until succeeds.
* 'gleb/rebuild-fail-v2' of github.com:scylladb/scylla-dev:
test: add test for rebuild failure
test: add expected_error to rebuild_node operation
topology_coordinator: Propagate rebuild failure to the initiator
The issues mentioned in the comment before are already fixed.
Unfortunately, there is another, opposite issue which this function can
be used for. The previous issue was about the existing driver session
not reconnecting. The current issue is about the existing driver session
reconnecting too much... (and in the middle of queries.)
A node can be in the `left_token_ring` state after:
- a finished decommission,
- a failed bootstrap,
- a failed replace.
When a node is in the `left_token_ring` state, we don't know how
it has ended up in this state. We cannot distinguish a node that
has finished decommissioning from a node that has failed bootstrap.
The main problem it causes is that we incorrectly send the
`barrier_and_drain` command to a node that has failed
bootstrapping or replacing. We must do it for a node that has
finished decommissioning because it could still coordinate
requests. However, since we cannot distinguish nodes in the
`left_token_ring` state, we must send the command to all of them.
This issue appeared in scylladb/scylladb#16797 and this patch is
a follow-up that fixes it.
The solution is changing `left_token_ring` from a node state
to a transition state.
Regarding implementation, most of the changes are simple
refactoring. The less obvious are:
- Before this patch, in `system_keyspace::left_topology_state`, we
had to keep the ignored nodes' IDs for replace to ensure that the
replacing node will have access to it after moving to the
`left_token_ring` state, which happens when replace fails. We
don't need this workaround anymore. When we enter the new
`left_token_ring` transition state, the new node will still be in
the `decommissioning` state, so it won't lose its request param.
- Before this patch, a decommissioning node lost its tokens
while moving to the `left_token_ring` state. After the patch, it
loses tokens while still being in the `decommissioning` state. We
ensure that all `decommissioning` handlers correctly handle a node
that lost its tokens.
Moving the `left_token_ring` handler from `handle_node_transition`
to `handle_topology_transition` created a large diff. There are
only three changes:
- adding `auto node = get_node_to_work_on(std::move(guard));`,
- adding `builder.del_transition_state()`,
- changing error logged when `global_token_metadata_barrier` fails.
Allows selectively enabling higher logging levels for just raft-topology
related things, without doing it for the entire storage_service (which
includes things like gossiper callbacks).
Also gets rid of the redundant "raft topology:" prefix which was also
not included everywhere.
In its current location it will be started with 3 pre-created scylla
nodes with default features ON. Next patch will exclude `tablets` from
the default list, so the test needs to create servers on its own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch changes the syntax of enabling tablets from
CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }
to be
CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }
and updates all tests accordingly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test runs two bootstraps and checks that there is no cleanup
in between. Then it runs a decommission and checks that cleanup runs
automatically and then it runs one more decommission and checks that no
cleanup runs again. Second part checks manual cleanup triggering. It
adds a node, triggers cleanup through the REST API, checks that is runs,
decommissions a node and check that the cleanup did not run again.
This test creates a 5 node cluster with 2 down nodes (A and B). After
that it creates a queue of 3 topology operation: bootstrap, removenode
A and removenode B with ignore_nodes=A. Check that all operation
manage to complete. Then it downs one node and creates a queue with
two requests: bootstrap and decommission. Since none can proceed both
should be canceled.
In the next patch we want to abort topology operations if there is no
enough live nodes to perform them. This will break tests that do a
topology operation right after restarting a node since a topology
coordinator may still not see the restarted node as alive. Fix all those
tests to wait between restart and a topology operation until UP state
propagates.
When a node changes its IP we need to store the mapping in
system.peers and update token_metadata.topology and erm
in-memory data structures.
The test_change_ip was improved to verify this new
behaviour. Before this patch the test didn't check
that IPs used for data requests are updated on
IP change. In this commit we add the read/write check.
It fails on insert with 'node unavailable'
error without the fix.
The removenode operation is defined to succeed only if the node
being removed is dead. Currently, we reject this operation on the
initiator side (in `storage_service::raft_removenode`) when the
failure detector considers the node being removed alive. However,
it is possible that even if the initiator considers the node dead,
the topology coordinator will consider it alive when handling the
topology request. For example, the topology coordinator can use
a bigger failure detector timeout, or the node being removed can
suddenly resurrect.
This PR makes the topology coordinator reject removenode if the
node being removed is considered alive. It also adds
`test_remove_alive_node` that verifies this change.
Fixesscylladb/scylladb#16109Closesscylladb/scylladb#16584
* github.com:scylladb/scylladb:
test: add test_remove_alive_node
topology_coordinator: reject removenode if the removed node is alive
test: ManagerClient: remove unused wait_for_host_down
test: remove_node: wait until the node being removed is dead
In the following commits, we make the topology coordinator reject
removenode requests if the node being removed is considered alive
by the gossiper. Before making this change, we need to adapt the
testing framework so that we don't have flaky removenode operations
that fail because the node being removed hasn't been marked as dead
yet. We achieve this by waiting until all other running nodes see
the node being removed as dead in all removenode operations.
Some tests are simplified after this change because they don't have
to call server_not_sees_other_server anymore.