Tablets load balancer is unable to process more than a single pending
replica, thus ALTER tablets KS cannot accept an ALTER statement which
would result in creating 2+ pending replicas, hence it has to validate
if the sum of absoulte differences of RFs specified in the statement is
not greter than 1.
(cherry picked from commit ee56bbfe61)
A bug has been discovered while trying to ALTER tablets KS and
specifying only 1 out of 2 DCs - the not specified DC's RF has been
zeroed. This is because ALTER tablets KS updated the KS only with the
RF-per-DC mapping specified in the ALTER tablets KS statement, so if a
DC was ommitted, it was assigned a value of RF=0.
This commit fixes that plus additionally passes all the KS options, not
only the replication options, to the topology coordinator, where the KS
update is performed.
`initial_tablets` is a special case, which requires a special handling
in the source code, as we cannot simply update old initial_tablet's
settings with the new ones, because if only ` and TABLETS = {'enabled':
true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but
rather keep the old value - this is tested by the
`test_alter_preserves_tablets_if_initial_tablets_skipped` testcase.
Other than that, the above mentioned testcase started to fail with
these changes, and it appeared to be an issue with the test not waiting
until ALTER is completed, and thus reading the old value, hence the
test's body has been modified to wait for ALTER to complete before
performing validation.
(cherry picked from commit 2aabe7f09c)
ALTER tablets KS validated if RF is not changed by more than 1 for DCs
that already had replicas, but not for DCs that didn't have them yet, so
specifying an RF jump from 0 to 2 was possible when listing a new DC in
ALTER tablets KS statement, which violated internal invariants of
tablets load balancer.
This PR fixes that bug and adds a multi-dc testcases to check if adding
replicas to a new DC and removing replicas from a DC is honoring the RF
change constraints.
Refs: #20039
(cherry picked from commit 47acdc1f98)
There are two bits that control whenter replication strategy for a
keyspace will use tablets or not -- the configuration option and CQL
parameter. This patch tunes its parsing to implement the logic shown
below:
if (strategy.supports_tablets) {
if (cql.with_tablets) {
if (cfg.enable_tablets) {
return create_keyspace_with_tablets();
} else {
throw "tablets are not enabled";
}
} else if (cql.with_tablets = off) {
return create_keyspace_without_tablets();
} else { // cql.with_tablets is not specified
if (cfg.enable_tablets) {
return create_keyspace_with_tablets();
} else {
return create_keyspace_without_tablets();
}
}
} else { // strategy doesn't support tablets
if (cql.with_tablets == on) {
throw "invalid cql parameter";
} else if (cql.with_tablets == off) {
return create_keyspace_without_tablets();
} else { // cql.with_tablets is not specified
return create_keyspace_without_tablets();
}
}
closes: #20088
In order to enable tablets "by default" for NetworkTopologyStrategy
there's explicit check near ks_prop_defs::get_initial_tablets(), that's
not very nice. It needs more care to fix it, e.g. provide feature
service reference to abstract_replication_strategy constructor. But
since ks_prop_defs code already highjacks options specifically for that
strategy type (see prepare_options() helper), it's OK for now.
There's also #20768 misbehavior that's preserved in this patch, but
should be fixed eventually as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit ebedc57300)
Closesscylladb/scylladb#20927
Before 17f4a151ce the node was marked as
been replaced in join_group0 state, before it actually joins the group0,
so by the time it actually joins and starts transferring snapshot/log no
traffic is sent to it. The commit changed this to mark the node as
being replaced after the snapshot/log is already transferred so we can
get the traffic to the node while it sill did not caught up with a
leader and this may causes problems since the state is not complete.
Mark the node as being replaced earlier, but still add the new node to
the topology later as the commit above intended.
Fixes: https://github.com/scylladb/scylladb/issues/20629
Need to be backported since this is a regression
(cherry picked from commit 644e7a2012)
(cherry picked from commit c0939d86f9)
(cherry picked from commit 1b4c255ffd)
Refs #20743Closesscylladb/scylladb#20829
* github.com:scylladb/scylladb:
test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts
topology coordinator:: mark node as being replaced earlier
topology coordinator: do metadata barrier before calling finish_accepting_node() during replace
The test performs consecutive schema changes in RECOVERY mode. The
second change relies on the first. However the driver might route the
changes to different servers and we don't have group 0 to guarantee
linearizability. We must rely on the first change coordinator to push
the schema mutations to other servers before returning, but that only
happens when it sees other servers as alive when doing the schema
change. It wasn't guaranteed in the test. Fix this.
Fixesscylladb/scylladb#20791
Should be backported to all branches containing this test to reduce
flakiness.
(cherry picked from commit f390d4020a)
Closesscylladb/scylladb#20807
add test for issue when writes in commitlog segments pinned to another table can be resurrected.
This test based on dtest code published in #14870 and adapted for community version.
It's a regression test for #15060 fix and should fail before this patch and succeed afterwards.
Refs #14870, #15060Closesscylladb/scylladb#20331
Migrate the `system_distributed.view_build_status` table to `system.view_build_status_v2`. The writes to the v2 table are done via raft group0 operations.
The new parameter `view_builder_version` stored in `scylla_local` indicates whether nodes should use the old or the new table.
New clusters use v2. Otherwise, the migration to v2 is initiated by the topology coordinator when the feature is enabled. It reads all the rows from the old table and writes them to the new table, and sets `view_builder_version` to v2. When the change is applied, all view_builder services are updated to write and read from the v2 table.
The old table `system_distributed.view_build_status` is set to read virtually from the new table in order to maintain compatibility.
When removing a node from the cluster, we remove its rows from the table atomically (fixes https://github.com/scylladb/scylladb/issues/11836). Also, during the migration, we remove all invalid rows.
Fixesscylladb/scylladb#15329
dtest https://github.com/scylladb/scylla-dtest/pull/4827Closesscylladb/scylladb#19745
* github.com:scylladb/scylladb:
view: test view_build_status table with node replace
test/pylib: use view_build_status_v2 table in wait_for_view
view_builder: common write view_build_status function
view_builder: improve migration to v2 with intermediate phase
view: delete node rows from view_build_status on node removal
view: sanitize view_build_status during migration
view: make old view_build_status table a virtual table
replica: move streaming_reader_lifecycle_policy to header file
view_builder: test view_build_status_v2
storage_service: add view_build_status to raft snapshot
view_builder: migration to v2
db:system_keyspace: add view_builder_version to scylla_local
view_builder: read view status from v2 table
view_builder: introduce writing status mutations via raft
view_builder: pass group0_client and qp to view_builder
view_builder: extract sys_dist status operations to functions
db:system_keyspace: add view_build_status_v2 table
Any expired tombstone can be garbage collected if it doesn't shadow data in the commit log, memtable, or uncompacting SSTables.
This PR introduces a new mode to major compaction, enabled by the `consider_only_existing_data` flag that bypasses these checks. When enabled, memtables and old commitlog segments are cleared with a system-wide flush and all the sstables (after flush) are included in the compaction, so that it works with all data generated up to a given time point.
This new mode works with the assumption that newly written data will not be shadowed by expired tombstones. So it ignores new sstables (and new data written to memtable) created after compaction started. Since there was a system wide flush, commitlog checks can also be skipped when garbage collecting tombstones. Introducing data shadowed by a tombstone during compaction can lead to undefined behavior, even without this PR, as the tombstone may or may not have already been garbage collected.
Fixes#19728Closesscylladb/scylladb#20031
* github.com:scylladb/scylladb:
cql-pytest: add test to verify consider_only_existing_data compaction option
tools/scylla-nodetool: add consider-only-existing-data option to compact command
api: compaction: add `consider_only_existing_data` option
compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp
compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled
tombstone_gc_state: introduce with_commitlog_check_disabled()
compaction: introduce new option to check only compacting sstables for gc
compaction: rename maybe_flush_all_tables to maybe_flush_commitlog
compaction: maybe_flush_all_tables: add new force_flush param
Adds the test_hints_consistency_during_decommission test which
reproduces the failure observed in scylladb/scylla-dtest#4582. It uses
error injections, including the newly added
topology_coordinator_pause_after_streaming injection, to reliably
orchestrate the scenario observed there.
In a nutshell, the test makes sure to replay hints after streaming
during decommission has finished, but before the cluster switches to
reading from new replicas. Without the fix, hints would be replayed to
the decommissioned node and then would be lost forever after the cluster
start reading from new replicas.
Move create_sync_point and await_sync_point from the scope of the
test_sync_point test to the file scope. They will be used in a test that
will be introduced in the commit that follows.
The test cases in this file use an error injection to reduce raft group
0 timeouts (from the default 1 minute), in order to speed up the tests;
the scenarios expect these timeouts to happen, so we want them to happen
as quick as possible, but we don't want to reduce timeouts so much that
it will make other operations fail when we don't expect them to (e.g.
when the test wants to add a node to the cluster).
Unfortunately the selected 5 seconds in debug mode was not enough and
made the tests flaky: scylladb/scylladb#20111.
Increase it to 10 seconds. This unfortunately will slow down these tests
as they have to sometimes wait for 10 seconds for the timeout to happen.
But better to have this than a flaky test.
Fixes: scylladb/scylladb#20111Closesscylladb/scylladb#20320
Add a test replacing a node and verifying the contents of the
view_build_status table are updated as expected, having rows for the new
node and no rows for the old node.
Change the util function wait_for_view to read the view build status
from the system.view_build_status_v2 table which replaces
system_distributed.view_build_status.
The old table can still be used but it is less efficient because it's
implemented as a virtual table which reads from the v2 table, so it's
better to read directly from the v2 table. This can cause slowness in
tests.
The additional util function wait_for_view_v1 reads from the old table.
This may be needed in upgrade tests if the v2 table is not available
yet.
Add an intermediate phase to the view builder migration to v2 where we
write to both the old and new table in order to not lose writes during
the migration.
We add an additional view builder version v1_5 between v1 and v2 where
we write to both tables. We perform a barrier before moving to v2 to
ensure all the operations to the old table are completed.
When a node is removed we want to clean its rows from the
view_build_status table.
Now when removing a node and generating the topology state update, we
generate also the mutations to delete all the possible rows belonging to
the node from the table.
When migrating the view_build_status to v2, skip adding any leftover
rows that don't correspond to an existing node or an existing view.
Previously such rows could have been created and not cleaned, for
example when a node is removed.
After migrating the view build status from
system_distributed.view_build_status to system.view_build_status_v2, we
set system_distributed.view_build_status to be a virtual table, such
that reading from it is actually reading from the underlying new table.
The reason for this is that we want to keep compatibility with the old
table, since it exists also in Cassandra and it is used by various external
tools to check the view build status. Making the table virtual makes the
transition transparent for external users.
The two tables are in different keyspaces and have different shard
mapping. The v1 table is a distributed table with a normal shard
mapping, and the v2 table is a local table using the null sharder. The
virtual reader works by constructing a multishard reader which reads the rows
from shard zero, and then filtering it to get only the rows owned by the
current shard.
Add tests to verify the new view_build_status_v2 is used by the
view_builder and can be read from all nodes with the expected values.
Also test a migration from the v1 layout to v2.
When testing mv admission control, we perform a large view update
and check if the following view update can be admitted due to the
high view backlog usage. We rely on a delay which keeps the backlog
high for longer to make sure the backlog is still increased during
the second write. However, in some test runs the delay is not long
enough, causing the second write to miss the large backlog and not
hit admission control.
In this patch we keep the increased backlog high using another
injection instead of relying on a delay to make absolute sure
that the backlog is still high during the second write.
Fixesscylladb/scylladb#20382Closesscylladb/scylladb#20445
Run the reversed queries on a 2-node cluster with CL=ALL with and
without NATIVE_REVERSE_QUERIES feature flag. When the flag is enabled,
the native reversed format is used, otherwise the legacy format.
The NATIVE_REVERSE_QUERIES feature flag is suppressed with an error
injection that simulates cluster upgrade process.
Backport is not required. The patch adds additional upgrade tests
for https://github.com/scylladb/scylladb/pull/18864Closesscylladb/scylladb#20179
Currently if a coordinator and a node being replaced are in the same DC
while inter-dc encryption is enabled (connections between nodes in the
same DC should not be encrypted) the replace operation will fail. It
fails because a coordinator uses non encrypted connection to push raft
data to the new node, but the new node will not accept such connection
until it knows which DC the coordinator belongs to and for that the raft
data needs to be transferred.
The series adds the test for this scenario and the fix for the
chicken&egg problem above.
The series (or at least the fix itself) needs to be backported because
this is a serious regression.
Fixes: scylladb/scylladb#19025Closesscylladb/scylladb#20290
* github.com:scylladb/scylladb:
topology coordinator: fix indentation after the last patch
topology coordinator: do not add replacing node without a ring to topology
test: add test for replace in clusters with encryption enabled
test.py: add server encryption support to cluster manager
.gitignore: fix pattern for resources to match only one specific directory
repair_service::repair_flush_hints_batchlog_handler may access batchlog
manager while it is uninitialized.
Throw if batchlog manager isn't initialized.
Fixes: #20236.
Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access.
Closesscylladb/scylladb#20251
* github.com:scylladb/scylladb:
test: add test to ensure repair won't fail with uninitialized bm
repair: throw if batchlog manager isn't initialized
When only inter dc encryption is enabled a non encrypted connection
between two nodes is allowed only if both nodes are in the same dc.
If a nodes that initiates the connection knows that dst is in the same
dc and hence use non encrypted connection, but the dst not yet knows the
topology of the src such connection will not be allowed since dst cannot
guaranty that dst is in the same dc.
Currently, when topology coordinator is used, a replacing node will
appear in the coordinator's topology immediately after it is added to the
group0. The coordinator will try to send raft message to the new node
and (assuming only inter dc encryption is enabled and replacing node and
the coordinator are in the same dc) it will try to open regular, non encrypted,
connection to it. But the replacing node will not have the coordinator
in it's topology yet (it needs to sync the raft state for that). so it
will reject such connection.
To solve the problem the patch does not add a replacing node that was
just added to group0 to the topology. It will be added later, when
tokens will be assigned to it. At this point a replacing node will
already make sure that its topology state is up-to-date (since it will
execute a raft barrier in join_node_response_params handler) and it knows
coordinator's topology. This aligns replace behaviour with bootstrap
since bootstrap also does not add a node without a ring to the topology.
The patch effectively reverts b8ee8911caFixes: scylladb/scylladb#19025
We add tests to verify the basic properties of zero-token nodes.
`test_zero_token_nodes_no_replication` and
`test_not_enough_token_owners` are more or less deterministic tests.
Running them only in the dev mode is sufficient.
`test_zero_token_nodes_topology_ops` is quite slow, as expected,
considering parameterization and the number of topology operations.
In the future we can think of making it faster or skipping in the
debug mode. For now, our priority is to test zero-token nodes
thoroughly.
In one of the following patches, we introduce support for zero-token
nodes. From that point, getting all nodes and getting all token
owners isn't equivalent. In this patch, we ensure that we consider
only token owners when we want to consider only token owners (for
example, in the replication logic), and we consider all nodes when
we want to consider all nodes (for example, in the topology logic).
The main purpose of this patch is to make the PR introducing
zero-token nodes easier to review. The patch that introduces
zero-token nodes is already complicated. We don't want trivial
changes from this patch to make noise there.
This patch introduces changes needed for zero-token nodes only in the
Raft-based topology and in the recovery mode. Zero-token nodes are
unsupported in the gossip-based topology outside recovery.
Some functions added to `token_metadata` and `topology` are
inefficient because they compute a new data structure in every call.
They are never called in the hot path, so it's not a serious problem.
Nevertheless, we should improve it somehow. Note that it's not
obvious how to do it because we don't want to make `token_metadata`
store topology-related data. Similarly, we don't want to make
`topology` store token-related data. We can think of an improvement
in a follow-up.
We don't remove unused `topology::get_datacenter_rack_nodes` and
`topology::get_datacenter_nodes`. These function can be useful in the
future. Also, `topology::_dc_nodes` is used internally in `topology`.
Attempting to read a partition via `SELECT * FROM MUTATION_FRAGMENTS()`, which the node doesn't own, from a table using tablets causes a crash.
This is because when using tablets, the replica side simply doesn't handle requests for un-owned tokens and this triggers a crash.
We should probably improve how this is handled (an exception is better than a crash), but this is outside the scope of this PR.
This PR fixes this and also adds a reproducer test.
Fixes: https://github.com/scylladb/scylladb/issues/18786
Fixes a regression introduced in 6.0, so needs backport to 6.0 and 6.1
Closesscylladb/scylladb#20109
* github.com:scylladb/scylladb:
test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works
replica/mutation_dump: enfore pinning of effective replication map
replica/mutation_dump: handle un-owned tokens (with tablets)
Currently, when a view update backlog of one replica is full, the write is still sent by the coordinator to all replicas. Because of the backlog, the write fails on the replica, causing inconsistency that needs to be fixed by repair. To avoid these inconsistencies, this patch adds a check on the coordinator for overloaded replicas. As a result, a write may be rejected before being sent to any replicas and later retried by the user, when the replica is no longer overloaded.
This patch does not remove the replica write failures, because we still may reach a full backlog when more view updates are generated after the coordinator check is performed and before the write reaches the replica.
Fixesscylladb/scylladb#17426Closesscylladb/scylladb#18334
* github.com:scylladb/scylladb:
mv: test the view update behavior
mv: add test for admission control
storage_proxy: return overloaded_exception instead of throwing
mv: reject user requests by coordinator when a replica is overloaded by MVs
Currently it doesn't, one of the node crashes with std::out_of_range
exception and meaningless calltrace
[Botond]: this test checks the case of reading a partition via
MUTATION_FRAGMENTS from a node which doesn't own said partition.
refs: #18786
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This reverts commit cc428e8a36. It causes
may spurious CI failures while nodes are being torn down. Revert it until
the root cause is fixed, after which it can be reinstated.
Fixes#20116.
- raft_sys_table_storage::store_snapshot_descriptor now receives a number of
preserved items in the log, rather than _config.snapshot_trailing value;
- Incorrect check for truncated number of items in store_snapshot_descriptor
was removed.
Fixesscylladb/scylladb#16817Fixesscylladb/scylladb#20080
ALTER tablets KS executes in 2 steps:
1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req,
2. global topo req is executed by topo coordinator, which reads data attached to the req.
The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue.
BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126
(I suggest to disable displaying whitespace differences when reviewing this PR).
Fixes: scylladb/scylladb#19576Closesscylladb/scylladb#19666
* github.com:scylladb/scylladb:
tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist
cql: refactor rf_change indentation
Prevent ALTERing non-existing KS with tablets
Using the error injection framework, we inject a sleep into the
processing path of ALTER tablets KS, so that the topology coordinator of
the leader node
sleeps after the rf_change event has been scheduled, but before it is
started to be executed. During that time the second node executes a DROP
KS statement, which is propagated to the leader node. Once leader node
wakes up and resumes processing of ALTER tablets KS, the KS won't exist
and the node cannot crash, which was the case before.
This patch adds `suppress_features` error injection. It allows to revoke
support for some features and it can be used to simulate upgrade process
in test.py.
Features to suppress are passed as injection's value, separated by `;`.
Example: `PARALLELIZED_AGGREGATION;UDA_NATIVE_PARALLELIZED_AGGREGATION`
Fixesscylladb/scylladb#20034Closesscylladb/scylladb#20055
Sync points are created, via POST HTTP requests, for a subset of nodes
in the cluster. Those nodes are specified in a request's parameter
`target_hosts`. When the parameter is empty, Scylla should assume
the user wants to create a sync point for ALL nodes.
Before these changes, sync points were created only for LIVE nodes.
If a node was dead but still part of the cluster and the user
requested creating a sync point leaving the parameter `target_hosts`
empty, the dead node was skipped during the creation of the sync point.
That was inconsistent with the guarantees the sync point API provides.
In this commit, we fix that issue and add a test verifying that
the changes have made the implementation compliant with the design
of the sync point API -- the test only passes after this commit.
Fixesscylladb/scylladb#9413Closesscylladb/scylladb#19750
Currently, the resource utilization in CI is low. Increasing the number of clusters will increase how many tests are executed simultaneously. This will decrease the time it takes to execute and improve resource utilization.
Related: https://github.com/scylladb/qa-tasks/issues/1667Closesscylladb/scylladb#19832
With the recently added mv admission control, we can now
test how are the view update backlogs updated and propagated
without relying just on the response delays that it was causing
until now.
This patch adds a test for it, replicating issues scylladb/scylladb#18461
and scylladb/scylladb#18783.
In the test, we start with an empty view update backlog, then perform
a write to it, increasing its backlog and saving the updated backlog
on coordinator, the backlog then drops back to 0, we wait 1s for the
backlog to be gossiped and we perform another write which should
succeed.
Due to scylladb/scylladb#18461, the test would fail because
in both gossip rounds before and after the write, the backlog was empty,
causing the write to be blocked by admission control indefinitely.
Due to scylladb/scylladb#18783, the test would fail because when
the backlog drops back to 0 after the write, the change is never
registered, causing all writes to be blocked as well.
In this patch we add 2 tests for checking that the mv admission control works.
The first one simply checks whether, after increasing the backlog on one node
over the admission control threshold, the following request is rejected with
the error message corresponding to the admission control.
The second one checks whether, after triggering admission control, the entire
user request fails instead of just failing a replica write. This is done by
performing a number of writes, some of which trigger the admission control
and cause retries, then checking if the node that had a large view update backlog
received all the writes. Before, the writes would succeed on enough replicas,
reaching QUORUM, and allowing the user write to succeed and cause no retries,
even though on the replica with a high backlog the write got rejected due to
the backlog size.
When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address.
This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0.
As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas.
In addition to the fix, this PR also includes a regression test heavily based on the test that @kbr-scylla prepared during his investigation of the issue.
Fixes: scylladb/scylladb#19439
This issue can cause multiple nodes to crash at once and the fix is quite small, so I think this justifies backporting it to all affected versions. 6.0 and 6.1 are affected. No need to backport to 5.4 as this issue only happens with tablets, and tablets are experimental there.
Closesscylladb/scylladb#19765
* github.com:scylladb/scylladb:
test: regression test for MV crash with tablets during decommission
db/view: drop view updates to replaced node marked as left
We change seeds in `ScyllaCluster.server_start` to all currently
running nodes. The previous code only pretended that it did it.
After doing this change, writing tests that create multiple clusters
is impossible. To allow it, we add the `seeds` parameter to
`ManagerClient.server_start`. We use it to fix and simplify the only
test that creates two clusters - `test_different_group0_ids`.
Use the rolling restart to avoid spurious driver reconnects.
This can be eventually reverted once the scylladb/python-driver#295 is fixed.
Fixesscylladb/scylladb#19154Closesscylladb/scylladb#19771
* github.com:scylladb/scylladb:
test: raft: fix the flaky `test_raft_recovery_stuck`
test: raft: code cleanup in `test_raft_recovery_stuck`
Replaced the old `read_barrier` helper from "test/pylib/util.py"
by the new helper from "test/pylib/rest_client.py" that is calling
the newly introduced direct REST API.
Replaced in all relevant tests and decommissioned the old helper.
Introduced a new helper `get_host_api_address` to retrieve the host API
address - which in come cases can be different from the host address
(e.g. if the RPC address is changed).
Fixes: scylladb/scylladb#19662Closesscylladb/scylladb#19739