Commit Graph

16 Commits

Author SHA1 Message Date
Tomasz Grabiec
5c93e12373 test: util: Introduce ensure_group0_leader_on()
Many tests want to assume that group0 leader runs on a particualr
server, typically the first server in the list.

And they cannot be easily made to work with arbitrary leader, becuase
they setup a particular topology and then stop particular nodes, and
want to assume the leader is stable. They open leader's log and
expect things to appear in that log.

It's much easier to ensure the leader, than to prepare tests to
handle failovers.
2026-01-18 15:36:07 +01:00
Andrzej Jackowski
35fd603acd test: wait for read_barrier in wait_until_driver_service_level_created
Previously, `wait_until_driver_service_level_created` only waited for
the `driver` service level to appear in the output of
`LIST ALL SERVICE_LEVELS`. However, the fact that one node lists
`sl:driver` does not necessarily mean that all other nodes can see
it yet. This caused sporadic test failures, especially in DEBUG builds.

To prevent these failures, this change adds an extra wait for
a `raft/read_barrier` after the `driver` service level first appears.
This ensures the service level is globally visible across the cluster.

Fixes: scylladb/scylladb#27019
2025-11-17 15:21:28 +01:00
Andrzej Jackowski
39bfad48cc test: use ManagerClient in wait_until_driver_service_level_created
Pass a ManagerClient instead of a `cql` session to
`wait_until_driver_service_level_created`. This makes it easier
to add additional functionality to the helper later (e.g. waiting for
a Raft read barrier in a subsequent commit).

Refs: scylladb/scylladb#27019
2025-11-17 14:55:14 +01:00
Tomasz Grabiec
d2e7d6fad2 test: Generalize tests to work with both numeric RF and rack lists 2025-10-29 23:32:58 +01:00
Tomasz Grabiec
e34548ccdb test: cluster: util: Fix docstring for parse_replication_options()
rack lists are now in replication_v2, which is also parsed with this
function.
2025-10-29 23:32:57 +01:00
Andrzej Jackowski
c59a7db1c9 service_level_controller: automatically create sl:driver
This commit:
  - Increases the number of allowed scheduling groups to allow the
    creation of `sl:driver`.
  - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating
    `sl:driver` until all nodes have increased the number of
    scheduling groups.
  - Starts using `get_create_driver_service_level_mutations`
    to unconditionally create `sl:driver` on
    `raft_initialize_discovery_leader`. The purpose of this code
    path is ensuring existence of `sl:driver` in new system and tests.
  - Starts using `migrate_to_driver_service_level` to create `sl:driver`
    if it is not already present. The creation of `sl:driver` is
    managed by `topology_coordinator`, similar to other system keyspace
    updates, such as the `view_builder` migration. The purpose of this
    code path is handling upgrades.
  - Modifies related tests to pass after `sl:driver` is added.

Later in this patch series, `sl:driver` will be used by
`transport/server` to handle selected traffic, such as the driver's
schema and topology fetches.

Refs: scylladb/scylladb#24411
2025-10-08 08:24:43 +02:00
Tomasz Grabiec
5fc617ecf5 test: Add tests for rack list RF 2025-10-02 19:45:00 +02:00
Avi Kivity
1258e7c165 Revert "Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski"
This reverts commit fe7e63f109, reversing
changes made to b5f3f2f4c5. It is causing
test.py failures around cqlpy.

Fixes #26163

Closes scylladb/scylladb#26174
2025-09-22 09:32:46 +03:00
Andrzej Jackowski
6f678a2d1f service_level_controller: automatically create sl:driver
This commit:
  - Increases the number of allowed scheduling groups to allow the
    creation of `sl:driver`.
  - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating
    `sl:driver` until all nodes have increased the number of
    scheduling groups.
  - Starts using `get_create_driver_service_level_mutations`
    to unconditionally create `sl:driver` on
    `raft_initialize_discovery_leader`. The purpose of this code
    path is ensuring existence of `sl:driver` in new system and tests.
  - Starts using `migrate_to_driver_service_level` to create `sl:driver`
    if it is not already present. The creation of `sl:driver` is
    managed by `topology_coordinator`, similar to other system keyspace
    updates, such as the `view_builder` migration. The purpose of this
    code path is handling upgrades.
  - Modifies related tests to pass after `sl:driver` is added.

Later in this patch series, `sl:driver` will be used by
`transport/server` to handle selected traffic, such as the driver's
schema and topology fetches.

Refs: scylladb/scylladb#24411
2025-09-18 09:28:32 +02:00
Patryk Jędrzejczak
e41fc841cd test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency
In the Raft-based topology, a decommissioning node is removed from group
0 after the decommission request is considered finished (and the token
ring is updated). `wait_for_token_ring_and_group0_consistency` doesn't
handle such a case; it only handles cases where the token ring is
updated later. We fix this in this commit.

We rely on the new implementation of
`wait_for_token_ring_and_group0_consistency` in the following commit to
fix flakiness of some tests.

We also update the obsolete docstring in this commit.
2025-09-09 19:01:09 +02:00
Sergey Zolotukhin
0d7de90523 Fix regexp in check_node_log_for_failed_mutations
The regexp that was added in https://github.com/scylladb/scylladb/pull/23658 does not work as expected:
`TRACE`, `INFO` and `DEBUG` level messages are not ignored.
This patch corrects the pattern to ensure those log levels are excluded.

Fixes scylladb/scylladb#23688

Closes scylladb/scylladb#23889
2025-06-25 12:00:16 +03:00
Sergey Zolotukhin
2314feeae2 test: Ignore DEBUG,TRACE,INFO level messages when checking for failed mutations.
Update the regular expression in `check_node_log_for_failed_mutations` to avoid
false test failures when DEBUG-level logging is enabled.

Fixes scylladb/scylladb#23688

Closes scylladb/scylladb#23658
2025-04-18 16:17:41 +03:00
Patryk Jędrzejczak
4fd0e93154 test: add tests for the Raft-based recovery procedure 2025-03-14 13:53:05 +01:00
Patryk Jędrzejczak
4e055882c1 test: topology: util: fix the tokens consistency check for left nodes
When we remove a node in the Raft-based topology
(by remove/replace/decommission), we remove its
tokens from `system.topology`, but we do not
change `num_tokens`. Hence, the old check could
fail for left nodes.
2025-03-14 13:53:05 +01:00
Patryk Jędrzejczak
d0efc77d20 test: topology: util: extend start_writes
We extend `start_writes` to allow:
- providing `ks_name` from the test,
- restarting it (by starting it again with the same `ks_name`),
- running it in the presence of shutdowns.

We use these features in a new test in one of the following patches.
2025-03-14 13:53:05 +01:00
Artsiom Mishuta
d1198f8318 test.py: rename topology_custom folder to cluster
rename topology_custom folder to cluster
as it contains not only topology test cases
2025-03-04 10:32:44 +01:00