scylladb

mirrors/scylladb

Fork 0

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-03 06:35:51 +00:00

Commit Graph

Author	SHA1	Message	Date
Tomasz Grabiec	d33d38139f	test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance Currently, the test assumes that when 'topology_coordinator_pause_before_processing_backlog: waiting' is logged, the task for decommission must be there. This was based on the assumption that topology coordinator is idle and decommission request wakes it up. But if the server is slow enough, it may still be running the load balancer in reaction to table creation, and block on that injection point before decommission request was added. Fix by waiting for the task to appear rather than the injection. Fixes SCYLLADB-715	2026-02-18 01:02:50 +01:00
Patryk Jędrzejczak	1f28a55448	test: test_tablets_parallel_decommission: prevent group0 majority loss Both of the changed test cases stop two out of four nodes when there are three group0 voters in the cluster. If one of the two live nodes is a non-voter (node 1, specifically, as node 0 is the leader), a temporary majority loss occurs, which can cause the following operations to fail. In the case of `test_tablets_are_rebuilt_in_parallel`, the `exclude_node` API can fail. In the case of `test_remove_is_canceled_if_there_is_node_down`, removenode can fail with an unexpected error message: ``` "service::raft_operation_timeout_error (group [46dd9cf1-fe21-11f0-baa0-03429f562ff5] raft operation [read_barrier] timed out)" ``` Somehow, these test cases are currently not flaky, but they become flaky in the following commit. We can consider backporting this commit to 2026.1 to prevent flakiness.	2026-02-02 10:39:55 +01:00
Tomasz Grabiec	85140cdf7e	test: Add tests for parallel decommission/removenode	2026-01-18 15:36:08 +01:00

Author

SHA1

Message

Date

Tomasz Grabiec

d33d38139f

test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance

Currently, the test assumes that when
'topology_coordinator_pause_before_processing_backlog: waiting' is
logged, the task for decommission must be there. This was based on the
assumption that topology coordinator is idle and decommission request
wakes it up. But if the server is slow enough, it may still be running
the load balancer in reaction to table creation, and block on that
injection point before decommission request was added.

Fix by waiting for the task to appear rather than the injection.

Fixes SCYLLADB-715

2026-02-18 01:02:50 +01:00

Patryk Jędrzejczak

1f28a55448

test: test_tablets_parallel_decommission: prevent group0 majority loss

Both of the changed test cases stop two out of four nodes when there are
three group0 voters in the cluster. If one of the two live nodes is
a non-voter (node 1, specifically, as node 0 is the leader), a temporary
majority loss occurs, which can cause the following operations to fail.
In the case of `test_tablets_are_rebuilt_in_parallel`, the `exclude_node`
API can fail. In the case of `test_remove_is_canceled_if_there_is_node_down`,
removenode can fail with an unexpected error message:
```
"service::raft_operation_timeout_error (group
[46dd9cf1-fe21-11f0-baa0-03429f562ff5] raft operation [read_barrier] timed out)"
```

Somehow, these test cases are currently not flaky, but they become flaky in
the following commit.

We can consider backporting this commit to 2026.1 to prevent flakiness.

2026-02-02 10:39:55 +01:00

Tomasz Grabiec

85140cdf7e

test: Add tests for parallel decommission/removenode

2026-01-18 15:36:08 +01:00

3 Commits