Commit Graph

7 Commits

Author SHA1 Message Date
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Tomasz Grabiec
5e6935f276 test: Use ManagerClient.{disable,enable}_tablet_balancing() 2026-01-13 00:38:00 +01:00
Dawid Mędrek
c4b32c38a3 test/cluster: Disable rf_rack_valid_keyspaces in problematic tests
Some of the tests in the test suite have proven to be more problematic
in adjusting to RF-rack-validity. Since we'd like to run as many tests
as possible with the `rf_rack_valid_keyspaces` configuration option
enabled, let's disable it in those. In the following commit, we'll enable
it by default.
2025-05-10 16:30:49 +02:00
Dawid Mędrek
dbb8835fdf test/cluster: Adjust simple tests to RF-rack-validity
We adjust all of the simple cases of cluster tests so they work
with `rf_rack_valid_keyspaces: true`. It boils down to assigning
nodes to multiple racks. For most of the changes, we do that by:

* Using `pytest.mark.prepare_3_racks_cluster` instead of
  `pytest.mark.prepare_3_nodes_cluster`.
* Using an additional argument -- `auto_rack_dc` -- when calling
  `ManagerClient::servers_add()`.

In some cases, we need to assign the racks manually, which may be
less obvious, but in every such situation, the tests didn't rely
on that assignment, so that doesn't affect them or what they verify.
2025-05-10 16:30:18 +02:00
Tomasz Grabiec
8e506c5a8f test: tablets: Fix flakiness due to ungraceful shutdown
The test fails sporadically with:

cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}

That's becase a server is stopped in the middle of the workload.

The server is stopped ungracefully which will cause some requests to
time out. We should stop it gracefully to allow in-flight requests to
finish.

Fixes #20492

Closes scylladb/scylladb#23451
2025-03-27 09:44:07 +03:00
Tomasz Grabiec
d6f8810e66 topology_coordinator: Allow capacity stats to be refreshed with some nodes down
With capacity-aware balancing, if we're missing capacity for a normal
node, we won't be able to proceed with tablet drain. Consider the
following scenario:

1. Nodes: A, B
2. refresh stats with A and B
3. Add node C
4. Node B goes down
5. removenode B starts
6. stats refreshing fails because B is down

If we don't have capacity stats for node C, load balancer cannot make
decisions and removenode is blocked indefinitely. A reproducer is
added in this patch.

To alleviate that, we allow capacity stats to be collected for nodes
which are reachable, we just don't update the table size part.

To keep table stats monotonic, we cache previous results per node, so
even if it's unreachable now, we use its last reported sizes. It's
still more accurate than not refreshing stats at all. A node can be
down for a long period, and other replicas can grow in size. It's not
perfect, because the stale node can skew the stats in its direction,
but ignoring it completely has its pitfalls too. Better solution is
left for later.
2025-03-06 13:35:37 +01:00
Artsiom Mishuta
d1198f8318 test.py: rename topology_custom folder to cluster
rename topology_custom folder to cluster
as it contains not only topology test cases
2025-03-04 10:32:44 +01:00