Commit Graph

25 Commits

Author SHA1 Message Date
Gleb Natapov
e88ce09372 raft_topology: log sub-step progress in local_topology_barrier
When a node processes a barrier_and_drain topology command, it performs
two potentially long-running operations inside local_topology_barrier():
waiting for stale token metadata versions to be released
(stale_versions_in_use) and draining closing sessions
(drain_closing_sessions). Either of these can hang indefinitely -- for
example, stale_versions_in_use blocks until all references to previous
token metadata versions are released, which depends on in-flight
requests completing.

Previously, the only logging was a single 'done' message at the end,
making it impossible to determine which sub-step was blocking when a
barrier_and_drain RPC appeared stuck on a node. In a recent CI failure,
a node never responded to barrier_and_drain during a removenode
operation, and the logs showed the RPC was received but nothing about
what it was waiting on internally.

Add info-level logging before each blocking sub-step, including the
topology version for correlation. This allows diagnosing hangs by
showing whether the node is stuck waiting for stale metadata versions,
stuck draining sessions, or never reached these steps at all.
2026-05-04 15:58:45 +03:00
Piotr Szymaniak
a5d35d2b4c test/cluster: Test deferred stream enablement on tablet tables
Async cluster test exercising the deferred enablement lifecycle:
ENABLING -> ENABLED -> disabled, verifying tablet merge blocking
and unblocking at each stage. Uses delay_cdc_stream_finalization
error injection and CQL ALTER TABLE with tablet count constraints.

Also adds tablet scheduler config to test_config.yaml (fast refresh
interval, scale factor 1) for reliable tablet count changes.
2026-04-19 03:54:33 +02:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Piotr Szymaniak
c8e7e20c5c test/cluster: retry create_table on transient schema agreement timeout
In test_index_requires_rf_rack_valid_keyspace, the create_table call
for a plain tablet-based table can fail with 'Unable to reach schema
agreement' after the server's 10s timeout is exceeded. This happens
when schema gossip propagation across the 4-node cluster takes longer
than expected after a sequence of rapid schema changes earlier in the
test.

Add a retry (up to 2 attempts) on schema agreement errors for this
specific create_table call rather than increasing the server-side
timeout.

Fixes: SCYLLADB-1135

Closes scylladb/scylladb#29132
2026-03-23 10:45:30 +02:00
Nadav Har'El
ad832c263e test/cluster: mark test_alternator_concurrent_rmw_same_partition_different_server not strictly xfail
A few days ago, in commit 7b30a39 we added to pytest.ini the option
xfail_strict. This option causes every time a test XPASSes, i.e., an xfail
test actually passes - to be considered an error and fail the test.

But some tests demonstrate a timing-related bug and do not reproduce the
bug every single time. An example we noticed in one CI run is:

test/cluster/test_alternator.py::test_alternator_concurrent_rmw_same_partition_different_server

This test reproduces a timing-related bug (if you do an LWT write to
one partition on to two different coordinators "at the same time", you
can get a failure), but only most of the time, not 100% of the time.

The solution is to add "strict=False" for the xfail marker on this specific
test. This undoes the xfail_strict for this specific test, accepting that
this specific test can either pass or fail. Note that this does NOT make
this test worthless - we still see this test failing most of the time, and
when a developer finally fixes this issue, the test will begin to pass all
the time.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-941
Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#29016
2026-03-12 23:46:23 +02:00
Nadav Har'El
0c7f499750 test, alternator: add test for TTL expiration with a node down
We have many single-node functional tests for Alternator TTL in
test/alternator/test_ttl.py. This patch adds a multi-node test in
test/cluster/test_alternator.py. The new test verifies that:

 1. Even though Alternator TTL splits the work of scanning and expiring
    items between nodes, all the items get correctly expired.
 2. When one node is down, all the items still expire because the
    "secondary" owner of each token range takes over expiring the
   items in this range while the "primary" owner is down.

This new test is actually a port of a test we already had in dtest
(alternator_ttl_tests.py::test_multinode_expiration). This port is
faster and smaller then the original (fewer nodes, fewer rows), but it
still found a regression (SCYLLADB-777) that dtest missed - the new test
failed when running with tablets and in release build mode.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-23 16:19:43 +02:00
Andrei Chekun
cc5ac75d73 test.py: remove deprecated skip_mode decorator
Finishing the deprecation of the skip_mode function in favor of
pytest.mark.skip_mode. This PR is only cleaning and migrating leftover tests
that are still used and old way of skip_mode.

Closes scylladb/scylladb#28299
2026-01-25 18:17:27 +02:00
Michael Litvak
1f7a65904e alternator: don't require rf_rack flag for indexes, validate instead
In 8df61f6d99 we changed the requirements for creating materialized
views and MV-based indexes - instead of requiring the
rf_rack_valid_keyspaces flag to be set, we now require the keyspace to
be RF-rack-valid at the time of creation, and it is enforced to remain
RF-rack-valid while the MV exists. This validation is done in the cql
create view/index statements.

The same should be done also for alternator - when creating a table with
GSI or LSI, or when adding a GSI to an existing table, previously we
required the flag rf_rack_valid_keyspaces to be set. Now we change it to
instead check if the keyspace is RF-rack-valid, and if not the operation
fails with an appropriate error.
2026-01-22 16:11:35 +01:00
Michael Litvak
e7ec87382e Revert "alternator: require rf_rack_valid_keyspaces when creating index"
This reverts commit 4b26a86cb0.

The rf_rack_valid_keyspaces option is now not required for creating MVs.
2026-01-20 09:56:48 +01:00
Tomasz Grabiec
6936704677 test: Add missing calls to disable_tablet_balancing() in tests which use move_tablet() API
If a test tries to move a tablet, it assumes the tablets are stable.
This fixes flakiness exposed by size-based load-balancing and a later
change to refresh stats sooner.
2026-01-13 00:38:00 +01:00
Andrei Chekun
c950c2e582 test.py: convert skip_mode function to pytest.mark
Function skip_mode works only on function and only in cluster test. This if OK
when we need to skip one test, but it's not possible to use it with pytestmark
to automatically mark all tests in the file. The goal of this PR is to migrate

skip_mode to be dynamic pytest.mark that can be used as ordinary mark.

Closes scylladb/scylladb#27853

[avi: apply to test/cluster/test_tablets.py::test_table_creation_wakes_up_balancer]
2026-01-08 21:55:16 +02:00
Michael Litvak
b9ec1180f5 alternator: require rf_rack_valid_keyspaces when creating index
When creating an alternator table with tablets, if it has an index, LSI
or GSI, require the config option rf_rack_valid_keyspaces to be enabled.

The option is required for materialized views in tablets keyspaces to
function properly and avoid consistency issues that could happen due to
cross-rack migrations and pairing switches when RF-rack validity is not
enforced.

Currently the option is validated when creating a materialized view via
the CQL interface, but it's missing from the alternator interface. Since
alternator indexes are based on materialized views, the same check
should be added there as well.

Fixes scylladb/scylladb#27612

Closes scylladb/scylladb#27622
2025-12-15 10:36:57 +02:00
Petr Gusev
3a865fe991 alternator/executor.cc: move shard check into cas_write
This change ensures that if cas_shard points to a different shard,
the executor will continue issuing shard jumps until
cas_shard.this_shard() returns true. The commit simply moves the
this_shard() check from the parallel_for_each lambda into cas_write,
with minimal functional changes.

We enable test_alternator_invalid_shard_for_lwt since now it should
pass.

Fixes scylladb/scylladb#27353
2025-12-09 10:21:01 +01:00
Petr Gusev
e60bcd0011 test_alternator: add test_alternator_invalid_shard_for_lwt
This test reproduces scylladb/scylladb#27353 using two injection
points. First, the test triggers an intra-node tablet migration and
suspends it at the streaming stage using the
intranode_migration_streaming_wait injection. Next, it enables the
alternator_executor_batch_write_wait injection, which suspends a
batch write after its cas_shard has already been created.
The test then issues several batch writes and waits until one of them
hits this injection on the destination shard. At this point, the
cas_shard.erm for that write is still in the streaming state,
meaning the executor would need to jump back to the source shard.
The test then resumes the suspended tablet migration, allowing it to
update the ERM on the source shard to write_both_read_new. After that,
the test releases the suspended batch write and expects it to perform
two shard jumps: first from the destination to the source shard, and
then again back to the source shard.

This commit adds the alternator_executor_batch_write_wait injection to
alternator/executor.cc. Coroutines are intentionally avoided in the
parallel_for_each lambda to prevent unnecessary coroutine-frame
allocations.
2025-12-08 10:29:28 +01:00
Nadav Har'El
7dc04b033c test/cluster: fix missing racks in xfailing Alternator test
Since Alternator is now using tablets by default, it's no longer possible
to create an Alternator table on a 3-node cluster with a single rack -
you need to have 3 racks to support RF=3.

Most of the multi-node Alternator tests in test/cluster/test_alternator.py
were already fixed to use a 3-rack cluster, but one test was missed
because it was marked "xfail" so its new failure to create the table was
missed. This patch adds the missing 3-rack setup, so the xfailing test
returns to failing on the real bug - not on the table creation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#27382
2025-12-03 10:54:11 +03:00
Nadav Har'El
65ed678109 test,alternator: use 3-rack clusters in tests
With tablets enabled, we can't create an Alternator table on a three-
node cluster with a single rack, since Scylla refuses RF=3 with just
one rack and we get the error:

    An error occurred (InternalServerError) when calling the CreateTable
    operation: ... Replication factor 3 exceeds the number of racks (1) in
    dc datacenter1

So in test/cluster/test_alternator.py we need to use the incantation
"auto_rack_dc='dc1'" every time that we create a three-node cluster.

Before this patch, several tests in test/cluster/test_alternator.py
failed on this error, with this patch all of them pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
63897370cb alternator: Fix tag name to request vnodes
The tag was lately renamed from `experimental:initial_tablets` to
`system::initial_tablets`. This commit fixes both the tests as well as
the exceptions sent to the user instructing how to create table with
vnodes.
2025-11-09 12:52:29 +02:00
Tomasz Grabiec
7f66f67d95 test: alternator: Adjust for rack lists
To achieve RF=3 with tablets and rf_rack_valid_keyspaces, we need 3
racks. So change the test to create 3 racks. Alternator was bypassing
standard keyspace creation path, so it escaped validation. But this
will change, and the test will stop wroking.

Also, after auto-expansion of RF to rack list, not all of 4 nodes
will host replicas. So need to adjust expectations.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
40e7543361 test: Create cluster with multiple racks in multi-dc setups
To allow auto-expansion of numeric RF to rack list. Otherwise,
keyspace creation will be rejected if rf-rack-valid keyspaces are
enforced.
2025-10-29 23:32:57 +01:00
Nadav Har'El
265660d22f test/cluster: greatly speed up test_localnodes_joining_nodes
The test cluster/test_alternator::test_localnodes_joining_nodes takes
a whopping 2 minutes and 9 seconds to run before this patch. After this
patch, it takes just 7 seconds.

The slowness of this test was caused by booting a second node that hangs
during boot for 2 minutes, deliberately. We never intended for this boot
to finish (the whole point of this test is to run before it finishes),
but unfortunately had to wait for it to avoid all sort of nasty problems
with unwaited futures.

As comments already explained in the code, the solution to this problem
is to *kill* the server at the end of the test - after we kill it, we
can wait for it - this wait will very quickly notice that the server
addition failed, and not need to wait 2 minutes. But until the previous
patch, we had no API to find the server which is starting (not yet
running), or to kill it. After the previous patch, we do have such an
API, and can now use it, and see this test finish in 7 seconds instead
of 2 minutes and 9 seconds.
2025-09-25 14:00:16 +03:00
Artsiom Mishuta
4b975668f6 tiering (test.py): introduce tiering labels
introduce tiering marks
1 “unstable” - For unstable tests that will be will continue runing every night and generate up-to-date statistics with failures without failing the “Main” verification path(scylla-ci, Next)

2 “nightly” - for tests that are quite old, stable, and test functionality that rather not be changed or affected by other features, are partially covered in other tests, verify non-critical functionality, have not found any issues or regressions, too long to run on every PR,  and can be popped out from the CI run.

set 7 long tests(according to statistic in elastic) as nightly(theses 8 tests took 20% of CI run,
about 4 hours without paralelization)
1 test as unstable(as exaple ot marker usage)

Closes scylladb/scylladb#24974
2025-08-04 15:38:16 +03:00
Nadav Har'El
2431f92967 alternator, test: add reproducer for issue about immediate LWT timeout
This patch adds a reproducer for issue #16261, where it was reported
that when Alternator read-modify-write (using LWT) operations to the
same partition are sent to different nodes, sometimes the operation
fails immediately, with an InternalServerError claiming to be a "timeout",
although this happens almost immediately (after a few milliseconds),
not after any real timeout.

The test uses 3 nodes, and 3 threads which send RMW operations to different
items in the same partition, and usually (though not with 100% certainty)
it reaches the InternalServerError in around 100 writes by each thread.
This InternalServerError looks like:

    Internal server error: exceptions::mutation_write_timeout_exception
    (Operation timed out for alternator_alternator_Test_1719157066704.alternator_Test_1719157066704 - received only 1 responses from 2 CL=LOCAL_SERIAL.)

The test also prints how much time it took for the request to fail,
for example:
    In incrementing 1,0 on node 1: error after 0.017074108123779297
This is 0.017 seconds - it's not the cas_contention_timeout_in_ms
timeout (1 second) or any other timeout.

If we enable trace logging, adding to topology_experimental_raft/suite.yaml
    extra_scylla_cmdline_options: ["--logger-log-level", "paxos=trace"]
we get the following TRACE-level message in the log:

    paxos - CAS[0] accept_proposal: proposal is partially rejected

This again shows the problem is "uncertainty" (partial rejection) and not
a timeout.

Refs #16261

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19445
2025-08-01 11:58:52 +03:00
Nadav Har'El
16c1365332 test,alternator: test server-side load balancing with zero-token node
In issue #6527 it was suggested that a zero-token node (a.k.a coordinator-
only node, or data-less node) could serve as a topology-aware Alternator
load balancer - requests could be sent to it and they will be forwarded to
the right node.

This feature was implemented, but we never tested that it actually works
for Alternator requests. So this patch tests this by starting a 5-node
cluster with 4 regular nodes and one zero-token node, and testing that
requests to the zero-token node work as expected.

It is important to know that this feature does indeed work as expected,
and also to have a regression test for it so the feature doesn't break
in the future.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23114
2025-06-25 11:13:15 +03:00
Nadav Har'El
3ce7e250cc alternator: fix schema "concurrent modification" errors
In ScyllaDB, schema modification operations use "optimistic locking":
A schema operation reads the current schema, decides what it wants to do
and prepares changes to the schema, and then attempts to commit those
changes - but only if the schema hasn't changed since the first read.
If the schema has already been changed by some other node - we need to
try again. In a loop.

In Alternator, there are six operations that perform schema modification:
CreateTable, DeleteTable, UpdateTable, TagResource, UntagResource and
UpdateTimeToLive. All of them were missing this loop. We knew about
this - and even had FIXME in all places. So all these operations,
when facing contention of concurrent schema modifications on different
nodes may fail one of these operations with an error like:

   Internal server error: service::group0_concurrent_modification
   (Failed to apply group 0 change due to concurrent modification).

This problem had very minor effect, if any, on real users because the
DynamoDB SDK automatically retries operations that fail with retryable
errors - like this "Internal server error" - and most likely the schema
operation will succeed upon retry. However, as shown in issue #13152
these failures were annoying in our CI, where tests - which disable
request retries - failed on these errors.

This patch fixes all six operations (the last three operations all
use one common function, db::modify_tags(), so are fixed by one
change) to add the missing loop.

The patch also includes reproducing tests for all these operations -
the new tests all fail before this patch, and pass with it.

These new tests are much more reliable reproducers than the dtests
we had that only sometimes - very rarely - reproduced the problem.
Moreover, the new tests reproduces the bug seperately for each of the
six operations, so if we forget to fix one of the six operations, one
of the tests would have continued to fail. Of course I checked this
during development.

The new tests are in the test/cluster framework, not test/alternator,
because this problem can only be reproduced in a multi-node cluster:
On a single node, it serializes its schema modifications on its own;
The collisions only happen when more than one node attempts schema
modifications at the same time.

Fixes #13152

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23827
2025-05-05 09:59:08 +03:00
Artsiom Mishuta
d1198f8318 test.py: rename topology_custom folder to cluster
rename topology_custom folder to cluster
as it contains not only topology test cases
2025-03-04 10:32:44 +01:00