Commit Graph

51895 Commits

Author SHA1 Message Date
Pavel Emelyanov
6665cda23f test/object_store: Shift indentation right for test cases
This is preparational patch. Next will need to replace

  foo()
  bar()

with

  with something() as s:
      foo()
      bar()

Effectively -- only add the `with something()` line. Not to shift the
whole file right together with that future change, do it here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-02-10 15:56:27 +03:00
Pawel Pery
81d11a23ce Revert "Merge 'vector_search: add validator tests' from Pawel Pery"
This reverts commit bcd1758911, reversing
changes made to b2c2a99741.

There is a design decision to not introduce additional test
orchestration tool for scylladb.git (see comments for #27499). One
commit has already been reverted in 55c7bc7. Last CI runs made validator
test flaky, so it is a time to remove all remaining validator tests.

It needs a backport to 2026.1 to remove remaining validator tests from there.

Fixes: VECTOR-497

Closes scylladb/scylladb#28568
2026-02-08 16:29:58 +02:00
Avi Kivity
bb99bfe815 test: scylla_gdb: tighten check for Error output from gdb
When running a gdb command, we check that the string 'Error'
does not appear within the output. However, if the command output
includes the string 'Error' as part of its normal operation, this
generates a false positive. In fact the task_histogram can include
the string 'error::Error' from the Rust core::error module.

Allow for that and only match 'Error' that isn't 'error::Error'.

Fixes #28516.

Closes scylladb/scylladb#28574
2026-02-08 09:48:23 +02:00
Anna Stuchlik
dc8f7c9d62 doc: replace the OS Support page with a link to the new location
We've moved that page to another place; see https://github.com/scylladb/scylladb/issues/28561.
This commit replaces the page with the link to the new location
and adds a redirection.

Fixes https://github.com/scylladb/scylladb/issues/28561

Closes scylladb/scylladb#28562
2026-02-06 11:38:21 +02:00
Avi Kivity
7a3ce5f91e test: minio: disable web console
minio starts a web console on a random port. This was seen to interfere
with the nodetool tests when the web console port clashed with the mock
API port.

Fix by disabling the web console.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-496

Closes scylladb/scylladb#28492
2026-02-05 20:11:32 +02:00
Nikos Dragazis
5d1e6243af test/cluster: Remove short_tablet_stats_refresh_interval injection
The test `test_size_based_load_balancing.py::test_balance_empty_tablets`
waits for tablet load stats to be refreshed and uses the
`short_tablet_stats_refresh_interval` injection to speed up the refresh
interval.

This injection has no effect; it was replaced by the
`tablet_load_stats_refresh_interval_in_seconds` config option (patch: 1d6808aec4),
so the test currently waits for 60 seconds (default refresh interval).

Use the config option. This reduces the execution time to ~8 seconds.

Fixes SCYLLADB-556.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#28536
2026-02-05 20:11:32 +02:00
Pavel Emelyanov
10c278fff7 database: Remove _flush_sg member from replica::database
This field is only used to initialize the following _memtable_controller
one. It's simpler just to do the initialization with whatever value the
field itself is initialized and drop the field itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#28539
2026-02-05 13:02:35 +02:00
Petr Hála
a04dbac369 open-coredump: Change to use new backtrace
* This is a breaking change, which removes compatibility with the old backtrace
    - See https://staging.backtrace.scylladb.com/api/docs#/default/search_by_build_id_search_build_id_post for the APIDoc
* Add timestamp field to log
* Tested locally

Closes scylladb/scylladb#28325
2026-02-05 11:50:47 +02:00
Marcin Maliszkiewicz
0753d9fae5 Merge 'test: remove xfail marker from a few passing tests' from Nadav Har'El
This patch fixes the few remaining cases of XPASS in test/cqlpy and test/alternator.
These are tests which, when written, reproduced a bug and therefore were marked "xfail", but some time later the bug was fixed and we either did not notice it was ever fixed, or just forgot to remove the xfail marker.

Removing the no-longer-needed xfail markers is good for test hygiene, but more importantly is needed to avoid regressions in those already-fixed areas (if a test is already marked xfail, it can start to fail in a new way and we wouldn't notice).

Backport not needed, xpass doesn't bother anyone.

Closes scylladb/scylladb#28441

* github.com:scylladb/scylladb:
  test/cqlpy: remove xfail from tests for fixed issue 7972
  test/cqlpy: remove xfail from tests for fixed issue 10358
  test/cqlpy: remove xfail from passing test testInvalidNonFrozenUDTRelation
  test/alternator: remove xfail from passing test_update_item_increases_metrics_for_new_item_size_only
2026-02-05 10:10:43 +01:00
Marcin Maliszkiewicz
6eca74b7bb Merge 'More Alternator tests for BatchWriteItem' from Nadav Har'El
The goal of this small pull request is to reproduce issue #28439, which found a bug in the Alternator Streams output when BatchWriteItem is called to write multiple items in the same partition, and always_use_lwt write isolation mode is used.

* The first patch reproduces this specific bug in Alternator Streams.
* The second patch adds missing (Fixes #28171) tests for BatchWriteItem in different write modes, and shows that BatchWriteItem itself works correctly - the bug is just in Alternator Streams' reporting of this write.

Closes scylladb/scylladb#28528

* github.com:scylladb/scylladb:
  test/alternator: add test for BatchWriteItem with different write isolations
  test/alternator: reproducer for Alternator Streams bug
2026-02-05 10:07:29 +01:00
Yaron Kaikov
b30ecb72d5 ci: fix PR number extraction for unlabeled events
When the workflow is triggered by removing the 'conflicts' label
(pull_request_target unlabeled event), github.event.issue.number is
not available. Use github.event.pull_request.number as fallback.

Fixes: https://scylladb.atlassian.net/browse/RELENG-245

Closes scylladb/scylladb#28543
2026-02-05 08:41:43 +02:00
Michał Hudobski
6b9fcc6ca3 auth: add CDC streams and timestamps to vector search permissions
It turns out that the cdc driver requires permissions to two additional system tables. This patch adds them to VECTOR_SEARCH_INDEXING and modifies the unit tests. The integration with vector store was tested manually, integration tests will be added in vector-store repository in a follow up PR.

Fixes: SCYLLADB-522

Closes scylladb/scylladb#28519
2026-02-04 09:10:08 +01:00
Nadav Har'El
47e827262f test/alternator: add test for BatchWriteItem with different write isolations
Alternator's various write operations have different code paths for the
different write isolation modes. Because most of the test suite runs in
only a single write mode (currently - only_rmw_uses_lwt), we already
introduced a test file test/alternator/test_write_isolation.py for
checking the different write operations in *all* four write isolation
modes.

But we missed testing one write operation - BatchWriteItem. This
operation isn't very "interesting" because it doesn't support *any*
read-modify-option option (it doesn't support UpdateExpression,
ConditionExpression or ReturnValues), but even without those, the
pure write code still has different code paths with and without LWT,
and should be tested. So we add the missing test here - and it passes.

In issue #28439 we discovered a bug that can be seen in Alternator
Streams in the case of BatchWriteItem with multiple writes to the
same partition and always_use_lwt mode. The fact that the test added
here passes shows that the bug is NOT in BatchWriteItem itself, which
works correctly in this case - but only in the Alternator Streams layer.

Fixes #28171

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-04 09:24:29 +02:00
Nadav Har'El
c63f43975f test/alternator: reproducer for Alternator Streams bug
This patch adds a reproducer for an Alternator Streams bug described in
issue #28439, where the stream returns the wrong events (and fewer of
them) in the following specific combination of the following circumstances:

1. A BatchWriteItem operation writing multiple items to the *same*
   partition.

2. The "always_use_lwt" write isolation mode is used. (the bug doesn't
   occur in other write isolation modes).

We didn't catch this bug earlier because the Alternator Streams test
we had for BatchWriteItem had multiple items in multiple partitions,
and we missed the multiple-items-in-one-partition case. Moreover,
today we run all the tests in only_rmw_uses_lwt mode (in the past,
we did use always_use_lwt, but changed recently in commit e7257b1393
following commit 76a766c that changed test.py).

As issue #28439 explains, the underlying cause of the bug is that the
always_use_lwt causes the multiple items to be written with the same
timestamp, which confused the Alternator Streams code reading the CDC
log. The bug is not in BatchWriteItem itself, or in ScyllaDB CDC, but
just in the Alternator Streams layer.

The test in this patch is parameterized to run on each of the four
write isolation modes, and currently fails (and so marked xfail) just
for the one mode 'always_use_lwt'. The test is scylla_only, as its
purpose is to checks the different write isolation mode - which don't
exist in AWS DynamoDB.

Refs #28439

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-04 09:17:48 +02:00
Radosław Cybulski
03ff091bee alternator: improve events output when test failed
Improve events printing, when test in test_streams.py failed.
New code will print both expected and received events (keys, previous
image, new image and type).
New code will explicitly mark, at which output event comparison failed.

Fixes #28455

Closes scylladb/scylladb#28476
2026-02-03 21:55:07 +02:00
Anna Stuchlik
a427ad3bf9 doc: remove the link to the Open Source blog post
Fixes https://github.com/scylladb/scylladb/issues/28486

Closes scylladb/scylladb#28518
2026-02-03 14:15:16 +01:00
Botond Dénes
3adf8b58c4 Merge 'test: pylib: scylla_cluster: set shutdown_announce_in_ms to 0' from Patryk Jędrzejczak
The usual Scylla shutdown in a cluster test takes ~2.1s. 2s come from
```
co_await sleep(std::chrono::milliseconds(_gcfg.shutdown_announce_ms));
```
as the default value of `shutdown_announce_in_ms` is 2000. This sleep
makes every `server_stop_gracefully` call 2s slower. There are ~300 such
calls in cluster tests (note that some come from `rolling_restart`). So,
it looks like this sleep makes cluster tests 300 * 2s = 10min slower.
Indeed, `./test.py --mode=dev cluster` takes 61min instead of 71min
on the potwor machine (the one in the Warsaw office) without it.

We set `shutdown_announce_in_ms` to 0 for all cluster tests to make them
faster.

The sleep is completely unnecessary in tests. Removing it could introduce
flakiness, but if that's the case, then the test for which it happens is
incorrect in the first place. Tests shouldn't assume that all nodes
receive and handle the shutdown message in 2s. They should use functions
like `server_not_sees_other_server` instead, which are faster and more
reliable.

Improvement of the tests running time, so no backport. The fix of
`test_tablets_parallel_decommission` may have to be backported to
2026.1, but it can be done manually.

Closes scylladb/scylladb#28464

* github.com:scylladb/scylladb:
  test: pylib: scylla_cluster: set shutdown_announce_in_ms to 0
  test: test_tablets_parallel_decommission: prevent group0 majority loss
  test: delete test_service_levels_work_during_recovery
2026-02-03 08:19:05 +02:00
Pavel Emelyanov
19ea05692c view_build_worker: Do not switch scheduling groups inside work_on_view_building_tasks
The handler appeared back in c9e710dca3. In this commit it performed the
"core" part of the task -- the do_build_range() method -- inside the
streaming sched group. The setup code looks seemingly was copied from the
view_builder::do_build_step() method and got the explicit switch of the
scheduling group.

The switch looks both -- justified and not. On one hand, it makes it
explict that the activity runs in the streaming scheduling group. On the
other hand, the verb already uses RPC index on 1, which is negotiated to
be run in streaming group anyway. On the "third hand", even though being
explicit the switch happens too late, as there exists a lot of other
activities performed by the handler that seems to also belong to the
same scheduling group, but which is not switched into explicitly.

By and large, it seems better to avoid the explicit switch and rely on
the RPC-level negotiation-based sched group switching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#28397
2026-02-03 07:00:32 +02:00
Anna Stuchlik
77480c9d8f doc: fix the links on the repair-related pages
This is a follow-up to https://github.com/scylladb/scylladb/pull/28199.

This commit fixes the syntax of the internal links.

Fixes https://github.com/scylladb/scylladb/issues/28486

Closes scylladb/scylladb#28487
2026-02-03 06:54:08 +02:00
Botond Dénes
64b38a2d0a Merge 'Use gossiper scheduling group where needed' from Pavel Emelyanov
This is the continuation of #28363 , this time about getting gossiper scheduling group via database.
Several places that do it already have gossiper at hand and should better get the group from it.
Eventually, this will allow to get rid of database::get_gossip_scheduling_group().

Refining inter-components API, not backporting

Closes scylladb/scylladb#28412

* github.com:scylladb/scylladb:
  gossiper: Export its scheduling group for those who need it
  migration_manager: Reorder members
2026-02-03 06:51:31 +02:00
Nadav Har'El
48b01e72fa test/alternator: add test verifying that keys only allow S/B/N type
Recently we had a question whether key columns can have any supported
type. I knew that actually - they can't, that key columns can have only
the types S(tring), B(inary) or N(umber), and that is all. But it turns
out we never had a test that confirms this understanding is true.

We did have a test for it for GSI key types already,
test_gsi.py::test_gsi_invalid_key_types, but we didn't have one for the
base table. So in this patch we add this missing test, and confirm that,
indeed, both DynamoDB and Alternator refuse a key attribute with any
type other than S, B or N.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#28479
2026-02-03 06:49:02 +02:00
Andrei Chekun
ed9a96fdb7 test.py: modify logic for adding function_path in JUnit
Current way is checking only fail during the test phase, and it will
miss the cases when fail happens on another phase. This PR eliminate
this, so every phase will have modified node reporter to enrich the
JUnit XML report with custom attribute function_path.

Closes scylladb/scylladb#28462
2026-02-03 06:42:18 +02:00
Andrei Chekun
3a422e82b4 test.py: fix the file name in test summary
Current way is always assumed that the error happened in the test file,
but that not always true. This PR will show the error from the boost
logger where actually error is happened.

Closes scylladb/scylladb#28429
2026-02-03 06:38:21 +02:00
Benny Halevy
84caa94340 gossiper: add_expire_time_for_endpoint: replace fmt::localtime with gmtime in log printout
1. fmt::localtime is deprecated.
2. We should really print times in UTC, especially on the cloud.
3. The current log message does not print the timezone so it'd unclear
   to anyone reading the lof message if the expiration time is in the
   local timezone or in GMT/UTC.

Fixes the following warning:
```
gms/gossiper.cc:2428:28: warning: 'localtime' is deprecated [-Wdeprecated-declarations]
 2428 |             endpoint, fmt::localtime(clk::to_time_t(expire_time)), expire_time.time_since_epoch().count(),
      |                            ^
/usr/include/fmt/chrono.h:538:1: note: 'localtime' has been explicitly marked deprecated here
  538 | FMT_DEPRECATED inline auto localtime(std::time_t time) -> std::tm {
      | ^
/usr/include/fmt/base.h:207:28: note: expanded from macro 'FMT_DEPRECATED'
  207 | #  define FMT_DEPRECATED [[deprecated]]
      |                            ^
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#28434
2026-02-03 06:36:53 +02:00
Pavel Emelyanov
8c42704c72 storage_service: Check raft rpc scheduling group from debug namespace
Some storage_service rpc verbs may checks that a handler is executed
inside gossiper scheduling group. For that, the expected group is
grabbed from database.

This patch puts the gossiper sched group into debug namespace and makes
this check use it from there. It removes one more place that uses
database as config provider.

Refs #28410

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#28427
2026-02-03 06:34:03 +02:00
Asias He
b5c3587588 repair: Add request type in the tablet repair log
So we can know if the repair is an auto repair or a user repair.

Fixes SCYLLADB-395

Closes scylladb/scylladb#28425
2026-02-03 06:26:58 +02:00
Nadav Har'El
a63ad48b0f test/cqlpy: remove xfail from tests for fixed issue 7972
The test test_to_json_double used to fail due to #7972, but this issue
was already fixed in Scylla 5.1 and we didn't notice.
So remove the xfail marker from this test, and also update another test
which still xfails but no longer due to this issue.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-02 23:49:32 +02:00
Nadav Har'El
10b81c1e97 test/cqlpy: remove xfail from tests for fixed issue 10358
The tests testWithUnsetValues and testFilteringWithoutIndices used to fail
due to #10358, but this issue was already fixed three years ago, when the
UNSET-checking code was cleaned up, and the test is now passing.
So remove the xfail marker from these tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-02 23:49:31 +02:00
Nadav Har'El
508bb97089 test/cqlpy: remove xfail from passing test testInvalidNonFrozenUDTRelation
The test testInvalidNonFrozenUDTRelation used to fail due to #10632
(an incorrectly-printed column name in an error message) and was marked
"xfail". But this issue has already been fixed two years ago, and
the test is now passing. So remove the xfail marker.
2026-02-02 23:49:31 +02:00
Nadav Har'El
3682c06157 test/alternator: remove xfail from passing test_update_item_increases_metrics_for_new_item_size_only
The test test_metrics.py::test_update_item_increases_metrics_for_new_item_size_only
tests whether the Alternator metrics report the exactly-DynamoDB-compatible
WCU number. It is parameterized with two cases - one that uses
alternator_force_read_before_write and one which doesn't.

The case that uses alternator_force_read_before_write is expected to
measure the "accurate" WCU, and currently it doesn't, so the test
rightly xfails.
But the case that doesn't use alternator_force_read_before_write is not
expected to measure the "accurate" WCU and has a different expectation,
so this case actually passes. But because the entire test is marked
xfail, it is reported as "XPASS" - unexpected pass.

Fix this by marking only the "True" case with xfail, while the "False"
case is not marked. After this pass, the True case continues to XFAIL
and the False case passes normally, instead of XPASS.

Also removed a sentence promising that the failing case will be solved
"by the next PR". Clearly this didn't happen. Maybe we even have such
a PR open (?), but it won't the "the next PR" even if merged today.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-02 23:49:31 +02:00
Nadav Har'El
df69dbec2a Merge ' cql3/statements/describe_statement: hide paxos state tables ' from Michał Jadwiszczak
Paxos state tables are internal tables fully managed by Scylla
and they shouldn't be exposed to the user nor they shouldn't be backed up.

This commit hides those kind of tables from all listings and if such table
is directly described with `DESC ks."tbl$paxos"`, the description is generated
withing a comment and a note for the user is added.

Fixes https://github.com/scylladb/scylladb/issues/28183

LWT on tablets and paxos state tables are present in 2025.4, so the patch should be backported to this version.

Closes scylladb/scylladb#28230

* github.com:scylladb/scylladb:
  test/cqlpy: add reproducer for hidden Paxos table being shown by DESC
  cql3/statements/describe_statement: hide paxos state tables
2026-02-02 21:22:59 +02:00
Nadav Har'El
f23e796e76 alternator: fix typos in comments and variable names
Copilot found these typos in comments and variable name in alternator/,
so might as well fix them.

There are no functional changes in this patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#28447
2026-02-02 19:16:43 +03:00
Marcin Maliszkiewicz
88c4ca3697 Merge 'test: migrate guardrails_test.py from scylla-dtest' from Andrzej Jackowski
This patch series copies `guardrails_test.py` from scylla-dtest, fix it and enables it.

The motivation is to unify the test execution of guardrails test, as some tests (`cqlpy/test_guardrail_...`) were already in scylladb repo, and some were in `scylla-dtest`.

Fixes: SCYLLADB-255

No backport, just test migration

Closes scylladb/scylladb#28454

* github.com:scylladb/scylladb:
  test: refactor test_all_rf_limits in guardrails_test.py
  test: specify exceptions being caught in guardrails_test.py
  test: enable guardrails_test.py
  test: add wait_other_notice to test_default_rf in guardrails_test.py
  test: copy guardrails_test.py from scylla-dtest
2026-02-02 16:54:13 +01:00
Avi Kivity
acc54cf304 tools: toolchain: adapt future toolchain to loss of toxiproxy in Fedora
Next Fedora will likely not have toxiproxy packaged [1]. Adapt
by installing it directly. To avoid changing the current toolchain,
add a ./install-dependencies --future option. This will allow us
to easily go back to the packages if the Fedora bug is fixed.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2426954

Closes scylladb/scylladb#28444
2026-02-02 17:02:19 +02:00
Avi Kivity
419636ca8f test: ldap: regularize toxiproxy command-line options
Modern toxiproxy interprets `-h` as help and requires the subcommand
subject (e.g. the proxy name) to be after the subcommand switches.
Arrange the command line in the way it likes, and spell out the
subcommands to be more comprehensible.

Closes scylladb/scylladb#28442
2026-02-02 17:00:58 +02:00
Botond Dénes
2b3f3d9ba7 Merge 'test.py: support boost labels in test.py' from Artsiom Mishuta
related PR: https://github.com/scylladb/scylladb/pull/27527

This PR changes test.py logic of parsing boost test cases to use -- --list_json_content
and pass boost labels as pytests markers

using  -- --list_json_content is not ideal and currenly require to implement severall [workarounds](https://github.com/scylladb/scylladb/pull/27527#issuecomment-3765499812), but having the ability to support boost labels in pytest is worth it. because now we can apply the tiering mechanism for the boost tests as well

Fixes SCYLLADB-246

Closes scylladb/scylladb#28232

* github.com:scylladb/scylladb:
  test: add nightly label
  test.py: support boost labels in test.py
2026-02-02 16:55:29 +02:00
Dawid Mędrek
68981cc90b Merge 'raft topology: generate notification about released nodes only once' from Piotr Dulikowski
Hints destined for some other node can only be drained after the other node is no longer a replica of any vnode or tablet. In case when tablets are present, a node might still technically be a replica of some tablets after it moved to left state. When it no longer is a replica of any tablet, it becomes "released" and storage service generates a notification about it. Hinted handoff listens to this notification and kicks off draining hints after getting it.

The current implementation of the "released" notification would trigger every time raft topology state is reloaded and a left node without any tokens is present in the raft topology. Although draining hints is idempotent, generating duplicate notifications is wasteful and recently became very noisy after in 44de563 verbosity of the draining-related log messages have been increased. The verbosity increase itself makes sense as draining is supposed to be a rare operation, but the duplicate notification bug now needs to be addressed.

Fix the duplicate notification problem by passing the list of previously released nodes to the `storage_service::raft_topology_update_ip` function and filtering based on it. If this function processes the topology state for the first time, it will not produce any notifications. This is fine as hinted handoff is prepared to detect "released" nodes during the startup sequence in main.cc and start draining the hints there, if needed.

Fixes: scylladb/scylladb#28301
Refs: scylladb/scylladb#25031

The log messages added in 44de563 cause a lot of noise during topology operations and tablet migrations, so the fix should be backported to all affected versions (2025.4 and 2026.1).

Closes scylladb/scylladb#28367

* github.com:scylladb/scylladb:
  storage_service: fix indentation after previous patch
  raft topology: generate notification about released nodes only once
  raft topology: extract "released" nodes calculation to external function
2026-02-02 15:39:15 +01:00
Jenkins Promoter
c907fc6789 Update pgo profiles - aarch64 2026-02-02 14:56:49 +02:00
Dawid Mędrek
b0afd3aa63 Merge 'storage_service: set up topology properly in maintenance mode' from Patryk Jędrzejczak
We currently make the local node the only token owner (that owns the
whole ring) in maintenance mode, but we don't update the topology properly.
The node is present in the topology, but in the `none` state. That's how
it's inserted by `tm.get_topology().set_host_id_cfg(host_id);` in
`scylla_main`. As a result, the node started in maintenance mode crashes
in the following way in the presence of a vnodes-based keyspace with the
NetworkTopologyStrategy:
```
scylla: locator/network_topology_strategy.cc:207:
    locator::natural_endpoints_tracker::natural_endpoints_tracker(
    const token_metadata &, const network_topology_strategy::dc_rep_factor_map &):
    Assertion `!_token_owners.empty() && !_racks.empty()' failed.
```
Both `_token_owners` and `_racks` are empty. The reason is that
`_tm.get_datacenter_token_owners()` and
`_tm.get_datacenter_racks_token_owners()` called above filter out nodes
in the `none` state.

This bug basically made maintenance mode unusable in customer clusters.

We fix it by changing the node state to `normal`.

We also extend `test_maintenance_mode` to provide a reproducer for

Fixes #27988

This PR must be backported to all branches, as maintenance mode is
currently unusable everywhere.

Closes scylladb/scylladb#28322

* github.com:scylladb/scylladb:
  test: test_maintenance_mode: enable maintenance mode properly
  test: test_maintenance_mode: shutdown cluster connections
  test: test_maintenance_mode: run with different keyspace options
  test: test_maintenance_mode: check that group0 is disabled by creating a keyspace
  test: test_maintenance_mode: get rid of the conditional skip
  test: test_maintenance_mode: remove the redundant value from the query result
  storage_proxy: skip validate_read_replica in maintenance mode
  storage_service: set up topology properly in maintenance mode
2026-02-02 13:28:19 +01:00
Andrzej Jackowski
298aca7da8 test: refactor test_all_rf_limits in guardrails_test.py
Before this commit, `test_all_rf_limits` was implemented in a
repetitive manner, making it harder to understand how the guardrails
were tested. This commit refactors the test to reduce code redundancy
and verify the guardrails more explicitly.
2026-02-02 10:49:12 +01:00
Andrzej Jackowski
136db260ca test: specify exceptions being caught in guardrails_test.py
Before this commit, the test caught a broad `Exception`. This change
specifies the expected exceptions to avoid a situation where the product
or test is broken and it goes undetected.
2026-02-02 10:48:07 +01:00
Patryk Jędrzejczak
ec2f99b3d1 test: pylib: scylla_cluster: set shutdown_announce_in_ms to 0
The usual Scylla shutdown in a cluster test takes ~2.1s. 2s come from
```
co_await sleep(std::chrono::milliseconds(_gcfg.shutdown_announce_ms));
```
as the default value of `shutdown_announce_in_ms` is 2000. This sleep
makes every `server_stop_gracefully` call 2s slower. There are ~300 such
calls in cluster tests (note that some come from `rolling_restart`). So,
it looks like this sleep makes cluster tests 300 * 2s = 10min slower.
Indeed, `./test.py --mode=dev cluster` takes 61min instead of 71min
on the potwor machine (the one in the Warsaw office) without it.

We set `shutdown_announce_in_ms` to 0 for all cluster tests to make them
faster.

The sleep is completely unnecessary in tests. Removing it could introduce
flakiness, but if that's the case, then the test for which it happens is
incorrect in the first place. Tests shouldn't assume that all nodes
receive and handle the shutdown message in 2s. They should use functions
like `server_not_sees_other_server` instead, which are faster and more
reliable.
2026-02-02 10:39:55 +01:00
Patryk Jędrzejczak
1f28a55448 test: test_tablets_parallel_decommission: prevent group0 majority loss
Both of the changed test cases stop two out of four nodes when there are
three group0 voters in the cluster. If one of the two live nodes is
a non-voter (node 1, specifically, as node 0 is the leader), a temporary
majority loss occurs, which can cause the following operations to fail.
In the case of `test_tablets_are_rebuilt_in_parallel`, the `exclude_node`
API can fail. In the case of `test_remove_is_canceled_if_there_is_node_down`,
removenode can fail with an unexpected error message:
```
"service::raft_operation_timeout_error (group
[46dd9cf1-fe21-11f0-baa0-03429f562ff5] raft operation [read_barrier] timed out)"
```

Somehow, these test cases are currently not flaky, but they become flaky in
the following commit.

We can consider backporting this commit to 2026.1 to prevent flakiness.
2026-02-02 10:39:55 +01:00
Patryk Jędrzejczak
bcf0114e90 test: delete test_service_levels_work_during_recovery
The test becomes flaky in one of the following commits. However, there is
no need to fix it, as we should delete it anyway. We are in the process of
removing the gossip-based topology from the code base, which includes the
recovery mode. We don't have to rewrite the test to use the new Raft-based
recovery procedure, as there is nothing interesting to test (no regression
to legacy service levels).
2026-02-02 10:39:54 +01:00
Artsiom Mishuta
af2d7a146f test: add nightly label
add nightly label for test
test_foreign_reader_as_mutation_source
as an example of usinf boost labels pytest as markers

command to test :
./tools/toolchain/dbuild  pytest --test-py-init --collect-only -q -m=nightly test/boost

output:
boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.debug.1
boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.release.1
boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.dev.1
2026-02-02 10:30:38 +01:00
Gleb Natapov
08268eee3f topology: disable force-gossip-topology-changes option
The patch marks force-gossip-topology-changes as deprecated and removes
tests that use it. There is one test (test_different_group0_ids) which
is marked as xfail instead since it looks like gossiper mode was used
there as a way to easily achieve a certain state, so more investigation
is needed if the tests can be fixed to use raft mode instead.

Closes scylladb/scylladb#28383
2026-02-02 09:56:32 +01:00
Avi Kivity
ceec703bb7 Revert "main: test: add future and abort_source to after_init_func"
This reverts commit 7bf7ff785a. The commit
tried to add clean shutdown to `scylla perf` paths, but forgot at least
`scylla perf-alternator --workload wr` which now crashes on uninitialized
`c.as`.

Fixes #28473

Closes scylladb/scylladb#28478
2026-02-02 09:22:24 +01:00
Avi Kivity
cc03f5c89d cql3: support literals and bind variables in selectors
Add support for literals in the SELECT clause. This allows
SELECT fn(column, 4) or SELECT fn(column, ?).

Note, "SELECT 7 FROM tab" becomes valid in the grammar, but is still
not accepted because of failed type inference - we cannot infer the
type of 7, and don't have a favored type for literals (like C favors
int). We might relax this later.

In the WHERE clause, and Cassandra in the SELECT clause, type hints
can also resolve type ambiguity: (bigint)7 or (text)?. But this is
deferred to a later patch.

A few changes to the grammar are needed on top of adding a `value`
alternative to `unaliasedSelector`:

 - vectorSimilarityArg gained access to `value` via `unaliasedSelector`,
   so it loses that alternate to avoid ambiguity. We may drop
   `vectorSimilarityArg` later.
 - COUNT(1) became ambiguous via the general function path (since
   function arguments can now be literals), so we remove this case
   from the COUNT special cases, remaining with count(*).
 - SELECT JSON and SELECT DISTINCT became "ambiguous enough" for
   ANTLR to complain, though as far as I can tell `value` does not
   add real ambiguity. The solution is to commit early (via "=>") to
   a parsing path.

Due to the loss of count(1) recognition in the parser, we have to
special-case it in prepare. We may relax it to count any expression
later, like modern Cassandra and SQL.

Testing is awkward because of the type inference problem in top-level.
We test via the set_intersection() function and via lua functions.

Example:

```
cqlsh> CREATE FUNCTION ks.sum(a int, b int) RETURNS NULL ON NULL INPUT RETURNS int  LANGUAGE LUA AS 'return a + b';
cqlsh> SELECT ks.sum(1, 2) FROM system.local;

 ks.sum(1, 2)
--------------
            3

(1 rows)
cqlsh>
```

(There are no suitable system functions!)

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-296

Closes scylladb/scylladb#28256
2026-02-02 00:06:13 +02:00
Patryk Jędrzejczak
68b105b21c db: virtual tables: add the rack column to cluster_status
`system.cluster_status` is missing the rack info compared to `nodetool status`
that is supposed to be equivalent. It has probably been an omission.

Closes scylladb/scylladb#28457
2026-02-01 20:36:53 +01:00
Pavel Emelyanov
6f3f30ee07 storage_service: Use stream_manager group for streaming
The hander of raft_topology_cmd::command::stream_ranges switches to
streaming scheduling group to perform data streaming in it. It grabs the
group from database db_config, which's not great. There's streaming
manager at hand in storage service handlers, since it's using its
functionality, it should use _its_ scheduling group.

This will help splitting the streaming scheduling group into more
elaborated groups under the maintenance supergroup: SCYLLADB-351

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#28363
2026-02-01 20:42:37 +02:00