Commit Graph

52358 Commits

Author SHA1 Message Date
Amnon Heiman
69fbcd32bd test/alternator/test_metrics.py: Update the bucket in the histogram search 2026-03-11 15:24:05 +02:00
Amnon Heiman
50af1f3671 alternator: Use batch_histogram for batch size histograms
Switch batch-related histograms to estimated_histogram_with_max.
Results with better memory consumption and improve efficiency.
2026-03-11 15:21:25 +02:00
Amnon Heiman
b22162c719 estimated_histogram.hh: adds estimated_histogram_with_max
This patch adds estimated_histogram_with_max template that will be a
based for specific estimated_histograms, eventually replacing the current
struct implementation.

Introduce estimated_histogram_with_max<Max> as a reusable wrapper around
approx_exponential_histogram<1, Max, 4>, providing merge support and the
same add helpers used by existing estimated_histogra type.

Add estimated_histogram_with_max_merge()

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2026-03-11 15:02:37 +02:00
Dawid Mędrek
5feed00caa Merge 'raft: read_barrier: update local commit_idx to read_idx when it's safe' from Patryk Jędrzejczak
When the local entry with `read_idx` belongs to the current term, it's
safe to update the local `commit_idx` to `read_idx`.

The motivation for this change is to speed up read barriers. `wait_for_apply`
executed at the end of `read_barrier` is delayed until the follower learns
that the entry with `read_idx` is committed. It usually happens quickly in
the `read_quorum` message. However, non-voters don't receive this message,
so they have to wait for `append_entries`. If no new entries are being
added, `append_entries` can come only from `fsm::tick_leader()`. For group0,
this happens once every 100ms.

The issue above significantly slows down cluster setups in tests. Nodes
join group0 as non-voters, and then they are met with several read barriers
just after a write to group0. One example is `global_token_metadata_barrier`
in `write_both_read_new` performed just after `update_topology_state` in
`write_both_read_old`.

I tested the performance impact of this change with the following test:
```python
for _ in range(10):
    await manager.servers_add(3)
```
It consistently takes 44-45s with the change and 50-51s without the change
in dev mode.

No backport:
- non-critical performance improvement mostly relevant in tests,
- the change requires some soak time in master.

Closes scylladb/scylladb#28891

* github.com:scylladb/scylladb:
  raft: server: fix the repeating typo
  raft: clarify the comment about read_barrier_reply
  raft: read_barrier: update local commit_idx to read_idx when it's safe
  raft: log: clarify the specification of term_for
2026-03-06 18:50:08 +01:00
Botond Dénes
4fdc0a5316 Merge 'Relax test's check_mutation_replicas() argument list' from Pavel Emelyanov
The one accepts long list of arguments, some of those is not really needed. Also some callers can be relaxed not to provide default values for arguments with such.

Improving tests, not backporting

Closes scylladb/scylladb#28861

* github.com:scylladb/scylladb:
  test: Remove passing default "expected_replicas" to check_mutation_replicas()
  test: Remove scope and primary-replica-only arguments from check_mutation_replicas() helper
2026-03-06 11:25:00 +02:00
Szymon Malewski
d817e56e87 vector_similarity_fcts.cc: fix strict aliasing violation in extract_float_vector
Previous code performed endian conversion by bulk-copying raw bytes
into a std::vector<float> and then iterating over it via a
reinterpret_cast<uint32_t*> pointer. Accessing float storage through a
uint32_t* violates C++ strict aliasing rules, giving the compiler
freedom to reorder or elide the stores, causing undefined behavior.

Replace the two-pass approach with a single-pass loop using
seastar::consume_be<uint32_t>() and std::bit_cast<float>(), which is
both well-defined and auto-vectorizable.

Follow-up #28754

Closes scylladb/scylladb#28912
2026-03-06 09:15:45 +01:00
Artsiom Mishuta
5d7a73cc5b test.py add support if non_gating tests
Add support for non_gating, the opposite of gating in dtest terminology, tests in test.py
codebase

This test will/should not be run by any current gating job (ci/next/nightly)

Closes scylladb/scylladb#28902
2026-03-06 09:39:32 +02:00
Andrei Chekun
01498a00d5 test.py: make HostRegistry singleton
HostRegistry initialized in several places in the framework, this can
lead to the overlapping IP, even though the possibility is low it's not
zero. This PR makes host registry initialized once for the master
thread and pytest. To avoid communication between with workers, each
worker will get its own subnet that it can use solely for its own goals.
This simplifies the solution while providing the way to avoid overlapping IP's.

Closes scylladb/scylladb#28520
2026-03-06 09:25:29 +02:00
Artsiom Mishuta
2be4d8074d test.py disable XFail tests on CI run
This PR disables running FXAIL tests on ci run to speed it up.

tests will continue run on "nightly" job and FAIL on unexpected pass
and will continue run on "NEXT" job and NOT FAIL on unexpected pass

Closes scylladb/scylladb#28886
2026-03-06 09:12:06 +02:00
Szymon Malewski
f9d213547f cql3: selection: fix add_column_for_post_processing for ORDER BY
The purpose of `add_column_for_post_processing` is to add columns that are required for processing of a query,
but are not part of SELECT clause and shouldn't be returned. They are added to the final result set, but later are not serialized.
Mainly it is used for filtering and grouping columns, with a special case of `WHERE primary_key IN ...  ORDER BY ...` when the whole result set needs additional final sorting,
and ordering columns must be added as well.
There was a bug that manifested in #9435, #8100 and was actually identified in #22061.
In case of selection with processing (e.g functions involved), result set row is formed in two stages.
Initially it is a list of columns fetched from replicas - on which filtering and grouping is performed.
After that the actual selection is resolved and the final number of columns can change.
Ordering is performed on this final shape, but the ordering column index returned by `add_column_for_post_processing` refereed to initial shape.
If selection refereed to the same column twice (e.g. `v, TTL(v)` as in #9435) final row was longer than initial and ordering refereed to incorrect column.
If a function in selection refereed to multiple columns (e.g. as_json(.., ..) which #8100 effectively uses) the final row was shorter
and ordering tried to use a non-existing column.

This patch fixes the problem by making sure that column index of the final result set is used for ordering.

The previously crashing test `cassandra_tests/validation/entities/json_test.py::testJsonOrdering` doesn't have to be skipped, but now it is failing on issue #28467.

Fixes #9435
Fixes #8100
Fixes #22061

Closes scylladb/scylladb#28472
2026-03-05 19:22:34 +02:00
Calle Wilund
ab3d3d8638 build: add slirp4netns to dependencies
Needed for port forwarded podman-in-podman containers

[avi:
  - move from Dockerfile to install-dependencies.sh so non-container
  builds also get it
  - regenerate frozen toolchain with optimized clang from

    https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-aarch64.tar.gz
    https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-x86_64.tar.gz
]

Closes scylladb/scylladb#28870
2026-03-05 17:44:17 +02:00
Tomasz Grabiec
b90fe19a42 Merge 'service: assert that tables updated via group0 use schema commitlog' from Aleksandra Martyniuk
Set enable_schema_commitlog for each group0 tables.

Assert that group0 tables use schema commitlog in ensure_group0_schema
(per each command).

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-914.

Needs backport to all live releases as all are vulnerable

Closes scylladb/scylladb#28876

* github.com:scylladb/scylladb:
  test: add test_group0_tables_use_schema_commitlog
  db: service: remove group0 tables from schema commitlog schema initializer
  service: ensure that tables updated via group0 use schema commitlog
  db: schema: remove set_is_group0_table param
2026-03-05 13:28:13 +01:00
Botond Dénes
509f2af8db Merge 'repair: Fix rwlock in compaction_state and lock holder lifecycle' from Raphael Raph Carvalho
Consider this:

- repair takes the lock holder
- tablet merge filber destories the compaction group and the compaction state
- repair fails
- repair destroy the lock holder

This is observed in the test:

```
repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036] Repair 1 out of 1 tablets: table=sec_index.users range=(432345564227567615,504403158265495551] replicas=[0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea:15, 498e354c-1254-4d8d-a565-2f5c6523845a:9, 5208598c-84f0-4526-bb7f-573728592172:28]

...

repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: Started to repair 1 out of 1 tables in keyspace=sec_index, table=users, table_id=ea2072d0-ccd9-11f0-8dba-c5ab01bffb77, repair_reason=repair
repair - Enable incremental repair for table=sec_index.users range=(432345564227567615,504403158265495551]
table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair
repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: get_sync_boundary: got error from node=0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea, keyspace=sec_index, table=users, range=(432345564227567615,504403158265495551], error=seastar::rpc::remote_verb_error (Compaction state for table [0x60f008fa34c0] not found)
compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge
compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge

....

scylla[10793] Segmentation fault on shard 28, in scheduling group streaming
```

The rwlock in compaction_state could be destroyed before the lock holder
of the rwlock is destroyed. This causes user after free when the lock
the holder is destroyed.

To fix it, users of repair lock will now be waited when a compaction
group is being stopped.
That way, compaction group - which controls the lifetime of rwlock -
cannot be destroyed while the lock is held.
Additionally, the merge completion fiber - that might remove groups -
is properly serialized with incremental repair.

The issue can be reproduced using sanitize build consistently and can not
be reproduced after the fix.

Fixes #27365

Closes scylladb/scylladb#28823

* github.com:scylladb/scylladb:
  repair: Fix rwlock in compaction_state and lock holder lifecycle
  repair: Prevent repair lock holder leakage after table drop
2026-03-05 14:18:25 +02:00
Patryk Jędrzejczak
f1978d8a22 raft: server: fix the repeating typo 2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak
5a43695f6a raft: clarify the comment about read_barrier_reply
The comment could be misleading. It could suggest that the returned index is
already safe to read. That's not necessarily true. The entry with the
returned index could, for example, be dropped by the leader if the leader's
entry with this index had a different term.
2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak
1ae2ae50a6 raft: read_barrier: update local commit_idx to read_idx when it's safe
When the local entry with `read_idx` belongs to the current term, it's
safe to update the local `commit_idx` to `read_idx`. The argument for
safety is in the new comment above `maybe_update_commit_idx_for_read`.

The motivation for this change is to speed up read barriers. `wait_for_apply`
executed at the end of `read_barrier` is delayed until the follower learns
that the entry with `read_idx` is committed. It usually happens quickly in
the `read_quorum` message. However, non-voters don't receive this message,
so they have to wait for `append_entries`. If no new entries are being
added, `append_entries` can come only from `fsm::tick_leader()`. For group0,
this happens once every 100ms.

The issue above significantly slows down cluster setups in tests. Nodes
join group0 as non-voters, and then they are met with several read barriers
just after a write to group0. One example is `global_token_metadata_barrier`
in `write_both_read_new` performed just after `update_topology_state` in
`write_both_read_old`.

Writing a test for this change would be difficult, so we trust the nemesis
tests to do the job. They have already found consistency issues in read
barriers. See #10578.
2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak
1cbd0da519 raft: log: clarify the specification of term_for
When `idx > last_idx()`, the function does an out-of-bounds access to `_log`.
This may look contradictory to the current specification.
2026-03-05 13:06:07 +01:00
Anna Stuchlik
855c503c63 doc: fix the unified installer instructions
This commit updates the documentation for the unified installer.

- The Open Source example is replaced with version 2025.1 (Source Available, currently supported, LTS).
- The info about CentOS 7 is removed (no longer supported).
- Java 8 is removed.
- The example for cassandra-stress is removed (as it was already removed on other installation pages).

Fixes https://github.com/scylladb/scylladb/issues/28150

Closes scylladb/scylladb#28152
2026-03-05 12:57:06 +02:00
Artsiom Mishuta
7b30a3981b test.py: enable strict_config,xfail_strict,strict-markers
this commit enables 3 strict pytest options:

strict_config - if any warnings encountered while parsing the pytest section of the configuration file will raise errors.
xfail_strict - if markers not registered in the markers section of the configuration file will raise errors.
strict-markers - if tests marked with @pytest.mark.xfail that actually succeed will by default fail the test suite

and fix errors that occur after enabling these options

Closes scylladb/scylladb#28859
2026-03-05 12:54:26 +02:00
Dawid Mędrek
7564a56dc8 Merge 'tombstone_gc: allow using repair-mode tombstone gc with RF=1 tables' from Botond Dénes
Currently, repair-mode tombstone-gc cannot be used on tables with RF=1. We want to make repair-mode the default for all tablet tables (and more, see https://github.com/scylladb/scylladb/issues/22814), but currently a keyspace created with RF=1 and later altered to RF>1 will end up using timeout-mode tombstone gc. This is because the repair-mode tombstone-gc code relies on repair history to determine the gc-before time for keys/ranges. RF=1 tables cannot run repairs so they will have empty repair history and consequently won't be able to purge tombstones.
This PR solves this by keeping a registry of RF=1 tables and consulting this registry when creating `tombstone_gc_state` objects. If the table is RF=1, tombstone-gc will work as if the table used immediate-mode tombstone-gc. The registry is updated on each replication update. As soon as the table is not RF=1 anymore, the tombstone-gc reverts to the natural repair-mode behaviour.

After this PR, tombstone-gc defaults to repair-mode for all tables, regardless of RF and tablets/vnodes.

Fixes: SCYLLADB-106.

New feature, no backport required.

Closes scylladb/scylladb#22945

* github.com:scylladb/scylladb:
  test/{boost,cluster}: add test for tombstone gc mode=repair with RF=1
  tombstone_gc: allow use of repair-mode for RF=1 tables
  replica/table: update rf=1 table registry in shared tombstone-gc state
  tombstone_gc: tombstone_gc_before_getter: consider RF when getting gc before time
  tombstone_gc: unpack per_table_history_maps
  tombstone_gc: extract _group0_gc_time from per_table_history_map
  tombstone_gc: drop tombstone_gc_state(nullptr) ctor and operator bool()
  test/lib/random_schema: use timeout-mode tombstone_gc
  tombstone_gc_options: add C++ friendly constructor
  test: move away from tombstone_gc_state(nullptr) ctor
  treewide: move away from tombstone_gc_state(nullptr) ctor
  sstable: move away from tombstone_gc_mode::operator bool()
  replica/table: add get_tombstone_gc_state()
  compaction: use tombstone_gc_state with value semantics
  db/row_cache: use tombstone_gc_state with value semantics
  tombstone_gc: introduce tombstone_gc_state::for_tests()
2026-03-05 11:50:31 +01:00
Piotr Dulikowski
a2669e9983 test: test_mv_merge_allowed: add mistakenly omitted awaits
The test test_mv_merge_allowed asserts in two places that the tablet
count is 2. It does so by calling an async function but, mistakenly, the
returned coroutine was not awaited. The coroutine is, apparently, truthy
so the assertions always passed.

Fix the test to properly await the coroutines in the assertions.

Fixes: SCYLLADB-905

Closes scylladb/scylladb#28875
2026-03-05 11:29:23 +01:00
Avi Kivity
5ae40caa6d dist: tune tcp_mem to 3% of total memory in scylla-kernel-conf package
tcp_mem defaults to 9% of total memory. ScyllaDB defaults to 93%. The
sum is more than 100%.

Fix by tuning tcp_mem to 3% of total memory.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-734

Closes scylladb/scylladb#28700
2026-03-05 12:51:04 +03:00
Patryk Jędrzejczak
bb1a798c2c Merge 'raft: Throw stopped_error if server aborted' from Dawid Mędrek
This PR solves a series of similar problems related to executing
methods on an already aborted `raft::server`. They materialize
in various ways:

* For `add_entry` and `modify_config`, a `raft::not_a_leader` with
  a null ID will be returned IF forwarding is disabled. This wasn't
  a big problem because forwarding has always been enabled for group0,
  but it's something that's nice to fix. It's might be relevant for
  strong consistency that will heavily rely on this code.

* For `wait_for_leader` and `wait_for_state_change`, the calls may
  hang and never resolve. A more detailed scenario is provided in a
  commit message.

For the last two methods, we also extend their descriptions to indicate
the new possible exception type, `raft::stopped_error`. This change is
correct since either we enter the functions and throw the exception
immediately (if the server has already been aborted), or it will be
thrown upon the call to `raft::server::abort`.

We fix both issues. A few reproducer tests have been included to verify
that the calls finish and throw the appropriate errors.

Fixes SCYLLADB-841

Backport: Although the hanging problems haven't been spotted so far
          (at least to the best of my knowledge), it's best to avoid
          running into a problem like that, so let's backport the
          changes to all supported versions. They're small enough.

Closes scylladb/scylladb#28822

* https://github.com/scylladb/scylladb:
  raft: Make methods throw stopped_error if server aborted
  raft: Throw stopped_error if server aborted
  test: raft: Introduce get_default_cluster
2026-03-05 10:47:39 +01:00
Marcin Maliszkiewicz
c3f59e4fa1 Merge 'cql3: implement write_consistency_levels guardrails' from Andrzej Jackowski
This patch series implements `write_consistency_levels_warned` and `write_consistency_levels_disallowed` guardrails, allowing the configuration of which consistency levels are unwanted for writes. The motivation for these guardrails is to forbid writing with consistency levels that don't provide high durability guarantees (like CL=ANY, ONE, or LOCAL_ONE).

Neither guardrail is enabled by default, so as not to disrupt clusters that are currently using any of the CLs for writes. The warning guardrail may seem harmless, as it only adds a warning to the CQL response; however, enabling it can significantly increase network traffic (as a warning message is added to each response) and also decrease throughput due to additional allocations required to prepare the warning. Therefore, both guardrails should be enabled with care. The newly added `writes_per_consistency_level` metric, which is incremented unconditionally, can help decide whether a guardrail can be safely enabled in an existing cluster.

This commit adds additional `if` instructions on the critical path. However, based on the `perf_simple_query` benchmark for writes, the difference is marginal (~40 additional instructions, which is a relative difference smaller than 0.001).

BEFORE:
```
291443.35 tps ( 53.3 allocs/op,  16.0 logallocs/op,  14.2 tasks/op,   48067 insns/op,   18885 cycles/op,        0 errors)
throughput:
 mean=   289743.07 standard-deviation=6075.60
 median= 291424.69 median-absolute-deviation=1702.56
 maximum=292498.27 minimum=261920.06
instructions_per_op:
 mean=   48072.30 standard-deviation=21.15
 median= 48074.49 median-absolute-deviation=12.07
 maximum=48119.87 minimum=48019.89
cpu_cycles_per_op:
 mean=   18884.09 standard-deviation=56.43
 median= 18877.33 median-absolute-deviation=14.71
 maximum=19155.48 minimum=18821.57
```

AFTER:
```
290108.83 tps ( 53.3 allocs/op,  16.0 logallocs/op,  14.2 tasks/op,   48121 insns/op,   18988 cycles/op,        0 errors)
throughput:
 mean=   289105.08 standard-deviation=3626.58
 median= 290018.90 median-absolute-deviation=1072.25
 maximum=291110.44 minimum=274669.98
instructions_per_op:
 mean=   48117.57 standard-deviation=18.58
 median= 48114.51 median-absolute-deviation=12.08
 maximum=48162.18 minimum=48087.18
cpu_cycles_per_op:
 mean=   18953.43 standard-deviation=28.76
 median= 18945.82 median-absolute-deviation=20.84
 maximum=19023.93 minimum=18916.46
```

Fixes: SCYLLADB-259
Refs: SCYLLADB-739
No backport, it's a new feature

Closes scylladb/scylladb#28570

* github.com:scylladb/scylladb:
  scylla.yaml: add write CL guardrails to scylla.yaml
  scylla.yaml: reorganize guardrails config to be in one place
  test: add cluster tests for write CL guardrails
  test: implement test_guardrail_write_consistency_level
  cql3: start using write CL guardrails
  cql3/query_processor: implement metrics to track CL of writes
  db: cql3/query_processor: add write_consistency_levels enum_sets
  config: add write_consistency_levels_* guardrails configuration
2026-03-05 09:55:38 +01:00
Yauheni Khatsianevich
aa85f5a9c3 test: migrating alternator ttl tests to scylla repo
migrating alternator_ttl_tests.py to scylla repo as part of
deprecating dtest framework
migrated tests:
- test_ttl_with_load_and_decommission

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-869

Closes scylladb/scylladb#28858
2026-03-05 10:04:14 +02:00
Nadav Har'El
8e32d97be6 test/alternator: fix run script
The test/alternator/run script currently fails, Scylla fails to boot
complaining that "--alternator-ttl-period-in-seconds" is specified
twice (which is, unfortunately, not allowed). The problem is that
recently we started to set this option in test/cqlpy/run.py, for
CQL's new row-level TTL, so now it is no longer needed in
test/alternator/run - and in fact not allowed and we must remove it.

This patch only affects the script test/alternator/run, and has no
affect on running tests through test.py or Jenkins.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#28868
2026-03-05 10:06:38 +03:00
Botond Dénes
9b2242c752 test/cluster/test_repair.py: fix test_repair_timtestamp_difference
This test forgot to await its check() calls, which is the pass-condition
of the test. Once the await was added, the test started failing. Turns
out, the test was broken, but this was never discovered, because due to
the missing await, the errors were not propagated.
This patch adds the missing await and fixes the discovered problems:
* Use cql.run_async() instead of cql.execute()
* Fix json path for timestamp
* Missing flush/compact

Fixes: SCYLLADB-911

Closes scylladb/scylladb#28883
2026-03-05 10:04:49 +03:00
Nadav Har'El
af07718fff test/cqlpy: fix "run --release" for versions 5.4 or older
Recently we started to rely on the options "--auth-superuser-name"
and "--auth-superuser-salted-password" to ensure that a
cassandra/cassandra user exists for tests - without those options
a default superuser no longer exists.

This broke "test/cqlpy/run --release" for old releases, earlier
than 5.4 (in the enterprise stream, 2024.1 or earlier), because
those old release didn't have this option.

So in this patch we fix the "--release" logic that removes these
options from the command line when running these old versions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#28894
2026-03-05 09:59:46 +03:00
Botond Dénes
5e7b966d37 Merge 'Remove prepare_snapshot_for_backup() helper from backup/restore tests' from Pavel Emelyanov
The helper in question duplicates the functionality of `take_snapshot()` one from the same file. The only difference is that it additionally creates keyspace:table with yet another helper, but that helper is also going to be removed (as continuation of #28600 and #28608)

Enhancing tests, not backporting

Closes scylladb/scylladb#28834

* github.com:scylladb/scylladb:
  test_backup: Remove prepare_snapshot_for_backup()
  test_backup: Patch test_simple_backup_and_restore to use take_snapshot()
  test_backup: Patch backup tests to use take_snapshot()
  test_backup: Add helper to take snapshot on a single server
2026-03-05 06:54:07 +02:00
Dani Tweig
25fc8ef14c Add RELENG to milestone-to-Jira sync project keys
Closes scylladb/scylladb#28889
2026-03-05 06:51:21 +02:00
Calle Wilund
35aab75256 test_internode_compression: Add await for "run" coro:s
Fixes: SCYLLADB-907

Closes scylladb/scylladb#28885
2026-03-05 06:50:33 +02:00
Patryk Jędrzejczak
2a3476094e storage_service: raft_topology_cmd_handler: fix use-after-free
8e9c7397c5 made `rs` a reference, which can
lead to use-after-free. The `normal_nodes` map containing the referenced
value can be destroyed before the last use of `rs` when the topology state
is reloaded after a context switch on some `co_await`. The following move
assignment in `storage_service::topology_state_load` causes this:
```
_topology_state_machine._topology = co_await _sys_ks.local().load_topology_state(tablet_hosts);
```

This issue has been discovered in next-2026.1 CI after queueing the
backport of #28558. `test_truncate_during_topology_change` failed after
ASan reported a heap-use-after-free in
```
co_await _repair.local().bootstrap_with_repair(get_token_metadata_ptr(), rs.ring.value().tokens, session);
```
This test enables `delay_bootstrap_120s`, which makes the bug much more
likely to reproduce, but it could happen elsewhere.

No backport needed, as the only backport of #28558 hasn't been merged yet.
The backport PR will cherry-pick this commit.

Closes scylladb/scylladb#28772
2026-03-05 01:50:22 +01:00
Aleksandra Martyniuk
156c29f962 test: add test_group0_tables_use_schema_commitlog 2026-03-04 17:25:06 +01:00
Aleksandra Martyniuk
5306e26b83 db: service: remove group0 tables from schema commitlog schema initializer
Remove group0 tables from schema commitlog schema initializer.
The schema commitlog of group0 tables is ensured by set_is_group0_table.
2026-03-04 17:25:06 +01:00
Aleksandra Martyniuk
690b2c4142 service: ensure that tables updated via group0 use schema commitlog
Set enable_schema_commitlog for each group0 tables.

Assert that group0 tables use schema commitlog in ensure_group0_schema
(per each command).

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-914.
2026-03-04 17:25:04 +01:00
Aleksandra Martyniuk
6b3b174704 db: schema: remove set_is_group0_table param
set_is_group0_table takes an enabled flag, based on which it decides
whether it's a group0 table. The method is called only with enabled = true.

Drop the param. For not group0 tables nothing should be set.
2026-03-04 17:24:34 +01:00
Aleksandra Martyniuk
57f1e46204 test: cluster: tasks: await set_task_ttl
Await set_task_ttl in test_tablet_repair_task_children.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-912

Closes scylladb/scylladb#28882
2026-03-04 17:58:37 +02:00
Dawid Mędrek
d44fc00c4c raft: Make methods throw stopped_error if server aborted
After the previous changes in `raft::server::{add_entry, modify_config}`
(cf. SCYLLADB-841), we also go through other methods of `raft::server`
and verify that they handle the aborted state properly.

I found two methods that do not:

(A) `wait_for_leader`
(B) `wait_for_state_change`

What happened before these changes?

In case (A), the dangerous scenario occurred when `_leader_promise` was
empty on entering the function. In that case, we would construct the
promise and wait on the corresponding future. However, if the server
had been already aborted before the call, the future would never
resolve and we'd be effectively stuck.

Case (B) is fully analogous: instead of `_leader_promise`, we'd work
with `_stte_change_promise`.

There's probably a more proper solution to this problem, but since I'm
not familiar with the internal code of Raft, I fix it this way. We can
improve it further in the future.

We provide two simple validation tests. They verify that after aborting
a `raft::server`, the calls:

* do not hang (the tests would time out otherwise),
* throw raft::stopped_error.

Fixes SCYLLADB-841
2026-03-04 16:28:11 +01:00
Dawid Mędrek
c200d6ab4f raft: Throw stopped_error if server aborted
Before the change, calling `add_entry` or `modify_config` on an already
aborted Raft server could result in an error `not_a_leader` containing
a null server ID. It was possible precisely when forwarding was
disabled in the server configuration.

`not_a_leader` is supposed to return the ID of the current leader,
so that was wrong. Furthermore, the description of the function
specified that if a server is aborted, then it should throw
`stopped_error`.

We fix that issue. A few small reproducer tests were provided to verify
that the functions behave correctly with and without forwarding enabled.

Refs SCYLLADB-841
2026-03-04 16:28:08 +01:00
Marcin Maliszkiewicz
9697b6013f Merge 'test: add missing awaits in test_client_routes_upgrade' from Andrzej Jackowski
Two calls in test_client_routes_upgrade were missing `await`,
so they were never actually executed. This caused Python
to emit RuntimeWarning about unawaited coroutines, and more
importantly, the test skipped important verification steps, which
could mask real bugs or cause flakiness.

Additionally, increase 10s timeouts to 60s to avoid flakiness in slow
environments. Although these tests haven't failed so far, similar
issues have already been observed in other tests with too-short
timeouts.

Fixes: [SCYLLADB-909](https://scylladb.atlassian.net/browse/SCYLLADB-909)

Backport to 2026.1, as the test is also there.

[SCYLLADB-909]: https://scylladb.atlassian.net/browse/SCYLLADB-909?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#28877

* github.com:scylladb/scylladb:
  test: increase timeouts in test_client_routes.py
  test: add missing awaits in test_client_routes_upgrade
2026-03-04 15:26:34 +01:00
Szymon Malewski
212bd6ae1a vector: Vectorize loops in similarity functions
The main loops iterating over vector components were not vectorized due to:
- "cannot prove it is safe to reorder floating-point operations"
- "Cannot vectorize early exit loop with more than one early exit"
The first issue is fixed with adding `#pragma clang fp contract(fast) reassociate(on)`, which allows compiler to optimize floating point operations.
The second issue is solved by refactoring the operations in the affected loop.
Additionally using float operations instead of double increases throughput and numerical accuracy is not the main consideration in vector search scenarios.

Performance measured:
- scylla built using dbuild
- using https://github.com/zilliztech/VectorDBBench (modified to call `SELECT id, similarity_cosine({vector<float, 1536>}, {vector<float, 1536>}) ...` without ANN search):
- client concurrency 20
before: ~2250 QPS
`float` operations: ~2350 QPS
`compute_cosine_similarity` vectorization: ~2500QPS
`extract_float_vector` vectorization: ~3000QPS

Follow-up https://github.com/scylladb/scylladb/pull/28615
Ref https://scylladb.atlassian.net/browse/SCYLLADB-764

Closes scylladb/scylladb#28754
2026-03-04 15:14:53 +01:00
Andrzej Jackowski
221b78cb81 test: increase timeouts in test_client_routes.py
Increase 10s timeouts to 60s to avoid flakiness in slow
environments. Although these tests haven't failed so far, similar
issues have already been observed in other tests with too-short
timeouts.

Test execution time is unaffected; the entire suite in `dev` takes ~30s
before and after this change.
2026-03-04 13:40:30 +01:00
Andrzej Jackowski
527c4141da test: add missing awaits in test_client_routes_upgrade
Two calls in test_client_routes_upgrade were missing `await`,
so they were never actually executed. This caused Python
to emit RuntimeWarning about unawaited coroutines, and more
importantly, the test skipped important verification steps, which
could mask real bugs or cause flakiness.

Fixes: SCYLLADB-909
2026-03-04 13:34:37 +01:00
Piotr Smaron
a31cb18324 db: fix UB in system.clients row sorting
The comparator used to sort per-IP client rows was not a strict-weak-ordering (it could return true in both directions for some pairs), which makes `std::ranges::sort` behavior undefined. A concrete pair that breaks it (and is realistic in system.clients):
a = (port=9042, client_type="cql")
b = (port=10000, client_type="alternator")
With the current comparator:
cmp(a,b) = (9042 < 10000) || ("cql" < "alternator") = true || false = true
cmp(b,a) = (10000 < 9042) || ("alternator" < "cql") = false || true = true
So both directions are true, meaning there is no valid ordering that sort can achieve.

The fix is to sort lexicographically by (port, client_type) to match the table's clustering key and ensure deterministic ordering.

Closes scylladb/scylladb#28844
2026-03-04 14:10:49 +03:00
Avi Kivity
c331796d28 Merge 'Support Min < Precision for approx_exponential_histogram' from Amnon Heiman
This series closes a gap in the approx_exponential_histogram implementation to
cover integer values starting from small Min values.

While the original implementation was focused on durations, where this limitation
was not an issue, over time, there has been a growing need for histograms that
cover smaller values, such as the number of SSTables or the number of items in a
batch.

The reason for the original limitation is inherent to the exponential histogram
math. The previous code required Min to be at least Precision to avoid negative
bit shifts in the exponential calculations.

After this series, approx_exponential_histogram allows Min to be smaller than
Precision by scaling values during indexing. The value is shifted left by
log2 Precision minus log2 Min or zero whichever is larger, and the existing
exponential math is applied. Bucket limits are then scaled back to the original
units. This keeps insertion and retrieval O(1) without runtime branching, at the
cost of repeated bucket limits for some values in the Min to Precision range.

Additional tests cover the new behavior.
Relates to #2785

** New feature, no need to backport. **

Closes scylladb/scylladb#28371

* github.com:scylladb/scylladb:
  estimated_histogram_test.cc: add to_metrics_histogram test
  histogram_metrics_helper.hh: Support Min < Precision
  estimated_histogram_test.cc: Add tests for approx_exponential_histogram with Min<Precision
  estimated_histogram.hh: support Min less than Precision histograms
2026-03-04 12:43:26 +02:00
Szymon Malewski
4c4673e8f9 test: vector_similarity: Fix similarity value checks
`isclose` function checks if returned similarity floats are close enough to expected value, but it doesn't `assert` by itself.
Several tests missed that `assert`, effectively always passing.
With this patch similarity values checks are wrapped in helper function `assert_similarity` with predefined tolerance.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-877

Closes scylladb/scylladb#28748
2026-03-04 09:53:32 +01:00
Marcin Maliszkiewicz
c7d3f80863 Merge 'auth: do not create default 'cassandra:cassandra' superuser' from Dario Mirovic
This patch series removes creation of default 'cassandra:cassandra' superuser on system start.

Disable creation of a superuser with default 'cassandra:cassandra' credentials to improve security. The current flow requires clients to create another superuser and then drop the default `cassandra:cassandra' role. For those who do, there is a time window where the default credentials exist. For those who do not, that role stays. We want to improve security by forcing the client to either use config to specify default values for default superuser name and password or use cqlsh over maintenance socket connection to explicitly create/alter a superuser role.

The patch series:
- Enable role modification over the maintenance socket
- Stop using default 'cassandra' value for default superuser, skipping creation instead

Design document: https://scylladb.atlassian.net/wiki/spaces/RND/pages/165773327/Drop+default+cassandra+superuser

Fixes scylladb/scylla-enterprise#5657

This is an improvement. It does not need a backport.

Closes scylladb/scylladb#27215

* github.com:scylladb/scylladb:
  config: enable maintenance socket in workdir by default
  docs: auth: do not specify password with -p option
  docs: update documentation related to default superuser
  test: maintenance socket role management
  test: cluster: add logs to test_maintenance_socket.py
  test: pylib: fix connect_driver handling when adding and starting server
  auth: do not create default 'cassandra:cassandra' superuser
  auth: remove redundant DEFAULT_USER_NAME from password authenticator
  auth: enable role management operations via maintenance socket
  client_state: add has_superuser method
  client_state: add _bypass_auth_checks flag
  auth: let maintenance_socket_role_manager know if node is in maintenance mode
  auth: remove class registrator usage
  auth: instantiate auth service with factory functors
  auth: add service constructor with factory functors
  auth: add transitional.hh file
  service: qos: handle special scheduling group case for maintenance socket
  service: qos: use _auth_integration as condition for using _auth_integration
2026-03-04 09:43:57 +01:00
Piotr Dulikowski
85dcbfae9a Merge 'hint: Don't switch group in database::apply_hint()' from Pavel Emelyanov
The method is called from storage_proxy::mutate_hint() which is in turn called from hint_mutation::apply_locally(). The latter is either called from directly by hint sender, which already runs in streaming group, or via RPC HINT_MUTATION handler which uses index 1 that negotiates streaming group as well.

To be sure, add a debugging check for current group being the expected one.

Code cleanup, not backporting

Closes scylladb/scylladb#28545

* github.com:scylladb/scylladb:
  hint: Don't switch group in database::apply_hint()
  hint_sender: Switch to sender group on stop either
2026-03-04 09:36:38 +01:00
Pavel Emelyanov
5793e305b5 test_backup: Remove prepare_snapshot_for_backup()
It's now unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-03-04 11:33:43 +03:00
Pavel Emelyanov
ffbd9a3218 test_backup: Patch test_simple_backup_and_restore to use take_snapshot()
This change is a bit more careful, as the test collects files from
snapshot directory several times. Before patching it to use the helper,
it collected _all_ the files. Now the helper only provides TOC-s, but
that's fine -- the only check that relies on that may also re-collect
TOC-s and compare new set with old set.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-03-04 11:33:43 +03:00