Commit Graph

49984 Commits

Author SHA1 Message Date
Ernest Zaslavsky
413739824f s3_client: track memory starvation in background filling fiber
Introduce a counter metric to monitor instances where the background
filling fiber is blocked due to insufficient memory in the S3 client.

Closes scylladb/scylladb#26466
2025-10-14 11:22:54 +03:00
Łukasz Paszkowski
125bf391a7 utils/directories: ignore files when retrieving stats fails
During Scylla startup, directories are created and verified in
`directories::do_verify_owner_and_mode()`. It is possible that while
retrieving file stats, a file might be removed, leading to Scylla
failing to boot.

This is particularly visible in `storage/test_out_of_space.py` tests,
which use FUSE to mount size-limited volumes. When a file that is open
by another process is removed, FUSE renames it to `.fuse_hidden*`.

In `directories::do_verify_owner_and_mode()`, the code performs a
`scan_dir` to list files and retrieves their stats to verify type, mode,
and ownership. If a file is removed while retrieving its stats, we see
errors such as:

```
Failed to get /scylladir/testlog/x86_64/dev/volumes/e0125c60-1e63-4330-bf6f-c0ea3e466919/scylla-0/hints/1/.fuse_hidden0000001800000005
```

This change makes `do_verify_owner_and_mode()` ignore files when
retrieving stats fails, avoiding spurious errors during verification.

Refs: https://github.com/scylladb/scylladb/issues/26314

Closes scylladb/scylladb#26535
2025-10-13 20:41:25 +03:00
Dawid Mędrek
7d017748ab db/commitlog: Extend segment truncation error messages
We include more relevant information for debugging purposes:
the remaining bytes and the size. It might be useful to determine
where exactly an error occurred and help reason about it.

Closes scylladb/scylladb#26486
2025-10-13 17:42:31 +03:00
Nadav Har'El
06108ea020 test/alternator: a small cleanup for a test in test_streams.py
This patch makes three small mostly-cosmetic improvements to a test in
test/alternator/test_streams.py:

1. The test is renamed "test_streams_deleteitem_old_image_no_ck" to
   emphasize its focus on the combination of deleteitem, old image,
   and no ck. The "putitem" we had in the name was not relevant, and
   the "old_image" was missing and important.

2. Moreover, using PutItem in this test just to set up the test scenario
   mixed the bug which the test tries to reproduced with a different
   only-recently-fixed bug (that PutItem also generated a spurious
   "REMOVE" event). So I changed the use of PutItem by using UpdateItem,
   to make this test indepedent of the other bug. Test independence is
   important because it allows us - if we want - to backport a fix for
   just one bug independently of the fix to the other bug.

3. Also improved the comment in front of the test to mention where we
   already tested the with-ck case, and also to mention issue 26382
   which this test reproduces (the xfail line also mentions it, but
   the xfail line will be removed when the bug is fixed - but the
   mention in the comment will remain - and should remain.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26526
2025-10-13 17:42:31 +03:00
Piotr Dulikowski
1cf944577b Merge 'Fix vector store client flaky test' from Karol Nowacki
This series of patches improves test vector_store_client_test stability. The primary issue with flaky connections was discovered while working on PR #26308.

Key Changes:
- Fixes premature connection closures in the mock server:
The mock HTTP server was not consuming request payloads, causing it to close connections immediately after a response. Subsequent tests attempting to reuse these closed connections would fail intermittently, leading to flakiness. The server has been updated to handle payloads correctly.

- Removes a retry workaround:
With the underlying connection issue resolved, the retry logic in the vector_store_client_test_ann_request test is no longer needed and has been removed.

- Mocks the DNS resolver in tests:
The vector_store_client_uri_update_to_invalid test has been corrected to mock DNS lookups, preventing it from making real network requests.

- Corrects request timeout handling:
A bug has been fixed where the request timeout was not being reset between consecutive requests.

- Unifies test timeouts:
Timeouts have been standardized across the test suite for consistency.

Fixes: #26468

It is recommended to backport this series to the 2025.4 branch. Since these changes only affect test code and do not alter any production logic, the backport is safe. Addressing this test flakiness will improve the stability of the CI pipeline and prevent it from blocking unrelated patches.

Closes scylladb/scylladb#26374

* github.com:scylladb/scylladb:
  vector_search: Unify test timeouts
  vector_search: Fix missing timeout reset
  vector_search: Refactor ANN request test
  vector_search: Fix flaky connection in tests
  vector_search: Fix flaky test by mocking DNS queries
2025-10-13 17:42:31 +03:00
Avi Kivity
c783f0e539 Merge 'index: Prefer const qualifiers wherever possible' from Dawid Mędrek
We add missing `const`-qualifiers wherever possible in the module.
A few smaller changes were included as a bonus.

Backport: not needed. This is a cleanup.

Closes scylladb/scylladb#26485

* github.com:scylladb/scylladb:
  index/secondary_index_manager: Take std::span instead of std::vector
  index/secondary_index_manager: Add missing const qualifier
  index/vector_index: Add missing const qualifiers
  cql3/statements/index_prop_defs.cc: Remove unused include
  cql3/statements/index_prop_defs.cc: Mark function as TU-local
  cql3/statements/index_prop_defs: Mark methods as const-qualified
2025-10-12 19:47:53 +03:00
Michał Chojnowski
93dac3d773 sstables/compressor: relax a large allocation warning in ZSTD_CDict creation
ZSTD_CDict needs a big contiguous allocation and there's no way around that.
The only thing to do is relax the warning appropriately.

Closes scylladb/scylladb#25393
2025-10-12 18:21:11 +03:00
Botond Dénes
24c6476f73 mutation/mutation_compactor: add tombstone_gc_state to query ctor
So tombstones can be purged correctly based on the tombstone gc mode.
Currently if repair-mode is used, tombstones are not purged at all,
which can lead to purged tombstone being re-replicated to replicas which
already purged them via read-repair.
This is not a correctness problem, tombstones are not included in data
query resutl or digest, these purgable tombstone are only a nuissance
for read repair, where they can create extra differences between
replicas. Note that for the read repair to trigger, some difference
other than in purgable tombstones has to exist, because as mentioned
above, these are not included in digets.

Fixes: scylladb/scylladb#24332

Closes scylladb/scylladb#26351
2025-10-12 17:48:15 +03:00
Botond Dénes
d9c3772e20 service/storage_proxy: send batches with CL=EACH_QUORUM
Batches that fail on the initial send are retired later, until they
succeed. These retires happen with CL=ALL, regardless of what the
original CL of the batch was. This is unnecessarily strict. We tried to
follow Cassandra here, but Cassandra has a big caveat in their use of
CL=ALL for batches. They accept saving just a hint for any/all of the
endpoints, so a batch which was just logged in hints is good enough for
them.
We do not plan on replicating this usage of hints at this time, so as a
middle ground, the CL is changed to EACH_QUORUM.

Fixes: scylladb/scylladb#25432

Closes scylladb/scylladb#26304
2025-10-12 17:18:41 +03:00
Michał Chojnowski
7c6e84e2ec test/boost/sstable_compressor_factory_test: fix thread-unsafe usage of Boost.Test
It turns out that Boost assertions are thread-unsafe,
(and can't be used from multiple threads concurrently).
This causes the test to fail with cryptic log corruptions sometimes.
Fix that by switching to thread-safe checks.

Fixes scylladb/scylladb#24982

Closes scylladb/scylladb#26472
2025-10-12 17:16:51 +03:00
Piotr Wieczorek
8cd9f5d271 test/alternator: Add a Streams test reproducing #26382
This commit adds a test that reproduces an issue, wherein OldImage isn't
included in the REMOVE events produced by Alternator Streams.

Refs https://github.com/scylladb/scylladb/issues/26382

Closes scylladb/scylladb#26383
2025-10-12 11:09:57 +03:00
Piotr Wieczorek
a55c5e9ec7 alternator: Correct RCU undercount in BatchGetItem
The `describe_multi_item` function treated the last reference-captured
argument as the number of used RCU half units. The caller
`batch_get_item`, however, expected this parameter to hold an item size.
This RCU value was then passed to
`rcu_consumed_capacity_counter::get_half_units`, treating the
already-calculated RCU integer as if it were a size in bytes.

This caused a second conversion that undercounted the true RCU. During
conversion, the number of bytes is divided by `RCU_BLOCK_SIZE_LENGTH`
(=4KB), so the double conversion divided the number of bytes by 16 MB.

The fix removes the second conversion in `describe_multi_item` and
changes the API of `describe_multi_item`.

Fixes: https://github.com/scylladb/scylladb/pull/25847

Closes scylladb/scylladb#25842
2025-10-12 10:42:32 +03:00
Karol Nowacki
62deea62a4 vector_search: Unify test timeouts
The test previously used separate timeouts for requests (5s) and the
overall test case (10s).

This change unifies both timeouts to 10 seconds.
2025-10-10 16:49:06 +02:00
Karol Nowacki
0de1fb8706 vector_search: Fix missing timeout reset
The `vector_store_client_test` could be flaky because the request timeout
was not consistently reset in all code paths. This could lead to a
timeout from a previous operation firing prematurely and failing the
test.

The fix ensures `abort_source_timeout` is reset before each request.
The implementation is also simplified by changing
`abort_source_timeout::reset` that combines the reset and arm
operations into a same invocation.
2025-10-10 16:48:54 +02:00
Karol Nowacki
d99a4c3bad vector_search: Refactor ANN request test
Refactor the `vector_store_client_test_ann_request` test to use the
`vs_mock_server` class, unifying the structure of the test cases.

This change also removes retry logic that waited for the server to be ready.
This is no longer necessary because the handler now exists for all index names
and consumes the entire request payload, preventing connection closures.

Previously, the server did not handle requests for unconfigured
indexes, which caused the connection to close. This could lead to a
race condition where the client would attempt to reuse a closed
connection.
2025-10-10 16:48:20 +02:00
Karol Nowacki
2eb752e582 vector_search: Fix flaky connection in tests
The vector store mock server was not reading the ANN request body,
which could cause it to prematurely close the connection.

This could lead to a race condition where the client attempts to reuse a
closed connection from its pool, resulting in a flaky test.

The fix is to always read the request body in the mock server.
2025-10-10 16:48:09 +02:00
Karol Nowacki
ac5e9c34b6 vector_search: Fix flaky test by mocking DNS queries
The `vector_store_client_uri_update_to_invalid` test was flaky because
it performed real DNS lookups, making it dependent on the network
environment.

This commit replaces the live DNS queries with a mock to make the test
hermetic and prevent intermittent failures.

`vector_search_metrics_test` test did not call configure{vs},
as a consequence the test did real DNS queries, which made the test
flaky.

The refreshes counter increment has been moved before the call to the resolver.
In tests, the resolver is mocked leading to lack of increments in production code.
Without this change, there is no way to test DNS counter increments.

The change also simplifies the test making it more readable.
2025-10-10 16:47:03 +02:00
Patryk Jędrzejczak
5f68b9dc6b test: test_raft_no_quorum: test_can_restart: deflake the read barrier call
Expecting the group 0 read barrier to succeed with a timeout of 1s, just
after restarting 3 out of 5 voters, turned out to be flaky. In some
unlikely scenarios, such as multiple vote splits, the Raft leader
election could finish after the read barrier times out.

To deflake the test, we increase the timeout of Raft operations back to
300s for read barriers we expect to succeed.

Fixes #26457

Closes scylladb/scylladb#26489
2025-10-10 15:22:39 +03:00
Asias He
13dd88b010 repair: Rename incremental mode name
Using the name regular as the incremental mode could be confusing, since
regular might be interpreted as the non-incremental repair. It is better
to use incremental directly.

Before:

- regular (standard incremental repair)
- full (full incremental repair)
- disabled (incremental repair disabled)

After:

- incremental (standard incremental repair)
- full (full incremental repair)
- disabled (incremental repair disabled)

Fixes #26503

Closes scylladb/scylladb#26504
2025-10-10 15:21:54 +03:00
Michał Chojnowski
85fd4d23fa test_sstable_compression_dictionaries_basic: reconnect robustly after node reboots
Using `driver_connect()` after a cluster restart isn't enough to ensure
full CQL availability, but the test assumes that it is.

Fix that by making the test wait for CQL availability via `get_ready_cql()`.

Also, replace some manual usages of wait_for_cql_and_get_hosts with
`get_ready_cql()` too.

Fixes scylladb/scylladb#25362

Closes scylladb/scylladb#25366
2025-10-10 14:27:02 +03:00
Piotr Dulikowski
0b800aab17 Merge 'db/view/view_building_worker: move discover_existing_staging_sstables() to the foreground' from Michał Jadwiszczak
db/view/view_building_worker: move discover_existing_staging_sstables() to the foreground
This patch moves `discover_existing_staging_sstables()` to be executed
from main level, instead of running it on the background fiber.

This method need to be run only once during the startup to collect
existing staging sstables, so there is no need to do it in the
background. This change will increase debugability of any further issues
related to it (like https://github.com/scylladb/scylladb/issues/26403).

Fixes https://github.com/scylladb/scylladb/issues/26417

The patch should be backported to 2025.4

Closes scylladb/scylladb#26446

* github.com:scylladb/scylladb:
  db/view/view_building_worker: move discover_existing_staging_sstables() to the foreground
  db/view/view_building_worker: futurize and rename `start_background_fibers()`
2025-10-09 18:24:50 +02:00
Michał Jadwiszczak
8d0d53016c db/view/view_building_worker: update state again if some batch was finished during the update
There was a race between loop in `view_building_worker::run_view_building_state_observer()`
and a moment when a batch was finishing its work (`.finally()` callback
in `view_building_worker::batch::start()`).

State observer waits on `_vb_state_machine.event` CV and when it's
awoken, it takes group0 read apply mutex and updates its state. While
updating the state, the observer looks at `batch::state` field and
reacts to it accordingly.
On the other hand, when a batch finishes its work, it sets `state` field
to `batch_state::finished` and does a broadcast on
`_vb_state_machine.event` CV.
So if the batch will execute the callback in `.finally()` while the
observer is updating its state, the observer may miss the event on the
CV and it will never notice that the batch was finished.

This patch fixes this by adding a `some_batch_finished` flag. Even if
the worker won't see an event on the CV, it will notice that the flag
was set and it will do next iteration.

Fixes scylladb/scylladb#26204

Closes scylladb/scylladb#26289
2025-10-09 18:17:22 +02:00
Avi Kivity
55d4d39ae3 Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski
This is a cherry-pick of https://github.com/scylladb/scylladb/pull/25412 commits, as the changes were reverted in 364316dd2f2212bbbb446eaa2a4b0bd53d125ad5 due to https://github.com/scylladb/scylladb/issues/26163.
The underlying problem (https://github.com/scylladb/scylladb/issues/26190) was fixed in seastar (https://github.com/scylladb/seastar/pull/2994), so https://github.com/scylladb/scylladb/pull/25412 commits are restored without changes (only rebase conflicts were resolved).

===
This patch series:
 - Increases the number of allowed scheduling groups to allow creation of `sl:driver`
 - Implements `create_driver_service_level` that creates `sl:driver` with shares=200 if it wasn't already created
 - Implements creation of `sl:driver` for new systems and tests in `raft_initialize_discovery_leader`
 - Modifies `topology_coordinator` to use  create `sl:driver` after upgrades.
 - Implements using `sl:driver` for new connections in `transport/server`
 - Adds to `transport/server` recognition of driver's control connections and forcing them to keep using `sl:driver`.
 - Adds tests to verify the new functionality
 - Modifies existing tests to let them pass after `sl:driver` is added
 - Modifies the documentation to contain new `sl:driver`

The changes were evaluated by a test with the following scenario ([test_connections-sl-driver.py](https://github.com/user-attachments/files/22021273/test_connections-sl-driver.py)):
 - Start ScyllaDB with one node
 - Create 1000 keyspaces, 1 table in each keyspace
 - Start `cassandra-stress` (`-rate threads=50  -mode native cql3`)
 - Run connection storm with 1000 session (100 python processes, 10 sessions each)

The maximum latency during connection storm dropped **from 224.94ms to 41.43ms** (those numbers are average from 20 test executions, were max latency was in [140ms, 361ms] before change and [31.4ms, 61.5ms] after).

The snippet of cassandra-stress output from the moment of connection storm:
Before:
```
type       total ops,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max ms,  sum ms,  sdv ms,      mb
...
total,        789206,   85887,   85887,   85887,     0.6,     0.3,     2.0,     2.0,     2.5,     5.0,    9.0,  0.09679,      0,      0,       0,       0,       0,       0
total,        909322,  120116,  120116,  120116,     0.4,     0.2,     1.9,     2.0,     2.1,     3.1,   10.0,  0.09053,      0,      0,       0,       0,       0,       0
total,        964392,   55070,   55070,   55070,     0.9,     0.4,     2.0,     4.5,     7.7,    18.9,   11.0,  0.09203,      0,      0,       0,       0,       0,       0
total,        975705,   11313,   11313,   11313,     4.4,     3.5,     6.5,    24.5,    82.7,    83.0,   12.0,  0.11713,      0,      0,       0,       0,       0,       0
total,        987548,   11843,   11843,   11843,     4.2,     3.5,     6.5,    33.7,    48.6,    51.5,   13.0,  0.13366,      0,      0,       0,       0,       0,       0
total,        995422,    7874,    7874,    7874,     6.3,     4.0,     7.7,    85.6,   112.9,   113.5,   14.0,  0.14753,      0,      0,       0,       0,       0,       0
total,       1007228,   11806,   11806,   11806,     4.3,     3.5,     6.5,    29.1,    43.8,    87.1,   15.0,  0.15598,      0,      0,       0,       0,       0,       0
total,       1012840,    5612,    5612,    5612,     8.2,     5.0,    11.5,   121.8,   166.6,   170.1,   16.0,  0.16535,      0,      0,       0,       0,       0,       0
total,       1016186,    3346,    3346,    3346,    13.4,     7.4,    20.1,   204.9,   207.6,   210.4,   17.0,  0.17405,      0,      0,       0,       0,       0,       0
total,       1025462,    9276,    9276,    9276,     6.3,     3.9,     9.6,    74.6,   206.8,   210.0,   18.0,  0.17800,      0,      0,       0,       0,       0,       0
total,       1035979,   10517,   10517,   10517,     4.8,     3.5,     6.7,    38.5,    82.6,    83.0,   19.0,  0.18120,      0,      0,       0,       0,       0,       0
total,       1047488,   11509,   11509,   11509,     4.3,     3.5,     6.0,    32.6,    72.3,    74.0,   20.0,  0.18334,      0,      0,       0,       0,       0,       0
total,       1077456,   29968,   29968,   29968,     1.7,     1.6,     2.9,     3.6,     7.0,     8.2,   21.0,  0.17943,      0,      0,       0,       0,       0,       0
total,       1105490,   28034,   28034,   28034,     1.8,     1.8,     3.5,     4.6,     5.3,    13.8,   22.0,  0.17609,      0,      0,       0,       0,       0,       0
total,       1132221,   26731,   26731,   26731,     1.9,     1.8,     3.8,     5.2,     8.4,    11.1,   23.0,  0.17314,      0,      0,       0,       0,       0,       0
total,       1162149,   29928,   29928,   29928,     1.7,     1.7,     3.0,     4.5,     8.0,     9.1,   24.0,  0.16950,      0,      0,       0,       0,       0,       0
...
```

After:
```
type       total ops,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max ms,  sum ms,  sdv ms,      mb
...
total,        822863,   94379,   94379,   94379,     0.5,     0.3,     2.0,     2.0,     2.1,     3.7,    9.0,  0.06669,      0,      0,       0,       0,       0,       0
total,        937337,  114474,  114474,  114474,     0.4,     0.2,     2.0,     2.0,     2.1,     3.4,   10.0,  0.06301,      0,      0,       0,       0,       0,       0
total,        986630,   49293,   49293,   49293,     1.0,     1.0,     2.0,     2.1,    17.9,    19.0,   11.0,  0.07318,      0,      0,       0,       0,       0,       0
total,       1026734,   40104,   40104,   40104,     1.2,     1.0,     2.0,     2.2,     6.3,     7.1,   12.0,  0.08410,      0,      0,       0,       0,       0,       0
total,       1066124,   39390,   39390,   39390,     1.3,     1.0,     2.0,     2.2,     2.6,     3.4,   13.0,  0.09108,      0,      0,       0,       0,       0,       0
total,       1103082,   36958,   36958,   36958,     1.3,     1.1,     2.1,     2.5,     3.1,     4.2,   14.0,  0.09643,      0,      0,       0,       0,       0,       0
total,       1141987,   38905,   38905,   38905,     1.3,     1.0,     2.0,     2.4,    11.4,    12.7,   15.0,  0.09894,      0,      0,       0,       0,       0,       0
total,       1180023,   38036,   38036,   38036,     1.3,     1.0,     2.0,     3.7,     5.6,     7.1,   16.0,  0.10070,      0,      0,       0,       0,       0,       0
total,       1216481,   36458,   36458,   36458,     1.4,     1.0,     2.1,     3.6,     4.7,     5.0,   17.0,  0.10210,      0,      0,       0,       0,       0,       0
total,       1256819,   40338,   40338,   40338,     1.2,     1.0,     2.0,     2.2,     3.5,     5.4,   18.0,  0.10173,      0,      0,       0,       0,       0,       0
total,       1295122,   38303,   38303,   38303,     1.3,     1.0,     2.0,     2.4,    21.0,    21.1,   19.0,  0.10136,      0,      0,       0,       0,       0,       0
total,       1334743,   39621,   39621,   39621,     1.3,     1.0,     2.0,     2.3,     3.3,     4.0,   20.0,  0.10055,      0,      0,       0,       0,       0,       0
total,       1375579,   40836,   40836,   40836,     1.2,     1.0,     2.0,     2.1,     3.4,     5.7,   21.0,  0.09927,      0,      0,       0,       0,       0,       0
total,       1415576,   39997,   39997,   39997,     1.2,     1.0,     2.0,     2.3,     3.2,     4.1,   22.0,  0.09807,      0,      0,       0,       0,       0,       0
total,       1449268,   33692,   33692,   33692,     1.5,     1.4,     2.5,     3.2,     4.2,     5.6,   23.0,  0.09800,      0,      0,       0,       0,       0,       0
total,       1471873,   22605,   22605,   22605,     2.2,     2.0,     4.8,     5.9,     7.0,     7.9,   24.0,  0.10015,      0,      0,       0,       0,       0,       0
...
```

Fixes: https://github.com/scylladb/scylladb/issues/24411

This is a new feature, so no backport needed.

Closes scylladb/scylladb#26411

* github.com:scylladb/scylladb:
  docs: workload-prioritization: add driver service level
  test: add test to verify use of `sl:driver`
  transport: use `sl:driver` to handle driver's control connections
  transport: whitespace only change in update_scheduling_group
  transport: call update_scheduling_group for non-auth connections
  generic_server: transport: start using `sl:driver` for new connections
  test: add test_desc_* for driver service level
  test: service_levels: add tests for sl:driver creation and removal
  test: add reload_raft_topology_state() to ScyllaRESTAPIClient
  service_level_controller: automatically create `sl:driver`
  service_level_controller: methods to create driver service level
  service_level_controller: handle special sl:driver in DESC output
  topology_coordinator: add service_level_controller reference
  system_keyspace: add service_level_driver_created
  test: add MAX_USER_SERVICE_LEVELS
2025-10-09 17:28:39 +03:00
Dawid Mędrek
ecc955fbe0 index/secondary_index_manager: Take std::span instead of std::vector 2025-10-09 16:17:07 +02:00
Dawid Mędrek
074f0f2e4c index/secondary_index_manager: Add missing const qualifier 2025-10-09 16:06:50 +02:00
Dawid Mędrek
7baf95bc4b index/vector_index: Add missing const qualifiers 2025-10-09 16:06:24 +02:00
Dawid Mędrek
4486ac0891 cql3/statements/index_prop_defs.cc: Remove unused include 2025-10-09 16:01:56 +02:00
Dawid Mędrek
d50c2f7c74 cql3/statements/index_prop_defs.cc: Mark function as TU-local 2025-10-09 16:00:44 +02:00
Dawid Mędrek
89b3d0c582 cql3/statements/index_prop_defs: Mark methods as const-qualified 2025-10-09 15:53:29 +02:00
Avi Kivity
bb02295695 setup: add the lazytime XFS mount option
In f828fe0d59 ("setup: add the lazytime XFS version") we added the
lazytime mount option to /var/lib/scylla, but it was quickly reverted
(8f5e80e61a) as it caused a regression on CentOS 7.

We reinstate it now with a kernel version check. This will avoid
the lazytime mount option on CentOS 7, which is unsupported anyway.

The lazytime option avoids marking the inode as dirty if it's only for the
purpose of updating mtime/ctime. This won't help much while writing sstables
(since the write also updates extent information), but may help a little
with with commitlog writes, since those are pure overwrites.

It likely won't help with the RWF_NOWAIT violations seen in [1], since
those are likely due to in-memory locking, not flushing dirty inodes
to disk.

Tested with an install to Ubuntu 24.04 LTS followed by a scylla_setup run.
The lazytime option was added the the .mount file and showed up in
the live mount.

[1] https://github.com/scylladb/seastar/issues/2974

Closes scylladb/scylladb#26436
Fixes #26002
2025-10-09 15:55:58 +03:00
Ernest Zaslavsky
c2bab430d7 s3_client: fix when condition to prevent infinite locking
Refine condition variable predicate in filling fiber to avoid
indefinite waiting when `close` is invoked.

Closes scylladb/scylladb#26449
2025-10-09 15:55:37 +03:00
Michał Chojnowski
c35b82b860 test/cluster/test_bti_index.py: avoid a race with CQL tracing
The test uses CQL tracing to check which files were read by a query.
This is flaky if the coordinator and the replica are different shards,
because the Python driver only waits for the coordinator, and not
for replicas, to finish writing their traces.
(So it might happen that the Python driver returns a result
with only coordinator events and no replica events).

Let's just dodge the issue by using --smp=1.

Fixes scylladb/scylladb#26432

Closes scylladb/scylladb#26434
2025-10-09 13:22:06 +03:00
Michał Chojnowski
87e3027c81 docs: fix a parameter name in API calls in sstable-dictionary-compression.rst
The correct argument name is `cf`, not `table`.

Fixes scylladb/scylladb#25275

Closes scylladb/scylladb#26447
2025-10-09 13:18:47 +03:00
Robert Bindar
2c74a6981b Make scylla_io_setup detect request size for best write IOPS
We noticed during work on scylladb/seastar#2802 that on i7i family
(later proved that it's valid for i4i family as well),
the disks are reporting the physical sector sizes incorrectly
as 512bytes, whilst we proved we can render much better write IOPS with
4096bytes.

This is not the case on AWS i3en family where the reported 512bytes
physical sector size is also the size we can achieve the best write IOPS.

This patch works around this issue by changing `scylla_io_setup` to parse
the instance type out of `/sys/devices/virtual/dmi/id/product_name`
and run iotune with the correct request size based on the instance type.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#25315
2025-10-08 14:30:52 +03:00
Piotr Dulikowski
fe7ffc5e5d Merge 'service/qos: set long timeout for auth queries on SL cache update' from Michael Litvak
pass an appropriate query state for auth queries called from service
level cache reload. we use the function qos_query_state to select a
query_state based on caller context - for internal queries, we set a
very long timeout.

the service level cache reload is called from group0 reload. we want it
to have a long timeout instead of the default 5 seconds for auth
queries, because we don't have strict latency requirement on the one
hand, and on the other hand a timeout exception is undesired in the
group0 reload logic and can break group0 on the node.

Fixes https://github.com/scylladb/scylladb/issues/25290

backport possible to improve stability

Closes scylladb/scylladb#26180

* github.com:scylladb/scylladb:
  service/qos: set long timeout for auth queries on SL cache update
  auth: add query_state parameter to query functions
  auth: refactor query_all_directly_granted
2025-10-08 12:37:01 +02:00
Michał Jadwiszczak
84e4e34d81 db/view/view_building_worker: move discover_existing_staging_sstables() to the foreground
This patch moves `discover_existing_staging_sstables()` to be executed
from main level, instead of running it on the background fiber.

This method need to be run only once during the startup to collect
existing staging sstables, so there is no need to do it in the
background. This change will increase debugability of any further issues
related to it (like scylladb/scylladb#26403).

Fixes scylladb/scylladb#26417
2025-10-08 11:16:07 +02:00
Michał Jadwiszczak
575dce765e db/view/view_building_worker: futurize and rename start_background_fibers()
Next commit will move `discover_existing_staging_sstables()`
to the foreground, so to prepare for this we need to futurize
`start_background_fibers()` method and change its name to better reflect
its purpose.
2025-10-08 10:19:41 +02:00
Andrzej Jackowski
0072b75541 docs: workload-prioritization: add driver service level
Refs: scylladb/scylladb#24411
2025-10-08 08:25:38 +02:00
Andrzej Jackowski
f720ce0492 test: add test to verify use of sl:driver
`sl:driver` is expected to be used for new and control connections,
but other connections that run user load should not use it after
the user is authenticated.

Refs: scylladb/scylladb#24411
2025-10-08 08:25:33 +02:00
Andrzej Jackowski
f99b8c4a55 transport: use sl:driver to handle driver's control connections
Before `sl:driver` was introduced, service levels were assigned as
follows:
 1. New connections were processed in `main`.
 2. After user authentication was completed, the connection's SL was
    changed to the user's SL (or `sl:default` if the user had no SL).

This commit introduces `service_level_state` to `client_state` and
implements the following logic in `transport/server`:

 1. If `sl:driver` is not present in the system (for example, it was
    removed), service levels behave as described above.
 2. If `sl:driver` is present, the flow is:
   I.   New connections use `sl:driver`.
   II.  After user authentication is completed, the connection's SL is
        changed to the user's SL (or `sl:default`).
   III. If a REGISTER (to events) request is handled, the client is
        processing the control connection. We mark the client_state
        to permanently use `sl:driver`.

The aforementioned state `2.III` is represented by
`_control_connection` flag in `client_state`.

Fixes: scylladb/scylladb#24411
2025-10-08 08:25:28 +02:00
Andrzej Jackowski
fd36bc418a transport: whitespace only change in update_scheduling_group
The indentation is changed because it will be required in the next
commit of this patch series.
2025-10-08 08:25:22 +02:00
Andrzej Jackowski
278019c328 transport: call update_scheduling_group for non-auth connections
Before this change, unauthorized connections stayed in `main`
scheduling group. It is not ideal, in such case, rather `sl:default`
should be used, to have a consistent behavior with a scenario
where users is authenticated but there is no service level assigned
to the user.

This commit adds a call to `update_scheduling_group` at the end of
connection creation for an unauthenticated user, to make sure the
service level is switched to `sl:default`.

Fixes: scylladb/scylladb#26040
2025-10-08 08:25:17 +02:00
Andrzej Jackowski
14081d0727 generic_server: transport: start using sl:driver for new connections
Before this change, new connections were handled in a default
scheduling group (`main`), because before the user is authenticated
we do not know which service level should be used. With the new
`sl:driver` service level, creation of new connections can be moved to
`sl:driver`.

We switch the service level as early as possible, in `do_accepts`.
There is a possibility, that `sl:driver` will not exist yet, for
instance, in specific upgrade cases, or if it was removed. Therefore,
we also switch to `sl:driver` after a connection is accepted.

Refs: scylladb/scylladb#24411
2025-10-08 08:25:12 +02:00
Andrzej Jackowski
b62135f767 test: add test_desc_* for driver service level
Driver service level is a special service level that is created
automatically by the system. Therefore, it requires special handling
in DESC SCHEMA WITH INTERNALS and those test verifies the special
behavior.

Refs: scylladb/scylladb#24411
2025-10-08 08:25:07 +02:00
Andrzej Jackowski
0ddf46c7b4 test: service_levels: add tests for sl:driver creation and removal
Refs: scylladb/scylladb#24411
2025-10-08 08:25:02 +02:00
Andrzej Jackowski
9e9bca9bdb test: add reload_raft_topology_state() to ScyllaRESTAPIClient
To encapsulate `/storage_service/raft_topology/reload` API call
2025-10-08 08:24:57 +02:00
Andrzej Jackowski
c59a7db1c9 service_level_controller: automatically create sl:driver
This commit:
  - Increases the number of allowed scheduling groups to allow the
    creation of `sl:driver`.
  - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating
    `sl:driver` until all nodes have increased the number of
    scheduling groups.
  - Starts using `get_create_driver_service_level_mutations`
    to unconditionally create `sl:driver` on
    `raft_initialize_discovery_leader`. The purpose of this code
    path is ensuring existence of `sl:driver` in new system and tests.
  - Starts using `migrate_to_driver_service_level` to create `sl:driver`
    if it is not already present. The creation of `sl:driver` is
    managed by `topology_coordinator`, similar to other system keyspace
    updates, such as the `view_builder` migration. The purpose of this
    code path is handling upgrades.
  - Modifies related tests to pass after `sl:driver` is added.

Later in this patch series, `sl:driver` will be used by
`transport/server` to handle selected traffic, such as the driver's
schema and topology fetches.

Refs: scylladb/scylladb#24411
2025-10-08 08:24:43 +02:00
Andrzej Jackowski
923559f46a service_level_controller: methods to create driver service level
This commit implements `get_create_driver_service_level_mutations`
and `migrate_to_driver_service_level` in service_level_controller.
Both methods create `sl:driver` with shares=200 and store this fact
in `system.scylla_local`. Both methods will be used later in this
patch series for automatic creation of sl:driver.

Refs: scylladb/scylladb#24411
2025-10-08 08:24:38 +02:00
Andrzej Jackowski
2d296a2f9b service_level_controller: handle special sl:driver in DESC output
Later in this patch series, `sl:driver` will be added as a special
service level created automatically by the system. It needs special
handling in `DESC SCHEMA ...` to ensure that during backup restore:
  1. CREATE SERVICE LEVEL does not fail if `sl:driver` already exists
  2. If `sl:driver` exists, its configuration is fully restored (emit
     ALTER SERVICE LEVEL).
  3. If `sl:driver` was removed, the information is retained (emit
     DROP SERVICE LEVEL instead of CREATE/ALTER).

Refs: scylladb/scylladb#24411
2025-10-08 08:24:33 +02:00
Andrzej Jackowski
1ff605005e topology_coordinator: add service_level_controller reference
This adds a reference to sl_controller so that, later in this patch
series, topology_coordinator can manage creating `sl:driver` once
group0 is fully operational.

Refs: scylladb/scylladb#24411
2025-10-08 08:24:28 +02:00