Commit Graph

48576 Commits

Author SHA1 Message Date
Piotr Dulikowski
2bb800c004 qos: don't populate effective service level cache until auth is migrated to raft
Right now, service levels are migrated in one group0 command and auth
is migrated in the next one. This has a bad effect on the group0 state
reload logic - modifying service levels in group0 causes the effective
service levels cache to be recalculated, and to do so we need to fetch
information about all roles. If the reload happens after SL upgrade and
before auth upgrade, the query for roles will be directed to the legacy
auth tables in system_auth - and the query, being a potentially remote
query, has a timeout. If the query times out, it will throw
an exception which will break the group0 apply fiber and the node will
need to be restarted to bring it back to work.

In order to solve this issue, make sure that the service level module
does not start populating and using the service level cache until both
service levels and auth are migrated to raft. This is achieved by adding
the check both to the cache population logic and the effective service
level getter - they now look at service level's accessor new method,
`can_use_effective_service_level_cache` which takes a look at the auth
version.

Fixes: scylladb/scylladb#24963
2025-07-29 11:37:37 +02:00
Gleb Natapov
d5e023bbad topology coordinator: drop no longer needed token metadata barrier
Currently we do token metadata barrier before accepting a replacing
node. It was needed for the "replace with the same IP" case to make sure
old request will not contact new node by mistake. But now since we
address nodes by id this is no longer possible since old requests will
use old id and will be rejected.

Closes scylladb/scylladb#25047
2025-07-24 11:15:42 +02:00
Tomasz Grabiec
c9bf010d6d Merge 'test.py: skip cleaning testlog' from Andrei Chekun
Skip removing any artifacts when -s provided between test.py invocation.
Logs from the previous run will be overridden if tests were executed one
more time. Fox example:
1. Execute tests A, B, C with parameter -s
2. All logs are present even if tests are passed
3. Execute test B with parameter -s
4. Logs for A and C are from the first run
5. Logs for B are from the most recent run

Backport is not needed, since it framework enhancement.

Closes scylladb/scylladb#24838

* github.com:scylladb/scylladb:
  test.py: skip cleaning artifacts when -s provided
  test.py: move deleting directory to prepare_dir
2025-07-24 09:46:42 +03:00
Gleb Natapov
ab6e328226 storage_proxy: preallocate write response handler hash table
Currently it grows dynamically and triggers oversized allocation
warning. Also it may be hard to find sufficient contiguous memory chunk
after the system runs for a while. This patch pre-allocates enough
memory for ~1M outstanding writes per shard.

Fixes #24660
Fixes #24217

Closes scylladb/scylladb#25098
2025-07-24 09:46:42 +03:00
Patryk Jędrzejczak
f89ffe491a Merge 'storage_service: cancel all write requests after stopping transports' from Sergey Zolotukhin
When a node shuts down, in storage service, after storage_proxy RPCs are stopped, some write handlers within storage_proxy may still be waiting for background writes to complete. These handlers hold appropriate ERMs to block schema changes before the write finishes. After the RPCs are stopped, these writes cannot receive the replies anymore.

If, at the same time, there are RPC commands executing `barrier_and_drain`, they may get stuck waiting for these ERM holders to finish, potentially blocking node shutdown until the writes time out.

This change introduces cancellation of all outstanding write handlers from storage_service after the storage proxy RPCs were stopped.

Fixes scylladb/scylladb#23665

Backport: since this fixes an issue that frequently causes issues in CI, backport to 2025.1, 2025.2, and 2025.3.

Closes scylladb/scylladb#24714

* https://github.com/scylladb/scylladb:
  storage_service: Cancel all write requests on storage_proxy shutdown
  test: Add test for unfinished writes during shutdown and topology change
2025-07-24 09:46:42 +03:00
Gleb Natapov
ddc3b6dcf5 migration manager: assert that if schema pull is disabled the group0 is not in use_pre_raft_procedures state
If schema pull are disabled group0 is used to bring up to date schema
by calling start_group0_operation() which executes raft read barrier
internally, but if the group0 is still in use_pre_raft_procedures
start_group0_operation() silently does nothing. Later the code that
assumes that schema is already up-to-date will fail and print warnings
into the log. But since getting queries in the state when a node is in
raft enabled mode but group0 is still not configured is illegal it is
better to make those errors more visible buy asserting them during
testing.

Closes scylladb/scylladb#25112
2025-07-23 14:10:17 +02:00
Botond Dénes
b65a2e2303 Update seastar submodule
* seastar 26badcb1...60b2e7da (42):
  > Revert "Fix incorrect defaults for io queue iops/bandwidth"
  > fair_queue: Ditch queue-wide accumulator reset on overflow
  > addr2line, scripts/stall-analyser: change the default tool to llvm-addr2line
  > Fix incorrect defaults for io queue iops/bandwidth
  > core/reactor: add cxx_exceptions() getter
  > gate: make destructor virtual
  > scripts/seastar-addr2line: change the default addr2line utility to llvm-addr2line
  > coding-style: Align example return types
  > reactor: Remove min_vruntime() declaration
  > reactor: Move enable_timer() method to private section
  > smp: fix missing span include
  > core: Don't keep internal errors counter on reactor
  > pollable_fd: Untangle shutdown()
  > io_queue: Remove deprecated statistics getters
  > fair_queue: Remove queued/executing resource counters
  > reactor: Move set_current_task() from public reactor API
  > util: make SEASTAR_ASSERT() failure generate SIGABRT
  > core: fix high CPU use at idle on high core count machines
  > Merge 'Move output IO throttler to IO queue level' from Pavel Emelyanov
    fair_queue: Move io_throttler to io_queue.hh
    fair_queue: Move metrics from to io_queue::stream
    fair_queue: Remove io_throttler from tests
    fair_queue_test: Remove io-throttler from fair-queue
    fair_queue: Remove capacity getters
    fair_queue: Move grab_result into io_queue::stream too
    fair_queue: Move throtting code to io_queue.cc
    fair_queue: Move throttling code to io_queue::stream class
    fair_queue: Open-code dispatch_requests() into users
    fair_queue: Split dispatch_requests() into top() and pop_front()
    fair_queue: Swap class push back and dispatch
    fair_queue: Configure forgiving factor externally
    fair_queue: Move replenisher kick to dispatch caller
    io_queue: Introduce io_queue::stream
    fair_queue: Merge two grab_capacity overloads
    fair_queue: Detatch outcoming capacity grabbing from main dispatch loop
    fair_queue: Move available tokens update into if branch
    io_queue: Rename make_fair_group_config into configure_throttler
    io_queue: Rename get_fair_group into get_throttler
    fair_queue: Rename fair_group -> io_throttler
  > http::reply: Add 308 (permanent redirect) and make pretty-print handle unknown values
  > Merge 'Relax reactor coupling with file_data_source_impl' from Pavel Emelyanov
    reactor: Relax friendship with file_data_source_impl
    fstream: Use direct io_stats reference
  > thread_pool: Relax coupling with reactor
  > reactor: Mark some IO classes management methods private
  > http: Deprecate json_exception
  > io_tester: Collect and report disk queue length samples
  > test/perf: Add context-switch measurer
  > http/client: Zero-copy forward content-length body into the underlying stream
  > json2code: Genrate move constructor and move-assignment operator
  > Merge 'Semi-mixed mode for output_stream' from Pavel Emelyanov
    output_stream: Support semi-mixed mode writing
    output_stream: Complete write(temporary_buffer) piggy-back-ing write(packet)
    iostream: Add friends for iostream tests
    packet: Mark bool cast operator const
    iostream: Document output_stream::write() methods
  > io_tester: Show metrics about requests split
  > reactor: add counter for internal errors
  > iotune: Print correct throughput units
  > core: add label to io_threaded_fallbacks to categorize operations
  > slab: correct allocation logic and enforce memory limits
  > Merge 'Fix for non-json http function_handlers' from Travis Downs
    httpd_test: add test for non-JSON function handler
    function_handlers: avoid implicit conversions
    http: do not always treat plain text reply as json
  > Merge 'tls: add ALPN support' from Łukasz Kurowski
    tls: add server-side ALPN support
    tls: add client-side ALPN support
  > Merge 'coroutine: experimental: generator: implement move and swap' from Benny Halevy
    coroutine: experimental: generator: implement move and swap
    coroutine: experimental: generator: unconstify buffer capacity
  > future: downgrade asserts
  > output_stream: Remove unused bits
  > Merge 'Upstream a couple of minor reactor optimizations' from Travis Downs
    Match type for pure_check_for_work
    Do not use std::function for check_for_work()
  > Handle ENOENT in getgrnam

Includes scylla-gdb.py update by Pavel Emelyanov.

Closes scylladb/scylladb#25094
2025-07-22 18:19:58 +02:00
Sergey Zolotukhin
e0dc73f52a storage_service: Cancel all write requests on storage_proxy shutdown
During a graceful node shutdown, RPC listeners are stopped in `storage_service::drain_on_shutdown`
as one of the first steps. However, even after RPCs are shut down, some write handlers in
`storage_proxy` may still be waiting for background writes to complete. These handlers retain the ERM.
Since the RPC subsystem is no longer active, replies cannot be received, and if any RPC commands are
concurrently executing `barrier_and_drain`, they may get stuck waiting for those writes. This can block
the messaging server shutdown and delay the entire shutdown process until the write timeout occurs.

This change introduces the cancellation of all outstanding write handlers in `storage_proxy`
during shutdown to prevent unnecessary delays.

Fixes scylladb/scylladb#23665
2025-07-22 15:03:30 +02:00
Sergey Zolotukhin
bc934827bc test: Add test for unfinished writes during shutdown and topology change
This test reproduces an issue where a topology change and an ongoing write query
during query coordinator shutdown can cause the node to get stuck.

When a node receives a write request, it creates a write handler that holds
a copy of the current table's ERM (Effective Replication Map). The ERM ensures
that no topology or schema changes occur while the request is being processed.

After the query coordinator receives the required number of replica write ACKs
to satisfy the consistency level (CL), it sends a reply to the client. However,
the write response handler remains alive until all replicas respond — the remaining
writes are handled in the background.

During shutdown, when all network connections are closed, these responses can no longer
be received. As a result, the write response handler is only destroyed once the write
timeout is reached.

This becomes problematic because the ERM held by the handler blocks topology or schema
change commands from executing. Since shutdown waits for these commands to complete,
this can lead to unnecessary delays in node shutdown and restarts, and occasional
test case failures.

Test for: scylladb/scylladb#23665
2025-07-22 15:03:13 +02:00
Ran Regev
3d82b9485e docs: update nodetool restore documentation for --sstables-file-list
Fixes: #25128
A leftover from #25077

Closes scylladb/scylladb#25129
2025-07-22 14:43:35 +02:00
Yaron Kaikov
4445c11c69 ./github/workflows/conflict_reminder: improve workflow with weekly notifications
- Change schedule from twice weekly (Mon/Thu) to once weekly (Mon only)
- Extend notification cooldown period from 3 days to 1 week
- Prevent notification spam while maintaining immediate conflict detection on pushes

Fixes: https://github.com/scylladb/scylladb/issues/25130

Closes scylladb/scylladb#25131
2025-07-22 15:21:12 +03:00
Avi Kivity
e4c4141d97 test.py: don't crash on early cleanup of ScyllaServer
If a test fails very early (still have to find why), test.py
crashes while flushing a non-existent log_file, as shown below.

To fix, initialize the property to None and check it during
cleanup.

```
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------

'ScyllaServer' object has no attribute 'log_file'
test_cluster_features Traceback (most recent call last):
  File "/home/avi/scylla-maint/./test.py", line 816, in <module>
    sys.exit(asyncio.run(main()))
             ~~~~~~~~~~~^^^^^^^^
  File "/usr/lib64/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/usr/lib64/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/usr/lib64/python3.13/asyncio/base_events.py", line 725, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/home/avi/scylla-maint/./test.py", line 523, in main
    total_tests_pytest, failed_pytest_tests = await run_all_tests(signaled, options)
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/avi/scylla-maint/./test.py", line 452, in run_all_tests
    failed += await reap(done, pending, signaled)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/avi/scylla-maint/./test.py", line 418, in reap
    result = coro.result()
  File "/home/avi/scylla-maint/test/pylib/suite/python.py", line 143, in run
    return await super().run(test, options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/avi/scylla-maint/test/pylib/suite/base.py", line 216, in run
    await test.run(options)
  File "/home/avi/scylla-maint/test/pylib/suite/topology.py", line 48, in run
    async with get_cluster_manager(self.uname, self.suite.clusters, str(self.suite.log_dir)) as manager:
               ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/contextlib.py", line 221, in __aexit__
    await anext(self.gen)
  File "/home/avi/scylla-maint/test/pylib/scylla_cluster.py", line 2006, in get_cluster_manager
    await manager.stop()
  File "/home/avi/scylla-maint/test/pylib/scylla_cluster.py", line 1539, in stop
    await self.clusters.put(self.cluster, is_dirty=True)
  File "/home/avi/scylla-maint/test/pylib/pool.py", line 104, in put
    await self.destroy(obj)
  File "/home/avi/scylla-maint/test/pylib/suite/python.py", line 65, in recycle_cluster
    srv.log_file.close()
    ^^^^^^^^^^^^
AttributeError: 'ScyllaServer' object has no attribute 'log_file'
```

Closes scylladb/scylladb#24885
2025-07-22 12:39:01 +02:00
Avi Kivity
2db2b42556 sstables: version: drop custom operator<=>
The default comparison for enums is equivalent and
sufficient.

Closes scylladb/scylladb#24888
2025-07-22 12:39:01 +02:00
Avi Kivity
e89f6c5586 config, main: make cpu scheduling mandatory
CPU scheduling has been with us since 641aaba12c
(2017), and no one ever disables it. Likely nothing really works without
it.

Make it mandatory and mark the option unused.

Closes scylladb/scylladb#24894
2025-07-22 12:39:01 +02:00
Avi Kivity
ee138217ba alternator: simplify std::views::transform calls that extract a member from a class
Rather than calling std::views::transform with a lambda that extracts
a member from a class, call std::views::transform with a pointer-to-member
to do the same thing. This results in more concise code.

Closes scylladb/scylladb#25012
2025-07-22 12:39:01 +02:00
Jakub Smolar
6e0a063ce3 gdb: handle zero-size reads in managed_bytes
Fixes: https://github.com/scylladb/scylladb/issues/25048

Closes scylladb/scylladb#25050
2025-07-22 12:39:01 +02:00
Nadav Har'El
298a0ec4de test/cqlpy: in README.md, remind users of run-cassandra to set NODETOOL
test/cqlpy/README.md explains how to run the cqlpy tests against
Cassandra, and mentions that if you don't have "nodetool" in your path
you need to set the NODETOOL variable. However, when giving a simple
example how to use the run-cassandra script, we forgot to remind the
user to set NODETOOL in addition to CASSANDRA, causing confusion for
users who didn't know why tests were failing.

So this patch fixes the section in test/cqlpy/README.md with the
run-cassandra example to also set the NODETOOL environment variable,
not just CASSANDRA.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25051
2025-07-22 12:39:00 +02:00
Aleksandra Martyniuk
b5026edf49 tasks: change _finished_children type
Parent task keeps a vector of statuses (task_essentials) of its finished
children. When the children number is large - for example because we
have many tables and a child task is created for each table - we may hit
oversize allocation while adding a new child essentials to the vector.

Keep task_essentails of children in chunked_vector.

Fixes: #25040.

Closes scylladb/scylladb#25064
2025-07-22 12:39:00 +02:00
Pavel Emelyanov
d94be313c1 Merge 'test: audit: ignore cassandra user audit logs in AUTH tests' from Andrzej Jackowski
Audit tests are vulnerable to noise from LOGIN queries (because AUTH
audit logs can appear at any time). Most tests already use the
`filter_out_noise` mechanism to remove this noise, but tests
focused on AUTH verification did not, leading to sporadic failures.

This change adds a filter to ignore AUTH logs generated by the default
"cassandra" user, so tests only verify logs from the user created
specifically for each test.

Additionally, this PR:
 - Adds missing `nonlocal new_rows` statement that prevented some checks from being called
 - Adds a testcase for audit logs of `cassandra` user

Fixes: https://github.com/scylladb/scylladb/issues/25069

Better backport those test changes to 2025.3. 2025.2 and earlier don't have `./cluster/dtest/audit_test.py`.

Closes scylladb/scylladb#25111

* github.com:scylladb/scylladb:
  test: audit: add cassandra user test case
  test: audit: ignore cassandra user audit logs in AUTH tests
  test: audit: change names of `filter_out_noise` parameters
  test: audit: add missing `nonlocal new_rows` statement
2025-07-22 10:42:16 +03:00
Pavel Emelyanov
295165d8ea Merge 's3_client: Enhance s3_client error handling' from Ernest Zaslavsky
Enhance and fix error handling in the `chunked_download_source` to prevent errors seeping from the request callback. Also stop retrying on seastar's side since it is going to break the integrity of data which maybe downloaded more than once for the same range.

Fixes: https://github.com/scylladb/scylladb/issues/25043

Should be backported to 2025.3 since we have an intention to release native backup/restore feature

Closes scylladb/scylladb#24883

* github.com:scylladb/scylladb:
  s3_client: Disable Seastar-level retries in HTTP client creation
  s3_test: Validate handling of non-`aws_error` exceptions
  s3_client: Improve error handling in chunked_download_source
  aws_error: Add factory method for `aws_error` from exception
2025-07-22 10:40:39 +03:00
Ran Regev
dd67d22825 nodetool restore: sstable list from a file
Fixes: #25045

added the ability to supply the list of files to
restore from the a given file.
mainly required for local testing.

Signed-off-by: Ran Regev <ran.regev@scylladb.com>

Closes scylladb/scylladb#25077
2025-07-22 09:11:02 +03:00
Ernest Zaslavsky
fc2c9dd290 s3_client: Disable Seastar-level retries in HTTP client creation
Prevent Seastar from retrying HTTP requests to avoid buffer double-feed
issues when an entire request is retried. This could cause data
corruption in `chunked_download_source`. The change is global for every
instance of `s3_client`, but it is still safe because:
* Seastar's `http_client` resets connections regardless of retry behavior
* `s3_client` retry logic handles all error types—exceptions, HTTP errors,
  and AWS-specific errors—via `http_retryable_client`
2025-07-21 17:03:23 +03:00
Ernest Zaslavsky
ba910b29ce s3_test: Validate handling of non-aws_error exceptions
Inject exceptions not wrapped in `aws_error` from request callback
lambda to verify they are properly caught and handled.
2025-07-21 16:52:43 +03:00
Ernest Zaslavsky
b7ae6507cd s3_client: Improve error handling in chunked_download_source
Create aws_error from raised exceptions when possible and respond
appropriately. Previously, non-aws_exception types leaked from the
request handler and were treated as non-retryable, causing potential
data corruption during download.
2025-07-21 16:49:47 +03:00
Ernest Zaslavsky
d53095d72f aws_error: Add factory method for aws_error from exception
Move `aws_error` creation logic out of `retryable_http_client` and
into the `aws_error` class to support reuse across components.
2025-07-21 16:42:44 +03:00
Andrzej Jackowski
21aedeeafb test: audit: add cassandra user test case
Audit tests use the `filter_out_noise` function to remove noise from
audit logs generated by user authentication. As a result, none of the
existing tests covered audit logs for the default `cassandra` user.
This change adds a test case for that user.

Refs: scylladb/scylladb#25069
2025-07-21 14:54:20 +02:00
Andrzej Jackowski
aef6474537 test: audit: ignore cassandra user audit logs in AUTH tests
Audit tests are vulnerable to noise from LOGIN queries (because AUTH
audit logs can appear at any time). Most tests already use the
`filter_out_noise` mechanism to remove this noise, but tests
focused on AUTH verification did not, leading to sporadic failures.

This change adds a filter to ignore AUTH logs generated by the default
"cassandra" user, so tests only verify logs from the user created
specifically for each test.

Fixes: scylladb/scylladb#25069
2025-07-21 14:54:20 +02:00
Andrzej Jackowski
daf1c58e21 test: audit: change names of filter_out_noise parameters
This is a refactoring commit that changes the names of the parameters
of the `filter_out_noise` function, as well as names of related
variables. The motiviation for the change is introduction of more
complex filtering logic in next commit of this patch series.

Refs: scylladb/scylladb#25069
2025-07-21 14:54:01 +02:00
Andrzej Jackowski
e634a2cb4f test: audit: add missing nonlocal new_rows statement
The variable `new_rows` was not updated by the inner function
`is_number_of_new_rows_correct` because the `nonlocal new_rows`
statement was missing. As a result, `sorted_new_rows` was empty and
certain checks were skipped.

This change:
 - Introduces the missing `nonlocal new_rows` declaration
 - Adds an assertion verifying that the number of new rows matches
   the expected count
 - Fixes the incorrect variable name in the lambda used for row sorting
2025-07-21 14:53:48 +02:00
Pavel Emelyanov
339f08b24a scripts: Enhance refresh_submodules.sh with nested summary
Currently when refreshing submodule, the script puts a plain list of
non-merge commits into commit message. The resulting summary contains
everything, but is hard to understand. E.g. if updating seastar today
the summary would start with

    * seastar 26badcb1...86c4893b (55):
      > util: make SEASTAR_ASSERT() failure generate SIGABRT
      > core: fix high CPU use at idle on high core count machines
      > http::reply: Add 308 (permanent redirect) and make pretty-print handle unknown values
      > reactor: Relax friendship with file_data_source_impl
      > fstream: Use direct io_stats reference
      > thread_pool: Relax coupling with reactor
      > reactor: Mark some IO classes management methods private
      > http: Deprecate json_exception
      > fair_queue: Move io_throttler to io_queue.hh
      > fair_queue: Move metrics from to io_queue::stream
      > fair_queue: Remove io_throttler from tests
      > fair_queue_test: Remove io-throttler from fair-queue
      > fair_queue: Remove capacity getters
      > fair_queue: Move grab_result into io_queue::stream too
      > fair_queue: Move throtting code to io_queue.cc
      > fair_queue: Move throttling code to io_queue::stream class
      > fair_queue: Open-code dispatch_requests() into users
      > fair_queue: Split dispatch_requests() into top() and pop_front()
      > fair_queue: Swap class push back and dispatch
      > fair_queue: Configure forgiving factor externally
      ...

That's not very informative, because the update includes several large
"merges" that have their summary which is missing here. This update
changes the way summary is generated to include merges and their
summaries and all merged commits are listed as sub-lines, like this

    * seastar 26badcb1...86c4893b (26):
      > util: make SEASTAR_ASSERT() failure generate SIGABRT
      > core: fix high CPU use at idle on high core count machines
      > Merge 'Move output IO throttler to IO queue level' from Pavel Emelyanov
        fair_queue: Move io_throttler to io_queue.hh
        fair_queue: Move metrics from to io_queue::stream
        fair_queue: Remove io_throttler from tests
        fair_queue_test: Remove io-throttler from fair-queue
        fair_queue: Remove capacity getters
        fair_queue: Move grab_result into io_queue::stream too
        fair_queue: Move throtting code to io_queue.cc
        fair_queue: Move throttling code to io_queue::stream class
        fair_queue: Open-code dispatch_requests() into users
        fair_queue: Split dispatch_requests() into top() and pop_front()
        fair_queue: Swap class push back and dispatch
        fair_queue: Configure forgiving factor externally
        fair_queue: Move replenisher kick to dispatch caller
        io_queue: Introduce io_queue::stream
        fair_queue: Merge two grab_capacity overloads
        fair_queue: Detatch outcoming capacity grabbing from main dispatch loop
        fair_queue: Move available tokens update into if branch
        io_queue: Rename make_fair_group_config into configure_throttler
        io_queue: Rename get_fair_group into get_throttler
        fair_queue: Rename fair_group -> io_throttler
      > http::reply: Add 308 (permanent redirect) and make pretty-print handle unknown values
      > Merge 'Relax reactor coupling with file_data_source_impl' from Pavel Emelyanov
        reactor: Relax friendship with file_data_source_impl
        fstream: Use direct io_stats reference
      > thread_pool: Relax coupling with reactor
      > reactor: Mark some IO classes management methods private
      ...

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24834
2025-07-21 14:48:30 +03:00
Ernest Zaslavsky
0053a4f24a encryption: remove default case from component_type switch
Do not use default, instead list all fall-through components
explicitly, so if we add a new one, the developer doing that
will be forced to consider what to do here.

Eliminate the `default` case from the switch in
`encryption_file_io_extension::wrap_sink`, and explicitly
handle all `component_type` values within the switch statement.

fixes: https://github.com/scylladb/scylladb/issues/23724

Closes scylladb/scylladb#24987
2025-07-21 14:43:12 +03:00
Ernest Zaslavsky
408aa289fe treewide: Move misc files to utils directory
As requested in #22114, moved the files and fixed other includes and build system.

Moved files:
- interval.hh
- Map_difference.hh

Fixes: #22114

This is a cleanup, no need to backport

Closes scylladb/scylladb#25095
2025-07-21 11:56:40 +03:00
Piotr Dulikowski
7fd97e6a93 Merge 'cdc: Forbid altering columns of CDC log tables directly' from Dawid Mędrek
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.

In this commit, we're restricting those operations. We also provide two
validation tests.

One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.

Fixes scylladb/scylladb#24643

Backport: we should backport the change to all affected
branches to prevent the consequences that may affect the user.

Closes scylladb/scylladb#25008

* github.com:scylladb/scylladb:
  cdc: Forbid altering columns of inactive CDC log table
  cdc: Forbid altering columns of CDC log tables directly
2025-07-21 09:31:00 +02:00
Ran Regev
bb95ac857e enable_set: fix separator formatting from space comma to comma space
For better log readability.
Fixes: #23883

Closes scylladb/scylladb#24647
2025-07-20 19:12:57 +03:00
Avi Kivity
3dfdcf7d7a Merge 'transport: remove throwing protocol_exception on connection start' from Dario Mirovic
`protocol_exception` is thrown in several places. This has become a performance issue, especially when starting/restarting a server. To alleviate this issue, throwing the exception has to be replaced with returning it as a result or an exceptional future.

This PR replaces throws in the `transport/server` module. This is achieved by using result_with_exception, and in some places, where suitable, just by creating and returning an exceptional future.

There are four commits in this PR. The first commit introduces tests in `test/cqlpy`. The second commit refactors transport server `handle_error` to not rethrow exceptions. The third commit refactors reusable buffer writer callbacks. The fourth commit replaces throwing `protocol_exception` to returning it.

Based on the comments on an issue linked in https://github.com/scylladb/scylladb/issues/24567, the main culprit from the side of protocol exceptions is the invalid protocol version one, so I tested that exception for performance.

In order to see if there is a measurable difference, a modified version of `test_protocol_version_mismatch` Python is used, with 100'000 runs across 10 processes (not threads, to avoid Python GIL). One test run consisted of 1 warm-up run and 5 measured runs. First test run has been executed on the current code, with throwing protocol exceptions. Second test urn has been executed on the new code, with returning protocol exceptions. The performance report is in https://github.com/scylladb/scylladb/pull/24738#issuecomment-3051611069. It shows ~10% gains in real, user, and sys time for this test.

Testing

Build: `release`

Test file: `test/cqlpy/test_protocol_exceptions.py`
Test name: `test_protocol_version_mismatch` (modified for mass connection requests)

Test arguments:
```
max_attempts=100'000
num_parallel=10
```

Throwing `protocol_exception` results:
```
real=1:26.97  user=10:00.27  sys=2:34.55  cpu=867%
real=1:26.95  user=9:57.10  sys=2:32.50  cpu=862%
real=1:26.93  user=9:56.54  sys=2:35.59  cpu=865%
real=1:26.96  user=9:54.95  sys=2:32.33  cpu=859%
real=1:26.96  user=9:53.39  sys=2:33.58  cpu=859%

real=1:26.95 user=9:56.85 sys=2:34.11 cpu=862%   # average
```

Returning `protocol_exception` as `result_with_exception` or an exceptional future:
```
real=1:18.46  user=9:12.21  sys=2:19.08  cpu=881%
real=1:18.44  user=9:04.03  sys=2:17.91  cpu=869%
real=1:18.47  user=9:12.94  sys=2:19.68  cpu=882%
real=1:18.49  user=9:13.60  sys=2:19.88  cpu=883%
real=1:18.48  user=9:11.76  sys=2:17.32  cpu=878%

real=1:18.47 user=9:10.91 sys=2:18.77 cpu=879%   # average
```

This PR replaced `transport/server` throws of `protocol_exception` with returns. There are a few other places where protocol exceptions are thrown, and there are many places where `invalid_request_exception` is thrown. That is out of scope of this single PR, so the PR just refs, and does not resolve issue #24567.

Refs: #24567

This PR improves performance in cases when protocol exceptions happen, for example during connection storms. It will require backporting.

Closes scylladb/scylladb#24738

* github.com:scylladb/scylladb:
  test/cqlpy: add cpp exception metric test conditions
  transport/server: replace protocol_exception throws with returns
  utils/reusable_buffer: accept non-throwing writer callbacks via result_with_exception
  transport/server: avoid exception-throw overhead in handle_error
  test/cqlpy: add protocol_exception tests
2025-07-20 17:42:30 +03:00
Dawid Mędrek
59800b1d66 cdc: Forbid altering columns of inactive CDC log table
When CDC becomes disabled on the base table, the CDC log table
still exsits (cf. scylladb/scylladb@adda43edc7).
If it continues to exist up to the point when CDC is re-enabled
on the base table, no new log table will be created -- instead,
the old olg table will be *re-attached*.

Since we want to avoid situations when the definition of the log
table has become misaligned with the definition of the base table
due to actions of the user, we forbid modifying the set of columns
or renaming them in CDC log tables, even when they're inactive.

Validation tests are provided.
2025-07-18 15:03:08 +02:00
Piotr Dulikowski
85e506dab5 Merge 'test.py: print warning when no tests found' from Andrei Chekun
Quit from the repeats if the test is under the pytest runner directory and has some typos or is absent. This allows not going several times through the discovery and stopping execution.
Print a warning at the end of the run when no tests were selected by provided name.

Fixes: scylladb/scylladb#24892

Closes scylladb/scylladb#24918

* github.com:scylladb/scylladb:
  test.py: print warning in case no tests were found
  test.py: break the loop when there is no tests for pytest
2025-07-18 10:26:44 +02:00
Piotr Dulikowski
fd6e14f3ab Merge 'cdc: throw error if column doesn't exist' from Michael Litvak
in the CDC log transformer, when creating a CDC mutation based on some
base table mutation, for each value of a base column we set the value in
the CDC column with the same name.

When looking up the column in the CDC schema by name, we may get a null
pointer if a column by that name is not found. This shouldn't happen
normally because the base schema and CDC schema should be compatible,
and for each base column there should be a CDC column with the same
name.

However, there are scenarios where the base schema and CDC schema are
incompatible for a short period of time when they are being altered.
When a base column is being added or dropped, we could get a base
mutation with this column set, and then the CDC transformer picks up the
latest CDC schema which doesn't have this column.

If such thing happens, we fix the code to throw an exception instead of
crashing on null pointer dereference. Currently we don't have a safer
approach to handle this, but this might be changed in the future. The
other alternative is dropping that data silently which we prefer not to
do.

Throwing an error is acceptable because this scenario most likely
indicates this behavior by the user:
* The user adds a new column, and start writing values to the column
  before the ALTER is complete. or,
* The user drops a column, and continues writing values to the column
  while it's being dropped.

Both cases might as well fail with an error because the column is not
found in the base table.

Fixes scylladb/scylladb#24952

backport needed - simple fix for a node crash

Closes scylladb/scylladb#24986

* github.com:scylladb/scylladb:
  test: cdc: add test_cdc_with_alter
  cdc: throw error if column doesn't exist
2025-07-18 09:40:56 +02:00
Andrei Chekun
04b0fba88c test.py: print warning in case no tests were found
Print a warning at the end of the run when no tests were selected by provided
name.

Fixes: https://github.com/scylladb/scylladb/issues/24892
2025-07-17 19:51:22 +02:00
Michael Litvak
86dfa6324f test: cdc: add test_cdc_with_alter
Add a test that tests adding and dropping a column to a table with CDC
enabled while writing to it.
2025-07-17 17:16:17 +02:00
Michael Litvak
b336f282ae cdc: throw error if column doesn't exist
in the CDC log transformer, when creating a CDC mutation based on some
base table mutation, for each value of a base column we set the value in
the CDC column with the same name.

When looking up the column in the CDC schema by name, we may get a null
pointer if a column by that name is not found. This shouldn't happen
normally because the base schema and CDC schema should be compatible,
and for each base column there should be a CDC column with the same
name.

However, there are scenarios where the base schema and CDC schema are
incompatible for a short period of time when they are being altered.
When a base column is being added or dropped, we could get a base
mutation with this column set, and then the CDC transformer picks up the
latest CDC schema which doesn't have this column.

If such thing happens, we fix the code to throw an exception instead of
crashing on null pointer dereference. Currently we don't have a safer
approach to handle this, but this might be changed in the future. The
other alternative is dropping that data silently which we prefer not to
do.

Throwing an error is acceptable because this scenario most likely
indicates this behavior by the user:
* The user adds a new column, and start writing values to the column
  before the ALTER is complete. or,
* The user drops a column, and continues writing values to the column
  while it's being dropped.

Both cases might as well fail with an error because the column is not
found in the base table.

Fixes scylladb/scylladb#24952
2025-07-17 17:16:17 +02:00
Dario Mirovic
4a6f71df68 test/cqlpy: add cpp exception metric test conditions
Tested code paths should not throw exceptions. `scylla_reactor_cpp_exceptions`
metric is used. This is a global metric. To address potential test flakiness,
each test runs multiple times:
- `run_count = 100`
- `cpp_exception_threshold = 10`

If a change in the code introduced an exception, expectation is that the number
of registered exceptions will be > `cpp_exception_threshold` in `run_count` runs.
In which case the test fails.
2025-07-17 17:02:48 +02:00
Dario Mirovic
5390f92afc transport/server: replace protocol_exception throws with returns
Replace throwing protocol_exception with returning it as a result
or an exceptional future in the transport server module. This
improves performance, for example during connection storms and
server restarts, where protocol exceptions are more frequent.

In functions already returning a future, protocol exceptions are
propagated using an exceptional future. In functions not already
returning a future, result_with_exception is used.

Notable change is checking v.failed() before calling v.get() in
process_request function, to avoid throwing in case of an
exceptional future.

Refs: #24567
2025-07-17 16:54:05 +02:00
Dario Mirovic
9f4344a435 utils/reusable_buffer: accept non-throwing writer callbacks via result_with_exception
Make make_bytes_ostream and make_fragmented_temporary_buffer accept
writer callbacks that return utils::result_with_exception instead of
forcing them to throw on error. This lets callers propagate failures
by returning an error result rather than throwing an exception.

Introduce buffer_writer_for, bytes_ostream_writer, and fragmented_buffer_writer
concepts to simplify and document the template requirements on writer callbacks.

This patch does not modify the actual callbacks passed, except for the syntax
changes needed for successful compilation, without changing the logic.

Refs: #24567
2025-07-17 16:40:02 +02:00
Dario Mirovic
30d424e0d3 transport/server: avoid exception-throw overhead in handle_error
Previously, connection::handle_error always called f.get() inside a try/catch,
forcing every failed future to throw and immediately catch an exception just to
classify it. This change eliminates that extra throw/catch cycle by first checking
f.failed(), getting the stored std::exception_ptr via f.get_exception(), and
then dispatching on its type via utils::try_catch<T>(eptr).

The error-response logic is not changed - cassandra_exception, std::exception,
and unknown exceptions are caught and processed, and any exceptions thrown by
write_response while handling those exceptions continues to escape handle_error.

Refs: #24567
2025-07-17 16:40:02 +02:00
Dario Mirovic
7aaeed012e test/cqlpy: add protocol_exception tests
Add a helper to fetch scylla_transport_cql_errors_total{type="protocol_error"} counter
from Scylla's metrics endpoint. These metrics are used to track protocol error
count before and after each test.

Add cql_with_protocol context manager utility for session creation with parameterized
protocol_version value. This is used for testing connection establishment with
different protocol versions, and proper disposal of successfully established sessions.

The tests cover two failure scenarios:
- Protocol version mismatch in test_protocol_version_mismatch which tests both supported
and unsupported protocol version
- Malformed frames via raw socket in _protocol_error_impl, used by several test functions,
and also test_no_protocol_exceptions test to assert that the error counters never decrease
during test execution, catching unintended metric resets

Refs: #24567
2025-07-17 16:39:54 +02:00
Petr Gusev
2027856847 Revert "paxos_state: read repair for intranode_migration"
This reverts commit 45f5efb9ba.

The load_and_repair_paxos_state function was introduced in
scylladb/scylladb#24478, but it has never been tested or proven useful.

One set of problems stems from its use of local data structures
from a remote shard. In particular, system_keyspace and schema_ptr
cannot be directly accessed from another shard — doing so is a bug.

More importantly, load_paxos_state on different shards can't ever
return different values. The actual shard from which data is read is
determined by sharder.shard_for_reads, and storage_proxy will jump
back to the appropriate shard if the current one doesn't match. This
means load_and_repair_paxos_state can't observe paxos state from
write-but-not-read shard, and therefore will never be able to
repair anything.

We believe this explicit Paxos state read-repair is not needed at all.

Any paxos state read which drives some paxos round forward is already
accompanied by a paxos state write. Suppose we wrote the state to the
old shard but not to the new shard (because of some error) while
streaming is already finished. The RPC call (prepare or accept) will
return error to the coordinator, such replica response won't affect
the current round. This write won't affect any subsequent paxos rounds
either, unless in those rounds the write actually succeeds on both
shards, effectively 'auto-repairing' paxos state.

Same if we managed to write to the new shard but not to the old shard.
Any subsequent reads will observe either the old state or the new
state (if the tablet already switched reads to the new shard). In any
case, we'll have to write the state to all relevant shards
from sharder.shard_for_writes (one or two) before sending rpc
response, making this state visible for all subsequent reads.

Thus, the monotonicity property ("once observed, the state must always
be observed") appears to hold without requiring explicit read-repair
and load_and_repair_paxos_state is not needed.

Closes scylladb/scylladb#24926
2025-07-17 14:00:43 +02:00
Botond Dénes
20693edb27 Merge 'sstables: put index_reader behind a virtual interface' from Michał Chojnowski
This is a refactoring patch in preparation for BTI indexes. It contains no functional changes (or at least it's not intended to).

In this patch, we modify the sstable readers to use index readers through a new virtual `abstract_index_readers` interface.
Later, we will add BTI indexes which will also implement this interface.

This interface contains the methods of `index_reader` which are needed by sstable readers, and leaves out all other methods, such as `current_clustered_cursor`.

Not all methods of this interface will be implementable by a trie-based index later. For example, a trie-based index can't provide a reliable `get_partition_key()`, because — unlike the current index — it only stores partition keys for partitions which have a row index. So the interface will have to be further restricted later. We don't do that in this patch because that will require changes to sstable reader logic, and this patch is supposed to only include cosmetic changes.

No backports needed, this is a preparation for new functionality.

Closes scylladb/scylladb#25000

* github.com:scylladb/scylladb:
  sstables: add sstable::make_index_reader() and use where appropriate
  sstables/mx: in readers, use abstract_index_reader instead of index_reader
  sstables: in validate(), use abstract_index_reader instead of index_reader where possible
  test/lib/index_reader_assertions: accept abstract_index_reader instead of index_reader
  sstables/index_reader: introduce abstract_index_reader
  sstables/index_reader: extract a prefetch_lower_bound() method
2025-07-17 14:32:08 +03:00
Nadav Har'El
04b263b51a Merge 'vector_index: do not create a view when creating a vector index' from Michał Hudobski
This PR adds a way for custom indexes to decide whether a view should be created for them, as for the vector_index the view is not needed, because we store it in the external service. To allow this, custom logic for describing indexes using custom classes was added (as it used to depend on the view corresponding to an index).

Fixes: VECTOR-10

Closes scylladb/scylladb#24438

* github.com:scylladb/scylladb:
  custom_index: do not create view when creating a custom index
  custom_index: refactor describe for custom indexes
  custom_index: remove unneeded duplicate of a static string
2025-07-17 13:48:49 +03:00
Michał Chojnowski
4e4a4b6622 sstables: add sstable::make_index_reader() and use where appropriate
If we add multiple index implementations, users of index readers won't
easily know which concrete index reader type is the right one to construct.

We also don't want pieces of code to depend on functionality specific to
certain concrete types, if that's not necessary.

So instead of constructing the readers by themselves, they can use a helper
function, which will return an abstract (virtual) index reader.
This patch adds such a function, as a method of `sstable`.
2025-07-17 10:32:57 +02:00