Commit Graph

8550 Commits

Author SHA1 Message Date
Michał Chojnowski
10fa4abde7 compress: change compressor_ptr from shared_ptr to unique_ptr
Cleanup patch. After we moved the ownership of compressors
to sstables, compressor objects never have shared lifetime.
`unique_ptr` is more appropriate for them than `shared_ptr` now.
(And besides expressing the intent better, using `unique_ptr`
prevents an accidental cross-shard `shared_ptr` copy).
2025-04-01 00:07:29 +02:00
Michał Chojnowski
58ae278d10 api: add the retrain_dict API call
Add an API call which will retrain the SSTable compression dictionary
for a given table.

Currently, it needs all nodes to be alive to succeed. We can relax this later.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
b77c611c00 raft/group0_state_machine: on system.dicts mutations, pass the affected partitition keys to the callback
Before this patch, `system.dicts` contains only one dictionary, for RPC
compression, with the fixed name "general".

In later parts of this series, we will add more dictionaries to
system.dicts, one per table, for SSTable compression.

To enable that, this patch adjusts the callback mechanism for group0's `write_mutations`
command, so that the mutation callbacks for group0-managed tables can see which
partition keys were affected. This way, the callbacks can query only the
modified partitions instead of doing a full scan. (This is necessary to
prevent quadratic behaviours.)

For now, only the `system.dicts` callback uses the partition keys.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
30a9d471fa sstables: plug an sstable_compressor_factory into sstables_manager
Create a `sstable_compressor_factory_impl` in `scylla_main`,
and pipe it through constructors into `sstables_manager`.

In next commits, the factory available through the `sstables_manager`
will be used to create compressors for SSTable readers and writers.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
11be7c0704 compress: remove compression_parameters::get_compressor()
Following up on the previous commits, we avoid constructing
compressors where not necessary,
by checking things directly on `compression_parameters` instead.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
8e611536b0 sstables/compress: move ownership of compressor to sstable::compression
SSTable readers and writers use `compressor` objects to compress and
decompress chunks of SSTable data files.

`compressor` objects are read-only, so only one of them is needed
for each SSTable. Before this commit, each reader and writer has
its own `compressor` object. This isn't necessary, but it's okay.

But later in this series it will stop being okay, because the creation
of a `compressor` will become an expensive cross-shard
operation (because it might require sharing a compression dictionary
from another shard). So we have to adjust the code so that there is
only once `compressor` per sstable, not one per reader/writer.

We stuff the ownership of this compressor into `sstable::compression`.

To make the ownership clear, we remove `compression_ptr` shared
pointers from readers and writers, and make them access the
compressor via the `sstable::compression` instead.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
cfe69e057f sstables/compress: break the dependency of compression_parameters on compressor
Note: this commit is meant to be a code refactoring only and is not intended
to change the observable behaviour.

Today `schema` contains a `compression_parameters`.
`compression_parameters` contains an instance of
`compressor`, and SSTable writers just share that instance.

This is fine because `compressor` is a stateless object,
functionally dependent on the schema.

But in later parts of the series, we will break this functional
dependency by adding dictionaries to compressors. Two writers
for the same schema might have different dictionaries, so they won't
be able to just share a single instance contained in the schema.

And when that happens, having a `compressor` instance
in the `schema`/`compression_parameters` will become awkward,
since it won't be actually used. It will be only a container for options.

In addition, for performance reasons, we will want to share some pieces
of compressors across shards, which will require -- in the general case --
a construction of a compressor to be asynchronous, and therefore not
possible inside the constructor of `compression_parameters`.

This commit modifies `compression_parameters` so that it doesn't hold or
construct instances of `compressor`.

Before this patch, the `compressor` instance constructed in
`compression_parameters` has an additional role of validating and
holding compressor-specific options.
(Today the only such option is the zstd compression level).

This means that the pieces of logic responsible for compressor-specific
options have to be rewritten. That ends up being the bulk of this commit.
2025-04-01 00:07:27 +02:00
Tomasz Grabiec
29d1c2adc6 Merge 'Finalize tablet splits earlier' from Lakshmi Narayanan Sreethar
Resize finalization is executed in a separate topology transition state,
`tablet_resize_finalization`, to ensure it does not overlap with tablet
transitions. The topology transitions into the
`tablet_resize_finalization` state only when no tablet migrations are
scheduled or being executed. If there is a large load-balancing backlog,
split finalization might be delayed indefinitely, leaving the tables
with large tablets.

This PR fixes the issue by updating the load balancer to no schedule any
migrations and to not make any repair plans when there a resize
finalization is pending in any table.

Also added a testcase to verify the fix.

Fixes #21762

Improvement : No need to backport.

Closes scylladb/scylladb#22148

* github.com:scylladb/scylladb:
  topology_coordinator: fix indentation in generate_migration_updates
  topology_coordinator: do not schedule migrations when there are pending resize finalizations
  load_balancer: make repair plans only when there is no pending resize finalization
2025-03-31 14:42:34 +02:00
Botond Dénes
90c20858ed Merge 'test/database: Remove most of take_snapshot() helper overloads and re-use them more' from Pavel Emelyanov
This helper facilitate snapshot creation by various test cases in database_test.cc. This PR generalizes all overloads into one that suits all callers and patches one more test case to use it as well.

Closes scylladb/scylladb#23482

* github.com:scylladb/scylladb:
  test/database: Re-use take_snapshot() helper once more
  test/database: Remove most of take_snapshot() helper overloads
2025-03-31 15:20:51 +03:00
Benny Halevy
5f2ce0b022 loading_cache_test: test_loading_cache_reload_during_eviction: use manual_clock
Rather than lowres_clock, as since
32b7cab917,
loading_cache_for_test uses manual_clock for timing
and relying on lowres_clock to time the test might
run out of memory on fast test machines.

Fixes #23497

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#23498
2025-03-31 14:53:06 +03:00
Pavel Emelyanov
ac582efb44 test/database: Re-use take_snapshot() helper once more
There's a test case that can call the recently patched take_snapshot()
helper as well. This changes nothing, but makes further patching a bit
simpler (not in this branch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-31 13:18:06 +03:00
Pavel Emelyanov
7e6380b6bd test/database: Remove most of take_snapshot() helper overloads
There are 3 of those that help tests (re)shuffle cql_test_env/database,
skip_flush == true/false options and keyspace/table/snapshot names.
There's little sense in having that many of those, just one overload
with default arguments suits most of the callers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-31 13:18:06 +03:00
Dawid Mędrek
b0b0c5905e test/cluster/test_multidc: Clean up RF-rack-valid keyspaces tests
There are some minor things we should fix that are a remnant
of the original changes (scylladb/scylladb@7646e14).

Closes scylladb/scylladb#23429
2025-03-31 09:38:42 +03:00
Pavel Emelyanov
693387bda6 Merge 'test.py: topology: allow to run tests with bare pytest command' from Evgeniy Naydanov
Add possibility to run topology tests using bare pytest command.

To achieve this goal the following changes were made:

- Add fixtures `testpy_testsuite` and `testpy_test` to `test/conftest.py`.
- To build `TestSuite` object we need to discover a corresponding `suite.xml` file.  Do this by walking up thru the fs tree starting from the current test file.
- Run ScyllaClusterManager using pytest fixture if `--manager-api` option is not provided.

And made some refactoring:

- Add path constants to `test` module and use them in different test suites instead of own dups of the same code:
  - TOP_SRC_DIR : ScyllaDB's source code root directory
  - TEST_DIR : the directory with test.py tests and libs
  - BUILD_DIR : directory with ScyllaDB's build artifacts
- Add TestSuite.log_dir attribute as a ScyllaDB's build mode subdir of a path provided using `--tmpdir` CLI argument. Don't use `tmpdir` name because it mixed up with pytest's built-in fixture and `--tmpdir` option itself.
- Change default value for `--tmdir` from `./testlog` to `TOP_SRC_DIR/testlog`
- Refactor `ResourceGather*` classes to use path from a `test` object instead of providing it separately.
- Move modes constants (`all_modes`/`ALL_MODES` and `debug_modes`/`DEBUG_MODES`) to `test` module and remove duplication.
- Move `prepare_dirs()` and `start_3rd_party_services()` from `pylib.util` to`pylib.suite.base` to avoid circular imports.
- In some places refactor to use f-strings for formatting.

Also minor changes related to running with pytest-xdist:

- When run tests in parallel we need to ensure that filenames are unique by adding xdist worker ID to them.
- Pass random seed across xdist workers using env variable.

Closes scylladb/scylladb#22960

* github.com:scylladb/scylladb:
  test.py: async_cql: remove unused event_loop fixture
  test.py: random_failures: make it play well with xdist
  test.py: add xdist worker ID to log filenames
  test.py: topology: run tests using bare pytest command
  test.py: add fixtures for current test suite and test
  test.py: refactor paths constants and options
2025-03-31 09:30:06 +03:00
Piotr Smaron
a2bbbc6904 auth: forbid modifying system ks by non-superusers
Before this patch, granting a user MODIFY permissions on ALL KEYSPACES allowed the user to write to system tables, where the user could also set himself to "superuser" granting him all other permissions. After this patch, MODIFY permissions on ALL KEYSPACES is limited only to non-system keyspaces.

Fixes: scylladb/scylladb#23218

Closes scylladb/scylladb#23219
2025-03-30 16:55:04 +03:00
Ferenc Szili
2c9b312b58 test: port of test and reproducer for resurrection during file based streaming
This change ports test/cluster/test_resurrection.py from enterprise to
master. Because the underlying issue deals with file based streaming,
this test was a part of the enterprise repo. It contains the test and
reproducer for the issue described below:

When tablets are migrated with file-based streaming, we can have a situation
where a tombstone is garbage collected before the data it shadows lands. For
instance, if we have a tablet replica with 3 sstables:

1 sstable containing an expired tombstone
2 sstable with additional data
3 sstable containing data which is shadowed by the expired tombstone in sstable 1

If this tablet is migrated, and the sstables are streamed in the order listed
above, the first two sstables can be compacted before the third sstable arrives.
In that case, the expired tombstone will be garbage collected, and data in the
third sstable will be resurrected after it arrives to the pending replica.

The fix for the issue was merged in b66479ea98

This patch only ports the missing test.

Closes scylladb/scylladb#23466
2025-03-30 13:39:40 +03:00
Evgeniy Naydanov
1a0c14aa50 test.py: async_cql: remove unused event_loop fixture
Newer version of pytest-asyncio (0.24.0) allows to control the scope
of async loop per fixture.  Don't need this workaround anymore.
2025-03-30 03:19:30 +00:00
Evgeniy Naydanov
cac0257914 test.py: random_failures: make it play well with xdist
Pass random seed across xdist workers using env variable.
2025-03-30 03:19:30 +00:00
Evgeniy Naydanov
9bba59631f test.py: add xdist worker ID to log filenames
When run tests in parallel we need to ensure that filenames
are unique by adding xdist worker ID to them.
2025-03-30 03:19:30 +00:00
Evgeniy Naydanov
9cb0ec2b42 test.py: topology: run tests using bare pytest command
Run ScyllaClusterManager using pytest fixture if `--manager-api`
option is not provided.

On this stage we're trying to be as close to test.py as possible.
test.py runs tests file-by-file, so, effectively, scopes `session`,
`package`, and `module` are pretty same.  Also, test.py starts
ScyllaClusterManager for every test module and this is the reason
why fixture `manager_api_sock_path` has scope=`module`.  And, in
result, we need to change scope for fixture `manager_internal` too.
2025-03-30 03:19:29 +00:00
Evgeniy Naydanov
42075170d1 test.py: add fixtures for current test suite and test
Add fixtures `testpy_testsuite` and `testpy_test` to `test/conftest.py`
To build TestSuite object we need to discover a corresponding `suite.xml`
file.  Do this by walking up thru the fs tree starting from the current
test file.
2025-03-30 03:19:29 +00:00
Evgeniy Naydanov
c4ae4e247a test.py: refactor paths constants and options
Add path constants to `test` module and use them in different test suites
instead of own dups of the same code:

 - TOP_SRC_DIR : ScyllaDB's source code root directory
 - TEST_DIR : the directory with test.py tests and libs
 - BUILD_DIR : directory with ScyllaDB's build artefacts

Add TestSuite.log_dir attribute as a ScyllaDB's build mode subdir of a path
provided using `--tmpdir` CLI argument.  Don't use `tmpdir` name because it
mixed up with pytest's built-in fixture and `--tmpdir` option itself.

Change default value for `--tmdir` from `./testlog` to `TOP_SRC_DIR/testlog`

Refactor `ResourceGather*` classes to use path from a `test` object instead of
providing it separately.

Move modes constants to `test` module and remove duplications.

Move `prepare_dirs()` and `start_3rd_party_services()` from `pylib.util` to
`pylib.suite.base` to avoid circular imports (with little refactoring to
use `pathlib.Path` instead of `str` as paths.)

Also, in some places refactor to use f-strings for formatting.
2025-03-30 03:19:29 +00:00
Michał Jadwiszczak
0ee0696959 test/cqlpy/test_service_level_api: update to service levels on raft and remove flakiness
Tests in `test_service_level_api` were written before
scylladb/scylladb#16585 and they were doing 10s sleeps to wait for
service level controller to update its configuration. Now performing
a read barrier is sufficient to ensure SL configuration is up-to-date,
which significantly reduces tests time (from ~60s to ~2-3s).

Moreover, there was flakiness in the `test_switch_tenants` test.
Until now, the test waited up to 60s for the connections to update
their scheduling groups. However, it is difficult to determine
how long the process might take because a connection may be blocked
while waiting for the next request to be processed,
and the scheduling group will be updated only after a request is processed
(see `generic_server::connection::process_until_tenant_switch()`).
To address this issue, 100 simple queries are executed so that
connections on all shards process at least one request
and update their scheduling groups.

Fixes scylladb/scylladb#22768

Closes scylladb/scylladb#23381
2025-03-28 17:14:21 +03:00
Avi Kivity
6d7cb68aab test: ldap: avoid io_uring Seastar reactor backend
It tends to fail sometimes with ENOMEM:

```
ERROR 2025-03-24 01:05:22,983 [shard 0:sl:d] ldap_role_manager - error in reconnect: std::system_error (error C-Ares:4, server.that.will.never.exist.scylladb.com: Not found)
ERROR 2025-03-24 01:05:30,984 [shard 0:sl:d] ldap_role_manager - error in reconnect: std::system_error (error C-Ares:4, server.that.will.never.exist.scylladb.com: Not found)
ERROR 2025-03-24 01:05:47,123 [shard 0:main] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:12, Cannot allocate memory)
ERROR 2025-03-24 01:05:47,139 [shard 0:main] table - failed to write sstable /scylladir/testlog/x86_64/debug/scylla-33787f64/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/me-3got_1s5n_0lfls1y4z7vkkts07a-big-Data.db: storage_io_error (Storage I/O error: 12: Cannot allocate memory)
ERROR 2025-03-24 01:05:47,140 [shard 0:main] table - Memtable flush failed due to: storage_io_error (Storage I/O error: 12: Cannot allocate memory). Aborting, at 0x30f5605 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x4514f14 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x4514b96 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x45165b1 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x4518dcf 0x3fde842 0x35dc5c6 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36c26ed /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36cdd0c /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d2cd2 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d0e56 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x327f47a /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x327c8f0 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1cdd4 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c79c /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c69c /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c184 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x34b2674 0x314b8b6 /lib64/libc.so.6+0x70ba7 /lib64/libc.so.6+0xf4b8b
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> >(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> >(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::shared_future<>::shared_state
Aborting on shard 0, in scheduling group main.
Backtrace:
  0x30f5605
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x384a0e4
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x3849db2
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x369bd84
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d42a2
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x37a5ed9
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x37a61d5
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x37a601f
  /lib64/libc.so.6+0x1a04f
  /lib64/libc.so.6+0x72b53
  /lib64/libc.so.6+0x19f9d
  /lib64/libc.so.6+0x1941
  0x3fde8b1
  0x35dc5c6
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36c26ed
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36cdd0c
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d2cd2
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d0e56
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x327f47a
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x327c8f0
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1cdd4
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c79c
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c69c
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c184
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x34b2674
  0x314b8b6
  /lib64/libc.so.6+0x70ba7
  /lib64/libc.so.6+0xf4b8b
=== TEST.PY SUMMARY START ===
Test exited with code -6

=== TEST.PY SUMMARY END ===

=== decoded ===
Backtrace:
[Backtrace #0]
__interceptor_backtrace at /mnt/clang_build/llvm-project-x86_64/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4369
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/debug/seastar/./seastar/include/seastar/util/backtrace.hh:70
seastar::backtrace_buffer::append_backtrace() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:805
seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:838
seastar::print_with_backtrace(char const*, bool) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:850
seastar::sigabrt_action() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:4004
seastar::install_oneshot_signal_handler<6, (void (*)())(&seastar::sigabrt_action)>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3981
seastar::install_oneshot_signal_handler<6, (void (*)())(&seastar::sigabrt_action)>()::{lambda(int, siginfo_t*, void*)#1}::__invoke(int, siginfo_t*, void*) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3976
/lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=c8c3fa52aaee3f5d73b6fd862e39e9d4c010b6ba, for GNU/Linux 3.2.0, not stripped

?? ??:0
printf_positional at ??:?
?? ??:0
?? ??:0
replica::table::seal_active_memtable(replica::compaction_group&, replica::flush_permit&&)::$_0::operator()(std::function<seastar::future<void> ()>) const at ././replica/table.cc:1512
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/coroutine:242
 (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:122
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:2616
seastar::reactor::run_some_tasks() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3088
seastar::reactor::do_run() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3256
seastar::reactor::run() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3146
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/app-template.cc:167
seastar::testing::test_runner::start_thread(int, char**)::$_0::operator()() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/testing/test_runner.cc:77
void std::__invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(std::__invoke_other, seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/invoke.h:61
std::enable_if<is_invocable_r_v<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>, void>::type std::__invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/invoke.h:111
std::_Function_handler<void (), seastar::testing::test_runner::start_thread(int, char**)::$_0>::_M_invoke(std::_Any_data const&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/std_function.h:290
seastar::posix_thread::start_routine(void*) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/posix.cc:90
asan_thread_start(void*) at /mnt/clang_build/llvm-project-x86_64/compiler-rt/lib/asan/asan_interceptors.cpp:239
__vfscanf_internal at :?
peek_token at ??:?
```

In ce65164315, we banned io_uring from tests, but missed the ldap
tests. This extends coverage to ldap tests.

I verified that the new options indeed reach the test.

Refs #23411.

Credit to Botond for recognizing the failure reason.

Closes scylladb/scylladb#23422
2025-03-28 07:45:53 +02:00
Botond Dénes
bd8973a025 tools/scylla-nodetool: s/GetInt()/GetInt64()/
GetInt() was observed to fail when the integer JSON value overflows the
int32_t type, which `GetInt()` uses for storage. When this happens,
rapidjson will assign a distinct 64 bit integer type to the value, and
attempting to access it as 32 bit integer triggers the wrong-type error,
resulting in assert failure. This was hit on the field where invoking
nodetool netstats resulted in nodetool crashing when the streamed bytes
amounts were higher than maxint.

To avoid such bugs in the future, replace all usage of GetInt() in
nodetool of GetInt64(), just to be sure.

A reproducer is added to the nodetool netstats crash.

Fixes: scylladb/scylladb#23394

Closes scylladb/scylladb#23395
2025-03-27 14:05:39 +02:00
Botond Dénes
d57e71837f Merge 'Improve scoped restore test' from Pavel Emelyanov
This PR includes several fixes to the nowadays flaky test_restore_with_streaming_scopes test.

1. Check that backup and restore APIs don't fail. Currently, if either of them does the test cases fails anyway checking that the data is not restored back, but it's better to know what exactly failed

2. For restore API the test collects the list of sstables to restore from. Currently collecting this list races with background compaction and sometimes leads to restore API to fail which, in turn, makes the whole test to fail

3. Add a test case that validates that restore-from-missing-sstable fails nicely

refs: #23189
No backport, as it's a relatively new test

Closes scylladb/scylladb#23445

* github.com:scylladb/scylladb:
  test/backup: Validate that restoring from non-existing sstables fails
  test/backup: Collect sstables names after snapshot
  test/backup: Check that backup and restore succeed
2025-03-27 13:23:41 +02:00
Piotr Dulikowski
288216a89e Merge 'Ignore wrapped exceptions gate_closed_exception and rpc::closed_error when node shuts down.' from Sergey Zolotukhin
Normally, when a node is shutting down, `gate_closed_exception` and `rpc::closed_error`
in `send_to_live_endpoints` should be ignored. However, if these exceptions are wrapped
in a `nested_exception`, an error message is printed, causing tests to fail.

This commit adds handling for nested exceptions in this case to prevent unnecessary
error messages.

Fixes scylladb/scylladb#23325
Fixes scylladb/scylladb#23305
Fixes scylladb/scylladb#21815

Backport: looks like this is quite a frequent issue, therefore backport to 2025.1.

Closes scylladb/scylladb#23336

* github.com:scylladb/scylladb:
  database: Pass schema_ptr as const ref in `wrap_commitlog_add_error`
  database: Unify exception handling in `do_apply` and `apply_with_commitlog`
  storage_proxy: Ignore wrapped `gate_closed_exception` and `rpc::closed_error` when node shuts down.
  exceptions: Add `try_catch_nested` to universally handle nested exceptions of the same type.
2025-03-27 11:39:42 +01:00
Pavel Emelyanov
9f036d957a Merge 'test/clqpy/test_tool.py: get_sstables_for_table(): exclude non-sealed sstables' from Botond Dénes
Filter out sstables which don't have a TOC or have a temporary TOC. Such sstables are incomplete and can dissapear if the compaction which writes them is interrupted.

Fixes: #23203

This PR fixes a flaky test which is only on master, no backports required.

Closes scylladb/scylladb#23450

* github.com:scylladb/scylladb:
  test/cqlpy/test_tools.py: test_scylla_sstable_query: reduce scope of no-compaction context
  test/clqpy/test_tool.py: get_sstables_for_table(): exclude non-sealed sstables
2025-03-27 09:45:07 +03:00
Tomasz Grabiec
8e506c5a8f test: tablets: Fix flakiness due to ungraceful shutdown
The test fails sporadically with:

cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}

That's becase a server is stopped in the middle of the workload.

The server is stopped ungracefully which will cause some requests to
time out. We should stop it gracefully to allow in-flight requests to
finish.

Fixes #20492

Closes scylladb/scylladb#23451
2025-03-27 09:44:07 +03:00
Lakshmi Narayanan Sreethar
5b47d84399 topology_coordinator: do not schedule migrations when there are pending resize finalizations
Resize finalization is executed in a separate topology transition state,
`tablet_resize_finalization`, to ensure it does not overlap with tablet
transitions. The topology transitions into the
`tablet_resize_finalization` state only when no tablet migrations are
scheduled or being executed. If there is a large load-balancing backlog,
split finalization might be delayed indefinitely, leaving the tables
with large tablets.

To fix this, do not schedule tablet migrations on any tables when there
are pending resize finalizations. This ensures that migrations from the
same table and other unrelated tables do not block resize finalization.

Also added a testcase to verify the fix.

Fixes #21762

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-03-27 10:16:34 +05:30
Avi Kivity
b292b5800b Merge 'test.py: move starting LDAP service to dedicate method' from Andrei Chekun
Move starting LDAP to the method where the rest of the services are started. This will unify the way of starting the 3rd party services.
Fix LDAP tests flakiness due not possible to connect to LDAP server.
Add catching stdout and stderr of toxiproxy-cli in case of errors

Related: https://github.com/scylladb/scylladb/pull/23333

This PR is based on https://github.com/scylladb/scylladb/pull/23221, so #23221 should be merged first.

Closes scylladb/scylladb#23235

* github.com:scylladb/scylladb:
  test.py: Refactor nodetool/conftest
  test.py: Refactor test/pylib/cpp/ldap
  test.py: move starting LDAP service to dedicate method
2025-03-26 15:31:00 +02:00
Botond Dénes
801339bad9 test/cqlpy/test_tools.py: test_scylla_sstable_query: reduce scope of no-compaction context
To just system.local, the table these tests operate on. No need to
disable autocompaction for all of the system keyspace.
2025-03-26 09:19:38 -04:00
Botond Dénes
3ec863c4ce test/clqpy/test_tool.py: get_sstables_for_table(): exclude non-sealed sstables
Filter out sstables which don't have a TOC or have a temporary TOC. Such
sstables are incomplete and can dissapear if the compaction which writes
them is interrupted.
2025-03-26 09:18:34 -04:00
Sergey Zolotukhin
6abfed9817 exceptions: Add try_catch_nested to universally handle nested exceptions of the same type. 2025-03-26 11:15:13 +01:00
Evgeniy Naydanov
574c81eac6 test.py: random_failures: deselect topology ops for some injections
After recent changes #18640 and #19151 started to reproduce for
stop_after_sending_join_node_request and
stop_after_bootstrapping_initial_raft_configuration error injections too.

The solution is the same: deselect the tests.

Fixes #23302

Closes scylladb/scylladb#23405
2025-03-26 12:07:12 +03:00
Pavel Emelyanov
38f37763d6 test/backup: Validate that restoring from non-existing sstables fails
When restore API is called and is given a non-existing sstable (object
name) the task should complete with failed status and some meaningful
message in the error text.

refs: #23189

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-26 10:55:42 +03:00
Pavel Emelyanov
02610a9072 test/backup: Collect sstables names after snapshot
The scoped restoer test works like this

- populate table
- flush it
- collect list of sstables
- take snapshot
- backup
- restore (with the list of sstables as argument)
- check the data is back

Steps 2 and 3 are racy -- in case compaction comes in the middle, the
list of collected sstables would differ from those snapshotted (and
backuped) which will later lead to restore failure due to missing
sstable.

Fix by collecting the list of sstables after taking snapshot, and
collect those not from the datadir, but from the snapshot dir.

fixes: #23189

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-26 10:40:54 +03:00
Pavel Emelyanov
08004fe470 test/backup: Check that backup and restore succeed
The scoped-restore test calls backup and restore APIs on several nodes,
but doesn't check if any of the operations actually succeeds. Sometimes
they indeed don't and test captures this, but in a weird manner -- the
post-test checks for data presense fails, because the expected data is
not in fact in its place.

It's more debugging-friendly if we know in advance if backup or restore
fails, rather than see that some data is missing after (failed) restore.

refs: #23189

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-25 19:45:56 +03:00
Gleb Natapov
0aa4a82c83 messaging_service: do not call uninitialized _address_to_host_id_mapper std::function
During messaging_service object creation remove_rpc_client function may
be called if prefer_local snitch setting is true. The caller does not
provide host id, so _address_to_host_id_mapper is called to obtain it,
but at this point the function is not initialized yet.

The patch fixes the code to not call the function if not initialized.
This is not the problem since during messaging_service creation there
is no connection to drop.

Fixes: #23353

Message-ID: <Z-J2KbBK8NoFNYZZ@scylladb.com>
2025-03-25 18:41:16 +02:00
Wojciech Mitros
88d3fc68b5 alter_table_statement: fix renaming multiple columns in tables with views
When we rename columns in a table which has materialized views depending
on it, we need to also rename them in the materialized views' WHERE
clauses.
Currently, we do that by creating a new WHERE clause after each rename,
with the updated column. This is later converted to a mutation that
overwrites the WHERE clause. After multiple renames, we have multiple
mutations, each overwriting the WHERE clause with one column renamed.
As a result, the final WHERE clause is one of the modified clauses with
one column renamed.
Instead, we should prepare one new WHERE clause which includes all the
renamed columns. This patch accomplishes this by processing all the
column renames first, and only preparing the new view schema with the
new WHERE clause afterwards.

This patch also includes a test reproducer for this scenario.

Fixes scylladb/scylladb#22194

Closes scylladb/scylladb#23152
2025-03-25 09:58:58 +01:00
Michael Litvak
49b8cf2d1d storage_service: fix tablet split of materialized views
This fixes an issue where materialized view tablets are not split
because they are not registered as split candidates by the storage
service.

The code in storage_service::replicate_to_all_cores was changed in
4bfa3060d0 to handle normal tables and view tables separately, but with
that change register_tablet_split_candidate is applied only to normal
tables and not every table like before. We fix it by registering view
tables as well.

We add a test to verify that split of MV tables works.

Closes scylladb/scylladb#23335
2025-03-24 08:23:58 +01:00
Avi Kivity
cc5fe542ed test: ignore unused fmt::to_string() result
fmt 11.1 apparently marks to_string() as [[nodiscard]]. Here we aren't
interested in the result, so explicitly ignore it to avoid an error.

Closes scylladb/scylladb#23403
2025-03-24 10:19:09 +03:00
Avi Kivity
cd04ab1a4e test: avoid spaces when defining user-defined literal operator
Clang 20 complains when it sees a user-defined literal operator
defined with a space before the underscore. Assume it's adhering
to the standard and comply.

Closes scylladb/scylladb#23401
2025-03-24 10:17:12 +03:00
Pavel Emelyanov
d436fb8045 Merge 'Fix EAR not applied on write to S3 (but on read).' from Calle Wilund
Fixes #23225
Fixes #23185

Adds a "wrap_sink" (with default implementation) to sstables::file_io_extension, and moves
extension wrapping of file and sink objects to storage level.
(Wrapping/handling on sstable level would be problematic, because for file storage we typically re-use the sstable file objects for sinks, whereas for S3 we do not).

This ensures we apply encryption on both read and write, whereas we previously only did so on read -> fail.
Adds io wrapper objects for adapting file/sink for default implementation, as well as a proper encrypted sink implementation for EAR.

Unit tests for io objects and a macro test for S3 encrypted storage included.

Closes scylladb/scylladb#23261

* github.com:scylladb/scylladb:
  encryption: Add "wrap_sink" to encryption sstable extension
  encrypted_file_impl: Add encrypted_data_sink
  sstables::storage: Move wrapping sstable components to storage provider
  sstables::file_io_extension: Add a "wrap_sink" method.
  sstables::file_io_extension: Make sstable argument to "wrap" const
  utils: Add "io-wrappers", useful IO helper types
2025-03-24 10:12:46 +03:00
Artsiom Mishuta
8bb6414037 test.py: reuse clusters in Python suite
PR https://github.com/scylladb/scylladb/pull/22274 was introduced due to
CI instability and want to mark the cluster dirty after each test for topology
But in fact, affects only Python suites that are quite stable, and CI was
Stabilized by PR https://github.com/scylladb/scylladb/pull/22252

This PR get back cluster reusage in Python test suites

Closes scylladb/scylladb#23179
2025-03-23 20:08:36 +02:00
Avi Kivity
9adfb91f46 Merge 'Introduce s3 data_source_impl for optimized object streaming' from Pavel Emelyanov
Currently, to stream data from sstable component the sstables code uses file_data_source_impl. In case the component is on S3, the s3::readable_file is put into that data source. The data source is configured with 128k buffers and at most 4 read-ahead-s. With that configuration, downloading full object from S3 becomes too slow -- GET-ing file with 128k requests is not nice even with 4 parallel read-ahead-s.

Better solution for S3 downloading is to request way larger chunk with one GET and then produce smaller, 128k or alike, buffers upon data arrival. This is what the newly introduced data source impl does -- it spawns a background GET and lets the upper input stream read buffers directly from the arriving body.

This PR doesn't yet make sstable layer use the new sink, just introduces it and adds unit and perf tests.

Testing

|Test|Download speed, MB/s|
|-|-|
|file_input_stream (*), 1 socket | 4.996|
|file_input_stream (*), 2 sockets | 9.403|
|s3_data_source (**) | 93.164|

(*) The file_input_stream test renders 128k GETs and is configured to issue at most 4 read-ahead-s
(**) The s3_data_source uses at most 1 socket regardless of what perf-test configures it to

refs: #22458

Closes scylladb/scylladb#22907

* github.com:scylladb/scylladb:
  test: Extend s3-perf test with stream download one
  test/perf: Tune-up s3 test options parsing
  test: Add unit test for newly introduced download source
  s3/client: Introduce data_source_impl for object downloading
  s3/client: Detach format_range_header() helper
2025-03-23 14:22:04 +02:00
Pavel Emelyanov
ca3b604afa test: Extend s3-perf test with stream download one
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-21 12:01:07 +03:00
Pavel Emelyanov
283e8e0706 test/perf: Tune-up s3 test options parsing
Rename the `--upload bool` into `--operation string` one, so that new
tests can be added in the future. Also rename run_download() to
run_contiguous_get() because this is what the internals of this method
do -- just GET contiguous ranges sequentially.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-21 12:01:07 +03:00
Pavel Emelyanov
bd313c581f test: Add unit test for newly introduced download source
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-21 12:01:06 +03:00
Avi Kivity
7646e1448a Merge 'cql3: Introduce RF-rack-valid keyspaces' from Dawid Mędrek
This PR is an introductory step towards enforcing
RF-rack-valid keyspaces in Scylla.

The scope of changes:
* defining RF-rack-valid keyspaces,
* introducing a configuration option enforcing RF-rack-valid
  keyspaces,
* restricting the CREATE and ALTER KEYSPACE statements
  so that they never lead to RF-rack invalid keyspaces,
* during the initialization of a node, it verifies that all existing
  keyspaces are RF-rack-valid. If not, the initialization fails.

We provide tests verifying that the changes behave as intended.

---

Note that there are a number of things that still need to be implemented.
That includes, for instance, restricting topology operations too.

---

Implementation strategy (going beyond the scope of this PR):

1. Introduce the new configuration option `rf_rack_valid_keyspaces`.
2. Start enforcing RF-rack-validity in keyspaces if the option is enabled.
3. Adjust the tests: in the tree and out of it. Explicitly enable the option in all tests.
4. Once the tests have been adjusted, change the default value of the option to enabled.
5. Stop explicitly enabling the option in tests.
6. Get rid of the option.

---

Fixes scylladb/scylladb#20356
Fixes scylladb/scylladb#23276
Fixes scylladb/scylladb#23300

---

Backport: this is part of the requirements for releasing 2025.1.

Closes scylladb/scylladb#23138

* github.com:scylladb/scylladb:
  main: Refuse to start node when RF-rack-invalid keyspace exists
  cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces
  db/config: Introduce RF-rack-valid keyspaces
2025-03-20 19:10:36 +02:00