Commit Graph

47235 Commits

Author SHA1 Message Date
Pavel Emelyanov
b5a124f60c sstable_directory: Move highest_generation_seen() to distributed_loader.cc
This method is only used by the loader code (and tests). Also, There's the
highest_version_seen() peer that sits in the loader code either.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23324
2025-04-01 09:15:14 +03:00
Pavel Emelyanov
eafc767cc6 sstable/filesystem: Add convenience helper to generate filename
In its operations the fs storage carefully generates full filename from
all sstable parameters -- version, format, generation, keyspace and
table names and component type or name. However, in all of the cases
format, version and keyspace:table names are inherited from the sstable
being operated on. This calls for a filename generation helper that
wraps most of the arguments thus making the lines shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23384
2025-04-01 09:14:44 +03:00
Botond Dénes
0fdf2a2090 Merge 'test/pylib: servers_add: support list of property_files' from Benny Halevy
So that a multi-dc/multi-rack cluster can be populated
in a single call.

* Enhancement, no backport required

Closes scylladb/scylladb#23341

* github.com:scylladb/scylladb:
  test/pylib: servers_add: add auto_rack_dc parameter
  test/pylib: servers_add: support list of property_files
2025-04-01 09:14:20 +03:00
Jenkins Promoter
6c528f5027 Update pgo profiles - aarch64 2025-04-01 04:45:44 +03:00
Jenkins Promoter
3c12029584 Update pgo profiles - x86_64 2025-04-01 04:27:11 +03:00
Tomasz Grabiec
29d1c2adc6 Merge 'Finalize tablet splits earlier' from Lakshmi Narayanan Sreethar
Resize finalization is executed in a separate topology transition state,
`tablet_resize_finalization`, to ensure it does not overlap with tablet
transitions. The topology transitions into the
`tablet_resize_finalization` state only when no tablet migrations are
scheduled or being executed. If there is a large load-balancing backlog,
split finalization might be delayed indefinitely, leaving the tables
with large tablets.

This PR fixes the issue by updating the load balancer to no schedule any
migrations and to not make any repair plans when there a resize
finalization is pending in any table.

Also added a testcase to verify the fix.

Fixes #21762

Improvement : No need to backport.

Closes scylladb/scylladb#22148

* github.com:scylladb/scylladb:
  topology_coordinator: fix indentation in generate_migration_updates
  topology_coordinator: do not schedule migrations when there are pending resize finalizations
  load_balancer: make repair plans only when there is no pending resize finalization
2025-03-31 14:42:34 +02:00
Botond Dénes
90c20858ed Merge 'test/database: Remove most of take_snapshot() helper overloads and re-use them more' from Pavel Emelyanov
This helper facilitate snapshot creation by various test cases in database_test.cc. This PR generalizes all overloads into one that suits all callers and patches one more test case to use it as well.

Closes scylladb/scylladb#23482

* github.com:scylladb/scylladb:
  test/database: Re-use take_snapshot() helper once more
  test/database: Remove most of take_snapshot() helper overloads
2025-03-31 15:20:51 +03:00
Benny Halevy
5f2ce0b022 loading_cache_test: test_loading_cache_reload_during_eviction: use manual_clock
Rather than lowres_clock, as since
32b7cab917,
loading_cache_for_test uses manual_clock for timing
and relying on lowres_clock to time the test might
run out of memory on fast test machines.

Fixes #23497

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#23498
2025-03-31 14:53:06 +03:00
Pavel Emelyanov
ac582efb44 test/database: Re-use take_snapshot() helper once more
There's a test case that can call the recently patched take_snapshot()
helper as well. This changes nothing, but makes further patching a bit
simpler (not in this branch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-31 13:18:06 +03:00
Pavel Emelyanov
7e6380b6bd test/database: Remove most of take_snapshot() helper overloads
There are 3 of those that help tests (re)shuffle cql_test_env/database,
skip_flush == true/false options and keyspace/table/snapshot names.
There's little sense in having that many of those, just one overload
with default arguments suits most of the callers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-31 13:18:06 +03:00
Botond Dénes
ea55eed037 Merge 'Snapshot several tables at once in scrub API handler' from Pavel Emelyanov
The scrub API handler may want to snapshot several tables. For that, it calls snapshot-ctl method to snapshot a single table for each table in the list. That's excessive, snapshot-ctl has a method to snapshot a bunch of tables at once, just what the scrub handler needs.

It's an improvement, so no need to backport

Closes scylladb/scylladb#23472

* github.com:scylladb/scylladb:
  snapshot-ctl: Remove unused snapshot-single-table method
  api: Snapshot all tables at once in scrub handler
2025-03-31 13:00:32 +03:00
Piotr Smaron
aff8cbc6f3 CODEOWNERS: remove expired owners
Removing krzaq, who's no longer with the company.
Removing core-frontend team members from Alternator areas, as it's no
longer the domain of this team.

Closes scylladb/scylladb#23500
2025-03-31 11:37:51 +03:00
Pavel Emelyanov
0077acd1bb api: Properly validate table in tablet add|del replica handlers
The handlers in question just go and call database.find_column_family,
in case the table in question doesn't exist, the no_such_column_family
exception would be thrown, which is not nice. Proper behavior is to
throw bad_param one and there's a helper that does it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23389
2025-03-31 10:03:17 +02:00
Andrzej Jackowski
c89d8c6566 cql3: prevent from empty option use in cf_statement::column_family()
Implementation of cf_statement::column_family() dereferences _cf_name
option without checking if the option is non-empty. On enterprise
branch, there is a safeguard that prevents from such an empty option
dereferencing. Although the current code on master seems to not call
columny_family() when _cf_name is empty, it is safer to introduce the
same workaround on master, to avoid any regression.

This change:
 - Prevent from empty option use in cf_statement::column_family()

Fixes: scylla-enterprise#5273

Closes scylladb/scylladb#23366
2025-03-31 09:43:22 +03:00
Michał Chojnowski
e23fdc0799 table: fix a race in table::take_storage_snapshot()
`safe_foreach_sstable` doesn't do its job correctly.

It iterates over an sstable set under the sstable deletion
lock in an attempt to ensure that SSTables aren't deleted during the iteration.

The thing is, it takes the deletion lock after the SSTable set is
already obtained, so SSTables might get unlinked *before* we take the lock.

Remove this function and fix its usages to obtain the set and iterate
over it under the lock.

Closes scylladb/scylladb#23397
2025-03-31 09:40:32 +03:00
Avi Kivity
2b9e1e61d0 docs: reader_concurrency_semaphore: document CPU concurrency limit
Document the CPU concurrency implemented in 3d816b7c16
and adjusted in 3d12451d1f.

Closes scylladb/scylladb#23404
2025-03-31 09:39:55 +03:00
Dawid Mędrek
b0b0c5905e test/cluster/test_multidc: Clean up RF-rack-valid keyspaces tests
There are some minor things we should fix that are a remnant
of the original changes (scylladb/scylladb@7646e14).

Closes scylladb/scylladb#23429
2025-03-31 09:38:42 +03:00
David Garcia
1a7be07b8c docs: renders os-support from json file
docs: renders os-support from json file

Closes scylladb/scylladb#23436
2025-03-31 09:36:49 +03:00
Marcin Maliszkiewicz
e3f2ebd4fb cql3: remove not needed cmd copy in indexed_table_select_statement
It's not used variable. There should be a tiny perf increase as
it saves allocation.

Closes scylladb/scylladb#23473
2025-03-31 09:34:32 +03:00
Avi Kivity
73e4a3c581 sstables: store features early in write path
sstable features indicate that an sstable has some extension, or that
some bug was fixed. They allow us to know if we can rely on certain
properties in a read sstables.

Currently, sstable features are set early in the read path (when we
read the scylla metadata file) and very late in the write path
(when we write the scylla metadata file just before sealing the sstable).

However, we happen to read features before we set them in the write path -
when we resize the bloom filter for a newly written sstable we instantiate
an index reader, and that depends on some features. As a result,
we read a disengaged optional (for the scylla metadata component) as if
it was engaged. This somehow worked so far, but fails with libstdc++
hash table implementation.

Fix it by moving storage of the features to the sstable itself, and
setting it early in the write path.

Fixes #23484

Closes scylladb/scylladb#23485
2025-03-31 09:33:56 +03:00
Pavel Emelyanov
693387bda6 Merge 'test.py: topology: allow to run tests with bare pytest command' from Evgeniy Naydanov
Add possibility to run topology tests using bare pytest command.

To achieve this goal the following changes were made:

- Add fixtures `testpy_testsuite` and `testpy_test` to `test/conftest.py`.
- To build `TestSuite` object we need to discover a corresponding `suite.xml` file.  Do this by walking up thru the fs tree starting from the current test file.
- Run ScyllaClusterManager using pytest fixture if `--manager-api` option is not provided.

And made some refactoring:

- Add path constants to `test` module and use them in different test suites instead of own dups of the same code:
  - TOP_SRC_DIR : ScyllaDB's source code root directory
  - TEST_DIR : the directory with test.py tests and libs
  - BUILD_DIR : directory with ScyllaDB's build artifacts
- Add TestSuite.log_dir attribute as a ScyllaDB's build mode subdir of a path provided using `--tmpdir` CLI argument. Don't use `tmpdir` name because it mixed up with pytest's built-in fixture and `--tmpdir` option itself.
- Change default value for `--tmdir` from `./testlog` to `TOP_SRC_DIR/testlog`
- Refactor `ResourceGather*` classes to use path from a `test` object instead of providing it separately.
- Move modes constants (`all_modes`/`ALL_MODES` and `debug_modes`/`DEBUG_MODES`) to `test` module and remove duplication.
- Move `prepare_dirs()` and `start_3rd_party_services()` from `pylib.util` to`pylib.suite.base` to avoid circular imports.
- In some places refactor to use f-strings for formatting.

Also minor changes related to running with pytest-xdist:

- When run tests in parallel we need to ensure that filenames are unique by adding xdist worker ID to them.
- Pass random seed across xdist workers using env variable.

Closes scylladb/scylladb#22960

* github.com:scylladb/scylladb:
  test.py: async_cql: remove unused event_loop fixture
  test.py: random_failures: make it play well with xdist
  test.py: add xdist worker ID to log filenames
  test.py: topology: run tests using bare pytest command
  test.py: add fixtures for current test suite and test
  test.py: refactor paths constants and options
2025-03-31 09:30:06 +03:00
Benny Halevy
a4aa4d74c1 test/pylib: servers_add: add auto_rack_dc parameter
To quickly populate nodes in a single dc,
each node in its own rack.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-30 19:23:40 +03:00
Benny Halevy
c4dbb11c87 test/pylib: servers_add: support list of property_files
So that a multi-dc/multi-rack cluster can be populated
in a single call.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-30 19:12:39 +03:00
Piotr Smaron
a2bbbc6904 auth: forbid modifying system ks by non-superusers
Before this patch, granting a user MODIFY permissions on ALL KEYSPACES allowed the user to write to system tables, where the user could also set himself to "superuser" granting him all other permissions. After this patch, MODIFY permissions on ALL KEYSPACES is limited only to non-system keyspaces.

Fixes: scylladb/scylladb#23218

Closes scylladb/scylladb#23219
2025-03-30 16:55:04 +03:00
Ferenc Szili
2c9b312b58 test: port of test and reproducer for resurrection during file based streaming
This change ports test/cluster/test_resurrection.py from enterprise to
master. Because the underlying issue deals with file based streaming,
this test was a part of the enterprise repo. It contains the test and
reproducer for the issue described below:

When tablets are migrated with file-based streaming, we can have a situation
where a tombstone is garbage collected before the data it shadows lands. For
instance, if we have a tablet replica with 3 sstables:

1 sstable containing an expired tombstone
2 sstable with additional data
3 sstable containing data which is shadowed by the expired tombstone in sstable 1

If this tablet is migrated, and the sstables are streamed in the order listed
above, the first two sstables can be compacted before the third sstable arrives.
In that case, the expired tombstone will be garbage collected, and data in the
third sstable will be resurrected after it arrives to the pending replica.

The fix for the issue was merged in b66479ea98

This patch only ports the missing test.

Closes scylladb/scylladb#23466
2025-03-30 13:39:40 +03:00
Andrzej Jackowski
b8adbcbc84 audit: fix empty query string in BATCH query
Function modification_statement::add_raw() is never called, which
makes query string in audit_info of batch queries empty. In enterprise
branch, add_raw is called in Cql.g and those changes were never merged
to master.

This changes:
 - Add missing call of add_raw() to Cql.g
 - Include other related changes (from PR#3228 in scylla-enterprise)

Fixes scylladb#23311

Closes scylladb/scylladb#23315
2025-03-30 13:37:11 +03:00
Michał Chojnowski
79a477ecb6 cmake: add the -dynamic-linker=... form to the -dynamic-linker regex
On my system (Nix), the compiler produces a `-dynamic-linker=/nix/store/...` in
the linker call scanned by get_padded_dynamic_linker_option.
But the regex can't deal with the `=` there, it requires a ` `. Fix that.

We also do the same in configure.py, and remove the Nix-specific hack
which used to disable the entire mechanism.

Closes scylladb/scylladb#22308
2025-03-30 11:58:47 +03:00
Kefu Chai
7814f6d374 github: improve seastar bad include check
for better developer experience:

- add inline annotations using problem matchers, see
  https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md
- use a single step for uploading both output files, because the `path`
  setting is actually passed to
  [@actions/glob](https://github.com/actions/toolkit/tree/main/packages/glob),
  i removed the double quotes and the leading "./"
  from the paths.
- use "::error" workflow command to signify the failure, see
  https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#example-creating-an-annotation-for-an-error

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23310
2025-03-30 11:56:18 +03:00
Evgeniy Naydanov
1a0c14aa50 test.py: async_cql: remove unused event_loop fixture
Newer version of pytest-asyncio (0.24.0) allows to control the scope
of async loop per fixture.  Don't need this workaround anymore.
2025-03-30 03:19:30 +00:00
Evgeniy Naydanov
cac0257914 test.py: random_failures: make it play well with xdist
Pass random seed across xdist workers using env variable.
2025-03-30 03:19:30 +00:00
Evgeniy Naydanov
9bba59631f test.py: add xdist worker ID to log filenames
When run tests in parallel we need to ensure that filenames
are unique by adding xdist worker ID to them.
2025-03-30 03:19:30 +00:00
Evgeniy Naydanov
9cb0ec2b42 test.py: topology: run tests using bare pytest command
Run ScyllaClusterManager using pytest fixture if `--manager-api`
option is not provided.

On this stage we're trying to be as close to test.py as possible.
test.py runs tests file-by-file, so, effectively, scopes `session`,
`package`, and `module` are pretty same.  Also, test.py starts
ScyllaClusterManager for every test module and this is the reason
why fixture `manager_api_sock_path` has scope=`module`.  And, in
result, we need to change scope for fixture `manager_internal` too.
2025-03-30 03:19:29 +00:00
Evgeniy Naydanov
42075170d1 test.py: add fixtures for current test suite and test
Add fixtures `testpy_testsuite` and `testpy_test` to `test/conftest.py`
To build TestSuite object we need to discover a corresponding `suite.xml`
file.  Do this by walking up thru the fs tree starting from the current
test file.
2025-03-30 03:19:29 +00:00
Evgeniy Naydanov
c4ae4e247a test.py: refactor paths constants and options
Add path constants to `test` module and use them in different test suites
instead of own dups of the same code:

 - TOP_SRC_DIR : ScyllaDB's source code root directory
 - TEST_DIR : the directory with test.py tests and libs
 - BUILD_DIR : directory with ScyllaDB's build artefacts

Add TestSuite.log_dir attribute as a ScyllaDB's build mode subdir of a path
provided using `--tmpdir` CLI argument.  Don't use `tmpdir` name because it
mixed up with pytest's built-in fixture and `--tmpdir` option itself.

Change default value for `--tmdir` from `./testlog` to `TOP_SRC_DIR/testlog`

Refactor `ResourceGather*` classes to use path from a `test` object instead of
providing it separately.

Move modes constants to `test` module and remove duplications.

Move `prepare_dirs()` and `start_3rd_party_services()` from `pylib.util` to
`pylib.suite.base` to avoid circular imports (with little refactoring to
use `pathlib.Path` instead of `str` as paths.)

Also, in some places refactor to use f-strings for formatting.
2025-03-30 03:19:29 +00:00
Michał Jadwiszczak
0ee0696959 test/cqlpy/test_service_level_api: update to service levels on raft and remove flakiness
Tests in `test_service_level_api` were written before
scylladb/scylladb#16585 and they were doing 10s sleeps to wait for
service level controller to update its configuration. Now performing
a read barrier is sufficient to ensure SL configuration is up-to-date,
which significantly reduces tests time (from ~60s to ~2-3s).

Moreover, there was flakiness in the `test_switch_tenants` test.
Until now, the test waited up to 60s for the connections to update
their scheduling groups. However, it is difficult to determine
how long the process might take because a connection may be blocked
while waiting for the next request to be processed,
and the scheduling group will be updated only after a request is processed
(see `generic_server::connection::process_until_tenant_switch()`).
To address this issue, 100 simple queries are executed so that
connections on all shards process at least one request
and update their scheduling groups.

Fixes scylladb/scylladb#22768

Closes scylladb/scylladb#23381
2025-03-28 17:14:21 +03:00
Pavel Emelyanov
9aa986a49a snapshot-ctl: Remove unused snapshot-single-table method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-28 10:45:31 +03:00
Pavel Emelyanov
5162f75d0b api: Snapshot all tables at once in scrub handler
The handler walks the list of tables and snapshots each one individually
(if needed). That's not very optimal, each such call starts a "snapshot
modification operation", which is switching to shard-0 for a lock, then
calls the snapshot of multiple tables giving it vector of a single name.
There's a method of snapshot-ctl that snapshots several tables at once,
no need to open-code it here.

One thing to care about -- the take_column_family_snapshot() throws when
the vector of table names is empty, so need an explicit skipping check.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-28 10:44:47 +03:00
Avi Kivity
6d7cb68aab test: ldap: avoid io_uring Seastar reactor backend
It tends to fail sometimes with ENOMEM:

```
ERROR 2025-03-24 01:05:22,983 [shard 0:sl:d] ldap_role_manager - error in reconnect: std::system_error (error C-Ares:4, server.that.will.never.exist.scylladb.com: Not found)
ERROR 2025-03-24 01:05:30,984 [shard 0:sl:d] ldap_role_manager - error in reconnect: std::system_error (error C-Ares:4, server.that.will.never.exist.scylladb.com: Not found)
ERROR 2025-03-24 01:05:47,123 [shard 0:main] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:12, Cannot allocate memory)
ERROR 2025-03-24 01:05:47,139 [shard 0:main] table - failed to write sstable /scylladir/testlog/x86_64/debug/scylla-33787f64/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/me-3got_1s5n_0lfls1y4z7vkkts07a-big-Data.db: storage_io_error (Storage I/O error: 12: Cannot allocate memory)
ERROR 2025-03-24 01:05:47,140 [shard 0:main] table - Memtable flush failed due to: storage_io_error (Storage I/O error: 12: Cannot allocate memory). Aborting, at 0x30f5605 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x4514f14 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x4514b96 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x45165b1 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x4518dcf 0x3fde842 0x35dc5c6 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36c26ed /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36cdd0c /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d2cd2 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d0e56 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x327f47a /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x327c8f0 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1cdd4 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c79c /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c69c /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c184 /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x34b2674 0x314b8b6 /lib64/libc.so.6+0x70ba7 /lib64/libc.so.6+0xf4b8b
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> >(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> >(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
   --------
   seastar::shared_future<>::shared_state
Aborting on shard 0, in scheduling group main.
Backtrace:
  0x30f5605
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x384a0e4
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x3849db2
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x369bd84
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d42a2
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x37a5ed9
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x37a61d5
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x37a601f
  /lib64/libc.so.6+0x1a04f
  /lib64/libc.so.6+0x72b53
  /lib64/libc.so.6+0x19f9d
  /lib64/libc.so.6+0x1941
  0x3fde8b1
  0x35dc5c6
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36c26ed
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36cdd0c
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d2cd2
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x36d0e56
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x327f47a
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x327c8f0
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1cdd4
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c79c
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c69c
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar_testing.so+0x1c184
  /jenkins/workspace/scylla-master/next/scylla/build/debug/seastar/libseastar.so+0x34b2674
  0x314b8b6
  /lib64/libc.so.6+0x70ba7
  /lib64/libc.so.6+0xf4b8b
=== TEST.PY SUMMARY START ===
Test exited with code -6

=== TEST.PY SUMMARY END ===

=== decoded ===
Backtrace:
[Backtrace #0]
__interceptor_backtrace at /mnt/clang_build/llvm-project-x86_64/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4369
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/debug/seastar/./seastar/include/seastar/util/backtrace.hh:70
seastar::backtrace_buffer::append_backtrace() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:805
seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:838
seastar::print_with_backtrace(char const*, bool) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:850
seastar::sigabrt_action() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:4004
seastar::install_oneshot_signal_handler<6, (void (*)())(&seastar::sigabrt_action)>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3981
seastar::install_oneshot_signal_handler<6, (void (*)())(&seastar::sigabrt_action)>()::{lambda(int, siginfo_t*, void*)#1}::__invoke(int, siginfo_t*, void*) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3976
/lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=c8c3fa52aaee3f5d73b6fd862e39e9d4c010b6ba, for GNU/Linux 3.2.0, not stripped

?? ??:0
printf_positional at ??:?
?? ??:0
?? ??:0
replica::table::seal_active_memtable(replica::compaction_group&, replica::flush_permit&&)::$_0::operator()(std::function<seastar::future<void> ()>) const at ././replica/table.cc:1512
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/coroutine:242
 (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:122
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:2616
seastar::reactor::run_some_tasks() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3088
seastar::reactor::do_run() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3256
seastar::reactor::run() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/reactor.cc:3146
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/app-template.cc:167
seastar::testing::test_runner::start_thread(int, char**)::$_0::operator()() at ./build/debug/seastar/./build/debug/seastar/./seastar/src/testing/test_runner.cc:77
void std::__invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(std::__invoke_other, seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/invoke.h:61
std::enable_if<is_invocable_r_v<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>, void>::type std::__invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::$_0&>(seastar::testing::test_runner::start_thread(int, char**)::$_0&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/invoke.h:111
std::_Function_handler<void (), seastar::testing::test_runner::start_thread(int, char**)::$_0>::_M_invoke(std::_Any_data const&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/std_function.h:290
seastar::posix_thread::start_routine(void*) at ./build/debug/seastar/./build/debug/seastar/./seastar/src/core/posix.cc:90
asan_thread_start(void*) at /mnt/clang_build/llvm-project-x86_64/compiler-rt/lib/asan/asan_interceptors.cpp:239
__vfscanf_internal at :?
peek_token at ??:?
```

In ce65164315, we banned io_uring from tests, but missed the ldap
tests. This extends coverage to ldap tests.

I verified that the new options indeed reach the test.

Refs #23411.

Credit to Botond for recognizing the failure reason.

Closes scylladb/scylladb#23422
2025-03-28 07:45:53 +02:00
Botond Dénes
bd8973a025 tools/scylla-nodetool: s/GetInt()/GetInt64()/
GetInt() was observed to fail when the integer JSON value overflows the
int32_t type, which `GetInt()` uses for storage. When this happens,
rapidjson will assign a distinct 64 bit integer type to the value, and
attempting to access it as 32 bit integer triggers the wrong-type error,
resulting in assert failure. This was hit on the field where invoking
nodetool netstats resulted in nodetool crashing when the streamed bytes
amounts were higher than maxint.

To avoid such bugs in the future, replace all usage of GetInt() in
nodetool of GetInt64(), just to be sure.

A reproducer is added to the nodetool netstats crash.

Fixes: scylladb/scylladb#23394

Closes scylladb/scylladb#23395
2025-03-27 14:05:39 +02:00
Botond Dénes
d57e71837f Merge 'Improve scoped restore test' from Pavel Emelyanov
This PR includes several fixes to the nowadays flaky test_restore_with_streaming_scopes test.

1. Check that backup and restore APIs don't fail. Currently, if either of them does the test cases fails anyway checking that the data is not restored back, but it's better to know what exactly failed

2. For restore API the test collects the list of sstables to restore from. Currently collecting this list races with background compaction and sometimes leads to restore API to fail which, in turn, makes the whole test to fail

3. Add a test case that validates that restore-from-missing-sstable fails nicely

refs: #23189
No backport, as it's a relatively new test

Closes scylladb/scylladb#23445

* github.com:scylladb/scylladb:
  test/backup: Validate that restoring from non-existing sstables fails
  test/backup: Collect sstables names after snapshot
  test/backup: Check that backup and restore succeed
2025-03-27 13:23:41 +02:00
Piotr Dulikowski
288216a89e Merge 'Ignore wrapped exceptions gate_closed_exception and rpc::closed_error when node shuts down.' from Sergey Zolotukhin
Normally, when a node is shutting down, `gate_closed_exception` and `rpc::closed_error`
in `send_to_live_endpoints` should be ignored. However, if these exceptions are wrapped
in a `nested_exception`, an error message is printed, causing tests to fail.

This commit adds handling for nested exceptions in this case to prevent unnecessary
error messages.

Fixes scylladb/scylladb#23325
Fixes scylladb/scylladb#23305
Fixes scylladb/scylladb#21815

Backport: looks like this is quite a frequent issue, therefore backport to 2025.1.

Closes scylladb/scylladb#23336

* github.com:scylladb/scylladb:
  database: Pass schema_ptr as const ref in `wrap_commitlog_add_error`
  database: Unify exception handling in `do_apply` and `apply_with_commitlog`
  storage_proxy: Ignore wrapped `gate_closed_exception` and `rpc::closed_error` when node shuts down.
  exceptions: Add `try_catch_nested` to universally handle nested exceptions of the same type.
2025-03-27 11:39:42 +01:00
Pavel Emelyanov
9f036d957a Merge 'test/clqpy/test_tool.py: get_sstables_for_table(): exclude non-sealed sstables' from Botond Dénes
Filter out sstables which don't have a TOC or have a temporary TOC. Such sstables are incomplete and can dissapear if the compaction which writes them is interrupted.

Fixes: #23203

This PR fixes a flaky test which is only on master, no backports required.

Closes scylladb/scylladb#23450

* github.com:scylladb/scylladb:
  test/cqlpy/test_tools.py: test_scylla_sstable_query: reduce scope of no-compaction context
  test/clqpy/test_tool.py: get_sstables_for_table(): exclude non-sealed sstables
2025-03-27 09:45:07 +03:00
Tomasz Grabiec
8e506c5a8f test: tablets: Fix flakiness due to ungraceful shutdown
The test fails sporadically with:

cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}

That's becase a server is stopped in the middle of the workload.

The server is stopped ungracefully which will cause some requests to
time out. We should stop it gracefully to allow in-flight requests to
finish.

Fixes #20492

Closes scylladb/scylladb#23451
2025-03-27 09:44:07 +03:00
Lakshmi Narayanan Sreethar
dccce670c1 topology_coordinator: fix indentation in generate_migration_updates
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-03-27 10:16:34 +05:30
Lakshmi Narayanan Sreethar
5b47d84399 topology_coordinator: do not schedule migrations when there are pending resize finalizations
Resize finalization is executed in a separate topology transition state,
`tablet_resize_finalization`, to ensure it does not overlap with tablet
transitions. The topology transitions into the
`tablet_resize_finalization` state only when no tablet migrations are
scheduled or being executed. If there is a large load-balancing backlog,
split finalization might be delayed indefinitely, leaving the tables
with large tablets.

To fix this, do not schedule tablet migrations on any tables when there
are pending resize finalizations. This ensures that migrations from the
same table and other unrelated tables do not block resize finalization.

Also added a testcase to verify the fix.

Fixes #21762

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-03-27 10:16:34 +05:30
Lakshmi Narayanan Sreethar
8cabc66f07 load_balancer: make repair plans only when there is no pending resize finalization
Do not make repair plans if any table has pending resize finalization.
This is to ensure that the finalization doesn't get delayed by reapir
tasks.

Refs #21762

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-03-27 10:16:34 +05:30
Avi Kivity
b292b5800b Merge 'test.py: move starting LDAP service to dedicate method' from Andrei Chekun
Move starting LDAP to the method where the rest of the services are started. This will unify the way of starting the 3rd party services.
Fix LDAP tests flakiness due not possible to connect to LDAP server.
Add catching stdout and stderr of toxiproxy-cli in case of errors

Related: https://github.com/scylladb/scylladb/pull/23333

This PR is based on https://github.com/scylladb/scylladb/pull/23221, so #23221 should be merged first.

Closes scylladb/scylladb#23235

* github.com:scylladb/scylladb:
  test.py: Refactor nodetool/conftest
  test.py: Refactor test/pylib/cpp/ldap
  test.py: move starting LDAP service to dedicate method
2025-03-26 15:31:00 +02:00
Botond Dénes
801339bad9 test/cqlpy/test_tools.py: test_scylla_sstable_query: reduce scope of no-compaction context
To just system.local, the table these tests operate on. No need to
disable autocompaction for all of the system keyspace.
2025-03-26 09:19:38 -04:00
Botond Dénes
3ec863c4ce test/clqpy/test_tool.py: get_sstables_for_table(): exclude non-sealed sstables
Filter out sstables which don't have a TOC or have a temporary TOC. Such
sstables are incomplete and can dissapear if the compaction which writes
them is interrupted.
2025-03-26 09:18:34 -04:00
Pavel Emelyanov
1da889f239 Merge 'Allow abort during join_cluster' from Benny Halevy
Bootstrap or replace can take a long time, but
since feef7d3fa1,
the stop_signal is checked only in checkpoints,
and in particular, abort isn't requested during
join_cluster.

Fixes #23222

* requires backport on top of https://github.com/scylladb/scylladb/pull/23184

Closes scylladb/scylladb#23306

* github.com:scylladb/scylladb:
  main: allow abort during join_cluster
  main: add checkpoint before joining cluster
  storage_service: add start_sys_dist_ks
2025-03-26 15:48:58 +03:00