scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Nikos Dragazis	2f93ab281b	api: Add REST endpoint for upgrading nodes to tablets The endpoint is the following: POST /storage_service/vnode_tablet_migrations/node/storage_mode?intended_mode={tablets,vnodes} This endpoint is part of the vnodes-to-tablets migration process and controls a node's intended_storage_mode in system.topology. The storage mode represents the node-local data distribution model, i.e., how data are organized across shards. The node will apply the intended storage mode to migrating tables upon next restart by resharding their SSTables (either on vnode boundaries if intended_mode=tablets, or with the static sharder if intended_mode=vnodes). Note that this endpoint controls the intended_storage_mode of the local node only. This has the nice benefit that once the API call returns, the change has not only been committed to group0 but also applied to the local node's state machine. This guarantees that the change is part of the node's local copy upon next restart; no additional read barrier is needed. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:20:35 +02:00
Nikos Dragazis	c4c3a95863	api: Add REST endpoint for starting vnodes-to-tablets migration The endpoint is the following: POST /storage_service/vnode_tablet_migrations/keyspaces/{keyspace} Its purpose is to start the migration of a whole keyspace from vnodes to tablets. When called, Scylla will synchronously create a tablet map for each table in the specified keyspace. The tablet maps of all tables are identical and they mirror the vnode layout; they contain one tablet per vnode and each tablet uses the same replica hosts and token boundaries as the corresponding vnode. The only difference from vnodes lies in the sharding approach. Tablets are assigned to a single shard - using a round-robin strategy in this patch - whereas vnodes are distributed evenly across all shards. If the tablet count per shard is low and tablet sizes are uneven, or some shards have more tablets than others, performance may degrade during the migration process. For example, a cluster with i8g.48xlarge (192 vCPUs), 256 vnodes per node and RF=3 will have 256 * 3 / 192 vCPUs = 4 tablet replicas per shard during the migration. One additional tablet or a double-sized tablet would cause 25% overcommit. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:19:47 +02:00
Nikos Dragazis	b7f4ae8218	topology_state_machine: Add intended_storage_mode to system.topology Part of the vnodes-to-tablets migration is to reshard the SSTables of each node on vnode boundaries. Resharding is a heavy operation that runs on startup while the node is offline. Since nodes can restart for unexpected reasons, we need a flag to do it in a controllable way. We also need the ability to roll back the migration, which requires resharding in the opposite direction. This means a node must be aware of the intended migration direction. To address both requirements, this patch introduces a new column, intended_storage_mode, in system.topology. A non-null value indicates that a node should perform a migration and specifies the migration direction. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	bc8109f1a4	distributed_loader: Wire vnode-based resharding into table populator Make the table populator migration-aware. If a table is migrating to tablets, switch from normal resharding to vnode-based resharding. Vnode-based resharding requires passing a vector of "owned ranges" upon which resharding will segregate the SSTables. Compute it from the tablet map. We could also compute them from the vnodes, since tablets are identical to vnodes during the migration, but in the future we may switch to a different model (multiple tablets per vnode). Let the distributed loader decide if a table is migrating or not and communicate that to the table populator. A table is migrating if the keyspace replication strategy uses vnodes but the table replication strategy uses tablets. Currently, tables cannot enter this "migrating" state; support for this will be introduced in the next patches. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	63399951df	replica: Pick any compaction group for resharding In the previous patch, reshard compaction was extended with a special operation mode where SSTables from vnode-based tables are segregated on vnode boundaries and not with the static sharder. This will later be wired into vnodes-to-tablets migration. The problem is that resharding requires a compaction group. With a vnode-based table, there is only one compaction group per shard, and this is what the current code utilizes (`try_get_compaction_group_view_with_static_sharding()`). But the new operation mode will apply to migrating tables, which use a `tablet_storage_group_manager`, which creates one compaction group for each tablet. Some compaction group needs to be selected. Pick any compaction group that is available on the current shard. Reshard compaction is an operation that happens early in the startup process; compaction groups do not own any SSTables yet, so all compaction groups are equivalent. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Benny Halevy	d1c6141407	compaction: resharding_compaction: add vnodes_resharding option In this mode, the output sstables generated by resharding compaction are segregated by token range, based on the keyspace vnode-based owned token ranges vector. A basic unit test was also added to sstable_directory_test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	d153a95943	storage_service: Preserve ERM flavor of migrating tables When a table is migrating from vnodes to tablets, the cluster is in a mixed state where some nodes use vnode ERMs and others use tablet ERMs. The ERM flavor is a node-local property that expresses the node's storage organization. Preserve the flavor across token metadata changes. The flavor needs to be on par with storage, but the storage can change only on startup, as it requires resharding all SSTables to conform with the flavor. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	4a3e26d5e3	tablet_allocator: Exclude migrating tables from load balancing The tablet load balancer operates on all tablet-based tables that appear in the tablet metadata. With the introduction of the vnodes-to-tablets migration procedure later in this series, migrating tables will also appear in the tablet metadata, but they need to be treated as vnode tables until migration is finished. This patch excludes such tables from load balancing. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	3e2dc078c9	feature_service: Add vnodes_to_tablets_migrations feature Vnodes-to-tablets migrations require cluster-level support: the REST API and the group0 state need to be supported by all nodes. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Yaniv Kaul	e59a21752d	.github/workflows/trigger_jenkins.yaml: add workflow permissions Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/147. To fix the problem, add an explicit `permissions:` block to the workflow (either at the top level or inside the `trigger-jenkins` job) that constrains the `GITHUB_TOKEN` to the minimal necessary privileges. This codifies least-privilege in the workflow itself instead of relying on repository or organization defaults. The best minimal, non‑breaking change is to define a root‑level `permissions:` block with read‑only contents access because the job does not perform any write operations to the repository, nor does it interact with issues, pull requests, or other GitHub resources. A conservative, widely accepted baseline is `contents: read`. If later steps require more permissions, they can be added explicitly, but for this snippet, no such need is visible. Concretely, in `.github/workflows/trigger_jenkins.yaml`, insert: ```yaml permissions: contents: read ``` between the `name:` block and the `on:` block (e.g., after line 2). No additional methods, imports, or definitions are needed since this is a pure YAML configuration change and does not alter runtime behavior of the existing shell steps. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27815	2026-03-24 08:40:30 +02:00
Yaniv Kaul	85a531819b	.github/workflows/trigger-scylla-ci.yaml: add permissions to workflow Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/169. In general, the fix is to add an explicit `permissions:` block to the workflow (at the root level or per job) so that the `GITHUB_TOKEN` has only the minimal scopes needed. Since this job only reads event data and uses secrets to talk to Jenkins, we can restrict `GITHUB_TOKEN` to read‑only repository contents. The single best fix here is to add a top‑level `permissions:` block right under the `name:` (and before `on:`) in `.github/workflows/trigger-scylla-ci.yaml`, setting `contents: read`. This applies to all jobs in the workflow, including `trigger-jenkins`, and does not alter any existing steps or logic. No additional imports or methods are needed, as this is purely a YAML configuration change for GitHub Actions. Concretely, edit `.github/workflows/trigger-scylla-ci.yaml` to insert: ```yaml permissions: contents: read ``` after line 1. No other lines in the file need to change. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27812	2026-03-24 08:37:49 +02:00
Botond Dénes	772b32d9f7	test/scylla_gdb: fix flakiness by preparing objects at test time Fixtures previously ran GDB once (module scope) to find live objects (sstables, tasks, schemas) and stored their addresses. Tests then reused those addresses in separate GDB invocations. Sometimes these addresses would become stale and the test would step on use-after-free (e.g. sstables compacted away between invocations). Fix by dropping the fixtures. The helper functions used by the fixtures to obtain the required objects are converted to gdb convenience functions, which can be used in the same expression as the test command invocation. Thus, the object is aquired on-demand at the moment it is used, so it is guaranteed to be fresh and relevant. Fixes: SCYLLADB-1020 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#28999	2026-03-23 16:54:03 +02:00
Piotr Dulikowski	60fb5270a9	logstor: fix fmt::format use with std::filesystem::path The version of fmt installed on my machine refuses to work with `std::filesystem::path` directly. Add `.string()` calls in places that attempt to print paths directly in order to make them work. Closes scylladb/scylladb#29148	2026-03-23 15:15:52 +01:00
Pavel Emelyanov	3b9398dfc8	Merge 'encryption: fix deadlock in encrypted_data_source::get()' from Ernest Zaslavsky When encrypted_data_source::get() caches a trailing block in _next, the next call takes it directly — bypassing input_stream::read(), which checks _eof. It then calls input_stream::read_exactly() on the already-drained stream. Unlike read(), read_up_to(), and consume(), read_exactly() does not check _eof when the buffer is empty, so it calls _fd.get() on a source that already returned EOS. In production this manifested as stuck encrypted SSTable component downloads during tablet restore: the underlying chunked_download_source hung forever on the post-EOS get(), causing 4 tablets to never complete. The stuck files were always block-aligned sizes (8k, 12k) where _next gets populated and the source is fully consumed in the same call. Fix by checking _input.eof() before calling read_exactly(). When the stream already reached EOF, buf2 is known to be empty, so the call is skipped entirely. A comprehensive test is added that uses a strict_memory_source which fails on post-EOS get(), reproducing the exact code path that caused the production deadlock. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1128 Backport to 2025.3/4 and 2026.1 is needed since it fixes a bug that may bite us in production, to be on the safe side Closes scylladb/scylladb#29110 * github.com:scylladb/scylladb: encryption: fix deadlock in encrypted_data_source::get() test_lib: mark `limiting_data_source_impl` as not `final` Fix formatting after previous patch Fix indentation after previous patch test_lib: make limiting_data_source_impl available to tests	2026-03-23 17:12:44 +03:00
Piotr Dulikowski	df68d0c0f7	directories: add missing seastar/util/closeable.hh include Without this include the file would not compile on its own. The issue was most likely masked by the use of precompiled headers in our CI. Closes scylladb/scylladb#29170	2026-03-23 15:46:56 +03:00
Yaniv Michael Kaul	051107f5bc	scylla-gdb: fix sstable-summary crash on ms-format sstables The 'scylla sstable-summary' GDB command crashes with 'ValueError: Argument "count" should be greater than zero' when inspecting ms-format (trie-based) sstables. This happens because ms-format sstables don't populate the traditional summary structure, leaving all fields zeroed out, which causes gdb.read_memory() to be called with a zero count. Fix by: - Adding zero-length guards to sstring.to_hex() and sstring.as_bytes() to return early when the data length is zero, consistent with the existing guard in managed_bytes.get(). - Adding the same guard to scylla_sstable_summary.to_hex(). - Detecting ms-format sstables (version == 5) early in scylla_sstable_summary.invoke() and printing an informative message instead of attempting to read the unpopulated summary. Fixes: SCYLLADB-1180 Closes scylladb/scylladb#29162	2026-03-23 12:44:47 +02:00
Piotr Szymaniak	c8e7e20c5c	test/cluster: retry create_table on transient schema agreement timeout In test_index_requires_rf_rack_valid_keyspace, the create_table call for a plain tablet-based table can fail with 'Unable to reach schema agreement' after the server's 10s timeout is exceeded. This happens when schema gossip propagation across the 4-node cluster takes longer than expected after a sequence of rapid schema changes earlier in the test. Add a retry (up to 2 attempts) on schema agreement errors for this specific create_table call rather than increasing the server-side timeout. Fixes: SCYLLADB-1135 Closes scylladb/scylladb#29132	2026-03-23 10:45:30 +02:00
Yaniv Kaul	fb1f995d6b	.github/workflows/backport-pr-fixes-validation.yaml: workflow does not contain permissions (Potential fix for code scanning alert no. 139) Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/139, To fix the problem, explicitly restrict the `GITHUB_TOKEN` permissions for this workflow/job so it has only what is needed. The script reads PR data and repository info (which is covered by `contents: read`/default read scopes) and posts a comment via `github.rest.issues.createComment`, which requires `issues: write`. No other write scopes (e.g., `contents: write`, `pull-requests: write`) are necessary. The best fix without changing functionality is to add a `permissions` block scoped to this job (or at the workflow root). Since we only see a single job here, we’ll add it under `check-fixes-prefix`. Concretely, in `.github/workflows/backport-pr-fixes-validation.yaml`, between the `runs-on: ubuntu-latest` line (line 10) and `steps:` (line 11), add: ```yaml permissions: contents: read issues: write ``` This keeps the token minimally privileged while still allowing the script to create issue/PR comments. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27810	2026-03-23 10:30:01 +02:00
Piotr Smaron	32225797cd	dtest: fix flaky test_writes_schema_recreated_while_node_down `read_barrier(session2)` was supposed to ensure `node2` has caught up on schema before a CL=ALL write. But `patient_cql_connection(node2)` creates a cluster-aware driver session `(TokenAwarePolicy(DCAwareRoundRobinPolicy()))` that can route the barrier CQL statement to any node — not necessarily `node2`. If the barrier runs on `node1` or `node3` (which already have the new schema), it's a no-op, and `node2` remains stale, thus the observed `WriteFailure`. The fix is to switch to `patient_exclusive_cql_connection(node2)`, which uses `WhiteListRoundRobinPolicy([node2_ip])` to pin all CQL to `node2`. This is already the established pattern used by other tests in the same file. Fixes: SCYLLADB-1139 No need to backport yet, appeared only on master. Closes scylladb/scylladb#29151	2026-03-23 10:25:54 +02:00
Michał Chojnowski	f29525f3a6	test/boost/cache_algorithm_test: disable sstable compression to avoid giant index pages The test intentionally creates huge index pages. But since `5e7fb08bf3`, the index reader allocates a block of memory for a whole index page, instead of incrementally allocating small pieces during index parsing. This giant allocation causes the test to fail spuriously in CI sometimes. Fix this by disabling sstable compression on the test table, which puts a hard cap of 2000 keys per index page. Fixes: SCYLLADB-1152 Closes scylladb/scylladb#29152	2026-03-23 09:57:11 +02:00
Raphael S. Carvalho	05b11a3b82	sstables_loader: use new sstable add path Use add_new_sstable_and_update_cache() when attaching SSTables downloaded by the node-scoped local loader. This is the correct variant for new SSTables: it can unlink the SSTable on failure to add it, and it can split the SSTable if a tablet split is in progress. The older add_sstable_and_update_cache() helper is intended for preexisting SSTables that are already stable on disk. Additionally, downloaded SSTables are now left unsealed (TemporaryTOC) until they are successfully added to the table's SSTable set. The download path (download_fully_contained_sstables) passes leave_unsealed=true to create_stream_sink, and attach_sstable opens the SSTable with unsealed_sstable=true and seals it only inside the on_add callback — matching the pattern used by stream_blob.cc and storage_service.cc for tablet streaming. This prevents a data-resurrection hazard: previously, if the process crashed between download and attach_sstable, or if attach_sstable failed mid-loop, sealed (TOC) SSTables would remain in the table directory and be reloaded by distributed_loader on restart. With TemporaryTOC, sstable_directory automatically cleans them up on restart instead. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1085. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#29072	2026-03-23 10:33:04 +03:00
Piotr Szymaniak	f511264831	alternator/test: fix test_ttl_with_load_and_decommission flaky Connection refused error The native Scylla nodetool reports ECONNREFUSED as 'Connection refused', not as 'ConnectException' (which is the Java nodetool format). Add 'Connection refused' to the valid_errors list so that transient connection failures during concurrent decommission/bootstrap topology changes are properly tolerated. Fixes SCYLLADB-1167 Closes scylladb/scylladb#29156	2026-03-22 11:01:45 +02:00
Piotr Dulikowski	cc695bc3f7	Merge 'vector_search: fix race condition on connection timeout' from Karol Nowacki When a `with_connect` operation timed out, the underlying connection attempt continued to run in the reactor. This could lead to a crash if the connection was established/rejected after the client object had already been destroyed. This issue was observed during the teardown phase of a upcoming high-availability test case. This commit fixes the race condition by ensuring the connection attempt is properly canceled on timeout. Additionally, the explicit TLS handshake previously forced during the connection is now deferred to the first I/O operation, which is the default and preferred behavior. Fixes: SCYLLADB-832 Backports to 2026.1 and 2025.4 are required, as this issue also exists on those branches and is causing CI flakiness. Closes scylladb/scylladb#29031 * github.com:scylladb/scylladb: vector_search: test: fix flaky test vector_search: fix race condition on connection timeout	2026-03-20 11:12:04 +01:00
Petr Gusev	4bfcd035ae	test_fencing: add missing await-s Fixes SCYLLADB-1099 Closes scylladb/scylladb#29133	2026-03-20 10:55:35 +01:00
Pavel Emelyanov	c4a0f6f2e6	object_store: Don't leave dangling objects by iterating moved-from names vector The code in upload_file std::move()-s vector of names into merge_objects() method, then iterates over this vector to delete objects. The iteration is apparently a no-op on moved-from vector. The fix is to make merge_objects() helper get vector of names by const reference -- the method doesn't modify the names collection, the caller keeps one in stable storage. Fixes #29060 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29061	2026-03-20 10:09:30 +02:00
Pavel Emelyanov	712ba5a31f	utils: Use yielding directory_lister in owner verification Switch directories::do_verify_owner_and_mode() from lister::scan_dir() to utils::directory_lister while preserving the previous hidden-entry behavior. Make do_verify_subpath use lister::filter_type directly so the verification helper can pass it straight into directory_lister, and keep a single yielding iteration loop for directory traversal. Minus one scan_dir user twards scan_dir removal from code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29064	2026-03-20 10:08:38 +02:00
Pavel Emelyanov	961fc9e041	s3: Don't rearm credential timers when credentials are not refreshed The update_credentials_and_rearm() may get "empty" credentials from _creds_provider_chain.get_aws_credentials() -- it doesn't throw, but returns default-initialized value. In that case the expires_at will be set to time_point::min, and it's probably not a good idea to arm the refresh timer and, even worse idea, to subtract 1h from it. Fixes #29056 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29057	2026-03-20 10:07:01 +02:00
Pavel Emelyanov	0a8dc4532b	s3: Fix missing upload ID in copy_part trace log The format string had two {} placeholders but three arguments, the _upload_id one is skipped from formatting Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29053	2026-03-20 10:05:44 +02:00
Botond Dénes	bb5c328a16	Merge 'Squash two primary-replica restoration tests together' from Pavel Emelyanov The test_restore_primary_replica_same_domain and test_restore_primary_replica_different_domain tests have very much in common. Previously both tests were also split each into two, so we have four tests, and now we have two that can also be squashed, the lines-of-code savings still worth it. This is the continuation of #28569 Tests improvement, not backporting Closes scylladb/scylladb#28994 * github.com:scylladb/scylladb: test: Replace a bunch of ternary operators with an if-else block test: Squash test_restore_primary_replica_same\|different_domain tests test: Use the same regexp in test_restore_primary_replica_different\|same_domain-s	2026-03-20 10:05:16 +02:00
Pavel Emelyanov	ea2a214959	test/backup: Use unique_name() for backup prefix instead of cf_dir The do_test_backup_abort() fetched the node's workdir and resolved cf_dir solely to construct a unique-ish backup prefix: prefix = f'{cf_dir}/backup' The comment already acknowledged this was only "unique(ish)" — relying on the UUID-derived cf_dir name as a uniqueness source is roundabout. unique_name() is already imported and used for exactly this purpose elsewhere in the file. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29030	2026-03-20 10:04:22 +02:00
Pavel Emelyanov	65032877d4	api: Move /storage_service/toppartitions from storage_service.cc to column_family.cc The endpoint URL remains intact. Having it next to another toppartitions endpoint (the /column_family/toppartitions one) is natural. This endpoint only needs sharded<replica::database>&, grabs it from http_context and doesn't use any other service. In column_family.cc the database reference is already available as a parameter. Once more user of http_context.db is gone. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#28996	2026-03-20 09:52:33 +02:00
Botond Dénes	de0bdf1a65	Merge 'Decouple test_refresh_deletes_uploaded_sstables from backup test-suite' from Pavel Emelyanov The test in question uses several helpers from the backup sute, but it doesn't really need them -- the operations it want to perform can be performed with standard pylib methods. "While at it" also collect some dangling effectively unused local variables from this test (these were apparently left from backup tests this one was copied-and-reworked from) Enhancing tests, not backporting Closes scylladb/scylladb#29130 * github.com:scylladb/scylladb: test/refresh: Simplify refresh invocation test/refresh: Remove r_servers alias for servers test/refresh: Replace check_mutation_replicas with a plain CQL SELECT test/refresh: Inline keyspace/table/data setup in test_refresh_deletes_uploaded_sstables test/refresh: Prepare indentation for new_test_keyspace in test_refresh_deletes_uploaded_sstables test/refresh: Decouple test_refresh_deletes_uploaded_sstables from backup tests test/refresh: Remove unused wait_for_cql_and_get_hosts import	2026-03-20 09:29:15 +02:00
Botond Dénes	97430e2df5	Merge 'Fix object storage lister entries walking loop' from Pavel Emelyanov Two issues found in the lister returned by gs_client_wrapper::make_object_lister() Lister can report EOF too early in case filter is active, another one is potential vector out-of-bounds access Fixes #29058 The code appeared in 2026.1, worth fixing it there as well Closes scylladb/scylladb#29059 * github.com:scylladb/scylladb: sstables: Fix object storage lister not resetting position in batch vector sstables: Fix object storage lister skipping entries when filter is active	2026-03-20 09:12:42 +02:00
Botond Dénes	5573c3b18e	Merge 'tablets: Fix deadlock in background storage group merge fiber' from Tomasz Grabiec When it deadlocks, groups stop merging and compaction group merge backlog will run-away. Also, graceful shutdown will be blocked on it. Found by flaky unit test test_merge_chooses_best_replica_with_odd_count, which timed-out in 1 in 100 runs. Reason for deadlock: When storage groups are merged, the main compaction group of the new storage group takes a compaction lock, which is appended to _compaction_reenablers_for_merging, and released when the merge completion fiber is done with the whole batch. If we accumulate more than 1 merge cycle for the fiber, deadlock occurs. Lock order will be this Initial state: cg0: main cg1: main cg2: main cg3: main After 1st merge: cg0': main [locked], merging_groups=[cg0.main, cg1.main] cg1': main [locked], merging_groups=[cg2.main, cg3.main] After 2nd merge: cg0'': main [locked], merging_groups=[cg0'.main [locked], cg0.main, cg1.main, cg1'.main [locked], cg2.main, cg3.main] merge completion fiber will try to stop cg0'.main, which will be blocked on compaction lock. which is held by the reenabler in _compaction_reenablers_for_merging, hence deadlock. The fix is to wait for background merge to finish before we start the next merge. It's achieved by holding old erm in the background merge, and doing a topology barrier from the merge finalizing transition. Background merge is supposed to be a relatively quick operation, it's stopping compaction groups. So may wait for active requests. It shouldn't prolong the barrier indefinitely. Tablet tests which trigger merge need to be adjusted to call the barrier, otherwise they will be vulnerable to the deadlock. Fixes SCYLLADB-928 Backport to >= 2025.4 because it's the earliest vulnerable due to `f9021777d8`. Closes scylladb/scylladb#29007 * github.com:scylladb/scylladb: tablets: Fix deadlock in background storage group merge fiber replica: table: Propagate old erm to storage group merge test: boost: tablets_test: Save tablet metadata when ACKing split resize decision storage_service: Extract local_topology_barrier()	2026-03-20 09:05:52 +02:00
Botond Dénes	34473302b0	Merge 'docs: document existing guardrails' from Andrzej Jackowski This patch series introduces a new documentation for exiting guardrails. Moreover: - Warning / failure messages of recently added write CL guardrails (SCYLLADB-259) are rephrased, so all guardrails have similar messages. - Some new tests are added, to help verify the correctness of the documentation and avoid situations where the documentation and implementation diverge. Fixes: [SCYLLADB-257](https://scylladb.atlassian.net/browse/SCYLLADB-257) No backport, just new docs and tests. [SCYLLADB-257]: https://scylladb.atlassian.net/browse/SCYLLADB-257?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29011 * github.com:scylladb/scylladb: test: add new guardrail tests matching documentation scenarios test: add metric assertions to guardrail replication strategy tests test: use regex matching in guardrail replication strategy tests test: extract ks_opts helper in test_guardrail_replication_strategy docs: document CQL guardrails cql: improve write consistency level guardrail messages	2026-03-20 08:56:00 +02:00
artem.penner	9898e5700b	scylla-node-exporter: Add systemd collector to node exporter This PR enables the node_exporter systemd collector and configures the unit whitelist to include scylla-server.service and systemd-coredump services. Motivation: We currently lack visibility into system-level service states, which is critical for diagnosing stability issues. This configuration enables two specific use cases: - Detecting Coredump Loops: We encounter scenarios where ScyllaDB enters a restart loop. To pinpoint SIGSEGV (coredumps) as the root cause, we need to track when the systemd-coredump service becomes active, indicating a dump is being processed. - Identifying Startup Failures: We need to detect when the scylla-server unit enters a failed state. This is essential for catching unrecoverable errors (e.g., corrupted commitlogs or configuration bugs) that prevent the server from starting. example of promql queries: - `node_systemd_unit_state{name=~"systemd-coredump@.*", state="active"} == 1` - `node_systemd_unit_state{name="scylla-server.service", state="failed"} == 1` Closes #28402	2026-03-20 08:39:56 +02:00
Andrzej Jackowski	10c4b9b5b0	test: verify signal() detects resource negative leak in rcs reader_concurrency_semaphore::signal() guards against available resources exceeding the initial limit after a signal, which would indicate a bug such as double-returning resources. It reports the issue via on_internal_error_noexcept and clamps resources back to the initial values. However, before this commit there were no tests that verified this behavior, so bugs like SCYLLADB-1014 went undetected. Add a test that artificially signals resources that were never consumed and verifies that signal() detects the negative leak and clamps available resources back to the initial limit. Refs: SCYLLADB-1014 Fixes: SCYLLADB-1031 Closes scylladb/scylladb#28993	2026-03-20 09:21:20 +03:00
Botond Dénes	f9adbc7548	test/cqlpy/test_tombstone_limit.py: disable tombstone-gc for test table Since `7564a56dc8`, all tables default to repair-mode tombstone-gc, which is identical to immediate-mode for RF=1 tables. Consequently the tombstones written by the tests in this test file are immediately collectible and with some unlucky timing, some of them can be collected before the end of the test, failing the empty-page prefix check because the empty pages prefix will be smaller than expected based on the number of tombstones written. Disable tombstone-gc to remove this source of flakyness. Fixes: SCYLLADB-1062 Closes scylladb/scylladb#29077	2026-03-20 09:14:29 +03:00
Michał Chojnowski	6b18d95dec	test: add a missing reconnect_driver in test_sstable_compression_dictionaries_upgrade.py Need to work around https://github.com/scylladb/python-driver/issues/295, lest a CQL query fail spuriously after the cluster restart. Fixes: SCYLLADB-1114 Closes scylladb/scylladb#29118	2026-03-20 09:05:14 +03:00
Botond Dénes	89388510a0	test/cluster/test_data_resurrection_in_memtable.py: use explicit CL The test has expectation w.r.t which write makes it to which nodes: * inserts make it to all nodes * delete makes it to all-1 (QUORUM) node However, this was not expressed with CL, and the default CL=ONE allowed for some nodes missing the writes and this violating the tests expectations on what data is persent on which nodes. This resulted on the test being flaky and failing on the data checks. Use explicit CL for the ingestion to prevent this. The improvements to the test introduced in `a8dd13731f` was of great help in investigating this: traces are now available and the check happens after the data was dumped to logs. Fixes: SCYLLADB-870 Fixes: SCYLLADB-812 Fixes: SCYLLADB-1102 Closes scylladb/scylladb#29128	2026-03-20 09:02:57 +03:00
Avi Kivity	6b259babeb	Merge 'logstor: initial log-structured storage for key-value tables' from Michael Litvak Introduce an initial and experimental implementation of an alternative log-structured storage engine for key-value tables. Main flows and components: * The storage is composed of 32MB files, each file divided to segments of size 128k. We write to them sequentially records that contain a mutation and additional metadata. Records are written to a buffer first and then written to the active segment sequentially in 4k sized blocks. * The primary index in memory maps keys to their location on disk. It is a B-tree per-table that is ordered by tokens, similar to a memtable. * On reads we calculate the key and look it up in the primary index, then read the mutation from disk with a single disk IO. * On writes we write the record to a buffer, wait for it to be written to disk, then update the index with the new location, and free the previous record. * We track the used space in each segment. When overwriting a record, we increase the free space counter for the segment of the previous record that becomes dead. We store the segments in a histogram by usage. * The compaction process takes segments with low utilization, reads them and writes the live records to new segments, and frees the old segments. * Segments are initially "mixed" - we write to the active segment records from all tables and all tablets. The "separator" process rewrites records from mixed segments into new segments that are organized by compaction groups (tablets), and frees the mixed segments. Each write is written to the active segment and to a separator buffer of the compaction group, which is eventually flushed to a new segment in the compaction group. Currently this mode is experimental and requires an experimental flag to be enabled. Some things that are not supported yet are strong consistency, tablet migration, tablet split/merge, big mutations, tombstone gc, ttl. to use, add to config: ``` enable_logstor: true experimental_features: - logstor ``` create a table: ``` CREATE TABLE ks.t(pk int PRIMARY KEY, a int, v text) WITH storage_engine = 'logstor'; ``` INSERT, SELECT, DELETE work as expected UPDATE not supported yet no backport - new feature Closes scylladb/scylladb#28706 * github.com:scylladb/scylladb: logstor: trigger separator flush for buffers that hold old segments docs/dev: add logstor documentation logstor: recover segments into compaction groups logstor: range read logstor: change index to btree by token per table logstor: move segments to replica::compaction_group db: update dirty mem limits dynamically logstor: track memory usage logstor: logstor stats api logstor: compaction buffer pool logstor: separator: flush buffer when full logstor: hold segment until index updates logstor: truncate table logstor: enable/disable compaction per table logstor: separator buffer pool test: logstor: add separator and compaction tests logstor: segment and separator barrier logstor: separator debt controller logstor: compaction controller logstor: recovery: recover mixed segments using separator logstor: wait for pending reads in compaction logstor: separator logstor: compaction groups logstor: cache files for read logstor: recovery: initial logstor: add segment generation logstor: reserve segments for compaction logstor: index: buckets logstor: add buffer header logstor: add group_id logstor: record generation logstor: generation utility logstor: use RIPEMD-160 for index key test: add test_logstor.py api: add logstor compaction trigger endpoint replica: add logstor to db schema: add logstor cf property logstor: initial commit db: disable tablet balancing with logstor db: add logstor experimental feature flag	2026-03-20 00:18:09 +02:00
Avi Kivity	062751fcec	Merge 'db/config: enable ms sstable format by default' from Łukasz Paszkowski Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes. Make the new format a new default for new clusters by naming ms in the default scylla.yaml. New functionality. No backport needed. This PR is basically Michał's one https://github.com/scylladb/scylladb/pull/26377, Jakub's https://github.com/scylladb/scylladb/pull/27332 fixing `sstables_manager::get_highest_supported_format()` and one test fix. Closes scylladb/scylladb#28960 * github.com:scylladb/scylladb: db/config: announce ms format as highest supported db/config: enable `ms` sstable format by default cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format api/system: add /system/chosen_sstable_version test/cluster/dtest: reduce num_tokens to 16	2026-03-19 18:19:01 +02:00
Pavel Emelyanov	969dddb630	test/refresh: Simplify refresh invocation take_snapshot return values were unused so drop them. do_refresh was a thin wrapper around load_new_sstables that added no logic; inline it directly into the gather expression. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:57 +03:00
Pavel Emelyanov	de21572b31	test/refresh: Remove r_servers alias for servers r_servers = servers was a no-op assignment; use servers directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:52 +03:00
Pavel Emelyanov	20b1531e6d	test/refresh: Replace check_mutation_replicas with a plain CQL SELECT The goal of test_refresh_deletes_uploaded_sstables is to verify that sstables are removed from the upload directory after refresh. The replica check was just a sanity guard; a simple SELECT of all keys is sufficient and much lighter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-19 18:42:48 +03:00
Pavel Emelyanov	c591b9ebe2	test/refresh: Inline keyspace/table/data setup in test_refresh_deletes_uploaded_sstables Replace create_dataset() with explicit keyspace creation via new_test_keyspace, inline CREATE TABLE, and direct cql.run_async inserts — matching the pattern used in do_test_streaming_scopes. This removes the last dependency on backup helpers for dataset setup and makes the test self-contained. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:44 +03:00
Pavel Emelyanov	06006a6328	test/refresh: Prepare indentation for new_test_keyspace in test_refresh_deletes_uploaded_sstables Wrap the test body under if True: to pre-indent it, making the subsequent patch that introduces new_test_keyspace a pure content change with no whitespace noise. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:40 +03:00
Pavel Emelyanov	67d8cde42d	test/refresh: Decouple test_refresh_deletes_uploaded_sstables from backup tests Replace create_cluster() from object_store/test_backup.py with a plain manager.servers_add(2) call. The test does not use object storage, so there is no need to pull in the backup helper along with its config and logging knobs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:36 +03:00
Pavel Emelyanov	04f046d2d8	test/refresh: Remove unused wait_for_cql_and_get_hosts import Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:32 +03:00
Botond Dénes	e8b37d1a89	Merge 'doc: fix the installation section' from Anna Stuchlik This PR fixes the Installation page: - Replaces `http `with `https `in the download command. - Replaces the Open Source example from the Installation section for CentOS (we overlooked this example before). Fixes https://github.com/scylladb/scylladb/issues/29087 Fixes https://github.com/scylladb/scylladb/issues/29087 This update affects all supported versions and should be backported as a bug fix. Closes scylladb/scylladb#29088 * github.com:scylladb/scylladb: doc: remove the Open Source Example from Installation doc: replace http with https in the installation instructions	2026-03-19 17:13:53 +02:00

1 2 3 4 5 ...

52794 Commits