scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Botond Dénes	a7856c3d52	Merge '[Backport 2025.2] service/qos: Fall back to default scheduling group when using maintenance socket' from Scylladb[bot] The service level controller relies on `auth::service` to collect information about roles and the relation between them and the service levels (those attached to them). Unfortunately, the service level controller is initialized way earlier than `auth::service` and so we had to prevent potential invalid queries of user service levels (cf. `46193f5e79`). Unfortunately, that came at a price: it made the maintenance socket incompatible with the current implementation of the service level controller. The maintenance socket starts early, before the `auth::service` is fully initialized and registered, and is exposed almost immediately. If the user attempts to connect to Scylla within this time window, via the maintenance socket, one of the things that will happen is choosing the right service level for the connection. Since the `auth::service` is not registered, Scylla with fail an assertion and crash. A similar scenario occurs when using maintenance mode. The maintenance socket is how the user communicates with the database, and we're not prepared for that either. To avoid unnecessary crashes, we add new branches if the passed user is absent or if it corresponds to the anonymous role. Since the role corresponding to a connection via the maintenance socket is the anonymous role, that solves the problem. Some accesses to `auth::service` are not affected and we do not modify those. Fixes scylladb/scylladb#26816 Backport: yes. This is a fix of a regression. - (cherry picked from commit `c0f7622d12`) - (cherry picked from commit `222eab45f8`) - (cherry picked from commit `394207fd69`) - (cherry picked from commit `b357c8278f`) Parent PR: #26856 Closes scylladb/scylladb#27034 * github.com:scylladb/scylladb: test/cluster/test_maintenance_mode.py: Wait for initialization test: Disable maintenance mode correctly in test_maintenance_mode.py test: Fix keyspace in test_maintenance_mode.py service/qos: Do not crash Scylla if auth_integration absent	2025-11-20 10:43:18 +02:00
Avi Kivity	6206b57008	Merge '[Backport 2025.2] Synchronize tablet split and load-and-stream' from Scylladb[bot] Load-and-stream is broken when running concurrently to the finalization step of tablet split. Consider this: 1) split starts 2) split finalization executes barrier and succeed 3) load-and-stream runs now, starts writing sstable (pre-split) 4) split finalization publishes changes to tablet metadata 5) load-and-stream finishes writing sstable 6) sstable cannot be loaded since it spans two tablets two possible fixes (maybe both): 1) load-and-stream awaits for topology to quiesce 2) perform split compaction on sstable that spans both sibling tablets This patch implements # 1. By awaiting for topology to quiesce, we guarantee that load-and-stream only starts when there's no chance coordinator is handling some topology operation like split finalization. Fixes https://github.com/scylladb/scylladb/issues/26455. - (cherry picked from commit `3abc66da5a`) - (cherry picked from commit `4654cdc6fd`) Parent PR: #26456 Closes scylladb/scylladb#26647 * github.com:scylladb/scylladb: sstables_loader: Don't bypass synchronization with busy topology test: Add reproducer for l-a-s and split synchronization issue sstables_loader: Synchronize tablet split and load-and-stream	2025-11-17 17:15:37 +02:00
Łukasz Paszkowski	cb5bfeeb16	tools/scylla-nodetool: fix crash when rows_merged cells contain null Any empty object of the json::json_list type has its internal _set variable assigned to false which results in such objects being skipped by the json::json_builder. Hence, the json returned by the api GET//compaction_manager/compaction_history does not contain the field `rows_merged` if a cell in the system.compaction_history table is null or an empty list. In such cases, executing the command `nodetool compactionhistory` will result in a crash with the following error message: `error running operation: rjson::error (JSON assert failed on condition 'false'` The patch fixes it by checking if the json object contains the `rows_merged` element before processing. If the element does not exist, the nodetool will now produce an empty list. Fixes https://github.com/scylladb/scylladb/issues/23540 Closes scylladb/scylladb#23514 (cherry picked from commit `113647550f`) Closes scylladb/scylladb#26947	2025-11-17 14:27:43 +02:00
Benny Halevy	cec15d4175	scylla-sstable: correctly dump sharding_metadata This patch fixes 2 issues at one go: First, Currently sstables::load clears the sharding metadata (via open_data()), and so scylla-sstable always prints an empty array for it. Second, printing token values would generate invalid json as they are currently printed as binary bytes, and they should be printed simply as numbers, as we do elsewhere, for example, for the first and last keys. Fixes #26982 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#26991 (cherry picked from commit `f9ce98384a`) Closes scylladb/scylladb#27033	2025-11-16 16:03:49 +02:00
Dawid Mędrek	a817da2cac	test/cluster/test_maintenance_mode.py: Wait for initialization If we try to perform queries too early, before the call to `storage_service::start_maintenance_mode` has finished, we will fail with the following error: ``` ERROR 2025-11-12 20:32:27,064 [shard 0:sl:d] token_metadata - sorted_tokens is empty in first_token_index! ``` To avoid that, we should wait until initialization is complete. (cherry picked from commit `b357c8278f`)	2025-11-15 22:09:51 +00:00
Dawid Mędrek	fb6625000b	test: Disable maintenance mode correctly in test_maintenance_mode.py Although setting the value of `maintenance_mode` to the string `"false"` disables maintenance mode, the testing framework misinterprets the value and thinks that it's actually enabled. As a result, it might try to connect to Scylla via the maintenance socket, which we don't want. (cherry picked from commit `394207fd69`)	2025-11-15 22:09:51 +00:00
Dawid Mędrek	aa19daaafe	test: Fix keyspace in test_maintenance_mode.py The keyspace used in the test is not necessarily called `ks`. (cherry picked from commit `222eab45f8`)	2025-11-15 22:09:51 +00:00
Raphael S. Carvalho	316e702d45	sstables_loader: Don't bypass synchronization with busy topology The patch `c543059f86` fixed the synchronization issue between tablet split and load-and-stream. The synchronization worked only with raft topology, and therefore was disabled with gossip. To do the check, storage_service::raft_topology_change_enabled() but the topology kind is only available/set on shard 0, so it caused the synchronization to be bypassed when load-and-stream runs on any shard other than 0. The reason the reproducer didn't catch it is that it was restricted to single cpu. It will now run with multi cpu and catch the problem observed. Fixes #22707 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#26730 (cherry picked from commit `7f34366b9d`) (cherry picked from commit `4c466ace4f`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-11-14 10:51:32 -03:00
Botond Dénes	64ffd56f69	Merge '[Backport 2025.2] [schema] Speculative retry rounding fix' from Scylladb[bot] This patch series re-enables support for speculative retry values `0` and `100`. These values have been supported some time ago, before [schema: fix issue 21825: add validation for PERCENTILE values in speculative_retry configuration. #21879 ](https://github.com/scylladb/scylladb/pull/21879). When that PR prevented using invalid `101PERCENTILE` values, valid `100PERCENTILE` and `0PERCENTILE` value were prevented too. Reproduction steps from [[Bug]: drop schema and all tables after apply speculative_retry = '99.99PERCENTILE' #26369](https://github.com/scylladb/scylladb/issues/26369) are unable to reproduce the issue after the fix. A test is added to make sure the inclusive border values `0` and `100` are supported. Documentation is updated to give more information to the users. It now states that these border values are inclusive, and also that the precision, with automatic rounding, is 1 decimal digit. Fixes #26369 This is a bug fix. If at any time a client tries to use value >= 99.5 and < 100, the raft error will happen. Backport is needed. The code which introduced inconsistency is introduced in 2025.2, so no backporting to 2025.1. - (cherry picked from commit `da2ac90bb6`) - (cherry picked from commit `5d1913a502`) - (cherry picked from commit `aba4c006ba`) - (cherry picked from commit `85f059c148`) - (cherry picked from commit `7ec9e23ee3`) Parent PR: #26909 Closes scylladb/scylladb#27013 * github.com:scylladb/scylladb: test: cqlpy: add test case for non-numeric PERCENTILE value schema: speculative_retry: update exception type for sstring ops docs: cql: ddl.rst: update speculative-retry-options test: cqlpy: add test for valid speculative_retry values schema: speculative_retry: allow 0 and 100 PERCENTILE values	2025-11-14 10:38:18 +02:00
Ernest Zaslavsky	8ae4c66750	minio: update CLI usage, remove deprecated `mc` options Replace phased-out `mc` command options with supported alternatives. Ensures compatibility with the latest MinIO version. Closes scylladb/scylladb#24363 (cherry picked from commit `1446f57635`) Closes scylladb/scylladb#27005	2025-11-14 10:35:57 +02:00
Dario Mirovic	30d615e76c	test: cqlpy: add test case for non-numeric PERCENTILE value Add test case for non-numeric PERCENTILE value, which raises an error different to the out-of-range invalid values. Regex in the test test_invalid_percentile_speculative_retry_values is expanded. Refs #26369 (cherry picked from commit `7ec9e23ee3`)	2025-11-13 19:44:10 +00:00
Dario Mirovic	5e2ba892ee	test: cqlpy: add test for valid speculative_retry values test_valid_percentile_speculative_retry_values is introduced to test that valid values for speculative_retry are properly accepted. Some of the values are moved from the test_invalid_percentile_speculative_retry_values test, because the previous commit added support for them. Refs #26369 (cherry picked from commit `5d1913a502`)	2025-11-13 19:44:10 +00:00
Raphael S. Carvalho	b280b1da66	test: Add reproducer for l-a-s and split synchronization issue Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `4654cdc6fd`)	2025-11-12 21:18:32 -03:00
Piotr Dulikowski	ff4ca88768	Merge '[Backport 2025.2] transport: call update_scheduling_group for non-auth connections' from Andrzej Jackowski This is backport of fix for https://github.com/scylladb/scylladb/issues/26040 and related test (https://github.com/scylladb/scylladb/pull/26589) to 2025.2. Before this change, unauthorized connections stayed in main scheduling group. It is not ideal, in such case, rather sl:default should be used, to have a consistent behavior with a scenario where users is authenticated but there is no service level assigned to the user. This commit adds a call to update_scheduling_group at the end of connection creation for an unauthenticated user, to make sure the service level is switched to sl:default. Fixes: https://github.com/scylladb/scylladb/issues/26040 Fixes: https://github.com/scylladb/scylladb/issues/26581 (cherry picked from commit `278019c328`) (cherry picked from commit `8642629e8e`) No backport, as it's already a backport (but similar PRs will be created for 2025.3, and 2025.4) Closes scylladb/scylladb#26813 * github.com:scylladb/scylladb: test: add test_anonymous_user to test_raft_service_levels transport: call update_scheduling_group for non-auth connections	2025-11-09 03:06:22 +01:00
Andrzej Jackowski	43dbaeebbc	test: add test_anonymous_user to test_raft_service_levels The primary goal of this test is to reproduce scylladb/scylladb#26040 so the fix (`278019c328`) can be backported to older branches. Scenario: connect via CQL as an anonymous user and verify that the `sl:default` scheduling group is used. Before the fix for #26040 `main` scheduling group was incorrectly used instead of `sl:default`. Control connections may legitimately use `sl:driver`, so the test accepts those occurrences while still asserting that regular anonymous queries use `sl:default`. This adds explicit coverage on master. After scylladb#24411 was implemented, some other tests started to fail when scylladb#26040 was unfixed. However, none of the tests asserted this exact behavior. Refs: scylladb/scylladb#26040 Refs: scylladb/scylladb#26581 Closes scylladb/scylladb#26589 (cherry picked from commit `8642629`)	2025-10-30 18:37:12 +01:00
Pavel Emelyanov	05c0f8ed03	lister: Fix race between readdir and stat Sometimes file::list_directory() returns entries without type set. In thase case lister calls file_type() on the entry name to get it. In case the call returns disengated type, the code assumes that some error occurred and resolves into exception. That's not correct. The file_type() method returns disengated type only if the file being inspected is missing (i.e. on ENOENT errno). But this can validly happen if a file is removed bettween readdir and stat. In that case it's not "some error happened", but a enry should be just skipped. In "some error happened", then file_type() would resolve into exceptional future on its own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26595 (cherry picked from commit `d9bfbeda9a`) Closes scylladb/scylladb#26759	2025-10-29 11:38:35 +02:00
Patryk Jędrzejczak	397a214cbd	test: test_raft_recovery_stuck: reconnect driver after rolling restarts It turns out that #21477 wasn't sufficient to fix the issue. The driver may still decide to reconnect the connection after `rolling_restart` returns. One possible explanation is that the driver sometimes handles the DOWN notification after all nodes consider each other UP. Reconnecting the driver after restarting nodes seems to be a reliable workaround that many tests use. We also use it here. Fixes #19959 Closes scylladb/scylladb#26638 (cherry picked from commit `5321720853`) Closes scylladb/scylladb#26755	2025-10-29 11:37:15 +02:00
Patryk Jędrzejczak	f2535f2c5e	test: test group0 tombstone GC in the Raft-based recovery procedure We add a regression test for the bug fixed in the previous commits. (cherry picked from commit `c57f097630`)	2025-10-24 13:02:27 +02:00
Botond Dénes	0299d7bc46	Merge '[Backport 2025.2] db/config: Add SSTable compression options for user tables' from Scylladb[bot] ScyllaDB offers the `compression` DDL property for configuring compression per user table (compression algorithm and chunk size). If not specified, the default compression algorithm is the LZ4Compressor with a 4KiB chunk size. The same default applies to system tables as well. This series introduces a new configuration option to allow customizing the default for user tables. It also adds some tests for the new functionality. Fixes #25195. - (cherry picked from commit `1106157756`) - (cherry picked from commit `ea41f652c4`) - (cherry picked from commit `a7e46974d4`) - (cherry picked from commit `e1d9c83406`) - (cherry picked from commit `8d5bd212ca`) - (cherry picked from commit `6ba0fa20ee`) - (cherry picked from commit `8410532fa0`) Parent PR: #26003 Closes scylladb/scylladb#26300 * github.com:scylladb/scylladb: test/cluster: Add tests for invalid SSTable compression options test/boost: Add tests for SSTable compression config options main: Validate SSTable compression options from config db/config: Add SSTable compression options for user tables db/config: Prepare compression_parameters for config system compressor: Validate presence of sstable_compression in parameters compressor: Add missing space in exception message	2025-10-20 10:43:08 +03:00
Nikos Dragazis	d995abfe0b	test/cluster: Add tests for invalid SSTable compression options Complementary to the previous patch. It triggers semantic validation checks in `compression_parameters::validate()` and expects the server to exit. The tests examine both command line and YAML options. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `8410532fa0`)	2025-10-20 00:00:03 +03:00
Nikos Dragazis	979925e822	test/boost: Add tests for SSTable compression config options Since patch `03461d6a54`, all boost unit tests depending on `cql_test_env` are compiled into a single executable (`combined_tests`). Add the new test in there. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `6ba0fa20ee`)	2025-10-20 00:00:02 +03:00
Patryk Jędrzejczak	b6a0a1290d	test: test_raft_recovery_entry_loss: fix the typo in the test case name (cherry picked from commit `71de01cd41`)	2025-10-17 10:26:22 +00:00
Patryk Jędrzejczak	26ed158cb0	test: verify that schema pulls are disabled in the Raft-based recovery procedure We do this at the end of `test_raft_recovery_entry_loss`. It's not worth to add a separate regression test, as tests of the recovery procedure are complicated and have a long running time. Also, we choose `test_raft_recovery_entry_loss` out of all tests of the recovery procedure because it does some schema changes. (cherry picked from commit `da8748e2b1`)	2025-10-17 10:26:22 +00:00
Michał Chojnowski	fc8cc87fc2	test/boost/sstable_compressor_factory_test: fix thread-unsafe usage of Boost.Test It turns out that Boost assertions are thread-unsafe, (and can't be used from multiple threads concurrently). This causes the test to fail with cryptic log corruptions sometimes. Fix that by switching to thread-safe checks. Fixes scylladb/scylladb#24982 Closes scylladb/scylladb#26472 (cherry picked from commit `7c6e84e2ec`) Closes scylladb/scylladb#26550	2025-10-15 12:15:48 +03:00
Michał Chojnowski	e585d6cb3b	test_sstable_compression_dictionaries_basic: reconnect robustly after node reboots Using `driver_connect()` after a cluster restart isn't enough to ensure full CQL availability, but the test assumes that it is. Fix that by making the test wait for CQL availability via `get_ready_cql()`. Also, replace some manual usages of wait_for_cql_and_get_hosts with `get_ready_cql()` too. Fixes scylladb/scylladb#25362 Closes scylladb/scylladb#25366 (cherry picked from commit `85fd4d23fa`) Closes scylladb/scylladb#26513	2025-10-12 21:09:18 +03:00
Michael Litvak	4d54e98304	auth: add query_state parameter to query functions add a query_state parameter to several auth functions that execute internal queries. currently the queries use the internal_distributed_query_state() query state, and we maintain this as default, but we want also to be able to pass a query state from the caller. in particular, the auth queries currently use a timeout of 5 seconds, and we will want to set a different timeout when executed in some different context. (cherry picked from commit `3c3dd4cf9d`)	2025-10-09 12:47:31 +00:00
Raphael S. Carvalho	40bad7524f	replica: Fix race between drop table and merge completion handling Consider this: 1) merge finishes, wakes up fiber to merge compaction groups 2) drop table happens, which in turn invokes truncate underneath 3) merge fiber stops old groups 4) truncate disables compaction on all groups, but the ones stopped 5) truncate performs a check that compaction has been disabled on all groups, including the ones stopped 6) the check fails because groups being stopped didn't have compaction explicitly disabled on them To fix it, the check on step 6 will ignore groups that have been stopped, since those are not eligible for having compaction explicitly disabled on them. The compaction check is there, so ongoing compaction will not propagate data being truncated, but here it happens in the context of drop table which doesn't leave anything behind. Also, a group stopped is somewhat equivalent to compaction disabled on it, since the procedure to stop a group stops all ongoing compaction and eventually removes its state from compaction manager. Fixes #25551. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#25563 (cherry picked from commit `149f9d8448`) Closes scylladb/scylladb#25631	2025-10-08 09:40:44 +03:00
Benny Halevy	2c6cfad7e4	test_tablets_merge: test_tablet_split_merge_with_many_tables: reduce number of tables in debug mode As the test hits timeouts in debug mode on aarch64. Fixes #26252 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#26303 (cherry picked from commit `b81c6a339b`) Closes scylladb/scylladb#26324	2025-10-01 14:12:58 +03:00
Botond Dénes	bbf9ac6252	Merge '[Backport 2025.2] compaction: ensure that all compaction executors are stopped' from Scylladb[bot] Currently, while stopping the compaction_manager, we stop task_manager compaction module and concurrently run compaction_manager::really_do_stop. really_do_stop stops and waits for all task_executors that are kept in compaction_manager::_tasks, but nothing ensures that no more tasks will be added there. Due to leftover tasks, we trigger on_fatal_internal_error. Modify the order of compaction_manager::stop. After the change, we stop compaction tasks in the following order: - abort module abort source; - close module gate in the background; - stop_ongoing_compactions (kept in compaction_manager::_tasks); - wait until module gate is closed. Check module abort source before creating compaction executor and adding it to _tasks. Thanks to the above, we can be sure that: - after module::stop there will be no tasks in _tasks; - compaction_manager::stop aborts all tasks; we don't wait for any whole compaction to finish. Fixes: https://github.com/scylladb/scylladb/issues/25806. Fixes shutdown bug; Needs backports to all version - (cherry picked from commit `17707d0e6b`) - (cherry picked from commit `97c77d7cd5`) Parent PR: #25885 Closes scylladb/scylladb#26223 * github.com:scylladb/scylladb: compaction: move _tasks check compaction: stop compaction module in really_do_stop	2025-10-01 14:08:45 +03:00
Tomasz Grabiec	d3d07331b3	Merge '[Backport 2025.2] replica: Fix split compaction when tablet boundaries change' from Scylladb[bot] Consider the following: 1) balancer emits split decision 2) split compaction starts 3) split decision is revoked 4) emits merge decision 5) completes merge, before compaction in step 2 finishes After last step, split compaction initiated in step 2 can fail because it works with the global tablet map, rather than the map when the compaction started. With the global state changing under its feet, on merge, the mutation splitting writer will think it's going backwards since sibling tablets are merged. This problem was also seen when running load-and-stream, where split initiated by the sstable writer failed, split completed, and the unsplit sstable is left in the table dir, causing problems in the restart. To fix this, let's make split compaction always work with the state when it started, not a global state. Fixes #24153. All 2025.* versions are vulnerable, so fix must be backported to them. - (cherry picked from commit `0c1587473c`) - (cherry picked from commit `68f23d54d8`) Parent PR: #25690 Closes scylladb/scylladb#25934 * github.com:scylladb/scylladb: replica: Fix split compaction when tablet boundaries change replica: Futurize split_compaction_options()	2025-09-30 19:50:50 +02:00
Raphael S. Carvalho	65e78d6336	replica: Fix split compaction when tablet boundaries change Consider the following: 1) balancer emits split decision 2) split compaction starts 3) split decision is revoked 4) emits merge decision 5) completes merge, before compaction in step 2 finishes After last step, split compaction initiated in step 2 can fail because it works with the global tablet map, rather than the map when the compaction started. With the global state changing under its feet, on merge, the mutation splitting writer will think it's going backwards since sibling tablets are merged. This problem was also seen when running load-and-stream, where split initiated by the sstable writer failed, split completed, and the unsplit sstable is left in the table dir, causing problems in the restart. To fix this, let's make split compaction always work with the state when it started, not a global state. Fixes #24153. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `68f23d54d8`)	2025-09-29 20:26:36 -03:00
Aleksandra Martyniuk	bb548aae54	test: fix test_two_tablets_concurrent_repair_and_migration_repair_writer_level test_two_tablets_concurrent_repair_and_migration_repair_writer_level waits for the first node that logs info about repair_writer using asyncio.wait. The done group is never awaited, so we never learn about the error. The test itself is incorrect and the log about repair_writer is never printed. We never learn about that and tests finishes successfully after 10 minutes timeout. Fix the test: - disable hinted handoff; - repair tablets of the whole table: - new table is added so that concurrent migration is possible; - use wait_for_first_completed that awaits done group; - do some cleanups. Remove nightly mark. Fixes: #26148. Closes scylladb/scylladb#26209 (cherry picked from commit `48bbe09c8b`) Closes scylladb/scylladb#26219	2025-09-27 17:26:16 +03:00
Aleksandra Martyniuk	5cfd052e1b	compaction: stop compaction module in really_do_stop Currently, compaction::task_manager_module is stopped in compaction_manager::stop, concurrently to really_do_stop. We can't predict the order of the two. Do not set _task_manager_module to nullptr at stop, because compaction_manager::really_do_stop() may be called before the actual shutdown, while other components still try to use it. compaction::task_manager_module does not keep a pointer to compaction_manager, so we won't end up with memory leak. Stop compaction module in really_do_stop, after ongoing compactions are stopped. It's a preparation for further patches. (cherry picked from commit `17707d0e6b`)	2025-09-25 16:01:39 +02:00
Ferenc Szili	9208bdfe93	load_balancer: fix std::out_of_bounds when decommissioning with empty nodes Consider the following: The tablet load balancer is working on: - node1: an empty node (no tablets) with a large disk capacity - node2: an empty node (no tablets) with a lower disk capacity then node1 - node3: is being decommissioned and contains tablet replicas In load_balancer::make_internode_plan() the initial destination node/shard is selected like this: // Pick best target shard. auto dst = global_shard_id {target, _load_sketch->get_least_loaded_shard(target)}; load_sketch::get_least_loaded_shard(host_id) calls ensure_node() which adds the host to load_sketch's internal hash maps in case the node was not yet seen by load_sketch. Let's assume dst is a shard on node1. Later in load_balancer::make_internode_plan() we will call pick_candidate() to try to find a better destination node than the initial one: // May choose a different source shard than src.shard or different destination host/shard than dst. auto candidate = co_await pick_candidate(nodes, src_node_info, target_info, src, dst, nodes_by_load_dst, drain_skipped); auto source_tablets = candidate.tablets; src = candidate.src; dst = candidate.dst; If pick_candidate() selects some other empty destination (due to larger capacity: node1) node, and that node has not yet been seen by load_sketch (because it was empty), a subsequent call to load_sketch::pick() will search for the node using std::unordered_map::at(), and because the node is not found it will throw a std::out_of_bounds() exception crashing the load balancer. This problem is fixed by changing load_sketch::populate() to initialize its internal maps with all the nodes which populate()'s arguments filter for. Fixes: #26203 Closes scylladb/scylladb#26207 (cherry picked from commit `c6c9c316a7`) Closes scylladb/scylladb#26239	2025-09-25 09:47:17 +03:00
Dawid Mędrek	96b865ab9b	db/batchlog: Drop batch if table has been dropped If there are pending mutations in the batchlog for a table that has been dropped, we'll keep attempting to replay them but with no success -- `db::no_such_column_family` exceptions will be thrown, and we'll keep trying again and again. To prevent that, we drop the batch in that case just like we do in the case of a non-existing keyspace. A reproducer test has been included in the commit. It fails without the changes in `db/batchlog_manager.cc`, and it succeeds with them. Fixes scylladb/scylladb#24806 Closes scylladb/scylladb#26057 (cherry picked from commit `35f7d2aec6`) Closes scylladb/scylladb#26200	2025-09-24 09:54:03 +03:00
Patryk Jędrzejczak	b747da8b64	test: deflake driver reconnections in the recovery procedure tests All three tests could hit https://github.com/scylladb/python-driver/issues/295. We use the standard workaround for this issue: reconnecting the driver after the rolling restart, and before sending any requests to local tables (that can fail if the driver closes a connection to the node that restarted last). All three tests perform two rolling restarts, but the latter ones already have the workaround. Fixes #26005 Closes scylladb/scylladb#26056 (cherry picked from commit `a56115f77b`) Closes scylladb/scylladb#26197	2025-09-24 09:53:45 +03:00
Dawid Mędrek	4111afa4f7	test/perf/tablet_load_balancing.cc: Create nodes within one DC In `789a4a1ce7`, we adjusted the test file to work with the configuration option `rf_rack_valid_keyspaces`. Part of the commit was making the two tables used in the test replicate in separate data centers. Unfortunately, that destroyed the point of the test because the tables no longer competed for resources. We fix that by enforcing the same replication factor for both tables. We still accept different values of replication factor when provided manually by the user (by `--rf1` and `--rf2` commandline options). Scylla won't allow for creating RF-rack-invalid keyspaces, but there's no reason to take away the flexibility the user of the test already has. Fixes scylladb/scylladb#26026 Closes scylladb/scylladb#26115 (cherry picked from commit `0d2560c07f`) Closes scylladb/scylladb#26171	2025-09-24 09:53:20 +03:00
Pavel Emelyanov	43dd627118	Merge '[Backport 2025.2] compaction/scrub: register sstables for compaction before validation' from Scylladb[bot] compaction/scrub: register sstables for compaction before validation When `scrub --validate` runs, it collects all candidate sstables at the start and validates them one by one in separate compaction tasks. However, scrub in validate mode does not register these sstables for compaction, which allows regular compaction to pick them up and potentially compact them away before validation begins. This leads to scrub failures because the sstables can no longer be found. This patch fixes the issue by first disabling compaction, collecting the sstables, and then registering them for compaction before starting validation. This ensures that the enqueued sstables remain available for the entire duration of the scrub validation task. Fixes #23363 This reported scrub failure occurs on all versions that have the checksum/digest validation feature for uncompressed sstables. So, backport it to older versions. - (cherry picked from commit `84f2e99c05`) - (cherry picked from commit `7cdda510ee`) Parent PR: #26034 Closes scylladb/scylladb#26098 * github.com:scylladb/scylladb: compaction/scrub: register sstables for compaction before validation compaction/scrub: handle exceptions when moving invalid sstables to quarantine	2025-09-24 09:52:57 +03:00
Lakshmi Narayanan Sreethar	7eecfeb351	compaction/scrub: register sstables for compaction before validation When `scrub --validate` runs, it collects all candidate sstables at the start and validates them one by one in separate compaction tasks. However, scrub in validate mode does not register these sstables for compaction, which allows regular compaction to pick them up and potentially compact them away before validation begins. This leads to scrub failures because the sstables can no longer be found. This patch fixes the issue by first disabling compaction, collecting the sstables, and then registering them for compaction before starting validation. This ensures that the enqueued sstables remain available for the entire duration of the scrub validation task. Fixes #23363 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `7cdda510ee`) Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-09-19 23:43:59 +05:30
Nadav Har'El	ca037dd072	alternator: fix bug in combination of AttributeUpdates + ReturnValues In test/alternator/test_returnvalues.py we had tests for the ReturnValues feature on UpdateItem requests - but we only tested UpdateItem requests with the "modern" UpdateExpression, and forgot to test the combination of ReturnValues with the old AttributeUpdates API. It turns out this combination is buggy: when both ReturnValues=ALL_OLD and AttributeUpdates need the previous value of the item, we may wrongly std::move() the value out, and the operation will fail with a strange error: An error occurred (ValidationException) when calling the UpdateItem operation: JSON assert failed on condition 'IsObject()' The fix in this patch is trivial - just move the std::move() to the correct place, after both UpdateExpression and AttributeUpdates handling is done. This patch also includes a reproducing test, which fails before this patch and passes with it - and of course passes on DynamoDB. This test reproduces two cases where the bug happened, as well as one case where it didn't (to make sure we don't regress in what already worked). Fixes #25894 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25900 (cherry picked from commit `3c0032deb4`) Closes scylladb/scylladb#26095	2025-09-19 19:15:17 +03:00
Szymon Malewski	71aa0c0daa	alternator/expressions.g: Fix antlr3 missing token leak This patch overrides the antlr3 function that allocates the missing tokens that would eventually leak. The override stores these tokens in a vector, ensuring memory is freed whenever the parser is destroyed. Solution is copied from CQL implementation. A unit test to reproduce the issue is added - leak would be reported by ASAN, when running this test in debug mode - the test passed but the leak is discovered when the test file exits. Fixes #25878 Closes scylladb/scylladb#25930 (cherry picked from commit `776f90e2f8`) Closes scylladb/scylladb#26084	2025-09-18 07:46:37 +03:00
Wojciech Mitros	26bb601dc0	storage_proxy: send hints to pending replicas Consider the following scenario: - Current replica set is [A, B, C] - write succeeds on [A, B], and a hint is logged for node C - before the hint is replayed, D bootstraps and the token migrates from C to D - hint is replayed to node C while D is pending, but it's too late, since streaming for that token is already done - C is cleaned up, replayed data is lost, and D has a stale copy until next repair. In the scenario we effectively fail to send the hint. This scenario is also more likely to happen with tablets, as it can happen for every tablet migration. This issue is particularly detrimental to materialized views. View updates use hints by default and a specific view update may be sent to just one view replica (when a single base replica has a different row state due to reordering or missed writes). When we lose a hint for such a view update, we can generate a persistent inconsistency between the base and view - ghost rows can appear due to a lost tombstone and rows may be missing in the view due to a lost row update. Such inconsistencies can't be fixed neither by repairing the view or the base table. To handle this, in this patch we add the pending replicas to the list of targets of each hint, even if the original target is still alive. This will cause some updates to be redundant. These updates are probably unavoidable for now, but they shouldn't be too common either. The scenarios for them are: 1. managing to send the hint to the source of a migrating replica before streaming that its token - the write will arrive on the pending replica anyway in streaming 2. the hint target not being the source of the migration - if we managed to apply the original write of the hint to the actual source of the migration, the pending replica will get it during streaming 3. sending the same hint to many targets at a similar time - while sending to each target, we'll see the same pending replica for the hint so we'll send it multiple times 4. possible retries where even though the hint was successfully sent to the main target, we failed to send it to the pending replica, so we need to retry the entire write This patch handles both tablet migrations and tablet rebuilds. In the future, for tablet migrations, we can avoid sending the hint to pending replias if the hint target is not the source fo the migration, which would allow us to avoid the redundant writes 2 and 3. For rack-aware RF, this will be as simple as checking whether the replicas are in the same rack. We also add a test case reproducing the issue. Co-Authored-By: Raphael S. Carvalho <raphaelsc@scylladb.com> Fixes https://github.com/scylladb/scylladb/issues/19835 Closes scylladb/scylladb#25590 (cherry picked from commit `10b8e1c51c`) Closes scylladb/scylladb#25881	2025-09-16 12:24:45 +02:00
Wojciech Mitros	d3a8de60aa	mv: delete previously undetected ghost rows in PRUNE MATERIALIZED VIEW statement The PRUNE MATERIALIZED VIEW statement is supposed to remove ghost rows from the view. Ghost rows are rows in the view with no corresponding row in the base table. Before this patch, only rows whose primary key columns of the base table had different values than any of the base rows were treated as ghost rows by the PRUNE statement. However, view rows which have a column in their primary key that's not in the base primary can also be ghost rows if this column has a different value than the base row with the same values of remaining primary key columns. That's because these rows won't be deleted unless we change value of this column in the base table to this specific value. In this patch we add a check for this column in the PRUNE MATERIALIZED VIEW logic. If this column isn't the same in the base table and the view, these rows are also deleted. Fixes https://github.com/scylladb/scylladb/issues/25655 Closes scylladb/scylladb#25720 (cherry picked from commit `1f9be235b8`) Closes scylladb/scylladb#25955	2025-09-16 12:19:15 +02:00
Patryk Jędrzejczak	3dfbd813ad	test: cluster: deflake consistency checks after decommission In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). Therefore, `check_token_ring_and_group0_consistency` called just after decommission might fail when the decommissioned node is still in group 0 (as a non-voter). We deflake all tests that call `check_token_ring_and_group0_consistency` after decommission in this commit. Fixes #25809 (cherry picked from commit `bb9fb7848a`)	2025-09-10 17:48:35 +00:00
Patryk Jędrzejczak	e44d154470	test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). `wait_for_token_ring_and_group0_consistency` doesn't handle such a case; it only handles cases where the token ring is updated later. We fix this in this commit. We rely on the new implementation of `wait_for_token_ring_and_group0_consistency` in the following commit to fix flakiness of some tests. We also update the obsolete docstring in this commit. (cherry picked from commit `e41fc841cd`)	2025-09-10 17:48:35 +00:00
Piotr Dulikowski	2a82604998	Merge '[Backport 2025.2] service/qos: Modularize service level controller to avoid invalid access to auth::service' from Scylladb[bot] Move management over effective service levels from `service_level_controller` to a new dedicated type -- `auth_integration`. Before these changes, it was possible for the service level controller to try to access `auth::service` after it was deinitialized. For instance, it could happen when reloading the cache. That HAS happened as described in the following issue: scylladb/scylladb#24792. Although the problem might have been mitigated or even resolved in scylladb/scylladb@10214e13bd, it's not clear how the service will be used in the future. It's best to prevent similar bugs than trying to fix them later on. The logic responsible for preventing to access an uninitialized `auth::service` was also either non-existent, complex, or non-sufficient. To prevent accessing `auth::service` by the service level controller, we extract the relevant portion of the code to a separate entity -- `auth_integration`. It's an internal helper type whose sole purpose is to manage effective service levels. Thanks to that, we were able to nest the lifetime of `auth_integration` within the lifetime of `auth::service`. It's now impossible to attempt to dereference it while it's uninitialized. If a bug related to an invalid access is spotted again, though, it might also be easier to debug it now. There should be no visible change to the users of the interface of the service level controller. We strived to make the patch minimal, and the only affected part of the logic should be related to how `auth::service` is accessed. The relevant portion of the initialization and deinitialization flow: (a) Before the changes: 1. Initialize `service_level_controller`. Pass a reference to an uninitialized `auth::service` to it. 2. Initialize other services. 3. Initialize and start `auth::service`. 4. (work) 5. Stop and deinitialize `auth::service`. 6. Deinitialize other services. 7. Deinitialize `service_level_controller`. (b) After the changes: 1. Initialize `service_level_controller`. Pass a reference to an uninitialized `auth::service` to it. () 2. Initialize other services. 3. Initialize and start `auth::service`. 4. Initialize `auth_integration`. Register it in `service_level_controller`. 5. (work) 6. Unregister `auth_integration` in `service_level_controller` and deinitialize it. 7. Stop and deinitialize `auth::service`. 8. Deinitialize other services. 9. Deinitialize `service_level_controller`. (): The reference to `auth::service` in `service_level_controller` is still necessary. We need to access the service when dropping a distributed service level. Although it would be best to cut that link between the service level controller and `auth::service` too, effectively separating the entities, it would require more work, so we leave it as-is for now. It shouldn't prove problematic as far as accessing an uninitialized service goes. Trying to drop a service level at the point when we're de-initializing auth should be impossible. For more context, see the function `drop_distributed_service_level` in `service_level_controller`. A trivial test has been included in the PR. Although its value is questionable as we only try to reload the service level cache at a specific moment, it's probably the best we can deliver to provide a reproducer of the issue this patch is resolving. Fixes scylladb/scylladb#24792 Backport: The impact of the bug was minimal as it only affected the shutdown. However, since CI is failing because of it, let's backport the change to all supported versions. - (cherry picked from commit `7d0086b093`) - (cherry picked from commit `34afb6cdd9`) - (cherry picked from commit `e929279d74`) - (cherry picked from commit `dd5a35dc67`) - (cherry picked from commit `fc1c41536c`) Parent PR: #25478 Closes scylladb/scylladb#25752 * github.com:scylladb/scylladb: service/qos: Move effective SL cache to auth_integration service/qos: Add auth::service to auth_integration service/qos: Reload effective SL cache conditionally service/qos: Add gate to auth_integration service/qos: Introduce auth_integration	2025-09-10 09:47:53 +02:00
Dawid Mędrek	8847f3996e	test/perf: Adjust tablet_load_balancing.cc to RF-rack-validity We modify the logic to make sure that all of the keyspaces that the test creates are RF-rack-valid. For that, we distribute the nodes across two DCs and as many racks as the provided replication factor. That may have an effect on the load balancing logic, but since this is a performance test and since tablet load balancing is still taking place, it should be acceptable. This commit also finishes work in adjusting perf tests to pass with the `rf_rack_valid_keyspaces` configuration option enabled. The remaining tests either don't attempt to create keyspaces or they already create RF-rack-valid keyspaces. We don't need to explicitly enable the configuration option. It's already enabled by default by `cql_test_config`. The reason why we haven't run into any issue because of that is that performance tests are not part of our CI. Fixes scylladb/scylladb#25127 Closes scylladb/scylladb#25728 (cherry picked from commit `789a4a1ce7`) Closes scylladb/scylladb#25921	2025-09-10 10:17:08 +03:00
Sergey Zolotukhin	a7a7de9a69	gossiper: fix empty initial local node state This change removes the addition of an empty state to `_endpoint_state_map`. Instead, a new state is created locally and then published via replicate, avoiding the issue of an empty state existing in `_endpoint_state_map` before the preemption point. Since this resolves the issue tested in `test_gossiper_empty_self_id_on_shadow_round`, the `xfail` mark has been removed. Fixes: scylladb/scylladb#25831 (cherry picked from commit `b34d543f30`)	2025-09-08 21:54:43 +00:00
Sergey Zolotukhin	f44b578075	gossiper: add test for a race condition in start_gossiping This change adds a test for a race condition in `start_gossiping` that can lead to an empty self state sent in `gossip_get_endpoint_states_response`. Test for scylladb/scylladb#25831 (cherry picked from commit `775642ea23`)	2025-09-08 21:54:43 +00:00
Sergey Zolotukhin	e157e8577e	gossiper: check for a race condition in `do_apply_state_locally` In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change 1. adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. 2. Removes xfail from the test_gossiper_race test since the issue is now fixed. 3. Adds exception handling in `do_shadow_round` to skip responses from nodes that sent an empty host ID. This re-applies the commit `13392a40d4` that was reverted in `46aa59fe49`, after fixing the issues that caused the CI to fail. Fixes: scylladb/scylladb#25702 Fixes: scylladb/scylladb#25621 Ref: scylladb/scylla-enterprise#5613 (cherry picked from commit `f08df7c9d7`)	2025-09-08 21:54:43 +00:00

1 2 3 4 5 ...

8970 Commits