scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 10:00:35 +00:00

Author	SHA1	Message	Date
Benny Halevy	4dcb8c19bd	scylla-sstable: correctly dump sharding_metadata This patch fixes 2 issues at one go: First, Currently sstables::load clears the sharding metadata (via open_data()), and so scylla-sstable always prints an empty array for it. Second, printing token values would generate invalid json as they are currently printed as binary bytes, and they should be printed simply as numbers, as we do elsewhere, for example, for the first and last keys. Fixes #26982 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#26991 (cherry picked from commit `f9ce98384a`) Closes scylladb/scylladb#27030	2025-11-16 16:05:14 +02:00
Ernest Zaslavsky	e185740c54	minio: update CLI usage, remove deprecated `mc` options Replace phased-out `mc` command options with supported alternatives. Ensures compatibility with the latest MinIO version. Closes scylladb/scylladb#24363 (cherry picked from commit `1446f57635`) Closes scylladb/scylladb#27004	2025-11-14 10:48:00 +02:00
Andrei Chekun	88556e6c77	test.py: rewrite the wait_for_first_completed Rewrite wait_for first_completed to return only first completed task guarantee of awaiting(disappearing) all cancelled and finished tasks Use wait_for_first_completed to avoid false pass tests in the future and issues like #26148 Use gather_safely to await tasks and removing warning that coroutine was not awaited Closes scylladb/scylladb#26435 (cherry picked from commit `24d17c3ce5`) Closes scylladb/scylladb#26661	2025-11-12 11:50:51 +01:00
Patryk Jędrzejczak	0cd118f5d6	test: test_raft_recovery_stuck: reconnect driver after rolling restarts It turns out that #21477 wasn't sufficient to fix the issue. The driver may still decide to reconnect the connection after `rolling_restart` returns. One possible explanation is that the driver sometimes handles the DOWN notification after all nodes consider each other UP. Reconnecting the driver after restarting nodes seems to be a reliable workaround that many tests use. We also use it here. Fixes #19959 Closes scylladb/scylladb#26638 (cherry picked from commit `5321720853`) Closes scylladb/scylladb#26749	2025-10-29 11:41:23 +02:00
Botond Dénes	2e375ade3a	Merge '[Backport 2025.1] service/qos: set long timeout for auth queries on SL cache update' from Scylladb[bot] pass an appropriate query state for auth queries called from service level cache reload. we use the function qos_query_state to select a query_state based on caller context - for internal queries, we set a very long timeout. the service level cache reload is called from group0 reload. we want it to have a long timeout instead of the default 5 seconds for auth queries, because we don't have strict latency requirement on the one hand, and on the other hand a timeout exception is undesired in the group0 reload logic and can break group0 on the node. Fixes https://github.com/scylladb/scylladb/issues/25290 backport possible to improve stability - (cherry picked from commit `a1161c156f`) - (cherry picked from commit `3c3dd4cf9d`) - (cherry picked from commit `ad1a5b7e42`) Parent PR: #26180 Closes scylladb/scylladb#26475 * github.com:scylladb/scylladb: service/qos: set long timeout for auth queries on SL cache update auth: add query_state parameter to query functions auth: refactor query_all_directly_granted	2025-10-29 11:40:17 +02:00
Andrei Chekun	c85a0615e0	test.py: fix flaky LDAP tests The issue with current approach is that LDAP server starting on localhost where ports can be busy. This PR migrate using HostRegistry() instead of localhost where no busy ports. This fix has the same idea that was on master #23235. Simple backport is not possible due to huge differences between the branches. Additionally, Minio's host fixed as well, to avoid flakiness. Fixes: #26295 Closes scylladb/scylladb#26518	2025-10-28 13:57:54 +03:00
Andrzej Jackowski	014ecbb2e0	test: add test_anonymous_user to test_raft_service_levels The primary goal of this test is to reproduce scylladb/scylladb#26040 so the fix (`278019c328`) can be backported to older branches. Scenario: connect via CQL as an anonymous user and verify that the `sl:default` scheduling group is used. Before the fix for #26040 `main` scheduling group was incorrectly used instead of `sl:default`. Control connections may legitimately use `sl:driver`, so the test accepts those occurrences while still asserting that regular anonymous queries use `sl:default`. This adds explicit coverage on master. After scylladb#24411 was implemented, some other tests started to fail when scylladb#26040 was unfixed. However, none of the tests asserted this exact behavior. Refs: scylladb/scylladb#26040 Refs: scylladb/scylladb#26581 Closes scylladb/scylladb#26589 (cherry picked from commit `8642629e8e`)	2025-10-27 10:10:51 +01:00
Michael Litvak	28d45bf612	auth: add query_state parameter to query functions add a query_state parameter to several auth functions that execute internal queries. currently the queries use the internal_distributed_query_state() query state, and we maintain this as default, but we want also to be able to pass a query state from the caller. in particular, the auth queries currently use a timeout of 5 seconds, and we will want to set a different timeout when executed in some different context. (cherry picked from commit `3c3dd4cf9d`)	2025-10-09 12:46:53 +00:00
Avi Kivity	67c4d980dd	Revert "Merge 'auth: move passwords::check call to alien thread' from Andrzej Jackowski" This reverts commit `1fd82d32e0`. It causes connection storms to snowball into a node crash via this mechanism: 1. large node suffers mild connection storm 2. password hash requests queue up on alien hash thread 3. incoming hash requests queue faster than the alien thread can retire them. 4. auth latency grows without bounds 5. this encourages the clients to create new connections 6. problem grows Reverting the patch restores the hash stall, but at least prevents node crashes. Fixes #26461 (2025.1) Closes scylladb/scylladb#26462	2025-10-09 11:04:34 +03:00
Benny Halevy	10b473ffcb	test_tablets_merge: test_tablet_split_merge_with_many_tables: reduce number of tables in debug mode As the test hits timeouts in debug mode on aarch64. Fixes #26252 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#26303 (cherry picked from commit `b81c6a339b`) Closes scylladb/scylladb#26323	2025-10-06 17:05:00 +03:00
Botond Dénes	b37ddaee90	Merge '[Backport 2025.1] compaction: ensure that all compaction executors are stopped' from Scylladb[bot] Currently, while stopping the compaction_manager, we stop task_manager compaction module and concurrently run compaction_manager::really_do_stop. really_do_stop stops and waits for all task_executors that are kept in compaction_manager::_tasks, but nothing ensures that no more tasks will be added there. Due to leftover tasks, we trigger on_fatal_internal_error. Modify the order of compaction_manager::stop. After the change, we stop compaction tasks in the following order: - abort module abort source; - close module gate in the background; - stop_ongoing_compactions (kept in compaction_manager::_tasks); - wait until module gate is closed. Check module abort source before creating compaction executor and adding it to _tasks. Thanks to the above, we can be sure that: - after module::stop there will be no tasks in _tasks; - compaction_manager::stop aborts all tasks; we don't wait for any whole compaction to finish. Fixes: https://github.com/scylladb/scylladb/issues/25806. Fixes shutdown bug; Needs backports to all version - (cherry picked from commit `17707d0e6b`) - (cherry picked from commit `97c77d7cd5`) Parent PR: #25885 Closes scylladb/scylladb#26222 * github.com:scylladb/scylladb: compaction: move _tasks check compaction: stop compaction module in really_do_stop	2025-10-06 07:04:20 +03:00
Tomasz Grabiec	e71c8a40a9	Merge '[Backport 2025.1] replica: Fix split compaction when tablet boundaries change' from Scylladb[bot] Consider the following: 1) balancer emits split decision 2) split compaction starts 3) split decision is revoked 4) emits merge decision 5) completes merge, before compaction in step 2 finishes After last step, split compaction initiated in step 2 can fail because it works with the global tablet map, rather than the map when the compaction started. With the global state changing under its feet, on merge, the mutation splitting writer will think it's going backwards since sibling tablets are merged. This problem was also seen when running load-and-stream, where split initiated by the sstable writer failed, split completed, and the unsplit sstable is left in the table dir, causing problems in the restart. To fix this, let's make split compaction always work with the state when it started, not a global state. Fixes #24153. All 2025.* versions are vulnerable, so fix must be backported to them. - (cherry picked from commit `0c1587473c`) - (cherry picked from commit `68f23d54d8`) Parent PR: #25690 Closes scylladb/scylladb#25933 * github.com:scylladb/scylladb: replica: Fix split compaction when tablet boundaries change replica: Futurize split_compaction_options() test: fix flakiness of test_missing_data	2025-09-30 19:45:45 +02:00
Aleksandra Martyniuk	9c5ed9586d	compaction: stop compaction module in really_do_stop Currently, compaction::task_manager_module is stopped in compaction_manager::stop, concurrently to really_do_stop. We can't predict the order of the two. Do not set _task_manager_module to nullptr at stop, because compaction_manager::really_do_stop() may be called before the actual shutdown, while other components still try to use it. compaction::task_manager_module does not keep a pointer to compaction_manager, so we won't end up with memory leak. Stop compaction module in really_do_stop, after ongoing compactions are stopped. It's a preparation for further patches. (cherry picked from commit `17707d0e6b`)	2025-09-25 16:18:45 +02:00
Ferenc Szili	38aa7be6a1	load_balancer: fix std::out_of_bounds when decommissioning with empty nodes Consider the following: The tablet load balancer is working on: - node1: an empty node (no tablets) with a large disk capacity - node2: an empty node (no tablets) with a lower disk capacity then node1 - node3: is being decommissioned and contains tablet replicas In load_balancer::make_internode_plan() the initial destination node/shard is selected like this: // Pick best target shard. auto dst = global_shard_id {target, _load_sketch->get_least_loaded_shard(target)}; load_sketch::get_least_loaded_shard(host_id) calls ensure_node() which adds the host to load_sketch's internal hash maps in case the node was not yet seen by load_sketch. Let's assume dst is a shard on node1. Later in load_balancer::make_internode_plan() we will call pick_candidate() to try to find a better destination node than the initial one: // May choose a different source shard than src.shard or different destination host/shard than dst. auto candidate = co_await pick_candidate(nodes, src_node_info, target_info, src, dst, nodes_by_load_dst, drain_skipped); auto source_tablets = candidate.tablets; src = candidate.src; dst = candidate.dst; If pick_candidate() selects some other empty destination (due to larger capacity: node1) node, and that node has not yet been seen by load_sketch (because it was empty), a subsequent call to load_sketch::pick() will search for the node using std::unordered_map::at(), and because the node is not found it will throw a std::out_of_bounds() exception crashing the load balancer. This problem is fixed by changing load_sketch::populate() to initialize its internal maps with all the nodes which populate()'s arguments filter for. Fixes: #26203 Closes scylladb/scylladb#26207 (cherry picked from commit `c6c9c316a7`) Closes scylladb/scylladb#26238	2025-09-25 09:51:23 +03:00
Raphael S. Carvalho	ca1974da91	replica: Fix split compaction when tablet boundaries change Consider the following: 1) balancer emits split decision 2) split compaction starts 3) split decision is revoked 4) emits merge decision 5) completes merge, before compaction in step 2 finishes After last step, split compaction initiated in step 2 can fail because it works with the global tablet map, rather than the map when the compaction started. With the global state changing under its feet, on merge, the mutation splitting writer will think it's going backwards since sibling tablets are merged. This problem was also seen when running load-and-stream, where split initiated by the sstable writer failed, split completed, and the unsplit sstable is left in the table dir, causing problems in the restart. To fix this, let's make split compaction always work with the state when it started, not a global state. Fixes #24153. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `68f23d54d8`)	2025-09-24 20:27:11 -03:00
Dawid Mędrek	ee800b9682	db/batchlog: Drop batch if table has been dropped If there are pending mutations in the batchlog for a table that has been dropped, we'll keep attempting to replay them but with no success -- `db::no_such_column_family` exceptions will be thrown, and we'll keep trying again and again. To prevent that, we drop the batch in that case just like we do in the case of a non-existing keyspace. A reproducer test has been included in the commit. It fails without the changes in `db/batchlog_manager.cc`, and it succeeds with them. Fixes scylladb/scylladb#24806 Closes scylladb/scylladb#26057 (cherry picked from commit `35f7d2aec6`) Closes scylladb/scylladb#26198	2025-09-24 09:51:29 +03:00
Dawid Mędrek	d6c59c5224	test/perf/tablet_load_balancing.cc: Create nodes within one DC In `789a4a1ce7`, we adjusted the test file to work with the configuration option `rf_rack_valid_keyspaces`. Part of the commit was making the two tables used in the test replicate in separate data centers. Unfortunately, that destroyed the point of the test because the tables no longer competed for resources. We fix that by enforcing the same replication factor for both tables. We still accept different values of replication factor when provided manually by the user (by `--rf1` and `--rf2` commandline options). Scylla won't allow for creating RF-rack-invalid keyspaces, but there's no reason to take away the flexibility the user of the test already has. Fixes scylladb/scylladb#26026 Closes scylladb/scylladb#26115 (cherry picked from commit `0d2560c07f`) Closes scylladb/scylladb#26170	2025-09-24 09:50:39 +03:00
Pavel Emelyanov	4431dd158f	Merge '[Backport 2025.1] compaction/scrub: register sstables for compaction before validation' from Scylladb[bot] compaction/scrub: register sstables for compaction before validation When `scrub --validate` runs, it collects all candidate sstables at the start and validates them one by one in separate compaction tasks. However, scrub in validate mode does not register these sstables for compaction, which allows regular compaction to pick them up and potentially compact them away before validation begins. This leads to scrub failures because the sstables can no longer be found. This patch fixes the issue by first disabling compaction, collecting the sstables, and then registering them for compaction before starting validation. This ensures that the enqueued sstables remain available for the entire duration of the scrub validation task. Fixes #23363 This reported scrub failure occurs on all versions that have the checksum/digest validation feature for uncompressed sstables. So, backport it to older versions. - (cherry picked from commit `84f2e99c05`) - (cherry picked from commit `7cdda510ee`) Parent PR: #26034 Closes scylladb/scylladb#26097 * github.com:scylladb/scylladb: compaction/scrub: register sstables for compaction before validation compaction/scrub: handle exceptions when moving invalid sstables to quarantine	2025-09-24 09:50:05 +03:00
Nadav Har'El	b7f1497efd	alternator: fix bug in combination of AttributeUpdates + ReturnValues In test/alternator/test_returnvalues.py we had tests for the ReturnValues feature on UpdateItem requests - but we only tested UpdateItem requests with the "modern" UpdateExpression, and forgot to test the combination of ReturnValues with the old AttributeUpdates API. It turns out this combination is buggy: when both ReturnValues=ALL_OLD and AttributeUpdates need the previous value of the item, we may wrongly std::move() the value out, and the operation will fail with a strange error: An error occurred (ValidationException) when calling the UpdateItem operation: JSON assert failed on condition 'IsObject()' The fix in this patch is trivial - just move the std::move() to the correct place, after both UpdateExpression and AttributeUpdates handling is done. This patch also includes a reproducing test, which fails before this patch and passes with it - and of course passes on DynamoDB. This test reproduces two cases where the bug happened, as well as one case where it didn't (to make sure we don't regress in what already worked). Fixes #25894 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25900 (cherry picked from commit `3c0032deb4`) Closes scylladb/scylladb#26094	2025-09-24 09:48:49 +03:00
Łukasz Paszkowski	51b13edd2c	compaction_manager: cancel submission timer on drain The `drain` method, cancels all running compactions and moves the compaction manager into the disabled state. To move it back to the enabled state, the `enable` method shall be called. This, however, throws an assertion error as the submission time is not cancelled and re-enabling the manager tries to arm the armed timer. Thus, cancel the timer, when calling the drain method to disable the compaction manager. Fixes https://github.com/scylladb/scylladb/issues/24504 All versions are affected. So it's a good candidate for a backport. Closes scylladb/scylladb#24505 (cherry picked from commit `a9a53d9178`) Closes scylladb/scylladb#24589	2025-09-23 10:30:34 +03:00
Szymon Malewski	21610258df	alternator/expressions.g: Fix antlr3 missing token leak This patch overrides the antlr3 function that allocates the missing tokens that would eventually leak. The override stores these tokens in a vector, ensuring memory is freed whenever the parser is destroyed. Solution is copied from CQL implementation. A unit test to reproduce the issue is added - leak would be reported by ASAN, when running this test in debug mode - the test passed but the leak is discovered when the test file exits. Fixes #25878 Closes scylladb/scylladb#25930 (cherry picked from commit `776f90e2f8`) Closes scylladb/scylladb#26083	2025-09-19 19:13:24 +03:00
Lakshmi Narayanan Sreethar	f26bc7e2e1	compaction/scrub: register sstables for compaction before validation When `scrub --validate` runs, it collects all candidate sstables at the start and validates them one by one in separate compaction tasks. However, scrub in validate mode does not register these sstables for compaction, which allows regular compaction to pick them up and potentially compact them away before validation begins. This leads to scrub failures because the sstables can no longer be found. This patch fixes the issue by first disabling compaction, collecting the sstables, and then registering them for compaction before starting validation. This ensures that the enqueued sstables remain available for the entire duration of the scrub validation task. Fixes #23363 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `7cdda510ee`) Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-09-19 18:30:13 +05:30
Wojciech Mitros	c32229b35c	storage_proxy: send hints to pending replicas Consider the following scenario: - Current replica set is [A, B, C] - write succeeds on [A, B], and a hint is logged for node C - before the hint is replayed, D bootstraps and the token migrates from C to D - hint is replayed to node C while D is pending, but it's too late, since streaming for that token is already done - C is cleaned up, replayed data is lost, and D has a stale copy until next repair. In the scenario we effectively fail to send the hint. This scenario is also more likely to happen with tablets, as it can happen for every tablet migration. This issue is particularly detrimental to materialized views. View updates use hints by default and a specific view update may be sent to just one view replica (when a single base replica has a different row state due to reordering or missed writes). When we lose a hint for such a view update, we can generate a persistent inconsistency between the base and view - ghost rows can appear due to a lost tombstone and rows may be missing in the view due to a lost row update. Such inconsistencies can't be fixed neither by repairing the view or the base table. To handle this, in this patch we add the pending replicas to the list of targets of each hint, even if the original target is still alive. This will cause some updates to be redundant. These updates are probably unavoidable for now, but they shouldn't be too common either. The scenarios for them are: 1. managing to send the hint to the source of a migrating replica before streaming that its token - the write will arrive on the pending replica anyway in streaming 2. the hint target not being the source of the migration - if we managed to apply the original write of the hint to the actual source of the migration, the pending replica will get it during streaming 3. sending the same hint to many targets at a similar time - while sending to each target, we'll see the same pending replica for the hint so we'll send it multiple times 4. possible retries where even though the hint was successfully sent to the main target, we failed to send it to the pending replica, so we need to retry the entire write This patch handles both tablet migrations and tablet rebuilds. In the future, for tablet migrations, we can avoid sending the hint to pending replias if the hint target is not the source fo the migration, which would allow us to avoid the redundant writes 2 and 3. For rack-aware RF, this will be as simple as checking whether the replicas are in the same rack. We also add a test case reproducing the issue. Co-Authored-By: Raphael S. Carvalho <raphaelsc@scylladb.com> Fixes https://github.com/scylladb/scylladb/issues/19835 Closes scylladb/scylladb#25590 (cherry picked from commit `10b8e1c51c`) Closes scylladb/scylladb#25880	2025-09-17 08:06:58 +02:00
Wojciech Mitros	90536040ad	mv: delete previously undetected ghost rows in PRUNE MATERIALIZED VIEW statement The PRUNE MATERIALIZED VIEW statement is supposed to remove ghost rows from the view. Ghost rows are rows in the view with no corresponding row in the base table. Before this patch, only rows whose primary key columns of the base table had different values than any of the base rows were treated as ghost rows by the PRUNE statement. However, view rows which have a column in their primary key that's not in the base primary can also be ghost rows if this column has a different value than the base row with the same values of remaining primary key columns. That's because these rows won't be deleted unless we change value of this column in the base table to this specific value. In this patch we add a check for this column in the PRUNE MATERIALIZED VIEW logic. If this column isn't the same in the base table and the view, these rows are also deleted. Fixes https://github.com/scylladb/scylladb/issues/25655 Closes scylladb/scylladb#25720 (cherry picked from commit `1f9be235b8`) Closes scylladb/scylladb#25954	2025-09-16 16:00:02 +02:00
Patryk Jędrzejczak	e10cd81dfb	test: cluster: deflake consistency checks after decommission In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). Therefore, `check_token_ring_and_group0_consistency` called just after decommission might fail when the decommissioned node is still in group 0 (as a non-voter). We deflake all tests that call `check_token_ring_and_group0_consistency` after decommission in this commit. Fixes #25809 (cherry picked from commit `bb9fb7848a`)	2025-09-11 15:25:36 +02:00
Patryk Jędrzejczak	b32d90b3f5	test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). `wait_for_token_ring_and_group0_consistency` doesn't handle such a case; it only handles cases where the token ring is updated later. We fix this in this commit. We rely on the new implementation of `wait_for_token_ring_and_group0_consistency` in the following commit to fix flakiness of some tests. We also update the obsolete docstring in this commit. (cherry picked from commit `e41fc841cd`)	2025-09-11 15:25:32 +02:00
Dawid Mędrek	cf376778e1	test/perf: Adjust tablet_load_balancing.cc to RF-rack-validity We modify the logic to make sure that all of the keyspaces that the test creates are RF-rack-valid. For that, we distribute the nodes across two DCs and as many racks as the provided replication factor. That may have an effect on the load balancing logic, but since this is a performance test and since tablet load balancing is still taking place, it should be acceptable. This commit also finishes work in adjusting perf tests to pass with the `rf_rack_valid_keyspaces` configuration option enabled. The remaining tests either don't attempt to create keyspaces or they already create RF-rack-valid keyspaces. We don't need to explicitly enable the configuration option. It's already enabled by default by `cql_test_config`. The reason why we haven't run into any issue because of that is that performance tests are not part of our CI. Fixes scylladb/scylladb#25127 Closes scylladb/scylladb#25728 (cherry picked from commit `789a4a1ce7`) Closes scylladb/scylladb#25920	2025-09-10 11:02:09 +03:00
Sergey Zolotukhin	0d013d6c9e	gossiper: fix empty initial local node state This change removes the addition of an empty state to `_endpoint_state_map`. Instead, a new state is created locally and then published via replicate, avoiding the issue of an empty state existing in `_endpoint_state_map` before the preemption point. Since this resolves the issue tested in `test_gossiper_empty_self_id_on_shadow_round`, the `xfail` mark has been removed. Fixes: scylladb/scylladb#25831 (cherry picked from commit `b34d543f30`)	2025-09-09 01:04:51 +02:00
Sergey Zolotukhin	398d1d0560	gossiper: add test for a race condition in start_gossiping This change adds a test for a race condition in `start_gossiping` that can lead to an empty self state sent in `gossip_get_endpoint_states_response`. Test for scylladb/scylladb#25831 (cherry picked from commit `775642ea23`)	2025-09-09 01:04:51 +02:00
Sergey Zolotukhin	c0e757738a	gossiper: check for a race condition in `do_apply_state_locally` In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change 1. adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. 2. Removes xfail from the test_gossiper_race test since the issue is now fixed. 3. Adds exception handling in `do_shadow_round` to skip responses from nodes that sent an empty host ID. This re-applies the commit `13392a40d4` that was reverted in `46aa59fe49`, after fixing the issues that caused the CI to fail. Fixes: scylladb/scylladb#25702 Fixes: scylladb/scylladb#25621 Ref: scylladb/scylla-enterprise#5613 (cherry picked from commit `f08df7c9d7`)	2025-09-09 01:03:43 +02:00
Emil Maskovsky	695f345821	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. This re-applies the commit `5dac4b38fb` that was reverted in `dc44fca67c`, but modified to relax the check from "on_internal_error" to a just warning log. The more strict can be re-introduced later once we are sure that all remaining problems are resolved and it will not break the CI. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721 (cherry picked from commit `28e0f42a83`)	2025-09-09 01:03:43 +02:00
Avi Kivity	1fd82d32e0	Merge 'auth: move passwords::check call to alien thread' from Andrzej Jackowski Analysis of customer stalls revealed that the function `detail::hash_with_salt` (invoked by `passwords::check`) often blocks the reactor. Internally, this function uses the external `crypt_r` function to compute password hashes, which is CPU-intensive. This PR addresses the issue in two ways: 1) `sha-512` is now the only password hashing scheme for new passwords (it was already the common-case). 2) `passwords::check` is moved to a dedicated alien thread. Regarding point 1: before this change, the following hashing schemes were supported by `identify_best_supported_scheme()`: bcrypt_y, bcrypt_a, SHA-512, SHA-256, and MD5. The reason for this was that the `crypt_r` function used for password hashing comes from an external library (currently `libxcrypt`), and the supported hashing algorithms vary depending on the library in use. However: - The bcrypt schemes never worked properly because their prefixes lack the required round count (e.g. `$2y$` instead of `$2y$05$`). Moreover, bcrypt is slower than SHA-512, so it not good idea to fix or use it. - SHA-256 and SHA-512 both belong to the SHA-2 family. Libraries that support one almost always support the other, so it’s very unlikely to find SHA-256 without SHA-512. - MD5 is no longer considered secure for password hashing. Regarding point 2: the `passwords::check` call now runs on a shared alien thread created at database startup. An `std::mutex` synchronizes that thread with the shards. In theory this could introduce a frequent lock contention, but in practice each shard handles only a few hundred new connections per second—even during storms. There is already `_conns_cpu_concurrency_semaphore` in `generic_server` limits the number of concurrent connection handlers. Fixes https://github.com/scylladb/scylladb/issues/24524 Backport not needed, as it is a new feature. Closes scylladb/scylladb#24924 * github.com:scylladb/scylladb: main: utils: add thread names to alien workers auth: move passwords::check call to alien thread test: wait for 3 clients with given username in test_service_level_api auth: refactor password checking in password_authenticator auth: make SHA-512 the only password hashing scheme for new passwords auth: whitespace change in identify_best_supported_scheme() auth: require scheme as parameter for `generate_salt` auth: check password hashing scheme support on authenticator start (cherry picked from commit `c762425ea7`)	2025-09-07 15:47:34 +03:00
Pavel Emelyanov	235fe9f83e	Revert "test/gossiper: add reproducible test for race condition during node decommission" This reverts commit `4b6097ab5b` because parent PR had been reverted as per #25803	2025-09-05 10:05:55 +03:00
Calle Wilund	9631beeafd	system_keyspace: Prune dropped tables from truncation on start/drop Fixes #25683 Once a table drop is complete, there should be no reason to retain truncation records for it, as any replay should skip mutations anyway (no CF), and iff we somehow resurrect a dropped table, this replay-resurrected data is the least problem anyway. Adds a prune phase to the startup drop_truncation_rp_records run, which ignores updating, and instead deletes records for non-existant tables (which should patch any existing servers with lingering data as well). Also does an explicit delete of records on actual table DROP, to ensure we don't grow this table more than needed even in long uptime nodes. Small unit test included. Closes scylladb/scylladb#25699 (cherry picked from commit `bc20861afb`) Closes scylladb/scylladb#25811	2025-09-04 08:41:30 +03:00
Pavel Emelyanov	1f6c11d7f3	Merge '[Backport 2025.1] drop table: fix crash on drop table with concurrent cleanup' from Scylladb[bot] Consider the following scenario: - A tablet is migrated away from a shard - The tablet cleanup stage closes the storage group's async_gate - A drop table runs truncate which attempts to disable compaction on the tablet with its gate closed. This fails, because table::parallel_foreach_compaction_group() ultimately calls storage_group_manager::parallel_foreach_storage_group() which will not disable compaction if it can't hold the storage group's gate - Truncate calls table::discard_sstables() which checks if the compaction has been disabled, and because it hasn't, it then runs on_internal_error() with "compaction not disabled on table ks.cf during TRUNCATE" which causes a crash Fixes: #25706 This needs to be backported to all supported versions with tablets - (cherry picked from commit `a0934cf80d`) - (cherry picked from commit `1b8a44af75`) Parent PR: #25708 Closes scylladb/scylladb#25783 * github.com:scylladb/scylladb: test: reproducer and test for drop with concurrent cleanup truncate: check for closed storage group's gate in discard_sstables	2025-09-04 08:41:04 +03:00
Pavel Emelyanov	bf0d1cf272	Merge '[Backport 2025.1] encryption_at_rest_test: Improve exception handling and increase verbosity in fake proxy' from Nikos Dragazis PRs included: * #22810 (encryption_at_rest_test/encryption: Add some verbosity etc to help diagnose test run issues) * #23778 (encryption_at_rest_test: Make fake_proxy read/write loop noexcept) Backporting them together because the latter needs the former to avoid conflicts, but the former cannot be backported individually due to not fixing an issue. * (cherry picked from commit `83aa66da1a`) * (cherry picked from commit `5905c19ab4`) * (cherry picked from commit `00263aa57a`) * (cherry picked from commit `4a44651fce`) Refs #22628. Fixes #23774. Closes scylladb/scylladb#25774 * github.com:scylladb/scylladb: encryption_at_rest_test: Make fake_proxy read/write loop noexcept gcp/aws kms: Promote service_error to recoverable + use malformed_response_error encryption_at_rest_test: Add verbosity + earlier stream close to proxy encryption: Add exception handler to context init (for tests)	2025-09-03 20:51:45 +03:00
Pavel Emelyanov	dba40c7a38	Merge '[Backport 2025.1] service/qos: Modularize service level controller to avoid invalid access to auth::service' from Scylladb[bot] Move management over effective service levels from `service_level_controller` to a new dedicated type -- `auth_integration`. Before these changes, it was possible for the service level controller to try to access `auth::service` after it was deinitialized. For instance, it could happen when reloading the cache. That HAS happened as described in the following issue: scylladb/scylladb#24792. Although the problem might have been mitigated or even resolved in scylladb/scylladb@10214e13bd, it's not clear how the service will be used in the future. It's best to prevent similar bugs than trying to fix them later on. The logic responsible for preventing to access an uninitialized `auth::service` was also either non-existent, complex, or non-sufficient. To prevent accessing `auth::service` by the service level controller, we extract the relevant portion of the code to a separate entity -- `auth_integration`. It's an internal helper type whose sole purpose is to manage effective service levels. Thanks to that, we were able to nest the lifetime of `auth_integration` within the lifetime of `auth::service`. It's now impossible to attempt to dereference it while it's uninitialized. If a bug related to an invalid access is spotted again, though, it might also be easier to debug it now. There should be no visible change to the users of the interface of the service level controller. We strived to make the patch minimal, and the only affected part of the logic should be related to how `auth::service` is accessed. The relevant portion of the initialization and deinitialization flow: (a) Before the changes: 1. Initialize `service_level_controller`. Pass a reference to an uninitialized `auth::service` to it. 2. Initialize other services. 3. Initialize and start `auth::service`. 4. (work) 5. Stop and deinitialize `auth::service`. 6. Deinitialize other services. 7. Deinitialize `service_level_controller`. (b) After the changes: 1. Initialize `service_level_controller`. Pass a reference to an uninitialized `auth::service` to it. () 2. Initialize other services. 3. Initialize and start `auth::service`. 4. Initialize `auth_integration`. Register it in `service_level_controller`. 5. (work) 6. Unregister `auth_integration` in `service_level_controller` and deinitialize it. 7. Stop and deinitialize `auth::service`. 8. Deinitialize other services. 9. Deinitialize `service_level_controller`. (): The reference to `auth::service` in `service_level_controller` is still necessary. We need to access the service when dropping a distributed service level. Although it would be best to cut that link between the service level controller and `auth::service` too, effectively separating the entities, it would require more work, so we leave it as-is for now. It shouldn't prove problematic as far as accessing an uninitialized service goes. Trying to drop a service level at the point when we're de-initializing auth should be impossible. For more context, see the function `drop_distributed_service_level` in `service_level_controller`. A trivial test has been included in the PR. Although its value is questionable as we only try to reload the service level cache at a specific moment, it's probably the best we can deliver to provide a reproducer of the issue this patch is resolving. Fixes scylladb/scylladb#24792 Backport: The impact of the bug was minimal as it only affected the shutdown. However, since CI is failing because of it, let's backport the change to all supported versions. - (cherry picked from commit `7d0086b093`) - (cherry picked from commit `34afb6cdd9`) - (cherry picked from commit `e929279d74`) - (cherry picked from commit `dd5a35dc67`) - (cherry picked from commit `fc1c41536c`) Parent PR: #25478 Closes scylladb/scylladb#25750 * github.com:scylladb/scylladb: service/qos: Move effective SL cache to auth_integration service/qos: Add auth::service to auth_integration service/qos: Reload effective SL cache conditionally service/qos: Add gate to auth_integration service/qos: Introduce auth_integration	2025-09-03 20:21:12 +03:00
Ferenc Szili	d8dd7e8eba	test: reproducer and test for drop with concurrent cleanup This change adds a reproducer and test for issue #25706 (cherry picked from commit `1b8a44af75`)	2025-09-03 16:12:49 +02:00
Emil Maskovsky	4b6097ab5b	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721 Backport: The test is primarily for an issue found in 2025.1, so it needs to be backported to all the 2025.x branches. Closes scylladb/scylladb#25685 (cherry picked from commit `5dac4b38fb`) Closes scylladb/scylladb#25779	2025-09-03 13:35:37 +02:00
Raphael S. Carvalho	5dd35e2de5	replica: Fix take_storage_snapshot() running concurrently to merge completion Some background: When merge happens, a background fiber wakes up to merge compaction groups of sibling tablets into main one. It cannot happen when rebuilding the storage group list, since token metadata update is not preemptable. So a storage group, post merge, has the main compaction group and two other groups to be merged into the main. When the merge happens, those two groups are empty and will be freed. Consider this scenario: 1) merge happens, from 2 to 1 tablet 2) produces a single storage group, containing main and two other compaction groups to be merged into main. 3) take_storage_snapshot(), triggered by migration post merge, gets a list of pointer to all compaction groups. 4) t__s__s() iterates first on main group, yields. 5) background fiber wakes up, moves the data into main and frees the two groups 6) t__s__s() advances to other groups that are now freed, since step 5. 7) segmentation fault In addition to memory corruption, there's also a potential for data to escape the iteration in take_storage_snapshot(), since data can be moved across compaction groups in background, all belonging to the same storage group. That could result in data loss. Readers should all operate on storage group level since it can provide a view on all the data owned by a tablet replica. The movement of sstable from group A to B is atomic, but iteration first on A, then later on B, might miss data that was moved from B to A, before the iteration reached B. By switching to storage group in the interface that retrieves groups by token range, we guarantee that all data of a given replica can be found regardless of which compaction group they sit on. Fixes #23162. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#24058 (cherry picked from commit `28056344ba`) Closes scylladb/scylladb#25337	2025-09-03 13:59:32 +03:00
Calle Wilund	39242c3d5a	commitlog: Ensure segment deletion is re-entrant Fixes #25709 If we have large allocations, spanning more than one segment, and the internal segment references from lead to secondary are the only thing keeping a segment alive, the implicit drop in discard_unused_segments and orphan_all can cause a recursive call to discard_unused_segments, which in turn can lead to vector corruption/crash, or even double free of segment (iterator confusion). Need to separate the modification of the vector (_segments) from actual releasing of objects. Using temporaries is the easiest solution. To further reduce recursion, we can also do an early clear of segment dependencies in callbacks from segment release (cf release). Closes scylladb/scylladb#25719 (cherry picked from commit `cc9eb321a1`) Closes scylladb/scylladb#25754	2025-09-03 06:55:59 +03:00
Dawid Mędrek	7b085f9198	service/qos: Add auth::service to auth_integration The new service, `auth_integration`, has taken over the responsibility over managing effective service levels from `service_level_controller`. However, before these changes, it still accessed `auth::service` via the service level controller. Let's change that. Note that we also remove a check that `auth::service` has been initialized. It's not necessary anymore because the lifetime of `auth_integration` is strictly nested within the lifetime of `auth::service`. In actuality, `service_level_controller` should lose its reference to `auth::service` completely. All of the management over effective service levels has already been moved to `auth_integration`. However, the referernce is still needed when dropping a distributed service level because we need to update the corresponding attribute for relevant roles. That should not lead to invalid accesses, though. Dropping a service level should not be possible when `auth::service` is not initialized. (cherry picked from commit `dd5a35dc67`)	2025-09-02 19:43:56 +02:00
Dawid Mędrek	c533c4bf96	service/qos: Reload effective SL cache conditionally Since `service_level_controller` outlives `auth_integration`, it may happen that we try to access it when it has already been deinitialized. To prevent that, we only try to reload or clear the effective service level cache when the object is still alive. These changes solve an existing problem with an invalid memory access. For more context, see issue scylladb/scylladb#24792. We provide a reproducer test that consistently fails before these changes but passes after them. Fixes scylladb/scylladb#24792 (cherry picked from commit `e929279d74`)	2025-09-02 19:43:51 +02:00
Dawid Mędrek	c904f6a0fc	service/qos: Introduce auth_integration We introduce a new type, `auth_integration`, that will be used internally by `service_level_controller`. Its purpose is to take over the responsibility over managing effective service levels. The main problem of the current implementation of service level controller is its dependency on `auth::service` whose lifetime is strictly nested within the lifetime of service level controller. That may and already have led to invalid memory accesses; for an example, see issue scylladb/scylladb#24792. Our strategy is to split service level controller into smaller parts and ensure that we access `auth::service` only when it's valid to do so. This commit is the first step towards that. We don't change anything in the logic yet, just add the new type. Further adjustments will be made in following commits. (cherry picked from commit `7d0086b093`)	2025-09-02 19:25:23 +02:00
Nadav Har'El	fc738ee9ce	Merge '[backport 2025.1] token_range_vector: fragment' from Avi Kivity token_range_vector is a sequence of intervals of tokens. It is used to describe vnodes or token ranges owned by shards. Since tokens are bloated (16 bytes instead of 8), and intervals are bloated (40 byte of overhead instead of 8), and since we have plenty of token ranges, such vectors can exceed our allocation unit of 128 kB and cause allocation stalls. This series fixes that by first generalizing some helpers and then changing token_range_vector to use chunked_vector. Although this touches IDL, there is no compatibility problem since the encoding for vector and chunked_vector are identical. There is no performance concern since token_range_vector is never used on any hot path (hot paths always contain a partition key). Fixes #3335. Fixes #24115. Fixes #24156 Backport notes: Due to compiler limitations in this toolchain, the template template parameters were replaced by elaborate template metaprogramming, see patch 'partition_range_compat: generalize wrap/unwrap helpers'. Closes scylladb/scylladb#25704 * github.com:scylladb/scylladb: dht: fragment token_range_vector partition_range_compat: generalize wrap/unwrap helpers utils: chunked_vector: add swap() method utils: chunked_vector: add range insert() overloads	2025-09-01 19:07:54 +03:00
Calle Wilund	57f3f1465e	encryption_at_rest_test: Make fake_proxy read/write loop noexcept Fixes #23774 Test code falls into same when_all issue as http client did. Avoid passing exceptions through this, and instead catch and report in worker lambda. Closes scylladb/scylladb#23778 (cherry picked from commit `4a44651fce`)	2025-09-01 16:08:33 +03:00
Calle Wilund	e52675415f	encryption_at_rest_test: Add verbosity + earlier stream close to proxy Refs #22628 Adds some verbosity to track issues with the network proxy used to test EAR connector difficulties. Also adds an earlier close in input stream to help network usage. Note: This is a diagnostic helper. Still cannot repro the issue above. (cherry picked from commit `5905c19ab4`)	2025-09-01 16:06:57 +03:00
Taras Veretilnyk	bd4a80301c	keys: from_nodetool_style_string don't split single partition keys Users with single-column partition keys that contain colon characters were unable to use certain REST APIs and 'nodetool' commands, because the API split key by colon regardless of the partition key schema. Affected commands: - 'nodetool getendpoints' - 'nodetool getsstables' Affected endpoints: - '/column_family/sstables/by_key' - '/storage_service/natural_endpoints' Refs: #16596 - This does not fully fix the issue, as users with compound keys will face the issue if any column of the partition key contains a colon character. Closes scylladb/scylladb#24829 Closes scylladb/scylladb#25554	2025-09-01 09:03:12 +03:00
Pavel Emelyanov	625306fec3	Merge '[Backport 2025.1] cql3: Warn when creating RF-rack-invalid keyspace' from Scylladb[bot] Although RF-rack-valid keyspaces are not universally enforced yet (they're governed by the configuration option `rf_rack_valid_keyspaces`), we'd like to encourage the user to abide by the restriction. To that end, we're introducing a warning when creating or altering a keyspace. If the configuration option is disabled, but the user is trying to create an RF-rack-invalid keyspace, they'll receive a warning. If the option is turned off, we will also log all of the RF-rack-invalid keyspaces at start-up. We provide validation tests. Fixes scylladb/scylladb#23330 Backport: we'd like to encourage the user to abide by the restriction even when they don't enforce it to make it easier in the future to adjust the schema when there's no way to disable it anymore. Because of that, we'd like to backport it to all relevant versions, starting with 2025.1. - (cherry picked from commit `60ea22d887`) - (cherry picked from commit `af8a3dd17b`) - (cherry picked from commit `837d267cbf`) Parent PR: #24785 Closes scylladb/scylladb#25633 * github.com:scylladb/scylladb: main: Log RF-rack-invalid keyspaces at startup cql3/statements: Fix indentation cql3: Warn when creating RF-rack-invalid keyspace	2025-09-01 09:02:34 +03:00
Avi Kivity	951841f71f	utils: chunked_vector: add swap() method Following std::vector(), we implement swap(). It's a simple matter of swapping all the contents. A unit test is added. (cherry picked from commit `13a75ff835`)	2025-08-27 14:24:35 +03:00

1 2 3 4 5 ...

8531 Commits