scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-30 05:07:05 +00:00

Author	SHA1	Message	Date
Botond Dénes	6fe8f98add	Merge '[Backport 2025.3] compaction/scrub: register sstables for compaction before validation' from Scylladb[bot] compaction/scrub: register sstables for compaction before validation When `scrub --validate` runs, it collects all candidate sstables at the start and validates them one by one in separate compaction tasks. However, scrub in validate mode does not register these sstables for compaction, which allows regular compaction to pick them up and potentially compact them away before validation begins. This leads to scrub failures because the sstables can no longer be found. This patch fixes the issue by first disabling compaction, collecting the sstables, and then registering them for compaction before starting validation. This ensures that the enqueued sstables remain available for the entire duration of the scrub validation task. Fixes #23363 This reported scrub failure occurs on all versions that have the checksum/digest validation feature for uncompressed sstables. So, backport it to older versions. - (cherry picked from commit `84f2e99c05`) - (cherry picked from commit `7cdda510ee`) Parent PR: #26034 Closes scylladb/scylladb#26099 * github.com:scylladb/scylladb: compaction/scrub: register sstables for compaction before validation compaction/scrub: handle exceptions when moving invalid sstables to quarantine	2025-09-25 09:27:31 +03:00
Pavel Emelyanov	702eda371b	s3: Add metrics to show S3 prefetch bytes The chunked download source sends large GET requests and then consumes data as it arrives. Sometimes it can stop reading from socket early and drop the in-flight data. The existing read-bytes metrics show only the number of consumed bytes, we we also want to know the number of requested bytes Refs #25770 (accounting of read-bytes) Fixes #25876 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25877 (cherry picked from commit `6fb66b796a`) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26070	2025-09-25 09:26:41 +03:00
Patryk Jędrzejczak	d653e710ba	test: deflake driver reconnections in the recovery procedure tests All three tests could hit https://github.com/scylladb/python-driver/issues/295. We use the standard workaround for this issue: reconnecting the driver after the rolling restart, and before sending any requests to local tables (that can fail if the driver closes a connection to the node that restarted last). All three tests perform two rolling restarts, but the latter ones already have the workaround. Fixes #26005 Closes scylladb/scylladb#26056 (cherry picked from commit `a56115f77b`) Closes scylladb/scylladb#26199	2025-09-24 11:52:00 +02:00
Tomasz Grabiec	b3f4bef36b	tablets: scheduler: Run plan-maker in maintenance scheduling group Currently, it runs in the gossiper scheduling group, because it's invoked by the topology coordinator. That scheduling group has the same amount of shares as user workload. Plan-making can take significant amount of time during rebalancing, and we don't want that to impact user workload which happens to run on the same shard. Reduce impact by running in the maintenance scheduling group. Fixes #26037 Closes scylladb/scylladb#26046 (cherry picked from commit `ddbcea3e2a`) Closes scylladb/scylladb#26168	2025-09-22 15:20:01 +02:00
Pavel Emelyanov	b4598031e6	s3: Fix chunked download source metrics calculations In S3 client both read and write metrics have three counters -- number of requests made, number of bytes processed and request latency. In most of the cases all three counters are updated at once -- upon response arrival. However, in case of chunked download source this way of accounting metrics is misleading. In this code the request is made once, and then the obtained bytes are consumed eventually as the data arrive. Currently, each time a new portion of data is read from the socket the number of read requests is incremented. That's wrong, the request is made once, and this counter should also be incremented once, not for every data buffer that arrived in response. Same for read request latency -- it's "added" for every data buffer that arrives, but it's a lenghy process, the _request_ latency should be accounted once per responce. Maybe later we'll want to have "data latency" metrics as well, but for what we have now it's request latency. The number of read bytes is accounted properly, so not touched here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25770 (cherry picked from commit `9deea3655f`) Closes scylladb/scylladb#26145	2025-09-22 07:35:03 +03:00
Asias He	04776ad19e	streaming: Enclose potential throws in try block and ensure sink close before logging - Move the initialization of log_done inside the try block to catch any exceptions it may throw. - Relocate the failure warning log after sink.close() cleanup to guarantee sink.close() is always called before logging errors. Refs #25497 Closes scylladb/scylladb#25591 (cherry picked from commit `b12404ba52`) Closes scylladb/scylladb#25903	2025-09-21 18:11:43 +03:00
Nadav Har'El	d61bce8685	alternator: fix bug in combination of AttributeUpdates + ReturnValues In test/alternator/test_returnvalues.py we had tests for the ReturnValues feature on UpdateItem requests - but we only tested UpdateItem requests with the "modern" UpdateExpression, and forgot to test the combination of ReturnValues with the old AttributeUpdates API. It turns out this combination is buggy: when both ReturnValues=ALL_OLD and AttributeUpdates need the previous value of the item, we may wrongly std::move() the value out, and the operation will fail with a strange error: An error occurred (ValidationException) when calling the UpdateItem operation: JSON assert failed on condition 'IsObject()' The fix in this patch is trivial - just move the std::move() to the correct place, after both UpdateExpression and AttributeUpdates handling is done. This patch also includes a reproducing test, which fails before this patch and passes with it - and of course passes on DynamoDB. This test reproduces two cases where the bug happened, as well as one case where it didn't (to make sure we don't regress in what already worked). Fixes #25894 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25900 (cherry picked from commit `3c0032deb4`) Closes scylladb/scylladb#26096	2025-09-19 19:25:15 +03:00
Lakshmi Narayanan Sreethar	6e94a73fd4	compaction/scrub: register sstables for compaction before validation When `scrub --validate` runs, it collects all candidate sstables at the start and validates them one by one in separate compaction tasks. However, scrub in validate mode does not register these sstables for compaction, which allows regular compaction to pick them up and potentially compact them away before validation begins. This leads to scrub failures because the sstables can no longer be found. This patch fixes the issue by first disabling compaction, collecting the sstables, and then registering them for compaction before starting validation. This ensures that the enqueued sstables remain available for the entire duration of the scrub validation task. Fixes #23363 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `7cdda510ee`) Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-09-19 18:38:54 +05:30
Lakshmi Narayanan Sreethar	20501b2ea3	compaction/scrub: handle exceptions when moving invalid sstables to quarantine In validate mode, scrub moves invalid sstables into the quarantine folder. If validation fails because the sstable files are missing from disk, there is nothing to move, and the quarantine step will throw an exception. Handle such exceptions so scrub can return a proper compaction_result instead of propagating the exception to the caller. This will help the testcase for #23363 to reliably determine if the scrub has failed or not. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `84f2e99c05`)	2025-09-19 18:35:31 +05:30
Szymon Malewski	cca78c6568	alternator/expressions.g: Fix antlr3 missing token leak This patch overrides the antlr3 function that allocates the missing tokens that would eventually leak. The override stores these tokens in a vector, ensuring memory is freed whenever the parser is destroyed. Solution is copied from CQL implementation. A unit test to reproduce the issue is added - leak would be reported by ASAN, when running this test in debug mode - the test passed but the leak is discovered when the test file exits. Fixes #25878 Closes scylladb/scylladb#25930 (cherry picked from commit `776f90e2f8`) Closes scylladb/scylladb#26085	2025-09-18 07:50:31 +03:00
Sergey Zolotukhin	8568a8a303	raft: disable caching for raft log. This change disables caching for raft log table due to the following reasons: * Immediate reason is a deficiency in handling emerging range tombstones in the cache, which causes stalls. * Long-term reason is that sequential reads from the raft log do not benefit from the cache, making it better to bypass it to free up space and avoid stalls. Fixes scylladb/scylladb#26027 Closes scylladb/scylladb#26031 (cherry picked from commit `2640b288c2`) Closes scylladb/scylladb#26074	2025-09-18 07:50:05 +03:00
Pavel Emelyanov	1310e61040	Merge '[Backport 2025.3] gossiper: ensure gossiper operations are executed in gossiper scheduling group' from Scylladb[bot] Sometimes gossiper operations invoked from storage_service and other components run under a non-gossiper scheduling group. If these operations acquire gossiper locks, priority inversion can occur: higher-priority gossiper tasks may wait behind lower-priority tasks (e.g. streaming), which can cause gossiper slowness or even failures. This patch ensures that gossiper operations requiring locks on gossiper structures are explicitly executed in the gossiper scheduling group. To help detect similar issues in the future, a warning is logged whenever a gossiper lock is acquired under a non-gossiper scheduling group. Fixes scylladb/scylladb#25907 Refs: scylladb/scylladb#25702 Backport: this patch fixes an issue with gossiper operations scheduling group, that might affect topology operations, therefore backport is needed to 2025.1, 2025.2, 2025.3 - (cherry picked from commit `340413e797`) - (cherry picked from commit `6c2a145f6c`) Parent PR: #25981 Closes scylladb/scylladb#26073 * https://github.com/scylladb/scylladb: gossiper: ensure gossiper operations are executed in gossiper scheduling group gossiper: fix wrong gossiper instance used in `force_remove_endpoint`	2025-09-18 07:49:49 +03:00
Aleksandra Martyniuk	3f345615a5	replica: lower severity of failure log Flush failure with seastar::named_gate_closed_exception is expected if a respective compaction group was already stopped. Lower the severity of a log in dirty_memory_manager::flush_one for this exception. Fixes: https://github.com/scylladb/scylladb/issues/25037. Closes scylladb/scylladb#25355 (cherry picked from commit `a10e241228`) Closes scylladb/scylladb#25650	2025-09-18 07:49:28 +03:00
Sergey Zolotukhin	3bf986170b	gossiper: ensure gossiper operations are executed in gossiper scheduling group Sometimes gossiper operations invoked from storage_service and other components run under a non-gossiper scheduling group. If these operations acquire gossiper locks, priority inversion can occur: higher-priority gossiper tasks may wait behind lower-priority tasks (e.g. streaming), which can cause gossiper slowness or even failures. This patch ensures that gossiper operations requiring locks on gossiper structures are explicitly executed in the gossiper scheduling group. To help detect similar issues in the future, a warning is logged whenever a gossiper lock is acquired under a non-gossiper scheduling group. Fixes scylladb/scylladb#25907 (cherry picked from commit `6c2a145f6c`)	2025-09-17 11:22:31 +00:00
Sergey Zolotukhin	d585211c4a	gossiper: fix wrong gossiper instance used in `force_remove_endpoint` `gossiper::force_remove_endpoint` is always executed on shard 0 using `invoke_on`. Since each shard has its own `gossiper` instance, if `force_remove_endpoint` is called from a shard other than shard 0, `my_host_id()` may be invoked on the wrong `gossiper` object. This results in undefined behavior due to unsynchronized access to resources on another shard. (cherry picked from commit `340413e797`)	2025-09-17 11:22:31 +00:00
Wojciech Mitros	246fcb8b6a	mv: delete previously undetected ghost rows in PRUNE MATERIALIZED VIEW statement The PRUNE MATERIALIZED VIEW statement is supposed to remove ghost rows from the view. Ghost rows are rows in the view with no corresponding row in the base table. Before this patch, only rows whose primary key columns of the base table had different values than any of the base rows were treated as ghost rows by the PRUNE statement. However, view rows which have a column in their primary key that's not in the base primary can also be ghost rows if this column has a different value than the base row with the same values of remaining primary key columns. That's because these rows won't be deleted unless we change value of this column in the base table to this specific value. In this patch we add a check for this column in the PRUNE MATERIALIZED VIEW logic. If this column isn't the same in the base table and the view, these rows are also deleted. Fixes https://github.com/scylladb/scylladb/issues/25655 Closes scylladb/scylladb#25720 (cherry picked from commit `1f9be235b8`) Closes scylladb/scylladb#25956	2025-09-15 12:26:02 +02:00
Jenkins Promoter	93da39020f	Update ScyllaDB version to: 2025.3.2	2025-09-15 11:12:31 +03:00
Jenkins Promoter	04b0d7b629	Update pgo profiles - aarch64	2025-09-15 05:35:35 +03:00
Jenkins Promoter	92d0b05bd0	Update pgo profiles - x86_64	2025-09-15 05:04:20 +03:00
Patryk Jędrzejczak	b5cbe0d50a	Merge '[Backport 2025.3] test: cluster: deflake consistency checks after decommission' from Scylladb[bot] In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). Therefore, `check_token_ring_and_group0_consistency` called just after decommission might fail when the decommissioned node is still in group 0 (as a non-voter). We deflake all tests that call `check_token_ring_and_group0_consistency` after decommission in this PR. Fixes #25809 This PR improves CI stability and changes only tests, so it should be backported to all supported branches. - (cherry picked from commit `e41fc841cd`) - (cherry picked from commit `bb9fb7848a`) Parent PR: #25927 Closes scylladb/scylladb#25963 * https://github.com/scylladb/scylladb: test: cluster: deflake consistency checks after decommission test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency	2025-09-11 13:01:54 +02:00
Patryk Jędrzejczak	2ce95c429f	test: cluster: deflake consistency checks after decommission In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). Therefore, `check_token_ring_and_group0_consistency` called just after decommission might fail when the decommissioned node is still in group 0 (as a non-voter). We deflake all tests that call `check_token_ring_and_group0_consistency` after decommission in this commit. Fixes #25809 (cherry picked from commit `bb9fb7848a`)	2025-09-10 17:49:12 +00:00
Patryk Jędrzejczak	b4e64e5adf	test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). `wait_for_token_ring_and_group0_consistency` doesn't handle such a case; it only handles cases where the token ring is updated later. We fix this in this commit. We rely on the new implementation of `wait_for_token_ring_and_group0_consistency` in the following commit to fix flakiness of some tests. We also update the obsolete docstring in this commit. (cherry picked from commit `e41fc841cd`)	2025-09-10 17:49:12 +00:00
Dawid Mędrek	3dac49c62f	test/perf: Adjust tablet_load_balancing.cc to RF-rack-validity We modify the logic to make sure that all of the keyspaces that the test creates are RF-rack-valid. For that, we distribute the nodes across two DCs and as many racks as the provided replication factor. That may have an effect on the load balancing logic, but since this is a performance test and since tablet load balancing is still taking place, it should be acceptable. This commit also finishes work in adjusting perf tests to pass with the `rf_rack_valid_keyspaces` configuration option enabled. The remaining tests either don't attempt to create keyspaces or they already create RF-rack-valid keyspaces. We don't need to explicitly enable the configuration option. It's already enabled by default by `cql_test_config`. The reason why we haven't run into any issue because of that is that performance tests are not part of our CI. Fixes scylladb/scylladb#25127 Closes scylladb/scylladb#25728 (cherry picked from commit `789a4a1ce7`) Closes scylladb/scylladb#25922	2025-09-10 10:30:40 +03:00
Asias He	ac88ea8152	streaming: Fix use after move in the tablet_stream_files_handler The files object is moved before the log when stream finishes. We've logged the files when the stream starts. Skip it in the end of streaming. Fixes #25830 Closes scylladb/scylladb#25835 (cherry picked from commit `451e1ec659`) Closes scylladb/scylladb#25891	2025-09-10 10:30:11 +03:00
Wojciech Mitros	055a6c2cee	storage_proxy: send hints to pending replicas Consider the following scenario: - Current replica set is [A, B, C] - write succeeds on [A, B], and a hint is logged for node C - before the hint is replayed, D bootstraps and the token migrates from C to D - hint is replayed to node C while D is pending, but it's too late, since streaming for that token is already done - C is cleaned up, replayed data is lost, and D has a stale copy until next repair. In the scenario we effectively fail to send the hint. This scenario is also more likely to happen with tablets, as it can happen for every tablet migration. This issue is particularly detrimental to materialized views. View updates use hints by default and a specific view update may be sent to just one view replica (when a single base replica has a different row state due to reordering or missed writes). When we lose a hint for such a view update, we can generate a persistent inconsistency between the base and view - ghost rows can appear due to a lost tombstone and rows may be missing in the view due to a lost row update. Such inconsistencies can't be fixed neither by repairing the view or the base table. To handle this, in this patch we add the pending replicas to the list of targets of each hint, even if the original target is still alive. This will cause some updates to be redundant. These updates are probably unavoidable for now, but they shouldn't be too common either. The scenarios for them are: 1. managing to send the hint to the source of a migrating replica before streaming that its token - the write will arrive on the pending replica anyway in streaming 2. the hint target not being the source of the migration - if we managed to apply the original write of the hint to the actual source of the migration, the pending replica will get it during streaming 3. sending the same hint to many targets at a similar time - while sending to each target, we'll see the same pending replica for the hint so we'll send it multiple times 4. possible retries where even though the hint was successfully sent to the main target, we failed to send it to the pending replica, so we need to retry the entire write This patch handles both tablet migrations and tablet rebuilds. In the future, for tablet migrations, we can avoid sending the hint to pending replias if the hint target is not the source fo the migration, which would allow us to avoid the redundant writes 2 and 3. For rack-aware RF, this will be as simple as checking whether the replicas are in the same rack. We also add a test case reproducing the issue. Co-Authored-By: Raphael S. Carvalho <raphaelsc@scylladb.com> Fixes https://github.com/scylladb/scylladb/issues/19835 Closes scylladb/scylladb#25590 (cherry picked from commit `10b8e1c51c`) Closes scylladb/scylladb#25882	2025-09-10 10:29:52 +03:00
Pavel Emelyanov	81e4c65f8c	Merge '[Backport 2025.3] Allow users to SELECT from CDC log tables they created.' from Scylladb[bot] Before the patch, user with CREATE access could create a table with CDC or alter the table enabling CDC, but could not query a SELECT on the CDC table they created. It was due to the fact, the SELECT permission was checked on the CDC log, and later it's "parent" - the keyspace, but not the base table, on which the user had SELECT permission automatically granted on CREATE. This patch matches the behavior of querying the CDC log to the one implemented for Materialized Views: 1. No new permissions are granted on CREATE. 2. When querying SELECT, the permissions on base table SELECT are checked. Fixes: https://github.com/scylladb/scylladb/issues/19798 Fixes: VECTOR-151 - (cherry picked from commit `be54346846`) - (cherry picked from commit `5e72d71188`) Parent PR: #25797 Closes scylladb/scylladb#25870 * github.com:scylladb/scylladb: cqlpy/test_permissions: run the reproducer tests for #19798 select_statement: check for access to CDC base table	2025-09-10 10:29:10 +03:00
Pavel Emelyanov	6977c5eaf1	s3: Export memory usage gauge (metrics) The memory usage is tracked with the help of a semaphore, so just export its "consumed" units. One tricky place here is the need to skip metrics registration for scylla-sstable tool. The thing is that the tools starts the storage manager and sstables manager on start and then some of tool's operations may want to start both managers again (via cql environment) causing double metrics registration exception. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25769 (cherry picked from commit `b26816f80d`) Closes scylladb/scylladb#25865	2025-09-10 10:28:39 +03:00
Yaron Kaikov	bdec3b2bc5	build_docker.sh: enable debug symboles installation Adding the latest scylla.repo location to our docker container, this will allow installation scylla-debuginfo package in case it's needed Fixes: https://github.com/scylladb/scylladb/issues/24271 Closes scylladb/scylladb#25646 (cherry picked from commit `d57741edc2`) Closes scylladb/scylladb#25893	2025-09-09 11:41:17 +03:00
Patryk Jędrzejczak	2792fd6383	Merge '[Backport 2025.3] gossiper: fix issues in processing gossip status during the startup and when messages are delayed to avoid empty host ids' from Scylladb[bot] Populate the local state during gossiper initialization in start_gossiping, preventing an empty state from being added to _endpoint_state_map and returned in get_endpoint_states responses, that was causing an 'empty host id issue' on the other nodes during nodes restart. Check for a race condition in do_apply_state_locally In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. Fixes https://github.com/scylladb/scylladb/issues/25831 Fixes https://github.com/scylladb/scylladb/issues/25803 Fixes https://github.com/scylladb/scylladb/issues/25702 Fixes https://github.com/scylladb/scylladb/issues/25621 Ref https://github.com/scylladb/scylla-enterprise/issues/5613 Backport: The issue affects all current releases(2025.x), therefore this PR needs to be backported to all 2025.1-2025.3. - (cherry picked from commit `28e0f42a83`) - (cherry picked from commit `f08df7c9d7`) - (cherry picked from commit `775642ea23`) - (cherry picked from commit `b34d543f30`) Parent PR: #25849 Closes scylladb/scylladb#25898 * https://github.com/scylladb/scylladb: gossiper: fix empty initial local node state gossiper: add test for a race condition in start_gossiping gossiper: check for a race condition in `do_apply_state_locally` test/gossiper: add reproducible test for race condition during node decommission	2025-09-09 10:00:30 +02:00
Sergey Zolotukhin	41dd29f5a3	gossiper: fix empty initial local node state This change removes the addition of an empty state to `_endpoint_state_map`. Instead, a new state is created locally and then published via replicate, avoiding the issue of an empty state existing in `_endpoint_state_map` before the preemption point. Since this resolves the issue tested in `test_gossiper_empty_self_id_on_shadow_round`, the `xfail` mark has been removed. Fixes: scylladb/scylladb#25831 (cherry picked from commit `b34d543f30`)	2025-09-08 21:55:16 +00:00
Sergey Zolotukhin	13f43e2872	gossiper: add test for a race condition in start_gossiping This change adds a test for a race condition in `start_gossiping` that can lead to an empty self state sent in `gossip_get_endpoint_states_response`. Test for scylladb/scylladb#25831 (cherry picked from commit `775642ea23`)	2025-09-08 21:55:16 +00:00
Sergey Zolotukhin	ec85ebf419	gossiper: check for a race condition in `do_apply_state_locally` In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change 1. adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. 2. Removes xfail from the test_gossiper_race test since the issue is now fixed. 3. Adds exception handling in `do_shadow_round` to skip responses from nodes that sent an empty host ID. This re-applies the commit `13392a40d4` that was reverted in `46aa59fe49`, after fixing the issues that caused the CI to fail. Fixes: scylladb/scylladb#25702 Fixes: scylladb/scylladb#25621 Ref: scylladb/scylla-enterprise#5613 (cherry picked from commit `f08df7c9d7`)	2025-09-08 21:55:16 +00:00
Emil Maskovsky	b53a5f9b3d	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. This re-applies the commit `5dac4b38fb` that was reverted in `dc44fca67c`, but modified to relax the check from "on_internal_error" to a just warning log. The more strict can be re-introduced later once we are sure that all remaining problems are resolved and it will not break the CI. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721 (cherry picked from commit `28e0f42a83`)	2025-09-08 21:55:16 +00:00
Anna Stuchlik	acd4cbbbe1	doc: add support for i7i instances This commit adds currently supported i7i and i7ie instances to the list of instance recommendations. Fixes https://github.com/scylladb/scylladb/issues/25808 Closes scylladb/scylladb#25817 (cherry picked from commit `f66580a28f`) Closes scylladb/scylladb#25853	2025-09-08 10:40:52 +03:00
Dawid Pawlik	4303bb7d56	cqlpy/test_permissions: run the reproducer tests for #19798 Since the previous commit fixes the issue, we can remove the xfail mark. The tests should pass now. (cherry picked from commit `5e72d71188`)	2025-09-08 07:39:52 +00:00
Dawid Pawlik	675f74b4b7	select_statement: check for access to CDC base table Before the patch, user with CREATE access could create a table with CDC or alter the table enabling CDC, but could not query a SELECT on the CDC table they created. It was due to the fact, the SELECT permission was checked on the CDC log, and later it's "parent" - the keyspace, but not thebase table, on which the user had SELECT permission automatically granted on CREATE. This patch matches the behaviour of querying the CDC log to the one implemented for Materialized Views: 1. No new permissions are granted on CREATE. 2. When querying SELECT, the permissions on base table SELECT are checked. Fixes: #19798 (cherry picked from commit `be54346846`)	2025-09-08 07:39:52 +00:00
Avi Kivity	0900a88884	Merge 'auth: move passwords::check call to alien thread' from Andrzej Jackowski Analysis of customer stalls revealed that the function `detail::hash_with_salt` (invoked by `passwords::check`) often blocks the reactor. Internally, this function uses the external `crypt_r` function to compute password hashes, which is CPU-intensive. This PR addresses the issue in two ways: 1) `sha-512` is now the only password hashing scheme for new passwords (it was already the common-case). 2) `passwords::check` is moved to a dedicated alien thread. Regarding point 1: before this change, the following hashing schemes were supported by `identify_best_supported_scheme()`: bcrypt_y, bcrypt_a, SHA-512, SHA-256, and MD5. The reason for this was that the `crypt_r` function used for password hashing comes from an external library (currently `libxcrypt`), and the supported hashing algorithms vary depending on the library in use. However: - The bcrypt schemes never worked properly because their prefixes lack the required round count (e.g. `$2y$` instead of `$2y$05$`). Moreover, bcrypt is slower than SHA-512, so it not good idea to fix or use it. - SHA-256 and SHA-512 both belong to the SHA-2 family. Libraries that support one almost always support the other, so it’s very unlikely to find SHA-256 without SHA-512. - MD5 is no longer considered secure for password hashing. Regarding point 2: the `passwords::check` call now runs on a shared alien thread created at database startup. An `std::mutex` synchronizes that thread with the shards. In theory this could introduce a frequent lock contention, but in practice each shard handles only a few hundred new connections per second—even during storms. There is already `_conns_cpu_concurrency_semaphore` in `generic_server` limits the number of concurrent connection handlers. Fixes https://github.com/scylladb/scylladb/issues/24524 Backport not needed, as it is a new feature. Closes scylladb/scylladb#24924 * github.com:scylladb/scylladb: main: utils: add thread names to alien workers auth: move passwords::check call to alien thread test: wait for 3 clients with given username in test_service_level_api auth: refactor password checking in password_authenticator auth: make SHA-512 the only password hashing scheme for new passwords auth: whitespace change in identify_best_supported_scheme() auth: require scheme as parameter for `generate_salt` auth: check password hashing scheme support on authenticator start (cherry picked from commit `c762425ea7`)	2025-09-07 13:38:33 +03:00
Calle Wilund	2bbf3cf669	system_keyspace: Prune dropped tables from truncation on start/drop Fixes #25683 Once a table drop is complete, there should be no reason to retain truncation records for it, as any replay should skip mutations anyway (no CF), and iff we somehow resurrect a dropped table, this replay-resurrected data is the least problem anyway. Adds a prune phase to the startup drop_truncation_rp_records run, which ignores updating, and instead deletes records for non-existant tables (which should patch any existing servers with lingering data as well). Also does an explicit delete of records on actual table DROP, to ensure we don't grow this table more than needed even in long uptime nodes. Small unit test included. Closes scylladb/scylladb#25699 (cherry picked from commit `bc20861afb`) Closes scylladb/scylladb#25815 scylla-2025.3.1 scylla-2025.3.1-candidate-20250907021632	2025-09-05 19:02:39 +03:00
Botond Dénes	c30c1ec40a	Merge '[Backport 2025.3] drop table: fix crash on drop table with concurrent cleanup' from Scylladb[bot] Consider the following scenario: - A tablet is migrated away from a shard - The tablet cleanup stage closes the storage group's async_gate - A drop table runs truncate which attempts to disable compaction on the tablet with its gate closed. This fails, because table::parallel_foreach_compaction_group() ultimately calls storage_group_manager::parallel_foreach_storage_group() which will not disable compaction if it can't hold the storage group's gate - Truncate calls table::discard_sstables() which checks if the compaction has been disabled, and because it hasn't, it then runs on_internal_error() with "compaction not disabled on table ks.cf during TRUNCATE" which causes a crash Fixes: #25706 This needs to be backported to all supported versions with tablets - (cherry picked from commit `a0934cf80d`) - (cherry picked from commit `1b8a44af75`) Parent PR: #25708 Closes scylladb/scylladb#25785 * github.com:scylladb/scylladb: test: reproducer and test for drop with concurrent cleanup truncate: check for closed storage group's gate in discard_sstables	2025-09-05 19:02:04 +03:00
Andrei Chekun	2ee1082561	test.py: modify run to use different junit output filenames Currently, run will execute twice pytest without modifying the path of the JUnit XML report. This leads that the second execution of the pytest will override the report. This PR fixing this issue so both reports will be stored. Closes scylladb/scylladb#25726 (cherry picked from commit `e55c8a9936`) Closes scylladb/scylladb#25778	2025-09-05 19:01:22 +03:00
Pavel Emelyanov	f1e3dedcd6	Revert "test/gossiper: add reproducible test for race condition during node decommission" This reverts commit `4e17330a1b` because parent PR had been reverted as per #25803	2025-09-05 10:08:29 +03:00
Nadav Har'El	5d6aa6e8c2	utils, alternator: fix detection of invalid base-64 This patch fixes an error-path bug in the base-64 decoding code in utils/base64.cc, which among other things is used in Alternator to decode blobs in JSON requests. The base-64 decoding code has a lookup table, which was wrongly sized 255 bytes, but needed to be 256 bytes. This meant that if the byte 255 (0xFF) was included in an invalid base-64 string, instead of detecting that this is an invalid byte (since the only valid bytes in a base-64 string are A-Z,a-z,0-9,+,/ and =), the code would either think it's valid with a nonsense 6-bit part, or even crash on an out-of-bounds read. Besides the trivial fix, this patch also includes a reproducing test, which tries to write a blob as a supposedly base-64 encoded string with a 0xFF byte in it. The test fails before this patch (the write succeeds, unexpectedly), and passes after this patch (the write fails as expected). The test also passes on DynamoDB. Fixes #25701 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25705 (cherry picked from commit `ff91027eac`) Closes scylladb/scylladb#25767	2025-09-04 11:38:55 +03:00
Pavel Emelyanov	1c8e10231a	Merge '[Backport 2025.3] service/qos: Modularize service level controller to avoid invalid access to auth::service' from Scylladb[bot] Move management over effective service levels from `service_level_controller` to a new dedicated type -- `auth_integration`. Before these changes, it was possible for the service level controller to try to access `auth::service` after it was deinitialized. For instance, it could happen when reloading the cache. That HAS happened as described in the following issue: scylladb/scylladb#24792. Although the problem might have been mitigated or even resolved in scylladb/scylladb@10214e13bd, it's not clear how the service will be used in the future. It's best to prevent similar bugs than trying to fix them later on. The logic responsible for preventing to access an uninitialized `auth::service` was also either non-existent, complex, or non-sufficient. To prevent accessing `auth::service` by the service level controller, we extract the relevant portion of the code to a separate entity -- `auth_integration`. It's an internal helper type whose sole purpose is to manage effective service levels. Thanks to that, we were able to nest the lifetime of `auth_integration` within the lifetime of `auth::service`. It's now impossible to attempt to dereference it while it's uninitialized. If a bug related to an invalid access is spotted again, though, it might also be easier to debug it now. There should be no visible change to the users of the interface of the service level controller. We strived to make the patch minimal, and the only affected part of the logic should be related to how `auth::service` is accessed. The relevant portion of the initialization and deinitialization flow: (a) Before the changes: 1. Initialize `service_level_controller`. Pass a reference to an uninitialized `auth::service` to it. 2. Initialize other services. 3. Initialize and start `auth::service`. 4. (work) 5. Stop and deinitialize `auth::service`. 6. Deinitialize other services. 7. Deinitialize `service_level_controller`. (b) After the changes: 1. Initialize `service_level_controller`. Pass a reference to an uninitialized `auth::service` to it. () 2. Initialize other services. 3. Initialize and start `auth::service`. 4. Initialize `auth_integration`. Register it in `service_level_controller`. 5. (work) 6. Unregister `auth_integration` in `service_level_controller` and deinitialize it. 7. Stop and deinitialize `auth::service`. 8. Deinitialize other services. 9. Deinitialize `service_level_controller`. (): The reference to `auth::service` in `service_level_controller` is still necessary. We need to access the service when dropping a distributed service level. Although it would be best to cut that link between the service level controller and `auth::service` too, effectively separating the entities, it would require more work, so we leave it as-is for now. It shouldn't prove problematic as far as accessing an uninitialized service goes. Trying to drop a service level at the point when we're de-initializing auth should be impossible. For more context, see the function `drop_distributed_service_level` in `service_level_controller`. A trivial test has been included in the PR. Although its value is questionable as we only try to reload the service level cache at a specific moment, it's probably the best we can deliver to provide a reproducer of the issue this patch is resolving. Fixes scylladb/scylladb#24792 Backport: The impact of the bug was minimal as it only affected the shutdown. However, since CI is failing because of it, let's backport the change to all supported versions. - (cherry picked from commit `7d0086b093`) - (cherry picked from commit `34afb6cdd9`) - (cherry picked from commit `e929279d74`) - (cherry picked from commit `dd5a35dc67`) - (cherry picked from commit `fc1c41536c`) Parent PR: #25478 Closes scylladb/scylladb#25753 * github.com:scylladb/scylladb: service/qos: Move effective SL cache to auth_integration service/qos: Add auth::service to auth_integration service/qos: Reload effective SL cache conditionally service/qos: Add gate to auth_integration service/qos: Introduce auth_integration	2025-09-04 11:38:17 +03:00
Pavel Emelyanov	d484837a2a	Merge '[Backport 2025.3] db/hints: Improve logs' from Scylladb[bot] Before these changes, the logs in hinted handoff often didn't provide crucial information like the identifier of the node that hints were being sent to. Also, some of the logs were misleading and referred to other places in the code than the one where an exception or some other situation really occurred. We modify those logs, extending them by more valuable information and fixing existing issues. What's more, all of the logs in `hint_endpoint_manager` and `hint_sender` follow a consistent format now: ``` <class_name>[<destination host ID>]:<function_name>: <message> ``` This way, we should always have AT LEAST the basic information. Fixes scylladb/scylladb#25466 Backport: There is no risk in backporting these changes. They only have impact on the logs. On the other hand, they might prove helpful when debugging an issue in hinted handoff. - (cherry picked from commit `2327d4dfa3`) - (cherry picked from commit `d7bc9edc6c`) - (cherry picked from commit `6f1fb7cfb5`) Parent PR: #25470 Closes scylladb/scylladb#25538 * github.com:scylladb/scylladb: db/hints: Add new logs db/hints: Adjust log levels db/hints: Improve logs	2025-09-04 11:36:30 +03:00
Pavel Emelyanov	ad6dbcfdc5	Merge '[Backport 2025.3] generic server: 2 step shutdown' from Scylladb[bot] This PR implements solution proposed in scylladb/scylladb#24481 Instead of terminating connections immediately, the shutdown now proceeds in two stages: first closing the receive (input) side to stop new requests, then waiting for all active requests to complete before fully closing the connections. The updated shutdown process is as follows: 1. Initial Shutdown Phase * Close the accept gate to block new incoming connections. * Abort all accept() calls. * For all active connections: * Close only the input side of the connection to prevent new requests. * Keep the output side open to allow responses to be sent. 2. Drain Phase * Wait for all in-progress requests to either complete or fail. 3. Final Shutdown Phase * Fully close all connections. Fixes scylladb/scylladb#24481 - (cherry picked from commit `122e940872`) - (cherry picked from commit `3848d10a8d`) - (cherry picked from commit `3610cf0bfd`) - (cherry picked from commit `27b3d5b415`) - (cherry picked from commit `061089389c`) - (cherry picked from commit `7334bf36a4`) - (cherry picked from commit `ea311be12b`) - (cherry picked from commit `4f63e1df58`) Parent PR: #24499 Closes scylladb/scylladb#25519 * github.com:scylladb/scylladb: test: Set `request_timeout_on_shutdown_in_seconds` to `request_timeout_in_ms`, decrease request timeout. generic_server: Two-step connection shutdown. transport: consmetic change, remove extra blanks. transport: Handle sleep aborted exception in sleep_until_timeout_passes generic_server: replace empty destructor with `= default` generic_server: refactor connection::shutdown to use `shutdown_input` and `shutdown_output` generic_server: add `shutdown_input` and `shutdown_output` functions to `connection` class. test: Add test for query execution during CQL server shutdown	2025-09-04 11:35:55 +03:00
Ran Regev	a79cbd9a9a	docs: backup and restore feature added backup and restore as a feature to documentation Signed-off-by: Ran Regev <ran.regev@scylladb.com> Closes scylladb/scylladb#25608 (cherry picked from commit `515d9f3e21`) Closes scylladb/scylladb#25748	2025-09-03 12:37:45 +03:00
Emil Maskovsky	4e17330a1b	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721 Backport: The test is primarily for an issue found in 2025.1, so it needs to be backported to all the 2025.x branches. Closes scylladb/scylladb#25685 (cherry picked from commit `5dac4b38fb`) Closes scylladb/scylladb#25781	2025-09-02 08:29:27 +02:00
Ferenc Szili	6a7a5f5edc	test: reproducer and test for drop with concurrent cleanup This change adds a reproducer and test for issue #25706 (cherry picked from commit `1b8a44af75`)	2025-09-02 02:18:56 +00:00
Ferenc Szili	34b403747a	truncate: check for closed storage group's gate in discard_sstables Consider the following scenario: - A tablet is migrated away from a shard - The tablet cleanup stage closes the storage group's async_gate - A drop table runs truncate which attempts to disable compaction on the tablet with its gate closed. This fails, because table::parallel_foreach_compaction_group() ultimately calls storage_group_manager::parallel_foreach_storage_group() which will not disable compaction if it can't hold the storage group's gate - Truncate calls table::discard_sstables() which checks if the compaction has been disabled, and because it hasn't, it then runs on_internal_error() with "compaction not disabled on table ks.cf during TRUNCATE" which causes a crash This patch makes dicard_sstables check if the storage group's gate is closed whend checking for disabled compaction. (cherry picked from commit `a0934cf80d`)	2025-09-02 02:18:56 +00:00
Piotr Dulikowski	debc637ac1	Merge '[Backport 2025.3] system_keyspace: add peers cache to get_ip_from_peers_table' from Scylladb[bot] The gossiper can call `storage_service::on_change` frequently (see scylladb/scylla-enterprise#5613), which may cause high CPU load and even trigger OOMs or related issues. This PR adds a temporary cache for `system.peers` to resolve host_id -> ip without hitting storage on every call. The cache is short-lived to handle the unlikely case where `system.peers` is updated directly via CQL. This is a temporary fix; a more thorough solution is tracked in https://github.com/scylladb/scylladb/issues/25620. Fixes scylladb/scylladb#25660 backport: this patch needs to be backported to all supported versions (2025.1/2/3). - (cherry picked from commit `91c633371e`) - (cherry picked from commit `de5dc4c362`) - (cherry picked from commit `4b907c7711`) Parent PR: #25658 Closes scylladb/scylladb#25766 * github.com:scylladb/scylladb: storage_service: move get_host_id_to_ip_map to system_keyspace system_keyspace: use peers cache in get_ip_from_peers_table storage_service: move get_ip_from_peers_table to system_keyspace	2025-09-01 21:21:26 +02:00

1 2 3 4 5 ...

48517 Commits