scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	74cf95a675	db/config, gms/gossiper: change recovery_leader to UUID We change the type of the `recovery_leader` config parameter and `gossip_config::recovery_leader` from sstring to UUID. `recovery_leader` is supposed to store host ID, so UUID is a natural choice. After changing the type to UUID, if the user provides an incorrect UUID, parsing `recovery_leader` will fail early, but the start-up will continue. Outside the recovery procedure, `recovery_leader` will then be ignored. In the recovery procedure, the start-up will fail on: ``` throw std::runtime_error( "Cannot start - Raft-based topology has been enabled but persistent group 0 ID is not present. " "If you are trying to run the Raft-based recovery procedure, you must set recovery_leader."); ``` (cherry picked from commit `445a15ff45`)	2025-08-05 10:59:39 +00:00
Patryk Jędrzejczak	d18d2fa0cf	db/config, utils: allow using UUID as a config option We change the `recovery_leader` option to UUID in the following commit. (cherry picked from commit `ec69028907`)	2025-08-05 10:59:39 +00:00
Jenkins Promoter	3d4ec918ff	Update ScyllaDB version to: 2025.3.0-rc3	2025-08-03 15:50:47 +03:00
Nikos Dragazis	257ebbeca9	test: Use in-memory SQLite for PyKMIP server The PyKMIP server uses an SQLite database to store artifacts such as encryption keys. By default, SQLite performs a full journal and data flush to disk on every CREATE TABLE operation. Each operation triggers three fdatasync(2) calls. If we multiply this by 16, that is the number of tables created by the server, we get a significant number of file syncs, which can last for several seconds on slow machines. This behavior has led to CI stability issues from KMIP unit tests where the server failed to complete its schema creation within the 20-second timeout (observed on spider9 and spider11). Fix this by configuring the server to use an in-memory SQLite. Fixes #24842. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#24995 (cherry picked from commit `2656fca504`) Closes scylladb/scylladb#25300	2025-08-02 17:12:05 +03:00
Ran Regev	7aa7f50b3a	scylla.yaml: add recommended value for stream_io_throughput_mb_per_sec Fixes: #24758 Updated scylla.yaml and the help for scylla --help Closes scylladb/scylladb#24793 (cherry picked from commit `db4f301f0c`) Closes scylladb/scylladb#25280	2025-08-01 15:02:01 +03:00
Piotr Dulikowski	0dc700de70	Merge '[Backport 2025.3] qos: don't populate effective service level cache until auth is migrated to raft' from Scylladb[bot] Right now, service levels are migrated in one group0 command and auth is migrated in the next one. This has a bad effect on the group0 state reload logic - modifying service levels in group0 causes the effective service levels cache to be recalculated, and to do so we need to fetch information about all roles. If the reload happens after SL upgrade and before auth upgrade, the query for roles will be directed to the legacy auth tables in system_auth - and the query, being a potentially remote query, has a timeout. If the query times out, it will throw an exception which will break the group0 apply fiber and the node will need to be restarted to bring it back to work. In order to solve this issue, make sure that the service level module does not start populating and using the service level cache until both service levels and auth are migrated to raft. This is achieved by adding the check both to the cache population logic and the effective service level getter - they now look at service level's accessor new method, `can_use_effective_service_level_cache` which takes a look at the auth version. Fixes: scylladb/scylladb#24963 Should be backported to all versions which support upgrade to topology over raft - the issue described here may put the cluster into a state which is difficult to get out of (group0 apply fiber can break on multiple nodes, which necessitates their restart). - (cherry picked from commit `2bb800c004`) - (cherry picked from commit `3a082d314c`) Parent PR: #25188 Closes scylladb/scylladb#25285 * github.com:scylladb/scylladb: test: sl: verify that legacy auth is not queried in sl to raft upgrade qos: don't populate effective service level cache until auth is migrated to raft	2025-08-01 08:49:13 +02:00
Jenkins Promoter	308400895f	Update pgo profiles - aarch64	2025-08-01 05:19:18 +03:00
Jenkins Promoter	54b259bec9	Update pgo profiles - x86_64	2025-08-01 05:02:34 +03:00
Piotr Dulikowski	f27a3be62b	test: sl: verify that legacy auth is not queried in sl to raft upgrade Adjust `test_service_levels_upgrade`: right before upgrade to topology on raft, enable an error injection which triggers when the standard role manager is about to query the legacy auth tables in the system_auth keyspace. The preceding commit which fixes scylladb/scylladb#24963 makes sure that the legacy tables are not queried during upgrade to topology on raft, so the error injection does not trigger and does not cause a problem; without that commit, the test fails. (cherry picked from commit `3a082d314c`)	2025-07-31 15:13:57 +00:00
Piotr Dulikowski	ba70b39486	qos: don't populate effective service level cache until auth is migrated to raft Right now, service levels are migrated in one group0 command and auth is migrated in the next one. This has a bad effect on the group0 state reload logic - modifying service levels in group0 causes the effective service levels cache to be recalculated, and to do so we need to fetch information about all roles. If the reload happens after SL upgrade and before auth upgrade, the query for roles will be directed to the legacy auth tables in system_auth - and the query, being a potentially remote query, has a timeout. If the query times out, it will throw an exception which will break the group0 apply fiber and the node will need to be restarted to bring it back to work. In order to solve this issue, make sure that the service level module does not start populating and using the service level cache until both service levels and auth are migrated to raft. This is achieved by adding the check both to the cache population logic and the effective service level getter - they now look at service level's accessor new method, `can_use_effective_service_level_cache` which takes a look at the auth version. Fixes: scylladb/scylladb#24963 (cherry picked from commit `2bb800c004`)	2025-07-31 15:13:57 +00:00
Anna Stuchlik	4bc531d48d	doc: add the upgrade guide from 2025.2 to 2025.3 This PR adds the upgrade guide from version 2025.2 to 2025.3. Also, it removes the upgrade guide existing for the previous version that is irrelevant in 2025.2 (upgrade from 2025.1 to 2025.2). Note that the new guide does not include the "Enable Consistent Topology Updates" page and note, as users upgrading to 2025.3 have consistent topology updates already enabled. Fixes https://github.com/scylladb/scylladb/issues/24696 Closes scylladb/scylladb#25219 (cherry picked from commit `8365219d40`) Closes scylladb/scylladb#25248	2025-07-31 12:19:33 +03:00
Anna Stuchlik	f3ca644a55	doc: add OS support for ScyllaDB 2025.3 This commit adds the information about support for platforms in ScyllaDB version 2025.3. Fixes https://github.com/scylladb/scylladb/issues/24698 Closes scylladb/scylladb#25220 (cherry picked from commit `b67bb641bc`) Closes scylladb/scylladb#25249	2025-07-31 12:17:36 +03:00
Anna Stuchlik	573bbace20	doc: add tablets support information to the Drivers table This commit: - Extends the Drivers support table with information on which driver supports tablets and since which version. - Adds the driver support policy to the Drivers page. - Reorganizes the Drivers page to accommodate the updates. In addition: - The CPP-over-Rust driver is added to the table. - The information about Serverless (which we don't support) is removed and replaced with tablets to correctly describe the contents of the table. Fixes https://github.com/scylladb/scylladb/issues/19471 Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69 Closes scylladb/scylladb#24635 (cherry picked from commit `18b4d4a77c`) Closes scylladb/scylladb#25251	2025-07-31 12:17:21 +03:00
Aleksandra Martyniuk	4630a2f9c5	streaming: close sink when exception is thrown If an exception is thrown in result_handling_cont in streaming, then the sink does not get closed. This leads to a node crash. Close sink in exception handler. Fixes: https://github.com/scylladb/scylladb/issues/25165. Closes scylladb/scylladb#25238 (cherry picked from commit `99ff08ae78`) Closes scylladb/scylladb#25268	2025-07-31 12:17:05 +03:00
Patryk Jędrzejczak	7164f11b99	Merge '[Backport 2025.3] Revert 24418: main.cc: fix group0 shutdown order' from Petr Gusev This PR reverts the changes of #24418 since they can cause use-after-free. The `raft_group0::abort()` was called in `storage_service::do_drain` (introduced in #24418) to stop the group0 Raft server before destroying local storage. This was necessary because `raft::server` depends on storage (via `raft_sys_table_storage` and `group0_state_machine`). However, this caused issues: services like `sstable_dict_autotrainer` and `auth::service`, which use `group0_client` but are not stopped by `storage_service`, could trigger use-after-free if `raft_group0` was destroyed too early. This can happen both during normal shutdown and when 'nodetool drain' is used. This PR reverts two of the three commits from #24418. The commit [`e456d2d`](`e456d2d507`) is not reverted because it only affects logging and does not impact correctness. Fixes scylladb/scylladb#25221 Backport: this PR is a backport Closes scylladb/scylladb#25206 * https://github.com/scylladb/scylladb: Revert "main.cc: fix group0 shutdown order" Revert "storage_service: test_group0_apply_while_node_is_being_shutdown" scylla-2025.3.0-rc2-candidate-20250731010336 scylla-2025.3.0-rc2	2025-07-30 16:18:13 +02:00
Pavel Emelyanov	99f328b7a7	Merge '[Backport 2025.3] s3_client: Enhance s3_client error handling' from Scylladb[bot] Enhance and fix error handling in the `chunked_download_source` to prevent errors seeping from the request callback. Also stop retrying on seastar's side since it is going to break the integrity of data which maybe downloaded more than once for the same range. Fixes: https://github.com/scylladb/scylladb/issues/25043 Should be backported to 2025.3 since we have an intention to release native backup/restore feature - (cherry picked from commit `d53095d72f`) - (cherry picked from commit `b7ae6507cd`) - (cherry picked from commit `ba910b29ce`) - (cherry picked from commit `fc2c9dd290`) Parent PR: #24883 Closes scylladb/scylladb#25137 * github.com:scylladb/scylladb: s3_client: Disable Seastar-level retries in HTTP client creation s3_test: Validate handling of non-`aws_error` exceptions s3_client: Improve error handling in chunked_download_source aws_error: Add factory method for `aws_error` from exception	2025-07-29 14:42:45 +03:00
Pavel Emelyanov	07f46a4ad5	Merge '[Backport 2025.3] storage_service: cancel all write requests after stopping transports' from Scylladb[bot] When a node shuts down, in storage service, after storage_proxy RPCs are stopped, some write handlers within storage_proxy may still be waiting for background writes to complete. These handlers hold appropriate ERMs to block schema changes before the write finishes. After the RPCs are stopped, these writes cannot receive the replies anymore. If, at the same time, there are RPC commands executing `barrier_and_drain`, they may get stuck waiting for these ERM holders to finish, potentially blocking node shutdown until the writes time out. This change introduces cancellation of all outstanding write handlers from storage_service after the storage proxy RPCs were stopped. Fixes scylladb/scylladb#23665 Backport: since this fixes an issue that frequently causes issues in CI, backport to 2025.1, 2025.2, and 2025.3. - (cherry picked from commit `bc934827bc`) - (cherry picked from commit `e0dc73f52a`) Parent PR: #24714 Closes scylladb/scylladb#25170 * github.com:scylladb/scylladb: storage_service: Cancel all write requests on storage_proxy shutdown test: Add test for unfinished writes during shutdown and topology change	2025-07-29 14:42:25 +03:00
Taras Veretilnyk	a9f5e7d18f	docs: fix typo in command name enbleautocompaction -> enableautocompaction Renamed the file and updated all references from 'enbleautocompaction' to the correct 'enableautocompaction'. Fixes scylladb/scylladb#25172 Closes scylladb/scylladb#25175 (cherry picked from commit `6b6622e07a`) Closes scylladb/scylladb#25218	2025-07-29 14:41:50 +03:00
Petr Gusev	d8f6a497a5	Revert "main.cc: fix group0 shutdown order" This reverts commit `6b85ab79d6`.	2025-07-28 17:50:38 +02:00
Petr Gusev	c98dde92db	Revert "storage_service: test_group0_apply_while_node_is_being_shutdown" This reverts commit `b1050944a3`.	2025-07-28 17:49:03 +02:00
Aleksandra Martyniuk	8efee38d6f	tasks: do not use binary progress for task manager tasks Currently, progress of a parent task depends on expected_total_workload, expected_children_number, and children progresses. Basically, if total workload is known or all children have already been created, progresses of children are summed up. Otherwise binary progress is returned. As a result, two tasks of the same type may return progress in different units. If they are children of the same task and this parent gathers the progress - it becomes meaningless. Drop expected_children_number as we can't assume that children are able to show their progresses. Modify get_progress method - progress is calculated based on children progresses. If expected_total_workload isn't specified, the total progress of a task may grow. If expected_total_workload isn't specified and no children are created, empty progress (0/0) is returned. Fixes: https://github.com/scylladb/scylladb/issues/24650. Closes scylladb/scylladb#25113 (cherry picked from commit `a7ee2bbbd8`) Closes scylladb/scylladb#25200	2025-07-28 13:11:45 +03:00
Michael Litvak	934260e9a9	storage service: drain view builder before group0 The view builder uses group0 operations to coordinate view building, so we should drain the view builder before stopping group0. Fixes scylladb/scylladb#25096 Closes scylladb/scylladb#25101 (cherry picked from commit `3ff388cd94`) Closes scylladb/scylladb#25198	2025-07-28 13:05:14 +03:00
Nadav Har'El	583c118ccd	Merge '[Backport 2025.3] alternator: avoid oversized allocation in Query/Scan' from Scylladb[bot] This series fixes one cause of oversized allocations - and therefore potentially stalls and increased tail latencies - in Alternator. The first patch in the series is the main fix - the later patches are cleanups requested by reviewers but also involved other pre-existing code, so I did those cleanups as separate patches. Alternator's Scan or Query operation return a page of results. When the number of items is not limited by a "Limit" parameter, the default is to return a 1 MB page. If items are short, a large number of them can fit in that 1MB. The test test_query.py::test_query_large_page_small_rows has 30,000 items returned in a single page. In the response JSON, all these items are returned in a single array "Items". Before this patch, we build the full response as a RapidJSON object before sending it. The problem is that unfortunately, RapidJSON stores arrays as contiguous allocations. This results in large contiguous allocations in workloads that scan many small items, and large contiguous allocations can also cause stalls and high tail latencies. For example, before this patch, running test/alternator/run --runveryslow \ test_query.py::test_query_large_page_small_rows reports in the log: oversized allocation: 573440 bytes. After this patch, this warning no longer appears. The patch solves the problem by collecting the scanned items not in a RapidJSON array, but rather in a chunked_vector<rjson::value>, i.e, a chunked (non-contiguous) array of items (each a JSON value). After collecting this array separately from the response object, we need to print its content without actually inserting it into the object - we add a new function print_with_extra_array() to do that. The new separate-chunked-vector technique is used when a large number (currently, >256) of items were scanned. When there is a smaller number of items in a page (this is typical when each item is longer), we just insert those items in the object and print it as before. Beyond the original slow test that demonstrated the oversized allocation (which is now gone), this patch also includes a new test which exercises the new code with a scan of 700 (>256) items in a page - but this new test is fast enough to be permanently in our test suite and not a manual "veryslow" test as the other test. Fixes #23535 The stalls caused by large allocations was seen by actual users, so it makes sense to backport this patch. On the other hand, the patch while not big is fairly intrusive (modifies the nomal Scan and Query path and also the later patches do some cleanup of additional code) so there is some small risk involved in the backport. - (cherry picked from commit `2385fba4b6`) - (cherry picked from commit `d8fab2a01a`) - (cherry picked from commit `13ec94107a`) - (cherry picked from commit `a248336e66`) Parent PR: #24480 Closes scylladb/scylladb#25194 * github.com:scylladb/scylladb: alternator: clean up by co-routinizing alternator: avoid spamming the log when failing to write response alternator: clean up and simplify request_return_type alternator: avoid oversized allocation in Query/Scan	2025-07-27 14:12:49 +03:00
Nadav Har'El	f1c5350141	alternator: clean up by co-routinizing Reviewers of the previous patch complained on some ugly pre-existing code in alternator/executor.cc, where returning from an asynchronous (future) function require lengthy verbose casts. So this patch cleans up a few instances of these ugly casts by using co_return instead of return. For example, the long and verbose return make_ready_future<executor::request_return_type>( rjson::print(std::move(response))); can be changed to the shorter and more readable co_return rjson::print(std::move(response)); This patch should not have any functional implications, and also not any performance implications: I only coroutinized slow-path functions and one function that was already "partially" coroutinized (and this was expecially ugly and deserved being fixed). Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `a248336e66`)	2025-07-27 07:42:01 +00:00
Nadav Har'El	f897f38003	alternator: avoid spamming the log when failing to write response Both make_streamed() and new make_streamed_with_extra_array() functions, used when returning a long response in Alternator, would write an error- level log message if it failed to write the response. This log message is probably not helpful, and may spam the log if the application causes repeated errors intentionally or accidentally. So drop these log messages. The exception is still thrown as usual. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `13ec94107a`)	2025-07-27 07:42:01 +00:00
Nadav Har'El	fe037663ea	alternator: clean up and simplify request_return_type The previous patch introduced a function make_streamed_with_extra_array which was a duplicate of the existing make_streamed. Reviewers complained how baroque the new function is (just like the old function), having to jump through hoops to return a copyable function working on non-copyable objects, making strange-named copies and shared pointers of everything. We needed to return a copyable function (std::function) just because Alternator used Seastar's json::json_return_type in the return type from executor function (request_return_type). This json_return_type contained either a sstring or an std::function, but neither was ever really appropriate: 1. We want to return noncopyable_function, not an std::function! 2. We want to return an std::string (which rjson::print()) returns, not an sstring! So in this patch we stop using seastar::json::json_return_type entirely in Alternator. Alternator's request_return_type is now an std::variant of three types: 1. std::string for short responses, 2. noncopyable_function for long streamed response 3. api_error for errors. The ugliest parts of make_streamed() where we made copies and shared pointers to allow for a copyable function are all gone. Even nicer, a lot of other ugly relics of using seastar::json_return_type are gone: 1. We no longer need obscure classes and functions like make_jsonable() and json_string() to convert strings to response bodies - an operation can simply return a string directly - usually returning rjson::print(value) or a fixed string like "" and it just works. 2. There is no more usage of seastar::json in Alternator (except one minor use of seastar::json::formatter::to_json in streams.cc that can be removed later). Alternator uses RapidJSON for its JSON needs, we don't need to use random pieces from a different JSON library. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `d8fab2a01a`)	2025-07-27 07:42:01 +00:00
Nadav Har'El	b7da50d781	alternator: avoid oversized allocation in Query/Scan This patch fixes one cause of oversized allocations - and therefore potentially stalls and increased tail latencies - in Alternator. Alternator's Scan or Query operation return a page of results. When the number of items is not limited by a "Limit" parameter, the default is to return a 1 MB page. If items are short, a large number of them can fit in that 1MB. The test test_query.py::test_query_large_page_small_rows has 30,000 items returned in a single page. In the response JSON, all these items are returned in a single array "Items". Before this patch, we build the full response as a RapidJSON object before sending it. The problem is that unfortunately, RapidJSON stores arrays as contiguous allocations. This results in large contiguous allocations in workloads that scan many small items, and large contiguous allocations can also cause stalls and high tail latencies. For example, before this patch, running test/alternator/run --runveryslow \ test_query.py::test_query_large_page_small_rows reports in the log: oversized allocation: 573440 bytes. After this patch, this warning no longer appears. The patch solves the problem by collecting the scanned items not in a RapidJSON array, but rather in a chunked_vector<rjson::value>, i.e, a chunked (non-contiguous) array of items (each a JSON value). After collecting this array separately from the response object, we need to print its content without actually inserting it into the object - we add a new function print_with_extra_array() to do that. The new separate-chunked-vector technique is used when a large number (currently, >256) of items were scanned. When there is a smaller number of items in a page (this is typical when each item is longer), we just insert those items in the object and print it as before. Beyond the original slow test that demonstrated the oversized allocation (which is now gone), this patch also includes a new test which exercises the new code with a scan of 700 (>256) items in a page - but this new test is fast enough to be permanently in our test suite and not a manual "veryslow" test as the other test. Fixes #23535 (cherry picked from commit `2385fba4b6`)	2025-07-27 07:42:01 +00:00
Pavel Emelyanov	7c04619ecf	Merge '[Backport 2025.3] encryption_at_rest_test: Fix some spurious errors' from Scylladb[bot] Fixes #24574 * Ensure we close the embedded load_cache objects on encryption shutdown, otherwise we can, in unit testing, get destruction of these while a timer is still active -> assert * Add extra exception handling to `network_error_test_helper`, so even if test framework might exception-escape, we properly stop the network proxy to avoid use after free. - (cherry picked from commit `ee98f5d361`) - (cherry picked from commit `8d37e5e24b`) Parent PR: #24633 Closes scylladb/scylladb#24772 * github.com:scylladb/scylladb: encryption_at_rest_test: Add exception handler to ensure proxy stop encryption: Ensure stopping timers in provider cache objects	2025-07-24 16:35:53 +03:00
Pavel Emelyanov	b07f4fb26b	Merge '[Backport 2025.3] streaming: Avoid deadlock by running view checks in a separate scheduling group' from Scylladb[bot] This issue happens with removenode, when RBNO is disabled, so range streamer is used. The deadlock happens in a scenario like this: 1. Start 3 nodes: {A, B, C}, RF=2 2. Node A is lost 3. removenode A 4. Both B and C gain ownership of ranges. 5. Streaming sessions are started with crossed directions: B->C, C->B Readers created by sender side exhaust streaming semaphore on B and C. Receiver side attempts to obtain a permit indirectly by calling check_needs_view_update_path(), which reads local tables. That read is blocked and times-out, causing streaming to fail. The streaming writer is already using a tracking-only permit. Even if we didn't deadlock, and the streaming semaphore was simply exhausted by other receiving sessions (via tracking-only permit), the query may still time-out due to starvation. To avoid that, run the query under a different scheduling group, which translates to the system semaphore instead of the maintenance semaphore, to break the dependency. The gossip group was chosen because it shouldn't be contended and this change should not interfere with it much. Fixes #24807 Fixes #24925 - (cherry picked from commit `ee2fa58bd6`) - (cherry picked from commit `dff2b01237`) Parent PR: #24929 Closes scylladb/scylladb#25058 * github.com:scylladb/scylladb: streaming: Avoid deadlock by running view checks in a separate scheduling group service: migration_manager: Run group0 barrier in gossip scheduling group	2025-07-24 16:35:24 +03:00
Ran Regev	c5f4ad3665	nodetool restore: sstable list from a file Fixes: #25045 added the ability to supply the list of files to restore from the a given file. mainly required for local testing. Signed-off-by: Ran Regev <ran.regev@scylladb.com> Closes scylladb/scylladb#25077 (cherry picked from commit `dd67d22825`) Closes scylladb/scylladb#25124	2025-07-24 16:35:04 +03:00
Ran Regev	013e0d685c	docs: update nodetool restore documentation for --sstables-file-list Fixes: #25128 A leftover from #25077 Closes scylladb/scylladb#25129 (cherry picked from commit `3d82b9485e`) Closes scylladb/scylladb#25139	2025-07-24 16:34:39 +03:00
Jakub Smolar	800f819b5b	gdb: handle zero-size reads in managed_bytes Fixes: https://github.com/scylladb/scylladb/issues/25048 Closes scylladb/scylladb#25050 (cherry picked from commit `6e0a063ce3`) Closes scylladb/scylladb#25142	2025-07-24 16:34:04 +03:00
Sergey Zolotukhin	8ac6aaadaf	storage_service: Cancel all write requests on storage_proxy shutdown During a graceful node shutdown, RPC listeners are stopped in `storage_service::drain_on_shutdown` as one of the first steps. However, even after RPCs are shut down, some write handlers in `storage_proxy` may still be waiting for background writes to complete. These handlers retain the ERM. Since the RPC subsystem is no longer active, replies cannot be received, and if any RPC commands are concurrently executing `barrier_and_drain`, they may get stuck waiting for those writes. This can block the messaging server shutdown and delay the entire shutdown process until the write timeout occurs. This change introduces the cancellation of all outstanding write handlers in `storage_proxy` during shutdown to prevent unnecessary delays. Fixes scylladb/scylladb#23665 (cherry picked from commit `e0dc73f52a`)	2025-07-24 13:03:32 +00:00
Sergey Zolotukhin	16a8cd9514	test: Add test for unfinished writes during shutdown and topology change This test reproduces an issue where a topology change and an ongoing write query during query coordinator shutdown can cause the node to get stuck. When a node receives a write request, it creates a write handler that holds a copy of the current table's ERM (Effective Replication Map). The ERM ensures that no topology or schema changes occur while the request is being processed. After the query coordinator receives the required number of replica write ACKs to satisfy the consistency level (CL), it sends a reply to the client. However, the write response handler remains alive until all replicas respond — the remaining writes are handled in the background. During shutdown, when all network connections are closed, these responses can no longer be received. As a result, the write response handler is only destroyed once the write timeout is reached. This becomes problematic because the ERM held by the handler blocks topology or schema change commands from executing. Since shutdown waits for these commands to complete, this can lead to unnecessary delays in node shutdown and restarts, and occasional test case failures. Test for: scylladb/scylladb#23665 (cherry picked from commit `bc934827bc`)	2025-07-24 13:03:32 +00:00
Ernest Zaslavsky	e45852a595	s3_client: Disable Seastar-level retries in HTTP client creation Prevent Seastar from retrying HTTP requests to avoid buffer double-feed issues when an entire request is retried. This could cause data corruption in `chunked_download_source`. The change is global for every instance of `s3_client`, but it is still safe because: * Seastar's `http_client` resets connections regardless of retry behavior * `s3_client` retry logic handles all error types—exceptions, HTTP errors, and AWS-specific errors—via `http_retryable_client` (cherry picked from commit `fc2c9dd290`)	2025-07-22 16:46:54 +00:00
Ernest Zaslavsky	fdf706a6eb	s3_test: Validate handling of non-`aws_error` exceptions Inject exceptions not wrapped in `aws_error` from request callback lambda to verify they are properly caught and handled. (cherry picked from commit `ba910b29ce`)	2025-07-22 16:46:53 +00:00
Ernest Zaslavsky	2bc3accf9c	s3_client: Improve error handling in chunked_download_source Create aws_error from raised exceptions when possible and respond appropriately. Previously, non-aws_exception types leaked from the request handler and were treated as non-retryable, causing potential data corruption during download. (cherry picked from commit `b7ae6507cd`)	2025-07-22 16:46:53 +00:00
Ernest Zaslavsky	0106d132bd	aws_error: Add factory method for `aws_error` from exception Move `aws_error` creation logic out of `retryable_http_client` and into the `aws_error` class to support reuse across components. (cherry picked from commit `d53095d72f`)	2025-07-22 16:46:53 +00:00
Pavel Emelyanov	53637fdf61	Merge '[Backport 2025.3] storage: add `make_data_or_index_source` to the storages' from Scylladb[bot] Add `make_data_or_index_source` to the storages to utilize new S3 based data source which should improve restore performance * Introduce the `encrypted_data_source` class that wraps an existing data source to read and decrypt data on the fly using block encryption. Also add unit tests to verify correct decryption behavior. * Add `make_data_or_index_source` to the `storage` interface, implement it for `filesystem_storage` storage which just creates `data_source` from a file and for the `s3_storage` create a (maybe) decrypting source from s3 make_download_source. This change should solve performance improvement for reading large objects from S3 and should not affect anything for the `filesystem_storage` Fixes: https://github.com/scylladb/scylladb/issues/22458 - (cherry picked from commit `211daeaa40`) - (cherry picked from commit `7e5e3c5569`) - (cherry picked from commit `0de61f56a2`) - (cherry picked from commit `8ac2978239`) - (cherry picked from commit `dff9a229a7`) - (cherry picked from commit `8d49bb8af2`) Parent PR: #23695 Closes scylladb/scylladb#25016 * github.com:scylladb/scylladb: sstables: Start using `make_data_or_index_source` in `sstable` sstables: refactor readers and sources to use coroutines sstables: coroutinize futurized readers sstables: add `make_data_or_index_source` to the `storage` encryption: refactor key retrieval encryption: add `encrypted_data_source` class	2025-07-21 18:05:53 +03:00
Piotr Dulikowski	fdfcd67a6e	Merge '[Backport 2025.3] cdc: Forbid altering columns of CDC log tables directly' from Scylladb[bot] The set of columns of a CDC log table should be managed automatically by Scylla, and the user should not have the ability to manipulate them directly. That could lead to disastrous consequences such as a segmentation fault. In this commit, we're restricting those operations. We also provide two validation tests. One of the existing tests had to be adjusted as it modified the type of a column in a CDC log table. Since the test simply verifies that the user has sufficient permissions to perform `ALTER TABLE` on the log table, the test is still valid. Fixes scylladb/scylladb#24643 Backport: we should backport the change to all affected branches to prevent the consequences that may affect the user. - (cherry picked from commit `20d0050f4e`) - (cherry picked from commit `59800b1d66`) Parent PR: #25008 Closes scylladb/scylladb#25108 * github.com:scylladb/scylladb: cdc: Forbid altering columns of inactive CDC log table cdc: Forbid altering columns of CDC log tables directly	2025-07-21 16:22:31 +02:00
Dawid Mędrek	dc6cb5cfad	cdc: Forbid altering columns of inactive CDC log table When CDC becomes disabled on the base table, the CDC log table still exsits (cf. scylladb/scylladb@adda43edc7). If it continues to exist up to the point when CDC is re-enabled on the base table, no new log table will be created -- instead, the old olg table will be re-attached. Since we want to avoid situations when the definition of the log table has become misaligned with the definition of the base table due to actions of the user, we forbid modifying the set of columns or renaming them in CDC log tables, even when they're inactive. Validation tests are provided. (cherry picked from commit `59800b1d66`)	2025-07-21 11:43:49 +00:00
Dawid Mędrek	10a9ced4d1	cdc: Forbid altering columns of CDC log tables directly The set of columns of a CDC log table should be managed automatically by Scylla, and the user should not have the ability to manipulate them directly. That could lead to disastrous consequences such as a segmentation fault. In this commit, we're restricting those operations. We also provide two validation tests. One of the existing tests had to be adjusted as it modified the type of a column in a CDC log table. Since the test simply verifies that the user has sufficient permissions to perform `ALTER TABLE` on the log table, the test is still valid. Fixes scylladb/scylladb#24643 (cherry picked from commit `20d0050f4e`)	2025-07-21 11:43:49 +00:00
Ernest Zaslavsky	934359ea28	s3_client: parse multipart response XML defensively Ensure robust handling of XML responses when initiating multipart uploads. Check for the existence of required nodes before access, and throw an exception if the XML is empty or malformed. Refs: https://github.com/scylladb/scylladb/issues/24676 Closes scylladb/scylladb#24990 (cherry picked from commit `342e94261f`) Closes scylladb/scylladb#25057	2025-07-21 12:03:00 +02:00
Piotr Dulikowski	74d97711fd	Merge '[Backport 2025.3] cdc: throw error if column doesn't exist' from Scylladb[bot] in the CDC log transformer, when creating a CDC mutation based on some base table mutation, for each value of a base column we set the value in the CDC column with the same name. When looking up the column in the CDC schema by name, we may get a null pointer if a column by that name is not found. This shouldn't happen normally because the base schema and CDC schema should be compatible, and for each base column there should be a CDC column with the same name. However, there are scenarios where the base schema and CDC schema are incompatible for a short period of time when they are being altered. When a base column is being added or dropped, we could get a base mutation with this column set, and then the CDC transformer picks up the latest CDC schema which doesn't have this column. If such thing happens, we fix the code to throw an exception instead of crashing on null pointer dereference. Currently we don't have a safer approach to handle this, but this might be changed in the future. The other alternative is dropping that data silently which we prefer not to do. Throwing an error is acceptable because this scenario most likely indicates this behavior by the user: * The user adds a new column, and start writing values to the column before the ALTER is complete. or, * The user drops a column, and continues writing values to the column while it's being dropped. Both cases might as well fail with an error because the column is not found in the base table. Fixes scylladb/scylladb#/24952 backport needed - simple fix for a node crash - (cherry picked from commit `b336f282ae`) - (cherry picked from commit `86dfa6324f`) Parent PR: #24986 Closes scylladb/scylladb#25067 * github.com:scylladb/scylladb: test: cdc: add test_cdc_with_alter cdc: throw error if column doesn't exist	2025-07-21 11:18:06 +02:00
Jenkins Promoter	fc7a6b66e2	Update ScyllaDB version to: 2025.3.0-rc2	2025-07-20 15:44:21 +03:00
Michael Litvak	594ec7d66d	test: cdc: add test_cdc_with_alter Add a test that tests adding and dropping a column to a table with CDC enabled while writing to it. (cherry picked from commit `86dfa6324f`)	2025-07-20 09:04:00 +02:00
Michael Litvak	338ff18dfe	cdc: throw error if column doesn't exist in the CDC log transformer, when creating a CDC mutation based on some base table mutation, for each value of a base column we set the value in the CDC column with the same name. When looking up the column in the CDC schema by name, we may get a null pointer if a column by that name is not found. This shouldn't happen normally because the base schema and CDC schema should be compatible, and for each base column there should be a CDC column with the same name. However, there are scenarios where the base schema and CDC schema are incompatible for a short period of time when they are being altered. When a base column is being added or dropped, we could get a base mutation with this column set, and then the CDC transformer picks up the latest CDC schema which doesn't have this column. If such thing happens, we fix the code to throw an exception instead of crashing on null pointer dereference. Currently we don't have a safer approach to handle this, but this might be changed in the future. The other alternative is dropping that data silently which we prefer not to do. Throwing an error is acceptable because this scenario most likely indicates this behavior by the user: * The user adds a new column, and start writing values to the column before the ALTER is complete. or, * The user drops a column, and continues writing values to the column while it's being dropped. Both cases might as well fail with an error because the column is not found in the base table. Fixes scylladb/scylladb#24952 (cherry picked from commit `b336f282ae`)	2025-07-18 10:36:44 +00:00
Tomasz Grabiec	888e92c969	streaming: Avoid deadlock by running view checks in a separate scheduling group This issue happens with removenode, when RBNO is disabled, so range streamer is used. The deadlock happens in a scenario like this: 1. Start 3 nodes: {A, B, C}, RF=2 2. Node A is lost 3. removenode A 4. Both B and C gain ownership of ranges. 5. Streaming sessions are started with crossed directions: B->C, C->B Readers created by sender side exhaust streaming semaphore on B and C. Receiver side attempts to obtain a permit indirectly by calling check_needs_view_update_path(), which reads local tables. That read is blocked and times-out, causing streaming to fail. The streaming writer is already using a tracking-only permit. To avoid that, run the query under a different scheduling group, which translates to the system semaphore instead of the maintenance semaphore, to break the dependency. The gossip group was chosen because it shouldn't be contended and this change should not interfere with it much. Fixes: #24807 (cherry picked from commit `dff2b01237`)	2025-07-17 17:25:44 +00:00
Tomasz Grabiec	f424c773a4	service: migration_manager: Run group0 barrier in gossip scheduling group Fixes two issues. One is potential priority inversion. The barrier will be executed using scheduling group of the first fiber which triggers it, the rest will block waiting on it. For example, CQL statements which need to sync the schema on replica side can block on the barrier triggered by streaming. That's undesirable. This is theoretical, not proved in the field. The second problem is blocking the error path. This barrier is called from the streaming error handling path. If the streaming concurrency semaphore is exhausted, and streaming fails due to timeout on obtaining the permit in check_needs_view_update_path(), the error path will block too because it will also attempt to obtain the permit as part of the group0 barrier. Running it in the gossip scheduling group prevents this. Fixes #24925 (cherry picked from commit `ee2fa58bd6`)	2025-07-17 17:25:44 +00:00
Piotr Dulikowski	e49b312be9	auth: fix crash when migration code runs parallel with raft upgrade The functions password_authenticator::start and standard_role_manager::start have a similar structure: they spawn a fiber which invokes a callback that performs some migration until that migration succeeds. Both handlers set a shared promise called _superuser_created_promise (those are actually two promises, one for the password authenticator and the other for the role manager). The handlers are similar in both cases. They check if auth is in legacy mode, and behave differently depending on that. If in legacy mode, the promise is set (if it was not set before), and some legacy migration actions follow. In auth-on-raft mode, the superuser is attempted to be created, and if it succeeds then the promise is _unconditionally_ set. While it makes sense at a glance to set the promise unconditionally, there is a non-obvious corner case during upgrade to topology on raft. During the upgrade, auth switches from the legacy mode to auth on raft mode. Thus, if the callback didn't succeed in legacy mode and then tries to run in auth-on-raft mode and succeds, it will unconditionally set a promise that was already set - this is a bug and triggers an assertion in seastar. Fix the issue by surrounding the `shared_promise::set_value` call with an `if` - like it is already done for the legacy case. Fixes: scylladb/scylladb#24975 Closes scylladb/scylladb#24976 (cherry picked from commit `a14b7f71fe`) Closes scylladb/scylladb#25019	2025-07-17 13:32:35 +02:00

1 2 3 4 5 ...

48367 Commits