scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 14:15:46 +00:00

Author	SHA1	Message	Date
Aleksandra Martyniuk	9554b4ef28	db/batchlog_manager: coroutinize replay_all_failed_batches (cherry picked from commit `502b03dbc6`)	2025-12-16 15:55:25 +01:00
Jenkins Promoter	bc863d96fe	Update pgo profiles - aarch64	2025-12-15 04:45:57 +02:00
Benny Halevy	9ac657aa20	utils: error_injection: wait_for_message: print injection_name and caller source_location on timeout When waiting for the condition variable times out we call on_internal_error, but unfortunately, the backtrace it generates is obfuscated by `coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume`. To make the log more useful, print the error injection name and the caller's source_location in the timeout error message. Fixes #27531 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#27532 (cherry picked from commit `5f13880a91`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#27579	2025-12-12 14:21:24 +01:00
Anna Stuchlik	58b490869f	replace the Driver pages with a link to the new Drivers pages This commit removes the now redundant driver pages from the Scylla DB documentation. Instead, the link to the pages where we moved the diver information is added. Also, the links are updated across the ScyllaDB manual. Redirections are added for all the removed pages. Fixes https://github.com/scylladb/scylladb/issues/26871 Closes scylladb/scylladb#27277 (cherry picked from commit `c5580399a8`) Closes scylladb/scylladb#27436	2025-12-12 10:35:42 +01:00
Yaron Kaikov	71d82e4271	Add JIRA issue validation to backport PR fixes check Extend the Fixes validation pattern to also accept JIRA issue references (format: [A-Z]+-\d+) in addition to GitHub issue references. This allows backport PRs to reference JIRA issues in the format 'Fixes: PROJECT-123'. Fixes: https://github.com/scylladb/scylladb/issues/27571 Closes scylladb/scylladb#27572 (cherry picked from commit `3dfa5ebd7f`) Closes scylladb/scylladb#27594	2025-12-12 09:38:08 +02:00
Avi Kivity	dfb5d1c776	tools: toolchain: prepare: replace 'reg' with 'skopeo' The prepare scripts uses 'reg' to verify we're not going to overwrite an existing image. The 'reg' command is not available in Fedora 43. Use 'skopeo' instead. Skopeo is part of the podman ecosystem so hopefully will live longer. Fixes #27178. Closes scylladb/scylladb#27179 (cherry picked from commit `d6ef5967ef`) Closes scylladb/scylladb#27193	2025-12-07 13:15:12 +02:00
Calle Wilund	7defa0b4cd	commitlog::read_log_file: Check for eof position on all data reads Fixes #24346 When reading, we check for each entry and each chunk, if advancing there will hit EOF of the segment. However, IFF the last chunk being read has the last entry _exactly_ matching the chunk size, and the chunk ending at _exactly_ segment size (preset size, typically 32Mb), we did not check the position, and instead complained about not being able to read. This has literally _never_ happened in actual commitlog (that was replayed at least), but has apparently happened more and more in hints replay. Fix is simple, just check the file position against size when advancing said position, i.e. when reading (skipping already does). v2: * Added unit test Closes scylladb/scylladb#27236 (cherry picked from commit `59c87025d1`) Closes scylladb/scylladb#27336	2025-12-03 12:21:13 +03:00
Pavel Emelyanov	388627365f	Merge '[Backport 2025.1] tablet: scheduler: Do not emit conflicting migration in merge colocation' from Scylladb[bot] The tablet scheduler should not emit conflicting migrations for the same tablet. This was addressed initially in scylladb/scylladb#26038 but the check is missing in the merge colocation plan, so add it there as well. Without this check, the merge colocation plan could generate a conflicting migration for a tablet that is already scheduled for migration, as the test demonstrates. This can cause correctness problems, because if the load balancer generates two migrations for a single tablet, both will be written as mutations, and the resulting mutation could contain mixed cells from both migrations. Fixes scylladb/scylladb#27304 backport to existing releases - this is a bug that can affect correctness - (cherry picked from commit `97b7c03709`) Parent PR: #27312 Closes scylladb/scylladb#27329 * github.com:scylladb/scylladb: tablet: scheduler: Do not emit conflicting migration in merge colocation tablet: scheduler: Do not emit conflicting migrations in the plan	2025-12-03 12:20:39 +03:00
Aleksandra Martyniuk	5297084bd1	replica: database: change type of tables_metadata::_ks_cf_to_uuid If there is a lot of tables, a node reports oversized allocation in _ks_cf_to_uuid of type flat_hash_map. Change the type to std::unordered_map to prevent oversized allocations. Fixes: https://github.com/scylladb/scylladb/issues/26787. Closes scylladb/scylladb#27165 (cherry picked from commit `19a7d8e248`) Closes scylladb/scylladb#27192	2025-12-03 12:19:12 +03:00
Ernest Zaslavsky	6cbf09dae1	streaming:: add more logging Start logging all missed streaming options like `scope` and `primary_replica` flags Fixes: https://github.com/scylladb/scylladb/issues/27299 Closes scylladb/scylladb#27311 (cherry picked from commit `1d5f60baac`) Closes scylladb/scylladb#27335	2025-12-02 12:21:01 +01:00
Jenkins Promoter	641a573757	Update pgo profiles - aarch64	2025-12-01 04:49:52 +02:00
Michael Litvak	e3f5924f71	tablet: scheduler: Do not emit conflicting migration in merge colocation The tablet scheduler should not emit conflicting migrations for the same tablet. This was addressed initially in scylladb/scylladb#26038 but the check is missing in the merge colocation plan, so add it there as well. Without this check, the merge colocation plan could generate a conflicting migration for a tablet that is already scheduled for migration, as the test demonstrates. This can cause correctness problems, because if the load balancer generates two migrations for a single tablet, both will be written as mutations, and the resulting mutation could contain mixed cells from both migrations. Fixes scylladb/scylladb#27304 Closes scylladb/scylladb#27312 (cherry picked from commit `97b7c03709`)	2025-11-30 10:37:58 +01:00
Tomasz Grabiec	dc1a318971	tablet: scheduler: Do not emit conflicting migrations in the plan Plan-making is invoked independently for different DCs (and in the future, racks) and then plans are merged. It could be that the same tablets are selected for migration in different DCs. Only one migration will prevail and be committed to group0, so it's not a correctness problem. Next cycle will recognize that the tablet is in transition and will not be selected by plan-maker. But it makes plan-making less efficient. It may also surprise consumers of the plan, like we saw in #25912. So we should make plan-maker be aware of already scheduled transitions and not consider those tablets as candidates. Fixes #26038 Closes scylladb/scylladb#26048 (cherry picked from commit `981592bca5`)	2025-11-30 10:37:58 +01:00
Patryk Jędrzejczak	edbd2e80b2	Merge '[Backport 2025.1] fix notification about expiring erm held for to long' from Scylladb[bot] Commit `6e4803a750` broke notification about expired erms held for too long since it resets the tracker without calling its destructor (where notification is triggered). Fix the assign operator to call the destructor like it should. Fixes https://github.com/scylladb/scylladb/issues/27141 - (cherry picked from commit `9f97c376f1`) - (cherry picked from commit `5dcdaa6f66`) Parent PR: #27140 Closes scylladb/scylladb#27273 * https://github.com/scylladb/scylladb: test: test that expired erm that held for too long triggers notification token_metadata: fix notification about expiring erm held for to long	2025-11-27 12:35:18 +01:00
Patryk Jędrzejczak	d735fe2ecc	Merge '[Backport 2025.1] locator/node: include _excluded in missing places' from Scylladb[bot] We currently ignore the `_excluded` field in `node::clone()` and the verbose formatter of `locator::node`. The first one is a bug that can have unpredictable consequences on the system. The second one can be a minor inconvenience during debugging. We fix both places in this PR. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-72 This PR is a bugfix that should be backported to all supported branches. - (cherry picked from commit `4160ae94c1`) - (cherry picked from commit `287c9eea65`) Parent PR: #27265 Closes scylladb/scylladb#27288 * https://github.com/scylladb/scylladb: locator/node: include _excluded in verbose formatter locator/node: preserve _excluded in clone()	2025-11-27 12:32:38 +01:00
Jenkins Promoter	4b95b6ac21	Update ScyllaDB version to: 2025.1.11	2025-11-27 07:19:36 +02:00
Patryk Jędrzejczak	e3af767196	locator/node: include _excluded in verbose formatter It can be helpful during debugging. (cherry picked from commit `287c9eea65`)	2025-11-26 23:03:36 +00:00
Patryk Jędrzejczak	b779f15cdb	locator/node: preserve _excluded in clone() We currently ignore the `_excluded` field in `clone()`. Losing information about exclusion can have unpredictable consequences. One observed effect (that led to finding this issue) is that the `/storage_service/nodes/excluded` API endpoint sometimes misses excluded nodes. (cherry picked from commit `4160ae94c1`)	2025-11-26 23:03:36 +00:00
Gleb Natapov	01c71681a8	test: test that expired erm that held for too long triggers notification (cherry picked from commit `5dcdaa6f66`)	2025-11-26 15:07:28 +00:00
Gleb Natapov	58f927fd8e	token_metadata: fix notification about expiring erm held for to long Commit `6e4803a750` broke notification about expired erms held for too long since it resets the tracker without calling its destructor (where notification is triggered). Fix assign operator to call destructor. (cherry picked from commit `9f97c376f1`)	2025-11-26 15:07:28 +00:00
Ernest Zaslavsky	d10f02e49b	streaming: fix loop break condition in tablet_sstable_streamer::stream Correct the loop termination logic that previously caused certain SSTables to be prematurely excluded, resulting in lost mutations. This change ensures all relevant SSTables are properly streamed and their mutations preserved. (cherry picked from commit `dedc8bdf71`) Closes scylladb/scylladb#27146 Fixes: #26979 Parent PR: #26980 Unfortunatelly the pytest based test cannot be ported back because of changes made to the testing harness and scylla-tools	2025-11-25 11:56:38 +03:00
Pavel Emelyanov	14fd0d9c21	lister: Fix race between readdir and stat Sometimes file::list_directory() returns entries without type set. In thase case lister calls file_type() on the entry name to get it. In case the call returns disengated type, the code assumes that some error occurred and resolves into exception. That's not correct. The file_type() method returns disengated type only if the file being inspected is missing (i.e. on ENOENT errno). But this can validly happen if a file is removed bettween readdir and stat. In that case it's not "some error happened", but a enry should be just skipped. In "some error happened", then file_type() would resolve into exceptional future on its own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26595 (cherry picked from commit `d9bfbeda9a`) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26754	2025-11-24 17:19:42 +03:00
Pavel Emelyanov	4f0ea75f51	Merge '[Backport 2025.1] Synchronize tablet split and load-and-stream' from Raphael Raph Carvalho Load-and-stream is broken when running concurrently to the finalization step of tablet split. Consider this: 1. split starts 2. split finalization executes barrier and succeed 3. load-and-stream runs now, starts writing sstable (pre-split) 4. split finalization publishes changes to tablet metadata 5. load-and-stream finishes writing sstable 6. sstable cannot be loaded since it spans two tablets two possible fixes (maybe both): load-and-stream awaits for topology to quiesce perform split compaction on sstable that spans both sibling tablets This patch implements # 1. By awaiting for topology to quiesce, we guarantee that load-and-stream only starts when there's no chance coordinator is handling some topology operation like split finalization. Fixes https://github.com/scylladb/scylladb/issues/26455. (cherry picked from commit `3abc66da5a`) (cherry picked from commit `4654cdc6fd`) Parent PR: https://github.com/scylladb/scylladb/pull/26456 Closes scylladb/scylladb#27126 * https://github.com/scylladb/scylladb: sstables_loader: Don't bypass synchronization with busy topology test: Add reproducer for l-a-s and split synchronization issue sstables_loader: Synchronize tablet split and load-and-stream sstable_set: incremental_reader_selector: be more careful when filtering out already engaged sstables	2025-11-24 17:17:53 +03:00
Raphael S. Carvalho	90e6e88f69	sstables_loader: Don't bypass synchronization with busy topology The patch `c543059f86` fixed the synchronization issue between tablet split and load-and-stream. The synchronization worked only with raft topology, and therefore was disabled with gossip. To do the check, storage_service::raft_topology_change_enabled() but the topology kind is only available/set on shard 0, so it caused the synchronization to be bypassed when load-and-stream runs on any shard other than 0. The reason the reproducer didn't catch it is that it was restricted to single cpu. It will now run with multi cpu and catch the problem observed. Fixes #22707 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#26730 (cherry picked from commit `7f34366b9d`) (cherry picked from commit `4c466ace4f`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-11-21 10:39:53 -03:00
Raphael S. Carvalho	d2bddea515	test: Add reproducer for l-a-s and split synchronization issue Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `4654cdc6fd`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-11-21 10:39:53 -03:00
Raphael S. Carvalho	e6dd6b462e	sstables_loader: Synchronize tablet split and load-and-stream Load-and-stream is broken when running concurrently to the finalization step of tablet split. Consider this: 1) split starts 2) split finalization executes barrier and succeed 3) load-and-stream runs now, starts writing sstable (pre-split) 4) split finalization publishes changes to tablet metadata 5) load-and-stream finishes writing sstable 6) sstable cannot be loaded since it spans two tablets two possible fixes (maybe both): 1) load-and-stream awaits for topology to quiesce 2) perform split compaction on sstable that spans both sibling tablets This patch implements #1. By awaiting for topology to quiesce, we guarantee that load-and-stream only starts when there's no chance coordinator is handling some topology operation like split finalization. Fixes #26455. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `3abc66da5a`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-11-21 10:39:53 -03:00
Raphael S. Carvalho	451ae85b38	sstable_set: incremental_reader_selector: be more careful when filtering out already engaged sstables The incremental reader selector maintains an unordered_set of sstables that are already engaged, and uses std::views::filter to filter those out. It adds the sstable under consideration to the set, and if addition failed (because it's already in) then it filters it out. This breaks if the filter view is executed twice - the first pass will add every sstable to the set, and the second will consider every sstable already filtered. This is what happens with libstdc++ 15 (due to the addition of vector(from_range_t) constructor), which uses the first pass to calculate the vector size and the second pass to insert the elements into a correctly-sized vector. Fix by open-coding the loop. Closes scylladb/scylladb#23597 (cherry picked from commit `ac3d25eb44`) Fixes #26247 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-11-21 10:38:59 -03:00
Botond Dénes	cb1f72dc81	Merge '[Backport 2025.1] Automatic cleanup improvements' from Scylladb[bot] This series allows an operator to reset 'cleanup needed' flag if he already cleaned up the node, so that automatic cleanup will not do it again. We also change 'nodetool cleanup' back to run cleanup on one node only (and reset 'cleanup needed' flag in the end), but the new '--global' option allows to run cleanup on all nodes that needed it simultaneously. Fixes https://github.com/scylladb/scylladb/issues/26866 Backport to all supported version since automatic cleanup behaviour as it is now may create unexpected by the operator load during cluster resizing. - (cherry picked from commit `e872f9cb4e`) - (cherry picked from commit `0f0ab11311`) Parent PR: #26868 Closes scylladb/scylladb#27089 * github.com:scylladb/scylladb: cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster cleanup: Add RESTful API to allow reset cleanup needed flag scylla-2025.1.10-candidate-20251123021923 scylla-2025.1.10	2025-11-21 13:52:08 +02:00
Patryk Jędrzejczak	5ba523fa77	test: test_raft_recovery_stuck: ensure mutual visibility before using driver Not waiting for nodes to see each other as alive can cause the driver to fail the request sent in `wait_for_upgrade_state()`. scylladb/scylladb#19771 has already replaced concurrent restarts with `ManagerClient.rolling_restart()`, but it has missed this single place, probably because we do concurrent starts here. Fixes #27055 Closes scylladb/scylladb#27075 (cherry picked from commit `e35ba974ce`) Closes scylladb/scylladb#27107	2025-11-20 10:55:12 +02:00
Botond Dénes	cf39bd8f3e	Merge '[Backport 2025.1] service/qos: Fall back to default scheduling group when using maintenance socket' from Scylladb[bot] The service level controller relies on `auth::service` to collect information about roles and the relation between them and the service levels (those attached to them). Unfortunately, the service level controller is initialized way earlier than `auth::service` and so we had to prevent potential invalid queries of user service levels (cf. `46193f5e79`). Unfortunately, that came at a price: it made the maintenance socket incompatible with the current implementation of the service level controller. The maintenance socket starts early, before the `auth::service` is fully initialized and registered, and is exposed almost immediately. If the user attempts to connect to Scylla within this time window, via the maintenance socket, one of the things that will happen is choosing the right service level for the connection. Since the `auth::service` is not registered, Scylla with fail an assertion and crash. A similar scenario occurs when using maintenance mode. The maintenance socket is how the user communicates with the database, and we're not prepared for that either. To avoid unnecessary crashes, we add new branches if the passed user is absent or if it corresponds to the anonymous role. Since the role corresponding to a connection via the maintenance socket is the anonymous role, that solves the problem. Some accesses to `auth::service` are not affected and we do not modify those. Fixes scylladb/scylladb#26816 Backport: yes. This is a fix of a regression. - (cherry picked from commit `c0f7622d12`) - (cherry picked from commit `222eab45f8`) - (cherry picked from commit `394207fd69`) - (cherry picked from commit `b357c8278f`) Parent PR: #26856 Closes scylladb/scylladb#27029 * github.com:scylladb/scylladb: test/cluster/test_maintenance_mode.py: Wait for initialization test: Disable maintenance mode correctly in test_maintenance_mode.py test: Fix keyspace in test_maintenance_mode.py service/qos: Do not crash Scylla if auth_integration absent	2025-11-20 10:47:06 +02:00
Botond Dénes	a84b331b09	Merge '[Backport 2025.1] cdc: set column drop timestamp in the future' from Scylladb[bot] When dropping a column from a CDC log table, set the column drop timestamp several seconds into the future. If a value is written to a column concurrently with dropping that column, the value's timestamp may be after the column drop timestamp. If this value is also flushed to an SSTable, the SSTable would be corrupted, because it considers the column missing after the drop timestamp and doesn't allow values for it. While this issue affects general tables, it especially impacts CDC tables because this scenario can occur when writing to a table with CDC preimage enabled while dropping a column from the base table. This happens even if the base mutation doesn't write to the dropped column, because CDC log mutations can generate values for a column even if the base mutation doesn't. For general tables, this issue can be avoided by simply not writing to a column while dropping it. We fix this for the more problematic case of CDC log tables by setting the column drop timestamp several seconds into the future, ensuring that writes concurrent with column drops are much less likely to have timestamps greater than the column drop timestamp. Fixes https://github.com/scylladb/scylladb/issues/26340 the issue affects all previous releases, backport to improve stability - (cherry picked from commit `eefae4cc4e`) - (cherry picked from commit `48298e38ab`) - (cherry picked from commit `039323d889`) - (cherry picked from commit `e85051068d`) Parent PR: #26533 Closes scylladb/scylladb#27025 * github.com:scylladb/scylladb: test: test concurrent writes with column drop with cdc preimage cdc: check if recreating a column too soon cdc: set column drop timestamp in the future	2025-11-20 10:46:24 +02:00
Gleb Natapov	1368f48221	cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster `97ab3f6622` changed "nodetool cleanup" (without arguments) to run cleanup on all dirty nodes in the cluster. This was somewhat unexpected, so this patch changes it back to run cleanup on the target node only (and reset "cleanup needed" flag afterwards) and it adds "nodetool cluster cleanup" command that runs the cleanup on all dirty nodes in the cluster. (cherry picked from commit `0f0ab11311`)	2025-11-18 16:01:27 +02:00
Gleb Natapov	c6d443869a	cleanup: Add RESTful API to allow reset cleanup needed flag Cleaning up a node using per keyspace/table interface does not reset cleanup needed flag in the topology. The assumption was that running cleanup on already clean node does nothing and completes quickly. But due to https://github.com/scylladb/scylladb/issues/12215 (which is closed as WONTFIX) this is not the case. This patch provides the ability to reset the flag in the topology if operator cleaned up the node manually already. (cherry picked from commit `e872f9cb4e`)	2025-11-18 15:46:39 +02:00
Yaron Kaikov	a01c2dc7e4	install-dependencies.sh: update node_exporter to 1.10.2 Update node exporter to solve CVE-2025-22871 [regenerate frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-x86_64.tar.gz ] Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-5 Closes scylladb/scylladb#26916 (cherry picked from commit `c601371b57`) Closes scylladb/scylladb#26950	2025-11-16 16:14:19 +02:00
Yaron Kaikov	1a224e7d05	auto-backport: Add support for JIRA issue references - Added support for JIRA issue references in PR body and commit messages - Supports both short format (PKG-92) and full URL format - Maintains existing GitHub issue reference support - JIRA pattern matches https://scylladb.atlassian.net/browse/{PROJECT-ID} - Allows backporting for PRs that reference JIRA issues with 'fixes' keyword Fixes: https://github.com/scylladb/scylladb/issues/26955 Closes scylladb/scylladb#26954 (cherry picked from commit `3ade3d8f5b`) Closes scylladb/scylladb#26963	2025-11-16 16:14:03 +02:00
Benny Halevy	4dcb8c19bd	scylla-sstable: correctly dump sharding_metadata This patch fixes 2 issues at one go: First, Currently sstables::load clears the sharding metadata (via open_data()), and so scylla-sstable always prints an empty array for it. Second, printing token values would generate invalid json as they are currently printed as binary bytes, and they should be printed simply as numbers, as we do elsewhere, for example, for the first and last keys. Fixes #26982 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#26991 (cherry picked from commit `f9ce98384a`) Closes scylladb/scylladb#27030	2025-11-16 16:05:14 +02:00
Michael Litvak	c8466afe74	test: test concurrent writes with column drop with cdc preimage add a test that writes to a table concurrently with dropping a column, where the table has CDC enabled with preimage. the test reproduces issue #26340 where this results in a malformed sstable. (cherry picked from commit `e85051068d`)	2025-11-16 10:19:08 +01:00
Michael Litvak	903afafa5f	cdc: check if recreating a column too soon When we drop a column from a CDC log table, we set the column drop timestamp a few seconds into the future. This can cause unexpected problems if a user tries to recreate a CDC column too soon, before the drop timestamp has passed. To prevent this issue, when creating a CDC column we check its creation timestamp against the existing drop timestamp, if any, and fail with an informative error if the recreation attempt is too soon. (cherry picked from commit `039323d889`)	2025-11-16 10:19:08 +01:00
Michael Litvak	1d6538cd30	cdc: set column drop timestamp in the future When dropping a column from a CDC log table, set the column drop timestamp several seconds into the future. If a value is written to a column concurrently with dropping that column, the value's timestamp may be after the column drop timestamp. If this value is also flushed to an SSTable, the SSTable would be corrupted, because it considers the column missing after the drop timestamp and doesn't allow values for it. While this issue affects general tables, it especially impacts CDC tables because this scenario can occur when writing to a table with CDC preimage enabled while dropping a column from the base table. This happens even if the base mutation doesn't write to the dropped column, because CDC log mutations can generate values for a column even if the base mutation doesn't. For general tables, this issue can be avoided by simply not writing to a column while dropping it. We fix this for the more problematic case of CDC log tables by setting the column drop timestamp several seconds into the future, ensuring that writes concurrent with column drops are much less likely to have timestamps greater than the column drop timestamp. Fixes scylladb/scylladb#26340 (cherry picked from commit `48298e38ab`)	2025-11-16 10:19:01 +01:00
Dawid Mędrek	2313aa5856	test/cluster/test_maintenance_mode.py: Wait for initialization If we try to perform queries too early, before the call to `storage_service::start_maintenance_mode` has finished, we will fail with the following error: ``` ERROR 2025-11-12 20:32:27,064 [shard 0:sl:d] token_metadata - sorted_tokens is empty in first_token_index! ``` To avoid that, we should wait until initialization is complete. (cherry picked from commit `b357c8278f`)	2025-11-15 22:09:14 +00:00
Dawid Mędrek	b217a5e43a	test: Disable maintenance mode correctly in test_maintenance_mode.py Although setting the value of `maintenance_mode` to the string `"false"` disables maintenance mode, the testing framework misinterprets the value and thinks that it's actually enabled. As a result, it might try to connect to Scylla via the maintenance socket, which we don't want. (cherry picked from commit `394207fd69`)	2025-11-15 22:09:14 +00:00
Dawid Mędrek	7808d85ecb	test: Fix keyspace in test_maintenance_mode.py The keyspace used in the test is not necessarily called `ks`. (cherry picked from commit `222eab45f8`)	2025-11-15 22:09:14 +00:00
Dawid Mędrek	f47b0743d0	service/qos: Do not crash Scylla if auth_integration absent If the user connects to Scylla via the maintenance socket, it may happen that `auth_integration` has not been registered in the service level controller yet. One example is maintenance mode when that will never happen; another when the connection occurs before Scylla is fully initialized. To avoid unnecessary crashes, we add new branches if the passed user is absent or if it corresponds to the anonymous role. Since the role corresponding to a connection via the maintenance socket is the anonymous role, that solves the problem. In those cases, we completely circumvent any calls to `auth_integration` and handle them separately. The modified methods are: * `get_user_scheduling_group`, * `with_user_service_level`, * `describe_service_levels`. For the first two, the new behavior is in line with the previous implementation of those functions. The last behaves differently now, but since it's a soft error, crashing the node is not necessary anyway. We throw an exception instead, whose error message should give the user a hint of what might be wrong. The other uses of `auth_integration` within the service level controller are not problematic: * `find_effective_service_level`, * `find_cached_effective_service_level`. They take the name of a role as their argument. Since the anonymous role doesn't have a name, it's not possible to call them with it. Fixes scylladb/scylladb#26816 (cherry picked from commit `c0f7622d12`)	2025-11-15 22:09:13 +00:00
Jenkins Promoter	3818e15d91	Update pgo profiles - aarch64	2025-11-15 05:02:53 +02:00
Jenkins Promoter	a945742c2a	Update pgo profiles - x86_64	2025-11-15 04:02:16 +02:00
Ernest Zaslavsky	e185740c54	minio: update CLI usage, remove deprecated `mc` options Replace phased-out `mc` command options with supported alternatives. Ensures compatibility with the latest MinIO version. Closes scylladb/scylladb#24363 (cherry picked from commit `1446f57635`) Closes scylladb/scylladb#27004	2025-11-14 10:48:00 +02:00
Andrei Chekun	88556e6c77	test.py: rewrite the wait_for_first_completed Rewrite wait_for first_completed to return only first completed task guarantee of awaiting(disappearing) all cancelled and finished tasks Use wait_for_first_completed to avoid false pass tests in the future and issues like #26148 Use gather_safely to await tasks and removing warning that coroutine was not awaited Closes scylladb/scylladb#26435 (cherry picked from commit `24d17c3ce5`) Closes scylladb/scylladb#26661	2025-11-12 11:50:51 +01:00
Botond Dénes	bec413a671	service/storage_proxy: send batches with CL=EACH_QUORUM Batches that fail on the initial send are retired later, until they succeed. These retires happen with CL=ALL, regardless of what the original CL of the batch was. This is unnecessarily strict. We tried to follow Cassandra here, but Cassandra has a big caveat in their use of CL=ALL for batches. They accept saving just a hint for any/all of the endpoints, so a batch which was just logged in hints is good enough for them. We do not plan on replicating this usage of hints at this time, so as a middle ground, the CL is changed to EACH_QUORUM. Fixes: scylladb/scylladb#25432 Closes scylladb/scylladb#26304 (cherry picked from commit `d9c3772e20`) Closes scylladb/scylladb#26927	2025-11-11 10:23:59 +03:00
Jenkins Promoter	01e929805a	Update pgo profiles - aarch64	2025-11-01 05:04:05 +02:00
Jenkins Promoter	9f961d67d7	Update pgo profiles - x86_64	2025-11-01 04:31:27 +02:00

1 2 3 4 5 ...

47138 Commits