scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-22 15:52:13 +00:00

Author	SHA1	Message	Date
Gleb Natapov	c3d2f0bde9	raft_group0: remove finish_setup_after_join function The only thing it does not change a bootstrapping node to become a voter in case the cluster does not support limited voters feature. But the feature was introduced in 2025.2 and direct upgrade from 2025.1 to version newer than 2026.1 is not supported. But even if such upgrade is done the removed code has affect only during bootstrap, not during regular boot. Also remove the upgrade test since after the patch suppressing the feature on the first boot will no longer behave correctly.	2026-05-11 15:38:36 +03:00
Gleb Natapov	5213aee99f	raft_group0: fix indentation after the last change	2026-05-11 11:56:26 +03:00
Gleb Natapov	5f7f72fa50	raft_group: drop unneeded checks	2026-05-11 11:55:39 +03:00
Botond Dénes	3f72852d8c	Merge 'Fix missing format string placeholders across the codebase (33 bugs across 14 modules )' from Yaniv Kaul Fix 28 format string bugs plus 5 related format argument bugs across 14 modules where `{}` placeholders were missing or arguments were wrong, causing arguments to be silently dropped or misleading output from the `{fmt}` library. Inspired by https://github.com/scylladb/scylladb/pull/29143 (which fixed a single instance in `replica/table.cc`), a comprehensive audit of the entire codebase was performed to find all similar issues. - Missing `{}` placeholder (21 instances): format string simply lacks `{}` for a passed argument, e.g. `format("msg for table {}", group_id, table_id)` -- `group_id` is silently dropped - Spurious comma breaking C++ string literal concatenation (2 instances): a comma after a string literal prevents adjacent-literal concatenation, turning the continuation into a format argument instead of part of the format string - Printf-style `%s` in fmtlib context (4 instances): `%s` has no meaning in fmtlib and appears as literal text while the argument is silently ignored - Extra spurious argument (1 instance): an extraneous `t.tomb()` argument inserted between correct arguments, causing wrong values in the wrong slots - Wrong variable in error message (4 instances in `types/map.hh`): error messages for oversized map keys/values reported `map_size` (total entry count) instead of the actual `elem.first.size()` or `elem.second.size()` that exceeded the limit - Swapped argument order (1 instance in `data_dictionary/data_dictionary.cc`): format string says `"Extraneous options for {type}: {values}"` but the values and type arguments were passed in reverse order \| Module \| Bugs Fixed \| Files \| \|--------\|:---------:\|-------\| \| `replica/` \| 1 \| `table.cc` \| \| `service/` \| 4 \| `raft_group0.cc`, `storage_service.cc` \| \| `db/` \| 6 \| `heat_load_balance.cc`, `commitlog_replayer.cc`, `view_update_generator.cc`, `view_building_worker.cc`, `row_locking.cc` \| \| `cql3/` \| 2 \| `prepare_expr.cc`, `statement_restrictions.cc` \| \| `transport/` \| 4 \| `event_notifier.cc` \| \| `sstables/` \| 3 \| `partition_reversing_data_source.cc`, `reader.cc` \| \| `alternator/` \| 1 \| `conditions.cc` \| \| `cdc/` \| 1 \| `split.cc` \| \| `raft/` \| 1 \| `server.cc` \| \| `utils/` \| 2 \| `gcp/object_storage.cc`, `s3/client.cc` \| \| `mutation/` \| 1 \| `mutation_partition.hh` \| \| `ent/` \| 2 \| `kmip_host.cc`, `kms_host.cc` \| \| `types/` \| 4 \| `map.hh` \| \| `data_dictionary/` \| 1 \| `data_dictionary.cc` \| The `{fmt}` library's compile-time checker validates that each `{}` placeholder references a valid argument, but does not verify the reverse -- that every argument has a corresponding placeholder. Extra arguments are silently ignored at both compile time and runtime. Build verified with `dbuild ninja build/dev/scylla` -- compiles cleanly. --- Note: Commits were amended to fix the author name from "Yaniv Michael Kaul" to "Yaniv Kaul". Closes scylladb/scylladb#29448 * github.com:scylladb/scylladb: data_dictionary: fix swapped arguments in extraneous options error types: fix wrong variable in map key/value size error messages ent: fix missing format placeholders in encryption error/log messages mutation: fix spurious argument in shadowable_tombstone formatter utils: fix missing format placeholders in object storage log messages raft: fix missing format placeholder in server ostream operator cdc: fix missing format placeholder in error message alternator: fix missing format placeholder in error message sstables: fix missing format placeholders in error messages transport: fix printf-style format specifiers in fmtlib log calls cql3: fix missing format placeholders in error messages db: fix missing format placeholders in log and error messages service: fix missing format placeholders in log messages replica: fix missing format placeholder in cleanup log message	2026-05-11 07:04:42 +03:00
Yaniv Kaul	4ee81f9b32	service: fix missing format placeholders in log messages Fix four format string bugs: - raft_group0.cc: the exception from sleep_and_abort was passed as an argument but had no {} placeholder, so it was silently dropped. - storage_service.cc: loading topology trace was missing a placeholder for the cleanup field (9 args but only 8 placeholders). - storage_service.cc: two join-rejection warnings had a spurious comma after the first string literal, breaking C++ string concatenation. This caused the continuation string to be treated as a separate format argument instead of being part of the format string, and params.host_id was silently dropped. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2026-05-10 17:49:50 +03:00
Avi Kivity	5a887362e3	Merge 'Remove legacy tables creation code' from Gleb Natapov Drop creation of `service_levels` and `cdc_generation_descriptions_v2` table creation code since they are no longer needed. Old clusters will still have it because they were created earlier. Also the series contains a small improvement around group0 creation. No backport needed since this removes functionality. Closes scylladb/scylladb#29482 * github.com:scylladb/scylladb: db/system_distributed_keyspace: remove system_distributed_everywhere since it is unused db/system_distributed_keyspace: drop CDC_TOPOLOGY_DESCRIPTION and CDC_GENERATIONS_V2 db/system_distributed_keyspace: remove unused code db/system_distributed_keyspace: drop old cdc_generation_descriptions_v2 table db/system_distributed_keyspace: drop old service_levels table fix indent after the previous patch group0: call setup_group0 only when needed	2026-05-10 14:46:21 +03:00
Yaniv Michael Kaul	6179406467	raft/group0: fix destroy assertion on startup failure If start_server_for_group0() successfully registers a server in _raft_gr._servers but a subsequent step (e.g. enable_in_memory_state_machine()) throws, the server is never destroyed because abort_and_drain()/destroy() check std::get_if<raft::group_id>(&_group0) which was only set after the entire with_scheduling_group block completed. Move _group0.emplace<raft::group_id>() inside the lambda, immediately after start_server_for_group() succeeds, so that cleanup paths can always find and destroy the registered server. This fixes the assertion: "raft_group_registry - stop(): server for group ... is not destroyed" which manifests during shutdown after an upgrade where topology_state_load() fails due to netw::unknown_address. Backport: Yes, to 2026.1, 2026.2, as it causes a crash on upgrades Refs: SCYLLADB-1217 Refs: CUSTOMER-340 Refs: CUSTOMER-335 Fixes: SCYLLADB-1801 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-assisted: Yes, Opencode/Opus 4.6 Closes scylladb/scylladb#29702	2026-05-04 11:25:46 +02:00
Gleb Natapov	0ef06a34ed	group0: call setup_group0 only when needed setup_group0 and setup_group0_if_exist have hidden condition inside that make them no-op. It is not clear at the call site that functions may do nothing. Change the code to check the conditions at the call site instead.	2026-04-15 15:48:48 +03:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Patryk Jędrzejczak	b9f82f6f23	raft_group0: join_group0: fix join hang when node joins group 0 before post_server_start A joining node hung forever if the topology coordinator added it to the group 0 configuration before the node reached `post_server_start`. In that case, `server->get_configuration().contains(my_id)` returned true and the node broke out of the join loop early, skipping `post_server_start`. `_join_node_group0_started` was therefore never set, so the node's `join_node_response` RPC handler blocked indefinitely. Meanwhile the topology coordinator's `respond_to_joining_node` call (which has no timeout) hung forever waiting for the reply that never came. Fix by only taking the early-break path when not starting as a follower (i.e. when the node is the discovery leader or is restarting). A joining node must always reach `post_server_start`. We also provide a regression test. It takes 6s in dev mode. Fixes SCYLLADB-959 Closes scylladb/scylladb#29266	2026-03-31 12:33:56 +02:00
Marcin Maliszkiewicz	61952cd985	raft: service: reload auth cache before service levels Since service levels depend on auth data, and not other way around, we need to ensure a proper loading order.	2026-03-18 09:06:20 +01:00
Marcin Maliszkiewicz	c4cfb278bc	service: raft: move update_service_levels_effective_cache check The auth::cache::includes_table function also covers role_members and role_attributes. The existing check was removed because it blocked these tables from triggering necessary cache updates. While previously non-critical (due to unused attributes and table coupling), maintaining a correct cache is essential for upcoming changes.	2026-03-18 09:06:20 +01:00
Patryk Jędrzejczak	f85628a9a0	group0: discovery: shorten the pause duration Nodes currently pause group0 discovery for 1s. This case is always hit while adding multiple nodes in parallel to an empty cluster by all nodes except the one that becomes the group0 leader. This is fine in production, but in tests, the slowdown is quite significant. Every `manager.servers_add(n)` call for n > 1 becomes 1s slower when the cluster is empty. Many cluster tests are affected. In this commit, we decrease the sleep duration from 1s to 100ms to speed up tests. The consequence of this change is that nodes might perform more steps in group0 discovery, but the increase in CPU usage and network traffic should be negligible.	2026-03-12 15:40:18 +01:00
Patryk Jędrzejczak	37aeba9c8c	Merge 'raft: add global read barrier to group0_batch::commit and switch auth and service levels' from Marcin Maliszkiewicz This series adds a global read barrier to raft_group0_client, ensuring that Raft group0 mutations are applied on all live nodes before returning to the caller. Currently, after a group0_batch::commit, the mutations are only guaranteed to be applied on the leader. Other nodes may still be catching up, leading to stale reads. This patch introduces a broadcast read barrier mechanism. Calling send_group0_read_barrier_to_live_members after committing will cause the coordinator to send a read barrier RPC to all live nodes (discovered via gossiper) and waits for them to complete. This is best effort attempt to get cluster-wide visibility of the committed state before the response is returned to the user. Auth and service levels write paths are switched to use this new mechanism. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-650 Backport: no, new feature Closes scylladb/scylladb#28731 * https://github.com/scylladb/scylladb: test: add tests for global group0_batch barrier feature qos: switch service levels write paths to use global group0_batch barrier auth: switch write paths to use global group0_batch barrier raft: add function to broadcast read barrier request raft: add gossiper dependency to raft_group0_client raft: add read barrier RPC	2026-03-11 10:37:19 +01:00
Gleb Natapov	0e3e7be335	group0: drop with_raft() function from group0_guard since it always returns true now Also drop the code that assumed that the function can return false.	2026-03-10 10:39:58 +02:00
Gleb Natapov	0b508c5f96	test: remove unused injection points Also remove test_auth_raft_command_split test which is irrelevant since `5ba7d1b116` because it does not use the function that injects max sized command after the commit.	2026-03-10 10:09:39 +02:00
Gleb Natapov	02fc4ad0a9	treewide: remove schema pull code since we never pull schema any more Schema pull was used by legacy schema code which is not supported for a long time now and during legacy recovery which is no longer supported as well. It can be dropped now.	2026-03-10 10:09:39 +02:00
Gleb Natapov	60a861c518	group0: hoist the checks for an illegal upgrade into main.cc The checks are spread around now, but having then in one place and done as early as possible simplifies the logic.	2026-03-10 10:09:39 +02:00
Gleb Natapov	00083b42a7	group0: get rid of group0_upgrade_state Simplify code by getting rid of group0_upgrade_state since upgrade is no longer supported, so no need to track its state. The none upgraded node will simply not boot and to detect that the patch checks the state directly from the system table.	2026-03-10 10:09:38 +02:00
Gleb Natapov	49ebab971d	storage_service: set topology change kind only once The only support mode is topology_change_kind::raft, so always set it in storage_service::join_cluster during join or regular boot. Drop the check for legacy mode from raft_group0::setup_group0_if_exist since the mode will not be set at this point any longer. The wrong upgrade will still be detected in storage_service::join_cluster where topology.upgrade_state is checked directly.	2026-03-10 10:09:38 +02:00
Gleb Natapov	4e072977d4	group0: drop in_recovery function and its uses Legacy recovery procedure is no longer supported and the code can be dropped.	2026-03-10 10:09:38 +02:00
Gleb Natapov	770762edd8	group0: rename use_raft to maintenance_mode and make it sync group0_upgrade_state::recovery is now used only in maintenance mode so rename the function to indicate it. Also there is no preemption point in the function any more and it can be a regular function, not a co-routine.	2026-03-10 10:09:33 +02:00
Marcin Maliszkiewicz	4c8681a927	raft: add function to broadcast read barrier request This function ensures that all alive nodes executed read barrier. It will be usefull for the following commits which would eventually delay returning response to the user until mutations are applied on other nodes so that the user may perceive better data consistency accross nodes.	2026-03-09 15:15:59 +01:00
Marcin Maliszkiewicz	cbae84a926	raft: add gossiper dependency to raft_group0_client In following commit raft_group0_client will send read barrier RPC to all alive nodes, it takes list of the nodes from gossiper.	2026-03-09 15:15:59 +01:00
Marcin Maliszkiewicz	8422fbca9f	raft: add read barrier RPC The RPC does read barrier on a destination node. It will be issued in following commits to live nodes to assure that command was applied everywhere.	2026-03-09 15:15:59 +01:00
Patryk Jędrzejczak	4c8dba15f1	Merge 'strong_consistency/state_machine: ensure and upgrade mutations schema' from Michał Jadwiszczak This patch fixes 2 issues within strong consistency state machine: - it might happen that apply is called before the schema is delivered to the node - on the other hand, the apply may be called after the schema was changed and purged from the schema registry The first problem is fixed by doing `group0.read_barrier()` before applying the mutations. The second one is solved by upgrading the mutations using column mappings in case the version of the mutations' schema is older. Fixes SCYLLADB-428 Strong consistency is in experimental phase, no need to backport. Closes scylladb/scylladb#28546 * https://github.com/scylladb/scylladb: test/cluster/test_strong_consistency: add reproducer for old schema during apply test/cluster/test_strong_consistency: add reproducer for missing schema during apply test/cluster/test_strong_consistency: extract common function raft_group_registry: allow to drop append entries requests for specific raft group strong_consistency/state_machine: find and hold schemas of applying mutations strong_consistency/state_machine: pull necessary dependencies db/schema_tables: add `get_column_mapping_if_exists()`	2026-03-09 09:49:22 +01:00
Michał Jadwiszczak	3548b7ad38	raft_group_registry: allow to drop append entries requests for specific raft group Similar to `raft_drop_incoming_append_entries`, the new error injection `raft_drop_incoming_append_entries_for_specified_group` skips handler for `raft_append_entries` RPC but it allows to specify id of raft group for which the requests should be dropped. The id of a raft group should be passed in error injection parameters under `value` key.	2026-03-05 13:47:43 +01:00
Tomasz Grabiec	b90fe19a42	Merge 'service: assert that tables updated via group0 use schema commitlog' from Aleksandra Martyniuk Set enable_schema_commitlog for each group0 tables. Assert that group0 tables use schema commitlog in ensure_group0_schema (per each command). Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-914. Needs backport to all live releases as all are vulnerable Closes scylladb/scylladb#28876 * github.com:scylladb/scylladb: test: add test_group0_tables_use_schema_commitlog db: service: remove group0 tables from schema commitlog schema initializer service: ensure that tables updated via group0 use schema commitlog db: schema: remove set_is_group0_table param	2026-03-05 13:28:13 +01:00
Aleksandra Martyniuk	690b2c4142	service: ensure that tables updated via group0 use schema commitlog Set enable_schema_commitlog for each group0 tables. Assert that group0 tables use schema commitlog in ensure_group0_schema (per each command). Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-914.	2026-03-04 17:25:04 +01:00
Avi Kivity	85bd6d0114	Merge 'Add multiple-shard persistent metadata storage for strongly consistent tables' from Wojciech Mitros In this series we introduce new system tables and use them for storing the raft metadata for strongly consistent tables. In contrast to the previously used raft group0 tables, the new tables can store data on any shard. The tables also allow specifying the shard where each partition should reside, which enables the tablets of strongly consistent tables to have their raft group metadata co-located on the same shard as the tablet replica. The new tables have almost the same schemas as the raft group0 tables. However, they have an additional column in their partition keys. The additional column is the shard that specifies where the data should be located. While a tablet and its corresponding raft group server resides on some shard, it now writes and reads all requests to the metadata tables using its shard in addition to the group_id. The extra partition key column is used by the new partitioner and sharder which allow this special shard routing. The partitioner encodes the shard in the token and the sharder decodes the shard from the token. This approach for routing avoids any additional lookups (for the tablet mapping) during operations on the new tables and it also doesn't require keeping any state. It also doesn't interact negatively with resharding - as long as tablets (and their corresponding raft metadata) occupy some shard, we do not allow starting the node with a shard count lower than the id of this shard. When increasing the shard count, the routing does not change, similarly to how tablet allocation doesn't change. To use the new tables, a new implementation of `raft::persistence` is added. Currently, it's almost an exact copy of the `raft_sys_table_storage` which just uses the new tables, but in the future we can modify it with changes specific to metadata (or mutation) storage for strongly consistent tables. The new storage is used in the `groups_manager`, which combined with the removal of some `this_shard_id() == 0` checks, allows strongly consistent tables to be used on all shards. This approach for making sure that the reads/writes to the new tables end up on the correct shards won in the balance of complexity/usability/performance against a few other approaches we've considered. They include: 1. Making the Raft server read/write directly to the database, skipping the sharder, on its shard, while using the default partitioner/sharder. This approach could let us avoid changing the schema and there should be no problems for reads and writes performed by the Raft server. However, in this approach we would input data in tables conflicting with the placement determined by the sharder. As a result, any read going through the sharder could miss the rows it was supposed to read. Even when reading all shards to find a specific value, there is a risk of polluting the cache - the rows loaded on incorrect shards may persist in the cache for an unknown amount of time. The cache may also mistakenly remember that a row is missing, even though it's actually present, just on an incorrect shard. Some of the issues with this approach could be worked around using another sharder which always returns this_shard_id() when asked about a shard. It's not clear how such a sharder would implement a method like `token_for_next_shard`, and how much simpler it would be compared to the current "identity" sharder. 2. Using a sharder depending on the current allocation of tablets on the node. This approach relies on the knowledge of group_id -> shard mapping at any point in time in the cluster. For this approach we'd also need to either add a custom partitioner which encodes the group_id in the token, or we'd need to track the token(group_id) -> shard mapping. This approach has the benefit over the one used in the series of keeping the partition key as just group_id. However, it requires more logic, and the access to the live state of the node in the sharder, and it's not static - the same token may be sharded differently depending on the state of the node - it shouldn't occur in practice, but if we changed the state of the node before adjusting the table data, we would be unable to access/fix the stale data without artificially also changing the state of the node. 3. Using metadata tables co-located to the strongly consistent tables. This approach could simplify the metadata migrations in the future, however it would require additional schema management of all co-located metadata tables, and it's not even obvious what could be used as the partition key in these tables - some metadata is per-raft-group, so we couldn't reuse the partition key of the strongly consistent table for it. And finding and remembering a partition key that is routed to a specific shard is not a simple task. Finally, splits and merges will most likely need special handling for metadata anyway, so we wouldn't even make use of co-located table's splits and merges. Fixes [SCYLLADB-361](https://scylladb.atlassian.net/browse/SCYLLADB-361) [SCYLLADB-361]: https://scylladb.atlassian.net/browse/SCYLLADB-361?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28509 * github.com:scylladb/scylladb: docs: add strong consistency doc test/cluster: add tests for strongly-consistent tables' metadata persistence raft: enable multi-shard raft groups for strongly consistent tablets test/raft: add unit tests for raft_groups_storage raft: add raft_groups_storage persistence class db: add system tables for strongly consistent tables' raft groups dht: add fixed_shard_partitioner and fixed_shard_sharder raft: add group_id -> shard mapping to raft_group_registry schema: add with_sharder overload accepting static_sharder reference	2026-03-04 08:55:43 +02:00
Wojciech Mitros	f841c0522d	raft: enable multi-shard raft groups for strongly consistent tablets In this patch we allow strongly consistent tables to have tablets on shards different than 0. For that, we remove the checks for shard 0 for the non-group0 raft groups, and we allow the tablet allocator to place tablets of strongly consistent tables on shards different than 0. We also start using the new storage (raft::persistence) for strongly consistent tables, added in the preceding commits.	2026-02-25 12:34:58 +01:00
Gleb Natapov	0f8cdd81f3	group0: fix indentation after previous patch	2026-02-25 10:08:32 +02:00
Gleb Natapov	7d7cbae763	raft_group0: simplify get_group0_upgrade_state function since no upgrade can happen any more No need for locking any more so the function may just return a value and be synchronous.	2026-02-25 10:08:32 +02:00
Gleb Natapov	0689fb5ab2	raft_group0: move service::group0_upgrade_state to use fmt::formatter instead of iostream	2026-02-25 10:08:32 +02:00
Gleb Natapov	cd76604c79	raft_group0: remove unused code from raft_group0 Also do not pass raft_replace_info into setup_group0 since it is not used there for a long time now.	2026-02-25 10:08:32 +02:00
Gleb Natapov	758d1c9c39	topology: fix indentation after the previous patch	2026-02-25 10:08:31 +02:00
Gleb Natapov	67cd5755b2	topology: drop topology_change_enabled parameter from raft_group0 code Since the parameter is always true there is no point to pass it everywhere. Just assume it is true at the point of use.	2026-02-25 10:08:31 +02:00
Gleb Natapov	a8a167623a	topology: remove code that assumes raft_topology_change_enabled() may return false The path removes the code protected by !raft_topology_change_enabled() since it is no longer reachable. Drop test_lwt_for_tablets_is_not_supported_without_raft since not raft mode is no longer supported.	2026-02-25 10:08:30 +02:00
Wojciech Mitros	c1b3fec11a	raft: add group_id -> shard mapping to raft_group_registry To handle RPC from other nodes, we need to be able to redirect the requests for each raft group to the shard that owns it. We need to be able to do the redirection on all shards, so to achieve that, on all shards we need to store the information about which shard is occupied by each Raft group server. For that we add a group_id -> shard mapping to the raft_group_registry. The mapping is filled out when starting raft servers, it's emptied when we abort raft servers. We use it when registering RPC verb handlers, so that regardless of the shard handling the RPC, the work on the raft group can be performed on the corresponding shard.	2026-02-23 15:34:56 +01:00
Gleb Natapov	92049c3205	topology: remove upgrade to raft topology code We do no longer need this code since we expect that cluster to be upgraded before moving to this version.	2026-02-23 14:54:24 +02:00
Gleb Natapov	4a9cf687cc	group0: remove upgrade to group0 code This patch removes ability of a cluster to upgrade from not having group0 to having one. This ability is used in gossiper based recovery procedure that is deprecated and removed in this version. Also remove tests that uses the procedure.	2026-02-23 14:54:24 +02:00
Gleb Natapov	dcafb5c083	group0: refuse to boot if a cluster is still is not in a raft topology mode We are going to drop legacy topology mode (gossiper mode) and no longer allow ScyllaDB to start in this mode. This patch refuses to boot if a cluster is not in raft topology mode yet. It may happen if a node of a cluster that is not yet in a raft topology is upgraded to a newer version. If this happens the node has to be downgraded. Raft topology has to be enabled on a cluster and then the node can be upgraded again.	2026-02-23 14:54:24 +02:00
Patryk Jędrzejczak	e21ecf69de	raft topology: make some assertions non-crashing Some assertions in the Raft-based topology are likely to cause crashes of multiple nodes due to the consistent nature of the Raft-based code. If the failing assertion is executed in the code run by each follower (e.g., the code reloading the in-memory topology state machine), then all nodes can crash. If the failing assertion is executed only by the leader (e.g., the topology coordinator fiber), then multiple consecutive group0 leaders will chain-crash until there is no group0 majority. Crashing multiple nodes is much more severe than necessary. It's enough to prevent the topology state machine from making more progress. This will naturally happen after throwing a runtime error. The problematic fiber will be killed or will keep failing in a loop. Note that it should be safe to block the topology state machine, but not the whole group0, as the topology state machine is mostly isolated from the rest of group0. We replace some occurrences of `on_fatal_internal_error` and `SCYLLA_ASSERT` with `on_internal_error`. These are not all occurrences, as some fatal assertions make sense, for example, in the bootstrap procedure.	2026-02-12 13:10:03 +01:00
Pavel Emelyanov	87920d16d8	raft_group0_client: Dont export system keyspace Now system_keyspace reference is used internally by the client code itself, no need to encourage other services abuse it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-27 14:51:40 +03:00
Pavel Emelyanov	966119ce30	raft_group0_client: Add and use get_last_group0_state_id() There are several places that want to get last state id and for that they make raft_group0_client() export system_keyspace reference. This patch adds a helper method to provide the needed ID. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-27 14:50:25 +03:00
Pavel Emelyanov	dded1feeb7	group0_state_machine: Call ensure_group0_sched() with data_dictionary There's a validation for tables being used by group0 commands are marked with the respective prop. For it the caller code needs to provide database reference and it gets one from client -> system_keyspace chain. There's more explicit way -- get the data_dictionary via proxy. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-27 14:48:22 +03:00
Petr Gusev	59a876cebb	raft_group_registry: disable metrics for non-0 groups The `raft::server` registers metrics using the `server_id` label. When both a group0 Raft server and the tablets Raft server are created on the same node/shard, duplicate metrics cause conflicts. This commit temporarily disables metrics for non-0 groups. A proper fix will likely require adding a `group_id` label in the future.	2026-01-21 14:56:01 +01:00
Botond Dénes	b52a3f3a43	db/system_keyspace: remove duplicate table names from v3 Those table names that are effectively just an alias of the their counterpart outside of the v3 namespace (struct). scylla_local() is made public. Currently it is private, but it has external users, working around the private designation by using the public v3::scylla_local() alias. This change just makes the existing status clear.	2026-01-19 12:32:21 +02:00
Avi Kivity	c6dfae5661	treewide: #include Seastar headers with angle brackets Seastar is an external library from the point of view of ScyllaDB, so should be included with angle brackets. Closes scylladb/scylladb#27947	2026-01-13 14:56:15 +02:00
Piotr Dulikowski	9ed820cbf5	test: cluster: test for recovery after partial group0 command Add a reproducer for scylladb/scylladb#26945. By using error injections, the test triggers a situation where a command that removes an obsolete CDC generation is partially applied, then the node is killed an brought back. Thanks to the fix, restarting the node succeeds and does not trigger any consistency checks in the group0 reload logic.	2025-12-23 20:50:43 +01:00

1 2 3 4 5 ...

599 Commits