scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Avi Kivity	c6dfae5661	treewide: #include Seastar headers with angle brackets Seastar is an external library from the point of view of ScyllaDB, so should be included with angle brackets. Closes scylladb/scylladb#27947	2026-01-13 14:56:15 +02:00
Michael Litvak	55f4a2b754	migration_listener: fix deadlock in nested notifications When calling a migration notification from the context of a notification callback, this could lead to a deadlock with unregistering a listener: A: the parent notification is called. it calls thread_for_each, where it acquires a read lock on the vector of listeners, and calls the callback function for each listener while holding the lock. B: a listener is unregistered. it calls `remove` and tries to acquire a write lock on the vector of listeners. it waits because the lock is held. A: the callback function calls another notification and calls thread_for_each which tries to acquire the read lock again. but it waits since there is a waiter. Currently we have such concrete scenario when creating a table, where the callback of `before_create_column_family` in the tablet allocator calls `before_allocate_tablet_map`, and this could deadlock with node shutdown where we unregister listeners. Fix this by not acquiring the read lock again in the nested notification. There is no need because the read lock is already held by the parent notification while the child notification is running. We add a function `thread_for_each_nested` that is similar to `thread_for_each` except it assumes the read lock is already held and doesn't acquire it, and it should be used for nested notifications instead of `thread_for_each`. Fixes scylladb/scylladb#27364 Closes scylladb/scylladb#27637	2025-12-17 14:00:28 +01:00
Michał Jadwiszczak	24d69b4005	db/view/view_building_state: replace task's state with `aborted` flag After previous commits, we can drop entire task's state and replace it with single boolean flag, which determines if a task was aborted. Once a task was aborted, it cannot get resurrected to a normal state.	2025-11-25 12:14:04 +01:00
Pavel Emelyanov	1c9c4c8c8c	Merge 'service: attach storage_service to migration_manager using pluggable' from Marcin Maliszkiewicz Migration manager depends on storage service. For instance, it has a reload_schema_in_bg background task which calls _ss.local() so it expects that storage service is not stopped before it stops. To solve this we use permit approach, and during storage_service stop: - we ignore new code execution in migration_manager which'd use storage_service - but wait with storage_service shutdown until all existing executions are done Fixes scylladb/scylladb#26734 Backport: no need, problem existed since very long time, code restructure in https://github.com/scylladb/scylladb/commit/389afcd (and following commits) made it hitting more often, as _ss was called earlier, but it's not released yet. Closes scylladb/scylladb#26779 * github.com:scylladb/scylladb: service: attach storage_service to migration_manager using pluggabe service: migration_manager: corutinize merge_schema_from service: migration_manager: corutinize reload_schema	2025-11-14 15:14:28 +03:00
Marcin Maliszkiewicz	958d04c349	service: attach storage_service to migration_manager using pluggabe Migration manager depends on storage service. For instance, it has a reload_schema_in_bg background task which calls _ss.local() so it expects that storage service is not stopped before it stops. To solve this we use permit approach, and during storage_service stop: - we ignore new code execution in migration_manager which'd use storage_service - but wait with storage_service shutdown until all existing executions are done Fixes scylladb/scylladb#26734	2025-11-14 08:50:19 +01:00
Marcin Maliszkiewicz	cf9b2de18b	service: migration_manager: corutinize merge_schema_from It's needed to easily keep-alive pluggable storage_service permit in a following commit.	2025-11-14 08:50:19 +01:00
Marcin Maliszkiewicz	5241e9476f	service: migration_manager: corutinize reload_schema It's needed to easily keep-alive pluggable storage_service permit in a following commit.	2025-11-14 08:50:18 +01:00
Michael Litvak	eefae4cc4e	migration_manager: pass timestamp to pre_create pass the write timestamp as parameter to the on_pre_create_column_families notification.	2025-11-13 16:59:43 +01:00
Piotr Dulikowski	2e5eb92f21	Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak When generating CDC log mutations for some base mutation, use a CDC schema that is compatible with the base schema. The compatible CDC schema has for every base column a corresponding CDC column with the same name. If using a non-compatible schema, we may encounter a situation, especially during ALTER, that we have a mutation with a base column set with some value, but the CDC schema doesn't have a column by that name. This would cause the user request to fail with an error. We add to the schema object a schema_ptr that for CDC-enabled tables points to the schema object of the CDC table that is compatible with the schema. It is set by the schema merge algorithm when creating the schema for a table that is created or altered. We use the fact that a base table and its CDC table are created and altered in the same group0 operation, and this way we can find and set the cdc schema for a base table. When transporting the base schema as a frozen schema between shards, we transport with it the frozen cdc schema as well. The patch starts with a series of refactoring commits that make extending the frozen schema easier and cleans up some duplication in the code about the frozen schema. We combine the two types `frozen_schema_with_base_info` and `view_schema_and_base_info` to a single type `extended_frozen_schema` that holds a frozen schema with additional data that is not part of the schema mutations but needs to be transported with it to unfreeze it - base_info, and the frozen cdc schema which is added in a later commit. Fixes https://github.com/scylladb/scylladb/issues/26405 backport not needed - enhancement Closes scylladb/scylladb#24960 * github.com:scylladb/scylladb: test: cdc: test cdc compatible schema cdc: use compatiable cdc schema db: schema_applier: create schema with pointer to CDC schema db: schema_applier: extract cdc tables schema: add pointer to CDC schema schema_registry: remove base_info from global_schema_ptr schema_registry: use extended_frozen_schema in schema load schema_registry: replace frozen_schema+base_info with extended_frozen_schema frozen_schema: extract info from schema_ptr in the constructor frozen_schema: rename frozen_schema_with_base_info to extended_frozen_schema	2025-11-13 10:11:54 +01:00
Nikos Dragazis	56e5dfc14b	migration_manager: Add missing validations for schema extensions The migration manager offers some free functions to prepare mutations for a new/updated table/view. Most of them include a validation check for the schema extensions, but in the following ones it's missing: * `prepare_new_column_family_announcement` (overload with vector as out parameter) * `prepare_new_column_families_announcement` Presumably, this was just an omission. It's also not a very important one since the only extension having validation logic is the `encryption_schema_extension`, but none of these functions is connected to user queries where encryption options can be provided in the schema. User queries go through the other `prepare_new_column_family_announcement` overload, which does perform a validation check. Add validation in the missing places. Fixes #26470. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#26487	2025-11-11 10:08:58 +02:00
Michael Litvak	6e2513c4d2	db: schema_applier: create schema with pointer to CDC schema When creating a schema for a non-CDC table in the schema_applier, find its CDC schema that we created previously in the same operation, if any, and create the schema with a pointer to the CDC schema. We use the fact that for a base table with CDC enabled, its CDC schema is created or altered together in the same group0 operation. Similarly, in schema_tables, when creating table schemas from the schema tables, first create all schemas that don't have CDC enabled, then create schemas that have CDC enabled by extending them with the pointer to the CDC schema that we created before. There are few additional cases where we create schemas that we need to consider how to handle. When loading a schema from schema tables in the schema_loader we decide not to set the CDC schema, because this schema is mostly used for tools and it's not used for generating CDC mutations. When transporting a schema by RPC in the migration manager, we don't transport its CDC schema, and we always set it to null. Because we use raft we expect this shouldn't have any effect, because the schema is synchronized through raft and not through the RPC.	2025-10-21 14:13:43 +02:00
Michael Litvak	ac96e40f13	schema: add pointer to CDC schema Add to the schema object a member that points to the CDC schema object that is compatible with this schema, if any. The compatible CDC schema is created and altered with its base schema in the same group0 operation. When generating CDC log mutations for some base mutation we want them to be created using a compatible schema thas has a CDC column corresponding to each base column. This change will allow us to find the right CDC schema given a base mutation. We also update the relevant structures in the schema registry that are related to learning about schemas and transporting schemas across shards or nodes. When transporting a schema as frozen_schema, we need to transport the frozen cdc schema as well, and set it again when unfreezing and reconstructing the schema. When adding a schema to the registry, we need to ensure its CDC schema is added to the registry as well. Currently we always set the CDC schema to nullptr and maintain the previous behavior. We will change it in a later commit. Until then, we mark all places where CDC schema is passed clearly so we don't forget it.	2025-10-21 14:13:43 +02:00
Michael Litvak	085abef05d	schema_registry: use extended_frozen_schema in schema load Change the schema loader type in the schema_registry to return a extended_frozen_schema instead of view_schema_and_base_info, and remove view_schema_and_base_info which is not used anymore. The casting between them is trivial.	2025-10-21 14:13:43 +02:00
Michael Litvak	278801b2a6	frozen_schema: extract info from schema_ptr in the constructor Currently we construct a frozen schema with base info in few places, and the caller is responsible for constructing the frozen schema and extracting the base info if it's a view table. We change it to make it simpler and remove the burden from the caller. The caller can simply pass the schema_ptr, and the constructor for extended_frozen_schema will construct the frozen schema and extract the additional info it needs. This will make it easier to add additional fields, and reduces code duplication. We also make temporary castings between extended_frozen_schema and view_schema_and_base_info for the transition, which are trivial, until they are combined to a single type.	2025-10-21 14:13:42 +02:00
Marcin Maliszkiewicz	389afcdeb6	service: fix dependencies during migration_manager startup We need to avoid reloading schema early as it goes via schema_applier which internally depends on storage_service and on distribued_loader initializing all keyspaces. Simply moving migration manager startup later in the code is not easy as some services depend on it being initialized so we just enable those feature listeners a bit later.	2025-10-14 10:56:26 +02:00
Benny Halevy	b17a36c071	tablets: read_tablet_mutations: use unfreeze_and_split_gently Split the tablets mutations by number of rows, based on `min_tablets_in_mutation` (currently calibrated to 1024), similar to the splitting done in `storage_service::merge_topology_snapshot`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-09-30 17:15:41 +03:00
Michael Litvak	5a7e6e53ff	cdc: fix create table with cdc if not exists Fix an issue where executing a CREATE TABLE IF NOT EXISTS statement with CDC enabled fails with an error if the table already exists. Instead, the query should succeed and be a no-op. This regression was introduced by commit `fed1048059`. Previously, when executing the query, we would first check if the table exists in do_prepare_new_column_families_announcement. If it did, we would throw an already_exists_exception, which was handled correctly; otherwise, we would continue and create the CDC table in the before_create_column_families notification. The order of operations was changed in `fed1048059`, causing the regression. Now, we first create the CDC schema and add it to the schema list for creation, and then check for each of them if they already exist. The problem is that when we create the CDC schema in on_pre_create_column_families, it also checks if the CDC table already exists. If it does, it throws an invalid_request_exception, which is not caught and handled as expected. This patch restores the previous order of operations: we first check if the tables exist, and only then add the CDC schema in pre_create. Fixes scylladb/scylladb#26142	2025-09-21 09:38:36 +02:00
Michael Litvak	7f2cd06bdc	migration_listener: add on_before_allocate_tablet_map notification Add a new notification on_before_allocate_tablet_map that is called when creating a tablet map for a new table and passes the tablet map. This will be useful next for CDC for example. when creating tablets for a new table we want to create CDC streams for each tablet in the same operation, and we need to have the tablet map with the tablet count and tokens for each tablet, because the CDC streams are based on that. We need to change slightly the tablet allocation code for this to work with colocated tables, because previously when we created the tablet map of a colocated table we didn't have a reference to the base tablet map, but now we do need it so we can pass it to the notification.	2025-09-17 14:47:11 +02:00
Michael Litvak	fed1048059	cdc: move cdc table creation to pre_create When creating a new table with CDC enabled, we create also a CDC log table by adding the CDC table's mutations in the same operation. Previously, it works by the CDC log service subscribing to on_before_create_column_family and adding the CDC table's mutations there when being notified about a new created table. The problem is that when we create the tables we also create their tablet maps in the tablet allocator, and we want to created the two tables as co-located tables: we allocate a tablet map for the base table, and the CDC table is co-located with the base table. This doesn't work well with the previous approach because the notification that creates the CDC table is the same notification that the tablet allocator creates the base tablet map, so the two operations are independent, but really we want the tablet allocator to work on both tables together, so that we have the base table's schema and tablet map when we create the CDC table's co-located tablet map. In order to achieve this, we want to create and add the CDC table's schema, and only after that notify using before_create_column_families with a vector that contains both the base table and CDC table. The tablet allocator will then have all the information it needs to create the co-located tablet map. We move the creation of the CDC log table - instead of adding the table's mutations in on_before_create_column_family, we create the table schema and add it to the new tables vector in on_pre_create_column_families, which is called by the migration manager in do_prepare_new_column_families_announcement. The migration manager will then create and add all mutations for creating the tables, and notify about the tables being created together.	2025-09-17 14:47:11 +02:00
Michał Jadwiszczak	6e3e287a39	db/schema_tables: create/cleanup tasks when an index is created/dropped Similarly as in previous commits, create view building tasks when an index is created and cleanup view building status when it's dropped.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	76caaea3f1	service/migration_manager: cleanup view building state on drop keyspace When a keyspace is dropped, remove all unfinished building tasks for all views and remove their entries from `system.view_built_status_v2` and `system.built_views`.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	f10c5c4493	service/migration_manager: cleanup view building state on drop view When a view is dropped, remove all unfinished building tasks, remove entries from `system.view_built_status_v2` and `system.built_views`. If the view is currently being built, removing its tasks means they are also aborted. Finished tasks are already removed from the table.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	6d1fbf06ed	service/migration_manager: create view building tasks on create view Create view building tasks in the same batch as new view mutations. The tasks are created only if `VIEW_BUILDING_COORDINATOR` feature is on and the view is in tablet keyspace.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	204f61ffe1	service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()` The reference is needed to get `view_building_state_machine`.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	76a6dd82fd	service/migration_manager: coroutinize `prepare_new_view_announcement()`	2025-08-27 08:55:47 +02:00
Gleb Natapov	198cfc6fe7	migration manager: do not use group0 on non zero shard Commit `ddc3b6dcf5` added a check of group0 state in get_schema_for_write(), but group0 client can only be used on shard 0, and get_schema_for_write() can be called on any shard, so we cannot use _group0_client there directly. Move assert where we use another group0 function already where it is guarantied to run on shard 0. Closes scylladb/scylladb#25204	2025-07-28 14:10:01 +02:00
Petr Gusev	3e0347c614	migration_manager: add timeout to start_group0_operation and announce Pass a timeout parameter through to start_operation() and add_entry(), respectively. This is a preparatory change for the next commit, which will use the timeout to properly handle timeouts during lazy creation of Paxos state tables.	2025-07-24 16:39:50 +02:00
Gleb Natapov	ddc3b6dcf5	migration manager: assert that if schema pull is disabled the group0 is not in use_pre_raft_procedures state If schema pull are disabled group0 is used to bring up to date schema by calling start_group0_operation() which executes raft read barrier internally, but if the group0 is still in use_pre_raft_procedures start_group0_operation() silently does nothing. Later the code that assumes that schema is already up-to-date will fail and print warnings into the log. But since getting queries in the state when a node is in raft enabled mode but group0 is still not configured is illegal it is better to make those errors more visible buy asserting them during testing. Closes scylladb/scylladb#25112	2025-07-23 14:10:17 +02:00
Botond Dénes	054ea54565	Merge 'streaming: Avoid deadlock by running view checks in a separate scheduling group' from Tomasz Grabiec This issue happens with removenode, when RBNO is disabled, so range streamer is used. The deadlock happens in a scenario like this: 1. Start 3 nodes: {A, B, C}, RF=2 2. Node A is lost 3. removenode A 4. Both B and C gain ownership of ranges. 5. Streaming sessions are started with crossed directions: B->C, C->B Readers created by sender side exhaust streaming semaphore on B and C. Receiver side attempts to obtain a permit indirectly by calling check_needs_view_update_path(), which reads local tables. That read is blocked and times-out, causing streaming to fail. The streaming writer is already using a tracking-only permit. Even if we didn't deadlock, and the streaming semaphore was simply exhausted by other receiving sessions (via tracking-only permit), the query may still time-out due to starvation. To avoid that, run the query under a different scheduling group, which translates to the system semaphore instead of the maintenance semaphore, to break the dependency. The gossip group was chosen because it shouldn't be contended and this change should not interfere with it much. Fixes #24807 Fixes #24925 Closes scylladb/scylladb#24929 * github.com:scylladb/scylladb: streaming: Avoid deadlock by running view checks in a separate scheduling group service: migration_manager: Run group0 barrier in gossip scheduling group	2025-07-17 10:24:41 +03:00
Avi Kivity	6fce817aa8	Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft. Pulling database::apply() out of schema merging code will allow to batch changes to subsystems. Future generic code will first call prepare() on all implementations, then single database::apply() and then update() on all implementations, then on each shard it will call commit() for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then post_commit(). Backport: no, it's a new feature Fixes: https://github.com/scylladb/scylladb/issues/19649 Fixes https://github.com/scylladb/scylladb/issues/24531 Closes scylladb/scylladb#24886 [avi: adjust for std::vector<mutations> -> utils::chunked_vector<mutations>] * github.com:scylladb/scylladb: test: add type creation to test_snapshot storage_service: always wake up load balancer on update tablet metadata db: schema_applier: call destroy also when exception occurs db: replica: simplify seeding ERM during shema change db: remove cleanup from add_column_family db: abort on exception during schema commit phase db: make user defined types changes atomic replica: db: make keyspace schema changes atomic db: atomically apply changes to tables and views replica: make truncate_table_on_all_shards get whole schema from table_shards service: split update_tablet_metadata into two phases service: pull out update_tablet_metadata from migration_listener db: service: add store_service dependency to schema_applier service: simplify load_tablet_metadata and update_tablet_metadata db: don't perform move on tablet_hint reference replica: split add_column_family_and_make_directory into steps replica: db: split drop_table into steps db: don't move map references in merge_tables_and_views() db: introduce commit_on_shard function db: access types during schema merge via special storage replica: make non-preemptive keyspace create/update/delete functions public replica: split update keyspace into two phases replica: split creating keyspace into two functions db: rename create_keyspace_from_schema_partition db: decouple functions and aggregates schema change notification from merging code db: store functions and aggregates change batch in schema_applier db: decouple tables and views schema change notifications from merging code db: store tables and views schema diff in schema_applier db: decouple user type schema change notifications from types merging code service: unify keyspace notification functions arguments db: replica: decouple keyspace schema change notifications to a separate function db: add class encapsulating schema merging	2025-07-13 20:47:55 +03:00
Benny Halevy	3feb759943	everywhere: use utils::chunked_vector for list of mutations Currently, we use std::vector<*mutation> to keep a list of mutations for processing. This can lead to large allocation, e.g. when the vector size is a function of the number of tables. Use a chunked vector instead to prevent oversized allocations. `perf-simple-query --smp 1` results obtained for fixed 400MHz frequency and PGO disabled: Before (read path): ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 89055.97 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39417 insns/op, 18003 cycles/op, 0 errors) 103372.72 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39380 insns/op, 17300 cycles/op, 0 errors) 98942.27 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39413 insns/op, 17336 cycles/op, 0 errors) 103752.93 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39407 insns/op, 17252 cycles/op, 0 errors) 102516.77 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39403 insns/op, 17288 cycles/op, 0 errors) throughput: mean= 99528.13 standard-deviation=6155.71 median= 102516.77 median-absolute-deviation=3844.59 maximum=103752.93 minimum=89055.97 instructions_per_op: mean= 39403.99 standard-deviation=14.25 median= 39406.75 median-absolute-deviation=9.30 maximum=39416.63 minimum=39380.39 cpu_cycles_per_op: mean= 17435.81 standard-deviation=318.24 median= 17300.40 median-absolute-deviation=147.59 maximum=18002.53 minimum=17251.75 ``` After (read path) ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 59755.04 tps ( 66.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39466 insns/op, 22834 cycles/op, 0 errors) 71854.16 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39417 insns/op, 17883 cycles/op, 0 errors) 82149.45 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39411 insns/op, 17409 cycles/op, 0 errors) 49640.04 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.3 tasks/op, 39474 insns/op, 19975 cycles/op, 0 errors) 54963.22 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.3 tasks/op, 39474 insns/op, 18235 cycles/op, 0 errors) throughput: mean= 63672.38 standard-deviation=13195.12 median= 59755.04 median-absolute-deviation=8709.16 maximum=82149.45 minimum=49640.04 instructions_per_op: mean= 39448.38 standard-deviation=31.60 median= 39466.17 median-absolute-deviation=25.75 maximum=39474.12 minimum=39411.42 cpu_cycles_per_op: mean= 19267.01 standard-deviation=2217.03 median= 18234.80 median-absolute-deviation=1384.25 maximum=22834.26 minimum=17408.67 ``` `perf-simple-query --smp 1 --write` results obtained for fixed 400MHz frequency and PGO disabled: Before (write path): ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no} Disabling auto compaction 63736.96 tps ( 59.4 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 49667 insns/op, 19924 cycles/op, 0 errors) 64109.41 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 49992 insns/op, 20084 cycles/op, 0 errors) 56950.47 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50005 insns/op, 20501 cycles/op, 0 errors) 44858.42 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50014 insns/op, 21947 cycles/op, 0 errors) 28592.87 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50027 insns/op, 27659 cycles/op, 0 errors) throughput: mean= 51649.63 standard-deviation=15059.74 median= 56950.47 median-absolute-deviation=12087.33 maximum=64109.41 minimum=28592.87 instructions_per_op: mean= 49941.18 standard-deviation=153.76 median= 50005.24 median-absolute-deviation=73.01 maximum=50027.07 minimum=49667.05 cpu_cycles_per_op: mean= 22023.01 standard-deviation=3249.92 median= 20500.74 median-absolute-deviation=1938.76 maximum=27658.75 minimum=19924.32 ``` After (write path) ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no} Disabling auto compaction 53395.93 tps ( 59.4 allocs/op, 16.5 logallocs/op, 14.3 tasks/op, 50326 insns/op, 21252 cycles/op, 0 errors) 46527.83 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50704 insns/op, 21555 cycles/op, 0 errors) 55846.30 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50731 insns/op, 21060 cycles/op, 0 errors) 55669.30 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50735 insns/op, 21521 cycles/op, 0 errors) 52130.17 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50757 insns/op, 21334 cycles/op, 0 errors) throughput: mean= 52713.91 standard-deviation=3795.38 median= 53395.93 median-absolute-deviation=2955.40 maximum=55846.30 minimum=46527.83 instructions_per_op: mean= 50650.57 standard-deviation=182.46 median= 50731.38 median-absolute-deviation=84.09 maximum=50756.62 minimum=50325.87 cpu_cycles_per_op: mean= 21344.42 standard-deviation=202.86 median= 21334.00 median-absolute-deviation=176.37 maximum=21554.61 minimum=21060.24 ``` Fixes #24815 Improvement for rare corner cases. No backport required Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#24919	2025-07-13 19:13:11 +03:00
Tomasz Grabiec	ee2fa58bd6	service: migration_manager: Run group0 barrier in gossip scheduling group Fixes two issues. One is potential priority inversion. The barrier will be executed using scheduling group of the first fiber which triggers it, the rest will block waiting on it. For example, CQL statements which need to sync the schema on replica side can block on the barrier triggered by streaming. That's undesirable. This is theoretical, not proved in the field. The second problem is blocking the error path. This barrier is called from the streaming error handling path. If the streaming concurrency semaphore is exhausted, and streaming fails due to timeout on obtaining the permit in check_needs_view_update_path(), the error path will block too because it will also attempt to obtain the permit as part of the group0 barrier. Running it in the gossip scheduling group prevents this. Fixes #24925	2025-07-11 16:29:31 +02:00
Marcin Maliszkiewicz	2f840e51d1	service: pull out update_tablet_metadata from migration_listener It's not a good usage as there is only one non-empty implementation. Also we need to change it further in the following commit which makes it incompatible with listener code.	2025-07-10 10:40:43 +02:00
Marcin Maliszkiewicz	fa157e7e46	db: service: add store_service dependency to schema_applier There is already implicit logical dependency via migration_notifier but in the next commits we'll be moving store_service out from it as we need better control (i.e. return a value from the call).	2025-07-10 10:40:43 +02:00
Marcin Maliszkiewicz	ae81497995	service: unify keyspace notification functions arguments Keyspace metadata is not used, only name is needed so we can remove those extra find_keyspace() calls. Moreover there is no need to copy the name.	2025-07-10 10:40:42 +02:00
Michael Litvak	05ffcefd50	migration_manager: add notification for creating multiple tables Add prepare_new_column_families_announcement for preparing multiple new tables that are created in a single operation. A listener can receive a notification when multiple tables are created. This is useful if the listener needs to have all the new tables, and not work on each new table independently. For example, if there are dependencies between the new tables.	2025-07-01 13:20:18 +03:00
Avi Kivity	cd79a8fc25	Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz" This reverts commit `0b516da95b`, reversing changes made to `30199552ac`. It breaks cluster.random_failures.test_random_failures.test_random_failures in debug mode (at least). Fixes #24513	2025-06-16 22:38:12 +03:00
Marcin Maliszkiewicz	21a5a3c01f	service: pull out update_tablet_metadata from migration_listener It's not a good usage as there is only one non-empty implementation. Also we need to change it further in the following commit which makes it incompatible with listener code.	2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz	92e3d69f79	db: service: add store_service dependency to schema_applier There is already implicit logical dependency via migration_notifier but in the next commits we'll be moving store_service out from it as we need better control (i.e. return a value from the call).	2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz	3a95edd0d7	service: unify keyspace notification functions arguments Keyspace metadata is not used, only name is needed so we can remove those extra find_keyspace() calls. Moreover there is no need to copy the name.	2025-05-27 20:00:58 +02:00
Wojciech Mitros	d77f11d436	base_info: remove the lw_shared_ptr variant The base_dependent_view_info is no longer needed to be shared or modified in the view_info, so we no longer need to keep it as a shared pointer.	2025-04-24 01:08:40 +02:00
Wojciech Mitros	05fce91945	schema_registry: store base info instead of base schema for view entries In the following patch we plan to remove the base schema from the base_info to make the base_info immutable. To do that, we first prepare the schema registry for the change; we need to be able to create view schemas from frozen schemas there and frozen schemas have no information about the base table. Unless we do this change, after base schemas are removed from the base info, we'll no longer be able to load a view schema to the schema registry without looking up the base schema in the database. This change also required some updates to schema building: * we add a method for unfreezing a view schema with base info instead of a base schema * we make it possible to use schema_builder with a base info instead of a base schema * we add a method for creating a view schema from mutations with a base info instead of a base schema * we add a view_info constructor withat base info instead of a base schema * we update the naming in schema_registry to reflect the usage of base info instead of base schema	2025-04-24 01:08:39 +02:00
Benny Halevy	5f8b5724e6	service: migration_manager: use named gate Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-12 11:28:49 +03:00
Avi Kivity	882f405eed	Merge "Convert gossiper's endpoint state map to be host id based" from Gleb " The series makes endpoint state map in the gossiper addressable by host id instead of ips. The transition has implication outside of the gossiper as well. Gossiper based topology operations are affected by this change since they assume that the mapping is ip based. On wire protocol is not affected by the change as maps that are sent by the gossiper protocol remain ip based. If old node sends two different entries for the same host id the one with newer generation is applied. If new node has two ids that are mapped to the same ip the newer one is added to the outgoing map. Interoperability was verified manually by running mixed cluster. The series concludes the conversion of the system to be host id based. " * 'gleb/gossipper-endpoint-map-to-host-id-v2' of github.com:scylladb/scylla-dev: gossiper: make examine_gossiper private gossiper: rename get_nodes_with_host_id to get_node_ip treewide: drop id parameter from gossiper::for_each_endpoint_state treewide: move gossiper to index nodes by host id gossiper: drop ip from replicate function parameters gossiper: drop ip from apply_new_states parameters gossiper: drop address from handle_major_state_change parameter list gossiper: pass rpc::client_info to gossiper_shutdown verb handler gossiper: add try_get_host_id function gossiper: add ip to endpoint_state serialization: fix std::map de-serializer to not invoke value's default constructor gossiper: drop template from wait_alive_helper function gossiper: move get_supported_features and its users to host id storage_service: make candidates_for_removal host id based gossiper: use peers table to detect address change storage_service: use std::views::keys instead of std::views::transform that returns a key gossiper: move _pending_mark_alive_endpoints to host id gossiper: do not allow to assassinate endpoint in raft topology mode gossiper: fix indentation after previous patch gossiper: do not allow to assassinate non existing endpoint	2025-04-02 12:30:00 +03:00
Piotr Smaron	370707b111	service: restore default timeout in `announce_with_raft` This restored timeout seems to have been accidentally removed in `7081215552 (r2005352424)`. Without it, `raft_server_with_timeouts::run_with_timeout` will get `std::nullopt` as a value of the `timeout` parameter and perform an operation without any timeout, whereas previously it would have waited for the default timeout specified in `raft_server_for_group::default_op_timeout`. Closes scylladb/scylladb#23380	2025-04-01 10:20:16 +03:00
Gleb Natapov	28fb84117d	treewide: drop id parameter from gossiper::for_each_endpoint_state We have it in endpoint_state anyway, so no need to pass both.	2025-03-31 16:50:50 +03:00
Gleb Natapov	4609bbbbb2	treewide: move gossiper to index nodes by host id This patch changes gossiper to index nodes by host ids instead of ips. The main data structure that changes is _endpoint_state_map, but this results in a lot of changes since everything that uses the map directly or indirectly has to be changed. The big victim of this outside of the gossiper itself is topology over gossiper code. It works on IPs and assumes the gossiper does the same and both need to be changed together. Changes to other subsystems are much smaller since they already mostly work on host ids anyway.	2025-03-31 16:50:50 +03:00
Gleb Natapov	48a1030c91	treewide: use host id directly in endpoint state change subscribers Now that we have host ids in endpoint state change subscribers some of them can be simplified by using the id directly instead of locking it up by ip.	2025-03-11 12:09:22 +02:00
Gleb Natapov	499eb4d17f	treewide: pass host id to endpoint state change subscribers	2025-03-11 12:09:22 +02:00
Gleb Natapov	0e3dcb7954	treewide: move everyone to use host id based gossiper::is_alive and drop ip based one	2025-03-11 12:09:21 +02:00

1 2 3 4 5 ...

432 Commits