scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Author	SHA1	Message	Date
Petr Gusev	1b2e0d0cc9	system_keyspace: get_truncated_position -> get_truncated_positions This method can return many replay_positions, so the plural form is more appropriate.	2023-09-28 12:25:40 +04:00
Pavel Emelyanov	becd960ae8	view_update_generator: Add logging to do_abort() Just tell the logs that the guy is aborting refs: #10941 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:34:21 +03:00
Pavel Emelyanov	967ebacaa4	view_update_generator: Move abort kicking to do_abort() When v.u.g. stops is first aborts the generation background fiber by requesting abort on the internal abort source and signalling the fiber in case it's waiting. Right now v.u.g.::stop() is defer-scheduled last in main(), so this move doesn't change much -- when stop_signal fires, it will kick the v.u.g.::do_abort() just a bit earlier, there's nothing that would happen after it before real ::stop() is called that depends on it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:32:45 +03:00
Pavel Emelyanov	e34220ebb7	view_update_generator: Add early abort subscription Subscribe v.u.g. to the main's stop_signal. For now a no-op callback. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-21 13:32:45 +03:00
Tomasz Grabiec	3d4398d1b2	Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620). If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957). When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary. We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`. Fixes: #7620 Fixes: #13957 Closes scylladb/scylladb#15331 * github.com:scylladb/scylladb: test: add test for group 0 schema versioning test/pylib: log_browsing: fix type hint feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations schema_tables: use schema version from group 0 if present migration_manager: store `group0_schema_version` in `scylla_local` during schema changes migration_manager: migration_request handler: assume `canonical_mutation` support system_keyspace: make `get/set_scylla_local_param` public feature_service: add `GROUP0_SCHEMA_VERSIONING` feature schema_tables: refactor `scylla_tables(schema_features)` migration_manager: add `std::move` to avoid a copy schema_tables: remove default value for `reload` in `merge_schema` schema_tables: pass `reload` flag when calling `merge_schema` cross-shard system_keyspace: fix outdated comment	2023-09-20 10:43:40 +02:00
Botond Dénes	111cdce2e1	Merge 'db/hints: Modularize manager.hh' from Dawid Mędrek This PR modularizes `manager.{hh, cc}` by dividing the files into separate smaller units. The changes improve overall readability of code and help reason about it. Each file has a specific purpose now. This is the first step in refactoring the Hinted Handoff module. Refs scylladb/scylla#15358 Closes scylladb/scylladb#15378 * github.com:scylladb/scylladb: db/hints: Remove unused aliases from manager.hh db/hints: Rename end_point_hints_manager db/hints: Rename sender to hint_sender db/hints: Move the rebalancing logic to hint_storage db/hints: Move the implementation of sender db/hints: Move the declaration of sender to hint_sender.hh db/hints: Move sender::replay_allowed() to the source file db/hints: Put end_point_hints_manager in internal namespace db/hints: Move the implementation of end_point_hints_manager db/hints: Move the declaration of end_point_hints_manager db/hints: Move definitions of functions using shard hint manager db/hints: Introduce hint_storage.hh db/hints: Extract the logger from manager.cc db/hints: Extract common types from manager.hh	2023-09-19 10:56:16 +03:00
Michael Huang	62a8a31be7	cdc: use chunked_vector for topology_description entries Lists can grow very big. Let's use a chunked vector to prevent large contiguous allocations. Fixes: #15302. Closes scylladb/scylladb#15428	2023-09-18 23:17:01 +03:00
Kamil Braun	bc6f7d1b20	Merge 'raft topology: add garbage collection for internal CDC generations table' from Patryk Jędrzejczak We add garbage collection for the `CDC_GENERATIONS_V3` table to prevent it from endlessly growing. This mechanism is especially needed because we send the entire contents of `CDC_GENERATIONS_V3` as a part of the group 0 snapshot. The solution is to keep a clean-up candidate, which is one of the already published CDC generations. The CDC generation publisher introduced in #15281 continually uses this candidate to remove all generations with timestamps not exceeding the candidate's and sets a new candidate when needed. We also add `test_cdc_generation_clearing.py` that verifies this new mechanism. Fixes #15323 Closes scylladb/scylladb#15413 * github.com:scylladb/scylladb: test: add test_cdc_generation_clearing raft topology: remove obsolete CDC generations raft topology: set CDC generation clean-up candidate topology_coordinator: refactor publish_oldest_cdc_generation system_keyspace: introduce decode_cdc_generation_id system_keyspace: add cleanup_candidate to CDC_GENERATIONS_V3	2023-09-18 11:30:10 +02:00
Kamil Braun	947c419421	schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 As explained in the previous commit, we use the new `committed_by_group0` flag attached to each row of a `scylla_tables` mutation to decide whether the `version` cell needs to be deleted or not. The rest of #13957 is solved by pre-existing code -- if the `version` column is present in the mutation, we don't calculate a hash for `schema::version()`, but take the value from the column: ``` table_schema_version schema_mutations::digest(db::schema_features sf) const { if (_scylla_tables) { auto rs = query::result_set(_scylla_tables); if (!rs.empty()) { auto&& row = rs.row(0); auto val = row.get<utils::UUID>("version"); if (val) { return table_schema_version(val); } } } ... ``` The issue will therefore be fixed once we enable `GROUP0_SCHEMA_VERSIONING`.	2023-09-15 14:32:52 +02:00
Kamil Braun	ce68ee0950	migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations As described in #13957, when creating or altering a table in group 0 mode, we don't want each node to calculate `schema::version()`s independently using a hash algorithm. Instead, we want to all nodes to use a single version for that table, commited by the group 0 command. There's even a column ready for this in `system.scylla_tables` -- `version`. This column is currently being set for system tables, but it's not being used for user tables. Similarly to what we did with global schema version in earlier commits, the obvious thing to do would be to include a live cell for the `version` column in the `system.scylla_tables` mutation when we perform the schema change in Raft mode, and to include a tombstone when performing it outside of Raft mode, for the RECOVERY case. But it's not that simple because as it turns out, we're already sending a `version` live cell (and also a tombstone, with timestamp decremented by 1) in all `system.scylla_tables` mutations. But then we delete that cell when doing schema merge (which begs the question why were we sending it in the first place? but I digress): ``` // We must force recalculation of schema version after the merge, since the resulting // schema may be a mix of the old and new schemas. delete_schema_version(mutation); ``` the above function removes the `version` cell from the mutation. So we need another way of distinguishing the cases of schema change originating from group 0 vs outside group 0 (e.g. RECOVERY). The method I chose is to extend `system.scylla_tables` with a boolean column, `committed_by_group0`, and extend schema mutations to set this column. In the next commit we'll decide whether or not the `version` cell should be deleted based on the value of this new column.	2023-09-15 14:32:52 +02:00
Kamil Braun	59912ca3b0	schema_tables: use schema version from group 0 if present As promised in the previous commit, if we persisted a schema version through a group 0 command, use it after a schema merge instead of calculating a digest. Ref: #7620 The above issue will be fixed once we enable the `GROUP0_SCHEMA_VERSIONING` feature.	2023-09-15 14:32:52 +02:00
Kamil Braun	7ab7588d59	migration_manager: store `group0_schema_version` in `scylla_local` during schema changes We extend schema mutations with an additional mutation to the `system.scylla_local` table which: - in Raft mode, stores a UUID under the `group0_schema_version` key. - outside Raft mode, stores a tombstone under that key. As we will see in later commits, nodes will use this after applying schema mutations. If the key is absent or has a tombstone, they'll calculate the global schema digest on their own -- using the old way. If the key is present, they'll take the schema version from there. The Raft-mode schema version is equal to the group 0 state ID of this schema command. The tombstone is necessary for the case of performing a schema change in RECOVERY mode. It will force a revert to the old digest-based way. Note that extending schema mutations with a `system.scylla_local` mutation is possible thanks to earlier commits which moved `system.scylla_local` to schema commitlog, so all mutations in the schema mutations vector still go to the same commitlog domain.	2023-09-15 14:32:45 +02:00
Kamil Braun	3ab244e6d9	system_keyspace: make `get/set_scylla_local_param` public We'll use it outside `system_keyspace` code in later commit.	2023-09-15 13:04:04 +02:00
Kamil Braun	72cd457d53	feature_service: add `GROUP0_SCHEMA_VERSIONING` feature This feature, when enabled, will modify how schema versions are calculated and stored. - In group 0 mode, schema versions are persisted by the group 0 command that performs the schema change, then reused by each node instead of being calculated as a digest (hash) by each node independently. - In RECOVERY mode or before Raft upgrade procedure finishes, when we perform a schema change, we revert to the old digest-based way, taking into account the possibility of having performed group0-mode schema changes (that used persistent versions). As we will see in future commits, this will be done by storing additional flags and tombstones in system tables. By "schema versions" we mean both the UUIDs returned from `schema::version()` and the "global" schema version (the one we gossip as `application_state::SCHEMA`). For now, in this commit, the feature is always disabled. Once all necessary code is setup in following commits, we will enable it together with Raft.	2023-09-15 13:04:04 +02:00
Kamil Braun	dc4e20d835	schema_tables: refactor `scylla_tables(schema_features)` The `scylla_tables` function gives a different schema definition for the `system_schema.scylla_tables` table, depending on whether certain schema features are enabled or not. The way it was implemented, we had to write `θ(2^n)` amount of code and comments to handle `n` features. Refactor it so that the amount of code we have to write to handle `n` features is `θ(n)`.	2023-09-15 13:04:04 +02:00
Kamil Braun	4376854473	schema_tables: remove default value for `reload` in `merge_schema` To avoid bugs like the one fixed in the previous commit.	2023-09-15 13:04:04 +02:00
Kamil Braun	48164e1d09	schema_tables: pass `reload` flag when calling `merge_schema` cross-shard In `0c86abab4d` `merge_schema` obtained a new flag, `reload`. Unfortunately, the flag was assigned a default value, which I think is almost always a bad idea, and indeed it was in this case. When `merge_scehma` is called on shard different than 0, it recursively calls itself on shard 0. That recursive call forgot to pass the `reload` flag. Fix this.	2023-09-15 13:04:04 +02:00
Kamil Braun	9017b998ca	system_keyspace: fix outdated comment	2023-09-15 13:04:04 +02:00
Patryk Jędrzejczak	e375e769b9	raft topology: set CDC generation clean-up candidate We want to use the clean-up candidates to remove the obsolete CDC generation data, but first, we need to set suitable generations as a candidate when there is no candidate. Since CDC generations must be published before we remove them, a generation that is being published is a good candidate.	2023-09-15 09:23:59 +02:00
Dawid Medrek	fbbb9f879a	db/hints: Remove unused aliases from manager.hh	2023-09-15 04:17:08 +02:00
Dawid Medrek	d46437a87b	db/hints: Rename end_point_hints_manager This commit renames `end_point_hints_manager` to `hint_endpoint_manager` to be consistent with other names used in the module (they all start with `hint_`).	2023-09-15 03:46:15 +02:00
Dawid Medrek	6d1eee448b	db/hints: Rename sender to hint_sender We rename the structure to highlight what exactly its purpose is.	2023-09-15 03:46:15 +02:00
Dawid Medrek	4ad0f8907c	db/hints: Move the rebalancing logic to hint_storage This commit continues modularizing manager.hh.	2023-09-15 03:46:15 +02:00
Dawid Medrek	999484466d	db/hints: Move the implementation of sender This commit continues modularizing manager.hh. After moving the declaration of sender to a dedicated header file, these changes move its implementation to a separate source file.	2023-09-15 03:46:15 +02:00
Dawid Medrek	17aabf6b9a	db/hints: Move the declaration of sender to hint_sender.hh This commit is yet another step in modularizing manager.hh. We move the declaration of sender to a dedicated file. Its implementation will follow in a future commit.	2023-09-15 03:46:15 +02:00
Dawid Medrek	1a7262ed6e	db/hints: Move sender::replay_allowed() to the source file The premise of these changes is the fact that we cannot have a cycle of #includes. Because the declaration of `sender` is going to be moved to a separate header file in a future commit, and because that header file is going to be included in the file where `end_point_hints_manager` is declared, we will need to rely on `end_point_hints_manager` being an incomplete type there. A consequence of that is that we cannot access any of `end_point_hints_manager`'s methods. This commit prepares the ground for it by moving the definition of the function to the source file where `end_point_hints_manager` will be a complete type.	2023-09-15 03:46:15 +02:00
Dawid Medrek	ad2a36bd45	db/hints: Put end_point_hints_manager in internal namespace	2023-09-15 03:46:15 +02:00
Dawid Medrek	507054012d	db/hints: Move the implementation of end_point_hints_manager This commit continues moving end_point_hints_manager to its dedicated files. After moving the declaration of the class, these changes move the implementation.	2023-09-15 03:46:15 +02:00
Dawid Medrek	f72c423984	db/hints: Move the declaration of end_point_hints_manager This commit is yet another step in modularizing manager.hh. We move the declaration of the class to a dedicated header file. The implementation will follow in a future commit.	2023-09-15 03:46:15 +02:00
Dawid Medrek	854cc0c939	db/hints: Move definitions of functions using shard hint manager We move definitions of inline methods of end_point_hints_manager and sender accessing shard hint manager to the source file, effectively un-inlining them. We need to do that to prepare for moving said structures out of manager.hh. This commit is yet another step in modularizing manager.hh.	2023-09-15 03:45:57 +02:00
Dawid Medrek	db08a85f5d	db/hints: Introduce hint_storage.hh This commit moves types used by shard hint manager and related to storing hints on disk to another file. It is yet another step in modularizing manager.hh.	2023-09-15 02:28:10 +02:00
Dawid Medrek	4814b3b19a	db/hints: Extract the logger from manager.cc This commit extracts the logger used in manager.cc to prepare the ground for modularization of manager.hh into separate smaller files. We want to preserve the logging behavior (at least for the time being), which means new files should use the same logger. These changes serve that purpose.	2023-09-15 02:24:20 +02:00
Dawid Medrek	efd6d1f57a	db/hints: Extract common types from manager.hh Currently, data structures used in manager.hh use their own aliases for gms::inet_address. It is clear they all should use the same type and having different names for it only reduces readability of the code. This commit introduces a common alias -- endpoint_id -- and gets rid of the other ones. This commit is also the first step in modularizing manager.hh by extracting common types to another file.	2023-09-15 02:23:30 +02:00
Patryk Jędrzejczak	c0fd42ead4	system_keyspace: introduce decode_cdc_generation_id The decode_cdc_generations_ids function allows us to decode a vector of CDC generation IDs. After adding cleanup_candidate to CDC_GENERATIONS_V3, we need a similar function that decodes a single ID.	2023-09-14 12:09:14 +02:00
Patryk Jędrzejczak	6db325fb69	system_keyspace: add cleanup_candidate to CDC_GENERATIONS_V3 In the following commits, we implement a garbage collection for CDC_GENERATIONS_V3. The first step is introducing the clean-up candidate. It will be continually updated by the CDC generation publisher and used to remove obsolete data.	2023-09-14 12:09:10 +02:00
Petr Gusev	082cd3bc8e	system_keyspace: switch CDC_LOCAL to schema commitlog	2023-09-13 23:17:20 +04:00
Petr Gusev	a683cebb02	system_keyspace: scylla_local: use schema commitlog We remove flush from set_scylla_local_param_as since it's now redundant. We add it to save_local_enabled_features as features need to be available before schema commitlog replay. We skip the flush if save_local_enabled_features is called from topology_state_load when the features are migrated to system.topology and we don't need strict durability.	2023-09-13 23:17:20 +04:00
Petr Gusev	beb29f094b	system_keyspace: drop load phases We want to switch system.scylla_local table to the schema commitlog, but load phases hamper here - schema commitlog is initialized after phase1, so a table which is using it should be moved to phase2, but system.scylla_local contains features, and we need them before schema commitlog initialization for SCHEMA_COMMITLOG feature. In this commit we are taking a different approach to loading system tables. First, we load them all in one pass in 'readonly' mode. In this mode, the table cannot be written to and has not yet been assigned a commit log. To achieve this we've added _readonly bool field to the table class, it's initialized to true in table's constructor. In addition, we changed the table constructor to always assign nullptr to commitlog, and we trigger an internal error if table.commitlog() property is accessed while the table is in readonly mode. Then, after triggering on_system_tables_loaded notifications on feature_service and sstable_format_selector, we call system_keyspace::mark_writable and eventually table::mark_ready_for_writes which selects the proper commitlog and marks the table as writable. In sstable_compaction_test we drop several mark_ready_for_writes calls since they are redundant, the table has already been made writable in env.make_table_for_tests call. The table::commitlog function either returns the current commitlog or causes an error if the table is readonly. This didn't work for virtual tables, since they never called mark_ready_for_writes. In this commit we add this call to initialize_virtual_tables.	2023-09-13 23:17:20 +04:00
Petr Gusev	47ffc66c7f	database.hh: add_column_family: add readonly parameter Previously, creating a table or view in schema_tables.cc/merge_tables_and_views was a two-step process: first adding a column family (add_column_family function) and then marking it as ready for writes (mark_table_as_writable). There is an yield between these stages, this means someone could see a table or view for which the mark_table_as_writable method had not yet been called, and start writing to it. This problem was demonstrated by materialised view dtests. A view is created on all nodes. On some nodes it will be created earlier than on others and the view rebuild process will start writing data to that view on other nodes, where mark_table_as_writable has not yet been called. In this patch we solve this problem by adding a readonly parameter to the add_column_family method. When loading tables from disk, this flag is set to true and the mark_table_as_writable is called only after all sstables have been loaded. When creating a new table, this flag is set to false, mark_table_as_writable is called from inside add_column_family and the new table becomes visible already as writable.	2023-09-13 23:17:20 +04:00
Petr Gusev	7e52014633	schema_tables: merge_tables_and_views: delay events until tables/views are created on all shards db.get_notifier().create_view triggers view rebuild, this process writes to the table on all shards and thus can access partially created table, e.g the one where mark_table_ready_for_writes was not yet called.	2023-09-13 23:17:20 +04:00
Petr Gusev	0e5f9ae9a4	system_keyspace: switch system.peers to schema commitlog Also, we remove flushes on writes as durability is now guaranteed by the commitlog.	2023-09-13 23:17:20 +04:00
Petr Gusev	7881ce1e09	system_keyspace: switch system.local to schema commitlog Schema commitlog lives only on the zero shard, so we need to turn on use_null_sharder option. Also, we remove flushes on writes as durability is now guaranteed by the commitlog.	2023-09-13 23:17:20 +04:00
Petr Gusev	a0653590b5	sstables_format_selector: extract listener In the following commits we want to move schema commitlog replay earlier, but the current sstable format should be selected before the replay. The current sstable format is stored in system.scylla_local, so we can't read it until system tables are loaded. This problem is similar to the enabled_features. To solve this we split sstables_format_selector in two parts. The lower level part, sstables_format_selector, knows only about database and system_keyspace. It will be moved before system_keyspace initialization, and the on_system_tables_loaded method will be called on it when the system_keyspace has loaded its tables. The higher level part, sstables_format_listener, is responsible for subscribing to feature_services and gossipier and is started later, at the same place as sstables_format_selector before this commit.	2023-09-13 23:04:50 +04:00
Petr Gusev	7104fc8a7e	sstables_format_selector: wrap when_enabled with seastar::async The listener may fire immediately, we must be in a thread context for this to work. In the next commits we are going to move enable_features_on_startup above sstables_format_selector::start in scylla_main, so we need to fix this beforehand.	2023-09-13 23:00:16 +04:00
Petr Gusev	2a0b228d17	main.cc: inline and split system_keyspace.setup Our goal is to switch system.local table to schema commitlog and stop doing flushes when we write to it. This means it would be incorrect to read from this table until schema commitlog is replayed. On the other hand, we need truncation records to be loaded before we start replaying schema commitlog, since commitlog_replayer relies on them. In this commit we inline the system_keyspace::setup function and split its content into two parts. In the first part, before schema commitlog replay, we load truncation records. It's safe to load them before schema commitlog replay since we intend to let the flushes on writes to system.truncated table. In the second part, after schema commitlog replay, we do the rest of the job - build_bootstrap_info and db::schema_tables::save_system_schema. We decided to inline this function since there is very low cohesion between the actions it's performing. It's just simpler to reason about them individually.	2023-09-13 23:00:15 +04:00
Petr Gusev	f0bc9f2d93	system_keyspace: refactor save_system_schema function This is a refactoring commit without observable changes in behaviour. Previously, there were two related functions in db::schema_tables: save_system_keyspace_schema(qp) and save_system_schema(qp, ks). The first called the second passing "system_schema" as the second argument. Outside of schema_tables module we don't need two functions, we just need a way to say 'persist system schema objects in the appropriate tables/keyspaces'. In this commit we change the function save_system_schema to have this meaning. Internally it calls save_system_schema_to_keyspace twice with "system_schema" and "system", since that's what we need in the single call site of this function in system_keyspace::setup. In subsequent commits we are going to move this call out of the system_keyspace::setup.	2023-09-13 23:00:15 +04:00
Petr Gusev	e395086557	system_keyspace: move initialize_virtual_tables into virtual_tables.hh This is a readability refactoring commit without observable changes in behaviour. initialize_virtual_tables logically belongs to virtual_tables module, and it allows to make other functions in virtual_tables.cc (register_virtual_tables, install_virtual_readers) local to the module, which simplifies the matters a bit. all_virtual_tables() is not needed anymore, all the references to registered virtual tables are now local to virtual_tables module and can just use virtual_tables variable directly.	2023-09-13 23:00:15 +04:00
Petr Gusev	c4787a160b	system_keyspace: remove unused parameter	2023-09-13 23:00:15 +04:00
Petr Gusev	b90011294d	config.cc: drop db::config::host_id In this refactoring commit we remove the db::config::host_id field, as it's hacky and duplicates token_metadata::get_my_id. Some tests want specific host_id, we add it to cql_test_config and use in cql_test_env. We can't pass host_id to sstables_manager by value since it's initialized in database constructor and host_id is not loaded yet. We also prefer not to make a dependency on shared_token_metadata since in this case we would have to create artificial shared_token_metadata in many tools and tests where sstables_manager is used. So we pass a function that returns host_id to sstables_manager constructor.	2023-09-13 23:00:15 +04:00
Petr Gusev	a03fbc3781	system_keyspace: set null sharder when configuring schema commitlog The schema commitlog lives only on the null shard, it makes no sense to set use_schema_commitlog without use_null_sharder. We also extract the function enable_schema_commitlog which sets all the needed properties.	2023-09-13 23:00:15 +04:00

1 2 3 4 5 ...

3334 Commits