scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Marcin Maliszkiewicz	9792d720c9	db: move schema merging code into a separate unit It's mostly self containted and it's easier to maintain reasonably sized files. Also splitting better shows boundaries between schema and schema merging code.	2024-09-23 12:01:36 +02:00
Kefu Chai	3e84d43f93	treewide: use seastar::format() or fmt::format() explicitly before this change, we rely on `using namespace seastar` to use `seastar::format()` without qualifying the `format()` with its namespace. this works fine until we changed the parameter type of format string `seastar::format()` from `const char*` to `fmt::format_string<...>`. this change practically invited `seastar::format()` to the club of `std::format()` and `fmt::format()`, where all members accept a templated parameter as its `fmt` parameter. and `seastar::format()` is not the best candidate anymore. despite that argument-dependent lookup (ADT for short) favors the function which is in the same namespace as its parameter, but `using namespace` makes `seastar::format()` more competitive, so both `std::format()` and `seastar::format()` are considered as the condidates. that is what is happening scylladb in quite a few caller sites of `format()`, hence ADT is not able to tell which function the winner in the name lookup: ``` /__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous 265 \| return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id()); \| ^~~~~~ /usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 4290 \| format(format_string<_Args...> __fmt, _Args&&... __args) \| ^ /__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 143 \| format(fmt::format_string<A...> fmt, A&&... a) { \| ^ ``` in this change, we change all `format()` to either `fmt::format()` or `seastar::format()` with following rules: - if the caller expects an `sstring` or `std::string_view`, change to `seastar::format()` - if the caller expects an `std::string`, change to `fmt::format()`. because, `sstring::operator std::basic_string` would incur a deep copy. we will need another change to enable scylladb to compile with the latest seastar. namely, to pass the format string as a templated parameter down to helper functions which format their parameters. to miminize the scope of this change, let's include that change when bumping up the seastar submodule. as that change will depend on the seastar change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-11 23:21:40 +03:00
Patryk Jędrzejczak	ed55261650	treewide: distinguish all nodes from all token owners In one of the following patches, we introduce support for zero-token nodes. From that point, getting all nodes and getting all token owners isn't equivalent. In this patch, we ensure that we consider only token owners when we want to consider only token owners (for example, in the replication logic), and we consider all nodes when we want to consider all nodes (for example, in the topology logic). The main purpose of this patch is to make the PR introducing zero-token nodes easier to review. The patch that introduces zero-token nodes is already complicated. We don't want trivial changes from this patch to make noise there. This patch introduces changes needed for zero-token nodes only in the Raft-based topology and in the recovery mode. Zero-token nodes are unsupported in the gossip-based topology outside recovery. Some functions added to `token_metadata` and `topology` are inefficient because they compute a new data structure in every call. They are never called in the hot path, so it's not a serious problem. Nevertheless, we should improve it somehow. Note that it's not obvious how to do it because we don't want to make `token_metadata` store topology-related data. Similarly, we don't want to make `topology` store token-related data. We can think of an improvement in a follow-up. We don't remove unused `topology::get_datacenter_rack_nodes` and `topology::get_datacenter_nodes`. These function can be useful in the future. Also, `topology::_dc_nodes` is used internally in `topology`.	2024-08-29 10:37:07 +02:00
Botond Dénes	2cec0d8dd1	service/migration_listener: update_tablet_metadata(): add hint parameter The hint contains information related to what exactly changed, allowing listeners to do partial updates, instead of reloading all metadata on each notification.	2024-08-11 09:53:19 -04:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Emil Maskovsky	2dbe9ef2f2	raft: use the abort source reference in raft group0 client interface Most callers of the raft group0 client interface are passing a real source instance, so we can use the abort source reference in the client interface. This change makes the code simpler and more consistent.	2024-07-31 09:18:54 +02:00
Benny Halevy	5f6c411656	migration_manager: remove dead code Well, even after 10 years, the c++ compilers still do not compile Java... And having that legacy code laying around not only it doesn't help anyone understand what's going on, but on the contrary, it's confusing and distracting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 20:30:33 +03:00
Kefu Chai	ad649be1bf	treewide: drop thrift support thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. * `rpc_port` and `start_rpc` options are preserved, but they are marked as "Unused". so that the new release of scylladb can consume existing scylla.yaml configurations which might contain these settings. by making them deprecated, user will be able get warned, and update their configurations before we actually remove them in the next major release. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 06:44:59 +08:00
Pavel Emelyanov	83d491af02	config: Remove experimental TABLETS feature ... and replace it with boolean enable_tablets option. All the places in the code are patched to check the latter option instead of the former feature. The option is OFF by default, but the default scylla.yaml file sets this to true, so that newly installed clusters turn tablets ON. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18898	2024-05-30 18:03:51 +03:00
Piotr Smaron	7081215552	Parameterize migration_manager::announce by type to allow executing different raft commands Since ALTER KS requires creating topology_change raft command, some functions need to be extended to handle it. RAFT commands are recognized by types, so some functions are just going to be parameterized by type, i.e. made into templates. These templates are instantiated already, so that only 1 instances of each template exists across the whole code base, to avoid compiling it in each translation unit.	2024-05-28 13:55:11 +02:00
Kefu Chai	125464f2d9	migration_manager: do not reference moved-away smart pointer this change is inspired by clang-tidy. it warns like: ``` [752/852] Building CXX object service/CMakeFiles/service.dir/migration_manager.cc.o Warning: /home/runner/work/scylladb/scylladb/service/migration_manager.cc:891:71: warning: 'view' used after it was moved [bugprone-use-after-move] 891 \| db.get_notifier().before_create_column_family(keyspace, view, mutations, ts); \| ^ /home/runner/work/scylladb/scylladb/service/migration_manager.cc:886:86: note: move occurred here 886 \| auto mutations = db::schema_tables::make_create_view_mutations(keyspace, std::move(view), ts); \| ^ ``` in which, `view` is an instance of view_ptr which is a type with the semantics of shared pointer, it's backed by a member variable of `seastar::lw_shared_ptr<const schema>`, whose move-ctor actually resets the original instance. so we are actually accessing the moved-away pointer in ```c++ db.get_notifier().before_create_column_family(keyspace, view, mutations, ts) ``` so, in this change, instead of moving away from `view`, we create a copy, and pass the copy to `db::schema_tables::make_create_view_mutations()`. this should be fine, as the behavior of `db::schema_tables::make_create_view_mutations()` does not rely on if the `view` passed to it is a moved away from it or not. the change which introduced this use-after-move was `88a5ddabce` Refs `88a5ddabce` Fixes #18837 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18838	2024-05-26 12:04:00 +03:00
Avi Kivity	3bead8cea0	feature: grandfather PER_TABLE_PARTITIONERS The PER_TABLE_PARTITIONERS feature was added in `90df9a44ce` (2020; 4.0) and can now be assumed to be always present. We also remove the associated schema_feature.	2024-05-18 00:15:07 +03:00
Avi Kivity	c7d7ca2c23	feature: grandfather CDC The CDC feature was made non-experimental in `e9072542c1` (2020; 4.4) and can now be assumed to be always present. We also remove the corresponding schema_feature.	2024-05-17 20:41:20 +03:00
Avi Kivity	82ad2913ca	feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY The DIGEST_INSENSITIVE_TO_EXPIRY feature was added in `9de071d214` (2019; 3.2) and can now be assumed to be always present. We enable the corresponding schema_feature unconditionally. We do not remove the corresponding schema feature, because it can be disabled when the related TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is present.	2024-05-17 20:41:19 +03:00
Avi Kivity	b5f6021a6b	feature: grandfather VIEW_VIRTUAL_COLUMNS The VIEW_VIRTUAL_COLUMNS feature was added in `a108df09f9` (2019; 3.1) and can now be assumed to be always present. The corresponding schema_feature is removed. Note schema_features are not sent over the wire. A digest calculation without VIEW_VIRTUAL_COLUMNS is no longer tested.	2024-05-17 20:41:19 +03:00
Tomasz Grabiec	c6c8347493	migration_manager: Pull all of group0 state on repair Current code uses non-raft path to pull the schema, which violates group0 linearizability because the node will have latest schema but miss group0 updates of other system tables. In particular, system.tablets. This manifests as repair errors due to missing tablet_map for a given table when trying to access it. Tablet map is always created together with the table in the same group0 command. When a node is bootstrapping, repair calls sync_schema() to make sure local schema is up to date. This races with group0 catch up, and if sync_schema() wins, repair may fail on misssing tablet map. Fix by making sync_schema() do a group0 read barrier when in raft mode. Fixes #18002 Closes scylladb/scylladb#18175	2024-04-17 16:21:05 +02:00
Kefu Chai	1b859e484f	treewide: use fmt::to_string() to transform a UUID to std::string without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is implemented using its `fmt::formatter`, which is not available at the end of this header file where `UUID` is defined. at this moment, we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can still use `UUID::to_sstring()`, but in {fmt} v10, we cannot. so, in this change, we change all callers of `UUID::to_sstring()` to `fmt::to_string()`, so that we don't depend on `FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-26 13:38:37 +08:00
Petr Gusev	ae0ec19537	migration_manager: use raft_timeout{} Checking all the call sites of the migration manager shows that all of them are initiated by user requests, not background activities. Therefore, we add a global raft_timeout{} here.	2024-03-21 16:35:48 +04:00
Benny Halevy	358e92e645	migration_manager: notify before_drop_column_family before dropping views Call the before_drop_column_family notifications before dropping the views to allow the tablet_allocator to delete the view's tablets. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:14:56 +02:00
Gleb Natapov	77907b97f1	migration_manager: fix indentation after the previous patch.	2024-02-29 12:39:48 +02:00
Gleb Natapov	4a3c79625f	messaging_service: process migration_request rpc on shard 0 Commit `0c376043eb` added access to group0 semaphore which can be done on shard0 only. Unlike all other group0 rpcs (that already always forwarded to shard0) migration_request does not since it is an rpc that what reused from non raft days. The patch adds the missing jump to shard0 before executing the rpc.	2024-02-29 12:39:48 +02:00
Gleb Natapov	0c376043eb	migration_manager: take group0 lock during raft snapshot taking Group0 state machine access atomicity is guaranteed by a mutex in group0 client. A code that reads or writes the state needs to hold the log. To transfer schema part of the snapshot we used existing "migration request" verb which did not follow the rule. Fix the code to take group0 lock before accessing schema in case the verb is called as part of group0 snapshot transfer. Fixes scylladb/scylladb#16821	2024-02-27 11:15:17 +01:00
Kefu Chai	7e84e03f52	gms: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. because the removal of `#include "unimplemented.hh"`, `service/migration_manager.cc` misses the definition of `unimplemented::cause::VALIDATION`, so include the header where it is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16654	2024-01-05 13:37:08 +02:00
Benny Halevy	ad8a9104d8	endpoint_state subscriptions: batch on_change notification Rather than calling on_change for each particular application_state, pass an endpoint_state::map_type with all changed states, to be processed as a batch. In particular, thise allows storage_service::on_change to update_peer_info once for all changed states. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Nadav Har'El	8181e28731	secondary index: fix view creation when using tablets In commit `88a5ddabce`, we fixed materialized view creation to support tablets. We added to the function called to create materialized views in CQL, prepare_new_view_announcement() a missing call to the on_before_create_column_family() notifier that creates tablets for this new view. Unfortunately, We have the same problem when creating a secondary index, because it does not use prepare_new_view_announcement(), and instead uses a generic function to "update" the base table, which in some cases ends up creating new views when a new index is requested. In this path, the notifier did not get called to the notifier, so we must add it here too. Unfortunately, the notifiers must run in a Seastar thread, which means that yet another function now needs to run in a Seastar thread. Before this patch, creating a secondary index in a table using tablets fails with "Tablet map not found for table <uuid>". With this patch, it works. The patch also includes tests for creating a regular and local secondary index. Both tests fail (with the aforementioned error) before this patch, and pass with it. Fixes #16396 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Kamil Braun	6a4106edf3	migration_manager: don't attach empty system.scylla_local mutation in migration request handler In `effb9fb3cb` migration request handler (called when a node requests schema pull) was extended with a `system.scylla_local` mutation: ``` cm.emplace_back(co_await self._sys_ks.local().get_group0_schema_version()); ``` This mutation is empty if the GROUP0_SCHEMA_VERSIONING feature is disabled. Nevertheless, it turned out to cause problems during upgrades. The following scenario shows the problem: We upgrade from 5.2 to enterprise version with the aforementioned patch. In 5.2, `system.scylla_local` does not use schema commitlog. After the first node upgrades to the enterprise version, it immediately on boot creates a new enterprise-only table (`system_replicated_keys.encrypted_keys`) -- the specific table is not important, only the fact that a schema change is performed. This happens before the restarting node notices other nodes being UP, so the schema change is not immediately pushed to the other nodes. Instead, soon after boot, the other non-upgraded nodes pull the schema from the upgraded node. The upgraded node attaches a `system.scylla_local` mutation to the vector of returned mutations. The non-upgraded nodes try to apply this vector of mutations. Because some of these mutations are for tables that already use schema commitlog, while the `system.scylla_local` table does not use schema commitlog, this triggers the following error (even though the mutation is empty): ``` Cannot apply atomically across commitlog domains: system.scylla_local, system_schema.keyspaces ``` Fortunately, the fix is simple -- instead of attaching an empty mutation, do not attach a mutation at all if the handler of migration request notices that group0_schema_version is not present. Note that group0_schema_version is only present if the GROUP0_SCHEMA_VERSIONING feature is enabled, which happens only after the whole upgrade finishes. Refs: scylladb/scylladb#16414 Not using "Fixes" because the issue will only be fixed once this PR is merged to `master` and the commit is cherry-picked onto next-enterprise. Closes scylladb/scylladb#16416	2023-12-14 22:58:13 +01:00
Kamil Braun	26cbd28883	Merge 'token_metadata: switch to host_id' from Petr Gusev In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes. The refactoring is structured as follows: * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version. * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading. * Go over all the places which read `token_metadata` and switch them to the new version. * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template. These series [depends](`1745a1551a`) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](`95c726a8df`) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node. Closes scylladb/scylladb#15903 * github.com:scylladb/scylladb: topology: remove_endpoint: remove inet_address overload token_metadata: topology: cleanup add_or_update_endpoint token_metadata: add_replacing_endpoint: forbid replacing node with itself topology: drop key_kind, host_id is now the primary key dc_rack_fn: make it non-template token_metadata: drop the template shared_token_metadata: switch to the new token_metadata gossiper: use new token_metadata database: get_token_metadata -> new token_metadata erm: switch to the new token_metadata storage_service: get_token_metadata -> token_metadata2 storage_service: get_token_to_endpoint_map: use new token_metadata api/token_metadata: switch to new version storage_service::on_change: switch to new token_metadata cdc: switch to token_metadata2 calculate_natural_endpoints: fix indentation calculate_natural_endpoints: switch to token_metadata2 storage_service: get_changed_ranges_for_leaving: use new token_metadata decommission_with_repair, removenode_with_repair -> new token_metadata rebuild_with_repair, replace_with_repair: use new token_metadata bootstrap: use new token_metadata tablets: switch to token_metadata2 calculate_effective_replication_map: use new token_metadata calculate_natural_endpoints: fix formatting abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata network_topology_strategy_test: update new token_metadata storage_service: on_alive: update new token_metadata storage_service: handle_state_bootstrap: update new token_metadata storage_service: snitch_reconfigured: update new token_metadata storage_service: leave_ring: update new token_metadata storage_service: node_ops_cmd_handler: update new token_metadata storage_service: node_ops_cmd_handler: add coordinator_host_id storage_service: bootstrap: update new token_metadata storage_service: join_token_ring: update new token_metadata storage_service: excise: update new token_metadata storage_service: join_cluster: update new token_metadata storage_service: on_remove: update new token_metadata storage_service: handle_state_normal: fill new token_metadata storage_service: topology_state_load: fill new token_metadata storage_service: adjust update_topology_change_info to update new token_metadata topology: set self host_id on the new topology locator::topology: allow being_replaced and replacing nodes to have the same IP token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known token_metadata: get_host_id: exception -> on_internal_error token_metadata: add get_all_ips method token_metadata: support host_id-based version token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. locator: make dc_rack_fn a template locator/topology: add key_kind parameter token_metadata: topology_change_info: change field types to token_metadata_ptr token_metadata: drop unused method get_endpoint_to_token_map_for_reading	2023-12-13 16:35:52 +01:00
Petr Gusev	799f747c8f	shared_token_metadata: switch to the new token_metadata	2023-12-12 23:19:54 +04:00
Tomasz Grabiec	effb9fb3cb	Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620). If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957). When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary. We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`. Fixes: #7620 Fixes: #13957 --- This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller. I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR. Closes scylladb/scylladb#16242 * github.com:scylladb/scylladb: docs: describe group 0 schema versioning in raft docs test: add test for group 0 schema versioning feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations schema_tables: use schema version from group 0 if present migration_manager: store `group0_schema_version` in `scylla_local` during schema changes system_keyspace: make `get/set_scylla_local_param` public feature_service: add `GROUP0_SCHEMA_VERSIONING` feature	2023-12-11 12:17:57 +01:00
Kamil Braun	defcf9915c	migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations As described in #13957, when creating or altering a table in group 0 mode, we don't want each node to calculate `schema::version()`s independently using a hash algorithm. Instead, we want to all nodes to use a single version for that table, commited by the group 0 command. There's even a column ready for this in `system.scylla_tables` -- `version`. This column is currently being set for system tables, but it's not being used for user tables. Similarly to what we did with global schema version in earlier commits, the obvious thing to do would be to include a live cell for the `version` column in the `system.scylla_tables` mutation when we perform the schema change in Raft mode, and to include a tombstone when performing it outside of Raft mode, for the RECOVERY case. But it's not that simple because as it turns out, we're already sending a `version` live cell (and also a tombstone, with timestamp decremented by 1) in all `system.scylla_tables` mutations. But then we delete that cell when doing schema merge (which begs the question why were we sending it in the first place? but I digress): ``` // We must force recalculation of schema version after the merge, since the resulting // schema may be a mix of the old and new schemas. delete_schema_version(mutation); ``` the above function removes the `version` cell from the mutation. So we need another way of distinguishing the cases of schema change originating from group 0 vs outside group 0 (e.g. RECOVERY). The method I chose is to extend `system.scylla_tables` with a boolean column, `committed_by_group0`, and extend schema mutations to set this column. In the next commit we'll decide whether or not the `version` cell should be deleted based on the value of this new column.	2023-12-08 17:46:31 +01:00
Kamil Braun	3db8ac80cb	migration_manager: store `group0_schema_version` in `scylla_local` during schema changes We extend schema mutations with an additional mutation to the `system.scylla_local` table which: - in Raft mode, stores a UUID under the `group0_schema_version` key. - outside Raft mode, stores a tombstone under that key. As we will see in later commits, nodes will use this after applying schema mutations. If the key is absent or has a tombstone, they'll calculate the global schema digest on their own -- using the old way. If the key is present, they'll take the schema version from there. The Raft-mode schema version is equal to the group 0 state ID of this schema command. The tombstone is necessary for the case of performing a schema change in RECOVERY mode. It will force a revert to the old digest-based way. Note that extending schema mutations with a `system.scylla_local` mutation is possible thanks to earlier commits which moved `system.scylla_local` to schema commitlog, so all mutations in the schema mutations vector still go to the same commitlog domain. Also, since we introduce a replicated tombstone to `system.scylla_local`, we need to set GC grace to nonzero. We set it to `schema_gc_grace`, which makes sense given the use case.	2023-12-08 17:45:41 +01:00
Benny Halevy	d49d10dbdb	migration_manager: use messaging rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:48:33 +02:00
Nadav Har'El	88a5ddabce	tablets, mv: create tablets for a new materialized view Before this patch, trying to create a materialized view when tablets are enabled for a keyspace results in a failure: "Tablet map not found for table <uuid>", with uuid referring to the new view. When a table schema is created, the handler on_before_create_column_family() is called - and this function creates the tablet map for the new table. The bug was that we forgot to do the same when creating a materialized view - which also a bona-fide table. In this patch we call on_before_create_column_family() also when creating the materialized view. I decided not to create a new callback (e.g., on_before_create_view()) and rather call the existing on_before_create_column_family() callback - after all, a view is a column family too. This patch also includes a test for this issue, which fails to create the view before this patch, and passes with the patch. The test is in the test/topology_experimental_raft suite, which runs Scylla with the tablets experimental feature, and will also allow me to create tests that need multiple nodes. However, the first test added here only needs a single node to reproduce the bug and validate its fix. Fixes #16194. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16205	2023-11-28 21:54:32 +01:00
Botond Dénes	a472700309	Merge 'Minor fixes and refactors' from Kamil Braun - remove some code that is obsolete in newer Scylla versions, - fix some minor bugs. These bugs appear to be benign, there are no known issues caused by them, but fixing them is a good idea nevertheless, - refactor some code for better maintainability. Parts of this PR were extracted from https://github.com/scylladb/scylladb/pull/15331 (which was merged but later reverted), parts of it are new. Closes scylladb/scylladb#16162 * github.com:scylladb/scylladb: test/pylib: log_browsing: fix type hint migration_manager: take `abort_source&` in get_schema_for_read/write migration_manager: inline merge_schema_in_background migration_manager: remove unused merge_schema_from overload migration_manager: assume `canonical_mutation` support migration_manager: add `std::move` to avoid a copy schema_tables: refactor `scylla_tables(schema_features)` schema_tables: pass `reload` flag when calling `merge_schema` cross-shard system_keyspace: fix outdated comment	2023-11-24 17:34:21 +02:00
Kamil Braun	819f542ee6	migration_manager: take `abort_source&` in get_schema_for_read/write No callsite needed the `nullptr` case, so we can convert pointer to reference.	2023-11-23 17:23:47 +01:00
Kamil Braun	ddfe4f65a8	migration_manager: inline merge_schema_in_background There was only one use site of this template.	2023-11-23 17:23:47 +01:00
Kamil Braun	42f6c5c2db	migration_manager: remove unused merge_schema_from overload The `frozen_mutation` version is now dead code.	2023-11-23 17:23:47 +01:00
Kamil Braun	8f5c2c88b8	migration_manager: assume `canonical_mutation` support Support for `canonical_mutation`s was added way back in Scylla 3.2. A lot of code in `migration_manager` is still checking whether the old `frozen_mutations` are received or need to be sent. We no longer need this code, since we don't support version skips during upgrade (and certainly not upgrades like 3.2->5.4). Leave a sanity checks in place, but otherwise delete the `frozen_mutation` branches.	2023-11-23 17:23:47 +01:00
Kamil Braun	0479e5529a	migration_manager: add `std::move` to avoid a copy	2023-11-23 17:23:47 +01:00
Botond Dénes	8c5f5b7722	service/migration_manager: only reload schema when enabling disabled features Instead of unconditionally reloading schema when enabling any schema feature, only create a listener, if the feature was disabled in the first place. So that we don't trigger reloading of the schema on each schema feature, on node restarts. In this case, the node will start with all these features enabled already. This prevents unnecessary work on restarts. Fixes: #16112 Closes scylladb/scylladb#16118	2023-11-22 17:44:07 +02:00
Eliran Sinvani	63631257db	migration manager: fix incomplete mv schemas returned from get_schema_for_write Sometimes a view registry can get deactivated inside the schema registry, this happens due to dactivating and reactivating the registry entry which doesn't rebuild the base table information in the view. This error is later caught when trying to convert the schema into a `global_schema_ptr`, however, the real bug here is that not all schemas returned from `get_schema_for_write` are suitable for write because the mv schemas can be incomplete. This commit changes the aforementioned function in order to fix the bug. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-20 06:07:20 +02:00
Eliran Sinvani	562403b82f	migration_manager: do not globalize potentially incomplete schema There was a case where maybe sync function of a materialized view could fail to sync if the view version was old. This is because adding the base information to the view is only relevant until the record is synced. This triggers an internal error in the `global_schem_ptr` constructor. The conversion to global pointer in that case was solely for logging purposes so instead, we pass the pieces of information needed for the logging itself. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-19 14:13:01 +02:00
Kamil Braun	9212bdc6b1	migration_manager: more verbose logging for schema versions We're observing nodes getting stuck during bootstrap inside `storage_service::wait_for_ring_to_settle()`, which periodically checks `migration_manager::have_schema_agreement()` until it becomes `true`: scylladb/scylladb#15393. There is no obvious reason why that happens -- according to the nodes' logs, their latest in-memory schema version is the same. So either the gossiped schema version is for some reason different (perhaps there is a race in publishing `application_state::SCHEMA`) or missing entirely. Alternatively, `wait_for_ring_to_settle` is leaving the `have_schema_agreement` loop and getting stuck in `update_topology_change_info` trying to acquire a lock. Modify logging inside `have_schema_agreement` so details about missing schema or version mismatch are logged on INFO level, and an INFO level message is printed before we return `true`. To prevent logs from getting spammed, rate-limit the periodic messages to once every 5 seconds. This will still show the reason in our tests which allow the node to hang for many minutes before timing out. Also these schema agreement checks are done on relatively rare occasions such as bootstrap, so the additional logs should not be harmful. Furthermore, when publishing schema version to gossip, log it on INFO level. This is happening at most once per schema change so it's a rare message. If there's a race in publishing schema versions, this should allow us to observe it. Ref: scylladb/scylladb#15393 Closes scylladb/scylladb#16021	2023-11-14 11:24:47 +02:00
Botond Dénes	22381441b0	migration_manager: also reload schema on enabling digest_insensitive_to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: #16004 Closes scylladb/scylladb#16013	2023-11-13 23:32:20 +02:00
Kamil Braun	ae58e39743	Merge 'reduce announcements of the automatic schema changes' from Patryk Jędrzejczak There are some schema modifications performed automatically (during bootstrap, upgrade etc.) by Scylla that are announced by multiple calls to `migration_manager::announce` even though they are logically one change. Precisely, they appear in: - `system_distributed_keyspace::start`, - `redis:create_keyspace_if_not_exists_impl`, - `table_helper::setup_keyspace` (for the `system_traces` keyspace). All these places contain a FIXME telling us to `announce` only once. There are a few reasons for this: - calling `migration_manager::announce` with Raft is quite expensive -- taking a `read_barrier` is necessary, and that requires contacting a leader, which then must contact a quorum, - we must implement a retrying mechanism for every automatic `announce` if `group0_concurrent_modification` occurs to enable support for concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs mentioned above would be harder, and fixing the FIXMEs later would also be harder. This PR fixes the first two FIXMEs and improves the situation with the last one by reducing the number of the `announce` calls to two. Unfortunately, reducing this number to one requires a big refactor. We can do it as a follow-up to a new, more specific issue. Also, we leave a new FIXME. Fixing the first two FIXMEs required enabling the announcement of a keyspace together with its tables. Until now, the code responsible for preparing mutations for a new table could assume the existence of the keyspace. This assumption wasn't necessary, but removing it required some refactoring. Fixes scylladb/scylladb#15437 Closes scylladb/scylladb#15897 * github.com:scylladb/scylladb: table_helper: announce twice in setup_keyspace table_helper: refactor setup_table redis: create_keyspace_if_not_exists_impl: fix indentation redis: announce once in create_keyspace_if_not_exists_impl db: system_distributed_keyspace: fix indentation db: system_distributed_keyspace: announce once in start tablet_allocator: update on_before_create_column_family migration_listener: add parameter to on_before_create_column_family alternator: executor: use new prepare_new_column_family_announcement alternator: executor: introduce create_keyspace_metadata migration_manager: add new prepare_new_column_family_announcement	2023-11-02 09:32:35 +01:00
Patryk Jędrzejczak	a762179972	migration_listener: add parameter to on_before_create_column_family After adding the new prepare_new_column_family_announcement that doesn't assume the existence of a keyspace, we also need to get rid of the same assumption in all on_before_create_column_family calls. After all, they may be initiated before creating the keyspace. However, some listeners require keyspace_metadata, so we pass it as a new parameter.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	fb2703de50	migration_manager: add new prepare_new_column_family_announcement In the following commits, we reduce the number of the migration_manager::anounce calls by merging some of them in a way that logically makes sense. Some of these merges are similar -- we announce a new keyspace and its tables together. However, we cannot use the current prepare_new_column_family_announcement there because it assumes that the keyspace has already been created (when it loads the keyspace from the database). Luckily, this assumption is not necessary as this function only needs keyspace_metadata. Instead of loading it from the database, we can pass it as a parameter.	2023-10-31 12:08:03 +01:00
Avi Kivity	d450a145ce	Revert "Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak" This reverts commit `4b80130b0b`, reversing changes made to `a5519c7c1f`. It's suspected of causing dtest failures due to a bug in coroutine::parallel_for_each.	2023-10-29 18:32:06 +02:00
Kamil Braun	3976808b12	schema_tables: turn view schema fixing code into a sanity check The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal with legacy materialized view schemas used for secondary indexes, schemas which were created before the notion of "computed columns" was introduced. Back then, secondary index schemas would use a regular "token" column. Later it became a computed column and old schemas would be migrated during rolling upgrade. The migration code was introduced in 2019 (`db8d4a0cc6`) and then fixed in 2020 (`d473bc9b06`). The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming that users don't try crazy things like upgrading from 2021.X to 2023.X (which we do not support), all clusters will have already executed the migration code once they upgrade to 2023.X, meaning we can get rid of it. The main motivation of this patch is to get rid of the `db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft mode this was the only call to `merge_schema` outside "group 0 code" and in fact it is unsafe -- it uses locally generated mutations with locally generated timestamp (`api::new_timestamp()`), so if we actually did it, we would permanently diverge the group 0 state machine across nodes (the schema pulling code is disabled in Raft mode). Fortunately, this should be dead code by now, as explained in the previous paragraph. The migration code is now turned into a sanity check, if the users try something crazy, they will get an error instead of silent data corruption.	2023-10-24 13:33:35 +02:00
Kamil Braun	5397524875	feature_service: make COMPUTED_COLUMNS feature unconditionally true The feature is assumed to be true, it was introduced in 2019. It's still advertised in gossip, but it's assumed to always be present. The `schema_feature` enum class still contains `COMPUTED_COLUMNS`, and the `all_tables` function in schema_tables.cc still checks for the schema feature when deciding if `computed_columns()` table should be included. This is necessary because digest calculation tests contain many digests calculated with the feature disabled, if we wanted to make it unconditional in the schema_tables code we'd have to regenerate almost all digests in the tests. It is simpler to leave the possibility for the tests to disable the feature.	2023-10-24 13:30:13 +02:00

1 2 3 4 5 ...

363 Commits