scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 12:06:44 +00:00

Author	SHA1	Message	Date
Kamil Braun	93be4c0cb0	Merge 'Base node liveliness consistently on gossiper::is_alive' from Benny Halevy Currently he gossiper marks endpoint_state objects as alive/dead. I some cases the endpoint_state::is_alive function is checked but in many other cases gossiper::is_alive(endpoint) is used to determine if the endpoint is alive. This series removed the endpoint_state::is_alive state and moves all the logic to gossiper::is_alive that bases its decision on the endpoint having an endpoint_state and being in the _live_endpoints set. For that, the _live_endpoints is made sure to be replicated to all shards when changed and the endpoint_state changes are serialized under lock_endpoint, and also making sure that the endpoint_state in the _endpoint_states_map is never updated in place, but rather a temporary copy is changed and then safely replicated using gossiper::replicate Refs https://github.com/scylladb/scylladb/issues/14794 Closes #14801 * github.com:scylladb/scylladb: gossiper: mark_alive: remove local_state param endpoint_state: get rid of _is_alive member and methods gossiper: is_alive: use _live_endpoints gossiper: evict_from_membership: erase endpoint from _live_endpoints gossiper: replicate_live_endpoints_on_change: use _live_endpoints_version to detect change gossiper: run: no need to replicate live_endpoints gossiper: fold update_live_endpoints_version into replicate_live_endpoints_on_change gossiper: add mutate_live_and_unreachable_endpoints gossiper: reset_endpoint_state_map: clear also shadow endpoint sets gossiper: reset_endpoint_state_map: clear live/unreachable endpoints on all shards gossiper: functions that change _live_endpoints must be called on shard 0 gossiper: add lock_endpoint_update_semaphore gossiper: make _live_endpoints an unordered_set endpoint_state: use gossiper::is_alive externally	2023-08-23 17:18:05 +02:00
Patryk Jędrzejczak	ef2eac9941	raft topology: make every type in request_param a named struct We make every alternative type in the request_param variant a named struct to make the code more readable. Additionally, this change will make extending request parameters easier if we decide to do so in the future. Closes #15132	2023-08-23 16:56:00 +02:00
Kamil Braun	169d19e5b0	Merge 'raft topology: support --ignore-dead-nodes in removenode and replace' from Patryk Jędrzejczak We add support for `--ignore-dead-nodes` in `raft_removenode` and `--ignore-dead-nodes-for-replace` in `raft_replace`. For now, we allow passing only host ids of the ignored nodes. Supporting IPs is currently impossible because `raft_address_map` doesn't provide a mapping from IP to a host id. The main steps of the implementation are as follows: - add the `ignore_nodes` column to `system.topology`, - set the `ignore_nodes` value of the topology mutation in `raft_removenode` and `raft_replace`, - extend `service::request_param` with alternative types that allow storing a set of ids of the ignored nodes, - load `ignore_nodes` from `system.topology` into `request_param` in `system_keyspace::load_topology_state`, - add `ignore_nodes` to `exclude_nodes` in `topology_coordinator::exec_global_command`, - pass `ignore_nodes` to `replace_with_repair` and `remove_with_repair` in `storage_service::raft_topology_cmd_handler`. Additionally, we add `test_raft_ignore_nodes.py` with two tests that verify the added changes. Fixes #15025 Closes #15113 * github.com:scylladb/scylladb: test: add test_raft_ignore_nodes test: ManagerClient.remove_node: allow List[HostId] for ignore_dead raft topology: pass ignore_nodes to {replace, remove}_with_repair raft topology: exec_global_command: add ignore_nodes to exclude_nodes raft topology: exec_global_command: change type of exclude_nodes topology_state_machine: extend request_param with a set of raft ids raft topology: set ignore_nodes in raft_removenode and raft_replace utils: introduce split_comma_separated_list raft topology: add the ignore_nodes column to system.topology	2023-08-22 18:04:59 +02:00
Kamil Braun	cdc3cd2b79	Merge 'raft: add fencing tests' from Petr Gusev In this PR a simple test for fencing is added. It exercises the data plane, meaning if it somehow happens that the node has a stale topology version, then requests from this node will get an error 'stale topology'. The test just decrements the node version manually through CQL, so it's quite artificial. To test a more real-world scenario we need to allow the topology change fiber to sometimes skip unavailable nodes. Now the algorithm fails and retries indefinitely in this case. The PR also adds some logs, and removes one seemingly redundant topology version increment, see the commit messages for details. Closes #14901 * github.com:scylladb/scylladb: test_fencing: add test_fence_hints test.py: output the skipped tests test.py: add skip_mode decorator and fixture test.py: add mode fixture hints: add debug log for dropped hints hints: send_one_hint: extend the scope of file_send_gate holder pylib: add ScyllaMetrics hints manager: add send_errors counter token_metadata: add debug logs fencing: add simple data plane test random_tables.py: add counter column type raft topology: don't increment version when transitioning to node_state::normal	2023-08-22 16:28:21 +02:00
Patryk Jędrzejczak	1f57d80ba1	topology_state_machine: extend request_param with a set of raft ids We add two new alternative types to service::request_param: removenode_param and replace_param. They allow storing the list of ignored nodes loaded from the ignore_nodes column of system.topology. We also remove the raft::server_id type because it has been only used by the replace operation.	2023-08-22 14:17:37 +02:00
Petr Gusev	439c91851f	hints: add debug log for dropped hints Dropping data is rather important event, let's log it at least at the debug level. It'll help in debugging tests.	2023-08-22 15:48:40 +04:00
Petr Gusev	9fd3df13a2	hints: send_one_hint: extend the scope of file_send_gate holder The problem was that the holder in with_gate call was released too early. This happened before the possible call to on_hint_send_failure in then_wrapped. As a result, the effects of on_hint_send_failure (segment_replay_failed flag) were not visible in send_one_file after ctx_ptr->file_send_gate.close(), so we could decide that the segment was sent in full and delete it even if sending of some hints led to errors. Fixes #15110	2023-08-22 15:48:40 +04:00
Petr Gusev	1b7603af23	hints manager: add send_errors counter There was no indication of problems in the hints manager metrics before. We need this counter for fencing tests in the later commit, but it seems to be useful on its own.	2023-08-22 14:31:04 +04:00
Patryk Jędrzejczak	0beabdc6ba	utils: introduce split_comma_separated_list Three places handle comma-separated lists similarly: - ss::remove_node.set(...) in api::set_storage_service, - storage_service::parse_node_list, - storage_service::is_repair_based_node_ops_enabled. In the next commit, the fourth place that needs the same logic appears -- storage_service::raft_replace. It needs to load and parse the --ignore-dead-nodes-for-replace param from config. Moreover, the code in is_repair_based_node_ops_enabled is different and doesn't seem right. We swap '\"' and '\'' with ' ' but don't do anything with it afterward. To avoid code duplication and fix is_repair_based_node_ops_enabled, we introduce the new function utils::split_comma_separated_list. This change has a small side effect on logging. For example, ignore_nodes_strs in storage_service::parse_node_list might be printed in a slightly different form.	2023-08-22 10:30:36 +02:00
Patryk Jędrzejczak	16f5db8af2	raft topology: add the ignore_nodes column to system.topology In the following commits, we add support for --ignore-dead-nodes in raft_removenode and --ignore-dead-nodes-for-replace in raft_replace. To make these request parameters accessible for the topology coordinator, we store them in the new ignore_nodes column of system.topology.	2023-08-22 10:30:12 +02:00
Benny Halevy	97061cc3b8	endpoint_state: use gossiper::is_alive externally Before we remove endpoint_state:_is_alive to rely solely on gossipper::_live_endpoints. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-22 09:06:09 +03:00
Avi Kivity	ce43effc21	Merge "fix rebuild with consistent topology management" From Gleb Natapov " The series fixes bogus asserting during topology state load and add a test that runs rebuild to make sure the code will not regress again. Fixes #14958 " * 'gleb/rebuilding_fix_v1' of github.com:scylladb/scylla-dev: test: add rebuild test system_keyspace: fix assertion for missing transition_state	2023-08-21 16:00:42 +03:00
Pavel Emelyanov	6bc30f1944	system_keyspace: De-bloat .setup() from messing with system.local On boot several manipulations with system.local are performed. 1. The host_id value is selected from it with key = local If not found, system_keyspace generates a new host_id, inserts the new value into the table and returns back 2. The cluster_name is selected from it with key = local Then it's system_keyspace that either checks that the name matches the one from db::config, or inserts the db::config value into the table 3. The row with key = local is updated with various info like versions, listen, rpc and bcast addresses, dc, rack, etc. Unconditionally All three steps are scattered over main, p.1 is called directly, p.2 and p.3 are executed via system_keyspace::setup() that happens rather late. Also there's some touch of this table from the cql_test_env startup code. The proposal is to collect this setup into one place and execute it early -- as soon as the system.local table is populated. This frees the system_keyspace code from the logic of selecting host id and cluster name leaving it to main and keeps it with only select/insert work. refs: #2795 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #15082	2023-08-20 21:24:31 +03:00
Kefu Chai	12d6ec5a18	config: respect --log-with-color 1 scylladb overrides some of seastar logging related options with its own options by applying them with `logging::apply_settings()`. but we fail to inherit `with_color` from Seastar as we are using the designated initializer, so the unspecified members are zero initialized. that's why we always have logging message in black and white even if scylla is running in a tty and `--log-with-color 1` is specified. so, make the debugging life more colorful, let's inherit the option from Seastar, and apply it when setting logging related options. see also `29e09a3292` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15076	2023-08-20 13:47:43 +03:00
Tomasz Grabiec	bd8bb5d4b1	Merge 'Wire tablet into compaction group' from Raphael "Raph" Carvalho Compaction group is the data plane for tablets, so this integration allows each tablet to have its own storage (memtable + sstables). A crucial step for dynamic tablets, where each tablet can be worked on independently. There are still some inefficiencies to be worked on, but as it is, it already unlocks further development. ``` INFO 2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata INFO 2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf ``` Closes #14863 * github.com:scylladb/scylladb: Kill scylla option to configure number of compaction groups replica: Wire tablet into compaction group token_metadata: Add this_host_id to topology config replica: Switch to chunked_vector for storing compaction groups replica: Generate group_id for compaction_group on demand	2023-08-18 15:17:17 +02:00
Avi Kivity	1901475598	Merge 'config: mark "experimental" option unused and cleanups' from Kefu Chai in this series, the "experimental" option is marked `Unused` as it has been marked deprecated for almost 2 years since scylla 4.6. and use `experimental_features` to specify the used experimental features explicitly. Closes #14948 * github.com:scylladb/scylladb: config: remove unused namespace alias config: use std::ranges when appropriate config: drop "experimental" option test: disable 'enable_user_defined_functions' if experimental_features does not include udf test: pylib: specify experimental_features explicitly	2023-08-17 20:42:02 +03:00
Kefu Chai	6788903fd6	db: config: mark config class final in `34c3688017`, we added a virtual function to `config_file`, and we new and delete pointer pointing to a `db::config` instance with `unique_ptr<>`. this makes the compiler nervous, as deleting a pointer pointing to an instance of non-final class with virtual function could lead to leak, if this pointer actually points to a derived class of this non-final class. so, in order to silence the warning and to prevent potential problem in future, let's mark `db::config` final. the warning from Clang 16 looks like: ``` In file included from /home/kefu/dev/scylladb/test/lib/test_services.cc:10: In file included from /home/kefu/dev/scylladb/test/lib/test_services.hh:25: In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/memory:78: /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:99:2: error: delete called on non-final 'db::config' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete __ptr; ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:404:4: note: in instantiation of member function 'std::default_delete<db::config>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/dev/scylladb/test/lib/test_services.cc:189:16: note: in instantiation of member function 'std::unique_ptr<db::config>::~unique_ptr' requested here auto cfg = std::make_unique<db::config>(); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15071	2023-08-17 13:43:16 +03:00
Raphael S. Carvalho	b578d6643f	Kill scylla option to configure number of compaction groups The option was introduced to bootstrap the project. It's still useful for testing, but that translates into maintaining an additional option and code that will not be really used outside of testing. A possible option is to later map the option in boost tests to initial_tablets, which may yield the same effect for testing. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-08-16 18:23:53 -03:00
Piotr Smaroń	34c3688017	db: config: add live_updatable_config_params_changeable_via_cql option If `live_updatable_config_params_changeable_via_cql` is set to true, configuration parameters defined with `liveness::LiveUpdate` option can be updated in the runtime with CQL, i.e. by updating `system.config` virtual table. If we don't want any configuration parameter to be changed in the runtime by updating `system.config` virtual table, this option should be set to false. This option should be set to false for e.g. cloud users, who can only perform CQL queries, and should not be able to change scylla's configuration on the fly. Current implemenatation is generic, but has a small drawback - messages returned to the user can be not fully accurate, consider: ``` cqlsh> UPDATE system.config SET value='2' WHERE name='task_ttl_in_seconds'; WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="option is not live-updateable" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} ``` where `task_ttl_in_seconds` has been defined with `liveness::LiveUpdate`, but because `live_updatable_config_params_changeable_via_cql` is set to `false` in `scylla.yaml,` `task_ttl_in_seconds` cannot be modified in the runtime by updating `system.config` virtual table. Fixes #14355 Closes #14382	2023-08-16 17:56:27 +03:00
Benny Halevy	8fbcf1ab9f	view: start: ignore also abort_requested_exception We see the abort_requested_exception error from time to time, instead of sleep_aborted that was expected and quietly ignored (in debug log level). Treat abort_requested_exception the same way since the error is expected on shutdown and to reduce test flakiness, as seen for example, in https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/3033/artifact/logs-full.release.010/1691896356104_repair_additional_test.py%3A%3ATestRepairAdditional%3A%3Atest_repair_schema/node2.log ``` INFO 2023-08-13 03:12:29,151 [shard 0] compaction_manager - Asked to stop WARN 2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Got error in the loop, live_nodes={}: seastar::sleep_aborted (Sleep is aborted) INFO 2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Finished main loop WARN 2023-08-13 03:12:29,152 [shard 0] cdc - Aborted update CDC description table with generation (2023/08/13 03:12:17, d74aad4b-6d30-4f22-947b-282a6e7c9892) INFO 2023-08-13 03:12:29,152 [shard 1] compaction_manager - Asked to stop INFO 2023-08-13 03:12:29,152 [shard 1] compaction_manager - Stopped INFO 2023-08-13 03:12:29,153 [shard 0] init - Signal received; shutting down INFO 2023-08-13 03:12:29,153 [shard 0] init - Shutting down view builder ops INFO 2023-08-13 03:12:29,153 [shard 0] view - Draining view builder INFO 2023-08-13 03:12:29,153 [shard 1] view - Draining view builder INFO 2023-08-13 03:12:29,153 [shard 0] compaction_manager - Stopped ERROR 2023-08-13 03:12:29,153 [shard 0] view - start failed: seastar::abort_requested_exception (abort requested) ERROR 2023-08-13 03:12:29,153 [shard 1] view - start failed: seastar::abort_requested_exception (abort requested) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15029	2023-08-13 18:39:09 +03:00
Gleb Natapov	53120c1d57	system_keyspace: fix assertion for missing transition_state The code assumes that if there is no transition_state there should be no nodes that currently in transition in a state other then left_token_ring state, but rebuild operation also creates such nodes, so add the check for it as well.	2023-08-10 16:37:56 +03:00
Kamil Braun	59c410fb97	Merge 'migration_manager: announce: provide descriptions for all calls' from Patryk Jędrzejczak The `system.group0_history` table provides useful descriptions for each command committed to Raft group 0. One way of applying a command to group 0 is by calling `migration_manager::announce`. This function has the `description` parameter set to empty string by default. Some calls to `announce` use this default value which causes `null` values in `system.group0_history`. We want `system.group0_history` to have an actual description for every command, so we change all default descriptions to reasonable ones. Going further, We remove the default value for the `description` parameter of `migration_manager::announce` to avoid using it in the future. Thanks to this, all commands in `system.group0_history` will have a non-null description. Fixes #13370 Closes #14979 * github.com:scylladb/scylladb: migration_manager: announce: remove the default value of description test: always pass empty description to migration_manager::announce migration_manager: announce: provide descriptions for all calls	2023-08-09 16:58:41 +02:00
Kefu Chai	153a808f52	config: remove unused namespace alias bpo is not used after it is defined, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-08-09 10:17:34 +08:00
Kefu Chai	6355270120	config: use std::ranges when appropriate use std::ranges functions for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-08-09 10:17:34 +08:00
Kefu Chai	64bc8d2f7d	config: drop "experimental" option "experimental" was marked deprecated in `8b917f7c`. this change was included since Scylla 4.6. now that 5.3 has been branched, this change will be included 5.4. this should be long enough for the user's turn around if this option is ever used. the dtests using this option has been audited and updated accordingly. and the unit testing this option is removed as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-08-09 10:17:34 +08:00
Pavel Emelyanov	f1515c610e	code: Remove query-context.hh The whole thing is unused now, so the header is no longer needed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-08 11:11:07 +03:00
Pavel Emelyanov	413d81ac16	code: Remove qctx Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-08 11:10:56 +03:00
Pavel Emelyanov	d7f5d6dba8	system_keyspace: Use system_keyspace's container() to flush In force_blocking_flush() there's an invoke-on-all invocation of replica::database::flush() and a FIXME to get the replica database from somewhere else rather than via query-processor -> data_dictionary. Since now the force_blocking_flush() is non-static the invoke-on-all can happen via system_keyspace's container and the database can be obtained directly from the sys.ks. local instance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-08 11:09:32 +03:00
Pavel Emelyanov	7a342ed5c0	system_keyspace: Make force_blocking_flush() non-static Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-08 11:09:20 +03:00
Pavel Emelyanov	6b8fe5ac43	system_keyspace: Coroutinize update_tokens() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-08 11:09:15 +03:00
Pavel Emelyanov	1700d79b60	system_keyspace: Coroutinize save_truncation_record() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-08 11:09:09 +03:00
Patryk Jędrzejczak	27ddf78171	migration_manager: announce: provide descriptions for all calls The system.group0_history table provides useful descriptions for each command committed to Raft group 0. One way of applying a command to group 0 is by calling migration_manager::announce. This function has the description parameter set to empty string by default. Some calls to announce use this default value which causes null values in system.group0_history. We want system.group0_history to have an actual description for every command, so we change all default descriptions to reasonable ones. We can't provide a reasonable description to announce in query_processor::execute_thrift_schema_command because this function is called in multiple situations. To solve this issue, we add the description parameter to this function and to handler::execute_schema_command that calls it.	2023-08-07 14:38:11 +02:00
Avi Kivity	6c1e44e237	Merge 'Make replica::database and cql3::query_processor share wasm manager' from Pavel Emelyanov This makes it possible to remove remaining users of the global qctx. The thing is that db::schema_tables code needs to get wasm's engine, alien runner and instance cache to build wasm context for the merged function or to drop it from cache in the opposite case. To get the wasm stuff, this code uses global qctx -> query_processor -> wasm chain. However, the functions (un)merging code already has the database reference at hand, and its natural to get wasm stuff from it, not from the q.p. which is not available So this PR packs the wasm engine, runner and cache on sharded<wasm::manager> instance, makes the manager be referenced by both q.p. and database and removes the qctx from schema tables code Closes #14933 * github.com:scylladb/scylladb: schema_tables: Stop using qctx database: Add wasm::manager& dependency main, cql_test_env, wasm: Start wasm::manager earlier wasm: Shuffle context::context() wasm: Add manager::remove() wasm: Add manager::precompile() wasm: Move stop() out of query_processor wasm: Make wasm sharded<manager> query_processor: Wrap wasm stuff in a struct	2023-08-06 17:00:28 +03:00
Tomasz Grabiec	f26e65d4d4	tablets: Fix crash on table drop Before the patch, tablet metadata update was processed on local schema merge before table changes. When table is dropped, this means that for a while table will exist without a corresponding tablet map. This can cause memtable flush for this table to fail, resulting in intentional abort(). That's because sstable writing attempts to access tablet map to generate sharding metadata. If auto_snapshot is enabled, this is much more likely to happen, because we flush memtables on table drop. To fix the problem, process tablet metadata after dropping tables, but before creating tables. Fixes #14943 Closes #14954	2023-08-06 16:45:43 +03:00
Pavel Emelyanov	fd50ba839c	schema_tables: Stop using qctx There are two places in there that need qctx to get query_processor from to, in turn, get wasm::manager from. Fortunately, both places have the database reference at hand and can get the wasm::manager from it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-04 19:47:50 +03:00
Pavel Emelyanov	595c5abbf9	wasm: Shuffle context::context() Add a constructor that builds context out of const manager reference. The existing one needs to get engine and instance cache and does it via query_processor. This change lets removing those exports and finally -- drop the wasm::manager -> cql3::query_processor friendship Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-04 19:47:50 +03:00
Pavel Emelyanov	56404ee053	wasm: Add manager::remove() This is one of the users of query_processor's export of wasm::manager's instance cache. Remove it in advance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-04 19:47:50 +03:00
Pavel Emelyanov	93cb73fddb	wasm: Add manager::precompile() This is not to make query_processor export alien runner from the wasm::manager Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-04 19:47:50 +03:00
Kamil Braun	421a5ad55c	Merge 'feature_service: don't load whole topology state to check features' from Piotr Dulikowski Currently, feature service uses `system_keyspace::load_topology_state` to load information about features from the `system.topology` table. This function implicitly assumes that it is called after schema commitlog replay and will correspond to the state of the topology state machine after some command is applied. However, feature check happens before the commitlog replay. If some group 0 command consists of multiple mutations that are not applied atomically, the `load_topology_state` function may fail to construct a `service::topology` object based on the table state. Moreover, this function not only checks `system.topology` but also `system.cdc_generations_v3` - in the case of the issue, the entry that was loaded from the this table didn't contain the `num_ranges` parameter. In order to fix this, the feature check code now uses `load_topology_features_state` which only loads enabled and supported features from `system.topology`. Only this information is really necessary for the feature check, and it doesn't have any invariants to check. Fixes: #14944 Closes #14955 * github.com:scylladb/scylladb: feature_service: don't load whole topology state to check features system_keyspace: separate loading topology_features from topology topology_state_machine: extract features-related fields to a struct untyped_result_set: add missing_column_exception	2023-08-04 15:09:12 +02:00
Piotr Dulikowski	8f491457ae	system_keyspace: separate loading topology_features from topology Now, it is possible to load topology_features separately from the topology struct. It will be used in the code that checks enabled features on startup.	2023-08-04 12:32:04 +02:00
Piotr Dulikowski	f1704eeee6	topology_state_machine: extract features-related fields to a struct `enabled_features` and `supported_features` are now moved to a new `topology::features` struct. This will allow to move load this information independently from the `topology` struct, which will be needed for feature checking during start.	2023-08-04 12:21:51 +02:00
Amnon Heiman	d10a3dd19a	config: add enable_node_table_metrics flag By default, per-table-per-shard metrics reporting is turned off, and the aggregated version of the metrics (per-table-per-node) will be turned on. There could be a situation where a user with an excessive number of tables would suffer from performance issues, both from the network and the metrics collection server. This patch adds a config option, enable_node_table_metrics, which allows users to turn off per-table metrics reporting altogether. For example, when running Scylla with the command line argument '--enable-node-aggregated-table_metrics 0' per-table metrics will not be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2023-08-02 10:20:18 +03:00
Kamil Braun	84bb75ea0a	Merge 'service: migration_manager: change the prepare_ methods to functions' from Patryk Jędrzejczak The `migration_manager` service is responsible for schema convergence in the cluster - pushing schema changes to other nodes and pulling schema when a version mismatch is observed. However, there is also a part of `migration_manager` that doesn't really belong there - creating mutations for schema updates. These are the functions with `prepare_` prefix. They don't modify any state and don't exchange any messages. They only need to read the local database. We take these functions out of `migration_manager` and make them separate functions to reduce the dependency of other modules (especially `query_processor` and CQL statements) on `migration_manager`. Since all of these functions only need access to `storage_proxy` (or even only `replica::database`), doing such a refactor is not complicated. We just have to add one parameter, either `storage_proxy` or `database` and both of them are easily accessible in the places where these functions are called. This refactor makes `migration_manager` unneeded in a few functions: - `alternator::executor::create_keyspace`, - `cql3::statements::alter_type_statement::prepare_announcement_mutations`, - `cql3::statements::schema_altering_statement::prepare_schema_mutations`, - `cql3::query_processor::execute_thrift_schema_command:`, - `thrift::handler::execute_schema_command`. We remove the `migration_manager&` parameter from all these functions. Fixes #14339 Closes #14875 * github.com:scylladb/scylladb: cql3: query_processor::execute_thrift_schema_command: remove an unused parameter cql3: schema_altering_statement::prepare_schema_mutations: remove an unused parameter cql3: alter_type_statement::prepare_announcement_mutations: change parameters alternator: executor::create_keyspace: remove an unused parameter service: migration_manager: change the prepare_ methods to functions	2023-08-01 11:56:56 +02:00
Avi Kivity	3de7cacdf3	Merge 'De-static system_keyspace's [gs]et_scylla_local_param(_as)?' from Pavel Emelyanov Those without `_as` suffix are just marked non-static The `..._as` ones are made class methods (now they are local to system_keyspace.cc) After that the `..._as` ones are patched to use `this->` instead of `qctx` Closes #14890 * github.com:scylladb/scylladb: system_keyspace: Stop using qctx in [gs]et_scylla_local_param_as() system_keyspace: Reuse container() and _db member for flushing system_keyspace: Make [gs]et_scylla_local_param_as() class methods system_keyspace: De-static [gs]et_scylla_local_param()	2023-07-31 21:51:04 +03:00
Pavel Emelyanov	a596186e47	system_keyspace: Stop using qctx in [gs]et_scylla_local_param_as() Now those methods are non-static and can start using this's reference to query processor instead of the global qctx thing Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-31 16:02:21 +03:00
Pavel Emelyanov	ec4040496b	system_keyspace: Reuse container() and _db member for flushing The set_scylla_local_param_as() wants to flush replica::database on all shards. For that it uses smp::invoke_on_all() and qctx, but since the method is now non-static one for system_keyspace it can enjoy usiing container().invoke_on_all() and this->_db (on target shard) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-31 16:02:21 +03:00
Pavel Emelyanov	1ac4b7d2fe	system_keyspace: Make [gs]et_scylla_local_param_as() class methods These are now two .cc-local templatized helpers, but they are only called by system_keyspace:: non-static methods, so can be such as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-31 16:02:18 +03:00
Pavel Emelyanov	04b12d24fd	system_keyspace: De-static [gs]et_scylla_local_param() All same-class callers are now non-static methods of system_keyspace, all external callers do it via an object at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-31 16:02:18 +03:00
Botond Dénes	4a02865ea1	Merge 'Prevent invalidation of iterators over database::_column_families' from Aleksandra Martyniuk Maps related to column families in database are extracted to a column_families_data class. Access to them is possible only through methods. All methods which may preempt hold rwlock in relevant mode, so that the iterators can't become invalid. Fixes: #13290 Closes #13349 * github.com:scylladb/scylladb: replica: make tables_metadata's attributes private replica: add methods to get a filtered copy of tables map replica: add methods to check if given table exists replica: add methods to get table or table id replica: api: return table_id instead of const table_id& replica: iterate safely over tables related maps replica: pass tables_metadata to phased_barrier_top_10_counts replica: add methods to safely add and remove table replica: wrap column families related maps into tables_metadata replica: futurize database::add_column_family and database::remove	2023-07-31 15:31:59 +03:00
Botond Dénes	72043a6335	Merge 'Avoid using qctx in schema_tables' column-mapping queries' from Pavel Emelyanov There are three methods in system_keyspace namespace that run queries over `system.scylla_table_schema_history` table. For that they use qctx which's not nice. Fortunately, all the callers already have the system_keyspace& local variable or argument they can pass to those methods. Since the accessed table belongs to system keyspace, the latter declares the querying methods as "friends" to let them get private `query_processor& _qp` member Closes #14876 * github.com:scylladb/scylladb: schema_tables: Extract query_processor from system_keyspace for querying schema_tables: Add system_keyspace& argument to ..._column_mapping() calls migration_manager: Add system_keyspace argument to get_schema_mapping()	2023-07-31 15:00:59 +03:00

1 2 3 4 5 ...

3252 Commits