scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Nadav Har'El	e57252092c	Merge 'cql3: result_set, selector: change value type to managed_bytes_opt' from Avi Kivity CQL evolved several expression evaluation mechanisms: WHERE clause, selectors (the SELECT clause), and the LWT IF clause are just some examples. Most now use expressions, which use managed_bytes_opt as the underlying value representation, but selectors still use bytes_opt. This poses two problems: 1. bytes_opt generates large contiguous allocations when used with large blobs, impacting latency 2. trying to use expressions with bytes_opt will incur a copy, reducing performance To solve the problem, we harmonize the data types to managed_bytes_opt (#13216 notwithstanding). This is somewhat difficult since the source of the values are views into a bytes_ostream. However, luckily bytes_ostream and managed_bytes_view are mostly compatible so with a little effort this can be done. The series is neutral wrt performance: before: ``` 222118.61 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 224250.14 tps ( 61.1 allocs/op, 12.1 tasks/op, 43094 insns/op, 0 errors) 224115.66 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 223508.70 tps ( 61.1 allocs/op, 12.1 tasks/op, 43107 insns/op, 0 errors) 223498.04 tps ( 61.1 allocs/op, 12.1 tasks/op, 43087 insns/op, 0 errors) ``` after: ``` 220708.37 tps ( 61.1 allocs/op, 12.1 tasks/op, 43118 insns/op, 0 errors) 225168.99 tps ( 61.1 allocs/op, 12.1 tasks/op, 43081 insns/op, 0 errors) 222406.00 tps ( 61.1 allocs/op, 12.1 tasks/op, 43088 insns/op, 0 errors) 224608.27 tps ( 61.1 allocs/op, 12.1 tasks/op, 43102 insns/op, 0 errors) 225458.32 tps ( 61.1 allocs/op, 12.1 tasks/op, 43098 insns/op, 0 errors) ``` Though I expect with some more effort we can eliminate some copies. Closes #13637 * github.com:scylladb/scylladb: cql3: untyped_result_set: switch to managed_bytes_view as the cell type cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt cql3: untyped_result_set: always own data types: abstract_type: add mixed-type versions of compare() and equal() utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt utils: managed_bytes: add managed_bytes_view::with_linearized() utils: managed_bytes: mark managed_bytes_view::is_linearized() const	2023-05-10 15:01:45 +03:00
Wojciech Mitros	9ae1b02144	service: revoke permissions on functions when a function/keyspace is dropped Currently, when a user has permissions on a function/all functions in keyspace, and the function/keyspace is dropped, the user keeps the permissions. As a result, when a new function/keyspace is created with the same name (and signature), they will be able to use it even if no permissions on it are granted to them. Simliarly to regular UDFs, the same applies to UDAs. After this patch, the corresponding permissions on functions are dropped when a function/keyspace is dropped. Fixes #13820 Closes #13823	2023-05-10 14:39:42 +03:00
Kamil Braun	7d9ab44e81	Merge 'token_metadata: read remapping for write_both_read_new' from Gusev Petr When new nodes are added or existing nodes are deleted, the topology state machine needs to shunt reads from the old nodes to the new ones. This happens in the `write_both_read_new` state. The problem is that previously this state was not handled in any way in `token_metadata` and the read nodes were only changed when the topology state machine reached the final 'owned' state. To handle `write_both_read_new` an additional `interval_map` inside `token_metadata` is maintained similar to `pending_endpoints`. It maps the ranges affected by the ongoing topology change operation to replicas which should be used for reading. When topology state sm reaches the point when it needs to switch reads to a new topology, it passes `request_read_new=true` in a call to `update_pending_ranges`. This forces `update_pending_ranges` to compute the ranges based on new topology and store them to the `interval_map`. On the data plane, when a read on coordinator needs to decide which endpoints to use, it first consults this `interval_map` in `token_metadata`, and only if it doesn't contain a range for current token it uses normal endpoints from `effective_replication_map`. Closes #13376 * github.com:scylladb/scylladb: storage_proxy, storage_service: use new read endpoints storage_proxy: rename get_live_sorted_endpoints->get_endpoints_for_reading token_metadata: add unit test for endpoints_for_reading token_metadata: add endpoints for reading sequenced_set: add extract_set method token_metadata_impl: extract maybe_migration_endpoints helper function token_metadata_impl: introduce migration_info token_metadata_impl: refactor update_pending_ranges token_metadata: add unit tests token_metadata: fix indentation token_metadata_impl: return unique_ptr from clone functions	2023-05-10 10:03:30 +02:00
Petr Gusev	08529a1c6c	storage_proxy, storage_service: use new read endpoints We use set_topology_transition_state to set read_new state in storage_service::topology_state_load based on _topology_state_machine._topology.tstate. This triggers update_pending_ranges to compute and store new ranges for read requests. We use this information in storage_proxy::get_endpoints_for_reading when we need to decide which nodes to use for reading.	2023-05-09 18:42:03 +04:00
Petr Gusev	052b91fb1f	storage_proxy: rename get_live_sorted_endpoints->get_endpoints_for_reading We are going to use remapped_endpoints_for_reading, we need to make sure we use it in the right place. The get_live_sorted_endpoints function looks like what we need - it's used in all read code paths. From its name, however, this was not obvious. Also, we add the parameter ks_name as we'll need it to pass to remapped_endpoints_for_reading.	2023-05-09 18:42:03 +04:00
Kamil Braun	41cac23aa4	Merge 'raft: verify RPC destination ID' from Mikołaj Grzebieluch All Raft verbs include `dst_id`, the ID of the destination server, but it isn't checked. `append_entries` will work even if it arrives at completely the wrong server (but in the same group). It can cause problems, e.g. in the scenario of replacing a dead node. This commit adds verifying if `dst_id` matches the server's ID and if it doesn't, the Raft verb is rejected. Closes #12179 Testing --- Testcase and scylla's configuration: `57d3ef14d8` It artificially lengthens the duration of replacing the old node. It increases the chance of getting the RPC command sent to a replaced node, by the new node. In the logs of the node that replaced the old one, we can see logs in the form: ``` DEBUG <time> [shard 0] raft_group_registry - Got message for server <dst_id>, but my id is <my_id> ``` It indicates that the Raft verb with the wrong `dst_id` was rejected. This test isn't included in the PR because it doesn't catch any specific error. Closes #13575 * github.com:scylladb/scylladb: service/raft: raft_group_registry: Add verification of destination ID service/raft: raft_group_registry: `handle_raft_rpc` refactor	2023-05-09 11:33:28 +02:00
Avi Kivity	42a1ced73b	cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt The expression system uses managed_bytes_opt for values, but result_set uses bytes_opt. This means that processing values from the result set in expressions requires a copy. Out of the two, managed_bytes_opt is the better choice, since it prevents large contiguous allocations for large blobs. So we switch result_set to use managed_bytes_opt. Users of the result_set API are adjusted. The db::function interface is not modified to limit churn; instead we convert the types on entry and exit. This will be adjusted in a following patch.	2023-05-07 17:17:36 +03:00
Kamil Braun	aba31ad06c	storage_service: use `seastar::format` instead of `fmt::format` For some reason Scylla crashes on `aarch64` in release mode when calling `fmt::format` in `raft_removenode` and `raft_decommission`. E.g. on this line: ``` group0_command g0_cmd = _group0->client().prepare_command(std::move(change), guard, fmt::format("decomission: request decomission for {}", raft_server.id())); ``` I found this in our configure.py: ``` def get_clang_inline_threshold(): if args.clang_inline_threshold != -1: return args.clang_inline_threshold elif platform.machine() == 'aarch64': # we see miscompiles with 1200 and above with format("{}", uuid) # also coroutine miscompiles with 600 return 300 else: return 2500 ``` but reducing it to `0` didn't help. I managed to get the following backtrace (with inline threshold 0): ``` void boost::intrusive::list_impl<boost::intrusive::mhtraits<seastar::thread_context, boost::intrusive::list_member_hook<>, &seastar::thread_context::_all_link>, unsigned long, false, void>::clear_and_dispose<boost::intrusive::detail::null_disposer>(boost::intrusive::detail::null_disposer) at /usr/include/boost/intrusive/list.hpp:751 (inlined by) boost::intrusive::list_impl<boost::intrusive::mhtraits<seastar::thread_context, boost::intrusive::list_member_hook<>, &seastar::thread_context::_all_link>, unsigned long, false, void>::clear() at /usr/include/boost/intrusive/list.hpp:728 (inlined by) ~list_impl at /usr/include/boost/intrusive/list.hpp:255 void fmt::v9::detail::buffer<wchar_t>::append<wchar_t>(wchar_t const, wchar_t const) at ??:? void fmt::v9::detail::vformat_to<char>(fmt::v9::detail::buffer<char>&, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<std::conditional<std::is_same<fmt::v9::type_identity<char>::type, char>::value, fmt::v9::appender, std::back_insert_iterator<fmt::v9::detail::buffer<fmt::v9::type_identity<char>::type> > >::type, fmt::v9::type_identity<char>::type> >, fmt::v9::detail::locale_ref) at ??:? fmt::v9::vformat[abi:cxx11](fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) at ??:? std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > fmt::v9::format<utils::tagged_uuid<raft::server_id_tag>&>(fmt::v9::basic_format_string<char, fmt::v9::type_identity<utils::tagged_uuid<raft::server_id_tag>&>::type>, utils::tagged_uuid<raft::server_id_tag>&) at /usr/include/fmt/core.h:3206 (inlined by) service::storage_service::raft_removenode(utils::tagged_uuid<locator::host_id_tag>) at ./service/storage_service.cc:3572 ``` Maybe it's a bug in `fmt` library? In any case replacing the call with `::format` (i.e. `seastar::format` from seastar/core/print.hh) helps. Do it for the entire file for consistency (and avoiding this bug). Also, for the future, replace `format` calls with `::format` - now it's the same thing, but the latter won't clash with `std::format` once we switch to libstdc++13. Fixes #13707 Closes #13711	2023-05-05 19:23:22 +02:00
Mikołaj Grzebieluch	4a8a8c153c	service/raft: raft_group_registry: Add verification of destination ID All Raft verbs include dst_id, the ID of the destination server, but it isn't checked. `append_entries` will work even if it arrives at completely the wrong server (but in the same group). It can cause problems, e.g. in the scenario of replacing a dead node. This commit adds verifying if `dst_id` matches the server's ID and if it doesn't, the Raft verb is rejected. Closes #12179	2023-05-04 15:25:23 +02:00
Tomasz Grabiec	e385ce8a2b	Merge "fix stack use after free during shutdown" from Gleb storage_service uses raft_group0 but the during shutdown the later is destroyed before the former is stopped. This series move raft_group0 destruction to be after storage_service is stopped already. For the move to work some existing dependencies of raft_group0 are dropped since they do not really needed during the object creation. Fixes #13522	2023-05-04 15:14:18 +02:00
Mikołaj Grzebieluch	ae41d908d7	service/raft: raft_group_registry: `handle_raft_rpc` refactor One-way RPC and two-way RPC have different semantics, i.e. in the first one client doesn't need to wait for an answer. This commit splits the logic of `handle_raft_rpc` to enable handle differences in semantics, e.g. errors handling.	2023-05-04 13:05:04 +02:00
Gleb Natapov	e9fb885e82	service/raft: raft_group0: drop dependency on cdc::generation_service raft_group0 does not really depends on cdc::generation_service, it needs it only transiently, so pass it to appropriate methods of raft_group0 instead of during its creation.	2023-05-04 13:03:07 +03:00
Tomasz Grabiec	aba5667760	Merge 'raft topology: refactor the coordinator to allow non-node specific topology transitions' from Kamil Braun We change the meaning and name of `replication_state`: previously it was meant to describe the "state of tokens" of a specific node; now it describes the topology as a whole - the current step in the 'topology saga'. It was moved from `ring_slice` into `topology`, renamed into `transition_state`, and the topology coordinator code was modified to switch on it first instead of node state - because there may be no single transitioning node, but the topology itself may be transitioning. This PR was extracted from #13683, it contains only the part which refactors the infrastructure to prepare for non-node specific topology transitions. Closes #13690 * github.com:scylladb/scylladb: raft topology: rename `update_replica_state` -> `update_topology_state` raft topology: remove `transition_state::normal` raft topology: switch on `transition_state` first raft topology: `handle_ring_transition`: rename `res` to `exec_command_res` raft topology: parse replaced node in `exec_global_command` raft topology: extract `cleanup_group0_config_if_needed` from `get_node_to_work_on` storage_service: extract raft topology coordinator fiber to separate class raft topology: rename `replication_state` to `transition_state` raft topology: make `replication_state` a topology-global state	2023-04-30 10:55:24 +02:00
Kefu Chai	56b99b7879	build: cmake: pick up tablets related changes to sync with the changes in `5e89f2f5ba` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-28 11:13:41 +08:00
Asias He	a8040306bb	storage_service: Fix removing replace node as pending Consider - n1, n2, n3 - n3 is down - n4 replaces n3 with the same ip address 127.0.0.3 - Inside the storage_service::handle_state_normal callback for 127.0.0.3 on n1/n2 ``` auto host_id = _gossiper.get_host_id(endpoint); auto existing = tmptr->get_endpoint_for_host_id(host_id); ``` host_id = new host id existing = empty As a result, del_replacing_endpoint() will not be called. This means 127.0.0.3 will not be removed as a pending node on n1 and n2 when replacing is done. This is wrong. This is a regression since commit `9942c60d93` (storage_service: do not inherit the host_id of a replaced a node), where replacing node uses a new host id than the node to be replaced. To fix, call del_replacing_endpoint() when a node becomes NORMAL and existing is empty. Before: n1: storage_service - replace[cd1f187a-0eee-4b04-91a9-905ecc499cfc]: Added replacing_node=127.0.0.3 to replace existing_node=127.0.0.3, coordinator=127.0.0.3 token_metadata - Added node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3 storage_service - replace[cd1f187a-0eee-4b04-91a9-905ecc499cfc]: Marked ops done from coordinator=127.0.0.3 storage_service - Node 127.0.0.3 state jump to normal storage_service - Set host_id=6f9ba4e8-9457-4c76-8e2a-e2be257fe123 to be owned by node=127.0.0.3 After: n1: storage_service - replace[28191ea6-d43b-3168-ab01-c7e7736021aa]: Added replacing_node=127.0.0.3 to replace existing_node=127.0.0.3, coordinator=127.0.0.3 token_metadata - Added node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3 storage_service - replace[28191ea6-d43b-3168-ab01-c7e7736021aa]: Marked ops done from coordinator=127.0.0.3 storage_service - Node 127.0.0.3 state jump to normal token_metadata - Removed node 127.0.0.3 as pending replacing endpoint which replaces existing node 127.0.0.3 storage_service - Set host_id=72219180-e3d1-4752-b644-5c896e4c2fed to be owned by node=127.0.0.3 Tests: https://github.com/scylladb/scylla-dtest/pull/3126 Closes #13677	2023-04-27 21:03:01 +03:00
Kamil Braun	0bee872fb1	raft topology: rename `update_replica_state` -> `update_topology_state` The new name is more generic and appropriate for topology transitions which don't affect any specific replica but the entire cluster as a whole (which we'll introduce later). Also take `guard` directly instead of `node_to_work_on` in this more generic function. Since we want `node_to_work_on` to die when we steal its guard, introduce `take_guard` which takes ownership of the object and returns the guard.	2023-04-27 15:22:19 +02:00
Kamil Braun	22ab5982e7	raft topology: remove `transition_state::normal` What this state really represented is that there is currently no transition. So remove it and make `transition_state` optional instead.	2023-04-27 15:18:32 +02:00
Kamil Braun	61c4e0ae20	raft topology: switch on `transition_state` first Previously the code assumed that there was always a 'node to work on' (a node which wants to change its state) or there was no work to do at all. It would find such a node, switch on its state (e.g. check if it's bootstrapping), and in some states switch on the topology `transition_state` (e.g. check if it's `write_both_read_old`). We want to introduce transitions that are not node-specific and can work even when all nodes are 'normal' (so there's no 'node to work on'). As a first step, we refactor the code so it switches on `transition_state` first. In some of these states, like `write_both_read_old`, there must be a 'node to work on' for the state to make sense; but later in some states it will be optional (such as `commit_cdc_generation`).	2023-04-27 15:14:59 +02:00
Kamil Braun	a023ca2cf1	raft topology: `handle_ring_transition`: rename `res` to `exec_command_res` A more descriptive name.	2023-04-27 15:12:12 +02:00
Kamil Braun	4ddfce8213	raft topology: parse replaced node in `exec_global_command` Will make following commits easier.	2023-04-27 15:10:49 +02:00
Kamil Braun	bafce8fd28	raft topology: extract `cleanup_group0_config_if_needed` from `get_node_to_work_on`	2023-04-27 15:04:36 +02:00
Kamil Braun	98f69f52aa	storage_service: extract raft topology coordinator fiber to separate class The lambdas defined inside the fiber are now methods of this class. Currently `handle_node_transition` is calling `handle_ring_transition`, in a later commit we will reverse this: `handle_ring_transition` will call `handle_node_transition`. We won't have to shuffle the functions around because they are members of the same class, making the change easier to review. In general, the code will be easier to maintain in this new form (no need to deal with so many lambda captures etc.) Also break up some lines which exceeded the 120 character limit (as per Seastar coding guidelines).	2023-04-27 15:04:35 +02:00
Kamil Braun	defa63dc20	raft topology: rename `replication_state` to `transition_state` The new name is more generic - it describes the current step of a 'topology saga` (a sequence of steps used to implement a larger topology operation such as bootstrap).	2023-04-27 11:39:38 +02:00
Kamil Braun	af1ea2bb16	raft topology: make `replication_state` a topology-global state Previously it was part of `ring_slice`, belonging to a specific node. This commit moves it into `topology`, making it a cluster-global property. The `replication_state` column in `system.topology` is now `static`. This will allow us to easily introduce topology transition states that do not refer to any specific node. `commit_cdc_generation` will be such a state, allowing us to commit a new CDC generation even though all nodes are normal (none are transitioning). One could argue that the other states are conceptually already cluster-global: for example, `write_both_read_new` doesn't affect only the tokens of a bootstrapping (or decommissioning etc.) node; it affects replica sets of other tokens as well (with RFs greater than 1).	2023-04-27 11:39:38 +02:00
Kamil Braun	30cc07b40d	Merge 'Introduce tablets' from Tomasz Grabiec This PR introduces an experimental feature called "tablets". Tablets are a way to distribute data in the cluster, which is an alternative to the current vnode-based replication. Vnode-based replication strategy tries to evenly distribute the global token space shared by all tables among nodes and shards. With tablets, the aim is to start from a different side. Divide resources of replica-shard into tablets, with a goal of having a fixed target tablet size, and then assign those tablets to serve fragments of tables (also called tablets). This will allow us to balance the load in a more flexible manner, by moving individual tablets around. Also, unlike with vnode ranges, tablet replicas live on a particular shard on a given node, which will allow us to bind raft groups to tablets. Those goals are not yet achieved with this PR, but it lays the ground for this. Things achieved in this PR: - You can start a cluster and create a keyspace whose tables will use tablet-based replication. This is done by setting `initial_tablets` option: ``` CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'initial_tablets': 8}; ``` All tables created in such a keyspace will be tablet-based. Tablet-based replication is a trait, not a separate replication strategy. Tablets don't change the spirit of replication strategy, it just alters the way in which data ownership is managed. In theory, we could use it for other strategies as well like EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy is augmented to support tablets. - You can create and drop tablet-based tables (no DDL language changes) - DML / DQL work with tablet-based tables Replicas for tablet-based tables are chosen from tablet metadata instead of token metadata Things which are not yet implemented: - handling of views, indexes, CDC created on tablet-based tables - sharding is done using the old method, it ignores the shard allocated in tablet metadata - node operations (topology changes, repair, rebuild) are not handling tablet-based tables - not integrated with compaction groups - tablet allocator piggy-backs on tokens to choose replicas. Eventually we want to allocate based on current load, not statically Closes #13387 * github.com:scylladb/scylladb: test: topology: Introduce test_tablets.py raft: Introduce 'raft_server_force_snapshot' error injection locator: network_topology_strategy: Support tablet replication service: Introduce tablet_allocator locator: Introduce tablet_aware_replication_strategy locator: Extract maybe_remove_node_being_replaced() dht: token_metadata: Introduce get_my_id() migration_manager: Send tablet metadata as part of schema pull storage_service: Load tablet metadata when reloading topology state storage_service: Load tablet metadata on boot and from group0 changes db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() migration_notifier: Introduce before_drop_keyspace() migration_manager: Make prepare_keyspace_drop_announcement() return a future<> test: perf: Introduce perf-tablets test: Introduce tablets_test test: lib: Do not override table id in create_table() utils, tablets: Introduce external_memory_usage() db: tablets: Add printers db: tablets: Add persistence layer dht: Use last_token_of_compaction_group() in split_token_range_msb() locator: Introduce tablet_metadata dht: Introduce first_token() dht: Introduce next_token() storage_proxy: Improve trace-level logging locator: token_metadata: Fix confusing comment on ring_range() dht, storage_proxy: Abstract token space splitting Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries" db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms() db: Introduce get_non_local_vnode_based_strategy_keyspaces() service: storage_proxy: Avoid copying keyspace name in write handler locator: Introduce per-table replication strategy treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type locator: Introduce effective_replication_map locator: Rename effective_replication_map to vnode_effective_replication_map locator: effective_replication_map: Abstract get_pending_endpoints() db: Propagate feature_service to abstract_replication_strategy::validate_options() db: config: Introduce experimental "TABLETS" feature db: Log replication strategy for debugging purposes db: Log full exception on error in do_parse_schema_tables() db: keyspace: Remove non-const replication strategy getter config: Reformat	2023-04-27 09:40:18 +02:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Tomasz Grabiec	ce94a2a5b0	Merge 'Fixes and tests for raft-based topology changes' from Kamil Braun Fix two issues with the replace operation introduced by recent PRs. Add a test which performs a sequence of basic topology operations (bootstrap, decommission, removenode, replace) in a new suite that enables the `raft` experimental feature (so that the new topology change coordinator code is used). Fixes: #13651 Closes #13655 * github.com:scylladb/scylladb: test: new suite for testing raft-based topology test: remove topology_custom/test_custom.py raft topology: don't require new CDC generation UUID to always be present raft topology: include shard_count/ignore_msb during replace	2023-04-26 11:38:07 +02:00
Kefu Chai	5804eb6d81	storage_service: specialize fmt::formatter<storage_service::mode> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `storage_service::mode` without the help of `operator<<`. the corresponding `operator<<()` for `storage_service::mode` is removed in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13640	2023-04-25 14:20:57 +02:00
Gleb Natapov	9849409c2a	service/raft: raft_group0: drop dependency on migration_manager raft_group0 does not really depends on migration_manager, it needs it only transiently, so pass it to appropriate methods of raft_group0 instead of during its creation.	2023-04-25 12:38:01 +03:00
Gleb Natapov	d5d156d474	service/raft: raft_group0: drop dependency on query_processor raft_group0 does not really depends on query_processor, it needs it only transiently, so pass it to appropriate methods of raft_group0 instead of during its creation.	2023-04-25 12:35:57 +03:00
Gleb Natapov	029f1737ef	service/raft: raft_group0: drop dependency on storage_service raft_group0 does not really depends on storage_service, it needs it only transiently, so pass it to appropriate methods of raft_group0 instead of during its creation.	2023-04-25 11:07:47 +03:00
Kamil Braun	3f0498ca53	raft topology: don't require new CDC generation UUID to always be present During node replace we don't introduce a new CDC generation, only during regular bootstrap. Instead of checking that `new_cdc_generation_uuid` must be present whenever there's a topology transition, only check it when we're in `commit_cdc_generation` state.	2023-04-24 14:41:33 +02:00
Kamil Braun	9ca53478ed	raft topology: include shard_count/ignore_msb during replace Fixes: #13651	2023-04-24 14:40:47 +02:00
Tomasz Grabiec	5e89f2f5ba	service: Introduce tablet_allocator Currently, responsible for injecting mutations of system.tablets to schema changes. Note that not all migrations are handled currently. Dependant view or cdc table drops are not handled.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	46eae545ad	migration_manager: Send tablet metadata as part of schema pull This is currently used by group0 to transfer snapshot of the raft state machine.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	a8a03ee502	storage_service: Load tablet metadata when reloading topology state This change puts the reloading into topology_state_load(), which is a function which reloads token_metadata from system.topology (the new raft-based topology management). It clears the metadata, so needs to reload tablet map too. In the future, tablet metadata could change as part of topology transaction too, so we reload rather than preserve.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	d42685d0cb	storage_service: Load tablet metadata on boot and from group0 changes	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	41e69836fd	db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	b754433ac1	migration_notifier: Introduce before_drop_keyspace() Tablet allocator will need to inject mutations on keyspace drop.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	5b046043ea	migration_manager: Make prepare_keyspace_drop_announcement() return a future<> It will be extended with listener notification firing, which is an async operation.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	27acf3b129	storage_proxy: Improve trace-level logging	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	e4865bd4d1	dht, storage_proxy: Abstract token space splitting Currently, scans are splitting partition ranges around tokens. This will have to change with tablets, where we should split at tablet boundaries. This patch introduces token_range_splitter which abstracts this task. It is provided by effective_replication_map implementation.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	dc04da15ec	db: Introduce get_non_local_vnode_based_strategy_keyspaces() It's meant to be used in places where currently get_non_local_strategy_keyspaces() is used, but work only with keyspaces which use vnode-based replication strategy.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	8fcb320e71	service: storage_proxy: Avoid copying keyspace name in write handler	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	9b17ad3771	locator: Introduce per-table replication strategy Will be used by tablet-based replication strategies, for which effective replication map is different per table. Also, this patch adapts existing users of effective replication map to use the per-table effective replication map. For simplicity, every table has an effective replication map, even if the erm is per keyspace. This way the client code can be uniform and doesn't have to check whether replication strategy is per table. Not all users of per-keyspace get_effective_replication_map() are adapted yet to work per-table. Those algorithms will throw an exception when invoked on a keyspace which uses per-table replication strategy.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	d3c9ad4ed6	locator: Rename effective_replication_map to vnode_effective_replication_map In preparation for introducing a more abstract effective_replication_map which can describe replication maps which are not based on vnodes.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	1343bfa708	locator: effective_replication_map: Abstract get_pending_endpoints()	2023-04-24 10:49:36 +02:00
Botond Dénes	9e757d9c6d	Merge 'De-globalize storage proxy' from Pavel Emelyanov All users of global proxy are gone (), proxy can be made fully main/cql_test_env local. () one test case still needs it, but can get it via cql_test_env Closes #13616 * github.com:scylladb/scylladb: code: Remove global proxy schema_change_test: Use proxy from cql_test_env test: Carry proxy reference on cql_test_env	2023-04-24 09:38:00 +03:00
Benny Halevy	2d20ee7d61	gms: version_generator: define version_type and generation_type strong types Derived from utils::tagged_integer, using different tags, the types are incompatible with each other and require explicit typecasting to- and from- their value type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:47:17 +03:00
Benny Halevy	d1817e9e1b	utils: move generation-number to gms Although get_generation_number implementation is completely generic, it is used exclusively to seed the gossip generation number. Following patches will define a strong gms::generation_id type and this function should return it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00

1 2 3 4 5 ...

3376 Commits