scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Botond Dénes	42a76ca568	Merge 'Improve printing of nodes and backtraces in topology' from Pavel Emelyanov There's a bunch of debug- and trace-level logging of locator::node-s that also include current_backtrace(). Printing node is done via debug_format() helper that generates and returns an sstring to print. Backtrace printing is not very lightweight on its own because of backtrace collecting. Not to slow things down in info log level, which is default, all such prints are wrapped with explicit if-s about log-level being enabled or not. This PR removes those level checks by introducing lazy_backtrace() helper and by providing a formatter for nodes that also results in lazy node format string calculation. Closes scylladb/scylladb#17235 * github.com:scylladb/scylladb: topology: Restore indentation after previous patch topology: Drop if_enabled checks for logging topology: Add lazy_backtrace() helper topology: Add printer wrapper for node* and formatter for it topology: Expand formatter<locator::node>	2024-02-19 09:32:53 +02:00
Patryk Wrobel	a3fb44cbca	Rename keyspace::get_effective_replication_map() This commit renames keyspace::get_effective_replication_map() to keyspace::get_vnode_effective_replication_map(). This change is required to ease the analysis of the usage of this function. When tablets are enabled, then this function shall not be used. Instead of per-keyspace, per-table replication map should be used. The rename was performed to distinguish between those two calls. The next step will be an audit of usages of keyspace::get_vnode_effective_replication_map(). Refs: scylladb#16626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17314	2024-02-13 20:22:02 +02:00
Botond Dénes	3f2d7e8b25	tree: remove unnecessary yields around for_each_tablet() Commit `904bafd069` consolidated the two existing for_each_tablet() overloads, to the one which has a future<> returning callback. It also added yields to the bodies of said callbacks. This is unnecessary, the loop in for_each_tablet() already has a yield per tablet, which should be enough to prevent stalls. This patch is a follow-up to #17118 Closes scylladb/scylladb#17284	2024-02-12 17:10:25 +01:00
Pavel Emelyanov	309d34a147	topology: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	f7a13b9bb0	topology: Drop if_enabled checks for logging Now all the logged arguments are lazily evaluated (node* format string and backtrace) so the preliminary log-level checks are not needed. indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	c1ea6c8acf	topology: Add lazy_backtrace() helper This helper returns lazy_eval-ed current_backtrace(), so it will be generated and printed only if logger is really going to do it with its current log-level. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	da53854b66	topology: Add printer wrapper for node* and formatter for it Currently to print node information there's a debug_format(node*) helper function that returns back an sstring object. Here's the formatter that's more flexible and convenient, and a node_printer wrapper, since formatters cannot format non-void pointers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	aa0293f411	topology: Expand formatter<locator::node> Equip it with :v specifier that turns verbose mode on and prints much more data about the node. Main user will appear in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Botond Dénes	35da9551fb	Merge 'storage_service: Add describe_ring support for tablet table' from Asias He The table query param is added to get the describe_ring result for a given table. Both vnode table and tablet table can use this table param, so it is easier for users to user. If the table param is not provided by user and the keyspace contains tablet table, the request will be rejected. E.g., curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles" curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1" Refs #16509 Closes scylladb/scylladb#17118 * github.com:scylladb/scylladb: tablets: Convert to use the new version of for_each_tablet storage_service: Add describe_ring support for tablet table storage_service: Mark host2ip as const tablets: Add for_each_tablet_gently	2024-02-07 10:41:36 +02:00
Tomasz Grabiec	032c1a3d04	Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged. With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE. closes: #16529 Closes scylladb/scylladb#17079 * github.com:scylladb/scylladb: tablets: Make sure topology has enough endpoints for RF cql-pytest: Disable tablets when RF > nodes-in-DC test: Remove test that configures RF larger than the number of nodes keyspace_metadata: Include tablets property in DESCRIBE	2024-02-06 22:38:11 +01:00
Botond Dénes	a3d4131918	Merge 'Sanitize replication factor parsing by strategies' from Pavel Emelyanov RF values appear as strings and strategies classes convert them to integers. This PR removes some duplication of efforts in converting code. Closes scylladb/scylladb#17132 * github.com:scylladb/scylladb: network_topology_strategy: Do not walk list of datacenters twice replication_strategy: Do not convert string RF into int twise abstract_replication_strategy: Make validate_replication_factor return value	2024-02-06 13:26:31 +02:00
Asias He	904bafd069	tablets: Convert to use the new version of for_each_tablet It is more gently than the old one.	2024-02-05 18:45:40 +08:00
Pavel Emelyanov	45dbe38658	tablets: Make sure topology has enough endpoints for RF When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged. With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:04 +03:00
Asias He	fab0d33d08	tablets: Add for_each_tablet_gently In this version, the callback returns a future<>, so it can yield itself to avoid stalls in func itself.	2024-02-05 13:42:08 +08:00
Avi Kivity	784c2f8ad2	Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing. Closes scylladb/scylladb#17130 * github.com:scylladb/scylladb: treewide: replace seastar::future::get0() with seastar::future::get() sstable: capture return value of get0() using auto utils: result_loop: define result_type with decayed type [avi: add another one that snuck in while this was cooking]	2024-02-04 15:23:33 +02:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Pavel Emelyanov	afda0f6ddf	network_topology_strategy: Do not walk list of datacenters twice Construct of that class walks the provided options to get per-DC replication factors. It does it twice -- first to populate the dc:rf map, second to calculate the sum of provided RF values. The latter loop can be optimized away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:39:24 +03:00
Pavel Emelyanov	06f9e7367c	replication_strategy: Do not convert string RF into int twise There are two replication strategy classes that validate string RF and then convert it into integer. Since validation helper returns the parsed value, it can be just used avoiding the 2nd conversion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:38:17 +03:00
Pavel Emelyanov	a8cd3bc636	abstract_replication_strategy: Make validate_replication_factor return value The helper in question checks if string RF is indeed an integer. Make this helper return the "checked" integer value, because it does this conversion. And rename it to parse_... to reflect what it now does. Next patches will make use of this change. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:36:47 +03:00
Kefu Chai	b45af994c2	locator/utils: remove stale comment this comment has already served its purpose when rewriting C* in C++. since we've re-implemented it, there is no need to keep it around. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17120	2024-02-02 11:07:35 +02:00
Avi Kivity	c8397f0287	Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer. If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint. Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on. A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split. Tablet metadata gains 2 new fields for managing this: resize_type: resize decision type, can be either of "merge", "split", or "none". resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator). A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ). When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready. When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split"). The split monitor will start splitting of compaction groups (using mechanism introduced here: `081f30d149`) for the table. And once splitting work is completed, the table updates its local state as having completed split. When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step). Fixes #16536. Closes scylladb/scylladb#16580 * github.com:scylladb/scylladb: test/topology_experimental_raft: Add tablet split test replica: Bypass reshape on boot with tablets temporarily replica: Fix table::compaction_group_for_sstable() for tablet streaming test/topology_experimental_raft: Disable load balancer in test fencing replica: Remap compaction groups when tablet split is finalized service: Split tablet map when split request is finalized replica: Update table split status if completed split compaction work storage_service: Implement split monitor topology_cordinator: Generate updates for resize decisions made by balancer load_balancer: Introduce metrics for resize decisions db: Make target tablet size a live-updateable config option load_balancer: Implement resize decisions service: Wire table_resize_plan into migration_plan service: Introduce table_resize_plan tablet_mutation_builder: Add set_resize_decision() topology_coordinator: Wire load stats into load balancer storage_service: Allow tablet split and migration to happen concurrently topology_coordinator: Periodically retrieve table_load_stats locator: Introduce topology::get_datacenter_nodes() storage_service: Implement table_load_stats RPC replica: Expose table_load_stats in table replica: Introduce storage_group::live_disk_space_used() locator: Introduce table_load_stats tablets: Add resize decision metadata to tablet metadata locator: Introduce resize_decision	2024-01-31 13:59:56 +02:00
Tomasz Grabiec	36f218c83b	Merge 'main: refuse startup when tablet resharding is required' from Botond Dénes We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes. Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard. Startup will fail as: ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.) Refs: #16739 Fixes: #16843 Closes scylladb/scylladb#17008 * github.com:scylladb/scylladb: test/topolgy_experimental_raft: test_tablets.py: add test for resharding test/pylib: manager[_client]: add update_cmdline() main: refuse startup when tablet resharding is required locator: tablets: add check_tablet_replica_shards()	2024-01-29 23:39:41 +01:00
Botond Dénes	95b6aeebae	locator: tablets: add check_tablet_replica_shards() Checks that all tablets with a replica on the this node, have a valid replica shard (< smp::count). Will be used to check whether the node can start-up with the current shard-count.	2024-01-29 07:04:33 -05:00
Pavel Emelyanov	3abdb3c7ee	tablets: Remove tablet_aware_replication_strategy::parse_initial_tablets It's now unused, string with initial tablets its parsed elsewhere Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17010	2024-01-29 10:03:38 +02:00
Raphael S. Carvalho	e0de3dd844	topology_cordinator: Generate updates for resize decisions made by balancer Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:40 -03:00
Raphael S. Carvalho	2209c7440c	topology_coordinator: Periodically retrieve table_load_stats This implements the fiber that aggregates per-table stats that will be feeded into load balancer to make resize decisions (split, merge, or revoke ongoing ones). Initially, the stats will be refreshed every 60s, but the idea is that eventually we make the frequency table based, where the size of each table is taken into account. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	489a527e20	locator: Introduce topology::get_datacenter_nodes() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	6c74fc4b82	locator: Introduce table_load_stats This is per table stats that will be aggregated from all nodes, by the coordinator, in order to help load balancer make resize decisions. size_in_bytes is the total aggregated table size, so coordinator becomes responsible for taking into account RF of each DC and also tablet count, for computing an accurate average size. split_ready_seq_number is the minimum sequence number among all replicas. If coordinator sees all replicas store the seq number of current split, then it knows all replicas are ready for the next stage in the split process. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	0d5ba1ee4b	tablets: Add resize decision metadata to tablet metadata The new metadata describes the ongoing resize operation (can be either of merge, split or none) that spans tablets of a given table. That's managed by group0, so down nodes will be able to see the decision when they come back up and see the changes to the metadata. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:06 -03:00
Raphael S. Carvalho	57582ac9c4	locator: Introduce resize_decision resize_decision is the metadata the says whether tablets of a table needs split, merge, or none. That will be recorded in tablet metadata, and therefore stored in group0. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:31:12 -03:00
Avi Kivity	69d597075a	Merge 'tablets: Add support for removenode and replace handling' from Tomasz Grabiec New tablet replicas are allocated and rebuilt synchronously with node operations. They are safely rebuilt from all existing replicas. The list of ignored nodes passed to node operations is respected. Tablet scheduler is responsible for scheduling tablet rebuilding transition which changes the replicas set. The infrastructure for handling decommission in tablet scheduler is reused for this. Scheduling is done incrementally, respecting per-shard load limits. Rebuilding transitions are recognized by load calculation to affect all tablet replicas. New kind of tablet transition is introduced called "rebuild" which adds new tablet replica and rebuilds it from existing replicas. Other than that, the transition goes through the same stages as regular migration to ensure safe synchronization with request coordinators. In this PR we simply stream from all tablet replicas. Later we should switch to calling repair to avoid sending excessive amounts of data. Fixes https://github.com/scylladb/scylladb/issues/16690. Closes scylladb/scylladb#16894 * github.com:scylladb/scylladb: tests: tablets: Add tests for removenode and replace tablets: Add support for removenode and replace handling topology_coordinator: tablets: Do not fail in a tight loop topology_coordinator: tablets: Avoid warnings about ignored failured future storage_service, topology: Track excluded state in locator::topology raft topology: Introduce param-less topology::get_excluded_nodes() raft topology: Move get_excluded_nodes() to topology tablets: load_balancer: Generalize load tracking tablets: Introduce get_migration_streaming_info() which works on migration request tablets: Move migration_to_transition_info() to tablets.hh tablets: Extract get_new_replicas() which works on migraiton request tablets: Move tablet_migration_info to tablets.hh tablets: Store transition kind per tablet	2024-01-25 14:49:43 +02:00
Botond Dénes	26d814d8be	Merge 'Configure initial tablets count scaling' from Pavel Emelyanov There are currently two options how to "request" the number of initial tables for a table 1. specify it explicitly when creating a keyspace 2. let scylla calculate it on its own Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly. Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient. Closes scylladb/scylladb#16919 * github.com:scylladb/scylladb: config: Add --tablets-initial-scale-factor tablet_allocator: Add initial tablets scale to config tablet_allocator: Add config	2024-01-23 13:25:12 +02:00
Kefu Chai	76b9e4f4f4	locator: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16914	2024-01-23 09:12:23 +02:00
Tomasz Grabiec	e5dcf03b88	tablets: Add support for removenode and replace handling New tablet replicas are allocated synchronously with node operations. They are safely rebuilt from all existing replicas. The list of ignored nodes passed to node operations is respected. Tablet scheduler is responsible for scheduling tablet transition which changes the replicas set. The infrastructure for handling decommission in tablet scheduler is reused for this. Scheduling is done incrementally, respecting per-shard load limits. Rebuilding transitions are recognized by load calculation to affect all tablet replicas. New kind of tablet transition is introduced called "rebuild" which adds new tablet replica and rebuilds it from existing replicas. Other than that, the transition goes through the same stages as regular migration to ensure safe synchronization with request coordinators. In this PR we simply stream from all tablet replicas. Later we should switch to calling repair to avoid sending excessive amounts of data. Fixes #16690.	2024-01-23 01:19:42 +01:00
Tomasz Grabiec	5fccee3a13	storage_service, topology: Track excluded state in locator::topology Will be used by tablet load balancer to avoid excluded nodes in scheduling.	2024-01-23 01:12:58 +01:00
Tomasz Grabiec	649ca0e46c	tablets: Introduce get_migration_streaming_info() which works on migration request Will be used by tablet load balancer to compute impact on load of planned migrations. Currently, the logic is hard coded in the load balancer and may get out of sync with the logic we have in get_migration_streaming_info() for already running tablet transitions. The logic will become more complex for rebuild transition, so use shared code to compute it.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	6dc56fd80b	tablets: Move migration_to_transition_info() to tablets.hh	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	1df256221c	tablets: Extract get_new_replicas() which works on migraiton request Now we have a single place which translates tablet migration request to new replicas. Will be reused in other places.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	ae382196f1	tablets: Move tablet_migration_info to tablets.hh Will add methods which operate on it to tablets.hh where they belong.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	4a06ffb43c	tablets: Store transition kind per tablet Will be used to distinguish regular migration from rebuild, repair and RF change.	2024-01-23 01:12:57 +01:00
Pavel Emelyanov	eb3b237e05	tablet_allocator: Add initial tablets scale to config When allocating tablets for table for the frist time their initial count is calculated so that each shard in a cluster gets one tablet. It may happen that more than one initial tablet per shard is better, e.g. perf tests typically rely on that. It's possible to specify the initial tablets count when creating a keyspace, this number doesn't take the cluster topology into consideration and may also be not very nice. As a temporary solution (e.g. for perf tests) we may add a configurable that scales the initial number of calculated tablets by some factor Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:14:45 +03:00
Kefu Chai	ce076b5ae3	gossiping_property_file_snitch: drop unused using namespace we don't use any symbol in this namespace, in this function, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16893	2024-01-21 16:48:37 +02:00
Pavel Emelyanov	8595d64d01	locator: Handle replication factor of 0 for initial_tablets calculations When calculating per-DC tablets the formula is shards_in_dc / rf_in_dc, but the denominator in it can be configured to be literally zero and the division doesn't work. Fix by assuming zero tablets for dcs with zero rf fixes: #16844 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16861	2024-01-18 19:42:08 +02:00
Kefu Chai	0ae81446ef	./: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16766	2024-01-17 16:30:14 +02:00
Pavel Emelyanov	941f6d8fca	cql: Move initial_tablets from REPLICATION to TABLETS in DDL This patch changes the syntax of enabling tablets from CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> } to be CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> } and updates all tests accordingly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Pavel Emelyanov	4c4a9679d8	network_topology_strategy: Estimate initial_tablets if 0 is set If user configured zero initial tablets (spoiler: or this value was set automagically when enabling tablets begind the scenes) we still need some value to start with and this patch calculates one. The math is based on topology and RF so that all shards are covered: initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters) The estimation is done when a table is created, not when the keyspace is created. For that, the keyspace is configured with zero initial tabled, and table-creation time zero is converted into auto-estimated value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Gleb Natapov	23a27ccc24	vnode_effective_replication_map: add get_all_pending_nodes() function Add a function that returns all nodes that have vnode been moved to them during a topology change operation. Needed to know which nodes need to do cleanup in case of failed topology change operation.	2024-01-14 14:37:16 +02:00
Gleb Natapov	a8f11852da	vnode_effective_replication_map: pre calculate dirty endpoints during topology change Some topology change operations causes some nodes loose ranges. This information is needed to know which nodes need to do cleanup after topology operation completes. Pre calculate it during erm creation.	2024-01-14 14:11:19 +02:00
Petr Gusev	41c15814e6	erm: for_each_natural_endpoint_until: use is_vnode == true This is an optimisation - for_each_natural_endpoint_until is called only for vnode tokens, we don't need to run the binary search for it in tm.first_token. Also the function is made private since it's only used in erm itself.	2024-01-12 12:23:22 +04:00
Petr Gusev	07f2ec63c7	erm: switch the internal data structures to host_id-s Before this patch the host_id -> IP mapping was done in calculate_effective_replication_map. This function is called from mutate_token_metadata, which means we have to have an IP for each host_id in topology_state_load, otherwise we get an error. We are going to remove the IP waiting loop from topology_state_load, so we need to get rid of IPs resolution from calculate_effective_replication_map. In this patch we move the host_id -> IP resolution to the data plane. When a write or read request is sent the target endpoints are requested from erm through get_natural_endpoints_without_node_being_replaced, get_pending_endpoints and get_endpoints_for_reading methods and this is where the IP resolution will now occur.	2024-01-12 12:23:22 +04:00

1 2 3 4 5 ...

765 Commits