scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Avi Kivity	78fc3b5f56	config: rename stream_plan_ranges_percentage to *_fraction The value is specified as a fraction between 0 and 1, so don't mislead users into specifying a value between 0 and 100. Closes #15261	2023-09-03 23:24:29 +03:00
Tomasz Grabiec	6d545b2f9e	storage_service: Implement stream_tablet RPC Performs streaming of data for a single tablet between two tablet replicas. The node which gets the RPC is the receiving replica.	2023-07-25 21:08:51 +02:00
Asias He	dad5caf141	streaming: Add stream_plan_ranges_percentage This option allows user to change the number of ranges to stream in batch per stream plan. Currently, each stream plan streams 10% of the total ranges. With more ranges per stream plan, it reduces the waiting time between two stream plans. For example, stream_plan1: shard0 (t0), shard1 (t1) stream_plan2: shard0 (t2), shard1 (t3) We start stream_plan2 after all shards finish streaming in stream_plan1. If shard0 and shard1 in stream_plan1 finishes at different time. One of the shards will be idle. If we stream more ranges in a single stream plan, the waiting time will be reduced. Previously, we retry the stream plan if one of the stream plans is failed. That's one of the reasons we want more stream plans. With RBNO and `1f8b529e08` (range_streamer: Disable restream logic), the restream factor is not important anymore. Also, more ranges in a single stream plan will create bigger but fewer sstables on the receiver side. The default value is the same as before: 10% percentage of total ranges. Fixes #14191 Closes #14402	2023-07-14 09:03:01 +03:00
Raphael S. Carvalho	8d58ff1be6	compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node. That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in #14317. To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead. Without further ado, before: ... 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028. after: ... 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028. Fixes #12998. Fixes #14317. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-07-11 13:56:24 -03:00
Tomasz Grabiec	29cbdb812b	dht: Rename dht::shard_of() to dht::static_shard_of() This is in order to prevent new incorrect uses of dht::shard_of() to be accidentally added. Also, makes sure that all current uses are caught by the compiler and require an explicit rename.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	fb0bdcec0c	storage_proxy: Avoid multishard reader for tablets Currently, the coordinator splits the partition range at vnode (or tablet) boundaries and then tries to merge adjacent ranges which target the same replica. This is an optimization which makes less sense with tablets, which are supposed to be of substantial size. If we don't merge the ranges, then with tablets we can avoid using the multishard reader on the replica side, since each tablet lives on a single shard. The main reason to avoid a multishard reader is avoiding its complexity, and avoiding adapting it to work with tablet sharding. Currently, the multishard reader implementation makes several assumptions about shard assignment which do not hold with tablets. It assumes that shards are assigned in a round-robin fashion.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	28b972a588	dht: Make split_range_to_single_shard() prepared for tablet sharder The function currently assumes that shard assignment for subsequent tokens is round robin, which will not be the case for tablets. This can lead to incorrect split calculation or infinite loop. Another assumption was that subsequent splits returned by the sharder have distinct shards. This also doesn't hold for tablets, which may return the same shard for subsequent tokens. This assumption was embedded in the following line: start_token = sharder.token_for_next_shard(end_token, shard); If the range which starts with end_token is also owned by "shard", token_for_next_shard() would skip over it.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	390bcf3fae	dht: Take sharder externally in splitting functions We need those functions to work with tablet sharder, which is not accessible through schema::get_sharder(). In order to propagate the right sharder, those functions need to take it externally rather from the schema object. The sharder will come from the effective_replication_map attached to the table object. Those splitting functions are used when generating sharding metadata of an sstable. We need to keep this sharding metadata consistent with tablet mapping to shards in order for node restart to detect that those sstables belong to a single shard and that resharding is not necessary. Resharding of sstables based on tablet metadata is not implemented yet and will abort after this series. Keeping sharding metadata accurate for tablets is only necessary until compaction group integration is finished. After that, we can use the sstable token range to determine the owning tablet and thus the owning shard. Before that, we can't, because a single sstable may contain keys from different tablets, and the whole key range may overlap with keys which belong to other shards.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	606a8ee2da	dht: sharder: Document guarantees about mapping stability	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e8dd5e34c3	dht: sharder: Introduce next_shard() The logic was extracted from ring_position_range_sharder::next(), and the latter was changed to rely on sharder::next_shard(). The tablet sharder will have a different implementation for next_shard(). This way, ring_position_range_sharder can work with both current sharder and the tablet sharder.	2023-06-21 00:58:24 +02:00
Avi Kivity	9c37fdaca3	Revert "dht: incremental_owned_ranges_checker: use lower_bound()" This reverts commit `d85af3dca4`. It restores the linear search algorithm, as we expect the search to terminate near the origin. In this case linear search is O(1) while binary search is O(log n). A comment is added so we don't repeat the mistake. Closes #13704	2023-05-02 08:01:44 +03:00
Kamil Braun	30cc07b40d	Merge 'Introduce tablets' from Tomasz Grabiec This PR introduces an experimental feature called "tablets". Tablets are a way to distribute data in the cluster, which is an alternative to the current vnode-based replication. Vnode-based replication strategy tries to evenly distribute the global token space shared by all tables among nodes and shards. With tablets, the aim is to start from a different side. Divide resources of replica-shard into tablets, with a goal of having a fixed target tablet size, and then assign those tablets to serve fragments of tables (also called tablets). This will allow us to balance the load in a more flexible manner, by moving individual tablets around. Also, unlike with vnode ranges, tablet replicas live on a particular shard on a given node, which will allow us to bind raft groups to tablets. Those goals are not yet achieved with this PR, but it lays the ground for this. Things achieved in this PR: - You can start a cluster and create a keyspace whose tables will use tablet-based replication. This is done by setting `initial_tablets` option: ``` CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'initial_tablets': 8}; ``` All tables created in such a keyspace will be tablet-based. Tablet-based replication is a trait, not a separate replication strategy. Tablets don't change the spirit of replication strategy, it just alters the way in which data ownership is managed. In theory, we could use it for other strategies as well like EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy is augmented to support tablets. - You can create and drop tablet-based tables (no DDL language changes) - DML / DQL work with tablet-based tables Replicas for tablet-based tables are chosen from tablet metadata instead of token metadata Things which are not yet implemented: - handling of views, indexes, CDC created on tablet-based tables - sharding is done using the old method, it ignores the shard allocated in tablet metadata - node operations (topology changes, repair, rebuild) are not handling tablet-based tables - not integrated with compaction groups - tablet allocator piggy-backs on tokens to choose replicas. Eventually we want to allocate based on current load, not statically Closes #13387 * github.com:scylladb/scylladb: test: topology: Introduce test_tablets.py raft: Introduce 'raft_server_force_snapshot' error injection locator: network_topology_strategy: Support tablet replication service: Introduce tablet_allocator locator: Introduce tablet_aware_replication_strategy locator: Extract maybe_remove_node_being_replaced() dht: token_metadata: Introduce get_my_id() migration_manager: Send tablet metadata as part of schema pull storage_service: Load tablet metadata when reloading topology state storage_service: Load tablet metadata on boot and from group0 changes db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() migration_notifier: Introduce before_drop_keyspace() migration_manager: Make prepare_keyspace_drop_announcement() return a future<> test: perf: Introduce perf-tablets test: Introduce tablets_test test: lib: Do not override table id in create_table() utils, tablets: Introduce external_memory_usage() db: tablets: Add printers db: tablets: Add persistence layer dht: Use last_token_of_compaction_group() in split_token_range_msb() locator: Introduce tablet_metadata dht: Introduce first_token() dht: Introduce next_token() storage_proxy: Improve trace-level logging locator: token_metadata: Fix confusing comment on ring_range() dht, storage_proxy: Abstract token space splitting Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries" db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms() db: Introduce get_non_local_vnode_based_strategy_keyspaces() service: storage_proxy: Avoid copying keyspace name in write handler locator: Introduce per-table replication strategy treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type locator: Introduce effective_replication_map locator: Rename effective_replication_map to vnode_effective_replication_map locator: effective_replication_map: Abstract get_pending_endpoints() db: Propagate feature_service to abstract_replication_strategy::validate_options() db: config: Introduce experimental "TABLETS" feature db: Log replication strategy for debugging purposes db: Log full exception on error in do_parse_schema_tables() db: keyspace: Remove non-const replication strategy getter config: Reformat	2023-04-27 09:40:18 +02:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Kefu Chai	5a11d67709	dht: token: s/tri_compare/operator<=>/ now that C++20 is able to generate the default-generated comparing operators for us. there is no need to define them manually. and, `std::rel_ops::*` are deprecated in C++20. also, use `foo <=> bar` instead of `tri_compare(foo, bar)` for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-26 14:09:57 +08:00
Kefu Chai	cc87e10f40	dht: print pk in decorated_key with "pk" prefix this change ensures that `dk._key` is formatted with the "pk" prefix. as in `3738fcb`, the `operator<<` for partition_key was removed. so the compiler has to find an alternative when trying to fulfill the needs when this operator<< is called. fortunately, from the compiler's perspective, `partition_key` has an `operator managed_bytes_view`, and this operator does not have the explicit specifier, and, `managed_bytes_view` does support `operator<<`. so this ends up with a change in the format of `decorated_key` when it is printed using `operator<<`. the code compiles. but unfortunately, the behavior is changed, and it breaks scylla-dtest/cdc_tracing_info_test.py where the partition_key is supposed to be printed like "pk{010203}" instead of "010203". the latter is how `managed_bytes_view` is formatted. a test is added accordingly to avoid future changes which break the dtest. Fixes scylladb#13628 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13653	2023-04-25 09:53:47 +02:00
Tomasz Grabiec	fa8ad9a585	dht: Use last_token_of_compaction_group() in split_token_range_msb()	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	fceb5f8cf6	locator: Introduce tablet_metadata token_metadata now stores tablet metadata with information about tablets in the system.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	241f7febec	dht: Introduce first_token()	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	462e3ffd36	dht: Introduce next_token()	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	d3c9ad4ed6	locator: Rename effective_replication_map to vnode_effective_replication_map In preparation for introducing a more abstract effective_replication_map which can describe replication maps which are not based on vnodes.	2023-04-24 10:49:36 +02:00
Kamil Braun	55f43e532c	Merge 'get rid of gms/failure_detector' from Benny Halevy Move gms::arrival_window to api/failure_detector which is its only user. and get rid of the rest, which is not used, now that we use direct_failure_detector instead. TODO: integare direct_failure_detector with failure_detector api. Closes #13576 * github.com:scylladb/scylladb: gms: get rid of unused failure_detector api: failure_detector: remove false dependency on failure_detector::arrival_window test: rest_api: add test_failure_detector	2023-04-21 11:47:44 +02:00
Benny Halevy	3f1ac846d8	gms: get rid of unused failure_detector The legacy failure_detector is now unused and can be removed. TODO: integare direct_failure_detector with failure_detector api. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-21 09:08:27 +03:00
Kefu Chai	fe9f41bd84	dht: remove unnecessarily forward declaration it turns out the declaration of `operator<<(ostream&, const dht::token&)` is unnecessarily. so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 11:41:54 +08:00
Kefu Chai	53dedca8cd	dht: specialize fmt::formatter<dht::token> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `dht::token` without the help of `operator<<`. the corresponding `operator<<()` is preserved in this change, as it has lots of users in this project, we will tackle them case-by-case in follow-up changes. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 11:41:54 +08:00
Gleb Natapov	fd6d45e178	bootstrapper: Add get_random_bootstrap_tokens function Does the same as get_bootstrap_tokens() but does not consult initial token config option. Will be used later.	2023-03-21 16:06:43 +02:00
Kefu Chai	c37f4e5252	treewide: use fmt::join() when appropriate now that fmtlib provides fmt::join(). see https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view there is not need to revent the wheel. so in this change, the homebrew join() is replaced with fmt::join(). as fmt::join() returns an join_view(), this could improve the performance under certain circumstances where the fully materialized string is not needed. please note, the goal of this change is to use fmt::join(), and this change does not intend to improve the performance of existing implementation based on "operator<<" unless the new implementation is much more complicated. we will address the unnecessarily materialized strings in a follow-up commit. some noteworthy things related to this change: * unlike the existing `join()`, `fmt::join()` returns a view. so we have to materialize the view if what we expect is a `sstring` * `fmt::format()` does not accept a view, so we cannot pass the return value of `fmt::join()` to `fmt::format()` * fmtlib does not format a typed pointer, i.e., it does not format, for instance, a `const std::string`. but operator<<() always print a typed pointer. so if we want to format a typed pointer, we either need to cast the pointer to `void` or use `fmt::ptr()`. * fmtlib is not able to pick up the overload of `operator<<(std::ostream& os, const column_definition* cd)`, so we have to use a wrapper class of `maybe_column_definition` for printing a pointer to `column_definition`. since the overload is only used by the two overloads of `statement_restrictions::add_single_column_parition_key_restriction()`, the operator<< for `const column_definition*` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 20:34:18 +08:00
Kamil Braun	fe14d14ce9	Merge 'Eliminate extraneous copies of dht::token_range_vector' from Benny Halevy In several places we copy token range vectors where we could move them and eliminate unnecessary memory copies. Ref #11005 Closes #12344 * github.com:scylladb/scylladb: dht/range_streamer: stream_async: move ranges_to_stream to do_streaming streaming: stream_session: maybe_yield streaming: stream_session: prepare: move token ranges to add_transfer_ranges streaming: stream_plan: transfer_ranges: move token ranges towards add_transfer_ranges dht/range_streamer: stream_async: do_streaming: move ranges downstream dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace dht/range_streamer: get_range_fetch_map: reduce copies dht/range_streamer: add_ranges: move ranges down-stream dht/boot_strapper: move ranges to add_ranges dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining dht/range_streamer: stream_async: erase from range_vec only after do_streaming success	2023-03-07 13:46:33 +01:00
Kefu Chai	c5d1a69859	build: cmake: link couple libraries as whole archive turns out we are using static variables to register entries in global registries, and these variables are not directly referenced, so linker just drops them when linking the executables or shared libraries. to address this problem, we just link the whole archive. another option would be create a linker script or pass --undefined=<symbol> to linker. neither of them is straightforward. a helper function is introduced to do this, as we cannot use CMake 3.24 as yet. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Kefu Chai	563fbb2d11	build: cmake: extract more subsystem out into its own CMakeLists.txt namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica, service, tools, tracing and transport. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	d85af3dca4	dht: incremental_owned_ranges_checker: use lower_bound() instead of using a while loop for finding the lower_bound, just use std::lower_bound() for finding if current node owns given token. this has two advantages: * better readability: as lower_bound is exactly what this loop calculates. * lower_bound uses binary search for searching the element, this algorithm should be faster than linear under most circumstances. * lower_bound uses std::advance() and prefix increment operator, this should be more performant than the postfix increment operator. as it does not create an temporary instance of iterator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13008	2023-03-01 11:29:46 +02:00
Benny Halevy	06a0902708	dht/range_streamer: stream_async: move ranges_to_stream to do_streaming Currently the ranges_to_stream variable lives on the caller state, and do_streaming() moves its contents down to request_ranges/transfer_ranges and then calls clear() to make it ready for reuse. This works in principle but it makes it harder for an occasional reader of this code to figure out what going on. This change transfers control of the ranges_to_stream vector to do_streaming, by calling it with (std::exchange(do_streaming, {})) and with that that moved vector doesn't need to be cleared by do_streaming, and the caller is reponsible for readying the variable for reuse in its for loop. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 17:38:34 +02:00
Benny Halevy	775c6b9697	dht/range_streamer: stream_async: do_streaming: move ranges downstream The ranges can be moved rather than copied to both `request_ranges` and `transfer_ranges` as they are only cleared after this point. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:56:55 +02:00
Benny Halevy	3cd8838a09	dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace After calling get_range_fetch_map, ranges_for_keyspace is not used anymore. Synchronously destroying it may potentially stall in large clusters so use utils::clear_gently to gently clear the map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:52:30 +02:00
Benny Halevy	a80c2d16dd	dht/range_streamer: get_range_fetch_map: reduce copies Use const& to refer to the input ranges and endpoints rather than copying them individually along the way more than needed to. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:52:30 +02:00
Benny Halevy	9d6e5d50d1	dht/range_streamer: add_ranges: move ranges down-stream Eliminate extraneous copy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:52:27 +02:00
Benny Halevy	c61f058aa5	dht/boot_strapper: move ranges to add_ranges Eliminate extraneous copy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:50:40 +02:00
Benny Halevy	27b382dcce	dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining Rather than calling nr_ranges_to_stream() inside `do_streaming`. As nr_ranges_to_stream depends on the `_to_stream` that will be updated only later on after the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:50:40 +02:00
Benny Halevy	c3c7efffb1	dht/range_streamer: stream_async: erase from range_vec only after do_streaming success range_vec is used for calculating nr_ranges_to_stream. Currently, the ranges_to_stream that were moved out of range_vec are push back on exception, but this isn't safe, since they may have moved already to request_ranges or transfer_ranges. Instead, erase the ranges we pass to do_streaming only after it succeeds so on exception, range_vec will not need adjusting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:50:40 +02:00
Kefu Chai	df63e2ba27	types: move types.{cc,hh} into types they are part of the CQL type system, and are "closer" to types. let's move them into "types" directory. the building systems are updated accordingly. the source files referencing `types.hh` were updated using following command: ``` find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} + ``` the source files under sstables include "types.hh", which is indeed the one located under "sstables", so include "sstables/types.hh" instea, so it's more explicit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12926	2023-02-19 21:05:45 +02:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Botond Dénes	c927eea1d5	Merge 'table: trim ranges for compaction group cleanup' from Benny Halevy This series contains the following changes for trimming the ranges passed to cleanup a compaction group to the compaction group owned token_range. table: compaction_group_for_token: use signed arithmetic Fixes #12595 table: make_compaction_groups: calculate compaction_group token ranges table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries Fixes #12594 Closes #12598 * github.com:scylladb/scylladb: table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries table: make_compaction_groups: calculate compaction_group token ranges dht: range_streamer: define logger as static	2023-01-30 13:11:28 +02:00
Benny Halevy	82011fc489	dht: incremental_owned_ranges_checker: belongs_to_current_node: mark as const Its _it member keeps state about the current range. Although it's modified by the method, this is an implementation detail that irrelevant to the caller, hence mark the belongs_to_current_node method as const (and noexcept while at it). This allows the caller, cleanup_compaction, to use it from inside a const method, without having to mark its respective member as mutable too. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12634	2023-01-25 14:52:21 +02:00
Benny Halevy	95a8e0b21d	table: make_compaction_groups: calculate compaction_group token ranges Add dht::split_token_range_msb that returns a token_range_vector with ranges split using a given number of most-significant bits. When creating the table's compaction groups, use dht::split_token_range_msb to calculate the token_range owned by each compaction_group. Refs #12594 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-22 22:54:26 +02:00
Benny Halevy	912b56ebcf	dht: range_streamer: define logger as static dht::logger can't be global in this case, as it's too generic, but should be static to range_streamer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-22 22:54:26 +02:00
Benny Halevy	8009585e7d	table: compaction_group_for_token: use signed arithmetic Add and use dht::compaction_group_of that computes the compaction_group index by unbiasing the token, similar to dht::shard_of. This way, all tokens in `_compaction_groups[i]` are ordered before `_compaction_groups[j]` iff i < j. Fixes #12595 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12599	2023-01-22 11:27:07 +02:00
Botond Dénes	50b155e706	dht/i_partitioner.hh: ring_position_ext: add weight() accessor	2023-01-09 09:46:57 -05:00
Benny Halevy	57ff3f240f	dht: optimize subtract_ranges Take advantage of the fact that both ranges and ranges_to_subtract are deoverlapped and sorted by to reduce the calculation complexity from quadratic to linear. Fixes #11922 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:48:28 +02:00
Benny Halevy	8b81635d95	compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation The algorithm is generic and can be used elsewhere. Add a unit test for the function before it gets optimized in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:48:26 +02:00
Benny Halevy	10f8f13b90	db: view_update_generator: always clean up staging sstables Since they are currently not cleaned up by cleanup compaction filter their tokens, processing only tokens owned by the current node (based on the keyspace replication strategy). Refs #9559 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 07:38:22 +02:00
Benny Halevy	fd3e66b0cc	compaction: extract incremental_owned_ranges_checker out to dht It is currently used by cleanup_compaction partition filter. Factor it out so it can be used to filter staging sstables in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 07:32:56 +02:00

1 2 3 4 5 ...

465 Commits