scylladb

Author	SHA1	Message	Date
Pavel Emelyanov	a1ea553fe1	code: Replace distributed<> with sharded<> The latter is recommended in seastar, and the former was left as compatibility alias. Latest seastar explicitly marks it as deprecated so once the submodule is updated, compilation logs will explode. Most of the patch is generated with for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]>') ; do sed -e 's/\<distributed<$[A-Za-z0-9:_]$>/sharded<\1>/g' -i $f; done for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done and a small manual change in test/perf/perf.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26136	2025-09-19 12:22:51 +02:00
Benny Halevy	cbad497859	locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al to static_effective_replication_map_ptr, in preparation for separating local_effective_replication_map from vnode_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-08-06 16:03:53 +03:00
Benny Halevy	33f34c8c32	dht: range_streamer: use naked e_r_m pointers Prepare for following patch that will separate the local effective replication map from vnode_effective_replication_map. The caller is responsible to keep the effective_replication_map_ptr alive while in use by low-level async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-08-06 13:34:23 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Gleb Natapov	41a57ed2e8	streaming: move streaming code to use host ids instead of host ips The patch is rather large, but it is a straightforward conversion from one type to another.	2024-12-15 11:31:11 +02:00
Kefu Chai	1ce58595aa	dht: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16891	2024-01-21 16:56:16 +02:00
Petr Gusev	7b55ccbd8e	token_metadata: drop the template Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.	2023-12-12 23:19:54 +04:00
Petr Gusev	93263bf9e7	bootstrap: use new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Tomasz Grabiec	c228f2c940	range_streamer, tablets: Do not keep token metadata around streaming It holds back global token metadata barrier during streaming, which limits parallelism of load balancing.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	fd3c089ccc	service: range_streamer: Propagate topology_guard to receivers	2023-12-06 18:36:16 +01:00
Tomasz Grabiec	6d545b2f9e	storage_service: Implement stream_tablet RPC Performs streaming of data for a single tablet between two tablet replicas. The node which gets the RPC is the receiving replica.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	d3c9ad4ed6	locator: Rename effective_replication_map to vnode_effective_replication_map In preparation for introducing a more abstract effective_replication_map which can describe replication maps which are not based on vnodes.	2023-04-24 10:49:36 +02:00
Benny Halevy	27b382dcce	dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining Rather than calling nr_ranges_to_stream() inside `do_streaming`. As nr_ranges_to_stream depends on the `_to_stream` that will be updated only later on after the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:50:40 +02:00
Asias He	9ed401c4b2	streaming: Add finished percentage metrics for node ops using streaming We have added the finished percentage for repair based node operations. This patch adds the finished percentage for node ops using the old streaming. Example output: scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945 scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000 In addition to the metrics, log shows the percentage is added. [shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646 Fixes #11600 Closes #11601	2022-09-22 14:19:34 +03:00
Pavel Emelyanov	360c4f8608	dht: Carry dc-rack over boot_strapper and range_streamer Both classes may populate (temporarly clones of) token metadata object with endpoint:tokens pairs for the endpoint they work with. Next patches will require that endpoint comes with the dc/rack info. This patch makes sure dht classes have the necessary information at hand (for now it's just empty pair of strings). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:37:02 +03:00
Benny Halevy	9b2af3f542	range_streamer: add_ranges and friends: get erm as param Rather than getting it in the callee, let the caller (e.g. storage_service) hold the erm and pass it down to potentially multiple async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Pavel Emelyanov	5e2fa32c8c	range_streamer: Get rack/datacenter from topology It's needed in source filter classes so range-streamer passes the topology reference into its methods. Nice side effect -- snitch header goes away from range-streamer one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-22 11:47:26 +03:00
Asias He	1f8b529e08	range_streamer: Disable restream logic Consider: - n1 and n2 in the cluster - n3 bootstraps to join - n1 does not hear gossip update from n3 due to network issue - n1 removes n3 from gossip and pending node list - stream between n1 and n3 fails - n1 and n3 network issue is fixed - n3 retry the stream with n1 - n3 finishes the stream with n1 - n3 advertises normal to join the cluster The problem is that n1 will not treat n3 as the pending node so writes will not route to n3 once n1 removes n3. Another problem is that when n1 gets normal gossip status update from n3. The gossip listener will fail because n1 has removed n3 so n1 could not find the host id for n3. This will cause n1 to abort. To fix, disable the retry logic in range_streamer so that once a stream with existing fails the bootstrap fails. The downside is that we lose the ability to restream caused by temporary network issue but since we have repair based node operation. We can use it to resume the previous failed node operations. Fixes: #9805 Closes #9806	2022-05-24 11:24:25 +03:00
Pavel Emelyanov	469ded71a9	bootstrapper: Get 'is-replacing' via argument too This also removes the only usage of this helper outside of the storage service. The place that needs it is the use_strict_sources_for_ranges() checker and all the callers of it are aware of whether it's replacing happenning or not. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-07 12:41:02 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Pavel Emelyanov	831f18e392	dht: Pass gossiper to range_streamer::add_ranges A continuation of the previous patch. The range_streamer needs gossiper too, and is called from boot_strapper and storage_service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:54:16 +03:00
Pavel Emelyanov	c593f8624d	dht: Keep stream_manager on board This is the preparation for the future patching. The stream_plan creation will need the manager reference, so keep one on dht object in advance. These are only created from the storage service bootstrap code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-24 12:17:37 +03:00
Pavel Emelyanov	5877b84a1a	range_streamer: Remove stream_plan from The streamer creates stream_plan "on demand" and doesnt use the on-board one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20211112180335.27831-1-xemul@scylladb.com>	2021-11-12 19:38:45 +01:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Emelyanov	ffc9cc9aec	range-streamer: Remove global storage service reference The reference is used by range streamer and (!) storage service itself to find out if the consistent_rangemovement option is ON/OFF. Both places already have the database with config at hands and can be simplified. v2: spellchecking Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210212095403.22662-1-xemul@scylladb.com>	2021-02-12 15:50:30 +01:00
Benny Halevy	63137b35ea	range_streamer: convert to token_metadata_ptr Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	569f2830c1	range_streamer: keep a const token_metadata& range_streamer doesn't need to modify toekn_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Asias He	81f0260816	range_streamer: Handle table of RF 1 in get_range_fetch_map After "Make replacing node take writes" series, with repair based node operations disabled, we saw the replace operation fail like: ``` [shard 0] init - Startup failed: std::runtime_error (unable to find sufficient sources for streaming range (9203926935651910749, +inf) in keyspace system_auth) ``` The reason is the system_auth keyspace has default RF of 1. It is impossible to find a source node to stream from for the ranges owned by the replaced node. In the past, the replace operation with keyspace of RF 1 passes, because the replacing node calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node) before streaming. We saw: ``` [shard 0] range_streamer - Bootstrap : keyspace system_auth range (-9021954492552185543, -9016289150131785593] exists on {127.0.0.6} ``` Node 127.0.0.6 is the replacing node 127.0.0.5. The source node check in range_streamer::get_range_fetch_map will pass if the source is the node itself. However, it will not stream from the node itself. As a result, the system_auth keyspace will not get any data. After the "Make replacing node take writes" series, the replacing node calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node) after the streaming finishes. We saw: ``` [shard 0] range_streamer - Bootstrap : keyspace system_auth range (-9049647518073030406, -9048297455405660225] exists on {127.0.0.5} ``` Since 127.0.0.5 was dead, the source node check failed, so the bootstrap operation. Ta fix, we ignore the keyspace of RF 1 when it is unable to find a source node to stream. Fixes #6351	2020-05-22 09:30:52 +08:00
Avi Kivity	c32f9a8f7b	dht: check for aborts during streaming Propagate the abort_source from main() into boot_strapper and range_stream and check for aborts at strategic points. This includes aborting running stream_plans and aborting sleeps between retries. Fixes #4674	2019-08-18 20:41:07 +03:00
Asias He	b2c110699e	gms: Remove i_failure_detector.hh It is not used any more.	2019-03-22 09:08:51 +08:00
Asias He	2b6a4050c2	dht: Do not use failure_detector::is_alive in failure_detector_source_filter Switch failure_detector_source_filter to use get_local_gossiper::is_alive directly since we are going to remove the static gms::get_local_failure_detector object soon. Pass the nodes that are down to the filter direclty, to avoid the range_streamer to depends on gossiper at all.	2019-03-22 08:26:47 +08:00
Asias He	7f826d3343	streaming: Expose reason for streaming On receiving a mutation_fragment or a mutation triggered by a streaming operation, we pass an enum stream_reason to notify the receiver what the streaming is used for. So the receiver can decide further operation, e.g., send view updates, beyond applying the streaming data on disk. Fixes #3276 Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com>	2018-10-15 22:03:28 +01:00
Asias He	8edf3defdf	range_streamer: Futurize add_ranges It might take long time for get_all_ranges_with_sources_for and get_all_ranges_with_strict_sources_for to calculate which cause reactor stall. To fix, run them in a thread and yield. Those functions are used in the slow path, it is ok to yield more than needed. Fixes #3639 Message-Id: <63aa7794906ac020c9d9b2984e1351a8298a249b.1536135617.git.asias@scylladb.com>	2018-10-09 09:46:50 +03:00
Asias He	95849371aa	range_streamer: Remove unordered_multimap usage We need the mapping between dht::token_range to std::vector<inet_address> and inet_address to dht::token_range_vector in various places. Currently, we use std::unordered_multimap and convert to std::unordered_map. It is better to use std::unordered_map in the first place. The changes like below: - Change from std::unordered_multimap<dht::token_range, inet_address> to std::unordered_map<dht::token_range, std::vector<inet_address>> - Change from std::unordered_multimap<inet_address, dht::token_range> to std::unordered_map<inet_address, dht::token_range_vector> Message-Id: <b8ecc41775e46ec064db3ee07510c404583390aa.1533106019.git.asias@scylladb.com>	2018-08-01 13:01:41 +03:00
Asias He	4a0b561376	storage_service: Get rid of moving operation The moving operation changes a node's token to a new token. It is supported only when a node has one token. The legacy moving operation is useful in the early days before the vnode is introduced where a node has only one token. I don't think it is useful anymore. In the future, we might support adjusting the number of vnodes to reblance the token range each node owns. Removing it simplifies the cluster operation logic and code. Fixes #3475 Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>	2018-08-01 11:18:17 +03:00
Asias He	1f06ee3960	range_streamer: Limit nr of nodes to stream in parallel For example, to bootstrap a 50th node in a cluster [shard 0] range_streamer - Bootstrap with [127.0.0.8, 127.0.0.2, 127.0.0.24, 127.0.0.21, 127.0.0.49, 127.0.0.44, 127.0.0.9, 127.0.0.7, 127.0.0.47, 127.0.0.15, 127.0.0.5, 127.0.0.30, 127.0.0.14, 127.0.0.12, 127.0.0.36, 127.0.0.11, 127.0.0.48, 127.0.0.28, 127.0.0.33, 127.0.0.10, 127.0.0.41, 127.0.0.4, 127.0.0.40, 127.0.0.3, 127.0.0.6, 127.0.0.43, 127.0.0.22, 127.0.0.26, 127.0.0.42, 127.0.0.25, 127.0.0.17, 127.0.0.37, 127.0.0.23, 127.0.0.13, 127.0.0.38, 127.0.0.1, 127.0.0.18, 127.0.0.20, 127.0.0.39, 127.0.0.27, 127.0.0.34, 127.0.0.32, 127.0.0.19, 127.0.0.16, 127.0.0.31, 127.0.0.45, 127.0.0.29, 127.0.0.35, 127.0.0.46] for keyspace=keyspace1 started, nodes_to_stream=49, nodes_in_parallel=49 the new node will get data from 49 existing nodes. Currently, it will stream from all the 49 existing nodes at the same time. It is not a good idea to stream from all the nodes in parallel which can overwhelm the bootstrap node, i.e., 49 nodes sending, 1 node receiving. To fix this, limit the nr of nodes to stream in parallel. We should have a better control over the memory usage and parallelism. But for now, limit the nr of nodes to a maximum of 16 as a starter. With this limit, each shard can work with as many as 16 remote nodes in parallel, I think this has enough parallelism for streaming in terms of performance. This change have effect on the bootstrap/decommission/removenode node operations, and do not have effect on repair. Refs #2782 Message-Id: <980610dc97490d4f16281a0c3203b9bee73e04e4.1531989557.git.asias@scylladb.com>	2018-07-19 11:44:05 +03:00
Asias He	d23dafa7ac	dht: Remove column_families parameter in add_rx_ranges and add_tx_ranges In `4b1034b` (storage_service: Remove the stream_hints), we removed the only user of the api with the column_families parameter. std::vector column_families = { db::system_keyspace::HINTS }; streamer->add_tx_ranges(keyspace, std::move(ranges_per_endpoint), column_families); We can simplify the code range_streamer a bit by removing it. Fixes #3476 Tests: dtest update_cluster_layout_tests.py Message-Id: <c81d79c5e6dbc8dd78c1242837de892e39d6abd2.1528356342.git.asias@scylladb.com>	2018-06-10 14:53:40 +03:00
Asias He	9b5585ebd5	range_streamer: Stream 10% of ranges instead of 10 ranges per time If there are a lot of ranges, e.g., num_tokens=2048, 10 ranges per stream plan will cause tons of stream plan to be created to stream data, each having very few data. This cause each stream plan has low transfer bandwidth, so that the total time to complete the streaming increases. It makes more sense to send a percentage of the total ranges per stream plan than a fixed ranges. Here is an example to stream a keyspace with 513 ranges in total, 10 ranges v.s. 10% ranges: Before: [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=system_traces, 510 out of 513 ranges: ranges = 51 [shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1 succeeded, took 107 seconds After: [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=system_traces, 510 out of 513 ranges: ranges = 10 [shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1 succeeded, took 22 seconds Message-Id: <a890b84fbac0f3c3cc4021e30dbf4cdf135b93ea.1520992228.git.asias@scylladb.com>	2018-03-14 10:12:12 +02:00
Asias He	6810031ba7	dht: Extend range_streamer interface After this patch and the following patches to use the new range_streamder interface, all the following cluster operations: - bootstrap - rebuild - decommission - removenode will use the same code to do the streaming. The range_streamer is now extended to support both fetch from and push to peer node. Another big change is now the range_streamer will stream less ranges at a time, so less data, per stream_plan and range_streamer will remember which ranges are failed to stream and can retry later. The retry policy is very simple at the moment it retries at most 5 times and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes .... Later, we can introduce api for user to decide when to stop retrying and the retry interval. The benefits: - All the cluster operation shares the same code to stream - We can know the operation progress, e.g., we can know total number of ranges need to be streamed and number of ranges finished in bootstrap, decommission and etc. - All the cluster operation can survive peer node down during the operation which usually takes long time to complete, e.g., when adding a new node, currently if any of the existing node which streams data to the new node had issue sending data to the new node, the whole bootstrap process will fail. After this patch, we can fix the problematic node and restart it, the joining node will retry streaming from the node again. - We can fail streaming early and timeout early and retry less because all the operations use stream can survive failure of a single stream_plan. It is not that important for now to have to make a single stream_plan successful. Note, another user of streaming, repair, is now using small stream_plan as well and can rerun the repair for the failed ranges too. This is one step closer to supporting the resumable add/remove node opeartions.	2017-08-07 16:31:47 +08:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	bdd6a69af7	streaming: Drop unused parameters - int connections_per_host Scylla does not create connections per stream_session, instead it uses rpc, thus connections_per_host is not relevant to scylla. - bool keep_ss_table_level - int repaired_at Scylla does not stream sstable files. They are not relevant to scylla.	2016-01-25 11:38:13 +08:00
Asias He	0af7fb5509	range_streamer: Kill FIXME in use_strict_consistency for consistent_rangemovement	2015-11-30 09:15:42 +08:00
Asias He	6aa5bfe59f	range_streamer: Add virtual destructor to i_source_filter Found by debug build ==10190==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x602000084430 in thread T0: object passed to delete has wrong type: size of the allocated type: 16 bytes; size of the deallocated type: 8 bytes. #0 0x7fe244add512 in operator delete(void, unsigned long) (/lib64/libasan.so.2+0x9a512) #1 0x3c674fe in std::default_delete<dht::range_streamer::i_source_filter>::operator()(dht::range_streamer::i_source_filter) const /usr/include/c++/5.1.1/bits/unique_ptr.h:76 #2 0x3c60584 in std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> >::~unique_ptr() /usr/include/c++/5.1.1/bits/unique_ptr.h:236 #3 0x3c7ac22 in void __gnu_cxx::new_allocator<std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> > >::destroy<std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> > >(std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> >*) /usr/include/c++/5.1.1/ext/new_allocator.h:124 ...	2015-11-12 11:19:22 +02:00
Asias He	d166b0f3fa	range_streamer: Add get_work_map	2015-11-09 08:43:04 +08:00
Asias He	b172146223	range_streamer: Introduce single_datacenter_filter	2015-10-23 16:13:30 +08:00

1 2

60 Commits