scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 01:50:35 +00:00

Author	SHA1	Message	Date
Asias He	22f41f04ba	range_streamer: Futurize add_ranges It might take long time for get_all_ranges_with_sources_for and get_all_ranges_with_strict_sources_for to calculate which cause reactor stall. To fix, run them in a thread and yield. Those functions are used in the slow path, it is ok to yield more than needed. Fixes #3639 Message-Id: <63aa7794906ac020c9d9b2984e1351a8298a249b.1536135617.git.asias@scylladb.com> (cherry picked from commit `8edf3defdf`)	2019-02-20 11:03:12 +02:00
Asias He	10cf97375e	streaming: Expose reason for streaming On receiving a mutation_fragment or a mutation triggered by a streaming operation, we pass an enum stream_reason to notify the receiver what the streaming is used for. So the receiver can decide further operation, e.g., send view updates, beyond applying the streaming data on disk. Fixes #3276 Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com> (cherry picked from commit `7f826d3343`)	2018-11-15 17:45:31 +02:00
Asias He	95849371aa	range_streamer: Remove unordered_multimap usage We need the mapping between dht::token_range to std::vector<inet_address> and inet_address to dht::token_range_vector in various places. Currently, we use std::unordered_multimap and convert to std::unordered_map. It is better to use std::unordered_map in the first place. The changes like below: - Change from std::unordered_multimap<dht::token_range, inet_address> to std::unordered_map<dht::token_range, std::vector<inet_address>> - Change from std::unordered_multimap<inet_address, dht::token_range> to std::unordered_map<inet_address, dht::token_range_vector> Message-Id: <b8ecc41775e46ec064db3ee07510c404583390aa.1533106019.git.asias@scylladb.com>	2018-08-01 13:01:41 +03:00
Asias He	4a0b561376	storage_service: Get rid of moving operation The moving operation changes a node's token to a new token. It is supported only when a node has one token. The legacy moving operation is useful in the early days before the vnode is introduced where a node has only one token. I don't think it is useful anymore. In the future, we might support adjusting the number of vnodes to reblance the token range each node owns. Removing it simplifies the cluster operation logic and code. Fixes #3475 Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>	2018-08-01 11:18:17 +03:00
Asias He	1f06ee3960	range_streamer: Limit nr of nodes to stream in parallel For example, to bootstrap a 50th node in a cluster [shard 0] range_streamer - Bootstrap with [127.0.0.8, 127.0.0.2, 127.0.0.24, 127.0.0.21, 127.0.0.49, 127.0.0.44, 127.0.0.9, 127.0.0.7, 127.0.0.47, 127.0.0.15, 127.0.0.5, 127.0.0.30, 127.0.0.14, 127.0.0.12, 127.0.0.36, 127.0.0.11, 127.0.0.48, 127.0.0.28, 127.0.0.33, 127.0.0.10, 127.0.0.41, 127.0.0.4, 127.0.0.40, 127.0.0.3, 127.0.0.6, 127.0.0.43, 127.0.0.22, 127.0.0.26, 127.0.0.42, 127.0.0.25, 127.0.0.17, 127.0.0.37, 127.0.0.23, 127.0.0.13, 127.0.0.38, 127.0.0.1, 127.0.0.18, 127.0.0.20, 127.0.0.39, 127.0.0.27, 127.0.0.34, 127.0.0.32, 127.0.0.19, 127.0.0.16, 127.0.0.31, 127.0.0.45, 127.0.0.29, 127.0.0.35, 127.0.0.46] for keyspace=keyspace1 started, nodes_to_stream=49, nodes_in_parallel=49 the new node will get data from 49 existing nodes. Currently, it will stream from all the 49 existing nodes at the same time. It is not a good idea to stream from all the nodes in parallel which can overwhelm the bootstrap node, i.e., 49 nodes sending, 1 node receiving. To fix this, limit the nr of nodes to stream in parallel. We should have a better control over the memory usage and parallelism. But for now, limit the nr of nodes to a maximum of 16 as a starter. With this limit, each shard can work with as many as 16 remote nodes in parallel, I think this has enough parallelism for streaming in terms of performance. This change have effect on the bootstrap/decommission/removenode node operations, and do not have effect on repair. Refs #2782 Message-Id: <980610dc97490d4f16281a0c3203b9bee73e04e4.1531989557.git.asias@scylladb.com>	2018-07-19 11:44:05 +03:00
Asias He	27cb41ddeb	range_streamer: Use float for time took for stream It is useful when the total time to stream is small, e.g, 2.0 seconds and 2.9 seconds. Showing the time as interger number of seconds is not accurate in such case. Message-Id: <d801b57279981c72acb907ad4b0190ba4d938a3d.1530175052.git.asias@scylladb.com>	2018-06-28 11:39:14 +03:00
Asias He	d23dafa7ac	dht: Remove column_families parameter in add_rx_ranges and add_tx_ranges In `4b1034b` (storage_service: Remove the stream_hints), we removed the only user of the api with the column_families parameter. std::vector column_families = { db::system_keyspace::HINTS }; streamer->add_tx_ranges(keyspace, std::move(ranges_per_endpoint), column_families); We can simplify the code range_streamer a bit by removing it. Fixes #3476 Tests: dtest update_cluster_layout_tests.py Message-Id: <c81d79c5e6dbc8dd78c1242837de892e39d6abd2.1528356342.git.asias@scylladb.com>	2018-06-10 14:53:40 +03:00
Asias He	9b5585ebd5	range_streamer: Stream 10% of ranges instead of 10 ranges per time If there are a lot of ranges, e.g., num_tokens=2048, 10 ranges per stream plan will cause tons of stream plan to be created to stream data, each having very few data. This cause each stream plan has low transfer bandwidth, so that the total time to complete the streaming increases. It makes more sense to send a percentage of the total ranges per stream plan than a fixed ranges. Here is an example to stream a keyspace with 513 ranges in total, 10 ranges v.s. 10% ranges: Before: [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=system_traces, 510 out of 513 ranges: ranges = 51 [shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1 succeeded, took 107 seconds After: [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=system_traces, 510 out of 513 ranges: ranges = 10 [shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1 succeeded, took 22 seconds Message-Id: <a890b84fbac0f3c3cc4021e30dbf4cdf135b93ea.1520992228.git.asias@scylladb.com>	2018-03-14 10:12:12 +02:00
Asias He	73d8e2743f	dht: Fix log in range_streamer The address and keyspace should be swapped. Before: range_streamer - Bootstrap with ks3 for keyspace=127.0.0.1 succeeded, took 56 seconds After: range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 56 seconds Message-Id: <5c49646f1fbe45e3a1e7545b8470e04b166922c4.1520416042.git.asias@scylladb.com>	2018-03-07 11:49:58 +02:00
Duarte Nunes	2210d10552	gms/gossiper: Cleanup is_alive() Make it use get_endpoint_state_for_endpoint_ptr(), check if gossiper is enabled, mark it as const, and have some callers use it instead of open coding the logic. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-10-11 10:02:32 +01:00
Duarte Nunes	ceebbe14cc	gossiper: Avoid endpoint_state copies gossiper::get_endpoint_state_for_endpoint() returns a copy of endpoint_state, which we've seen can be very expensive. This patch adds a similar function which returns a pointer instead, and changes the call sites where using the pointer-returning variant is deemed safe (the pointer neither escapes the function, nor crosses any defer point). Fixes #764 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-10-10 13:48:02 +01:00
Tomasz Grabiec	741ec61269	streaming: Fix streaming not streaming all ranges It skipped one sub-range in each of the 10 range batch, and tried to access the range vector using end() iterator. Fixes sporadic failures of update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_node_1_test. Message-Id: <1505848902-16734-1-git-send-email-tgrabiec@scylladb.com>	2017-09-20 10:33:59 +03:00
Asias He	6810031ba7	dht: Extend range_streamer interface After this patch and the following patches to use the new range_streamder interface, all the following cluster operations: - bootstrap - rebuild - decommission - removenode will use the same code to do the streaming. The range_streamer is now extended to support both fetch from and push to peer node. Another big change is now the range_streamer will stream less ranges at a time, so less data, per stream_plan and range_streamer will remember which ranges are failed to stream and can retry later. The retry policy is very simple at the moment it retries at most 5 times and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes .... Later, we can introduce api for user to decide when to stop retrying and the retry interval. The benefits: - All the cluster operation shares the same code to stream - We can know the operation progress, e.g., we can know total number of ranges need to be streamed and number of ranges finished in bootstrap, decommission and etc. - All the cluster operation can survive peer node down during the operation which usually takes long time to complete, e.g., when adding a new node, currently if any of the existing node which streams data to the new node had issue sending data to the new node, the whole bootstrap process will fail. After this patch, we can fix the problematic node and restart it, the joining node will retry streaming from the node again. - We can fail streaming early and timeout early and retry less because all the operations use stream can survive failure of a single stream_plan. It is not that important for now to have to make a single stream_plan successful. Note, another user of streaming, repair, is now using small stream_plan as well and can rerun the repair for the failed ranges too. This is one step closer to supporting the resumable add/remove node opeartions.	2017-08-07 16:31:47 +08:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Asias He	f6a2672be0	storage_service: Modify log to match config option of scylla We currently log as follow: May 9 00:09:13 node3.nl scylla[2546]: [shard 0] storage_service - This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set,or all existing data is removed and the node is bootstrapped again Howerver, user should use override_decommission:true instead of cassandra.override_decommission:true in scylla.yaml where the cassandra prefix is stripped. Fixes #1240 Message-Id: <b0c9424c6922431ad049ab49391771e07ca6fbde.1467079190.git.asias@scylladb.com>	2016-07-04 10:47:49 +02:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	89b79d44de	streaming: Get rid of the _connecting_ parameter messaging_service will use private ip address automatically to connect a peer node if possible. There is no need for the upper level like streaming to worry about it. Drop it simplifies things a bit.	2015-12-31 11:25:08 +01:00
Asias He	0af7fb5509	range_streamer: Kill FIXME in use_strict_consistency for consistent_rangemovement	2015-11-30 09:15:42 +08:00
Asias He	f80e3d7859	range_streamer: Simplify multiple_map to map conversion in add_ranges	2015-11-30 09:15:42 +08:00
Asias He	21882f5122	range_streamer: Kill one leftover comment	2015-11-30 09:15:42 +08:00
Asias He	6b258f1247	range_streamer: Kill FIXME for is_replacing	2015-11-30 09:15:42 +08:00
Asias He	87292d6a16	range_streamer: Simplify unordered_multimap_to_unordered_map operator[] is own friend, it creates map[x] if x is not in the map.	2015-11-09 08:43:04 +08:00
Asias He	a54989cd65	range_streamer: Fix get_all_ranges_with_strict_sources_for std::set_difference requires the container to be sorted which is not true here, use remove_if. Do not use assert, use throw instead so that we can recover from this error.	2015-11-09 08:43:04 +08:00
Asias He	d166b0f3fa	range_streamer: Add get_work_map	2015-11-09 08:43:04 +08:00
Asias He	1bbc1920d2	range_streamer: Start to use get_preferred_ip It is available now.	2015-10-27 21:48:37 +08:00
Asias He	dc02d76aee	range_streamer: Implement fetch_async It is used by boot_strapper::bootstrap() in bootstrap process to start the streaming.	2015-10-13 15:45:56 +08:00
Asias He	887c0a36ec	range_streamer: Implement add_ranges It is used by boot_strapper::bootstrap() in bootstrap process.	2015-10-13 15:45:56 +08:00
Asias He	1c1f9bed09	range_streamer: Implement use_strict_sources_for_ranges	2015-10-13 15:45:55 +08:00
Asias He	d47ea88aa8	range_streamer: Implement get_all_ranges_with_strict_sources_for	2015-10-13 15:45:55 +08:00
Asias He	84de936e43	range_streamer: Implement get_all_ranges_with_sources_for	2015-10-13 15:45:55 +08:00
Asias He	944e28cd6c	range_streamer: Implement get_range_fetch_map	2015-10-13 15:45:55 +08:00
Asias He	c8b9a6fa06	dht: Convert RangeStreamer to C++	2015-10-13 15:45:55 +08:00

34 Commits