scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 09:30:45 +00:00

Author	SHA1	Message	Date
Asias He	0c56bbe793	gossip: Make get_supported_features and wait_for_feature_on{_all}_node private They are used only inside gossiper itself. Also make the helper get_supported_features(std::unordered_map<gms::inet_address, sstring>) static. Message-Id: <f434c145ad9138084708b60c1d959b84360e47b2.1467775291.git.asias@scylladb.com>	2016-07-06 09:54:56 +03:00
Asias He	bb80362c3f	gossip: Insert with result.end() in get_supported_features It is faster than result.begin(), suggested by Avi.	2016-07-05 10:09:54 +08:00
Asias He	72cb4a228b	gossip: Add to_feature_set helper To convert a "," split feature string to a feature set.	2016-07-05 10:09:54 +08:00
Asias He	1d6c57fb40	gossip: Reduce timeout in shadow round In `3a36ec33db` (gossip: Wait longer for seed node during boot up), we increased the timeout by the factor of 60, i.e., ring_dealy * 60 = 5 seconds * 60 = 5 minutes. In `57ee9676c2` (storage_service: Fix default ring_delay time), we fixed the default ring_dealy to 30 seconds. Now the timeout is 30 * 60 seconds = 30 minutes, which is too long. Make it 5 minues.	2016-07-05 10:09:54 +08:00
Asias He	88f0bb3a7b	gossip: Add check_knows_remote_features To check if this node knows features in std::unordered_map<inet_address, sstring> peer_features_string	2016-07-05 10:09:54 +08:00
Asias He	2b53c50c15	gossip: Add get_supported_features To get features supported by all the nodes listed in the address/feature map.	2016-07-05 10:09:53 +08:00
Asias He	4f3ce42163	storage_service: Prevent old version node to join a new version cluster We want to prevent older version of scylla which has fewer features to join a cluster with newer version of scylla which has more features, because when scylla sees a feature is enabled on all other nodes, it will start to use the feature and assume existing nodes and future nodes will always have this feature. In order to support downgrade during rolling upgrade, we need to support mixed old and new nodes case. 1) All old nodes O O O O O <- N OK O O O O O <- O OK 2) All new nodes N N N N N <- N OK N N N N N <- O FAIL 3) Mixed old and new nodes O N O N O <- N OK O N O N O <- O OK (O == old node, N == new node, <- == joining the cluster) With this patch, I tested: 1.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 1.2) Add old node to old node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} 2.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 2.2) Add old node to new node cluster seastar - Exiting on unhandled exception: std::runtime_error (Feature check failed. This node can not join the cluster because it does not understand the feature. Local node 127.0.0.4 features = {}, Remote common_features = {RANGE_TOMBSTONES}) 3.1) Add new node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {} 3.2) Add old node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} Fixes #1253	2016-06-17 10:49:45 +08:00
Asias He	32ed468e42	gossip: Remove empty string feature in get_supported_features If the feature string is empty, boost::split will return std::set<sstring> = {""} instead of std::set<sstring> = {} which will make a node with a feaure, e.g. std::set<sstring> = {"RANGE_TOMBSTONES"}, think it does not understand the feature of a node with no features at all.	2016-06-17 10:49:45 +08:00
Duarte Nunes	17a544c4a6	gossip: Add feature default ctor and operator= This allows a feature to be declared and initialized later. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	2c82dcd309	gossip: Decouple feature lifetime from the gossiper This patch changes the gms::feature destructor so it checks whether the gossiper has been stopped before trying to unregister the feature. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Asias He	f27e5d2a68	messaging_service: Delay listening ms during boot up When a node starts up, peer node can send gossip syn message to it before the gossip message handlers are registered in messaging_service. We can see: scylla[123]: [shard 0] rpc - client a.b.c.d: unknown verb exception 6 ignored To fix, we delay the listening of messaging_service to the point when gossip message handlers are registered. Message-Id: <9b20d85e199ef0e44cdcde2920123a301a88f3d7.1464254400.git.asias@scylladb.com>	2016-05-31 12:28:11 +03:00
Duarte Nunes	f613dabf53	gossip: Introduce the gms::feature class This class encapsulates the waiting for a cluster feature. A feature object is registered with the gossiper, which is responsible for later marking it as enabled. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Duarte Nunes	4684b8ecbb	gossip: Refactor waiting for features This patch changes the sleep-based mechanism of detecting new features by instead registering waiters with a condition variable that is signaled whenever a new endpoint information is received. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Duarte Nunes	422f244172	gossip: Don't timeout when waiting for features This patch removes the timeout when waiting for features, since future patches will make this argument unnecessary. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Duarte Nunes	b3011c9039	gossip: Rename set_heart_beat_state ...to set_heart_beat_state_and_update_timestamp in order to make it explicit for callers the update_timestamp is also changed. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1464309023-3254-3-git-send-email-duarte@scylladb.com>	2016-05-27 09:11:39 +03:00
Duarte Nunes	8c0e2e05b7	gossip: Fix modification to shadow endpoint state This patch fixes an inadvertent change to the shadow endpoint state map in gossiper::run, done by calling get_heart_beat_state() which also updates the endpoint state's timestamp. This did not happen for the normal map, but did happen for the shadow map. As a result, every time gossiper::run() was scheduled, endpoint_map_changed would always be true and all the shards would make superfluous copies of the endpoint state maps. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1464309023-3254-2-git-send-email-duarte@scylladb.com>	2016-05-27 09:10:38 +03:00
Asias He	fed1e65e1e	gossip: Do not insert the same node into _live_endpoints_just_added _live_endpoints_just_added tracks the peer node which just becomes live. When a down node gets back, the peer nodes can receive multiple messages which would mark the node up, e.g., the message piled up in the sender's tcp stack, after a node was blocked with gdb and released. Each such message will trigger a echo message and when the reply of the echo message is received (real_mark_alive), the same node will be added to _live_endpoints_just_added.push_back more than once. Thus, we see the same node be favored more than once: INFO 2016-04-12 12:09:57,399 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:09:58,412 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:09:59,429 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:10:00,429 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:10:01,430 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:10:02,442 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:10:03,454 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 To fix, do not insert the node if it is already in _live_endpoints_just_added. Fixes #1178 Message-Id: <6bcfad4430fbc63b4a8c40ec86a2744bdfafb40f.1464161975.git.asias@scylladb.com>	2016-05-25 14:19:40 +03:00
Gleb Natapov	7a54b5ebbb	gossiper: cleanup mark_alive() even more Message-Id: <20160519100513.GE984@scylladb.com>	2016-05-19 12:47:19 +02:00
Asias He	eb9ac9ab91	gms: Optimize gossiper::is_alive In perf-flame, I saw in service::storage_proxy::create_write_response_handler (2.66% cpu) gossiper::is_alive takes 0.72% cpu locator::token_metadata::pending_endpoints_for takes 1.2% cpu After this patch: service::storage_proxy::create_write_response_handler (2.17% cpu) gossiper::is_alive does not show up at all locator::token_metadata::pending_endpoints_for takes 1.3% cpu There is no need to copy the endpoint_state from the endpoint_state_map to check if a node is alive. Optimize it since gossiper::is_alive is called in the fast path. Message-Id: <2144310aef8d170cab34a2c96cb67cabca761ca8.1463540290.git.asias@scylladb.com>	2016-05-18 10:12:38 +03:00
Gleb Natapov	76e0eb426e	gossiper: simplify mark_alive() The code runs in a thread so there is no need to use heap to communicate between statements. Message-Id: <20160517120245.GK984@scylladb.com>	2016-05-17 15:37:21 +03:00
Paweł Dziepak	0d3d0a3c08	gossiper: handle failures in gossiper thread creation seastar::async() creates a seastar thread and to do that allocates memory. That allocation, obviously, may fail so the error handling code needs to be moved so that it also catches errors from thread creation. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-11 23:54:47 +01:00
Pekka Enberg	47a904c0f6	Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias "There is a need to have an ability to detect whether a feature is supported by entire cluster. The way to do it is to advertise feature availability over gossip and then each node will be able to check if all other nodes have a feature in question. The idea is to have new application state SUPPORTED_FEATURES that will contain set of strings, each string holding feature name. This series adds API to do so. The following patch on top of this series demostreates how to wait for features during boot up. FEATURE1 and FEATURE2 are introduced. We use wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully. Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout. --- a/service/storage_service.cc +++ b/service/storage_service.cc @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() { // Add features supported by this local node. When a new feature is // introduced in scylla, update it here, e.g., // return sstring("FEATURE1,FEATURE2") - return sstring(""); + return sstring("FEATURE1,FEATURE2"); } std::set<inet_address> get_seeds() { @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() { // gossip snitch infos (local DC and rack) gossip_snitch_info().get(); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get(); + logger.info("Wait for FEATURE1 and FEATURE2 done"); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get(); + logger.info("Wait for FEATURE3 done"); + We can query the supported_features: cqlsh> SELECT supported_features from system.peers; supported_features -------------------- FEATURE1,FEATURE2 FEATURE1,FEATURE2 (2 rows) cqlsh> SELECT supported_features from system.local; supported_features -------------------- FEATURE1,FEATURE2 (1 rows)"	2016-04-08 09:22:50 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	04e8727793	gossip: Introduce wait_for_feature_on_{all}_node API to wait for features are available on a node or all the nodes in the cluster. $timeout specifies how long we want to wait. If the features are not availabe yet, sleep 2 seconds and retry.	2016-04-06 07:12:34 +08:00
Asias He	1e437e925c	gossip: Introduce get_supported_features - Get features supported by this particular node std::set<sstring> get_supported_features(inet_address endpoint) const; - Get features supported by all the nodes this node knows about std::set<sstring> get_supported_features() const;	2016-04-06 07:12:34 +08:00
Asias He	16af12ca47	gossip: Add comments on external runtime dependency needed by gossip	2016-03-15 16:13:13 +08:00
Asias He	1034dd0aff	gossip: Ignore ack2 message if gosisp is not enabled yet	2016-03-15 16:09:43 +08:00
Asias He	1bf0412e7a	gossip: Introduce handle_shutdown_msg helper	2016-03-15 16:09:43 +08:00
Asias He	54d8ac16b5	gossip: Introduce handle_echo_msg helper	2016-03-15 16:09:42 +08:00
Asias He	1f64f4bfcb	gossip: Introdcue handle_ack2_msg helper	2016-03-15 16:09:42 +08:00
Asias He	9f64c36a08	storage_service: Fix pending_range_calculator_service Since calculate_pending_ranges will modify token_metadata, we need to replicate to other shards. With this patch, when we call calculate_pending_ranges, token_metadata will be replciated to other non-zero shards. In addition, it is not useful as a standalone class. We can merge it into the storage_service. Kill one singleton class. Fixes #1033 Refs #962 Message-Id: <fb5b26311cafa4d315eb9e72d823c5ade2ab4bda.1457943074.git.asias@scylladb.com>	2016-03-14 10:14:22 +02:00
Asias He	134b814cde	gossip: Log status info when stopping gossip	2016-03-10 10:56:48 +08:00
Asias He	ed723665df	gossip: Do not stop gossip more than once If we do - Decommission a node - Stop a node we will shutdown gossip more than once in: - storage_service::decommission - storage_service::drain_on_shutdown Fix by checking if it is already stopped and back off if so.	2016-03-10 10:56:48 +08:00
Vlad Zolotarov	87e6efcdab	storage_service: distribute gossiper::endpoint_state_map together with token_metadata If storage_service::token_metadata is not distributed together with gossiper::endpoint_state_map there may be a situation when a non-zero shard sees a new value in token_metadata (e.g. newly added node's token ranges) while still seeing an old gossiper::endpoint_state_map contents (e.g. a mentioned above newly added node may not be present, thus causing gossiper::is_alive() to return FALSE for that node, while the node is actually alive and kicking). To avoid this discrepancy we will always update a token_metadata together with an endpoint_state_map when we distribute new token_metadata data among shards. Fixes #909 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 13:15:19 +02:00
Vlad Zolotarov	3a72ef87f2	gossiper: make _shadow_endpoint_state_map public and rename We will need to access it from a storage_service class when replicate token_metadata. Rename _shadow_endpoint_state_map -> shadow_endpoint_state_map according to our coding convention. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 11:16:44 +02:00
Vlad Zolotarov	4a21d48cc5	gossiper: use a semaphore instead of a future<> for serializing a timer callback Use a semaphore to allow serializing with a gossiper's timer callback. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 11:16:44 +02:00
Asias He	01cb6b0d42	gossip: Send syn message in parallel and do not wait for it 1) As explained in commit `697b16414a` (gossip: Make gossip message handling async), in each gossip round we can make talking to the 1-3 peer nodes in parallel to reduce latency of gossip round. 2) Gossip syn message uses one way rpc message, but now the returned future of the one way message is ready only when message is dequeued for some reason (sent or dropped). If we wait for the one way syn messge to return it might block the gossip round for a unbounded time. To fix, do not wait for it in the gossip round. The downside is there will be no back pressure to bound the syn messages, however since the messages are once per second, I think it is fine. Message-Id: <ea4655f121213702b3f58185378bb8899e422dd1.1456991561.git.asias@scylladb.com>	2016-03-03 11:17:50 +02:00
Asias He	32eaaecf36	gossip: Get rid of assert Log the error and throw the exception, instead of abort the whole process. Make the code more robust.	2016-02-25 21:19:52 +08:00
Asias He	94cb7f22d4	gossip: Make add_local_application_state safe to call on any cpu add_local_application_state is used in various places. Before this patch, it can only be called on cpu zero. To make it safer to use, use invoke_on() to foward the code to run on cpu zero, so that caller can call it on any cpu. Refs: #795 Message-Id: <d69b81c5561622078dbe887d87209c4ea2e3bf46.1456315043.git.asias@scylladb.com>	2016-02-25 12:45:54 +02:00
Asias He	4e931c2453	gossip: Log the error when fails to add local application state Gleb saw once: scylla: gms/gossiper.cc:1393: gms::gossiper::add_local_application_state(gms::application_state, gms::versioned_value):: mutable: Assertion `endpoint_state_map.count(ep_addr)' failed. The assert is about we can not find the entry in endpoint_state_map of the node itself. I can not really find any place we could call add_local_application_state before we call gossiper::start_gossiping() where it inserts broadcast address into endpoint_state_map. I can not reproduce issue, let's log the error so we can narrow down which application state triggered the assert. Refs: #795 Message-Id: <f4433be0a0d4f23470a5e24e528afdb67b74c7ef.1456315043.git.asias@scylladb.com>	2016-02-25 12:45:17 +02:00
Asias He	697b16414a	gossip: Make gossip message handling async In each gossip round, i.e., gossiper::run(), we do: 1) send syn message 2) peer node: receive syn message, send back ack message 3) process ack message in handle_ack_msg apply_state_locally mark_alive send_gossip_echo handle_major_state_change on_restart mark_alive send_gossip_echo mark_dead on_dead on_join apply_new_states do_on_change_notifications on_change 4) send back ack2 message 5) peer node: process ack2 message apply_state_locally At the moment, syn is "wait" message, it times out in 3 seconds. In step 3, all the registered gossip callbacks are called which might take significant amount of time to complete. In order to reduce the gossip round latency, we make syn "no-wait" and do not run the handle_ack_msg insdie the gossip::run(). As a result, we will not get a ack message as the return value of a syn message any more, so a GOSSIP_DIGEST_ACK message verb is introduced. With this patch, the gossip message exchange is now async. It is useful when some nodes are down in the cluster. We will not delay the gossip round, which is supposed to run every second, 3*n seconds (n = 1-3, since it talks to 1-3 peer nodes in each gossip round) or even longer (considering the time to run gossip callbacks). Later, we can make talking to the 1-3 peer nodes in parallel to reduce latency even more. Refs: #900	2016-02-24 19:33:39 +08:00
Asias He	5003c6e78b	config: Introduce shutdown_announce_in_ms option Time a node waits after sending gossip shutdown message in milliseconds. Reduces ./cql_query_test execution time from real 2m24.272s user 0m8.339s sys 0m10.556s to real 1m17.765s user 0m3.698s sys 0m11.578	2016-01-27 11:19:38 +08:00
Asias He	53c6cd7808	gossip: Rename echo verb to gossip_echo It is used by gossip only. I really could not allow myself to get along this inconsistence. Change before we still can. Message-Id: <1453719054-29584-2-git-send-email-asias@scylladb.com>	2016-01-25 12:53:07 +02:00
Asias He	755d792c78	gossip: Wait for gossip timer callback to finish in do_stop_gossiping Also do not rearm the timer if we stopped the gossip. Message-Id: <73765857b554d9914e87b24d287ff35ab0af6fce.1453378191.git.asias@scylladb.com>	2016-01-21 14:15:57 +02:00
Asias He	826b6ed877	gossip: Print node status in handle_major_state_change Message-Id: <1452768680-32355-1-git-send-email-asias@scylladb.com>	2016-01-14 14:22:37 +02:00
Asias He	e7a899f5f3	gossip: Enable debug msg for convcit Kill one FIXME in convict Message-Id: <1452768680-32355-2-git-send-email-asias@scylladb.com>	2016-01-14 14:22:36 +02:00
Pekka Enberg	973c62a486	gms/gossiper: Fix compilation error Commit `02b04e5` ("gossip: Add is_safe_for_bootstrap") needs on extra curly bracket to compile. Message-Id: <1452177529-13555-1-git-send-email-penberg@scylladb.com>	2016-01-07 16:42:55 +02:00
Asias He	02b04e5907	gossip: Add is_safe_for_bootstrap Make the following tests pass: bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test bootstrap_test.py:TestBootstrap.killed_wiped_node_cannot_join_test 1) start node2 2) wait for cql connection with node2 is ready 3) stop node2 4) delete data and commitlog directory for node2 5) start node2 In step 5), node2 will do the bootstrap process since its data, including the system table is wiped. It will think itself is a completly new node and can possiblly stream from wrong node and violate consistency. To fix, we reject the boot if we found the node was in SHUTDOWN or STATUS_NORMAL. CASSANDRA-9765 Message-Id: <47bc23f4ce1487a60c5b4fbe5bfe9514337480a8.1452158975.git.asias@scylladb.com>	2016-01-07 15:55:01 +02:00
Asias He	2345cda42f	messaging_service: Rename shard_id to msg_addr Use shard_id as the destination of the messaging_service is confusing, since shard_id is used in the context of cpu id. Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>	2016-01-07 10:36:35 +02:00
Asias He	8c909122a6	gossip: Add wait_for_gossip_to_settle Implement the wait for gossip to settle logic in the bootup process. CASSANDRA-4288 Fixes: bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test 1) start node2 2) wait for cql connection with node2 is ready 3) stop node2 4) delete data and commitlog directory for node2 5) start node2 In step 5, sometimes I saw in shadow round of node2, it gets node2's status as BOOT from other nodes in the cluster instead of NORMAL. The problem is we do not wait for gossip to settle before we start cql server, as a result, when we stop node2 in step 3), other nodes in the cluster have not got node2's status update to NORMAL.	2016-01-07 10:09:25 +02:00

1 2 3 4

184 Commits