scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Asias He	0c56bbe793	gossip: Make get_supported_features and wait_for_feature_on{_all}_node private They are used only inside gossiper itself. Also make the helper get_supported_features(std::unordered_map<gms::inet_address, sstring>) static. Message-Id: <f434c145ad9138084708b60c1d959b84360e47b2.1467775291.git.asias@scylladb.com>	2016-07-06 09:54:56 +03:00
Asias He	bb80362c3f	gossip: Insert with result.end() in get_supported_features It is faster than result.begin(), suggested by Avi.	2016-07-05 10:09:54 +08:00
Asias He	72cb4a228b	gossip: Add to_feature_set helper To convert a "," split feature string to a feature set.	2016-07-05 10:09:54 +08:00
Asias He	1d6c57fb40	gossip: Reduce timeout in shadow round In `3a36ec33db` (gossip: Wait longer for seed node during boot up), we increased the timeout by the factor of 60, i.e., ring_dealy * 60 = 5 seconds * 60 = 5 minutes. In `57ee9676c2` (storage_service: Fix default ring_delay time), we fixed the default ring_dealy to 30 seconds. Now the timeout is 30 * 60 seconds = 30 minutes, which is too long. Make it 5 minues.	2016-07-05 10:09:54 +08:00
Asias He	88f0bb3a7b	gossip: Add check_knows_remote_features To check if this node knows features in std::unordered_map<inet_address, sstring> peer_features_string	2016-07-05 10:09:54 +08:00
Asias He	2b53c50c15	gossip: Add get_supported_features To get features supported by all the nodes listed in the address/feature map.	2016-07-05 10:09:53 +08:00
Asias He	4f3ce42163	storage_service: Prevent old version node to join a new version cluster We want to prevent older version of scylla which has fewer features to join a cluster with newer version of scylla which has more features, because when scylla sees a feature is enabled on all other nodes, it will start to use the feature and assume existing nodes and future nodes will always have this feature. In order to support downgrade during rolling upgrade, we need to support mixed old and new nodes case. 1) All old nodes O O O O O <- N OK O O O O O <- O OK 2) All new nodes N N N N N <- N OK N N N N N <- O FAIL 3) Mixed old and new nodes O N O N O <- N OK O N O N O <- O OK (O == old node, N == new node, <- == joining the cluster) With this patch, I tested: 1.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 1.2) Add old node to old node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} 2.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 2.2) Add old node to new node cluster seastar - Exiting on unhandled exception: std::runtime_error (Feature check failed. This node can not join the cluster because it does not understand the feature. Local node 127.0.0.4 features = {}, Remote common_features = {RANGE_TOMBSTONES}) 3.1) Add new node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {} 3.2) Add old node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} Fixes #1253	2016-06-17 10:49:45 +08:00
Asias He	32ed468e42	gossip: Remove empty string feature in get_supported_features If the feature string is empty, boost::split will return std::set<sstring> = {""} instead of std::set<sstring> = {} which will make a node with a feaure, e.g. std::set<sstring> = {"RANGE_TOMBSTONES"}, think it does not understand the feature of a node with no features at all.	2016-06-17 10:49:45 +08:00
Duarte Nunes	17a544c4a6	gossip: Add feature default ctor and operator= This allows a feature to be declared and initialized later. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	2c82dcd309	gossip: Decouple feature lifetime from the gossiper This patch changes the gms::feature destructor so it checks whether the gossiper has been stopped before trying to unregister the feature. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Vlad Zolotarov	c58c56bccc	gms::inet_address: add a constructor from socket_address Currently only IPv4 addresses are supported. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Asias He	f27e5d2a68	messaging_service: Delay listening ms during boot up When a node starts up, peer node can send gossip syn message to it before the gossip message handlers are registered in messaging_service. We can see: scylla[123]: [shard 0] rpc - client a.b.c.d: unknown verb exception 6 ignored To fix, we delay the listening of messaging_service to the point when gossip message handlers are registered. Message-Id: <9b20d85e199ef0e44cdcde2920123a301a88f3d7.1464254400.git.asias@scylladb.com>	2016-05-31 12:28:11 +03:00
Duarte Nunes	f613dabf53	gossip: Introduce the gms::feature class This class encapsulates the waiting for a cluster feature. A feature object is registered with the gossiper, which is responsible for later marking it as enabled. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Duarte Nunes	4684b8ecbb	gossip: Refactor waiting for features This patch changes the sleep-based mechanism of detecting new features by instead registering waiters with a condition variable that is signaled whenever a new endpoint information is received. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Duarte Nunes	422f244172	gossip: Don't timeout when waiting for features This patch removes the timeout when waiting for features, since future patches will make this argument unnecessary. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Duarte Nunes	b3011c9039	gossip: Rename set_heart_beat_state ...to set_heart_beat_state_and_update_timestamp in order to make it explicit for callers the update_timestamp is also changed. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1464309023-3254-3-git-send-email-duarte@scylladb.com>	2016-05-27 09:11:39 +03:00
Duarte Nunes	8c0e2e05b7	gossip: Fix modification to shadow endpoint state This patch fixes an inadvertent change to the shadow endpoint state map in gossiper::run, done by calling get_heart_beat_state() which also updates the endpoint state's timestamp. This did not happen for the normal map, but did happen for the shadow map. As a result, every time gossiper::run() was scheduled, endpoint_map_changed would always be true and all the shards would make superfluous copies of the endpoint state maps. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1464309023-3254-2-git-send-email-duarte@scylladb.com>	2016-05-27 09:10:38 +03:00
Asias He	fed1e65e1e	gossip: Do not insert the same node into _live_endpoints_just_added _live_endpoints_just_added tracks the peer node which just becomes live. When a down node gets back, the peer nodes can receive multiple messages which would mark the node up, e.g., the message piled up in the sender's tcp stack, after a node was blocked with gdb and released. Each such message will trigger a echo message and when the reply of the echo message is received (real_mark_alive), the same node will be added to _live_endpoints_just_added.push_back more than once. Thus, we see the same node be favored more than once: INFO 2016-04-12 12:09:57,399 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:09:58,412 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:09:59,429 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:10:00,429 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:10:01,430 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:10:02,442 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 INFO 2016-04-12 12:10:03,454 [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 To fix, do not insert the node if it is already in _live_endpoints_just_added. Fixes #1178 Message-Id: <6bcfad4430fbc63b4a8c40ec86a2744bdfafb40f.1464161975.git.asias@scylladb.com>	2016-05-25 14:19:40 +03:00
Gleb Natapov	7a54b5ebbb	gossiper: cleanup mark_alive() even more Message-Id: <20160519100513.GE984@scylladb.com>	2016-05-19 12:47:19 +02:00
Asias He	eb9ac9ab91	gms: Optimize gossiper::is_alive In perf-flame, I saw in service::storage_proxy::create_write_response_handler (2.66% cpu) gossiper::is_alive takes 0.72% cpu locator::token_metadata::pending_endpoints_for takes 1.2% cpu After this patch: service::storage_proxy::create_write_response_handler (2.17% cpu) gossiper::is_alive does not show up at all locator::token_metadata::pending_endpoints_for takes 1.3% cpu There is no need to copy the endpoint_state from the endpoint_state_map to check if a node is alive. Optimize it since gossiper::is_alive is called in the fast path. Message-Id: <2144310aef8d170cab34a2c96cb67cabca761ca8.1463540290.git.asias@scylladb.com>	2016-05-18 10:12:38 +03:00
Gleb Natapov	76e0eb426e	gossiper: simplify mark_alive() The code runs in a thread so there is no need to use heap to communicate between statements. Message-Id: <20160517120245.GK984@scylladb.com>	2016-05-17 15:37:21 +03:00
Paweł Dziepak	0d3d0a3c08	gossiper: handle failures in gossiper thread creation seastar::async() creates a seastar thread and to do that allocates memory. That allocation, obviously, may fail so the error handling code needs to be moved so that it also catches errors from thread creation. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-11 23:54:47 +01:00
Pekka Enberg	47a904c0f6	Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias "There is a need to have an ability to detect whether a feature is supported by entire cluster. The way to do it is to advertise feature availability over gossip and then each node will be able to check if all other nodes have a feature in question. The idea is to have new application state SUPPORTED_FEATURES that will contain set of strings, each string holding feature name. This series adds API to do so. The following patch on top of this series demostreates how to wait for features during boot up. FEATURE1 and FEATURE2 are introduced. We use wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully. Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout. --- a/service/storage_service.cc +++ b/service/storage_service.cc @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() { // Add features supported by this local node. When a new feature is // introduced in scylla, update it here, e.g., // return sstring("FEATURE1,FEATURE2") - return sstring(""); + return sstring("FEATURE1,FEATURE2"); } std::set<inet_address> get_seeds() { @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() { // gossip snitch infos (local DC and rack) gossip_snitch_info().get(); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get(); + logger.info("Wait for FEATURE1 and FEATURE2 done"); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get(); + logger.info("Wait for FEATURE3 done"); + We can query the supported_features: cqlsh> SELECT supported_features from system.peers; supported_features -------------------- FEATURE1,FEATURE2 FEATURE1,FEATURE2 (2 rows) cqlsh> SELECT supported_features from system.local; supported_features -------------------- FEATURE1,FEATURE2 (1 rows)"	2016-04-08 09:22:50 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	e0a82a1107	gossip: Add supported_features helper in versioned_value Give a supported features sstring, return a versioned_value for it.	2016-04-06 07:12:34 +08:00
Asias He	04e8727793	gossip: Introduce wait_for_feature_on_{all}_node API to wait for features are available on a node or all the nodes in the cluster. $timeout specifies how long we want to wait. If the features are not availabe yet, sleep 2 seconds and retry.	2016-04-06 07:12:34 +08:00
Asias He	1e437e925c	gossip: Introduce get_supported_features - Get features supported by this particular node std::set<sstring> get_supported_features(inet_address endpoint) const; - Get features supported by all the nodes this node knows about std::set<sstring> get_supported_features() const;	2016-04-06 07:12:34 +08:00
Asias He	a6080773b3	gossip: Add SUPPORTED_FEATURES application_state It is used to negotiate cluster wide features.	2016-04-06 07:12:34 +08:00
Asias He	7acc9816d2	gossip: Handle unknown application_state when printing In case an unknown application_state is received, we should be able to handle it when printting. Message-Id: <98d2307359292e90c8925f38f67a74b69e45bebe.1458553057.git.asias@scylladb.com>	2016-03-21 11:59:04 +02:00
Asias He	16af12ca47	gossip: Add comments on external runtime dependency needed by gossip	2016-03-15 16:13:13 +08:00
Asias He	1034dd0aff	gossip: Ignore ack2 message if gosisp is not enabled yet	2016-03-15 16:09:43 +08:00
Asias He	1bf0412e7a	gossip: Introduce handle_shutdown_msg helper	2016-03-15 16:09:43 +08:00
Asias He	54d8ac16b5	gossip: Introduce handle_echo_msg helper	2016-03-15 16:09:42 +08:00
Asias He	1f64f4bfcb	gossip: Introdcue handle_ack2_msg helper	2016-03-15 16:09:42 +08:00
Asias He	9f64c36a08	storage_service: Fix pending_range_calculator_service Since calculate_pending_ranges will modify token_metadata, we need to replicate to other shards. With this patch, when we call calculate_pending_ranges, token_metadata will be replciated to other non-zero shards. In addition, it is not useful as a standalone class. We can merge it into the storage_service. Kill one singleton class. Fixes #1033 Refs #962 Message-Id: <fb5b26311cafa4d315eb9e72d823c5ade2ab4bda.1457943074.git.asias@scylladb.com>	2016-03-14 10:14:22 +02:00
Asias He	134b814cde	gossip: Log status info when stopping gossip	2016-03-10 10:56:48 +08:00
Asias He	ed723665df	gossip: Do not stop gossip more than once If we do - Decommission a node - Stop a node we will shutdown gossip more than once in: - storage_service::decommission - storage_service::drain_on_shutdown Fix by checking if it is already stopped and back off if so.	2016-03-10 10:56:48 +08:00
Vlad Zolotarov	87e6efcdab	storage_service: distribute gossiper::endpoint_state_map together with token_metadata If storage_service::token_metadata is not distributed together with gossiper::endpoint_state_map there may be a situation when a non-zero shard sees a new value in token_metadata (e.g. newly added node's token ranges) while still seeing an old gossiper::endpoint_state_map contents (e.g. a mentioned above newly added node may not be present, thus causing gossiper::is_alive() to return FALSE for that node, while the node is actually alive and kicking). To avoid this discrepancy we will always update a token_metadata together with an endpoint_state_map when we distribute new token_metadata data among shards. Fixes #909 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 13:15:19 +02:00
Vlad Zolotarov	3a72ef87f2	gossiper: make _shadow_endpoint_state_map public and rename We will need to access it from a storage_service class when replicate token_metadata. Rename _shadow_endpoint_state_map -> shadow_endpoint_state_map according to our coding convention. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 11:16:44 +02:00
Vlad Zolotarov	4a21d48cc5	gossiper: use a semaphore instead of a future<> for serializing a timer callback Use a semaphore to allow serializing with a gossiper's timer callback. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 11:16:44 +02:00
Asias He	01cb6b0d42	gossip: Send syn message in parallel and do not wait for it 1) As explained in commit `697b16414a` (gossip: Make gossip message handling async), in each gossip round we can make talking to the 1-3 peer nodes in parallel to reduce latency of gossip round. 2) Gossip syn message uses one way rpc message, but now the returned future of the one way message is ready only when message is dequeued for some reason (sent or dropped). If we wait for the one way syn messge to return it might block the gossip round for a unbounded time. To fix, do not wait for it in the gossip round. The downside is there will be no back pressure to bound the syn messages, however since the messages are once per second, I think it is fine. Message-Id: <ea4655f121213702b3f58185378bb8899e422dd1.1456991561.git.asias@scylladb.com>	2016-03-03 11:17:50 +02:00
Paweł Dziepak	b5eee2e5d4	gms: add inet_address::to_sstring() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 12:49:55 +00:00
Asias He	32eaaecf36	gossip: Get rid of assert Log the error and throw the exception, instead of abort the whole process. Make the code more robust.	2016-02-25 21:19:52 +08:00
Asias He	59564591d5	storage_service: Use get_gossip_status to get status The help is introduced recently, use it. Avoid to open code it.	2016-02-25 21:19:52 +08:00
Asias He	94cb7f22d4	gossip: Make add_local_application_state safe to call on any cpu add_local_application_state is used in various places. Before this patch, it can only be called on cpu zero. To make it safer to use, use invoke_on() to foward the code to run on cpu zero, so that caller can call it on any cpu. Refs: #795 Message-Id: <d69b81c5561622078dbe887d87209c4ea2e3bf46.1456315043.git.asias@scylladb.com>	2016-02-25 12:45:54 +02:00
Asias He	4e931c2453	gossip: Log the error when fails to add local application state Gleb saw once: scylla: gms/gossiper.cc:1393: gms::gossiper::add_local_application_state(gms::application_state, gms::versioned_value):: mutable: Assertion `endpoint_state_map.count(ep_addr)' failed. The assert is about we can not find the entry in endpoint_state_map of the node itself. I can not really find any place we could call add_local_application_state before we call gossiper::start_gossiping() where it inserts broadcast address into endpoint_state_map. I can not reproduce issue, let's log the error so we can narrow down which application state triggered the assert. Refs: #795 Message-Id: <f4433be0a0d4f23470a5e24e528afdb67b74c7ef.1456315043.git.asias@scylladb.com>	2016-02-25 12:45:17 +02:00
Asias He	697b16414a	gossip: Make gossip message handling async In each gossip round, i.e., gossiper::run(), we do: 1) send syn message 2) peer node: receive syn message, send back ack message 3) process ack message in handle_ack_msg apply_state_locally mark_alive send_gossip_echo handle_major_state_change on_restart mark_alive send_gossip_echo mark_dead on_dead on_join apply_new_states do_on_change_notifications on_change 4) send back ack2 message 5) peer node: process ack2 message apply_state_locally At the moment, syn is "wait" message, it times out in 3 seconds. In step 3, all the registered gossip callbacks are called which might take significant amount of time to complete. In order to reduce the gossip round latency, we make syn "no-wait" and do not run the handle_ack_msg insdie the gossip::run(). As a result, we will not get a ack message as the return value of a syn message any more, so a GOSSIP_DIGEST_ACK message verb is introduced. With this patch, the gossip message exchange is now async. It is useful when some nodes are down in the cluster. We will not delay the gossip round, which is supposed to run every second, 3*n seconds (n = 1-3, since it talks to 1-3 peer nodes in each gossip round) or even longer (considering the time to run gossip callbacks). Later, we can make talking to the 1-3 peer nodes in parallel to reduce latency even more. Refs: #900	2016-02-24 19:33:39 +08:00
Asias He	022c7e50a1	failure_detector: Fix false alarm of "Not marking nodes down due to local pause of" The problem is we initialize _last_interpret when failure_detector object is constructed. When interpret() runs for the first time, the _last_interpret value is not the last time we run interpret() but the time we initialize failure_detector object. Fix by initializing _last_interpret inside interpret(). [Thu Feb 18 02:40:04 2016] INFO [shard 0] storage_service - Node 127.0.0.1 state jump to normal [Thu Feb 18 02:40:04 2016] INFO [shard 0] storage_service - NORMAL: node is now in normal status [Thu Feb 18 02:40:04 2016] INFO [shard 0] gossip - Waiting for gossip to settle before accepting client requests... [Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - No gossip backlog; proceeding Starting listening for CQL clients on 127.0.0.1:9042... [Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - Node 127.0.0.2 is now part of the cluster [Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - InetAddress 127.0.0.2 is now UP [Thu Feb 18 02:40:13 2016] INFO [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 [Thu Feb 18 02:40:13 2016] WARN [shard 0] failure_detector - Not marking nodes down due to local pause of 9091 > 5000 (milliseconds)	2016-02-24 19:31:14 +08:00
Erich Keane	e87019843f	Fix PHI_FACTOR definition to be spec compliant PHI_FACTOR is a constexpr variable that is defined using std::log. Though G++ has a constexpr version of std::log, this itself is not spec complaint (in fact, Clang enforces this). See C++ Spec 26.8 for the definition of std::log and 17.6.5.6 for the rule regarding adding constexpr where it isn't specified. This patch replaces the std::log statement with a version from math.h that contains the exact value (M_LOG10El). Signed-off-by: Erich Keane <erich.keane@verizon.net> Message-Id: <1454603285-32677-1-git-send-email-erich.keane@verizon.net>	2016-02-04 18:33:44 +02:00
Gleb Natapov	4e440ebf8e	Remove old inet_address and uuid serializers	2016-02-02 12:15:50 +02:00

1 2 3 4 5 ...

341 Commits