scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 00:50:35 +00:00

Author	SHA1	Message	Date
Vlad Zolotarov	eb4fbb3949	gms::gossiper: move collectd counters registration to the metrics registration layer Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-01-10 16:24:55 -05:00
Asias He	86c2620b7a	gossip: Skip stopping if it is not started If exception is triggered early in boot when doing an I/O operation, scylla will fail because io checker calls storage service to stop transport services, and not all of them were initialized yet. Scylla was failing as follow: scylla: ./seastar/core/sharded.hh:439: Service& seastar::sharded<Service>::local() [with Service = gms::gossiper]: Assertion `local_is_initialized()' failed. Aborting on shard 0. Backtrace: 0x000000000048a2ca 0x000000000048a3d3 0x00007fc279e739ff 0x00007fc279ad6a27 0x00007fc279ad8629 0x00007fc279acf226 0x00007fc279acf2d1 0x0000000000c145f8 0x000000000110d1bc 0x000000000041bacd 0x00000000005520f1 0x00007fc279aeaf1f Aborted (core dumped) Refs #883. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Asias He <asias@scylladb.com> Message-Id: <963f7b0f5a7a8a1405728b414a7d7a6dccd70581.1479172124.git.asias@scylladb.com>	2016-12-05 09:42:37 +02:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Asias He	774d16306f	gossip: Use lowres_clock for scheduled_gossip_task The timer is fired once per second. Using low resolution clock is enough. Message-Id: <1f21514e975afea6ac5c9dde18a881a41561da70.1475130948.git.asias@scylladb.com>	2016-09-29 10:03:14 +03:00
Asias He	f0d3084c8b	gossip: Switch to use system_clock The expire time which is used to decide when to remove a node from gossip membership is gossiped around the cluster. We switched to steady clock in the past. In order to have a consistent time_point in all the nodes in the cluster, we have to use wall clock. Switch to use system_clock for gossip. Fixes #1704	2016-09-27 16:42:13 +08:00
Asias He	ef782f0335	gossip: Add heart_beat_version to collectd $ tools/scyllatop/scyllatop.py 'gossip' node-1/gossip-0/gauge-heart_beat_version 1.0 node-2/gossip-0/gauge-heart_beat_version 1.0 node-3/gossip-0/gauge-heart_beat_version 1.0 Gossip heart beat version changes every second. If everyting is working correctly, the gauge-heart_beat_version output should be 1.0. If not, the gauge-heart_beat_version output should be less than 1.0. Message-Id: <cbdaa1397cdbcd0dc6a67987f8af8038fd9b2d08.1470712861.git.asias@scylladb.com>	2016-08-15 12:32:00 +03:00
Asias He	0c56bbe793	gossip: Make get_supported_features and wait_for_feature_on{_all}_node private They are used only inside gossiper itself. Also make the helper get_supported_features(std::unordered_map<gms::inet_address, sstring>) static. Message-Id: <f434c145ad9138084708b60c1d959b84360e47b2.1467775291.git.asias@scylladb.com>	2016-07-06 09:54:56 +03:00
Asias He	88f0bb3a7b	gossip: Add check_knows_remote_features To check if this node knows features in std::unordered_map<inet_address, sstring> peer_features_string	2016-07-05 10:09:54 +08:00
Asias He	2b53c50c15	gossip: Add get_supported_features To get features supported by all the nodes listed in the address/feature map.	2016-07-05 10:09:53 +08:00
Asias He	4f3ce42163	storage_service: Prevent old version node to join a new version cluster We want to prevent older version of scylla which has fewer features to join a cluster with newer version of scylla which has more features, because when scylla sees a feature is enabled on all other nodes, it will start to use the feature and assume existing nodes and future nodes will always have this feature. In order to support downgrade during rolling upgrade, we need to support mixed old and new nodes case. 1) All old nodes O O O O O <- N OK O O O O O <- O OK 2) All new nodes N N N N N <- N OK N N N N N <- O FAIL 3) Mixed old and new nodes O N O N O <- N OK O N O N O <- O OK (O == old node, N == new node, <- == joining the cluster) With this patch, I tested: 1.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 1.2) Add old node to old node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} 2.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 2.2) Add old node to new node cluster seastar - Exiting on unhandled exception: std::runtime_error (Feature check failed. This node can not join the cluster because it does not understand the feature. Local node 127.0.0.4 features = {}, Remote common_features = {RANGE_TOMBSTONES}) 3.1) Add new node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {} 3.2) Add old node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} Fixes #1253	2016-06-17 10:49:45 +08:00
Duarte Nunes	f613dabf53	gossip: Introduce the gms::feature class This class encapsulates the waiting for a cluster feature. A feature object is registered with the gossiper, which is responsible for later marking it as enabled. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Duarte Nunes	4684b8ecbb	gossip: Refactor waiting for features This patch changes the sleep-based mechanism of detecting new features by instead registering waiters with a condition variable that is signaled whenever a new endpoint information is received. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Duarte Nunes	422f244172	gossip: Don't timeout when waiting for features This patch removes the timeout when waiting for features, since future patches will make this argument unnecessary. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-05-27 17:20:51 +00:00
Pekka Enberg	47a904c0f6	Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias "There is a need to have an ability to detect whether a feature is supported by entire cluster. The way to do it is to advertise feature availability over gossip and then each node will be able to check if all other nodes have a feature in question. The idea is to have new application state SUPPORTED_FEATURES that will contain set of strings, each string holding feature name. This series adds API to do so. The following patch on top of this series demostreates how to wait for features during boot up. FEATURE1 and FEATURE2 are introduced. We use wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully. Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout. --- a/service/storage_service.cc +++ b/service/storage_service.cc @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() { // Add features supported by this local node. When a new feature is // introduced in scylla, update it here, e.g., // return sstring("FEATURE1,FEATURE2") - return sstring(""); + return sstring("FEATURE1,FEATURE2"); } std::set<inet_address> get_seeds() { @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() { // gossip snitch infos (local DC and rack) gossip_snitch_info().get(); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get(); + logger.info("Wait for FEATURE1 and FEATURE2 done"); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get(); + logger.info("Wait for FEATURE3 done"); + We can query the supported_features: cqlsh> SELECT supported_features from system.peers; supported_features -------------------- FEATURE1,FEATURE2 FEATURE1,FEATURE2 (2 rows) cqlsh> SELECT supported_features from system.local; supported_features -------------------- FEATURE1,FEATURE2 (1 rows)"	2016-04-08 09:22:50 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	04e8727793	gossip: Introduce wait_for_feature_on_{all}_node API to wait for features are available on a node or all the nodes in the cluster. $timeout specifies how long we want to wait. If the features are not availabe yet, sleep 2 seconds and retry.	2016-04-06 07:12:34 +08:00
Asias He	1e437e925c	gossip: Introduce get_supported_features - Get features supported by this particular node std::set<sstring> get_supported_features(inet_address endpoint) const; - Get features supported by all the nodes this node knows about std::set<sstring> get_supported_features() const;	2016-04-06 07:12:34 +08:00
Asias He	1bf0412e7a	gossip: Introduce handle_shutdown_msg helper	2016-03-15 16:09:43 +08:00
Asias He	54d8ac16b5	gossip: Introduce handle_echo_msg helper	2016-03-15 16:09:42 +08:00
Asias He	1f64f4bfcb	gossip: Introdcue handle_ack2_msg helper	2016-03-15 16:09:42 +08:00
Vlad Zolotarov	3a72ef87f2	gossiper: make _shadow_endpoint_state_map public and rename We will need to access it from a storage_service class when replicate token_metadata. Rename _shadow_endpoint_state_map -> shadow_endpoint_state_map according to our coding convention. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 11:16:44 +02:00
Vlad Zolotarov	4a21d48cc5	gossiper: use a semaphore instead of a future<> for serializing a timer callback Use a semaphore to allow serializing with a gossiper's timer callback. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 11:16:44 +02:00
Asias He	01cb6b0d42	gossip: Send syn message in parallel and do not wait for it 1) As explained in commit `697b16414a` (gossip: Make gossip message handling async), in each gossip round we can make talking to the 1-3 peer nodes in parallel to reduce latency of gossip round. 2) Gossip syn message uses one way rpc message, but now the returned future of the one way message is ready only when message is dequeued for some reason (sent or dropped). If we wait for the one way syn messge to return it might block the gossip round for a unbounded time. To fix, do not wait for it in the gossip round. The downside is there will be no back pressure to bound the syn messages, however since the messages are once per second, I think it is fine. Message-Id: <ea4655f121213702b3f58185378bb8899e422dd1.1456991561.git.asias@scylladb.com>	2016-03-03 11:17:50 +02:00
Asias He	59564591d5	storage_service: Use get_gossip_status to get status The help is introduced recently, use it. Avoid to open code it.	2016-02-25 21:19:52 +08:00
Asias He	697b16414a	gossip: Make gossip message handling async In each gossip round, i.e., gossiper::run(), we do: 1) send syn message 2) peer node: receive syn message, send back ack message 3) process ack message in handle_ack_msg apply_state_locally mark_alive send_gossip_echo handle_major_state_change on_restart mark_alive send_gossip_echo mark_dead on_dead on_join apply_new_states do_on_change_notifications on_change 4) send back ack2 message 5) peer node: process ack2 message apply_state_locally At the moment, syn is "wait" message, it times out in 3 seconds. In step 3, all the registered gossip callbacks are called which might take significant amount of time to complete. In order to reduce the gossip round latency, we make syn "no-wait" and do not run the handle_ack_msg insdie the gossip::run(). As a result, we will not get a ack message as the return value of a syn message any more, so a GOSSIP_DIGEST_ACK message verb is introduced. With this patch, the gossip message exchange is now async. It is useful when some nodes are down in the cluster. We will not delay the gossip round, which is supposed to run every second, 3*n seconds (n = 1-3, since it talks to 1-3 peer nodes in each gossip round) or even longer (considering the time to run gossip callbacks). Later, we can make talking to the 1-3 peer nodes in parallel to reduce latency even more. Refs: #900	2016-02-24 19:33:39 +08:00
Asias He	755d792c78	gossip: Wait for gossip timer callback to finish in do_stop_gossiping Also do not rearm the timer if we stopped the gossip. Message-Id: <73765857b554d9914e87b24d287ff35ab0af6fce.1453378191.git.asias@scylladb.com>	2016-01-21 14:15:57 +02:00
Asias He	02b04e5907	gossip: Add is_safe_for_bootstrap Make the following tests pass: bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test bootstrap_test.py:TestBootstrap.killed_wiped_node_cannot_join_test 1) start node2 2) wait for cql connection with node2 is ready 3) stop node2 4) delete data and commitlog directory for node2 5) start node2 In step 5), node2 will do the bootstrap process since its data, including the system table is wiped. It will think itself is a completly new node and can possiblly stream from wrong node and violate consistency. To fix, we reject the boot if we found the node was in SHUTDOWN or STATUS_NORMAL. CASSANDRA-9765 Message-Id: <47bc23f4ce1487a60c5b4fbe5bfe9514337480a8.1452158975.git.asias@scylladb.com>	2016-01-07 15:55:01 +02:00
Asias He	2345cda42f	messaging_service: Rename shard_id to msg_addr Use shard_id as the destination of the messaging_service is confusing, since shard_id is used in the context of cpu id. Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>	2016-01-07 10:36:35 +02:00
Asias He	8c909122a6	gossip: Add wait_for_gossip_to_settle Implement the wait for gossip to settle logic in the bootup process. CASSANDRA-4288 Fixes: bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test 1) start node2 2) wait for cql connection with node2 is ready 3) stop node2 4) delete data and commitlog directory for node2 5) start node2 In step 5, sometimes I saw in shadow round of node2, it gets node2's status as BOOT from other nodes in the cluster instead of NORMAL. The problem is we do not wait for gossip to settle before we start cql server, as a result, when we stop node2 in step 3), other nodes in the cluster have not got node2's status update to NORMAL.	2016-01-07 10:09:25 +02:00
Avi Kivity	f3980f1fad	Merge seastar upstream * seastar 51154f7...8b2171e (9): > memcached: avoid a collision of an expiration with time_point(-1). > tutorial: minor spelling corrections etc. > tutorial: expand semaphores section > Merge "Use steady_clock where monotonic clock is required" from Vlad > Merge "TLS fixes + RPC adaption" from Calle > do_with() optimization > tutorial: explain limiting parallelism using semaphores > submit_io: change pending flushes criteria > apps: remove defunct apps/seastar Adjust code to use steady_clock instead of high_resolution_clock.	2015-12-27 14:40:20 +02:00
Asias He	9d4382c626	gossip: Introduce get_gossip_status Get value of application_state::STATUS.	2015-12-09 12:29:15 +08:00
Asias He	3004866f59	gossip: Rename start to start_gossiping So that we have a more consistent name start_gossiping() and stop_gossiping() and it will not confuse with get_gossiper.start().	2015-12-02 16:50:34 +08:00
Asias He	5c3951b28a	gossip: Get rid of the handler helper	2015-12-02 16:50:34 +08:00
Asias He	7a6ad7aec2	gossip: Fix Assertion `local_is_initialized()' failed This patch fixes the following cql_query_test failure. cql_query_test: scylla/seastar/core/sharded.hh:439: Service& seastar::sharded<Service>::local() [with Service = gms::gossiper]: Assertion `local_is_initialized()' failed. The problem is in gossiper::stop() we call gossip::add_local_application_state() which will in turn call gms::get_local_gossiper(). In seastar::sharded::stop _instances[engine().cpu_id()].service = nullptr; return inst->stop().then([this, inst] { return _instances[engine().cpu_id()].freed.get_future(); }); We set the _instances to nullptr before we call the stop method, so local_is_initialized asserts when we try to access get_local_gossiper again. To fix, we make the stopping of gossiper explicit. In the shutdown procedure, we call stop_gossiping() explicitly. This has two more advantages: 1) The api to stop gossip is now calling the stop_gossiping() instead of sharing the seastar::sharded's stop method. 2) We can now get rid of the _handler seastar::sharded helper.	2015-12-02 16:50:34 +08:00
Asias He	f62a6f234b	gossip: Add shutdown gossip state Backported: CASSANDRA-8336 and CASSANDRA-9871 84b2846 remove redundant state b2c62bb Add shutdown gossip state to prevent timeouts during rolling restarts 8f9ca07 Cannot replace token does not exist - DN node removed as Fat Client Fixes: When X is shutdown, X sends SHUTDOWN message to both Y and Z, but for some reason, only Y receives the message and Z does not receive the message. If Z has a higher gossip version for X than Y has for X, Z will initiate a gossip with Y and Y will mark X alive again. X ------> Y \ / \ / Z	2015-12-01 17:29:25 +08:00
Asias He	80d1d4d161	storage_service: Relax bootstrapping/leaving/moving nodes check in check_for_endpoint_collision When other bootstrapping/leaving/moving nodes are found during bootstrap, instead of throwing immediately, sleep and try again for one minute, hoping other nodes will finish the operation soon. Since we are retrying using shadow gossip round more than once, we need to put the gossip state back to shadow round after each shadow round, to make shadow round works correctly. This is useful when starting an empty cluster for testing. E.g, $ scylla --listen-address 127.0.0.1 $ sleep 3 $ scylla --listen-address 127.0.0.2 $ sleep 3 $ scylla --listen-address 127.0.0.3 Without this patch, node 3 will hit the check. TIME STATUS ----------------------- Node 1: 32:00 Starts 32:00 In NORMAL status Node 2: 32:03 Starts 32:04 In BOOT status 32:10 In NORMAL status Node 3: 32:06 Starts 32:06 Found node 2 in BOOT status, hit the check, sleep and try again 32:11 Found node 2 in NORMAL status, can keep going now 32:12 In BOOT status 32:18 In NORMAL status	2015-11-30 09:07:57 +08:00
Asias He	3b52033371	gossip: Favor newly added node in do_gossip_to_live_member When a new node joins a cluster, it will starts a gossip round with seed node. However, within this round, the seed node will not tell the new node anything it knows about other nodes in the cluster, because the digest in the gossip SYN message contains only the new node itself and no other nodes. The seed node will pick randomly from the live nodes, including the newly added node in do_gossip_to_live_member to start a gossip round. If the new node is "lucky", seed node will talk to it very soon and tells all the information it knows about the cluster, thus the new node will mark the seed node alive and think it has seen the seed node. If there considerably large number of live nodes, it might take a long time before the seed node pick the new node and talk to it. In bootstrap code, storage_service::bootstrap checks if we see any nodes after sleep of RING_DELAY milliseconds and throw "Unable to contact any seeds!" if not, thus the node will fail to bootstrap. To help the seed node talk to new node faster, we favor new node in do_gossip_to_live_member.	2015-11-18 15:00:37 +02:00
Asias He	d622fe867e	gossip: Pass const ref if possible It is clear that we will not change the parameter.	2015-11-09 13:01:37 +02:00
Asias He	5e8037b50a	gossip: Futurize add_local_application_state() We are ignoring the future returned by seastar::async. Futurize it so caller can wait for the application state to be actually applied. In addition, dropping the unused add_local_application_states function.	2015-11-01 11:20:52 +02:00
Vlad Zolotarov	33b195760b	gms::gossiper: allow the modification of _subscribers while it's being iterated Introduce a subscribers_list class that exposes 3 methods: - push_back(s) - adds a new element s to the back of the list - remove(s) - removes an element s from the list - for_each(f) - invoke f on each element of the list - make a subscriber_list store shared_ptr to a subscriber to allow removing (currently it stores a naked pointer to the object). subscribers_list allows push_back() and remove() to be called while another thread (e.g. seastar::async()) is in the middle of for_each(). Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Simplify subscribers_list::remove() method. - load_broadcaster: inherit from enable_shared_from_this instead of async_sharded_service.	2015-10-30 00:16:16 +02:00
Asias He	1469cec5bf	gossiper: Kill free function helper to get heart version and generation number They can only be executed on cpu 0. Make the gossiper member functions for them to do so.	2015-10-27 21:48:37 +08:00
Asias He	f573059698	gossiper: Kill free function helper for {unsafe_,}assassinate_endpoint They can only be executed on cpu 0. Make the gossiper member functions for them to do so.	2015-10-27 21:48:37 +08:00
Asias He	c5f377eb8b	gossip: Simplify get_endpoint_downtime _unreachable_endpoints is replicated to call cores. No need to query on core 0.	2015-10-27 21:48:37 +08:00
Asias He	6f1db4fb72	gossip: Simplify get_unreachable_members _unreachable_endpoints is replicated to call cores. No need to query on core 0. This also fixes a bug in storage_proxy::truncate_blocking which might access _unreachable_endpoints on non-zero cores.	2015-10-27 21:48:37 +08:00
Asias He	a9f96d1f5a	gossip: Replicate _unreachable_endpoints to all cores	2015-10-27 21:48:37 +08:00
Asias He	2439a2a982	gossip: Simplify get_live_members _live_endpoints is replicated to call cores. No need to query on core 0.	2015-10-27 21:48:37 +08:00
Amnon Heiman	ff67285091	gossiper: make the get cluster name and partitioner public The API needs the cluster and the partitioner names, so the methods are now public. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-10-20 10:21:19 +03:00
Asias He	817c138034	gossip: Add get_current_heart_beat_version interface HTTP API will use it.	2015-09-28 09:38:22 +08:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Asias He	c44afca3d8	gossip: Make is_dead_state take const reference	2015-09-11 15:43:27 +08:00

1 2 3

107 Commits