scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 03:20:37 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	e80adbade3	code: De-globalize gossiper No code uses global gossiper instance, it can be removed. The main and cql-test-env code now have their own real local instances. This change also requires adding the debug:: pointer and fixing the scylle-gdb.py to find the correct global location. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Solodovnikov	746f1179eb	gms: gossiper: coroutinize `apply_state_locally_without_listener_notification` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-17 11:38:33 +03:00
Pavel Solodovnikov	b7322c3f5d	gms: gossiper: coroutinize `do_apply_state_locally` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-17 11:29:26 +03:00
Pavel Solodovnikov	c48dcf607a	gms: gossiper: coroutinize `apply_new_states` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-17 11:28:42 +03:00
Raphael S. Carvalho	8427ec056c	gms: gossiper: don't duplicate knowledge of minimum time for gossip to settle Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220409022435.58070-2-raphaelsc@scylladb.com>	2022-04-11 19:19:02 +03:00
Pavel Emelyanov	3da5f6ac30	gossiper: Add system keyspace dependency The gossiper reads peer features from system keyspace. Also the snitch code needs system keyspace, and since now it gets all its dependencies from gossiper (will be fixed some day, but not now), it will do the same for sys.ks.. Thus it's worth having gossiper->system_keyspace explicit dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-25 15:08:13 +03:00
Pavel Emelyanov	6a154305d7	gossiper: Remove db::config reference from gossiper Also const-ify the db::config reference argument and std::move the gossip_config argument while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-02 18:34:55 +03:00
Pavel Emelyanov	0c24087007	gossiper: Keep live-updateable options on gossiper These options need to have updateable_value<> instance referencing them from gossiper itself. The updateable_value<> is shard-aware in the sense that it should be constructed on correct shard. This patch does this -- the db::config reference is carried all the way down to the gossiper constructor, then each instance gets its shard-local construction of the updateable_value<>s. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-02 18:34:55 +03:00
Pavel Emelyanov	271ceb57b9	gossiper: Keep immutable options on gossip_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-02 18:34:55 +03:00
Pavel Solodovnikov	e26829e202	gms: gossiper: coroutinize `handle_major_state_change` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-02-05 10:15:21 +03:00
Pavel Solodovnikov	9ce0e2efa3	gms: gossiper: coroutinize `mark_as_shutdown` and `convict` Since these two functions call each other, convert to coroutines and eliminate the dependency on `seastar::async` for both of them at the same time. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-02-05 10:15:21 +03:00
Pavel Solodovnikov	529f4d0f98	gms: gossiper: coroutinize `do_on_change_notifications` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-02-05 10:15:21 +03:00
Pavel Solodovnikov	37066039df	gms: gossiper: coroutinize `do_before_change_notifications` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-02-05 10:15:21 +03:00
Pavel Solodovnikov	231d8a3ad4	gms: gossiper: coroutinize `real_mark_alive` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-02-05 10:15:21 +03:00
Pavel Solodovnikov	c929f23b8d	gms: gossiper: coroutinize `mark_dead` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-02-05 10:15:20 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Pavel Solodovnikov	a01c900d66	gms: gossiper: coroutinize `do_status_check` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-01-11 09:29:12 +03:00
Pavel Solodovnikov	42ff01eee2	gms: gossiper: coroutinize `remove_endpoint` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-01-11 09:29:12 +03:00
Avi Kivity	87917d2536	Merge "gms: gossiper: coroutinize a few small functions" from Pavel S " Start converting small functions in gossiper code from using `seastar::thread` context to coroutines. For now, the changes are quite trivial. Later, larger code fragments will be converted to eliminate uses of `seastar::async` function calls. Moving the code to coroutines makes the code a bit more readable and also mmediately evident that a given function is async just looking at the signature (for example, for void-returning functions, a coroutine will return `future<>` instead of `void` in case of a seastar::thread-using function). Tests: unit(dev) " * 'coro_gossip_v1' of https://github.com/ManManson/scylla: gms: gossiper: coroutinize `maybe_enable_features` gms: gossiper: coroutinize `wait_alive` gms: gossiper: coroutinize `add_saved_endpoint` gms: gossiper: coroutinize `evict_from_membership`	2021-12-15 16:02:18 +02:00
Pavel Solodovnikov	47533bca65	gms: gossiper: coroutinize `maybe_enable_features` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-12-11 09:39:48 +03:00
Pavel Solodovnikov	3993c6a9fb	gms: gossiper: coroutinize `wait_alive` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-12-11 09:30:32 +03:00
Pavel Solodovnikov	a6ff04dd24	gms: gossiper: coroutinize `add_saved_endpoint` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-12-11 09:23:35 +03:00
Pavel Solodovnikov	23dd8b66c5	gms: gossiper: coroutinize `evict_from_membership` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-12-11 09:15:03 +03:00
Juliusz Stasiewicz	351f142791	cdc/check_and_repair_cdc_streams: ignore LEFT endpoints When `check_and_repair_cdc_streams` encountered a node with status LEFT, Scylla would throw. This behavior is fixed so that LEFT nodes are simply ignored. Fixes #9771 Closes #9778	2021-12-10 15:28:14 +01:00
Benny Halevy	55967a8597	batchlog_manager: endpoint_filter: move to gossiper There's nothing in this function that actually requries the batchlog manager instance. It uses a random number engine that's moved along with it to class gossiper. This resolves a circular dependency between the batchlog_manager and storage_proxy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Pavel Emelyanov	9fccf7f3af	gossiper: Guard background processing with gate When shutdown gossiper may have some messages being processed in the background. This brings two problems. First, the gossiper itself is about to disappear soon and messages might step on the freed instance (however, this one is not real now, gossiper is not freed for real, just ::stop() is called). Second, messages processing may notify other subsystems which, in turn, do not expect this after gossiper is shutdown. The common solution to this is to run background code through a gate that gets closed at some point, the ::shutdown() in gossiper case. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-08 10:25:03 +03:00
Pavel Emelyanov	42f44adb98	gossiper: Helper for background messaging processing Some messages are processed by gossiper on shard0 in the no-wait manner. Add a generic helper for that to facilitate next patching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-08 10:24:44 +03:00
Asias He	1657e7be14	gossiper: Send generation number with shutdown message Consider: - n1, n2 in the cluster - n2 shutdown - n2 sends gossip shutdown message to n1 - n1 delays processing of the handler of shutdown message - n2 restarts - n1 learns new gossip state of n2 - n1 resumes to handle the shutdown message - n1 will mark n2 as shutdown status incorrectly until n2 restarts again To prevent this, we can send the gossip generation number along with the shutdown message. If the generation number does not match the local generation number for the remote node, the shutdown message will be ignored. Since we use the rpc::optional to send the generation number, it works with mixed cluster. Fixes #8597 Closes #9381	2021-09-27 11:08:43 +03:00
Pavel Emelyanov	968e117315	gossiper: Relax set_seeds() It's much shorter and simpler to pass the seeds, obtained from the config, into gossiper via gossip_config rahter than with the help of a special call. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	7680274e02	storage_service: Eliminate the do-bind argument from everywhere The same as in previous patch -- the gossiper doesn't need to know if it should call messaging.start_listen() or not, neither should do the storage_service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	0607a2b84f	gossiper: Drop ms-registered manipulations Now it's no-op and can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	9aba3e6f9f	gossiper: Split (un)init_messaging_handler() As a preparation for the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	dfe54207cb	gossiper: Relocate stop_gossiping() into .stop() The helper in question is called in two places: 1. In main() as a fuse against early exception before creating the drain_on_shutdown() defer 2. In the stop_gossiping() API call Both can be replaced with the stop_gossiping() call from the .stop() method, here's why: 1. In main the gossiper::stop() call is already deferred right after the gossiper is started. So this change moves it above. It may happen that an exception pops up before the old fuse was deferred, but that's OK -- the stop_gossiping() is safe against early- and re- entrances 2. The stop_gossiping() change is effectlvey a rename -- it calls the stop_gossiping() as it did before, but with the help of the .stop() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	e24c5034b5	gossiper: Introduce .shutdown() and use where appropriate The start/stop sequence we're moving towards assumes a shutdown (or drain) method that will be called early on stop to notify the service that the system is going down so it could prepare. For gossiper it already means calling stop_gossiping() on the shard-0 instance. So by and large this patch renames a few stop_gossiping() calls into .shutdown() ones. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	25210334b6	gossiper: Set cluster_name via gossip_config It's taken purely from the db::config and thus can be set up early. Right now the empty name is converted into "Test Cluster" one, but remains empty in the config and is later used by the system_keyspace code. This logic remains intact. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	084abb824e	gossiper, main: Straighten start/stop Turn the gossiper start/stop sequence into the canonical form gossiper.start(std::ref(dependencies)...).get(); auto stop_gossiper = defer({ gossiper.invoke_on_all(&gossiper::stop).get(); }); gossiper.invoke_on_all(&gossiper::start).get(); The deferred call should be gossiper.stop(); but for now keep the instances memory alive. This trick is safe at this point, because .start() and .stop() methods are both empty (still). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:05 +03:00
Pavel Emelyanov	89adb0df90	gossiper: Merge start_gossiping() overloads into one There are two of them and one is only called from the API with the do_bind always set to "yes". This fact makes it possible to remove it by adding relevant defaults for the other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-21 11:19:16 +03:00
Pavel Emelyanov	efb0ddff21	gossiper: Fix do_shadow_round comment Shadow round is used during each boot, not only during node replacement Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-21 11:19:16 +03:00
Pavel Emelyanov	f7ab1aa876	gossiper: Dispose dead code The debug_show() is unused, as well as the advertise_myself(). The _features_condvar used to be listened on before `f32f08c9`, now it's signal-only. Feature frendship with gossiper is not required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-21 11:19:16 +03:00
Avi Kivity	aa68927873	gossiper: remove get_local_gossiper() from some inline helpers Some state accessors called get_local_gossiper(); this is removed and replaced with a parameter. Some callers (redis, alternators) now have the gossiper passed as a parameter during initialization so they can use the adjusted API.	2021-09-07 17:03:37 +03:00
Avi Kivity	9ce1af9fcb	gossiper: remove get_gossiper() from stop_gossiping() Have the callers pass it instead, and they all have a reference already except for cql_test_env (which will be fixed later). The checks for initialization it does are likely unnecessary, but we'll only be able to prove it when get_gossiper() is completely removed.	2021-09-07 16:20:04 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Asias He	9b902fad79	gossiper: Update timestamp for nodes in ack and ack2 msg handler In commit `425e3b1182` (gossip: Introduce direct failure detector), the call to notify_failure_detector inside ack and ack2 msg handler was removed since there is no need to update the old failure detector anymore. However, the timestamp for endpoit_state is also updated inside notify_failure_detector. With the new failure detector we still need the timestamp for endpoit_state. Otherwise, nodes might be removed from gossip wrongly. For example, as we saw in issue #8702: INFO 2021-05-24 22:45:24,713 [shard 0] gossip - FatClient 127.0.60.2 has been silent for 5000ms, removing from gossip To fix, update the timestamp as we do before in ack and ack2 msg handler. Fixes #8702 Closes #8777	2021-06-06 09:21:23 +03:00
Avi Kivity	50f3bbc359	Merge "treewide: various header cleanups" from Pavel S " The patch set is an assorted collection of header cleanups, e.g: * Reduce number of boost includes in header files * Switch to forward declarations in some places A quick measurement was performed to see if these changes provide any improvement in build times (ccache cleaned and existing build products wiped out). The results are posted below (`/usr/bin/time -v ninja dev-build`) for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX). Before: Command being timed: "ninja dev-build" User time (seconds): 28262.47 System time (seconds): 824.85 Percent of CPU this job got: 3979% Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2129888 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1402838 Minor (reclaiming a frame) page faults: 124265412 Voluntary context switches: 1879279 Involuntary context switches: 1159999 Swaps: 0 File system inputs: 0 File system outputs: 11806272 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 After: Command being timed: "ninja dev-build" User time (seconds): 26270.81 System time (seconds): 767.01 Percent of CPU this job got: 3905% Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2117608 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1400189 Minor (reclaiming a frame) page faults: 117570335 Voluntary context switches: 1870631 Involuntary context switches: 1154535 Swaps: 0 File system inputs: 0 File system outputs: 11777280 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 The observed improvement is about 5% of total wall clock time for `dev-build` target. Also, all commits make sure that headers stay self-sufficient, which would help to further improve the situation in the future. " * 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla: transport: remove extraneous `qos/service_level_controller` includes from headers treewide: remove evidently unneded storage_proxy includes from some places service_level_controller: remove extraneous `service/storage_service.hh` include sstables/writer: remove extraneous `service/storage_service.hh` include treewide: remove extraneous database.hh includes from headers treewide: reduce boost headers usage in scylla header files cql3: remove extraneous includes from some headers cql3: various forward declaration cleanups utils: add missing <limits> header in `extremum_tracking.hh`	2021-05-24 14:24:20 +03:00
Asias He	425e3b1182	gossip: Introduce direct failure detector Currently, gossip uses the updates of the gossip heartbeat from gossip messages to decide if a node is up or down. This means if a node is actually down but the gossip messages are delayed in the network, the marking of node down can be delayed. For example, a node sends 20 gossip messages in 20 seconds before it is dead. Each message is delayed 15 seconds by the network for some reason. A node receives those delayed messages one after another. Those delayed messages will prevent this node from being marked as down. Because heartbeat update is received just before the threshold to mark a node down is triggered which is around 20 seconds by default. As a result, this node will not be marked as down in 20 * 15 seconds = 300 seconds, much longer than the ~20 seconds node down detection time in normal cases. In this patch, a new failure detector is implemented. - Direct detection The existing failure detector can get gossip heartbeat updates indirectly. For example: Node A can talk to Node B Node B can talk to Node C Node A can not talk to Node C, due to network issues Node A will not mark Node B to be down because Node A can get heart beat of Node C from node B indirectly. This indirect detection is not very useful because when Node A decides if it should send requests to Node C, the requests from Node A to C will fail while Node A thinks it can communicate with Node C. This patch changes the failure detection to be direct. It uses the existing gossip echo message to detect directly. Gossip echo messages will be sent to peer nodes periodically. A peer node will be marked as down if a timeout threshold has been meet. Since the failure detection is peer to peer, it avoids the delayed message issue mentioned above. - Parallel detection The old failure detector uses shard zero only. This new failure detector utilizes all the shards to perform the failure detection, each shard handling a subset of live nodes. For example, if the cluster has 32 nodes and each node has 16 shards, each shard will handle only 2 nodes. With a 16 nodes cluster, each node has 16 shards, each shard will handle only one peer node. A gossip message will be sent to peer nodes every 2 seconds. The extra echo messages traffic produced compared to the old failure detector is negligible. - Deterministic detection Users can configure the failure_detector_timeout_in_ms to set the threshold to mark a node down. It is the maximum time between two successful echo message before gossip marks a node down. It is easier to understand than the old phi_convict_threshold. - Compatible This patch only uses the existing gossip echo message. Nodes with or without this patch can work together. Fixes #8488 Closes #8036	2021-05-24 10:47:06 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Kamil Braun	03ad111beb	tree-wide: comments on deprecated functions to access global variables Closes #8665	2021-05-18 11:31:10 +03:00
Asias He	9ea57dff21	gossip: Relax failure detector update We currently only update the failure detector for a node when a higher version of application state is received. Since gossip syn messages do not contain application state, so this means we do not update the failure detector upon receiving gossip syn messages, even if a message from peer node is received which implies the peer node is alive. This patch relaxes the failure detector update rule to update the failure detector for the sender of gossip messages directly. Refs #8296 Closes #8476	2021-04-14 13:16:00 +02:00
Asias He	bdb95233e8	gossip: Add advertise_to_nodes gossiper::advertise_to_nodes() is added to allow respond to gossip echo message with specified nodes and the current gossip generation number for the nodes. This is helpful to avoid the restarted node to be marked as alive during a pending replace operation. After this patch, when a node sends a echo message, the gossip generation number is sent in the echo message. Since the generation number changes after a restart, the receiver of the echo message can compare the generation number to tell if the node has restarted. Refs #8013	2021-04-01 09:38:54 +08:00
Asias He	f690f3ee8e	gossip: Add helper to wait for a node to be up This patch adds gossiper::wait_alive helper to wait for nodes to be up on all shards. Refs #8013	2021-04-01 09:38:54 +08:00

1 2 3 4 5

247 Commits