scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Benny Halevy	6aaa1b0f48	gossiper: add_saved_endpoint: set dc and rack When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following https://github.com/scylladb/scylladb/pull/16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs #15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	468462aa73	gossiper: add_saved_endpoint: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	b9e2aa4065	gossiper: add_saved_endpoint: make host_id mandatory Require all callers to provide a valid host_id parameter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	1061455442	gossiper: add load_endpoint_state Pack the topology-related data loaded from system.peers in `gms::load_endpoint_state`, to be used in a following patch for `add_saved_endpoint`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:06:56 +03:00
Benny Halevy	6b2d94045a	gossiper: start_gossiping: log local state The trace level message hides important information about the initial node state in gossip. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:06:30 +03:00
Kefu Chai	0be61e51d3	treewide: include <fmt/ostream.h> this header was previously brought in by seastar's sstring.hh. but since sstring.hh does not include <fmt/ostream.h> anymore, `gms/application_state.cc` does not have access to this header. also, `gms/application_state.cc` should `#include` the used header by itself. so, in this change, let's include <fmt/ostream.h> in `gms/application_state.cc`. this change addresses the FTBFS with the latest seastar. the same applies to other places changed in this commit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18193	2024-04-11 11:59:41 +03:00
Kefu Chai	ff43628b44	gms: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18194	2024-04-05 08:48:17 +03:00
Piotr Dulikowski	e76817502f	gms: feature: mark when_enabled(func) with nodiscard The feature::when_enabled function takes a callback and returns a listener_registration object. Unless the feature were enabled right from the start, the listener_registration will be non-null and will keep the callback registered until the registration is destroyed. If the registration is destroyed before the feature is enabled, the callback will not be called. It's easy to make a mistake and forget to keep the returned registration alive - especially when, in tests, the feature is enabled early in boot, because in that case when_enabled calls the callback immediately and returns a null object instead. In order to prevent issues with prematurely dropped listener_registration in the future, mark feature::when_enabled with the [[nodiscard]] attribute.	2024-03-27 08:55:45 +01:00
Piotr Dulikowski	2d9e78b09a	gossiper: failure detector: don't handle directly removed live endpoints Commit `0665d9c346` changed the gossiper failure detector in the following way: when live endpoints change and per-node failure detectors finish their loops, the main failure detector calls gossiper::convict for those nodes which were alive when the current iteration of the main FD started but now are not. This was changed in order to make sure that nodes are marked as down, because some other code in gossiper could concurrently remove nodes from the live node lists without marking them properly. This was committed around 3 years ago and the situation changed: - After `75d1dd3a76` the `endpoint_state::_is_alive` field was removed and liveness of a node is solely determined by its presence in the `gossiper::_live_endpoints` field. - Currently, all gossiper code which modifies `_live_endpoints` takes care to trigger relevant callback. The only function which modifies the field but does not trigger notifications is `gossiper::evict_from_membership`, but it is either called after `gossiper::remove_endpoint` which triggers callbacks by itself, or when a node is already dead and there is no need to trigger callbacks. So, it looks like the reasons it was introduced for are not relevant anymore. What's more important though is that it is involved in a bug described in scylladb/scylladb#17515. In short, the following sequence of events may happen: 1. Failure detector for some remote node X decides that it was dead long enough and `convict`s it, causing live endpoints to be updated. 2. The gossiper main loop sends a successful echo to X and decides to mark it as alive. 3. At the same time, failure detector for all nodes other than X finish and main failure detector continues; it notices that node X is not alive (because it was convicted in point 1.) and decides to convict it. 4. Actions planned in 2 and 3 run one after another, i.e. node is first marked as alive and then immediately as dead. This causes `on_alive` callbacks to run first and then `on_dead`. The second one is problematic as it closes RPC connections to node X - in particular, if X is in the process of replacing another node with the same IP then it may cause the replace operation to fail. In order to simplify the code and fix the bug - remove the piece of logic in question. Fixes: scylladb/scylladb#17515 Closes scylladb/scylladb#17754	2024-03-14 13:29:17 +01:00
Avi Kivity	dd76e1c834	Merge 'Simplify error_injection::inject_with_handler()' from Pavel Emelyanov The method in question can have a shorter name that matches all other injections in this class, and can be non-template Closes scylladb/scylladb#17734 * github.com:scylladb/scylladb: error_injection: De-template inject() with handler error_injection: Overload inject() instead of inject_with_handler()	2024-03-14 13:37:54 +02:00
Pavel Emelyanov	488404e080	gms: Remove unused i_failure_detection_event_listener Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17765	2024-03-13 09:33:56 +02:00
Pavel Emelyanov	1f44a374b8	error_injection: Overload inject() instead of inject_with_handler() The inject_with_handler() method accepts a coroutine that can be called wiht injection_handler. With such function as an argument, there's no need in distinctive inject_with_handler() name for a method, it can be overload of all the existing inject()-s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 19:30:19 +03:00
Benny Halevy	9804ce79d8	gossiper: do_status_check: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	1375c4e6a3	gossiper: do_status_check: allow evicting dead nodes from membership with no host_id Be more permissive about the presence of host_id application state for dead and expired nodes in release mode, so do not throw runtime_error in this case, but rather consider them as non-normal token owners. Instead, call on_internal_error_noexcept that will log the internal error and a backtrace, and will abort if abort-on-internal-error is set. This was seen when replacing dead nodes, without https://github.com/scylladb/scylladb/pull/15788 Fixes #16936 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	f32efcb7a6	gossiper: print the host_id when endpoint state goes UP/DOWN The host_id is now used in token_metadata and in raft topology changes so print it when the gossiper marks the node as UP/DOWN. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	fbf85ee199	gossiper: get_host_id: differentiate between no endpoint_state and no application_state Currently, we throw the same runtime_error: `Host {} does not have HOST_ID application_state` in both case: where there is no endpoint_state or when the endpoint_state has no HOST_ID application state. The latter case is unexpected, especially after `8ba0decda5` (and also from the add_saved_endpoint path after https://github.com/scylladb/scylladb/pull/15788 is merged), so throw different error in each case so we can tell them apart in the logs. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:16:49 +02:00
Benny Halevy	a9fb0cf3dc	gms: endpoint_state: add get_host_id A simpler getter to get the HOST_ID application state from the endpoint_state. Return a null host_id if the application state is not found. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 15:19:51 +02:00
Benny Halevy	234774295e	gossiper: do_status_check: continue loop after evicting FatClient We're seeing cases like #16936: ``` INFO 2024-01-23 02:14:19,915 [shard 0:strm] gossip - failure_detector_loop: Mark node 127.0.23.4 as DOWN INFO 2024-01-23 02:14:19,915 [shard 0:strm] gossip - InetAddress 127.0.23.4 is now DOWN, status = BOOT INFO 2024-01-23 02:14:27,913 [shard 0: gms] gossip - FatClient 127.0.23.4 has been silent for 30000ms, removing from gossip INFO 2024-01-23 02:14:27,915 [shard 0: gms] gossip - Removed endpoint 127.0.23.4 WARN 2024-01-23 02:14:27,916 [shard 0: gms] gossip - === Gossip round FAIL: std::runtime_error (Host 127.0.23.4 does not have HOST_ID application_state) ``` Since the FatClient timeout handling already evicts the endpoint from memberhsip there is no need to check further if the node is dead and expired, so just co_return. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 15:19:51 +02:00
Kamil Braun	fd32e2ee10	Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc. The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0. The PR fixes the race by moving the call to `versioned_value::cache_hitrates` to shard 0. Additionally, in order to help detect similar issues in the future, a check is introduced to `get_next_version` which aborts the process if the function was called on other shard than 0. There is a possibility that it is a fix for #17493. Because `get_next_version` uses a simple incrementation to advance the global counter, a data race can occur if two shards call it concurrently and it may result in shard 0 returning the same or smaller value when called two times in a row. The following sequence of events is suspected to occur on node A: 1. Shard 1 calls `get_next_version()`, loads version `v - 1` from the global counter and stores in a register; the thread then is preempted, 2. Shard 0 executes `add_local_application_state()` which internally calls `get_next_version()`, loads `v - 1` then stores `v` and uses version `v` to update the application state, 3. Shard 0 executes `add_local_application_state()` again, increments version to `v + 1` and uses it to update the application state, 4. Gossip message handler runs, exchanging application states with node B. It sends its application state to B. Note that the max version of any of the local application states is `v + 1`, 5. Shard 1 resumes and stores version `v` in the global counter, 6. Shard 0 executes `add_local_application_state()` and updates the application state - again - with version `v + 1`. 7. After that, node B will never learn about the application state introduced in point 6. as gossip exchange only sends endpoint states with version larger than the previous observed max version, which was `v + 1` in point 4. Note that the above scenario was _not_ reproduced. However, I managed to observe a race condition by: 1. modifying Scylla to run update of `CACHE_HITRATES` much more frequently than usual, 2. putting an assertion in `add_local_application_state` which fails if the version returned by `get_next_version` was not larger than the previous returned value, 3. running a test which performs schema changes in a loop. The assertion from the second point was triggered. While it's hard to tell how likely it is to occur without making updates of cache hitrates more frequent - not to mention the full theorized scenario - for now this is the best lead that we have, and the data race being fixed here is a real bug anyway. Refs: #17493 Closes scylladb/scylladb#17499 * github.com:scylladb/scylladb: version_generator: check that get_next_version is called on shard 0 misc_services: fix data race from bad usage of get_next_version	2024-02-25 19:35:34 +01:00
Piotr Dulikowski	54546e1530	version_generator: check that get_next_version is called on shard 0 The get_next_version function can only be safely called from shard 0, but this constraint is not enforced in any way. As evidenced in the previous commit, it is easy to accidentally call it from a non-zero shard. Introduce a runtime check to get_next_version which calls on_fatal_internal_error if it detects that the function was called form the wrong shard. This will let us detect cross-shard use issues in runtime.	2024-02-23 13:49:49 +01:00
Kefu Chai	3a3f0d392f	gms/versioned_value: impl operator<<(.., const gms::versioned_value) using fmt less repeatings this way. this is also a follow-up change of `cb781c0ff7`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17390	2024-02-23 08:11:03 +02:00
Kamil Braun	3d15fecf12	Merge 'amend cluster_status_table virtual table to work with raft' from Gleb cluster_status_table virtual table have a status field for each node. In gossiper mode the status is taken from the gossiper, but with raft the states are different and are stored in the topology state machine. The series fixes the code to check current mode and take the status from correct place. Refs scylladb/scylladb#16984 * 'gleb/cluster_status_table-v1' of github.com:scylladb/scylla-dev: gossiper: remove unused REMOVAL_COORDINATOR state virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled virtual_tables: create result for cluster_status_table read on shard 0	2024-02-22 11:47:57 +01:00
Gleb Natapov	f00ea36f63	gossiper: remove unused REMOVAL_COORDINATOR state This is leftover from `66ff072540`	2024-02-19 15:01:33 +02:00
Kefu Chai	cb781c0ff7	gms: add add formatter for gms::versioned_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::versioned_value`. its operator<< is preserved, as it's still being used by the homebrew generic formatter for std::unordered_map<gms::application_state, gms::versioned_value>, which is in turn used in gms/gossiper.cc. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17366	2024-02-18 19:21:54 +02:00
Kefu Chai	4812a57f71	gms: add add formatter for gms::gossip_* before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for - gms::gossip_digest - gms::gossip_digest_ack - gms::gossip_digest_syn and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17379	2024-02-18 19:21:53 +02:00
Kamil Braun	2e81f045cc	Merge 'transport: controller: do_start_server: do not set_cql_read for maintenance port' from Benny Halevy RPC is not ready yet at this point, so we should not set this application state yet. Also, simplify add_local_application_state as it contains dead code that will never generate an internal error after `1d07a596bf`. Fixes #16932 Closes scylladb/scylladb#17263 * github.com:scylladb/scylladb: gossiper: add_local_application_state: drop internae error transport: controller: do_start_server: do not set_cql_read for maintenance port	2024-02-12 13:26:45 +01:00
Benny Halevy	2ed29e31db	gms: inet_address: make constructors explicit In particular, `inet_address(const sstring& addr)` is dangerous, since a function like `topology::get_datacenter(inet_address ep)` might accidentally convert a `sstring` argument into an `inet_address` (which would most likely throw an obscure std::invalid_argument if the datacenter name does not look like an inet_address). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17260	2024-02-11 15:44:13 +02:00
Benny Halevy	f86a5072d6	gossiper: add_local_application_state: drop internae error After `1d07a596bf` that dropped before_change notifications there is no sense in getting the local endpoint_state_ptr twice: before and after the notifications and call on_internal_error if the state isn't found after the notifications. Just throw the runtime_error if the endpoint state is not found, otherwise, use it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-02-11 13:33:26 +02:00
Kamil Braun	e9e24f47ec	Merge 'raft topology: implement upgrade and recovery procedure' from Piotr Dulikowski This PR implements a procedure that upgrades existing clusters to use raft-based topology operations. The procedure does not start automatically, it must be triggered manually by the administrator after making sure that no topology operations are currently running. Upgrade is triggered by sending `POST /storage_service/raft_topology/upgrade` request. This causes the topology coordinator to start who drives the rest of the process: it builds the `system.topology` state based on information observed in gossip and tells all nodes to switch to raft mode. Then, topology coordinator runs normally. Upgrade progress is tracked in a new static column `upgrade_state` in `system.topology`. The procedure also serves as an extension to the current recovery procedure on raft. The current recovery procedure requires restarting nodes in a special mode which disables raft, perform `nodetool removenode` on the dead nodes, clean up some state on the nodes and restart them so that they automatically rebuild the group 0. Raft topology fits into existing procedure by falling back to legacy topology operations after disabling raft. After rebuilding the group 0, upgrade needs to be triggered again. Because upgrade is manual and it might not be convenient for administrators to run it right after upgrading the cluster, we allow the cluster to operate in legacy topology operations mode until upgrade, which includes allowing new nodes to join. In order to allow it, nodes now ask the cluster about the mode they should use to join before proceeding by using a new `JOIN_NODE_QUERY` RPC. The procedure is explained in more detail in `topology-over-raft.md`. Fixes: https://github.com/scylladb/scylladb/issues/15008 Closes scylladb/scylladb#17077 * github.com:scylladb/scylladb: test/topology_custom: upgrade/recovery tests for topology on raft cdc/generation_service: in legacy mode, fall back to raft tables system_keyspace: add read_cdc_generation_opt cdc/generation_service: turn off gossip notifications in raft topo mode cql_test_env: move raft_topology_change_enabled var earlier group0_state_machine: pull snapshot after raft topology feature enabled storage_service: disable persistent feature enabler on upgrade storage_service: replicate raft features to system.peers storage_service: gossip tokens and cdc generation in raft topology mode API: add api for triggering and monitoring topology-on-raft upgrade storage_service: infer which topology operations to use on startup storage_service: set the topology kind value based on group 0 state raft_group0: expose link to the upgrade doc in the header feature_service: fall back to checking legacy features on startup storage_service: add fiber for tracking the topology upgrade progress gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES topology_coordinator: implement core upgrade logic topology_coordinator: extract top-level error handling logic storage_service: initialize discovery leader's state earlier topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data topology_state_machine: introduce upgrade_state storage_service: disallow topology ops when upgrade is in progress raft_group0_client: add in_recovery method storage_service: introduce join_node_query verb raft_group0: make discover_group0 public raft_group0: filter current node's IP in discover_group0 raft_group0: remove my_id arg from discover_group0 storage_service: make _raft_topology_change_enabled more advanced docs: document raft topology upgrade and recovery	2024-02-09 11:54:53 +01:00
Piotr Dulikowski	53932420f8	storage_service: disable persistent feature enabler on upgrade When starting in legacy mode, a gossip event listener called persistent feature enabler is registered. This listener marks a feature as enabled when it notices, in gossip, that all nodes declare support for the feature. With raft-based topology, features are managed in group 0 instead and do not rely on the persistent feature enabler at all. Make the listener look at the raft_topology_change_enabled() method and prevent it from enabling more features after that method starts returning true.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	3513a07d8a	feature_service: fall back to checking legacy features on startup When checking features on startup (i.e. whether support for any feature was revoked in an unsafe way), it might happen that upgrade to raft topology didn't finish yet. In that case, instead of loading an empty set of features - which supposedly represents the set of features that were enabled until last boot - we should fall back to loading the set from the legacy `enabled_features` key in `system.scylla_local`.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	2ecb8641b1	gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES All nodes being capable of support for raft topology is a prerequisite for starting upgrade to raft topology. The newly introduced feature will track this prerequisite.	2024-02-08 19:12:28 +01:00
Kefu Chai	6eae678eb3	db: add formatter for gms::gossip_digest_ack2 before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::gossip_digest_ack2`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17153	2024-02-08 11:49:37 +02:00
Pavel Emelyanov	66b859a29f	gms: Remove unused operator<< for feature object Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17109	2024-02-01 19:00:46 +02:00
Kefu Chai	005d231f96	db: add formatter for gms::application_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::application_state`, but its operator<< is preserved, as it is still used by the generic homebrew formatter for `std::unordered_map<>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17096	2024-02-01 10:02:25 +02:00
Botond Dénes	5f44ae8371	Merge 'Add more logging for `gossiper::lock_endpoint` and `storage_service::handle_state_normal`' from Kamil Braun In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. Thus the plan is to continue debugging using the longevity test, but we need more logs. To check whether `handle_state_normal` was called and which branches were taken, include some INFO level logs there. Also, detect deadlocks inside `gossiper::lock_endpoint` by reporting an error message if `lock_endpoint` waits for the lock for too long. Ref: scylladb/scylladb#16668 Closes scylladb/scylladb#16733 * github.com:scylladb/scylladb: gossiper: report error when waiting too long for endpoint lock gossiper: store source_location instead of string in endpoint_permit storage_service: more verbose logging in handle_state_normal	2024-01-12 10:51:21 +02:00
Kamil Braun	cf646022cb	gossiper: report error when waiting too long for endpoint lock In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. One hypothesis is that `gossiper` is stuck on `lock_endpoint`. We dealt with gossiper deadlocks in the past (e.g. scylladb/scylladb#7127). Modify the code so it reports an error if `lock_endpoint` waits for the lock for more than a minute. When the issue reproduces again in longevity, we will see if `lock_endpoint` got stuck.	2024-01-11 17:29:25 +01:00
Kefu Chai	7abd263ee6	db/config.cc: do not respect sstable_format option "me" sstable format includes an important feature of storing the `host_id` of the local node when writing sstables. The is crucial for validating the sstable's `replay_position` in stats metadata as it is valid only on the originating node and shard (#10080), therefor we would like to make the `me` format mandatory. before making `me` mandatory, we need to stop handling `sstable_format` option if it is "md". in this change - gms/feature_service: do not disable `ME_SSTABLE_FORMAT` even if `sstable_format` is configured with "md". and in that case, instead, a warning is printed in the logging message to note that this setting is not valid anymore. - docs/architecture/sstable: note that "me" is used by default now. after this change, "sstable_format" will only accept "me" if it's explicitly configured. and when a server with this change joins a cluster, it uses "md" if the any of the node in the cluster still has `sstable_format`. practically, this change makes "me" mandatory in a 6.x cluster, assuming this change will be included in 6.x releases. Fixes #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Kefu Chai	bece3eff0c	feature_service: abort if sstable_format < md sstable_format comes from scylla.yaml or from the command line arguments, and we gate scylla from unallowed sstable formats lower than `md` when parsing the configuration, and scylla bails out at seeing the unallowed sstable format like: ``` terminate called after throwing an instance of 'std::invalid_argument' what(): Invalid value for sstable_format: got ka which is not inside the set of allowed values md, me Aborted (core dumped) ``` scylla errors out way before `feature_config_from_db_config()` gets called -- it throws in `bpo::notify(configuration)`, way before `func` is evaluated in `app_template::run_deprecated()`. so, in this change, we do not handle these values anymore, and consider it a bug if we run into any of them. Refs #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Kamil Braun	6e39c2ffde	gossiper: store source_location instead of string in endpoint_permit The original code extracted only the function_name from the source_location for logging. We'll use more information from the source_location in later commits.	2024-01-10 17:02:52 +01:00
Kefu Chai	f61f6c27e3	gms: add formatter for gms::endpoint_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for gms::endpoint_state, and change update the callers of `operator<<` to use `fmt::print()`. but we cannot drop `operator<<` yet, as we are still using the templated operator<< and templated fmt::formatter to print containers in scylla and in seastar -- they are still using `operator<<` under the hood. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16705	2024-01-10 09:16:23 +02:00
Kefu Chai	b91eb89ffa	gms: heart_beat_state: add formatter for gms::heart_beat_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for gms::heart_beat_state, and remove its operator<<(). the only caller site of its operator<< is updated to use `fmt::print()` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16652	2024-01-09 11:52:40 +02:00
Kefu Chai	cca786e847	gms: endpoint_state: fix a typo in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16653	2024-01-09 11:51:49 +02:00
Kefu Chai	7e84e03f52	gms: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. because the removal of `#include "unimplemented.hh"`, `service/migration_manager.cc` misses the definition of `unimplemented::cause::VALIDATION`, so include the header where it is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16654	2024-01-05 13:37:08 +02:00
Kamil Braun	f942bf4a1f	Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy Currently, `add_saved_endpoint` is called from two paths: One, is when loading states from system.peers in the join path (join_cluster, join_token_ring), when `_raft_topology_change_enabled` is false, and the other is from `storage_service::topology_state_load` when raft topology changes are enabled. In the later path, from `topology_state_load`, `add_saved_endpoint` is called only if the endpoint_state does not exist yet. However, this is checked without acquiring the endpoint_lock and so it races with the gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint state may already be populated. Since `add_saved_endpoint` applies local information about the endpoint state (e.g. tokens, dc, rack), it uses the local heart_beat_version, with generation=0 to update the endpoint states, and that is incompatible with changes applies via gossip that will carry the endpoint's generation and version, determining the state's update order. This change makes sure that the endpoint state is never update in `add_saved_endpoint` if it has non-zero generation. An internal error exception is thrown if non-zero generation is found, and in the only call site that might reach that state, in `storage_service::topology_state_load`, the caller acquires the endpoint_lock for checking for the existence of the endpoint_state, calling `add_saved_endpoint` under the lock only if the endpoint_state does not exist. Fixes #16429 Closes scylladb/scylladb#16432 * github.com:scylladb/scylladb: gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found storage_service: topology_state_load: lock endpoint for add_saved_endpoint raft_group_registry: move on_alive error injection to gossiper	2024-01-04 14:47:10 +01:00
Benny Halevy	9e8998109f	gossiper: get_*_members_synchronized: acquire endpoint update semaphore To ensure that the value they return is synchronized on all shards. This got broken recently by `147f30caff`. Refs https://github.com/scylladb/scylladb/pull/16597#discussion_r1440445432 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16629	2024-01-03 17:41:46 +01:00
Benny Halevy	147f30caff	gossiper: mutate_live_and_unreachable_endpoints: make exception safe Change the mutate_live_and_unreachable_endpoints procedure so that the called `func` would mutate a cloned `live_and_unreachable_endpoints` object in place. Those are replicated to temporary copies on all shards using `foreign<unique_ptr<>>` so that the would be automatically freed on exception. Only after all copies are made, they are applied on all gossiper shards in a noexcept loop and finally, a `on_success` function is called to apply further side effects if everything else was replicated successfully. The latter is still susceptible to exceptions, but we can live with those as long as `_live_endpoints` and `_unreachable_endpoints` are synchronized on all shards. With that, the read-only methods: `get_live_members_synchronized` and `get_unreachable_members_synchronized` become trivial and they just return the required data from shard 0. Fixes #15089 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16597	2024-01-03 14:46:10 +02:00
Benny Halevy	cdd5605d81	gms: endpoint_state: change application_state_map to std::unordered_map State changes are processed as a batch and there is no reason to maintain them as an ordered map. Instead, use a std::unordered_map that is more efficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	ad8a9104d8	endpoint_state subscriptions: batch on_change notification Rather than calling on_change for each particular application_state, pass an endpoint_state::map_type with all changed states, to be processed as a batch. In particular, thise allows storage_service::on_change to update_peer_info once for all changed states. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	1d07a596bf	everywhere: drop before_change subscription None of the subscribers is doing anything before_change. This is done before changing `on_change` in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00

1 2 3 4 5 ...

1092 Commits