scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Kefu Chai	fa3129fa29	treewide: use unsigned variable to compare with unsigned some times we initialize a loop variable like auto i = 0; or int i = 0; but since the type of `0` is `int`, what we get is a variable of `int` type, but later we compare it with an unsigned number, if we compile the source code with `-Werror=sign-compare` option, the compiler would warn at seeing this. in general, this is a false alarm, as we are not likely to have a wrong comparison result here. but in order to prevent issues due to the integer promotion for comparison in other places. and to prepare for enabling `-Werror=sign-compare`. let's use unsigned to silence this warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 10:27:18 +08:00
Kefu Chai	6459bf9c0b	test: randomized_nemesis_test: do not perform tautogical comparision it is not supported by C++, and might not yield expected result. as "0 <= d" evaluates to true, which is always less than "magic". so let's avoid using it. ``` /home/kefu/dev/scylladb/test/raft/randomized_nemesis_test.cc:2908:23: error: result of comparison of constant 54313 with expression of type 'bool' is always true [-Werror,-Wtautological-constant-out-of-range-compare] 2908 \| assert(0 <= d < magic); \| ~~~~~~ ^ ~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14695	2023-07-16 18:30:58 +03:00
Patryk Jędrzejczak	ed5627cb78	test: raft: add more unit tests for raft address map https://github.com/scylladb/scylladb/pull/12035 and https://github.com/scylladb/scylladb/pull/14329 have introduced a few features to the raft address map that haven't been tested yet: - mappings without an actual IP address (the first PR) - marking entries with generation numbers (the second PR) This commit adds unit tests that verify these changes. Closes #14572	2023-07-13 12:00:43 +02:00
Kefu Chai	8f31f28446	build: cmake: add test/raft tests Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14656	2023-07-12 15:06:59 +03:00
Kamil Braun	2fea2fc19c	raft: replication test: don't hang if `_seen` overshots `_apply_entries` As in the previous commit, if a command gets doubly applied due to `commit_status_unknown`, this will could lead to hard-to-debug failures; one of them was the test hanging because we would never call `_done.set_value()` in `state_machine::apply` due to `_seen` overshooting `_apply_entries`. Fix the problem and print a warning if we apply too many commands. Fixes: #14072	2023-06-07 14:17:23 +02:00
Kamil Braun	43b48c59fd	raft: replication test: print a warning when handling `commit_status_unknown` `commit_status_unknown` may lead to double application and then a hard-to-debug failure. But some tests actually rely on retrying it, so print a warning and leave a FIXME for maybe a better future solution. Ref: #14029	2023-06-07 14:17:20 +02:00
Benny Halevy	f5f566bdd8	utils: add tagged_integer A generic template for defining strongly typed integer types. Use it here to replace raft::internal::tagged_uint64. Will be used for defining gms generation and version as strong and distinguishable types in following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Avi Kivity	0c64dd12b1	test: raft_server_test: fix string compare for clang 15 Clang 15 rejects string compares where the left-hand-side is a C string, so help it along by converting it ourselves. Closes #13582	2023-04-21 06:38:10 +03:00
Avi Kivity	e75009cd49	treewide: catch by reference gcc rightly warns about capturing by value, so capture by reference.	2023-03-21 15:43:00 +02:00
Avi Kivity	eaad38c682	test: raft: avoid confusing string compare gcc doesn't like comparing a C string to an sstring -- apparently it has different promotion rules than clang. Fix by doing an explicit conversion.	2023-03-21 15:43:00 +02:00
Avi Kivity	32a724fada	test: raft: fsm_test: disambiguate raft::configuration construction gcc thinks the constructor call is ambiguous since "{}" can match the default constructor. Fix by making the parameter type explicit. Use "{}" for the constructor call to avoid the most-vexing-parse problem.	2023-03-21 13:45:57 +02:00
Gleb Natapov	2fc8e13dd8	raft: add server::wait_for_state_change() function Add a function that allows waiting for a state change of a raft server. It is useful for a user that wants to know when a node becomes/stops being a leader. Message-Id: <20230316112801.1004602-4-gleb@scylladb.com>	2023-03-20 11:31:55 +01:00
Avi Kivity	6aa91c13c5	Merge 'Optimize topology::compare_endpoints' from Benny Halevy The code for compare_endpoints originates at the dawn of time (`bc034aeaec`) and is called on the fast path from storage_proxy via `sort_by_proximity`. This series considerably reduces the function's footprint by: 1. carefully coding the many comparisons in the function so to reduce the number of conditional banches (apparently the compiler isn't doing a good enough job at optimizing it in this case) 2. avoid sstring copy in topology::get_{datacenter,rack} Closes #12761 * github.com:scylladb/scylladb: topology: optimize compare_endpoints to_string: add print operators for std::{weak,partial}_ordering utils: to_sstring: deinline std::strong_ordering print operator move to_string.hh to utils/ test: network_topology: add test_topology_compare_endpoints	2023-03-07 15:17:19 +02:00
Kefu Chai	3ae11de204	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:53 +08:00
Avi Kivity	e2f6e0b848	utils: move hashing related files to utils/ module Closes #12884	2023-02-17 07:19:52 +02:00
Benny Halevy	25ebc63b82	move to_string.hh to utils/ Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:09:04 +02:00
Gleb Natapov	022a825b33	raft: introduce not_a_member error and return it when non member tries to do add/modify_config Currently if a node that is outside of the config tries to add an entry or modify config transient error is returned and this causes the node to retry. But the error is not transient. If a node tries to do one of the operations above it means it was part of the cluster at some point, but since a node with the same id should not be added back to a cluster if it is not in the cluster now it will never be. Return a new error not_a_member to a caller instead. Message-Id: <Y42mTOx8bNNrHqpd@scylladb.com>	2022-12-05 17:11:04 +01:00
Konstantin Osipov	73e5298273	raft: (address map) actively maintain ip <-> raft server id map 1) make address map API flexible Before this patch: - having a mapping without an actual IP address was an internal error - not having a mapping for an IP address was an internal error - re-mapping to a new IP address wasn't allowed After this patch: - the address map may contain a mapping without an actual IP address, and the caller must be prepared for it: find() will return a nullopt. This happens when we first add an entry to Raft configuration and only later learn its IP address, e.g. via gossip. - it is allowed to re-map an existing entry to a new address; 2) subscribe to gossip notifications Learning IP addresses from gossip allows us to adjust the address map whenever a node IP address changes. Gossiper is also the only valid source of re-mapping, other sources (RPC) should not re-map, since otherwise a packet from a removed server can remap the id to a wrong address and impact liveness of a Raft cluster. 3) prompt address map state with app state Initialize the raft address map with initial gossip application state, specifically IPs of members of the cluster. With this, we no longer need to store these IPs in Raft configuration (and update them when they change). The obvious drawback of this approach is that a node may join Raft config before it propagates its IP address to the cluster via gossip - so the boot process has to wait until it happens. Gossip also doesn't tell us which IPs are members of Raft configuration, so we subscribe to Group0 configuration changes to mark the members of Raft config "non-expiring" in the address translation map. Thanks to the changes above, Raft configuration no longer stores IP addresses. We still keep the 'server_info' column in the raft_config system table, in case we change our mind or decide to store something else in there.	2022-11-29 19:55:43 +03:00
Konstantin Osipov	990c7a209f	raft: change the API of conf change notifications Pass a change diff into the notification callback, rather than add or remove servers one by one, so that if we need to persist the state, we can do it once per configuration change, not for every added or removed server. For now still pass added and removed entries in two separate calls per a single configuration change. This is done mainly to fulfill the library contract that it never sends messages to servers outside the current configuration. The group0 RPC implementation doesn't need the two calls, since it simply marks the removed servers as expired: they are not removed immediately anyway, and messages can still be delivered to them. However, there may be test/mock implementations of RPC which could benefit from this contract, so we decided to keep it.	2022-11-17 12:07:31 +03:00
Kamil Braun	e086521c1a	direct_failure_detector: get rid of complex `endpoint_id` translations The direct failure detector operates on abstract `endpoint_id`s for pinging. The `pigner` interface is responsible for translating these IDs to 'real' addresses. Earlier we used two types of addresses: IP addresses in 'production' code (`gms::gossiper::direct_fd_pinger`) and `raft::server_id`s in test code (in `randomized_nemesis_test`). For each of these use cases we would maintain mappings between `endpoint_id`s and the address type. In recent commits we switched the 'production' code to also operate on Raft server IDs, which are UUIDs underneath. In this commit we switch `endpoint_id`s from `unsigned` type to `utils::UUID`. Because each use case operates in Raft server IDs, we can perform a simple translation: `raft_id.uuid()` to get an `endpoint_id` from a Raft ID, `raft::server_id{ep_id}` to obtain a Raft ID from an `endpoint_id`. We no longer have to maintain complex sharded data structures to store the mappings.	2022-11-04 09:38:08 +01:00
Kamil Braun	db6cc035ed	test/raft: raft_address_map_test: add replication test	2022-10-31 09:17:12 +01:00
Kamil Braun	159bb32309	service/raft: turn raft_address_map into a service	2022-10-31 09:17:10 +01:00
Konstantin Osipov	3e46c32d7b	raft: (discovery) do not use raft::server_address to carry IP data We plan to remove IP information from Raft addresses. raft::server_address is used in Raft configuration and also in discovery, which is a separate algorithm, as a handy data structure, to avoid having new entities in RPC. Since we plan to remove IP addresses from Raft configuration, using raft::server_address in discovery and still storing IPs in it would create ambiguity: in some uses raft::server_address would store an IP, and in others - would not. So switch to an own data structure for the purposes of discovery, discovery_peer, which contains a pair ip, raft server id. Note to reviewers: ideally we should switch to URIs in discovery_peer right away. Otherwise we may have to deal with incompatible changes in discovery when adding URI support to Scylla.	2022-10-10 16:24:33 +03:00
Petr Gusev	0923cb435f	raft: mark removed servers as expiring instead of dropping them There is a flaw in how the raft rpc endpoints are currently managed. The io_fiber in raft::server is supposed to first add new servers to rpc, then send all the messages and then remove the servers which have been excluded from the configuration. The problem is that the send_messages function isn't synchronous, it schedules send_append_entries to run after all the current requests to the target server, which can happen after we have already removed the server from address_map. In this patch the remove_server function is changed to mark the server_id as expiring rather than synchronously dropping it. This means all currently scheduled requests to that server will still be able to resolve the ip address for that server_id. Fixes: #11228 Closes #11748	2022-10-07 19:08:34 +02:00
Petr Gusev	bc50b7407f	raft replication_test, make backpressure test to do actual backpressure Before this patch this test didn't actually experience any backpressure since all the commands were executed sequentially.	2022-09-27 12:04:14 +04:00
Petr Gusev	b34dfed307	raft server, release memory if add_entry throws We consume memory from semaphore in add_entry_on_leader, but never release it if add_entry throws.	2022-09-27 12:02:34 +04:00
Petr Gusev	27e60ecbf4	raft server, log size limit in bytes Before this patch we could get an OOM if we received several big commands. The number of commands was small, but their total size in bytes was large. snapshot_trailing_size is needed to guarantee progress. Without this limit the fsm could get stuck if the size of the next item is greater than max_log_size - (size of trailing entries).	2022-09-26 13:10:10 +04:00
Petr Gusev	210d9dd026	raft: fix snapshots leak applier_fiber could create multiple snapshots between io_fiber run. The fsm_output.snp variable was overwritten by applier_fiber and io_fiber didn't drop the previous snapshot. In this patch we introduce the variable fsm_output.snps_to_drop, store in it the current snapshot id before applying a new one, and then sequentially drop them in io_fiber after storing the last snapshot_descriptor. _sm_events.signal() is added to fsm::apply_snapshot, since this method mutates the _output and thus gives a reason to run io_fiber. The new test test_frequent_snapshotting demonstrates the problem by causing frequent snapshots and setting the applier queue size to one. Closes #11530	2022-09-21 12:46:26 +02:00
Petr Gusev	c57238d3d6	raft server, check aborted state on public server public api's Fix: #11352	2022-09-12 10:16:40 +04:00
Tomasz Grabiec	83850e247a	Merge 'raft: server: handle aborts when waiting for config entry to commit' from Kamil Braun Changing configuration involves two entries in the log: a 'joint configuration entry' and a 'non-joint configuration entry'. We use `wait_for_entry` to wait on the joint one. To wait on the non-joint one, we use a separate promise field in `server`. This promise wasn't connected to the `abort_source` passed into `set_configuration`. The call could get stuck if the server got removed from the configuration and lost leadership after committing the joint entry but before committing the non-joint one, waiting on the promise. Aborting wouldn't help. Fix this by subscribing to the `abort_source` in resolving the promise exceptionally. Furthermore, make sure that two `set_configuration` calls don't step on each other's toes by one setting the other's promise. To do that, reset the promise field at the end of `set_configuration` and check that it's not engaged at the beginning. Fixes #11288. Closes #11325 * github.com:scylladb/scylladb: test: raft: randomized_nemesis_test: additional logging raft: server: handle aborts when waiting for config entry to commit	2022-08-25 12:49:09 +02:00
Kamil Braun	90233551be	test: raft: randomized_nemesis_test: don't access failure detector service after it's stopped It could happen that we accessed failure detector service after it was stopped if a reconfiguration happened in the 'right' moment. This would resolve in an assertion failure. Fix this. Closes #11326	2022-08-25 11:32:06 +03:00
Kamil Braun	b42dfbc0aa	test: raft: randomized_nemesis_test: additional logging Add some more logging to `randomized_nemesis_test` such as logging the start and end of a reconfiguration operation in a way that makes it easy to find one given the other in the logs.	2022-08-23 13:14:30 +02:00
Kamil Braun	e0c6153adf	test: raft: randomized_nemesis_test: more chaos in `remove_leader_with_forwarding_finishes` Improve the randomness of this test, making it a bit easier to reproduce the scenarios that the test aims to catch. Increase timeouts a bit to account for this additional randomness.	2022-08-22 18:53:48 +02:00
Gleb Natapov	f1f1176963	service: raft: do not allow downgrading non expiring entry to expiring one in raft_address_map Expiring entries are added when a message is received from an unknown host. If the host is later added to the raft configuration they become non expiring. After that they can only be removed when the host is dropped from the configuration, but they should never become expiring again. Refs #10826	2022-07-21 17:40:04 +02:00
Kamil Braun	daf9c53bb8	raft: split `can_vote` field from `server_address` to separate struct Whether a server can vote in a Raft configuration is not part of the address. `server_address` was used in many context where `can_vote` is irrelevant. Split the struct: `server_address` now contains only `id` and `server_info` as it did before `can_vote` was introduced. Instead we have a `config_member` struct that contains a `server_address` and the `can_vote` field. Also remove an "unsafe" constructor from `server_address` where `id` was provided but `server_info` was not. The constructor was used for tests where `server_info` is irrelevant, but it's important not to forget about the info in production code. The constructor was used for two purposes: - Invoking set operations such as `contains`. To solve this we use C++20 transparent hash and comparator functions, which allow invoking `contains` and similar functions by providing a different key type (in this case `raft::server_id` in set of addresses, for example). - constructing addresses without `info`s in tests. For this we provide helper functions in the test helpers module and use them.	2022-07-18 18:22:10 +02:00
Petr Gusev	6cdd5b9ff5	raft, set_configuration fix: don't use dummy entries Leader which ceases to be a leader as a result of a execute_modify_config cannot wait for a dummy record to be committed because io_fiber aborts current waiters as soon as it detects a lost of leadership. This commit excludes dummy entries from the configuration change procedure. A special promise is set on io_fiber when it gets a non-joint configuration, and set_configuration just waits for the corresponding future instead of a dummy record. Fixes: #10010 Closes #10905	2022-07-06 11:26:59 +02:00
Avi Kivity	4b53af0bd5	treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime of the function object is less ambiguous, and so it is safer. Replace all eligible occurences (i.e. caller is a coroutine). One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra attention since there was a handle_exception() continuation attached. It is converted to a try/catch. Closes #10699	2022-05-31 09:06:24 +03:00
Kamil Braun	2dafd99e3b	test: raft: randomized_nemesis_test: include non-voters during reconfigurations We modify the `reconfigure` and `modify_config` APIs to take a vector of <server_id, bool> pairs (instead of just a vector of server_ids), where the bool indicates whether the server is a voter in the modified config. The `reconfiguration` operation would previously shuffle the set of servers and split it into two parts: members and non-members. Now it partitions it into three parts: voters, non-voters, and non-members.	2022-05-27 12:06:18 +02:00
Kamil Braun	6268c63739	test: raft: randomized_nemesis_test: check consistency of reads The test would perform `read_barrier`s but not check the correctness of the reads: whether the state observed by a read is consistent with the model and recent enough (in short, check linearizability). This commit adds the correctness checks.	2022-05-25 15:00:19 +02:00
Kamil Braun	6b2b400143	test: raft: randomized_nemesis_test: perform linearizable reads using read_barriers Introduce a new operation, `raft_read`, which calls `read_barrier` on a server, reads the state of the server's state machine, and returns that state. Extend the generator in `basic_generator_test` to generate `raft_read`s. Only do it if forwarding is enabled (although it may make sense to test read barriers in non-forwarding scenario as well - we may think about it and do it in a follow-up). For now, we don't check the consistency of the results of the reads. They do return the observed state, but we don't compare it yet with the model. For now we simply issue the reads concurrently with other operations to introduce some more chaos to the cluster and check liveness and consistency of existing operations.	2022-05-25 15:00:19 +02:00
Kamil Braun	4ea5807862	test: raft: randomized_nemesis_test: add flags for disabling nemeses Makes it easier to debug stuff.	2022-05-25 15:00:16 +02:00
Kamil Braun	700a1fdd20	test: raft: randomized_nemesis_test: send `modify_config` requests in reconfiguration nemsesis Extend the reconfiguration nemesis to send `modify_config` requests as well as `reconfigure` requests. It chooses one or the other with probability 1/2.	2022-05-24 11:39:08 +02:00
Kamil Braun	2222f095b3	test: raft: randomized_nemesis_test: fix `rpc` reply ID generation When `rpc` wants to perform a two-way RPC call it sends a message containing a `reply_id`. The other side will send the `reply_id` back when answering, so the original side can match the response to the promise corresponding to the future being waited on by the RPC caller. Previously each instance of `rpc` generated reply IDs independently as increasing integers starting from 0. The network delivers messages based on Raft server IDs. A response message may thus be delievered not to the original instance which invoked the RPC, but to a new instance which uses the same Raft server ID (after we simulated a server crash/stop and restart, creating a new server with the same ID that reuses the previous instance's `persistence` instance but has a new `rpc`). The new instance could have started a new RPC call using the same `reply_id` as one currently being in-flight that was started by the previous instance. The new instance could then receive and handle a response that was intended for the previous instance, leading to weird bugs. Fix this by replacing the local reply ID counters by a global counter so that every two-way RPC call gets a unique reply ID.	2022-05-24 11:39:08 +02:00
Kamil Braun	b9807f07e6	test: raft: randomized_nemesis_test: during bouncing call, allow a leader to reroute to itself A server executing a `modify_config` call, even if it initially was a leader and accepted the request, may end up throwing a `not_a_leader` error, rerouting the caller to a new leader - but this new leader may be that same server. This happens because `execute_modify_config` translates certain errors that it considers transient (such as `conf_change_in_progress`) into `not_a_leader{last_known_leader}`, in attempt to notify the caller that they should retry the request; but when this translation happens, the `last_known_leader` may be that same server (it could have even lost leadership and then regained it back while the request was being handled). This is not strictly an error, and it should be safe for the client to retry the request by sending it to the same server. The nemesis test assumed that a server never returns `not_a_leader{itself}`; this commit drops the assumption. An alternative solution would be to extend the error types that are now translated to `not_a_leader` so they include information about the last known leader. This way the client does not lose information about the original error and still gets a potential contact point for retry.	2022-05-24 11:36:51 +02:00
Kamil Braun	b33bc7a5d6	test: raft: randomized_nemesis_test: handle timed_out_error from modify_config May be propagated from `rpc::send_modify_config` to the caller of `modify_config`.	2022-05-24 11:36:51 +02:00
Kamil Braun	c15f3a9698	test: raft: randomized_nemesis_test: remove old failure_detector No longer used. Split from the previous commit for a better diff.	2022-05-09 13:14:41 +02:00
Kamil Braun	915d329f1f	test: raft: randomized_nemesis_test: use `direct_failure_detector::failure_detector` Until now the nemesis test used its own failure detector implementation which used one-way heartbeats. Switch it to use the new direct failure detection service, which will also be used in production code. Integrating it does require some work however as we need to implement the `pinger` and `clock` interfaces for the failure detector. The service is sharded, but for simplicity of implementation we implement rpcs and sleeps by routing the requests to shard 0, where logical timers and network live.	2022-05-09 13:14:41 +02:00
Kamil Braun	e5fc0681d9	test: raft: randomized_nemesis_test: ping all shards on each tick Right now the test is running entirely on shard 0, but we want to introduce a sharded service to the test. The initial naive attempt of doing that failed because the test would time out (reach the tick limit) before any work distributed to other shards could even start. The solution in this commit solves that by synchronizing the shards on each tick. When the test is ran with smp=1, the behavior is as before.	2022-05-09 13:14:41 +02:00
Kamil Braun	e4f85cf425	test: unit test for new failure detector service	2022-05-09 13:14:41 +02:00
Gleb Natapov	7f26a8eef5	raft: actively search for a leader if it is not known for a tick duration For a follower to forward requests to a leader the leader must be known. But there may be a situation where a follower does not learn about a leader for a while. This may happen when a node becomes a follower while its log is up-to-date and there are no new entries submitted to raft. In such case the leader will send nothing to the follower and the only way to learn about the current leader is to get a message from it. Until a new entry is added to the raft's log a follower that does not know who the leader is will not be able to add entries. Kind of a deadlock. Note that the problem is specific to our implementation where failure detection is done by an outside module. In vanilla raft a leader sends messages to all followers periodically, so essentially it is never idle. The patch solves this by broadcasting specially crafted append reject to all nodes in the cluster on a tick in case a leader is not known. The leader responds to this message with an empty append request which will cause the node to learn about the leader. For optimisation purposes the patch sends the broadcast only in case there is actually an operation that waits for leader to be known. Fixes #10379	2022-04-25 14:51:22 +02:00

1 2 3 4 5 ...

437 Commits