scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Pavel Solodovnikov	c0854a0f62	raft: create system tables only when `raft` experimental feature is set Also introduce a tiny function to return raft-enabled db config for cql testing. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210826091432.279532-1-pa.solodovnikov@scylladb.com>	2021-08-26 12:21:12 +03:00
Gleb Natapov	3ff6f76cef	raft: test: add read_barrier test to replication_test	2021-08-25 08:57:13 +03:00
Gleb Natapov	ad2c2abcb8	raft: test: add read_barrier tests to fsm_test	2021-08-25 08:57:13 +03:00
Gleb Natapov	03a266d73b	raft: make read_barrier work on a follower as well as on a leader This patch implements RAFT extension that allows to perform linearisable reads by accessing local state machine. The extension is described in section 6.4 of the PhD. To sum it up to perform a read barrier on a follower it needs to asks a leader the last committed index that it knows about. The leader must make sure that it is still a leader before answering by communicating with a quorum. When follower gets the index back it waits for it to be applied and by that completes read_barrier invocation. The patch adds three new RPC: read_barrier, read_barrier_reply and execute_read_barrier_on_leader. The last one is the one a follower uses to ask a leader about safe index it can read. First two are used by a leader to communicate with a quorum.	2021-08-25 08:57:13 +03:00
Alejo Sanchez	a5c74a6442	raft: candidate timeout proportional to cluster size To avoid dueling candidates with large clusters, make the timeout proportional to the cluster size. Debug mode is too slow for a test of 1000 nodes so it's disabled, but the test passes for release and dev modes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-24 13:09:01 +02:00
Alejo Sanchez	7206eae16e	raft: testing: many nodes test Tests with many nodes and realistic timers and ticks. Network delays are kept as a fraction of ticks. (e.g. 20/100) Tests with 600 or more nodes hang in debug mode. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-24 13:09:01 +02:00
Alejo Sanchez	87a03a3485	raft: replication test: remove unused tick_all Tests now wait for normal ticks for election, remove deprecated tick_all helper. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-24 13:09:01 +02:00
Alejo Sanchez	14c214d73e	raft: replication test: delays Allow test supplied delays for rpc communication. Allow supplying network delay, local delay (nodes within the same server), how many nodes are local, and an extra small delay simulating local load. Modify rpc class to support delays. If delays are enabled, it no longer directly calls the other node's server code but it schedules it to be called later. This makes the test more realistic as in the previous version the first candidate was always going to get to all followers first, preventing a dueling candidates scenario. Previously, tickers were all scheduled at the same time, so there was no spread of them across the tick time. Now these tickers are scheduled with a uniform spread across this time (tick delta). Also previously, for custom free elections used tick_all() which traversed _in_configuration sequentially and ticked each. This, combined with rpc outbound directly calling methods in the other server without yielding, caused free elections to be unrealistic with same order determined and first candidate always winning. This patch changes this behavior. The free election uses normal tickers (now uniformly distributed in tick delay time) and its loop waits for tick delay time (yielding) and checks if there's a new leader. Also note the order might not be the same in debug mode if more than one tick is scheduled. As rpc messages are sent delayed, network connectivity needs to be checked again before calling the function on the remote side. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-24 13:05:53 +02:00
Alejo Sanchez	db23823c77	raft: replication test: packet drop rpc helper Add a helper to check if a packet should be dropped. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	497af3167f	raft: replication test: connectivity configuration Pass packet drops within connectivity configuration struct. Default to no packet drops. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	e4d5428e8a	raft: replication test: rpc network map in raft_cluster Move rpc network map to raft cluster, no longer as static in rpc class.	2021-08-23 17:50:16 +02:00
Alejo Sanchez	192ac5be4c	raft: replication test: use minimum granularity seastar lowres_clock minimum granularity is 10ms, not 1ms. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	5cfe6c1ca2	raft: replication test: minor: rename local to int ids For clarity, name 0-based integer ids as int ids not local. This is in contrast with 1-based UUID ids.	2021-08-23 17:50:16 +02:00
Alejo Sanchez	27d90f0165	raft: replication test: fix restart_tickers when partitioning When partitioning, elect_new_leader restarts tickers, so don't re-restart them in this case. When leader is dropped and no new leader is specified, restart tickers before free election. If no change of leader, restart tickers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	e4262291f2	raft: replication test: partition ranges Allow specifying ranges within partition to handle large number of nodes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	56a110d42f	raft: replication test: isolate one server Support disconnection of one server with the rest. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	6b3327c753	raft: replication test: move objects out of header Use a separate cc file for definitions and objects. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	cea18e6830	raft: replication test: make dummy command const Make dummy command const in header. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	2db3192ac3	raft: replication test: template clock type Templetize clock type. Use a struct for run_test to work around https://bugs.llvm.org/show_bug.cgi?id=50345 With help from @kbr- Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	cb35588fb1	raft: replication test: tick delta inside raft_cluster Store tick delta inside raft_cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	49cb040037	raft: replication test: style - member initializer Fix raft_cluster constructor member initializer list.	2021-08-23 17:50:16 +02:00
Alejo Sanchez	6e2ab657b3	raft: replication test: move common code out Common replication test code moved to header. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Alejo Sanchez	a6cd35c512	raft: testing: refactor helper Move definitions to helper object file. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Kamil Braun	3344ac8a6c	test: raft: randomized_nemesis_test: a basic generator test The previous commits introduced basic the generator concept and a library of most common composition patterns. In this commit we implement a test that uses this new infrastructure. Two `Executable` operations are implemented: - `raft_call` is for calling to a Raft cluster with a given state machine command, - `network_majority_grudge` partitions the network in half, putting the leader in the minority. We run a workload of these operations against a cluster of 5 nodes with 6 threads for executing the operations: one "nemesis thread" for `network_majority_grudge` and 5 "client threads" for `raft_call`. Each client thread randomly chooses a contact point which it tries first when executing a `raft_call`, but it can also "bounce" - call a different server when the previous returned "not_a_leader" (we use the generic "bouncing" wrapper to do this). For now we only print the resulting history. In a follow-up patchset we will analyze it for consistency anomalies.	2021-08-16 13:07:08 +02:00
Kamil Braun	66ec484730	test: raft: generator: a library of basic generators Operations and generators can be composed to create more complex operations and generators. There are certain composition patterns useful for many different test scenarios. This commit introduces a couple of such patterns. For example: - Given multiple different operation types, we can create a new operation type - `either_of` - which is a "union" of the original operation types. Executing `either_of` operation means executing an operation of one of the original types, but the specific type can be chosen in runtime. - Given a generator `g`, `op_limit(n, g)` is a new generator which limits the number of operations produced by `g`. - Given a generator `g` and a time duration of `d` ticks, `stagger(g, d)` is a new generator which spreads the operations from `g` roughly every `d` ticks. (The actual definition in code is more general and complex but the idea is similar.) And so on. Some of these patterns have correspodning notions in Jepsen, e.g. our `stagger` has a corresponding `stagger` in Jepsen (although our `stagger` is more general).	2021-08-16 13:07:08 +02:00
Kamil Braun	d8863c5a7b	test: raft: introduce generators We introduce the concepts of "operations" and "generators", basic building blocks that will allow us to declaratively write randomized tests for torturing simulated Raft clusters. An "operation" is a data structure representing a computation which may cause side effects such as calling a Raft cluster or partitioning the network, represented in the code with the `Executable` concept. It has an `execute` function performing the computation and returns a result of type `result_type`. Different computations of the same type share state of type `state_type`. The state can, for example, contain database handles. Each execution is performed on an abstract `thread' (represented by a `thread_id`) and has a logical starting time point. The thread and start point together form the execution's `context` which is passed as a reference to `execute`. Two operations may be called in parallel only if they are on different threads. A generator, represented through the `Generator` concept, produces a sequence of operations. An operation can be fetched from a generator using the `op` function, which also returns the next state of the generator (generators are purely functional data structures). The generator concept is inspired by the generators in the Jepsen testing library for distributed systems. We also implement `interpreter` which "interprets", or "runs", a given generator, by fetching operations from the generator and executing them with concurrency controlled by the abstract threads. The algorithm used in the interpreter is also similar to the interpreter algorithm in Jepsen, although there are differences. Most notably we don't have a "worker" concept - everything runs on a single shard; but we use "abstract threads" combined with futures for concurrency. There is also no notion of "process". Finally, the interpreter doesn't keep an explicit history, but instead uses a callback `Recorder` to notify the user about operation invocations and completions. The user can decide to save these events in a history, or perhaps they can analyze them on the fly using constant memory.	2021-08-16 13:07:08 +02:00
Kamil Braun	421b1b9494	test: raft: introduce `future_set` A set of futures that can be polled. Polling the set (`poll` function) returns the value of one of the futures which became available or `std::nullopt` if the given logical durationd passes (according to the given timer), whichever event happens first. The current implementation assumes sequential polling. New futures can be added to the set with `add`. All futures can be removed from the set with `release`.	2021-08-16 13:07:08 +02:00
Kamil Braun	a5e92e1c45	test: raft: randomized_nemesis_test: handle `raft::stopped_error` in timeout futures The timeout futures in `call` and `reconfigure` may be discarded after Raft servers were `abort()`ed which would result in `raft::stopped_error` and the test complained about discarded exceptional futures. Discard these errors explicitly.	2021-08-16 13:07:08 +02:00
Kamil Braun	7533c84e62	raft: sometimes become a candidate even if outside the configuration There are situations where a node outside the current configuration is the only node that can become a leader. We become candidates in such cases. But there is an easy check for when we don't need to; a comment was added explaining that.	2021-08-06 13:18:32 +02:00
Kamil Braun	93822b0ee7	test: raft: regression test for storing cluster configuration when taking snapshots Before the fix introduced in the previous patch, the cluster would forget its configuration when taking a snapshot, making it unable to reelect a leader. This regression test catches that.	2021-08-06 12:17:22 +02:00
Kamil Braun	f050d3682c	raft: fsm: stronger check for outdated remote snapshots We must not apply remote snapshots with commit indexes smaller than our local commit index; this could result in out-of-order command application to the local state machine replica, leading to serializability violations. Message-Id: <20210805112736.35059-1-kbraun@scylladb.com>	2021-08-05 14:29:50 +02:00
Kamil Braun	4165045356	test: raft: randomized_nemesis_test: handle timeouts in rpc::send_snapshot They were already correctly returned to the caller, but we had a leftover discarded future that would sometimes end up with a broken_promise exception. Ignore the exception explicitly. Message-Id: <20210803122207.78406-1-kbraun@scylladb.com>	2021-08-04 15:24:47 +03:00
Tomasz Grabiec	3e47f28c65	Merge "raft: use the correct term when storing a snapshot" from Kamil We should not use the current term; we should use the term of the snapshot's index, which may be lower. * https://github.com/kbr-/scylla/tree/snapshot-right-term-fix: test: raft: regression test for using the correct term when taking a snapshot test: raft: randomized_nemesis_test: server configuration parameter raft: use the correct term when storing a snapshot	2021-08-02 15:33:52 +02:00
Kamil Braun	ac5121a016	test: raft: regression test for using the correct term when taking a snapshot	2021-08-02 11:48:35 +02:00
Kamil Braun	63fdc718d4	test: raft: randomized_nemesis_test: server configuration parameter	2021-08-02 11:47:19 +02:00
Gleb Natapov	4764028cb3	raft: Remove leader_id from append_request The filed is not used anywhere. Message-Id: <YP0khmjK2JSp77AG@scylladb.com>	2021-07-28 20:30:07 +02:00
Kamil Braun	b5a7220da4	test: raft: randomized_nemesis_test: `reconfigure` function Instead of calling `set_configuration` directly on a `raft::server`, the caller will use the higher-level `reconfigure`. Similarly to `call`, the function converts exceptions into return values (inside a `variant`) and allows passing in a timeout parameter.	2021-07-13 11:15:26 +02:00
Kamil Braun	eb4a8d48aa	test: raft: randomized_nemesis_test: refactor waiting for leader into a separate function	2021-07-13 11:15:26 +02:00
Kamil Braun	69c59ec801	test: raft: randomized_nemesis_test: persistence: avoid creating gaps in the log when storing snapshots When storing a snapshot `snap`, if `snap.idx > e.idx` where `e` is the last entry in the log (if any), we need to clear all previous entries so that we don't create a gap in the log. The log must remain contiguous. One case is controversial: what to do if `snap.idx == e.idx + 1`. Technically no gap would be created between the entry and the snapshot. However, if we now want to store a new entry with index `e.idx + 2`, that would create a gap between two entries which is illegal.	2021-07-13 11:15:26 +02:00
Kamil Braun	f381a97f6f	test: raft: randomized_nemesis_test: persistence: handle complex state types The usage of `template <..., State init_state>` in `persistence` permitted using only a very restricted class of types (so called "structural types"). Pass the initial state through `persistence`'s constructor instead. Also modify the member functions so the State type doesn't need to have a default constructor.	2021-07-13 11:15:25 +02:00
Kamil Braun	59e04b2b2e	test: raft: randomized_nemesis_test: `call`: handle `raft::dropped_entry` This exception happens when the leader stops being a leader in the middle of a call. Expect it to happen and return it in the result variant.	2021-07-13 11:15:25 +02:00
Kamil Braun	d97cf1a254	test: raft: randomized_nemesis_test: impure_state_machine/call: handle dropped channels Inside `call`, if `add_entry` failed or the operation timed out, the output channel promise would be dropped without setting a value, causing a `broken_promise` exception. Furthermore the output future would be dropped, so we get a discarded `broken_promise` future. The fix: 1. When we drop a channel without a result (inside `impure_state_machine::with_output_channel`), set an explicit exception with a dedicated type. 2. Discard the channel future in a controlled way, explicitly handling the `output_channel_dropped` exception.	2021-07-13 11:15:25 +02:00
Kamil Braun	f51ff786bd	test: raft: randomized_nemesis_test: environment: expose the network Let the user of `environment` access the `network` directly for e. g. introducing network partitions.	2021-07-13 11:15:25 +02:00
Kamil Braun	26d2f99cad	test: raft: randomized_nemesis_test: configurable network delay and FD convict threshold The following are now passed to `environement` as parameters: - network delay, - failure detector convict threshold. Environment passes them further down when constructing the underlying objects.	2021-07-13 11:15:25 +02:00
Kamil Braun	035ae2eb1b	test: raft: randomized_nemesis_test: generalize `with_env_and_ticker` Generalize the type of the callback: use a template parameter instead of `noncopyable_function` and don't assume the return type of the callback. This allows returning a result from `with_env_and_ticker`, e.g. for performing analysis or logging the results after a part of the test that used the environment and ticker have finished.	2021-07-13 11:15:25 +02:00
Kamil Braun	25fb195bc7	test: raft: randomized_nemesis_test: network: `add_grudge`, `remove_grudge` functions Extend the interface of `network` to allow introducing and removing "grudges" which prevent the delivery of messages from one given server to another (when the time comes to deliver a message but there's a grudge, the message is dropped).	2021-07-13 11:15:25 +02:00
Kamil Braun	774ef653b1	test: raft: randomized_nemesis_test: move `ticker` to its own header	2021-07-13 11:15:25 +02:00
Kamil Braun	a45e8e0db0	test: raft: randomized_nemesis_test: ticker: take `logger` as a constructor parameter Remove the global dependency on `tlogger`.	2021-07-13 11:15:25 +02:00
Kamil Braun	21b5a6d9f7	test: raft: logical_timer: handle immediate timeout If the user calls `with_timeout` with a time point that's already been reached, we return `timed_out_error` immediately.	2021-07-13 11:15:25 +02:00
Kamil Braun	ed8e9a564a	test: raft: logical_timer: on timeout, return the original future in the exception More specifically, return a future which is equivalent to the original future (when the original future resolves, this future will contain its result). Thus we don't discard the future, the user gets it back. Let them decide what to do with it.	2021-07-13 11:15:25 +02:00

1 2 3 4 5 ...

306 Commits