Commit Graph

2154 Commits

Author SHA1 Message Date
Pavel Solodovnikov
c0854a0f62 raft: create system tables only when raft experimental feature is set
Also introduce a tiny function to return raft-enabled db config
for cql testing.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210826091432.279532-1-pa.solodovnikov@scylladb.com>
2021-08-26 12:21:12 +03:00
Avi Kivity
acf8da2bce Merge "flat_mutation_reader: keep timeout in permit" from Benny
"
This series moves the timeout parameter, that is passed to most
f_m_r methods, into the reader_permit.  This eliminates
the need to pass the timeout around, as it's taken
from the permit when needed.

The permit timeout is updated in certain cases
when the permit/reader is paused and retrieved
later on for reuse.

Following are perf_simple_query results showing ~1%
reduction in insns/op and corresponding increase in tps.

$ build/release/test/perf/perf_simple_query -c 1 --operations-per-shard 1000000 --task-quota-ms 10

Before:
102500.38 tps ( 75.1 allocs/op,  12.1 tasks/op,   45620 insns/op)

After:
103957.53 tps ( 75.1 allocs/op,  12.1 tasks/op,   45372 insns/op)

Test: unit(dev)
DTest:
    repair_additional_test.py:RepairAdditionalTest.repair_abort_test (release)
    materialized_views_test.py:TestMaterializedViews.remove_node_during_mv_insert_3_nodes_test (release)
    materialized_views_test.py:InterruptBuildProcess.interrupt_build_process_with_resharding_half_to_max_test (release)
    migration_test.py:TTLWithMigrate.big_table_with_ttls_test (release)
"

* tag 'reader_permit-timeout-v6' of github.com:bhalevy/scylla:
  flat_mutation_reader: get rid of timeout parameter
  reader_concurrency_semaphore: use permit timeout for admission
  reader_concurrency_semaphore: adjust reactivated reader timeout
  multishard_mutation_query: create_reader: validate saved reader permit
  repair: row_level: read_mutation_fragment: set reader timeout
  flat_mutation_reader: maybe_timed_out: use permit timeout
  test: sstable_datafile_test: add sstable_reader_with_timeout
  reader_permit: add timeout member
2021-08-25 17:51:10 +03:00
Avi Kivity
993f824cfd Merge "raft: implement linearisable reads on a follower" from Gleb and Kostja
"
This series implements section 6.4 of the Raft PhD. It allows to do
linearisable reads on a follower bypassing raft log entirely. After this
series server::read_barrier can be executed on a follower as well as
leader and after it completes local user's state machine state can be
accessed directly.
"

* 'raft-read-v9' of github.com:scylladb/scylla-dev:
  raft: test: add read_barrier test to replication_test
  raft: test: add read_barrier tests to fsm_test
  raft: make read_barrier work on a follower as well as on a leader
  raft: add a function to wait for an index to be applied
  raft: (server) add a helper to wait through uncertainty period
  raft: make fsm::current_leader() public
  raft: add hasher for raft::internal::tagged_uint64
  serialize: add serialized for std::monostate
  raft: fix indentation in applier_fiber
2021-08-25 13:11:35 +03:00
Gleb Natapov
3ff6f76cef raft: test: add read_barrier test to replication_test 2021-08-25 08:57:13 +03:00
Gleb Natapov
ad2c2abcb8 raft: test: add read_barrier tests to fsm_test 2021-08-25 08:57:13 +03:00
Gleb Natapov
03a266d73b raft: make read_barrier work on a follower as well as on a leader
This patch implements RAFT extension that allows to perform linearisable
reads by accessing local state machine. The extension is described
in section 6.4 of the PhD. To sum it up to perform a read barrier on
a follower it needs to asks a leader the last committed index that it
knows about. The leader must make sure that it is still a leader before
answering by communicating with a quorum. When follower gets the index
back it waits for it to be applied and by that completes read_barrier
invocation.

The patch adds three new RPC: read_barrier, read_barrier_reply and
execute_read_barrier_on_leader. The last one is the one a follower uses
to ask a leader about safe index it can read. First two are used by a
leader to communicate with a quorum.
2021-08-25 08:57:13 +03:00
Nadav Har'El
cf06b7cd40 test/alternator: correct some typos in comments
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210729125317.1610573-1-nyh@scylladb.com>
2021-08-24 19:43:29 +03:00
Avi Kivity
4a42b69ba8 Merge "raft: testing: many nodes test" from Alejo
"
Factor out replication test, make it work with different clocks, add
some features, and add a many nodes test with steady_clock. Also
refactor common test helper.

Many nodes test passes for release and dev and normal tick of 100ms for
up to 1000 servers. For debug mode it's much fewer due to lack of
optimizations so it's only tested for smaller numbers.

Tests: unit ({dev}), unit ({debug}), unit ({release})
"

* 'raft-many-22-v12' of https://github.com/alecco/scylla: (21 commits)
  raft: candidate timeout proportional to cluster size
  raft: testing: many nodes test
  raft: replication test: remove unused tick_all
  raft: replication test: delays
  raft: replication test: packet drop rpc helper
  raft: replication test: connectivity configuration
  raft: replication test: rpc network map in raft_cluster
  raft: replication test: use minimum granularity
  raft: replication test: minor: rename local to int ids
  raft: replication test: fix restart_tickers when partitioning
  raft: replication test: partition ranges
  raft: replication test: isolate one server
  raft: replication test: move objects out of header
  raft: replication test: make dummy command const
  raft: replication test: template clock type
  raft: replication test: tick delta inside raft_cluster
  raft: replication test: style - member initializer
  raft: replication test: move common code out
  raft: testing: refactor helper
  raft: log election stages
  ...
2021-08-24 17:05:05 +03:00
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
4e3dcfd7d6 reader_concurrency_semaphore: use permit timeout for admission
Now that the timeout is stored in the reader
permit use it for admission rather than a timeout
parameter.

Note that evictable_reader::next_partition
currently passes db::no_timeout to
resume_or_create_reader, which propagated to
maybe_wait_readmission, but it seems to be
an oversight of the f_m_r api that doesn't
pass a timeout to next_partition().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
9b0b13c450 reader_concurrency_semaphore: adjust reactivated reader timeout
Update the reader's timeout where needed
after unregistering inactive_read.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
f25aabf1b2 flat_mutation_reader: maybe_timed_out: use permit timeout
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00
Benny Halevy
46fb7fe68e test: sstable_datafile_test: add sstable_reader_with_timeout
Verify that the sstable reader (for the highest supported version)
times out properly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00
Benny Halevy
fe479aca1d reader_permit: add timeout member
To replace the timeout parameter passed
to flat_mutation_reader methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00
Alejo Sanchez
a5c74a6442 raft: candidate timeout proportional to cluster size
To avoid dueling candidates with large clusters, make the timeout
proportional to the cluster size.

Debug mode is too slow for a test of 1000 nodes so it's disabled, but
the test passes for release and dev modes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-24 13:09:01 +02:00
Alejo Sanchez
7206eae16e raft: testing: many nodes test
Tests with many nodes and realistic timers and ticks.

Network delays are kept as a fraction of ticks. (e.g. 20/100)

Tests with 600 or more nodes hang in debug mode.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-24 13:09:01 +02:00
Alejo Sanchez
87a03a3485 raft: replication test: remove unused tick_all
Tests now wait for normal ticks for election, remove deprecated tick_all
helper.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-24 13:09:01 +02:00
Alejo Sanchez
14c214d73e raft: replication test: delays
Allow test supplied delays for rpc communication.

Allow supplying network delay, local delay (nodes within the same
server), how many nodes are local, and an extra small delay simulating
local load.

Modify rpc class to support delays. If delays are enabled, it no longer
directly calls the other node's server code but it schedules it to be
called later. This makes the test more realistic as in the previous
version the first candidate was always going to get to all followers
first, preventing a dueling candidates scenario.

Previously, tickers were all scheduled at the same time, so there was no
spread of them across the tick time. Now these tickers are scheduled
with a uniform spread across this time (tick delta).

Also previously, for custom free elections used tick_all() which
traversed _in_configuration sequentially and ticked each. This, combined
with rpc outbound directly calling methods in the other server without
yielding, caused free elections to be unrealistic with same order
determined and first candidate always winning. This patch changes this
behavior. The free election uses normal tickers (now uniformly
distributed in tick delay time) and its loop waits for tick delay time
(yielding) and checks if there's a new leader. Also note the order might
not be the same in debug mode if more than one tick is scheduled.

As rpc messages are sent delayed, network connectivity needs to be
checked again before calling the function on the remote side.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-24 13:05:53 +02:00
Alejo Sanchez
db23823c77 raft: replication test: packet drop rpc helper
Add a helper to check if a packet should be dropped.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
497af3167f raft: replication test: connectivity configuration
Pass packet drops within connectivity configuration struct.
Default to no packet drops.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
e4d5428e8a raft: replication test: rpc network map in raft_cluster
Move rpc network map to raft cluster, no longer as static in rpc class.
2021-08-23 17:50:16 +02:00
Alejo Sanchez
192ac5be4c raft: replication test: use minimum granularity
seastar lowres_clock minimum granularity is 10ms, not 1ms.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
5cfe6c1ca2 raft: replication test: minor: rename local to int ids
For clarity, name 0-based integer ids as int ids not local.
This is in contrast with 1-based UUID ids.
2021-08-23 17:50:16 +02:00
Alejo Sanchez
27d90f0165 raft: replication test: fix restart_tickers when partitioning
When partitioning, elect_new_leader restarts tickers, so don't
re-restart them in this case.

When leader is dropped and no new leader is specified, restart tickers
before free election.

If no change of leader, restart tickers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
e4262291f2 raft: replication test: partition ranges
Allow specifying ranges within partition to handle large number of
nodes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
56a110d42f raft: replication test: isolate one server
Support disconnection of one server with the rest.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
6b3327c753 raft: replication test: move objects out of header
Use a separate cc file for definitions and objects.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
cea18e6830 raft: replication test: make dummy command const
Make dummy command const in header.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
2db3192ac3 raft: replication test: template clock type
Templetize clock type.

Use a struct for run_test to work around
https://bugs.llvm.org/show_bug.cgi?id=50345

With help from @kbr-

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
cb35588fb1 raft: replication test: tick delta inside raft_cluster
Store tick delta inside raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
49cb040037 raft: replication test: style - member initializer
Fix raft_cluster constructor member initializer list.
2021-08-23 17:50:16 +02:00
Alejo Sanchez
6e2ab657b3 raft: replication test: move common code out
Common replication test code moved to header.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
a6cd35c512 raft: testing: refactor helper
Move definitions to helper object file.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Avi Kivity
115d14028b Merge 'Allow multi-parameter user-defined aggregates' from Piotr Sarna
Due to an overzealous assertion put in the code (in one of the last iterations, by the way!) it was impossible to create an aggregate which accepts multiple arguments. The behavior is now fixed, and a test case is provided for it.

Tests: unit(release)

Closes #9211

* github.com:scylladb/scylla:
  cql-pytest: add test case for UDA with multiple args
  cql3: fix aggregates with > 1 argument
2021-08-23 17:45:58 +03:00
Pavel Solodovnikov
22794efc22 db: add experimental option for raft
Introduce `raft` experimental option.
Adjust the tests accordingly to accomodate the new option.

It's not enabled by default when providing
`--experimental=true` config option and should be
requested explicitly via `--experimental-options=raft`
config option.

Hide the code related to `raft_group_registry` behind
the switch. The service object is still constructed
but no initialization is performed (`init()` is not
called) if the flag is not set.

Later, other raft-related things, such as raft schema
changes, will also use this flag.

Also, don't introduce a corresponding gossiper feature
just yet, because again, it should be done after the
raft schema changes API contract is stabilized.

This will be done in a separate series, probably related to
implementing the feature itself.

Tests: unit(dev)

Ref #9239.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210823121956.167682-1-pa.solodovnikov@scylladb.com>
2021-08-23 17:45:58 +03:00
Nadav Har'El
d598a94b43 Merge: everywhere: mark deferred actions noexcept
Merged patch series by By Benny Halevy:

Prepare for updating seastar submodule to a change
that requires deferred actions to be noexcept
(and return void).

Test: unit(dev, debug)

* tag 'deferred_action-noexcept-v1' of github.com:bhalevy/scylla:
  everywhere: make deferred actions noexcept
  cql3: prepare_context: mark methods noexcept
  commitlog: segment, segment_manager: mark methods noexcept
  everywhere: cleanup defer.hh includes
2021-08-23 11:16:17 +03:00
Eliran Sinvani
b33479f731 Micro Benchmark: Fix division by zero in 'perf_fast_forward'
Commit 8d6e575 introduced a new stat, instructions per fragment.
Computing this new stat can end with a division by zero when
the number of fragmens read is 0. Here we fix it by reporting
0 ins/f when no fragments were read.

Fixes #9231

Closes #9232
2021-08-23 10:55:44 +03:00
Avi Kivity
6221b90b89 secondary_index_manager: stop including expression.hh
Use a forward declaration of cql3::expr::oper_t to reduce the
number of translation units depending on expression.hh.

Before:

    $ find build/dev -name '*.d' | xargs cat | grep -c expression.hh
    272

After:

    $ find build/dev -name '*.d' | xargs cat | grep -c expression.hh
    154

Some translation units adjust their includes to restore access
to required headers.

Closes #9229
2021-08-22 21:21:46 +03:00
Benny Halevy
e9aff2426e everywhere: make deferred actions noexcept
Prepare for updating seastar submodule to a change
that requires deferred actions to be noexcept
(and return void).

Test: unit(dev, debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-22 21:11:52 +03:00
Benny Halevy
4439e5c132 everywhere: cleanup defer.hh includes
Get rid of unused includes of seastar/util/{defer,closeable}.hh
and add a few that are missing from source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-22 21:11:39 +03:00
Vlad Zolotarov
7bd1bcd779 loading_shared_values/loading_cache: get rid of iterators interface and return value_ptr from find(...) instead
loading_shared_values/loading_cache'es iterators interface is dangerous/fragile because
iterator doesn't "lock" the entry it points to and if there is a
preemption point between aquiring non-end() iterator and its
dereferencing the corresponding cache entry may had already got evicted (for
whatever reason, e.g. cache size constraints or expiration) and then
dereferencing may end up in a use-after-free and we don't have any
protection against it in the value_extractor_fn today.

And this is in addition to #8920.

So, instead of trying to fix the iterator interface this patch kills two
birds in a single shot: we are ditching the iterators interface
completely and return value_ptr from find(...) instead - the same one we
are returning from loading_cache::get_ptr(...) asyncronous APIs.

A similar rework is done to a loading_shared_values loading_cache is
based on: we drop iterators interface and return
loading_shared_values::entry_ptr from find(...) instead.

loading_cache::value_ptr already takes care of "lock"ing the returned value so that it
would relain readable even if it's evicted from the cache by the time
one tries to read it. And of course it also takes care of updating the
last read time stamp and moving the corresponding item to the top of the
MRU list.

Fixes #8920

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20210817222404.3097708-1-vladz@scylladb.com>
2021-08-22 16:49:40 +03:00
Avi Kivity
bae67dcce6 Merge "mutation_fragment: Specialize appending_hash for it" from Pavel E
"
The mutation_fragments hashing code sitting in row-level repair
upsets clang and makes it spend 20 minutes compiling itself. This
set speeds this up greatly by moving the hashing code into the
mutation_fragment.cc and turning it into the appending_hash<>
specialisation. A simple sanity checking test makes sure this
doesn't change resulting hash values.

tests: unit.hashers_test(dev, release) // hash values matched, phew
       dtest.repair_additional_test.repair_large_partition_existing_rows_test(release)
"

* 'br-row-level-comp-speedup-2.2' of https://github.com/xemul/scylla:
  mutation_fragment: Specialize appending_hash for it
  tests: Add sanity check for hashing mutation_fragments
2021-08-18 11:25:48 +03:00
Pavel Emelyanov
b5fee07527 mutation_fragment: Specialize appending_hash for it
Row-level rpair hashes the mutation fragment and wraps this into a
private fragment_hasher class. For some reason it takes ~20 minutes
for clang to compile the row_level.o with -O3 level (release mode).
Putting the whole fragment_hasher into a dedicated file reduces the
compilation times ~9 times.

However, it seems more natural not to move the fragment_hasher around
but to specialize the appending_hash<> for mutation_fragment and make
row_level.cc code just call feed_hash().

Compilation times (release mode):

                       before     after
row_level.o            19m34s      2m4s
mutation_fragment.o       13s       17s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-08-18 09:17:40 +03:00
Pavel Emelyanov
34f8f10123 tests: Add sanity check for hashing mutation_fragments
Next patch is going to change the way row-level repair code hashes
mutation_fragment objects. This patch prepares the sanity check for
the hash values not be accidentally changed by hashing some simple
fragments and comparing them against known expected values.

The hash_mutation_fragment_for_test helper is added for this patch
only and will be removed really soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-08-18 09:17:40 +03:00
Piotr Sarna
88238c7c2a cql-pytest: add test case for UDA with multiple args
A test case for an aggregate which works on multiple parameters
is added.
2021-08-16 19:52:50 +02:00
Tomasz Grabiec
09b575474b Merge "test: raft: generators infrastructure with an actual random nemesis test" from Kamil
Operations and generators can be composed to create more complex
operations and generators. There are certain composition patterns useful
for many different test scenarios.

We implement a couple of such patterns. For example:
- Given multiple different operation types, we can create a new
  operation type - `either_of` - which is a "union" of the original
  operation types. Executing `either_of` operation means executing an
  operation of one of the original types, but the specific type
  can be chosen in runtime.
- Given a generator `g`, `op_limit(n, g)` is a new generator which
  limits the number of operations produced by `g`.
- Given a generator `g` and a time duration of `d` ticks, `stagger(g, d)` is a
  new generator which spreads the operations from `g` roughly every `d`
  ticks. (The actual definition in code is more general and complex but
  the idea is similar.)

Some of these patterns have correspodning notions in Jepsen, e.g. our
`stagger` has a corresponding `stagger` in Jepsen (although our
`stagger` is more general).

Finally, we implement a test that uses this new infrastructure.

Two `Executable` operations are implemented:
- `raft_call` is for calling to a Raft cluster with a given state
  machine command,
- `network_majority_grudge` partitions the network in half,
  putting the leader in the minority.

We run a workload of these operations against a cluster of 5 nodes with
6 threads for executing the operations: one "nemesis thread" for
`network_majority_grudge` and 5 "client threads" for `raft_call`.
Each client thread randomly chooses a contact point which it tries first
when executing a `raft_call`, but it can also "bounce" - call a
different server when the previous returned "not_a_leader" (we use the
generic "bouncing" wrapper to do this).

For now we only print the resulting history. In a follow-up patchset
we will analyze it for consistency anomalies.

* kbr/raft-test-generator-v4:
  test: raft: randomized_nemesis_test: a basic generator test
  test: raft: generator: a library of basic generators
  test: raft: introduce generators
  test: raft: introduce `future_set`
  test: raft: randomized_nemesis_test: handle `raft::stopped_error` in timeout futures
2021-08-16 15:55:25 +02:00
Kamil Braun
3344ac8a6c test: raft: randomized_nemesis_test: a basic generator test
The previous commits introduced basic the generator concept and a
library of most common composition patterns.

In this commit we implement a test that uses this new infrastructure.

Two `Executable` operations are implemented:
- `raft_call` is for calling to a Raft cluster with a given state
  machine command,
- `network_majority_grudge` partitions the network in half,
  putting the leader in the minority.

We run a workload of these operations against a cluster of 5 nodes with
6 threads for executing the operations: one "nemesis thread" for
`network_majority_grudge` and 5 "client threads" for `raft_call`.
Each client thread randomly chooses a contact point which it tries first
when executing a `raft_call`, but it can also "bounce" - call a
different server when the previous returned "not_a_leader" (we use the
generic "bouncing" wrapper to do this).

For now we only print the resulting history. In a follow-up patchset
we will analyze it for consistency anomalies.
2021-08-16 13:07:08 +02:00
Kamil Braun
66ec484730 test: raft: generator: a library of basic generators
Operations and generators can be composed to create more complex
operations and generators. There are certain composition patterns useful
for many different test scenarios.

This commit introduces a couple of such patterns. For example:
- Given multiple different operation types, we can create a new
  operation type - `either_of` - which is a "union" of the original
  operation types. Executing `either_of` operation means executing an
  operation of one of the original types, but the specific type
  can be chosen in runtime.
- Given a generator `g`, `op_limit(n, g)` is a new generator which
  limits the number of operations produced by `g`.
- Given a generator `g` and a time duration of `d` ticks, `stagger(g, d)` is a
  new generator which spreads the operations from `g` roughly every `d`
  ticks. (The actual definition in code is more general and complex but
  the idea is similar.)

And so on.

Some of these patterns have correspodning notions in Jepsen, e.g. our
`stagger` has a corresponding `stagger` in Jepsen (although our
`stagger` is more general).
2021-08-16 13:07:08 +02:00
Kamil Braun
d8863c5a7b test: raft: introduce generators
We introduce the concepts of "operations" and "generators", basic
building blocks that will allow us to declaratively write randomized
tests for torturing simulated Raft clusters.

An "operation" is a data structure representing a computation which
may cause side effects such as calling a Raft cluster or partitioning
the network, represented in the code with the `Executable` concept.
It has an `execute` function performing the computation and returns
a result of type `result_type`. Different computations of the same type
share state of type `state_type`. The state can, for example, contain
database handles.

Each execution is performed on an abstract `thread' (represented by a `thread_id`)
and has a logical starting time point. The thread and start point together form
the execution's `context` which is passed as a reference to `execute`.

Two operations may be called in parallel only if they are on different threads.

A generator, represented through the `Generator` concept, produces a
sequence of operations. An operation can be fetched from a generator
using the `op` function, which also returns the next state of the
generator (generators are purely functional data structures).

The generator concept is inspired by the generators in the Jepsen
testing library for distributed systems.

We also implement `interpreter` which "interprets", or "runs", a given
generator, by fetching operations from the generator and executing them
with concurrency controlled by the abstract threads.

The algorithm used in the interpreter is also similar to the interpreter
algorithm in Jepsen, although there are differences. Most notably we don't
have a "worker" concept - everything runs on a single shard; but we use
"abstract threads" combined with futures for concurrency.
There is also no notion of "process". Finally, the interpreter doesn't
keep an explicit history, but instead uses a callback `Recorder` to notify
the user about operation invocations and completions. The user can
decide to save these events in a history, or perhaps they can analyze
them on the fly using constant memory.
2021-08-16 13:07:08 +02:00
Kamil Braun
421b1b9494 test: raft: introduce future_set
A set of futures that can be polled.

Polling the set (`poll` function) returns the value of one of
the futures which became available or `std::nullopt` if the given
logical durationd passes (according to the given timer), whichever
event happens first.  The current implementation assumes sequential
polling.

New futures can be added to the set with `add`.
All futures can be removed from the set with `release`.
2021-08-16 13:07:08 +02:00