Commit Graph

28388 Commits

Author SHA1 Message Date
Jan Ciolek
d61b2dbf8a cql3: Implement term::to_expression for collections
Each collection delayed_value can now be converted to expression.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Jan Ciolek
f17d003808 cql3: Implement term::to_expression for tuples
Each tuples::delayed_value can now be converted to expression.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Jan Ciolek
c40f227c14 cql3: Implement term::to_expression for marker classes
Implement to_expression for non terminals that represent a bind marker.
For now each bind marker has a shape describing where it is used, but hopefully this can be removed in the future.

In order to evaluate a bind_variable we need to know its type.
The type is needed to pass to constant and to validate the value.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Jan Ciolek
499c9235fc cql3: expr: Add data_type to *_constructor structs
It is useful to have a data_type in *_constructor structs when evaluating.
The resulting constant has a data_type, so we have to find it somehow.

For tuple_constructor we don't have to create a separate tuple_type_impl instance.
For collection_constructor we know what the type is even in case of an empty collection.
For usertype_constructor we know the name, type and order of fields in the user type.

Additionally without a data_type we wouldn't know whether the type is reversed or not.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Jan Ciolek
f86a1270b0 cql3: Add term::to_expression method
Add a method that converts given term to the matching expression.
It will be used as an intermediate step when implementing evaluate(expression).
evaluate(term) will convert the term to the expression and then call evaluate(expression).

For terminals this is simply calling get() to serialize the value.
For non-terminals the implementation is more complicated and will be implemeted in the following commits.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Jan Ciolek
746e9c620f cql3: Reorganize term and expression includes
Make term.hh include expression.hh instead of the other way around.
expression can't be forward declared.
expression is needed in term.hh to declare term::to_expression().

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Tomasz Grabiec
f582bfd453 Merge "test: raft: randomized_nemesis_test: generator test with linearizability checking" from Kamil
The AppendReg state machine stores a sequence of integers. It supports
`append` inputs which append a single integer to the sequence and return
the previous state (before appending).

The implementation uses the `append_seq` data structure
representing an immutable sequence that uses a vector underneath
which may be shared by multiple instances of `append_seq`.
Appending to the sequence appends to the underlying vector,
but there is no observable effect on the other instances since
they use only the prefix of the sequence that wasn't changed.
If two instances sharing the same vector try to append,
the later one must perform a copy.

This allows efficient appends if only one instance is appending, which
is useful in the following context:
- a Raft server stores a copy in the underlying state machine replica
  and appends to it,
- clients send append operations to the server; the server returns the
  state of the sequence before it was appended to,
- thanks to the sharing, we don't need to copy all elements when
  returning the sequence to the client, and only one instance (the
  server) is appending to the shared vector,
- summarizing, all operations have amortized O(1) complexity.

We use AppendReg instead of ExReg in `basic_generator_test`
with a generator which generates a sequence of append operations with
unique integers.

This implies that the result of every operation uniquely identifies the
operation (since it contains the appended integer, and different
operations use different integers) and all operations that must have
happened before it (since it contains the previous state of the append
register), which allows us to reconstruct the "current state" of the
register according to the results of operations coming from Raft calls,
giving us an on-line serializability checker with O(1) amortized
complexity on each operation completion.
We also enforce linearizability by checking that every
completed operation was previously invoked.

We also perform a simple liveness check at the end of the test by
ensuring that a leader becomes eventually elected and that we can
successfully execute a call.

* kbr/linearizability-v2:
  test: raft: randomized_nemesis_test: check consistency and liveness in basic_generator_test
  test: raft: randomized_nemesis_test: introduce append register
2021-09-23 23:55:13 +02:00
Benny Halevy
7e9ca101ae storage_service: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-31-bhalevy@scylladb.com>
2021-09-23 17:36:43 +03:00
Benny Halevy
ecbe9f1ef6 storage_service: coroutinize rebuild
Prepare for futurizing get_ranges_for_endpoint.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-30-bhalevy@scylladb.com>
2021-09-23 17:36:42 +03:00
Benny Halevy
c8b12afe1b storage_service: effective_ownership: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-29-bhalevy@scylladb.com>
2021-09-23 17:35:32 +03:00
Benny Halevy
add78a8cc0 storage_service: coroutinize effective_ownership
Prepare for futurizing get_ranges_for_endpoint.

Dtest: nodetool_additional_test:TestNodetool.status_test

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-28-bhalevy@scylladb.com>
2021-09-23 17:34:56 +03:00
Avi Kivity
7127c92acc Merge "simplifications and layer violation fix for compaction manager" from Raphael
"This series removes layer violation in compaction, and also
simplifies compaction manager and how it interacts with compaction
procedure."

* 'compaction_manager_layer_violation_fix/v3' of github.com:raphaelsc/scylla:
  compaction: split compaction info and data for control
  compaction_manager: use task when stopping a given compaction type
  compaction: remove start_size and end_size from compaction_info
  compaction_manager: introduce helpers for task
  compaction_manager: introduce explicit ctor for task
  compaction: kill sstables field in compaction_info
  compaction: kill table pointer in compaction_info
  compaction: simplify procedure to stop ongoing compactions
  compaction: move management of compaction_info to compaction_manager
  compaction: move output run id from compaction_info into task
2021-09-23 17:29:19 +03:00
Raphael S. Carvalho
5bf51ced14 compaction: split compaction info and data for control
compaction_info must only contain info data to be exported to the
outside world, whereas compaction_data will contain data for
controlling compaction behavior and stats which change as
compaction progresses.
This separation makes the interface clearer, also allowing for
future improvements like removing direct references to table
in compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:56:18 -03:00
Raphael S. Carvalho
6e7729fa21 compaction_manager: use task when stopping a given compaction type
compaction_info will eventually only be used for exporting data about
ongoing compactions, so task must be used instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:53:53 -03:00
Raphael S. Carvalho
6d1170ac94 compaction: remove start_size and end_size from compaction_info
those stats aren't used in compaction stats API and therefore they
can be removed. end_size is added to compaction_result (needed for
updating history) and start_size can be calculated in advance.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:41:13 -03:00
Raphael S. Carvalho
2353f40f63 compaction_manager: introduce helpers for task
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:38:39 -03:00
Raphael S. Carvalho
6820fbf460 compaction_manager: introduce explicit ctor for task
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:38:36 -03:00
Raphael S. Carvalho
d73a241a4e compaction: kill sstables field in compaction_info
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:38:32 -03:00
Raphael S. Carvalho
b6b4042faf compaction: kill table pointer in compaction_info
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:38:11 -03:00
Raphael S. Carvalho
98f8673d4e compaction: simplify procedure to stop ongoing compactions
Today, compactions are tracked by both _compactions and _tasks,
where _compactions refer to actual ongoing compaction tasks,
whereas _tasks refer to manager tasks which is responsible for
spawning new compactions, retry them on failure, etc.
As each task can only have one ongoing compaction at a time,
let's move compaction into task, such that manager won't have to
look at both when deciding to do something like stopping a task.

So stopping a task becomes simpler, and duplication is naturally
gone.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:25:51 -03:00
Raphael S. Carvalho
0885376a85 compaction: move management of compaction_info to compaction_manager
Today, compaction is calling compaction manager to register / deregister
the compaction_info created by it.

This is a layer violation because manager sits one layer above
compaction, so manager should be responsible for managing compaction
info.

From now on, compaction_info will be created and managed by
compaction_manager. compaction will only have a reference to info,
which it can use to update the world about compaction progress.

This will allow compaction_manager to be simplified as info can be
coupled with its respective task, allowing duplication to be removed
and layer violation to be fixed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:00:49 -03:00
Raphael S. Carvalho
7688d0432c compaction: move output run id from compaction_info into task
this run id is used to track partial runs that are being written to.
let's move it from info into task, as this is not an external info,
but rather one that belongs to compaction_manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 09:56:01 -03:00
Piotr Sarna
88480ac504 cql-pytest: relax another condition for a failed wasm execution
The previous commit already relaxed the condition for test_fib,
but the same should be done for test_fib_called_on_null
for an identical reason - more than 1 error can be expected
in the case of calling heavily recursive function, and either
fuel exhaustion, or hitting the stack limit, or any other
InvalidRequest exception should be accepted.

Closes #9363
2021-09-23 14:11:02 +03:00
Benny Halevy
ad46ff8e5e database: coroutinize create_keyspace
Prepare for futurizing on create_in_memory_keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-10-bhalevy@scylladb.com>
2021-09-23 14:05:44 +03:00
Benny Halevy
91091e9d89 database: update_keyspace: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-9-bhalevy@scylladb.com>
2021-09-23 14:05:18 +03:00
Benny Halevy
c71cd2bed3 database: coroutinize update_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-8-bhalevy@scylladb.com>
2021-09-23 14:05:18 +03:00
Piotr Sarna
62948b7404 Merge 'cql3: Add expr::constant to replace terminal' from Jan Ciołek
Add new struct to the `expression` variant:
```c++
// A value serialized with the internal (latest) cql_serialization_format
struct constant {
    cql3::raw_value value;
    data_type type; // Never nullptr, for NULL and UNSET might be empty_type
};
```
and use it where possible instead of `terminal`.

This struct will eventually replace all classes deriving from
`terminal`, but for now `terminal` can't be removed completely.

We can't get rid of terminal yet, because sometimes `terminal` is
converted back to `term`, which `constant` can't do. This won't be a
problem once we replace term with expression.

`bool` is removed from `expression`, now `constant` is used instead.

This is a redesign of PR #9203, there is some discussion about the
chosen representation there.

Closes #9371

* github.com:scylladb/scylla:
  cql3: term: Remove get_elements and multi_item_terminal from terminals
  cql3: Replace most uses of terminal with expr::constant
  cql3: expr: Remove repetition from expr::get_elements
  cql3: expr: Add expr::get_elements(constant)
  cql3: term: remove term::bind_and_get
  cql3: Replace all uses of bind_and_get with evaluate_to_raw_view
  cql3: expr: Add evaluate_IN_list
  cql3: tuples: Implement tuples::in_value::get
  cql3: Move data_type to terminal, make get_value_type non-virtual
  cql3: user_types: Implement get_value_type in user_types.hh
  cql3: tuples: Implement get_value_type in tuples.hh
  cql3: maps: Implement get_value_type in maps.hh
  cql3: sets: Implement get_value_type in sets.hh
  cql3: lists: Implement get_value_type in lists.hh
  cql3: constants: Implement get_value_type in constants.hh
  cql3: expr: Add expr::evaluate
  cql3: Make collection term get() use the internal serialization format
  cql3: values: Add unset value to raw_value_view::make_temporary
  cql3: expr: Add constant to expression
2021-09-23 13:02:29 +02:00
Avi Kivity
369afe3124 treewide: use coroutine::maybe_yield() instead of co_await make_ready_future()
The dedicated API shows the intent, and may be a tiny bit faster.

Closes #9382
2021-09-23 12:28:56 +02:00
Avi Kivity
6702711d9c Merge "Gossiper start-stop sanitation (+ bonus track)" from Pavel E
"
The main challenge here is to move messaging_service.start_listen()
call from out of gossiper into main. Other changes are pretty minor
compared to that and include

- patch gossiper API towards a standard start-shutdown-stop form
- gossiping "sharder info" in initial state
- configure cluster name and seeds via gossip_config

tests: unit(dev)
       dtest.bootstrap_test.start_stop_test_node(dev)
       manual(dev): start+stop, nodetool enable-/disablegossip

refs: #2737
refs: #2795
refs: #5489

"

* 'br-gossiper-dont-start-messaging-listen-2' of https://github.com/xemul/scylla:
  code: Expell gossiper.hh from other headers
  storage_service: Gossip "sharder" in initial states
  gossiper: Relax set_seeds()
  gossiper, main: Turn init_gossiper into get_seeds_from_config
  storage_service: Eliminate the do-bind argument from everywhere
  gossiper: Drop ms-registered manipulations
  messaging, main, gossiper: Move listening start into main
  gossiper: Do handlers reg/unreg from start/stop
  gossiper: Split (un)init_messaging_handler()
  gossiper: Relocate stop_gossiping() into .stop()
  gossiper: Introduce .shutdown() and use where appropriate
  gossiper: Set cluster_name via gossip_config
  gossiper, main: Straighten start/stop
  tests/cql_test_env: Open-code tst_init_ms_fd_gossiper
  tests/cql_test_env: De-global most of gossiper
  gossiper: Merge start_gossiping() overloads into one
  gossiper: Use is_... helpers
  gossiper: Fix do_shadow_round comment
  gossiper: Dispose dead code
2021-09-23 12:18:38 +03:00
Avi Kivity
bae9c042c2 Merge 'Add compaction stats to tracing data' from Botond Dénes
Too many tombstones (row or range) are a common source of query performance problems, yet currently we have no visibility into the amount of tombstones a query has to process while constructing the results. This series addresses this by collecting stats about the compacted data in `compact_mutation_state`. This contains the number of partitions, static rows (live and dead), clustering rows (live and dead) and range tombstones. This data is then added to tracing on each query path.
Example trace:
```
 activity                                                                                                                              | timestamp                  | source    | source_elapsed | client
---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                    Execute CQL3 query | 2021-09-22 12:06:24.089000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                         Parsing a statement [shard 0] | 2021-09-22 12:06:24.089552 | 127.0.0.1 |              1 | 127.0.0.1
                                                                                                      Processing a statement [shard 0] | 2021-09-22 12:06:24.089674 | 127.0.0.1 |            122 | 127.0.0.1
      Creating read executor for token -4069959284402364209 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] | 2021-09-22 12:06:24.089724 | 127.0.0.1 |            173 | 127.0.0.1
                                                                                                 read_data: querying locally [shard 0] | 2021-09-22 12:06:24.089727 | 127.0.0.1 |            175 | 127.0.0.1
                                                    Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] | 2021-09-22 12:06:24.089732 | 127.0.0.1 |            181 | 127.0.0.1
                                Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice {(-inf, +inf)} [shard 0] | 2021-09-22 12:06:24.089751 | 127.0.0.1 |            199 | 127.0.0.1
 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 4 clustering row(s) (3 live, 1 dead) and 1 range tombstone(s) [shard 0] | 2021-09-22 12:06:24.089838 | 127.0.0.1 |            286 | 127.0.0.1
                                                                                                            Querying is done [shard 0] | 2021-09-22 12:06:24.089847 | 127.0.0.1 |            295 | 127.0.0.1
                                                                                        Done processing - preparing a result [shard 0] | 2021-09-22 12:06:24.089862 | 127.0.0.1 |            311 | 127.0.0.1
                                                                                                                      Request complete | 2021-09-22 12:06:24.089326 | 127.0.0.1 |            326 | 127.0.0.1

```

Tests: unit(dev)

Fixes: https://github.com/scylladb/scylla/issues/5471

Closes #9372

* github.com:scylladb/scylla:
  multishard_mutation_query: add tracepoint with compaction stats
  querier: add tracepoint with compaction stats
  mutation_compactor: collect stats about compacted data
2021-09-22 19:24:19 +03:00
Kamil Braun
ea172fe531 test: raft: randomized_nemesis_test: check consistency and liveness in basic_generator_test
Use AppendReg instead of ExReg for the state machine.
Use a generator which generates a sequence of append operations with
unique integers.

This implies that the result of every operation uniquely identifies the
operation (since it contains the appended integer, and different
operations use different integers) and all operations that must have
happened before it (since it contains the previous state of the append
register), which allows us to reconstruct the "current state" of the
register according to the results of operations coming from Raft calls,
giving us an on-line linearizability checker with O(1) amortized
complexity on each operation completion.

We also perform a simple liveness check at the end of the test by
ensuring that a leader becomes eventually elected and that we can
successfully execute a call.
2021-09-22 17:56:23 +02:00
Avi Kivity
c0afdf3f15 Update seastar submodule
* seastar c04a12edbd...e6db0cd587 (13):
  > Merge "Add kernel stack trace reporting for stalls" from Avi
Ref #8828
  > Merge "Keep XFS' dioattr cached" from Pavel E
  > coroutines: de-template maybe_yield()
  > sharded: Add const versions of map_reduce's
  > apps/io_tester: remove unused lambda capture
  > doc: exclude seastar::coroutine::internal namespace
  > deprecate unaligned_cast<> from unaligned.hh
  > reactor: adjust max_networking_aio_io_control_blocks to lower size when fs.aio-max-nr is small
  > build: clarify choice of C++ dialect, and change default to C++20
  > coding_style: update concepts style to snake_case
  > Merge "Teach io_tester to submit requests-per-second flow" from Pavel E
  > cmake: find and link against Boost::filesystem
  > coroutine: add maybe_yield
2021-09-22 18:55:25 +03:00
Nadav Har'El
92570ea7d9 cql-pytest: add tests on behavior of empty-string keys
We know (verified by existing tests) that null keys are not allowed -
neither as partition keys nor clustering keys.
In issue #9352 a question was raised of whether an *empty string* is
allowed as as a key on a base table (not a materialized view or index).
The following tests confirm that the current situation is as follows:

1. An empty string is perfectly legal as a clustering key.
2. An empty string is NOT ALLOWED as a partition key - the error
   "Key may not be empty" is reported if this is attempted.
3. If the partition key is compound (multiple partition-key columns)
   then any or all of them may be empty strings.

These tests pass the same on both Cassandra and Scylla, showing that
this bizarre (and undocumented) behavior is identical in both.

Refs #9352.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210922131310.293846-1-nyh@scylladb.com>
2021-09-22 18:55:25 +03:00
Avi Kivity
083279d9ab Merge "Generalize sstable creation for tests" from Pavel E
"
There's a whole lot of places that create an sstable for tests
like this

    auto sst = env.make_sstable(...);
    sst->write_components(...);
    sst->load();

Some of them are already generalized with the make_sstable_easy
helper, but there are several instances of them.

Found while hunting down the places that use default IO sched
class behind the scenes.

tests: unit(dev)
"

* 'br-sst-tests-make-sstable-easy' of https://github.com/xemul/scylla:
  test: Generalize make_sstable() and make_sstable_easy()
  test: Use now existing helpers elsewhere
  test: Generalize all make_sstable_easy()-s
  test: Set test change estimation to 1
  test: Generalize make_sstable_easy in mutation tests
  test: Generalize make_sstable_easy in set tests
  test: Reuse make_sstable_easy in datafile tests
  test: Relax make_sstable_easy in compaction tests
2021-09-22 18:55:25 +03:00
Nadav Har'El
a99a774731 cql-pytest: test for secondary-index on empty-string value
When a string column is indexed with a secondary index, the empty value
for this column (an empty string '') is perfectly legal, and should be
indexed as well. This is not the same as an unset (null) value which
isn't indexed.

The following test demonstrates that this case works in Cassandra, but
does not in Scylla (so the test is marked "xfail"). In Scylla, a query
that returns the expected results with ALLOW FILTERING suddenly returns
a different (and wrong) result when an index is added on the table.

This test reproduces issue #9364.

Refs #9364.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210922121510.291826-1-nyh@scylladb.com>
2021-09-22 18:55:25 +03:00
Avi Kivity
b5cf0b4489 Merge "compaction: Update backlog tracker correctly when schema is updated" from Raphael
"
Backlog tracker isn't updated correctly when facing a schema change, and
may leak a SSTable if compaction strategy is changed, which causes
backlog to be computed incorrectly. Most of these problems happen because
sstable set and tracker are updated independently, so it could happen
that tracker lose track (pun intended) of changes applied to set.

The first patch will fix the leak when strategy is changed, and the third
patch will make sure that tracker is updated atomically with sstable set,
so these kind of problems will not happen anymore.

Fixes #9157

test: mode(debug)
"

* 'fixes_to_backlog_tracker_v3' of https://github.com/raphaelsc/scylla:
  compaction: Update backlog tracker correctly when schema is updated
  compaction: Don't leak backlog of input sstable when compaction strategy is changed
  compaction: introduce compaction_read_monitor_generator::remove_exhausted_sstables()
  compaction: simplify removal of monitors
2021-09-22 18:55:25 +03:00
Nadav Har'El
e8493e20cb cql-pytest: test for empty-string as partition key in materialized view
Scylla and Cassandra do not allow an empty string as a partition key,
but a materialized view might "convert" a regular string column into a
partition key, and an empty string is a perfectly valid value for this
column. This can result in a view row which has an empty string as a
partition key. This case works in Cassandra, but doesn't in Scylla (the
row with the empty string as a partition key doesn't appear). The
following test demonstrates this difference between Scylla and Cassandra
(it passes on Cassandra, fails on Scylla, and accordingly marked
"xfail").

Refs #9375.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210922115000.290387-1-nyh@scylladb.com>
2021-09-22 18:55:25 +03:00
Piotr Jastrzebski
56888c8954 docs: clean up codeowners
Recently we had to say goodbye to our dear friend Pekka.
He orphaned a few subsystems that can't call for his help in code
reviews anymore.

This patch makes sure no one will bother Pekka in his afterlife.
It also cleanups HACKING.md a little bit by removing Pekka and Duarte
from the maintainer/reviewer lists.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <98ba1aed9ee8a87b9037b5032b82abc5bfddbd66.1632301309.git.piotr@scylladb.com>
2021-09-22 18:55:25 +03:00
Botond Dénes
3f4f408bcf schema: add get_reversed()
A variant of make_reversed() which goes through the schema registry,
teaching the schema to the registry if necessary. This effectively
caches the result of the reversing and as an added bonus double
reversing yields the very same schema C++ object that was the starting
point.

Closes #9365
2021-09-22 18:55:25 +03:00
Kamil Braun
81b7ed23bb test: raft: randomized_nemesis_test: introduce append register
The AppendReg state machine stores a sequence of integers. It supports
`append` inputs which append a single integer to the sequence and return
the previous state (before appending).

The implementation uses the `append_seq` data structure
representing an immutable sequence that uses a vector underneath
which may be shared by multiple instances of `append_seq`.
Appending to the sequence appends to the underlying vector,
but there is no observable effect on the other instances since
they use only the prefix of the sequence that wasn't changed.
If two instances sharing the same vector try to append,
the later one must perform a copy.

This allows efficient appends if only one instance is appending, which
is useful in the following context:
- a Raft server stores a copy in the underlying state machine replica
  and appends to it,
- clients send append operations to the server; the server returns the
  state of the sequence before it was appended to,
- thanks to the sharing, we don't need to copy all elements when
  returning the sequence to the client, and only one instance (the
  server) is appending to the shared vector,
- summarizing, all operations have amortized O(1) complexity.
2021-09-22 17:54:07 +02:00
Botond Dénes
922295dd8e multishard_mutation_query: add tracepoint with compaction stats
Add the content of the compaction stats introduced in the previous patch
to the tracing data. This will help diagnose query performance related
problems caused by tombstones.
2021-09-22 14:00:24 +03:00
Botond Dénes
eba46e353d querier: add tracepoint with compaction stats
Add the content of the compaction stats introduced in the previous patch
to the tracing data. This will help diagnose query performance related
problems caused by tombstones.
2021-09-22 14:00:05 +03:00
Botond Dénes
f0ead81250 mutation_compactor: collect stats about compacted data
Stats contain the number of partitions, static rows, clustering rows and
range tombstones. For rows dead/live are counted separately.
2021-09-22 13:59:19 +03:00
Pavel Emelyanov
598841a5dd code: Expell gossiper.hh from other headers
This needs to add forward declarations of the gossiper class and
re-include some other headers here and there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-22 13:13:06 +03:00
Pavel Emelyanov
6875a4b292 storage_service: Gossip "sharder" in initial states
Right now the number of shards and ignore-msb-bits are gossiped
with a separate call. It's simpler to include this data into
the initial gossiping state.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-22 13:13:06 +03:00
Pavel Emelyanov
968e117315 gossiper: Relax set_seeds()
It's much shorter and simpler to pass the seeds, obtained from the
config, into gossiper via gossip_config rahter than with the help
of a special call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-22 13:13:06 +03:00
Pavel Emelyanov
2b63c4c16f gossiper, main: Turn init_gossiper into get_seeds_from_config
Looking into init_gossiper() helper makes it clear that what it does
is gets seeds, provider and listen_address from config and generates
a set of seeds for the gossiper. Then calls gossiper.set_seeds().

This patch renames the helper into get_seeds_from_config(), removes
all but db::config& argunebts from it and moves the call to set_seed()
into main.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-22 13:13:06 +03:00
Pavel Emelyanov
7680274e02 storage_service: Eliminate the do-bind argument from everywhere
The same as in previous patch -- the gossiper doesn't need to know
if it should call messaging.start_listen() or not, neither should
do the storage_service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-22 13:13:06 +03:00
Pavel Emelyanov
0607a2b84f gossiper: Drop ms-registered manipulations
Now it's no-op and can be removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-22 13:13:06 +03:00
Pavel Emelyanov
ca316f32f0 messaging, main, gossiper: Move listening start into main
Before preparing the cluster join process the messaging should be
put into listening state. Right now it's done "on-demand" by the
call to the do_shadow_round(), also there's a safety call in the
start_gossiping(). Tests, however, should not start listening, so
the do_bind boolean exists and is passed all the way around.

Make the main() code explicitly call the messaging.start_listen()
and leave tests without it. This change makes messaging start
listening a bit earlier, but in between these old and new places
there's nothing that needs messaging to stay deaf.

As the do_bind becomes useless, the wait_for_gossip_to_settle() is
also moved into main.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-22 13:13:06 +03:00