Commit Graph

37667 Commits

Author SHA1 Message Date
Avi Kivity
91cdaa72bd cql3: selection: evaluate aggregation queries via expr::evaluate()
When constructing a selection_with_processing, split the
selectors into an inner loop and an outer loop with split_aggregation().
We can then reimplement add_input_row() and get_output_row() as follows:

 - add_input_row(): evaluate the inner loop expressions and store
   the results in temporaries
 - get_output_row(): evaluate the outer loop expressions, pulling in
   values from those temporaries.

reset(), which is called between groups, simply copies the initial
values rathered by split_aggregation() into the temporaries.

The only complexity comes from add_column_for_post_query_processing(),
which essentially re-does the work of split_aggregation(). It would
be much better if we added the column before split_aggregation() was
called, but some refactoring has to take place before that happens.
2023-07-03 19:45:17 +03:00
Avi Kivity
27254c4f50 cql3: selection, select_statement: fine tune add_column_for_post_processing() usage
In three cases we need to consult a column that's possibly not explicitly
selected:
 - for the WHERE clause
 - for GROUP BY
 - for ORDER BY

The return value of the function is the index where the newly-added
column can be found. Currently, the index is correct for both
the internal column vector and the result set, but soon in won't
be.

In the first two cases (WHERE clause and ORDER BY), we're interested
in the column before grouping, in the last case (ORDER BY) we're interested
in the column after grouping, so we need to distinguish between the two.

Since we already have selection::index_of() that returns the pre-grouping
index, choose the post-grouping index for the return value of
selection::add_column_for_post_processing(), and change the GROUP BY
code to use index_of(). Comments are added.
2023-07-03 19:45:17 +03:00
Avi Kivity
6bf1bd7130 cql3: selection: evaluate non-aggregating complex selections using expr::evaluate()
Now that everything is in place, implement the fast-path
transform_input_row() for selection_with_processing. It's a
straightforward call to evaluate() in a loop.

We adjust add_column_for_post_processing() to also update _selectors,
otherwise ORDER BY clauses that require an additional column will not
see that column.

Since every sub-class implements transform_input_row(), mark
the base class declaration as pure virtual.
2023-07-03 19:45:17 +03:00
Avi Kivity
f5eb7fd6dc cql3: selection: store primary key in result_set_builder
expr::evaluate() expects an exploded primary key in its
evaluation_inputs structure (this dates back from the conversion
of filtering to expressions). But right now, the exploded primary
key is only available in the filter.

That's easy to fix however: move the primary key containers
to result_set_builder and just keep references in the filter.

After this, we can evaluate column_value expressions that
reference the primary key.
2023-07-03 19:45:17 +03:00
Avi Kivity
0021f77e30 cql3: expression: fix field_selection::type interpretation by evaluate()
field_selection::type refers to the type of the selection operation,
not the type of the structure being selected. This is what
prepare_expression() generates and how all other expression elements
work, but evaluate() for field_selection thinks it's the type
of the structure, and so fails when it gets an expression
from prepare_expression().

Fix that, and adjust the tests.
2023-07-03 19:45:17 +03:00
Avi Kivity
aed01018a3 cql3: selection: make result_set_builder::current non-optional<>
Previously, we used the engagedness of result_set_builder::optional
as a flag, but the previous patch eliminated that and it's always
engaged. Remove the optional wrapper to reduce noise.
2023-07-03 19:45:17 +03:00
Avi Kivity
44c8507075 cql3: selection: simplify row/group processing
Processing a result set relies on calling result_set_builder::new_row().
This function is quite complex as it has several roles:

 - complete processing of the previously computed row, if any
 - determine if GROUP BY grouping has changed, and flush the previous group
   if so
 - flush the last group if that's the case

This works now, but won't work with expr::evaluate. The reason is that
new_row() is called after the partition key and clustering key of the
new row have been evaluated, so processing of the previous row will see
incorrect data. It works today because we copy the partition key and
clustering key into result_set_builder::current, but expr::evaluate
uses the exploded partition key and clustering key, which have been
clobbered.

The solution is to separate the roles. Instead of new_row() that's
responsible for completing the previous row and starting a new one,
we have start_new_row() that's responsible for what its name says,
and complete_row() that's responsible for completing the row and
checking for group change. The responsibity for flushing the final
group is moved to result_set_builder::build(). This removes the
awkward "more_rows_coming" parameter that makes everything more
complicated.

result_set_builder::current is still optional, but it's always
engaged. The next patch will clean that up.
2023-07-03 19:45:17 +03:00
Avi Kivity
877f4f86d2 cql3: selection: convert requires_thread to expressions
If any function requires a thread to execute (due to running in Lua
or wasm), then the entire selection needs to run in a thread.
2023-07-03 19:45:17 +03:00
Avi Kivity
cbd68abde8 cql: selection: convert used_functions() to expressions
used_functions() is used to check whether prepared statements need
to be invalidated when user-defined functions change.

We need to skip over empty scalar components of aggregates, since
these can be defined by users (with the same meaning as if the
identity function was used).
2023-07-03 19:45:17 +03:00
Avi Kivity
bfb1acc6d3 cql3: selection: convert is_reducible/get_reductions to expressions
The current version of automatic query parallelization works when all
selectors are reducible (e.g. have a state_reduction_function member),
and all the inputs to the aggregates are direct column selectors without
further transformation. The actual column names and reductions need to
be packed up for forward_service to be used.

Convert is_reducible()/get_reductions() to the expression world. The
conversion is fairly straightforward.
2023-07-03 19:45:17 +03:00
Avi Kivity
d99fc29e2d cql3: selection: convert is_count() to expressions
Early versions of automatic query parallelization only
supported `SELECT count(*)` with one selector. Convert the
check to expressions.
2023-07-03 19:45:17 +03:00
Avi Kivity
d36eb8cea6 cql3: selection convert contains_ttl/contains_writetime to work on expressions
contains_ttl/contains_writetime are two attributes of a selection. If a selection
contains them, we must ask the replica to send them over; otherwise we don't
have data to process. Not sending ttl/writetime saves some effort.

The implementation is a straightforward recursive descent using expr::find_in_expression.
2023-07-03 19:45:17 +03:00
Avi Kivity
6c2bb5e1ed cql3: selection: make simple_selectors stateless
Now that we push all GROUP BY queries to selection_with_processing,
we always process rows via transform_input_row() and there's no
reason to keep any state in simple_selectors.

Drop the state and raise an internal error if we're ever
called for aggregation.
2023-07-03 19:45:17 +03:00
Avi Kivity
a26516ef65 cql3: expression: add helper to split expressions with aggregate functions
Aggregate functions cannot be evaluated directly, since they implicitly
refer to state (the accumulator). To allow for evaluation, we
split the expression into two: an inner expression that is evaluated
over the input vector (once per element). The inner expression calls
the aggregation function, with an extra input parameter (the accumulator).

The outer expression is evaluated once per input vector; it calls
the final function, and its input is just the accumulator. The outer
expression also contains any expressions that operate on the result
of the aggregate function.

The acculator is stored in a temporary.

Simple example:

   sum(x)

is transformed into an inner expression:

   t1 = (t1 + x)   // really sum.aggregation_function

and an outer expression:

   result = t1     // really sum.state_to_result_function

Complicated example:

    scalar_func(agg1(x, f1(y)), agg2(x, f2(y)))

is transformed into two inner expressions:

    t1 = agg1.aggregation_function(t1, x, f1(y))
    t2 = agg2.aggregation_function(t2, x, f2(y))

and an outer expression

    output = scalar_func(agg1.state_to_result_function(t1),
                         agg2.state_to_result_function(t2))

There's a small wart: automatically parallelized queries can generate
"reducible" aggregates that have no state_to_result function, since we
want to pass the state back to the coordinator. Detect that and short
circuit evaluation to pass the accumulator directly.
2023-07-03 19:45:17 +03:00
Avi Kivity
f48ecb5049 cql3: selection: short-circuit non-aggregations
Currently, selector evaluation assumes the most complex case
where we aggregate, so multiple input rows combine into one output row.

In effect the query either specifies an outer loop (for the group)
and an inner loop (for input rows), or it only specifies the inner loop;
but we always perform the outer and inner loop.

Prepare to have a separate path for the non-aggregation case by
introducing transform_input_row().
2023-07-03 19:45:17 +03:00
Avi Kivity
4a2428e4ec cql3: selection: drop validate_selectors
It's unused. It dates from the (perhaps better) time when
regularity of aggregation across selectors was enforced.
2023-07-03 19:45:17 +03:00
Avi Kivity
432cb02d64 cql3: select_statement: force aggregation if GROUP BY is used
GROUP BY is typically used with aggregation. In one case the aggregation
is implicit:

    SELECT a, b, c
    FROM tab
    GROUP BY x, y, z

One row will appear from each group, even though no aggregation
was specified. To avoid this irregularity, rewrite this query as

    SELECT first(a), first(b), first(c)
    FROM tab
    GROUP BY x, y, z

This allows us to have different paths for aggregations and
non-aggregations, without worrying about this special case.
2023-07-03 19:45:17 +03:00
Avi Kivity
bc6c64e13c cql3: select_statement: levellize aggregation depth
Avoid mixed aggregate/non-aggregate queries by inserting
calls to the first() function. This allows us to avoid internal
state (simple_selector::_current) and make selector evaluation
stateless apart from explicit temporaries.
2023-07-03 19:45:17 +03:00
Avi Kivity
ecdded90cd cql3: selection: skip first_function when collecting metadata
We plan to rewrite aggregation queries that have a non-aggregating
selector using the first function, so that all selectors are
aggregates (or none are). Prevent the first function from affecting
metadata (the auto-generated column names), by skipping over the
first function if detected. They input and output types are unchanged
so this only affects the name.
2023-07-03 19:45:17 +03:00
Avi Kivity
996e02f5bf cql3: select_statement: explicitly disable automatic parallelization with no aggregates
A query of the form `SELECT foo, count(foo) FROM tab` returns the first
value of the foo column along with the count. This can't be parallized
today since the first selector isn't an aggregate.

We plan to rewrite the query internally as `SELECT first(foo), count(foo)
FROM tab`, in order to make the query more regular (no mixing of aggregates
and non-aggregates). However, this will defeat the current check since
after the rewrite, all selectors are aggregates.

Prepare for this by performing the check on a pre-rewrite variable, so
it won't be affected by the query rewrite in the next patch.

Note that although even though we could add support for running
first() in parallel, it's not possible to get the correct results,
since first() is not commutative and we don't reduce in order. It's
also not a particularly interesting query.
2023-07-03 19:45:17 +03:00
Avi Kivity
778ae2b461 cql3: expression: introduce temporaries
Temporaries are similar to bind variables - they are values provided from
outside the expression. While bind variables are provided by the user, temporaries
are generated internally.

The intended use is for aggregate accumulator storage. Currently aggregates
store the accumulator in aggregate_function_selector::_accumulator, which
means the entire selector hierarchy must be cloned for every query. With
expressions, we can have a single expression object reused for many computations,
but we need a way to inject the accumulator into an aggregation, which this
new expression element provides.
2023-07-03 19:45:17 +03:00
Avi Kivity
7c3ceb6473 cql3: select_statement: use prepared selectors
Change one more layer of processing to work on prepared
rather than raw selectors. This moves the call to prepare
the selectors early in select_statement processing. In turn
this changes maybe_jsonize_select_clause() and forward_service's
mock_selection() to work in the prepared realm as well.

This moves us one step closer to using evaluate() to process
the select clause, as the prepared selectors are now available
in select_statement. We can't use them yet since we can't evaluate
aggregations.
2023-07-03 19:45:17 +03:00
Avi Kivity
a338d0455d cql3: selection: avoid selector_factories in collect_metadata()
Generate the column headings in the result set metadata using
the newly introduced result_set_metadata mode of the expression
printer.
2023-07-03 19:45:17 +03:00
Avi Kivity
7aee322a6c cql3: expressions: add "metadata mode" formatter for expressions
When returning a result set (and when preparing a statement), we
return metadata about the result set columns. Part of that is the
column names, which are derived from the expressions used as selectors.

Currently, they are computed via selector::column_name(), but as
we're dismantling that hierarchy we need a different way to obtain
those names.

It turns out that the expression formatter is close enough to what
we need. To avoid disturbing the current :user mode, add a new
:metadata mode and apply the adjustments needed to bring it in line
with what column metadata looks like today.

Note that column metadata is visible to applications and they can
depend on it; e.g. the Python driver allows choosing columns based on
their names rather than ordinal position.
2023-07-03 19:45:17 +03:00
Avi Kivity
a1f4abb753 cql3: selection: convert collect_metadata() to the prepared expression domain
Simplifies refactoring later on.
2023-07-03 19:45:17 +03:00
Avi Kivity
91b251f6b4 cql3: selection: convert processes_selection to work on prepared expressions
processes_selection() checks whether a selector passes-through a column
or applies some form of processing (like a case or function application).

It's more sensible to do this in the prepared domain as we have more
information about the expression. It doesn't really help here, but
it does help the refactoring later in the series.
2023-07-03 19:45:17 +03:00
Avi Kivity
4fb797303f cql3: selection: prepare selectors earlier
Currently, each selector expression is individually prepared, then converted
into a selector object that is later executed. This is done (on a vector
of raw selectors) by cql3::selection::raw_selector::to_selectables().

Split that into two phases. The first phase converts raw_selector into
a new struct prepared_selector (a better name would be plain 'selector',
but it's taken for now). The second phase continues the process and
converts prepared_selector into selectables.

This gives us a full view of the prepared expressions while we're
preparing the select clause of the select statement.
2023-07-03 19:45:17 +03:00
Avi Kivity
70b246eaaf cql3: raw_selector: deinline
It's easier to refactor things if they don't cause the entire
universe to recompile, plus adding new headers is less painful.
2023-07-03 19:45:17 +03:00
Avi Kivity
99fe0ee772 cql3: expression: reimplement verify_no_aggregate_functions()
Most clauses in a CQL statement don't tolerate aggregate functions,
and so they call verify_no_aggregate_functions(). It can now be
reimplemented in terms of aggregation_depth(), removing some code.
2023-07-03 19:45:17 +03:00
Avi Kivity
b1b4a18ad8 cql3: expression: add helpers to manage an expression's aggregation depth
We define the "aggregation depth" of an expression by how many
nested aggregation functions are applied. In CQL/SQL, legal
values are 0 and 1, but for generality we deal with any aggregation depth.

The first helper measures the maximum aggregation depth along any path
in the expression graph. If it's 2 or greater, we have something like
max(max(x)) and we should reject it (though these helpers don't). If
we get 1 it's a simple aggregation. If it's zero then we're not aggregating
(though CQL may decide to aggregate anyway if GROUP BY is used).

The second helper edits an expression to make sure the aggregation depth
along any path that reaches a column is the same. Logically,
`SELECT x, max(y)` does not make sense, as one is a vector of values
and the other is a scalar. CQL resolves the problem by defining x as
"the first value seen". We apply this resolution by converting the
query to `SELECT first(x), max(y)` (where `first()` is an internal
aggregate function), so both selectors refer to scalars that consume
vectors.

When a scalar is consumed by an aggregate function (for example,
`SELECT max(x), min(17)` we don't have to bother, since a scalar
is implicity promoted to a vector by evaluating it every row. There
is some ambiguity if the scalar is a non-pure function (e.g.
`SELECT max(x), min(random())`, but it's not worth following.

A small unit test is added.
2023-07-03 19:45:16 +03:00
Avi Kivity
faf0ea0f68 cql3: expression: improve printing of prepared function calls
Currently, a prepared function_call expression is printed as an
"anonymous function", but it's not really anonymous - the name is
available. Print it out.

This helps in a unit test later on (and is worthwhile by itself).
2023-07-03 19:02:33 +03:00
Avi Kivity
b7556e9482 cql3: functions: add "first" aggregate function
first(x) returns the first x it sees in the group. This is useful
for SELECT clauses that return a mix of aggregates and non-aggregates,
for example

    SELECT max(x), x

with inputs of x = { 1, 2, 3 } is expected to return (3, 1).

Currently, this behavior is handled by individual selectors,
which means they need to contain extra state for this, which
cannot be easily translated to expressions. The new first function
allows translating the SELECT clause above to

    SELECT max(x), first(x)

so all selectors are aggregations and can be handled in the same
way.

The first() function is not exposed to users.
2023-07-02 18:15:00 +03:00
Piotr Dulikowski
ee9bfb583c combined: mergers: remove recursion in operator()()
In mutation_reader_merger and clustering_order_reader_merger, the
operator()() is responsible for producing mutation fragments that will
be merged and pushed to the combined reader's buffer. Sometimes, it
might have to advance existing readers, open new and / or close some
existing ones, which requires calling a helper method and then calling
operator()() recursively.

In some unlucky circumstances, a stack overflow can occur:

- Readers have to be opened incrementally,
- Most or all readers must not produce any fragments and need to report
  end of stream without preemption,
- There has to be enough readers opened within the lifetime of the
  combined reader (~500),
- All of the above needs to happen within a single task quota.

In order to prevent such a situation, the code of both reader merger
classes were modified not to perform recursion at all. Most of the code
of the operator()() was moved to maybe_produce_batch which does not
recur if it is not possible for it to produce a fragment, instead it
returns std::nullopt and operator()() calls this method in a loop via
seastar::repeat_until_value.

A regression test is added.

Fixes: scylladb/scylladb#14415

Closes #14452
2023-06-30 12:07:13 +03:00
Botond Dénes
5648bfb9a0 Merge 'Find task manager task progress' from Aleksandra Martyniuk
Modify task_manager::task::impl::get_progress method so that,
whenever relevant, progress is calculated based on children's
progress. Otherwise progress indicates only whether the task
is finished or not.

The method may be overriden in inheriting classes.

Closes #14381

* github.com:scylladb/scylladb:
  tasks: delete task_manager::task::impl::_progress as it's unused
  tasks: modify task_manager::task::impl::get_progress method
  tasks: add is_complete method
2023-06-30 09:47:07 +03:00
Botond Dénes
8a7261fd70 Merge 'doc: fix rollback in the 4.3-to-2021.1, 5.0-to-2022.1, and 5.1-to-2022.2 upgrade guides' from Anna Stuchlik
This PR fixes the Restore System Tables section of the upgrade guides by adding a command to clean upgraded SStables during rollback or adding the entire section to restore system tables (which was missing from the older documents).

This PR fixes is a bug and must be backported to branch-5.3, branch-5.2., and branch-5.1.

Refs: https://github.com/scylladb/scylla-enterprise/issues/3046

- [x]  5.1-to-2022.2 - update command (backport to branch-5.3, branch-5.2, and branch-5.1)
- [x]  5.0-to-2022.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1)
- [x]  4.3-to-2021.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1)

(see https://github.com/scylladb/scylla-enterprise/issues/3046#issuecomment-1604232864)

Closes #14444

* github.com:scylladb/scylladb:
  doc: fix rollback in 4.3-to-2021.1 upgrade guide
  doc: fix rollback in 5.0-to-2022.1 upgrade guide
  doc: fix rollback in 5.1-to-2022.2 upgrade guide
2023-06-30 09:38:45 +03:00
Botond Dénes
d20ed2d4db Merge 'doc: improve the Unified Installer page' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/14033

This PR:
- replaces the OUTDATED list of platforms supported by Unified Installer with a link to the "OS Support" page. In this way, the list of supported OSes will be documented in one place, preventing outdated documentation.
- improves the language and syntax, including:
    - Improving the wording.
    - Replacing "Scylla" with "ScyllaDB"
    - Fixing language mistakes
    - Fixing heading underline so that the headings render correctly.

Closes #14445

* github.com:scylladb/scylladb:
  doc: update the language - Unified Installer page
  doc: update Unified Installer support
2023-06-30 09:38:18 +03:00
Kamil Braun
ff386e7a44 service: raft: force initial snapshot transfer in new cluster
When we upgrade a cluster to use Raft, or perform manual Raft recovery
procedure (which also creates a fresh group 0 cluster, using the same
algorithm as during upgrade), we start with a non-empty group 0 state
machine; in particular, the schema tables are non-empty.

In this case we need to ensure that nodes which join group 0 receive the
group 0 state. Right now this is not the case. In previous releases,
where group 0 consisted only of schema, and schema pulls were also done
outside Raft, those nodes received schema through this outside
mechanism. In 91f609d065 we disabled
schema pulls outside Raft; we're also extending group 0 with other
things, like topology-specific state.

To solve this, we force snapshot transfers by setting the initial
snapshot index on the first group 0 server to `1` instead of `0`. During
replication, Raft will see that the joining servers are behind,
triggering snapshot transfer and forcing them to pull group 0 state.

It's unnecessary to do this for cluster which bootstraps with Raft
enabled right away but it also doesn't hurt, so we keep the logic simple
and don't introduce branches based on that.

Extend Raft upgrade tests with a node bootstrap step at the end to
prevent regressions (without this patch, the step would hang - node
would never join, waiting for schema).

Fixes: #14066

Closes #14336
2023-06-29 22:46:42 +02:00
Konstantin Osipov
3d81408a58 test.py: make experimental: raft the default for all tests
Make sure all tests use the new centralized topology
coordinator. This is a step forward towards maturing the
coordinator implementation.

Closes #14039
2023-06-29 14:44:00 +02:00
Tomasz Grabiec
a9282103ba Merge 'Call storage_service notifications only after keyspace schema changes are applied on all shards' from Benny Halevy
This series aims at hardening schema merges and preventing inconsistencies across shards by
updating the database shards before calling the notification callback.

As seen in #13137, we don't want to call the notifications on all shards in parallel while the database shards are in flux.

In addition, any error to update the keyspace will cause abort so not to leave the database shards in an inconsistent state .

Other changes optimize this path by:
- updating shard 0 first, to seed the effective_replication_map.
- executing `storage_service::keyspace_changed` only once, on shard 0 to prevent quadratic update of the token_metadata and e_r_m on every keyspace change.

Fixes #13137

Closes #14158

* github.com:scylladb/scylladb:
  migration_manager: propagate listener notification exceptions
  storage_service: keyspace_changed: execute only on shard 0
  database: modify_keyspace_on_all_shards: execute func first on shard 0
  database: modify_keyspace_on_all_shards: call notifiers only after applying func on all shards
  database: add modify_keyspace_on_all_shards
  schema_tables: merge_keyspaces: extract_scylla_specific_keyspace_info for update_keyspace
  database: create_keyspace_on_all_shards
  database: update_keyspace_on_all_shards
  database: drop_keyspace_on_all_shards
2023-06-29 12:17:53 +02:00
Anna Stuchlik
d3aba00131 doc: update the language - Unified Installer page
This commit improves the language and syntax on
the Unified Installer page. The changes cover:

- Improving the wording.
- Replacing "Scylla" with "ScyllaDB"
- Fixing language mistakes
- Fixing heading underline so that the headings
  render correctly.
2023-06-29 12:11:22 +02:00
Anna Stuchlik
944ce5c5c2 doc: update Unified Installer support
This commit replaces the OUTDATED list of platforms supported
by Unified Installer with a link to the "OS Support" page.
In this way, the list of supported OSes will be documented
in one place, preventing outdated documentation.
2023-06-29 11:51:21 +02:00
Aleksandra Martyniuk
f63825151e tasks: delete task_manager::task::impl::_progress as it's unused 2023-06-29 11:30:27 +02:00
Aleksandra Martyniuk
d624be4e6b tasks: modify task_manager::task::impl::get_progress method
Modify task_manager::task::impl::get_progress method so that,
whenever relevant, progress is calculated based on children's
progress. Otherwise progress indicates only whether the task
is finished or not.
2023-06-29 11:30:26 +02:00
Botond Dénes
2a58b4a39a Merge 'Compaction resharding tasks' from Aleksandra Martyniuk
Task manager's tasks covering resharding compaction
on table and shard level.

Closes #14044

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to test resharding compaction
  compaction: add shard_reshard_sstables_compaction_task_impl
  compaction: invoke resharding on sharded database
  compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run()
  replica: delete unused functions and struct
  compaction: add reshard_sstables_compaction_task_impl
  compaction: replica: copy struct and functions from distributed_loader.cc
  compaction: create resharding_compaction_task_impl
2023-06-29 12:10:54 +03:00
Aleksandra Martyniuk
0278b21e76 tasks: add is_complete method
Add is_complete method to task_manager::task::impl and
task_manager::task.
2023-06-29 11:02:14 +02:00
Anna Stuchlik
32cfde2f8b doc: fix rollback in 4.3-to-2021.1 upgrade guide
This commit fixes the Restore System Tables section
in the 5.4.3-to-2021.1 upgrade guide by adding the command
to restore system tables.
2023-06-29 10:36:47 +02:00
Anna Stuchlik
130ddc3d2b doc: fix rollback in 5.0-to-2022.1 upgrade guide
This commit fixes the Restore System Tables section
in the 5.0-to-2022.1 upgrade guide by adding the command
to restore system tables.
2023-06-29 10:29:23 +02:00
Anna Stuchlik
8b3153f9ef doc: fix rollback in 5.1-to-2022.2 upgrade guide
This commit fixes the Restore System Tables section
in the 5.1-to-2022.2 upgrade guide by adding a command
to clean upgraded SStables during rollback.
2023-06-29 10:17:06 +02:00
Nadav Har'El
dd63169077 Merge 'test/boost/index_with_paging_test: reduce running time' from Alecco
Reduce test string value size, parallelize inserts, and use a prepared statement,

The debug running time for this tests is reduced from 13:18 to 7:52.

Refs #13905

Closes #14380

* github.com:scylladb/scylladb:
  test/boost/index_with_paging_test: parallel insert
  test/boost/index_with_paging_test: prepared statement
  test/boost/index_with_paging_test: reduce running time
2023-06-29 10:45:01 +03:00
Tomasz Grabiec
50e8ec77c6 Merge 'Wait for other nodes to be UP and NORMAL on bootstrap right after enabling gossiping' from Kamil Braun
`handle_state_normal` may drop connections to the handled node. This
causes spurious failures if there's an ongoing concurrent operation.
This problem was already solved twice in the past in different contexts:
first in 53636167ca, then in
79ee38181c.

Time to fix it for the third time. Now we do this right after enabling
gossiping, so hopefully it's the last time.

This time it's causing snapshot transfer failures in group 0. Although
the transfer is retried and eventually succeeds, the failed transfer is
wasted work and causes an annoying ERROR message in the log which
dtests, SCT, and I don't like.

The fix is done by moving the `wait_for_normal_state_handled_on_boot()`
call before `setup_group0()`. But for the wait to work correctly we must
first ensure that gossiper sees an alive node, so we precede it with
`wait_for_live_node_to_show_up()` (before this commit, the call site of
`wait_for_normal_state_handled_on_boot` was already after this wait).

There is another problem: the bootstrap procedure is racing with gossiper
marking nodes as UP, and waiting for other nodes to be NORMAL doesn't guarantee
that they are also UP. If gossiper is quick enough, everything will be fine.
If not, problems may arise such as streaming or repair failing due to nodes
still being marked as DOWN, or the CDC generation write failing.

In general, we need all NORMAL nodes to be up for bootstrap to proceed.
One exception is replace where we ignore the replaced node. The
`sync_nodes` set constructed for `wait_for_normal_state_handled_on_boot`
takes this into account, so we also use it to wait for nodes to be UP.

As explained in commit messages and comments, we only do these
waits outside raft-based-topology mode.

This should improve CI stability.
Fixes: #12972
Refs: #14042

Closes #14354

* github.com:scylladb/scylladb:
  messaging_service: print which connections are dropped due to missing topology info
  storage_service: wait for nodes to be UP on bootstrap
  storage_service: wait for NORMAL state handler before `setup_group0()`
  storage_service: extract `gossiper::wait_for_live_nodes_to_show_up()`
2023-06-28 20:40:03 +02:00