scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Avi Kivity	cbd68abde8	cql: selection: convert used_functions() to expressions used_functions() is used to check whether prepared statements need to be invalidated when user-defined functions change. We need to skip over empty scalar components of aggregates, since these can be defined by users (with the same meaning as if the identity function was used).	2023-07-03 19:45:17 +03:00
Avi Kivity	bfb1acc6d3	cql3: selection: convert is_reducible/get_reductions to expressions The current version of automatic query parallelization works when all selectors are reducible (e.g. have a state_reduction_function member), and all the inputs to the aggregates are direct column selectors without further transformation. The actual column names and reductions need to be packed up for forward_service to be used. Convert is_reducible()/get_reductions() to the expression world. The conversion is fairly straightforward.	2023-07-03 19:45:17 +03:00
Avi Kivity	d99fc29e2d	cql3: selection: convert is_count() to expressions Early versions of automatic query parallelization only supported `SELECT count(*)` with one selector. Convert the check to expressions.	2023-07-03 19:45:17 +03:00
Avi Kivity	d36eb8cea6	cql3: selection convert contains_ttl/contains_writetime to work on expressions contains_ttl/contains_writetime are two attributes of a selection. If a selection contains them, we must ask the replica to send them over; otherwise we don't have data to process. Not sending ttl/writetime saves some effort. The implementation is a straightforward recursive descent using expr::find_in_expression.	2023-07-03 19:45:17 +03:00
Avi Kivity	6c2bb5e1ed	cql3: selection: make simple_selectors stateless Now that we push all GROUP BY queries to selection_with_processing, we always process rows via transform_input_row() and there's no reason to keep any state in simple_selectors. Drop the state and raise an internal error if we're ever called for aggregation.	2023-07-03 19:45:17 +03:00
Avi Kivity	a26516ef65	cql3: expression: add helper to split expressions with aggregate functions Aggregate functions cannot be evaluated directly, since they implicitly refer to state (the accumulator). To allow for evaluation, we split the expression into two: an inner expression that is evaluated over the input vector (once per element). The inner expression calls the aggregation function, with an extra input parameter (the accumulator). The outer expression is evaluated once per input vector; it calls the final function, and its input is just the accumulator. The outer expression also contains any expressions that operate on the result of the aggregate function. The acculator is stored in a temporary. Simple example: sum(x) is transformed into an inner expression: t1 = (t1 + x) // really sum.aggregation_function and an outer expression: result = t1 // really sum.state_to_result_function Complicated example: scalar_func(agg1(x, f1(y)), agg2(x, f2(y))) is transformed into two inner expressions: t1 = agg1.aggregation_function(t1, x, f1(y)) t2 = agg2.aggregation_function(t2, x, f2(y)) and an outer expression output = scalar_func(agg1.state_to_result_function(t1), agg2.state_to_result_function(t2)) There's a small wart: automatically parallelized queries can generate "reducible" aggregates that have no state_to_result function, since we want to pass the state back to the coordinator. Detect that and short circuit evaluation to pass the accumulator directly.	2023-07-03 19:45:17 +03:00
Avi Kivity	f48ecb5049	cql3: selection: short-circuit non-aggregations Currently, selector evaluation assumes the most complex case where we aggregate, so multiple input rows combine into one output row. In effect the query either specifies an outer loop (for the group) and an inner loop (for input rows), or it only specifies the inner loop; but we always perform the outer and inner loop. Prepare to have a separate path for the non-aggregation case by introducing transform_input_row().	2023-07-03 19:45:17 +03:00
Avi Kivity	4a2428e4ec	cql3: selection: drop validate_selectors It's unused. It dates from the (perhaps better) time when regularity of aggregation across selectors was enforced.	2023-07-03 19:45:17 +03:00
Avi Kivity	432cb02d64	cql3: select_statement: force aggregation if GROUP BY is used GROUP BY is typically used with aggregation. In one case the aggregation is implicit: SELECT a, b, c FROM tab GROUP BY x, y, z One row will appear from each group, even though no aggregation was specified. To avoid this irregularity, rewrite this query as SELECT first(a), first(b), first(c) FROM tab GROUP BY x, y, z This allows us to have different paths for aggregations and non-aggregations, without worrying about this special case.	2023-07-03 19:45:17 +03:00
Avi Kivity	bc6c64e13c	cql3: select_statement: levellize aggregation depth Avoid mixed aggregate/non-aggregate queries by inserting calls to the first() function. This allows us to avoid internal state (simple_selector::_current) and make selector evaluation stateless apart from explicit temporaries.	2023-07-03 19:45:17 +03:00
Avi Kivity	ecdded90cd	cql3: selection: skip first_function when collecting metadata We plan to rewrite aggregation queries that have a non-aggregating selector using the first function, so that all selectors are aggregates (or none are). Prevent the first function from affecting metadata (the auto-generated column names), by skipping over the first function if detected. They input and output types are unchanged so this only affects the name.	2023-07-03 19:45:17 +03:00
Avi Kivity	996e02f5bf	cql3: select_statement: explicitly disable automatic parallelization with no aggregates A query of the form `SELECT foo, count(foo) FROM tab` returns the first value of the foo column along with the count. This can't be parallized today since the first selector isn't an aggregate. We plan to rewrite the query internally as `SELECT first(foo), count(foo) FROM tab`, in order to make the query more regular (no mixing of aggregates and non-aggregates). However, this will defeat the current check since after the rewrite, all selectors are aggregates. Prepare for this by performing the check on a pre-rewrite variable, so it won't be affected by the query rewrite in the next patch. Note that although even though we could add support for running first() in parallel, it's not possible to get the correct results, since first() is not commutative and we don't reduce in order. It's also not a particularly interesting query.	2023-07-03 19:45:17 +03:00
Avi Kivity	778ae2b461	cql3: expression: introduce temporaries Temporaries are similar to bind variables - they are values provided from outside the expression. While bind variables are provided by the user, temporaries are generated internally. The intended use is for aggregate accumulator storage. Currently aggregates store the accumulator in aggregate_function_selector::_accumulator, which means the entire selector hierarchy must be cloned for every query. With expressions, we can have a single expression object reused for many computations, but we need a way to inject the accumulator into an aggregation, which this new expression element provides.	2023-07-03 19:45:17 +03:00
Avi Kivity	7c3ceb6473	cql3: select_statement: use prepared selectors Change one more layer of processing to work on prepared rather than raw selectors. This moves the call to prepare the selectors early in select_statement processing. In turn this changes maybe_jsonize_select_clause() and forward_service's mock_selection() to work in the prepared realm as well. This moves us one step closer to using evaluate() to process the select clause, as the prepared selectors are now available in select_statement. We can't use them yet since we can't evaluate aggregations.	2023-07-03 19:45:17 +03:00
Avi Kivity	a338d0455d	cql3: selection: avoid selector_factories in collect_metadata() Generate the column headings in the result set metadata using the newly introduced result_set_metadata mode of the expression printer.	2023-07-03 19:45:17 +03:00
Avi Kivity	7aee322a6c	cql3: expressions: add "metadata mode" formatter for expressions When returning a result set (and when preparing a statement), we return metadata about the result set columns. Part of that is the column names, which are derived from the expressions used as selectors. Currently, they are computed via selector::column_name(), but as we're dismantling that hierarchy we need a different way to obtain those names. It turns out that the expression formatter is close enough to what we need. To avoid disturbing the current :user mode, add a new :metadata mode and apply the adjustments needed to bring it in line with what column metadata looks like today. Note that column metadata is visible to applications and they can depend on it; e.g. the Python driver allows choosing columns based on their names rather than ordinal position.	2023-07-03 19:45:17 +03:00
Avi Kivity	a1f4abb753	cql3: selection: convert collect_metadata() to the prepared expression domain Simplifies refactoring later on.	2023-07-03 19:45:17 +03:00
Avi Kivity	91b251f6b4	cql3: selection: convert processes_selection to work on prepared expressions processes_selection() checks whether a selector passes-through a column or applies some form of processing (like a case or function application). It's more sensible to do this in the prepared domain as we have more information about the expression. It doesn't really help here, but it does help the refactoring later in the series.	2023-07-03 19:45:17 +03:00
Avi Kivity	4fb797303f	cql3: selection: prepare selectors earlier Currently, each selector expression is individually prepared, then converted into a selector object that is later executed. This is done (on a vector of raw selectors) by cql3::selection::raw_selector::to_selectables(). Split that into two phases. The first phase converts raw_selector into a new struct prepared_selector (a better name would be plain 'selector', but it's taken for now). The second phase continues the process and converts prepared_selector into selectables. This gives us a full view of the prepared expressions while we're preparing the select clause of the select statement.	2023-07-03 19:45:17 +03:00
Avi Kivity	70b246eaaf	cql3: raw_selector: deinline It's easier to refactor things if they don't cause the entire universe to recompile, plus adding new headers is less painful.	2023-07-03 19:45:17 +03:00
Avi Kivity	99fe0ee772	cql3: expression: reimplement verify_no_aggregate_functions() Most clauses in a CQL statement don't tolerate aggregate functions, and so they call verify_no_aggregate_functions(). It can now be reimplemented in terms of aggregation_depth(), removing some code.	2023-07-03 19:45:17 +03:00
Avi Kivity	b1b4a18ad8	cql3: expression: add helpers to manage an expression's aggregation depth We define the "aggregation depth" of an expression by how many nested aggregation functions are applied. In CQL/SQL, legal values are 0 and 1, but for generality we deal with any aggregation depth. The first helper measures the maximum aggregation depth along any path in the expression graph. If it's 2 or greater, we have something like max(max(x)) and we should reject it (though these helpers don't). If we get 1 it's a simple aggregation. If it's zero then we're not aggregating (though CQL may decide to aggregate anyway if GROUP BY is used). The second helper edits an expression to make sure the aggregation depth along any path that reaches a column is the same. Logically, `SELECT x, max(y)` does not make sense, as one is a vector of values and the other is a scalar. CQL resolves the problem by defining x as "the first value seen". We apply this resolution by converting the query to `SELECT first(x), max(y)` (where `first()` is an internal aggregate function), so both selectors refer to scalars that consume vectors. When a scalar is consumed by an aggregate function (for example, `SELECT max(x), min(17)` we don't have to bother, since a scalar is implicity promoted to a vector by evaluating it every row. There is some ambiguity if the scalar is a non-pure function (e.g. `SELECT max(x), min(random())`, but it's not worth following. A small unit test is added.	2023-07-03 19:45:16 +03:00
Avi Kivity	faf0ea0f68	cql3: expression: improve printing of prepared function calls Currently, a prepared function_call expression is printed as an "anonymous function", but it's not really anonymous - the name is available. Print it out. This helps in a unit test later on (and is worthwhile by itself).	2023-07-03 19:02:33 +03:00
Avi Kivity	b7556e9482	cql3: functions: add "first" aggregate function first(x) returns the first x it sees in the group. This is useful for SELECT clauses that return a mix of aggregates and non-aggregates, for example SELECT max(x), x with inputs of x = { 1, 2, 3 } is expected to return (3, 1). Currently, this behavior is handled by individual selectors, which means they need to contain extra state for this, which cannot be easily translated to expressions. The new first function allows translating the SELECT clause above to SELECT max(x), first(x) so all selectors are aggregations and can be handled in the same way. The first() function is not exposed to users.	2023-07-02 18:15:00 +03:00
Piotr Dulikowski	ee9bfb583c	combined: mergers: remove recursion in operator()() In mutation_reader_merger and clustering_order_reader_merger, the operator()() is responsible for producing mutation fragments that will be merged and pushed to the combined reader's buffer. Sometimes, it might have to advance existing readers, open new and / or close some existing ones, which requires calling a helper method and then calling operator()() recursively. In some unlucky circumstances, a stack overflow can occur: - Readers have to be opened incrementally, - Most or all readers must not produce any fragments and need to report end of stream without preemption, - There has to be enough readers opened within the lifetime of the combined reader (~500), - All of the above needs to happen within a single task quota. In order to prevent such a situation, the code of both reader merger classes were modified not to perform recursion at all. Most of the code of the operator()() was moved to maybe_produce_batch which does not recur if it is not possible for it to produce a fragment, instead it returns std::nullopt and operator()() calls this method in a loop via seastar::repeat_until_value. A regression test is added. Fixes: scylladb/scylladb#14415 Closes #14452	2023-06-30 12:07:13 +03:00
Botond Dénes	5648bfb9a0	Merge 'Find task manager task progress' from Aleksandra Martyniuk Modify task_manager::task::impl::get_progress method so that, whenever relevant, progress is calculated based on children's progress. Otherwise progress indicates only whether the task is finished or not. The method may be overriden in inheriting classes. Closes #14381 * github.com:scylladb/scylladb: tasks: delete task_manager::task::impl::_progress as it's unused tasks: modify task_manager::task::impl::get_progress method tasks: add is_complete method	2023-06-30 09:47:07 +03:00
Botond Dénes	8a7261fd70	Merge 'doc: fix rollback in the 4.3-to-2021.1, 5.0-to-2022.1, and 5.1-to-2022.2 upgrade guides' from Anna Stuchlik This PR fixes the Restore System Tables section of the upgrade guides by adding a command to clean upgraded SStables during rollback or adding the entire section to restore system tables (which was missing from the older documents). This PR fixes is a bug and must be backported to branch-5.3, branch-5.2., and branch-5.1. Refs: https://github.com/scylladb/scylla-enterprise/issues/3046 - [x] 5.1-to-2022.2 - update command (backport to branch-5.3, branch-5.2, and branch-5.1) - [x] 5.0-to-2022.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1) - [x] 4.3-to-2021.1 - add "Restore system tables" to rollback (backport to branch-5.3, branch-5.2, and branch-5.1) (see https://github.com/scylladb/scylla-enterprise/issues/3046#issuecomment-1604232864) Closes #14444 * github.com:scylladb/scylladb: doc: fix rollback in 4.3-to-2021.1 upgrade guide doc: fix rollback in 5.0-to-2022.1 upgrade guide doc: fix rollback in 5.1-to-2022.2 upgrade guide	2023-06-30 09:38:45 +03:00
Botond Dénes	d20ed2d4db	Merge 'doc: improve the Unified Installer page' from Anna Stuchlik Fixes https://github.com/scylladb/scylladb/issues/14033 This PR: - replaces the OUTDATED list of platforms supported by Unified Installer with a link to the "OS Support" page. In this way, the list of supported OSes will be documented in one place, preventing outdated documentation. - improves the language and syntax, including: - Improving the wording. - Replacing "Scylla" with "ScyllaDB" - Fixing language mistakes - Fixing heading underline so that the headings render correctly. Closes #14445 * github.com:scylladb/scylladb: doc: update the language - Unified Installer page doc: update Unified Installer support	2023-06-30 09:38:18 +03:00
Kamil Braun	ff386e7a44	service: raft: force initial snapshot transfer in new cluster When we upgrade a cluster to use Raft, or perform manual Raft recovery procedure (which also creates a fresh group 0 cluster, using the same algorithm as during upgrade), we start with a non-empty group 0 state machine; in particular, the schema tables are non-empty. In this case we need to ensure that nodes which join group 0 receive the group 0 state. Right now this is not the case. In previous releases, where group 0 consisted only of schema, and schema pulls were also done outside Raft, those nodes received schema through this outside mechanism. In `91f609d065` we disabled schema pulls outside Raft; we're also extending group 0 with other things, like topology-specific state. To solve this, we force snapshot transfers by setting the initial snapshot index on the first group 0 server to `1` instead of `0`. During replication, Raft will see that the joining servers are behind, triggering snapshot transfer and forcing them to pull group 0 state. It's unnecessary to do this for cluster which bootstraps with Raft enabled right away but it also doesn't hurt, so we keep the logic simple and don't introduce branches based on that. Extend Raft upgrade tests with a node bootstrap step at the end to prevent regressions (without this patch, the step would hang - node would never join, waiting for schema). Fixes: #14066 Closes #14336	2023-06-29 22:46:42 +02:00
Konstantin Osipov	3d81408a58	test.py: make `experimental: raft` the default for all tests Make sure all tests use the new centralized topology coordinator. This is a step forward towards maturing the coordinator implementation. Closes #14039	2023-06-29 14:44:00 +02:00
Tomasz Grabiec	a9282103ba	Merge 'Call storage_service notifications only after keyspace schema changes are applied on all shards' from Benny Halevy This series aims at hardening schema merges and preventing inconsistencies across shards by updating the database shards before calling the notification callback. As seen in #13137, we don't want to call the notifications on all shards in parallel while the database shards are in flux. In addition, any error to update the keyspace will cause abort so not to leave the database shards in an inconsistent state . Other changes optimize this path by: - updating shard 0 first, to seed the effective_replication_map. - executing `storage_service::keyspace_changed` only once, on shard 0 to prevent quadratic update of the token_metadata and e_r_m on every keyspace change. Fixes #13137 Closes #14158 * github.com:scylladb/scylladb: migration_manager: propagate listener notification exceptions storage_service: keyspace_changed: execute only on shard 0 database: modify_keyspace_on_all_shards: execute func first on shard 0 database: modify_keyspace_on_all_shards: call notifiers only after applying func on all shards database: add modify_keyspace_on_all_shards schema_tables: merge_keyspaces: extract_scylla_specific_keyspace_info for update_keyspace database: create_keyspace_on_all_shards database: update_keyspace_on_all_shards database: drop_keyspace_on_all_shards	2023-06-29 12:17:53 +02:00
Anna Stuchlik	d3aba00131	doc: update the language - Unified Installer page This commit improves the language and syntax on the Unified Installer page. The changes cover: - Improving the wording. - Replacing "Scylla" with "ScyllaDB" - Fixing language mistakes - Fixing heading underline so that the headings render correctly.	2023-06-29 12:11:22 +02:00
Anna Stuchlik	944ce5c5c2	doc: update Unified Installer support This commit replaces the OUTDATED list of platforms supported by Unified Installer with a link to the "OS Support" page. In this way, the list of supported OSes will be documented in one place, preventing outdated documentation.	2023-06-29 11:51:21 +02:00
Aleksandra Martyniuk	f63825151e	tasks: delete task_manager::task::impl::_progress as it's unused	2023-06-29 11:30:27 +02:00
Aleksandra Martyniuk	d624be4e6b	tasks: modify task_manager::task::impl::get_progress method Modify task_manager::task::impl::get_progress method so that, whenever relevant, progress is calculated based on children's progress. Otherwise progress indicates only whether the task is finished or not.	2023-06-29 11:30:26 +02:00
Botond Dénes	2a58b4a39a	Merge 'Compaction resharding tasks' from Aleksandra Martyniuk Task manager's tasks covering resharding compaction on table and shard level. Closes #14044 * github.com:scylladb/scylladb: test: extend test_compaction_task.py to test resharding compaction compaction: add shard_reshard_sstables_compaction_task_impl compaction: invoke resharding on sharded database compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run() replica: delete unused functions and struct compaction: add reshard_sstables_compaction_task_impl compaction: replica: copy struct and functions from distributed_loader.cc compaction: create resharding_compaction_task_impl	2023-06-29 12:10:54 +03:00
Aleksandra Martyniuk	0278b21e76	tasks: add is_complete method Add is_complete method to task_manager::task::impl and task_manager::task.	2023-06-29 11:02:14 +02:00
Anna Stuchlik	32cfde2f8b	doc: fix rollback in 4.3-to-2021.1 upgrade guide This commit fixes the Restore System Tables section in the 5.4.3-to-2021.1 upgrade guide by adding the command to restore system tables.	2023-06-29 10:36:47 +02:00
Anna Stuchlik	130ddc3d2b	doc: fix rollback in 5.0-to-2022.1 upgrade guide This commit fixes the Restore System Tables section in the 5.0-to-2022.1 upgrade guide by adding the command to restore system tables.	2023-06-29 10:29:23 +02:00
Anna Stuchlik	8b3153f9ef	doc: fix rollback in 5.1-to-2022.2 upgrade guide This commit fixes the Restore System Tables section in the 5.1-to-2022.2 upgrade guide by adding a command to clean upgraded SStables during rollback.	2023-06-29 10:17:06 +02:00
Nadav Har'El	dd63169077	Merge 'test/boost/index_with_paging_test: reduce running time' from Alecco Reduce test string value size, parallelize inserts, and use a prepared statement, The debug running time for this tests is reduced from 13:18 to 7:52. Refs #13905 Closes #14380 * github.com:scylladb/scylladb: test/boost/index_with_paging_test: parallel insert test/boost/index_with_paging_test: prepared statement test/boost/index_with_paging_test: reduce running time	2023-06-29 10:45:01 +03:00
Tomasz Grabiec	50e8ec77c6	Merge 'Wait for other nodes to be UP and NORMAL on bootstrap right after enabling gossiping' from Kamil Braun `handle_state_normal` may drop connections to the handled node. This causes spurious failures if there's an ongoing concurrent operation. This problem was already solved twice in the past in different contexts: first in `53636167ca`, then in `79ee38181c`. Time to fix it for the third time. Now we do this right after enabling gossiping, so hopefully it's the last time. This time it's causing snapshot transfer failures in group 0. Although the transfer is retried and eventually succeeds, the failed transfer is wasted work and causes an annoying ERROR message in the log which dtests, SCT, and I don't like. The fix is done by moving the `wait_for_normal_state_handled_on_boot()` call before `setup_group0()`. But for the wait to work correctly we must first ensure that gossiper sees an alive node, so we precede it with `wait_for_live_node_to_show_up()` (before this commit, the call site of `wait_for_normal_state_handled_on_boot` was already after this wait). There is another problem: the bootstrap procedure is racing with gossiper marking nodes as UP, and waiting for other nodes to be NORMAL doesn't guarantee that they are also UP. If gossiper is quick enough, everything will be fine. If not, problems may arise such as streaming or repair failing due to nodes still being marked as DOWN, or the CDC generation write failing. In general, we need all NORMAL nodes to be up for bootstrap to proceed. One exception is replace where we ignore the replaced node. The `sync_nodes` set constructed for `wait_for_normal_state_handled_on_boot` takes this into account, so we also use it to wait for nodes to be UP. As explained in commit messages and comments, we only do these waits outside raft-based-topology mode. This should improve CI stability. Fixes: #12972 Refs: #14042 Closes #14354 * github.com:scylladb/scylladb: messaging_service: print which connections are dropped due to missing topology info storage_service: wait for nodes to be UP on bootstrap storage_service: wait for NORMAL state handler before `setup_group0()` storage_service: extract `gossiper::wait_for_live_nodes_to_show_up()`	2023-06-28 20:40:03 +02:00
Avi Kivity	f6f974cdeb	cql3: selection: fix GROUP BY, empty groups, and aggregations A GROUP BY combined with aggregation should produce a single row per group, except for empty groups. This is in contrast to an aggregation without GROUP BY, which produces a single row no matter what. The existing code only considered the case of no grouping and forced a row into the result, but this caused an unwanted row if grouping was used. Fix by refining the check to also consider GROUP BY. XFAIL tests are relaxed. Fixes #12477. Note, forward_service requires that aggregation produce exactly one row, but since it can't work with grouping, it isn't affected. Closes #14399	2023-06-28 18:56:22 +03:00
Kamil Braun	b912eeade5	Merge 'merge raft commands to group0 before applying them whenever possible' from Gleb Since most group0 commands are just mutations it is easy to combine them before passing them to a subsystem they destined to since it is more efficient. The logic that handles those mutations in a subsystem will run once for each batch of commands instead of for each individual command. This is especially useful when a node catches up to a leader and gets a lot of commands together. The patch here does exactly that. It combines commands into a single command if possible, but it preserves an order between commands, so each time it encounters a command to a different subsystem it flushes already combined batch and starts a new one. This extra safety assumes that there are dependencies between subsystems managed by group0, so the order matters. It may be not the case now, but we prefer to be on a safe side. Broadcast table commands are not mutations, so they are never combined. * 'raft-merge-cmds' of https://github.com/gleb-cloudius/scylla: test: add test for group0 raft command merging service: raft: respect max mutation size limit when persisting raft entries group0_state_machine: merge commands before applying them whenever possible	2023-06-28 17:21:07 +02:00
Kamil Braun	1fa9678c64	messaging_service: print which connections are dropped due to missing topology info This connection dropping caused us to spend a lot of time debugging. Those debugging sessions would be shorter if Scylla logs indicated that connections are being dropped and why. Connection drops for a given node are a one-time event - we only do it if we establish a connection to a node without topology info, which should only happen before we handle the node's NORMAL status for the first time. So it's a rare thing and we can log it on INFO level without worrying about log spam.	2023-06-28 16:20:29 +02:00
Kamil Braun	51cec2be86	storage_service: wait for nodes to be UP on bootstrap The bootstrap procedure is racing with gossiper marking nodes as UP. If gossiper is quick enough, everything will be fine. If not, problems may arise such as streaming or repair failing due to nodes still being marked as DOWN, or the CDC generation write failing. In general, we need all NORMAL nodes to be up for bootstrap to proceed. One exception is replace where we ignore the replaced node. The `sync_nodes` set constructed for `wait_for_normal_state_handled_on_boot` takes this into account, so we use it. Refs: #14042 This doesn't completely fix #14042 yet becasue it's specific to gossiper-based topology mode only. For Raft-based topology, the node joining procedure will be coordinated by the topology coordinator right from the start and it will be the coordinator who issues the 'wait for node to see other live nodes'.	2023-06-28 16:20:29 +02:00
Kamil Braun	5ec5c7704c	storage_service: wait for NORMAL state handler before `setup_group0()` `handle_state_normal` may drop connections to the handled node. This causes spurious failures if there's an ongoing concurrent operation. This problem was already solved twice in the past in different contexts: first in `53636167ca`, then in `79ee38181c`. Time to fix it for the third time. Now we do this right after enabling gossiping, so hopefully it's the last time. This time it's causing snapshot transfer failures in group 0. Although the transfer is retried and eventually succeeds, the failed transfer is wasted work and causes an annoying ERROR message in the log which dtests, SCT, and I don't like. The fix is done by moving the `wait_for_normal_state_handled_on_boot()` call before `setup_group0()`. But for the wait to work correctly we must first ensure that gossiper sees an alive node, so we precede it with `wait_for_live_node_to_show_up()` (before this commit, the call site of `wait_for_normal_state_handled_on_boot` was already after this wait). We do it only in non-raft-topology mode, because with Raft-based topology, node state changes are propagated to the cluster through explicit global barriers and we plan to remove node statuses from gossiper altogether. Fixes: #12972	2023-06-28 16:19:24 +02:00
Anna Stuchlik	f4ae2c095b	doc: fix rollback in 5.2-to-2023.1 upgrade guide This commit fixes the Restore System Tables section in the 5.2-to-2023.1 upgrade guide by adding a command to clean upgraded SStables during rollback. This is a bug (an incomplete command) and must be backported to branch-5.3 and branch-5.2. Refs: https://github.com/scylladb/scylla-enterprise/issues/3046 Closes #14373	2023-06-28 17:16:32 +03:00
Alejo Sanchez	d4697ed21e	test/boost/index_with_paging_test: parallel insert Parallelize inserts for long-running test_index_with_paging. Run time in debug mode reduced by 1 minute 48 seconds. Refs #13905 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-06-28 16:11:58 +02:00
Alejo Sanchez	70a3179888	test/boost/index_with_paging_test: prepared statement Prepare statement for insert. Run time in debug mode reduced by 9 seconds. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-06-28 14:49:21 +02:00

1 2 3 4 5 ...

37659 Commits