related PR: https://github.com/scylladb/scylladb/pull/27527
This PR changes test.py logic of parsing boost test cases to use -- --list_json_content
and pass boost labels as pytests markers
using -- --list_json_content is not ideal and currenly require to implement severall [workarounds](https://github.com/scylladb/scylladb/pull/27527#issuecomment-3765499812), but having the ability to support boost labels in pytest is worth it. because now we can apply the tiering mechanism for the boost tests as well
Fixes SCYLLADB-246
Closesscylladb/scylladb#28232
* github.com:scylladb/scylladb:
test: add nightly label
test.py: support boost labels in test.py
Hints destined for some other node can only be drained after the other node is no longer a replica of any vnode or tablet. In case when tablets are present, a node might still technically be a replica of some tablets after it moved to left state. When it no longer is a replica of any tablet, it becomes "released" and storage service generates a notification about it. Hinted handoff listens to this notification and kicks off draining hints after getting it.
The current implementation of the "released" notification would trigger every time raft topology state is reloaded and a left node without any tokens is present in the raft topology. Although draining hints is idempotent, generating duplicate notifications is wasteful and recently became very noisy after in 44de563 verbosity of the draining-related log messages have been increased. The verbosity increase itself makes sense as draining is supposed to be a rare operation, but the duplicate notification bug now needs to be addressed.
Fix the duplicate notification problem by passing the list of previously released nodes to the `storage_service::raft_topology_update_ip` function and filtering based on it. If this function processes the topology state for the first time, it will not produce any notifications. This is fine as hinted handoff is prepared to detect "released" nodes during the startup sequence in main.cc and start draining the hints there, if needed.
Fixes: scylladb/scylladb#28301
Refs: scylladb/scylladb#25031
The log messages added in 44de563 cause a lot of noise during topology operations and tablet migrations, so the fix should be backported to all affected versions (2025.4 and 2026.1).
Closesscylladb/scylladb#28367
* github.com:scylladb/scylladb:
storage_service: fix indentation after previous patch
raft topology: generate notification about released nodes only once
raft topology: extract "released" nodes calculation to external function
We currently make the local node the only token owner (that owns the
whole ring) in maintenance mode, but we don't update the topology properly.
The node is present in the topology, but in the `none` state. That's how
it's inserted by `tm.get_topology().set_host_id_cfg(host_id);` in
`scylla_main`. As a result, the node started in maintenance mode crashes
in the following way in the presence of a vnodes-based keyspace with the
NetworkTopologyStrategy:
```
scylla: locator/network_topology_strategy.cc:207:
locator::natural_endpoints_tracker::natural_endpoints_tracker(
const token_metadata &, const network_topology_strategy::dc_rep_factor_map &):
Assertion `!_token_owners.empty() && !_racks.empty()' failed.
```
Both `_token_owners` and `_racks` are empty. The reason is that
`_tm.get_datacenter_token_owners()` and
`_tm.get_datacenter_racks_token_owners()` called above filter out nodes
in the `none` state.
This bug basically made maintenance mode unusable in customer clusters.
We fix it by changing the node state to `normal`.
We also extend `test_maintenance_mode` to provide a reproducer for
Fixes#27988
This PR must be backported to all branches, as maintenance mode is
currently unusable everywhere.
Closesscylladb/scylladb#28322
* github.com:scylladb/scylladb:
test: test_maintenance_mode: enable maintenance mode properly
test: test_maintenance_mode: shutdown cluster connections
test: test_maintenance_mode: run with different keyspace options
test: test_maintenance_mode: check that group0 is disabled by creating a keyspace
test: test_maintenance_mode: get rid of the conditional skip
test: test_maintenance_mode: remove the redundant value from the query result
storage_proxy: skip validate_read_replica in maintenance mode
storage_service: set up topology properly in maintenance mode
add nightly label for test
test_foreign_reader_as_mutation_source
as an example of usinf boost labels pytest as markers
command to test :
./tools/toolchain/dbuild pytest --test-py-init --collect-only -q -m=nightly test/boost
output:
boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.debug.1
boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.release.1
boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.dev.1
The patch marks force-gossip-topology-changes as deprecated and removes
tests that use it. There is one test (test_different_group0_ids) which
is marked as xfail instead since it looks like gossiper mode was used
there as a way to easily achieve a certain state, so more investigation
is needed if the tests can be fixed to use raft mode instead.
Closesscylladb/scylladb#28383
This reverts commit 7bf7ff785a. The commit
tried to add clean shutdown to `scylla perf` paths, but forgot at least
`scylla perf-alternator --workload wr` which now crashes on uninitialized
`c.as`.
Fixes#28473Closesscylladb/scylladb#28478
Add support for literals in the SELECT clause. This allows
SELECT fn(column, 4) or SELECT fn(column, ?).
Note, "SELECT 7 FROM tab" becomes valid in the grammar, but is still
not accepted because of failed type inference - we cannot infer the
type of 7, and don't have a favored type for literals (like C favors
int). We might relax this later.
In the WHERE clause, and Cassandra in the SELECT clause, type hints
can also resolve type ambiguity: (bigint)7 or (text)?. But this is
deferred to a later patch.
A few changes to the grammar are needed on top of adding a `value`
alternative to `unaliasedSelector`:
- vectorSimilarityArg gained access to `value` via `unaliasedSelector`,
so it loses that alternate to avoid ambiguity. We may drop
`vectorSimilarityArg` later.
- COUNT(1) became ambiguous via the general function path (since
function arguments can now be literals), so we remove this case
from the COUNT special cases, remaining with count(*).
- SELECT JSON and SELECT DISTINCT became "ambiguous enough" for
ANTLR to complain, though as far as I can tell `value` does not
add real ambiguity. The solution is to commit early (via "=>") to
a parsing path.
Due to the loss of count(1) recognition in the parser, we have to
special-case it in prepare. We may relax it to count any expression
later, like modern Cassandra and SQL.
Testing is awkward because of the type inference problem in top-level.
We test via the set_intersection() function and via lua functions.
Example:
```
cqlsh> CREATE FUNCTION ks.sum(a int, b int) RETURNS NULL ON NULL INPUT RETURNS int LANGUAGE LUA AS 'return a + b';
cqlsh> SELECT ks.sum(1, 2) FROM system.local;
ks.sum(1, 2)
--------------
3
(1 rows)
cqlsh>
```
(There are no suitable system functions!)
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-296Closesscylladb/scylladb#28256
`system.cluster_status` is missing the rack info compared to `nodetool status`
that is supposed to be equivalent. It has probably been an omission.
Closesscylladb/scylladb#28457
The hander of raft_topology_cmd::command::stream_ranges switches to
streaming scheduling group to perform data streaming in it. It grabs the
group from database db_config, which's not great. There's streaming
manager at hand in storage service handlers, since it's using its
functionality, it should use _its_ scheduling group.
This will help splitting the streaming scheduling group into more
elaborated groups under the maintenance supergroup: SCYLLADB-351
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28363
Before we could observe two exactly the same
"starting auth service" messages in the log.
One from checkpoint() the other from notify().
We remove the second one to stay consistent
with other services.
Closesscylladb/scylladb#28349
Adds --json-result option to perf-cql-raw and perf-alternator, the same as perf-simple-query has.
It is useful for automating test runs.
Related: https://scylladb.atlassian.net/browse/SCYLLADB-434
Bacport: no, original benchmark is not backported
Closesscylladb/scylladb#28451
* github.com:scylladb/scylladb:
test: perf: add example commands to perf-alternator and perf-cql-raw
test: perf: add option to write results to json in perf-cql-raw
test: perf: add option to write results to json in perf-alternator
test: perf: move write_json_result to a common file
When the topology coordinator refreshes load_stats, it caches load_stats for every node. In case the node becomes unresponsive, and fresh load_stats can not be read from the node, the cached version of load_stats will be used. This is to allow the load balancer to have at least some information about the table sizes and disk capacities of the host.
During load_stats refresh, we aggregate the table sizes from all the nodes. This procedure calls db.find_column_family() for each table_id found in load_stats. This function will throw if the table is not found. This will cause load_stats refresh to fail.
It is also possible for a table to have been dropped between the time load_stats has been prepared on the host, and the time it is processed on the topology coordinator. This would also cause an exception in the refresh procedure.
This fixes this problem by checking if the table still exists.
Fixes: #28359Closesscylladb/scylladb#28440
* github.com:scylladb/scylladb:
test: add test and reproducer for load_stats refresh exception
load_stats: handle dropped tables when refreshing load_stats
This patch adds a test and reproducer for the issue where the load_stats
refresh procedure throws exceptions if any of the tables have been
dropped since load_stats was produced.
We extend the test to provide a reproducer for #27988 and to avoid
similar bugs in the future.
The test slows down from ~14s to ~19s on my local machine in dev
mode. It seems reasonable.
In the following commit, we make the rest run with multiple keyspaces,
and the old check becomes inconvenient. We also move it below to the
part of the code that won't be executed for each keyspace.
Additionally, we check if the error message is as expected.
This skip has already caused trouble.
After 0668c642a2, the skip was always hit, and
the test was silently doing nothing. This made us miss #26816 for a long
time. The test was fixed in 222eab45f8, but we
should get rid of the skip anyway.
We increase the number of writes from 256 to 1000 to make the chance of not
finding the key on server A even lower. If that still happens, it must be
due to a bug, so we fail the test. We also make the test insert rows until
server A is a replica of one row. The expected number of inserted rows is
a small constant, so it should, in theory, make the test faster and cleaner
(we need one row on server A, so we insert exactly one such row).
It's possible to make the test fully deterministic, by e.g., hardcoding
the key and tokens of all nodes via `initial_token`, but I'm afraid it would
make the test "too deterministic" and could hide a bug.
In maintenance mode, the local node adds only itself to the topology. However,
the effective replication map of a keyspace with tablets enabled contains all
tablet replicas. It gets them from the tablets map, not the topology. Hence,
`network_topology_strategy::sanity_check_read_replicas` hits
```
throw std::runtime_error(format("Requested location for node {} not in topology. backtrace {}", id, lazy_backtrace()));
```
for tablet replicas other than the local node.
As a result, all requests to a keyspace with tablets enabled and RF > 1 fail
in debug mode (`validate_read_replica` does nothing in other modes). We don't
want to skip maintenance mode tests in debug mode, so we skip the check in
maintenance mode.
We move the `is_debug_build()` check because:
- `validate_read_replicas` is a static function with no access to the config,
- we want the `!_db.local().get_config().maintenance_mode()` check to be
dropped by the compiler in non-debug builds.
We also suppress `-Wunneeded-internal-declaration` with `[[maybe_unused]]`.
We currently make the local node the only token owner (that owns the
whole ring) in maintenance mode, but we don't update the topology properly.
The node is present in the topology, but in the `none` state. That's how
it's inserted by `tm.get_topology().set_host_id_cfg(host_id);` in
`scylla_main`. As a result, the node started in maintenance mode crashes
in the following way in the presence of a vnodes-based keyspace with the
NetworkTopologyStrategy:
```
scylla: locator/network_topology_strategy.cc:207:
locator::natural_endpoints_tracker::natural_endpoints_tracker(
const token_metadata &, const network_topology_strategy::dc_rep_factor_map &):
Assertion `!_token_owners.empty() && !_racks.empty()' failed.
```
Both `_token_owners` and `_racks` are empty. The reason is that
`_tm.get_datacenter_token_owners()` and
`_tm.get_datacenter_racks_token_owners()` called above filter out nodes
in the `none` state.
This bug basically made maintenance mode unusable in customer clusters.
We fix it by changing the node state to `normal`. We also update its
rack, datacenter, and shards count. Rack and datacenter are present in the
topology somehow, but there is nothing wrong with updating them again.
The shard count is also missing, so we better update it to avoid other
issues.
Fixes#27988
Schema is already a member of select statement, avoiding
the call saves around 400 cpu instructions on a select
request hot path.
Closesscylladb/scylladb#28328
When the topology coordinator refreshes load_stats, it caches load_stats
for every node. In case the node becomes unresponsive, and fresh
load_stats can not be read from the node, the cached version of
load_stats will be used. This is to allow the load balancer to
have at least some information about the table sizes and disk capacities
of the host.
During load_stats refresh, we aggregate the table sizes from all the
nodes. This procedure calls db.find_column_family() for each table_id
found in load_stats. This function will throw if the table is not found.
This will cause load_stats refresh to fail.
It is also possible for a table to have been dropped between the time
load_stats has been prepared on the host, and the time it is processed
on the topology coordinator. This would also cause an exception in the
refresh procedure.
This patch fixes this problem by checking if the table still exists.
Vector Search feature needs to support creating vector indexes with additional
filtering column. There will be two types of indexes: global which indexes
vectors per table, and local which indexes vectors per partition key. The new
syntaxes are based on ScyllaDB's Global Secondary Index and Local Secondary
Index. Vector indexes don't use secondary indexes functionalities in any way -
all indexing, filtering and processing data will be done on Vector Store side.
This patch allows creating vector indexes using this CQL syntax:
```
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
discussion_board_id int,
country text,
lang text,
PRIMARY KEY ((commenter, discussion_board_id), created_at)
);
CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index
ON cycling.comments_vs(comment_vector, country, lang) USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index
ON cycling.comments_vs((commenter, discussion_board_id), comment_vector, country, lang)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
```
Currently, if we run these queries to create indexes we will receive such errors:
```
InvalidRequest: Error from server: code=2200 [Invalid query] message="Vector index can only be created on a single column"
InvalidRequest: Error from server: code=2200 [Invalid query] message="Local index definition must contain full partition key only. Redundant column: XYZ"
```
This commit refactors `vector_index::check_target` to correctly validate
columns building the index. Vector-store currently support filtering by native
types, so the type of columns is checked. The first column from the list must
be a vector (to build index based on these vectors), so it is also checked.
Allowed types for columns are native types without counter (it is not possible
to create a table with counter and vector) and without duration (it is not
possible to correctly compare durations, this type is even not allowed in
secondary indexes).
This commits adds cqlpy test to check errors while creating indexes.
Fixes: SCYLLADB-298
This needs to be backported to version 2026.1 as this is a fix for filtering support.
Closesscylladb/scylladb#28366
These were marked xfail due to #8077 (the column name was wrong),
but it was fixed long ago for 5.4 (exact commit not known).
Remove the xfail markers to prevent regressions.
Closesscylladb/scylladb#28432
This series optimizes role lookup by moving find_record into standard_role_manager and switching it to use the auth cache. This allows reverting can_login to its original simpler form, ensuring hot paths are properly cached while maintaining consistency via group0_guard.
Backport: no, it's not a bug fix.
Closesscylladb/scylladb#28329
* github.com:scylladb/scylladb:
auth: bring back previous version of standard_role_manager::can_login
auth: switch find_record to use cache
auth: make find_record and callers standard_role_manager members
The method was coroutinized by 6df07f7ff7. Back then thecoroutine::switch_to() wasn't available, and the code used with_scheduling_group() to call coroutinized lambdas. Those lambdas were implemented as on-stack variables to solve the capture list lifetime problems. As a result, the code looks like
```
auto flush = [] {
... // do the flushing
auto post_flush = [] {
... // do the post-flushing
}
co_return co_await with_scheduling_group(group_b, post_flush);
};
co_return co_await with_scheduling_group(group_a, flush);
```
which is a bit clumsy. Now we have switch_to() and can make the code flow of this method more readable, like this
```
co_await switch_to(group_a);
... // do the flushing
co_await switch_to(group_b);
... // do the post-flushing
```
Code cleanup, not backporting
Closesscylladb/scylladb#28430
* github.com:scylladb/scylladb:
table: Fix indentation after previous patch
table: Use coroutine::switch_to() in try_flush_memtable_to_sstable()
test_alternator_proxy_protocol starts a node and connects via the alternator ports.
Starting a node, by default, waits until the CQL ports are up. This does not guarantee
that the alternator ports are up (they will be up very soon after this), so there is a short
window where a connection to the alternator ports will fail.
Fix by adding a ServerUpState=SERVING mode, which waits for the node to report
to its supervisor (systemd, which we are pretending to be) that its ports are open.
The test is then adjusted to request this new ServerUpState.
Fixes#28210Fixes#28211
Flaky tests are only in master and branch-2026.1, so backporting there.
Closesscylladb/scylladb#28291
* github.com:scylladb/scylladb:
test: test_alternator_proxy_protocol: wait for the node to report itself as serving
test: cluster_manager: add ability to wait for supervisor STATUS=serving
It allows dropping the local lambdas passed into with_scheduling_group()
calls. Overall the code flow becomes more readable.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This commit moves the "Ungrouped properties" category to the end of the
properties list. The properties are now published in the documentation,
and it doesn't look good if the list starts with ungrouped properties.
This patch was taken over from Anna Stuchlik <anna.stuchlik@scylladb.com>.
Closesscylladb/scylladb#28343
Move to `replica/`, drop `flat` from name and drop unused usages as well as unused includes.
Code cleanup, no backport
Closesscylladb/scylladb#28353
* github.com:scylladb/scylladb:
replica/partition_snapshot_reader: remove unused includes
partition_snapshot_reader: remove "flat" from name
mv partition_snapshot_reader.hh -> replica/
This test case was observed to take over 2 minutes to run on CI
machines, contributing to already bloated CI run times.
Disable this test in debug mode. This test checks for memtable flush
being slowed down when compaction can't keep up. So this test needs to
overwhelm the CPU by definition. On the other hand, this is not a
correctness test, there are such tests for the memtable and compaction
already, so it is not critical to run this in debug mode, it is not
expected to catch any use-after-free and such.
Closesscylladb/scylladb#28407
This commit replaces the previous approach of running pytest inside
GDB’s Python interpreter. Instead, tests are executed by driving a
persistent GDB process externally using pexpect.
- pexpect: Python library for controlling interactive programs
(used here to send commands to GDB and capture its output)
- persistent GDB: keep one GDB session alive across multiple tests
instead of starting a new process for each test
Tests can now be executed via `./test.py gdb` or with
`pytest test/scylla_gdb`. This improves performance and
makes failures easier to debug since pytest no longer runs
hidden inside GDB subprocesses.
Closesscylladb/scylladb#24804
When reads arrive, they have to wait for admission on the reader
concurrency semaphore. If the node is overloaded, the reads will
be queued. They can time out while in the queue, but will not time
out once admitted.
Once the shard is sufficiently loaded, it is possible that most
queued reads will time out, because the average time it takes to
for a queued read to be admitted is around that of the timeout.
If a read times out, any work we already did, or are about to do
on it is wasted effort. Therefore, the patch tries to prevent it
by checking if an admitted read has a chance to complete in time
and abort it if not. It uses the following criteria:
if read's remaining time <= read's timeout when arrived to the semaphore * live updateable preemptive_abort_factor;
the read is rejected and the next one from the wait list is considered.
Fixes https://github.com/scylladb/scylladb/issues/14909
Fixes: SCYLLADB-353
Backport is not needed. Better to first observe its impact.
Closesscylladb/scylladb#21649
* github.com:scylladb/scylladb:
reader_concurrency_semaphore: Check during admission if read may timeout
permit_reader::impl: Replace break with return after evicting inactive permit on timeout
reader_concurrency_semaphore: Add preemptive_abort_factor to constructors
config: Add parameters to control reads' preemptive_abort_factor
permit_reader: Add a new state: preemptive_aborted
reader_concurrency_semaphore: validate waiters counter when dequeueing a waiting permit
reader_concurrency_semaphore: Remove cpu_concurrency's default value
Contains various improvements to tablet load balancer. Batched together to save on the bill for CI.
Most notably:
- Make plan summary more concise, and print info only about present elements.
- Print rack name in addition to DC name when making a per-rack plan
- Print "Not possible to achieve balance" only when this is the final plan with no active migrations
- Print per-node stats when "Not possible to achieve balance" is printed
- amortize metrics lookup cost
- avoid spamming logs with per-node "Node {} does not have complete tablet stats, ignoring"
Backport to 2026.1: since the changes enhance debuggability and are relatively low risk
Fixes#28423Fixes#28422Closesscylladb/scylladb#28337
* github.com:scylladb/scylladb:
tablets: tablet_allocator.cc: Convert tabs to spaces
tablets: load_balancer: Warn about incomplete stats once for all offending nodes
tablets: load_balancer: Improve node stats printout
tablets: load_balancer: Warn about imbalance only when there are no more active migrations
tablets: load_balancer: Extract print_node_stats()
tablet: load_balancer: Use empty() instead of size() where applicable
tablets: Fix redundancy in migration_plan::empty()
tablets: Cache pointer to stats during plan-making
tablets: load_balancer: Print rack in addition to DC when giving context
tablets: load_balancer: Make plan summary concise
tablets: load_balancer: Move "tablet_migration_bypass" injection point to make_plan()
In fact, it's partially there already. When view_builder::start() is called is first calls initialization code (the start_in_background() method), then kicks do_build_step() that runs a background fiber to perform build steps. The starting code inherits scheduling group from main(). And the step fiber code needs to run itself in a maintenance scheduling group, so it explicitly grabs one via database->db_config.
This PR mainly gets rid of the call to database::get_streaming_scheduling_group() from do_build_step() as preparation to splitting the streaming scheduling group into parts (see SCYLLADB-351). To make it happen the do_build_step() is patched to inherit its scheduling group from view_builder::start() and the start() itself is called by main from maintenance scheduling group (like for other view building services).
New feature (nested scheduling group), not backporting
Closesscylladb/scylladb#28386
* github.com:scylladb/scylladb:
view_builder: Start background in maintenance group
view_builder: Wake-up step fiber with condition variable
In a lambda returned from make_streaming_consumer() there's a check for
current scheudling group being streaming one. It came from #17090 where
streaming code was launched in wrong sched group thus affecting user
groups in a bad way.
The check is nice and useful, but it abuses replica::database by getting
unrelated information from it.
To preserve the check and to stop using database as provider of configs,
keep the streaming scheduling group handle in the debug namespace. This
emphasises that this global variable is purely for debugging purposes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28410
Explain what automatic repair is and how to configure it. While at it, improve the existing repair documentation a bit.
Fixes: SCYLLADB-130
This PR missed the 2026.1 branch date, so it needs backport to 2026.1, where the auto repair feature debuts.
Closesscylladb/scylladb#28199
* github.com:scylladb/scylladb:
docs: add feature page for automatic repair
docs: inter-link incremental-repair and repair documents
docs: incremental-repair: fix curl example
One of the best features of the pytest framework is "assertion
rewriting": If your test does for example "assert a + 1 == b", the
assertion is "rewritten" so that if it fails it tells you not only
that "a+1" and "b" are not equal, what the non-equal values are,
how they are not equal (e.g., find different elements of arrays) and
how each side of the equality was calculated.
But pytest can only "rewrite" assertion that it sees. If you call a
utility function checksomething() from another module and that utility
function calls assert, it will not be able to rewrite it, and you'll
get ugly, hard-to-debug, assertion failures.
This problem is especially noticable in tests we translated from
Cassandra, in test/cqlpy/cassandra_tests. Those tests use a bunch of
assertion-performing utility functions like assertRows() et al.
Those utility functions are defined in a separate source file,
porting.py, so by default do not get their assertions rewritten.
We had a solution for this: test/cqlpy/cassandra_test/__init__.py had:
pytest.register_assert_rewrite("cassandra_tests.porting")
This tells pytest to rewrite assertions in porting.py the first time
that it is imported.
It used to work well, but recently it stopped working. This is because
we change the module paths recently, and it should be written as
test.cqlpy.cassandra_tests.porting.
I verified by editing one of the cassandra_tests to make a bad check
that indeed this statement stopped working, and fixing the module
path in this way solves it, and makes assertion rewriting work
again.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#28411
Currently view_builder::start() is called in default scheduling group.
Once it initializes itself, it wakes up the step fiber that explicitly
switches to maintenance scheduling group.
This explicit switch made sence before previous patch, when the fiber
was implemented as a serialized action. Now the fiber starts directly
from .start() method and can inherit scheduling group from it.
Said that, main code calls view_builder::start() in maintenance
scheduling group killing two birds with one stone. First, the step fiber
no longer needs borrow its scheduling group indirectly via database.
Second, the start_in_background() code itself runs in a more suitable
scheduling group.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
View builder runs a background fiber that perform build steps. To kick
the fiber it uses serizlized action, but it's an overkill -- nobody
waits for the action to finish, but on stop, when it's joined.
This patch uses condition variable to kick the fiber, and starts it
instantly, in the place where serialized action was first kicked.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Explain what the feature is and how to confiture it.
Inter-link all the repair related pages, so one can discover all about
repair, regardless of which page they land on.