Expose the buffer hint functionality added by the previous commits, to
callers of make_multishard_streaming_reader(). All callers disable it
currently, it will be used in the next patch.
Calculate a buffer fill hint and pass it to
shard_reader_v2::fill_buffer(), so the underlying buffer-fill can be
optimized to avoid multiple cross shard round-trips, as well as possible
evict-recreate cycles.
The buffer hint mechanism is opt-in, enabled via the new
multishard_reader_buffer_hint parameter.
When the hint is provided, respect it: make sure the returned buffer is
of the requested size, stopping early if the stop_token is seen.
To reduce the amount of possible eviction-recreate cycles while the
buffer is filled, disable auto-pause for the duration of the
fill_reader_buffer() call. For this purpose, auto_pause_disable_guard is
added to evictable_reader_v2.
The hint will tell the shard reader exactly how much data to produce, to
avoid multiple cross-shard round-trips and possible evict-recreate
cycles.
The hint is neither used yet or calculated yet, this is coming in the
next patches.
Recently, seastar rpc started accepting std::type_identity in addition
to boost::type as a type marker (while labeling the latter with an
ominous deprecation warning). Reduce our depedendency on boost
by switching to std::type_identity.
Terminology note: in the context of this series, "index page" means an contiguous segment of the index file starting (inclusive) at a key corresponding to a summary entry and ending (exclusive) before the key corresponding to the next summary entry. "Index pages" are not related to filesystem pages.
---
In a single-partition read, if the searched partition key is the first key in its index page, we start scanning the index for that key starting at the previous index page (inclusive), even though we could start directly from the key's page. Similarly, if the searched partition key is absent from the sstable and lies after all other keys in its appropriate page, we additionally scan the next page, even though it's known from the summary that it can't possibly contain the key.
Those cases are wasteful. It's worse than it might seem at first glance. When partitions are small, only a small fraction of search keys fulfills those conditions (i.e. "first key in its page" or "an absent key greater than the last key in its page"), so the waste doesn't matter much. But when partitions are big enough, every index page contains only one partition key (and a promoted index for that partition), which directly means that *all* search keys fulfill the conditions, which means that total index reading work is two times bigger than what it should be.
In addition, there is a secondary performance bug which, when the aforementioned conditions are fulfilled, causes *additional* I/O to happen *past* the index reads which are actually parsed and used. In effect, the index I/O in single-partition reads might be not just doubled, but even tripled (that's for IOPS — throughput might be multiplied even more), all because of a slight inaccuracy in the edge cases.
This series fixes those inefficiencies by tightening the edge cases and ensuring that single-partition reads always read only a single index page.
Here's an example where we query the first row (i.e. `LIMIT 1`) of a certain partition key, in a table with large (1 MB) promoted indexes. Before the patch, the lookup of the lower bound involves 3 serialized disk reads (as described above) to subsequent index pages, and even the lookup of the upper bound involves 2 disk reads:
```
Execute CQL3 query
Parsing a statement [shard 0]
Processing a statement for authenticated user: anonymous [shard 0]
Executing read query (reversed false) [shard 0]
Creating read executor for token -1297921881139976049 with all: [127.11.11.1] targets: [127.11.11.1] repair decision: NONE [shard 0]
Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 0]
read_data: querying locally [shard 0]
Start querying singular range {{-1297921881139976049, pk{00023130}}} [shard 0]
[reader concurrency semaphore user] admitted immediately [shard 0]
[reader concurrency semaphore user] executing read [shard 0]
Reading key {-1297921881139976049, pk{00023130}} from sstable ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 38359040 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 38391808 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 38359040, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 38391808, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 39370752 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 39403520 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 39370752, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 39403520, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40378368 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40411136 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40378368, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40411136, successfully read 32768 bytes [shard 0]
upper_bound_cache_only({position: clustered, ckp{}, 1}): no upper bound [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40378368 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40411136 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40378368, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40411136, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 41390080 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 41422848 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 41390080, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 41422848, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db: scheduling bulk DMA read of size 21926 at offset 819200 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db: finished bulk DMA read of size 21926 at offset 819200, successfully read 24576 bytes [shard 0]
Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 0 cell(s) (0 live, 0 dead) [shard 0]
Querying is done [shard 0]
Done processing - preparing a result [shard 0]
Request complete
```
After the patch, the lookup of each bound involves 1 read:
```
Execute CQL3 query
Parsing a statement [shard 0]
Processing a statement for authenticated user: anonymous [shard 0]
Executing read query (reversed false) [shard 0]
Creating read executor for token -1297921881139976049 with all: [127.11.11.1] targets: [127.11.11.1] repair decision: NONE [shard 0]
Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 0]
read_data: querying locally [shard 0]
Start querying singular range {{-1297921881139976049, pk{00023130}}} [shard 0]
[reader concurrency semaphore user] admitted immediately [shard 0]
[reader concurrency semaphore user] executing read [shard 0]
Reading key {-1297921881139976049, pk{00023130}} from sstable ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 39370752 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 39403520 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 39370752, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 39403520, successfully read 32768 bytes [shard 0]
upper_bound_cache_only({position: clustered, ckp{}, 1}): no upper bound [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40378368 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40411136 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40378368, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40411136, successfully read 32768 bytes [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db: scheduling bulk DMA read of size 21926 at offset 819200 [shard 0]
./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db: finished bulk DMA read of size 21926 at offset 819200, successfully read 24576 bytes [shard 0]
Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 0 cell(s) (0 live, 0 dead) [shard 0]
Querying is done [shard 0]
Done processing - preparing a result [shard 0]
Request complete
```
Doesn't have to be backported, since the problem only affects performance, not correctness, and it has been present since forever.
Closesscylladb/scylladb#20897
* github.com:scylladb/scylladb:
index_reader: remove a piece of misguided code involved in single-partition reads
index_reader: in single-partition reads, don't read more than one page
index_reader: fix unnecessary reads of preceding index pages
stop() methods, like destructors must always succeed,
and returning errors from them is futile as there is
nothing else we can do with them by continue with shutdown.
stop_ongoing_compactions, in particular, currently returns the status
of stopped compaction tasks from `stop_tasks`, but still all tasks
must be stopped after it, even if they failed, so assert that
and ignore the errors.
Fixesscylladb/scylladb#21159
* Needs backport to 6.2 and 6.1, as commit 8cc99973eb causes handles storage that might cause compaction tasks to fail and eventually terminate on shudown when the exceptions are thrown in noexcept context in the deferred stop destructor body
Closesscylladb/scylladb#21299
* github.com:scylladb/scylladb:
compaction_manager: stop: await _stop_future if engaged
compaction_manager: really_do_stop: assert that no tasks are left behind
compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors
compaction/compaction_manager: stop_tasks(): unlink stopped tasks
compaction/compaction_manager: make _tasks an intrusive list
There's a whole lot of helpers and wrappers in api/ that help handlers manipulate keyspaces and tables. One of those is foreach_column_family which calls the provided callable on a table on each shard. There's exactly the same (but a bit more flexible) helper nearby. While at it, this helper gets a better name.
Closesscylladb/scylladb#21398
* github.com:scylladb/scylladb:
api: Rename set_tables -> for_tables_on_all_shards
api: Remove foreach_column_family() helper
The current condition that consults the compaction manager
state for awaiting `_stop_future` works since _stop_future
is assigned after the state is set to `stopped`, but it is
incidental. What matters is that `_stop_future` is engaged.
While at it, exchange _stop_future with a ready future
so that stop() can be safely called multiple times.
And dropped the superfluous co_return.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
stop_ongoing_compactions now ignores any errors returned
by tasks, and it should leave no task left behind.
Assert that here, before the compaction_manager is destroyed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
stop() methods, like destructors must always succeed,
and returning errors from them is futile as there is
nothing else we can do with them but continue with shutdown.
Leaked errors on the stop path may cause termination
on shutdown, when called in a deferred action destructor.
Fixesscylladb/scylladb#21298
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Stopped tasks currently linger in _tasks until the fiber that created
the task is scheduled again and unlinks the task. This window between
stop and remove prevents reliable checks for empty _tasks list after all
tasks are stopped.
Unlink the task early so really_do_stop() can safely check for an empty
_tasks list (next patch).
_tasks is currently std::list<shared_ptr<compaction_task_executor>>, but
it has no role in keeping the instances alive, this is done by the
fibers which create the task (and pin a shared ptr instance).
This lends itself to an intrusive list, avoiding that extra
allocation upon push_back().
Using an intrusive list also makes it simpler and much cheaper (O(1) vs.
O(N)) to remove tasks from the _tasks list. This will be made use of in
the next patch.
Code using _task has to be updated because the value_type changes from
shared_ptr<compaction_task_executor> to compaction_task_executor&.
* seastar f821bda19...fba36a3d1 (13):
> build: do not include -DBoost_TEST_DYN_LINK in seastar_testing_cflags
> doc: compatibility: update the notes on supported GCC versions
> docker: bump up to clang {18,19} and gcc {13,14}
> rpc: optimize small tuple deserialization
> rpc: switch rpc::type from boost to std
> thread: do not use fortify source
> build: suppress CMake warning about CMP0057
> core/units: remove space before literal identifier
> signal.md: describe auto signal handling
> build: persist Seastar options in SeastarConfig.cmake
> sharded.hh: seperate invoke_on decls from defs
> test: Add perf test for http client
> gate: check: mark as const
Closesscylladb/scylladb#21390
The hints and batchlog flush requests are issued to all nodes for each repair request when tombstone_gc repair mode is used.
The amount of such flush requests is high when all nodes in the cluster run repair. It is observed it takes a long time, up to 15s, for a repair request to finish such a flush request.
To reduce overhead of the flush, each node caches the flush and only executes the real flush when some time has passed. It is safe to do so before the real flush_time is returned. Repair uses the smallest flush_time from peers as the repair time.
The nice thing about the cache on the receiver side is that all senders can hit the cache. It is better than cache on the sender side.
A slightly smaller flush_time compared to the real flush time will be used with the benefits of significantly dropped hints and batchlog flush. The tradeoff is reasonable.
Fixes#20259
Performance improvement. No backports.
Closesscylladb/scylladb#20260
* github.com:scylladb/scylladb:
test/test_repair.py: Add test_batchlog_flush_in_repair
repair: Reduce hints and batchlog flush
db/batchlog_manager: Add add_delay_to_batch_replay
db/batchlog_manager: Add get_last_replay
db/batchlog_manager: wire in batchlog_replay_cleanup_after_replays
db/config: introduce batchlog_replay_cleanup_after_replays
db/batchlog_manager: do_batch_log_replay(): add cleanup flag
Optimize the various constructors a little, and add an std::from_range_t
constructor.
Minor improvement, so no backports.
Closesscylladb/scylladb#21399
* github.com:scylladb/scylladb:
utils: chunked_vector: add from_range_t constructor
utils: chunked_vector: optimize initializer_list constructor
utils: chunked_vector: iterator constructor: copy spanwise
utils: chunked_vector: reserve for forward iterators, not just random access iterators, on construction
Currently, to find the operation with given id, all operations tracked by a virtual task are listed. This isn't necessary, since we only need info regarding one particular operation.
Add a method to check whether a virtual task tracks the operation with the given id.
No backport needed
Closesscylladb/scylladb#20769
* github.com:scylladb/scylladb:
tasks: delete virtual_task::get_ids method as it is unused
tasks: improve task_manager::lookup_virtual_task
There's a whole lot of helpers and wrappers in api/ that help handlers
manipulate keyspaces and tables. One of those is foreach_column_family
which calls the provided callable on a table on each shard. There's
exactly the same (but a bit more flexible) set_table() helper nearby.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Task manager GET /status method returns two counters that reflect task progress -- total and completed. To make caller reason about their meaning, additionally there's progress_units field next to those counters.
This patch implements this progress report for backup task. The units are bytes, the total counter is total size of files that are being uploaded, and the completed counter is total amount of bytes successfully sent with PUT requests. To get the counters, the client::upload_file() is extended to calculate those.
fixes#20653Closesscylladb/scylladb#21144
* github.com:scylladb/scylladb:
backup_task: Report uploading progress
s3/client: Account upload progress for real
s3/client: Introduce upload_progress
s3: Extract client_fwd.hh
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::transform`.
in this change, we:
- replace `boost::transform` with `std::ranges::transform`
- update affected code to work with `std::ranges::transform`
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21318
This pattern is -- if requested (by test) suspend code execution until requestor (the test) explicitly wakes it up. For that the injected place should inject a lambda that is called with so called "handler" at hand and try to read message from the handler. In many cases the inner lambda additionally prints a message into logs that tests waits upon to make sure injection was stepped on. In the end of the day this "breakpoint" is injected like
```
co_await inject("foo", [] (auto& handler) {
log.info("foo waiting");
co_await handler.wait_for_message(timeout);
});
```
This PR makes breakpoints shorter and more unified, like this
```
co_await inject("foo", wait_for_message(timeout));
```
where `wait_for_message` is a wrapper structure used to pick new `inject()` overload.
Closesscylladb/scylladb#21342
* github.com:scylladb/scylladb:
sstables: Use inject(wait_for_message_overload)
treewide,error_injection: Use inject(wait_for_message) and fix tests
treewide,error_injection: Use inject(wait_for_message) overload
error_injection: Add inject() overload with wait_for_message wrapper
std::ranges::to<> has a little protocol with containers. Implement it
to get optimized construction.
Similar to the iterator pair constructor, if the range's size can be
obtained (even with an O(N) algorithm), favor that to avoid reallocations.
Copy elements spanwise to promote optimization to memcpy when possible.
Instead of copying element-by-element, copy contiguous spans. This
is much faster if the input is a span and the constructor is trivial,
since the whole thing translates to a memcpy.
Make the two branches constexpr to reduce work for the compiler in
optimizing the other branch away.
For a forward iterator, prefer a two pass algorithm to first count
the number of elements, reserver, then copy the elements, to a single
pass algorithm that involves reallocation and copying.
`Exception` could be too general, what we really care about is
`GithubException`. so let's catch the latter instead for better
readability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21364
The S3 mock server (introduced in 5a96549c) currently prints its status
messages directly to stdout, which can be distracting when reviewing test
results. For example:
```console
$ ./test.py --verbose --mode debug object_store/test_backup::test_simple_backup
Found 1 tests.
Starting S3 mock server on ('127.226.51.1', 2012)
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[1/1] object_store debug [ PASS ] object_store.test_backup.1 5.99s
Stopping S3 mock server
-------------------------
CPU utilization: 6.5%
```
Move these messages to use proper logging to give developers more control
over their visibility:
- Make logger parameter mandatory in MockS3Server constructor
- Route "Stopping S3 mock server" message through the provided logger
- Add --log-level option to the standalone mock server launcher
The message is now hidden:
```console
$ ./test.py --verbose --mode debug --save-log-on-success object_store/test_backup::test_simple_backup
Found 1 tests.
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[1/1] object_store debug [ PASS ] object_store.test_backup.1 6.25s
------------------------------------------------------------------------------
CPU utilization: 5.5%
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21384
When a compaction_group is removed via `compaction_manager::remove`,
it is erase from `_compaction_state`, and therefore compaction
is definitely not enabled on it.
This triggers an internal error if tablets are cleaned up
during drop/truncate, which checks that compaction is disabled
in all compaction groups.
Note that the callers of `compaction_disabled` aren't really
interested in compaction being actively disabled on the
compaction_group, but rather if it's enabled or not.
A follow-up patch can be consider to reverse the logic
and expose `compaction_enabled` rather than `compaction_disabled`.
Fixesscylladb/scylladb#20060
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#21378
View building is an expensive process that takes a long time to complete.
During the build, it's impact on other work should be minimized, even at
the expense of slightly slowing it down.
Instead, view building is currently performed in the the same scheduling
group (gossip) as other high-priority tasks, in particular raft processing,
which slows it down, making races more likely and increasing the number
of retries that need to be done.
While view building is still initiated in the gossip group (as it's the
result of adding a view, which is a schema change), in this patch the bulk
of the view building work is moved to a low-priority, maintenance scheduling
group (named "streaming" after its main use case).
Additionally, a test is added, where we make sure that the scheduling
group is the one most used when building a view.
Fixes https://github.com/scylladb/scylladb/issues/21232Closesscylladb/scylladb#21326
Today, each test function in test/topology_experimental_raft creates a
cluster in the beginning of the test and drops it at the end of the
function. This is very inefficient if you hope (like I do) to write many
small and pinpointed test functions instead of large test functions that
test 20 unrelated things.
Trying to propose a way to change this sad state of affairs, in
test_alternator.py I created a fixture "alternator3" which I hoped could
be used in multiple tests that need a 3-node Alternator cluster.
Currently only one test uses this fixture.
Unfortunately, it turns out the alternator3 fixture is broken, and
led to flaky test runs (sometimes the test using alternator3 picked
up an existing cluster instead of starting with an empty cluster,
and failed). These problems cannot be *completely* fixed at the current
state of the framework. The framework does not currently allow keeping
a 3-node cluster between test functions, while also allowing other test
functions to create different clusters. The specific flakiness we saw
could be fixed by adding a missing before_test() call, but in the
future we would need to ensure that all the test functions that
use it are contiguous in the test file, and I don't see how we can (or
want to) ensure this. So at this point I am giving up and withdrawing
this proposal until the developers of the topology test framework
make this one of their design goals.
Since there was only one test using this fixture, removing it should
make no performance or correctness difference - it should just fix
the flakiness.
Fixesscylladb/scylladb#21322.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21370
Fixesscylladb/scylladb#21159
When an exception is thrown in sstable write etc such that
storage_manager::isolate is initiated, we start a shutdown chain
for message service, gossip etc. These are synced (properly) in
storage_manager::stop, but if we somehow call gossiper::shutdown
outside the normal service::stop cycle, we can end up running the
method simultaneously, intertwined (missing the guard because of
the state change between check and set). We then end up co_awaiting
an invalid future (_failure_detector_loop_done) - a second wait.
Fixed by
a.) Remove superfluous gossiper::shutdown in cql_test_env. This was added
in 20496ed, ages ago. However, it should not be needed nowadays.
b.) Ensure _failure_detector_loop_done is always waitable. Just to be sure.
Closesscylladb/scylladb#21379
Replace use of boost::ranges::join() with another construct, as it
has no std replacement, and replace other uses with their std
equivalent, in order to reduce dependency load.
Code cleanup - no backport.
Closesscylladb/scylladb#21382
* github.com:scylladb/scylladb:
compound_compat: replace use of boost ranges with std ranges
compound_compat: simplify seriakization of ka/la sstables static cell names
these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been
confirmed.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#21374
* github.com:scylladb/scylladb:
.github: add gms to iwyu's CLEANER_DIR
gms: remove unused `#include`s
The `reader_consumer_v2` type
(`std::function<future<> (mutation_reader)>`) is defined alongside
`mutation_reader` in `mutation_reader.hh`.
before this change, we sometimes use
`std::function<future<> (mutation_reader)>` directly when defining a
consumer parameter or a consumer variable.
in this change, we improve maintainability by:
- Reducing duplicate function type declarations
- Centralizing the consumer type definition
- Making future signature updates easier to implement
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21369
To reduce the dependency load, replace use of boost ranges
with the std equivalent.
Files that lost the indirect boost dependency have it added as a
direct dependency.
compound_compat is used for serializing ka/la sstables static cell names.
Since we can no longer write such sstabkes, the function is used only
in some tests.
Reduce the use of boost::range::join(): it has no direct equivalent
in std (std::views::concat is in C++26), and it is slow due to the
need to type-erase. Instead of using boost::range::join, extend the
vector used to hold the empty clustering key a bit more, and copy
the view representing the static cell name into into it.
these unused includes are identified by clang-include-cleaner.
after auditing the source files, all of the reports have been
confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This place could be in the pre-previous patch, it just can use the
overload, but it seemengly has a bug. It prints _two_ messages -- that
the injection handler was suspended and that it was woken up. The bug is
in the 2nd message -- it's printed without waiting for the message, so
it likely gets printed before wakeup itself. It seems that no tests care
about it though.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is continuation of previous patch, this time also update tests that
wait for specific message in logs (to make sure injection handler was
called and paused the code execution).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>