before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
There are some bugs missed in task handler:
- wait_for_task does not wait until virtual tasks are done, but
returns the status immediately;
- wait_for_task suffers from use after return;
- get_status_recursively does not set the kind of task essentials.
Fix the aforementioned.
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
system_keyspace::get_topology_request_entries returns entries for
requests which are running or have finished after specified time.
In task manager node ops task set the time so that they are shown
for task_ttl seconds after they have finished.
Virtual tasks are supported by get_task_status, abort_task and
wait_task.
Task status returned by get_task_status and wait_task:
- contains task_kind to indicate whether it's virtual (cluster) or
regular (node) task;
- children list apart from task_id contains node address of the task.
task_manager/list_module_tasks/{module} starts supporting virtual tasks,
which means that their stats will also be shown for users.
Additional task_kind param is added to indicate whether the task is
virutal (cluster-wide) or regular (node-wide).
Support in other paths will be added in following patches.
Contrary to regular tasks, which are per-operation, virtual tasks
are associated with the whole group of operations. There may be many
operations of each group performed at the same time. Info about each
running operation will be shown to a user through the API.
For virtual tasks, task manager imitates a regular task covering
each operation, but task_manager::tasks aren't actually created
in the memory. Instead, information (e.g. status) about the operation
is retrieved from associated service and passed to a user.
To hide most of the differences from user, task_handler class is created.
Task handler performs appropriate actions depending on task's kind.
However, users need to stay conscious about the kind of task, because:
- get_task_status and wait_task do not unregister virtual tasks;
- time for which a virtual tasks stays in task manager depends
on associated service and tasks' implementation;
- number of virtual task's children shown by get_tasks doesn't have
to be monotonous.
API is modified to use task_handler.
API-specific classes are moved to task_handler.{cc,hh}.
Virtual tasks are kept in task manager together with regular tasks.
All virtual tasks are stored on shard 0.
task_manager::module::make_task is modified to consider virtual
tasks as possible parents.
A virtual task is a new kind of task supported by task manager,
which covers cluster-wide operations.
From users' perspective virtual tasks behave similarly
to task_manager::tasks. The API side of virtual tasks will be
covered in the following patches.
Contrary to task_manager::task, virtual task does not update
its fields proactively. Moreover, no object is kept in memory
for each individual virtual task's operation. Instead a service
(or services) is queried on API user's demand to learn about
the status of running operation. Hence the name.
task_manager::virtual_task is responsible for a whole group
of virtual tasks, i.e. for tracking and generating statuses
of all operations of similar type.
To enable tracking of some kind of operations, one needs to
override task_manager::virtual_task::impl and provide implementations
of the methods returning appropriate information about the operations.
task_manager::virtual_task must be kept on shard 0.
Similarly to task_manager::tasks, virtual tasks can have child tasks,
responsible for tracking suboperations' progress. But virtual tasks
cannot have parents - they are always roots in task trees.
Some methods and structs will be implemented in later patches.
this change was created in the same spirit of ebff5f5d.
despite that we include Seastar as a submodule, Seastar is not a
part of scylla project. so we'd better include its headers using
brackets.
ebff5f5d addressed this cosmetic issue a while back. but probably
clangd's header-insertion helped some of contributor to insert
the missing headers with `"`. so this style of `include` returned
to the tree with these new changes.
unfortunately, clangd does not allow us to configure the style
of `include` at the time of writing.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#19406
Currently if task_manager::task::impl::abort preempts before children
are recursively aborted and then the task gets unregistered, we hit
use after free since abort uses children vector which is no
longer alive.
Modify abort method so that it goes over all tasks in task manager
and aborts those with the given parent.
Fixes: #19304.
Currently, when a child task is unregistered, it is still kept by its parent. This leads
to excessive memory usage, especially when the tasks are configured to be kept in task
manager after they are finished (task_ttl_in_seconds).
Introduce task_essentials struct which keeps only data necesarry for task manager API.
When a task which has a parent is finished, a foreign pointer to it in its parent is replaced
with respective task_essentials. Once a parent task is finished it is also folded into
its parent (if it has one). Children details of a folded task are lost, unless they
(or some of their subtrees) failed. That is, when a task is finished, we keep:
- a root task (until it is unregistered);
- task_essentials of root's direct children;
- a path (of task_essentials) from root to each failed task (so that the reason
of a failure could be examined).
when compiling with Clang-18 + libstdc++-13, the tree fails to build:
```
/home/kefu/dev/scylladb/tasks/task_manager.hh:45:36: error: no template named 'list' in namespace 'std'
45 | using foreign_task_list = std::list<foreign_task_ptr>;
| ~~~~~^
```
so let's include the used header
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16433
If std::vector is resized its iterators and references may
get invalidated. While task_manager::task::impl::_children's
iterators are avoided throughout the code, references to its
elements are being used.
Since children vector does not need random access to its
elements, change its type to std::list<foreign_task_ptr>, which
iterators and references aren't invalidated on element insertion.
Fixes: #16380.
Closesscylladb/scylladb#16381
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.
Finish a task with a failure if it was aborted with task manager api.
This reverts commit 11cafd2fc8, reversing
changes made to 2bae14f743.
Reverting because this series causes frequent CI failures, and the
proposed quickfix causes other failures of its own.
Fixes: #16113
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.
Finish a task with a failure if it was aborted with task manager api.
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.
Reverting because rest_api.test_compaction_task started failing after
this was merged.
Fixes: #16005
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.
Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.
Closesscylladb/scylladb#15083
* github.com:scylladb/scylladb:
test: test abort of compaction task that isn't started yet
test: test running compaction task abort
tasks: fail if a task was aborted
compaction: abort task manager compaction tasks
When task_manager is constructed without config (tests) its task_ttl is
left uninitialized (i.e. -- random number gets in there). This results
in tasks hanging around being registered for infinite amount of time
making long-living task manager look hanged.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15859
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.
Finish a task with a failure if it was aborted with task manager api.
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.
In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.
Release shard_repair_task_impl resources immediately after shard
repair is finished.
Fixes: #15505.
Closesscylladb/scylladb#15506
Most of the time only the roots of tasks tree should be non internal.
Change default implementation of is_internal and delete overrides
consistent with it.
Closesscylladb/scylladb#15353
Find progress of repair tasks based on the number of ranges
that have been repaired.
Fixes: [#1156](https://github.com/scylladb/scylla-enterprise/issues/1156).
Closes#14698
* github.com:scylladb/scylladb:
test: repair tasks test
repair: add methods making repair progress more precise
tasks: make progress related methods virtual
repair: add get_progress method to shard_repair_task_impl
repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size()
repair: log a name of a particular table repair is working on
tasks: delete move and copy constructors from task_manager::task::impl
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.
Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.
Fixesscylladb/scylladb#15211
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Have a private about_source for every module
and request abort on stop() to signal all outstanding
tasks to abort (especially when they are sleeping
for the task_ttl).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather to the top-level task_manager about_source,
to provide separation between task_manager modules
so each one can be aborted and stopped independentally
of the others (in the next patch).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When task manager is not aborted, the tasks are stored in the memory,
not allowing the tasks' gate to be closed.
When wrapped_compaction_manager is destructed, task manager gets
aborted, so that system could shutdown.