run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.
Finish a task with a failure if it was aborted with task manager api.
This reverts commit 11cafd2fc8, reversing
changes made to 2bae14f743.
Reverting because this series causes frequent CI failures, and the
proposed quickfix causes other failures of its own.
Fixes: #16113
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.
Finish a task with a failure if it was aborted with task manager api.
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.
Reverting because rest_api.test_compaction_task started failing after
this was merged.
Fixes: #16005
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.
Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.
Closesscylladb/scylladb#15083
* github.com:scylladb/scylladb:
test: test abort of compaction task that isn't started yet
test: test running compaction task abort
tasks: fail if a task was aborted
compaction: abort task manager compaction tasks
When task_manager is constructed without config (tests) its task_ttl is
left uninitialized (i.e. -- random number gets in there). This results
in tasks hanging around being registered for infinite amount of time
making long-living task manager look hanged.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15859
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.
Finish a task with a failure if it was aborted with task manager api.
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.
In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.
Release shard_repair_task_impl resources immediately after shard
repair is finished.
Fixes: #15505.
Closesscylladb/scylladb#15506
Most of the time only the roots of tasks tree should be non internal.
Change default implementation of is_internal and delete overrides
consistent with it.
Closesscylladb/scylladb#15353
Find progress of repair tasks based on the number of ranges
that have been repaired.
Fixes: [#1156](https://github.com/scylladb/scylla-enterprise/issues/1156).
Closes#14698
* github.com:scylladb/scylladb:
test: repair tasks test
repair: add methods making repair progress more precise
tasks: make progress related methods virtual
repair: add get_progress method to shard_repair_task_impl
repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size()
repair: log a name of a particular table repair is working on
tasks: delete move and copy constructors from task_manager::task::impl
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.
Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.
Fixesscylladb/scylladb#15211
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Have a private about_source for every module
and request abort on stop() to signal all outstanding
tasks to abort (especially when they are sleeping
for the task_ttl).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather to the top-level task_manager about_source,
to provide separation between task_manager modules
so each one can be aborted and stopped independentally
of the others (in the next patch).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When task manager is not aborted, the tasks are stored in the memory,
not allowing the tasks' gate to be closed.
When wrapped_compaction_manager is destructed, task manager gets
aborted, so that system could shutdown.
Modify task_manager::task::impl::get_progress method so that,
whenever relevant, progress is calculated based on children's
progress. Otherwise progress indicates only whether the task
is finished or not.
Keep seastar::shared_ptr to task::impl instead of std::unique_ptr
in task. Some classes deriving from task::impl may be used outside
task manager context.
gcc dislikes a member name that matches a type name, as it changes
the type name retroactively. Fix by fully-qualifying the type name,
so it is not changed by the newly-introduced member.
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
In most cases, tasks manager's tasks are started just after they are
created. Thus, to reduce boilerplate required for creating and starting
tasks, tasks::task_manager::module::make_and_start_task method is added.
Repair tasks are modified to use the method where possible.
Closes#12729
* github.com:scylladb/scylladb:
repair: use tasks::task_manager::module::make_and_start_task for repair tasks
tasks: add task_manager::module::make_and_start_task method
In most cases, tasks manager's tasks are started just after they are
created. Thus, to reduce boilerplate required for creating and starting
tasks, make_and_start_task method is added.
Aborting of repair operation is fully managed by task manager.
Repair tasks are aborted:
- on shutdown; top level repair tasks subscribe to global abort source. On shutdown all tasks are aborted recursively
- through node operations (applies to data_sync_repair_task_impls and their descendants only); data_sync_repair_task_impl subscribes to node_ops_info abort source
- with task manager api (top level tasks are abortable)
- with storage_service api and on failure; these cases were modified to be aborted the same way as the ones from above are.
Closes#12085
* github.com:scylladb/scylladb:
repair: make top level repair tasks abortable
repair: unify a way of aborting repair operations
repair: delete sharded abort source from node_ops_info
repair: delete unused node_ops_info from data_sync_repair_task_impl
repair: delete redundant abort subscription from shard_repair_task_impl
repair: add abort subscription to data sync task
tasks: abort tasks on system shutdown
The PR introduces changes to task manager api:
- extends tasks' list returned with get_tasks with task type,
keyspace, table, entity, and sequence number
- extends status returned with get_task_status and wait_task
with a list of children's ids
Closes#12338
* github.com:scylladb/scylladb:
api: extend status in task manager api
api: extend get_tasks in task manager api
invoke_on_task is used in translation units where its definition is not
visible, yet it has no explicit instantiations. If the compiler always
decides to inline the definition, not to instantiate it implicitly,
linking invoke_on_task will fail. (It happened to me when I turned up
inline-threshold). Fix that.
Closes#12387
Type of operation is related to a specific implementation
of a task. Then, it should rather be access with a virtual
method in tasks::task_manager::task::impl than be
its attribute.
Closes#12326
* github.com:scylladb/scylladb:
api: delete unused type parameter from task_manager_test api
tasks: repair: api: remove type attribute from task_manager::task::status
tasks: add type() method to task_manager::task::impl
repair: add reason attribute to repair_task
The generic task holds and destroyes a task::impl
but we want the derived class's destructor to be called
when the task is destroyed otherwise, for example,
member like abort_source subscription will not be destroyed
(and auto-unlinked).
Fixes#12183
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#12266
The PR introduces shard_repair_task_impl which represents a repair task
that spans over a single shard repair.
repair_info is replaced with shard_repair_task_impl, since both serve
similar purpose.
Closes#12066
* github.com:scylladb/scylladb:
repair: reindent
repair: replace repair_info with shard_repair_task_impl
repair: move repair_info methods to shard_repair_task_impl
repair: rename methods of repair_module
repair: change type of repair_module::_repairs
repair: keep a reference to shard_repair_task_impl in row_level_repair
repair: move repair_range method to shard_repair_task_impl
repair: make do_repair_ranges a method of shard_repair_task_impl
repair: copy repair_info methods to shard_repair_task_impl
repair: corutinize shard task creation
repair: define run for shard_repair_task_impl
repair: add shard_repair_task_impl
Fix some issues found with gcc 12. Note we can't fully compile with gcc yet, due to [1].
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056Closes#12121
* github.com:scylladb/scylladb:
utils: observer: qualify seastar::noncopyable_function
sstables: generation_type: forgo constexpr on hash of generation_type
logalloc: disambiguate types and non-type members
task_manager: disambiguate types and non-type members
direct_failure_detector: don't change meaning of endpoint_liveness
schema: abort on illegal per column computation kind
database: abort on illegal per partition rate limit operation
mutation_fragment: abort on illegal fragment type
per_partition_rate_limit_options: abort on illegal operation type
schema: drop unused lambda
mutation_partition: drop unused lambda
cql3: create_index_statement: remove unused lambda
transport: prevent signed and unsigned comparison
database: don't compare signed and unsigned types
raft: don't compare signed and unsigned types
compaction: don't compare signed and unsigned compaction counts
bytes_ostream: don't take reference to packed variable
Currently, each data sync repair task is started (and hence run) twice.
Thus, when two running operations happen within a time frame long
enough, the following situation may occur:
- the first run finishes
- after some time (ttl) the task is unregistered from the task manager
- the second run finishes and attempts to finish the task which does
not exist anymore
- memory access causes a segfault.
The second call to start is deleted. A check is added
to the start method to ensure that each task is started at most once.
Fixes: #12089Closes#12090
task_manager has some members with the same names as types from
namespace scope. gcc (rightfully) complains that this changes
the meaning of the name. Qualify the types to disambiguate.
As a preparation to replacing repair_info with shard_repair_task_impl,
type of _repairs in repair module is changed from
std::unordered_map<int, lw_shared_ptr<repair_info>> to
std::unordered_map<int, tasks::task_id>.
Currently in start() method a task is run even if it was already
aborted.
When start() is called on an aborted task, its state is set to
task_manager::task_state::failed and it doesn't run.