Files
scylladb/docs/operating-scylla/admin-tools/task-manager.rst
Aleksandra Martyniuk 683176d3db tasks: add shard, start_time, and end_time to task_stats
task_stats contains short info about a task. To get a list of task_stats
in the module, one needs to request /task_manager/list_module_tasks/{module}.

To make identification and navigation between tasks easier, extend
task_stats to contain shard, start_time, and end_time.

Closes scylladb/scylladb#22351
2025-02-04 12:11:24 +02:00

107 lines
5.1 KiB
ReStructuredText

==================
Task manager tasks
==================
Task manager tracks long-running background operations. Task manager is divided into modules. Each module is responsible
for tracking operations of similar purposes (e.g. compactions). The operation and its parts are represented by tasks.
Tasks form a tree, where the root task covers the whole operation (e.g. compaction) and its children cover
its suboperations (e.g. compaction of one table), etc.
There are two types of tasks: *cluster* and *node* tasks. Node tasks cover operations that are performed on a single
node. To see their stats, you need to request the status from a particular node. Cluster tasks are responsible
for the operations that spread across many nodes. They are visible from all nodes in the cluster. Cluster tasks
can't have parents. They can have children on many nodes. The status of a cluster task's child may not be accessible
even though the cluster operation is still running.
A task may be *internal*. This means that the task has a parent or is started by an internal process. By default API
calls skip the internal tasks. Cluster tasks cannot be internal.
If a task is internal, it is unregistered from task manager immediately after it is finished. If it has a non-cluster
parent, the task's status is folded into its parent and accessible only through the parent. You won't see the statuses
of indirect descendants (e.g. children of children) of a finished task unless they have failed.
Other non-cluster tasks stay in task manager for *task_ttl* seconds after they are finished or *user_task_ttl* seconds if they
were started by user. task_ttl value can be set with ``task_ttl_in_seconds`` param or through ``/task_manager/ttl`` api.
user_task_ttl value can be set with ``user_task_ttl_in_seconds`` param or through ``/task_manager/user_ttl`` api. The time
for which cluster tasks are visible in task manager isn't specified.
Task manager API
----------------
Data structures
^^^^^^^^^^^^^^^
Task data, that is returned from the task manager API, is kept in either ``task_stats`` or ``task_status``.
task_stats
..........
- *task_id* - unique task id;
- *type* - a type of the task, e.g. offstrategy compaction;
- *kind* - whether the task is per node or cluster;
- *scope* - specifies the operation's scope, e.g. keyspace, range;
- *state* - one of created, running, done, failed;
- *sequence_number* - an operation number (per module). It is shared by all tasks in a tree. Irrelevant for cluster tasks;
- *keyspace* - optional, name of a keyspace on which the task operates;
- *table* - optional, name of a table on which the task operates;
- *entity* - optional, additional info specific to the task;
- *shard* - optional, shard id on which the task operates;
- *start_time* - relevant only if state != created;
- *end_time* - relevant only if the task is finished (state in [done, failed]).
task_status
...........
All fields from task_stats and additionally:
- *is_abortable* - a flag that decides whether the task can be aborted through API;
- *error* - relevant only if the task failed;
- *parent_id* - relevant only if the task has a parent;
- *progress_units* - a unit of progress;
- *progress_total* - job size in progress_units;
- *progress_completed* - current progress in progress_units;
- *children_ids* - list of pairs of children ids and nodes on which they are created.
API calls
^^^^^^^^^^
* ``/task_manager/list_modules`` - lists modules supported by task manager;
* ``/task_manager/list_module_tasks/{module}`` - lists tasks in the module; query params:
- *internal* - if set, internal tasks are listed, false by default;
- *keyspace* - if set, tasks are filtered to contain only the ones working on this keyspace;
- *table* - if set, tasks are filtered to contain only the ones working on this table;
* ``/task_manager/task_status/{task_id}`` - gets the task's status;
* ``/task_manager/abort_task/{task_id}`` - aborts the task if it's abortable, otherwise 403 status code is returned;
* ``/task_manager/wait_task/{task_id}`` - waits for the task and gets its status; query params:
- *timeout* - timeout in seconds; if set - 408 status code is returned if waiting times out;
* ``/task_manager/task_status_recursive/{task_id}`` - gets statuses of the task and all its descendants in BFS order;
* ``/task_manager/ttl`` - gets or sets new ttl; query params (if setting):
- *ttl* - new ttl value.
* ``/task_manager/user_ttl`` - gets or sets new user ttl; query params (if setting):
- *user_ttl* - new user ttl value.
* ``/task_manager/drain/{module}`` - unregisters all finished local tasks in the module.
Cluster tasks are not unregistered from task manager with API calls.
Tasks API
---------
With task manager, we can have asynchronous versions of synchronous calls. Some of them are accessible from tasks API.
The calls work analogically as their synchronous versions, but instead of waiting for the operation to be done, they
return the id of the associated task. You can query the operation's status with task manager API.
See :doc:`Nodetool tasks </operating-scylla/nodetool-commands/tasks/index>`.
To learn how to interact with REST API see :doc:`REST API </operating-scylla/rest>`.