Files
scylladb/docs/dev/task_manager.md
Aleksandra Martyniuk 18cc79176a api: task_manager: do not unregister tasks on get_status
Currently, /task_manager/task_status_recursive/{task_id} and
/task_manager/task_status/{task_id} unregister queries task if it
has already finished.

The status should not disappear after being queried. Do not unregister
finished task when its status or recursive status is queried.
2025-01-27 11:23:45 +01:00

4.0 KiB

Task manager is a tool for tracking long-running background operations.

Structure overview

Task manager is divided into modules, e.g. repair or compaction module, which keep track of operations of similar nature. Operations are tracked with tasks.

Each task covers a logical part of the operation, e.g repair of a keyspace or a table. Each operation is covered by a tree of tasks, e.g. global repair task is a parent of tasks covering a single keyspace, which are parents of table tasks.

There are two types of tasks supported by task manager - regular tasks (task_manager::task) and virtual tasks (task_manager::virtual_task). Regular tasks cover local operations (or their parts) and virtual tasks - global, cluster-wide operations.

Time to live of a task

Regular root tasks are kept in task manager for task_ttl time after they are finished or for user_task_ttl if they were started by user. task_ttl or user_task_ttl value can be set in node configuration with respectively --task-ttl-in-seconds or --user-task-ttl-in-seconds option or changed with task manager API (/task_manager/ttl or /task_manager/user_ttl).

A task which isn't a root is unregistered immediately after it is finished and its status is folded into its parent. When a task is being folded into its parent, info about each of its children is lost unless the child or any child's descendant failed.

Time for which a virtual task is shown in task manager depends on a specific implementation.

Internal

Tasks can be marked as internal, which means they are not listed by default. A task should be marked as internal if it has a parent which is a regular task or if it's supposed to be unregistered immediately after it's finished.

Abortable

A flag which determines if a task can be aborted through API.

Type vs scope vs kind

type of a task describes what operation is covered by a task, e.g. "major compaction".

scope of a task describes for which part of the operation the task is responsible, e.g. "shard".

kind of a task indicates whether a task is regular ("local") or virtual ("global").

API

Documentation for task manager API is available under api/api-doc/task_manager.json. Briefly:

  • /task_manager/list_modules - lists module supported by task manager;
  • /task_manager/list_module_tasks/{module} - lists (by default non-internal) tasks in the module;
  • /task_manager/task_status/{task_id} - gets the task's status;
  • /task_manager/abort_task/{task_id} - aborts the task if it's abortable;
  • /task_manager/wait_task/{task_id} - waits for the task and gets its status;
  • /task_manager/task_status_recursive/{task_id} - gets statuses of the task and all its descendants in BFS order;
  • /task_manager/ttl - gets or sets new ttl.
  • /task_manager/user_ttl - gets or sets new user ttl.
  • /task_manager/drain/{module} - unregisters all finished local tasks in the module.

Virtual tasks

A virtual task is a task which covers an operation that spreads among the whole cluster. From API perspective virtual tasks are similar to regular tasks. The main differences are:

  • a virtual task is presented on each node;
  • time which virtual tasks spend in task manager is implementation dependent;
  • number of children does not have to be monotonous (virtual tasks do not keep references to their children).

Implementation

Virtual tasks aren't kept in memory and their status isn't updated proactively as for regular tasks. Instead, the appropriate data (e.g. task status) is created based on an associated service (e.g. storage_service for node_ops virtual tasks) once API user requests it.

virtual_task class generates statuses for all operations from one group - it can contain many abstract virtual tasks. All virtual_tasks are kept only on shard 0.

Group traits of virtual tasks

  • topology_change_group:
    • tasks are listed for user_task_ttl after they are finished, but their statuses can be viewed as long as they are kept in topology_requests table.