Commit Graph

47 Commits

Author SHA1 Message Date
Tomasz Grabiec
cddde464ca Merge 'service: Support adding/removing a datacenter with tablets by changing RF' from Aleksandra Martyniuk
With this change, you can add or remove a DC(s) in a single ALTER KEYSPACE statement. It requires the keyspace to use rack list replication factor.

In existing approach, during RF change all tablet replicas are rebuilt at once. This isn't the case now. In global_topology_request::keyspace_rf_change the request is added to a ongoing_rf_changes - a new column in system.topology table. In a new column in system_schema.keyspaces - next_replication - we keep the target RF.

In make_rf_change_plan, load balancer schedules necessary migrations, considering the load of nodes and other pending tablet transitions. Requests from ongoing_rf_changes are processed concurrently, independently from one another. In each request racks are processed concurrently. No tablet replica will be removed until all required replicas are added. While adding replicas to each rack we always start with base tables and won't proceed with views until they are done (while removing - the other way around). The intermediary steps aren't reflected in schema. When the Rf change is finished:
- in system_schema.keyspaces:
  - next_replication is cleared;
  - new keyspace properties are saved;
- request is removed from ongoing_rf_changes;
- the request is marked as done in system.topology_requests.

Until the request is done, DESCRIBE KEYSPACE shows the replication_v2.

If a request hasn't started to remove replicas, it can be aborted using task manager. system.topology_requests::error is set (but the request isn't marked as done) and next_replication = replication_v2. This will be interpreted by load balancer, that will start the rollback of the request. After the rollback is done, we set the relevant system.topology_requests entry as done (failed), clear the request id from system.topology::ongoing_rf_changes, and remove next_replication.

Fixes: SCYLLADB-567.

No backport needed; new feature.

Closes scylladb/scylladb#24421

* github.com:scylladb/scylladb:
  service: fix indentation
  docs: update documentation
  test: test multi RF changes
  service: tasks: allow aborting ongoing RF changes
  cql3: allow changing RF by more than one when adding or removing a DC
  service: handle multi_rf_change
  service: implement make_rf_change_plan
  service: add keyspace_rf_change_plan to migration_plan
  service: extend tablet_migration_info to handle rebuilds
  service: split update_node_load_on_migration
  service: rearrange keyspace_rf_change handler
  db: add columns to system_schema.keyspaces
  db: service: add ongoing_rf_changes to system.topology
  gms: add keyspace_multi_rf_change feature
2026-04-22 01:46:11 +02:00
Nikos Dragazis
295e434781 distributed_loader: Link resharding tasks to migration virtual task
When a table is loaded on startup during a vnodes-to-tablets migration
(forward or rollback), the `table_populator` runs a resharding
compaction.

Set the migration virtual task as parent of the resharding task. This
enables users to easily find all node-local resharding tasks related to
a particular migration.

Make `migration_virtual_task::make_task_id()` public so that the
`distributed_loader` can compute the migration's task ID.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2026-04-17 20:59:05 +03:00
Nikos Dragazis
696f9f8954 service: Add virtual task for vnodes-to-tablets migrations
Add a virtual task that exposes in-progress vnodes-to-tablets migrations
through the task manager API.

The task is synthesized from the current migration state, so completed
migrations are not shown. Progress is reported as the number of nodes
that currently use tablets: it increases on the forward path and
decreases on rollback. For simplicity, per-node storage modes are not
exposed in the task status; callers that need them should use the
migration status REST endpoint.

Unlike regular tasks that use time-based UUIDs, this task uses
deterministic named UUIDs derived from the keyspace names. This keeps
the implementation simple (no need to persist them) and gives each
keyspace a stable task ID. The downside is that the start time of each
task is unknown and repeated migrations of the same keyspace
(migration -> rollback -> new migration) cannot be distinguished.

Introduce a new task manager module to keep them separate from other
tasks.

Add support for `wait()`. While its practical value is debatable
(migration is a manual procedure, rolling restart will interrupt it), it
keeps the task consistent with the task manager interface.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2026-04-17 20:59:05 +03:00
Aleksandra Martyniuk
1b2b453782 service: tasks: allow aborting ongoing RF changes
Allow aborting an ongoing RF change using task manager.

RF change can only be aborted if:
- it is currently paused (existing);
- it is a multi-RF change that still has replicas to be added.

In the second case, we set error for the request in system.topology_requests
and set next_replication to replication_v2. This makes load balancer
roll back the RF change.
2026-04-17 09:58:08 +02:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Aleksandra Martyniuk
d4fdeb4839 tasks: pass token_metadata_ptr to task_manager::virtual_task::impl::get_children
In get_children we get the vector of alive nodes with get_nodes.
Yet, between this and sending rpc to those nodes there might be
a preemption. Currently, the liveness of a node is checked once
again before the rpcs (only with gossiper not in topology - unlike
get_nodes).

Modify get_children, so that it keeps a token_metadata_ptr,
preventing topology from changing between get_nodes and rpcs.

Remove test_get_children as it checked if the get_children method
won't fail if a node is down after get_nodes - which cannot happen
currently.
2026-03-18 15:37:24 +01:00
Aleksandra Martyniuk
e02b0d763c service: fix indentation 2026-03-10 14:44:52 +01:00
Aleksandra Martyniuk
0e0070f118 service: remove status_helper::tablets
Currently, status_helper::tablets, which keeps a vector of processed
tablet ids, is used only in tablet_virtual_task::get_status_helper,
so there is no point in returning it. Also, in get_status_helper,
it is used only to determine if any tablets are processed.

Remove status_helper::tablets. Use a flag instead of the vector
in get_status_helper.
2026-03-10 14:42:27 +01:00
Aleksandra Martyniuk
e5928497ce service: tasks: scan all tablets in tablet_virtual_task::wait
Currently, for repair tasks tablet_virtual_task::wait gathers the
ids of tablets that are to be repaired. The gathered set is later
used to check if the repair is still ongoing.

However, if the tablets are resized (split or merged), the gathered
set becomes irrelevant. Those, we may end up with invalid tablet id
error being thrown.

Wait until repair is done for all tablets in the table.
2026-03-10 14:42:21 +01:00
Aleksandra Martyniuk
40dca578c5 test: add test_tablet_repair_wait_with_table_drop 2026-03-06 15:08:29 +01:00
Aleksandra Martyniuk
dd634c329f service: tasks: return successful status if a table was dropped
tablet_virtual_task::wait throws if a table on which a tablet operation
was working is dropped.

Treat the tablet operation as successful if a table is dropped.
2026-03-06 14:37:44 +01:00
Aleksandra Martyniuk
3ed8701301 service: tasks: fix type of global_topology_request_virtual_task
Currently, the type of global_topology_request_virtual_task isn't
taken out of std::variant before printing, which results with
a task of type variant(actual_type).

Retrieve the type from the variant before passing it to task type.
2026-01-16 11:36:21 +01:00
Asias He
4f77dd058d repair: Add tablet repair progress report support
This patch adds tablet repair progress report support so that the user
could use the /task_manager/task_status API to query the progress.

In order to support this, a new system table is introduced to record the
user request related info, i.e, start of the request and end of the
request.

The progress is accurate when tablet split or merge happens in the
middle of the request, since the tokens of the tablet are recorded when
the request is started and when repair of each tablet is finished. The
original tablet repair is considered as finished when the finished
ranges cover the original tablet token ranges.

After this patch, the /task_manager/task_status API will report correct
progress_total and progress_completed.

Fixes #22564
Fixes #26896

Closes scylladb/scylladb#27679
2026-01-08 21:55:18 +02:00
Tomasz Grabiec
c077283352 Merge 'service: support conversion of tablet keyspaces to rack-list using ALTER KEYSPACE' from Aleksandra Martyniuk
If a keyspace has a numeric replication factor in a DC and rf < #racks,
then the replicas of tablets in this keyspace can be distributed among
all racks in the DC (different for each tablet). With rack list, we need all
tablet replicas to be placed on the same racks. Hence, the conversion
requires tablet co-location.

After this series, the conversion can be done using ALTER KEYSPACE
statement. The statement that does this conversion in any DC is not
allowed to change a rf in any DC. So, if we have dc1 and dc2 with 3 racks
each and a keyspace ks then with a single ALTER KEYSPACE we can do:
- {dc1 : 2} -> {dc1 : [r1, r2]};
- {dc1 : 2, dc2: 2} ->  {dc1 : [r1, r2], dc2: [r2,r3]};
- {dc1 : 2, dc2: 2} -> {dc1 : [r1, r2], dc2: 2}
- {dc1 : 2} -> {dc1 : 2, dc2 : [r1]}
But we cannot do:
- {dc1 : 2} -> {dc1 : [r1, r2, r3]};
- {dc1 : 1, dc2 : [r1, r2] → dc1: [r1], dc2: [r1].

In order to do the co-locations rf change request is paused. Tablet
load balancer examines the paused rf change requests and schedules
necessary tablet migrations. During the process of co-location, no other
cross-rack migration is allowed.

Load balancer checks whether any paused rf change request is
ready to be resumed. If so, it puts the request back to global topology
request queue.

While an rf change request for a keyspace is running, any other rf change
of this keyspace will fail.

Fixes: #26398.

New feature, no backport

Closes scylladb/scylladb#27279

* github.com:scylladb/scylladb:
  test: add est_rack_list_conversion_with_two_replicas_in_rack
  test: test creating tablet_rack_list_colocation_plan
  test: add test_numeric_rf_to_rack_list_conversion test
  tasks: service: add global_topology_request_virtual_task
  cql3: statements: allow altering from numeric rf to rack list
  service: topology_coordinator: pause keyspace_rf_change request
  service: implement make_rack_list_colocation_plan
  service: add tablet_rack_list_colocation_plan
  cql3: reject concurrent alter of the same keyspace
  test: check paused rf change requests persistence
  db: service: add paused_rf_change_requests to system.topology
  service: pass topology and system_keyspace to load_balancer ctor
  service: tablet_allocator: extract load updates
  service: tablet_allocator: extract ensure_node
  tasks, system_keyspace: Introduce get_topology_request_entry_opt()
  node_ops: Drop get_pending_ids()
  node_ops: Drop redundant get_status_helper()
2025-12-17 10:05:06 +01:00
Aleksandra Martyniuk
9039dfa4a5 tasks: service: add global_topology_request_virtual_task
Add a service::topo::global_topology_request_virtual_task, which
covers the replication factor changes.

Currently, the global_topology_request_virtual_task can be aborted
only if it is paused.

The progress of the rf change isn't counted.
2025-12-16 13:31:22 +01:00
Avi Kivity
24264e24bb Revert "repair: Add tablet repair progress report support"
This reverts commit faad0167d7. It causes
a regression in

test_two_tablets_concurrent_repair_and_migration_repair_writer_level

in debug mode (with ~5%-10% probability).

Fixes #27510.

Closes scylladb/scylladb#27560
2025-12-11 12:18:11 +02:00
Asias He
faad0167d7 repair: Add tablet repair progress report support
This patch adds tablet repair progress report support so that the user
could use the /task_manager/task_status API to query the progress.

In order to support this, a new system table is introduced to record the
user request related info, i.e, start of the request and end of the
request.

The progress is accurate when tablet split or merge happens in the
middle of the request, since the tokens of the tablet are recorded when
the request is started and when repair of each tablet is finished. The
original tablet repair is considered as finished when the finished
ranges cover the original tablet token ranges.

After this patch, the /task_manager/task_status API will report correct
progress_total and progress_completed.

Fixes #22564
Fixes #26896

Closes scylladb/scylladb#26924
2025-12-08 13:35:19 +02:00
Aleksandra Martyniuk
55fde70f8d api: tasks: task_manager: keep children identities in chunked_{array,vector}
task_status contains a vector of children identities. If the number
of children is large, we may hit oversized allocation.

Change all types of children-related containers to chunked_vector.
Modify the children type returned from task manager API.

Fixes: scylladb#25795.

Closes scylladb/scylladb#25923
2025-09-15 08:44:16 +03:00
Michael Litvak
ddf02c9489 tablets: replace all_tables method
The method all_tables in tablet_metadata is used for iterating over all
tables in the tablet metadata with their tablet maps.

Now that we have co-located tables we need to make the distinction on
which tables we want to iterate over. In some cases we want to iterate
over each group of co-located tables, treating them as one unit, and in
other cases we want to iterate over all tables, doesn't matter if they
are part of a co-located group and have a base table.

We replace all_tables with new methods that can be used for each of the
cases.
2025-07-01 13:20:18 +03:00
Aleksandra Martyniuk
53e0f79947 tasks: check whether a node is alive before rpc
Check whether a node is alive before making an rpc that gathers children
infos from the whole cluster in virtual_task::impl::get_children.
2025-04-17 12:51:22 +02:00
Aleksandra Martyniuk
f8e4198e72 service: tasks: hold token_metadata_ptr in tablet_virtual_task
Hold token_metadata_ptr in tablet_virtual_task methods that iterate
over tablets, to keep the tablet_map alive.

Fixes: https://github.com/scylladb/scylladb/issues/22316.

Closes scylladb/scylladb#22740
2025-02-19 09:33:53 +02:00
Aleksandra Martyniuk
e16b413568 tasks: replace ip with host_id in task_identity
Replace ip with host_id in task_identity. Translate host_id to ip
in task manager api handlers.

Use host_id in send_tasks_get_children.
2025-02-05 10:11:52 +01:00
Aleksandra Martyniuk
683176d3db tasks: add shard, start_time, and end_time to task_stats
task_stats contains short info about a task. To get a list of task_stats
in the module, one needs to request /task_manager/list_module_tasks/{module}.

To make identification and navigation between tasks easier, extend
task_stats to contain shard, start_time, and end_time.

Closes scylladb/scylladb#22351
2025-02-04 12:11:24 +02:00
Botond Dénes
8c8db2052e Merge 'service: add child for tablet repair virtual task' from Aleksandra Martyniuk
tablet_repair_task_impl is run as a part of tablet repair. Make it
a child of tablet repair virtual task.

tablet_repair_task_impl started by /storage_service/repair_async API
(vnode repair) does not have a parent, as it is the top-level task
in that case.

No backport needed; new functionality

Closes scylladb/scylladb#22372

* github.com:scylladb/scylladb:
  test: add test to check tablet repair child
  service: add child for tablet repair virtual task
2025-02-04 12:08:24 +02:00
Aleksandra Martyniuk
610a761ca2 service: use read barrier in tablet_virtual_task::contains
Currently, when the tablet repair is started, info regarding
the operation is kept in the system.tablets. The new tablet states
are reflected in memory after load_topology_state is called.
Before that, the data in the table and the memory aren't consistent.

To check the supported operations, tablet_virtual_task uses in-memory
tablet_metadata. Hence, it may not see the operation, even though
its info is already kept in system.tablets table.

Run read barrier in tablet_virtual_task::contains to ensure it will
see the latest data. Add a test to check it.

Fixes: #21975.

Closes scylladb/scylladb#21995
2025-02-04 12:07:42 +02:00
Aleksandra Martyniuk
c23ce40f50 service: add child for tablet repair virtual task
tablet_repair_task_impl is run as a part of tablet repair. Make it
a child of tablet repair virtual task.

tablet_repair_task_impl started by /storage_service/repair_async API
(vnode repair) does not have a parent, as it is the top-level task
in that case.
2025-02-03 10:31:14 +01:00
Kefu Chai
1ef2d9d076 tree: migrate from boost::adaptors::transformed to std::views::transform
Replace remaining uses of boost::adaptors::transformed with std::views::transform
to reduce Boost dependencies, following the migration pattern established in
bab12e3a. This change addresses recently merged code that reintroduced Boost
header dependencies through boost::adaptors::transformed usage.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22365
2025-01-17 16:56:40 +02:00
Aleksandra Martyniuk
840bcdc158 service: extend tablet_virtual_task::abort
Set resize tasks as non abortable.
2025-01-10 10:03:08 +01:00
Aleksandra Martyniuk
639470d256 service: retrun status_helper struct from tablet_virtual_task::get_status_helper 2025-01-10 10:03:08 +01:00
Aleksandra Martyniuk
0c7bef6875 service: extend tablet_virtual_task::wait
Extend tablet_virtual_task::wait to support resize tasks.

To decide what is a state of a finished resize virtual task (done
or failed), the tablet count is checked. The task state is set to done,
if the tablet count before resize is different than after.
2025-01-10 10:03:08 +01:00
Aleksandra Martyniuk
adf6b3f3ff service: extend tablet_virtual_task::get_status
Extend tablet_virtual_task::get_status to cover resize tasks.
2025-01-10 10:03:08 +01:00
Aleksandra Martyniuk
78215d64d1 service: extend tablet_virtual_task::contains
Extend tablet_virtual_task::contains to check resize operations.

Methods that do not support resize tasks return immediately if they
are handling split or merge task.
2025-01-10 10:03:07 +01:00
Aleksandra Martyniuk
0df64e18fb service: extend tablet_virtual_task::get_stats
Extend tablet_virtual_task::get_stats to list resize tasks.
2025-01-10 10:03:07 +01:00
Aleksandra Martyniuk
a8d7f4d89a service: add service::task_manager_module::get_nodes 2025-01-10 10:03:07 +01:00
Kefu Chai
6acc5294a4 treewide: migrate from boost::copy_range to std::ranges::to
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::to`.

in this change, we:

- replace `boost::copy_range` to `std::ranges::to`
- remove unused `#include` of boost headers

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21880
2024-12-26 11:46:26 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Avi Kivity
fe9fcdfe30 task_manager.hh: replace boost ranges with std ranges
Standardize on one range library to reduce dependency load.

Unfortunately, std::views::concat (the replacement for boost::join),
is C++26 only. We use two separate inserts to the result vector to
compensate, and rationalize it by saying that boost::join() is likely
slow due to the need for type-erasure.

Closes scylladb/scylladb#21834
2024-12-16 13:08:02 +02:00
Aleksandra Martyniuk
bc17535427 test: add tests to check the failed migration virtual tasks 2024-12-11 15:17:16 +01:00
Aleksandra Martyniuk
e0d3182fa0 service: extend tablet_virtual_task::abort
Set migration tasks as non abortable.
2024-12-11 15:17:15 +01:00
Aleksandra Martyniuk
4c529a8f2e service: extend tablet_virtual_task::wait
Extend tablet_virtual_task::wait to support migration tasks.

To decide what is a state of a finished migration virtual task
(done or failed), the tablet replicas are checked. The task state
is set to done, if the replicas contain the destination of a tablet
migration.
2024-12-11 15:17:14 +01:00
Aleksandra Martyniuk
de191fb851 service: extend tablet_virtual_task::get_status_helper
Extend tablet_virtual_task::get_status_helper to cover migration
tasks. get_status_helper is used by get_status and wait methods.
Waiting for a task in the latter will be modified in the following
patch.
2024-12-11 15:17:14 +01:00
Aleksandra Martyniuk
50ce3d9106 service: extend tablet_virtual_task::contains
Extend tablet_virtual_task::contains to check migration operations.
Returned virtual_task_hint contains also tablet_id (only for migration
tasks) and task_type.

Return immediately from methods that do not support migration
for non-repair task types. The methods' support for migration
will be implemented in the following patches.
2024-12-11 15:17:14 +01:00
Aleksandra Martyniuk
53bd61a539 service: extend tablet_virtual_task::get_stats
Extend tablet_virtual_task::get_stats to list migration tasks.
2024-12-11 15:17:13 +01:00
Aleksandra Martyniuk
215a15d103 service: tasks: make get_table_id a method of virtual_task_hint 2024-12-11 15:17:08 +01:00
Aleksandra Martyniuk
dee6404aa4 locator: rename tablet_task_info methods 2024-12-11 12:07:36 +01:00
Kefu Chai
4bc7e068ff locator: remove unused "#include"s
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21754
2024-12-03 11:05:35 +02:00
Aleksandra Martyniuk
409ed508cc service: add tablet_virtual_task
Add tablet_virtual_task, which covers tablet repair.
2024-11-28 11:42:38 +01:00