Commit Graph

1102 Commits

Author SHA1 Message Date
Aleksandra Martyniuk
b5b2ffa5df \test: add test to check concurrent migration and repair of two different tablets
(cherry picked from commit bae6711809)
2025-04-04 10:14:51 +02:00
Aleksandra Martyniuk
307f00a398 repair: release erm in repair_writer_impl::create_writer when possible
Currently, repair_writer_impl::create_writer keeps erm to ensure
that a sharder is valid. If we repair a tablet, erm blocks the state
machine and no operation on any tablet of this table might be performed.

Use auto_refreshing_sharder and topology_guard to ensure that the
operation is safe and that tablet operations on the whole table
aren't blocked.

Fixes: #23453.
(cherry picked from commit 1dc29ddc86)
2025-04-03 13:19:40 +00:00
Asias He
8c93a331f7 repair: Enable small table optimization for system_replicated_keys
This enterprise-only system table is replicated and small. It should be
included for small table optimization.

Fixes scylladb/scylla-enterprise#5256

Closes scylladb/scylladb#23135

Closes scylladb/scylladb#23147
2025-04-01 13:36:51 +03:00
Aleksandra Martyniuk
0a0347cb4e repair: do not hold erm for repair scheduled by scheduler
Do not hold erm	for tablet repair scheduled by scheduler. Thanks to
that one tablet repair won't exclude migration of other tablets.

Concurrent repair and migration of the same tablet isn't possible,
since a tablet can be in one type of transition only at the time.
Hence the change is safe.

Refs: https://github.com/scylladb/scylladb/issues/22408.
(cherry picked from commit 5b792bdc98)
2025-03-19 10:09:51 +01:00
Aleksandra Martyniuk
da64c02b92 repair: get total rf based on current erm
Get total rf based on erm. Currently, it does not change anything
because erm stays the same during the whole repair.

(cherry picked from commit a1375896df)
2025-03-19 10:09:30 +01:00
Aleksandra Martyniuk
39aabe5191 repair: make shard_repair_task_impl::erm private
Make shard_repair_task_impl::erm private. Access it with getter.

(cherry picked from commit 34cd485553)
2025-03-19 10:09:11 +01:00
Aleksandra Martyniuk
9eeff8573b repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary
When small_table_optimization isn't enabled, put_row_diff_with_rpc_stream
does not access erm. Pass small_table_optimization_params containing erm
only when small_table_optimization is enabled.

This is safe as erm is kept by shard_repair_task_impl.

(cherry picked from commit 444c7eab90)
2025-03-19 10:08:22 +01:00
Aleksandra Martyniuk
4115f6f367 repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary
When small_table_optimization isn't enabled, flush_rows_in_working_row_buf
does not access erm. Add small_table_optimization_params containing erm and
pass it only when small_table_optimization is enabled.

This is safe as erm is kept by shard_repair_task_impl.

(cherry picked from commit e56bb5b6e2)
2025-03-19 10:07:45 +01:00
Aleksandra Martyniuk
fb2c46dfbe repair: pass session_id to repair_writer_impl::create_writer
(cherry picked from commit 09c74aa294)
2025-03-19 10:07:00 +01:00
Aleksandra Martyniuk
b4e37600d6 repair: keep materialized topology guard in shard_repair_task_impl
Keep materialized topology guard in shard_repair_task_impl and check
it in check_in_abort_or_shutdown and before each range repair.

(cherry picked from commit 47bb9dcf78)
2025-03-19 10:04:17 +01:00
Aleksandra Martyniuk
6bbf20a440 repair: pass session_id to repair_meta
Pass session_id of tablet repair down the stack from the repair request
to repair_meta.

The session_id will be utiziled in the following patches.

(cherry picked from commit 928f92c780)
2025-03-19 10:02:24 +01:00
Aleksandra Martyniuk
1f102ca2f7 locator: add tablet_task_info::selected_by_filters
Extract dcs and hosts filters check to a method.

(cherry picked from commit fe4e99d7b3)
2025-03-05 10:36:51 +01:00
Asias He
6d5b029812 repair: Introduce Host and DC filter support
Currently, the tablet repair scheduler repairs all replicas of a tablet.
It does not support hosts or DCs selection. It should be enough for most
cases. However, users might still want to limit the repair to certain
hosts or DCs in production. #21985 added the preparation work to add the
config options for the selection. This patch adds the hosts or DCs
selection support.

Fixes #22417

(cherry picked from commit 5545289bfa)
2025-02-27 12:14:44 +01:00
Kefu Chai
6b27459de3 tasks: make release_resources() a coroutine
Convert tasks::task_manager::task::impl::release_resources() to a coroutine
to prepare for upcoming changes that will implement asynchronous resource
release.

This is a preparatory refactoring that enables future coroutine-based
implementation of resource cleanup logic.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 4c1f1baab4)
2025-02-15 22:46:43 +00:00
Aleksandra Martyniuk
8e754e9d41 repair: handle keyspace dropped
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.

(cherry picked from commit bfb1704afa)
2025-01-28 21:50:35 +00:00
Tomasz Grabiec
c7f78edc78 Merge 'repair: Wire repair_time in system.tablets for tombstone gc' from Asias He
The repair_time in system.tablets will be updated when repair runs
successfully. We can now use it to update the repair time for tombstone
gc, i.e, when the system.tablets.repair_time is propagated, call
gc_state.update_repair_time() on the node that is the owner of the
tablet.

Since b3b3e880d3 ("repair: Reduce hints and batchlog flush"), the
repair time that could be used for tombstone gc might be smaller than
when the repair is started, so the actual repair time for tombstone gc
is returned by the repair rpc call from the repair master node.

Fixes #17507

New feature. No backport is needed.

Closes scylladb/scylladb#21896

* github.com:scylladb/scylladb:
  repair: Stop using rpc to update repair time for repairs scheduled by scheduler
  repair: Wire repair_time in system.tablets for tombstone gc
  test: Disable flush_cache_time for two tablet repair tests
  test: Introduce guarantee_repair_time_next_second helper
  repair: Return repair time for repair_service::repair_tablet
  service: Add tablet_operation.hh
2025-01-20 18:08:49 +01:00
Asias He
387b2050df repair: Stop using rpc to update repair time for repairs scheduled by scheduler
If a tablet repair is scheduled by tablet repair scheduler, the repair
time for tombstone gc will be updated when the system.tablet.repair_time
is updated. Skip updating using rpc calls in this case.
2025-01-17 16:12:05 +08:00
Asias He
41a1eca072 repair: Return repair time for repair_service::repair_tablet
The repair time returned by repair_service::repair_tablet considers the
hints and batchlog flush time, so it could be used for the tombstone gc
purpose.
2025-01-17 16:12:05 +08:00
Gleb Natapov
f9df092fd1 repair: drop unneeded code
There is a code that creates a map from id to ip and then creates a
vector from the keys of the map. Create a vector directly instead.
2025-01-16 16:37:08 +02:00
Gleb Natapov
ae821ba07a repair: use gossiper to map ip to host id instead of token_metadata
We want to drop ips from token_metadata so move to different API to map
ip to id.
2025-01-15 16:30:29 +02:00
Gleb Natapov
4d7c05ad82 hints: move create_hint_sync_point function to host ids
One of its caller is in the RESTful API which gets ips from the user, so
we convert ips to ids inside the API handler using gossiper before
calling the function. We need to deprecate ip based API and move to host
id based.
2025-01-15 16:30:28 +02:00
Aleksandra Martyniuk
32ab58cdea repair: add repair_service gate
In main.cc storage_service is started before and stopped after
repair_service. storage_service keeps a reference to sharded
repair_service and calls its methods, but nothing ensures that
repair_service's local instance would be alive for the whole
execution of the method.

Add a gate to repair_service and enter it in storage_service
before executing methods on local instances of repair_service.

Fixes: #21964.

Closes scylladb/scylladb#22145
2025-01-14 07:56:38 -05:00
Kefu Chai
f41b030fdd repair: do not include unused header
this unused include was identifier by clang-include-cleaner. after
auditing task_manager_module.hh, the report has been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22200
2025-01-08 12:58:35 +02:00
Aleksandra Martyniuk
a91e03710a repair: check tasks local to given shard
Currently task_manager_module::is_aborted checks the tasks local
to caller's shard on a given shard.

Fix the method to check the task map local to the given shard.

Fixes: #22156.

Closes scylladb/scylladb#22161
2025-01-06 21:53:54 +02:00
Kefu Chai
6acc5294a4 treewide: migrate from boost::copy_range to std::ranges::to
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::to`.

in this change, we:

- replace `boost::copy_range` to `std::ranges::to`
- remove unused `#include` of boost headers

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21880
2024-12-26 11:46:26 +02:00
Kefu Chai
cd2a2bd021 repair: correct misspelling of "corespondent"
replace "corespondent" with "corresponding" in a logging message.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22003
2024-12-23 11:29:58 +02:00
Asias He
0141906c4a repair: Enable small table optimization for RBNO rebuild
Similar to 9ace191616 (repair: Enable
small table optimization for RBNO bootstrap and decommission), this
patch enables small table optimization for RBNO rebuild.

This is useful for rebuild ops which is used for building an empty DC.

Fixes: #21951

Closes scylladb/scylladb#21952
2024-12-20 13:03:34 +02:00
Kefu Chai
6914892a1b repair: do not include unused headers
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21837
2024-12-20 08:55:56 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Gleb Natapov
e479ba88af repair: move repair code to use host ids instead of host ips
The patch is rather large, but it is a straightforward conversion from
one type to another.
2024-12-15 11:31:11 +02:00
Avi Kivity
ecd78c88bf Merge "move more verbs to idl" from Gleb
"
The series moves node ops, repair and streaming verbs to IDL. Also
contains IDL related cleanups.

In addition to the CI tested manually by bootstrapping a node with the
series into a cluster of old nodes with repair and streaming both in
gossiper and raft mode. This exercises repair, streaming and node_ops
paths.

"

* 'gleb/move-more-rpcs-to-idl-v3' of github.com:scylladb/scylla-dev:
  repair: repair_flush_hints_batchlog_request::target_nodes is not used any more, so mark it as such
  streaming: move streaming verbs to IDL
  messaging_service: move repair verbs to IDL
  node_ops: move node_ops_cmd to IDL
  idl: rename partition_checksum.dist.hh to repair.dist.hh
  idl: move node_ops related stuff from the repair related IDL
2024-12-12 17:19:43 +02:00
muthu90tech
e49381119d locator: topology: use node& instead of node*
This change goes thru locator:topology to use node&
instead of node* where nullptr is not possible. There are
places where the node object is used in unordered_set, in
those cases the node is wrapped in std::reference_wrapper.

Fixes scylladb/scylladb#20357

Closes scylladb/scylladb#21863
2024-12-12 13:22:55 +01:00
Gleb Natapov
c095f63ea5 repair: repair_flush_hints_batchlog_request::target_nodes is not used any more, so mark it as such
After b3b3e880d3 target_nodes is not used
by the receiver, so we can skip setting it on sender as well.
2024-12-11 18:26:57 +02:00
Gleb Natapov
bfee93c747 messaging_service: move repair verbs to IDL 2024-12-09 14:50:52 +02:00
Gleb Natapov
5f6007f6ec node_ops: move node_ops_cmd to IDL 2024-12-09 14:50:52 +02:00
Gleb Natapov
39c75d3add idl: rename partition_checksum.dist.hh to repair.dist.hh
The file has many more things than partition_checksum. All of them
are repair related now.
2024-12-09 14:49:59 +02:00
Avi Kivity
9024e4940c counters.hh: drop unused boost includes
Re-add them to source files that need them.

Closes scylladb/scylladb#21738
2024-12-05 12:27:41 +02:00
Avi Kivity
841481c202 Merge "move storage proxy and adjacent services to identify hosts by ids" from Gleb
"
This rather large patch series moves storage proxy and some adjacent
services (like migration manager) to use host ids to identify nodes rather
than ips. Messaging service gains a capability to address nodes by host
ids (which allows dropping translations from topology coordinator code
that worked on host ids already) and also makes sure that a node with
incorrect host id will reject a message (can happen during address
changes).

The series gets rid of the raft address map completely and replaces it with
the gossiper address map which is managed by the gossiper since translation
is now done in the layer below raft.

Fixes: scylladb/scylladb#6403

perf-simple-query -- smp 1 -m 1G output

Before:

enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64336.82 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41291 insns/op,   24485 cycles/op,        0 errors)
62669.58 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41277 insns/op,   24695 cycles/op,        0 errors)
69172.12 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41326 insns/op,   24463 cycles/op,        0 errors)
56706.60 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41143 insns/op,   24513 cycles/op,        0 errors)
56416.65 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41186 insns/op,   24851 cycles/op,        0 errors)

         throughput: mean=61860.35 standard-deviation=5395.48 median=62669.58 median-absolute-deviation=5153.75 maximum=69172.12 minimum=56416.65
instructions_per_op: mean=41244.62 standard-deviation=76.90 median=41276.94 median-absolute-deviation=58.55 maximum=41326.19 minimum=41142.80
  cpu_cycles_per_op: mean=24601.35 standard-deviation=167.39 median=24512.64 median-absolute-deviation=116.65 maximum=24851.45 minimum=24462.70

After:

enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
65237.35 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   40733 insns/op,   23145 cycles/op,        0 errors)
59283.09 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40624 insns/op,   23948 cycles/op,        0 errors)
70851.03 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40625 insns/op,   23027 cycles/op,        0 errors)
70549.61 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40650 insns/op,   23266 cycles/op,        0 errors)
68634.96 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40622 insns/op,   22935 cycles/op,        0 errors)

         throughput: mean=66911.21 standard-deviation=4814.60 median=68634.96 median-absolute-deviation=3638.40 maximum=70851.03 minimum=59283.09
instructions_per_op: mean=40650.89 standard-deviation=47.55 median=40624.60 median-absolute-deviation=27.11 maximum=40733.37 minimum=40622.33
  cpu_cycles_per_op: mean=23264.16 standard-deviation=402.12 median=23145.29 median-absolute-deviation=237.63 maximum=23947.96 minimum=22934.59

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13531/
SCT (longevity-100gb-4h with nemesis_selector: ['topology_changes']): https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/gleb/job/move-to-host-id/3/

Tested mixed cluster manually.
"

* 'gleb/move-to-host-id-v2' of github.com:scylladb/scylla-dev: (55 commits)
  group0: drop unused field from replace_info struct
  test: rename raft_address_map_test to address_map_test and move if from raft tests
  raft_address_map: remove raft address map
  topology coordinator: do not modify expire state for left/new nodes any more in raft address map
  topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used
  group0: drop raft address map dependency from raft_rpc
  group0: move raft_ticker_type definition from raft_address_map.hh
  storage_service: do not update raft address map on gossiper events
  group0: drop raft address map dependency from raft_server_with_timeouts
  group0: move group0 upgrade code to host ids
  repair: drop raft address map dependency
  group0: remove unused raft address map getter from raft_group0
  group0: drop raft address map from group0_state_machine dependency since it is not used there any more
  group0: remove dependency on raft address map from group0_state_id_handler
  gossiper: add get_application_state_ptr that searches by host_id
  gossiper: change get_live_token_owners to return host ids
  view: move view building to host id
  hints: use host id to send hints
  storage_proxy: remove id_vector_to_addr since it is no longer used
  db: consistency_level: change is_sufficient_live_nodes to work on host ids
  ...
2024-12-03 18:18:48 +02:00
Kefu Chai
bab12e3a98 treewide: migrate from boost::adaptors::transformed to std::views::transform
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.

in this change, we:

- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
  is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
  `std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
  range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
  returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
  by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
  to feed `std::views::transform()` a view range.

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

limitations:

there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21700
2024-12-03 09:41:32 +02:00
Gleb Natapov
4ddb925997 repair: drop raft address map dependency
Replace it with gossiper address map, but make dependency localized.
Only functions that actually use address map get it now.
2024-12-02 10:31:13 +02:00
Gleb Natapov
cd9b349886 migration_manager: move to use host ids instead of ips
Users also amended to pass ids instead of ips.
2024-12-02 10:31:12 +02:00
Gleb Natapov
be5caec54e service: make address_map raft independent
We want to start using address map class outside for raft, so lets make
it work on host_id instead of raft::servers_id and move is outside of
raft.
2024-12-01 12:12:29 +02:00
Kefu Chai
f436edfa22 mutation: remove unused "#include"s
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-29 14:01:44 +08:00
Botond Dénes
ccb433d767 Merge 'tasks: add api_task_ttl for tasks started with API' from Aleksandra Martyniuk
When users start an operation asynchronously with API, they are expected to check the operation's status. Hence, the status should be kept in task manager for reasonable time after the operation is done. The operations that are started internally usually don't need to stay in task manager for that long.

Add api_task_ttl that will be used for tasks started with API. By default it's 1 hour. The time for which non-API tasks stay in task manager isn't changed.

Fixes: #21499.
Refs: #21425.

No backport needed - previous versions may use task_ttl

Closes scylladb/scylladb#21505

* github.com:scylladb/scylladb:
  test: add test to check user_task_ttl
  tasks: api: move make_task method
  docs: nodetool: update backup and restore commands docs
  docs: update task manager docs
  nodetool: add nodetool tasks user-ttl command
  node_ops: use user task ttl for node ops virtual task
  tasks: use user_task_ttl for tasks started by user
  api: task_manager: add /task_manager/user_ttl to get and set user task ttl
  tasks: add task_manager::task::is_user_task method
  tasks: keep updateable_value of task_ttl in task manager
  db: config: add user_task_ttl_seconds named value
2024-11-27 09:57:57 +02:00
Kefu Chai
a5ee0c896b treewide: migrate from boost::adaptors::filtered to std::views::filter
Modernize the codebase by replacing Boost range adaptors with C++23 standard library views,
reducing external dependencies and leveraging modern C++ language features.

Key Changes:
- Replace `boost::adaptors::filtered` with `std::views::filter`
- Remove `#include <boost/range/adaptor/filtered.hpp>`
- Utilize standard library range views

Motivation:
- Reduce project's external dependency footprint
- Leverage standard library's range and view capabilities
- Improve long-term code maintainability
- Align with modern C++ best practices

Implementation Challenges and Considerations:
1. Range Conversion and Move Semantics
   - `std::ranges::to` adaptor requires rvalue references
   - Necessitated updates to variable and parameter constness
   - Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const`
     from `common` to enable efficient range conversion

2. Range Iteration and Mutation
   - Range views may mutate internal state during iteration
   - Cannot pass ranges by const reference in some scenarios
   - Solution: Pass ranges by rvalue reference to explicitly indicate
     state invalidation

Limitations:
- One instance of `boost::adaptors::filtered` temporarily preserved
  due to lack of a C++23 alternative for `boost::join()`
- A comprehensive replacement will be addressed in a follow-up change

This change is part of our ongoing effort to modernize the codebase,
reducing external dependencies and adopting modern C++ practices.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21648
2024-11-26 14:26:50 +02:00
Aleksandra Martyniuk
292d00463a tasks: add task_manager::task::is_user_task method 2024-11-25 14:21:53 +01:00
Asias He
9ace191616 repair: Enable small table optimization for RBNO bootstrap and decommission
The non local strategy system keyspaces usually contain very litte data.
All the tables within them have to be repaired for all the token ranges,
which could be large in clusters with a large number of nodes. In multiple
DC setup, the repair in RBNO is dominated by the network latency. As a
result, it takes a long time to repair those tables even if they are
almost empty.

To speed up the RBNO bootstrap, especially for starting empty clusters,
this patch enables small table optimization for RBNO for system tables.
We could enable it for small user tables as a follow up.

Tests:

1) A 5ms latency is added to simulate cross dc network delay, 256 tokens
per node, 10 nodes:

- Before

  topology_custom dev topology_custom.test_boot_time.1 1287.06s

- After

  topology_custom dev topology_custom.test_boot_time.1 12.48s

The test shows 100X boot time improvement

2) A SCT test to bootstrap 3 DCs, 3 nodes in each DC.

- Before

  Time to bootstrap = 1h23m

- After

  Time to bootstrap = 13m

The test shows 6X bootstrap time improvement

Fixes #19131
2024-11-25 13:46:17 +08:00
Asias He
69531ed8a6 repair: Move flush_rows after repair_meta class
In the next patch, it will call a member function of repair_meta class.

Refs #19131
2024-11-25 13:46:17 +08:00
Kefu Chai
e2e6f4f441 repair: s/Exceute/Execute/ in logging message
fix a typo in the logging message.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21661
2024-11-22 15:21:56 +02:00
Tomasz Grabiec
0d2583600d Merge 'Add tablet repair scheduler support' from Asias He
This adds a new tablet migration kind: repair. It allows tablet repair
scheduler to use this migration kind to schedule repair jobs.

The current repair scheduler implementation does the following:

- A tablet is picked to be repaired when is requested by user

- The tablet repair can be scheduled along with tablet migration and
  rebuild. It runs in the tablet_migration track.

- Repair jobs are scheduled in a smart way so that at any point in time,
  there are no more than configured jobs per shard, which is similar to
  scylla manager's control.

New feature. No backport is needed.

Closes scylladb/scylladb#21088

* github.com:scylladb/scylladb:
  test: Add tests for tablet repair scheduler
  repair: Add restful API for tablet repair
  repair: Add tablet repair scheduler internal API support
  docs: Update system_keyspace.md for tablet repair related info
  docs: Add docs for tablet repair migration
  repair: Add core tablet repair scheduler support
  messaging_service: Introduce TABLET_REPAIR verb
  tablet_allocator: Introduce stream_weight for tablet_migration_streaming_info
  network_topology_strategy: Preserve fields of task_info in reallocate_tablets
2024-11-20 13:28:17 +01:00