storage_service: raft topology: warn when raft_topology_cmd_handler fails due to abort

Currently we print an ERROR on all exceptions in `raft_topology_cmd_handler`. This log level is too high, in some cases exceptions are expected -- like during shutdown. And it causes dtest failures. Turn exceptions from aborts into WARN level. Also improve logging by printing the command that failed. Fixes scylladb/scylladb#19754 (cherry picked from commit 7506709573) Closes scylladb/scylladb#20072
raft topology: improve logging
2024-08-08 18:14:24 +02:00 · 2024-08-08 11:59:34 +03:00 · 2024-08-08 11:57:09 +03:00 · 2024-08-07 10:55:06 +02:00 · 2024-08-05 12:52:32 +02:00 · 2024-08-05 09:46:48 +02:00
303 changed files with 6942 additions and 2102 deletions
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@@ -1 +0,0 @@
-**Please replace this line with justification for the backport/\* labels added to this PR**
--- a/.github/scripts/label_promoted_commits.py
+++ b/.github/scripts/label_promoted_commits.py
@@ -1,4 +1,3 @@
-import requests
 from github import Github
 import argparse
 import re
@@ -23,36 +22,65 @@ def parser():
                             'commit, exclusive).')
    parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '
                                                                         'done')
-    parser.add_argument('--label', type=str, required=True, help='Label to use')
+    parser.add_argument('--ref', type=str, required=True, help='PR target branch')
    return parser.parse_args()


+def add_comment_and_close_pr(pr, comment):
+    if pr.state == 'open':
+        pr.create_issue_comment(comment)
+        pr.edit(state="closed")
+
+
+def mark_backport_done(repo, ref_pr_number, branch):
+    pr = repo.get_pull(int(ref_pr_number))
+    label_to_remove = f'backport/{branch}'
+    label_to_add = f'{label_to_remove}-done'
+    current_labels = [label.name for label in pr.get_labels()]
+    if label_to_remove in current_labels:
+        pr.remove_from_labels(label_to_remove)
+    if label_to_add not in current_labels:
+        pr.add_to_labels(label_to_add)
+
+
 def main():
+    # This script is triggered by a push event to either the master branch or a branch named branch-x.y (where x and y represent version numbers). Based on the pushed branch, the script performs the following actions:
+    # - When ref branch is `master`, it will add the `promoted-to-master` label, which we need later for the auto backport process
+    # - When ref branch is `branch-x.y` (which means we backported a patch), it will replace in the original PR the `backport/x.y` label with `backport/x.y-done` and will close the backport PR (Since GitHub close only the one referring to default branch)
    args = parser()
    pr_pattern = re.compile(r'Closes .*#([0-9]+)')
+    target_branch = re.search(r'branch-(\d+\.\d+)', args.ref)
    g = Github(github_token)
    repo = g.get_repo(args.repository, lazy=False)
-
    commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)
+    processed_prs = set()
    # Print commit information
    for commit in commits.commits:
-        print(commit.sha)
+        print(f'Commit sha is: {commit.sha}')
        match = pr_pattern.search(commit.commit.message)
        if match:
-            pr_number = match.group(1)
-            url = f'https://api.github.com/repos/{args.repository}/issues/{pr_number}/labels'
-            data = {
-                "labels": [f'{args.label}']
-            }
-            headers = {
-                "Authorization": f"token {github_token}",
-                "Accept": "application/vnd.github.v3+json"
-            }
-            response = requests.post(url, headers=headers, json=data)
-            if response.ok:
-                print(f"Label added successfully to {url}")
+            pr_number = int(match.group(1))
+            if pr_number in processed_prs:
+                continue
+            if target_branch:
+                pr = repo.get_pull(pr_number)
+                branch_name = target_branch[1]
+                refs_pr = re.findall(r'Refs (?:#|https.*?)(\d+)', pr.body)
+                if refs_pr:
+                    print(f'branch-{target_branch.group(1)}, pr number is: {pr_number}')
+                    # 1. change the backport label of the parent PR to note that
+                    #    we've merge the corresponding backport PR
+                    # 2. close the backport PR and leave a comment on it to note
+                    #    that it has been merged with a certain git commit,
+                    ref_pr_number = refs_pr[0]
+                    mark_backport_done(repo, ref_pr_number, branch_name)
+                    comment = f'Closed via {commit.sha}'
+                    add_comment_and_close_pr(pr, comment)
            else:
-                print(f"No label was added to {url}")
+                print(f'master branch, pr number is: {pr_number}')
+                pr = repo.get_pull(pr_number)
+                pr.add_to_labels('promoted-to-master')
+            processed_prs.add(pr_number)


 if __name__ == "__main__":
--- a/.github/workflows/add-label-when-promoted.yaml
+++ b/.github/workflows/add-label-when-promoted.yaml
@@ -4,6 +4,10 @@ on:
  push:
    branches:
      - master
+      - branch-*.*
+
+env:
+  DEFAULT_BRANCH: 'master'

 jobs:
  check-commit:
@@ -15,6 +19,8 @@ jobs:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
+          repository: ${{ github.repository }}
+          ref: ${{ env.DEFAULT_BRANCH }}
          fetch-depth: 0  # Fetch all history for all tags and branches

      - name: Install dependencies
@@ -23,4 +29,4 @@ jobs:
      - name: Run python script
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --label promoted-to-master
+        run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --ref ${{ github.ref }}
--- a/2
+++ b/2
@@ -78,7 +78,7 @@ fi

 # Default scylla product/version tags
 PRODUCT=scylla
-VERSION=5.5.0-dev
+VERSION=6.0.3

 if test -f version
 then
--- a/alternator/executor.cc
+++ b/alternator/executor.cc
@@ -4576,7 +4576,7 @@ static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_vie
    // used by default on new Alternator tables. Change this initialization
    // to 0 enable tablets by default, with automatic number of tablets.
    std::optional<unsigned> initial_tablets;
-    if (sp.get_db().local().get_config().check_experimental(db::experimental_features_t::feature::TABLETS)) {
+    if (sp.get_db().local().get_config().enable_tablets()) {
        auto it = tags_map.find(INITIAL_TABLETS_TAG_KEY);
        if (it != tags_map.end()) {
            // Tag set. If it's a valid number, use it. If not - e.g., it's
--- a/alternator/server.cc
+++ b/alternator/server.cc
@@ -211,7 +211,10 @@ protected:
        sstring local_dc = topology.get_datacenter();
        std::unordered_set<gms::inet_address> local_dc_nodes = topology.get_datacenter_endpoints().at(local_dc);
        for (auto& ip : local_dc_nodes) {
-            if (_gossiper.is_alive(ip)) {
+            // Note that it's not enough for the node to be is_alive() - a
+            // node joining the cluster is also "alive" but not responsive to
+            // requests. We need the node to be in normal state. See #19694.
+            if (_gossiper.is_normal(ip)) {
                rjson::push_back(results, rjson::from_string(fmt::to_string(ip)));
            }
        }
--- a/api/api-doc/error_injection.json
+++ b/api/api-doc/error_injection.json
@@ -63,6 +63,28 @@
                     "paramType":"path"
                  }
               ]
+            },
+            {
+               "method":"GET",
+               "summary":"Read the state of an injection from all shards",
+               "type":"array",
+               "items":{
+                  "type":"error_injection_info"
+               },
+               "nickname":"read_injection",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[
+                  {
+                     "name":"injection",
+                     "description":"injection name",
+                     "required":true,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"path"
+                  }
+               ]
            }
         ]
      },
@@ -152,5 +174,39 @@
            }
         }
      }
+   },
+   "models":{
+      "mapper":{
+         "id":"mapper",
+         "description":"A key value mapping",
+         "properties":{
+            "key":{
+               "type":"string",
+               "description":"The key"
+            },
+            "value":{
+               "type":"string",
+               "description":"The value"
+            }
+         }
+      },
+       "error_injection_info":{
+         "id":"error_injection_info",
+         "description":"Information about an error injection",
+         "properties":{
+            "enabled":{
+               "type":"boolean",
+               "description":"Is the error injection enabled"
+            },
+            "parameters":{
+               "type":"array",
+               "items":{
+                  "type":"mapper"
+               },
+               "description":"The parameter values"
+            }
+         },
+         "required":["enabled"]
+      }
   }
 }
--- a/api/compaction_manager.cc
+++ b/api/compaction_manager.cc
@@ -7,6 +7,7 @@
 */

 #include <seastar/core/coroutine.hh>
+#include <seastar/coroutine/exception.hh>

 #include "compaction_manager.hh"
 #include "compaction/compaction_manager.hh"
@@ -153,10 +154,13 @@ void set_compaction_manager(http_context& ctx, routes& r) {
    });

    cm::get_compaction_history.set(r, [&ctx] (std::unique_ptr<http::request> req) {
-        std::function<future<>(output_stream<char>&&)> f = [&ctx](output_stream<char>&& s) {
-            return do_with(output_stream<char>(std::move(s)), true, [&ctx] (output_stream<char>& s, bool& first){
-                return s.write("[").then([&ctx, &s, &first] {
-                    return ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable {
+        std::function<future<>(output_stream<char>&&)> f = [&ctx] (output_stream<char>&& out) -> future<> {
+            auto s = std::move(out);
+            bool first = true;
+            std::exception_ptr ex;
+            try {
+                co_await s.write("[");
+                co_await ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable -> future<> {
                        cm::history h;
                        h.id = fmt::to_string(entry.id);
                        h.ks = std::move(entry.ks);
@@ -170,18 +174,21 @@ void set_compaction_manager(http_context& ctx, routes& r) {
                            e.value = it.second;
                            h.rows_merged.push(std::move(e));
                        }
-                        auto fut = first ? make_ready_future<>() : s.write(", ");
+                        if (!first) {
+                            co_await s.write(", ");
+                        }
                        first = false;
-                        return fut.then([&s, h = std::move(h)] {
-                            return formatter::write(s, h);
-                        });
-                    }).then([&s] {
-                        return s.write("]").then([&s] {
-                            return s.close();
-                        });
+                        co_await formatter::write(s, h);
                    });
-                });
-            });
+                co_await s.write("]");
+                co_await s.flush();
+            } catch (...) {
+                ex = std::current_exception();
+            }
+            co_await s.close();
+            if (ex) {
+                co_await coroutine::return_exception_ptr(std::move(ex));
+            }
        };
        return make_ready_future<json::json_return_type>(std::move(f));
    });
--- a/api/error_injection.cc
+++ b/api/error_injection.cc
@@ -64,6 +64,32 @@ void set_error_injection(http_context& ctx, routes& r) {
        });
    });

+    hf::read_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {
+        const sstring injection = req->get_path_param("injection");
+
+        std::vector<error_injection_json::error_injection_info> error_injection_infos(smp::count, error_injection_json::error_injection_info{});
+
+        co_await smp::invoke_on_all([&] {
+            auto& info = error_injection_infos[this_shard_id()];
+            auto& errinj = utils::get_local_injector();
+            const auto enabled = errinj.is_enabled(injection);
+            info.enabled = enabled;
+            if (!enabled) {
+                return;
+            }
+            std::vector<error_injection_json::mapper> parameters;
+            for (const auto& p : errinj.get_injection_parameters(injection)) {
+                error_injection_json::mapper param;
+                param.key = p.first;
+                param.value = p.second;
+                parameters.push_back(std::move(param));
+            }
+            info.parameters = std::move(parameters);
+        });
+
+        co_return json::json_return_type(error_injection_infos);
+    });
+
    hf::disable_on_all.set(r, [](std::unique_ptr<request> req) {
        auto& errinj = utils::get_local_injector();
        return errinj.disable_on_all().then([] {
--- a/api/raft.cc
+++ b/api/raft.cc
@@ -61,17 +61,31 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
        co_return json_void{};
    });
    r::get_leader_host.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
-        return smp::submit_to(0, [&] {
-            auto& srv = std::invoke([&] () -> raft::server& {
-                if (req->query_parameters.contains("group_id")) {
-                    raft::group_id id{utils::UUID{req->get_query_param("group_id")}};
-                    return raft_gr.local().get_server(id);
-                } else {
-                    return raft_gr.local().group0();
-                }
+        if (!req->query_parameters.contains("group_id")) {
+            const auto leader_id = co_await raft_gr.invoke_on(0, [] (service::raft_group_registry& raft_gr) {
+                auto& srv = raft_gr.group0();
+                return srv.current_leader();
            });
-            return json_return_type(srv.current_leader().to_sstring());
+            co_return json_return_type{leader_id.to_sstring()};
+        }
+
+        const raft::group_id gid{utils::UUID{req->get_query_param("group_id")}};
+
+        std::atomic<bool> found_srv{false};
+        std::atomic<raft::server_id> leader_id = raft::server_id::create_null_id();
+        co_await raft_gr.invoke_on_all([gid, &found_srv, &leader_id] (service::raft_group_registry& raft_gr) {
+            if (raft_gr.find_server(gid)) {
+                found_srv = true;
+                leader_id = raft_gr.get_server(gid).current_leader();
+            }
+            return make_ready_future<>();
        });
+
+        if (!found_srv) {
+            throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};
+        }
+
+        co_return json_return_type(leader_id.load().to_sstring());
    });
 }

--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -36,6 +36,7 @@
 #include <seastar/http/exception.hh>
 #include <seastar/core/coroutine.hh>
 #include <seastar/coroutine/parallel_for_each.hh>
+#include <seastar/coroutine/exception.hh>
 #include "repair/row_level.hh"
 #include "locator/snitch_base.hh"
 #include "column_family.hh"
@@ -1685,32 +1686,41 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_
    ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
        auto result = co_await snap_ctl.local().get_snapshot_details();
        co_return std::function([res = std::move(result)] (output_stream<char>&& o) -> future<> {
-            auto result = std::move(res);
+            std::exception_ptr ex;
            output_stream<char> out = std::move(o);
-            bool first = true;
+            try {
+                auto result = std::move(res);
+                bool first = true;

-            co_await out.write("[");
-            for (auto& [name, details] : result) {
-                if (!first) {
-                    co_await out.write(", ");
+                co_await out.write("[");
+                for (auto& [name, details] : result) {
+                    if (!first) {
+                        co_await out.write(", ");
+                    }
+                    std::vector<ss::snapshot> snapshot;
+                    for (auto& cf : details) {
+                        ss::snapshot snp;
+                        snp.ks = cf.ks;
+                        snp.cf = cf.cf;
+                        snp.live = cf.details.live;
+                        snp.total = cf.details.total;
+                        snapshot.push_back(std::move(snp));
+                    }
+                    ss::snapshots all_snapshots;
+                    all_snapshots.key = name;
+                    all_snapshots.value = std::move(snapshot);
+                    co_await all_snapshots.write(out);
+                    first = false;
                }
-                std::vector<ss::snapshot> snapshot;
-                for (auto& cf : details) {
-                    ss::snapshot snp;
-                    snp.ks = cf.ks;
-                    snp.cf = cf.cf;
-                    snp.live = cf.details.live;
-                    snp.total = cf.details.total;
-                    snapshot.push_back(std::move(snp));
-                }
-                ss::snapshots all_snapshots;
-                all_snapshots.key = name;
-                all_snapshots.value = std::move(snapshot);
-                co_await all_snapshots.write(out);
-                first = false;
+                co_await out.write("]");
+                co_await out.flush();
+            } catch (...) {
+              ex = std::current_exception();
            }
-            co_await out.write("]");
            co_await out.close();
+            if (ex) {
+                co_await coroutine::return_exception_ptr(std::move(ex));
+            }
        });
    });

--- a/api/task_manager.cc
+++ b/api/task_manager.cc
@@ -7,6 +7,7 @@
 */

 #include <seastar/core/coroutine.hh>
+#include <seastar/coroutine/exception.hh>
 #include <seastar/http/exception.hh>

 #include "task_manager.hh"
@@ -23,6 +24,8 @@ namespace tm = httpd::task_manager_json;
 using namespace json;
 using namespace seastar::httpd;

+using task_variant = std::variant<tasks::task_manager::foreign_task_ptr, tasks::task_manager::task::task_essentials>;
+
 inline bool filter_tasks(tasks::task_manager::task_ptr task, std::unordered_map<sstring, sstring>& query_params) {
    return (!query_params.contains("keyspace") || query_params["keyspace"] == task->get_status().keyspace) &&
        (!query_params.contains("table") || query_params["table"] == task->get_status().table);
@@ -102,13 +105,14 @@ future<full_task_status> retrieve_status(const tasks::task_manager::foreign_task
    s.module = task->get_module_name();
    s.progress.completed = progress.completed;
    s.progress.total = progress.total;
-    std::vector<std::string> ct{task->get_children().size()};
-    boost::transform(task->get_children(), ct.begin(), [] (const auto& child) {
+    std::vector<std::string> ct = co_await task->get_children().map_each_task<std::string>([] (const tasks::task_manager::foreign_task_ptr& child) {
        return child->id().to_sstring();
+    }, [] (const tasks::task_manager::task::task_essentials& child) {
+        return child.task_status.id.to_sstring();
    });
    s.children_ids = std::move(ct);
    co_return s;
-}
+};

 void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm, db::config& cfg) {
    tm::get_modules.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
@@ -138,19 +142,28 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

        std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {
            auto s = std::move(os);
-            auto res = std::move(r);
-            co_await s.write("[");
-            std::string delim = "";
-            for (auto& v: res) {
-                for (auto& stats: v) {
-                    co_await s.write(std::exchange(delim, ", "));
-                    tm::task_stats ts;
-                    ts = stats;
-                    co_await formatter::write(s, ts);
+            std::exception_ptr ex;
+            try {
+                auto res = std::move(r);
+                co_await s.write("[");
+                std::string delim = "";
+                for (auto& v: res) {
+                    for (auto& stats: v) {
+                        co_await s.write(std::exchange(delim, ", "));
+                        tm::task_stats ts;
+                        ts = stats;
+                        co_await formatter::write(s, ts);
+                    }
                }
+                co_await s.write("]");
+                co_await s.flush();
+            } catch (...) {
+                ex = std::current_exception();
            }
-            co_await s.write("]");
            co_await s.close();
+            if (ex) {
+                co_await coroutine::return_exception_ptr(std::move(ex));
+            }
        };
        co_return std::move(f);
    });
@@ -179,7 +192,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
                if (!task->is_abortable()) {
                    co_await coroutine::return_exception(std::runtime_error("Requested task cannot be aborted"));
                }
-                co_await task->abort();
+                task->abort();
            });
        } catch (tasks::task_manager::task_not_found& e) {
            throw bad_param_exception(e.what());
@@ -193,7 +206,6 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
        try {
            task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) {
                return task->done().then_wrapped([task] (auto f) {
-                    task->unregister_task();
                    // done() is called only because we want the task to be complete before getting its status.
                    // The future should be ignored here as the result does not matter.
                    f.ignore_ready_future();
@@ -210,7 +222,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
    tm::get_task_status_recursively.set(r, [&_tm = tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
        auto& tm = _tm;
        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
-        std::queue<tasks::task_manager::foreign_task_ptr> q;
+        std::queue<task_variant> q;
        utils::chunked_vector<full_task_status> res;

        tasks::task_manager::foreign_task_ptr task;
@@ -230,10 +242,33 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
        q.push(co_await task.copy());   // Task cannot be moved since we need it to be alive during whole loop execution.
        while (!q.empty()) {
            auto& current = q.front();
-            res.push_back(co_await retrieve_status(current));
-            for (auto& child: current->get_children()) {
-                q.push(co_await child.copy());
-            }
+            co_await std::visit(overloaded_functor {
+                [&] (const tasks::task_manager::foreign_task_ptr& task) -> future<> {
+                    res.push_back(co_await retrieve_status(task));
+                    co_await task->get_children().for_each_task([&q] (const tasks::task_manager::foreign_task_ptr& child) -> future<> {
+                        q.push(co_await child.copy());
+                    }, [&] (const tasks::task_manager::task::task_essentials& child) {
+                        q.push(child);
+                        return make_ready_future();
+                    });
+                },
+                [&] (const tasks::task_manager::task::task_essentials& task) -> future<> {
+                    res.push_back(full_task_status{
+                        .task_status = task.task_status,
+                        .type = task.type,
+                        .progress = task.task_progress,
+                        .parent_id = task.parent_id,
+                        .abortable = task.abortable,
+                        .children_ids = boost::copy_range<std::vector<std::string>>(task.failed_children | boost::adaptors::transformed([] (auto& child) {
+                            return child.task_status.id.to_sstring();
+                        }))
+                    });
+                    for (auto& child: task.failed_children) {
+                        q.push(child);
+                    }
+                    return make_ready_future();
+                }
+            }, current);
            q.pop();
        }

--- a/api/task_manager_test.cc
+++ b/api/task_manager_test.cc
@@ -89,14 +89,13 @@ void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_man
        std::string error = fail ? it->second : "";

        try {
-            co_await tasks::task_manager::invoke_on_task(tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) {
+            co_await tasks::task_manager::invoke_on_task(tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) -> future<> {
                tasks::test_task test_task{task};
                if (fail) {
-                    test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));
+                    co_await test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));
                } else {
-                    test_task.finish();
+                    co_await test_task.finish();
                }
-                return make_ready_future<>();
            });
        } catch (tasks::task_manager::task_not_found& e) {
            throw bad_param_exception(e.what());
--- a/auth/common.cc
+++ b/auth/common.cc
@@ -24,7 +24,6 @@
 #include "service/raft/group0_state_machine.hh"
 #include "timeout_config.hh"
 #include "db/config.hh"
-#include "db/system_auth_keyspace.hh"
 #include "utils/error_injection.hh"

 namespace auth {
@@ -41,14 +40,14 @@ constinit const std::string_view AUTH_PACKAGE_NAME("org.apache.cassandra.auth.")
 static logging::logger auth_log("auth");

 bool legacy_mode(cql3::query_processor& qp) {
-    return qp.auth_version < db::system_auth_keyspace::version_t::v2;
+    return qp.auth_version < db::system_keyspace::auth_version_t::v2;
 }

 std::string_view get_auth_ks_name(cql3::query_processor& qp) {
    if (legacy_mode(qp)) {
        return meta::legacy::AUTH_KS;
    }
-    return db::system_auth_keyspace::NAME;
+    return db::system_keyspace::NAME;
 }

 // Func must support being invoked more than once.
@@ -123,7 +122,7 @@ static future<> announce_mutations_with_guard(
        ::service::raft_group0_client& group0_client,
        std::vector<canonical_mutation> muts,
        ::service::group0_guard group0_guard,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout) {
    auto group0_cmd = group0_client.prepare_command(
        ::service::write_mutations{
@@ -139,7 +138,7 @@ future<> announce_mutations_with_batching(
        ::service::raft_group0_client& group0_client,
        start_operation_func_t start_operation_func,
        std::function<mutations_generator(api::timestamp_type& t)> gen,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout) {
    // account for command's overhead, it's better to use smaller threshold than constantly bounce off the limit
    size_t memory_threshold = group0_client.max_command_size() * 0.75;
@@ -190,7 +189,7 @@ future<> announce_mutations(
        ::service::raft_group0_client& group0_client,
        const sstring query_string,
        std::vector<data_value_or_unset> values,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout) {
    auto group0_guard = co_await group0_client.start_operation(as, timeout);
    auto timestamp = group0_guard.write_timestamp();
--- a/auth/common.hh
+++ b/auth/common.hh
@@ -84,7 +84,7 @@ future<> create_legacy_metadata_table_if_missing(
 // Execute update query via group0 mechanism, mutations will be applied on all nodes.
 // Use this function when need to perform read before write on a single guard or if
 // you have more than one mutation and potentially exceed single command size limit.
-using start_operation_func_t = std::function<future<::service::group0_guard>(abort_source*)>;
+using start_operation_func_t = std::function<future<::service::group0_guard>(abort_source&)>;
 using mutations_generator = coroutine::experimental::generator<mutation>;
 future<> announce_mutations_with_batching(
        ::service::raft_group0_client& group0_client,
@@ -93,7 +93,7 @@ future<> announce_mutations_with_batching(
        // function here
        start_operation_func_t start_operation_func,
        std::function<mutations_generator(api::timestamp_type& t)> gen,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout);

 // Execute update query via group0 mechanism, mutations will be applied on all nodes.
@@ -102,7 +102,7 @@ future<> announce_mutations(
        ::service::raft_group0_client& group0_client,
        const sstring query_string,
        std::vector<data_value_or_unset> values,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout);

 }
--- a/auth/default_authorizer.cc
+++ b/auth/default_authorizer.cc
@@ -9,7 +9,7 @@
 */

 #include "auth/default_authorizer.hh"
-#include "db/system_auth_keyspace.hh"
+#include "db/system_keyspace.hh"

 extern "C" {
 #include <crypt.h>
@@ -203,7 +203,7 @@ default_authorizer::modify(
                cql3::query_processor::cache_internal::no).discard_result();
    }
    co_return co_await announce_mutations(_qp, _group0_client, query,
-        {permissions::to_strings(set), sstring(role_name), resource.name()}, &_as, ::service::raft_timeout{});
+        {permissions::to_strings(set), sstring(role_name), resource.name()}, _as, ::service::raft_timeout{});
 }


@@ -256,7 +256,7 @@ future<> default_authorizer::revoke_all(std::string_view role_name) {
                    {sstring(role_name)},
                    cql3::query_processor::cache_internal::no).discard_result();
        } else {
-            co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, &_as, ::service::raft_timeout{});
+            co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, _as, ::service::raft_timeout{});
        }
    } catch (exceptions::request_execution_exception& e) {
        alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);
@@ -346,9 +346,9 @@ future<> default_authorizer::revoke_all(const resource& resource) {
        const auto timeout = ::service::raft_timeout{};
        co_await announce_mutations_with_batching(
                _group0_client,
-                [this, timeout](abort_source* as) { return _group0_client.start_operation(as, timeout); },
+                [this, timeout](abort_source& as) { return _group0_client.start_operation(as, timeout); },
                std::move(gen),
-                &_as,
+                _as,
            timeout);
    } catch (exceptions::request_execution_exception& e) {
        alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", name, e);
--- a/auth/password_authenticator.cc
+++ b/auth/password_authenticator.cc
@@ -136,7 +136,7 @@ future<> password_authenticator::create_default_if_missing() {
        plogger.info("Created default superuser authentication record.");
    } else {
        co_await announce_mutations(_qp, _group0_client, query,
-            {salted_pwd, _superuser}, &_as, ::service::raft_timeout{});
+            {salted_pwd, _superuser}, _as, ::service::raft_timeout{});
        plogger.info("Created default superuser authentication record.");
    }
 }
@@ -271,7 +271,7 @@ future<> password_authenticator::create(std::string_view role_name, const authen
                cql3::query_processor::cache_internal::no).discard_result();
    } else {
        co_await announce_mutations(_qp, _group0_client, query,
-                {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}, &_as, ::service::raft_timeout{});
+                {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}, _as, ::service::raft_timeout{});
    }
 }

@@ -294,7 +294,7 @@ future<> password_authenticator::alter(std::string_view role_name, const authent
                cql3::query_processor::cache_internal::no).discard_result();
    } else {
        co_await announce_mutations(_qp, _group0_client, query,
-            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}, &_as, ::service::raft_timeout{});
+            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}, _as, ::service::raft_timeout{});
    }
 }

@@ -311,7 +311,7 @@ future<> password_authenticator::drop(std::string_view name) {
                {sstring(name)},
                cql3::query_processor::cache_internal::no).discard_result();
    } else {
-        co_await announce_mutations(_qp, _group0_client, query, {sstring(name)}, &_as, ::service::raft_timeout{});
+        co_await announce_mutations(_qp, _group0_client, query, {sstring(name)}, _as, ::service::raft_timeout{});
    }
 }

--- a/auth/service.cc
+++ b/auth/service.cc
@@ -28,7 +28,6 @@
 #include "db/config.hh"
 #include "db/consistency_level_type.hh"
 #include "db/functions/function_name.hh"
-#include "db/system_auth_keyspace.hh"
 #include "log.hh"
 #include "schema/schema_fwd.hh"
 #include <seastar/core/future.hh>
@@ -644,7 +643,7 @@ future<> migrate_to_auth_v2(db::system_keyspace& sys_ks, ::service::raft_group0_
                }
                auto muts = co_await qp.get_mutations_internal(
                        format("INSERT INTO {}.{} ({}) VALUES ({})",
-                                db::system_auth_keyspace::NAME,
+                                db::system_keyspace::NAME,
                                cf_name,
                                col_names_str,
                                val_binders_str),
@@ -659,12 +658,12 @@ future<> migrate_to_auth_v2(db::system_keyspace& sys_ks, ::service::raft_group0_
            }
        }
        co_yield co_await sys_ks.make_auth_version_mutation(ts,
-                db::system_auth_keyspace::version_t::v2);
+                db::system_keyspace::auth_version_t::v2);
    };
    co_await announce_mutations_with_batching(g0,
            start_operation_func,
            std::move(gen),
-            &as,
+            as,
            std::nullopt);
 }

--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -190,7 +190,7 @@ future<> standard_role_manager::create_default_role_if_missing() {
                    {_superuser},
                    cql3::query_processor::cache_internal::no).discard_result();
        } else {
-            co_await announce_mutations(_qp, _group0_client, query, {_superuser}, &_as, ::service::raft_timeout{});
+            co_await announce_mutations(_qp, _group0_client, query, {_superuser}, _as, ::service::raft_timeout{});
        }
        log.info("Created default superuser role '{}'.", _superuser);
    } catch(const exceptions::unavailable_exception& e) {
@@ -285,7 +285,7 @@ future<> standard_role_manager::create_or_replace(std::string_view role_name, co
                {sstring(role_name), c.is_superuser, c.can_login},
                cql3::query_processor::cache_internal::yes).discard_result();
    } else {
-        co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name), c.is_superuser, c.can_login}, &_as, ::service::raft_timeout{});
+        co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name), c.is_superuser, c.can_login}, _as, ::service::raft_timeout{});
    }
 }

@@ -333,7 +333,7 @@ standard_role_manager::alter(std::string_view role_name, const role_config_updat
                    {sstring(role_name)},
                    cql3::query_processor::cache_internal::no).discard_result();
        } else {
-            return announce_mutations(_qp, _group0_client, std::move(query), {sstring(role_name)}, &_as, ::service::raft_timeout{});
+            return announce_mutations(_qp, _group0_client, std::move(query), {sstring(role_name)}, _as, ::service::raft_timeout{});
        }
    });
 }
@@ -383,7 +383,7 @@ future<> standard_role_manager::drop(std::string_view role_name) {
            co_await _qp.execute_internal(query, {sstring(role_name)},
                cql3::query_processor::cache_internal::yes).discard_result();
        } else {
-            co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, &_as, ::service::raft_timeout{});
+            co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, _as, ::service::raft_timeout{});
        }
    };
    // Finally, delete the role itself.
@@ -401,7 +401,7 @@ future<> standard_role_manager::drop(std::string_view role_name) {
                    {sstring(role_name)},
                    cql3::query_processor::cache_internal::no).discard_result();
        } else {
-            co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, &_as, ::service::raft_timeout{});
+            co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, _as, ::service::raft_timeout{});
        }
    };

@@ -434,7 +434,7 @@ standard_role_manager::modify_membership(
                    cql3::query_processor::cache_internal::no).discard_result();
        } else {
            co_await announce_mutations(_qp, _group0_client, std::move(query),
-                    {role_set{sstring(role_name)}, sstring(grantee_name)}, &_as, ::service::raft_timeout{});
+                    {role_set{sstring(role_name)}, sstring(grantee_name)}, _as, ::service::raft_timeout{});
        }
    };

@@ -453,7 +453,7 @@ standard_role_manager::modify_membership(
                            cql3::query_processor::cache_internal::no).discard_result();
                } else {
                    co_return co_await announce_mutations(_qp, _group0_client, insert_query,
-                            {sstring(role_name), sstring(grantee_name)}, &_as, ::service::raft_timeout{});
+                            {sstring(role_name), sstring(grantee_name)}, _as, ::service::raft_timeout{});
                }
            }

@@ -470,7 +470,7 @@ standard_role_manager::modify_membership(
                            cql3::query_processor::cache_internal::no).discard_result();
                } else {
                    co_return co_await announce_mutations(_qp, _group0_client, delete_query,
-                            {sstring(role_name), sstring(grantee_name)}, &_as, ::service::raft_timeout{});
+                            {sstring(role_name), sstring(grantee_name)}, _as, ::service::raft_timeout{});
                }
            }
        }
@@ -644,7 +644,7 @@ future<> standard_role_manager::set_attribute(std::string_view role_name, std::s
        co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name), sstring(attribute_value)}, cql3::query_processor::cache_internal::yes).discard_result();
    } else {
        co_await announce_mutations(_qp, _group0_client, query,
-                {sstring(role_name), sstring(attribute_name), sstring(attribute_value)}, &_as, ::service::raft_timeout{});
+                {sstring(role_name), sstring(attribute_name), sstring(attribute_value)}, _as, ::service::raft_timeout{});
    }
 }

@@ -659,7 +659,7 @@ future<> standard_role_manager::remove_attribute(std::string_view role_name, std
        co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes).discard_result();
    } else {
        co_await announce_mutations(_qp, _group0_client, query,
-                {sstring(role_name), sstring(attribute_name)}, &_as, ::service::raft_timeout{});
+                {sstring(role_name), sstring(attribute_name)}, _as, ::service::raft_timeout{});
    }
 }
 }
--- a/compaction/compaction_manager.cc
+++ b/compaction/compaction_manager.cc
@@ -489,7 +489,7 @@ public:
        return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
    }

-    virtual future<> abort() noexcept override {
+    virtual void abort() noexcept override {
        return compaction_task_executor::abort(_as);
    }
 protected:
@@ -514,7 +514,7 @@ public:
        return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
    }

-    virtual future<> abort() noexcept override {
+    virtual void abort() noexcept override {
        return compaction_task_executor::abort(_as);
    }
 protected:
@@ -629,7 +629,7 @@ public:
        return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
    }

-    virtual future<> abort() noexcept override {
+    virtual void abort() noexcept override {
        return compaction_task_executor::abort(_as);
    }
 protected:
@@ -855,12 +855,11 @@ void compaction_task_executor::finish_compaction(state finish_state) noexcept {
    _compaction_state.compaction_done.signal();
 }

-future<> compaction_task_executor::abort(abort_source& as) noexcept {
+void compaction_task_executor::abort(abort_source& as) noexcept {
    if (!as.abort_requested()) {
        as.request_abort();
        stop_compaction("user requested abort");
    }
-    return make_ready_future();
 }

 void compaction_task_executor::stop_compaction(sstring reason) noexcept {
@@ -1181,7 +1180,7 @@ public:
        , regular_compaction_task_impl(mgr._task_manager_module, tasks::task_id::create_random_id(), mgr._task_manager_module->new_sequence_number(), t.schema()->ks_name(), t.schema()->cf_name(), "", tasks::task_id::create_null_id())
    {}

-    virtual future<> abort() noexcept override {
+    virtual void abort() noexcept override {
        return compaction_task_executor::abort(_as);
    }
 protected:
@@ -1352,7 +1351,7 @@ public:
        return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
    }

-    virtual future<> abort() noexcept override {
+    virtual void abort() noexcept override {
        return compaction_task_executor::abort(_as);
    }
 protected:
@@ -1379,13 +1378,20 @@ private:
                }));
        };

-        auto get_next_job = [&] () -> std::optional<sstables::compaction_descriptor> {
-            auto desc = t.get_compaction_strategy().get_reshaping_job(get_reshape_candidates(), t.schema(), sstables::reshape_mode::strict);
-            return desc.sstables.size() ? std::make_optional(std::move(desc)) : std::nullopt;
+        auto get_next_job = [&] () -> future<std::optional<sstables::compaction_descriptor>> {
+            auto candidates = get_reshape_candidates();
+            if (candidates.empty()) {
+                co_return std::nullopt;
+            }
+            // all sstables added to maintenance set share the same underlying storage.
+            auto& storage = candidates.front()->get_storage();
+            sstables::reshape_config cfg = co_await sstables::make_reshape_config(storage, sstables::reshape_mode::strict);
+            auto desc = t.get_compaction_strategy().get_reshaping_job(get_reshape_candidates(), t.schema(), cfg);
+            co_return desc.sstables.size() ? std::make_optional(std::move(desc)) : std::nullopt;
        };

        std::exception_ptr err;
-        while (auto desc = get_next_job()) {
+        while (auto desc = co_await get_next_job()) {
            auto compacting = compacting_sstable_registration(_cm, _cm.get_compaction_state(&t), desc->sstables);
            auto on_replace = compacting.update_on_sstable_replacement();

@@ -1755,7 +1761,7 @@ public:
        return compaction_task_impl::get_progress(_compaction_data, _progress_monitor);
    }

-    virtual future<> abort() noexcept override {
+    virtual void abort() noexcept override {
        return compaction_task_executor::abort(_as);
    }
 protected:
--- a/compaction/compaction_manager.hh
+++ b/compaction/compaction_manager.hh
@@ -594,7 +594,7 @@ public:
        return _compaction_data.abort.abort_requested();
    }

-    future<> abort(abort_source& as) noexcept;
+    void abort(abort_source& as) noexcept;

    void stop_compaction(sstring reason) noexcept;

--- a/compaction/compaction_strategy.cc
+++ b/compaction/compaction_strategy.cc
@@ -83,7 +83,7 @@ reader_consumer_v2 compaction_strategy_impl::make_interposer_consumer(const muta
 }

 compaction_descriptor
-compaction_strategy_impl::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const {
+compaction_strategy_impl::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const {
    return compaction_descriptor();
 }

@@ -728,8 +728,8 @@ compaction_backlog_tracker compaction_strategy::make_backlog_tracker() const {
 }

 sstables::compaction_descriptor
-compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const {
-    return _compaction_strategy_impl->get_reshaping_job(std::move(input), schema, mode);
+compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const {
+    return _compaction_strategy_impl->get_reshaping_job(std::move(input), schema, cfg);
 }

 uint64_t compaction_strategy::adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate, schema_ptr schema) const {
@@ -767,6 +767,13 @@ compaction_strategy make_compaction_strategy(compaction_strategy_type strategy,
    return compaction_strategy(std::move(impl));
 }

+future<reshape_config> make_reshape_config(const sstables::storage& storage, reshape_mode mode) {
+    co_return sstables::reshape_config{
+        .mode = mode,
+        .free_storage_space = co_await storage.free_space() / smp::count,
+    };
+}
+
 }

 namespace compaction {
--- a/compaction/compaction_strategy.hh
+++ b/compaction/compaction_strategy.hh
@@ -30,6 +30,7 @@ class compaction_strategy_impl;
 class sstable;
 class sstable_set;
 struct compaction_descriptor;
+class storage;

 class compaction_strategy {
    ::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
@@ -121,11 +122,13 @@ public:
    //
    // The caller should also pass a maximum number of SSTables which is the maximum amount of
    // SSTables that can be added into a single job.
-    compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const;
+    compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const;

 };

 // Creates a compaction_strategy object from one of the strategies available.
 compaction_strategy make_compaction_strategy(compaction_strategy_type strategy, const std::map<sstring, sstring>& options);

+future<reshape_config> make_reshape_config(const sstables::storage& storage, reshape_mode mode);
+
 }
--- a/compaction/compaction_strategy_impl.hh
+++ b/compaction/compaction_strategy_impl.hh
@@ -76,6 +76,6 @@ public:
        return false;
    }

-    virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const;
+    virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const;
 };
 }
--- a/compaction/compaction_strategy_type.hh
+++ b/compaction/compaction_strategy_type.hh
@@ -8,6 +8,8 @@

 #pragma once

+#include <cstdint>
+
 namespace sstables {

 enum class compaction_strategy_type {
@@ -18,4 +20,10 @@ enum class compaction_strategy_type {
 };

 enum class reshape_mode { strict, relaxed };
+
+struct reshape_config {
+    reshape_mode mode;
+    const uint64_t free_storage_space;
+};
+
 }
--- a/compaction/leveled_compaction_strategy.cc
+++ b/compaction/leveled_compaction_strategy.cc
@@ -146,7 +146,8 @@ int64_t leveled_compaction_strategy::estimated_pending_compactions(table_state&
 }

 compaction_descriptor
-leveled_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const {
+leveled_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const {
+    auto mode = cfg.mode;
    std::array<std::vector<shared_sstable>, leveled_manifest::MAX_LEVELS> level_info;

    auto is_disjoint = [schema] (const std::vector<shared_sstable>& sstables, unsigned tolerance) -> std::tuple<bool, unsigned> {
@@ -203,7 +204,7 @@ leveled_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input

    if (level_info[0].size() > offstrategy_threshold) {
        size_tiered_compaction_strategy stcs(_stcs_options);
-        return stcs.get_reshaping_job(std::move(level_info[0]), schema, mode);
+        return stcs.get_reshaping_job(std::move(level_info[0]), schema, cfg);
    }

    for (unsigned level = leveled_manifest::MAX_LEVELS - 1; level > 0; --level) {
--- a/compaction/leveled_compaction_strategy.hh
+++ b/compaction/leveled_compaction_strategy.hh
@@ -74,7 +74,7 @@ public:

    virtual std::unique_ptr<compaction_backlog_tracker::impl> make_backlog_tracker() const override;

-    virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const override;
+    virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const override;
 };

 }
--- a/compaction/size_tiered_compaction_strategy.cc
+++ b/compaction/size_tiered_compaction_strategy.cc
@@ -298,8 +298,9 @@ size_tiered_compaction_strategy::most_interesting_bucket(const std::vector<sstab
 }

 compaction_descriptor
-size_tiered_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const
+size_tiered_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const
 {
+    auto mode = cfg.mode;
    size_t offstrategy_threshold = std::max(schema->min_compaction_threshold(), 4);
    size_t max_sstables = std::max(schema->max_compaction_threshold(), int(offstrategy_threshold));

--- a/compaction/size_tiered_compaction_strategy.hh
+++ b/compaction/size_tiered_compaction_strategy.hh
@@ -96,7 +96,7 @@ public:

    virtual std::unique_ptr<compaction_backlog_tracker::impl> make_backlog_tracker() const override;

-    virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const override;
+    virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const override;

    friend class ::size_tiered_backlog_tracker;
 };
--- a/compaction/task_manager_module.cc
+++ b/compaction/task_manager_module.cc
@@ -595,28 +595,35 @@ future<> table_reshaping_compaction_task_impl::run() {

 future<> shard_reshaping_compaction_task_impl::run() {
    auto& table = _db.local().find_column_family(_status.keyspace, _status.table);
+    auto holder = table.async_gate().hold();
    tasks::task_info info{_status.id, _status.shard};

-    std::unordered_map<size_t, std::unordered_set<sstables::shared_sstable>> sstables_grouped_by_compaction_group;
+    std::unordered_map<compaction::table_state*, std::unordered_set<sstables::shared_sstable>> sstables_grouped_by_compaction_group;
    for (auto& sstable : _dir.get_unshared_local_sstables()) {
-        auto compaction_group_id = table.get_compaction_group_id_for_sstable(sstable);
-        sstables_grouped_by_compaction_group[compaction_group_id].insert(sstable);
+        auto& t = table.table_state_for_sstable(sstable);
+        sstables_grouped_by_compaction_group[&t].insert(sstable);
    }

    // reshape sstables individually within the compaction groups
    for (auto& sstables_in_cg : sstables_grouped_by_compaction_group) {
-        co_await reshape_compaction_group(sstables_in_cg.first, sstables_in_cg.second, table, info);
+        co_await reshape_compaction_group(*sstables_in_cg.first, sstables_in_cg.second, table, info);
    }
 }

-future<> shard_reshaping_compaction_task_impl::reshape_compaction_group(size_t compaction_group_id, std::unordered_set<sstables::shared_sstable>& sstables_in_cg, replica::column_family& table, const tasks::task_info& info) {
+future<> shard_reshaping_compaction_task_impl::reshape_compaction_group(compaction::table_state& t, std::unordered_set<sstables::shared_sstable>& sstables_in_cg, replica::column_family& table, const tasks::task_info& info) {

    while (true) {
        auto reshape_candidates = boost::copy_range<std::vector<sstables::shared_sstable>>(sstables_in_cg
                | boost::adaptors::filtered([&filter = _filter] (const auto& sst) {
            return filter(sst);
        }));
-        auto desc = table.get_compaction_strategy().get_reshaping_job(std::move(reshape_candidates), table.schema(), _mode);
+        if (reshape_candidates.empty()) {
+            break;
+        }
+        // all sstables were found in the same sstable_directory instance, so they share the same underlying storage.
+        auto& storage = reshape_candidates.front()->get_storage();
+        auto cfg = co_await sstables::make_reshape_config(storage, _mode);
+        auto desc = table.get_compaction_strategy().get_reshaping_job(std::move(reshape_candidates), table.schema(), cfg);
        if (desc.sstables.empty()) {
            break;
        }
@@ -635,7 +642,6 @@ future<> shard_reshaping_compaction_task_impl::reshape_compaction_group(size_t c
        desc.creator = _creator;

        try {
-            auto& t = table.get_compaction_group(compaction_group_id)->as_table_state();
            co_await table.get_compaction_manager().run_custom_job(t, sstables::compaction_type::Reshape, "Reshape compaction", [&dir = _dir, sstlist = std::move(sstlist), desc = std::move(desc), &sstables_in_cg, &t] (sstables::compaction_data& info, sstables::compaction_progress_monitor& progress_monitor) mutable -> future<> {
                sstables::compaction_result result = co_await sstables::compact_sstables(std::move(desc), info, t, progress_monitor);
                // update the sstables_in_cg set with new sstables and remove the reshaped ones
--- a/compaction/task_manager_module.hh
+++ b/compaction/task_manager_module.hh
@@ -606,7 +606,7 @@ private:
    std::function<bool (const sstables::shared_sstable&)> _filter;
    uint64_t& _total_shard_size;

-    future<> reshape_compaction_group(size_t compaction_group_id, std::unordered_set<sstables::shared_sstable>& sstables_in_cg, replica::column_family& table, const tasks::task_info& info);
+    future<> reshape_compaction_group(compaction::table_state& t, std::unordered_set<sstables::shared_sstable>& sstables_in_cg, replica::column_family& table, const tasks::task_info& info);
 public:
    shard_reshaping_compaction_task_impl(tasks::task_manager::module_ptr module,
            std::string keyspace,
--- a/compaction/time_window_compaction_strategy.cc
+++ b/compaction/time_window_compaction_strategy.cc
@@ -226,12 +226,14 @@ reader_consumer_v2 time_window_compaction_strategy::make_interposer_consumer(con
 }

 compaction_descriptor
-time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const {
+time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const {
+    auto mode = cfg.mode;
    std::vector<shared_sstable> single_window;
    std::vector<shared_sstable> multi_window;

    size_t offstrategy_threshold = std::max(schema->min_compaction_threshold(), 4);
    size_t max_sstables = std::max(schema->max_compaction_threshold(), int(offstrategy_threshold));
+    const uint64_t target_job_size = cfg.free_storage_space * reshape_target_space_overhead;

    if (mode == reshape_mode::relaxed) {
        offstrategy_threshold = max_sstables;
@@ -263,22 +265,40 @@ time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> i
            multi_window.size(), !multi_window.empty() && sstable_set_overlapping_count(schema, multi_window) == 0,
            single_window.size(), !single_window.empty() && sstable_set_overlapping_count(schema, single_window) == 0);

-    auto need_trimming = [max_sstables, schema, &is_disjoint] (const std::vector<shared_sstable>& ssts) {
-        // All sstables can be compacted at once if they're disjoint, given that partitioned set
-        // will incrementally open sstables which translates into bounded memory usage.
-        return ssts.size() > max_sstables && !is_disjoint(ssts);
+    auto get_job_size = [] (const std::vector<shared_sstable>& ssts) {
+        return boost::accumulate(ssts | boost::adaptors::transformed(std::mem_fn(&sstable::bytes_on_disk)), uint64_t(0));
+    };
+
+    // Targets a space overhead of 10%. All disjoint sstables can be compacted together as long as they won't
+    // cause an overhead above target. Otherwise, the job targets a maximum of #max_threshold sstables.
+    auto need_trimming = [&] (const std::vector<shared_sstable>& ssts, const uint64_t job_size, bool is_disjoint) {
+        const size_t min_sstables = 2;
+        auto is_above_target_size = job_size > target_job_size;
+
+        return (ssts.size() > max_sstables && !is_disjoint) ||
+               (ssts.size() > min_sstables && is_above_target_size);
+    };
+
+    auto maybe_trim_job = [&need_trimming] (std::vector<shared_sstable>& ssts, uint64_t job_size, bool is_disjoint) {
+        while (need_trimming(ssts, job_size, is_disjoint)) {
+            auto sst = ssts.back();
+            ssts.pop_back();
+            job_size -= sst->bytes_on_disk();
+        }
    };

    if (!multi_window.empty()) {
+        auto disjoint = is_disjoint(multi_window);
+        auto job_size = get_job_size(multi_window);
        // Everything that spans multiple windows will need reshaping
-        if (need_trimming(multi_window)) {
+        if (need_trimming(multi_window, job_size, disjoint)) {
            // When trimming, let's keep sstables with overlapping time window, so as to reduce write amplification.
            // For example, if there are N sstables spanning window W, where N <= 32, then we can produce all data for W
            // in a single compaction round, removing the need to later compact W to reduce its number of files.
            boost::partial_sort(multi_window, multi_window.begin() + max_sstables, [](const shared_sstable &a, const shared_sstable &b) {
                return a->get_stats_metadata().max_timestamp < b->get_stats_metadata().max_timestamp;
            });
-            multi_window.resize(max_sstables);
+            maybe_trim_job(multi_window, job_size, disjoint);
        }
        compaction_descriptor desc(std::move(multi_window));
        desc.options = compaction_type_options::make_reshape();
@@ -297,15 +317,17 @@ time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> i
                std::copy(ssts.begin(), ssts.end(), std::back_inserter(single_window));
                continue;
            }
+
            // reuse STCS reshape logic which will only compact similar-sized files, to increase overall efficiency
            // when reshaping time buckets containing a huge amount of files
-            auto desc = size_tiered_compaction_strategy(_stcs_options).get_reshaping_job(std::move(ssts), schema, mode);
+            auto desc = size_tiered_compaction_strategy(_stcs_options).get_reshaping_job(std::move(ssts), schema, cfg);
            if (!desc.sstables.empty()) {
                return desc;
            }
        }
    }
    if (!single_window.empty()) {
+        maybe_trim_job(single_window, get_job_size(single_window), all_disjoint);
        compaction_descriptor desc(std::move(single_window));
        desc.options = compaction_type_options::make_reshape();
        return desc;
--- a/compaction/time_window_compaction_strategy.hh
+++ b/compaction/time_window_compaction_strategy.hh
@@ -76,6 +76,7 @@ public:
    // To prevent an explosion in the number of sstables we cap it.
    // Better co-locate some windows into the same sstables than OOM.
    static constexpr uint64_t max_data_segregation_window_count = 100;
+    static constexpr float reshape_target_space_overhead = 0.1f;

    using bucket_t = std::vector<shared_sstable>;
    enum class bucket_compaction_mode { none, size_tiered, major };
@@ -168,7 +169,7 @@ public:
        return true;
    }

-    virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_mode mode) const override;
+    virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const override;
 };

 }
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -618,3 +618,6 @@ maintenance_socket: ignore
 # replication_strategy_warn_list:
 #  - SimpleStrategy
 # replication_strategy_fail_list:
+
+# This enables tablets on newly created keyspaces
+enable_tablets: true
--- a/configure.py
+++ b/configure.py
@@ -1015,7 +1015,6 @@ scylla_core = (['message/messaging_service.cc',
                'cql3/result_set.cc',
                'cql3/prepare_context.cc',
                'db/consistency_level.cc',
-                'db/system_auth_keyspace.cc',
                'db/system_keyspace.cc',
                'db/virtual_table.cc',
                'db/virtual_tables.cc',
@@ -1358,6 +1357,7 @@ scylla_perfs = ['test/perf/perf_alternator.cc',
                'test/perf/perf_simple_query.cc',
                'test/perf/perf_sstable.cc',
                'test/perf/perf_tablets.cc',
+                'test/perf/tablet_load_balancing.cc',
                'test/perf/perf.cc',
                'test/lib/alternator_test_env.cc',
                'test/lib/cql_test_env.cc',
@@ -1753,33 +1753,32 @@ def configure_seastar(build_dir, mode, mode_config):


 def configure_abseil(build_dir, mode, mode_config):
-    # for sanitizer cflags
-    seastar_flags = query_seastar_flags(f'{outdir}/{mode}/seastar/seastar.pc',
-                                        mode_config['build_seastar_shared_libs'],
-                                        args.staticcxx)
-    seastar_cflags = seastar_flags['seastar_cflags']
+    abseil_cflags = mode_config['lib_cflags']
+    cxx_flags = mode_config['cxxflags']
+    if '-DSANITIZE' in cxx_flags:
+        abseil_cflags += ' -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr'

-    abseil_build_dir = os.path.join(build_dir, mode, 'abseil')
-
-    abseil_cflags = seastar_cflags + ' ' + modes[mode]['cxx_ld_flags']
    # We want to "undo" coverage for abseil if we have it enabled, as we are not
    # interested in the coverage of the abseil library. these flags were previously
    # added to cxx_ld_flags
    if args.coverage:
        for flag in COVERAGE_INST_FLAGS:
-            abseil_cflags = abseil_cflags.replace(f' {flag}', '')
+            cxx_flags = cxx_flags.replace(f' {flag}', '')
+
+    cxx_flags += ' ' + abseil_cflags.strip()
    cmake_mode = mode_config['cmake_build_type']
    abseil_cmake_args = [
        '-DCMAKE_BUILD_TYPE={}'.format(cmake_mode),
        '-DCMAKE_INSTALL_PREFIX={}'.format(build_dir + '/inst'), # just to avoid a warning from absl
        '-DCMAKE_C_COMPILER={}'.format(args.cc),
        '-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
-        '-DCMAKE_CXX_FLAGS_{}={}'.format(cmake_mode.upper(), abseil_cflags),
+        '-DCMAKE_CXX_FLAGS_{}={}'.format(cmake_mode.upper(), cxx_flags),
        '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON',
        '-DCMAKE_CXX_STANDARD=20',
        '-DABSL_PROPAGATE_CXX_STD=ON',
    ]

+    abseil_build_dir = os.path.join(build_dir, mode, 'abseil')
    abseil_cmd = ['cmake', '-G', 'Ninja', real_relpath('abseil', abseil_build_dir)] + abseil_cmake_args

    os.makedirs(abseil_build_dir, exist_ok=True)
--- a/cql3/query_processor.cc
+++ b/cql3/query_processor.cc
@@ -14,9 +14,11 @@
 #include <seastar/coroutine/parallel_for_each.hh>

 #include "service/storage_proxy.hh"
+#include "service/topology_mutation.hh"
 #include "service/migration_manager.hh"
 #include "service/forward_service.hh"
 #include "service/raft/raft_group0_client.hh"
+#include "service/storage_service.hh"
 #include "cql3/CqlParser.hpp"
 #include "cql3/statements/batch_statement.hh"
 #include "cql3/statements/modification_statement.hh"
@@ -42,16 +44,22 @@ const sstring query_processor::CQL_VERSION = "3.3.1";
 const std::chrono::minutes prepared_statements_cache::entry_expiry = std::chrono::minutes(60);

 struct query_processor::remote {
-    remote(service::migration_manager& mm, service::forward_service& fwd, service::raft_group0_client& group0_client)
-            : mm(mm), forwarder(fwd), group0_client(group0_client) {}
+    remote(service::migration_manager& mm, service::forward_service& fwd,
+           service::storage_service& ss, service::raft_group0_client& group0_client)
+            : mm(mm), forwarder(fwd), ss(ss), group0_client(group0_client) {}

    service::migration_manager& mm;
    service::forward_service& forwarder;
+    service::storage_service& ss;
    service::raft_group0_client& group0_client;

    seastar::gate gate;
 };

+bool query_processor::topology_global_queue_empty() {
+    return remote().first.get().ss.topology_global_queue_empty();
+}
+
 static service::query_state query_state_for_internal_call() {
    return {service::client_state::for_internal_calls(), empty_service_permit()};
 }
@@ -498,8 +506,8 @@ query_processor::~query_processor() {
 }

 void query_processor::start_remote(service::migration_manager& mm, service::forward_service& forwarder,
-                                  service::raft_group0_client& group0_client) {
-    _remote = std::make_unique<struct remote>(mm, forwarder, group0_client);
+                                   service::storage_service& ss, service::raft_group0_client& group0_client) {
+    _remote = std::make_unique<struct remote>(mm, forwarder, ss, group0_client);
 }

 future<> query_processor::stop_remote() {
@@ -835,7 +843,7 @@ bool query_processor::has_more_results(cql3::internal_query_state& state) const

 future<> query_processor::for_each_cql_result(
        cql3::internal_query_state& state,
-         noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set::row&)>&& f) {
+        noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set::row&)> f) {
    do {
        auto msg = co_await execute_paged_internal(state);
        for (auto& row : *msg) {
@@ -1018,16 +1026,29 @@ query_processor::execute_schema_statement(const statements::schema_altering_stat

    cql3::cql_warnings_vec warnings;

+    auto request_id = guard->new_group0_state_id();
+    stmt.global_req_id = request_id;
+
    auto [ret, m, cql_warnings] = co_await stmt.prepare_schema_mutations(*this, options, guard->write_timestamp());
    warnings = std::move(cql_warnings);

+    ce = std::move(ret);
    if (!m.empty()) {
        auto description = format("CQL DDL statement: \"{}\"", stmt.raw_cql_statement);
-        co_await remote_.get().mm.announce(std::move(m), std::move(*guard), description);
+        if (ce && ce->target == cql_transport::event::schema_change::target_type::TABLET_KEYSPACE) {
+            co_await remote_.get().mm.announce<service::topology_change>(std::move(m), std::move(*guard), description);
+            // TODO: eliminate timeout from alter ks statement on the cqlsh/driver side
+            auto error = co_await remote_.get().ss.wait_for_topology_request_completion(request_id);
+            co_await remote_.get().ss.wait_for_topology_not_busy();
+            if (!error.empty()) {
+                log.error("CQL statement \"{}\" with topology request_id \"{}\" failed with error: \"{}\"", stmt.raw_cql_statement, request_id, error);
+                throw exceptions::request_execution_exception(exceptions::exception_code::INVALID, error);
+            }
+        } else {
+            co_await remote_.get().mm.announce<service::schema_change>(std::move(m), std::move(*guard), description);
+        }
    }

-    ce = std::move(ret);
-
    // If an IF [NOT] EXISTS clause was used, this may not result in an actual schema change.  To avoid doing
    // extra work in the drivers to handle schema changes, we return an empty message in this case. (CASSANDRA-7600)
    ::shared_ptr<messages::result_message> result;
@@ -1158,14 +1179,14 @@ future<> query_processor::query_internal(
        db::consistency_level cl,
        const data_value_list& values,
        int32_t page_size,
-        noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f) {
+        noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f) {
    auto query_state = create_paged_state(query_string, cl, values, page_size);
    co_return co_await for_each_cql_result(query_state, std::move(f));
 }

 future<> query_processor::query_internal(
        const sstring& query_string,
-        noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f) {
+        noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f) {
    return query_internal(query_string, db::consistency_level::ONE, {}, 1000, std::move(f));
 }

--- a/cql3/query_processor.hh
+++ b/cql3/query_processor.hh
@@ -31,7 +31,6 @@
 #include "lang/wasm.hh"
 #include "service/raft/raft_group0_client.hh"
 #include "types/types.hh"
-#include "db/system_auth_keyspace.hh"


 namespace service {
@@ -151,7 +150,8 @@ public:

    ~query_processor();

-    void start_remote(service::migration_manager&, service::forward_service&, service::raft_group0_client&);
+    void start_remote(service::migration_manager&, service::forward_service&,
+                      service::storage_service& ss, service::raft_group0_client&);
    future<> stop_remote();

    data_dictionary::database db() {
@@ -176,7 +176,7 @@ public:

    wasm::manager& wasm() { return _wasm; }

-    db::system_auth_keyspace::version_t auth_version;
+    db::system_keyspace::auth_version_t auth_version;

    statements::prepared_statement::checked_weak_ptr get_prepared(const std::optional<auth::authenticated_user>& user, const prepared_cache_key_type& key) {
        if (user) {
@@ -315,7 +315,7 @@ public:
            db::consistency_level cl,
            const data_value_list& values,
            int32_t page_size,
-            noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f);
+            noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f);

    /*
     * \brief iterate over all cql results using paging
@@ -330,7 +330,7 @@ public:
     */
    future<> query_internal(
            const sstring& query_string,
-            noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f);
+            noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f);

    class cache_internal_tag;
    using cache_internal = bool_class<cache_internal_tag>;
@@ -461,6 +461,8 @@ public:

    void reset_cache();

+    bool topology_global_queue_empty();
+
 private:
    // Keep the holder until you stop using the `remote` services.
    std::pair<std::reference_wrapper<remote>, gate::holder> remote();
@@ -499,7 +501,7 @@ private:
     */
    future<> for_each_cql_result(
            cql3::internal_query_state& state,
-             noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)>&& f);
+            noncopyable_function<future<stop_iteration>(const cql3::untyped_result_set_row&)> f);

    /*!
     * \brief check, based on the state if there are additional results
--- a/cql3/statements/alter_keyspace_statement.cc
+++ b/cql3/statements/alter_keyspace_statement.cc
@@ -8,11 +8,15 @@
 * SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
 */

+#include <boost/range/algorithm.hpp>
+#include <fmt/format.h>
 #include <seastar/core/coroutine.hh>
+#include <stdexcept>
 #include "alter_keyspace_statement.hh"
 #include "prepared_statement.hh"
 #include "service/migration_manager.hh"
 #include "service/storage_proxy.hh"
+#include "service/topology_mutation.hh"
 #include "db/system_keyspace.hh"
 #include "data_dictionary/data_dictionary.hh"
 #include "data_dictionary/keyspace_metadata.hh"
@@ -21,6 +25,8 @@
 #include "create_keyspace_statement.hh"
 #include "gms/feature_service.hh"

+static logging::logger mylogger("alter_keyspace");
+
 bool is_system_keyspace(std::string_view keyspace);

 cql3::statements::alter_keyspace_statement::alter_keyspace_statement(sstring name, ::shared_ptr<ks_prop_defs> attrs)
@@ -36,6 +42,20 @@ future<> cql3::statements::alter_keyspace_statement::check_access(query_processo
    return state.has_keyspace_access(_name, auth::permission::ALTER);
 }

+static bool validate_rf_difference(const std::string_view curr_rf, const std::string_view new_rf) {
+    auto to_number = [] (const std::string_view rf) {
+        int result;
+        // We assume the passed string view represents a valid decimal number,
+        // so we don't need the error code.
+        (void) std::from_chars(rf.begin(), rf.end(), result);
+        return result;
+    };
+
+    // We want to ensure that each DC's RF is going to change by at most 1
+    // because in that case the old and new quorums must overlap.
+    return std::abs(to_number(curr_rf) - to_number(new_rf)) <= 1;
+}
+
 void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, const service::client_state& state) const {
        auto tmp = _name;
        std::transform(tmp.begin(), tmp.end(), tmp.begin(), ::tolower);
@@ -61,6 +81,17 @@ void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, c
            }

            auto new_ks = _attrs->as_ks_metadata_update(ks.metadata(), *qp.proxy().get_token_metadata_ptr(), qp.proxy().features());
+
+            if (ks.get_replication_strategy().uses_tablets()) {
+                const std::map<sstring, sstring>& current_rfs = ks.metadata()->strategy_options();
+                for (const auto& [new_dc, new_rf] : _attrs->get_replication_options()) {
+                    auto it = current_rfs.find(new_dc);
+                    if (it != current_rfs.end() && !validate_rf_difference(it->second, new_rf)) {
+                        throw exceptions::invalid_request_exception("Cannot modify replication factor of any DC by more than 1 at a time.");
+                    }
+                }
+            }
+
            locator::replication_strategy_params params(new_ks->strategy_options(), new_ks->initial_tablets());
            auto new_rs = locator::abstract_replication_strategy::create_replication_strategy(new_ks->strategy_name(), params);
            if (new_rs->is_per_table() != ks.get_replication_strategy().is_per_table()) {
@@ -83,20 +114,63 @@ void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, c

 future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>>
 cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_processor& qp, const query_options&, api::timestamp_type ts) const {
+    using namespace cql_transport;
    try {
-        auto old_ksm = qp.db().find_keyspace(_name).metadata();
+        event::schema_change::target_type target_type = event::schema_change::target_type::KEYSPACE;
+        auto ks = qp.db().find_keyspace(_name);
+        auto ks_md = ks.metadata();
        const auto& tm = *qp.proxy().get_token_metadata_ptr();
        const auto& feat = qp.proxy().features();
+        auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, tm, feat);
+        std::vector<mutation> muts;
+        std::vector<sstring> warnings;
+        auto ks_options = _attrs->get_all_options_flattened(feat);

-        auto m = service::prepare_keyspace_update_announcement(qp.db().real_database(), _attrs->as_ks_metadata_update(old_ksm, tm, feat), ts);
+        // we only want to run the tablets path if there are actually any tablets changes, not only schema changes
+        if (ks.get_replication_strategy().uses_tablets() && !_attrs->get_replication_options().empty()) {
+            if (!qp.topology_global_queue_empty()) {
+                return make_exception_future<std::tuple<::shared_ptr<::cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>>(
+                        exceptions::invalid_request_exception("Another global topology request is ongoing, please retry."));
+            }
+            if (_attrs->get_replication_options().contains(ks_prop_defs::REPLICATION_FACTOR_KEY)) {
+                return make_exception_future<std::tuple<::shared_ptr<::cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>>(
+                       exceptions::invalid_request_exception("'replication_factor' tag is not allowed when executing ALTER KEYSPACE with tablets, please list the DCs explicitly"));
+            }
+            qp.db().real_database().validate_keyspace_update(*ks_md_update);
+
+            service::topology_mutation_builder builder(ts);
+            builder.set_global_topology_request(service::global_topology_request::keyspace_rf_change);
+            builder.set_global_topology_request_id(this->global_req_id);
+            builder.set_new_keyspace_rf_change_data(_name, ks_options);
+            service::topology_change change{{builder.build()}};
+
+            auto topo_schema = qp.db().find_schema(db::system_keyspace::NAME, db::system_keyspace::TOPOLOGY);
+            boost::transform(change.mutations, std::back_inserter(muts), [topo_schema] (const canonical_mutation& cm) {
+                return cm.to_mutation(topo_schema);
+            });
+
+            service::topology_request_tracking_mutation_builder rtbuilder{utils::UUID{this->global_req_id}};
+            rtbuilder.set("done", false)
+                     .set("start_time", db_clock::now());
+            service::topology_change req_change{{rtbuilder.build()}};
+
+            auto topo_req_schema = qp.db().find_schema(db::system_keyspace::NAME, db::system_keyspace::TOPOLOGY_REQUESTS);
+            boost::transform(req_change.mutations, std::back_inserter(muts), [topo_req_schema] (const canonical_mutation& cm) {
+                return cm.to_mutation(topo_req_schema);
+            });
+
+            target_type = event::schema_change::target_type::TABLET_KEYSPACE;
+        } else {
+            auto schema_mutations = service::prepare_keyspace_update_announcement(qp.db().real_database(), ks_md_update, ts);
+            muts.insert(muts.begin(), schema_mutations.begin(), schema_mutations.end());
+        }

-        using namespace cql_transport;
        auto ret = ::make_shared<event::schema_change>(
                event::schema_change::change_type::UPDATED,
-                event::schema_change::target_type::KEYSPACE,
+                target_type,
                keyspace());

-        return make_ready_future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>>(std::make_tuple(std::move(ret), std::move(m), std::vector<sstring>()));
+        return make_ready_future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>>(std::make_tuple(std::move(ret), std::move(muts), warnings));
    } catch (data_dictionary::no_such_keyspace& e) {
        return make_exception_future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>>(exceptions::invalid_request_exception("Unknown keyspace " + _name));
    }
@@ -107,7 +181,6 @@ cql3::statements::alter_keyspace_statement::prepare(data_dictionary::database db
    return std::make_unique<prepared_statement>(make_shared<alter_keyspace_statement>(*this));
 }

-static logging::logger mylogger("alter_keyspace");

 future<::shared_ptr<cql_transport::messages::result_message>>
 cql3::statements::alter_keyspace_statement::execute(query_processor& qp, service::query_state& state, const query_options& options, std::optional<service::group0_guard> guard) const {
--- a/cql3/statements/ks_prop_defs.cc
+++ b/cql3/statements/ks_prop_defs.cc
@@ -24,7 +24,6 @@ static std::map<sstring, sstring> prepare_options(
        const sstring& strategy_class,
        const locator::token_metadata& tm,
        std::map<sstring, sstring> options,
-        std::optional<unsigned>& initial_tablets,
        const std::map<sstring, sstring>& old_options = {}) {
    options.erase(ks_prop_defs::REPLICATION_STRATEGY_CLASS_KEY);

@@ -72,6 +71,35 @@ static std::map<sstring, sstring> prepare_options(
    return options;
 }

+ks_prop_defs::ks_prop_defs(std::map<sstring, sstring> options) {
+    std::map<sstring, sstring> replication_opts, storage_opts, tablets_opts, durable_writes_opts;
+
+    auto read_property_into = [] (auto& map, const sstring& name, const sstring& value, const sstring& tag) {
+        map[name.substr(sstring(tag).size() + 1)] = value;
+    };
+
+    for (const auto& [name, value] : options) {
+        if (name.starts_with(KW_DURABLE_WRITES)) {
+            read_property_into(durable_writes_opts, name, value, KW_DURABLE_WRITES);
+        } else if (name.starts_with(KW_REPLICATION)) {
+            read_property_into(replication_opts, name, value, KW_REPLICATION);
+        } else if (name.starts_with(KW_TABLETS)) {
+            read_property_into(tablets_opts, name, value, KW_TABLETS);
+        } else if (name.starts_with(KW_STORAGE)) {
+            read_property_into(storage_opts, name, value, KW_STORAGE);
+        }
+    }
+
+    if (!replication_opts.empty())
+        add_property(KW_REPLICATION, replication_opts);
+    if (!storage_opts.empty())
+        add_property(KW_STORAGE, storage_opts);
+    if (!tablets_opts.empty())
+        add_property(KW_TABLETS, tablets_opts);
+    if (!durable_writes_opts.empty())
+        add_property(KW_DURABLE_WRITES, durable_writes_opts.begin()->second);
+}
+
 void ks_prop_defs::validate() {
    // Skip validation if the strategy class is already set as it means we've already
    // prepared (and redoing it would set strategyClass back to null, which we don't want)
@@ -110,38 +138,37 @@ data_dictionary::storage_options ks_prop_defs::get_storage_options() const {
    return opts;
 }

-std::optional<unsigned> ks_prop_defs::get_initial_tablets(const sstring& strategy_class, bool enabled_by_default) const {
+ks_prop_defs::init_tablets_options ks_prop_defs::get_initial_tablets(const sstring& strategy_class, bool enabled_by_default) const {
    // FIXME -- this should be ignored somehow else
+    init_tablets_options ret{ .enabled = false, .specified_count = std::nullopt };
    if (locator::abstract_replication_strategy::to_qualified_class_name(strategy_class) != "org.apache.cassandra.locator.NetworkTopologyStrategy") {
-        return std::nullopt;
+        return ret;
    }

    auto tablets_options = get_map(KW_TABLETS);
    if (!tablets_options) {
-        return enabled_by_default ? std::optional<unsigned>(0) : std::nullopt;
+        return enabled_by_default ? init_tablets_options{ .enabled = true } : ret;
    }

-    std::optional<unsigned> ret;
-
    auto it = tablets_options->find("enabled");
    if (it != tablets_options->end()) {
        auto enabled = it->second;
        tablets_options->erase(it);

        if (enabled == "true") {
-            ret = 0; // even if 'initial' is not set, it'll start with auto-detection
+            ret = init_tablets_options{ .enabled = true, .specified_count = 0 }; // even if 'initial' is not set, it'll start with auto-detection
        } else if (enabled == "false") {
-            assert(!ret.has_value());
+            assert(!ret.enabled);
            return ret;
        } else {
-            throw exceptions::configuration_exception(sstring("Tablets enabled value must be true or false; found ") + it->second);
+            throw exceptions::configuration_exception(sstring("Tablets enabled value must be true or false; found: ") + enabled);
        }
    }

    it = tablets_options->find("initial");
    if (it != tablets_options->end()) {
        try {
-            ret = std::stol(it->second);
+            ret = init_tablets_options{ .enabled = true, .specified_count = std::stol(it->second)};
        } catch (...) {
            throw exceptions::configuration_exception(sstring("Initial tablets value should be numeric; found ") + it->second);
        }
@@ -159,29 +186,55 @@ std::optional<sstring> ks_prop_defs::get_replication_strategy_class() const {
    return _strategy_class;
 }

+bool ks_prop_defs::get_durable_writes() const {
+    return get_boolean(KW_DURABLE_WRITES, true);
+}
+
+std::map<sstring, sstring> ks_prop_defs::get_all_options_flattened(const gms::feature_service& feat) const {
+    std::map<sstring, sstring> all_options;
+
+    auto ingest_flattened_options = [&all_options](const std::map<sstring, sstring>& options, const sstring& prefix) {
+        for (auto& option: options) {
+            all_options[prefix + ":" + option.first] = option.second;
+        }
+    };
+    ingest_flattened_options(get_replication_options(), KW_REPLICATION);
+    ingest_flattened_options(get_storage_options().to_map(), KW_STORAGE);
+    ingest_flattened_options(get_map(KW_TABLETS).value_or(std::map<sstring, sstring>{}), KW_TABLETS);
+    ingest_flattened_options({{sstring(KW_DURABLE_WRITES), to_sstring(get_boolean(KW_DURABLE_WRITES, true))}}, KW_DURABLE_WRITES);
+
+    return all_options;
+}
+
 lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata(sstring ks_name, const locator::token_metadata& tm, const gms::feature_service& feat) {
    auto sc = get_replication_strategy_class().value();
-    std::optional<unsigned> initial_tablets = get_initial_tablets(sc, feat.tablets);
-    auto options = prepare_options(sc, tm, get_replication_options(), initial_tablets);
+    auto initial_tablets = get_initial_tablets(sc, feat.tablets);
+    // if tablets options have not been specified, but tablets are globally enabled, set the value to 0
+    if (initial_tablets.enabled && !initial_tablets.specified_count) {
+        initial_tablets.specified_count = 0;
+    }
+    auto options = prepare_options(sc, tm, get_replication_options());
    return data_dictionary::keyspace_metadata::new_keyspace(ks_name, sc,
-            std::move(options), initial_tablets, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
+            std::move(options), initial_tablets.specified_count, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
 }

 lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata_update(lw_shared_ptr<data_dictionary::keyspace_metadata> old, const locator::token_metadata& tm, const gms::feature_service& feat) {
    std::map<sstring, sstring> options;
    const auto& old_options = old->strategy_options();
    auto sc = get_replication_strategy_class();
-    std::optional<unsigned> initial_tablets;
    if (sc) {
-        initial_tablets = get_initial_tablets(*sc, old->initial_tablets().has_value());
-        options = prepare_options(*sc, tm, get_replication_options(), initial_tablets, old_options);
+        options = prepare_options(*sc, tm, get_replication_options(), old_options);
    } else {
        sc = old->strategy_name();
        options = old_options;
-        initial_tablets = old->initial_tablets();
+    }
+    auto initial_tablets = get_initial_tablets(*sc, old->initial_tablets().has_value());
+    // if tablets options have not been specified, inherit them if it's tablets-enabled KS
+    if (initial_tablets.enabled && !initial_tablets.specified_count) {
+        initial_tablets.specified_count = old->initial_tablets();
    }

-    return data_dictionary::keyspace_metadata::new_keyspace(old->name(), *sc, options, initial_tablets, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
+    return data_dictionary::keyspace_metadata::new_keyspace(old->name(), *sc, options, initial_tablets.specified_count, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
 }


--- a/cql3/statements/ks_prop_defs.hh
+++ b/cql3/statements/ks_prop_defs.hh
@@ -49,11 +49,21 @@ public:
 private:
    std::optional<sstring> _strategy_class;
 public:
+    struct init_tablets_options {
+        bool enabled;
+        std::optional<unsigned> specified_count;
+    };
+
+    ks_prop_defs() = default;
+    explicit ks_prop_defs(std::map<sstring, sstring> options);
+
    void validate();
    std::map<sstring, sstring> get_replication_options() const;
    std::optional<sstring> get_replication_strategy_class() const;
-    std::optional<unsigned> get_initial_tablets(const sstring& strategy_class, bool enabled_by_default) const;
+    init_tablets_options get_initial_tablets(const sstring& strategy_class, bool enabled_by_default) const;
    data_dictionary::storage_options get_storage_options() const;
+    bool get_durable_writes() const;
+    std::map<sstring, sstring> get_all_options_flattened(const gms::feature_service& feat) const;
    lw_shared_ptr<data_dictionary::keyspace_metadata> as_ks_metadata(sstring ks_name, const locator::token_metadata&, const gms::feature_service&);
    lw_shared_ptr<data_dictionary::keyspace_metadata> as_ks_metadata_update(lw_shared_ptr<data_dictionary::keyspace_metadata> old, const locator::token_metadata&, const gms::feature_service&);
 };
--- a/cql3/statements/schema_altering_statement.hh
+++ b/cql3/statements/schema_altering_statement.hh
@@ -63,6 +63,7 @@ protected:

 public:
    virtual future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>> prepare_schema_mutations(query_processor& qp, const query_options& options, api::timestamp_type) const = 0;
+    mutable utils::UUID global_req_id;
 };

 }
--- a/cql3/statements/select_statement.cc
+++ b/cql3/statements/select_statement.cc
@@ -2032,7 +2032,10 @@ std::unique_ptr<prepared_statement> select_statement::prepare(data_dictionary::d
            && !restrictions->need_filtering()  // No filtering
            && group_by_cell_indices->empty()   // No GROUP BY
            && db.get_config().enable_parallelized_aggregation()
-            && !is_local_table();
+            && !is_local_table()
+            && !( // Do not parallelize the request if it's single partition read
+                restrictions->partition_key_restrictions_is_all_eq() 
+                && restrictions->partition_key_restrictions_size() == schema->partition_key_size());
    };

    if (_parameters->is_prune_materialized_view()) {
--- a/data_dictionary/data_dictionary.cc
+++ b/data_dictionary/data_dictionary.cc
@@ -390,6 +390,12 @@ struct fmt::formatter<data_dictionary::user_types_metadata> {
 };

 auto fmt::formatter<data_dictionary::keyspace_metadata>::format(const data_dictionary::keyspace_metadata& m, fmt::format_context& ctx) const -> decltype(ctx.out()) {
-    return fmt::format_to(ctx.out(), "KSMetaData{{name={}, strategyClass={}, strategyOptions={}, cfMetaData={}, durable_writes={}, userTypes={}}}",
-            m.name(), m.strategy_name(), m.strategy_options(), m.cf_meta_data(), m.durable_writes(), m.user_types());
+    fmt::format_to(ctx.out(), "KSMetaData{{name={}, strategyClass={}, strategyOptions={}, cfMetaData={}, durable_writes={}, tablets=",
+            m.name(), m.strategy_name(), m.strategy_options(), m.cf_meta_data(), m.durable_writes());
+    if (m.initial_tablets()) {
+        fmt::format_to(ctx.out(), "{{\"initial\":{}}}", m.initial_tablets().value());
+    } else {
+        fmt::format_to(ctx.out(), "{{\"enabled\":false}}");
+    }
+    return fmt::format_to(ctx.out(), ", userTypes={}}}", m.user_types());
 }
--- a/db/CMakeLists.txt
+++ b/db/CMakeLists.txt
@@ -2,7 +2,6 @@ add_library(db STATIC)
 target_sources(db
  PRIVATE
    consistency_level.cc
-    system_auth_keyspace.cc
    system_keyspace.cc
    virtual_table.cc
    virtual_tables.cc
--- a/db/batchlog_manager.cc
+++ b/db/batchlog_manager.cc
@@ -133,7 +133,7 @@ future<> db::batchlog_manager::stop() {
 }

 future<size_t> db::batchlog_manager::count_all_batches() const {
-    sstring query = format("SELECT count(*) FROM {}.{}", system_keyspace::NAME, system_keyspace::BATCHLOG);
+    sstring query = format("SELECT count(*) FROM {}.{} BYPASS CACHE", system_keyspace::NAME, system_keyspace::BATCHLOG);
    return _qp.execute_internal(query, cql3::query_processor::cache_internal::yes).then([](::shared_ptr<cql3::untyped_result_set> rs) {
       return size_t(rs->one().get_as<int64_t>("count"));
    });
@@ -152,26 +152,26 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
    auto throttle = _replay_rate / _qp.proxy().get_token_metadata_ptr()->count_normal_token_owners();
    auto limiter = make_lw_shared<utils::rate_limiter>(throttle);

-    auto batch = [this, limiter](const cql3::untyped_result_set::row& row) {
+    auto batch = [this, limiter](const cql3::untyped_result_set::row& row) -> future<stop_iteration> {
        auto written_at = row.get_as<db_clock::time_point>("written_at");
        auto id = row.get_as<utils::UUID>("id");
        // enough time for the actual write + batchlog entry mutation delivery (two separate requests).
        auto timeout = get_batch_log_timeout();
        if (db_clock::now() < written_at + timeout) {
            blogger.debug("Skipping replay of {}, too fresh", id);
-            return make_ready_future<>();
+            return make_ready_future<stop_iteration>(stop_iteration::no);
        }

        // check version of serialization format
        if (!row.has("version")) {
            blogger.warn("Skipping logged batch because of unknown version");
-            return make_ready_future<>();
+            return make_ready_future<stop_iteration>(stop_iteration::no);
        }

        auto version = row.get_as<int32_t>("version");
        if (version != netw::messaging_service::current_version) {
            blogger.warn("Skipping logged batch because of incorrect version");
-            return make_ready_future<>();
+            return make_ready_future<stop_iteration>(stop_iteration::no);
        }

        auto data = row.get_blob("data");
@@ -253,49 +253,20 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
            auto now = service::client_state(service::client_state::internal_tag()).get_timestamp();
            m.partition().apply_delete(*schema, clustering_key_prefix::make_empty(), tombstone(now, gc_clock::now()));
            return _qp.proxy().mutate_locally(m, tracing::trace_state_ptr(), db::commitlog::force_sync::no);
-        });
+        }).then([] { return make_ready_future<stop_iteration>(stop_iteration::no); });
    };

-    return seastar::with_gate(_gate, [this, batch = std::move(batch)] {
+    return seastar::with_gate(_gate, [this, batch = std::move(batch)] () mutable {
        blogger.debug("Started replayAllFailedBatches (cpu {})", this_shard_id());
-
-        typedef ::shared_ptr<cql3::untyped_result_set> page_ptr;
-        sstring query = format("SELECT id, data, written_at, version FROM {}.{} LIMIT {:d}", system_keyspace::NAME, system_keyspace::BATCHLOG, page_size);
-        return _qp.execute_internal(query, cql3::query_processor::cache_internal::yes).then([this, batch = std::move(batch)](page_ptr page) {
-            return do_with(std::move(page), [this, batch = std::move(batch)](page_ptr & page) mutable {
-                return repeat([this, &page, batch = std::move(batch)]() mutable {
-                    if (page->empty()) {
-                        return make_ready_future<stop_iteration>(stop_iteration::yes);
-                    }
-                    auto id = page->back().get_as<utils::UUID>("id");
-                    return parallel_for_each(*page, batch).then([this, &page, id]() {
-                        if (page->size() < page_size) {
-                            return make_ready_future<stop_iteration>(stop_iteration::yes); // we've exhausted the batchlog, next query would be empty.
-                        }
-                        sstring query = format("SELECT id, data, written_at, version FROM {}.{} WHERE token(id) > token(?) LIMIT {:d}",
-                                system_keyspace::NAME,
-                                system_keyspace::BATCHLOG,
-                                page_size);
-                        return _qp.execute_internal(query, {id}, cql3::query_processor::cache_internal::yes).then([&page](auto res) {
-                                    page = std::move(res);
-                                    return make_ready_future<stop_iteration>(stop_iteration::no);
-                                });
-                    });
-                });
-            });
-        }).then([] {
-        // TODO FIXME : cleanup()
-#if 0
-            ColumnFamilyStore cfs = Keyspace.open(SystemKeyspace.NAME).getColumnFamilyStore(SystemKeyspace.BATCHLOG);
-            cfs.forceBlockingFlush();
-            Collection<Descriptor> descriptors = new ArrayList<>();
-            for (SSTableReader sstr : cfs.getSSTables())
-            descriptors.add(sstr.descriptor);
-            if (!descriptors.isEmpty()) // don't pollute the logs if there is nothing to compact.
-            CompactionManager.instance.submitUserDefined(cfs, descriptors, Integer.MAX_VALUE).get();
-
-#endif
-
+        return _qp.query_internal(
+                format("SELECT id, data, written_at, version FROM {}.{} BYPASS CACHE", system_keyspace::NAME, system_keyspace::BATCHLOG),
+                db::consistency_level::ONE,
+                {},
+                page_size,
+                std::move(batch)).then([this] {
+            // Replaying batches could have generated tombstones, flush to disk,
+            // where they can be compacted away.
+            return replica::database::flush_table_on_all_shards(_qp.proxy().get_db(), system_keyspace::NAME, system_keyspace::BATCHLOG);
        }).then([] {
            blogger.debug("Finished replayAllFailedBatches");
        });
--- a/db/config.cc
+++ b/db/config.cc
@@ -991,7 +991,7 @@ db::config::config(std::shared_ptr<db::extensions> exts)
    , unspooled_dirty_soft_limit(this, "unspooled_dirty_soft_limit", value_status::Used, 0.6, "Soft limit of unspooled dirty memory expressed as a portion of the hard limit.")
    , sstable_summary_ratio(this, "sstable_summary_ratio", value_status::Used, 0.0005, "Enforces that 1 byte of summary is written for every N (2000 by default)"
        "bytes written to data file. Value must be between 0 and 1.")
-    , components_memory_reclaim_threshold(this, "components_memory_reclaim_threshold", liveness::LiveUpdate, value_status::Used, .1, "Ratio of available memory for all in-memory components of SSTables in a shard beyond which the memory will be reclaimed from components until it falls back under the threshold. Currently, this limit is only enforced for bloom filters.")
+    , components_memory_reclaim_threshold(this, "components_memory_reclaim_threshold", liveness::LiveUpdate, value_status::Used, .2, "Ratio of available memory for all in-memory components of SSTables in a shard beyond which the memory will be reclaimed from components until it falls back under the threshold. Currently, this limit is only enforced for bloom filters.")
    , large_memory_allocation_warning_threshold(this, "large_memory_allocation_warning_threshold", value_status::Used, size_t(1) << 20, "Warn about memory allocations above this size; set to zero to disable.")
    , enable_deprecated_partitioners(this, "enable_deprecated_partitioners", value_status::Used, false, "Enable the byteordered and random partitioners. These partitioners are deprecated and will be removed in a future version.")
    , enable_keyspace_column_family_metrics(this, "enable_keyspace_column_family_metrics", value_status::Used, false, "Enable per keyspace and per column family metrics reporting.")
@@ -1031,6 +1031,8 @@ db::config::config(std::shared_ptr<db::extensions> exts)
            "Start serializing reads after their collective memory consumption goes above $normal_limit * $multiplier.")
    , reader_concurrency_semaphore_kill_limit_multiplier(this, "reader_concurrency_semaphore_kill_limit_multiplier", liveness::LiveUpdate, value_status::Used, 4,
            "Start killing reads after their collective memory consumption goes above $normal_limit * $multiplier.")
+    , reader_concurrency_semaphore_cpu_concurrency(this, "reader_concurrency_semaphore_cpu_concurrency", liveness::LiveUpdate, value_status::Used, 1,
+            "Admit new reads while there are less than this number of requests that need CPU.")
    , twcs_max_window_count(this, "twcs_max_window_count", liveness::LiveUpdate, value_status::Used, 50,
            "The maximum number of compaction windows allowed when making use of TimeWindowCompactionStrategy. A setting of 0 effectively disables the restriction.")
    , initial_sstable_loading_concurrency(this, "initial_sstable_loading_concurrency", value_status::Used, 4u,
@@ -1157,6 +1159,7 @@ db::config::config(std::shared_ptr<db::extensions> exts)
    , service_levels_interval(this, "service_levels_interval_ms", liveness::LiveUpdate, value_status::Used, 10000, "Controls how often service levels module polls configuration table")
    , error_injections_at_startup(this, "error_injections_at_startup", error_injection_value_status, {}, "List of error injections that should be enabled on startup.")
    , topology_barrier_stall_detector_threshold_seconds(this, "topology_barrier_stall_detector_threshold_seconds", value_status::Used, 2, "Report sites blocking topology barrier if it takes longer than this.")
+    , enable_tablets(this, "enable_tablets", value_status::Used, false, "Enable tablets for newly created keyspaces")
    , default_log_level(this, "default_log_level", value_status::Used)
    , logger_log_level(this, "logger_log_level", value_status::Used)
    , log_to_stdout(this, "log_to_stdout", value_status::Used)
@@ -1347,7 +1350,7 @@ std::map<sstring, db::experimental_features_t::feature> db::experimental_feature
        {"consistent-topology-changes", feature::UNUSED},
        {"broadcast-tables", feature::BROADCAST_TABLES},
        {"keyspace-storage-options", feature::KEYSPACE_STORAGE_OPTIONS},
-        {"tablets", feature::TABLETS},
+        {"tablets", feature::UNUSED},
    };
 }

--- a/db/config.hh
+++ b/db/config.hh
@@ -111,7 +111,6 @@ struct experimental_features_t {
        ALTERNATOR_STREAMS,
        BROADCAST_TABLES,
        KEYSPACE_STORAGE_OPTIONS,
-        TABLETS,
    };
    static std::map<sstring, feature> map(); // See enum_option.
    static std::vector<enum_option<experimental_features_t>> all();
@@ -390,6 +389,7 @@ public:
    named_value<uint64_t> max_memory_for_unlimited_query_hard_limit;
    named_value<uint32_t> reader_concurrency_semaphore_serialize_limit_multiplier;
    named_value<uint32_t> reader_concurrency_semaphore_kill_limit_multiplier;
+    named_value<uint32_t> reader_concurrency_semaphore_cpu_concurrency;
    named_value<uint32_t> twcs_max_window_count;
    named_value<unsigned> initial_sstable_loading_concurrency;
    named_value<bool> enable_3_1_0_compatibility_mode;
@@ -495,6 +495,7 @@ public:

    named_value<std::vector<error_injection_at_startup>> error_injections_at_startup;
    named_value<double> topology_barrier_stall_detector_threshold_seconds;
+    named_value<bool> enable_tablets;

    static const sstring default_tls_priority;
 private:
--- a/db/hints/manager.cc
+++ b/db/hints/manager.cc
@@ -278,7 +278,7 @@ sync_point::shard_rps manager::calculate_current_sync_point(std::span<const gms:
        auto it = _ep_managers.find(*hid);
        if (it != _ep_managers.end()) {
            const hint_endpoint_manager& ep_man = it->second;
-            rps[addr] = ep_man.last_written_replay_position();
+            rps[*hid] = ep_man.last_written_replay_position();
        }
    }

@@ -316,10 +316,14 @@ future<> manager::wait_for_sync_point(abort_source& as, const sync_point::shard_
    hid_rps.reserve(rps.size());

    for (const auto& [addr, rp] : rps) {
-        const auto maybe_hid = tmptr->get_host_id_if_known(addr);
-        // Ignore the IPs we cannot map.
-        if (maybe_hid) [[likely]] {
-            hid_rps.emplace(*maybe_hid, rp);
+        if (std::holds_alternative<gms::inet_address>(addr)) {
+            const auto maybe_hid = tmptr->get_host_id_if_known(std::get<gms::inet_address>(addr));
+            // Ignore the IPs we cannot map.
+            if (maybe_hid) [[likely]] {
+                hid_rps.emplace(*maybe_hid, rp);
+            }
+        } else {
+            hid_rps.emplace(std::get<locator::host_id>(addr), rp);
        }
    }

@@ -409,6 +413,12 @@ bool manager::have_ep_manager(const std::variant<locator::host_id, gms::inet_add
 bool manager::store_hint(endpoint_id host_id, gms::inet_address ip, schema_ptr s, lw_shared_ptr<const frozen_mutation> fm,
        tracing::trace_state_ptr tr_state) noexcept
 {
+    if (utils::get_local_injector().enter("reject_incoming_hints")) {
+        manager_logger.debug("Rejecting a hint to {} / {} due to an error injection", host_id, ip);
+        ++_stats.dropped;
+        return false;
+    }
+
    if (stopping() || draining_all() || !started() || !can_hint_for(host_id)) {
        manager_logger.trace("Can't store a hint to {}", host_id);
        ++_stats.dropped;
@@ -554,10 +564,16 @@ future<> manager::change_host_filter(host_filter filter) {
            const auto maybe_host_id_and_ip = std::invoke([&] () -> std::optional<pair_type> {
                try {
                    locator::host_id_or_endpoint hid_or_ep{de.name};
-                    if (hid_or_ep.has_host_id()) {
+
+                    // If hinted handoff is host-ID-based, hint directories representing IP addresses must've
+                    // been created by mistake and they're invalid. The same for pre-host-ID hinted handoff
+                    // -- hint directories representing host IDs are NOT valid.
+                    if (hid_or_ep.has_host_id() && _uses_host_id) {
                        return std::make_optional(pair_type{hid_or_ep.id(), hid_or_ep.resolve_endpoint(*tmptr)});
-                    } else {
+                    } else if (hid_or_ep.has_endpoint() && !_uses_host_id) {
                        return std::make_optional(pair_type{hid_or_ep.resolve_id(*tmptr), hid_or_ep.endpoint()});
+                    } else {
+                        return std::nullopt;
                    }
                } catch (...) {
                    return std::nullopt;
@@ -565,6 +581,8 @@ future<> manager::change_host_filter(host_filter filter) {
            });

            if (!maybe_host_id_and_ip) {
+                manager_logger.warn("Encountered a hint directory of invalid name while changing the host filter: {}. "
+                        "Hints stored in it won't be replayed.", de.name);
                co_return;
            }

@@ -618,12 +636,12 @@ bool manager::check_dc_for(endpoint_id ep) const noexcept {
    }
 }

-future<> manager::drain_for(endpoint_id endpoint) noexcept {
+future<> manager::drain_for(endpoint_id host_id, gms::inet_address ip) noexcept {
    if (!started() || stopping() || draining_all()) {
        co_return;
    }

-    manager_logger.trace("on_leave_cluster: {} is removed/decommissioned", endpoint);
+    manager_logger.trace("on_leave_cluster: {} is removed/decommissioned", host_id);

    const auto holder = seastar::gate::holder{_draining_eps_gate};
    // As long as we hold on to this lock, no migration of hinted handoff to host IDs
@@ -642,7 +660,7 @@ future<> manager::drain_for(endpoint_id endpoint) noexcept {

    std::exception_ptr eptr = nullptr;

-    if (_proxy.local_db().get_token_metadata().get_topology().is_me(endpoint)) {
+    if (_proxy.local_db().get_token_metadata().get_topology().is_me(host_id)) {
        set_draining_all();

        try {
@@ -657,28 +675,45 @@ future<> manager::drain_for(endpoint_id endpoint) noexcept {
        _ep_managers.clear();
        _hint_directory_manager.clear();
    } else {
-        auto it = _ep_managers.find(endpoint);
-
-        if (it != _ep_managers.end()) {
-            try {
-                co_await drain_ep_manager(it->second);
-            } catch (...) {
-                eptr = std::current_exception();
+        const auto maybe_host_id = std::invoke([&] () -> std::optional<locator::host_id> {
+            if (_uses_host_id) {
+                return host_id;
            }
+            // Before the whole cluster is migrated to the host-ID-based hinted handoff,
+            // one hint directory may correspond to multiple target nodes. If *any* of them
+            // leaves the cluster, we should drain the hint directory. This is why we need
+            // to rely on this mapping here.
+            const auto maybe_mapping = _hint_directory_manager.get_mapping(host_id, ip);
+            if (maybe_mapping) {
+                return maybe_mapping->first;
+            }
+            return std::nullopt;
+        });

-            // We can't provide the function with `it` here because we co_await above,
-            // so iterators could have been invalidated.
-            // This never throws.
-            _ep_managers.erase(endpoint);
-            _hint_directory_manager.remove_mapping(endpoint);
+        if (maybe_host_id) {
+            auto it = _ep_managers.find(*maybe_host_id);
+
+            if (it != _ep_managers.end()) {
+                try {
+                    co_await drain_ep_manager(it->second);
+                } catch (...) {
+                    eptr = std::current_exception();
+                }
+
+                // We can't provide the function with `it` here because we co_await above,
+                // so iterators could have been invalidated.
+                // This never throws.
+                _ep_managers.erase(*maybe_host_id);
+                _hint_directory_manager.remove_mapping(*maybe_host_id);
+            }
        }
    }

    if (eptr) {
-        manager_logger.error("Exception when draining {}: {}", endpoint, eptr);
+        manager_logger.error("Exception when draining {}: {}", host_id, eptr);
    }

-    manager_logger.trace("drain_for: finished draining {}", endpoint);
+    manager_logger.trace("drain_for: finished draining {}", host_id);
 }

 void manager::update_backlog(size_t backlog, size_t max_backlog) {
@@ -700,8 +735,6 @@ future<> manager::with_file_update_mutex_for(const std::variant<locator::host_id
    return _ep_managers.at(host_id).with_file_update_mutex(std::move(func));
 }

-// The function assumes that if `_uses_host_id == true`, then there are no directories that represent IP addresses,
-// i.e. every directory is either valid and represents a host ID, or is invalid (so it should be ignored anyway).
 future<> manager::initialize_endpoint_managers() {
    auto maybe_create_ep_mgr = [this] (const locator::host_id& host_id, const gms::inet_address& ip) -> future<> {
        if (!check_dc_for(host_id)) {
@@ -729,16 +762,29 @@ future<> manager::initialize_endpoint_managers() {

        // The directory is invalid, so there's nothing more to do.
        if (!maybe_host_id_or_ep) {
+            manager_logger.warn("Encountered a hint directory of invalid name while initializing endpoint managers: {}. "
+                    "Hints stored in it won't be replayed", de.name);
            co_return;
        }

        if (_uses_host_id) {
+            // If hinted handoff is host-ID-based but the directory doesn't represent a host ID,
+            // it's invalid. Ignore it.
+            if (!maybe_host_id_or_ep->has_host_id()) {
+                co_return;
+            }
+
            // If hinted handoff is host-ID-based, `get_ep_manager` will NOT use the passed IP address,
            // so we simply pass the default value there.
            co_return co_await maybe_create_ep_mgr(maybe_host_id_or_ep->id(), gms::inet_address{});
        }

        // If we have got to this line, hinted handoff is still IP-based and we need to map the IP.
+        if (!maybe_host_id_or_ep->has_endpoint()) {
+            // If the directory name doesn't represent an IP, it's invalid. We ignore it.
+            co_return;
+        }
+
        const auto maybe_host_id = std::invoke([&] () -> std::optional<locator::host_id> {
            try {
                return maybe_host_id_or_ep->resolve_id(*tmptr);
--- a/db/hints/manager.hh
+++ b/db/hints/manager.hh
@@ -317,11 +317,16 @@ public:
    /// In both cases - removes the corresponding hints' directories after all hints have been drained and erases the
    /// corresponding hint_endpoint_manager objects.
    ///
-    /// \param endpoint node that left the cluster
-    future<> drain_for(endpoint_id endpoint) noexcept;
+    /// \param host_id host ID of the node that left the cluster
+    /// \param ip the IP of the node that left the cluster
+    future<> drain_for(endpoint_id host_id, gms::inet_address ip) noexcept;

    void update_backlog(size_t backlog, size_t max_backlog);

+    bool uses_host_id() const noexcept {
+        return _uses_host_id;
+    }
+
 private:
    bool stopping() const noexcept {
        return _state.contains(state::stopping);
--- a/db/hints/resource_manager.cc
+++ b/db/hints/resource_manager.cc
@@ -148,10 +148,16 @@ void space_watchdog::on_timer() {
                auto maybe_variant = std::invoke([&] () -> std::optional<std::variant<locator::host_id, gms::inet_address>> {
                    try {
                        const auto hid_or_ep = locator::host_id_or_endpoint{de.name};
-                        if (hid_or_ep.has_host_id()) {
+
+                        // If hinted handoff is host-ID-based, hint directories representing IP addresses must've
+                        // been created by mistake and they're invalid. The same for pre-host-ID hinted handoff
+                        // -- hint directories representing host IDs are NOT valid.
+                        if (hid_or_ep.has_host_id() && shard_manager.uses_host_id()) {
                            return std::variant<locator::host_id, gms::inet_address>(hid_or_ep.id());
-                        } else {
+                        } else if (hid_or_ep.has_endpoint() && !shard_manager.uses_host_id()) {
                            return std::variant<locator::host_id, gms::inet_address>(hid_or_ep.endpoint());
+                        } else {
+                            return std::nullopt;
                        }
                    } catch (...) {
                        return std::nullopt;
@@ -173,6 +179,8 @@ void space_watchdog::on_timer() {
                // Case 3: The directory isn't managed by an endpoint manager, and it represents neither an IP address,
                //         nor a host ID.
                else {
+                    // We use trace here to prevent flooding logs with unnecessary information.
+                    resource_manager_logger.trace("Encountered a hint directory of invalid name while scanning: {}", de.name);
                    return scan_one_ep_dir(dir / de.name, shard_manager, {});
                }
            }).get();
--- a/db/hints/sync_point.cc
+++ b/db/hints/sync_point.cc
@@ -26,52 +26,63 @@ namespace hints {
 //
 // Format V1 (encoded in base64):
 //   uint8_t 0x01 - version of format
-//   sync_point_v1 - encoded using IDL
+//   sync_point_v1_or_v2 - encoded using IDL
 //
 // Format V2 (encoded in base64):
 //   uint8_t 0x02 - version of format
-//   sync_point_v1 - encoded using IDL
+//   sync_point_v1_or_v2 - encoded using IDL
 //   uint64_t - checksum computed using the xxHash algorithm
 //
-// sync_point_v1:
+// Format V3 (encoded in base64):
+//   uint8_t 0x03 - version of format
+//   sync_point_v3 - encoded using IDL
+//   uint64_t - checksum computed using the xxHash algorithm
+//
+// sync_point_v1_or_v2:
 //   UUID host_id - ID of the host which created the sync point
 //   uint16_t shard_count - the number of shards in this sync point
-//   per_manager_sync_point_v1 regular_sp - replay positions for regular mutation hint queues
-//   per_manager_sync_point_v1 mv_sp - replay positions for materialized view hint queues
+//   per_manager_sync_point_v1_or_v2 regular_sp - replay positions for regular mutation hint queues
+//   per_manager_sync_point_v1_or_v2 mv_sp - replay positions for materialized view hint queues
 //
-// per_manager_sync_point_v1:
-//   std::vector<gms::inet_address> addresses - addresses for which this sync point defines replay positions
+// per_manager_sync_point_v1_or_v2:
+//   std::vector<gms::inet_address> endpoints - addresses for which this sync point defines replay positions
 //   std::vector<db::replay_position> flattened_rps:
-//       A flattened collection of replay positions for all addresses and shards.
+//       A flattened collection of replay positions for all endpoints and shards.
 //       Replay positions are grouped by address, in the same order as in
-//       the `addresses` field, and there is one replay position for each of
+//       the `endpoints` field, and there is one replay position for each of
 //       the shards (shard count is defined by the `shard_count`) field.
 //       Flattened representation was chosen in order to save space on
 //       vector lengths etc.
+//
+// sync_point_v3:
+//   similar to sync_point_v1_or_v2 except it uses per_manager_sync_point_v3 instead
+//   of per_manager_sync_point_v1_or_v2, which has locator::host_id instead of
+//   gms::inet_address.

 static constexpr size_t version_size = sizeof(uint8_t);
 static constexpr size_t checksum_size = sizeof(uint64_t);

-static std::vector<sync_point::shard_rps> decode_one_type_v1(uint16_t shard_count, const per_manager_sync_point_v1& v1) {
+template <typename PerManagerType>
+static std::vector<sync_point::shard_rps> decode_one_type(uint16_t shard_count, const PerManagerType& v) {
    std::vector<sync_point::shard_rps> ret;

-    if (size_t(shard_count) * v1.addresses.size() != v1.flattened_rps.size()) {
+    if (size_t(shard_count) * v.endpoints.size() != v.flattened_rps.size()) {
        throw std::runtime_error(format("Could not decode the sync point - there should be {} rps in flattened_rps, but there are only {}",
-                size_t(shard_count) * v1.addresses.size(), v1.flattened_rps.size()));
+                size_t(shard_count) * v.endpoints.size(), v.flattened_rps.size()));
    }

    ret.resize(std::max(unsigned(shard_count), smp::count));

-    auto rps_it = v1.flattened_rps.begin();
-    for (const auto addr : v1.addresses) {
+    auto rps_it = v.flattened_rps.begin();
+    for (const auto ep : v.endpoints) {
        uint16_t shard;
        for (shard = 0; shard < shard_count; shard++) {
-            ret[shard].emplace(addr, *rps_it++);
+            ret[shard].emplace(ep, *rps_it++);
        }
        // Fill missing shards with zero replay positions so that segments
        // which were moved across shards will be correctly waited on
        for (; shard < smp::count; shard++) {
-            ret[shard].emplace(addr, db::replay_position());
+            ret[shard].emplace(ep, db::replay_position());
        }
    }

@@ -94,50 +105,62 @@ sync_point sync_point::decode(sstring_view s) {
    seastar::simple_memory_input_stream in{raw_s.data(), raw_s.size()};

    uint8_t version = ser::serializer<uint8_t>::read(in);
-    if (version == 2) {
+    if (version == 2 || version == 3) {
        if (raw_s.size() < version_size + checksum_size) {
-            throw std::runtime_error("Could not decode the sync point encoded in the V2 format - serialized blob is too short");
+            throw std::runtime_error("Could not decode the sync point encoded in the V2/V3 format - serialized blob is too short");
        }

        seastar::simple_memory_input_stream in_checksum{raw_s.end() - checksum_size, checksum_size};
        uint64_t checksum = ser::serializer<uint64_t>::read(in_checksum);
        if (checksum != calculate_checksum(raw_s.substr(0, raw_s.size() - checksum_size))) {
-            throw std::runtime_error("Could not decode the sync point encoded in the V2 format - wrong checksum");
+            throw std::runtime_error("Could not decode the sync point encoded in the V2/V3 format - wrong checksum");
        }
    }
    else if (version != 1) {
        throw std::runtime_error(format("Unsupported sync point format version: {}", int(version)));
    }

-    sync_point_v1 v1 = ser::serializer<sync_point_v1>::read(in);
+    if (version == 1 || version == 2) {
+        sync_point_v1_or_v2 v = ser::serializer<sync_point_v1_or_v2>::read(in);
+
+        return sync_point{
+            v.host_id,
+            decode_one_type(v.shard_count, v.regular_sp),
+            decode_one_type(v.shard_count, v.mv_sp),
+        };
+    }
+
+    // version == 3
+    sync_point_v3 v3 = ser::serializer<sync_point_v3>::read(in);

    return sync_point{
-        v1.host_id,
-        decode_one_type_v1(v1.shard_count, v1.regular_sp),
-        decode_one_type_v1(v1.shard_count, v1.mv_sp),
+        v3.host_id,
+        decode_one_type(v3.shard_count, v3.regular_sp),
+        decode_one_type(v3.shard_count, v3.mv_sp),
    };
 }

-static per_manager_sync_point_v1 encode_one_type_v1(unsigned shards, const std::vector<sync_point::shard_rps>& rps) {
-    per_manager_sync_point_v1 ret;
+static per_manager_sync_point_v3 encode_one_type_v3(unsigned shards, const std::vector<sync_point::shard_rps>& rps) {
+    per_manager_sync_point_v3 ret;

-    // Gather all addresses, from all shards
-    std::unordered_set<gms::inet_address> all_addrs;
+    // Gather all endpoints, from all shards
+    std::unordered_set<locator::host_id> all_eps;
    for (const auto& shard_rps : rps) {
        for (const auto& p : shard_rps) {
-            all_addrs.insert(p.first);
+            // New sync points are created with host_id only
+            all_eps.insert(std::get<locator::host_id>(p.first));
        }
    }

-    ret.flattened_rps.reserve(size_t(shards) * all_addrs.size());
+    ret.flattened_rps.reserve(size_t(shards) * all_eps.size());

-    // Encode into v1 struct
-    // For each address, we encode a replay position for all shards.
+    // Encode into v3 struct
+    // For each endpoint, we encode a replay position for all shards.
    // If there is no replay position for a shard, we use a zero replay position.
-    for (const auto addr : all_addrs) {
-        ret.addresses.push_back(addr);
+    for (const auto ep : all_eps) {
+        ret.endpoints.push_back(ep);
        for (const auto& shard_rps : rps) {
-            auto it = shard_rps.find(addr);
+            auto it = shard_rps.find(ep);
            if (it != shard_rps.end()) {
                ret.flattened_rps.push_back(it->second);
            } else {
@@ -154,24 +177,24 @@ static per_manager_sync_point_v1 encode_one_type_v1(unsigned shards, const std::
 }

 sstring sync_point::encode() const {
-    // Encode as v1 structure
-    sync_point_v1 v1;
-    v1.host_id = this->host_id;
-    v1.shard_count = std::max(this->regular_per_shard_rps.size(), this->mv_per_shard_rps.size());
-    v1.regular_sp = encode_one_type_v1(v1.shard_count, this->regular_per_shard_rps);
-    v1.mv_sp = encode_one_type_v1(v1.shard_count, this->mv_per_shard_rps);
+    // Encode as v3 structure
+    sync_point_v3 v3;
+    v3.host_id = this->host_id;
+    v3.shard_count = std::max(this->regular_per_shard_rps.size(), this->mv_per_shard_rps.size());
+    v3.regular_sp = encode_one_type_v3(v3.shard_count, this->regular_per_shard_rps);
+    v3.mv_sp = encode_one_type_v3(v3.shard_count, this->mv_per_shard_rps);

    // Measure how much space we need
    seastar::measuring_output_stream measure;
-    ser::serializer<sync_point_v1>::write(measure, v1);
+    ser::serializer<sync_point_v3>::write(measure, v3);

    // Reserve version_size bytes for the version and checksum_size bytes for the checksum
    bytes serialized{bytes::initialized_later{}, version_size + measure.size() + checksum_size};

-    // Encode using V2 format
+    // Encode using V3 format
    seastar::simple_memory_output_stream out{reinterpret_cast<char*>(serialized.data()), serialized.size()};
-    ser::serializer<uint8_t>::write(out, 2);
-    ser::serializer<sync_point_v1>::write(out, v1);
+    ser::serializer<uint8_t>::write(out, 3);
+    ser::serializer<sync_point_v3>::write(out, v3);
    sstring_view serialized_s(reinterpret_cast<const char*>(serialized.data()), version_size + measure.size());
    uint64_t checksum = calculate_checksum(serialized_s);
    ser::serializer<uint64_t>::write(out, checksum);
--- a/db/hints/sync_point.hh
+++ b/db/hints/sync_point.hh
@@ -22,7 +22,8 @@ namespace hints {
 // A sync point is a collection of positions in hint queues which can be waited on.
 // The sync point encompasses one type of hints manager only.
 struct sync_point {
-    using shard_rps = std::unordered_map<gms::inet_address, db::replay_position>;
+    using host_id_or_addr = std::variant<locator::host_id, gms::inet_address>;
+    using shard_rps = std::unordered_map<host_id_or_addr, db::replay_position>;
    // ID of the host which created this sync point
    locator::host_id host_id;
    std::vector<shard_rps> regular_per_shard_rps;
@@ -40,21 +41,41 @@ struct sync_point {
 // IDL type
 // Contains per-endpoint and per-shard information about replay positions
 // for a particular type of hint queues (regular mutation hints or MV update hints)
-struct per_manager_sync_point_v1 {
-    std::vector<gms::inet_address> addresses;
+struct per_manager_sync_point_v1_or_v2 {
+    std::vector<gms::inet_address> endpoints;
    std::vector<db::replay_position> flattened_rps;
 };

 // IDL type
-struct sync_point_v1 {
+struct sync_point_v1_or_v2 {
    locator::host_id host_id;
    uint16_t shard_count;

    // Sync point information for regular mutation hints
-    db::hints::per_manager_sync_point_v1 regular_sp;
+    db::hints::per_manager_sync_point_v1_or_v2 regular_sp;

    // Sync point information for materialized view hints
-    db::hints::per_manager_sync_point_v1 mv_sp;
+    db::hints::per_manager_sync_point_v1_or_v2 mv_sp;
+};
+
+// IDL type
+// same as per_manager_sync_point_v1_or_v2 except that it stores the
+// endpoints as host_id instead of address
+struct per_manager_sync_point_v3 {
+    std::vector<locator::host_id> endpoints;
+    std::vector<db::replay_position> flattened_rps;
+};
+
+// IDL type
+struct sync_point_v3 {
+    locator::host_id host_id;
+    uint16_t shard_count;
+
+    // Sync point information for regular mutation hints
+    db::hints::per_manager_sync_point_v3 regular_sp;
+
+    // Sync point information for materialized view hints
+    db::hints::per_manager_sync_point_v3 mv_sp;
 };

 }
--- a/db/paxos_grace_seconds_extension.hh
+++ b/db/paxos_grace_seconds_extension.hh
@@ -55,6 +55,10 @@ public:
        return ser::serialize_to_buffer<bytes>(_paxos_gc_sec);
    }

+    std::string options_to_string() const override {
+        return std::to_string(_paxos_gc_sec);
+    }
+
    static int32_t deserialize(const bytes_view& buffer) {
        return ser::deserialize_from_buffer(buffer, boost::type<int32_t>());
    }
--- a/db/schema_tables.cc
+++ b/db/schema_tables.cc
@@ -14,7 +14,6 @@
 #include "gms/feature_service.hh"
 #include "partition_slice_builder.hh"
 #include "dht/i_partitioner.hh"
-#include "system_auth_keyspace.hh"
 #include "system_keyspace.hh"
 #include "query-result-set.hh"
 #include "query-result-writer.hh"
@@ -235,7 +234,6 @@ future<> save_system_schema(cql3::query_processor& qp) {
    co_await save_system_schema_to_keyspace(qp, schema_tables::NAME);
    // #2514 - make sure "system" is written to system_schema.keyspaces.
    co_await save_system_schema_to_keyspace(qp, system_keyspace::NAME);
-    co_await save_system_schema_to_keyspace(qp, system_auth_keyspace::NAME);
 }

 namespace v3 {
@@ -1296,7 +1294,6 @@ static future<> do_merge_schema(distributed<service::storage_proxy>& proxy, shar
    schema_ptr s = keyspaces();
    // compare before/after schemas of the affected keyspaces only
    std::set<sstring> keyspaces;
-    std::set<table_id> column_families;
    std::unordered_map<keyspace_name, table_selector> affected_tables;
    bool has_tablet_mutations = false;
    for (auto&& mutation : mutations) {
@@ -1311,7 +1308,6 @@ static future<> do_merge_schema(distributed<service::storage_proxy>& proxy, shar
        }

        keyspaces.emplace(std::move(keyspace_name));
-        column_families.emplace(mutation.column_family_id());
        // We must force recalculation of schema version after the merge, since the resulting
        // schema may be a mix of the old and new schemas, with the exception of entries
        // that originate from group 0.
--- a/db/system_auth_keyspace.cc
+++ b/db/system_auth_keyspace.cc
@@ -1,141 +0,0 @@
-/*
- * Modified by ScyllaDB
- * Copyright (C) 2024-present ScyllaDB
- */
-
-/*
- * SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
- */
-
-#include "system_auth_keyspace.hh"
-#include "system_keyspace.hh"
-#include "db/schema_tables.hh"
-#include "schema/schema_builder.hh"
-#include "types/set.hh"
-
-namespace db {
-
-// all system auth tables use schema commitlog
-namespace {
-    const auto set_use_schema_commitlog = schema_builder::register_static_configurator([](const sstring& ks_name, const sstring& cf_name, schema_static_props& props) {
-        if (ks_name == system_auth_keyspace::NAME) {
-            props.enable_schema_commitlog();
-        }
-    });
-} // anonymous namespace
-
-namespace system_auth_keyspace {
-
-// use the same gc setting as system_schema tables
-using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
-// FIXME: in some cases time-based gc may cause data resurrection,
-// for more info see https://github.com/scylladb/scylladb/issues/15607
-static constexpr auto auth_gc_grace = std::chrono::duration_cast<std::chrono::seconds>(days(7)).count();
-
-schema_ptr roles() {
-    static thread_local auto schema = [] {
-        schema_builder builder(generate_legacy_id(NAME, ROLES), NAME, ROLES,
-        // partition key
-        {{"role", utf8_type}},
-        // clustering key
-        {},
-        // regular columns
-        {
-            {"can_login", boolean_type},
-            {"is_superuser", boolean_type},
-            {"member_of", set_type_impl::get_instance(utf8_type, true)},
-            {"salted_hash", utf8_type}
-        },
-        // static columns
-        {},
-        // regular column name type
-        utf8_type,
-        // comment
-        "roles for authentication and RBAC"
-        );
-        builder.set_gc_grace_seconds(auth_gc_grace);
-        builder.with_version(system_keyspace::generate_schema_version(builder.uuid()));
-        return builder.build();
-    }();
-    return schema;
-}
-
-schema_ptr role_members() {
-    static thread_local auto schema = [] {
-        schema_builder builder(generate_legacy_id(NAME, ROLE_MEMBERS), NAME, ROLE_MEMBERS,
-        // partition key
-        {{"role", utf8_type}},
-        // clustering key
-        {{"member", utf8_type}},
-        // regular columns
-        {},
-        // static columns
-        {},
-        // regular column name type
-        utf8_type,
-        // comment
-        "joins users and their granted roles in RBAC"
-        );
-        builder.set_gc_grace_seconds(auth_gc_grace);
-        builder.with_version(system_keyspace::generate_schema_version(builder.uuid()));
-        return builder.build();
-    }();
-    return schema;
-}
-
-schema_ptr role_attributes() {
-    static thread_local auto schema = [] {
-        schema_builder builder(generate_legacy_id(NAME, ROLE_ATTRIBUTES), NAME, ROLE_ATTRIBUTES,
-        // partition key
-        {{"role", utf8_type}},
-        // clustering key
-        {{"name", utf8_type}},
-        // regular columns
-        {
-            {"value", utf8_type}
-        },
-        // static columns
-        {},
-        // regular column name type
-        utf8_type,
-        // comment
-        "role permissions in RBAC"
-        );
-        builder.set_gc_grace_seconds(auth_gc_grace);
-        builder.with_version(system_keyspace::generate_schema_version(builder.uuid()));
-        return builder.build();
-    }();
-    return schema;
-}
-
-schema_ptr role_permissions() {
-    static thread_local auto schema = [] {
-        schema_builder builder(generate_legacy_id(NAME, ROLE_PERMISSIONS), NAME, ROLE_PERMISSIONS,
-        // partition key
-        {{"role", utf8_type}},
-        // clustering key
-        {{"resource", utf8_type}},
-        // regular columns
-        {
-            {"permissions", set_type_impl::get_instance(utf8_type, true)}
-        },
-        // static columns
-        {},
-        // regular column name type
-        utf8_type,
-        // comment
-        "role permissions for CassandraAuthorizer"
-        );
-        builder.set_gc_grace_seconds(auth_gc_grace);
-        builder.with_version(system_keyspace::generate_schema_version(builder.uuid()));
-        return builder.build();
-    }();
-    return schema;
-}
-
-std::vector<schema_ptr> all_tables() {
-    return {roles(), role_members(), role_attributes(), role_permissions()};
-}
-
-} // namespace system_auth_keyspace
-} // namespace db
--- a/db/system_auth_keyspace.hh
+++ b/db/system_auth_keyspace.hh
@@ -1,38 +0,0 @@
-/*
- * Modified by ScyllaDB
- * Copyright (C) 2024-present ScyllaDB
- */
-
-/*
- * SPDX-License-Identifier: (AGPL-3.0-or-later and Apache-2.0)
- */
-
-#pragma once
-
-#include "schema/schema_fwd.hh"
-#include <vector>
-
-namespace db {
-
-namespace system_auth_keyspace {
-    enum class version_t: int64_t {
-        v1 = 1,
-        v2 = 2,
-    };
-    static constexpr auto NAME = "system_auth_v2";
-    // tables
-    static constexpr auto ROLES = "roles";
-    static constexpr auto ROLE_MEMBERS = "role_members";
-    static constexpr auto ROLE_ATTRIBUTES = "role_attributes";
-    static constexpr auto ROLE_PERMISSIONS = "role_permissions";
-
-
-    schema_ptr roles();
-    schema_ptr role_members();
-    schema_ptr role_attributes();
-    schema_ptr role_permissions();
-
-    std::vector<schema_ptr> all_tables();
-}; // namespace system_auth_keyspace
-
-} // namespace db
--- a/db/system_keyspace.cc
+++ b/db/system_keyspace.cc
@@ -18,7 +18,6 @@
 #include <seastar/core/on_internal_error.hh>
 #include "system_keyspace.hh"
 #include "cql3/untyped_result_set.hh"
-#include "db/system_auth_keyspace.hh"
 #include "thrift/server.hh"
 #include "cql3/query_processor.hh"
 #include "partition_slice_builder.hh"
@@ -88,6 +87,10 @@ namespace {
            system_keyspace::SCYLLA_LOCAL,
            system_keyspace::COMMITLOG_CLEANUPS,
            system_keyspace::SERVICE_LEVELS_V2,
+            system_keyspace::ROLES,
+            system_keyspace::ROLE_MEMBERS,
+            system_keyspace::ROLE_ATTRIBUTES,
+            system_keyspace::ROLE_PERMISSIONS,
            system_keyspace::v3::CDC_LOCAL
        };
        if (ks_name == system_keyspace::NAME && tables.contains(cf_name)) {
@@ -233,12 +236,15 @@ schema_ptr system_keyspace::topology() {
            .with_column("request_id", timeuuid_type)
            .with_column("ignore_nodes", set_type_impl::get_instance(uuid_type, true), column_kind::static_column)
            .with_column("new_cdc_generation_data_uuid", timeuuid_type, column_kind::static_column)
+            .with_column("new_keyspace_rf_change_ks_name", utf8_type, column_kind::static_column)
+            .with_column("new_keyspace_rf_change_data", map_type_impl::get_instance(utf8_type, utf8_type, false), column_kind::static_column)
            .with_column("version", long_type, column_kind::static_column)
            .with_column("fence_version", long_type, column_kind::static_column)
            .with_column("transition_state", utf8_type, column_kind::static_column)
            .with_column("committed_cdc_generations", set_type_impl::get_instance(cdc_generation_ts_id_type, true), column_kind::static_column)
            .with_column("unpublished_cdc_generations", set_type_impl::get_instance(cdc_generation_ts_id_type, true), column_kind::static_column)
            .with_column("global_topology_request", utf8_type, column_kind::static_column)
+            .with_column("global_topology_request_id", timeuuid_type, column_kind::static_column)
            .with_column("enabled_features", set_type_impl::get_instance(utf8_type, true), column_kind::static_column)
            .with_column("session", uuid_type, column_kind::static_column)
            .with_column("tablet_balancing_enabled", boolean_type, column_kind::static_column)
@@ -1139,6 +1145,103 @@ schema_ptr system_keyspace::service_levels_v2() {
    return schema;
 }

+schema_ptr system_keyspace::roles() {
+    static thread_local auto schema = [] {
+        schema_builder builder(generate_legacy_id(NAME, ROLES), NAME, ROLES,
+        // partition key
+        {{"role", utf8_type}},
+        // clustering key
+        {},
+        // regular columns
+        {
+            {"can_login", boolean_type},
+            {"is_superuser", boolean_type},
+            {"member_of", set_type_impl::get_instance(utf8_type, true)},
+            {"salted_hash", utf8_type}
+        },
+        // static columns
+        {},
+        // regular column name type
+        utf8_type,
+        // comment
+        "roles for authentication and RBAC"
+        );
+        builder.with_version(system_keyspace::generate_schema_version(builder.uuid()));
+        return builder.build();
+    }();
+    return schema;
+}
+
+schema_ptr system_keyspace::role_members() {
+    static thread_local auto schema = [] {
+        schema_builder builder(generate_legacy_id(NAME, ROLE_MEMBERS), NAME, ROLE_MEMBERS,
+        // partition key
+        {{"role", utf8_type}},
+        // clustering key
+        {{"member", utf8_type}},
+        // regular columns
+        {},
+        // static columns
+        {},
+        // regular column name type
+        utf8_type,
+        // comment
+        "joins users and their granted roles in RBAC"
+        );
+        builder.with_version(system_keyspace::generate_schema_version(builder.uuid()));
+        return builder.build();
+    }();
+    return schema;
+}
+
+schema_ptr system_keyspace::role_attributes() {
+    static thread_local auto schema = [] {
+        schema_builder builder(generate_legacy_id(NAME, ROLE_ATTRIBUTES), NAME, ROLE_ATTRIBUTES,
+        // partition key
+        {{"role", utf8_type}},
+        // clustering key
+        {{"name", utf8_type}},
+        // regular columns
+        {
+            {"value", utf8_type}
+        },
+        // static columns
+        {},
+        // regular column name type
+        utf8_type,
+        // comment
+        "role permissions in RBAC"
+        );
+        builder.with_version(system_keyspace::generate_schema_version(builder.uuid()));
+        return builder.build();
+    }();
+    return schema;
+}
+
+schema_ptr system_keyspace::role_permissions() {
+    static thread_local auto schema = [] {
+        schema_builder builder(generate_legacy_id(NAME, ROLE_PERMISSIONS), NAME, ROLE_PERMISSIONS,
+        // partition key
+        {{"role", utf8_type}},
+        // clustering key
+        {{"resource", utf8_type}},
+        // regular columns
+        {
+            {"permissions", set_type_impl::get_instance(utf8_type, true)}
+        },
+        // static columns
+        {},
+        // regular column name type
+        utf8_type,
+        // comment
+        "role permissions for CassandraAuthorizer"
+        );
+        builder.with_version(system_keyspace::generate_schema_version(builder.uuid()));
+        return builder.build();
+    }();
+    return schema;
+}
+
 schema_ptr system_keyspace::legacy::hints() {
    static thread_local auto schema = [] {
        schema_builder builder(generate_legacy_id(NAME, HINTS), NAME, HINTS,
@@ -2130,10 +2233,16 @@ future<> system_keyspace::set_bootstrap_state(bootstrap_state state) {
    });
 }

+std::vector<schema_ptr> system_keyspace::auth_tables() {
+    return {roles(), role_members(), role_attributes(), role_permissions()};
+}
+
 std::vector<schema_ptr> system_keyspace::all_tables(const db::config& cfg) {
    std::vector<schema_ptr> r;
    auto schema_tables = db::schema_tables::all_tables(schema_features::full());
    std::copy(schema_tables.begin(), schema_tables.end(), std::back_inserter(r));
+    auto auth_tables = system_keyspace::auth_tables();
+    std::copy(auth_tables.begin(), auth_tables.end(), std::back_inserter(r));
    r.insert(r.end(), { built_indexes(), hints(), batchlog(), paxos(), local(),
                    peers(), peer_events(), range_xfers(),
                    compactions_in_progress(), compaction_history(),
@@ -2149,14 +2258,11 @@ std::vector<schema_ptr> system_keyspace::all_tables(const db::config& cfg) {
                    topology(), cdc_generations_v3(), topology_requests(), service_levels_v2(),
    });

-    auto auth_tables = db::system_auth_keyspace::all_tables();
-    std::copy(auth_tables.begin(), auth_tables.end(), std::back_inserter(r));
-
    if (cfg.check_experimental(db::experimental_features_t::feature::BROADCAST_TABLES)) {
        r.insert(r.end(), {broadcast_kv_store()});
    }

-    if (cfg.check_experimental(db::experimental_features_t::feature::TABLETS)) {
+    if (cfg.enable_tablets()) {
        r.insert(r.end(), {tablets()});
    }

@@ -2691,17 +2797,17 @@ future<std::optional<mutation>> system_keyspace::get_group0_schema_version() {

 static constexpr auto AUTH_VERSION_KEY = "auth_version";

-future<system_auth_keyspace::version_t> system_keyspace::get_auth_version() {
+future<system_keyspace::auth_version_t> system_keyspace::get_auth_version() {
    auto str_opt = co_await get_scylla_local_param(AUTH_VERSION_KEY);
    if (!str_opt) {
-        co_return db::system_auth_keyspace::version_t::v1;
+        co_return auth_version_t::v1;
    }
    auto& str = *str_opt;
    if (str == "" || str == "1") {
-        co_return db::system_auth_keyspace::version_t::v1;
+        co_return auth_version_t::v1;
    }
    if (str == "2") {
-        co_return db::system_auth_keyspace::version_t::v2;
+        co_return auth_version_t::v2;
    }
    on_internal_error(slogger, fmt::format("unexpected auth_version in scylla_local got {}", str));
 }
@@ -2719,7 +2825,7 @@ static service::query_state& internal_system_query_state() {
    return qs;
 };

-future<mutation> system_keyspace::make_auth_version_mutation(api::timestamp_type ts, db::system_auth_keyspace::version_t version) {
+future<mutation> system_keyspace::make_auth_version_mutation(api::timestamp_type ts, db::system_keyspace::auth_version_t version) {
    static sstring query = format("INSERT INTO {}.{} (key, value) VALUES (?, ?);", db::system_keyspace::NAME, db::system_keyspace::SCYLLA_LOCAL);
    auto muts = co_await _qp.get_mutations_internal(query, internal_system_query_state(), ts, {AUTH_VERSION_KEY, std::to_string(int64_t(version))});
    if (muts.size() != 1) {
@@ -2967,6 +3073,11 @@ future<service::topology> system_keyspace::load_topology_state(const std::unorde
            ret.committed_cdc_generations = decode_cdc_generations_ids(deserialize_set_column(*topology(), some_row, "committed_cdc_generations"));
        }

+        if (some_row.has("new_keyspace_rf_change_data")) {
+            ret.new_keyspace_rf_change_ks_name = some_row.get_as<sstring>("new_keyspace_rf_change_ks_name");
+            ret.new_keyspace_rf_change_data = some_row.get_map<sstring,sstring>("new_keyspace_rf_change_data");
+        }
+
        if (!ret.committed_cdc_generations.empty()) {
            // Sanity check for CDC generation data consistency.
            auto gen_id = ret.committed_cdc_generations.back();
@@ -2998,6 +3109,10 @@ future<service::topology> system_keyspace::load_topology_state(const std::unorde
            ret.global_request.emplace(req);
        }

+        if (some_row.has("global_topology_request_id")) {
+            ret.global_request_id = some_row.get_as<utils::UUID>("global_topology_request_id");
+        }
+
        if (some_row.has("enabled_features")) {
            ret.enabled_features = decode_features(deserialize_set_column(*topology(), some_row, "enabled_features"));
        }
--- a/db/system_keyspace.hh
+++ b/db/system_keyspace.hh
@@ -14,7 +14,6 @@
 #include <unordered_map>
 #include <utility>
 #include <vector>
-#include "db/system_auth_keyspace.hh"
 #include "gms/gossiper.hh"
 #include "schema/schema_fwd.hh"
 #include "utils/UUID.hh"
@@ -180,6 +179,12 @@ public:
    static constexpr auto TABLETS = "tablets";
    static constexpr auto SERVICE_LEVELS_V2 = "service_levels_v2";

+    // auth
+    static constexpr auto ROLES = "roles";
+    static constexpr auto ROLE_MEMBERS = "role_members";
+    static constexpr auto ROLE_ATTRIBUTES = "role_attributes";
+    static constexpr auto ROLE_PERMISSIONS = "role_permissions";
+
    struct v3 {
        static constexpr auto BATCHES = "batches";
        static constexpr auto PAXOS = "paxos";
@@ -267,6 +272,12 @@ public:
    static schema_ptr tablets();
    static schema_ptr service_levels_v2();

+    // auth
+    static schema_ptr roles();
+    static schema_ptr role_members();
+    static schema_ptr role_attributes();
+    static schema_ptr role_permissions();
+
    static table_schema_version generate_schema_version(table_id table_id, uint16_t offset = 0);

    future<> build_bootstrap_info();
@@ -310,7 +321,9 @@ public:
    template <typename T>
    future<std::optional<T>> get_scylla_local_param_as(const sstring& key);

+    static std::vector<schema_ptr> auth_tables();
    static std::vector<schema_ptr> all_tables(const db::config& cfg);
+
    future<> make(
            locator::effective_replication_map_factory&,
            replica::database&);
@@ -577,11 +590,16 @@ public:
    // returns the corresponding mutation. Otherwise returns nullopt.
    future<std::optional<mutation>> get_group0_schema_version();

+    enum class auth_version_t: int64_t {
+        v1 = 1,
+        v2 = 2,
+    };
+
    // If the `auth_version` key in `system.scylla_local` is present (either live or tombstone),
    // returns the corresponding mutation. Otherwise returns nullopt.
    future<std::optional<mutation>> get_auth_version_mutation();
-    future<mutation> make_auth_version_mutation(api::timestamp_type ts, db::system_auth_keyspace::version_t version);
-    future<system_auth_keyspace::version_t> get_auth_version();
+    future<mutation> make_auth_version_mutation(api::timestamp_type ts, auth_version_t version);
+    future<auth_version_t> get_auth_version();

    future<> sstables_registry_create_entry(sstring location, sstring status, sstables::sstable_state state, sstables::entry_descriptor desc);
    future<> sstables_registry_update_entry_status(sstring location, sstables::generation_type gen, sstring status);
--- a/db/view/view.cc
+++ b/db/view/view.cc
@@ -1625,25 +1625,26 @@ get_view_natural_endpoint(
        }
    }

+    auto& view_topology = view_erm->get_token_metadata_ptr()->get_topology();
    for (auto&& view_endpoint : view_erm->get_replicas(view_token)) {
        if (use_legacy_self_pairing) {
+            auto it = std::find(base_endpoints.begin(), base_endpoints.end(),
+                view_endpoint);
            // If this base replica is also one of the view replicas, we use
            // ourselves as the view replica.
-            if (view_endpoint == me) {
+            if (view_endpoint == me && it != base_endpoints.end()) {
                return topology.my_address();
            }
            // We have to remove any endpoint which is shared between the base
            // and the view, as it will select itself and throw off the counts
            // otherwise.
-            auto it = std::find(base_endpoints.begin(), base_endpoints.end(),
-                view_endpoint);
            if (it != base_endpoints.end()) {
                base_endpoints.erase(it);
-            } else if (!network_topology || topology.get_datacenter(view_endpoint) == my_datacenter) {
+            } else if (!network_topology || view_topology.get_datacenter(view_endpoint) == my_datacenter) {
                view_endpoints.push_back(view_endpoint);
            }
        } else {
-            if (!network_topology || topology.get_datacenter(view_endpoint) == my_datacenter) {
+            if (!network_topology || view_topology.get_datacenter(view_endpoint) == my_datacenter) {
                view_endpoints.push_back(view_endpoint);
            }
        }
@@ -1658,7 +1659,7 @@ get_view_natural_endpoint(
        return {};
    }
    auto replica = view_endpoints[base_it - base_endpoints.begin()];
-    return topology.get_node(replica).endpoint();
+    return view_topology.get_node(replica).endpoint();
 }

 static future<> apply_to_remote_endpoints(service::storage_proxy& proxy, locator::effective_replication_map_ptr ermp,
@@ -1715,6 +1716,7 @@ future<> view_update_generator::mutate_MV(
 {
    auto base_ermp = base->table().get_effective_replication_map();
    static constexpr size_t max_concurrent_updates = 128;
+    co_await utils::get_local_injector().inject("delay_before_get_view_natural_endpoint", 8000ms);
    co_await max_concurrent_for_each(view_updates, max_concurrent_updates,
            [this, base_token, &stats, &cf_stats, tr_state, &pending_view_updates, allow_hints, wait_for_all, base_ermp] (frozen_mutation_and_schema mut) mutable -> future<> {
        auto view_token = dht::get_token(*mut.s, mut.fm.key());
--- a/db/view/view_update_generator.cc
+++ b/db/view/view_update_generator.cc
@@ -7,7 +7,7 @@
 */

 #include "db/view/view_update_backlog.hh"
-#include "exceptions/exceptions.hh"
+#include <seastar/core/timed_out_error.hh>
 #include "gms/inet_address.hh"
 #include <seastar/util/defer.hh>
 #include <boost/range/adaptor/map.hpp>
@@ -370,6 +370,17 @@ future<> view_update_generator::populate_views(const replica::table& table,
    }
 }

+
+// Generating view updates for a single client request can take a long time and might not finish before the timeout is
+// reached. In such case this exception is thrown.
+// "Generating a view update" means creating a view update and scheduling it to be sent later.
+// This exception isn't thrown if the sending timeouts, it's only concrened with generating.
+struct view_update_generation_timeout_exception : public seastar::timed_out_error {
+    const char* what() const noexcept override {
+        return "Request timed out - couldn't prepare materialized view updates in time";
+    }
+};
+
 /**
 * Given some updates on the base table and the existing values for the rows affected by that update, generates the
 * mutations to be applied to the base table's views, and sends them to the paired view replicas.
@@ -446,7 +457,7 @@ future<> view_update_generator::generate_and_propagate_view_updates(const replic
            }

            if (db::timeout_clock::now() > timeout) {
-                err = std::make_exception_ptr(exceptions::view_update_generation_timeout_exception());
+                err = std::make_exception_ptr(view_update_generation_timeout_exception());
                break;
            }
        }
--- a/dist/common/scripts/scylla_raid_setup
+++ b/dist/common/scripts/scylla_raid_setup
@@ -325,6 +325,8 @@ WantedBy=local-fs.target
        os.chown(dpath, uid, gid)

    if is_debian_variant():
+        if not shutil.which('update-initramfs'):
+            pkg_install('initramfs-tools')
        run('update-initramfs -u', shell=True, check=True)

    if not udev_info.uuid_link:
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -85,7 +85,7 @@ redirects: setup
 # Preview commands
 .PHONY: preview
 preview: setup
-	$(POETRY) run sphinx-autobuild -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml --host $(PREVIEW_HOST) --port 5500 --ignore *.csv --ignore *.yaml
+	$(POETRY) run sphinx-autobuild -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml --host $(PREVIEW_HOST) --port 5500 --ignore *.csv --ignore *.json --ignore *.yaml

 .PHONY: multiversionpreview
 multiversionpreview: multiversion
--- a/docs/_ext/scylladb_cc_properties.py
+++ b/docs/_ext/scylladb_cc_properties.py
@@ -1,23 +1,19 @@
-import os
 import re
-import yaml
 from typing import Any, Dict, List

-import jinja2
-
 from sphinx import addnodes
 from sphinx.application import Sphinx
 from sphinx.directives import ObjectDescription
 from sphinx.util import logging, ws_re
-from sphinx.util.display import status_iterator
 from sphinx.util.docfields import Field
 from sphinx.util.docutils import switch_source_input, SphinxDirective
 from sphinx.util.nodes import make_id, nested_parse_with_titles
-from sphinx.jinja2glue import BuiltinTemplateLoader
 from docutils import nodes
 from docutils.parsers.rst import directives
 from docutils.statemachine import StringList

+from utils import maybe_add_filters
+
 logger = logging.getLogger(__name__)

 class DBConfigParser:
@@ -152,51 +148,6 @@ class DBConfigParser:
        return DBConfigParser.all_properties[name]


-def readable_desc(description: str) -> str:
-    """
-    This function is deprecated and maintained only for backward compatibility 
-    with previous versions. Use ``readable_desc_rst``instead.
-    """
-    return (
-        description.replace("\\n", "")
-        .replace('<', '&lt;')
-        .replace('>', '&gt;')
-        .replace("\n", "<br>")
-        .replace("\\t", "- ")
-        .replace('"', "")
-    )
-
-
-def readable_desc_rst(description):
-    indent = ' ' * 3
-    lines = description.split('\n')
-    cleaned_lines = []
-    
-    for line in lines:
-
-        cleaned_line = line.replace('\\n', '\n')
-
-        if line.endswith('"'):
-            cleaned_line = cleaned_line[:-1] + ' '
-
-        cleaned_line = cleaned_line.lstrip()
-        cleaned_line = cleaned_line.replace('"', '')
-        
-        if cleaned_line != '':
-            cleaned_line = indent + cleaned_line
-            cleaned_lines.append(cleaned_line)
-    
-    return ''.join(cleaned_lines)
-
-
-def maybe_add_filters(builder):
-    env = builder.templates.environment
-    if 'readable_desc' not in env.filters:
-        env.filters['readable_desc'] = readable_desc
-
-    if 'readable_desc_rst' not in env.filters:
-        env.filters['readable_desc_rst'] = readable_desc_rst
-

 class ConfigOption(ObjectDescription):
    has_content = True
--- a/docs/_ext/scylladb_metrics.py
+++ b/docs/_ext/scylladb_metrics.py
@@ -0,0 +1,188 @@
+import os
+import sys
+import json
+from sphinx import addnodes
+from sphinx.directives import ObjectDescription
+from sphinx.util.docfields import Field
+from sphinx.util.docutils import switch_source_input
+from sphinx.util.nodes import make_id
+from sphinx.util import logging, ws_re
+from docutils.parsers.rst import Directive, directives
+from docutils.statemachine import StringList
+from sphinxcontrib.datatemplates.directive import DataTemplateJSON
+from utils import maybe_add_filters
+
+sys.path.insert(0, os.path.abspath("../../scripts"))
+import scripts.get_description as metrics
+
+LOGGER = logging.getLogger(__name__)
+
+
+class MetricsProcessor:
+
+    MARKER = "::description"
+
+    def _create_output_directory(self, app, metrics_directory):
+        output_directory = os.path.join(app.builder.srcdir, metrics_directory)
+        os.makedirs(output_directory, exist_ok=True)
+        return output_directory
+
+    def _process_single_file(self, file_path, destination_path, metrics_config_path):
+        with open(file_path, 'r', encoding='utf-8') as f:
+            content = f.read()
+        if self.MARKER in content and not os.path.exists(destination_path):
+            try:
+                metrics_file = metrics.get_metrics_from_file(file_path, "scylla", metrics.get_metrics_information(metrics_config_path))
+                with open(destination_path, 'w+', encoding='utf-8') as f:
+                    json.dump(metrics_file, f, indent=4)
+            except SystemExit:
+                LOGGER.info(f'Skipping file: {file_path}')
+            except Exception as error:
+                LOGGER.info(error)
+
+    def _process_metrics_files(self, repo_dir, output_directory, metrics_config_path):
+        for root, _, files in os.walk(repo_dir):
+            for file in files:
+                if file.endswith(".cc"):
+                    file_path = os.path.join(root, file)
+                    file_name = os.path.splitext(file)[0] + ".json"
+                    destination_path = os.path.join(output_directory, file_name)
+                    self._process_single_file(file_path, destination_path, metrics_config_path)
+
+    def run(self, app, exception=None):
+        repo_dir = os.path.abspath(os.path.join(app.srcdir, ".."))
+        metrics_config_path = os.path.join(repo_dir, app.config.scylladb_metrics_config_path)
+        output_directory = self._create_output_directory(app, app.config.scylladb_metrics_directory)
+
+        self._process_metrics_files(repo_dir, output_directory, metrics_config_path)
+
+
+class MetricsTemplateDirective(DataTemplateJSON):
+    option_spec = DataTemplateJSON.option_spec.copy()
+    option_spec["title"] = lambda x: x
+
+    def _make_context(self, data, config, env):
+        context = super()._make_context(data, config, env)
+        context["title"] = self.options.get("title")
+        return context
+
+    def run(self):
+        return super().run()
+
+
+class MetricsOption(ObjectDescription):
+    has_content = True
+    required_arguments = 1
+    optional_arguments = 0
+    final_argument_whitespace = False
+    option_spec = {
+        'type': directives.unchanged,
+        'component': directives.unchanged,
+        'key': directives.unchanged,
+        'source': directives.unchanged,
+    }
+
+    doc_field_types = [
+        Field('type', label='Type', has_arg=False, names=('type',)),
+        Field('component', label='Component', has_arg=False, names=('component',)),
+        Field('key', label='Key', has_arg=False, names=('key',)),
+        Field('source', label='Source', has_arg=False, names=('source',)),
+    ]
+
+    def handle_signature(self, sig: str, signode: addnodes.desc_signature):
+        signode.clear()
+        signode += addnodes.desc_name(sig, sig)
+        return ws_re.sub(' ', sig)
+
+    @property
+    def env(self):
+        return self.state.document.settings.env
+
+    def _render(self, name, option_type, component, key, source):
+        item = {'name': name, 'type': option_type, 'component': component, 'key': key, 'source': source }
+        template = self.config.scylladb_metrics_option_template
+        return self.env.app.builder.templates.render(template, item)
+
+    def transform_content(self, contentnode: addnodes.desc_content) -> None:
+        name = self.arguments[0]
+        option_type = self.options.get('type', '')
+        component = self.options.get('component', '')
+        key = self.options.get('key', '')
+        source_file = self.options.get('source', '')
+        _, lineno = self.get_source_info()
+        source = f'scylladb_metrics:{lineno}:<{name}>'
+        fields = StringList(self._render(name, option_type, component, key, source_file).splitlines(), source=source, parent_offset=lineno)
+        with switch_source_input(self.state, fields):
+            self.state.nested_parse(fields, 0, contentnode)
+
+    def add_target_and_index(self, name: str, sig: str, signode: addnodes.desc_signature) -> None:
+        node_id = make_id(self.env, self.state.document, self.objtype, name)
+        signode['ids'].append(node_id)
+        self.state.document.note_explicit_target(signode)
+        entry = f'{name}; metrics option'
+        self.indexnode['entries'].append(('pair', entry, node_id, '', None))
+        self.env.get_domain('std').note_object(self.objtype, name, node_id, location=signode)
+
+class MetricsDirective(Directive):
+    TEMPLATE = 'metrics.tmpl'
+    required_arguments = 0
+    optional_arguments = 1
+    option_spec = {'template': directives.path}
+    has_content = True
+
+    def _process_file(self, file, relative_path_from_current_rst):
+        data_directive = MetricsTemplateDirective(
+            name=self.name,
+            arguments=[os.path.join(relative_path_from_current_rst, file)],
+            options=self.options,
+            content=self.content,
+            lineno=self.lineno,
+            content_offset=self.content_offset,
+            block_text=self.block_text,
+            state=self.state,
+            state_machine=self.state_machine,
+        )
+        data_directive.options["template"] = self.options.get('template', self.TEMPLATE)
+        data_directive.options["title"] = file.replace('_', ' ').replace('.json','').capitalize()
+        return data_directive.run()
+
+    def _get_relative_path(self, output_directory, app, docname):
+        current_rst_path = os.path.join(app.builder.srcdir, docname + ".rst")
+        return os.path.relpath(output_directory, os.path.dirname(current_rst_path))
+
+
+    def run(self):
+        maybe_add_filters(self.state.document.settings.env.app.builder)
+        app = self.state.document.settings.env.app
+        docname = self.state.document.settings.env.docname
+        metrics_directory = os.path.join(app.builder.srcdir, app.config.scylladb_metrics_directory)
+        output = []
+        try:
+            relative_path_from_current_rst = self._get_relative_path(metrics_directory, app, docname)
+            files = os.listdir(metrics_directory)
+            for _, file in enumerate(files):
+                output.extend(self._process_file(file, relative_path_from_current_rst))
+        except Exception as error:
+            LOGGER.info(error)
+        return output
+
+def setup(app):
+    app.add_config_value("scylladb_metrics_directory", default="_data/metrics", rebuild="html")
+    app.add_config_value("scylladb_metrics_config_path", default='scripts/metrics-config.yml', rebuild="html")
+    app.add_config_value('scylladb_metrics_option_template', default='metrics_option.tmpl', rebuild='html', types=[str])
+    app.connect("builder-inited", MetricsProcessor().run)
+    app.add_object_type(
+        'metrics_option',
+        'metrics_option',
+        objname='metrics option')
+    app.add_directive_to_domain('std', 'metrics_option', MetricsOption, override=True)
+    app.add_directive("metrics_option", MetricsOption)
+    app.add_directive("scylladb_metrics", MetricsDirective)
+
+   
+    return {
+        "version": "0.1",
+        "parallel_read_safe": True,
+        "parallel_write_safe": True,
+    }
+
--- a/docs/_ext/utils.py
+++ b/docs/_ext/utils.py
@@ -0,0 +1,44 @@
+def readable_desc(description: str) -> str:
+    """
+    This function is deprecated and maintained only for backward compatibility 
+    with previous versions. Use ``readable_desc_rst``instead.
+    """
+    return (
+        description.replace("\\n", "")
+        .replace('<', '&lt;')
+        .replace('>', '&gt;')
+        .replace("\n", "<br>")
+        .replace("\\t", "- ")
+        .replace('"', "")
+    )
+
+
+def readable_desc_rst(description):
+    indent = ' ' * 3
+    lines = description.split('\n')
+    cleaned_lines = []
+    
+    for line in lines:
+
+        cleaned_line = line.replace('\\n', '\n')
+
+        if line.endswith('"'):
+            cleaned_line = cleaned_line[:-1] + ' '
+
+        cleaned_line = cleaned_line.lstrip()
+        cleaned_line = cleaned_line.replace('"', '')
+        
+        if cleaned_line != '':
+            cleaned_line = indent + cleaned_line
+            cleaned_lines.append(cleaned_line)
+    
+    return ''.join(cleaned_lines)
+
+
+def maybe_add_filters(builder):
+    env = builder.templates.environment
+    if 'readable_desc' not in env.filters:
+        env.filters['readable_desc'] = readable_desc
+
+    if 'readable_desc_rst' not in env.filters:
+        env.filters['readable_desc_rst'] = readable_desc_rst
--- a/docs/_static/css/custom.css
+++ b/docs/_static/css/custom.css
@@ -41,6 +41,6 @@ dl dt:hover > a.headerlink {
    visibility: visible;
 }

-dl.confval {
+dl.confval, dl.metrics_option {
    border-bottom: 1px solid #cacaca;
 }
--- a/docs/_templates/metrics.tmpl
+++ b/docs/_templates/metrics.tmpl
@@ -0,0 +1,19 @@
+.. -*- mode: rst -*-
+
+{{title}}
+{{ '-' * title|length }}
+
+{% if data  %}
+{% for key, value in data.items() %}
+.. _metricsprop_{{ key }}:
+
+.. metrics_option:: {{ key }}
+  :type: {{value[0]}}
+  :source: {{value[4]}}
+  :component: {{value[2]}}
+  :key: {{value[3]}}
+
+  {{value[1] | readable_desc_rst}}
+
+{% endfor %}
+{% endif %}
--- a/docs/_templates/metrics_option.tmpl
+++ b/docs/_templates/metrics_option.tmpl
@@ -0,0 +1,3 @@
+   {% if type %}* **Type:** ``{{ type }}``{% endif %}
+   {% if component %}* **Component:** ``{{ component }}``{% endif %}
+   {% if key %}* **Key:** ``{{ key }}``{% endif %}
--- a/docs/_utils/redirects.yaml
+++ b/docs/_utils/redirects.yaml
@@ -21,6 +21,9 @@
 # remove the Open Source vs. Enterprise Matrix from the Open Source docs

 /stable/reference/versions-matrix-enterprise-oss.html: https://enterprise.docs.scylladb.com/stable/reference/versions-matrix-enterprise-oss.html
+# Remove the outdated Troubleshooting article
+
+/stable/troubleshooting/error-messages/create-mv.html: /stable/troubleshooting/index.html

 # Remove the Learn page (replaced with a link to a page in a different repo)

--- a/docs/alternator/compatibility.md
+++ b/docs/alternator/compatibility.md
@@ -117,9 +117,9 @@ request. Alternator can then validate the authenticity and authorization of
 each request using a known list of authorized key pairs.

 In the current implementation, the user stores the list of allowed key pairs
-in the `system_auth_v2.roles` table: The access key ID is the `role` column, and
+in the `system.roles` table: The access key ID is the `role` column, and
 the secret key is the `salted_hash`, i.e., the secret key can be found by
-`SELECT salted_hash from system_auth_v2.roles WHERE role = ID;`.
+`SELECT salted_hash from system.roles WHERE role = ID;`.

 <!--- REMOVE IN FUTURE VERSIONS - Remove the note below in version 6.1 -->

--- a/docs/architecture/images/tablets-cluster.png
+++ b/docs/architecture/images/tablets-cluster.png
--- a/docs/architecture/images/tablets-load-balancing.png
+++ b/docs/architecture/images/tablets-load-balancing.png
--- a/docs/architecture/index.rst
+++ b/docs/architecture/index.rst
@@ -4,6 +4,7 @@ ScyllaDB Architecture
   :titlesonly:
   :hidden:
 
+   Data Distribution with Tablets </architecture/tablets>
   ScyllaDB Ring Architecture <ringarchitecture/index/>
   ScyllaDB Fault Tolerance <architecture-fault-tolerance>
   Consistency Level Console Demo <console-CL-full-demo>
@@ -13,6 +14,7 @@ ScyllaDB Architecture
   Raft Consensus Algorithm in ScyllaDB </architecture/raft>
   
              
+* :doc:`Data Distribution with Tablets </architecture/tablets/>` - Tablets in ScyllaDB
 * :doc:`ScyllaDB Ring Architecture </architecture/ringarchitecture/index/>` - High-Level view of ScyllaDB Ring Architecture
 * :doc:`ScyllaDB Fault Tolerance </architecture/architecture-fault-tolerance>` - Deep dive into ScyllaDB Fault Tolerance
 * :doc:`Consistency Level Console Demo </architecture/console-CL-full-demo>` - Console Demos of Consistency Level Settings
--- a/docs/architecture/tablets.rst
+++ b/docs/architecture/tablets.rst
@@ -0,0 +1,131 @@
+=========================================
+Data Distribution with Tablets
+=========================================
+
+A ScyllaDB cluster is a group of interconnected nodes. The data of the entire 
+cluster has to be distributed as evenly as possible across those nodes.
+
+ScyllaDB is designed to ensure a balanced distribution of data by storing data
+in tablets. When you add or remove nodes to scale your cluster, add or remove
+a datacenter, or replace a node, tablets are moved between the nodes to keep
+the same number on each node. In addition, tablets are balanced across shards
+in each node.
+
+This article explains the concept of tablets and how they let you scale your
+cluster quickly and seamlessly.
+
+Data Distribution
+-------------------
+
+ScyllaDB distributes data by splitting tables into tablets. Each tablet has 
+its replicas on different nodes, depending on the RF (replication factor). Each
+partition of a table is mapped to a single tablet in a deterministic way. When you
+query or update the data, ScyllaDB can quickly identify the tablet that stores
+the relevant partition. 
+
+The following example shows a 3-node cluster with a replication factor (RF) of
+3. The data is stored in a table (Table 1) with two rows. Both rows are mapped
+to one tablet (T1) with replicas on all three nodes.
+
+.. image:: images/tablets-cluster.png
+
+.. TODO - Add a section about tablet splitting when there are more triggers,
+   like throughput. In 6.0, tablets only split when reaching a threshold size
+   (the threshold is based on the average tablet data size).
+
+Load Balancing
+==================
+
+ScyllaDB autonomously moves tablets to balance the load. This process
+is managed by a load balancer mechanism and happens independently of
+the administrator. The tablet load balancer decides where to migrate
+the tablets, either within the same node to balance the shards or across 
+the nodes to balance the global load in the cluster.
+
+As a table grows, each tablet can split into two, creating a new tablet.
+The load balancer can migrate the split halves independently to different nodes
+or shards.
+
+The load-balancing process takes place in the background and is performed
+without any service interruption.
+
+Scaling Out
+=============
+
+A tablet can be dynamically migrated to an existing node or a newly added
+empty node. Paired with consistent topology updates with Raft, tablets allow
+you to add multiple nodes simultaneously. After nodes are added to the cluster,
+existing nodes stream data to the new ones, and the system load eventually
+converges to an even distribution as the process completes. 
+
+With tablets enabled, manual cleanup is not required.
+Cleanup is performed automatically per tablet,
+making tablets-based streaming user-independent and safer.
+
+In addition, tablet cleanup is lightweight and efficient, as it doesn't
+involve rewriting SStables on the existing nodes, which makes data ownership 
+changes faster. This dramatically reduces 
+the impact of cleanup on the performance of user queries.
+
+The following diagrams show migrating tablets from heavily loaded nodes A and B
+to a new node.
+
+.. image:: images/tablets-load-balancing.png
+
+.. _tablets-enable-tablets: 
+
+Enabling Tablets
+-------------------
+
+ScyllaDB now uses tablets by default for data distribution. This functionality is
+controlled by the :confval:`enable_tablets` option. However, tablets only work if
+enabled on all nodes within the cluster.
+
+When creating a new keyspace with tablets enabled (the default), you can still disable
+them on a per-keyspace basis. The recommended ``NetworkTopologyStrategy`` for keyspaces
+remains *required* when using tablets.
+
+You can create a keyspace with tablets
+disabled with the ``tablets = {'enabled': false}`` option:
+
+.. code:: cql
+
+    CREATE KEYSPACE my_keyspace
+    WITH replication = {
+        'class': 'NetworkTopologyStrategy',
+        'replication_factor': 3
+    } AND tablets = {
+        'enabled': false
+    };
+
+
+
+.. warning::
+
+    You cannot ALTER a keyspace to enable or disable tablets.
+    The only way to update the tablet support for a keyspace is to DROP it
+    (losing the schema and data) and then recreate it after redefining 
+    the keyspace schema with ``tablets = { 'enabled': false }`` or 
+    ``tablets = { 'enabled': true }``.
+
+Limitations and Unsupported Features
+--------------------------------------
+
+The following ScyllaDB features are not supported if a keyspace has tablets
+enabled:
+
+* Counters
+* Change Data Capture (CDC)
+* Lightweight Transactions (LWT)
+* Alternator (as it uses LWT)
+
+If you plan to use any of the above features, CREATE your keyspace
+:ref:`with tablets disabled <tablets-enable-tablets>`.
+
+Resharding in keyspaces with tablets enabled has the following limitations:
+
+* ScyllaDB does not support reducing the number of shards after node restart.
+* ScyllaDB does not reshard data on node restart. Tablet replicas remain
+  allocated to the old shards on restart and are subject to background
+  load-balancing to additional shards after restart completes and the node 
+  starts serving CQL.
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -44,7 +44,8 @@ extensions = [
    "scylladb_gcp_images",
    "scylladb_include_flag",
    "scylladb_dynamic_substitutions",
-    "scylladb_swagger"
+    "scylladb_swagger",
+    "scylladb_metrics"
 ]

 # The suffix(es) of source filenames.
@@ -127,6 +128,10 @@ scylladb_swagger_origin_api = "../api"
 scylladb_swagger_template = "swagger.tmpl"
 scylladb_swagger_inc_template = "swagger_inc.tmpl"

+# -- Options for scylladb_metrics
+scylladb_metrics_directory = "_data/opensource/metrics"
+
+
 # -- Options for HTML output

 # The theme to use for pages.
--- a/docs/cql/ddl.rst
+++ b/docs/cql/ddl.rst
@@ -107,12 +107,6 @@ For example:
   WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 3}
   AND durable_writes = true;

-.. TODO Add a link to the description of minimum_keyspace_rf when the ScyllaDB options section is added to the docs.
-
-You can configure the minimum acceptable replication factor using the ``minimum_keyspace_rf`` option. 
-Attempting to create a keyspace with a replication factor lower than the value set with 
-``minimum_keyspace_rf`` will return an error (the default value is 0). 
-
 The supported ``options`` are:

 =================== ========== =========== ========= ===================================================================
@@ -122,7 +116,7 @@ name                 kind       mandatory   default   description
                                                      details below).
 ``durable_writes``   *simple*   no          true      Whether to use the commit log for updates on this keyspace
                                                      (disable this option at your own risk!).
-``tablets``          *map*      no                    Experimental - enables tablets for this keyspace (see :ref:`tablets<tablets>`)
+``tablets``          *map*      no                    Enables or disables tablets for the keyspace (see :ref:`tablets<tablets>`)
 =================== ========== =========== ========= ===================================================================

 The ``replication`` property is mandatory and must at least contains the ``'class'`` sub-option, which defines the
@@ -142,7 +136,12 @@ query latency. For a production ready strategy, see *NetworkTopologyStrategy* .
 ========================= ====== ======= =============================================
 sub-option                 type   since   description
 ========================= ====== ======= =============================================
-``'replication_factor'``   int    all     The number of replicas to store per range
+``'replication_factor'``   int    all     The number of replicas to store per range.
+
+                                          The replication factor should be equal to
+                                          or lower than the number of nodes.
+                                          Configuring a higher RF may prevent
+                                          creating tables in that keyspace. 
 ========================= ====== ======= =============================================

 .. note:: Using NetworkTopologyStrategy is recommended. Using SimpleStrategy will make it harder to add Data Center in the future.
@@ -166,6 +165,11 @@ sub-option                             type  description
                                             definitions or explicit datacenter settings.
                                             For example, to have three replicas per
                                             datacenter, supply this with a value of 3.
+
+                                             The replication factor configured for a DC
+                                             should be equal to or lower than the number
+                                             of nodes in that DC. Configuring a higher RF 
+                                             may prevent creating tables in that keyspace. 
 ===================================== ====== =============================================

 Note that when ``ALTER`` ing keyspaces and supplying ``replication_factor``,
@@ -213,39 +217,30 @@ An example that excludes a datacenter while using ``replication_factor``::

 .. _tablets:

-The ``tablets`` property :label-caution:`Experimental`
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The ``tablets`` property
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-The ``tablets`` property is used to make keyspace replication tablets-based.
-It is only valid when ``experimental_features: tablets`` is specified in ``scylla.yaml`` (which
-in turn requires ``consistent_cluster_management: true``); it must be a power of two.
+The ``tablets`` property enables or disables tablets-based distribution
+for a keyspace. 

 Options:

 ===================================== ====== =============================================
 sub-option                             type  description
 ===================================== ====== =============================================
-``'enabled'``                          bool  Whether or not to enable tablets for keyspace
+``'enabled'``                          bool  Whether or not to enable tablets for a keyspace
 ``'initial'``                          int   The number of tablets to start with
 ===================================== ====== =============================================

-By default if tablets cluster feature is enabled, any keyspace will be created with tablets
-enabled. The ``tablets`` option is used to opt-out a keyspace from tablets replication.
+By default, a keyspace is created with tablets enabled. The ``tablets`` option 
+is used to opt out a keyspace from tablets-based distribution; see :ref:`Enabling Tablets <tablets-enable-tablets>`
+for details.

 A good rule of thumb to calculate initial tablets is to divide the expected total storage used
 by tables in this keyspace by (``replication_factor`` * 5GB). For example, if you expect a 30TB
 table and have a replication factor of 3, divide 30TB by (3*5GB) for a result of 2000. Since the
 value must be a power of two, round up to 2048.
-
-.. note::
-   The calculation applies to every table in the keyspace independently; so it can only realistically be
-   used for a keyspace containing a single table. It is expected that per-table controls will be available
-   in the future.
-
-.. caution::
-   The ``initial`` option may change its definition or be completely removed as it is part
-   of an experimental feature.
-
+The calculation applies to every table in the keyspace.

 An example that creates a keyspace with 2048 tablets per table::

@@ -257,6 +252,9 @@ An example that creates a keyspace with 2048 tablets per table::
        'initial': 2048
    };

+
+See :doc:`Data Distribution with Tablets </architecture/tablets>` for more information about tablets.
+
 .. _use-statement:        
        
 USE
@@ -289,6 +287,17 @@ For instance::

 The supported options are the same as :ref:`creating a keyspace <create-keyspace-statement>`.

+ALTER KEYSPACE with Tablets :label-caution:`Experimental`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Modifying a keyspace with tablets enabled is possible and doesn't require any special CQL syntax. However, there are some limitations:
+
+- The replication factor (RF) can be increased or decreased by at most 1 at a time. To reach the desired RF value, modify the RF repeatedly.
+- The ``ALTER`` statement rejects the ``replication_factor`` tag. List the DCs explicitly when altering a keyspace. See :ref:`NetworkTopologyStrategy <replication-strategy>`.
+- If there's any other ongoing global topology operation, executing the ``ALTER`` statement will fail (with an explicit and specific error) and needs to be repeated.
+- The ``ALTER`` statement may take longer than the regular query timeout, and even if it times out, it will continue to execute in the background.
+- The replication strategy cannot be modified, as keyspaces with tablets only support ``NetworkTopologyStrategy``.
+
 .. _drop-keyspace-statement:

 DROP KEYSPACE
--- a/docs/dev/docker-hub.md
+++ b/docs/dev/docker-hub.md
@@ -341,7 +341,7 @@ The `--authenticator` command lines option allows to provide the authenticator c

 #### `--authorizer AUTHORIZER`

-The `--authorizer` command lines option allows to provide the authorizer class ScyllaDB will use. By default ScyllaDB uses the `AllowAllAuthorizer` which allows any action to any user. The second option is using the `CassandraAuthorizer` parameter, which stores permissions in `system_auth_v2.permissions` table.
+The `--authorizer` command lines option allows to provide the authorizer class ScyllaDB will use. By default ScyllaDB uses the `AllowAllAuthorizer` which allows any action to any user. The second option is using the `CassandraAuthorizer` parameter, which stores permissions in `system.permissions` table.

 **Since: 2.3**

--- a/docs/dev/service_levels.md
+++ b/docs/dev/service_levels.md
@@ -6,7 +6,7 @@ There are two system tables that are used to facilitate the service level featur
 ### Service Level Attachment Table

 ```
-    CREATE TABLE system_auth_v2.role_attributes (
+    CREATE TABLE system.role_attributes (
    role text,
    attribute_name text,
    attribute_value text,
@@ -23,7 +23,7 @@ So for example in order to find out which `service_level` is attached to role `r
 one can run the following query:

 ```
-SELECT * FROM  system_auth_v2.role_attributes WHERE role='r' and attribute_name='service_level'
+SELECT * FROM  system.role_attributes WHERE role='r' and attribute_name='service_level'

 ```

@@ -157,4 +157,4 @@ The command displays a table with: option name, effective service level the valu
 ----------------------+-------------------------+-------------
        workload_type |                     sl2 |       batch
              timeout |                     sl1 |          2s
-```
+```
--- a/docs/dev/task_manager.md
+++ b/docs/dev/task_manager.md
@@ -0,0 +1,63 @@
+Task manager is a tool for tracking long-running background
+operations.
+
+# Structure overview
+
+Task manager is divided into modules, e.g. repair or compaction
+module, which keep track of operations of similar nature. Operations
+are tracked with tasks.
+
+Each task covers a logical part of the operation, e.g repair
+of a keyspace or a table. Each operation is covered by a tree
+of tasks, e.g. global repair task is a parent of tasks covering
+a single keyspace, which are parents of table tasks.
+
+# Time to live of a task
+
+Root tasks are kept in task manager for `task_ttl` time after they are
+finished. `task_ttl` value can be set in node configuration with
+`--task-ttl-in-seconds` option or changed with task manager API
+(`/task_manager/ttl`).
+
+A task which isn't a root is unregistered immediately after it is
+finished and its status is folded into its parent. When a task
+is being folded into its parent, info about each of its children is
+lost unless the child or any child's descendant failed.
+
+# Internal
+
+Tasks can be marked as `internal`, which means they are not listed
+by default. A task should be marked as internal if it has a parent
+or if it's supposed to be unregistered immediately after it's finished.
+
+# Abortable
+
+A flag which determines if a task can be aborted through API.
+
+# Type vs scope
+
+`type` of a task describes what operation is covered by a task,
+e.g. "major compaction".
+
+`scope` of a task describes for which part of the operation
+the task is responsible, e.g. "shard".
+
+# API
+
+Documentation for task manager API is available under `api/api-doc/task_manager.json`.
+Briefly:
+- `/task_manager/list_modules` -
+        lists module supported by task manager;
+- `/task_manager/list_module_tasks/{module}` -
+        lists (by default non-internal) tasks in the module;
+- `/task_manager/task_status/{task_id}` -
+        gets the task's status, unregisters the task if it's finished;
+- `/task_manager/abort_task/{task_id}` -
+        aborts the task if it's abortable;
+- `/task_manager/wait_task/{task_id}` -
+        waits for the task and gets its status;
+- `/task_manager/task_status_recursive/{task_id}` -
+        gets statuses of the task and all its descendants in BFS
+        order, unregisters the task;
+- `/task_manager/ttl` -
+        sets new ttl, returns old value.
--- a/docs/dev/topology-over-raft.md
+++ b/docs/dev/topology-over-raft.md
@@ -549,7 +549,10 @@ CREATE TABLE system.topology (
    committed_cdc_generations set<tuple<timestamp, timeuuid>> static,
    unpublished_cdc_generations set<tuple<timestamp, timeuuid>> static,
    global_topology_request text static,
+    global_topology_request_id timeuuid static,
    new_cdc_generation_data_uuid timeuuid static,
+    new_keyspace_rf_change_ks_name text static,
+    new_keyspace_rf_change_data frozen<map<text, text>> static,
    PRIMARY KEY (key, host_id)
 )
 ```
@@ -575,8 +578,11 @@ There are also a few static columns for cluster-global properties:
 - `committed_cdc_generations` - the IDs of the committed CDC generations
 - `unpublished_cdc_generations` - the IDs of the committed yet unpublished CDC generations
 - `global_topology_request` - if set, contains one of the supported global topology requests
+- `global_topology_request_id` - if set, contains global topology request's id, which is a new group0's state id
 - `new_cdc_generation_data_uuid` - used in `commit_cdc_generation` state, the time UUID of the generation to be committed
 - `upgrade_state` - describes the progress of the upgrade to raft-based topology.
+- 'new_keyspace_rf_change_ks_name' - the name of the KS that is being the target of the scheduled ALTER KS statement 
+- 'new_keyspace_rf_change_data' - the KS options to be used when executing the scheduled ALTER KS statement

 # Join procedure

--- a/docs/getting-started/_common/os-support-info.rst
+++ b/docs/getting-started/_common/os-support-info.rst
@@ -1,15 +1,15 @@
 You can `build ScyllaDB from source <https://github.com/scylladb/scylladb#build-prerequisites>`_ on other x86_64 or aarch64 platforms, without any guarantees.

-+----------------------------+-------------+---------------+---------------+
-| Linux Distributions        |Ubuntu       | Debian        | Rocky /       |
-|                            |             |               | RHEL          |
-+----------------------------+------+------+-------+-------+-------+-------+
-| ScyllaDB Version / Version |20.04 |22.04 |  10   |  11   |   8   |   9   |
-+============================+======+======+=======+=======+=======+=======+
-|   6.0                      | |v|  | |v|  | |v|   | |v|   | |v|   | |v|   |
-+----------------------------+------+------+-------+-------+-------+-------+
-|   5.4                      | |v|  | |v|  | |v|   | |v|   | |v|   | |v|   |
-+----------------------------+------+------+-------+-------+-------+-------+
+----------------------------+--------------------+-------+---------------+
+| Linux Distributions        |Ubuntu              | Debian| Rocky /       |
+|                            |                    |       | RHEL          |
+----------------------------+------+------+------+-------+-------+-------+
+| ScyllaDB Version / Version |20.04 |22.04 |24.04 |  11   |   8   |   9   |
+============================+======+======+======+=======+=======+=======+
+|   6.0                      | |v|  | |v|  | |v|  | |v|   | |v|   | |v|   |
+----------------------------+------+------+------+-------+-------+-------+
+|   5.4                      | |v|  | |v|  | |x|  | |v|   | |v|   | |v|   |
+----------------------------+------+------+------+-------+-------+-------+

 * The recommended OS for ScyllaDB Open Source is Ubuntu 22.04.
 * All releases are available as a Docker container and EC2 AMI, GCP, and Azure images. 
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -40,7 +40,6 @@ Join the ScyllaDB Open Source community:
 * Contribute to the ScyllaDB Open Source `project <https://github.com/scylladb/scylladb>`_.
 * Join the `ScyllaDB Community Forum <https://forum.scylladb.com/>`_.
 * Join our `Slack Channel <https://slack.scylladb.com/>`_.
-* Sign up for the `scylladb-users <https://groups.google.com/d/forum/scylladb-users>`_ Google group.

 Learn How to Use ScyllaDB
 ---------------------------
--- a/docs/operating-scylla/nodetool-commands/decommission.rst
+++ b/docs/operating-scylla/nodetool-commands/decommission.rst
@@ -3,16 +3,31 @@ nodetool decommission

 **decommission** - Deactivate a selected node by streaming its data to the next node in the ring.

-.. note::
-
-    You cannot decomission a node if any existing node is down.
-
 For example:

 ``nodetool decommission``

-.. include:: /operating-scylla/_common/decommission_warning.rst
-
 Use the ``nodetool netstats`` command to monitor the progress of the token reallocation.             

+.. note::
+
+    You cannot decomission a node if any existing node is down.
+
+See :doc:`Remove a Node from a ScyllaDB Cluster (Down Scale) </operating-scylla/procedures/cluster-management/remove-node>`
+for procedure details.
+
+Before you run ``nodetool decommission``:
+
+* Review current disk space utilization on existing nodes and make sure the amount 
+  of data streamed from the node being removed can fit into the disk space available
+  on the remaining nodes. If there is not enough disk space on the remaining nodes,
+  the removal of a node will fail. Add more storage to remaining nodes **before**
+  starting the removal procedure.
+* Make sure that the number of nodes remaining in the DC after you decommission a node
+  will be the same or higher than the Replication Factor configured for the keyspace
+  in this DC. If the number of remaining nodes is lower than the RF, the decommission
+  request may fail.
+  In such a case, ALTER the keyspace to reduce the RF before running ``nodetool decommission``.
+
+
 .. include:: nodetool-index.rst
--- a/docs/operating-scylla/nodetool-commands/describering.rst
+++ b/docs/operating-scylla/nodetool-commands/describering.rst
@@ -2,14 +2,28 @@ Nodetool describering
 =====================

 **describering** - :code:`<keyspace>`- Shows the partition ranges of a given keyspace.
-
 For example:

 .. code-block:: shell

   nodetool describering nba

-Example output (for three node cluster on AWS):
+If :doc:`tablets </architecture/tablets>` are enabled for your keyspace, you
+need to additionally specify the table name. The command will display the ring
+of the table.
+
+.. code:: shell
+
+   nodetool describering <keyspace> <table>
+
+For example:
+
+.. code-block:: shell
+
+   nodetool describering nba player_name
+
+
+Example output (for a three-node cluster on AWS with tablets disabled):

 .. code-block:: shell

--- a/docs/operating-scylla/nodetool-commands/removenode.rst
+++ b/docs/operating-scylla/nodetool-commands/removenode.rst
@@ -21,9 +21,16 @@ is removed from the cluster or replaced.
 Prerequisites
 ------------------------

-Using ``removenode`` requires at least a quorum of nodes in a cluster to be available. 
-If the quorum is lost, it must be restored before you change the cluster topology. 
-See :doc:`Handling Node Failures </troubleshooting/handling-node-failures>` for details. 
+* Using ``removenode`` requires at least a quorum of nodes in a cluster to be available. 
+  If the quorum is lost, it must be restored before you change the cluster topology. 
+  See :doc:`Handling Node Failures </troubleshooting/handling-node-failures>` for details.
+
+* Make sure that the number of nodes remaining in the DC after you remove a node
+  will be the same or higher than the Replication Factor configured for the keyspace
+  in this DC. If the number of remaining nodes is lower than the RF, the removenode
+  request may fail. In such a case, you should follow the procedure to
+  :doc:`replace a dead node </operating-scylla/procedures/cluster-management/replace-dead-node>`
+  instead of running ``nodetool removenode``.

 Usage
 --------
--- a/docs/operating-scylla/nodetool-commands/ring.rst
+++ b/docs/operating-scylla/nodetool-commands/ring.rst
@@ -1,10 +1,12 @@
 Nodetool ring
 =============
-**ring** ``[<keyspace>]`` - The nodetool ring command displays the token
+**ring** ``[<keyspace>] [<table>]`` - The nodetool ring command displays the token
 ring information. The token ring is responsible for managing the
 partitioning of data within the Scylla cluster. This command is
 critical if a cluster is facing data consistency issues.

+By default, ``ring`` command shows all keyspaces.
+
 For example:

 .. code:: sh
@@ -16,13 +18,23 @@ tokens that are assigned to each one of them. It will also show the
 status of each of the nodes.

 +------------+-----+-----------+-------+--------------+-----------+---------------------------+
-|Address     |Rack |  Status   |State  |      Load    |  Owns	  |  Token                    |
+|Address     |Rack |  Status   |State  |      Load    |  Owns     |  Token                    |
 +============+=====+===========+=======+==============+===========+===========================+
-|172.30.0.64 | 1b  |    Up     | Normal|551.31 MB     |	Mykespace | 1006916943685901788       |
+|172.30.0.64 | 1b  |    Up     | Normal|551.31 MB     | Mykespace | 1006916943685901788       |
 +------------+-----+-----------+-------+--------------+-----------+---------------------------+
 |172.30.0.62 | 1b  |    Up     | Normal|541.59 MB     | Mykespace | 1024434117767101090       |
 +------------+-----+-----------+-------+--------------+-----------+---------------------------+
-|172.30.0.61 | 1b  |    Up     | Normal|541.59 MB     |	Mykespace | 1043327858966261499       |
+|172.30.0.61 | 1b  |    Up     | Normal|541.59 MB     | Mykespace | 1043327858966261499       |
 +------------+-----+-----------+-------+--------------+-----------+---------------------------+

+You can specify a ``<keyspace>`` name to filter the output and focus on
+a specific keyspace. Another optional argument ``<table>`` allows you
+to further narrow down. For keyspaces with :doc:`tablets </architecture/tablets>`
+enabled, you need to provide both ``<keyspace>`` and ``<table>``. This
+will display the partition ranges for that specific table.
+
+.. code:: sh
+
+   nodetool ring <keyspace> <table>
+
 .. include:: nodetool-index.rst
--- a/docs/operating-scylla/nodetool-commands/status.rst
+++ b/docs/operating-scylla/nodetool-commands/status.rst
@@ -2,11 +2,14 @@ Nodetool status
 ===============
 **status** - This command prints the cluster information for a single keyspace or all keyspaces.

+The keyspace argument is required to calculate effective ownership information (``Owns`` column).
+For tablet keyspaces, a table is also required for effective ownership.
+
 For example:

 ::

-    nodetool status
+    nodetool status my_keyspace

 Example output:

--- a/docs/operating-scylla/procedures/backup-restore/backup.rst
+++ b/docs/operating-scylla/procedures/backup-restore/backup.rst
@@ -29,11 +29,16 @@ With time, SSTables are compacted, but the hard link keeps a copy of each file.

 | 1. Data can only be restored from a snapshot of the table schema, where data exists in a backup. Backup your schema with the following command:

-| ``$: cqlsh -e "DESC SCHEMA" > <schema_name.cql>``
+| ``$: cqlsh -e "DESC SCHEMA WITH INTERNALS" > <schema_name.cql>``

 For example:

-| ``$: cqlsh -e "DESC SCHEMA" > db_schema.cql``
+| ``$: cqlsh -e "DESC SCHEMA WITH INTERNALS" > db_schema.cql``
+
+.. warning::
+
+  To get a proper schema description, you need to use cqlsh at least in version ``6.0.19``. Restoring a schema backup created by
+  an older version of cqlsh may lead to data resurrection or data loss. To check the version of your cqlsh, you can use ``cqlsh --version``.

 |
 | 2. Take a snapshot, including every keyspace you want to backup.
--- a/docs/operating-scylla/procedures/cluster-management/_common/upgrade-warning-add-new-node-or-dc.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/upgrade-warning-add-new-node-or-dc.rst
@@ -17,8 +17,8 @@ limitations while applying the procedure:
  retry, or the node refuses to boot on subsequent attempts, consult the 
  :doc:`Handling Membership Change Failures </operating-scylla/procedures/cluster-management/handling-membership-change-failures>`
  document. 
-* The ``system_auth`` keyspace has not been upgraded to ``system_auth_v2``. 
+* The ``system_auth`` keyspace has not been upgraded to ``system``.
  As a result, if ``authenticator`` is set to ``PasswordAuthenticator``, you must 
  increase the replication factor of the ``system_auth`` keyspace. It is 
  recommended to set ``system_auth`` replication factor to the number of nodes 
-  in each DC.
+  in each DC.
--- a/docs/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.rst
+++ b/docs/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.rst
@@ -156,7 +156,9 @@ Add New DC
      UN   54.160.174.243  109.54 KB       256     ?               c7686ffd-7a5b-4124-858e-df2e61130aaa    RACK1
      UN   54.235.9.159    109.75 KB       256     ?               39798227-9f6f-4868-8193-08570856c09a    RACK1
      UN   54.146.228.25   128.33 KB       256     ?               7a4957a1-9590-4434-9746-9c8a6f796a0c    RACK1
-   
+
+.. TODO possibly provide additional information WRT how ALTER works with tablets
+
 #. When all nodes are up and running ``ALTER`` the following Keyspaces in the new nodes:

   * Keyspace created by the user (which needed to replicate to the new DC).
--- a/docs/operating-scylla/procedures/cluster-management/handling-membership-change-failures.rst
+++ b/docs/operating-scylla/procedures/cluster-management/handling-membership-change-failures.rst
@@ -70,11 +70,46 @@ Step One: Determining Host IDs of Ghost Members
 If you cannot determine the ghost members' host ID using the suggestions above, use the method described below.

 #. Make sure there are no ongoing membership changes.
-#. Execute the following CQL query on one of your nodes to obtain the host IDs of all token ring members:
+
+#. Execute the following CQL query on one of your nodes to retrieve the Raft group 0 ID:

   .. code-block:: cql
    
-    select peer, host_id, up from system.cluster_status;
+    select value from system.scylla_local where key = 'raft_group0_id'
+
+   For example:
+
+   .. code-block:: cql
+    
+    cqlsh> select value from system.scylla_local where key = 'raft_group0_id';
+
+     value
+    --------------------------------------
+     607fef80-c276-11ed-a6f6-3075f294cc65
+
+#. Use the obtained Raft group 0 ID to query the set of all cluster members' host IDs (which includes the ghost members), by executing the following query:
+
+   .. code-block:: cql
+    
+    select server_id from system.raft_state where group_id = <group0_id>
+
+   replace ``<group0_id>`` with the group 0 ID that you obtained. For example:
+
+   .. code-block:: cql
+    
+    cqlsh> select server_id from system.raft_state where group_id = 607fef80-c276-11ed-a6f6-3075f294cc65;
+
+     server_id
+    --------------------------------------
+     26a9badc-6e96-4b86-a8df-5173e5ab47fe
+     7991e7f5-692e-45a0-8ae5-438be5bc7c4f
+     aff11c6d-fbe7-4395-b7ca-3912d7dba2c6
+
+#. Execute the following CQL query to obtain the host IDs of all token ring members:
+
+   .. code-block:: cql
+    
+    select host_id, up from system.cluster_status;

   For example:

@@ -83,25 +118,28 @@ If you cannot determine the ghost members' host ID using the suggestions above,
    cqlsh> select peer, host_id, up from system.cluster_status;

     peer      | host_id                              | up
-     -----------+--------------------------------------+-------
-     127.0.0.3 | 42405b3b-487e-4759-8590-ddb9bdcebdc5 | False
-     127.0.0.1 | 4e3ee715-528f-4dc9-b10f-7cf294655a9e |  True
-     127.0.0.2 | 225a80d0-633d-45d2-afeb-a5fa422c9bd5 |  True
+    -----------+--------------------------------------+-------
+     127.0.0.3 |                                 null | False
+     127.0.0.1 | 26a9badc-6e96-4b86-a8df-5173e5ab47fe |  True
+     127.0.0.2 | 7991e7f5-692e-45a0-8ae5-438be5bc7c4f |  True

   The output of this query is similar to the output of ``nodetool status``.

-   We included the ``up`` column to see which nodes are down.
+   We included the ``up`` column to see which nodes are down and the ``peer`` column to see their IP addresses.

-   In this example, one of the 3 nodes tried to decommission but crashed while it was leaving the token ring. The node is in a partially left state and will refuse to restart, but other nodes still consider it as a normal member. We'll have to use ``removenode`` to clean up after it.
+   In this example, one of the nodes tried to decommission and crashed as soon as it left the token ring but before it left the Raft group. Its entry will show up in ``system.cluster_status`` queries with ``host_id = null``, like above, until the cluster is restarted.

-#. A host ID belongs to a ghost member if it appears in the ``system.cluster_status`` query but does not correspond to any remaining node in your cluster.
+#. A host ID belongs to a ghost member if:
+
+   * It appears in the ``system.raft_state`` query but not in the ``system.cluster_status`` query,
+   * Or it appears in the ``system.cluster_status`` query but does not correspond to any remaining node in your cluster.
+
+   In our example, the ghost member's host ID was ``aff11c6d-fbe7-4395-b7ca-3912d7dba2c6`` because it appeared in the ``system.raft_state`` query but not in the ``system.cluster_status`` query.

   If you're unsure whether a given row in the ``system.cluster_status`` query corresponds to a node in your cluster, you can connect to each node in the cluster and execute ``select host_id from system.local`` (or search the node's logs) to obtain that node's host ID, collecting the host IDs of all nodes in your cluster. Then check if each host ID from the ``system.cluster_status`` query appears in your collected set; if not, it's a ghost member.

   A good rule of thumb is to look at the members marked as down (``up = False`` in ``system.cluster_status``) - ghost members are eventually marked as down by the remaining members of the cluster. But remember that a real member might also be marked as down if it was shutdown or partitioned away from the rest of the cluster. If in doubt, connect to each node and collect their host IDs, as described in the previous paragraph.

-   In our example, the ghost member's host ID is ``42405b3b-487e-4759-8590-ddb9bdcebdc5`` because it is the only member marked as down and we can verify that the other two rows appearing in ``system.cluster_status`` belong to the remaining 2 nodes in the cluster.
-
 In some cases, even after a failed topology change, there may be no ghost members left - for example, if a bootstrapping node crashed very early in the procedure or a decommissioning node crashed after it committed the membership change but before it finalized its own shutdown steps.

 If any ghost members are present, proceed to the next step.
--- a/docs/operating-scylla/procedures/cluster-management/replace-dead-node.rst
+++ b/docs/operating-scylla/procedures/cluster-management/replace-dead-node.rst
@@ -190,11 +190,11 @@ In this case, the node's data will be cleaned after restart. To remedy this, you

 #. Start Scylla Server

-   .. include:: /rst_include/scylla-commands-stop-index.rst
+   .. include:: /rst_include/scylla-commands-start-index.rst

 Sometimes the public/ private IP of instance is changed after restart. If so refer to the Replace Procedure_ above.


 .. _replace-node-upgrade-info:

-.. scylladb_include_flag:: upgrade-warning-replace-node.rst
+.. scylladb_include_flag:: upgrade-warning-replace-node.rst
--- a/docs/operating-scylla/security/authentication.rst
+++ b/docs/operating-scylla/security/authentication.rst
@@ -31,10 +31,10 @@ Procedure

       cqlsh -u cassandra -p cassandra

-   .. warning::
+   .. note::

-      Before proceeding  to the next step, we highly recommend creating a custom superuser 
-      to ensure security and prevent performance degradation.
+      Before proceeding  to the next step, we recommend creating a custom superuser
+      to improve security.
      See :doc:`Creating a Custom Superuser </operating-scylla/security/create-superuser/>` for instructions.

 #. If you want to create users and roles, continue to :doc:`Enable Authorization </operating-scylla/security/enable-authorization>`.
--- a/docs/operating-scylla/security/create-superuser.rst
+++ b/docs/operating-scylla/security/create-superuser.rst
@@ -6,12 +6,7 @@ The default ScyllaDB superuser role is ``cassandra`` with password ``cassandra``
 Users with the ``cassandra`` role have full access to the database and can run 
 any CQL command on the database resources.

-During login, the credentials for the default superuser ``cassandra`` are read with 
-a consistency level of QUORUM, whereas those for all other roles are read at LOCAL_ONE. 
-QUORUM may significantly impact performance, especially in multi-datacenter deployments.
-
-To prevent performance degradation and ensure better security, we highly recommend creating 
-a custom superuser. You should:
+To improve security, we recommend creating a custom superuser. You should:

 #. Use the default ``cassandra`` superuser to log in.
 #. Create a custom superuser.
--- a/docs/operating-scylla/security/enable-authorization.rst
+++ b/docs/operating-scylla/security/enable-authorization.rst
@@ -57,13 +57,13 @@ Set a Superuser
 The default ScyllaDB superuser role is ``cassandra`` with password ``cassandra``. Using the default
 superuser is unsafe and may significantly impact performance. 

-If you haven't created a custom superuser while enablint authentication, you should create a custom superuser
+If you haven't created a custom superuser while enabling authentication, you should create a custom superuser
 before creating additional roles. 
 See :doc:`Creating a Custom Superuser </operating-scylla/security/create-superuser/>` for instructions.

-.. warning::
+.. note::
   
-   We highly recommend creating a custom superuser to ensure security and avoid performance degradation.
+   We recommend creating a custom superuser to improve security.

 .. _roles:

--- a/docs/reference/_common/enterprise-vs-oss-matrix-link.rst
+++ b/docs/reference/_common/enterprise-vs-oss-matrix-link.rst
@@ -0,0 +1 @@
+`ScyllaDB Enterprise vs. Open Source Matrix <https://enterprise.docs.scylladb.com/stable/reference/versions-matrix-enterprise-oss.html>`_
--- a/docs/reference/_common/reference-toc.rst
+++ b/docs/reference/_common/reference-toc.rst
@@ -0,0 +1,11 @@
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+
+   AWS Images </reference/aws-images>
+   Azure Images </reference/azure-images>
+   GCP Images </reference/gcp-images>
+   Configuration Parameters </reference/configuration-parameters>
+   Glossary </reference/glossary>
+   API Reference (BETA) </reference/api-reference>
+   Metrics (BETA) </reference/metrics>
--- a/docs/reference/index.rst
+++ b/docs/reference/index.rst
@@ -2,8 +2,16 @@
 Reference 
 ===============

-.. toctree::
-   :maxdepth: 1
-   :glob:
+.. scylladb_include_flag:: reference-toc.rst

-   /reference/*
+
+* ScyllaDB images for AWS, Azure, and GCP.
+
+  * :doc:`AWS Images </reference/aws-images>`
+  * :doc:`Azure Images </reference/azure-images>`
+  * :doc:`GCP Images </reference/gcp-images>`
+* :doc:`Configuration Parameters </reference/configuration-parameters>` - ScyllaDB properties configurable in the ``scylla.yaml`` configuration file.
+* :doc:`Glossary </reference/glossary>` - ScyllaDB-related terms and definitions.
+* :doc:`API Reference (BETA) </reference/api-reference>`
+* :doc:`Metrics (BETA) </reference/metrics>`
+* .. scylladb_include_flag:: enterprise-vs-oss-matrix-link.rst
--- a/docs/reference/metrics.rst
+++ b/docs/reference/metrics.rst
@@ -0,0 +1,6 @@
+==============
+Metrics (BETA)
+==============
+
+.. scylladb_metrics::
+  :template: metrics.tmpl
--- a/docs/troubleshooting/error-messages/create-mv.rst
+++ b/docs/troubleshooting/error-messages/create-mv.rst
@@ -1,95 +0,0 @@
-A Removed Node was not Removed Properly from the Seed Node List
-===============================================================
-
-Phenonoma
-^^^^^^^^^
-
-Failed to create :doc:`materialized view </cql/mv>` after node was removed from the cluster. 
-
-
-Error message:
-
-.. code-block:: shell
-
-   InvalidRequest: Error from server: code=2200 [Invalid query] message="Can't create materialized views until the whole cluster has been upgraded"
-
-Problem
-^^^^^^^
-
-A removed node was not removed properly from the seed node list.
-
-Scylla Open Source 4.3 and later and Scylla Enterprise 2021.1 and later are seedless. See :doc:`Scylla Seed Nodes </kb/seed-nodes/>` for details.
-This problem may occur in an earlier version of Scylla.
-
-How to Verify
-^^^^^^^^^^^^^
-
-Scylla logs show the error message above.
-
-To verify that the node wasn't remove properly use the :doc:`nodetool gossipinfo </operating-scylla/nodetool-commands/gossipinfo>` command
-
-For example:
-
-A three nodes cluster, with one node (54.62.0.101) removed.
-
-.. code-block:: shell
-
-   nodetool gossipinfo
-
-   /54.62.0.99
-   generation:1172279348
-   heartbeat:7212
-   LOAD:2.0293227179E10
-   INTERNAL_IP:10.240.0.83
-   DC:E1
-   STATUS:NORMAL,-872190912874367364312
-   HOST_ID:12fdcf43-4642-53b1-a987-c0e825e4e10a
-   RPC_ADDRESS:10.240.0.83
-   RACK:R1
-
-   /54.62.0.100
-   generation:1657463198
-   heartbeat:8135
-   LOAD:2.0114638716E12
-   INTERNAL_IP:10.240.0.93
-   DC:E1
-   STATUS:NORMAL,-258152127640110957173
-   HOST_ID:99acbh55-1013-24a1-a987-s1w718c1e01b
-   RPC_ADDRESS:10.240.0.93
-   RACK:R1
-
-   /54.62.0.101
-   generation:1657463198
-   heartbeat:7022
-   LOAD:2.5173672157E48
-   INTERNAL_IP:10.240.0.103
-   DC:E1
-   STATUS:NORMAL,-365481201980413697284
-   HOST_ID:99acbh55-1301-55a1-a628-s4w254c1e01b
-   RPC_ADDRESS:10.240.0.103
-   RACK:R1
-
-We can see that node ``54.62.0.101`` is still part of the cluster and needs to be removed.
-  
-Solution
-^^^^^^^^
-
-Remove the relevant node from the other nodes seed list (under scylla.yaml) and restart the nodes one by one.
-
-For example:
-
-Seed list before remove the node
-
-.. code-block:: shell
-
-   - seeds: "10.240.0.83,10.240.0.93,10.240.0.103" 
-
-Seed list after removing the node
-
-.. code-block:: shell
-
-   - seeds: "10.240.0.83,10.240.0.93" 
-
-Restart Scylla nodes
-
-.. include:: /rst_include/scylla-commands-restart-index.rst
--- a/Show More
+++ b/Show More
				`@@ -1 +0,0 @@`
				`*Please replace this line with justification for the backport/\ labels added to this PR**`
				`@@ -0,0 +1 @@`
				`ScyllaDB Enterprise vs. Open Source Matrix <https://enterprise.docs.scylladb.com/stable/reference/versions-matrix-enterprise-oss.html>`_