compaction_manager: cancel submission timer on drain

The `drain` method, cancels all running compactions and moves the compaction manager into the disabled state. To move it back to the enabled state, the `enable` method shall be called. This, however, throws an assertion error as the submission time is not cancelled and re-enabling the manager tries to arm the armed timer. Thus, cancel the timer, when calling the drain method to disable the compaction manager. Fixes https://github.com/scylladb/scylladb/issues/24504 All versions are affected. So it's a good candidate for a backport. Closes scylladb/scylladb#24505 (cherry picked from commit a9a53d9178) Closes scylladb/scylladb#24585
Merge '[Backport 6.2] cql: create default superuser if it doesn't exist' from Marcin Maliszkiewicz
2025-06-29 14:40:43 +03:00 · 2025-06-29 14:34:55 +03:00 · 2025-06-28 09:40:37 +03:00 · 2025-06-27 17:50:15 +02:00 · 2025-06-27 17:50:08 +02:00 · 2025-06-27 17:50:01 +02:00
317 changed files with 7917 additions and 2235 deletions
--- a/.github/scripts/auto-backport.py
+++ b/.github/scripts/auto-backport.py
@@ -0,0 +1,186 @@
+#!/usr/bin/env python3
+
+import argparse
+import os
+import re
+import sys
+import tempfile
+import logging
+
+from github import Github, GithubException
+from git import Repo, GitCommandError
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+try:
+    github_token = os.environ["GITHUB_TOKEN"]
+except KeyError:
+    print("Please set the 'GITHUB_TOKEN' environment variable")
+    sys.exit(1)
+
+
+def is_pull_request():
+    return '--pull-request' in sys.argv[1:]
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--repo', type=str, required=True, help='Github repository name')
+    parser.add_argument('--base-branch', type=str, default='refs/heads/master', help='Base branch')
+    parser.add_argument('--commits', default=None, type=str, help='Range of promoted commits.')
+    parser.add_argument('--pull-request', type=int, help='Pull request number to be backported')
+    parser.add_argument('--head-commit', type=str, required=is_pull_request(), help='The HEAD of target branch after the pull request specified by --pull-request is merged')
+    return parser.parse_args()
+
+
+def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft=False):
+    pr_body = f'{pr.body}\n\n'
+    for commit in commits:
+        pr_body += f'- (cherry picked from commit {commit})\n\n'
+    pr_body += f'Parent PR: #{pr.number}'
+    try:
+        backport_pr = repo.create_pull(
+            title=backport_pr_title,
+            body=pr_body,
+            head=f'scylladbbot:{new_branch_name}',
+            base=base_branch_name,
+            draft=is_draft
+        )
+        logging.info(f"Pull request created: {backport_pr.html_url}")
+        backport_pr.add_to_assignees(pr.user)
+        if is_draft:
+            backport_pr.add_to_labels("conflicts")
+            pr_comment = f"@{pr.user} - This PR was marked as draft because it has conflicts\n"
+            pr_comment += "Please resolve them and mark this PR as ready for review"
+            backport_pr.create_issue_comment(pr_comment)
+        logging.info(f"Assigned PR to original author: {pr.user}")
+        return backport_pr
+    except GithubException as e:
+        if 'A pull request already exists' in str(e):
+            logging.warning(f'A pull request already exists for {pr.user}:{new_branch_name}')
+        else:
+            logging.error(f'Failed to create PR: {e}')
+
+
+def get_pr_commits(repo, pr, stable_branch, start_commit=None):
+    commits = []
+    if pr.merged:
+        merge_commit = repo.get_commit(pr.merge_commit_sha)
+        if len(merge_commit.parents) > 1:  # Check if this merge commit includes multiple commits
+            commits.append(pr.merge_commit_sha)
+        else:
+            if start_commit:
+                promoted_commits = repo.compare(start_commit, stable_branch).commits
+            else:
+                promoted_commits = repo.get_commits(sha=stable_branch)
+            for commit in pr.get_commits():
+                for promoted_commit in promoted_commits:
+                    commit_title = commit.commit.message.splitlines()[0]
+                    # In Scylla-pkg and scylla-dtest, for example,
+                    # we don't create a merge commit for a PR with multiple commits,
+                    # according to the GitHub API, the last commit will be the merge commit,
+                    # which is not what we need when backporting (we need all the commits).
+                    # So here, we are validating the correct SHA for each commit so we can cherry-pick
+                    if promoted_commit.commit.message.startswith(commit_title):
+                        commits.append(promoted_commit.sha)
+
+    elif pr.state == 'closed':
+        events = pr.get_issue_events()
+        for event in events:
+            if event.event == 'closed':
+                commits.append(event.commit_id)
+    return commits
+
+
+def create_pr_comment_and_remove_label(pr, comment_body):
+    labels = pr.get_labels()
+    pattern = re.compile(r"backport/\d+\.\d+$")
+    for label in labels:
+        if pattern.match(label.name):
+            print(f"Removing label: {label.name}")
+            comment_body += f'- {label.name}\n'
+            pr.remove_from_labels(label)
+    pr.create_issue_comment(comment_body)
+
+
+def backport(repo, pr, version, commits, backport_base_branch):
+    new_branch_name = f'backport/{pr.number}/to-{version}'
+    backport_pr_title = f'[Backport {version}] {pr.title}'
+    repo_url = f'https://scylladbbot:{github_token}@github.com/{repo.full_name}.git'
+    fork_repo = f'https://scylladbbot:{github_token}@github.com/scylladbbot/{repo.name}.git'
+    with (tempfile.TemporaryDirectory() as local_repo_path):
+        try:
+            repo_local = Repo.clone_from(repo_url, local_repo_path, branch=backport_base_branch)
+            repo_local.git.checkout(b=new_branch_name)
+            is_draft = False
+            for commit in commits:
+                try:
+                    repo_local.git.cherry_pick(commit, '-m1', '-x')
+                except GitCommandError as e:
+                    logging.warning(f'Cherry-pick conflict on commit {commit}: {e}')
+                    is_draft = True
+                    repo_local.git.add(A=True)
+                    repo_local.git.cherry_pick('--continue')
+            if not repo.private and not repo.has_in_collaborators(pr.user.login):
+                repo.add_to_collaborators(pr.user.login, permission="push")
+                comment = f':warning:  @{pr.user.login} you have been added as collaborator to scylladbbot fork '
+                comment += f'Please check your inbox and approve the invitation, once it is done, please add the backport labels again'
+                create_pr_comment_and_remove_label(pr, comment)
+                return
+            repo_local.git.push(fork_repo, new_branch_name, force=True)
+            create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,
+                                is_draft=is_draft)
+
+        except GitCommandError as e:
+            logging.warning(f"GitCommandError: {e}")
+
+
+def main():
+    args = parse_args()
+    base_branch = args.base_branch.split('/')[2]
+    promoted_label = 'promoted-to-master'
+    repo_name = args.repo
+    if 'scylla-enterprise' in args.repo:
+        promoted_label = 'promoted-to-enterprise'
+    stable_branch = base_branch
+    backport_branch = 'branch-'
+
+    backport_label_pattern = re.compile(r'backport/\d+\.\d+$')
+
+    g = Github(github_token)
+    repo = g.get_repo(repo_name)
+    closed_prs = []
+    start_commit = None
+
+    if args.commits:
+        start_commit, end_commit = args.commits.split('..')
+        commits = repo.compare(start_commit, end_commit).commits
+        for commit in commits:
+            match = re.search(rf"Closes .*#([0-9]+)", commit.commit.message, re.IGNORECASE)
+            if match:
+                pr_number = int(match.group(1))
+                pr = repo.get_pull(pr_number)
+                closed_prs.append(pr)
+    if args.pull_request:
+        start_commit = args.head_commit
+        pr = repo.get_pull(args.pull_request)
+        closed_prs = [pr]
+
+    for pr in closed_prs:
+        labels = [label.name for label in pr.labels]
+        backport_labels = [label for label in labels if backport_label_pattern.match(label)]
+        if promoted_label not in labels:
+            print(f'no {promoted_label} label: {pr.number}')
+            continue
+        if not backport_labels:
+            print(f'no backport label: {pr.number}')
+            continue
+        commits = get_pr_commits(repo, pr, stable_branch, start_commit)
+        logging.info(f"Found PR #{pr.number} with commit {commits} and the following labels: {backport_labels}")
+        for backport_label in backport_labels:
+            version = backport_label.replace('backport/', '')
+            backport_base_branch = backport_label.replace('backport/', backport_branch)
+            backport(repo, pr, version, commits, backport_base_branch)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/scripts/label_promoted_commits.py
+++ b/.github/scripts/label_promoted_commits.py
@@ -16,13 +16,8 @@ def parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('--repository', type=str, required=True,
                        help='Github repository name (e.g., scylladb/scylladb)')
-    parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('
-                                                                               'newest commit).')
-    parser.add_argument('--commit_after_merge', type=str, required=True,
-                        help='Git commit ID to end labeling at (oldest '
-                             'commit, exclusive).')
-    parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '
-                                                                         'done')
+    parser.add_argument('--commits', type=str, required=True, help='Range of promoted commits.')
+    parser.add_argument('--label', type=str, default='promoted-to-master', help='Label to use')
    parser.add_argument('--ref', type=str, required=True, help='PR target branch')
    return parser.parse_args()

@@ -53,10 +48,11 @@ def main():
    target_branch = re.search(r'branch-(\d+\.\d+)', args.ref)
    g = Github(github_token)
    repo = g.get_repo(args.repository, lazy=False)
-    commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)
+    start_commit, end_commit = args.commits.split('..')
+    commits = repo.compare(start_commit, end_commit).commits
    processed_prs = set()
    # Print commit information
-    for commit in commits.commits:
+    for commit in commits:
        print(f'Commit sha is: {commit.sha}')
        match = pr_pattern.search(commit.commit.message)
        if match:
@@ -66,13 +62,13 @@ def main():
            if target_branch:
                pr = repo.get_pull(pr_number)
                branch_name = target_branch[1]
-                refs_pr = re.findall(r'Refs (?:#|https.*?)(\d+)', pr.body)
+                refs_pr = re.findall(r'Parent PR: (?:#|https.*?)(\d+)', pr.body)
                if refs_pr:
                    print(f'branch-{target_branch.group(1)}, pr number is: {pr_number}')
                    # 1. change the backport label of the parent PR to note that
-                    #    we've merge the corresponding backport PR
+                    #    we've merged the corresponding backport PR
                    # 2. close the backport PR and leave a comment on it to note
-                    #    that it has been merged with a certain git commit,
+                    #    that it has been merged with a certain git commit.
                    ref_pr_number = refs_pr[0]
                    mark_backport_done(repo, ref_pr_number, branch_name)
                    comment = f'Closed via {commit.sha}'
--- a/.github/workflows/add-label-when-promoted.yaml
+++ b/.github/workflows/add-label-when-promoted.yaml
@@ -5,9 +5,10 @@ on:
    branches:
      - master
      - branch-*.*
-
-env:
-  DEFAULT_BRANCH: 'master'
+      - enterprise
+    pull_request_target:
+      types: [labeled]
+      branches: [master, next, enterprise]

 jobs:
  check-commit:
@@ -20,17 +21,51 @@ jobs:
        env:
          GITHUB_CONTEXT: ${{ toJson(github) }}
        run: echo "$GITHUB_CONTEXT"
+      - name: Set Default Branch
+        id: set_branch
+        run: |
+          if [[ "${{ github.repository }}" == *enterprise* ]]; then
+            echo "DEFAULT_BRANCH=enterprise" >> $GITHUB_ENV
+          else
+            echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV
+          fi
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          repository: ${{ github.repository }}
          ref: ${{ env.DEFAULT_BRANCH }}
+          token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
          fetch-depth: 0  # Fetch all history for all tags and branches
-
+      - name: Set up Git identity
+        run: |
+          git config --global user.name "GitHub Action"
+          git config --global user.email "action@github.com"
+          git config --global merge.conflictstyle diff3
      - name: Install dependencies
-        run: sudo apt-get install -y python3-github
-
+        run: sudo apt-get install -y python3-github python3-git
      - name: Run python script
+        if: github.event_name == 'push'
        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --ref ${{ github.ref }}
+          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
+        run: python .github/scripts/label_promoted_commits.py  --commits ${{ github.event.before }}..${{ github.sha }} --repository ${{ github.repository }} --ref ${{ github.ref }}
+      - name: Run auto-backport.py when promotion completed
+        if: ${{ github.event_name == 'push' && github.ref == format('refs/heads/{0}', env.DEFAULT_BRANCH) }}
+        env:
+          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
+        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}
+      - name: Check if label starts with 'backport/' and contains digits
+        id: check_label
+        run: |
+          label_name="${{ github.event.label.name }}"
+          if [[ "$label_name" =~ ^backport/[0-9]+\.[0-9]+$ ]]; then
+            echo "Label matches backport/X.X pattern."
+            echo "backport_label=true" >> $GITHUB_OUTPUT
+          else
+            echo "Label does not match the required pattern."
+            echo "backport_label=false" >> $GITHUB_OUTPUT
+          fi
+      - name: Run auto-backport.py when label was added
+        if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.backport_label == 'true' && github.event.pull_request.state == 'closed' }}
+        env:
+          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
+        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }}
--- a/.github/workflows/make-pr-ready-for-review.yaml
+++ b/.github/workflows/make-pr-ready-for-review.yaml
@@ -0,0 +1,27 @@
+name: Mark PR as Ready When Conflicts Label is Removed
+
+on:
+  pull_request_target:
+    types:
+      - unlabeled
+
+env:
+  DEFAULT_BRANCH: 'master'
+
+jobs:
+  mark-ready:
+    if: github.event.label.name == 'conflicts'
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          repository: ${{ github.repository }}
+          ref: ${{ env.DEFAULT_BRANCH }}
+          token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
+          fetch-depth: 1
+      - name: Mark pull request as ready for review
+        run:  gh pr ready "${{ github.event.pull_request.number }}"
+        env:
+          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "seastar"]
 	path = seastar
-	url = ../seastar
+	url = ../scylla-seastar
 	ignore = dirty
 [submodule "swagger-ui"]
 	path = swagger-ui
--- a/4
+++ b/4
@@ -78,7 +78,7 @@ fi

 # Default scylla product/version tags
 PRODUCT=scylla
-VERSION=6.2.0-dev
+VERSION=6.2.4

 if test -f version
 then
@@ -104,7 +104,7 @@ else
 fi

 if [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
-	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" |cut -d . -f 3)
+	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" | rev | cut -d . -f 1 | rev)
 	if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
 		exit 0
 	fi
--- a/alternator/executor.cc
+++ b/alternator/executor.cc
@@ -2195,7 +2195,6 @@ future<executor::request_return_type> executor::batch_write_item(client_state& c
    mutation_builders.reserve(request_items.MemberCount());
    uint batch_size = 0;
    for (auto it = request_items.MemberBegin(); it != request_items.MemberEnd(); ++it) {
-        batch_size++;
        schema_ptr schema = get_table_from_batch_request(_proxy, it);
        tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
        std::unordered_set<primary_key, primary_key_hash, primary_key_equal> used_keys(
@@ -2216,6 +2215,7 @@ future<executor::request_return_type> executor::batch_write_item(client_state& c
                    co_return api_error::validation("Provided list of item keys contains duplicates");
                }
                used_keys.insert(std::move(mut_key));
+                batch_size++;
            } else if (r_name == "DeleteRequest") {
                const rjson::value& key = (r->value)["Key"];
                mutation_builders.emplace_back(schema, put_or_delete_item(
@@ -2226,6 +2226,7 @@ future<executor::request_return_type> executor::batch_write_item(client_state& c
                    co_return api_error::validation("Provided list of item keys contains duplicates");
                }
                used_keys.insert(std::move(mut_key));
+                batch_size++;
            } else {
                co_return api_error::validation(fmt::format("Unknown BatchWriteItem request type: {}", r_name));
            }
@@ -3483,7 +3484,7 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli
        }
    };
    std::vector<table_requests> requests;
-
+    uint batch_size = 0;
    for (auto it = request_items.MemberBegin(); it != request_items.MemberEnd(); ++it) {
        table_requests rs(get_table_from_batch_request(_proxy, it));
        tracing::add_table_name(trace_state, sstring(executor::KEYSPACE_NAME_PREFIX) + rs.schema->cf_name(), rs.schema->cf_name());
@@ -3497,6 +3498,7 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli
            rs.add(key);
            check_key(key, rs.schema);
        }
+        batch_size += rs.requests.size();
        requests.emplace_back(std::move(rs));
    }

@@ -3504,7 +3506,7 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli
        co_await verify_permission(client_state, tr.schema, auth::permission::SELECT);
    }

-    _stats.api_operations.batch_get_item_batch_total += requests.size();
+    _stats.api_operations.batch_get_item_batch_total += batch_size;
    // If we got here, all "requests" are valid, so let's start the
    // requests for the different partitions all in parallel.
    std::vector<future<std::vector<rjson::value>>> response_futures;
--- a/alternator/server.cc
+++ b/alternator/server.cc
@@ -216,8 +216,8 @@ protected:
        for (auto& ip : local_dc_nodes) {
            // Note that it's not enough for the node to be is_alive() - a
            // node joining the cluster is also "alive" but not responsive to
-            // requests. We need the node to be in normal state. See #19694.
-            if (_gossiper.is_normal(ip)) {
+            // requests. We alive *and* normal. See #19694, #21538.
+            if (_gossiper.is_alive(ip) && _gossiper.is_normal(ip)) {
                // Use the gossiped broadcast_rpc_address if available instead
                // of the internal IP address "ip". See discussion in #18711.
                rjson::push_back(results, rjson::from_string(_gossiper.get_rpc_address(ip)));
--- a/alternator/stats.cc
+++ b/alternator/stats.cc
@@ -29,8 +29,6 @@ stats::stats() : api_operations{} {
 						                        seastar::metrics::description("Latency summary of an operation via Alternator API"), [this]{return to_metrics_summary(api_operations.name.summary());})(op(CamelCaseName)).set_skip_when_empty(),
            OPERATION(batch_get_item, "BatchGetItem")
            OPERATION(batch_write_item, "BatchWriteItem")
-            OPERATION(batch_get_item_batch_total, "BatchGetItemSize")
-            OPERATION(batch_write_item_batch_total, "BatchWriteItemSize")
            OPERATION(create_backup, "CreateBackup")
            OPERATION(create_global_table, "CreateGlobalTable")
            OPERATION(create_table, "CreateTable")
@@ -98,6 +96,10 @@ stats::stats() : api_operations{} {
                    seastar::metrics::description("number of rows read and matched during filtering operations")),
            seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
                    seastar::metrics::description("number of rows read and dropped during filtering operations")),
+                    seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchWriteItem")},
+                            api_operations.batch_write_item_batch_total).set_skip_when_empty(),
+                    seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchGetItem")},
+                            api_operations.batch_get_item_batch_total).set_skip_when_empty(),
    });
 }

--- a/api/api-doc/task_manager.json
+++ b/api/api-doc/task_manager.json
@@ -218,6 +218,30 @@
               ]
            }
         ]
+      },
+      {
+         "path":"/task_manager/drain/{module}",
+         "operations":[
+            {
+               "method":"POST",
+               "summary":"Drain finished local tasks",
+               "type":"void",
+               "nickname":"drain_tasks",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[
+                  {
+                     "name":"module",
+                     "description":"The module to drain",
+                     "required":true,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"path"
+                  }
+               ]
+            }
+         ]
      }
   ],
   "models":{
--- a/api/raft.cc
+++ b/api/raft.cc
@@ -102,8 +102,8 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis

        if (!req->query_parameters.contains("group_id")) {
            // Read barrier on group 0 by default
-            co_await raft_gr.invoke_on(0, [timeout] (service::raft_group_registry& raft_gr) {
-                return raft_gr.group0_with_timeouts().read_barrier(nullptr, timeout);
+            co_await raft_gr.invoke_on(0, [timeout] (service::raft_group_registry& raft_gr) -> future<> {
+                co_await raft_gr.group0_with_timeouts().read_barrier(nullptr, timeout);
            });
            co_return json_void{};
        }
@@ -111,12 +111,12 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
        raft::group_id gid{utils::UUID{req->get_query_param("group_id")}};

        std::atomic<bool> found_srv{false};
-        co_await raft_gr.invoke_on_all([gid, timeout, &found_srv] (service::raft_group_registry& raft_gr) {
+        co_await raft_gr.invoke_on_all([gid, timeout, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {
            if (!raft_gr.find_server(gid)) {
-                return make_ready_future<>();
+                co_return;
            }
            found_srv = true;
-            return raft_gr.get_server_with_timeouts(gid).read_barrier(nullptr, timeout);
+            co_await raft_gr.get_server_with_timeouts(gid).read_barrier(nullptr, timeout);
        });

        if (!found_srv) {
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -898,7 +898,8 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
        auto host_id = validate_host_id(req->get_query_param("host_id"));
        std::vector<sstring> ignore_nodes_strs = utils::split_comma_separated_list(req->get_query_param("ignore_nodes"));
        apilog.info("remove_node: host_id={} ignore_nodes={}", host_id, ignore_nodes_strs);
-        auto ignore_nodes = std::list<locator::host_id_or_endpoint>();
+        locator::host_id_or_endpoint_list ignore_nodes;
+        ignore_nodes.reserve(ignore_nodes_strs.size());
        for (const sstring& n : ignore_nodes_strs) {
            try {
                auto hoep = locator::host_id_or_endpoint(n);
--- a/api/task_manager.cc
+++ b/api/task_manager.cc
@@ -218,6 +218,32 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
        uint32_t ttl = cfg.task_ttl_seconds();
        co_return json::json_return_type(ttl);
    });
+
+    tm::drain_tasks.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
+        co_await tm.invoke_on_all([&req] (tasks::task_manager& tm) -> future<> {
+            tasks::task_manager::module_ptr module;
+            try {
+                module = tm.find_module(req->get_path_param("module"));
+            } catch (...) {
+                throw bad_param_exception(fmt::format("{}", std::current_exception()));
+            }
+
+            const auto& local_tasks = module->get_local_tasks();
+            std::vector<tasks::task_id> ids;
+            ids.reserve(local_tasks.size());
+            std::transform(begin(local_tasks), end(local_tasks), std::back_inserter(ids), [] (const auto& task) {
+                return task.second->is_complete() ? task.first : tasks::task_id::create_null_id();
+            });
+
+            for (auto&& id : ids) {
+                if (id) {
+                    module->unregister_task(id);
+                }
+                co_await maybe_yield();
+            }
+        });
+        co_return json_void();
+    });
 }

 void unset_task_manager(http_context& ctx, routes& r) {
@@ -229,6 +255,7 @@ void unset_task_manager(http_context& ctx, routes& r) {
    tm::get_task_status_recursively.unset(r);
    tm::get_and_update_ttl.unset(r);
    tm::get_ttl.unset(r);
+    tm::drain_tasks.unset(r);
 }

 }
--- a/api/token_metadata.cc
+++ b/api/token_metadata.cc
@@ -71,7 +71,7 @@ void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_to

    ss::get_host_id_map.set(r, [&tm](const_req req) {
        std::vector<ss::mapper> res;
-        return map_to_key_value(tm.local().get()->get_endpoint_to_host_id_map_for_reading(), res);
+        return map_to_key_value(tm.local().get()->get_endpoint_to_host_id_map(), res);
    });

    static auto host_or_broadcast = [&tm](const_req req) {
--- a/auth/certificate_authenticator.cc
+++ b/auth/certificate_authenticator.cc
@@ -76,7 +76,7 @@ auth::certificate_authenticator::certificate_authenticator(cql3::query_processor
                    continue;
                } catch (std::out_of_range&) {
                    // just fallthrough
-                } catch (std::regex_error&) {
+                } catch (boost::regex_error&) {
                    std::throw_with_nested(std::invalid_argument(fmt::format("Invalid query expression: {}", map.at(cfg_query_attr))));
                }
            }
--- a/auth/maintenance_socket_role_manager.cc
+++ b/auth/maintenance_socket_role_manager.cc
@@ -43,6 +43,10 @@ future<> maintenance_socket_role_manager::stop() {
    return make_ready_future<>();
 }

+future<> maintenance_socket_role_manager::ensure_superuser_is_created() {
+    return make_ready_future<>();
+}
+
 template<typename T = void>
 future<T> operation_not_supported_exception(std::string_view operation) {
    return make_exception_future<T>(
--- a/auth/maintenance_socket_role_manager.hh
+++ b/auth/maintenance_socket_role_manager.hh
@@ -39,6 +39,8 @@ public:

    virtual future<> stop() override;

+    virtual future<> ensure_superuser_is_created() override;
+
    virtual future<> create(std::string_view role_name, const role_config&, ::service::group0_batch&) override;

    virtual future<> drop(std::string_view role_name, ::service::group0_batch& mc) override;
--- a/auth/role_manager.hh
+++ b/auth/role_manager.hh
@@ -106,6 +106,13 @@ public:

    virtual future<> stop() = 0;

+    ///
+    /// Ensure that superuser role exists.
+    ///
+    /// \returns a future once it is ensured that the superuser role exists.
+    ///
+    virtual future<> ensure_superuser_is_created() = 0;
+
    ///
    /// \returns an exceptional future with \ref role_already_exists for a role that has previously been created.
    ///
--- a/auth/service.cc
+++ b/auth/service.cc
@@ -257,6 +257,10 @@ future<> service::stop() {
    });
 }

+future<> service::ensure_superuser_is_created() {
+    return _role_manager->ensure_superuser_is_created();
+}
+
 void service::update_cache_config() {
    auto db = _qp.db();

--- a/auth/service.hh
+++ b/auth/service.hh
@@ -131,6 +131,8 @@ public:

    future<> stop();

+    future<> ensure_superuser_is_created();
+
    void update_cache_config();

    void reset_authorization_cache();
--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -241,35 +241,39 @@ future<> standard_role_manager::migrate_legacy_metadata() {
 }

 future<> standard_role_manager::start() {
-    return once_among_shards([this] {
-        return futurize_invoke([this] () {
-            if (legacy_mode(_qp)) {
-                return create_legacy_metadata_tables_if_missing();
-            }
-            return make_ready_future<>();
-        }).then([this] {
-            _stopped = auth::do_after_system_ready(_as, [this] {
-                return seastar::async([this] {
-                    if (legacy_mode(_qp)) {
-                        _migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as).get();
+    return once_among_shards([this] () -> future<> {
+        if (legacy_mode(_qp)) {
+            co_await create_legacy_metadata_tables_if_missing();
+        }

-                        if (any_nondefault_role_row_satisfies(_qp, &has_can_login).get()) {
-                            if (legacy_metadata_exists()) {
-                                log.warn("Ignoring legacy user metadata since nondefault roles already exist.");
-                            }
+        auto handler = [this] () -> future<> {
+            const bool legacy = legacy_mode(_qp);
+            if (legacy) {
+                if (!_superuser_created_promise.available()) {
+                    _superuser_created_promise.set_value();
+                }
+                co_await _migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as);

-                            return;
-                        }
-
-                        if (legacy_metadata_exists()) {
-                            migrate_legacy_metadata().get();
-                            return;
-                        }
+                if (co_await any_nondefault_role_row_satisfies(_qp, &has_can_login)) {
+                    if (legacy_metadata_exists()) {
+                        log.warn("Ignoring legacy user metadata since nondefault roles already exist.");
                    }
-                    create_default_role_if_missing().get();
-                });
-            });
-        });
+                    co_return;
+                }
+
+                if (legacy_metadata_exists()) {
+                    co_await migrate_legacy_metadata();
+                    co_return;
+                }
+            }
+            co_await create_default_role_if_missing();
+            if (!legacy) {
+                _superuser_created_promise.set_value();
+            }
+        };
+
+        _stopped = auth::do_after_system_ready(_as, handler);
+        co_return;
    });
 }

@@ -278,6 +282,11 @@ future<> standard_role_manager::stop() {
    return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});;
 }

+future<> standard_role_manager::ensure_superuser_is_created() {
+    SCYLLA_ASSERT(this_shard_id() == 0);
+    return _superuser_created_promise.get_shared_future();
+}
+
 future<> standard_role_manager::create_or_replace(std::string_view role_name, const role_config& c, ::service::group0_batch& mc) {
    const sstring query = seastar::format("INSERT INTO {}.{} ({}, is_superuser, can_login) VALUES (?, ?, ?)",
            get_auth_ks_name(_qp),
--- a/auth/standard_role_manager.hh
+++ b/auth/standard_role_manager.hh
@@ -37,6 +37,7 @@ class standard_role_manager final : public role_manager {
    future<> _stopped;
    abort_source _as;
    std::string _superuser;
+    shared_promise<> _superuser_created_promise;

 public:
    standard_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&);
@@ -49,6 +50,8 @@ public:

    virtual future<> stop() override;

+    virtual future<> ensure_superuser_is_created() override;
+
    virtual future<> create(std::string_view role_name, const role_config&, ::service::group0_batch&) override;

    virtual future<> drop(std::string_view role_name, ::service::group0_batch& mc) override;
--- a/cache_mutation_reader.hh
+++ b/cache_mutation_reader.hh
@@ -122,6 +122,9 @@ class cache_mutation_reader final : public mutation_reader::impl {
    gc_clock::time_point _read_time;
    gc_clock::time_point _gc_before;

+    api::timestamp_type _max_purgeable_timestamp = api::missing_timestamp;
+    api::timestamp_type _max_purgeable_timestamp_shadowable = api::missing_timestamp;
+
    future<> do_fill_buffer();
    future<> ensure_underlying();
    void copy_from_cache_to_buffer();
@@ -200,12 +203,17 @@ class cache_mutation_reader final : public mutation_reader::impl {
    gc_clock::time_point get_gc_before(const schema& schema, dht::decorated_key dk, const gc_clock::time_point query_time) {
        auto gc_state = _read_context.tombstone_gc_state();
        if (gc_state) {
-            return gc_state->get_gc_before_for_key(schema.shared_from_this(), dk, query_time);
+            return gc_state->with_commitlog_check_disabled().get_gc_before_for_key(schema.shared_from_this(), dk, query_time);
        }

        return gc_clock::time_point::min();
    }

+    bool can_gc(tombstone t, is_shadowable is) const {
+        const auto max_purgeable = is ? _max_purgeable_timestamp_shadowable : _max_purgeable_timestamp;
+        return t.timestamp < max_purgeable;
+    }
+
 public:
    cache_mutation_reader(schema_ptr s,
                               dht::decorated_key dk,
@@ -227,8 +235,19 @@ public:
        , _read_time(get_read_time())
        , _gc_before(get_gc_before(*_schema, dk, _read_time))
    {
-        clogger.trace("csm {}: table={}.{}, reversed={}, snap={}", fmt::ptr(this), _schema->ks_name(), _schema->cf_name(), _read_context.is_reversed(),
-                      fmt::ptr(&*_snp));
+        _max_purgeable_timestamp = ctx.get_max_purgeable(dk, is_shadowable::no);
+        _max_purgeable_timestamp_shadowable = ctx.get_max_purgeable(dk, is_shadowable::yes);
+
+        clogger.trace("csm {}: table={}.{}, dk={}, gc-before={}, max-purgeable-regular={}, max-purgeable-shadowable={}, reversed={}, snap={}",
+                fmt::ptr(this),
+                _schema->ks_name(),
+                _schema->cf_name(),
+                dk,
+                _gc_before,
+                _max_purgeable_timestamp,
+                _max_purgeable_timestamp_shadowable,
+                _read_context.is_reversed(),
+                fmt::ptr(&*_snp));
        push_mutation_fragment(*_schema, _permit, partition_start(std::move(dk), _snp->partition_tombstone()));
    }
    cache_mutation_reader(schema_ptr s,
@@ -786,12 +805,12 @@ void cache_mutation_reader::copy_from_cache_to_buffer() {
            t.apply(range_tomb);

            auto row_tomb_expired = [&](row_tombstone tomb) {
-                return (tomb && tomb.max_deletion_time() < _gc_before);
+                return (tomb && tomb.max_deletion_time() < _gc_before && can_gc(tomb.tomb(), tomb.is_shadowable()));
            };

            auto is_row_dead = [&](const deletable_row& row) {
                auto& m = row.marker();
-                return (!m.is_missing() && m.is_dead(_read_time) && m.deletion_time() < _gc_before);
+                return (!m.is_missing() && m.is_dead(_read_time) && m.deletion_time() < _gc_before && can_gc(tombstone(m.timestamp(), m.deletion_time()), is_shadowable::no));
            };

            if (row_tomb_expired(t) || is_row_dead(row)) {
@@ -799,9 +818,11 @@ void cache_mutation_reader::copy_from_cache_to_buffer() {

                _read_context.cache()._tracker.on_row_compacted();

+                auto mutation_can_gc = can_gc_fn([this] (tombstone t, is_shadowable is) { return can_gc(t, is); });
+
                with_allocator(_snp->region().allocator(), [&] {
                    deletable_row row_copy(row_schema, row);
-                    row_copy.compact_and_expire(row_schema, t.tomb(), _read_time, always_gc, _gc_before, nullptr);
+                    row_copy.compact_and_expire(row_schema, t.tomb(), _read_time, mutation_can_gc, _gc_before, nullptr);
                    std::swap(row, row_copy);
                });
                remove_row = row.empty();
--- a/cdc/generation.cc
+++ b/cdc/generation.cc
@@ -364,6 +364,9 @@ cdc::topology_description make_new_generation_description(
        const noncopyable_function<std::pair<size_t, uint8_t>(dht::token)>& get_sharding_info,
        const locator::token_metadata_ptr tmptr) {
    const auto tokens = get_tokens(bootstrap_tokens, tmptr);
+    if (tokens.empty()) {
+        on_internal_error(cdc_log, "Attempted to create a CDC generation from an empty list of tokens");
+    }

    utils::chunked_vector<token_range_description> vnode_descriptions;
    vnode_descriptions.reserve(tokens.size());
@@ -1111,7 +1114,9 @@ future<bool> generation_service::legacy_do_handle_cdc_generation(cdc::generation
    auto sys_dist_ks = get_sys_dist_ks();
    auto gen = co_await retrieve_generation_data(gen_id, _sys_ks.local(), *sys_dist_ks, { _token_metadata.get()->count_normal_token_owners() });
    if (!gen) {
-        throw std::runtime_error(fmt::format(
+        // This may happen during raft upgrade when a node gossips about a generation that
+        // was propagated through raft and we didn't apply it yet.
+        throw generation_handling_nonfatal_exception(fmt::format(
            "Could not find CDC generation {} in distributed system tables (current time: {}),"
            " even though some node gossiped about it.",
            gen_id, db_clock::now()));
--- a/cdc/metadata.cc
+++ b/cdc/metadata.cc
@@ -186,7 +186,7 @@ bool cdc::metadata::prepare(db_clock::time_point tp) {
    }

    auto ts = to_ts(tp);
-    auto emplaced = _gens.emplace(to_ts(tp), std::nullopt).second;
+    auto [it, emplaced] = _gens.emplace(to_ts(tp), std::nullopt);

    if (_last_stream_timestamp != api::missing_timestamp) {
        auto last_correct_gen = gen_used_at(_last_stream_timestamp);
@@ -201,5 +201,5 @@ bool cdc::metadata::prepare(db_clock::time_point tp) {
        }
    }

-    return emplaced;
+    return !it->second;
 }
--- a/compaction/compaction.cc
+++ b/compaction/compaction.cc
@@ -226,7 +226,8 @@ static api::timestamp_type get_max_purgeable_timestamp(const table_state& table_
 }

 static std::vector<shared_sstable> get_uncompacting_sstables(const table_state& table_s, std::vector<shared_sstable> sstables) {
-    auto all_sstables = boost::copy_range<std::vector<shared_sstable>>(*table_s.main_sstable_set().all());
+    auto sstable_set = table_s.sstable_set_for_tombstone_gc();
+    auto all_sstables = boost::copy_range<std::vector<shared_sstable>>(*sstable_set->all());
    auto& compacted_undeleted = table_s.compacted_undeleted_sstables();
    all_sstables.insert(all_sstables.end(), compacted_undeleted.begin(), compacted_undeleted.end());
    boost::sort(all_sstables, [] (const shared_sstable& x, const shared_sstable& y) {
--- a/compaction/compaction_manager.cc
+++ b/compaction/compaction_manager.cc
@@ -188,7 +188,7 @@ unsigned compaction_manager::current_compaction_fan_in_threshold() const {
        return 0;
    }
    auto largest_fan_in = std::ranges::max(_tasks | boost::adaptors::transformed([] (auto& task) {
-        return task->compaction_running() ? task->compaction_data().compaction_fan_in : 0;
+        return task.compaction_running() ? task.compaction_data().compaction_fan_in : 0;
    }));
    // conservatively limit fan-in threshold to 32, such that tons of small sstables won't accumulate if
    // running major on a leveled table, which can even have more than one thousand files.
@@ -388,11 +388,26 @@ future<sstables::compaction_result> compaction_task_executor::compact_sstables_a

    co_return res;
 }
+
+future<sstables::sstable_set> compaction_task_executor::sstable_set_for_tombstone_gc(table_state& t) {
+    auto compound_set = t.sstable_set_for_tombstone_gc();
+    // Compound set will be linearized into a single set, since compaction might add or remove sstables
+    // to it for incremental compaction to work.
+    auto new_set = sstables::make_partitioned_sstable_set(t.schema(), false);
+    co_await compound_set->for_each_sstable_gently([&] (const sstables::shared_sstable& sst) {
+        auto inserted = new_set.insert(sst);
+        if (!inserted) {
+            on_internal_error(cmlog, format("Unable to insert SSTable {} into set used for tombstone GC", sst->get_filename()));
+        }
+    });
+    co_return std::move(new_set);
+}
+
 future<sstables::compaction_result> compaction_task_executor::compact_sstables(sstables::compaction_descriptor descriptor, sstables::compaction_data& cdata, on_replacement& on_replace, compaction_manager::can_purge_tombstones can_purge,
                                                                               sstables::offstrategy offstrategy) {
    table_state& t = *_compacting_table;
    if (can_purge) {
-        descriptor.enable_garbage_collection(t.main_sstable_set());
+        descriptor.enable_garbage_collection(co_await sstable_set_for_tombstone_gc(t));
    }
    descriptor.creator = [&t] (shard_id dummy) {
        auto sst = t.make_sstable();
@@ -580,9 +595,9 @@ requires (compaction_manager& cm, throw_if_stopping do_throw_if_stopping, Args&&
 }
 future<compaction_manager::compaction_stats_opt> compaction_manager::perform_compaction(throw_if_stopping do_throw_if_stopping, tasks::task_info parent_info, Args&&... args) {
    auto task_executor = seastar::make_shared<TaskExecutor>(*this, do_throw_if_stopping, std::forward<Args>(args)...);
-    _tasks.push_back(task_executor);
-    auto unregister_task = defer([this, task_executor] {
-        _tasks.remove(task_executor);
+    _tasks.push_back(*task_executor);
+    auto unregister_task = defer([task_executor] {
+        task_executor->unlink();
        task_executor->switch_state(compaction_task_executor::state::none);
    });

@@ -884,10 +899,10 @@ public:
    explicit strategy_control(compaction_manager& cm) noexcept : _cm(cm) {}

    bool has_ongoing_compaction(table_state& table_s) const noexcept override {
-        return std::any_of(_cm._tasks.begin(), _cm._tasks.end(), [&s = table_s.schema()] (const shared_ptr<compaction_task_executor>& task) {
-            return task->compaction_running()
-                && task->compacting_table()->schema()->ks_name() == s->ks_name()
-                && task->compacting_table()->schema()->cf_name() == s->cf_name();
+        return std::any_of(_cm._tasks.begin(), _cm._tasks.end(), [&s = table_s.schema()] (const compaction_task_executor& task) {
+            return task.compaction_running()
+                && task.compacting_table()->schema()->ks_name() == s->ks_name()
+                && task.compacting_table()->schema()->cf_name() == s->cf_name();
        });
    }

@@ -1051,7 +1066,7 @@ void compaction_manager::postpone_compaction_for_table(table_state* t) {
    _postponed.insert(t);
 }

-future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_executor>> tasks, sstring reason) {
+future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_executor>> tasks, sstring reason) noexcept {
    // To prevent compaction from being postponed while tasks are being stopped,
    // let's stop all tasks before the deferring point below.
    for (auto& t : tasks) {
@@ -1059,14 +1074,16 @@ future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_e
        t->stop_compaction(reason);
    }
    co_await coroutine::parallel_for_each(tasks, [] (auto& task) -> future<> {
+        auto unlink_task = deferred_action([task] { task->unlink(); });
        try {
            co_await task->compaction_done();
        } catch (sstables::compaction_stopped_exception&) {
            // swallow stop exception if a given procedure decides to propagate it to the caller,
            // as it happens with reshard and reshape.
        } catch (...) {
+            // just log any other errors as the callers have nothing to do with them.
            cmlog.debug("Stopping {}: task returned error: {}", *task, std::current_exception());
-            throw;
+            co_return;
        }
        cmlog.debug("Stopping {}: done", *task);
    });
@@ -1075,9 +1092,12 @@ future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_e
 future<> compaction_manager::stop_ongoing_compactions(sstring reason, table_state* t, std::optional<sstables::compaction_type> type_opt) noexcept {
    try {
        auto ongoing_compactions = get_compactions(t).size();
-        auto tasks = boost::copy_range<std::vector<shared_ptr<compaction_task_executor>>>(_tasks | boost::adaptors::filtered([t, type_opt] (auto& task) {
-            return (!t || task->compacting_table() == t) && (!type_opt || task->compaction_type() == *type_opt);
-        }));
+        auto tasks = _tasks
+                | std::views::filter([t, type_opt] (const auto& task) {
+                    return (!t || task.compacting_table() == t) && (!type_opt || task.compaction_type() == *type_opt);
+                })
+                | std::views::transform([] (auto& task) { return task.shared_from_this(); })
+                | std::ranges::to<std::vector<shared_ptr<compaction_task_executor>>>();
        logging::log_level level = tasks.empty() ? log_level::debug : log_level::info;
        if (cmlog.is_enabled(level)) {
            std::string scope = "";
@@ -1091,15 +1111,19 @@ future<> compaction_manager::stop_ongoing_compactions(sstring reason, table_stat
        }
        return stop_tasks(std::move(tasks), std::move(reason));
    } catch (...) {
-        return current_exception_as_future<>();
+        cmlog.error("Stopping ongoing compactions failed: {}.  Ignored", std::current_exception());
    }
+    return make_ready_future();
 }

 future<> compaction_manager::drain() {
    cmlog.info("Asked to drain");
    if (*_early_abort_subscription) {
        _state = state::disabled;
+        _compaction_submission_timer.cancel();
        co_await stop_ongoing_compactions("drain");
+        // Trigger a signal to properly exit from postponed_compactions_reevaluation() fiber
+        reevaluate_postponed_compactions();
    }
    cmlog.info("Drained");
 }
@@ -1109,17 +1133,17 @@ future<> compaction_manager::stop() {
    if (auto cm = std::exchange(_task_manager_module, nullptr)) {
        co_await cm->stop();
    }
-    if (_state != state::none) {
-        co_return co_await std::move(*_stop_future);
+    if (_stop_future) {
+        co_await std::exchange(*_stop_future, make_ready_future());
    }
 }

-future<> compaction_manager::really_do_stop() {
+future<> compaction_manager::really_do_stop() noexcept {
    cmlog.info("Asked to stop");
    // Reset the metrics registry
    _metrics.clear();
    co_await stop_ongoing_compactions("shutdown");
-    co_await coroutine::parallel_for_each(_compaction_state | boost::adaptors::map_values, [] (compaction_state& cs) -> future<> {
+    co_await coroutine::parallel_for_each(_compaction_state | std::views::values, [] (compaction_state& cs) -> future<> {
        if (!cs.gate.is_closed()) {
            co_await cs.gate.close();
        }
@@ -1618,6 +1642,9 @@ public:
                std::move(sstables), std::move(compacting), compaction_manager::can_purge_tombstones::yes)
            , _opt(options.as<sstables::compaction_type_options::split>())
    {
+        if (utils::get_local_injector().is_enabled("split_sstable_rewrite")) {
+            _do_throw_if_stopping = throw_if_stopping::yes;
+        }
    }

    static bool sstable_needs_split(const sstables::shared_sstable& sst, const sstables::compaction_type_options::split& opt) {
@@ -1633,13 +1660,12 @@ private:
    bool sstable_needs_split(const sstables::shared_sstable& sst) const {
        return sstable_needs_split(sst, _opt);
    }
-
 protected:
    sstables::compaction_descriptor make_descriptor(const sstables::shared_sstable& sst) const override {
        return make_descriptor(sst, _opt);
    }

-    future<sstables::compaction_result> rewrite_sstable(const sstables::shared_sstable sst) override {
+    future<sstables::compaction_result> do_rewrite_sstable(const sstables::shared_sstable sst) {
        if (sstable_needs_split(sst)) {
            return rewrite_sstables_compaction_task_executor::rewrite_sstable(std::move(sst));
        }
@@ -1652,6 +1678,20 @@ protected:
            return sstables::compaction_result{};
        });
    }
+
+    future<sstables::compaction_result> rewrite_sstable(const sstables::shared_sstable sst) override {
+        co_await utils::get_local_injector().inject("split_sstable_rewrite", [this] (auto& handler) -> future<> {
+            cmlog.info("split_sstable_rewrite: waiting");
+            while (!handler.poll_for_message() && !_compaction_data.is_stop_requested()) {
+                co_await sleep(std::chrono::milliseconds(5));
+            }
+            cmlog.info("split_sstable_rewrite: released");
+            if (_compaction_data.is_stop_requested()) {
+                throw make_compaction_stopped_exception();
+            }
+        }, false);
+        co_return co_await do_rewrite_sstable(std::move(sst));
+    }
 };

 }
@@ -1979,7 +2019,7 @@ future<> compaction_manager::perform_cleanup(owned_ranges_ptr sorted_owned_range
 future<> compaction_manager::try_perform_cleanup(owned_ranges_ptr sorted_owned_ranges, table_state& t, tasks::task_info info) {
    auto check_for_cleanup = [this, &t] {
        return boost::algorithm::any_of(_tasks, [&t] (auto& task) {
-            return task->compacting_table() == &t && task->compaction_type() == sstables::compaction_type::Cleanup;
+            return task.compacting_table() == &t && task.compaction_type() == sstables::compaction_type::Cleanup;
        });
    };
    if (check_for_cleanup()) {
@@ -2077,8 +2117,10 @@ compaction_manager::maybe_split_sstable(sstables::shared_sstable sst, table_stat
    }
    std::vector<sstables::shared_sstable> ret;

-    co_await run_custom_job(t, sstables::compaction_type::Split, "Split SSTable",
-                            [&] (sstables::compaction_data& info, sstables::compaction_progress_monitor& monitor) -> future<> {
+        // FIXME: indentation.
+        auto gate = get_compaction_state(&t).gate.hold();
+        sstables::compaction_progress_monitor monitor;
+        sstables::compaction_data info = create_compaction_data();
        sstables::compaction_descriptor desc = split_compaction_task_executor::make_descriptor(sst, opt);
        desc.creator = [&t] (shard_id _) {
            return t.make_sstable();
@@ -2089,7 +2131,6 @@ compaction_manager::maybe_split_sstable(sstables::shared_sstable sst, table_stat

        co_await sstables::compact_sstables(std::move(desc), info, t, monitor);
        co_await sst->unlink();
-    }, tasks::task_info{}, throw_if_stopping::yes);

    co_return ret;
 }
@@ -2159,11 +2200,11 @@ future<> compaction_manager::remove(table_state& t, sstring reason) noexcept {
    auto found = false;
    sstring msg;
    for (auto& task : _tasks) {
-        if (task->compacting_table() == &t) {
+        if (task.compacting_table() == &t) {
            if (!msg.empty()) {
                msg += "\n";
            }
-            msg += format("Found {} after remove", *task.get());
+            msg += format("Found {} after remove", task);
            found = true;
        }
    }
@@ -2174,30 +2215,38 @@ future<> compaction_manager::remove(table_state& t, sstring reason) noexcept {
 }

 const std::vector<sstables::compaction_info> compaction_manager::get_compactions(table_state* t) const {
-    auto to_info = [] (const shared_ptr<compaction_task_executor>& task) {
+    auto to_info = [] (const compaction_task_executor& task) {
        sstables::compaction_info ret;
-        ret.compaction_uuid = task->compaction_data().compaction_uuid;
-        ret.type = task->compaction_type();
-        ret.ks_name = task->compacting_table()->schema()->ks_name();
-        ret.cf_name = task->compacting_table()->schema()->cf_name();
-        ret.total_partitions = task->compaction_data().total_partitions;
-        ret.total_keys_written = task->compaction_data().total_keys_written;
+        ret.compaction_uuid = task.compaction_data().compaction_uuid;
+        ret.type = task.compaction_type();
+        ret.ks_name = task.compacting_table()->schema()->ks_name();
+        ret.cf_name = task.compacting_table()->schema()->cf_name();
+        ret.total_partitions = task.compaction_data().total_partitions;
+        ret.total_keys_written = task.compaction_data().total_keys_written;
        return ret;
    };
    using ret = std::vector<sstables::compaction_info>;
-    return boost::copy_range<ret>(_tasks | boost::adaptors::filtered([t] (const shared_ptr<compaction_task_executor>& task) {
-                return (!t || task->compacting_table() == t) && task->compaction_running();
+    return boost::copy_range<ret>(_tasks | boost::adaptors::filtered([t] (const compaction_task_executor& task) {
+                return (!t || task.compacting_table() == t) && task.compaction_running();
            }) | boost::adaptors::transformed(to_info));
 }

 bool compaction_manager::has_table_ongoing_compaction(const table_state& t) const {
-    return std::any_of(_tasks.begin(), _tasks.end(), [&t] (const shared_ptr<compaction_task_executor>& task) {
-        return task->compacting_table() == &t && task->compaction_running();
+    return std::any_of(_tasks.begin(), _tasks.end(), [&t] (const compaction_task_executor& task) {
+        return task.compacting_table() == &t && task.compaction_running();
    });
 };

 bool compaction_manager::compaction_disabled(table_state& t) const {
-    return _compaction_state.contains(&t) && _compaction_state.at(&t).compaction_disabled();
+    if (auto it = _compaction_state.find(&t); it != _compaction_state.end()) {
+        return it->second.compaction_disabled();
+    } else {
+        cmlog.debug("compaction_disabled: {}:{} not in compaction_state", t.schema()->id(), t.get_group_id());
+        // Compaction is not strictly disabled, but it is not enabled either.
+        // The callers actually care about if it's enabled or not, not about the actual state of
+        // compaction_state::compaction_disabled()
+        return true;
+    }
 }

 future<> compaction_manager::stop_compaction(sstring type, table_state* table) {
@@ -2222,8 +2271,8 @@ future<> compaction_manager::stop_compaction(sstring type, table_state* table) {
 void compaction_manager::propagate_replacement(table_state& t,
        const std::vector<sstables::shared_sstable>& removed, const std::vector<sstables::shared_sstable>& added) {
    for (auto& task : _tasks) {
-        if (task->compacting_table() == &t && task->compaction_running()) {
-            task->compaction_data().pending_replacements.push_back({ removed, added });
+        if (task.compacting_table() == &t && task.compaction_running()) {
+            task.compaction_data().pending_replacements.push_back({ removed, added });
        }
    }
 }
--- a/compaction/compaction_manager.hh
+++ b/compaction/compaction_manager.hh
@@ -94,8 +94,13 @@ public:

 private:
    shared_ptr<compaction::task_manager_module> _task_manager_module;
+
+    using compaction_task_executor_list_type = bi::list<
+            compaction_task_executor,
+            bi::base_hook<bi::list_base_hook<bi::link_mode<bi::auto_unlink>>>,
+            bi::constant_time_size<false>>;
    // compaction manager may have N fibers to allow parallel compaction per shard.
-    std::list<shared_ptr<compaction::compaction_task_executor>> _tasks;
+    compaction_task_executor_list_type _tasks;

    // Possible states in which the compaction manager can be found.
    //
@@ -179,7 +184,7 @@ private:
    }
    future<compaction_manager::compaction_stats_opt> perform_compaction(throw_if_stopping do_throw_if_stopping, tasks::task_info parent_info, Args&&... args);

-    future<> stop_tasks(std::vector<shared_ptr<compaction::compaction_task_executor>> tasks, sstring reason);
+    future<> stop_tasks(std::vector<shared_ptr<compaction::compaction_task_executor>> tasks, sstring reason) noexcept;
    future<> update_throughput(uint32_t value_mbs);

    // Return the largest fan-in of currently running compactions
@@ -245,7 +250,7 @@ private:

    // Stop all fibers, without waiting. Safe to be called multiple times.
    void do_stop() noexcept;
-    future<> really_do_stop();
+    future<> really_do_stop() noexcept;

    // Propagate replacement of sstables to all ongoing compaction of a given table
    void propagate_replacement(compaction::table_state& t, const std::vector<sstables::shared_sstable>& removed, const std::vector<sstables::shared_sstable>& added);
@@ -470,7 +475,9 @@ public:

 namespace compaction {

-class compaction_task_executor : public enable_shared_from_this<compaction_task_executor> {
+class compaction_task_executor
+    : public enable_shared_from_this<compaction_task_executor>
+    , public boost::intrusive::list_base_hook<boost::intrusive::link_mode<boost::intrusive::auto_unlink>> {
 public:
    enum class state {
        none,       // initial and final state
@@ -594,6 +601,8 @@ private:
    future<compaction_manager::compaction_stats_opt> compaction_done() noexcept {
        return _compaction_done.get_future();
    }
+
+    future<sstables::sstable_set> sstable_set_for_tombstone_gc(::compaction::table_state& t);
 public:
    bool stopping() const noexcept {
        return _compaction_data.abort.abort_requested();
@@ -614,7 +623,7 @@ public:
    friend future<compaction_manager::compaction_stats_opt> compaction_manager::perform_compaction(throw_if_stopping do_throw_if_stopping, tasks::task_info parent_info, Args&&... args);
    friend future<compaction_manager::compaction_stats_opt> compaction_manager::perform_task(shared_ptr<compaction_task_executor> task, throw_if_stopping do_throw_if_stopping);
    friend fmt::formatter<compaction_task_executor>;
-    friend future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_executor>> tasks, sstring reason);
+    friend future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_executor>> tasks, sstring reason) noexcept;
    friend sstables::test_env_compaction_manager;
 };

--- a/compaction/table_state.hh
+++ b/compaction/table_state.hh
@@ -39,6 +39,7 @@ public:
    virtual bool compaction_enforce_min_threshold() const noexcept = 0;
    virtual const sstables::sstable_set& main_sstable_set() const = 0;
    virtual const sstables::sstable_set& maintenance_sstable_set() const = 0;
+    virtual lw_shared_ptr<const sstables::sstable_set> sstable_set_for_tombstone_gc() const = 0;
    virtual std::unordered_set<sstables::shared_sstable> fully_expired_sstables(const std::vector<sstables::shared_sstable>& sstables, gc_clock::time_point compaction_time) const = 0;
    virtual const std::vector<sstables::shared_sstable>& compacted_undeleted_sstables() const noexcept = 0;
    virtual sstables::compaction_strategy& get_compaction_strategy() const noexcept = 0;
--- a/compaction/time_window_compaction_strategy.cc
+++ b/compaction/time_window_compaction_strategy.cc
@@ -296,7 +296,8 @@ time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> i
            // When trimming, let's keep sstables with overlapping time window, so as to reduce write amplification.
            // For example, if there are N sstables spanning window W, where N <= 32, then we can produce all data for W
            // in a single compaction round, removing the need to later compact W to reduce its number of files.
-            boost::partial_sort(multi_window, multi_window.begin() + max_sstables, [](const shared_sstable &a, const shared_sstable &b) {
+            auto sort_size = std::min(max_sstables, multi_window.size());
+            boost::partial_sort(multi_window, multi_window.begin() + sort_size, [](const shared_sstable &a, const shared_sstable &b) {
                return a->get_stats_metadata().max_timestamp < b->get_stats_metadata().max_timestamp;
            });
            maybe_trim_job(multi_window, job_size, disjoint);
--- a/compress.cc
+++ b/compress.cc
@@ -14,7 +14,9 @@
 #include "exceptions/exceptions.hh"
 #include "utils/class_registrator.hh"

-const sstring compressor::namespace_prefix = "org.apache.cassandra.io.compress.";
+sstring compressor::make_name(std::string_view short_name) {
+    return seastar::format("org.apache.cassandra.io.compress.{}", short_name);
+}

 class lz4_processor: public compressor {
 public:
@@ -66,7 +68,7 @@ compressor::ptr_type compressor::create(const sstring& name, const opt_getter& o
        return {};
    }

-    qualified_name qn(namespace_prefix, name);
+    qualified_name qn(make_name(""), name);

    for (auto& c : { lz4, snappy, deflate }) {
        if (c->name() == static_cast<const sstring&>(qn)) {
@@ -91,9 +93,9 @@ shared_ptr<compressor> compressor::create(const std::map<sstring, sstring>& opti
    return {};
 }

-thread_local const shared_ptr<compressor> compressor::lz4 = ::make_shared<lz4_processor>(namespace_prefix + "LZ4Compressor");
-thread_local const shared_ptr<compressor> compressor::snappy = ::make_shared<snappy_processor>(namespace_prefix + "SnappyCompressor");
-thread_local const shared_ptr<compressor> compressor::deflate = ::make_shared<deflate_processor>(namespace_prefix + "DeflateCompressor");
+thread_local const shared_ptr<compressor> compressor::lz4 = ::make_shared<lz4_processor>(make_name("LZ4Compressor"));
+thread_local const shared_ptr<compressor> compressor::snappy = ::make_shared<snappy_processor>(make_name("SnappyCompressor"));
+thread_local const shared_ptr<compressor> compressor::deflate = ::make_shared<deflate_processor>(make_name("DeflateCompressor"));

 const sstring compression_parameters::SSTABLE_COMPRESSION = "sstable_compression";
 const sstring compression_parameters::CHUNK_LENGTH_KB = "chunk_length_in_kb";
--- a/compress.hh
+++ b/compress.hh
@@ -69,7 +69,7 @@ public:
    static thread_local const ptr_type snappy;
    static thread_local const ptr_type deflate;

-    static const sstring namespace_prefix;
+    static sstring make_name(std::string_view short_name);
 };

 template<typename BaseType, typename... Args>
--- a/configure.py
+++ b/configure.py
@@ -1113,7 +1113,7 @@ scylla_core = (['message/messaging_service.cc',
                'utils/lister.cc',
                'repair/repair.cc',
                'repair/row_level.cc',
-                'repair/table_check.cc',
+                'streaming/table_check.cc',
                'exceptions/exceptions.cc',
                'auth/allow_all_authenticator.cc',
                'auth/allow_all_authorizer.cc',
@@ -1323,6 +1323,7 @@ scylla_tests_generic_dependencies = [
    'test/lib/test_utils.cc',
    'test/lib/tmpdir.cc',
    'test/lib/sstable_run_based_compaction_strategy_for_tests.cc',
+    'test/lib/eventually.cc',
 ]

 scylla_tests_dependencies = scylla_core + alternator + idls + scylla_tests_generic_dependencies + [
@@ -1363,6 +1364,7 @@ scylla_perfs = ['test/perf/perf_alternator.cc',
                'test/lib/key_utils.cc',
                'test/lib/random_schema.cc',
                'test/lib/data_model.cc',
+                'test/lib/eventually.cc',
                'seastar/tests/perf/linux_perf_event.cc']

 deps = {
@@ -1470,7 +1472,7 @@ deps['test/boost/bytes_ostream_test'] = [
    "test/lib/log.cc",
 ]
 deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
-deps['test/boost/UUID_test'] = ['utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
+deps['test/boost/UUID_test'] = ['clocks-impl.cc', 'utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
 deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
 deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
 deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
@@ -1501,11 +1503,11 @@ deps['test/boost/rust_test'] += ['rust/inc/src/lib.rs']

 deps['test/boost/group0_cmd_merge_test'] += ['test/lib/expr_test_utils.cc']

-deps['test/raft/replication_test'] = ['test/raft/replication_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
-deps['test/raft/raft_server_test'] = ['test/raft/raft_server_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
+deps['test/raft/replication_test'] = ['test/raft/replication_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc', 'test/lib/eventually.cc'] + scylla_raft_dependencies
+deps['test/raft/raft_server_test'] = ['test/raft/raft_server_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc', 'test/lib/eventually.cc'] + scylla_raft_dependencies
 deps['test/raft/randomized_nemesis_test'] = ['test/raft/randomized_nemesis_test.cc', 'direct_failure_detector/failure_detector.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
 deps['test/raft/failure_detector_test'] = ['test/raft/failure_detector_test.cc', 'direct_failure_detector/failure_detector.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
-deps['test/raft/many_test'] = ['test/raft/many_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc'] + scylla_raft_dependencies
+deps['test/raft/many_test'] = ['test/raft/many_test.cc', 'test/raft/replication.cc', 'test/raft/helpers.cc', 'test/lib/eventually.cc'] + scylla_raft_dependencies
 deps['test/raft/fsm_test'] =  ['test/raft/fsm_test.cc', 'test/raft/helpers.cc', 'test/lib/log.cc'] + scylla_raft_dependencies
 deps['test/raft/etcd_test'] =  ['test/raft/etcd_test.cc', 'test/raft/helpers.cc', 'test/lib/log.cc'] + scylla_raft_dependencies
 deps['test/raft/raft_sys_table_storage_test'] = ['test/raft/raft_sys_table_storage_test.cc'] + \
--- a/cql3/statements/alter_keyspace_statement.cc
+++ b/cql3/statements/alter_keyspace_statement.cc
@@ -11,6 +11,7 @@
 #include <boost/range/algorithm.hpp>
 #include <fmt/format.h>
 #include <seastar/core/coroutine.hh>
+#include <seastar/core/on_internal_error.hh>
 #include <stdexcept>
 #include "alter_keyspace_statement.hh"
 #include "prepared_statement.hh"
@@ -26,6 +27,8 @@
 #include "gms/feature_service.hh"
 #include "replica/database.hh"

+using namespace std::string_literals;
+
 static logging::logger mylogger("alter_keyspace");

 bool is_system_keyspace(std::string_view keyspace);
@@ -43,18 +46,16 @@ future<> cql3::statements::alter_keyspace_statement::check_access(query_processo
    return state.has_keyspace_access(_name, auth::permission::ALTER);
 }

-static bool validate_rf_difference(const std::string_view curr_rf, const std::string_view new_rf) {
-    auto to_number = [] (const std::string_view rf) {
-        int result;
-        // We assume the passed string view represents a valid decimal number,
-        // so we don't need the error code.
-        (void) std::from_chars(rf.begin(), rf.end(), result);
-        return result;
-    };
-
-    // We want to ensure that each DC's RF is going to change by at most 1
-    // because in that case the old and new quorums must overlap.
-    return std::abs(to_number(curr_rf) - to_number(new_rf)) <= 1;
+static unsigned get_abs_rf_diff(const std::string& curr_rf, const std::string& new_rf) {
+    try {
+        return std::abs(std::stoi(curr_rf) - std::stoi(new_rf));
+    } catch (std::invalid_argument const& ex) {
+        on_internal_error(mylogger, fmt::format("get_abs_rf_diff expects integer arguments, "
+                                                "but got curr_rf:{} and new_rf:{}", curr_rf, new_rf));
+    } catch (std::out_of_range const& ex) {
+        on_internal_error(mylogger, fmt::format("get_abs_rf_diff expects integer arguments to fit into `int` type, "
+                                                "but got curr_rf:{} and new_rf:{}", curr_rf, new_rf));
+    }
 }

 void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, const service::client_state& state) const {
@@ -84,11 +85,24 @@ void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, c
            auto new_ks = _attrs->as_ks_metadata_update(ks.metadata(), *qp.proxy().get_token_metadata_ptr(), qp.proxy().features());

            if (ks.get_replication_strategy().uses_tablets()) {
-                const std::map<sstring, sstring>& current_rfs = ks.metadata()->strategy_options();
-                for (const auto& [new_dc, new_rf] : _attrs->get_replication_options()) {
-                    auto it = current_rfs.find(new_dc);
-                    if (it != current_rfs.end() && !validate_rf_difference(it->second, new_rf)) {
-                        throw exceptions::invalid_request_exception("Cannot modify replication factor of any DC by more than 1 at a time.");
+                const std::map<sstring, sstring>& current_rf_per_dc = ks.metadata()->strategy_options();
+                auto new_rf_per_dc = _attrs->get_replication_options();
+                new_rf_per_dc.erase(ks_prop_defs::REPLICATION_STRATEGY_CLASS_KEY);
+                unsigned total_abs_rfs_diff = 0;
+                for (const auto& [new_dc, new_rf] : new_rf_per_dc) {
+                    sstring old_rf = "0";
+                    if (auto new_dc_in_current_mapping = current_rf_per_dc.find(new_dc);
+                             new_dc_in_current_mapping != current_rf_per_dc.end()) {
+                        old_rf = new_dc_in_current_mapping->second;
+                    } else if (!qp.proxy().get_token_metadata_ptr()->get_topology().get_datacenters().contains(new_dc)) {
+                        // This means that the DC listed in ALTER doesn't exist. This error will be reported later,
+                        // during validation in abstract_replication_strategy::validate_replication_strategy.
+                        // We can't report this error now, because it'd change the order of errors reported:
+                        // first we need to report non-existing DCs, then if RFs aren't changed by too much.
+                        continue;
+                    }
+                    if (total_abs_rfs_diff += get_abs_rf_diff(old_rf, new_rf); total_abs_rfs_diff >= 2) {
+                        throw exceptions::invalid_request_exception("Only one DC's RF can be changed at a time and not by more than 1");
                    }
                }
            }
@@ -118,6 +132,63 @@ bool cql3::statements::alter_keyspace_statement::changes_tablets(query_processor
    return ks.get_replication_strategy().uses_tablets() && !_attrs->get_replication_options().empty();
 }

+namespace {
+// These functions are used to flatten all the options in the keyspace definition into a single-level map<string, string>.
+// (Currently options are stored in a nested structure that looks more like a map<string, map<string, string>>).
+// Flattening is simply joining the keys of maps from both levels with a colon ':' character,
+// or in other words: prefixing the keys in the output map with the option type, e.g. 'replication', 'storage', etc.,
+// so that the output map contains entries like: "replication:dc1" -> "3".
+// This is done to avoid key conflicts and to be able to de-flatten the map back into the original structure.
+
+void add_prefixed_key(const sstring& prefix, const std::map<sstring, sstring>& in, std::map<sstring, sstring>& out) {
+    for (const auto& [in_key, in_value]: in) {
+        out[prefix + ":" + in_key] = in_value;
+    }
+};
+
+std::map<sstring, sstring> get_current_options_flattened(const shared_ptr<cql3::statements::ks_prop_defs>& ks,
+                                                         bool include_tablet_options,
+                                                         const gms::feature_service& feat) {
+    std::map<sstring, sstring> all_options;
+
+    add_prefixed_key(ks->KW_REPLICATION, ks->get_replication_options(), all_options);
+    add_prefixed_key(ks->KW_STORAGE, ks->get_storage_options().to_map(), all_options);
+    // if no tablet options are specified in ATLER KS statement,
+    // we want to preserve the old ones and hence cannot overwrite them with defaults
+    if (include_tablet_options) {
+        auto initial_tablets = ks->get_initial_tablets(std::nullopt);
+        add_prefixed_key(ks->KW_TABLETS,
+                         {{"enabled", initial_tablets ? "true" : "false"},
+                         {"initial", std::to_string(initial_tablets.value_or(0))}},
+                         all_options);
+    }
+    add_prefixed_key(ks->KW_DURABLE_WRITES,
+                     {{sstring(ks->KW_DURABLE_WRITES), to_sstring(ks->get_boolean(ks->KW_DURABLE_WRITES, true))}},
+                     all_options);
+
+    return all_options;
+}
+
+std::map<sstring, sstring> get_old_options_flattened(const data_dictionary::keyspace& ks, bool include_tablet_options) {
+    std::map<sstring, sstring> all_options;
+
+    using namespace cql3::statements;
+    add_prefixed_key(ks_prop_defs::KW_REPLICATION, ks.get_replication_strategy().get_config_options(), all_options);
+    add_prefixed_key(ks_prop_defs::KW_STORAGE, ks.metadata()->get_storage_options().to_map(), all_options);
+    if (include_tablet_options) {
+        add_prefixed_key(ks_prop_defs::KW_TABLETS,
+                         {{"enabled", ks.metadata()->initial_tablets() ? "true" : "false"},
+                          {"initial", std::to_string(ks.metadata()->initial_tablets().value_or(0))}},
+                         all_options);
+    }
+    add_prefixed_key(ks_prop_defs::KW_DURABLE_WRITES,
+                     {{sstring(ks_prop_defs::KW_DURABLE_WRITES), to_sstring(ks.metadata()->durable_writes())}},
+                     all_options);
+
+    return all_options;
+}
+} // <anonymous> namespace
+
 future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, cql3::cql_warnings_vec>>
 cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_processor& qp, service::query_state& state, const query_options& options, service::group0_batch& mc) const {
    using namespace cql_transport;
@@ -130,11 +201,37 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
        auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, tm, feat);
        std::vector<mutation> muts;
        std::vector<sstring> warnings;
-        auto ks_options = _attrs->get_all_options_flattened(feat);
+        bool include_tablet_options = _attrs->get_map(_attrs->KW_TABLETS).has_value();
+        auto old_ks_options = get_old_options_flattened(ks, include_tablet_options);
+        auto ks_options = get_current_options_flattened(_attrs, include_tablet_options, feat);
+        ks_options.merge(old_ks_options);
+
        auto ts = mc.write_timestamp();
        auto global_request_id = mc.new_group0_state_id();

+        // #22688 - filter out any dc*:0 entries - consider these
+        // null and void (removed). Migration planning will treat it
+        // as dc*=0 still.
+        std::erase_if(ks_options, [](const auto& i) {
+            static constexpr std::string replication_prefix = ks_prop_defs::KW_REPLICATION + ":"s;
+            // Flattened map, replication entries starts with "replication:".
+            // Only valid options are replication_factor, class and per-dc rf:s. We want to
+            // filter out any dcN=0 entries.
+            auto& [key, val] = i;
+            if (key.starts_with(replication_prefix) && val == "0") {
+                std::string_view v(key);
+                v.remove_prefix(replication_prefix.size());
+                return v != ks_prop_defs::REPLICATION_FACTOR_KEY 
+                    && v != ks_prop_defs::REPLICATION_STRATEGY_CLASS_KEY
+                    ;
+            }
+            return false;
+        });
+
        // we only want to run the tablets path if there are actually any tablets changes, not only schema changes
+        // TODO: the current `if (changes_tablets(qp))` is insufficient: someone may set the same RFs as before,
+        //       and we'll unnecessarily trigger the processing path for ALTER tablets KS,
+        //       when in reality nothing or only schema is being changed
        if (changes_tablets(qp)) {
            if (!qp.topology_global_queue_empty()) {
                return make_exception_future<std::tuple<::shared_ptr<::cql_transport::event::schema_change>, cql3::cql_warnings_vec>>(
--- a/cql3/statements/ks_prop_defs.cc
+++ b/cql3/statements/ks_prop_defs.cc
@@ -69,6 +69,16 @@ static std::map<sstring, sstring> prepare_options(
        }
    }

+    // #22688 / #20039 - check for illegal, empty options (after above expand)
+    // moved to here. We want to be able to remove dc:s once rf=0, 
+    // in which case, the options actually serialized in result mutations
+    // will in extreme cases in fact be empty -> cannot do this check in 
+    // verify_options. We only want to apply this constraint on the input
+    // provided by the user
+    if (options.empty() && !tm.get_topology().get_datacenters().empty()) {
+        throw exceptions::configuration_exception("Configuration for at least one datacenter must be present");
+    }
+
    return options;
 }

@@ -139,28 +149,22 @@ data_dictionary::storage_options ks_prop_defs::get_storage_options() const {
    return opts;
 }

-ks_prop_defs::init_tablets_options ks_prop_defs::get_initial_tablets(const sstring& strategy_class, bool enabled_by_default) const {
-    // FIXME -- this should be ignored somehow else
-    init_tablets_options ret{ .enabled = false, .specified_count = std::nullopt };
-    if (locator::abstract_replication_strategy::to_qualified_class_name(strategy_class) != "org.apache.cassandra.locator.NetworkTopologyStrategy") {
-        return ret;
-    }
-
+std::optional<unsigned> ks_prop_defs::get_initial_tablets(std::optional<unsigned> default_value) const {
    auto tablets_options = get_map(KW_TABLETS);
    if (!tablets_options) {
-        return enabled_by_default ? init_tablets_options{ .enabled = true } : ret;
+        return default_value;
    }

+    unsigned initial_count = 0;
    auto it = tablets_options->find("enabled");
    if (it != tablets_options->end()) {
        auto enabled = it->second;
        tablets_options->erase(it);

        if (enabled == "true") {
-            ret = init_tablets_options{ .enabled = true, .specified_count = 0 }; // even if 'initial' is not set, it'll start with auto-detection
+            // nothing
        } else if (enabled == "false") {
-            SCYLLA_ASSERT(!ret.enabled);
-            return ret;
+            return std::nullopt;
        } else {
            throw exceptions::configuration_exception(sstring("Tablets enabled value must be true or false; found: ") + enabled);
        }
@@ -169,7 +173,7 @@ ks_prop_defs::init_tablets_options ks_prop_defs::get_initial_tablets(const sstri
    it = tablets_options->find("initial");
    if (it != tablets_options->end()) {
        try {
-            ret = init_tablets_options{ .enabled = true, .specified_count = std::stol(it->second)};
+            initial_count = std::stol(it->second);
        } catch (...) {
            throw exceptions::configuration_exception(sstring("Initial tablets value should be numeric; found ") + it->second);
        }
@@ -180,7 +184,7 @@ ks_prop_defs::init_tablets_options ks_prop_defs::get_initial_tablets(const sstri
        throw exceptions::configuration_exception(sstring("Unrecognized tablets option ") + tablets_options->begin()->first);
    }

-    return ret;
+    return initial_count;
 }

 std::optional<sstring> ks_prop_defs::get_replication_strategy_class() const {
@@ -191,32 +195,13 @@ bool ks_prop_defs::get_durable_writes() const {
    return get_boolean(KW_DURABLE_WRITES, true);
 }

-std::map<sstring, sstring> ks_prop_defs::get_all_options_flattened(const gms::feature_service& feat) const {
-    std::map<sstring, sstring> all_options;
-
-    auto ingest_flattened_options = [&all_options](const std::map<sstring, sstring>& options, const sstring& prefix) {
-        for (auto& option: options) {
-            all_options[prefix + ":" + option.first] = option.second;
-        }
-    };
-    ingest_flattened_options(get_replication_options(), KW_REPLICATION);
-    ingest_flattened_options(get_storage_options().to_map(), KW_STORAGE);
-    ingest_flattened_options(get_map(KW_TABLETS).value_or(std::map<sstring, sstring>{}), KW_TABLETS);
-    ingest_flattened_options({{sstring(KW_DURABLE_WRITES), to_sstring(get_boolean(KW_DURABLE_WRITES, true))}}, KW_DURABLE_WRITES);
-
-    return all_options;
-}
-
 lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata(sstring ks_name, const locator::token_metadata& tm, const gms::feature_service& feat) {
    auto sc = get_replication_strategy_class().value();
-    auto initial_tablets = get_initial_tablets(sc, feat.tablets);
-    // if tablets options have not been specified, but tablets are globally enabled, set the value to 0
-    if (initial_tablets.enabled && !initial_tablets.specified_count) {
-        initial_tablets.specified_count = 0;
-    }
+    // if tablets options have not been specified, but tablets are globally enabled, set the value to 0 for N.T.S. only
+    auto initial_tablets = get_initial_tablets(feat.tablets && locator::abstract_replication_strategy::to_qualified_class_name(sc) == "org.apache.cassandra.locator.NetworkTopologyStrategy" ? std::optional<unsigned>(0) : std::nullopt);
    auto options = prepare_options(sc, tm, get_replication_options());
    return data_dictionary::keyspace_metadata::new_keyspace(ks_name, sc,
-            std::move(options), initial_tablets.specified_count, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
+            std::move(options), initial_tablets, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
 }

 lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata_update(lw_shared_ptr<data_dictionary::keyspace_metadata> old, const locator::token_metadata& tm, const gms::feature_service& feat) {
@@ -229,13 +214,9 @@ lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata_u
        sc = old->strategy_name();
        options = old_options;
    }
-    auto initial_tablets = get_initial_tablets(*sc, old->initial_tablets().has_value());
    // if tablets options have not been specified, inherit them if it's tablets-enabled KS
-    if (initial_tablets.enabled && !initial_tablets.specified_count) {
-        initial_tablets.specified_count = old->initial_tablets();
-    }
-
-    return data_dictionary::keyspace_metadata::new_keyspace(old->name(), *sc, options, initial_tablets.specified_count, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
+    auto initial_tablets = get_initial_tablets(old->initial_tablets());
+    return data_dictionary::keyspace_metadata::new_keyspace(old->name(), *sc, options, initial_tablets, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
 }


--- a/cql3/statements/ks_prop_defs.hh
+++ b/cql3/statements/ks_prop_defs.hh
@@ -49,21 +49,15 @@ public:
 private:
    std::optional<sstring> _strategy_class;
 public:
-    struct init_tablets_options {
-        bool enabled;
-        std::optional<unsigned> specified_count;
-    };
-
    ks_prop_defs() = default;
    explicit ks_prop_defs(std::map<sstring, sstring> options);

    void validate();
    std::map<sstring, sstring> get_replication_options() const;
    std::optional<sstring> get_replication_strategy_class() const;
-    init_tablets_options get_initial_tablets(const sstring& strategy_class, bool enabled_by_default) const;
+    std::optional<unsigned> get_initial_tablets(std::optional<unsigned> default_value) const;
    data_dictionary::storage_options get_storage_options() const;
    bool get_durable_writes() const;
-    std::map<sstring, sstring> get_all_options_flattened(const gms::feature_service& feat) const;
    lw_shared_ptr<data_dictionary::keyspace_metadata> as_ks_metadata(sstring ks_name, const locator::token_metadata&, const gms::feature_service&);
    lw_shared_ptr<data_dictionary::keyspace_metadata> as_ks_metadata_update(lw_shared_ptr<data_dictionary::keyspace_metadata> old, const locator::token_metadata&, const gms::feature_service&);
 };
--- a/cql3/statements/list_service_level_statement.cc
+++ b/cql3/statements/list_service_level_statement.cc
@@ -54,7 +54,7 @@ list_service_level_statement::execute(query_processor& qp,

    return make_ready_future().then([this, &state] () {
                                  if (_describe_all) {
-                                      return state.get_service_level_controller().get_distributed_service_levels();
+                                      return state.get_service_level_controller().get_distributed_service_levels(qos::query_context::user);
                                  } else {
                                      return state.get_service_level_controller().get_distributed_service_level(_service_level);
                                  }
--- a/cql3/statements/property_definitions.hh
+++ b/cql3/statements/property_definitions.hh
@@ -46,14 +46,14 @@ public:
 protected:
    std::optional<sstring> get_simple(const sstring& name) const;

-    std::optional<std::map<sstring, sstring>> get_map(const sstring& name) const;
-
    void remove_from_map_if_exists(const sstring& name, const sstring& key) const;
 public:
    bool has_property(const sstring& name) const;

    std::optional<value_type> get(const sstring& name) const;

+    std::optional<std::map<sstring, sstring>> get_map(const sstring& name) const;
+
    sstring get_string(sstring key, sstring default_value) const;

    // Return a property value, typed as a Boolean
--- a/db/commitlog/commitlog.cc
+++ b/db/commitlog/commitlog.cc
@@ -1132,7 +1132,12 @@ public:
            write(out, uint64_t(0));
        }

-        buf.remove_suffix(buf.size_bytes() - size);
+        auto to_remove = buf.size_bytes() - size;
+        // #20862 - we decrement usage counter based on buf.size() below.
+        // Since we are shrinking buffer here, we need to also decrement
+        // counter already
+        buf.remove_suffix(to_remove);
+        _segment_manager->totals.buffer_list_bytes -= to_remove;

        // Build sector checksums.
        auto id = net::hton(_desc.id);
@@ -1583,7 +1588,7 @@ future<> db::commitlog::segment_manager::oversized_allocation(entry_writer& writ

    std::vector<std::pair<sseg_ptr, uint64_t>> maybe_clear;

-    assert(_request_controller.available_units() <= ssize_t(max_request_controller_units()));
+    SCYLLA_ASSERT(_request_controller.available_units() <= ssize_t(max_request_controller_units()));
    auto fut = get_units(_request_controller, max_request_controller_units(), timeout);
    if (_request_controller.waiters()) {
        totals.requests_blocked_memory++;
@@ -1592,7 +1597,7 @@ future<> db::commitlog::segment_manager::oversized_allocation(entry_writer& writ
    scope_increment_counter allocating(totals.active_allocations);

    auto permit = co_await std::move(fut);
-    assert(_request_controller.available_units() == 0);
+    SCYLLA_ASSERT(_request_controller.available_units() == 0);

    decltype(permit) fake_permit; // can't have allocate+sync release semaphore.
    bool failed = false;
@@ -1853,10 +1858,13 @@ future<> db::commitlog::segment_manager::oversized_allocation(entry_writer& writ
            }
        }
    }
-    assert(_request_controller.available_units() == 0);
-
+    SCYLLA_ASSERT(_request_controller.available_units() == 0);
+    SCYLLA_ASSERT(permit.count() == max_request_controller_units());
+    auto nw = _request_controller.waiters();
    permit.return_all();
-    assert(_request_controller.available_units() == ssize_t(max_request_controller_units()));
+    // #20633 cannot guarantee controller avail is now full, since we could have had waiters when doing
+    // return all -> now will be less avail
+    SCYLLA_ASSERT(nw > 0 || _request_controller.available_units() == ssize_t(max_request_controller_units()));

    if (!failed) {
        clogger.trace("Oversized allocation succeeded.");
@@ -3826,6 +3834,10 @@ uint64_t db::commitlog::get_total_size() const {
        ;
 }

+uint64_t db::commitlog::get_buffer_size() const {
+    return _segment_manager->totals.buffer_list_bytes;
+}
+
 uint64_t db::commitlog::get_completed_tasks() const {
    return _segment_manager->totals.allocation_count;
 }
--- a/db/commitlog/commitlog.hh
+++ b/db/commitlog/commitlog.hh
@@ -306,6 +306,7 @@ public:
    future<> delete_segments(std::vector<sstring>) const;

    uint64_t get_total_size() const;
+    uint64_t get_buffer_size() const;
    uint64_t get_completed_tasks() const;
    uint64_t get_flush_count() const;
    uint64_t get_pending_tasks() const;
--- a/db/config.cc
+++ b/db/config.cc
@@ -27,6 +27,7 @@
 #include "cdc/cdc_extension.hh"
 #include "tombstone_gc_extension.hh"
 #include "db/per_partition_rate_limit_extension.hh"
+#include "db/paxos_grace_seconds_extension.hh"
 #include "db/tags/extension.hh"
 #include "config.hh"
 #include "extensions.hh"
@@ -1192,6 +1193,18 @@ void db::config::add_tombstone_gc_extension() {
    _extensions->add_schema_extension<tombstone_gc_extension>(tombstone_gc_extension::NAME);
 }

+void db::config::add_paxos_grace_seconds_extension() {
+    _extensions->add_schema_extension<db::paxos_grace_seconds_extension>(db::paxos_grace_seconds_extension::NAME);
+}
+
+void db::config::add_all_default_extensions() {
+    add_cdc_extension();
+    add_per_partition_rate_limit_extension();
+    add_tags_extension();
+    add_tombstone_gc_extension();
+    add_paxos_grace_seconds_extension();
+}
+
 void db::config::setup_directories() {
    maybe_in_workdir(commitlog_directory, "commitlog");
    if (!schema_commitlog_directory.is_set()) {
@@ -1526,18 +1539,19 @@ future<> update_relabel_config_from_file(const std::string& name) {
    co_return;
 }

-std::vector<sstring> split_comma_separated_list(sstring comma_separated_list) {
+std::vector<sstring> split_comma_separated_list(const std::string_view comma_separated_list) {
    std::vector<sstring> strs, trimmed_strs;
-    boost::split(strs, std::move(comma_separated_list), boost::is_any_of(","));
-    for (sstring n : strs) {
+    boost::split(strs, comma_separated_list, boost::is_any_of(","));
+    trimmed_strs.reserve(strs.size());
+    for (sstring& n : strs) {
        std::replace(n.begin(), n.end(), '\"', ' ');
        std::replace(n.begin(), n.end(), '\'', ' ');
        boost::trim_all(n);
        if (!n.empty()) {
-            trimmed_strs.push_back(n);
+            trimmed_strs.push_back(std::move(n));
        }
    }
    return trimmed_strs;
 }

-}
+} // namespace utils
--- a/db/config.hh
+++ b/db/config.hh
@@ -140,6 +140,9 @@ public:
    void add_per_partition_rate_limit_extension();
    void add_tags_extension();
    void add_tombstone_gc_extension();
+    void add_paxos_grace_seconds_extension();
+
+    void add_all_default_extensions();

    /// True iff the feature is enabled.
    bool check_experimental(experimental_features_t::feature f) const;
@@ -545,6 +548,6 @@ future<gms::inet_address> resolve(const config_file::named_value<sstring>&, gms:
 */
 future<> update_relabel_config_from_file(const std::string& name);

-std::vector<sstring> split_comma_separated_list(sstring comma_separated_list);
+std::vector<sstring> split_comma_separated_list(std::string_view comma_separated_list);

-}
+} // namespace utils
--- a/db/consistency_level.cc
+++ b/db/consistency_level.cc
@@ -36,7 +36,7 @@ size_t quorum_for(const locator::effective_replication_map& erm) {
 size_t local_quorum_for(const locator::effective_replication_map& erm, const sstring& dc) {
    using namespace locator;

-    auto& rs = erm.get_replication_strategy();
+    const auto& rs = erm.get_replication_strategy();

    if (rs.get_type() == replication_strategy_type::network_topology) {
        const network_topology_strategy* nrs =
@@ -65,7 +65,7 @@ size_t block_for_local_serial(const locator::effective_replication_map& erm) {
 size_t block_for_each_quorum(const locator::effective_replication_map& erm) {
    using namespace locator;

-    auto& rs = erm.get_replication_strategy();
+    const auto& rs = erm.get_replication_strategy();

    if (rs.get_type() == replication_strategy_type::network_topology) {
        const network_topology_strategy* nrs =
@@ -260,7 +260,7 @@ filter_for_query(consistency_level cl,
    size_t bf = block_for(erm, cl);

    if (read_repair == read_repair_decision::DC_LOCAL) {
-        bf = std::max(block_for(erm, cl), local_count);
+        bf = std::max(bf, local_count);
    }

    if (bf >= live_endpoints.size()) { // RRD.DC_LOCAL + CL.LOCAL or CL.ALL
--- a/db/hints/manager.hh
+++ b/db/hints/manager.hh
@@ -35,8 +35,6 @@
 #include <span>
 #include <unordered_map>

-class fragmented_temporary_buffer;
-
 namespace utils {
 class directories;
 } // namespace utils
--- a/db/system_distributed_keyspace.cc
+++ b/db/system_distributed_keyspace.cc
@@ -741,8 +741,8 @@ system_distributed_keyspace::get_cdc_desc_v1_timestamps(context ctx) {
    co_return res;
 }

-future<qos::service_levels_info> system_distributed_keyspace::get_service_levels() const {
-    return qos::get_service_levels(_qp, NAME, SERVICE_LEVELS, db::consistency_level::ONE);
+future<qos::service_levels_info> system_distributed_keyspace::get_service_levels(qos::query_context ctx) const {
+    return qos::get_service_levels(_qp, NAME, SERVICE_LEVELS, db::consistency_level::ONE, ctx);
 }

 future<qos::service_levels_info> system_distributed_keyspace::get_service_level(sstring service_level_name) const {
--- a/db/system_distributed_keyspace.hh
+++ b/db/system_distributed_keyspace.hh
@@ -112,7 +112,7 @@ public:

    future<db_clock::time_point> cdc_current_generation_timestamp(context);

-    future<qos::service_levels_info> get_service_levels() const;
+    future<qos::service_levels_info> get_service_levels(qos::query_context ctx) const;
    future<qos::service_levels_info> get_service_level(sstring service_level_name) const;
    future<> set_service_level(sstring service_level_name, qos::service_level_options slo) const;
    future<> drop_service_level(sstring service_level_name) const;
--- a/db/view/view.cc
+++ b/db/view/view.cc
@@ -2044,7 +2044,6 @@ future<> view_builder::start_in_background(service::migration_manager& mm, utils
        // the view build information.
        fail.cancel();
        co_await barrier.arrive_and_wait();
-        units.return_all();

        co_await calculate_shard_build_step(vbi);
        _mnotifier.register_listener(this);
@@ -2349,7 +2348,7 @@ static future<> announce_with_raft(

 future<> view_builder::mark_view_build_started(sstring ks_name, sstring view_name) {
    co_await write_view_build_status(
-        [&] () -> future<> {
+        [this, ks_name, view_name] () -> future<> {
            co_await utils::get_local_injector().inject("view_builder_pause_add_new_view",
                    [] (auto& handler) { return handler.wait_for_message(db::timeout_clock::now() + std::chrono::minutes(5)); });
            const sstring query_string = format("INSERT INTO {}.{} (keyspace_name, view_name, host_id, status) VALUES (?, ?, ?, ?)",
@@ -2359,7 +2358,7 @@ future<> view_builder::mark_view_build_started(sstring ks_name, sstring view_nam
                    {std::move(ks_name), std::move(view_name), host_id.uuid(), "STARTED"},
                    "view builder: mark view build STARTED");
        },
-        [&] () -> future<> {
+        [this, ks_name, view_name] () -> future<> {
            co_await utils::get_local_injector().inject("view_builder_pause_add_new_view",
                    [] (auto& handler) { return handler.wait_for_message(db::timeout_clock::now() + std::chrono::minutes(5)); });
            co_await _sys_dist_ks.start_view_build(std::move(ks_name), std::move(view_name));
@@ -2369,7 +2368,7 @@ future<> view_builder::mark_view_build_started(sstring ks_name, sstring view_nam

 future<> view_builder::mark_view_build_success(sstring ks_name, sstring view_name) {
    co_await write_view_build_status(
-        [&] () -> future<> {
+        [this, ks_name, view_name] () -> future<> {
            co_await utils::get_local_injector().inject("view_builder_pause_mark_success",
                    [] (auto& handler) { return handler.wait_for_message(db::timeout_clock::now() + std::chrono::minutes(5)); });
            const sstring query_string = format("UPDATE {}.{} SET status = ? WHERE keyspace_name = ? AND view_name = ? AND host_id = ?",
@@ -2379,7 +2378,7 @@ future<> view_builder::mark_view_build_success(sstring ks_name, sstring view_nam
                    {"SUCCESS", std::move(ks_name), std::move(view_name), host_id.uuid()},
                    "view builder: mark view build SUCCESS");
        },
-        [&] () -> future<> {
+        [this, ks_name, view_name] () -> future<> {
            co_await utils::get_local_injector().inject("view_builder_pause_mark_success",
                    [] (auto& handler) { return handler.wait_for_message(db::timeout_clock::now() + std::chrono::minutes(5)); });
            co_await _sys_dist_ks.finish_view_build(std::move(ks_name), std::move(view_name));
@@ -2389,14 +2388,14 @@ future<> view_builder::mark_view_build_success(sstring ks_name, sstring view_nam

 future<> view_builder::remove_view_build_status(sstring ks_name, sstring view_name) {
    co_await write_view_build_status(
-        [&] () -> future<> {
+        [this, ks_name, view_name] () -> future<> {
            const sstring query_string = format("DELETE FROM {}.{} WHERE keyspace_name = ? AND view_name = ?",
                    db::system_keyspace::NAME, db::system_keyspace::VIEW_BUILD_STATUS_V2);
            co_await announce_with_raft(_qp, _group0_client, _as, std::move(query_string),
                    {std::move(ks_name), std::move(view_name)},
                    "view builder: delete view build status");
        },
-        [&] () -> future<> {
+        [this, ks_name, view_name] () -> future<> {
            co_await _sys_dist_ks.remove_view(std::move(ks_name), std::move(view_name));
        }
    );
@@ -2444,11 +2443,11 @@ view_builder::view_build_statuses(sstring keyspace, sstring view_name) const {

 future<> view_builder::add_new_view(view_ptr view, build_step& step) {
    vlogger.info0("Building view {}.{}, starting at token {}", view->ks_name(), view->cf_name(), step.current_token());
+    if (this_shard_id() == 0) {
+        co_await mark_view_build_started(view->ks_name(), view->cf_name());
+    }
+    co_await _sys_ks.register_view_for_building(view->ks_name(), view->cf_name(), step.current_token());
    step.build_status.emplace(step.build_status.begin(), view_build_status{view, step.current_token(), std::nullopt});
-    auto f = this_shard_id() == 0 ? mark_view_build_started(view->ks_name(), view->cf_name()) : make_ready_future<>();
-    return when_all_succeed(
-            std::move(f),
-            _sys_ks.register_view_for_building(view->ks_name(), view->cf_name(), step.current_token())).discard_result();
 }

 static future<> flush_base(lw_shared_ptr<replica::column_family> base, abort_source& as) {
@@ -2990,6 +2989,12 @@ public:
                    _step.build_status.pop_back();
                }
            }
+
+            // before going back to the minimum token, advance current_key to the end
+            // and check for built views in that range.
+            _step.current_key = {_step.prange.end().value_or(dht::ring_position::max()).value().token(), partition_key::make_empty()};
+            check_for_built_views();
+
            _step.current_key = {dht::minimum_token(), partition_key::make_empty()};
            for (auto&& vs : _step.build_status) {
                vs.next_token = dht::minimum_token();
@@ -3154,16 +3159,16 @@ future<> view_builder::register_staging_sstable(sstables::shared_sstable sst, lw
    return _vug.register_staging_sstable(std::move(sst), std::move(table));
 }

-future<bool> check_needs_view_update_path(view_builder& vb, const locator::token_metadata& tm, const replica::table& t, streaming::stream_reason reason) {
+future<bool> check_needs_view_update_path(view_builder& vb, locator::token_metadata_ptr tmptr, const replica::table& t, streaming::stream_reason reason) {
    if (is_internal_keyspace(t.schema()->ks_name())) {
        return make_ready_future<bool>(false);
    }
    if (reason == streaming::stream_reason::repair && !t.views().empty()) {
        return make_ready_future<bool>(true);
    }
-    return do_with(t.views(), [&vb, &tm] (auto& views) {
+    return do_with(std::move(tmptr), t.views(), [&vb] (locator::token_metadata_ptr& tmptr, auto& views) {
        return map_reduce(views,
-                [&vb, &tm] (const view_ptr& view) { return vb.check_view_build_ongoing(tm, view->ks_name(), view->cf_name()); },
+                [&] (const view_ptr& view) { return vb.check_view_build_ongoing(*tmptr, view->ks_name(), view->cf_name()); },
                false,
                std::logical_or<bool>());
    });
--- a/db/view/view_update_checks.hh
+++ b/db/view/view_update_checks.hh
@@ -10,20 +10,17 @@

 #include <seastar/core/future.hh>
 #include "streaming/stream_reason.hh"
+#include "locator/token_metadata_fwd.hh"
 #include "seastarx.hh"

 namespace replica {
 class table;
 }

-namespace locator {
-class token_metadata;
-}
-
 namespace db::view {
 class view_builder;

-future<bool> check_needs_view_update_path(view_builder& vb, const locator::token_metadata& tm, const replica::table& t,
+future<bool> check_needs_view_update_path(view_builder& vb, locator::token_metadata_ptr tmptr, const replica::table& t,
        streaming::stream_reason reason);

 }
--- a/dist/common/scripts/scylla_coredump_setup
+++ b/dist/common/scripts/scylla_coredump_setup
@@ -40,6 +40,25 @@ if __name__ == '__main__':
                        help='enable compress on systemd-coredump')
    args = parser.parse_args()

+    # Seems like specific version of systemd pacakge on RHEL9 has a bug on
+    # SELinux configuration, it introduced "systemd-container-coredump" module
+    # to provide rule for systemd-coredump but not enabled by default.
+    # We have to manually load it, otherwise it causes permission errror.
+    # (#19325)
+    if is_redhat_variant() and distro.major_version() == '9':
+        if not shutil.which('getenforce'):
+            pkg_install('libselinux-utils')
+        if not shutil.which('semodule'):
+            pkg_install('policycoreutils')
+        enforce = out('getenforce')
+        if enforce != "Disabled":
+            if os.path.exists('/usr/share/selinux/packages/targeted/systemd-container-coredump.pp.bz2'):
+                modules = out('semodule -l')
+                match = re.match(r'^systemd-container-coredump$', modules, re.MULTILINE)
+                if not match:
+                    run('semodule -v -i /usr/share/selinux/packages/targeted/systemd-container-coredump.pp.bz2', shell=True, check=True)
+                    run('semodule -v -e systemd-container-coredump', shell=True, check=True)
+
    # abrt-ccpp.service needs to stop before enabling systemd-coredump,
    # since both will try to install kernel coredump handler
    # (This will only requires for abrt < 2.14)
--- a/dist/common/scripts/scylla_raid_setup
+++ b/dist/common/scripts/scylla_raid_setup
@@ -16,6 +16,7 @@ import sys
 import stat
 import logging
 import pyudev
+import psutil
 from pathlib import Path
 from scylla_util import *
 from subprocess import run, SubprocessError
@@ -92,6 +93,15 @@ class UdevInfo:
    def id_links(self):
        return [l for l in self.device.device_links if l.startswith('/dev/disk/by-id')]

+
+def is_selinux_enabled():
+    partitions = psutil.disk_partitions(all=True)
+    for p in partitions:
+        if p.fstype == 'selinuxfs':
+            if os.path.exists(p.mountpoint + '/enforce'):
+                return True
+    return False
+
 if __name__ == '__main__':
    if os.getuid() > 0:
        print('Requires root permission.')
@@ -333,3 +343,43 @@ WantedBy=local-fs.target
        LOGGER.error(f'Error detected, dumping udev env parameters on {fsdev}')
        udev_info.verify()
        udev_info.dump_variables()
+
+    if is_redhat_variant():
+        offline_skip_relabel = False
+        has_semanage = True
+        if not shutil.which('matchpathcon'):
+            offline_skip_relabel = True
+            pkg_install('libselinux-utils', offline_exit=False)
+        if not shutil.which('restorecon'):
+            offline_skip_relabel = True
+            pkg_install('policycoreutils', offline_exit=False)
+        if not shutil.which('semanage'):
+            if is_offline():
+                has_semanage = False
+            else:
+                pkg_install('policycoreutils-python-utils')
+        if is_offline() and offline_skip_relabel:
+            print('Unable to find SELinux tools, skip relabeling.')
+            sys.exit(0)
+
+        selinux_context = out('matchpathcon -n /var/lib/systemd/coredump')
+        selinux_type = selinux_context.split(':')[2]
+        if has_semanage:
+            run(f'semanage fcontext -a -t {selinux_type} "{root}/coredump(/.*)?"', shell=True, check=True)
+        else:
+            # without semanage, we need to update file_contexts directly,
+            # and compile it to binary format (.bin file)
+            try:
+                with open('/etc/selinux/targeted/contexts/files/file_contexts.local', 'a') as f:
+                    spacer = ''
+                    if f.tell() != 0:
+                        spacer = '\n'
+                    f.write(f'{spacer}{root}/coredump(/.*)?   {selinux_context}\n')
+            except FileNotFoundError as e:
+                print('Unable to find SELinux policy files, skip relabeling.')
+                sys.exit(0)
+            run('sefcontext_compile /etc/selinux/targeted/contexts/files/file_contexts.local', shell=True, check=True)
+        if is_selinux_enabled():
+            run(f'restorecon -F -v -R {root}', shell=True, check=True)
+        else:
+            Path('/.autorelabel').touch(exist_ok=True)
--- a/dist/common/scripts/scylla_util.py
+++ b/dist/common/scripts/scylla_util.py
@@ -293,13 +293,14 @@ def swap_exists():
    swaps = out('swapon --noheadings --raw')
    return True if swaps != '' else False

-def pkg_error_exit(pkg):
+def pkg_error_exit(pkg, offline_exit=True):
    print(f'Package "{pkg}" required.')
-    sys.exit(1)
+    if offline_exit:
+        sys.exit(1)

-def yum_install(pkg):
+def yum_install(pkg, offline_exit=True):
    if is_offline():
-        pkg_error_exit(pkg)
+        pkg_error_exit(pkg, offline_exit)
    return run(f'yum install -y {pkg}', shell=True, check=True)

 def apt_is_updated():
@@ -313,9 +314,9 @@ def apt_is_updated():

 APT_GET_UPDATE_NUM_RETRY = 30
 APT_GET_UPDATE_RETRY_INTERVAL = 10
-def apt_install(pkg):
+def apt_install(pkg, offline_exit=True):
    if is_offline():
-        pkg_error_exit(pkg)
+        pkg_error_exit(pkg, offline_exit)

    # The lock for update and install/remove are different, and
    # DPkg::Lock::Timeout will only wait for install/remove lock.
@@ -344,14 +345,14 @@ def apt_install(pkg):
    apt_env['DEBIAN_FRONTEND'] = 'noninteractive'
    return run(f'apt-get -o DPkg::Lock::Timeout=300 install -y {pkg}', shell=True, check=True, env=apt_env)

-def emerge_install(pkg):
+def emerge_install(pkg, offline_exit=True):
    if is_offline():
-        pkg_error_exit(pkg)
+        pkg_error_exit(pkg, offline_exit)
    return run(f'emerge -uq {pkg}', shell=True, check=True)

-def zypper_install(pkg):
+def zypper_install(pkg, offline_exit=True):
    if is_offline():
-        pkg_error_exit(pkg)
+        pkg_error_exit(pkg, offline_exit)
    return run(f'zypper install -y {pkg}', shell=True, check=True)

 def pkg_distro():
@@ -364,18 +365,20 @@ def pkg_distro():
    else:
        return distro.id()

-pkg_xlat = {'cpupowerutils': {'debian': 'linux-cpupower', 'gentoo':'sys-power/cpupower', 'arch':'cpupower', 'suse': 'cpupower'}}
-def pkg_install(pkg):
+pkg_xlat = {'cpupowerutils': {'debian': 'linux-cpupower', 'gentoo':'sys-power/cpupower', 'arch':'cpupower', 'suse': 'cpupower'},
+            'policycoreutils-python-utils': {'amzn2': 'policycoreutils-python'}}
+
+def pkg_install(pkg, offline_exit=True):
    if pkg in pkg_xlat and pkg_distro() in pkg_xlat[pkg]:
        pkg = pkg_xlat[pkg][pkg_distro()]
    if is_redhat_variant():
-        return yum_install(pkg)
+        return yum_install(pkg, offline_exit)
    elif is_debian_variant():
-        return apt_install(pkg)
+        return apt_install(pkg, offline_exit)
    elif is_gentoo():
-        return emerge_install(pkg)
+        return emerge_install(pkg, offline_exit)
    elif is_suse_variant():
-        return zypper_install(pkg)
+        return zypper_install(pkg, offline_exit)
    else:
        pkg_error_exit(pkg)

--- a/dist/common/sysconfig/scylla-node-exporter
+++ b/dist/common/sysconfig/scylla-node-exporter
@@ -1 +1 @@
-SCYLLA_NODE_EXPORTER_ARGS="--collector.interrupts"
+SCYLLA_NODE_EXPORTER_ARGS="--collector.interrupts --no-collector.hwmon"
--- a/dist/common/systemd/scylla-server.service
+++ b/dist/common/systemd/scylla-server.service
@@ -20,7 +20,6 @@ ExecStart=/usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE $CPUSET $MEM_CONF
 ExecStopPost=+/opt/scylladb/scripts/scylla_stop
 TimeoutStartSec=1y
 TimeoutStopSec=900
-KillMode=process
 Restart=on-abnormal
 User=scylla
 OOMScoreAdjust=-950
--- a/docs/_templates/db_config.tmpl
+++ b/docs/_templates/db_config.tmpl
@@ -2,7 +2,7 @@

 {% for group in data %}
 {% if group.value_status_count[value_status] > 0 %}
-.. _confgroup_{{ group.name }}:
+.. _confgroup_{{ group.name|lower|replace(" ", "_") }}:

 {{ group.name }}
 {{ '-' * (group.name|length) }}
@@ -13,7 +13,7 @@

 {% for item in group.properties %}
 {% if item.value_status == value_status %}
-.. _confprop_{{ item.name }}:
+.. _confprop_{{ item.name|lower|replace(" ", "_") }}:

 .. confval:: {{ item.name }}
 {% endif %}
--- a/docs/_utils/redirects.yaml
+++ b/docs/_utils/redirects.yaml
@@ -1,12 +1,13 @@
 ### a dictionary of redirections
 #old path: new path

-
-
-# Move up the Features section
 # THESE REDIRECTIOSN SHOULD BE UNCOMMENTED WHEN 6.2 IS RELEASED
 # Before 6.2 documentation is available, these redirections result in 404

+#/stable/troubleshooting/nodetool-memory-read-timeout.html: /stable/troubleshooting/index.html
+
+# Move up the Features section
+
 #/stable/using-scylla/features.html: /stable/features/index.html
 #/stable/using-scylla/lwt.html: /stable/features/lwt.html
 #/stable/using-scylla/secondary-indexes.html: /stable/features/secondary-indexes.html
--- a/docs/architecture/index.rst
+++ b/docs/architecture/index.rst
@@ -12,6 +12,7 @@ ScyllaDB Architecture
   SSTable <sstable/index/>
   Compaction Strategies <compaction/compaction-strategies>
   Raft Consensus Algorithm in ScyllaDB </architecture/raft>
+   Zero-token Nodes </architecture/zero-token-nodes>
   
              
 * :doc:`Data Distribution with Tablets </architecture/tablets/>` - Tablets in ScyllaDB
@@ -22,5 +23,6 @@ ScyllaDB Architecture
 * :doc:`SSTable </architecture/sstable/index/>` - ScyllaDB SSTable 2.0 and 3.0 Format Information
 * :doc:`Compaction Strategies </architecture/compaction/compaction-strategies>` - High-level analysis of different compaction strategies
 * :doc:`Raft Consensus Algorithm in ScyllaDB </architecture/raft>` - Overview of how Raft is implemented in ScyllaDB.
+* :doc:`Zero-token Nodes </architecture/zero-token-nodes>` - Nodes that do not replicate any data.

 Learn more about these topics in the `ScyllaDB University: Architecture lesson <https://university.scylladb.com/courses/scylla-essentials-overview/lessons/architecture/>`_.
--- a/docs/architecture/zero-token-nodes.rst
+++ b/docs/architecture/zero-token-nodes.rst
@@ -0,0 +1,23 @@
+=========================
+Zero-token Nodes
+=========================
+
+By default, all nodes in a cluster own a set of token ranges and are used to
+replicate data. In certain circumstances, you may choose to add a node that
+doesn't own any token. Such nodes are referred to as zero-token nodes. They
+do not have a copy of the data but only participate in Raft quorum voting.
+
+To configure a zero-token node, set the ``join_ring`` parameter to ``false``.
+
+You can use zero-token nodes in multi-DC deployments to reduce the risk of
+losing a quorum of nodes.
+See :doc:`Preventing Quorum Loss in Symmetrical Multi-DC Clusters </operating-scylla/procedures/cluster-management/arbiter-dc>` for details.
+
+Note that:
+
+* Zero-token nodes are ignored by drivers, so there is no need to change
+  the load balancing policy on the clients after adding zero-token nodes
+  to the cluster.
+* Zero-token nodes never store replicated data, so running ``nodetool rebuild``,
+  ``nodetool repair``, and ``nodetool cleanup`` can be skipped as it does not
+  affect zero-token nodes.
--- a/docs/dev/docker-hub.md
+++ b/docs/dev/docker-hub.md
@@ -50,6 +50,13 @@ Which yields, for `/proc/sys/fs/aio-max-nr`:
 $ docker run --name some-scylla --hostname some-scylla -d scylladb/scylla
 ```

+If you're on macOS and plan to start a multi-node cluster (3 nodes or more), start ScyllaDB with
+`–reactor-backend=epoll` to override the default `linux-aio` reactor backend:
+
+```console
+$ docker run --name some-scylla --hostname some-scylla -d scylladb/scylla --reactor-backend=epoll
+```
+
 ### Run `nodetool` utility

 ```console
@@ -77,6 +84,11 @@ cqlsh>
 ```console
 $ docker run --name some-scylla2  --hostname some-scylla2 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"
 ```
+If you're on macOS, ensure to add the `–reactor-backend=epoll` option when adding new nodes:
+
+```console
+$ docker run --name some-scylla2  --hostname some-scylla2 -d scylladb/scylla --reactor-backend=epoll --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"
+```

 #### Make a cluster with Docker Compose

@@ -344,90 +356,6 @@ The `--authenticator` command lines option allows to provide the authenticator c

 The `--authorizer` command lines option allows to provide the authorizer class ScyllaDB will use. By default ScyllaDB uses the `AllowAllAuthorizer` which allows any action to any user. The second option is using the `CassandraAuthorizer` parameter, which stores permissions in `system.permissions` table.

-**Since: 2.3**
-
-### JMX parameters
-
-JMX ScyllaDB service is initialized from the `/scylla-jmx-service.sh` on
-container startup. By default the script uses `/etc/sysconfig/scylla-jmx`
-to read the default configuration. It then can be overridden by setting
-environmental parameters.
-
-An example:
-
-    docker run -d -e "SCYLLA_JMX_ADDR=-ja 0.0.0.0" -e SCYLLA_JMX_REMOTE=-r --publish 7199:7199 scylladb/scylla
-
-#### SCYLLA_JMX_PORT
-
-Scylla JMX listening port.
-
-Default value:
-
-    SCYLLA_JMX_PORT="-jp 7199"
-
-#### SCYLLA_API_PORT
-
-Scylla API port for JMX to connect to.
-
-Default value:
-
-    SCYLLA_API_PORT="-p 10000"
-
-#### SCYLLA_API_ADDR
-
-Scylla API address for JMX to connect to.
-
-Default value:
-
-    SCYLLA_API_ADDR="-a localhost"
-
-#### SCYLLA_JMX_ADDR
-
-JMX address to bind on.
-
-Default value:
-
-    SCYLLA_JMX_ADDR="-ja localhost"
-
-For example, it is possible to make JMX available to the outer world
-by changing its bind address to `0.0.0.0`:
-
-    docker run -d -e "SCYLLA_JMX_ADDR=-ja 0.0.0.0" -e SCYLLA_JMX_REMOTE=-r --publish 7199:7199 scylladb/scylla
-
-`cassandra-stress` requires direct access to the JMX.
-
-#### SCYLLA_JMX_FILE
-
-A JMX service configuration file path.
-
-Example value:
-
-    SCYLLA_JMX_FILE="-cf /etc/scylla.d/scylla-user.cfg"
-
-#### SCYLLA_JMX_LOCAL
-
-The location of the JMX executable.
-
-Example value:
-
-    SCYLLA_JMX_LOCAL="-l /opt/scylladb/jmx
-
-#### SCYLLA_JMX_REMOTE
-
-Allow JMX to run remotely.
-
-Example value:
-
-    SCYLLA_JMX_REMOTE="-r"
-
-#### SCYLLA_JMX_DEBUG
-
-Enable debugger.
-
-Example value:
-
-    SCYLLA_JMX_DEBUG="-d"
-
 ### Related Links

 * [Best practices for running ScyllaDB on docker](http://docs.scylladb.com/procedures/best_practices_scylla_on_docker/)
--- a/docs/dev/task_manager.md
+++ b/docs/dev/task_manager.md
@@ -62,16 +62,18 @@ Briefly:
 - `/task_manager/list_module_tasks/{module}` -
        lists (by default non-internal) tasks in the module;
 - `/task_manager/task_status/{task_id}` -
-        gets the task's status, unregisters the task if it's finished;
+        gets the task's status;
 - `/task_manager/abort_task/{task_id}` -
        aborts the task if it's abortable;
 - `/task_manager/wait_task/{task_id}` -
        waits for the task and gets its status;
 - `/task_manager/task_status_recursive/{task_id}` -
        gets statuses of the task and all its descendants in BFS
-        order, unregisters the task;
+        order;
 - `/task_manager/ttl` -
        gets or sets new ttl.
+- `/task_manager/drain/{module}` -
+        unregisters all finished local tasks in the module.

 # Virtual tasks

--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -194,7 +194,7 @@ Alternatively, you can explicitly install **all** the ScyllaDB packages for the

 .. code-block:: console

-   sudo apt-get install scylla-enterprise{,-server,-jmx,-tools,-tools-core,-kernel-conf,-node-exporter,-conf,-python3}=2021.1.0-0.20210511.9e8e7d58b-1
+   sudo apt-get install scylla-enterprise{,-server,-tools,-tools-core,-kernel-conf,-node-exporter,-conf,-python3}=2021.1.0-0.20210511.9e8e7d58b-1
   sudo apt-get install scylla-enterprise-machine-image=2021.1.0-0.20210511.9e8e7d58b-1  # only execute on AMI instance


--- a/docs/features/index.rst
+++ b/docs/features/index.rst
@@ -1,8 +1,11 @@
 Features
 ========================

+This document highlights ScyllaDB's key data modeling features.
+
 .. toctree::
   :maxdepth: 1
+   :hidden:

   Lightweight Transactions </features/lwt/>
   Global Secondary Indexes </features/secondary-indexes/>
@@ -12,6 +15,23 @@ Features
   Change Data Capture </features/cdc/index>
   Workload Attributes </features/workload-attributes>

-`ScyllaDB Enterprise <https://enterprise.docs.scylladb.com/stable/overview.html#enterprise-only-features>`_ 
-provides additional features, including Encryption at Rest, 
-workload prioritization, auditing, and more.
+.. panel-box::
+  :title: ScyllaDB Features
+  :id: "getting-started"
+  :class: my-panel
+   
+  * Secondary Indexes and Materialized Views provide efficient search mechanisms
+    on non-partition keys by creating an index.
+
+    * :doc:`Global Secondary Indexes </features/secondary-indexes/>`
+    * :doc:`Local Secondary Indexes </features/local-secondary-indexes/>`
+    * :doc:`Materialized Views </features/materialized-views/>`
+
+  * :doc:`Lightweight Transactions </features/lwt/>` provide conditional updates
+    through linearizability.
+  * :doc:`Counters </features/counters/>` are columns that only allow their values
+    to be incremented, decremented, read, or deleted.
+  * :doc:`Change Data Capture </features/cdc/index>` allows you to query the current
+    state and the history of all changes made to tables in the database.
+  * :doc:`Workload Attributes </features/workload-attributes>` assigned to your workloads
+    specify how ScyllaDB will handle requests depending on the workload.
--- a/docs/getting-started/_common/os-support-info.rst
+++ b/docs/getting-started/_common/os-support-info.rst
@@ -1,14 +1,14 @@
 You can `build ScyllaDB from source <https://github.com/scylladb/scylladb#build-prerequisites>`_ on other x86_64 or aarch64 platforms, without any guarantees.

 +----------------------------+--------------------+-------+---------------+
-| Linux Distributions        |Ubuntu              | Debian| Rocky /       |
-|                            |                    |       | RHEL          |
+| Linux Distributions        |Ubuntu              | Debian|Rocky / CentOS |
+|                            |                    |       |/ RHEL         |
 +----------------------------+------+------+------+-------+-------+-------+
 | ScyllaDB Version / Version |20.04 |22.04 |24.04 |  11   |   8   |   9   |
 +============================+======+======+======+=======+=======+=======+
-|   6.1                      | |v|  | |v|  | |v|  | |v|   | |v|   | |v|   |
+|   6.2                      | |v|  | |v|  | |v|  | |v|   | |v|   | |v|   |
 +----------------------------+------+------+------+-------+-------+-------+
-|   6.0                      | |v|  | |v|  | |v|  | |v|   | |v|   | |v|   |
+|   6.1                      | |v|  | |v|  | |v|  | |v|   | |v|   | |v|   |
 +----------------------------+------+------+------+-------+-------+-------+

 * The recommended OS for ScyllaDB Open Source is Ubuntu 22.04.
@@ -18,4 +18,4 @@ Supported Architecture
 -----------------------------

 ScyllaDB Open Source supports x86_64 for all versions and AArch64 starting from ScyllaDB 4.6 and nightly build. 
-In particular, aarch64 support includes AWS EC2 Graviton.
+In particular, aarch64 support includes AWS EC2 Graviton.
--- a/docs/getting-started/cloud-instance-recommendations.rst
+++ b/docs/getting-started/cloud-instance-recommendations.rst
@@ -175,7 +175,7 @@ Recommended instances types are `n1-highmem <https://cloud.google.com/compute/do
   * - n2-highmem-32
     - 32
     - 256
-     - 6,000
+     - 9,000
   * - n2-highmem-48
     - 48
     - 384
--- a/docs/getting-started/install-scylla/install-on-linux.rst
+++ b/docs/getting-started/install-scylla/install-on-linux.rst
@@ -46,7 +46,7 @@ Install ScyllaDB

            .. code-block:: console
    
-               sudo gpg --homedir /tmp --no-default-keyring --keyring /etc/apt/keyrings/scylladb.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 491c93b9de7496a7
+               sudo gpg --homedir /tmp --no-default-keyring --keyring /etc/apt/keyrings/scylladb.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys a43e06657bac99e3

            .. code-block:: console
               :substitutions:
--- a/docs/getting-started/install-scylla/launch-on-aws.rst
+++ b/docs/getting-started/install-scylla/launch-on-aws.rst
@@ -78,7 +78,7 @@ Launching Instances from ScyllaDB AMI
   * The ``scylla.yaml`` file: ``/etc/scylla/scylla.yaml``
   * Data: ``/var/lib/scylla/``

-   To check that the ScyllaDB server and the JMX component are running, run:
+   To check that the ScyllaDB server is running, run:

   .. code-block:: console
    
--- a/docs/getting-started/install-scylla/launch-on-azure.rst
+++ b/docs/getting-started/install-scylla/launch-on-azure.rst
@@ -77,7 +77,7 @@ Launching ScyllaDB on Azure
        
        ssh -i ~/.ssh/ssh-key.pem scyllaadm@public-ip
 
-   To check that the ScyllaDB server and the JMX component are running, run:
+   To check that the ScyllaDB server is running, run:

   .. code-block:: console
      
--- a/docs/getting-started/install-scylla/launch-on-gcp.rst
+++ b/docs/getting-started/install-scylla/launch-on-gcp.rst
@@ -63,7 +63,7 @@ Launching ScyllaDB on GCP
        
        gcloud compute ssh scylla-node1
   
-   To check that the ScyllaDB server and the JMX component are running, run:
+   To check that the ScyllaDB server is running, run:

     .. code-block:: console
      
--- a/docs/getting-started/installation-common/unified-installer.rst
+++ b/docs/getting-started/installation-common/unified-installer.rst
@@ -1,8 +1,3 @@
-.. |SCYLLADB_VERSION| replace:: 5.2
-
-.. update the version folder URL below (variables won't work):
-    https://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.2/
-
 ====================================================
 Install ScyllaDB Without root Privileges
 ====================================================
@@ -24,14 +19,17 @@ Note that if you're on CentOS 7, only root offline installation is supported.
 Download and Install
 -----------------------

-#. Download the latest tar.gz file for ScyllaDB |SCYLLADB_VERSION| (x86 or ARM) from https://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.2/.
+#. Download the latest tar.gz file for ScyllaDB version (x86 or ARM) from ``https://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-<version>/``.
+
+   Example for version 6.1: https://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-6.1/
+
 #. Uncompress the downloaded package.

-   The following example shows the package for ScyllaDB 5.2.4 (x86):
+   The following example shows the package for ScyllaDB 6.1.1 (x86):

   .. code:: console

-    tar xvfz scylla-unified-5.2.4-0.20230623.cebbf6c5df2b.x86_64.tar.gz
+    tar xvfz scylla-unified-6.1.1-0.20240814.8d90b817660a.x86_64.tar.gz

 #. Install OpenJDK 8 or 11.

--- a/docs/getting-started/logging.rst
+++ b/docs/getting-started/logging.rst
@@ -71,7 +71,7 @@ This will send ScyllaDB only logs to :code:`/var/log/scylla/scylla.log`

 Logging on Docker
 -----------------
-Starting from ScyllaDB 1.3, `ScyllaDB Docker <https://hub.docker.com/r/scylladb/scylla/>`_, you should use :code:`docker logs` command to access ScyllaDB server and JMX proxy logs
+Starting from ScyllaDB 1.3, `ScyllaDB Docker <https://hub.docker.com/r/scylladb/scylla/>`_, you should use :code:`docker logs` command to access ScyllaDB server logs.


 .. include:: /rst_include/advance-index.rst
--- a/docs/kb/custom-user.rst
+++ b/docs/kb/custom-user.rst
@@ -26,13 +26,6 @@ By default, ScyllaDB runs as user ``scylla`` in group ``scylla``. The following

 4. Edit ``/etc/systemd/system/multi-user.target.wants/node-exporter.service``

-.. code-block:: sh
-   
-   User=test
-   Group=test
-
-5. Edit /usr/lib/systemd/system/scylla-jmx.service
-
 .. code-block:: sh
   
   User=test
@@ -51,5 +44,4 @@ At this point, all  services should be started as test:test user:
 .. code-block:: sh
   
   test      8760     1 11 14:42 ?        00:00:01 /usr/bin/scylla --log-to-syslog 1 --log-to-std ...
-   test      8765     1 12 14:42 ?        00:00:01 /opt/scylladb/jmx/symlinks/scylla-jmx -Xmx256m ...
   test     13638     1  0 14:30 ?        00:00:00 /usr/bin/node_exporter --collector.interrupts
--- a/docs/kb/memory-usage.rst
+++ b/docs/kb/memory-usage.rst
@@ -11,7 +11,7 @@ For example:
 ScyllaDB uses available memory to cache your data. ScyllaDB knows how to dynamically manage memory for optimal performance, for example, if many clients connect to ScyllaDB, it will evict some data from the cache to make room for these connections, when the connection count drops again, this memory is returned to the cache.

 To limit the memory usage you can start scylla with ``--memory`` parameter.
-Alternatively, you can specify the amount of memory ScyllaDB should leave to the OS with ``--reserve-memory`` parameter. Keep in mind that the amount of memory left to the operating system needs to suffice external scylla modules, such as ``scylla-jmx``, which runs on top of JVM.
+Alternatively, you can specify the amount of memory ScyllaDB should leave to the OS with ``--reserve-memory`` parameter. Keep in mind that the amount of memory left to the operating system needs to suffice external scylla modules.

 On Ubuntu, edit the ``/etc/default/scylla-server``.

--- a/docs/operating-scylla/_common/networking-ports.rst
+++ b/docs/operating-scylla/_common/networking-ports.rst
@@ -14,8 +14,6 @@ Port    Description                                   Protocol
 ------  --------------------------------------------  --------
 7001    SSL inter-node communication (RPC)            TCP
 ------  --------------------------------------------  --------
-7199    JMX management                                TCP
------  --------------------------------------------  --------
 10000   ScyllaDB REST API                               TCP
 ------  --------------------------------------------  --------
 9180    Prometheus API                                TCP
--- a/docs/operating-scylla/admin-tools/scylla-sstable.rst
+++ b/docs/operating-scylla/admin-tools/scylla-sstable.rst
@@ -430,8 +430,8 @@ The content is dumped in JSON, using the following schema:
        "estimated_tombstone_drop_time": $STREAMING_HISTOGRAM,
        "sstable_level": Uint,
        "repaired_at": Uint64,
-        "min_column_names": [Uint, ...],
-        "max_column_names": [Uint, ...],
+        "min_column_names": [String, ...],
+        "max_column_names": [String, ...],
        "has_legacy_counter_shards": Bool,
        "columns_count": Int64, // >= MC only
        "rows_count": Int64, // >= MC only
--- a/docs/operating-scylla/admin-tools/task-manager.rst
+++ b/docs/operating-scylla/admin-tools/task-manager.rst
@@ -73,17 +73,19 @@ API calls
 	- *keyspace* - if set, tasks are filtered to contain only the ones working on this keyspace;
 	- *table* - if set, tasks are filtered to contain only the ones working on this table;

-* ``/task_manager/task_status/{task_id}`` - gets the task's status, unregisters the task if it's finished;
+* ``/task_manager/task_status/{task_id}`` - gets the task's status;
 * ``/task_manager/abort_task/{task_id}`` - aborts the task if it's abortable, otherwise 403 status code is returned;
-* ``/task_manager/wait_task/{task_id}`` - waits for the task and gets its status (does not unregister the tasks); query params:
+* ``/task_manager/wait_task/{task_id}`` - waits for the task and gets its status; query params:

 	- *timeout* - timeout in seconds; if set - 408 status code is returned if waiting times out;

-* ``/task_manager/task_status_recursive/{task_id}`` - gets statuses of the task and all its descendants in BFS order, unregisters the root task;
+* ``/task_manager/task_status_recursive/{task_id}`` - gets statuses of the task and all its descendants in BFS order;
 * ``/task_manager/ttl`` - gets or sets new ttl; query params (if setting):

 	- *ttl* - new ttl value.

+* ``/task_manager/drain/{module}`` - unregisters all finished local tasks in the module.
+
 Cluster tasks are not unregistered from task manager with API calls.

 Tasks API
--- a/docs/operating-scylla/admin.rst
+++ b/docs/operating-scylla/admin.rst
@@ -146,9 +146,7 @@ The ScyllaDB ports are detailed in the table below. For ScyllaDB Manager ports,

 .. include:: /operating-scylla/_common/networking-ports.rst

-All ports above need to be open to external clients (CQL), external admin systems (JMX), and other nodes (RPC). REST API port can be kept closed for incoming external connections.
-
-The JMX service, :code:`scylla-jmx`, runs on port 7199. It is required in order to manage ScyllaDB using :code:`nodetool` and other Apache Cassandra-compatible utilities. The :code:`scylla-jmx` process must be able to connect to port 10000 on localhost. The JMX service listens for incoming JMX connections on all network interfaces on the system.
+All ports above need to be open to external clients (CQL) and other nodes (RPC). REST API port can be kept closed for incoming external connections.

 Advanced networking
 -------------------
@@ -223,10 +221,6 @@ Monitoring Stack

 |mon_root|

-JMX
---
-ScyllaDB JMX is compatible with Apache Cassandra, exposing the relevant subset of MBeans.
-
 .. REST

 .. include:: /operating-scylla/rest.rst
--- a/docs/operating-scylla/nodetool-commands/snapshot.rst
+++ b/docs/operating-scylla/nodetool-commands/snapshot.rst
@@ -31,7 +31,7 @@ Parameter                                                             Descriptio
 --------------------------------------------------------------------  -------------------------------------------------------------------------------------
 -kc <ktlist>, --kc.list <ktlist>                                      The list of Keyspaces to take snapshot
 --------------------------------------------------------------------  -------------------------------------------------------------------------------------
-p <port> / --port <port>                                             Remote jmx agent port number
+-p <port> / --port <port>                                             The port of the REST API of the ScyllaDB node.
 --------------------------------------------------------------------  -------------------------------------------------------------------------------------
 -sf / --skip-flush                                                    Do not flush memtables before snapshotting (snapshot will not contain unflushed data)
 --------------------------------------------------------------------  -------------------------------------------------------------------------------------
--- a/docs/operating-scylla/nodetool-commands/tasks/drain.rst
+++ b/docs/operating-scylla/nodetool-commands/tasks/drain.rst
@@ -0,0 +1,21 @@
+Nodetool tasks drain
+====================
+**tasks drain** - Unregisters all finished local tasks from the module.
+If a module is not specified, finished tasks in all modules are unregistered.
+
+Syntax
+-------
+.. code-block:: console
+
+   nodetool tasks drain [--module <module>]
+
+Options
+-------
+
+* ``--module`` - if set, only the specified module is drained.
+
+For example:
+
+.. code-block:: shell
+
+   > nodetool tasks drain --module repair
--- a/docs/operating-scylla/nodetool-commands/tasks/index.rst
+++ b/docs/operating-scylla/nodetool-commands/tasks/index.rst
@@ -5,6 +5,7 @@ Nodetool tasks
   :hidden:

   abort <abort>
+   drain <drain>
   list <list>
   modules <modules>
   status <status>
@@ -17,10 +18,17 @@ Nodetool tasks
 Task manager is an API-based tool for tracking long-running background operations, such as repair or compaction,
 which makes them observable and controllable. Task manager operates per node.

+Task Status Retention
+---------------------
+
+* When a task completes, its status is temporarily stored on the executing node
+* Status information is retained for up to :confval:`task_ttl_in_seconds` seconds
+
 Supported tasks suboperations
 -----------------------------

 * :doc:`abort </operating-scylla/nodetool-commands/tasks/abort>` - Aborts the task.
+* :doc:`drain </operating-scylla/nodetool-commands/tasks/drain>` - Unregisters all finished local tasks.
 * :doc:`list </operating-scylla/nodetool-commands/tasks/list>` - Lists tasks in the module.
 * :doc:`modules </operating-scylla/nodetool-commands/tasks/modules>` - Lists supported modules.
 * :doc:`status </operating-scylla/nodetool-commands/tasks/status>` - Gets status of the task.
--- a/docs/operating-scylla/nodetool-commands/tasks/status.rst
+++ b/docs/operating-scylla/nodetool-commands/tasks/status.rst
@@ -1,6 +1,6 @@
 Nodetool tasks status
 =========================
-**tasks status** - Gets the status of a task manager task. If the task was finished it is unregistered.
+**tasks status** - Gets the status of a task manager task.

 Syntax
 -------
@@ -23,10 +23,10 @@ Example output
   type: repair
   kind: node
   scope: keyspace
-   state: done
+   state: running
   is_abortable: true
   start_time: 2024-07-29T15:48:55Z
-   end_time: 2024-07-29T15:48:55Z
+   end_time:
   error:
   parent_id: none
   sequence_number: 5
--- a/docs/operating-scylla/nodetool-commands/tasks/tree.rst
+++ b/docs/operating-scylla/nodetool-commands/tasks/tree.rst
@@ -1,7 +1,7 @@
 Nodetool tasks tree
 =======================
 **tasks tree** - Gets the statuses of a task manager task and all its descendants.
-The statuses are listed in BFS order. If the task was finished it is unregistered.
+The statuses are listed in BFS order.

 If task_id isn't specified, trees of all non-internal tasks are printed
 (internal tasks are the ones that have a parent or cover an operation that
--- a/docs/operating-scylla/nodetool.rst
+++ b/docs/operating-scylla/nodetool.rst
@@ -61,26 +61,14 @@ Nodetool
   nodetool-commands/viewbuildstatus
   nodetool-commands/version

-The ``nodetool`` utility provides a simple command-line interface to the following exposed operations and attributes. ScyllaDB’s nodetool is a fork of `the Apache Cassandra nodetool <https://cassandra.apache.org/doc/latest/tools/nodetool/nodetool.html>`_ with the same syntax and a subset of the operations.
+The ``nodetool`` utility provides a simple command-line interface to the following exposed operations and attributes.

 .. _nodetool-generic-options:

 Nodetool generic options
 ========================
-All options are supported:
-
-
-
-* ``-p <port>`` or ``--port <port>`` - Remote JMX agent port number.
-
-* ``-pp`` or ``--print-port`` - Operate in 4.0 mode with hosts disambiguated by port number.
-
-* ``-pw <password>`` or ``--password <password>`` - Remote JMX agent password.
-
-* ``-pwf <passwordFilePath>`` or ``--password-file <passwordFilePath>`` - Path to the JMX password file.
-
-* ``-u <username>`` or ``--username <username>`` - Remote JMX agent username.

+* ``-p <port>`` or ``--port <port>`` - The port of the REST API of the ScyllaDB node.
 * ``--`` - Separates command-line options from the list of argument(useful when an argument might be mistaken for a command-line option).

 Supported Nodetool operations
@@ -145,4 +133,4 @@ Operations that are not listed below are currently not available.
 * :doc:`viewbuildstatus </operating-scylla/nodetool-commands/viewbuildstatus/>` - Shows the progress of a materialized view build.
 * :doc:`version </operating-scylla/nodetool-commands/version>` - Print the DB version.

-.. include:: /rst_include/apache-copyrights.rst
+
--- a/docs/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.rst
+++ b/docs/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.rst
@@ -201,6 +201,7 @@ Add New DC

 #. If you are using ScyllaDB Monitoring, update the `monitoring stack <https://monitoring.docs.scylladb.com/stable/install/monitoring_stack.html#configure-scylla-nodes-from-files>`_ to monitor it. If you are using ScyllaDB Manager, make sure you install the `Manager Agent <https://manager.docs.scylladb.com/stable/install-scylla-manager-agent.html>`_ and Manager can access the new DC.

+.. _add-dc-to-existing-dc-not-connect-clients:

 Configure the Client not to Connect to the New DC
 -------------------------------------------------
--- a/docs/operating-scylla/procedures/cluster-management/arbiter-dc.rst
+++ b/docs/operating-scylla/procedures/cluster-management/arbiter-dc.rst
@@ -0,0 +1,57 @@
+=========================================================
+Preventing Quorum Loss in Symmetrical Multi-DC Clusters
+=========================================================
+
+ScyllaDB requires at least a quorum (majority) of nodes in a cluster to be up
+and communicate with each other. A cluster that loses a quorum can handle reads
+and writes of user data, but cluster management operations, such as schema and
+topology updates, are impossible.
+
+In clusters that are symmetrical, i.e., have two (DCs) with the same number of
+nodes, losing a quorum may occur if one of the DCs becomes unavailable.
+For example, if one DC fails in a 2-DC cluster where each DC has three nodes,
+only three out of six nodes are available, and the quorum is lost.
+
+Adding another DC would mitigate the risk of losing a quorum, but it comes
+with network and storage costs. To prevent the quorum loss with minimum costs,
+you can configure an arbiter (tie-breaker) DC.
+
+An arbiter DC is a datacenter with a :doc:`zero-token node </architecture/zero-token-nodes>`
+-- a node that doesn't replicate any data but is only used for Raft quorum
+voting. An arbiter DC maintains the cluster quorum if one of the other DCs
+fails, while it doesn't incur extra network and storage costs as it has no
+user data.
+
+Adding an Arbiter DC
+-----------------------
+
+To set up an arbiter DC, follow the procedure to
+:doc:`add a new datacenter to an existing cluster </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`.
+When editing the *scylla.yaml* file, set the ``join_ring`` parameter to
+``false`` following these guidelines:
+
+* Set ``join_ring=false`` before you start the node(s). If you set that
+  parameter on a node that has already been bootstrapped and owns a token
+  range, the node startup will fail. In such a case, you'll need to
+  :doc:`decommission </operating-scylla/procedures/cluster-management/decommissioning-data-center>`
+  the node, :doc:`wipe it clean </operating-scylla/procedures/cluster-management/clear-data>`,
+  and add it back to the arbiter DC properly following
+  the :doc:`procedure </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`.
+* As a rule, one node is sufficient for an arbiter to serve as a tie-breaker.
+  In case you add more than one node to the arbiter DC, ensure that you set
+  ``join_ring=false`` on all the nodes in that DC.
+
+Follow-up steps:
+^^^^^^^^^^^^^^^^^^^
+* An arbiter DC has a replication factor of 0 (RF=0) for all keyspaces. You
+  need to ``ALTER`` the keyspaces to update their RF.
+* Since zero-token nodes are ignored by drivers, you can skip
+  :ref:`configuring the client not to connect to the new DC <add-dc-to-existing-dc-not-connect-clients>`.
+
+References
+----------------
+
+* :doc:`Zero-token Nodes </architecture/zero-token-nodes>`
+* :doc:`Raft Consensus Algorithm in ScyllaDB </architecture/raft>`
+* :doc:`Handling Node Failures </troubleshooting/handling-node-failures>`
+* :doc:`Adding a New Data Center Into an Existing ScyllaDB Cluster </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>`
--- a/docs/operating-scylla/procedures/cluster-management/create-cluster-multidc.rst
+++ b/docs/operating-scylla/procedures/cluster-management/create-cluster-multidc.rst
@@ -209,6 +209,17 @@ In this example, we will show how to install a nine nodes cluster.
   UN   54.187.142.201  109.54 KB       256     ?               d99967d6-987c-4a54-829d-86d1b921470f    RACK1
   UN   54.187.168.20   109.54 KB       256     ?               2329c2e0-64e1-41dc-8202-74403a40f851    RACK1

-See also:
+--------------------------
+Preventing Quorum Loss
+--------------------------

+If your cluster is symmetrical, i.e., it has  an even number of datacenters
+with the same number of nodes, consider adding an arbiter DC to mitigate
+the risk of losing a quorum at a minimum cost.
+See :doc:`Preventing Quorum Loss in Symmetrical Multi-DC Clusters </operating-scylla/procedures/cluster-management/arbiter-dc>`
+for details.
+
+------------
+See also:
+------------
 :doc:`Create a ScyllaDB Cluster - Single Data Center (DC) </operating-scylla/procedures/cluster-management/create-cluster>`
--- a/docs/operating-scylla/procedures/cluster-management/decommissioning-data-center.rst
+++ b/docs/operating-scylla/procedures/cluster-management/decommissioning-data-center.rst
@@ -55,7 +55,7 @@ Procedure
      cqlsh> DESCRIBE <KEYSPACE_NAME>
      cqlsh> CREATE KEYSPACE <KEYSPACE_NAME> WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', '<DC_NAME1>' : 3, '<DC_NAME2>' : 3, '<DC_NAME3>' : 3};

-      cqlsh> ALTER KEYSPACE <KEYSPACE_NAME> WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', '<DC_NAME1>' : 3, '<DC_NAME2>' : 3};
+      cqlsh> ALTER KEYSPACE <KEYSPACE_NAME> WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', '<DC_NAME1>' : 3, '<DC_NAME2>' : 3, '<DC_NAME3>' : 0};

   For example:

@@ -71,7 +71,7 @@ Procedure

   .. code-block:: shell

-      cqlsh> ALTER KEYSPACE nba WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US-DC' : 3, 'EUROPE-DC' : 3};
+      cqlsh> ALTER KEYSPACE nba WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US-DC' : 3, 'ASIA-DC' : 0, 'EUROPE-DC' : 3};

 #. Run :doc:`nodetool decommission </operating-scylla/nodetool-commands/decommission>` on every node in the data center that is to be removed.
   Refer to :doc:`Remove a Node from a ScyllaDB Cluster - Down Scale </operating-scylla/procedures/cluster-management/remove-node>` for further information.
--- a/docs/operating-scylla/procedures/cluster-management/index.rst
+++ b/docs/operating-scylla/procedures/cluster-management/index.rst
@@ -26,6 +26,8 @@ Cluster Management Procedures
   Safely Restart Your Cluster <safe-start>
   Handling Membership Change Failures <handling-membership-change-failures>
   repair-based-node-operation
+   Prevent Quorum Loss in Symmetrical Multi-DC Clusters <arbiter-dc>
+

 .. panel-box::
  :title: Cluster and DC Creation
@@ -84,6 +86,8 @@ Cluster Management Procedures

  * :doc:`Repair Based Node Operations (RBNO) </operating-scylla/procedures/cluster-management/repair-based-node-operation>`

+  * :doc:`Preventing Quorum Loss in Symmetrical Multi-DC Clusters <arbiter-dc>`
+
 .. panel-box::
  :title: Topology Changes
  :id: "getting-started"
--- a/docs/operating-scylla/procedures/tips/benchmark-tips.rst
+++ b/docs/operating-scylla/procedures/tips/benchmark-tips.rst
@@ -41,7 +41,7 @@ With the recent addition of the `ScyllaDB Advisor <http://monitoring.docs.scylla
 Install ScyllaDB Manager
 ------------------------

-Install and use `ScyllaDB Manager <https://manager.docs.scylladb.com>` together with the `ScyllaDB Monitoring Stack <http://monitoring.docs.scylladb.com/>`_.
+Install and use `ScyllaDB Manager <https://manager.docs.scylladb.com>`_ together with the `ScyllaDB Monitoring Stack <http://monitoring.docs.scylladb.com/>`_.
 ScyllaDB Manager provides automated backups and repairs of your database.
 ScyllaDB Manager can manage multiple ScyllaDB clusters and run cluster-wide tasks in a controlled and predictable way.
 For example, with ScyllaDB Manager you can control the intensity of a repair, increasing it to speed up the process, or lower the intensity to ensure it minimizes impact on ongoing operations.
--- a/docs/operating-scylla/procedures/tips/best-practices-scylla-on-docker.rst
+++ b/docs/operating-scylla/procedures/tips/best-practices-scylla-on-docker.rst
@@ -22,6 +22,13 @@ To start a single ScyllaDB node instance in a Docker container, run:

 docker run --name some-scylla -d scylladb/scylla

+If you're on macOS and plan to start a multi-node cluster (3 nodes or more), start ScyllaDB with
+``–reactor-backend=epoll`` to override the default ``linux-aio`` reactor backend:
+
+.. code-block:: console
+
+ docker run --name some-scylla -d scylladb/scylla --reactor-backend=epoll
+
 The ``docker run`` command starts a new Docker instance in the background named some-scylla that runs the ScyllaDB server:

 .. code-block:: console
@@ -95,6 +102,12 @@ With a single ``some-scylla`` instance running,  joining new nodes to form a clu

 docker run --name some-scylla2 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"

+If you're on macOS, ensure to add the ``–reactor-backend=epoll`` option when adding new nodes:
+
+.. code-block:: console
+
+ docker run --name some-scylla2 -d scylladb/scylla --reactor-backend=epoll --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"
+
 To query when the node is up and running (and view the status of the entire cluster) use the ``nodetool status`` command:

 .. code-block:: console
--- a/docs/operating-scylla/rest.rst
+++ b/docs/operating-scylla/rest.rst
@@ -6,8 +6,8 @@ ScyllaDB exposes a REST API to retrieve administrative information from a node a
 administrative operations. For example, it allows you to check or update configuration, 
 retrieve cluster-level information, and more.

-The :doc:`nodetool </operating-scylla/nodetool>` CLI tool interacts with a *scylla-jmx* process using JMX. 
-The process, in turn, uses the REST API to interact with the ScyllaDB process.
+The :doc:`nodetool </operating-scylla/nodetool>` CLI tool uses the REST API
+to interact with the ScyllaDB process.

 You can interact with the REST API directly using :code:`curl`, ScyllaDB's CLI for REST API, or the Swagger UI.

--- a/docs/operating-scylla/security/certificate-authentication.rst
+++ b/docs/operating-scylla/security/certificate-authentication.rst
@@ -11,7 +11,7 @@ Procedure

 #. Enable authentication

-   Enable authentication and define authorized roles in the cluster as described in the `Enable Authentication </operating-scylla/security/authentication/>`_ document. 
+   Enable authentication and define authorized roles in the cluster as described in the :doc:`Enable Authentication </operating-scylla/security/authentication/>` document. 

 #. Enable CQL transport TLS using client certificate verification
   
--- a/docs/operating-scylla/security/client-node-encryption.rst
+++ b/docs/operating-scylla/security/client-node-encryption.rst
@@ -3,7 +3,7 @@ Encryption: Data in Transit Client to Node

 Follow the procedures below to enable a client to node encryption.
 Once enabled, all communication between the client and the node is transmitted over TLS/SSL.
-The libraries used by ScyllaDB for OpenSSL are FIPS 140-2 certified.
+The libraries used by ScyllaDB for OpenSSL are FIPS 140-2 enabled.

 Workflow
 ^^^^^^^^
--- a/docs/troubleshooting/autoremove-ubuntu.rst
+++ b/docs/troubleshooting/autoremove-ubuntu.rst
@@ -0,0 +1,38 @@
+Removing ScyllaDB with the "--autoremove" option on Ubuntu breaks system packages
+======================================================================================
+
+Problem
+^^^^^^^
+
+Running ``apt purge scylla --autoremove`` marks most system packages for
+removal.
+
+.. code::
+
+   root@myserv:~# apt purge scylla --autoremove
+   Reading package lists... Done
+   Building dependency tree... Done
+   Reading state information... Done
+   The following packages will be REMOVED:
+     apport-symptoms* bc* bcache-tools* bolt* btrfs-progs* byobu* cloud-guest-utils* cloud-init* cloud-initramfs-copymods* cloud-initramfs-dyn-netconf* cryptsetup* cryptsetup-initramfs* dmeventd* eatmydata* ethtool* fdisk* fonts-ubuntu-console* fwupd* fwupd-signed* gdisk* gir1.2-packagekitglib-1.0* git* git-man* kpartx* landscape-common* libaio1* libappstream4* libatasmart4* libblockdev-crypto2* libblockdev-fs2*
+     libblockdev-loop2* libblockdev-part-err2* libblockdev-part2* libblockdev-swap2* libblockdev-utils2* libblockdev2* libdevmapper-event1.02.1* libeatmydata1* liberror-perl* libfdisk1* libfwupd2* libfwupdplugin5* libgcab-1.0-0* libgpgme11* libgstreamer1.0-0* libgusb2* libinih1* libintl-perl* libintl-xs-perl* libjcat1* libjson-glib-1.0-0* libjson-glib-1.0-common* liblvm2cmd2.03* libmbim-glib4* libmbim-proxy*
+     libmm-glib0* libmodule-find-perl* libmodule-scandeps-perl* libmspack0* libpackagekit-glib2-18* libparted-fs-resize0* libproc-processtable-perl* libqmi-glib5* libqmi-proxy* libsgutils2-2* libsmbios-c2* libsort-naturally-perl* libstemmer0d* libtcl8.6* libterm-readkey-perl* libudisks2-0* liburcu8* libutempter0* libvolume-key1* libxmlb2* libxmlsec1* libxmlsec1-openssl* libxslt1.1* lvm2* lxd-agent-loader* mdadm*
+     modemmanager* motd-news-config* multipath-tools* needrestart* open-vm-tools* overlayroot* packagekit* packagekit-tools* pastebinit* patch* pollinate* python3-apport* python3-certifi* python3-chardet* python3-configobj* python3-debconf* python3-debian* python3-json-pointer* python3-jsonpatch* python3-jsonschema* python3-magic* python3-newt* python3-packaging* python3-pexpect* python3-problem-report*
+     python3-ptyprocess* python3-pyrsistent* python3-requests* python3-software-properties* python3-systemd* python3-xkit* run-one* sbsigntool* screen* scylla* scylla-conf* scylla-cqlsh* scylla-kernel-conf* scylla-node-exporter* scylla-python3* scylla-server* secureboot-db* sg3-utils* sg3-utils-udev* software-properties-common* sosreport* tcl* tcl8.6* thin-provisioning-tools* tmux* ubuntu-drivers-common* udisks2*
+     unattended-upgrades* update-notifier-common* usb-modeswitch* usb-modeswitch-data* xfsprogs* zerofree*
+   0 upgraded, 0 newly installed, 139 to remove and 0 not upgraded.
+
+Cause
+^^^^^^^
+
+This problem may occur on Ubuntu 22.04 or earlier. It is caused by
+the ``systemd-coredump`` package installed with the ``scylla_setup`` script.
+Installing ``systemd-coredump`` results in removing ``apport`` and ``ubuntu-server``.
+In turn, the ``--autoremove`` option marks for removal all packages installed
+by ``ubuntu-server dependencies``.
+
+
+Solution
+^^^^^^^^^^
+
+Do not run the ``--autoremove`` option when removing ScyllaDB.
--- a/docs/troubleshooting/cluster/index.rst
+++ b/docs/troubleshooting/cluster/index.rst
@@ -10,7 +10,6 @@ Cluster and Node
   Failed Decommission Problem </troubleshooting/failed-decommission/>
   Cluster Timeouts </troubleshooting/timeouts>
   Node Joined With No Data </troubleshooting/node-joined-without-any-data>
-   SocketTimeoutException </troubleshooting/nodetool-memory-read-timeout/> 
   NullPointerException </troubleshooting/nodetool-nullpointerexception/> 
   Failed Schema Sync </troubleshooting/failed-schema-sync/> 

@@ -28,7 +27,6 @@ Cluster and Node
 * :doc:`Failed Decommission Problem </troubleshooting/failed-decommission/>`
 * :doc:`Cluster Timeouts </troubleshooting/timeouts>`
 * :doc:`Node Joined With No Data </troubleshooting/node-joined-without-any-data>`
-* :doc:`Nodetool fails with SocketTimeoutException 'Read timed out' </troubleshooting/nodetool-memory-read-timeout>`
 * :doc:`Nodetool Throws NullPointerException </troubleshooting/nodetool-nullpointerexception>`
 * :doc:`Failed Schema Sync </troubleshooting/failed-schema-sync>`

--- a/docs/troubleshooting/drop-table-space-up.rst
+++ b/docs/troubleshooting/drop-table-space-up.rst
@@ -16,6 +16,4 @@ Solution
 2. If you are deleting an entire keyspace, repeat the procedure above for every table inside the keyspace.
 3. This behavior is controlled by the ``auto_snapshot`` flag in ``/etc/scylla/scylla.yaml``, which set to true by default. To stop taking snapshots on deletion, set that flag to false and restart all your scylla nodes.

-.. note:: Alternatively you can use the``rm`` Linux utility to remove the files. If you do, keep in mind that the ``rm`` Linux utility is not aware if some snapshots are still associated with existing keyspaces, but nodetool is. 
- 
-
+.. note:: Alternatively you can use the ``rm`` Linux utility to remove the files. If you do, keep in mind that the ``rm`` Linux utility is not aware if some snapshots are still associated with existing keyspaces, but nodetool is.
--- a/docs/troubleshooting/index.rst
+++ b/docs/troubleshooting/index.rst
@@ -14,6 +14,7 @@ Troubleshooting ScyllaDB
   storage/index
   CQL/index
   monitor/index
+   install-remove/index


 ScyllaDB's troubleshooting section contains articles which are targeted to pinpoint and answer problems with ScyllaDB. For broader issues and workarounds, consult the :doc:`Knowledge base </kb/index>`.
@@ -33,6 +34,7 @@ Keep your versions up-to-date. The two latest versions are supported. Also, alwa
  * :doc:`Data Storage and SSTables <storage/index>`
  * :doc:`CQL errors <CQL/index>`
  * :doc:`ScyllaDB Monitoring and ScyllaDB Manager <monitor/index>`
+  * :doc:`Installation and Removal <install-remove/index>`

 Also check out the `Monitoring lesson <https://university.scylladb.com/courses/scylla-operations/lessons/scylla-monitoring/>`_ on ScyllaDB University, which covers how to troubleshoot different issues when running a ScyllaDB cluster.

--- a/docs/troubleshooting/install-remove/index.rst
+++ b/docs/troubleshooting/install-remove/index.rst
@@ -0,0 +1,13 @@
+Installation and Removal
+===========================
+
+.. toctree::
+   :hidden:
+   :maxdepth: 2 
+
+   Removing ScyllaDB on Ubuntu breaks system packages </troubleshooting/autoremove-ubuntu/>
+
+
+
+* :doc:`Removing ScyllaDB with the "--autoremove" option on Ubuntu breaks system packages </troubleshooting/autoremove-ubuntu/>`
+  
--- a/docs/troubleshooting/nodetool-memory-read-timeout.rst
+++ b/docs/troubleshooting/nodetool-memory-read-timeout.rst
@@ -1,112 +0,0 @@
-Nodetool fails with SocketTimeoutException 'Read timed out' 
-===========================================================
-
-This troubleshooting article describes what to do when Nodetool fails with a 'Read timed out' error.
-
-Problem
-^^^^^^^
-
-When running any Nodetool command, users may see the following error:
-
-.. code-block:: none
-
-   Failed to connect to '127.0.0.1:7199' - SocketTimeoutException: 'Read timed out' 
-
-Analysis
-^^^^^^^^
-Nodetool is a Java based application which requires memory. ScyllaDB by default consumes 93% of the node’s RAM (for MemTables + Cache) and leaves 7% for other applications, such as nodetool.
-
-If cases where this is not enough memory (e.g. small instances with ~64GB RAM or lower), Nodetool may not be able to run due to insufficient memory. In this case an out of memory (OOM) error may appear and scylla-jmx will not run.
-
-
-Example
-------
-
-The error you will see is similar to:
-
-.. code-block:: none
-
-   OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000005c0000000, 
-   671088640, 0) failed; error='Cannot allocate memory' (err no=12) 
-
-
-In order to check if the issue is scylla-jmx, use the following command (systemd-based Linux distribution) to check the status of the service:
-
-.. code-block:: none
-
-   sudo systemctl status scylla-jmx
-
-If the service is running you will see something similar to:
-
-.. code-block:: none
-
-   sudo service scylla-jmx status
-   ● scylla-jmx.service - ScyllaDB JMX
-      Loaded: loaded (/lib/systemd/system/scylla-jmx.service; disabled; vendor preset: enabled)
-      Active: active (running) since Wed 2018-07-18 20:59:08 UTC; 3s ago
-    Main PID: 256050 (scylla-jmx)
-       Tasks: 27
-      Memory: 119.5M
-         CPU: 1.959s
-      CGroup: /system.slice/scylla-jmx.service
-              └─256050 /usr/lib/scylla/jmx/symlinks/scylla-jmx -Xmx384m -XX:+UseSerialGC -Dcom.sun.management.jmxremote.auth
-
-If it isn't, you will see an error similar to:
-
-.. code-block:: none
-
-   sudo systemctl status scylla-jmx
-   ● scylla-jmx.service - ScyllaDB JMX
-     Loaded: loaded (/usr/lib/systemd/system/scylla-jmx.service; disabled; vendor preset: disabled)
-     Active: failed (Result: exit-code) since Thu 2018-05-10 10:34:15 EDT; 3min 47s ago
-     Process: 1417 ExecStart=/usr/lib/scylla/jmx/scylla-jmx $SCYLLA_JMX_PORT $SCYLLA_API_PORT $SCYLLA_API_ADDR $SCYLLA_JMX_ADDR
-     $SCYLLA_JMX_FILE $SCYLLA_JMX_LOCAL $SCYLLA_JMX_REMOTE $SCYLLA_JMX_DEBUG (code=exited, status=127)
-     Main PID: 1417 (code=exited, status=127)
-
-or 
-
-.. code-block:: none
-
-   sudo service scylla-jmx status
-   ● scylla-jmx.service
-     Loaded: not-found (Reason: No such file or directory)
-     Active: failed (Result: exit-code) since Wed 2018-07-18 20:38:58 UTC; 12min ago
-     Main PID: 141256 (code=exited, status=143)
-
-You will need to restart the service or change the RAM allocation as per the Solution_ below. 
-
-Solution
-^^^^^^^^
-
-There are two ways to fix this problem, one is faster but may not permanently fix the issue and the other solution is more robust. 
-
-**The immediate solution**
-
-.. code-block:: none
-
-   service scylla-jmx restart 
-
-.. note:: This is not a permanent fix as the problem might manifest again at a later time.
-
-**The more robust solution**
-
-1. Take the size of your node’s RAM, calculate 7% of that size, increase it by another 40%, and use this new size as your RAM requirement. 
-
-   For example: on a GCP n1-highmem-8 instance (52GB RAM)
-
-   * 7% would be ~3.6GB. 
-   * Increasing it by ~40% means you need to increase your RAM ~5GB.
-2. Open one of the following files (as per your OS platform):
-
-   * Ubuntu: ``/etc/default/scylla-server``. 
-   * Red Hat/ CentOS: ``/etc/sysconfig/scylla-server`` 
-3. In the file you are editing, add to the ``SCYLLA_ARGS`` statement ``--reserve-memory 5G`` (the amount you calculated above). Save and exit.
-4. Restart ScyllaDB server 
-
-.. code-block:: none
-
-   sudo systemctl restart scylla-server
-
-
-.. note:: If the initial calculation and reserve memory is not enough and problem persists and/or reappears, repeat the procedure from step 2 and increase the RAM in 1GB increments.
-
--- a/docs/troubleshooting/report-scylla-problem.rst
+++ b/docs/troubleshooting/report-scylla-problem.rst
@@ -279,17 +279,12 @@ Once you have collected and compressed your reports, send them to ScyllaDB for a
   curl -X PUT https://upload.scylladb.com/$report_uuid/yourfile -T yourfile


-For example with the health check report and node health check report:
-
-
-.. code-block:: shell
-
-   curl -X PUT https://upload.scylladb.com/$report_uuid/output_files.tgz -T output_files.tgz
+For example with the Scylla Doctor's vitals:

  
 .. code-block:: shell
 
-   curl -X PUT https://upload.scylladb.com/$report_uuid/192.0.2.0-health-check-report.txt -T 192.0.2.0-health-check-report.txt
+   curl -X PUT https://upload.scylladb.com/$report_uuid/my_cluster_123_vitals.tgz -T my_cluster_123_vitals.tgz


 The **UUID** you generated replaces the variable ``$report_uuid`` at runtime. ``yourfile`` is any file you need to send to ScyllaDB support.
--- a/docs/upgrade/upgrade-opensource/upgrade-guide-from-6.1-to-6.2/metric-update-6.1-to-6.2.rst
+++ b/docs/upgrade/upgrade-opensource/upgrade-guide-from-6.1-to-6.2/metric-update-6.1-to-6.2.rst
@@ -21,8 +21,8 @@ The following metrics are new in ScyllaDB |NEW_VERSION|:

   * - Metric
     - Description
-   * -
-     - 
+   * - scylla_alternator_batch_item_count 
+     - The total number of items processed across all batches

  

--- a/docs/upgrade/upgrade-to-enterprise/index.rst
+++ b/docs/upgrade/upgrade-to-enterprise/index.rst
@@ -6,22 +6,14 @@ Upgrade from ScyllaDB Open Source to ScyllaDB Enterprise
   :titlesonly:
   :hidden:

+   ScyllaDB 6.0 to ScyllaDB Enterprise 2024.2 <upgrade-guide-from-6.0-to-2024.2/index>
   ScyllaDB 5.4 to ScyllaDB Enterprise 2024.1 <upgrade-guide-from-5.4-to-2024.1/index>
   ScyllaDB 5.2 to ScyllaDB Enterprise 2023.1 <upgrade-guide-from-5.2-to-2023.1/index>
  

-.. raw:: html
-
-
-   <div class="panel callout radius animated">
-            <div class="row">
-              <div class="medium-3 columns">
-                <h5 id="getting-started">Upgrade to ScyllaDB Enterprise</h5>
-              </div>
-              <div class="medium-9 columns">
-
 Procedures for upgrading from ScyllaDB Open Source to ScyllaDB Enterprise:

+* :doc:`ScyllaDB 6.0 to ScyllaDB Enterprise 2024.2 </upgrade/upgrade-to-enterprise/upgrade-guide-from-6.0-to-2024.2/index>`
 * :doc:`ScyllaDB 5.4 to ScyllaDB Enterprise 2024.1 </upgrade/upgrade-to-enterprise/upgrade-guide-from-5.4-to-2024.1/index>`
 * :doc:`ScyllaDB 5.2 to ScyllaDB Enterprise 2023.1 </upgrade/upgrade-to-enterprise/upgrade-guide-from-5.2-to-2023.1/index>`

--- a/Show More
+++ b/Show More