dist: systemd: use default KillMode

before this change, we specify the KillMode of the scylla-service service unit explicitly to "process". according to according to https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html, > If set to process, only the main process itself is killed (not recommended!). and the document suggests use "control-group" over "process". but scylla server is not a multi-process server, it is a multi-threaded server. so it should not make any difference even if we switch to the recommended "control-group". in the light that we've been seeing "defunct" scylla process after stopping the scylla service using systemd. we are wondering if we should try to change the `KillMode` to "control-group", which is the default value of this setting. in this change, we just drop the setting so that the systemd stops the service by stopping all processes in the control group of this unit are stopped. Fixes scylladb/scylladb#21507 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21508 (cherry picked from commit 961a53f716) Closes scylladb/scylladb#23177
Update seastar submodule
2025-04-04 17:56:15 +03:00 · 2025-02-11 10:27:00 +02:00 · 2025-02-10 11:56:06 +02:00 · 2025-02-06 13:30:18 +02:00 · 2025-02-04 16:32:05 +02:00 · 2025-02-03 19:22:01 +01:00
311 changed files with 7236 additions and 2376 deletions
--- a/.github/scripts/auto-backport.py
+++ b/.github/scripts/auto-backport.py
@@ -0,0 +1,186 @@
+#!/usr/bin/env python3
+
+import argparse
+import os
+import re
+import sys
+import tempfile
+import logging
+
+from github import Github, GithubException
+from git import Repo, GitCommandError
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+try:
+    github_token = os.environ["GITHUB_TOKEN"]
+except KeyError:
+    print("Please set the 'GITHUB_TOKEN' environment variable")
+    sys.exit(1)
+
+
+def is_pull_request():
+    return '--pull-request' in sys.argv[1:]
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--repo', type=str, required=True, help='Github repository name')
+    parser.add_argument('--base-branch', type=str, default='refs/heads/master', help='Base branch')
+    parser.add_argument('--commits', default=None, type=str, help='Range of promoted commits.')
+    parser.add_argument('--pull-request', type=int, help='Pull request number to be backported')
+    parser.add_argument('--head-commit', type=str, required=is_pull_request(), help='The HEAD of target branch after the pull request specified by --pull-request is merged')
+    return parser.parse_args()
+
+
+def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft=False):
+    pr_body = f'{pr.body}\n\n'
+    for commit in commits:
+        pr_body += f'- (cherry picked from commit {commit})\n\n'
+    pr_body += f'Parent PR: #{pr.number}'
+    try:
+        backport_pr = repo.create_pull(
+            title=backport_pr_title,
+            body=pr_body,
+            head=f'scylladbbot:{new_branch_name}',
+            base=base_branch_name,
+            draft=is_draft
+        )
+        logging.info(f"Pull request created: {backport_pr.html_url}")
+        backport_pr.add_to_assignees(pr.user)
+        if is_draft:
+            backport_pr.add_to_labels("conflicts")
+            pr_comment = f"@{pr.user} - This PR was marked as draft because it has conflicts\n"
+            pr_comment += "Please resolve them and mark this PR as ready for review"
+            backport_pr.create_issue_comment(pr_comment)
+        logging.info(f"Assigned PR to original author: {pr.user}")
+        return backport_pr
+    except GithubException as e:
+        if 'A pull request already exists' in str(e):
+            logging.warning(f'A pull request already exists for {pr.user}:{new_branch_name}')
+        else:
+            logging.error(f'Failed to create PR: {e}')
+
+
+def get_pr_commits(repo, pr, stable_branch, start_commit=None):
+    commits = []
+    if pr.merged:
+        merge_commit = repo.get_commit(pr.merge_commit_sha)
+        if len(merge_commit.parents) > 1:  # Check if this merge commit includes multiple commits
+            commits.append(pr.merge_commit_sha)
+        else:
+            if start_commit:
+                promoted_commits = repo.compare(start_commit, stable_branch).commits
+            else:
+                promoted_commits = repo.get_commits(sha=stable_branch)
+            for commit in pr.get_commits():
+                for promoted_commit in promoted_commits:
+                    commit_title = commit.commit.message.splitlines()[0]
+                    # In Scylla-pkg and scylla-dtest, for example,
+                    # we don't create a merge commit for a PR with multiple commits,
+                    # according to the GitHub API, the last commit will be the merge commit,
+                    # which is not what we need when backporting (we need all the commits).
+                    # So here, we are validating the correct SHA for each commit so we can cherry-pick
+                    if promoted_commit.commit.message.startswith(commit_title):
+                        commits.append(promoted_commit.sha)
+
+    elif pr.state == 'closed':
+        events = pr.get_issue_events()
+        for event in events:
+            if event.event == 'closed':
+                commits.append(event.commit_id)
+    return commits
+
+
+def create_pr_comment_and_remove_label(pr, comment_body):
+    labels = pr.get_labels()
+    pattern = re.compile(r"backport/\d+\.\d+$")
+    for label in labels:
+        if pattern.match(label.name):
+            print(f"Removing label: {label.name}")
+            comment_body += f'- {label.name}\n'
+            pr.remove_from_labels(label)
+    pr.create_issue_comment(comment_body)
+
+
+def backport(repo, pr, version, commits, backport_base_branch):
+    new_branch_name = f'backport/{pr.number}/to-{version}'
+    backport_pr_title = f'[Backport {version}] {pr.title}'
+    repo_url = f'https://scylladbbot:{github_token}@github.com/{repo.full_name}.git'
+    fork_repo = f'https://scylladbbot:{github_token}@github.com/scylladbbot/{repo.name}.git'
+    with (tempfile.TemporaryDirectory() as local_repo_path):
+        try:
+            repo_local = Repo.clone_from(repo_url, local_repo_path, branch=backport_base_branch)
+            repo_local.git.checkout(b=new_branch_name)
+            is_draft = False
+            for commit in commits:
+                try:
+                    repo_local.git.cherry_pick(commit, '-m1', '-x')
+                except GitCommandError as e:
+                    logging.warning(f'Cherry-pick conflict on commit {commit}: {e}')
+                    is_draft = True
+                    repo_local.git.add(A=True)
+                    repo_local.git.cherry_pick('--continue')
+            if not repo.private and not repo.has_in_collaborators(pr.user.login):
+                repo.add_to_collaborators(pr.user.login, permission="push")
+                comment = f':warning:  @{pr.user.login} you have been added as collaborator to scylladbbot fork '
+                comment += f'Please check your inbox and approve the invitation, once it is done, please add the backport labels again'
+                create_pr_comment_and_remove_label(pr, comment)
+                return
+            repo_local.git.push(fork_repo, new_branch_name, force=True)
+            create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,
+                                is_draft=is_draft)
+
+        except GitCommandError as e:
+            logging.warning(f"GitCommandError: {e}")
+
+
+def main():
+    args = parse_args()
+    base_branch = args.base_branch.split('/')[2]
+    promoted_label = 'promoted-to-master'
+    repo_name = args.repo
+    if 'scylla-enterprise' in args.repo:
+        promoted_label = 'promoted-to-enterprise'
+    stable_branch = base_branch
+    backport_branch = 'branch-'
+
+    backport_label_pattern = re.compile(r'backport/\d+\.\d+$')
+
+    g = Github(github_token)
+    repo = g.get_repo(repo_name)
+    closed_prs = []
+    start_commit = None
+
+    if args.commits:
+        start_commit, end_commit = args.commits.split('..')
+        commits = repo.compare(start_commit, end_commit).commits
+        for commit in commits:
+            match = re.search(rf"Closes .*#([0-9]+)", commit.commit.message, re.IGNORECASE)
+            if match:
+                pr_number = int(match.group(1))
+                pr = repo.get_pull(pr_number)
+                closed_prs.append(pr)
+    if args.pull_request:
+        start_commit = args.head_commit
+        pr = repo.get_pull(args.pull_request)
+        closed_prs = [pr]
+
+    for pr in closed_prs:
+        labels = [label.name for label in pr.labels]
+        backport_labels = [label for label in labels if backport_label_pattern.match(label)]
+        if promoted_label not in labels:
+            print(f'no {promoted_label} label: {pr.number}')
+            continue
+        if not backport_labels:
+            print(f'no backport label: {pr.number}')
+            continue
+        commits = get_pr_commits(repo, pr, stable_branch, start_commit)
+        logging.info(f"Found PR #{pr.number} with commit {commits} and the following labels: {backport_labels}")
+        for backport_label in backport_labels:
+            version = backport_label.replace('backport/', '')
+            backport_base_branch = backport_label.replace('backport/', backport_branch)
+            backport(repo, pr, version, commits, backport_base_branch)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/scripts/label_promoted_commits.py
+++ b/.github/scripts/label_promoted_commits.py
@@ -16,13 +16,8 @@ def parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('--repository', type=str, required=True,
                        help='Github repository name (e.g., scylladb/scylladb)')
-    parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('
-                                                                               'newest commit).')
-    parser.add_argument('--commit_after_merge', type=str, required=True,
-                        help='Git commit ID to end labeling at (oldest '
-                             'commit, exclusive).')
-    parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '
-                                                                         'done')
+    parser.add_argument('--commits', type=str, required=True, help='Range of promoted commits.')
+    parser.add_argument('--label', type=str, default='promoted-to-master', help='Label to use')
    parser.add_argument('--ref', type=str, required=True, help='PR target branch')
    return parser.parse_args()

@@ -53,10 +48,11 @@ def main():
    target_branch = re.search(r'branch-(\d+\.\d+)', args.ref)
    g = Github(github_token)
    repo = g.get_repo(args.repository, lazy=False)
-    commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)
+    start_commit, end_commit = args.commits.split('..')
+    commits = repo.compare(start_commit, end_commit).commits
    processed_prs = set()
    # Print commit information
-    for commit in commits.commits:
+    for commit in commits:
        print(f'Commit sha is: {commit.sha}')
        match = pr_pattern.search(commit.commit.message)
        if match:
@@ -66,13 +62,13 @@ def main():
            if target_branch:
                pr = repo.get_pull(pr_number)
                branch_name = target_branch[1]
-                refs_pr = re.findall(r'Refs (?:#|https.*?)(\d+)', pr.body)
+                refs_pr = re.findall(r'Parent PR: (?:#|https.*?)(\d+)', pr.body)
                if refs_pr:
                    print(f'branch-{target_branch.group(1)}, pr number is: {pr_number}')
                    # 1. change the backport label of the parent PR to note that
-                    #    we've merge the corresponding backport PR
+                    #    we've merged the corresponding backport PR
                    # 2. close the backport PR and leave a comment on it to note
-                    #    that it has been merged with a certain git commit,
+                    #    that it has been merged with a certain git commit.
                    ref_pr_number = refs_pr[0]
                    mark_backport_done(repo, ref_pr_number, branch_name)
                    comment = f'Closed via {commit.sha}'
--- a/.github/workflows/add-label-when-promoted.yaml
+++ b/.github/workflows/add-label-when-promoted.yaml
@@ -5,9 +5,10 @@ on:
    branches:
      - master
      - branch-*.*
-
-env:
-  DEFAULT_BRANCH: 'master'
+      - enterprise
+    pull_request_target:
+      types: [labeled]
+      branches: [master, next, enterprise]

 jobs:
  check-commit:
@@ -20,17 +21,51 @@ jobs:
        env:
          GITHUB_CONTEXT: ${{ toJson(github) }}
        run: echo "$GITHUB_CONTEXT"
+      - name: Set Default Branch
+        id: set_branch
+        run: |
+          if [[ "${{ github.repository }}" == *enterprise* ]]; then
+            echo "DEFAULT_BRANCH=enterprise" >> $GITHUB_ENV
+          else
+            echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV
+          fi
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          repository: ${{ github.repository }}
          ref: ${{ env.DEFAULT_BRANCH }}
+          token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
          fetch-depth: 0  # Fetch all history for all tags and branches
-
+      - name: Set up Git identity
+        run: |
+          git config --global user.name "GitHub Action"
+          git config --global user.email "action@github.com"
+          git config --global merge.conflictstyle diff3
      - name: Install dependencies
-        run: sudo apt-get install -y python3-github
-
+        run: sudo apt-get install -y python3-github python3-git
      - name: Run python script
+        if: github.event_name == 'push'
        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --ref ${{ github.ref }}
+          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
+        run: python .github/scripts/label_promoted_commits.py  --commits ${{ github.event.before }}..${{ github.sha }} --repository ${{ github.repository }} --ref ${{ github.ref }}
+      - name: Run auto-backport.py when promotion completed
+        if: ${{ github.event_name == 'push' && github.ref == format('refs/heads/{0}', env.DEFAULT_BRANCH) }}
+        env:
+          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
+        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}
+      - name: Check if label starts with 'backport/' and contains digits
+        id: check_label
+        run: |
+          label_name="${{ github.event.label.name }}"
+          if [[ "$label_name" =~ ^backport/[0-9]+\.[0-9]+$ ]]; then
+            echo "Label matches backport/X.X pattern."
+            echo "backport_label=true" >> $GITHUB_OUTPUT
+          else
+            echo "Label does not match the required pattern."
+            echo "backport_label=false" >> $GITHUB_OUTPUT
+          fi
+      - name: Run auto-backport.py when label was added
+        if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.backport_label == 'true' && github.event.pull_request.state == 'closed' }}
+        env:
+          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
+        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }}
--- a/.gitignore
+++ b/.gitignore
@@ -19,7 +19,7 @@ CMakeLists.txt.user
 *.egg-info
 __pycache__CMakeLists.txt.user
 .gdbinit
-resources
+/resources
 .pytest_cache
 /expressions.tokens
 tags
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "seastar"]
 	path = seastar
-	url = ../seastar
+	url = ../scylla-seastar
 	ignore = dirty
 [submodule "swagger-ui"]
 	path = swagger-ui
--- a/4
+++ b/4
@@ -78,7 +78,7 @@ fi

 # Default scylla product/version tags
 PRODUCT=scylla
-VERSION=6.1.0-dev
+VERSION=6.1.6

 if test -f version
 then
@@ -104,7 +104,7 @@ else
 fi

 if [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
-	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" |cut -d . -f 3)
+	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" | rev | cut -d . -f 1 | rev)
 	if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
 		exit 0
 	fi
--- a/alternator/auth.cc
+++ b/alternator/auth.cc
@@ -19,6 +19,7 @@
 #include "alternator/executor.hh"
 #include "cql3/selection/selection.hh"
 #include "cql3/result_set.hh"
+#include "types/types.hh"
 #include <seastar/core/coroutine.hh>

 namespace alternator {
@@ -31,11 +32,12 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::serv
    dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};
    std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};
    const column_definition* salted_hash_col = schema->get_column_definition(bytes("salted_hash"));
-    if (!salted_hash_col) {
+    const column_definition* can_login_col = schema->get_column_definition(bytes("can_login"));
+    if (!salted_hash_col || !can_login_col) {
        co_await coroutine::return_exception(api_error::unrecognized_client(format("Credentials cannot be fetched for: {}", username)));
    }
-    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col});
-    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id}, selection->get_query_options());
+    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col, can_login_col});
+    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id, can_login_col->id}, selection->get_query_options());
    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,
            proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));
    auto cl = auth::password_authenticator::consistency_for_user(username);
@@ -51,7 +53,14 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::serv
    if (result_set->empty()) {
        co_await coroutine::return_exception(api_error::unrecognized_client(format("User not found: {}", username)));
    }
-    const managed_bytes_opt& salted_hash = result_set->rows().front().front(); // We only asked for 1 row and 1 column
+    const auto& result = result_set->rows().front();
+    bool can_login = result[1] && value_cast<bool>(boolean_type->deserialize(*result[1]));
+    if (!can_login) {
+        // This is a valid role name, but has "login=False" so should not be
+        // usable for authentication (see #19735).
+        co_await coroutine::return_exception(api_error::unrecognized_client(format("Role {} has login=false so cannot be used for login", username)));
+    }
+    const managed_bytes_opt& salted_hash = result.front();
    if (!salted_hash) {
        co_await coroutine::return_exception(api_error::unrecognized_client(format("No password found for user: {}", username)));
    }
--- a/alternator/executor.cc
+++ b/alternator/executor.cc
@@ -9,6 +9,7 @@
 #include <fmt/ranges.h>
 #include <seastar/core/sleep.hh>
 #include "alternator/executor.hh"
+#include "cdc/log.hh"
 #include "db/config.hh"
 #include "log.hh"
 #include "schema/schema_builder.hh"
@@ -4439,8 +4440,10 @@ future<executor::request_return_type> executor::list_tables(client_state& client

    auto tables = _proxy.data_dictionary().get_tables(); // hold on to temporary, table_names isn't a container, it's a view
    auto table_names = tables
-            | boost::adaptors::filtered([] (data_dictionary::table t) {
-                        return t.schema()->ks_name().find(KEYSPACE_NAME_PREFIX) == 0 && !t.schema()->is_view();
+            | boost::adaptors::filtered([this] (data_dictionary::table t) {
+                        return t.schema()->ks_name().find(KEYSPACE_NAME_PREFIX) == 0 &&
+                            !t.schema()->is_view() &&
+                            !cdc::is_log_for_some_table(_proxy.local_db(), t.schema()->ks_name(), t.schema()->cf_name());
                    })
            | boost::adaptors::transformed([] (data_dictionary::table t) {
                        return t.schema()->cf_name();
--- a/alternator/server.cc
+++ b/alternator/server.cc
@@ -211,7 +211,10 @@ protected:
        sstring local_dc = topology.get_datacenter();
        std::unordered_set<gms::inet_address> local_dc_nodes = topology.get_datacenter_endpoints().at(local_dc);
        for (auto& ip : local_dc_nodes) {
-            if (_gossiper.is_alive(ip)) {
+            // Note that it's not enough for the node to be is_alive() - a
+            // node joining the cluster is also "alive" but not responsive to
+            // requests. We alive *and* normal. See #19694, #21538.
+            if (_gossiper.is_alive(ip) && _gossiper.is_normal(ip)) {
                // Use the gossiped broadcast_rpc_address if available instead
                // of the internal IP address "ip". See discussion in #18711.
                rjson::push_back(results, rjson::from_string(_gossiper.get_rpc_address(ip)));
--- a/alternator/ttl.cc
+++ b/alternator/ttl.cc
@@ -26,6 +26,7 @@
 #include "log.hh"
 #include "gc_clock.hh"
 #include "replica/database.hh"
+#include "service/client_state.hh"
 #include "service_permit.hh"
 #include "timestamp.hh"
 #include "service/storage_proxy.hh"
@@ -312,7 +313,7 @@ static size_t random_offset(size_t min, size_t max) {
 // this range's primary node is down. For this we need to return not just
 // a list of this node's secondary ranges - but also the primary owner of
 // each of those ranges.
-static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary_ranges(
+static future<std::vector<std::pair<dht::token_range, gms::inet_address>>> get_secondary_ranges(
        const locator::effective_replication_map_ptr& erm,
        gms::inet_address ep) {
    const auto& tm = *erm->get_token_metadata_ptr();
@@ -323,6 +324,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary
    }
    auto prev_tok = sorted_tokens.back();
    for (const auto& tok : sorted_tokens) {
+        co_await coroutine::maybe_yield();
        inet_address_vector_replica_set eps = erm->get_natural_endpoints(tok);
        if (eps.size() <= 1 || eps[1] != ep) {
            prev_tok = tok;
@@ -350,7 +352,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary
        }
        prev_tok = tok;
    }
-    return ret;
+    co_return ret;
 }


@@ -386,63 +388,63 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary
 //
 // FIXME: Check if this algorithm is safe with tablet migration.
 // https://github.com/scylladb/scylladb/issues/16567
-enum primary_or_secondary_t {primary, secondary};
-template<primary_or_secondary_t primary_or_secondary>
-class token_ranges_owned_by_this_shard {
-    // ranges_holder_primary holds just the primary ranges themselves
-    class ranges_holder_primary {
-        const dht::token_range_vector _token_ranges;
-     public:
-        ranges_holder_primary(const locator::vnode_effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)
-            : _token_ranges(erm->get_primary_ranges(ep)) {}
-        std::size_t size() const { return _token_ranges.size(); }
-        const dht::token_range& operator[](std::size_t i) const {
-            return _token_ranges[i];
-        }
-        bool should_skip(std::size_t i) const {
-            return false;
-        }
-    };
-    // ranges_holder<secondary> holds the secondary token ranges plus each
-    // range's primary owner, needed to implement should_skip().
-    class ranges_holder_secondary {
-        std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;
-        gms::gossiper& _gossiper;
-     public:
-        ranges_holder_secondary(const locator::effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)
-            : _token_ranges(get_secondary_ranges(erm, ep))
-            , _gossiper(g) {}
-        std::size_t size() const { return _token_ranges.size(); }
-        const dht::token_range& operator[](std::size_t i) const {
-            return _token_ranges[i].first;
-        }
-        // range i should be skipped if its primary owner is alive.
-        bool should_skip(std::size_t i) const {
-            return _gossiper.is_alive(_token_ranges[i].second);
-        }
-    };

+// ranges_holder_primary holds just the primary ranges themselves
+class ranges_holder_primary {
+    dht::token_range_vector _token_ranges;
+public:
+    explicit ranges_holder_primary(dht::token_range_vector token_ranges) : _token_ranges(std::move(token_ranges)) {}
+    static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map_ptr& erm, gms::inet_address ep) {
+        co_return ranges_holder_primary(co_await erm->get_primary_ranges(ep));
+    }
+    std::size_t size() const { return _token_ranges.size(); }
+    const dht::token_range& operator[](std::size_t i) const {
+        return _token_ranges[i];
+    }
+    bool should_skip(std::size_t i) const {
+        return false;
+    }
+};
+// ranges_holder<secondary> holds the secondary token ranges plus each
+// range's primary owner, needed to implement should_skip().
+class ranges_holder_secondary {
+    std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;
+    const gms::gossiper& _gossiper;
+public:
+    explicit ranges_holder_secondary(std::vector<std::pair<dht::token_range, gms::inet_address>> token_ranges, const gms::gossiper& g)
+        : _token_ranges(std::move(token_ranges))
+        , _gossiper(g) {}
+    static future<ranges_holder_secondary> make(const locator::effective_replication_map_ptr& erm, gms::inet_address ep, const gms::gossiper& g) {
+        co_return ranges_holder_secondary(co_await get_secondary_ranges(erm, ep), g);
+    }
+    std::size_t size() const { return _token_ranges.size(); }
+    const dht::token_range& operator[](std::size_t i) const {
+        return _token_ranges[i].first;
+    }
+    // range i should be skipped if its primary owner is alive.
+    bool should_skip(std::size_t i) const {
+        return _gossiper.is_alive(_token_ranges[i].second);
+    }
+};
+
+template<class primary_or_secondary_t>
+class token_ranges_owned_by_this_shard {
    schema_ptr _s;
    locator::effective_replication_map_ptr _erm;
    // _token_ranges will contain a list of token ranges owned by this node.
    // We'll further need to split each such range to the pieces owned by
    // the current shard, using _intersecter.
-    using ranges_holder = std::conditional_t<
-            primary_or_secondary == primary_or_secondary_t::primary,
-            ranges_holder_primary,
-            ranges_holder_secondary>;
-    const ranges_holder _token_ranges;
+    const primary_or_secondary_t _token_ranges;
    // NOTICE: _range_idx is used modulo _token_ranges size when accessing
    // the data to ensure that it doesn't go out of bounds
    size_t _range_idx;
    size_t _end_idx;
    std::optional<dht::selective_token_range_sharder> _intersecter;
 public:
-    token_ranges_owned_by_this_shard(replica::database& db, gms::gossiper& g, schema_ptr s)
+    token_ranges_owned_by_this_shard(schema_ptr s, primary_or_secondary_t token_ranges)
        :  _s(s)
        , _erm(s->table().get_effective_replication_map())
-        , _token_ranges(db.find_keyspace(s->ks_name()).get_vnode_effective_replication_map(),
-                g, _erm->get_topology().my_address())
+        , _token_ranges(std::move(token_ranges))
        , _range_idx(random_offset(0, _token_ranges.size() - 1))
        , _end_idx(_range_idx + _token_ranges.size())
    {
@@ -498,6 +500,7 @@ struct scan_ranges_context {
    bytes column_name;
    std::optional<std::string> member;

+    service::client_state internal_client_state;
    ::shared_ptr<cql3::selection::selection> selection;
    std::unique_ptr<service::query_state> query_state_ptr;
    std::unique_ptr<cql3::query_options> query_options;
@@ -507,6 +510,7 @@ struct scan_ranges_context {
        : s(s)
        , column_name(column_name)
        , member(member)
+        , internal_client_state(service::client_state::internal_tag())
    {
        // FIXME: don't read the entire items - read only parts of it.
        // We must read the key columns (to be able to delete) and also
@@ -525,10 +529,9 @@ struct scan_ranges_context {
        std::vector<query::clustering_range> ck_bounds{query::clustering_range::make_open_ended_both_sides()};
        auto partition_slice = query::partition_slice(std::move(ck_bounds), {}, std::move(regular_columns), opts);
        command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));
-        executor::client_state client_state{executor::client_state::internal_tag()};
        tracing::trace_state_ptr trace_state;
        // NOTICE: empty_service_permit is used because the TTL service has fixed parallelism
-        query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, empty_service_permit());
+        query_state_ptr = std::make_unique<service::query_state>(internal_client_state, trace_state, empty_service_permit());
        // FIXME: What should we do on multi-DC? Will we run the expiration on the same ranges on all
        // DCs or only once for each range? If the latter, we need to change the CLs in the
        // scanner and deleter.
@@ -724,7 +727,9 @@ static future<bool> scan_table(
    expiration_stats.scan_table++;
    // FIXME: need to pace the scan, not do it all at once.
    scan_ranges_context scan_ctx{s, proxy, std::move(column_name), std::move(member)};
-    token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), gossiper, s);
+    auto erm = db.real_database().find_keyspace(s->ks_name()).get_vnode_effective_replication_map();
+    auto my_address = erm->get_topology().my_address();
+    token_ranges_owned_by_this_shard my_ranges(s, co_await ranges_holder_primary::make(erm, my_address));
    while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {
        // Note that because of issue #9167 we need to run a separate
        // query on each partition range, and can't pass several of
@@ -744,7 +749,7 @@ static future<bool> scan_table(
    // by tasking another node to take over scanning of the dead node's primary
    // ranges. What we do here is that this node will also check expiration
    // on its *secondary* ranges - but only those whose primary owner is down.
-    token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), gossiper, s);
+    token_ranges_owned_by_this_shard my_secondary_ranges(s, co_await ranges_holder_secondary::make(erm, my_address, gossiper));
    while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {
        expiration_stats.secondary_ranges_scanned++;
        dht::partition_range_vector partition_ranges;
--- a/api/api-doc/storage_service.json
+++ b/api/api-doc/storage_service.json
@@ -1891,6 +1891,14 @@
                     "allowMultiple":false,
                     "type":"string",
                     "paramType":"query"
+                  },
+                  {
+                     "name":"force",
+                     "description":"Enforce the source_dc option, even if it unsafe to use for rebuild",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"boolean",
+                     "paramType":"query"
                  }
               ]
            }
--- a/api/api-doc/system.json
+++ b/api/api-doc/system.json
@@ -194,6 +194,21 @@
               "parameters":[]
            }
         ]
+      },
+      {
+         "path":"/system/highest_supported_sstable_version",
+         "operations":[
+            {
+               "method":"GET",
+               "summary":"Get highest supported sstable version",
+               "type":"string",
+               "nickname":"get_highest_supported_sstable_version",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[]
+            }
+         ]
      }
   ]
 }
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -54,6 +54,7 @@
 #include "locator/abstract_replication_strategy.hh"
 #include "sstables_loader.hh"
 #include "db/view/view_builder.hh"
+#include "utils/user_provided_param.hh"

 using namespace seastar::httpd;
 using namespace std::chrono_literals;
@@ -1096,7 +1097,16 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
    });

    ss::rebuild.set(r, [&ss](std::unique_ptr<http::request> req) {
-        auto source_dc = req->get_query_param("source_dc");
+        utils::optional_param source_dc;
+        if (auto source_dc_str = req->get_query_param("source_dc"); !source_dc_str.empty()) {
+            source_dc.emplace(std::move(source_dc_str)).set_user_provided();
+        }
+        if (auto force_str = req->get_query_param("force"); !force_str.empty() && service::loosen_constraints(validate_bool(force_str))) {
+            if (!source_dc) {
+                throw bad_param_exception("The `source_dc` option must be provided for using the `force` option");
+            }
+            source_dc.set_force();
+        }
        apilog.info("rebuild: source_dc={}", source_dc);
        return ss.local().rebuild(std::move(source_dc)).then([] {
            return make_ready_future<json::json_return_type>(json_void());
--- a/api/system.cc
+++ b/api/system.cc
@@ -10,6 +10,7 @@
 #include "api/api-doc/system.json.hh"
 #include "api/api-doc/metrics.json.hh"
 #include "replica/database.hh"
+#include "sstables/sstables_manager.hh"

 #include <rapidjson/document.h>
 #include <seastar/core/reactor.hh>
@@ -182,6 +183,11 @@ void set_system(http_context& ctx, routes& r) {
        apilog.info("Profile dumped to {}", profile_dest);
        return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));
    }) ;
+
+    hs::get_highest_supported_sstable_version.set(r, [&ctx] (const_req req) {
+        auto& table = ctx.db.local().find_column_family("system", "local");
+        return seastar::to_sstring(table.get_sstables_manager().get_highest_supported_format());
+    });
 }

 }
--- a/auth/certificate_authenticator.cc
+++ b/auth/certificate_authenticator.cc
@@ -76,7 +76,7 @@ auth::certificate_authenticator::certificate_authenticator(cql3::query_processor
                    continue;
                } catch (std::out_of_range&) {
                    // just fallthrough
-                } catch (std::regex_error&) {
+                } catch (boost::regex_error&) {
                    std::throw_with_nested(std::invalid_argument(fmt::format("Invalid query expression: {}", map.at(cfg_query_attr))));
                }
            }
--- a/auth/common.cc
+++ b/auth/common.cc
@@ -71,7 +71,7 @@ static future<> create_legacy_metadata_table_if_missing_impl(
    assert(this_shard_id() == 0); // once_among_shards makes sure a function is executed on shard 0 only

    auto db = qp.db();
-    auto parsed_statement = cql3::query_processor::parse_statement(cql);
+    auto parsed_statement = cql3::query_processor::parse_statement(cql, cql3::dialect{});
    auto& parsed_cf_statement = static_cast<cql3::statements::raw::cf_statement&>(*parsed_statement);

    parsed_cf_statement.prepare_keyspace(meta::legacy::AUTH_KS);
@@ -121,7 +121,7 @@ static future<> announce_mutations_with_guard(
        ::service::raft_group0_client& group0_client,
        std::vector<canonical_mutation> muts,
        ::service::group0_guard group0_guard,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout) {
    auto group0_cmd = group0_client.prepare_command(
        ::service::write_mutations{
@@ -137,7 +137,7 @@ future<> announce_mutations_with_batching(
        ::service::raft_group0_client& group0_client,
        start_operation_func_t start_operation_func,
        std::function<::service::mutations_generator(api::timestamp_type t)> gen,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout) {
    // account for command's overhead, it's better to use smaller threshold than constantly bounce off the limit
    size_t memory_threshold = group0_client.max_command_size() * 0.75;
@@ -188,7 +188,7 @@ future<> announce_mutations(
        ::service::raft_group0_client& group0_client,
        const sstring query_string,
        std::vector<data_value_or_unset> values,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout) {
    auto group0_guard = co_await group0_client.start_operation(as, timeout);
    auto timestamp = group0_guard.write_timestamp();
--- a/auth/common.hh
+++ b/auth/common.hh
@@ -80,7 +80,7 @@ future<> create_legacy_metadata_table_if_missing(
 // Execute update query via group0 mechanism, mutations will be applied on all nodes.
 // Use this function when need to perform read before write on a single guard or if
 // you have more than one mutation and potentially exceed single command size limit.
-using start_operation_func_t = std::function<future<::service::group0_guard>(abort_source*)>;
+using start_operation_func_t = std::function<future<::service::group0_guard>(abort_source&)>;
 future<> announce_mutations_with_batching(
        ::service::raft_group0_client& group0_client,
        // since we can operate also in topology coordinator context where we need stronger
@@ -88,7 +88,7 @@ future<> announce_mutations_with_batching(
        // function here
        start_operation_func_t start_operation_func,
        std::function<::service::mutations_generator(api::timestamp_type t)> gen,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout);

 // Execute update query via group0 mechanism, mutations will be applied on all nodes.
@@ -97,7 +97,7 @@ future<> announce_mutations(
        ::service::raft_group0_client& group0_client,
        const sstring query_string,
        std::vector<data_value_or_unset> values,
-        seastar::abort_source* as,
+        seastar::abort_source& as,
        std::optional<::service::raft_timeout> timeout);

 // Appends mutations to a collector, they will be applied later on all nodes via group0 mechanism.
--- a/auth/password_authenticator.cc
+++ b/auth/password_authenticator.cc
@@ -136,7 +136,7 @@ future<> password_authenticator::create_default_if_missing() {
        plogger.info("Created default superuser authentication record.");
    } else {
        co_await announce_mutations(_qp, _group0_client, query,
-            {salted_pwd, _superuser}, &_as, ::service::raft_timeout{});
+            {salted_pwd, _superuser}, _as, ::service::raft_timeout{});
        plogger.info("Created default superuser authentication record.");
    }
 }
--- a/auth/service.cc
+++ b/auth/service.cc
@@ -681,7 +681,7 @@ future<> migrate_to_auth_v2(db::system_keyspace& sys_ks, ::service::raft_group0_
    co_await announce_mutations_with_batching(g0,
            start_operation_func,
            std::move(gen),
-            &as,
+            as,
            std::nullopt);
 }

--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -192,7 +192,7 @@ future<> standard_role_manager::create_default_role_if_missing() {
                    {_superuser},
                    cql3::query_processor::cache_internal::no).discard_result();
        } else {
-            co_await announce_mutations(_qp, _group0_client, query, {_superuser}, &_as, ::service::raft_timeout{});
+            co_await announce_mutations(_qp, _group0_client, query, {_superuser}, _as, ::service::raft_timeout{});
        }
        log.info("Created default superuser role '{}'.", _superuser);
    } catch(const exceptions::unavailable_exception& e) {
--- a/cdc/generation.cc
+++ b/cdc/generation.cc
@@ -1110,7 +1110,9 @@ future<bool> generation_service::legacy_do_handle_cdc_generation(cdc::generation
    auto sys_dist_ks = get_sys_dist_ks();
    auto gen = co_await retrieve_generation_data(gen_id, _sys_ks.local(), *sys_dist_ks, { _token_metadata.get()->count_normal_token_owners() });
    if (!gen) {
-        throw std::runtime_error(format(
+        // This may happen during raft upgrade when a node gossips about a generation that
+        // was propagated through raft and we didn't apply it yet.
+        throw generation_handling_nonfatal_exception(format(
            "Could not find CDC generation {} in distributed system tables (current time: {}),"
            " even though some node gossiped about it.",
            gen_id, db_clock::now()));
--- a/cdc/metadata.cc
+++ b/cdc/metadata.cc
@@ -186,7 +186,7 @@ bool cdc::metadata::prepare(db_clock::time_point tp) {
    }

    auto ts = to_ts(tp);
-    auto emplaced = _gens.emplace(to_ts(tp), std::nullopt).second;
+    auto [it, emplaced] = _gens.emplace(to_ts(tp), std::nullopt);

    if (_last_stream_timestamp != api::missing_timestamp) {
        auto last_correct_gen = gen_used_at(_last_stream_timestamp);
@@ -201,5 +201,5 @@ bool cdc::metadata::prepare(db_clock::time_point tp) {
        }
    }

-    return emplaced;
+    return !it->second;
 }
--- a/compaction/compaction.cc
+++ b/compaction/compaction.cc
@@ -172,7 +172,8 @@ static api::timestamp_type get_max_purgeable_timestamp(const table_state& table_
 }

 static std::vector<shared_sstable> get_uncompacting_sstables(const table_state& table_s, std::vector<shared_sstable> sstables) {
-    auto all_sstables = boost::copy_range<std::vector<shared_sstable>>(*table_s.main_sstable_set().all());
+    auto sstable_set = table_s.sstable_set_for_tombstone_gc();
+    auto all_sstables = boost::copy_range<std::vector<shared_sstable>>(*sstable_set->all());
    auto& compacted_undeleted = table_s.compacted_undeleted_sstables();
    all_sstables.insert(all_sstables.end(), compacted_undeleted.begin(), compacted_undeleted.end());
    boost::sort(all_sstables, [] (const shared_sstable& x, const shared_sstable& y) {
--- a/compaction/compaction_manager.cc
+++ b/compaction/compaction_manager.cc
@@ -187,7 +187,7 @@ unsigned compaction_manager::current_compaction_fan_in_threshold() const {
        return 0;
    }
    auto largest_fan_in = std::ranges::max(_tasks | boost::adaptors::transformed([] (auto& task) {
-        return task->compaction_running() ? task->compaction_data().compaction_fan_in : 0;
+        return task.compaction_running() ? task.compaction_data().compaction_fan_in : 0;
    }));
    // conservatively limit fan-in threshold to 32, such that tons of small sstables won't accumulate if
    // running major on a leveled table, which can even have more than one thousand files.
@@ -387,11 +387,26 @@ future<sstables::compaction_result> compaction_task_executor::compact_sstables_a

    co_return res;
 }
+
+future<sstables::sstable_set> compaction_task_executor::sstable_set_for_tombstone_gc(table_state& t) {
+    auto compound_set = t.sstable_set_for_tombstone_gc();
+    // Compound set will be linearized into a single set, since compaction might add or remove sstables
+    // to it for incremental compaction to work.
+    auto new_set = sstables::make_partitioned_sstable_set(t.schema(), false);
+    co_await compound_set->for_each_sstable_gently([&] (const sstables::shared_sstable& sst) {
+        auto inserted = new_set.insert(sst);
+        if (!inserted) {
+            on_internal_error(cmlog, format("Unable to insert SSTable {} into set used for tombstone GC", sst->get_filename()));
+        }
+    });
+    co_return std::move(new_set);
+}
+
 future<sstables::compaction_result> compaction_task_executor::compact_sstables(sstables::compaction_descriptor descriptor, sstables::compaction_data& cdata, on_replacement& on_replace, compaction_manager::can_purge_tombstones can_purge,
                                                                               sstables::offstrategy offstrategy) {
    table_state& t = *_compacting_table;
    if (can_purge) {
-        descriptor.enable_garbage_collection(t.main_sstable_set());
+        descriptor.enable_garbage_collection(co_await sstable_set_for_tombstone_gc(t));
    }
    descriptor.creator = [&t] (shard_id dummy) {
        auto sst = t.make_sstable();
@@ -577,9 +592,9 @@ requires (compaction_manager& cm, throw_if_stopping do_throw_if_stopping, Args&&
 }
 future<compaction_manager::compaction_stats_opt> compaction_manager::perform_compaction(throw_if_stopping do_throw_if_stopping, std::optional<tasks::task_info> parent_info, Args&&... args) {
    auto task_executor = seastar::make_shared<TaskExecutor>(*this, do_throw_if_stopping, std::forward<Args>(args)...);
-    _tasks.push_back(task_executor);
-    auto unregister_task = defer([this, task_executor] {
-        _tasks.remove(task_executor);
+    _tasks.push_back(*task_executor);
+    auto unregister_task = defer([task_executor] {
+        task_executor->unlink();
        task_executor->switch_state(compaction_task_executor::state::none);
    });

@@ -885,10 +900,10 @@ public:
    explicit strategy_control(compaction_manager& cm) noexcept : _cm(cm) {}

    bool has_ongoing_compaction(table_state& table_s) const noexcept override {
-        return std::any_of(_cm._tasks.begin(), _cm._tasks.end(), [&s = table_s.schema()] (const shared_ptr<compaction_task_executor>& task) {
-            return task->compaction_running()
-                && task->compacting_table()->schema()->ks_name() == s->ks_name()
-                && task->compacting_table()->schema()->cf_name() == s->cf_name();
+        return std::any_of(_cm._tasks.begin(), _cm._tasks.end(), [&s = table_s.schema()] (const compaction_task_executor& task) {
+            return task.compaction_running()
+                && task.compacting_table()->schema()->ks_name() == s->ks_name()
+                && task.compacting_table()->schema()->cf_name() == s->cf_name();
        });
    }

@@ -1052,7 +1067,7 @@ void compaction_manager::postpone_compaction_for_table(table_state* t) {
    _postponed.insert(t);
 }

-future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_executor>> tasks, sstring reason) {
+future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_executor>> tasks, sstring reason) noexcept {
    // To prevent compaction from being postponed while tasks are being stopped,
    // let's stop all tasks before the deferring point below.
    for (auto& t : tasks) {
@@ -1060,14 +1075,16 @@ future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_e
        t->stop_compaction(reason);
    }
    co_await coroutine::parallel_for_each(tasks, [] (auto& task) -> future<> {
+        auto unlink_task = deferred_action([task] { task->unlink(); });
        try {
            co_await task->compaction_done();
        } catch (sstables::compaction_stopped_exception&) {
            // swallow stop exception if a given procedure decides to propagate it to the caller,
            // as it happens with reshard and reshape.
        } catch (...) {
+            // just log any other errors as the callers have nothing to do with them.
            cmlog.debug("Stopping {}: task returned error: {}", *task, std::current_exception());
-            throw;
+            co_return;
        }
        cmlog.debug("Stopping {}: done", *task);
    });
@@ -1076,9 +1093,12 @@ future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_e
 future<> compaction_manager::stop_ongoing_compactions(sstring reason, table_state* t, std::optional<sstables::compaction_type> type_opt) noexcept {
    try {
        auto ongoing_compactions = get_compactions(t).size();
-        auto tasks = boost::copy_range<std::vector<shared_ptr<compaction_task_executor>>>(_tasks | boost::adaptors::filtered([t, type_opt] (auto& task) {
-            return (!t || task->compacting_table() == t) && (!type_opt || task->compaction_type() == *type_opt);
-        }));
+        auto tasks = _tasks
+                | std::views::filter([t, type_opt] (const auto& task) {
+                    return (!t || task.compacting_table() == t) && (!type_opt || task.compaction_type() == *type_opt);
+                })
+                | std::views::transform([] (auto& task) { return task.shared_from_this(); })
+                | std::ranges::to<std::vector<shared_ptr<compaction_task_executor>>>();
        logging::log_level level = tasks.empty() ? log_level::debug : log_level::info;
        if (cmlog.is_enabled(level)) {
            std::string scope = "";
@@ -1092,8 +1112,9 @@ future<> compaction_manager::stop_ongoing_compactions(sstring reason, table_stat
        }
        return stop_tasks(std::move(tasks), std::move(reason));
    } catch (...) {
-        return current_exception_as_future<>();
+        cmlog.error("Stopping ongoing compactions failed: {}.  Ignored", std::current_exception());
    }
+    return make_ready_future();
 }

 future<> compaction_manager::drain() {
@@ -1110,17 +1131,17 @@ future<> compaction_manager::stop() {
    if (auto cm = std::exchange(_task_manager_module, nullptr)) {
        co_await cm->stop();
    }
-    if (_state != state::none) {
-        co_return co_await std::move(*_stop_future);
+    if (_stop_future) {
+        co_await std::exchange(*_stop_future, make_ready_future());
    }
 }

-future<> compaction_manager::really_do_stop() {
+future<> compaction_manager::really_do_stop() noexcept {
    cmlog.info("Asked to stop");
    // Reset the metrics registry
    _metrics.clear();
    co_await stop_ongoing_compactions("shutdown");
-    co_await coroutine::parallel_for_each(_compaction_state | boost::adaptors::map_values, [] (compaction_state& cs) -> future<> {
+    co_await coroutine::parallel_for_each(_compaction_state | std::views::values, [] (compaction_state& cs) -> future<> {
        if (!cs.gate.is_closed()) {
            co_await cs.gate.close();
        }
@@ -1553,11 +1574,16 @@ protected:
        co_return stats;
    }

-    virtual sstables::compaction_descriptor make_descriptor(const sstables::shared_sstable& sst) const {
+    static sstables::compaction_descriptor
+    make_descriptor(const sstables::shared_sstable& sst, const sstables::compaction_type_options& opt, owned_ranges_ptr owned_ranges = {}) {
        auto sstable_level = sst->get_sstable_level();
        auto run_identifier = sst->run_identifier();
        return sstables::compaction_descriptor({ sst },
-            sstable_level, sstables::compaction_descriptor::default_max_sstable_bytes, run_identifier, _options, _owned_ranges_ptr);
+            sstable_level, sstables::compaction_descriptor::default_max_sstable_bytes, run_identifier, opt, owned_ranges);
+    }
+
+    virtual sstables::compaction_descriptor make_descriptor(const sstables::shared_sstable& sst) const {
+        return make_descriptor(sst, _options, _owned_ranges_ptr);
    }

    virtual future<sstables::compaction_result> rewrite_sstable(const sstables::shared_sstable sst) {
@@ -1610,19 +1636,30 @@ public:
                std::move(sstables), std::move(compacting), compaction_manager::can_purge_tombstones::yes)
            , _opt(options.as<sstables::compaction_type_options::split>())
    {
+        if (utils::get_local_injector().is_enabled("split_sstable_rewrite")) {
+            _do_throw_if_stopping = throw_if_stopping::yes;
+        }
+    }
+
+    static bool sstable_needs_split(const sstables::shared_sstable& sst, const sstables::compaction_type_options::split& opt) {
+        return opt.classifier(sst->get_first_decorated_key().token()) != opt.classifier(sst->get_last_decorated_key().token());
+    }
+
+    static sstables::compaction_descriptor
+    make_descriptor(const sstables::shared_sstable& sst, const sstables::compaction_type_options::split& split_opt) {
+        auto opt = sstables::compaction_type_options::make_split(split_opt.classifier);
+        return rewrite_sstables_compaction_task_executor::make_descriptor(sst, std::move(opt));
    }
 private:
    bool sstable_needs_split(const sstables::shared_sstable& sst) const {
-        return _opt.classifier(sst->get_first_decorated_key().token()) != _opt.classifier(sst->get_last_decorated_key().token());
+        return sstable_needs_split(sst, _opt);
    }
 protected:
    sstables::compaction_descriptor make_descriptor(const sstables::shared_sstable& sst) const override {
-        auto desc = rewrite_sstables_compaction_task_executor::make_descriptor(sst);
-        desc.options = sstables::compaction_type_options::make_split(_opt.classifier);
-        return desc;
+        return make_descriptor(sst, _opt);
    }

-    future<sstables::compaction_result> rewrite_sstable(const sstables::shared_sstable sst) override {
+    future<sstables::compaction_result> do_rewrite_sstable(const sstables::shared_sstable sst) {
        if (sstable_needs_split(sst)) {
            return rewrite_sstables_compaction_task_executor::rewrite_sstable(std::move(sst));
        }
@@ -1635,6 +1672,20 @@ protected:
            return sstables::compaction_result{};
        });
    }
+
+    future<sstables::compaction_result> rewrite_sstable(const sstables::shared_sstable sst) override {
+        co_await utils::get_local_injector().inject("split_sstable_rewrite", [this] (auto& handler) -> future<> {
+            cmlog.info("split_sstable_rewrite: waiting");
+            while (!handler.poll_for_message() && !_compaction_data.is_stop_requested()) {
+                co_await sleep(std::chrono::milliseconds(5));
+            }
+            cmlog.info("split_sstable_rewrite: released");
+            if (_compaction_data.is_stop_requested()) {
+                throw make_compaction_stopped_exception();
+            }
+        }, false);
+        co_return co_await do_rewrite_sstable(std::move(sst));
+    }
 };

 }
@@ -1965,7 +2016,7 @@ future<> compaction_manager::perform_cleanup(owned_ranges_ptr sorted_owned_range
 future<> compaction_manager::try_perform_cleanup(owned_ranges_ptr sorted_owned_ranges, table_state& t, std::optional<tasks::task_info> info) {
    auto check_for_cleanup = [this, &t] {
        return boost::algorithm::any_of(_tasks, [&t] (auto& task) {
-            return task->compacting_table() == &t && task->compaction_type() == sstables::compaction_type::Cleanup;
+            return task.compacting_table() == &t && task.compaction_type() == sstables::compaction_type::Cleanup;
        });
    };
    if (check_for_cleanup()) {
@@ -2056,6 +2107,31 @@ future<compaction_manager::compaction_stats_opt> compaction_manager::perform_spl
    return perform_task_on_all_files<split_compaction_task_executor>(info, t, std::move(options), std::move(owned_ranges_ptr), std::move(get_sstables));
 }

+future<std::vector<sstables::shared_sstable>>
+compaction_manager::maybe_split_sstable(sstables::shared_sstable sst, table_state& t, sstables::compaction_type_options::split opt) {
+    if (!split_compaction_task_executor::sstable_needs_split(sst, opt)) {
+        co_return std::vector<sstables::shared_sstable>{sst};
+    }
+    std::vector<sstables::shared_sstable> ret;
+
+        // FIXME: indentation.
+        auto gate = get_compaction_state(&t).gate.hold();
+        sstables::compaction_progress_monitor monitor;
+        sstables::compaction_data info = create_compaction_data();
+        sstables::compaction_descriptor desc = split_compaction_task_executor::make_descriptor(sst, opt);
+        desc.creator = [&t] (shard_id _) {
+            return t.make_sstable();
+        };
+        desc.replacer = [&] (sstables::compaction_completion_desc d) {
+            std::move(d.new_sstables.begin(), d.new_sstables.end(), std::back_inserter(ret));
+        };
+
+        co_await sstables::compact_sstables(std::move(desc), info, t, monitor);
+        co_await sst->unlink();
+
+    co_return ret;
+}
+
 // Submit a table to be scrubbed and wait for its termination.
 future<compaction_manager::compaction_stats_opt> compaction_manager::perform_sstable_scrub(table_state& t, sstables::compaction_type_options::scrub opts, std::optional<tasks::task_info> info) {
    auto scrub_mode = opts.operation_mode;
@@ -2121,11 +2197,11 @@ future<> compaction_manager::remove(table_state& t) noexcept {
    auto found = false;
    sstring msg;
    for (auto& task : _tasks) {
-        if (task->compacting_table() == &t) {
+        if (task.compacting_table() == &t) {
            if (!msg.empty()) {
                msg += "\n";
            }
-            msg += format("Found {} after remove", *task.get());
+            msg += format("Found {} after remove", task);
            found = true;
        }
    }
@@ -2136,30 +2212,38 @@ future<> compaction_manager::remove(table_state& t) noexcept {
 }

 const std::vector<sstables::compaction_info> compaction_manager::get_compactions(table_state* t) const {
-    auto to_info = [] (const shared_ptr<compaction_task_executor>& task) {
+    auto to_info = [] (const compaction_task_executor& task) {
        sstables::compaction_info ret;
-        ret.compaction_uuid = task->compaction_data().compaction_uuid;
-        ret.type = task->compaction_type();
-        ret.ks_name = task->compacting_table()->schema()->ks_name();
-        ret.cf_name = task->compacting_table()->schema()->cf_name();
-        ret.total_partitions = task->compaction_data().total_partitions;
-        ret.total_keys_written = task->compaction_data().total_keys_written;
+        ret.compaction_uuid = task.compaction_data().compaction_uuid;
+        ret.type = task.compaction_type();
+        ret.ks_name = task.compacting_table()->schema()->ks_name();
+        ret.cf_name = task.compacting_table()->schema()->cf_name();
+        ret.total_partitions = task.compaction_data().total_partitions;
+        ret.total_keys_written = task.compaction_data().total_keys_written;
        return ret;
    };
    using ret = std::vector<sstables::compaction_info>;
-    return boost::copy_range<ret>(_tasks | boost::adaptors::filtered([t] (const shared_ptr<compaction_task_executor>& task) {
-                return (!t || task->compacting_table() == t) && task->compaction_running();
+    return boost::copy_range<ret>(_tasks | boost::adaptors::filtered([t] (const compaction_task_executor& task) {
+                return (!t || task.compacting_table() == t) && task.compaction_running();
            }) | boost::adaptors::transformed(to_info));
 }

 bool compaction_manager::has_table_ongoing_compaction(const table_state& t) const {
-    return std::any_of(_tasks.begin(), _tasks.end(), [&t] (const shared_ptr<compaction_task_executor>& task) {
-        return task->compacting_table() == &t && task->compaction_running();
+    return std::any_of(_tasks.begin(), _tasks.end(), [&t] (const compaction_task_executor& task) {
+        return task.compacting_table() == &t && task.compaction_running();
    });
 };

 bool compaction_manager::compaction_disabled(table_state& t) const {
-    return _compaction_state.contains(&t) && _compaction_state.at(&t).compaction_disabled();
+    if (auto it = _compaction_state.find(&t); it != _compaction_state.end()) {
+        return it->second.compaction_disabled();
+    } else {
+        cmlog.debug("compaction_disabled: {}:{} not in compaction_state", t.schema()->id(), t.get_group_id());
+        // Compaction is not strictly disabled, but it is not enabled either.
+        // The callers actually care about if it's enabled or not, not about the actual state of
+        // compaction_state::compaction_disabled()
+        return true;
+    }
 }

 future<> compaction_manager::stop_compaction(sstring type, table_state* table) {
@@ -2184,8 +2268,8 @@ future<> compaction_manager::stop_compaction(sstring type, table_state* table) {
 void compaction_manager::propagate_replacement(table_state& t,
        const std::vector<sstables::shared_sstable>& removed, const std::vector<sstables::shared_sstable>& added) {
    for (auto& task : _tasks) {
-        if (task->compacting_table() == &t && task->compaction_running()) {
-            task->compaction_data().pending_replacements.push_back({ removed, added });
+        if (task.compacting_table() == &t && task.compaction_running()) {
+            task.compaction_data().pending_replacements.push_back({ removed, added });
        }
    }
 }
--- a/compaction/compaction_manager.hh
+++ b/compaction/compaction_manager.hh
@@ -93,8 +93,13 @@ public:

 private:
    shared_ptr<compaction::task_manager_module> _task_manager_module;
+
+    using compaction_task_executor_list_type = bi::list<
+            compaction_task_executor,
+            bi::base_hook<bi::list_base_hook<bi::link_mode<bi::auto_unlink>>>,
+            bi::constant_time_size<false>>;
    // compaction manager may have N fibers to allow parallel compaction per shard.
-    std::list<shared_ptr<compaction::compaction_task_executor>> _tasks;
+    compaction_task_executor_list_type _tasks;

    // Possible states in which the compaction manager can be found.
    //
@@ -180,7 +185,7 @@ private:
    }
    future<compaction_manager::compaction_stats_opt> perform_compaction(throw_if_stopping do_throw_if_stopping, std::optional<tasks::task_info> parent_info, Args&&... args);

-    future<> stop_tasks(std::vector<shared_ptr<compaction::compaction_task_executor>> tasks, sstring reason);
+    future<> stop_tasks(std::vector<shared_ptr<compaction::compaction_task_executor>> tasks, sstring reason) noexcept;
    future<> update_throughput(uint32_t value_mbs);

    // Return the largest fan-in of currently running compactions
@@ -246,7 +251,7 @@ private:

    // Stop all fibers, without waiting. Safe to be called multiple times.
    void do_stop() noexcept;
-    future<> really_do_stop();
+    future<> really_do_stop() noexcept;

    // Propagate replacement of sstables to all ongoing compaction of a given table
    void propagate_replacement(compaction::table_state& t, const std::vector<sstables::shared_sstable>& removed, const std::vector<sstables::shared_sstable>& added);
@@ -347,6 +352,11 @@ public:
    // or user aborted splitting using stop API.
    future<compaction_stats_opt> perform_split_compaction(compaction::table_state& t, sstables::compaction_type_options::split opt, std::optional<tasks::task_info> info = std::nullopt);

+    // Splits a single SSTable by segregating all its data according to the classifier.
+    // If SSTable doesn't need split, the same input SSTable is returned as output.
+    // If SSTable needs split, then output SSTables are returned and the input SSTable is deleted.
+    future<std::vector<sstables::shared_sstable>> maybe_split_sstable(sstables::shared_sstable sst, table_state& t, sstables::compaction_type_options::split opt);
+
    // Run a custom job for a given table, defined by a function
    // it completes when future returned by job is ready or returns immediately
    // if manager was asked to stop.
@@ -462,7 +472,9 @@ public:

 namespace compaction {

-class compaction_task_executor : public enable_shared_from_this<compaction_task_executor> {
+class compaction_task_executor
+    : public enable_shared_from_this<compaction_task_executor>
+    , public boost::intrusive::list_base_hook<boost::intrusive::link_mode<boost::intrusive::auto_unlink>> {
 public:
    enum class state {
        none,       // initial and final state
@@ -586,6 +598,8 @@ private:
    future<compaction_manager::compaction_stats_opt> compaction_done() noexcept {
        return _compaction_done.get_future();
    }
+
+    future<sstables::sstable_set> sstable_set_for_tombstone_gc(::compaction::table_state& t);
 public:
    bool stopping() const noexcept {
        return _compaction_data.abort.abort_requested();
@@ -606,7 +620,7 @@ public:
    friend future<compaction_manager::compaction_stats_opt> compaction_manager::perform_compaction(throw_if_stopping do_throw_if_stopping, std::optional<tasks::task_info> parent_info, Args&&... args);
    friend future<compaction_manager::compaction_stats_opt> compaction_manager::perform_task(shared_ptr<compaction_task_executor> task, throw_if_stopping do_throw_if_stopping);
    friend fmt::formatter<compaction_task_executor>;
-    friend future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_executor>> tasks, sstring reason);
+    friend future<> compaction_manager::stop_tasks(std::vector<shared_ptr<compaction_task_executor>> tasks, sstring reason) noexcept;
    friend sstables::test_env_compaction_manager;
 };

--- a/compaction/table_state.hh
+++ b/compaction/table_state.hh
@@ -39,6 +39,7 @@ public:
    virtual bool compaction_enforce_min_threshold() const noexcept = 0;
    virtual const sstables::sstable_set& main_sstable_set() const = 0;
    virtual const sstables::sstable_set& maintenance_sstable_set() const = 0;
+    virtual lw_shared_ptr<const sstables::sstable_set> sstable_set_for_tombstone_gc() const = 0;
    virtual std::unordered_set<sstables::shared_sstable> fully_expired_sstables(const std::vector<sstables::shared_sstable>& sstables, gc_clock::time_point compaction_time) const = 0;
    virtual const std::vector<sstables::shared_sstable>& compacted_undeleted_sstables() const noexcept = 0;
    virtual sstables::compaction_strategy& get_compaction_strategy() const noexcept = 0;
--- a/compaction/task_manager_module.cc
+++ b/compaction/task_manager_module.cc
@@ -467,7 +467,16 @@ future<> shard_cleanup_keyspace_compaction_task_impl::run() {

 future<> table_cleanup_keyspace_compaction_task_impl::run() {
    co_await wait_for_your_turn(_cv, _current_task, _status.id);
-    auto owned_ranges_ptr = compaction::make_owned_ranges_ptr(_db.get_keyspace_local_ranges(_status.keyspace));
+    // Note that we do not hold an effective_replication_map_ptr throughout
+    // the cleanup operation, so the topology might change.
+    // Since clenaup is an admin operation required for vnodes,
+    // it is the responsibility of the system operator to not
+    // perform additional incompatible range movements during cleanup.
+    auto get_owned_ranges = [&] (std::string_view ks_name) -> future<owned_ranges_ptr> {
+        const auto& erm = _db.find_keyspace(ks_name).get_vnode_effective_replication_map();
+        co_return compaction::make_owned_ranges_ptr(co_await _db.get_keyspace_local_ranges(erm));
+    };
+    auto owned_ranges_ptr = co_await get_owned_ranges(_status.keyspace);
    co_await run_on_table("force_keyspace_cleanup", _db, _status.keyspace, _ti, [&] (replica::table& t) {
        // skip the flush, as cleanup_keyspace_compaction_task_impl::run should have done this.
        return t.perform_cleanup_compaction(owned_ranges_ptr, tasks::task_info{_status.id, _status.shard}, replica::table::do_flush::no);
@@ -531,8 +540,15 @@ future<> shard_upgrade_sstables_compaction_task_impl::run() {

 future<> table_upgrade_sstables_compaction_task_impl::run() {
    co_await wait_for_your_turn(_cv, _current_task, _status.id);
-    auto owned_ranges = _db.maybe_get_keyspace_local_ranges(_status.keyspace);
-    auto owned_ranges_ptr = owned_ranges ? compaction::make_owned_ranges_ptr(std::move(owned_ranges.value())) : nullptr;
+    auto get_owned_ranges = [&] (std::string_view keyspace_name) -> future<owned_ranges_ptr> {
+        const auto& ks = _db.find_keyspace(keyspace_name);
+        if (ks.get_replication_strategy().is_per_table()) {
+            co_return nullptr;
+        }
+        const auto& erm = ks.get_vnode_effective_replication_map();
+        co_return compaction::make_owned_ranges_ptr(co_await _db.get_keyspace_local_ranges(erm));
+    };
+    auto owned_ranges_ptr = co_await get_owned_ranges(_status.keyspace);
    tasks::task_info info{_status.id, _status.shard};
    co_await run_on_table("upgrade_sstables", _db, _status.keyspace, _ti, [&] (replica::table& t) -> future<> {
        return t.parallel_foreach_table_state([&] (compaction::table_state& ts) -> future<> {
--- a/compaction/time_window_compaction_strategy.cc
+++ b/compaction/time_window_compaction_strategy.cc
@@ -295,7 +295,8 @@ time_window_compaction_strategy::get_reshaping_job(std::vector<shared_sstable> i
            // When trimming, let's keep sstables with overlapping time window, so as to reduce write amplification.
            // For example, if there are N sstables spanning window W, where N <= 32, then we can produce all data for W
            // in a single compaction round, removing the need to later compact W to reduce its number of files.
-            boost::partial_sort(multi_window, multi_window.begin() + max_sstables, [](const shared_sstable &a, const shared_sstable &b) {
+            auto sort_size = std::min(max_sstables, multi_window.size());
+            boost::partial_sort(multi_window, multi_window.begin() + sort_size, [](const shared_sstable &a, const shared_sstable &b) {
                return a->get_stats_metadata().max_timestamp < b->get_stats_metadata().max_timestamp;
            });
            maybe_trim_job(multi_window, job_size, disjoint);
--- a/compress.cc
+++ b/compress.cc
@@ -14,7 +14,9 @@
 #include "exceptions/exceptions.hh"
 #include "utils/class_registrator.hh"

-const sstring compressor::namespace_prefix = "org.apache.cassandra.io.compress.";
+sstring compressor::make_name(std::string_view short_name) {
+    return seastar::format("org.apache.cassandra.io.compress.{}", short_name);
+}

 class lz4_processor: public compressor {
 public:
@@ -66,7 +68,7 @@ compressor::ptr_type compressor::create(const sstring& name, const opt_getter& o
        return {};
    }

-    qualified_name qn(namespace_prefix, name);
+    qualified_name qn(make_name(""), name);

    for (auto& c : { lz4, snappy, deflate }) {
        if (c->name() == static_cast<const sstring&>(qn)) {
@@ -91,9 +93,9 @@ shared_ptr<compressor> compressor::create(const std::map<sstring, sstring>& opti
    return {};
 }

-thread_local const shared_ptr<compressor> compressor::lz4 = ::make_shared<lz4_processor>(namespace_prefix + "LZ4Compressor");
-thread_local const shared_ptr<compressor> compressor::snappy = ::make_shared<snappy_processor>(namespace_prefix + "SnappyCompressor");
-thread_local const shared_ptr<compressor> compressor::deflate = ::make_shared<deflate_processor>(namespace_prefix + "DeflateCompressor");
+thread_local const shared_ptr<compressor> compressor::lz4 = ::make_shared<lz4_processor>(make_name("LZ4Compressor"));
+thread_local const shared_ptr<compressor> compressor::snappy = ::make_shared<snappy_processor>(make_name("SnappyCompressor"));
+thread_local const shared_ptr<compressor> compressor::deflate = ::make_shared<deflate_processor>(make_name("DeflateCompressor"));

 const sstring compression_parameters::SSTABLE_COMPRESSION = "sstable_compression";
 const sstring compression_parameters::CHUNK_LENGTH_KB = "chunk_length_in_kb";
--- a/compress.hh
+++ b/compress.hh
@@ -69,7 +69,7 @@ public:
    static thread_local const ptr_type snappy;
    static thread_local const ptr_type deflate;

-    static const sstring namespace_prefix;
+    static sstring make_name(std::string_view short_name);
 };

 template<typename BaseType, typename... Args>
--- a/configure.py
+++ b/configure.py
@@ -431,8 +431,6 @@ modes = {

 scylla_tests = set([
    'test/boost/UUID_test',
-    'test/boost/pretty_printers_test',
-    'test/boost/cdc_generation_test',
    'test/boost/aggregate_fcts_test',
    'test/boost/allocation_strategy_test',
    'test/boost/alternator_unit_test',
@@ -443,7 +441,9 @@ scylla_tests = set([
    'test/boost/batchlog_manager_test',
    'test/boost/big_decimal_test',
    'test/boost/bloom_filter_test',
+    'test/boost/bptree_test',
    'test/boost/broken_sstable_test',
+    'test/boost/btree_test',
    'test/boost/bytes_ostream_test',
    'test/boost/cache_algorithm_test',
    'test/boost/cache_mutation_reader_test',
@@ -452,13 +452,15 @@ scylla_tests = set([
    'test/boost/canonical_mutation_test',
    'test/boost/cartesian_product_test',
    'test/boost/castas_fcts_test',
+    'test/boost/cdc_generation_test',
    'test/boost/cdc_test',
    'test/boost/cell_locker_test',
    'test/boost/checksum_utils_test',
-    'test/boost/chunked_vector_test',
    'test/boost/chunked_managed_vector_test',
+    'test/boost/chunked_vector_test',
    'test/boost/clustering_ranges_walker_test',
    'test/boost/column_mapping_test',
+    'test/boost/commitlog_cleanup_test',
    'test/boost/commitlog_test',
    'test/boost/compaction_group_test',
    'test/boost/compound_test',
@@ -468,102 +470,124 @@ scylla_tests = set([
    'test/boost/counter_test',
    'test/boost/cql_auth_query_test',
    'test/boost/cql_auth_syntax_test',
-    'test/boost/cql_query_test',
+    'test/boost/cql_functions_test',
+    'test/boost/cql_query_group_test',
    'test/boost/cql_query_large_test',
    'test/boost/cql_query_like_test',
-    'test/boost/cql_query_group_test',
-    'test/boost/cql_functions_test',
+    'test/boost/cql_query_test',
    'test/boost/crc_test',
    'test/boost/data_listeners_test',
    'test/boost/database_test',
-    'test/boost/commitlog_cleanup_test',
    'test/boost/dirty_memory_manager_test',
+    'test/boost/double_decker_test',
    'test/boost/duration_test',
    'test/boost/dynamic_bitset_test',
    'test/boost/enum_option_test',
    'test/boost/enum_set_test',
-    'test/boost/extensions_test',
    'test/boost/error_injection_test',
+    'test/boost/estimated_histogram_test',
+    'test/boost/exception_container_test',
+    'test/boost/exceptions_fallback_test',
+    'test/boost/exceptions_optimized_test',
+    'test/boost/expr_test',
+    'test/boost/extensions_test',
    'test/boost/filtering_test',
-    'test/boost/mutation_reader_another_test',
    'test/boost/flush_queue_test',
    'test/boost/fragmented_temporary_buffer_test',
    'test/boost/frozen_mutation_test',
+    'test/boost/generic_server_test',
    'test/boost/gossiping_property_file_snitch_test',
+    'test/boost/group0_cmd_merge_test',
+    'test/boost/group0_test',
    'test/boost/hash_test',
    'test/boost/hashers_test',
    'test/boost/hint_test',
    'test/boost/idl_test',
+    'test/boost/index_with_paging_test',
    'test/boost/input_stream_test',
+    'test/boost/intrusive_array_test',
    'test/boost/json_cql_query_test',
    'test/boost/json_test',
    'test/boost/keys_test',
    'test/boost/large_paging_state_test',
-    'test/boost/recent_entries_map_test',
    'test/boost/like_matcher_test',
    'test/boost/limiting_data_source_test',
    'test/boost/linearizing_input_stream_test',
+    'test/boost/lister_test',
    'test/boost/loading_cache_test',
+    'test/boost/locator_topology_test',
    'test/boost/log_heap_test',
-    'test/boost/estimated_histogram_test',
-    'test/boost/summary_test',
-    'test/boost/logalloc_test',
    'test/boost/logalloc_standard_allocator_segment_pool_backend_test',
-    'test/boost/managed_vector_test',
+    'test/boost/logalloc_test',
    'test/boost/managed_bytes_test',
-    'test/boost/intrusive_array_test',
+    'test/boost/managed_vector_test',
    'test/boost/map_difference_test',
    'test/boost/memtable_test',
+    'test/boost/multishard_combining_reader_as_mutation_source_test',
    'test/boost/multishard_mutation_query_test',
    'test/boost/murmur_hash_test',
    'test/boost/mutation_fragment_test',
    'test/boost/mutation_query_test',
+    'test/boost/mutation_reader_another_test',
    'test/boost/mutation_reader_test',
-    'test/boost/multishard_combining_reader_as_mutation_source_test',
    'test/boost/mutation_test',
    'test/boost/mutation_writer_test',
    'test/boost/mvcc_test',
    'test/boost/network_topology_strategy_test',
-    'test/boost/token_metadata_test',
-    'test/boost/tablets_test',
-    'test/boost/sessions_test',
    'test/boost/nonwrapping_interval_test',
    'test/boost/observable_test',
    'test/boost/partitioner_test',
+    'test/boost/per_partition_rate_limit_test',
+    'test/boost/pretty_printers_test',
    'test/boost/querier_cache_test',
    'test/boost/query_processor_test',
-    'test/boost/wrapping_interval_test',
+    'test/boost/radix_tree_test',
    'test/boost/range_tombstone_list_test',
-    'test/boost/reusable_buffer_test',
-    'test/boost/restrictions_test',
+    'test/boost/rate_limiter_test',
+    'test/boost/reader_concurrency_semaphore_test',
+    'test/boost/recent_entries_map_test',
    'test/boost/repair_test',
+    'test/boost/restrictions_test',
+    'test/boost/result_utils_test',
+    'test/boost/reusable_buffer_test',
    'test/boost/role_manager_test',
    'test/boost/row_cache_test',
    'test/boost/rust_test',
+    'test/boost/s3_test',
    'test/boost/schema_change_test',
+    'test/boost/schema_changes_test',
+    'test/boost/schema_loader_test',
    'test/boost/schema_registry_test',
    'test/boost/secondary_index_test',
-    'test/boost/tracing_test',
-    'test/boost/index_with_paging_test',
    'test/boost/serialization_test',
    'test/boost/serialized_action_test',
+    'test/boost/service_level_controller_test',
+    'test/boost/sessions_test',
    'test/boost/small_vector_test',
    'test/boost/snitch_reset_test',
+    'test/boost/sorting_test',
    'test/boost/sstable_3_x_test',
+    'test/boost/sstable_compaction_test',
+    'test/boost/sstable_conforms_to_mutation_source_test',
    'test/boost/sstable_datafile_test',
+    'test/boost/sstable_directory_test',
    'test/boost/sstable_generation_test',
+    'test/boost/sstable_move_test',
    'test/boost/sstable_mutation_test',
    'test/boost/sstable_partition_index_cache_test',
-    'test/boost/schema_changes_test',
-    'test/boost/sstable_conforms_to_mutation_source_test',
-    'test/boost/sstable_compaction_test',
    'test/boost/sstable_resharding_test',
-    'test/boost/sstable_directory_test',
+    'test/boost/sstable_set_test',
    'test/boost/sstable_test',
-    'test/boost/sstable_move_test',
+    'test/boost/stall_free_test',
    'test/boost/statement_restrictions_test',
    'test/boost/storage_proxy_test',
+    'test/boost/string_format_test',
+    'test/boost/summary_test',
+    'test/boost/tablets_test',
+    'test/boost/tagged_integer_test',
+    'test/boost/token_metadata_test',
    'test/boost/top_k_test',
+    'test/boost/tracing_test',
    'test/boost/transport_test',
    'test/boost/types_test',
    'test/boost/user_function_test',
@@ -571,39 +595,16 @@ scylla_tests = set([
    'test/boost/utf8_test',
    'test/boost/view_build_test',
    'test/boost/view_complex_test',
-    'test/boost/view_schema_test',
-    'test/boost/view_schema_pkey_test',
    'test/boost/view_schema_ckey_test',
+    'test/boost/view_schema_pkey_test',
+    'test/boost/view_schema_test',
    'test/boost/vint_serialization_test',
    'test/boost/virtual_reader_test',
    'test/boost/virtual_table_mutation_source_test',
    'test/boost/virtual_table_test',
-    'test/boost/wasm_test',
    'test/boost/wasm_alloc_test',
-    'test/boost/bptree_test',
-    'test/boost/btree_test',
-    'test/boost/radix_tree_test',
-    'test/boost/double_decker_test',
-    'test/boost/stall_free_test',
-    'test/boost/sstable_set_test',
-    'test/boost/reader_concurrency_semaphore_test',
-    'test/boost/service_level_controller_test',
-    'test/boost/schema_loader_test',
-    'test/boost/lister_test',
-    'test/boost/group0_test',
-    'test/boost/exception_container_test',
-    'test/boost/result_utils_test',
-    'test/boost/rate_limiter_test',
-    'test/boost/per_partition_rate_limit_test',
-    'test/boost/expr_test',
-    'test/boost/exceptions_optimized_test',
-    'test/boost/exceptions_fallback_test',
-    'test/boost/s3_test',
-    'test/boost/locator_topology_test',
-    'test/boost/string_format_test',
-    'test/boost/tagged_integer_test',
-    'test/boost/group0_cmd_merge_test',
-    'test/boost/sorting_test',
+    'test/boost/wasm_test',
+    'test/boost/wrapping_interval_test',
    'test/manual/ec2_snitch_test',
    'test/manual/enormous_table_scan_test',
    'test/manual/gce_snitch_test',
@@ -1452,7 +1453,7 @@ deps['test/boost/bytes_ostream_test'] = [
    "test/lib/log.cc",
 ]
 deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
-deps['test/boost/UUID_test'] = ['utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
+deps['test/boost/UUID_test'] = ['clocks-impl.cc', 'utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
 deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
 deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
 deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -68,6 +68,7 @@ options {
 #include "cql3/statements/ks_prop_defs.hh"
 #include "cql3/selection/raw_selector.hh"
 #include "cql3/selection/selectable-expr.hh"
+#include "cql3/dialect.hh"
 #include "cql3/keyspace_element_name.hh"
 #include "cql3/constants.hh"
 #include "cql3/operation_impl.hh"
@@ -148,6 +149,8 @@ using uexpression = uninitialized<expression>;

    listener_type* listener;

+    dialect _dialect;
+
    // Keeps the names of all bind variables. For bind variables without a name ('?'), the name is nullptr.
    // Maps bind_index -> name.
    std::vector<::shared_ptr<cql3::column_identifier>> _bind_variable_names;
@@ -171,9 +174,14 @@ using uexpression = uninitialized<expression>;
        return s;
    }

+    void set_dialect(dialect d) {
+        _dialect = d;
+    }
+
    bind_variable new_bind_variables(shared_ptr<cql3::column_identifier> name)
    {
-        if (name && _named_bind_variables_indexes.contains(*name)) {
+        if (_dialect.duplicate_bind_variable_names_refer_to_same_variable
+                && name && _named_bind_variables_indexes.contains(*name)) {
            return bind_variable{_named_bind_variables_indexes[*name]};
        }
        auto marker = bind_variable{_bind_variable_names.size()};
--- a/cql3/cql3_type.cc
+++ b/cql3/cql3_type.cc
@@ -449,7 +449,8 @@ sstring maybe_quote(const sstring& identifier) {
        // many keywords but allow keywords listed as "unreserved keywords".
        // So we can use any of them, for example cident.
        try {
-            cql3::util::do_with_parser(identifier, std::mem_fn(&cql3_parser::CqlParser::cident));
+            // In general it's not a good idea to use the default dialect, but for parsing an identifier, it's okay.
+            cql3::util::do_with_parser(identifier, dialect{}, std::mem_fn(&cql3_parser::CqlParser::cident));
            return identifier;
        } catch(exceptions::syntax_exception&) {
            // This alphanumeric string is not a valid identifier, so fall
--- a/cql3/dialect.hh
+++ b/cql3/dialect.hh
@@ -0,0 +1,34 @@
+// Copyright (C) 2024-present ScyllaDB
+// SPDX-License-Identifier: AGPL-3.0-or-later
+
+#pragma once
+
+#include <fmt/core.h>
+
+namespace cql3 {
+
+struct dialect {
+    bool duplicate_bind_variable_names_refer_to_same_variable = true;  // if :a is found twice in a query, the two references are to the same variable (see #15559)
+    bool operator==(const dialect&) const = default;
+};
+
+inline
+dialect
+internal_dialect() {
+    return dialect{
+        .duplicate_bind_variable_names_refer_to_same_variable = true,
+    };
+}
+
+}
+
+template <>
+struct fmt::formatter<cql3::dialect> {
+    constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
+
+    template <typename FormatContext>
+    auto format(const cql3::dialect& d, FormatContext& ctx) const {
+        return fmt::format_to(ctx.out(), "cql3::dialect{{duplicate_bind_variable_names_refer_to_same_variable={}}}",
+                d.duplicate_bind_variable_names_refer_to_same_variable);
+    }
+};
--- a/cql3/prepared_statements_cache.hh
+++ b/cql3/prepared_statements_cache.hh
@@ -14,6 +14,7 @@
 #include "utils/hash.hh"
 #include "cql3/statements/prepared_statement.hh"
 #include "cql3/column_specification.hh"
+#include "cql3/dialect.hh"

 namespace cql3 {

@@ -37,14 +38,17 @@ class prepared_cache_key_type {
 public:
    // derive from cql_prepared_id_type so we can customize the formatter of
    // cache_key_type
-    struct cache_key_type : public cql_prepared_id_type {};
+    struct cache_key_type : public cql_prepared_id_type {
+        cache_key_type(cql_prepared_id_type&& id, cql3::dialect d) : cql_prepared_id_type(std::move(id)), dialect(d) {}
+        cql3::dialect dialect; // Not part of hash, but we don't expect collisions because of that
+        bool operator==(const cache_key_type& other) const = default;
+    };

 private:
    cache_key_type _key;

 public:
-    prepared_cache_key_type() = default;
-    explicit prepared_cache_key_type(cql_prepared_id_type cql_id) : _key(std::move(cql_id)) {}
+    explicit prepared_cache_key_type(cql_prepared_id_type cql_id, dialect d) : _key(std::move(cql_id), d) {}

    cache_key_type& key() { return _key; }
    const cache_key_type& key() const { return _key; }
@@ -176,7 +180,7 @@ struct hash<cql3::prepared_cache_key_type> final {
 template <> struct fmt::formatter<cql3::prepared_cache_key_type::cache_key_type> {
    constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
    auto format(const cql3::prepared_cache_key_type::cache_key_type& p, fmt::format_context& ctx) const {
-        return fmt::format_to(ctx.out(), "{{cql_id: {}}}", static_cast<const cql3::cql_prepared_id_type&>(p));
+        return fmt::format_to(ctx.out(), "{{cql_id: {}, dialect: {}}}", static_cast<const cql3::cql_prepared_id_type&>(p), p.dialect);
    }
 };

--- a/cql3/query_processor.cc
+++ b/cql3/query_processor.cc
@@ -566,10 +566,10 @@ query_processor::execute_maybe_with_guard(service::query_state& query_state, ::s
 }

 future<::shared_ptr<result_message>>
-query_processor::execute_direct_without_checking_exception_message(const sstring_view& query_string, service::query_state& query_state, query_options& options) {
+query_processor::execute_direct_without_checking_exception_message(const sstring_view& query_string, service::query_state& query_state, dialect d, query_options& options) {
    log.trace("execute_direct: \"{}\"", query_string);
    tracing::trace(query_state.get_trace_state(), "Parsing a statement");
-    auto p = get_statement(query_string, query_state.get_client_state());
+    auto p = get_statement(query_string, query_state.get_client_state(), d);
    auto statement = p->statement;
    const auto warnings = std::move(p->warnings);
    if (statement->get_bound_terms() != options.get_values_count()) {
@@ -653,18 +653,21 @@ query_processor::process_authorized_statement(const ::shared_ptr<cql_statement>
 }

 future<::shared_ptr<cql_transport::messages::result_message::prepared>>
-query_processor::prepare(sstring query_string, service::query_state& query_state) {
+query_processor::prepare(sstring query_string, service::query_state& query_state, cql3::dialect d) {
    auto& client_state = query_state.get_client_state();
-    return prepare(std::move(query_string), client_state);
+    return prepare(std::move(query_string), client_state, d);
 }

 future<::shared_ptr<cql_transport::messages::result_message::prepared>>
-query_processor::prepare(sstring query_string, const service::client_state& client_state) {
+query_processor::prepare(sstring query_string, const service::client_state& client_state, cql3::dialect d) {
    using namespace cql_transport::messages;
    return prepare_one<result_message::prepared::cql>(
            std::move(query_string),
            client_state,
-            compute_id,
+            d,
+            [d] (std::string_view query_string, std::string_view keyspace) {
+                return compute_id(query_string, keyspace, d);
+            },
            prepared_cache_key_type::cql_id);
 }

@@ -676,13 +679,14 @@ static std::string hash_target(std::string_view query_string, std::string_view k

 prepared_cache_key_type query_processor::compute_id(
        std::string_view query_string,
-        std::string_view keyspace) {
-    return prepared_cache_key_type(md5_hasher::calculate(hash_target(query_string, keyspace)));
+        std::string_view keyspace,
+        dialect d) {
+    return prepared_cache_key_type(md5_hasher::calculate(hash_target(query_string, keyspace)), d);
 }

 std::unique_ptr<prepared_statement>
-query_processor::get_statement(const sstring_view& query, const service::client_state& client_state) {
-    std::unique_ptr<raw::parsed_statement> statement = parse_statement(query);
+query_processor::get_statement(const sstring_view& query, const service::client_state& client_state, dialect d) {
+    std::unique_ptr<raw::parsed_statement> statement = parse_statement(query, d);

    // Set keyspace for statement that require login
    auto cf_stmt = dynamic_cast<raw::cf_statement*>(statement.get());
@@ -696,7 +700,7 @@ query_processor::get_statement(const sstring_view& query, const service::client_
 }

 std::unique_ptr<raw::parsed_statement>
-query_processor::parse_statement(const sstring_view& query) {
+query_processor::parse_statement(const sstring_view& query, dialect d) {
    try {
        {
            const char* error_injection_key = "query_processor-parse_statement-test_failure";
@@ -706,7 +710,7 @@ query_processor::parse_statement(const sstring_view& query) {
                }
            });
        }
-        auto statement = util::do_with_parser(query,  std::mem_fn(&cql3_parser::CqlParser::query));
+        auto statement = util::do_with_parser(query, d, std::mem_fn(&cql3_parser::CqlParser::query));
        if (!statement) {
            throw exceptions::syntax_exception("Parsing failed");
        }
@@ -722,9 +726,9 @@ query_processor::parse_statement(const sstring_view& query) {
 }

 std::vector<std::unique_ptr<raw::parsed_statement>>
-query_processor::parse_statements(std::string_view queries) {
+query_processor::parse_statements(std::string_view queries, dialect d) {
    try {
-        auto statements = util::do_with_parser(queries, std::mem_fn(&cql3_parser::CqlParser::queries));
+        auto statements = util::do_with_parser(queries, d, std::mem_fn(&cql3_parser::CqlParser::queries));
        if (statements.empty()) {
            throw exceptions::syntax_exception("Parsing failed");
        }
@@ -797,7 +801,7 @@ query_options query_processor::make_internal_options(
 statements::prepared_statement::checked_weak_ptr query_processor::prepare_internal(const sstring& query_string) {
    auto& p = _internal_statements[query_string];
    if (p == nullptr) {
-        auto np = parse_statement(query_string)->prepare(_db, _cql_stats);
+        auto np = parse_statement(query_string, internal_dialect())->prepare(_db, _cql_stats);
        np->statement->raw_cql_statement = query_string;
        p = std::move(np); // inserts it into map
    }
@@ -903,7 +907,8 @@ query_processor::execute_internal(
        auto p = prepare_internal(query_string);
        return execute_with_params(std::move(p), cl, query_state, values);
    } else {
-        auto p = parse_statement(query_string)->prepare(_db, _cql_stats);
+        // For internal queries, we want the default dialect, not the user provided one
+        auto p = parse_statement(query_string, dialect{})->prepare(_db, _cql_stats);
        p->statement->raw_cql_statement = query_string;
        auto checked_weak_ptr = p->checked_weak_from_this();
        return execute_with_params(std::move(checked_weak_ptr), cl, query_state, values).finally([p = std::move(p)] {});
--- a/cql3/query_processor.hh
+++ b/cql3/query_processor.hh
@@ -21,6 +21,7 @@
 #include "cql3/authorized_prepared_statements_cache.hh"
 #include "cql3/statements/prepared_statement.hh"
 #include "cql3/cql_statement.hh"
+#include "cql3/dialect.hh"
 #include "exceptions/exceptions.hh"
 #include "service/migration_listener.hh"
 #include "timestamp.hh"
@@ -137,10 +138,11 @@ public:

    static prepared_cache_key_type compute_id(
            std::string_view query_string,
-            std::string_view keyspace);
+            std::string_view keyspace,
+            dialect d);

-    static std::unique_ptr<statements::raw::parsed_statement> parse_statement(const std::string_view& query);
-    static std::vector<std::unique_ptr<statements::raw::parsed_statement>> parse_statements(std::string_view queries);
+    static std::unique_ptr<statements::raw::parsed_statement> parse_statement(const std::string_view& query, dialect d);
+    static std::vector<std::unique_ptr<statements::raw::parsed_statement>> parse_statements(std::string_view queries, dialect d);

    query_processor(service::storage_proxy& proxy, data_dictionary::database db, service::migration_notifier& mn, memory_config mcfg, cql_config& cql_cfg, utils::loading_cache_config auth_prep_cache_cfg, lang::manager& langm);

@@ -249,10 +251,12 @@ public:
    execute_direct(
            const std::string_view& query_string,
            service::query_state& query_state,
+            dialect d,
            query_options& options) {
        return execute_direct_without_checking_exception_message(
                query_string,
                query_state,
+                d,
                options)
                .then(cql_transport::messages::propagate_exception_as_future<::shared_ptr<cql_transport::messages::result_message>>);
    }
@@ -263,6 +267,7 @@ public:
    execute_direct_without_checking_exception_message(
            const std::string_view& query_string,
            service::query_state& query_state,
+            dialect d,
            query_options& options);

    future<::shared_ptr<cql_transport::messages::result_message>>
@@ -397,10 +402,10 @@ public:


    future<::shared_ptr<cql_transport::messages::result_message::prepared>>
-    prepare(sstring query_string, service::query_state& query_state);
+    prepare(sstring query_string, service::query_state& query_state, dialect d);

    future<::shared_ptr<cql_transport::messages::result_message::prepared>>
-    prepare(sstring query_string, const service::client_state& client_state);
+    prepare(sstring query_string, const service::client_state& client_state, dialect d);

    future<> stop();

@@ -443,7 +448,8 @@ public:

    std::unique_ptr<statements::prepared_statement> get_statement(
            const std::string_view& query,
-            const service::client_state& client_state);
+            const service::client_state& client_state,
+            dialect d);

    friend class migration_subscriber;

@@ -527,14 +533,15 @@ private:
    prepare_one(
            sstring query_string,
            const service::client_state& client_state,
+            dialect d,
            PreparedKeyGenerator&& id_gen,
            IdGetter&& id_getter) {
        return do_with(
                id_gen(query_string, client_state.get_raw_keyspace()),
                std::move(query_string),
-                [this, &client_state, &id_getter](const prepared_cache_key_type& key, const sstring& query_string) {
-            return _prepared_cache.get(key, [this, &query_string, &client_state] {
-                auto prepared = get_statement(query_string, client_state);
+                [this, &client_state, &id_getter, d](const prepared_cache_key_type& key, const sstring& query_string) {
+            return _prepared_cache.get(key, [this, &query_string, &client_state, d] {
+                auto prepared = get_statement(query_string, client_state, d);
                auto bound_terms = prepared->statement->get_bound_terms();
                if (bound_terms > std::numeric_limits<uint16_t>::max()) {
                    throw exceptions::invalid_request_exception(
--- a/cql3/selection/selection.cc
+++ b/cql3/selection/selection.cc
@@ -503,10 +503,12 @@ selection::collect_metadata(const schema& schema, const std::vector<prepared_sel
 }

 result_set_builder::result_set_builder(const selection& s, gc_clock::time_point now,
-                                       std::vector<size_t> group_by_cell_indices)
+                                       std::vector<size_t> group_by_cell_indices,
+                                       uint64_t limit)
    : _result_set(std::make_unique<result_set>(::make_shared<metadata>(*(s.get_result_metadata()))))
    , _selectors(s.new_selectors())
    , _group_by_cell_indices(std::move(group_by_cell_indices))
+    , _limit(limit)
    , _last_group(_group_by_cell_indices.size())
    , _group_began(false)
    , _now(now)
@@ -577,8 +579,10 @@ void result_set_builder::flush_selectors() {
        // handled by process_current_row
        return;
    }
-    _result_set->add_row(_selectors->get_output_row());
-    _selectors->reset();
+    if (_result_set->size() < _limit) {
+        _result_set->add_row(_selectors->get_output_row());
+        _selectors->reset();
+    }
 }

 void result_set_builder::complete_row() {
@@ -790,6 +794,10 @@ int32_t result_set_builder::ttl_of(size_t idx) {
    return _ttls[idx];
 }

+size_t result_set_builder::result_set_size() const {
+    return _result_set->size();
+}
+
 bytes_opt result_set_builder::get_value(data_type t, query::result_atomic_cell_view c) {
    return {c.value().linearize()};
 }
--- a/cql3/selection/selection.hh
+++ b/cql3/selection/selection.hh
@@ -172,6 +172,7 @@ private:
    std::unique_ptr<result_set> _result_set;
    std::unique_ptr<selectors> _selectors;
    const std::vector<size_t> _group_by_cell_indices; ///< Indices in \c current of cells holding GROUP BY values.
+    const uint64_t _limit; ///< Maximum number of rows to return.
    std::vector<managed_bytes_opt> _last_group; ///< Previous row's group: all of GROUP BY column values.
    bool _group_began; ///< Whether a group began being formed.
 public:
@@ -236,7 +237,8 @@ public:
    };

    result_set_builder(const selection& s, gc_clock::time_point now,
-                       std::vector<size_t> group_by_cell_indices = {});
+                       std::vector<size_t> group_by_cell_indices = {},
+                       uint64_t limit = std::numeric_limits<uint64_t>::max());
    void add_empty();
    void add(bytes_opt value);
    void add(const column_definition& def, const query::result_atomic_cell_view& c);
@@ -246,6 +248,7 @@ public:
    std::unique_ptr<result_set> build();
    api::timestamp_type timestamp_of(size_t idx);
    int32_t ttl_of(size_t idx);
+    size_t result_set_size() const;

    // Implements ResultVisitor concept from query.hh
    template<typename Filter = nop_filter>
--- a/cql3/statements/alter_keyspace_statement.cc
+++ b/cql3/statements/alter_keyspace_statement.cc
@@ -11,6 +11,7 @@
 #include <boost/range/algorithm.hpp>
 #include <fmt/format.h>
 #include <seastar/core/coroutine.hh>
+#include <seastar/core/on_internal_error.hh>
 #include <stdexcept>
 #include "alter_keyspace_statement.hh"
 #include "prepared_statement.hh"
@@ -43,18 +44,16 @@ future<> cql3::statements::alter_keyspace_statement::check_access(query_processo
    return state.has_keyspace_access(_name, auth::permission::ALTER);
 }

-static bool validate_rf_difference(const std::string_view curr_rf, const std::string_view new_rf) {
-    auto to_number = [] (const std::string_view rf) {
-        int result;
-        // We assume the passed string view represents a valid decimal number,
-        // so we don't need the error code.
-        (void) std::from_chars(rf.begin(), rf.end(), result);
-        return result;
-    };
-
-    // We want to ensure that each DC's RF is going to change by at most 1
-    // because in that case the old and new quorums must overlap.
-    return std::abs(to_number(curr_rf) - to_number(new_rf)) <= 1;
+static unsigned get_abs_rf_diff(const std::string& curr_rf, const std::string& new_rf) {
+    try {
+        return std::abs(std::stoi(curr_rf) - std::stoi(new_rf));
+    } catch (std::invalid_argument const& ex) {
+        on_internal_error(mylogger, fmt::format("get_abs_rf_diff expects integer arguments, "
+                                                "but got curr_rf:{} and new_rf:{}", curr_rf, new_rf));
+    } catch (std::out_of_range const& ex) {
+        on_internal_error(mylogger, fmt::format("get_abs_rf_diff expects integer arguments to fit into `int` type, "
+                                                "but got curr_rf:{} and new_rf:{}", curr_rf, new_rf));
+    }
 }

 void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, const service::client_state& state) const {
@@ -84,11 +83,24 @@ void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, c
            auto new_ks = _attrs->as_ks_metadata_update(ks.metadata(), *qp.proxy().get_token_metadata_ptr(), qp.proxy().features());

            if (ks.get_replication_strategy().uses_tablets()) {
-                const std::map<sstring, sstring>& current_rfs = ks.metadata()->strategy_options();
-                for (const auto& [new_dc, new_rf] : _attrs->get_replication_options()) {
-                    auto it = current_rfs.find(new_dc);
-                    if (it != current_rfs.end() && !validate_rf_difference(it->second, new_rf)) {
-                        throw exceptions::invalid_request_exception("Cannot modify replication factor of any DC by more than 1 at a time.");
+                const std::map<sstring, sstring>& current_rf_per_dc = ks.metadata()->strategy_options();
+                auto new_rf_per_dc = _attrs->get_replication_options();
+                new_rf_per_dc.erase(ks_prop_defs::REPLICATION_STRATEGY_CLASS_KEY);
+                unsigned total_abs_rfs_diff = 0;
+                for (const auto& [new_dc, new_rf] : new_rf_per_dc) {
+                    sstring old_rf = "0";
+                    if (auto new_dc_in_current_mapping = current_rf_per_dc.find(new_dc);
+                             new_dc_in_current_mapping != current_rf_per_dc.end()) {
+                        old_rf = new_dc_in_current_mapping->second;
+                    } else if (!qp.proxy().get_token_metadata_ptr()->get_topology().get_datacenters().contains(new_dc)) {
+                        // This means that the DC listed in ALTER doesn't exist. This error will be reported later,
+                        // during validation in abstract_replication_strategy::validate_replication_strategy.
+                        // We can't report this error now, because it'd change the order of errors reported:
+                        // first we need to report non-existing DCs, then if RFs aren't changed by too much.
+                        continue;
+                    }
+                    if (total_abs_rfs_diff += get_abs_rf_diff(old_rf, new_rf); total_abs_rfs_diff >= 2) {
+                        throw exceptions::invalid_request_exception("Only one DC's RF can be changed at a time and not by more than 1");
                    }
                }
            }
@@ -118,6 +130,63 @@ bool cql3::statements::alter_keyspace_statement::changes_tablets(query_processor
    return ks.get_replication_strategy().uses_tablets() && !_attrs->get_replication_options().empty();
 }

+namespace {
+// These functions are used to flatten all the options in the keyspace definition into a single-level map<string, string>.
+// (Currently options are stored in a nested structure that looks more like a map<string, map<string, string>>).
+// Flattening is simply joining the keys of maps from both levels with a colon ':' character,
+// or in other words: prefixing the keys in the output map with the option type, e.g. 'replication', 'storage', etc.,
+// so that the output map contains entries like: "replication:dc1" -> "3".
+// This is done to avoid key conflicts and to be able to de-flatten the map back into the original structure.
+
+void add_prefixed_key(const sstring& prefix, const std::map<sstring, sstring>& in, std::map<sstring, sstring>& out) {
+    for (const auto& [in_key, in_value]: in) {
+        out[prefix + ":" + in_key] = in_value;
+    }
+};
+
+std::map<sstring, sstring> get_current_options_flattened(const shared_ptr<cql3::statements::ks_prop_defs>& ks,
+                                                         bool include_tablet_options,
+                                                         const gms::feature_service& feat) {
+    std::map<sstring, sstring> all_options;
+
+    add_prefixed_key(ks->KW_REPLICATION, ks->get_replication_options(), all_options);
+    add_prefixed_key(ks->KW_STORAGE, ks->get_storage_options().to_map(), all_options);
+    // if no tablet options are specified in ATLER KS statement,
+    // we want to preserve the old ones and hence cannot overwrite them with defaults
+    if (include_tablet_options) {
+        auto initial_tablets = ks->get_initial_tablets(std::nullopt);
+        add_prefixed_key(ks->KW_TABLETS,
+                         {{"enabled", initial_tablets ? "true" : "false"},
+                         {"initial", std::to_string(initial_tablets.value_or(0))}},
+                         all_options);
+    }
+    add_prefixed_key(ks->KW_DURABLE_WRITES,
+                     {{sstring(ks->KW_DURABLE_WRITES), to_sstring(ks->get_boolean(ks->KW_DURABLE_WRITES, true))}},
+                     all_options);
+
+    return all_options;
+}
+
+std::map<sstring, sstring> get_old_options_flattened(const data_dictionary::keyspace& ks, bool include_tablet_options) {
+    std::map<sstring, sstring> all_options;
+
+    using namespace cql3::statements;
+    add_prefixed_key(ks_prop_defs::KW_REPLICATION, ks.get_replication_strategy().get_config_options(), all_options);
+    add_prefixed_key(ks_prop_defs::KW_STORAGE, ks.metadata()->get_storage_options().to_map(), all_options);
+    if (include_tablet_options) {
+        add_prefixed_key(ks_prop_defs::KW_TABLETS,
+                         {{"enabled", ks.metadata()->initial_tablets() ? "true" : "false"},
+                          {"initial", std::to_string(ks.metadata()->initial_tablets().value_or(0))}},
+                         all_options);
+    }
+    add_prefixed_key(ks_prop_defs::KW_DURABLE_WRITES,
+                     {{sstring(ks_prop_defs::KW_DURABLE_WRITES), to_sstring(ks.metadata()->durable_writes())}},
+                     all_options);
+
+    return all_options;
+}
+} // <anonymous> namespace
+
 future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, cql3::cql_warnings_vec>>
 cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_processor& qp, service::query_state& state, const query_options& options, service::group0_batch& mc) const {
    using namespace cql_transport;
@@ -130,11 +199,18 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
        auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, tm, feat);
        std::vector<mutation> muts;
        std::vector<sstring> warnings;
-        auto ks_options = _attrs->get_all_options_flattened(feat);
+        bool include_tablet_options = _attrs->get_map(_attrs->KW_TABLETS).has_value();
+        auto old_ks_options = get_old_options_flattened(ks, include_tablet_options);
+        auto ks_options = get_current_options_flattened(_attrs, include_tablet_options, feat);
+        ks_options.merge(old_ks_options);
+
        auto ts = mc.write_timestamp();
        auto global_request_id = mc.new_group0_state_id();

        // we only want to run the tablets path if there are actually any tablets changes, not only schema changes
+        // TODO: the current `if (changes_tablets(qp))` is insufficient: someone may set the same RFs as before,
+        //       and we'll unnecessarily trigger the processing path for ALTER tablets KS,
+        //       when in reality nothing or only schema is being changed
        if (changes_tablets(qp)) {
            if (!qp.topology_global_queue_empty()) {
                return make_exception_future<std::tuple<::shared_ptr<::cql_transport::event::schema_change>, cql3::cql_warnings_vec>>(
--- a/cql3/statements/alter_table_statement.cc
+++ b/cql3/statements/alter_table_statement.cc
@@ -384,7 +384,8 @@ std::pair<schema_builder, std::vector<view_ptr>> alter_table_statement::prepare_
                    auto new_where = util::rename_column_in_where_clause(
                            view->view_info()->where_clause(),
                            column_identifier::raw(view_from->text(), true),
-                            column_identifier::raw(view_to->text(), true));
+                            column_identifier::raw(view_to->text(), true),
+                            cql3::dialect{});
                    builder.with_view_info(view->view_info()->base_id(), view->view_info()->base_name(),
                            view->view_info()->include_all_columns(), std::move(new_where));

--- a/cql3/statements/create_service_level_statement.cc
+++ b/cql3/statements/create_service_level_statement.cc
@@ -7,6 +7,7 @@
 */

 #include "auth/service.hh"
+#include "exceptions/exceptions.hh"
 #include "seastarx.hh"
 #include "cql3/statements/create_service_level_statement.hh"
 #include "service/qos/service_level_controller.hh"
@@ -38,6 +39,10 @@ create_service_level_statement::execute(query_processor& qp,
        service::query_state &state,
        const query_options &,
        std::optional<service::group0_guard> guard) const {
+    if (_service_level.starts_with('$')) {
+        throw exceptions::invalid_request_exception("Names starting with '$' are reserved for internal tenants. Use a different name.");
+    }
+
    service::group0_batch mc{std::move(guard)};
    qos::service_level_options slo = _slo.replace_defaults(qos::service_level_options{});
    auto& sl = state.get_service_level_controller();
--- a/cql3/statements/create_table_statement.cc
+++ b/cql3/statements/create_table_statement.cc
@@ -192,6 +192,13 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa

    auto stmt = ::make_shared<create_table_statement>(*_cf_name, _properties.properties(), _if_not_exists, _static_columns, _properties.properties()->get_id());

+    bool ks_uses_tablets;
+    try {
+        ks_uses_tablets = db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets();
+    } catch (const data_dictionary::no_such_keyspace& e) {
+        throw exceptions::invalid_request_exception("Cannot create a table in a non-existent keyspace: " + keyspace());
+    }
+
    std::optional<std::map<bytes, data_type>> defined_multi_cell_columns;
    for (auto&& entry : _definitions) {
        ::shared_ptr<column_identifier> id = entry.first;
@@ -201,7 +208,7 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
            throw exceptions::invalid_request_exception("Cannot set default_time_to_live on a table with counters");
        }

-        if (db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets() && pt.is_counter()) {
+        if (ks_uses_tablets && pt.is_counter()) {
            throw exceptions::invalid_request_exception(format("Cannot use the 'counter' type for table {}.{}: Counters are not yet supported with tablets", keyspace(), cf_name));
        }

--- a/cql3/statements/ks_prop_defs.cc
+++ b/cql3/statements/ks_prop_defs.cc
@@ -138,28 +138,22 @@ data_dictionary::storage_options ks_prop_defs::get_storage_options() const {
    return opts;
 }

-ks_prop_defs::init_tablets_options ks_prop_defs::get_initial_tablets(const sstring& strategy_class, bool enabled_by_default) const {
-    // FIXME -- this should be ignored somehow else
-    init_tablets_options ret{ .enabled = false, .specified_count = std::nullopt };
-    if (locator::abstract_replication_strategy::to_qualified_class_name(strategy_class) != "org.apache.cassandra.locator.NetworkTopologyStrategy") {
-        return ret;
-    }
-
+std::optional<unsigned> ks_prop_defs::get_initial_tablets(std::optional<unsigned> default_value) const {
    auto tablets_options = get_map(KW_TABLETS);
    if (!tablets_options) {
-        return enabled_by_default ? init_tablets_options{ .enabled = true } : ret;
+        return default_value;
    }

+    unsigned initial_count = 0;
    auto it = tablets_options->find("enabled");
    if (it != tablets_options->end()) {
        auto enabled = it->second;
        tablets_options->erase(it);

        if (enabled == "true") {
-            ret = init_tablets_options{ .enabled = true, .specified_count = 0 }; // even if 'initial' is not set, it'll start with auto-detection
+            // nothing
        } else if (enabled == "false") {
-            assert(!ret.enabled);
-            return ret;
+            return std::nullopt;
        } else {
            throw exceptions::configuration_exception(sstring("Tablets enabled value must be true or false; found: ") + enabled);
        }
@@ -168,7 +162,7 @@ ks_prop_defs::init_tablets_options ks_prop_defs::get_initial_tablets(const sstri
    it = tablets_options->find("initial");
    if (it != tablets_options->end()) {
        try {
-            ret = init_tablets_options{ .enabled = true, .specified_count = std::stol(it->second)};
+            initial_count = std::stol(it->second);
        } catch (...) {
            throw exceptions::configuration_exception(sstring("Initial tablets value should be numeric; found ") + it->second);
        }
@@ -179,7 +173,7 @@ ks_prop_defs::init_tablets_options ks_prop_defs::get_initial_tablets(const sstri
        throw exceptions::configuration_exception(sstring("Unrecognized tablets option ") + tablets_options->begin()->first);
    }

-    return ret;
+    return initial_count;
 }

 std::optional<sstring> ks_prop_defs::get_replication_strategy_class() const {
@@ -190,32 +184,13 @@ bool ks_prop_defs::get_durable_writes() const {
    return get_boolean(KW_DURABLE_WRITES, true);
 }

-std::map<sstring, sstring> ks_prop_defs::get_all_options_flattened(const gms::feature_service& feat) const {
-    std::map<sstring, sstring> all_options;
-
-    auto ingest_flattened_options = [&all_options](const std::map<sstring, sstring>& options, const sstring& prefix) {
-        for (auto& option: options) {
-            all_options[prefix + ":" + option.first] = option.second;
-        }
-    };
-    ingest_flattened_options(get_replication_options(), KW_REPLICATION);
-    ingest_flattened_options(get_storage_options().to_map(), KW_STORAGE);
-    ingest_flattened_options(get_map(KW_TABLETS).value_or(std::map<sstring, sstring>{}), KW_TABLETS);
-    ingest_flattened_options({{sstring(KW_DURABLE_WRITES), to_sstring(get_boolean(KW_DURABLE_WRITES, true))}}, KW_DURABLE_WRITES);
-
-    return all_options;
-}
-
 lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata(sstring ks_name, const locator::token_metadata& tm, const gms::feature_service& feat) {
    auto sc = get_replication_strategy_class().value();
-    auto initial_tablets = get_initial_tablets(sc, feat.tablets);
-    // if tablets options have not been specified, but tablets are globally enabled, set the value to 0
-    if (initial_tablets.enabled && !initial_tablets.specified_count) {
-        initial_tablets.specified_count = 0;
-    }
+    // if tablets options have not been specified, but tablets are globally enabled, set the value to 0 for N.T.S. only
+    auto initial_tablets = get_initial_tablets(feat.tablets && locator::abstract_replication_strategy::to_qualified_class_name(sc) == "org.apache.cassandra.locator.NetworkTopologyStrategy" ? std::optional<unsigned>(0) : std::nullopt);
    auto options = prepare_options(sc, tm, get_replication_options());
    return data_dictionary::keyspace_metadata::new_keyspace(ks_name, sc,
-            std::move(options), initial_tablets.specified_count, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
+            std::move(options), initial_tablets, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
 }

 lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata_update(lw_shared_ptr<data_dictionary::keyspace_metadata> old, const locator::token_metadata& tm, const gms::feature_service& feat) {
@@ -228,13 +203,9 @@ lw_shared_ptr<data_dictionary::keyspace_metadata> ks_prop_defs::as_ks_metadata_u
        sc = old->strategy_name();
        options = old_options;
    }
-    auto initial_tablets = get_initial_tablets(*sc, old->initial_tablets().has_value());
    // if tablets options have not been specified, inherit them if it's tablets-enabled KS
-    if (initial_tablets.enabled && !initial_tablets.specified_count) {
-        initial_tablets.specified_count = old->initial_tablets();
-    }
-
-    return data_dictionary::keyspace_metadata::new_keyspace(old->name(), *sc, options, initial_tablets.specified_count, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
+    auto initial_tablets = get_initial_tablets(old->initial_tablets());
+    return data_dictionary::keyspace_metadata::new_keyspace(old->name(), *sc, options, initial_tablets, get_boolean(KW_DURABLE_WRITES, true), get_storage_options());
 }


--- a/cql3/statements/ks_prop_defs.hh
+++ b/cql3/statements/ks_prop_defs.hh
@@ -49,21 +49,15 @@ public:
 private:
    std::optional<sstring> _strategy_class;
 public:
-    struct init_tablets_options {
-        bool enabled;
-        std::optional<unsigned> specified_count;
-    };
-
    ks_prop_defs() = default;
    explicit ks_prop_defs(std::map<sstring, sstring> options);

    void validate();
    std::map<sstring, sstring> get_replication_options() const;
    std::optional<sstring> get_replication_strategy_class() const;
-    init_tablets_options get_initial_tablets(const sstring& strategy_class, bool enabled_by_default) const;
+    std::optional<unsigned> get_initial_tablets(std::optional<unsigned> default_value) const;
    data_dictionary::storage_options get_storage_options() const;
    bool get_durable_writes() const;
-    std::map<sstring, sstring> get_all_options_flattened(const gms::feature_service& feat) const;
    lw_shared_ptr<data_dictionary::keyspace_metadata> as_ks_metadata(sstring ks_name, const locator::token_metadata&, const gms::feature_service&);
    lw_shared_ptr<data_dictionary::keyspace_metadata> as_ks_metadata_update(lw_shared_ptr<data_dictionary::keyspace_metadata> old, const locator::token_metadata&, const gms::feature_service&);
 };
--- a/cql3/statements/list_service_level_statement.cc
+++ b/cql3/statements/list_service_level_statement.cc
@@ -54,7 +54,7 @@ list_service_level_statement::execute(query_processor& qp,

    return make_ready_future().then([this, &state] () {
                                  if (_describe_all) {
-                                      return state.get_service_level_controller().get_distributed_service_levels();
+                                      return state.get_service_level_controller().get_distributed_service_levels(qos::query_context::user);
                                  } else {
                                      return state.get_service_level_controller().get_distributed_service_level(_service_level);
                                  }
--- a/cql3/statements/property_definitions.hh
+++ b/cql3/statements/property_definitions.hh
@@ -46,14 +46,14 @@ public:
 protected:
    std::optional<sstring> get_simple(const sstring& name) const;

-    std::optional<std::map<sstring, sstring>> get_map(const sstring& name) const;
-
    void remove_from_map_if_exists(const sstring& name, const sstring& key) const;
 public:
    bool has_property(const sstring& name) const;

    std::optional<value_type> get(const sstring& name) const;

+    std::optional<std::map<sstring, sstring>> get_map(const sstring& name) const;
+
    sstring get_string(sstring key, sstring default_value) const;

    // Return a property value, typed as a Boolean
--- a/cql3/statements/select_statement.cc
+++ b/cql3/statements/select_statement.cc
@@ -283,33 +283,44 @@ select_statement::make_partition_slice(const query_options& options) const
        std::reverse(bounds.begin(), bounds.end());
        ++_stats.reverse_queries;
    }
+
+    const uint64_t per_partition_limit = get_inner_loop_limit(get_limit(options, _per_partition_limit),
+        _selection->is_aggregate());
    return query::partition_slice(std::move(bounds),
-        std::move(static_columns), std::move(regular_columns), _opts, nullptr, get_per_partition_limit(options));
+        std::move(static_columns), std::move(regular_columns), _opts, nullptr, per_partition_limit);
 }

-uint64_t select_statement::do_get_limit(const query_options& options,
-                                        const std::optional<expr::expression>& limit,
-                                        const expr::unset_bind_variable_guard& limit_unset_guard,
-                                        uint64_t default_limit) const {
-    if (!limit.has_value() || limit_unset_guard.is_unset(options) || _selection->is_aggregate()) {
-        return default_limit;
-    }
-
-    auto val = expr::evaluate(*limit, options);
-    if (val.is_null()) {
-        throw exceptions::invalid_request_exception("Invalid null value of limit");
+select_statement::get_limit_result select_statement::get_limit(
+    const query_options& options, const std::optional<expr::expression>& limit) const
+{
+    if (!limit.has_value()) {
+        return bo::success(query::max_rows);
    }
    try {
+        auto val = expr::evaluate(*limit, options);
+        if (val.is_null()) {
+            return bo::failure(exceptions::invalid_request_exception("Invalid null value of limit"));
+        }
        auto l = val.view().validate_and_deserialize<int32_t>(*int32_type);
        if (l <= 0) {
-            throw exceptions::invalid_request_exception("LIMIT must be strictly positive");
+            return bo::failure(exceptions::invalid_request_exception("LIMIT must be strictly positive"));
        }
-        return l;
+        return bo::success(l);
    } catch (const marshal_exception& e) {
-        throw exceptions::invalid_request_exception("Invalid limit value");
+        return bo::failure(exceptions::invalid_request_exception("Invalid limit value"));
+    } catch (const exceptions::invalid_request_exception& e) {
+        return bo::failure(e);
    }
 }

+uint64_t select_statement::get_inner_loop_limit(const select_statement::get_limit_result& limit, bool is_aggregate)
+{
+    if (!limit.has_value() || is_aggregate) {
+        return query::max_rows;
+    }
+    return limit.value();
+}
+
 bool select_statement::needs_post_query_ordering() const {
    // We need post-query ordering only for queries with IN on the partition key and an ORDER BY.
    return _restrictions->key_is_in_relation() && !_parameters->orderings().empty();
@@ -358,7 +369,8 @@ select_statement::do_execute(query_processor& qp,

    validate_for_read(cl);

-    uint64_t limit = get_limit(options);
+    const auto parsed_limit = get_limit(options, _limit);
+    const uint64_t inner_loop_limit = get_inner_loop_limit(parsed_limit, _selection->is_aggregate());
    auto now = gc_clock::now();

    _stats.filtered_reads += _restrictions_need_filtering;
@@ -380,7 +392,7 @@ select_statement::do_execute(query_processor& qp,
            std::move(slice),
            max_result_size,
            query::tombstone_limit(qp.proxy().get_tombstone_limit()),
-            query::row_limit(limit),
+            query::row_limit(inner_loop_limit),
            query::partition_limit(query::max_partitions),
            now,
            tracing::make_trace_info(state.get_trace_state()),
@@ -393,14 +405,13 @@ select_statement::do_execute(query_processor& qp,

    _stats.unpaged_select_queries(_ks_sel) += page_size <= 0;

-    // An aggregation query will never be paged for the user, but we always page it internally to avoid OOM.
-    // If we user provided a page_size we'll use that to page internally (because why not), otherwise we use our default
-    // Note that if there are some nodes in the cluster with a version less than 2.0, we can't use paging (CASSANDRA-6707).
+    // An aggregation query may not be paged for the user, but we always page it internally to avoid OOM.
+    // If the user provided a page_size we'll use that to page internally (because why not), otherwise we use our default
    // Also note: all GROUP BY queries are considered aggregation.
    const bool aggregate = _selection->is_aggregate() || has_group_by();
    const bool nonpaged_filtering = _restrictions_need_filtering && page_size <= 0;
    if (aggregate || nonpaged_filtering) {
-        page_size = internal_paging_size;
+        page_size = page_size <= 0 ? internal_paging_size : std::min(page_size, internal_paging_size);
    }

    auto key_ranges = _restrictions->get_partition_key_ranges(options);
@@ -438,7 +449,9 @@ select_statement::do_execute(query_processor& qp,
                    *command, key_ranges))) {
        f = execute_without_checking_exception_message_non_aggregate_unpaged(qp, command, std::move(key_ranges), state, options, now);
    } else {
-        f = execute_without_checking_exception_message_aggregate_or_paged(qp, command, std::move(key_ranges), state, options, now, page_size, aggregate, nonpaged_filtering);
+        f = execute_without_checking_exception_message_aggregate_or_paged(qp, command,
+            std::move(key_ranges), state, options, now, page_size, aggregate,
+            nonpaged_filtering, parsed_limit.has_value() ? parsed_limit.value() : query::max_rows);
    }

    if (!tablet_info.has_value()) {
@@ -454,7 +467,8 @@ select_statement::do_execute(query_processor& qp,
 future<::shared_ptr<cql_transport::messages::result_message>>
 select_statement::execute_without_checking_exception_message_aggregate_or_paged(query_processor& qp,
        lw_shared_ptr<query::read_command> command, dht::partition_range_vector&& key_ranges, service::query_state& state,
-        const query_options& options, gc_clock::time_point now, int32_t page_size, bool aggregate, bool nonpaged_filtering) const {
+        const query_options& options, gc_clock::time_point now, int32_t page_size, bool aggregate, bool nonpaged_filtering,
+        uint64_t limit) const {
    command->slice.options.set<query::partition_slice::option::allow_short_read>();
    auto timeout_duration = get_timeout(state.get_client_state(), options);
    auto timeout = db::timeout_clock::now() + timeout_duration;
@@ -462,8 +476,11 @@ select_statement::execute_without_checking_exception_message_aggregate_or_paged(
            state, options, command, std::move(key_ranges), _restrictions_need_filtering ? _restrictions : nullptr);

    if (aggregate || nonpaged_filtering) {
-        auto builder = cql3::selection::result_set_builder(*_selection, now, *_group_by_cell_indices);
-        coordinator_result<void> result_void = co_await utils::result_do_until([&p] {return p->is_exhausted();},
+        auto builder = cql3::selection::result_set_builder(*_selection, now, *_group_by_cell_indices, limit);
+        coordinator_result<void> result_void = co_await utils::result_do_until(
+                [&p, &builder, limit] {
+                    return p->is_exhausted() || (limit < builder.result_set_size());
+                },
                [&p, &builder, page_size, now, timeout] {
                    return p->fetch_page_result(builder, page_size, now, timeout);
                }
@@ -586,7 +603,7 @@ indexed_table_select_statement::prepare_command_for_base_query(query_processor&
            std::move(slice),
            qp.proxy().get_max_result_size(slice),
            query::tombstone_limit(qp.proxy().get_tombstone_limit()),
-            query::row_limit(get_limit(options)),
+            query::row_limit(get_inner_loop_limit(get_limit(options, _limit), _selection->is_aggregate())),
            query::partition_limit(query::max_partitions),
            now,
            tracing::make_trace_info(state.get_trace_state()),
@@ -1368,7 +1385,8 @@ indexed_table_select_statement::find_index_partition_ranges(query_processor& qp,
    using value_type = std::tuple<dht::partition_range_vector, lw_shared_ptr<const service::pager::paging_state>>;
    auto now = gc_clock::now();
    auto timeout = db::timeout_clock::now() + get_timeout(state.get_client_state(), options);
-    return read_posting_list(qp, options, get_limit(options), state, now, timeout, false).then(utils::result_wrap(
+    const uint64_t limit = get_inner_loop_limit(get_limit(options, _limit), _selection->is_aggregate());
+    return read_posting_list(qp, options, limit, state, now, timeout, false).then(utils::result_wrap(
            [this, &options] (::shared_ptr<cql_transport::messages::result_message::rows> rows) {
        auto rs = cql3::untyped_result_set(rows);
        dht::partition_range_vector partition_ranges;
@@ -1417,7 +1435,8 @@ indexed_table_select_statement::find_index_clustering_rows(query_processor& qp,
    using value_type = std::tuple<std::vector<indexed_table_select_statement::primary_key>, lw_shared_ptr<const service::pager::paging_state>>;
    auto now = gc_clock::now();
    auto timeout = db::timeout_clock::now() + get_timeout(state.get_client_state(), options);
-    return read_posting_list(qp, options, get_limit(options), state, now, timeout, true).then(utils::result_wrap(
+    const uint64_t limit = get_inner_loop_limit(get_limit(options, _limit), _selection->is_aggregate());
+    return read_posting_list(qp, options, limit, state, now, timeout, true).then(utils::result_wrap(
            [this, &options] (::shared_ptr<cql_transport::messages::result_message::rows> rows) {

        auto rs = cql3::untyped_result_set(rows);
@@ -1683,6 +1702,7 @@ schema_ptr mutation_fragments_select_statement::generate_output_schema(schema_pt

 future<exceptions::coordinator_result<service::storage_proxy_coordinator_query_result>>
 mutation_fragments_select_statement::do_query(
+        locator::effective_replication_map_ptr erm_keepalive,
        locator::host_id this_node,
        service::storage_proxy& sp,
        schema_ptr schema,
@@ -1690,7 +1710,7 @@ mutation_fragments_select_statement::do_query(
        dht::partition_range_vector partition_ranges,
        db::consistency_level cl,
        service::storage_proxy_coordinator_query_options optional_params) const {
-    auto res = co_await replica::mutation_dump::dump_mutations(sp.get_db(), schema, _underlying_schema, partition_ranges, *cmd, optional_params.timeout(sp));
+    auto res = co_await replica::mutation_dump::dump_mutations(sp.get_db(), std::move(erm_keepalive), schema, _underlying_schema, partition_ranges, *cmd, optional_params.timeout(sp));
    service::replicas_per_token_range last_replicas;
    if (this_node) {
        last_replicas.emplace(dht::token_range::make_open_ended_both_sides(), std::vector<locator::host_id>{this_node});
@@ -1704,7 +1724,7 @@ mutation_fragments_select_statement::do_execute(query_processor& qp, service::qu

    auto cl = options.get_consistency();

-    uint64_t limit = get_limit(options);
+    const uint64_t limit = get_inner_loop_limit(get_limit(options, _limit), _selection->is_aggregate());
    auto now = gc_clock::now();

    _stats.filtered_reads += _restrictions_need_filtering;
@@ -1762,7 +1782,7 @@ mutation_fragments_select_statement::do_execute(query_processor& qp, service::qu
    if (!aggregate && !_restrictions_need_filtering && (page_size <= 0
            || !service::pager::query_pagers::may_need_paging(*_schema, page_size,
                    *command, key_ranges))) {
-        return do_query({}, qp.proxy(), _schema, command, std::move(key_ranges), cl,
+        return do_query(erm_keepalive, {}, qp.proxy(), _schema, command, std::move(key_ranges), cl,
                {timeout, state.get_permit(), state.get_client_state(), state.get_trace_state(), {}, {}})
        .then(wrap_result_to_error_message([this, erm_keepalive, now, slice = command->slice] (service::storage_proxy_coordinator_query_result&& qr) mutable {
            cql3::selection::result_set_builder builder(*_selection, now);
@@ -1801,8 +1821,8 @@ mutation_fragments_select_statement::do_execute(query_processor& qp, service::qu
            std::move(key_ranges),
            _restrictions_need_filtering ? _restrictions : nullptr,
            [this, erm_keepalive, this_node] (service::storage_proxy& sp, schema_ptr schema, lw_shared_ptr<query::read_command> cmd, dht::partition_range_vector partition_ranges,
-                    db::consistency_level cl, service::storage_proxy_coordinator_query_options optional_params) {
-                return do_query(this_node, sp, std::move(schema), std::move(cmd), std::move(partition_ranges), cl, std::move(optional_params));
+                    db::consistency_level cl, service::storage_proxy_coordinator_query_options optional_params) mutable {
+                return do_query(std::move(erm_keepalive), this_node, sp, std::move(schema), std::move(cmd), std::move(partition_ranges), cl, std::move(optional_params));
            });

    if (_selection->is_trivial() && !_restrictions_need_filtering && !_per_partition_limit) {
@@ -2561,7 +2581,9 @@ std::unique_ptr<cql3::statements::raw::select_statement> build_select_statement(
    if (!where_clause.empty()) {
        out << " WHERE " << where_clause << " ALLOW FILTERING";
    }
-    return do_with_parser(out.str(), std::mem_fn(&cql3_parser::CqlParser::selectStatement));
+    // In general it's not a good idea to use the default dialect, but here the database is talking to
+    // itself, so we can hope the dialects are mutually compatible here.
+    return do_with_parser(out.str(), dialect{}, std::mem_fn(&cql3_parser::CqlParser::selectStatement));
 }

 }
--- a/cql3/statements/select_statement.hh
+++ b/cql3/statements/select_statement.hh
@@ -128,7 +128,7 @@ public:

    future<::shared_ptr<cql_transport::messages::result_message>> execute_without_checking_exception_message_aggregate_or_paged(query_processor& qp,
        lw_shared_ptr<query::read_command> cmd, dht::partition_range_vector&& partition_ranges, service::query_state& state,
-         const query_options& options, gc_clock::time_point now, int32_t page_size, bool aggregate, bool nonpaged_filtering) const;
+         const query_options& options, gc_clock::time_point now, int32_t page_size, bool aggregate, bool nonpaged_filtering, uint64_t limit) const;


    struct primary_key {
@@ -152,13 +152,10 @@ public:
    db::timeout_clock::duration get_timeout(const service::client_state& state, const query_options& options) const;

 protected:
-    uint64_t do_get_limit(const query_options& options, const std::optional<expr::expression>& limit, const expr::unset_bind_variable_guard& unset_guard, uint64_t default_limit) const;
-    uint64_t get_limit(const query_options& options) const {
-        return do_get_limit(options, _limit, _limit_unset_guard, query::max_rows);
-    }
-    uint64_t get_per_partition_limit(const query_options& options) const {
-        return do_get_limit(options, _per_partition_limit, _per_partition_limit_unset_guard, query::partition_max_rows);
-    }
+    using get_limit_result = bo::result<uint64_t, exceptions::invalid_request_exception>;
+    get_limit_result get_limit(const query_options& options, const std::optional<expr::expression>& limit) const;
+    static uint64_t get_inner_loop_limit(const select_statement::get_limit_result& limit, bool is_aggregate);
+
    bool needs_post_query_ordering() const;
    virtual void update_stats_rows_read(int64_t rows_read) const {
        _stats.rows_read += rows_read;
@@ -338,6 +335,7 @@ public:
 private:
    future<exceptions::coordinator_result<service::storage_proxy_coordinator_query_result>>
    do_query(
+            locator::effective_replication_map_ptr erm_keepalive,
            locator::host_id this_node,
            service::storage_proxy& sp,
            schema_ptr schema,
--- a/cql3/util.cc
+++ b/cql3/util.cc
@@ -20,7 +20,7 @@ void __sanitizer_finish_switch_fiber(void* fake_stack_save, const void** stack_b

 namespace cql3::util {

-static void do_with_parser_impl_impl(const sstring_view& cql, noncopyable_function<void (cql3_parser::CqlParser& parser)> f) {
+static void do_with_parser_impl_impl(const sstring_view& cql, dialect d, noncopyable_function<void (cql3_parser::CqlParser& parser)> f) {
    cql3_parser::CqlLexer::collector_type lexer_error_collector(cql);
    cql3_parser::CqlParser::collector_type parser_error_collector(cql);
    cql3_parser::CqlLexer::InputStreamType input{reinterpret_cast<const ANTLR_UINT8*>(cql.begin()), ANTLR_ENC_UTF8, static_cast<ANTLR_UINT32>(cql.size()), nullptr};
@@ -29,13 +29,14 @@ static void do_with_parser_impl_impl(const sstring_view& cql, noncopyable_functi
    cql3_parser::CqlParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());
    cql3_parser::CqlParser parser{&tstream};
    parser.set_error_listener(parser_error_collector);
+    parser.set_dialect(d);
    f(parser);
 }

 #ifndef DEBUG

-void do_with_parser_impl(const sstring_view& cql, noncopyable_function<void (cql3_parser::CqlParser& parser)> f) {
-    return do_with_parser_impl_impl(cql, std::move(f));
+void do_with_parser_impl(const sstring_view& cql, dialect d, noncopyable_function<void (cql3_parser::CqlParser& parser)> f) {
+    return do_with_parser_impl_impl(cql, d, std::move(f));
 }

 #else
@@ -47,6 +48,7 @@ void do_with_parser_impl(const sstring_view& cql, noncopyable_function<void (cql
 struct thunk_args {
    // arguments to do_with_parser_impl_impl
    const sstring_view& cql;
+    dialect d;
    noncopyable_function<void (cql3_parser::CqlParser&)>&& func;
    // Exceptions can't be returned from another stack, so store
    // any thrown exception here
@@ -70,7 +72,7 @@ static void thunk(int p1, int p2) {
    // Complete stack switch started in do_with_parser_impl()
    __sanitizer_finish_switch_fiber(nullptr, &san.stack_bottom, &san.stack_size);
    try {
-        do_with_parser_impl_impl(args->cql, std::move(args->func));
+        do_with_parser_impl_impl(args->cql, args->d, std::move(args->func));
    } catch (...) {
        args->ex = std::current_exception();
    }
@@ -79,11 +81,12 @@ static void thunk(int p1, int p2) {
    setcontext(&args->caller_stack);
 };

-void do_with_parser_impl(const sstring_view& cql, noncopyable_function<void (cql3_parser::CqlParser& parser)> f) {
+void do_with_parser_impl(const sstring_view& cql, dialect d, noncopyable_function<void (cql3_parser::CqlParser& parser)> f) {
    static constexpr size_t stack_size = 1 << 20;
    static thread_local std::unique_ptr<char[]> stack = std::make_unique<char[]>(stack_size);
    thunk_args args{
        .cql = cql,
+        .d = d,
        .func = std::move(f),
    };
    ucontext_t uc;
@@ -92,7 +95,7 @@ void do_with_parser_impl(const sstring_view& cql, noncopyable_function<void (cql
    if (stack.get() <= (char*)&uc && (char*)&uc < stack.get() + stack_size) {
        // We are already running on the large stack, so just call the
        // parser directly.
-        return do_with_parser_impl_impl(cql, std::move(f));
+        return do_with_parser_impl_impl(cql, d, std::move(f));
    }
    uc.uc_stack.ss_sp = stack.get();
    uc.uc_stack.ss_size = stack_size;
@@ -136,12 +139,12 @@ sstring relations_to_where_clause(const expr::expression& e) {
    return boost::algorithm::join(expressions, " AND ");
 }

-expr::expression where_clause_to_relations(const sstring_view& where_clause) {
-    return do_with_parser(where_clause, std::mem_fn(&cql3_parser::CqlParser::whereClause));
+expr::expression where_clause_to_relations(const sstring_view& where_clause, dialect d) {
+    return do_with_parser(where_clause, d, std::mem_fn(&cql3_parser::CqlParser::whereClause));
 }

-sstring rename_column_in_where_clause(const sstring_view& where_clause, column_identifier::raw from, column_identifier::raw to) {
-    std::vector<expr::expression> relations = boolean_factors(where_clause_to_relations(where_clause));
+sstring rename_column_in_where_clause(const sstring_view& where_clause, column_identifier::raw from, column_identifier::raw to, dialect d) {
+    std::vector<expr::expression> relations = boolean_factors(where_clause_to_relations(where_clause, d));
    std::vector<expr::expression> new_relations;
    new_relations.reserve(relations.size());

--- a/cql3/util.hh
+++ b/cql3/util.hh
@@ -21,18 +21,19 @@
 #include "cql3/CqlParser.hpp"
 #include "cql3/error_collector.hh"
 #include "cql3/statements/raw/select_statement.hh"
+#include "cql3/dialect.hh"

 namespace cql3 {

 namespace util {


-void do_with_parser_impl(const sstring_view& cql, noncopyable_function<void (cql3_parser::CqlParser& p)> func);
+void do_with_parser_impl(const sstring_view& cql, dialect d, noncopyable_function<void (cql3_parser::CqlParser& p)> func);

 template <typename Func, typename Result = cql3_parser::unwrap_uninitialized_t<std::invoke_result_t<Func, cql3_parser::CqlParser&>>>
-Result do_with_parser(const sstring_view& cql, Func&& f) {
+Result do_with_parser(const sstring_view& cql, dialect d, Func&& f) {
    std::optional<Result> ret;
-    do_with_parser_impl(cql, [&] (cql3_parser::CqlParser& parser) {
+    do_with_parser_impl(cql, d, [&] (cql3_parser::CqlParser& parser) {
        ret.emplace(f(parser));
    });
    return std::move(*ret);
@@ -40,9 +41,9 @@ Result do_with_parser(const sstring_view& cql, Func&& f) {

 sstring relations_to_where_clause(const expr::expression& e);

-expr::expression where_clause_to_relations(const sstring_view& where_clause);
+expr::expression where_clause_to_relations(const sstring_view& where_clause, dialect d);

-sstring rename_column_in_where_clause(const sstring_view& where_clause, column_identifier::raw from, column_identifier::raw to);
+sstring rename_column_in_where_clause(const sstring_view& where_clause, column_identifier::raw from, column_identifier::raw to, dialect d);

 /// build a CQL "select" statement with the desired parameters.
 /// If select_all_columns==true, all columns are selected and the value of
--- a/db/commitlog/commitlog.cc
+++ b/db/commitlog/commitlog.cc
@@ -1100,7 +1100,12 @@ public:
            write(out, uint64_t(0));
        }

-        buf.remove_suffix(buf.size_bytes() - size);
+        auto to_remove = buf.size_bytes() - size;
+        // #20862 - we decrement usage counter based on buf.size() below.
+        // Since we are shrinking buffer here, we need to also decrement
+        // counter already
+        buf.remove_suffix(to_remove);
+        _segment_manager->totals.buffer_list_bytes -= to_remove;

        // Build sector checksums.
        auto id = net::hton(_desc.id);
@@ -3221,6 +3226,10 @@ uint64_t db::commitlog::get_total_size() const {
        ;
 }

+uint64_t db::commitlog::get_buffer_size() const {
+    return _segment_manager->totals.buffer_list_bytes;
+}
+
 uint64_t db::commitlog::get_completed_tasks() const {
    return _segment_manager->totals.allocation_count;
 }
--- a/db/commitlog/commitlog.hh
+++ b/db/commitlog/commitlog.hh
@@ -297,6 +297,7 @@ public:
    future<> delete_segments(std::vector<sstring>) const;

    uint64_t get_total_size() const;
+    uint64_t get_buffer_size() const;
    uint64_t get_completed_tasks() const;
    uint64_t get_flush_count() const;
    uint64_t get_pending_tasks() const;
--- a/db/config.cc
+++ b/db/config.cc
@@ -99,6 +99,21 @@ error_injection_list_to_json(const std::vector<db::config::error_injection_at_st
    return value_to_json("error_injection_list");
 }

+template <>
+bool
+config_from_string(std::string_view value) {
+    // boost::lexical_cast doesn't accept true/false, which are our output representations
+    // for bools. We want round-tripping, so we need to accept true/false. For backward
+    // compatibility, we also accept 1/0. #19791.
+    if (value == "true" || value == "1") {
+        return true;
+    } else if (value == "false" || value == "0") {
+        return false;
+    } else {
+        throw boost::bad_lexical_cast(typeid(std::string_view), typeid(bool));
+    }
+}
+
 template <>
 const config_type config_type_for<bool> = config_type("bool", value_to_json<bool>);

@@ -177,7 +192,7 @@ struct convert<seastar::log_level> {
        if (!convert<std::string>::decode(node, tmp)) {
            return false;
        }
-        rhs = boost::lexical_cast<seastar::log_level>(tmp);
+        rhs = utils::config_from_string<seastar::log_level>(tmp);
        return true;
    }
 };
@@ -1057,6 +1072,8 @@ db::config::config(std::shared_ptr<db::extensions> exts)
            "Make the system.config table UPDATEable.")
    , enable_parallelized_aggregation(this, "enable_parallelized_aggregation", liveness::LiveUpdate, value_status::Used, true,
            "Use on a new, parallel algorithm for performing aggregate queries.")
+    , cql_duplicate_bind_variable_names_refer_to_same_variable(this, "cql_duplicate_bind_variable_names_refer_to_same_variable", liveness::LiveUpdate, value_status::Used, true,
+            "A bind variable that appears twice in a CQL query refers to a single variable (if false, no name matching is performed).")
    , alternator_port(this, "alternator_port", value_status::Used, 0, "Alternator API port.")
    , alternator_https_port(this, "alternator_https_port", value_status::Used, 0, "Alternator API HTTPS port.")
    , alternator_address(this, "alternator_address", value_status::Used, "0.0.0.0", "Alternator API listening address.")
--- a/db/config.hh
+++ b/db/config.hh
@@ -399,6 +399,7 @@ public:
    named_value<bool> enable_optimized_reversed_reads;
    named_value<bool> enable_cql_config_updates;
    named_value<bool> enable_parallelized_aggregation;
+    named_value<bool> cql_duplicate_bind_variable_names_refer_to_same_variable;

    named_value<uint16_t> alternator_port;
    named_value<uint16_t> alternator_https_port;
--- a/db/consistency_level.cc
+++ b/db/consistency_level.cc
@@ -36,7 +36,7 @@ size_t quorum_for(const locator::effective_replication_map& erm) {
 size_t local_quorum_for(const locator::effective_replication_map& erm, const sstring& dc) {
    using namespace locator;

-    auto& rs = erm.get_replication_strategy();
+    const auto& rs = erm.get_replication_strategy();

    if (rs.get_type() == replication_strategy_type::network_topology) {
        const network_topology_strategy* nrs =
@@ -65,7 +65,7 @@ size_t block_for_local_serial(const locator::effective_replication_map& erm) {
 size_t block_for_each_quorum(const locator::effective_replication_map& erm) {
    using namespace locator;

-    auto& rs = erm.get_replication_strategy();
+    const auto& rs = erm.get_replication_strategy();

    if (rs.get_type() == replication_strategy_type::network_topology) {
        const network_topology_strategy* nrs =
@@ -260,7 +260,7 @@ filter_for_query(consistency_level cl,
    size_t bf = block_for(erm, cl);

    if (read_repair == read_repair_decision::DC_LOCAL) {
-        bf = std::max(block_for(erm, cl), local_count);
+        bf = std::max(bf, local_count);
    }

    if (bf >= live_endpoints.size()) { // RRD.DC_LOCAL + CL.LOCAL or CL.ALL
@@ -334,7 +334,13 @@ filter_for_query(consistency_level cl,
        if (!old_node && ht_max - ht_min > 0.01) { // if there is old node or hit rates are close skip calculations
            // local node is always first if present (see storage_proxy::get_endpoints_for_reading)
            unsigned local_idx = erm.get_topology().is_me(epi[0].first) ? 0 : epi.size() + 1;
-            live_endpoints = boost::copy_range<inet_address_vector_replica_set>(miss_equalizing_combination(epi, local_idx, remaining_bf, bool(extra)));
+            auto weighted = boost::copy_range<inet_address_vector_replica_set>(miss_equalizing_combination(epi, local_idx, remaining_bf, bool(extra)));
+            // Workaround for https://github.com/scylladb/scylladb/issues/9285
+            auto last = std::adjacent_find(weighted.begin(), weighted.end());
+            if (last == weighted.end()) {
+                // No duplicates, so use the result based on hit rates
+                live_endpoints = std::move(weighted);
+            }
        }
    }

--- a/db/cql_type_parser.cc
+++ b/db/cql_type_parser.cc
@@ -20,7 +20,9 @@
 #include "utils/sorting.hh"

 static ::shared_ptr<cql3::cql3_type::raw> parse_raw(const sstring& str) {
-    return cql3::util::do_with_parser(str,
+    // In general it's a bad idea to use the default dialect, but type parsing
+    // should be dialect-agnostic.
+    return cql3::util::do_with_parser(str, cql3::dialect{},
        [] (cql3_parser::CqlParser& parser) {
            return parser.comparator_type(true);
        });
--- a/db/hints/internal/hint_endpoint_manager.cc
+++ b/db/hints/internal/hint_endpoint_manager.cc
@@ -167,6 +167,7 @@ future<db::commitlog> hint_endpoint_manager::add_store() noexcept {
        return io_check([name = _hints_dir.c_str()] { return recursive_touch_directory(name); }).then([this] () {
            commitlog::config cfg;

+            cfg.sched_group = _shard_manager.local_db().commitlog()->active_config().sched_group;
            cfg.commit_log_location = _hints_dir.c_str();
            cfg.commitlog_segment_size_in_mb = resource_manager::hint_segment_size_in_mb;
            cfg.commitlog_total_space_in_mb = resource_manager::max_hints_per_ep_size_mb;
--- a/db/hints/internal/hint_sender.cc
+++ b/db/hints/internal/hint_sender.cc
@@ -76,23 +76,6 @@ future<timespec> hint_sender::get_last_file_modification(const sstring& fname) {
    });
 }

-future<> hint_sender::do_send_one_mutation(frozen_mutation_and_schema m, locator::effective_replication_map_ptr ermp, const inet_address_vector_replica_set& natural_endpoints) {
-    return futurize_invoke([this, m = std::move(m), ermp = std::move(ermp), &natural_endpoints] () mutable -> future<> {
-        // The fact that we send with CL::ALL in both cases below ensures that new hints are not going
-        // to be generated as a result of hints sending.
-        const auto& tm = ermp->get_token_metadata();
-        const auto maybe_addr = tm.get_endpoint_for_host_id_if_known(end_point_key());
-
-        if (maybe_addr && boost::range::find(natural_endpoints, *maybe_addr) != natural_endpoints.end()) {
-            manager_logger.trace("Sending directly to {}", end_point_key());
-            return _proxy.send_hint_to_endpoint(std::move(m), std::move(ermp), *maybe_addr);
-        } else {
-            manager_logger.trace("Endpoints set has changed and {} is no longer a replica. Mutating from scratch...", end_point_key());
-            return _proxy.send_hint_to_all_replicas(std::move(m));
-        }
-    });
-}
-
 bool hint_sender::can_send() noexcept {
    if (stopping() && !draining()) {
        return false;
@@ -274,11 +257,30 @@ void hint_sender::start() {
 }

 future<> hint_sender::send_one_mutation(frozen_mutation_and_schema m) {
-    auto erm = _db.find_column_family(m.s).get_effective_replication_map();
+    auto ermp = _db.find_column_family(m.s).get_effective_replication_map();
    auto token = dht::get_token(*m.s, m.fm.key());
-    inet_address_vector_replica_set natural_endpoints = erm->get_natural_endpoints(std::move(token));
+    inet_address_vector_replica_set natural_endpoints = ermp->get_natural_endpoints(std::move(token));

-    return do_send_one_mutation(std::move(m), std::move(erm), std::move(natural_endpoints));
+    return futurize_invoke([this, m = std::move(m), ermp = std::move(ermp), &natural_endpoints] () mutable -> future<> {
+        // The fact that we send with CL::ALL in both cases below ensures that new hints are not going
+        // to be generated as a result of hints sending.
+        const auto& tm = ermp->get_token_metadata();
+        const auto maybe_addr = tm.get_endpoint_for_host_id_if_known(end_point_key());
+
+        if (maybe_addr && boost::range::find(natural_endpoints, *maybe_addr) != natural_endpoints.end() && !tm.is_leaving(end_point_key())) {
+            manager_logger.trace("Sending directly to {}", end_point_key());
+            return _proxy.send_hint_to_endpoint(std::move(m), std::move(ermp), *maybe_addr);
+        } else {
+            if (manager_logger.is_enabled(log_level::trace)) {
+                if (tm.is_leaving(end_point_key())) {
+                    manager_logger.trace("The original target endpoint {} is leaving. Mutating from scratch...", end_point_key());
+                } else {
+                    manager_logger.trace("Endpoints set has changed and {} is no longer a replica. Mutating from scratch...", end_point_key());
+                }
+            }
+            return _proxy.send_hint_to_all_replicas(std::move(m));
+        }
+    });
 }

 future<> hint_sender::send_one_hint(lw_shared_ptr<send_one_file_ctx> ctx_ptr, fragmented_temporary_buffer buf, db::replay_position rp, gc_clock::duration secs_since_file_mod, const sstring& fname) {
--- a/db/hints/internal/hint_sender.hh
+++ b/db/hints/internal/hint_sender.hh
@@ -233,18 +233,14 @@ private:
    /// \return
    const column_mapping& get_column_mapping(lw_shared_ptr<send_one_file_ctx> ctx_ptr, const frozen_mutation& fm, const hint_entry_reader& hr);

-    /// \brief Perform a single mutation send attempt.
+    /// \brief Send one mutation out.
    ///
    /// If the original destination end point is still a replica for the given mutation - send the mutation directly
    /// to it, otherwise execute the mutation "from scratch" with CL=ALL.
    ///
-    /// \param m mutation to send
-    /// \param ermp points to the effective_replication_map used to obtain \c natural_endpoints
-    /// \param natural_endpoints current replicas for the given mutation
-    /// \return future that resolves when the operation is complete
-    future<> do_send_one_mutation(frozen_mutation_and_schema m, locator::effective_replication_map_ptr ermp, const inet_address_vector_replica_set& natural_endpoints);
-
-    /// \brief Send one mutation out.
+    /// The mutation will be sent with CL=ALL semantics to all current replicas also in case if the original destination
+    /// is leaving the cluster - otherwise the hint might be applied only on the leaving node and streaming might
+    /// miss it.
    ///
    /// \param m mutation to send
    /// \return future that resolves when the mutation sending processing is complete.
--- a/db/hints/manager.hh
+++ b/db/hints/manager.hh
@@ -34,8 +34,6 @@
 #include <span>
 #include <unordered_map>

-class fragmented_temporary_buffer;
-
 namespace utils {
 class directories;
 } // namespace utils
--- a/db/schema_tables.cc
+++ b/db/schema_tables.cc
@@ -779,40 +779,35 @@ redact_columns_for_missing_features(mutation&& m, schema_features features) {
 */
 future<table_schema_version> calculate_schema_digest(distributed<service::storage_proxy>& proxy, schema_features features, noncopyable_function<bool(std::string_view)> accept_keyspace)
 {
-    auto map = [&proxy, features, accept_keyspace = std::move(accept_keyspace)] (sstring table) mutable -> future<std::vector<mutation>> {
+    using mutations_generator = coroutine::experimental::generator<mutation>;
+
+    auto map = [&proxy, features, accept_keyspace = std::move(accept_keyspace)] (sstring table) mutable -> mutations_generator {
        auto& db = proxy.local().get_db();
        auto rs = co_await db::system_keyspace::query_mutations(db, NAME, table);
        auto s = db.local().find_schema(NAME, table);
-        std::vector<mutation> mutations;
        for (auto&& p : rs->partitions()) {
-            auto mut = co_await unfreeze_gently(p.mut(), s);
-            auto partition_key = value_cast<sstring>(utf8_type->deserialize(mut.key().get_component(*s, 0)));
+            auto partition_key = value_cast<sstring>(utf8_type->deserialize(::partition_key(p.mut().key()).get_component(*s, 0)));
            if (!accept_keyspace(partition_key)) {
                continue;
            }
-            mut = redact_columns_for_missing_features(std::move(mut), features);
-            mutations.emplace_back(std::move(mut));
-        }
-        co_return mutations;
-    };
-    auto reduce = [features] (auto& hash, auto&& mutations) {
-        for (const mutation& m : mutations) {
-            feed_hash_for_schema_digest(hash, m, features);
+            auto mut = co_await unfreeze_gently(p.mut(), s);
+            co_yield redact_columns_for_missing_features(std::move(mut), features);
        }
    };
    auto hash = md5_hasher();
    auto tables = all_table_names(features);
    {
        for (auto& table: tables) {
-            auto mutations = co_await map(table);
-            if (diff_logger.is_enabled(logging::log_level::trace)) {
-                for (const mutation& m : mutations) {
+            auto gen_mutations = map(table);
+            while (auto mut_opt = co_await gen_mutations()) {
+                auto& m = *mut_opt;
+                feed_hash_for_schema_digest(hash, m, features);
+                if (diff_logger.is_enabled(logging::log_level::trace)) {
                    md5_hasher h;
                    feed_hash_for_schema_digest(h, m, features);
                    diff_logger.trace("Digest {} for {}, compacted={}", h.finalize(), m, compact_for_schema_digest(m));
                }
            }
-            reduce(hash, mutations);
        }
        co_return utils::UUID_gen::get_name_UUID(hash.finalize());
    }
@@ -1948,7 +1943,9 @@ static shared_ptr<cql3::functions::user_aggregate> create_aggregate(replica::dat

    bytes_opt initcond = std::nullopt;
    if (initcond_str) {
-        auto expr = cql3::util::do_with_parser(*initcond_str, std::mem_fn(&cql3_parser::CqlParser::term));
+        // In general using the default dialect is wrong, but here the database is communicating with itself,
+        // not the user, so any dialect should work.
+        auto expr = cql3::util::do_with_parser(*initcond_str, cql3::dialect{}, std::mem_fn(&cql3_parser::CqlParser::term));
        auto dummy_ident = ::make_shared<cql3::column_identifier>("", true);
        auto column_spec = make_lw_shared<cql3::column_specification>("", "", dummy_ident, state_type);
        auto raw = cql3::expr::evaluate(prepare_expression(expr, db.as_data_dictionary(), "", nullptr, {column_spec}), cql3::query_options::DEFAULT);
--- a/db/system_distributed_keyspace.cc
+++ b/db/system_distributed_keyspace.cc
@@ -756,8 +756,8 @@ system_distributed_keyspace::get_cdc_desc_v1_timestamps(context ctx) {
    co_return res;
 }

-future<qos::service_levels_info> system_distributed_keyspace::get_service_levels() const {
-    return qos::get_service_levels(_qp, NAME, SERVICE_LEVELS, db::consistency_level::ONE);
+future<qos::service_levels_info> system_distributed_keyspace::get_service_levels(qos::query_context ctx) const {
+    return qos::get_service_levels(_qp, NAME, SERVICE_LEVELS, db::consistency_level::ONE, ctx);
 }

 future<qos::service_levels_info> system_distributed_keyspace::get_service_level(sstring service_level_name) const {
--- a/db/system_distributed_keyspace.hh
+++ b/db/system_distributed_keyspace.hh
@@ -112,7 +112,7 @@ public:

    future<db_clock::time_point> cdc_current_generation_timestamp(context);

-    future<qos::service_levels_info> get_service_levels() const;
+    future<qos::service_levels_info> get_service_levels(qos::query_context ctx) const;
    future<qos::service_levels_info> get_service_level(sstring service_level_name) const;
    future<> set_service_level(sstring service_level_name, qos::service_level_options slo) const;
    future<> drop_service_level(sstring service_level_name) const;
--- a/db/view/view.cc
+++ b/db/view/view.cc
@@ -1673,7 +1673,22 @@ get_view_natural_endpoint(
        return {};
    }
    auto replica = view_endpoints[base_it - base_endpoints.begin()];
-    return view_topology.get_node(replica).endpoint();
+
+    // https://github.com/scylladb/scylladb/issues/19439
+    // With tablets, a node being replaced might transition to "left" state
+    // but still be kept as a replica. In such case, the IP of the replaced
+    // node will be lost and `endpoint()` will return an empty IP here.
+    // As of writing this, storage proxy was not migrated to host IDs yet
+    // (#6403) and hints are not prepared to handle nodes that are left
+    // but are still replicas. Therefore, there is no other sensible option
+    // right now but to give up attempt to send the update or write a hint
+    // to the paired, permanently down replica.
+    const auto ep = view_topology.get_node(replica).endpoint();
+    if (ep != gms::inet_address{}) {
+        return ep;
+    } else {
+        return std::nullopt;
+    }
 }

 static future<> apply_to_remote_endpoints(service::storage_proxy& proxy, locator::effective_replication_map_ptr ermp,
@@ -2210,11 +2225,11 @@ view_builder::view_build_statuses(sstring keyspace, sstring view_name) const {

 future<> view_builder::add_new_view(view_ptr view, build_step& step) {
    vlogger.info0("Building view {}.{}, starting at token {}", view->ks_name(), view->cf_name(), step.current_token());
+    if (this_shard_id() == 0) {
+        co_await _sys_dist_ks.start_view_build(view->ks_name(), view->cf_name());
+    }
+    co_await _sys_ks.register_view_for_building(view->ks_name(), view->cf_name(), step.current_token());
    step.build_status.emplace(step.build_status.begin(), view_build_status{view, step.current_token(), std::nullopt});
-    auto f = this_shard_id() == 0 ? _sys_dist_ks.start_view_build(view->ks_name(), view->cf_name()) : make_ready_future<>();
-    return when_all_succeed(
-            std::move(f),
-            _sys_ks.register_view_for_building(view->ks_name(), view->cf_name(), step.current_token())).discard_result();
 }

 static future<> flush_base(lw_shared_ptr<replica::column_family> base, abort_source& as) {
@@ -2541,6 +2556,12 @@ public:
                    _step.build_status.pop_back();
                }
            }
+
+            // before going back to the minimum token, advance current_key to the end
+            // and check for built views in that range.
+            _step.current_key = {_step.prange.end().value_or(dht::ring_position::max()).value().token(), partition_key::make_empty()};
+            check_for_built_views();
+
            _step.current_key = {dht::minimum_token(), partition_key::make_empty()};
            for (auto&& vs : _step.build_status) {
                vs.next_token = dht::minimum_token();
@@ -2705,16 +2726,16 @@ future<> view_builder::register_staging_sstable(sstables::shared_sstable sst, lw
    return _vug.register_staging_sstable(std::move(sst), std::move(table));
 }

-future<bool> check_needs_view_update_path(view_builder& vb, const locator::token_metadata& tm, const replica::table& t, streaming::stream_reason reason) {
+future<bool> check_needs_view_update_path(view_builder& vb, locator::token_metadata_ptr tmptr, const replica::table& t, streaming::stream_reason reason) {
    if (is_internal_keyspace(t.schema()->ks_name())) {
        return make_ready_future<bool>(false);
    }
    if (reason == streaming::stream_reason::repair && !t.views().empty()) {
        return make_ready_future<bool>(true);
    }
-    return do_with(t.views(), [&vb, &tm] (auto& views) {
+    return do_with(std::move(tmptr), t.views(), [&vb] (locator::token_metadata_ptr& tmptr, auto& views) {
        return map_reduce(views,
-                [&vb, &tm] (const view_ptr& view) { return vb.check_view_build_ongoing(tm, view->ks_name(), view->cf_name()); },
+                [&] (const view_ptr& view) { return vb.check_view_build_ongoing(*tmptr, view->ks_name(), view->cf_name()); },
                false,
                std::logical_or<bool>());
    });
--- a/db/view/view_update_checks.hh
+++ b/db/view/view_update_checks.hh
@@ -10,20 +10,17 @@

 #include <seastar/core/future.hh>
 #include "streaming/stream_reason.hh"
+#include "locator/token_metadata_fwd.hh"
 #include "seastarx.hh"

 namespace replica {
 class table;
 }

-namespace locator {
-class token_metadata;
-}
-
 namespace db::view {
 class view_builder;

-future<bool> check_needs_view_update_path(view_builder& vb, const locator::token_metadata& tm, const replica::table& t,
+future<bool> check_needs_view_update_path(view_builder& vb, locator::token_metadata_ptr tmptr, const replica::table& t,
        streaming::stream_reason reason);

 }
--- a/dist/common/scripts/scylla_coredump_setup
+++ b/dist/common/scripts/scylla_coredump_setup
@@ -40,6 +40,25 @@ if __name__ == '__main__':
                        help='enable compress on systemd-coredump')
    args = parser.parse_args()

+    # Seems like specific version of systemd pacakge on RHEL9 has a bug on
+    # SELinux configuration, it introduced "systemd-container-coredump" module
+    # to provide rule for systemd-coredump but not enabled by default.
+    # We have to manually load it, otherwise it causes permission errror.
+    # (#19325)
+    if is_redhat_variant() and distro.major_version() == '9':
+        if not shutil.which('getenforce'):
+            pkg_install('libselinux-utils')
+        if not shutil.which('semodule'):
+            pkg_install('policycoreutils')
+        enforce = out('getenforce')
+        if enforce != "Disabled":
+            if os.path.exists('/usr/share/selinux/packages/targeted/systemd-container-coredump.pp.bz2'):
+                modules = out('semodule -l')
+                match = re.match(r'^systemd-container-coredump$', modules, re.MULTILINE)
+                if not match:
+                    run('semodule -v -i /usr/share/selinux/packages/targeted/systemd-container-coredump.pp.bz2', shell=True, check=True)
+                    run('semodule -v -e systemd-container-coredump', shell=True, check=True)
+
    # abrt-ccpp.service needs to stop before enabling systemd-coredump,
    # since both will try to install kernel coredump handler
    # (This will only requires for abrt < 2.14)
--- a/dist/common/scripts/scylla_raid_setup
+++ b/dist/common/scripts/scylla_raid_setup
@@ -16,6 +16,7 @@ import sys
 import stat
 import logging
 import pyudev
+import psutil
 from pathlib import Path
 from scylla_util import *
 from subprocess import run, SubprocessError
@@ -92,6 +93,15 @@ class UdevInfo:
    def id_links(self):
        return [l for l in self.device.device_links if l.startswith('/dev/disk/by-id')]

+
+def is_selinux_enabled():
+    partitions = psutil.disk_partitions(all=True)
+    for p in partitions:
+        if p.fstype == 'selinuxfs':
+            if os.path.exists(p.mountpoint + '/enforce'):
+                return True
+    return False
+
 if __name__ == '__main__':
    if os.getuid() > 0:
        print('Requires root permission.')
@@ -325,9 +335,51 @@ WantedBy=local-fs.target
        os.chown(dpath, uid, gid)

    if is_debian_variant():
+        if not shutil.which('update-initramfs'):
+            pkg_install('initramfs-tools')
        run('update-initramfs -u', shell=True, check=True)

    if not udev_info.uuid_link:
        LOGGER.error(f'Error detected, dumping udev env parameters on {fsdev}')
        udev_info.verify()
        udev_info.dump_variables()
+
+    if is_redhat_variant():
+        offline_skip_relabel = False
+        has_semanage = True
+        if not shutil.which('matchpathcon'):
+            offline_skip_relabel = True
+            pkg_install('libselinux-utils', offline_exit=False)
+        if not shutil.which('restorecon'):
+            offline_skip_relabel = True
+            pkg_install('policycoreutils', offline_exit=False)
+        if not shutil.which('semanage'):
+            if is_offline():
+                has_semanage = False
+            else:
+                pkg_install('policycoreutils-python-utils')
+        if is_offline() and offline_skip_relabel:
+            print('Unable to find SELinux tools, skip relabeling.')
+            sys.exit(0)
+
+        selinux_context = out('matchpathcon -n /var/lib/systemd/coredump')
+        selinux_type = selinux_context.split(':')[2]
+        if has_semanage:
+            run(f'semanage fcontext -a -t {selinux_type} "{root}/coredump(/.*)?"', shell=True, check=True)
+        else:
+            # without semanage, we need to update file_contexts directly,
+            # and compile it to binary format (.bin file)
+            try:
+                with open('/etc/selinux/targeted/contexts/files/file_contexts.local', 'a') as f:
+                    spacer = ''
+                    if f.tell() != 0:
+                        spacer = '\n'
+                    f.write(f'{spacer}{root}/coredump(/.*)?   {selinux_context}\n')
+            except FileNotFoundError as e:
+                print('Unable to find SELinux policy files, skip relabeling.')
+                sys.exit(0)
+            run('sefcontext_compile /etc/selinux/targeted/contexts/files/file_contexts.local', shell=True, check=True)
+        if is_selinux_enabled():
+            run(f'restorecon -F -v -R {root}', shell=True, check=True)
+        else:
+            Path('/.autorelabel').touch(exist_ok=True)
--- a/dist/common/scripts/scylla_util.py
+++ b/dist/common/scripts/scylla_util.py
@@ -293,13 +293,14 @@ def swap_exists():
    swaps = out('swapon --noheadings --raw')
    return True if swaps != '' else False

-def pkg_error_exit(pkg):
+def pkg_error_exit(pkg, offline_exit=True):
    print(f'Package "{pkg}" required.')
-    sys.exit(1)
+    if offline_exit:
+        sys.exit(1)

-def yum_install(pkg):
+def yum_install(pkg, offline_exit=True):
    if is_offline():
-        pkg_error_exit(pkg)
+        pkg_error_exit(pkg, offline_exit)
    return run(f'yum install -y {pkg}', shell=True, check=True)

 def apt_is_updated():
@@ -313,9 +314,9 @@ def apt_is_updated():

 APT_GET_UPDATE_NUM_RETRY = 30
 APT_GET_UPDATE_RETRY_INTERVAL = 10
-def apt_install(pkg):
+def apt_install(pkg, offline_exit=True):
    if is_offline():
-        pkg_error_exit(pkg)
+        pkg_error_exit(pkg, offline_exit)

    # The lock for update and install/remove are different, and
    # DPkg::Lock::Timeout will only wait for install/remove lock.
@@ -344,14 +345,14 @@ def apt_install(pkg):
    apt_env['DEBIAN_FRONTEND'] = 'noninteractive'
    return run(f'apt-get -o DPkg::Lock::Timeout=300 install -y {pkg}', shell=True, check=True, env=apt_env)

-def emerge_install(pkg):
+def emerge_install(pkg, offline_exit=True):
    if is_offline():
-        pkg_error_exit(pkg)
+        pkg_error_exit(pkg, offline_exit)
    return run(f'emerge -uq {pkg}', shell=True, check=True)

-def zypper_install(pkg):
+def zypper_install(pkg, offline_exit=True):
    if is_offline():
-        pkg_error_exit(pkg)
+        pkg_error_exit(pkg, offline_exit)
    return run(f'zypper install -y {pkg}', shell=True, check=True)

 def pkg_distro():
@@ -364,18 +365,20 @@ def pkg_distro():
    else:
        return distro.id()

-pkg_xlat = {'cpupowerutils': {'debian': 'linux-cpupower', 'gentoo':'sys-power/cpupower', 'arch':'cpupower', 'suse': 'cpupower'}}
-def pkg_install(pkg):
+pkg_xlat = {'cpupowerutils': {'debian': 'linux-cpupower', 'gentoo':'sys-power/cpupower', 'arch':'cpupower', 'suse': 'cpupower'},
+            'policycoreutils-python-utils': {'amzn2': 'policycoreutils-python'}}
+
+def pkg_install(pkg, offline_exit=True):
    if pkg in pkg_xlat and pkg_distro() in pkg_xlat[pkg]:
        pkg = pkg_xlat[pkg][pkg_distro()]
    if is_redhat_variant():
-        return yum_install(pkg)
+        return yum_install(pkg, offline_exit)
    elif is_debian_variant():
-        return apt_install(pkg)
+        return apt_install(pkg, offline_exit)
    elif is_gentoo():
-        return emerge_install(pkg)
+        return emerge_install(pkg, offline_exit)
    elif is_suse_variant():
-        return zypper_install(pkg)
+        return zypper_install(pkg, offline_exit)
    else:
        pkg_error_exit(pkg)

--- a/dist/common/sysconfig/scylla-node-exporter
+++ b/dist/common/sysconfig/scylla-node-exporter
@@ -1 +1 @@
-SCYLLA_NODE_EXPORTER_ARGS="--collector.interrupts"
+SCYLLA_NODE_EXPORTER_ARGS="--collector.interrupts --no-collector.hwmon"
--- a/dist/common/systemd/scylla-server.service
+++ b/dist/common/systemd/scylla-server.service
@@ -20,7 +20,6 @@ ExecStart=/usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE $CPUSET $MEM_CONF
 ExecStopPost=+/opt/scylladb/scripts/scylla_stop
 TimeoutStartSec=1y
 TimeoutStopSec=900
-KillMode=process
 Restart=on-abnormal
 User=scylla
 OOMScoreAdjust=-950
--- a/dist/redhat/scylla.spec
+++ b/dist/redhat/scylla.spec
@@ -158,33 +158,6 @@ Obsoletes:      scylla-server < 1.1
 %description conf
 This package contains the main scylla configuration file.

-# we need to refuse upgrade if current scylla < 1.7.3 && commitlog remains
-%pretrans conf
-ver=$(rpm -qi scylla-server | grep Version | awk '{print $3}')
-if [ -n "$ver" ]; then
-    ver_fmt=$(echo $ver | awk -F. '{printf "%d%02d%02d", $1,$2,$3}')
-    if [ $ver_fmt -lt 10703 ]; then
-        # for <scylla-1.2
-        if [ ! -f /opt/scylladb/lib/scylla/scylla_config_get.py ]; then
-            echo
-            echo "Error: Upgrading from scylla-$ver to scylla-%{version} is not supported."
-            echo "Please upgrade to scylla-1.7.3 or later, before upgrade to %{version}."
-            echo
-            exit 1
-        fi
-        commitlog_directory=$(/opt/scylladb/lib/scylla/scylla_config_get.py -g commitlog_directory)
-        commitlog_files=$(ls $commitlog_directory | wc -l)
-        if [ $commitlog_files -ne 0 ]; then
-            echo
-            echo "Error: Upgrading from scylla-$ver to scylla-%{version} is not supported when commitlog is not clean."
-            echo "Please upgrade to scylla-1.7.3 or later, before upgrade to %{version}."
-            echo "Also make sure $commitlog_directory is empty."
-            echo
-            exit 1
-        fi
-    fi
-fi
-
 %files conf
 %defattr(-,root,root)
 %attr(0755,root,root) %dir %{_sysconfdir}/scylla
--- a/docs/_ext/scylladb_include_flag.py
+++ b/docs/_ext/scylladb_include_flag.py
@@ -1,6 +1,10 @@
+import os
 from sphinx.directives.other import Include
+from sphinx.util import logging
 from docutils.parsers.rst import directives

+LOGGER = logging.getLogger(__name__)
+
 class IncludeFlagDirective(Include):
    option_spec = Include.option_spec.copy()
    option_spec['base_path'] = directives.unchanged
@@ -8,11 +12,18 @@ class IncludeFlagDirective(Include):
    def run(self):
        env = self.state.document.settings.env
        base_path = self.options.get('base_path', '_common')
+        file_path = self.arguments[0]

        if env.app.tags.has('enterprise'):
-            self.arguments[0] = base_path + "_enterprise/" + self.arguments[0]
+            enterprise_path = os.path.join(base_path + "_enterprise", file_path)
+            _, enterprise_abs_path = env.relfn2path(enterprise_path)
+            if os.path.exists(enterprise_abs_path):
+                self.arguments[0] = enterprise_path
+            else:
+                LOGGER.info(f"Enterprise content not found: Skipping inclusion of {file_path}")
+                return []
        else:
-            self.arguments[0] = base_path + "/" + self.arguments[0]
+            self.arguments[0] = os.path.join(base_path, file_path)
        return super().run()

 def setup(app):
--- a/docs/alternator/compatibility.md
+++ b/docs/alternator/compatibility.md
@@ -123,10 +123,6 @@ the secret key is the `salted_hash`, i.e., the secret key can be found by

 <!--- REMOVE IN FUTURE VERSIONS - Remove the note below in version 6.1 -->

-(Note: If you upgraded from version 5.4 to version 6.0 without 
-[enabling consistent topology updates](../upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology.rst), 
-the table name is `system_auth.roles`.)
-
 By default, authorization is not enforced at all. It can be turned on
 by providing an entry in Scylla configuration:
    `alternator_enforce_authorization: true`
--- a/docs/architecture/_common/consistent-topology-with-raft-upgrade-info.rst
+++ b/docs/architecture/_common/consistent-topology-with-raft-upgrade-info.rst
@@ -1,3 +0,0 @@
-If you upgraded from 5.4, you must perform a manual action in order to enable
-consistent topology changes.
-See :doc:`the guide for enabling consistent topology changes</upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology>` for more details.
--- a/docs/architecture/raft.rst
+++ b/docs/architecture/raft.rst
@@ -60,9 +60,8 @@ In summary, Raft makes schema changes safe, but it requires that a quorum of nod
 Verifying that the Raft upgrade procedure finished successfully
 ========================================================================

-You may need to perform the following procedure on upgrade if you explicitly
-disabled the Raft-based schema changes feature in the previous ScyllaDB
-version. Please consult the upgrade guide.
+You may need to perform the following procedure as part of
+the :ref:`manual recovery procedure <recovery-procedure>`.

 The Raft upgrade procedure requires **full cluster availability** to correctly setup the Raft algorithm; after the setup finishes, Raft can proceed with only a majority of nodes, but this initial setup is an exception.
 An unlucky event, such as a hardware failure, may cause one of your nodes to fail. If this happens before the Raft upgrade procedure finishes, the procedure will get stuck and your intervention will be required.
@@ -173,8 +172,6 @@ gossip-based topology.

 The feature is automatically enabled in new clusters.

-.. scylladb_include_flag:: consistent-topology-with-raft-upgrade-info.rst
-
 Verifying that Raft is Enabled
 ----------------------------------

--- a/docs/cql/_common/tablets-default.rst
+++ b/docs/cql/_common/tablets-default.rst
@@ -0,0 +1,3 @@
+By default, a keyspace is created with tablets enabled. The ``tablets`` option 
+is used to opt out a keyspace from tablets-based distribution; see :ref:`Enabling Tablets <tablets-enable-tablets>`
+for details.
--- a/docs/cql/compaction.rst
+++ b/docs/cql/compaction.rst
@@ -62,7 +62,7 @@ The following options are available for all compaction strategies.
 =====

 ``tombstone_compaction_interval`` (default: 86400s (1 day))
-   An SSTable that is suitable for single SSTable compaction, according to tombstone_threshold will not be compacted if it is newer than tombstone_compaction_interval. 
+  *tombstone_compaction_interval* is lower-bound for when a new tombstone compaction can start. If an SSTable was compacted at a time *X*, the earliest time it will be considered for tombstone compaction again is *X + tombstone_compaction_interval*. This does not guarantee that sstables will be considered for compaction immediately after tombstone_compaction_interval time has elapsed after the last compaction.

 =====

--- a/docs/cql/cql-extensions.md
+++ b/docs/cql/cql-extensions.md
@@ -377,6 +377,20 @@ FINALFUNC final_fct
 INITCOND (0, 0);
 ```

+### Behavior of bind variables references with the same name
+
+If a bind variable is referred to twice (example: `WHERE aa = :var AND bb = :var`; `:var`
+is referenced twice), ScyllaDB and Cassandra treat it differently:
+
+ - Cassandra ignores the double reference and treats the two as two separate variables. They
+   can have different types, and occupy two slots in the bind variable metadata (used by
+   drivers when the user provides a bind variable tuple rather than a map)
+ - ScyllaDB treats the two references as referring to the same variable. The two references
+   must have the same type, and occupy one slot in the bind variable metadata.
+
+ScyllaDB can revert to the Cassandra treatment by setting the configuration item
+`cql_duplicate_bind_variable_names_refer_to_same_variable` to `false`.
+
 ### Lists elements for filtering

 Subscripting a list in a WHERE clause is supported as are maps.
--- a/docs/cql/ddl.rst
+++ b/docs/cql/ddl.rst
@@ -116,7 +116,7 @@ name                 kind       mandatory   default   description
                                                      details below).
 ``durable_writes``   *simple*   no          true      Whether to use the commit log for updates on this keyspace
                                                      (disable this option at your own risk!).
-``tablets``          *map*      no                    Enables or disables tablets for the keyspace (see :ref:`tablets<tablets>`)
+``tablets``          *map*      no                    Enables or disables tablets for the keyspace (see :ref:`tablets <tablets>`)
 =================== ========== =========== ========= ===================================================================

 The ``replication`` property is mandatory and must at least contains the ``'class'`` sub-option, which defines the
@@ -232,9 +232,7 @@ sub-option                             type  description
 ``'initial'``                          int   The number of tablets to start with
 ===================================== ====== =============================================

-By default, a keyspace is created with tablets enabled. The ``tablets`` option 
-is used to opt out a keyspace from tablets-based distribution; see :ref:`Enabling Tablets <tablets-enable-tablets>`
-for details.
+.. scylladb_include_flag:: tablets-default.rst

 A good rule of thumb to calculate initial tablets is to divide the expected total storage used
 by tables in this keyspace by (``replication_factor`` * 5GB). For example, if you expect a 30TB
@@ -759,10 +757,8 @@ available:
 ========================= =============== =============================================================================
 Option                    Default         Description
 ========================= =============== =============================================================================
- ``sstable_compression``   LZ4Compressor   The compression algorithm to use. Default compressors are
-                                           LZ4Compressor, SnappyCompressor, and DeflateCompressor.
-                                           A custom compressor can be provided by specifying the full class
-                                           name as a “string constant”:#constants.
+ ``sstable_compression``   LZ4Compressor   The compression algorithm to use. Available compressors are
+                                           LZ4Compressor, SnappyCompressor, DeflateCompressor, and ZstdCompressor.
 ``chunk_length_in_kb``    4               On disk SSTables are compressed by block (to allow random reads). This
                                           defines the size (in KB) of the block. Bigger values may improve the
                                           compression rate, but increases the minimum size of data to be read from disk
--- a/docs/dev/docker-hub.md
+++ b/docs/dev/docker-hub.md
@@ -48,6 +48,13 @@ to calculate the proper value is:
 $ docker run --name some-scylla --hostname some-scylla -d scylladb/scylla
 ```

+If you're on macOS and plan to start a multi-node cluster (3 nodes or more), start ScyllaDB with
+`–reactor-backend=epoll` to override the default `linux-aio` reactor backend:
+
+```console
+$ docker run --name some-scylla --hostname some-scylla -d scylladb/scylla --reactor-backend=epoll
+```
+
 ### Run `nodetool` utility

 ```console
@@ -75,6 +82,11 @@ cqlsh>
 ```console
 $ docker run --name some-scylla2  --hostname some-scylla2 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"
 ```
+If you're on macOS, ensure to add the `–reactor-backend=epoll` option when adding new nodes:
+
+```console
+$ docker run --name some-scylla2  --hostname some-scylla2 -d scylladb/scylla --reactor-backend=epoll --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"
+```

 #### Make a cluster with Docker Compose

--- a/docs/getting-started/_common/os-support-info.rst
+++ b/docs/getting-started/_common/os-support-info.rst
@@ -1,14 +1,14 @@
 You can `build ScyllaDB from source <https://github.com/scylladb/scylladb#build-prerequisites>`_ on other x86_64 or aarch64 platforms, without any guarantees.

 +----------------------------+--------------------+-------+---------------+
-| Linux Distributions        |Ubuntu              | Debian| Rocky /       |
-|                            |                    |       | RHEL          |
+| Linux Distributions        |Ubuntu              | Debian|Rocky / CentOS |
+|                            |                    |       |/ RHEL         |
 +----------------------------+------+------+------+-------+-------+-------+
 | ScyllaDB Version / Version |20.04 |22.04 |24.04 |  11   |   8   |   9   |
 +============================+======+======+======+=======+=======+=======+
-|   6.0                      | |v|  | |v|  | |v|  | |v|   | |v|   | |v|   |
+|   6.1                      | |v|  | |v|  | |v|  | |v|   | |v|   | |v|   |
 +----------------------------+------+------+------+-------+-------+-------+
-|   5.4                      | |v|  | |v|  | |x|  | |v|   | |v|   | |v|   |
+|   6.0                      | |v|  | |v|  | |v|  | |v|   | |v|   | |v|   |
 +----------------------------+------+------+------+-------+-------+-------+

 * The recommended OS for ScyllaDB Open Source is Ubuntu 22.04.
@@ -18,4 +18,4 @@ Supported Architecture
 -----------------------------

 ScyllaDB Open Source supports x86_64 for all versions and AArch64 starting from ScyllaDB 4.6 and nightly build. 
-In particular, aarch64 support includes AWS EC2 Graviton.
+In particular, aarch64 support includes AWS EC2 Graviton.
--- a/docs/getting-started/_common/setup-after-install.rst
+++ b/docs/getting-started/_common/setup-after-install.rst
@@ -0,0 +1,54 @@
+Configure and Run ScyllaDB
+-------------------------------
+
+#. Configure the following parameters in the ``/etc/scylla/scylla.yaml`` configuration file.
+
+   * ``cluster_name`` - The name of the cluster. All the nodes in the cluster must have the same 
+     cluster name configured.
+   * ``seeds`` - The IP address of the first node. Other nodes will use it as the first contact 
+     point to discover the cluster topology when joining the cluster.
+   * ``listen_address`` - The IP address that ScyllaDB uses to connect to other nodes in the cluster.
+   * ``rpc_address`` - The IP address of the interface for CQL client connections.
+
+#. Run the ``scylla_setup`` script to tune the system settings and determine the optimal configuration.
+
+   .. code-block:: console
+    
+      sudo scylla_setup
+
+   * The script invokes a set of :ref:`scripts <system-configuration-scripts>` to configure several operating system settings; for example, it sets 
+     RAID0 and XFS filesystem. 
+   * The script runs a short (up to a few minutes) benchmark on your storage and generates the ``/etc/scylla.d/io.conf`` 
+     configuration file. When the file is ready, you can start ScyllaDB. ScyllaDB will not run without XFS 
+     or ``io.conf`` file.
+   * You can bypass this check by running ScyllaDB in :doc:`developer mode </getting-started/installation-common/dev-mod>`. 
+     We recommend against enabling developer mode in production environments to ensure ScyllaDB's maximum performance.
+
+#. Run ScyllaDB as a service (if not already running).
+
+   .. code-block:: console
+    
+      sudo systemctl start scylla-server
+
+
+Now you can start using ScyllaDB. Here are some tools you may find useful.
+
+Run nodetool:
+   
+.. code-block:: console
+     
+     nodetool status
+
+Run cqlsh:
+
+.. code-block:: console
+     
+     cqlsh
+
+Run cassandra-stress:
+
+.. code-block:: console
+     
+     cassandra-stress write -mode cql3 native 
+
+
--- a/docs/getting-started/cloud-instance-recommendations.rst
+++ b/docs/getting-started/cloud-instance-recommendations.rst
@@ -175,7 +175,7 @@ Recommended instances types are `n1-highmem <https://cloud.google.com/compute/do
   * - n2-highmem-32
     - 32
     - 256
-     - 6,000
+     - 9,000
   * - n2-highmem-48
     - 48
     - 384
--- a/docs/getting-started/install-scylla/install-on-linux.rst
+++ b/docs/getting-started/install-scylla/install-on-linux.rst
@@ -154,59 +154,7 @@ Install ScyllaDB
               sudo yum install scylla-5.2.3


-Configure and Run ScyllaDB
-------------------------------
-
-#. Configure the following parameters in the ``/etc/scylla/scylla.yaml`` configuration file.
-
-   * ``cluster_name`` - The name of the cluster. All the nodes in the cluster must have the same 
-     cluster name configured.
-   * ``seeds`` - The IP address of the first node. Other nodes will use it as the first contact 
-     point to discover the cluster topology when joining the cluster.
-   * ``listen_address`` - The IP address that ScyllaDB uses to connect to other nodes in the cluster.
-   * ``rpc_address`` - The IP address of the interface for CQL client connections.
-
-#. Run the ``scylla_setup`` script to tune the system settings and determine the optimal configuration.
-
-   .. code-block:: console
-    
-      sudo scylla_setup
-
-   * The script invokes a set of :ref:`scripts <system-configuration-scripts>` to configure several operating system settings; for example, it sets 
-     RAID0 and XFS filesystem. 
-   * The script runs a short (up to a few minutes) benchmark on your storage and generates the ``/etc/scylla.d/io.conf`` 
-     configuration file. When the file is ready, you can start ScyllaDB. ScyllaDB will not run without XFS 
-     or ``io.conf`` file.
-   * You can bypass this check by running ScyllaDB in :doc:`developer mode </getting-started/installation-common/dev-mod>`. 
-     We recommend against enabling developer mode in production environments to ensure ScyllaDB's maximum performance.
-
-#. Run ScyllaDB as a service (if not already running).
-
-   .. code-block:: console
-    
-      sudo systemctl start scylla-server
-
-
-Now you can start using ScyllaDB. Here are some tools you may find useful.
-
-Run nodetool:
-   
-.. code-block:: console
-     
-     nodetool status
-
-Run cqlsh:
-
-.. code-block:: console
-     
-     cqlsh
-
-Run cassandra-stress:
-
-.. code-block:: console
-     
-     cassandra-stress write -mode cql3 native 
-
+.. include:: /getting-started/_common/setup-after-install.rst

 Next Steps
 ------------
--- a/docs/getting-started/installation-common/scylla-web-installer.rst
+++ b/docs/getting-started/installation-common/scylla-web-installer.rst
@@ -12,7 +12,7 @@ Prerequisites
 Ensure that your platform is supported by the ScyllaDB version you want to install. 
 See :doc:`OS Support by Platform and Version </getting-started/os-support/>`.

-Installing ScyllaDB with Web Installer
+Install ScyllaDB with Web Installer
 ---------------------------------------
 To install ScyllaDB with Web Installer, run:

@@ -40,22 +40,24 @@ options to install a different version or ScyllaDB Enterprise:
 You can run the command with the ``-h`` or ``--help`` flag to print information about the script.

 Examples
---------
+===========

-Installing ScyllaDB Open Source 4.6.1:
+Installing ScyllaDB Open Source 6.0.1:

 .. code:: console

-    curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 4.6.1
+    curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 6.0.1

-Installing the latest patch release for ScyllaDB Open Source 4.6:
+Installing the latest patch release for ScyllaDB Open Source 6.0:

 .. code:: console

-    curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 4.6
+    curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 6.0

-Installing ScyllaDB Enterprise 2021.1:
+Installing ScyllaDB Enterprise 2024.1:

 .. code:: console

-    curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-product scylla-enterprise --scylla-version 2021.1
+    curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-product scylla-enterprise --scylla-version 2024.1
+
+.. include:: /getting-started/_common/setup-after-install.rst
--- a/docs/getting-started/installation-common/unified-installer.rst
+++ b/docs/getting-started/installation-common/unified-installer.rst
@@ -1,8 +1,3 @@
-.. |SCYLLADB_VERSION| replace:: 5.2
-
-.. update the version folder URL below (variables won't work):
-    https://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.2/
-
 ====================================================
 Install ScyllaDB Without root Privileges
 ====================================================
@@ -24,14 +19,17 @@ Note that if you're on CentOS 7, only root offline installation is supported.
 Download and Install
 -----------------------

-#. Download the latest tar.gz file for ScyllaDB |SCYLLADB_VERSION| (x86 or ARM) from https://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.2/.
+#. Download the latest tar.gz file for ScyllaDB version (x86 or ARM) from ``https://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-<version>/``.
+
+   Example for version 6.1: https://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-6.1/
+
 #. Uncompress the downloaded package.

-   The following example shows the package for ScyllaDB 5.2.4 (x86):
+   The following example shows the package for ScyllaDB 6.1.1 (x86):

   .. code:: console

-    tar xvfz scylla-unified-5.2.4-0.20230623.cebbf6c5df2b.x86_64.tar.gz
+    tar xvfz scylla-unified-6.1.1-0.20240814.8d90b817660a.x86_64.tar.gz

 #. Install OpenJDK 8 or 11.

--- a/docs/operating-scylla/admin-tools/scylla-sstable.rst
+++ b/docs/operating-scylla/admin-tools/scylla-sstable.rst
@@ -430,8 +430,8 @@ The content is dumped in JSON, using the following schema:
        "estimated_tombstone_drop_time": $STREAMING_HISTOGRAM,
        "sstable_level": Uint,
        "repaired_at": Uint64,
-        "min_column_names": [Uint, ...],
-        "max_column_names": [Uint, ...],
+        "min_column_names": [String, ...],
+        "max_column_names": [String, ...],
        "has_legacy_counter_shards": Bool,
        "columns_count": Int64, // >= MC only
        "rows_count": Int64, // >= MC only
--- a/docs/operating-scylla/nodetool-commands/rebuild.rst
+++ b/docs/operating-scylla/nodetool-commands/rebuild.rst
@@ -1,8 +1,17 @@
 Nodetool rebuild
 ================

-**rebuild** ``[<src-dc-name>]`` - This command rebuilds a node's data by streaming data from other nodes in the cluster (similarly to bootstrap).
-Rebuild operates on multiple nodes in a ScyllaDB cluster. It streams data from a single source replica when rebuilding a token range. When executing the command, ScyllaDB first figures out which ranges the local node (the one we want to rebuild) is responsible for. Then which node in the cluster contains the same ranges. Finally, ScyllaDB streams the data to the local node.
+**rebuild** ``[[--force] <source-dc-name>]`` - This command rebuilds a node's data by streaming data from other nodes in the cluster (similarly to bootstrap).
+
+When executing the command, ScyllaDB first figures out which ranges the local node (the one we want to rebuild) is responsible for.
+Then which node in the cluster contains the same ranges.
+If ``source-dc-name`` is provided, ScyllaDB will stream data only from nodes in that datacenter, when safe to do so.
+Otherwise, an alternative datacenter that lost no nodes will be considered, and if none exist, all datacenters will be considered.
+Use the ``--force`` option to enforce rebuild using the source datacenter, even if it is unsafe to do so.
+
+When ``rebuild`` is enabled in :doc:`Repair Based Node Operations (RBNO) </operating-scylla/procedures/cluster-management/repair-based-node-operation>`,
+data is rebuilt using repair-based-rebuild by reading all source replicas in each token range and repairing any discrepancies between them.
+Otherwise, data is streamed from a single source replica when rebuilding each token range.
 
 When :doc:`adding a new data-center into an existing ScyllaDB cluster </operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/>` use the rebuild command.

@@ -14,6 +23,6 @@ For Example:

 .. code-block:: shell

-   nodetool rebuild <src-dc-name>
+   nodetool rebuild <source-dc-name>

 .. include:: nodetool-index.rst
--- a/docs/operating-scylla/procedures/cluster-management/_common/membership-change-failures-note.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/membership-change-failures-note.rst
@@ -1,7 +1,10 @@
 .. note::

-    This page only applies to clusters where consistent topology updates are not enabled. 
+    This page only applies to clusters where consistent topology updates are not enabled.
+    Consistent topology updates are mandatory, so **this page serves troubleshooting purposes**.
+
    The page does NOT apply if you:

-    * Created a cluster with ScyllaDB 6.0 (consistent topology updates are automatically enabled).
-    * Upgraded from ScyllaDB 5.4 and :doc:`manually enabled consistent topology updates </upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology>`.
+    * Created a cluster with ScyllaDB 6.0 or later (consistent topology updates are automatically enabled).
+    * `Manually enabled consistent topology updates <https://opensource.docs.scylladb.com/branch-6.0/upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology.html>`_
+      after upgrading to 6.0 or before upgrading to 6.1 (required).
--- a/docs/operating-scylla/procedures/cluster-management/_common/system-auth-alter-info.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/system-auth-alter-info.rst
@@ -1,3 +0,0 @@
-(Note: If you upgraded from version 5.4 without 
-:doc:`enabling consistent topology updates </upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology>`, 
-you must additionally alter the ``system_auth`` keyspace.)
--- a/docs/operating-scylla/procedures/cluster-management/_common/upgrade-note-add-new-dc.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/upgrade-note-add-new-dc.rst
@@ -1,3 +0,0 @@
-.. note::
-
-   If you upgraded your cluster from version 5.4, see :ref:`After Upgrading from 5.4 <add-dc-upgrade-info>`.
--- a/docs/operating-scylla/procedures/cluster-management/_common/upgrade-note-add-new-node.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/upgrade-note-add-new-node.rst
@@ -1,3 +0,0 @@
-.. note::
-
-   If you upgraded your cluster from version 5.4, see :ref:`After Upgrading from 5.4 <add-new-node-upgrade-info>`.
--- a/docs/operating-scylla/procedures/cluster-management/_common/upgrade-note-remove-node.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/upgrade-note-remove-node.rst
@@ -1,3 +0,0 @@
-.. note::
-
-   If you upgraded your cluster from version 5.4, see :ref:`After Upgrading from 5.4 <remove-node-upgrade-info>`.
--- a/docs/operating-scylla/procedures/cluster-management/_common/upgrade-note-replace-node.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/upgrade-note-replace-node.rst
@@ -1,3 +0,0 @@
-.. note::
-
-   If you upgraded your cluster from version 5.4, see :ref:`After Upgrading from 5.4 <replace-node-upgrade-info>`.
--- a/docs/operating-scylla/procedures/cluster-management/_common/upgrade-warning-add-new-node-or-dc.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/upgrade-warning-add-new-node-or-dc.rst
@@ -1,24 +0,0 @@
-
-After Upgrading from 5.4
----------------------------
-
-The procedure described above applies to clusters where consistent topology updates 
-are enabled. The feature is automatically enabled in new clusters.
-
-If you've upgraded an existing cluster from version 5.4, ensure that you 
-:doc:`manually enabled consistent topology updates </upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology>`.
-Without consistent topology updates enabled, you must consider the following
-limitations while applying the procedure:
-
-* You can only bootstrap one node at a time. You need to wait until the status 
-  of one new node becomes UN (Up Normal) before adding another new node.
-* If the node starts bootstrapping but fails in the middle, for example, due to 
-  a power loss, you can retry bootstrap by restarting the node. If you don't want to
-  retry, or the node refuses to boot on subsequent attempts, consult the 
-  :doc:`Handling Membership Change Failures </operating-scylla/procedures/cluster-management/handling-membership-change-failures>`
-  document. 
-* The ``system_auth`` keyspace has not been upgraded to ``system``.
-  As a result, if ``authenticator`` is set to ``PasswordAuthenticator``, you must 
-  increase the replication factor of the ``system_auth`` keyspace. It is 
-  recommended to set ``system_auth`` replication factor to the number of nodes 
-  in each DC.
--- a/docs/operating-scylla/procedures/cluster-management/_common/upgrade-warning-remove-node.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/upgrade-warning-remove-node.rst
@@ -1,21 +0,0 @@
-
-After Upgrading from 5.4
----------------------------
-
-The procedure described above applies to clusters where consistent topology updates 
-are enabled. The feature is automatically enabled in new clusters.
-
-If you've upgraded an existing cluster from version 5.4, ensure that you 
-:doc:`manually enabled consistent topology updates </upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology>`.
-Without consistent topology updates enabled, you must consider the following
-limitations while applying the procedure:
-    
-* It’s essential to ensure the removed node will **never** come back to the cluster, 
-  which might adversely affect your data (data resurrection/loss). To prevent the removed 
-  node from rejoining the cluster, remove that node from the cluster network or VPC.
-* You can only remove one node at a time. You need to verify that the node has 
-  been removed before removing another one.
-* If ``nodetool decommission`` starts executing but fails in the middle, for example, 
-  due to a power loss, consult the 
-  :doc:`Handling Membership Change Failures </operating-scylla/procedures/cluster-management/handling-membership-change-failures>`
-  document. 
--- a/docs/operating-scylla/procedures/cluster-management/_common/upgrade-warning-replace-node.rst
+++ b/docs/operating-scylla/procedures/cluster-management/_common/upgrade-warning-replace-node.rst
@@ -1,23 +0,0 @@
-
----------------------------
-After Upgrading from 5.4
----------------------------
-
-The procedure described above applies to clusters where consistent topology updates 
-are enabled. The feature is automatically enabled in new clusters.
-
-If you've upgraded an existing cluster from version 5.4, ensure that you 
-:doc:`manually enabled consistent topology updates </upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology>`.
-Without consistent topology updates enabled, you must consider the following
-limitations while applying the procedure:
-    
-* It’s essential to ensure the replaced (dead) node will never come back to the cluster, 
-  which might lead to a split-brain situation. Remove the replaced (dead) node from 
-  the cluster network or VPC.
-* You can only replace one node at a time. You need to wait until the status 
-  of the new node becomes UN (Up Normal) before replacing another new node.
-* If the new node starts and begins the replace operation but then fails in the middle, 
-  for example, due to a power loss, you can retry the replace by restarting the node. 
-  If you don’t want to retry, or the node refuses to boot on subsequent attempts, consult the 
-  :doc:`Handling Membership Change Failures </operating-scylla/procedures/cluster-management/handling-membership-change-failures>`
-  document. 
--- a/docs/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.rst
+++ b/docs/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.rst
@@ -1,8 +1,6 @@
 Adding a New Data Center Into an Existing ScyllaDB Cluster
 ***********************************************************

-.. scylladb_include_flag:: upgrade-note-add-new-dc.rst
-
 The following procedure specifies how to add a Data Center (DC) to a live ScyllaDB Cluster, in a single data center, :ref:`multi-availability zone <faq-best-scenario-node-multi-availability-zone>`, or multi-datacenter. Adding a DC out-scales the cluster and provides higher availability (HA).

 The procedure includes:
@@ -164,8 +162,6 @@ Add New DC
   * Keyspace created by the user (which needed to replicate to the new DC).
   * System: ``system_distributed``, ``system_traces``, for example, replicate the data to three nodes in the new DC.

-   .. scylladb_include_flag:: system-auth-alter-info.rst
-
   For example:

   Before
@@ -234,7 +230,3 @@ Additional Resources for Java Clients
 * `DCAwareRoundRobinPolicy.Builder <https://java-driver.docs.scylladb.com/scylla-3.10.2.x/api/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.Builder.html>`_
 * `DCAwareRoundRobinPolicy <https://java-driver.docs.scylladb.com/scylla-3.10.2.x/api/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.html>`_

-
-.. _add-dc-upgrade-info:
-
-.. scylladb_include_flag:: upgrade-warning-add-new-node-or-dc.rst
--- a/docs/operating-scylla/procedures/cluster-management/add-node-to-cluster.rst
+++ b/docs/operating-scylla/procedures/cluster-management/add-node-to-cluster.rst
@@ -2,8 +2,6 @@
 Adding a New Node Into an Existing ScyllaDB Cluster (Out Scale)
 =================================================================

-.. scylladb_include_flag:: upgrade-note-add-new-node.rst
-
 When you add a new node, other nodes in the cluster stream data to the new node. This operation is called bootstrapping and may
 be time-consuming, depending on the data size and network bandwidth. If using a :ref:`multi-availability-zone <faq-best-scenario-node-multi-availability-zone>`, make sure they are balanced.

@@ -100,7 +98,3 @@ Procedure

 #. If you are using ScyllaDB Monitoring, update the `monitoring stack <https://monitoring.docs.scylladb.com/stable/install/monitoring_stack.html#configure-scylla-nodes-from-files>`_ to monitor it. If you are using ScyllaDB Manager, make sure you install the `Manager Agent <https://manager.docs.scylladb.com/stable/install-scylla-manager-agent.html>`_, and Manager can access it.

-
-.. _add-new-node-upgrade-info:
-
-.. scylladb_include_flag:: upgrade-warning-add-new-node-or-dc.rst
--- a/Show More
+++ b/Show More