Use tablet_task_info fields for tablet task timing

For tablet_virtual_task, set creation_time to tablet_task_info::request_time and start_time to tablet_task_info::sched_time, matching the actual semantics of when the request was created vs when it was scheduled for execution. Co-authored-by: Deexie <56607372+Deexie@users.noreply.github.com>
Merge topology_request_tracking_mutation_builder calls for keyspace_rf_change
2025-12-08 15:05:50 +00:00 · 2025-12-08 14:53:23 +00:00 · 2025-12-08 14:40:01 +00:00 · 2025-12-08 14:28:01 +00:00 · 2025-12-08 14:09:22 +00:00 · 2025-12-08 13:26:09 +00:00
776 changed files with 26728 additions and 14664 deletions
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -57,7 +57,6 @@ repair/* @tgrabiec @asias

 # SCHEMA MANAGEMENT
 db/schema_tables* @tgrabiec
-db/legacy_schema_migrator* @tgrabiec
 service/migration* @tgrabiec
 schema* @tgrabiec

--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,86 @@
+# ScyllaDB Development Instructions
+
+## Project Context
+High-performance distributed NoSQL database. Core values: performance, correctness, readability.
+
+## Build System
+
+### Modern Build (configure.py + ninja)
+```bash
+# Configure (run once per mode, or when switching modes)
+./configure.py --mode=<mode>  # mode: dev, debug, release, sanitize
+
+# Build everything
+ninja <mode>-build  # e.g., ninja dev-build
+
+# Build Scylla binary only (sufficient for Python integration tests)
+ninja build/<mode>/scylla
+
+# Build specific test
+ninja build/<mode>/test/boost/<test_name>
+```
+
+## Running Tests
+
+### C++ Unit Tests
+```bash
+# Run all tests in a file
+./test.py --mode=<mode> test/<suite>/<test_name>.cc
+
+# Run a single test case from a file
+./test.py --mode=<mode> test/<suite>/<test_name>.cc::<test_case_name>
+
+# Examples
+./test.py --mode=dev test/boost/memtable_test.cc
+./test.py --mode=dev test/raft/raft_server_test.cc::test_check_abort_on_client_api
+```
+
+**Important:** 
+- Use full path with `.cc` extension (e.g., `test/boost/test_name.cc`, not `boost/test_name`)
+- To run a single test case, append `::<test_case_name>` to the file path
+- If you encounter permission issues with cgroup metric gathering, add `--no-gather-metrics` flag
+
+**Rebuilding Tests:**
+- test.py does NOT automatically rebuild when test source files are modified
+- Many tests are part of composite binaries (e.g., `combined_tests` in test/boost contains multiple test files)
+- To find which binary contains a test, check `configure.py` in the repository root (primary source) or `test/<suite>/CMakeLists.txt`
+- To rebuild a specific test binary: `ninja build/<mode>/test/<suite>/<binary_name>`
+- Examples: 
+  - `ninja build/dev/test/boost/combined_tests` (contains group0_voter_calculator_test.cc and others)
+  - `ninja build/dev/test/raft/replication_test` (standalone Raft test)
+
+### Python Integration Tests
+```bash
+# Only requires Scylla binary (full build usually not needed)
+ninja build/<mode>/scylla
+
+# Run all tests in a file
+./test.py --mode=<mode> <test_path>
+
+# Run a single test case from a file
+./test.py --mode=<mode> <test_path>::<test_function_name>
+
+# Examples
+./test.py --mode=dev alternator/
+./test.py --mode=dev cluster/test_raft_voters::test_raft_limited_voters_retain_coordinator
+
+# Optional flags
+./test.py --mode=dev cluster/test_raft_no_quorum -v  # Verbose output
+./test.py --mode=dev cluster/test_raft_no_quorum --repeat 5  # Repeat test 5 times
+```
+
+**Important:**
+- Use path without `.py` extension (e.g., `cluster/test_raft_no_quorum`, not `cluster/test_raft_no_quorum.py`)
+- To run a single test case, append `::<test_function_name>` to the file path
+- Add `-v` for verbose output
+- Add `--repeat <num>` to repeat a test multiple times
+- After modifying C++ source files, only rebuild the Scylla binary for Python tests - building the entire repository is unnecessary
+
+## Code Philosophy
+- Performance matters in hot paths (data read/write, inner loops)
+- Self-documenting code through clear naming
+- Comments explain "why", not "what"
+- Prefer standard library over custom implementations
+- Strive for simplicity and clarity, add complexity only when clearly justified
+- Question requests: don't blindly implement requests - evaluate trade-offs, identify issues, and suggest better alternatives when appropriate
+- Consider different approaches, weigh pros and cons, and recommend the best fit for the specific context
--- a/.github/instructions/cpp.instructions.md
+++ b/.github/instructions/cpp.instructions.md
@@ -0,0 +1,115 @@
+---
+applyTo: "**/*.{cc,hh}"
+---
+
+# C++ Guidelines
+
+**Important:** Always match the style and conventions of existing code in the file and directory.
+
+## Memory Management
+- Prefer stack allocation whenever possible
+- Use `std::unique_ptr` by default for dynamic allocations
+- `new`/`delete` are forbidden (use RAII)
+- Use `seastar::lw_shared_ptr` or `seastar::shared_ptr` for shared ownership within same shard
+- Use `seastar::foreign_ptr` for cross-shard sharing
+- Avoid `std::shared_ptr` except when interfacing with external C++ APIs
+- Avoid raw pointers except for non-owning references or C API interop
+
+## Seastar Asynchronous Programming
+- Use `seastar::future<T>` for all async operations
+- Prefer coroutines (`co_await`, `co_return`) over `.then()` chains for readability
+- Coroutines are preferred over `seastar::do_with()` for managing temporary state
+- In hot paths where futures are ready, continuations may be more efficient than coroutines
+- Chain futures with `.then()`, don't block with `.get()` (unless in `seastar::thread` context)
+- All I/O must be asynchronous (no blocking calls)
+- Use `seastar::gate` for shutdown coordination
+- Use `seastar::semaphore` for resource limiting (not `std::mutex`)
+- Break long loops with `maybe_yield()` to avoid reactor stalls
+
+## Coroutines
+```cpp
+seastar::future<T> func() {
+    auto result = co_await async_operation();
+    co_return result;
+}
+```
+
+## Error Handling
+- Throw exceptions for errors (futures propagate them automatically)
+- In data path: avoid exceptions, use `std::expected` (or `boost::outcome`) instead
+- Use standard exceptions (`std::runtime_error`, `std::invalid_argument`)
+- Database-specific: throw appropriate schema/query exceptions
+
+## Performance
+- Pass large objects by `const&` or `&&` (move semantics)
+- Use `std::string_view` for non-owning string references
+- Avoid copies: prefer move semantics
+- Use `utils::chunked_vector` instead of `std::vector` for large allocations (>128KB)
+- Minimize dynamic allocations in hot paths
+
+## Database-Specific Types
+- Use `schema_ptr` for schema references
+- Use `mutation` and `mutation_partition` for data modifications
+- Use `partition_key` and `clustering_key` for keys
+- Use `api::timestamp_type` for database timestamps
+- Use `gc_clock` for garbage collection timing
+
+## Style
+- C++23 standard (prefer modern features, especially coroutines)
+- Use `auto` when type is obvious from RHS
+- Avoid `auto` when it obscures the type
+- Use range-based for loops: `for (const auto& item : container)`
+- Use standard algorithms when they clearly simplify code (e.g., replacing 10-line loops)
+- Avoid chaining multiple algorithms if a straightforward loop is clearer
+- Mark functions and variables `const` whenever possible
+- Use scoped enums: `enum class` (not unscoped `enum`)
+
+## Headers
+- Use `#pragma once`
+- Include order: own header, C++ std, Seastar, Boost, project headers
+- Forward declare when possible
+- Never `using namespace` in headers (exception: `using namespace seastar` is globally available via `seastarx.hh`)
+
+## Documentation
+- Public APIs require clear documentation
+- Implementation details should be self-evident from code
+- Use `///` or Doxygen `/** */` for public documentation, `//` for implementation notes - follow the existing style
+
+## Naming
+- `snake_case` for most identifiers (classes, functions, variables, namespaces)
+- Template parameters: `CamelCase` (e.g., `template<typename ValueType>`)
+- Member variables: prefix with `_` (e.g., `int _count;`)
+- Structs (value-only): no `_` prefix on members
+- Constants and `constexpr`: `snake_case` (e.g., `static constexpr int max_size = 100;`)
+- Files: `.hh` for headers, `.cc` for source
+
+## Formatting
+- 4 spaces indentation, never tabs
+- Opening braces on same line as control structure (except namespaces)
+- Space after keywords: `if (`, `while (`, `return `
+- Whitespace around operators matches precedence: `*a + *b` not `* a+* b`
+- Line length: keep reasonable (<160 chars), use continuation lines with double indent if needed
+- Brace all nested scopes, even single statements
+- Minimal patches: only format code you modify, never reformat entire files
+
+## Logging
+- Use structured logging with appropriate levels: DEBUG, INFO, WARN, ERROR
+- Include context in log messages (e.g., request IDs)
+- Never log sensitive data (credentials, PII)
+
+## Forbidden
+- `malloc`/`free`
+- `printf` family (use logging or fmt)
+- Raw pointers for ownership
+- `using namespace` in headers
+- Blocking operations: `std::sleep`, `std::read`, `std::mutex` (use Seastar equivalents)
+- `std::atomic` (reserved for very special circumstances only)
+- Macros (use `inline`, `constexpr`, or templates instead)
+
+## Testing
+When modifying existing code, follow TDD: create/update test first, then implement.
+- Examine existing tests for style and structure
+- Use Boost.Test framework
+- Use `SEASTAR_THREAD_TEST_CASE` for Seastar asynchronous tests
+- Aim for high code coverage, especially for new features and bug fixes
+- Maintain bisectability: all tests must pass in every commit. Mark failing tests with `BOOST_FAIL()` or similar, then fix in subsequent commit
--- a/.github/instructions/python.instructions.md
+++ b/.github/instructions/python.instructions.md
@@ -0,0 +1,51 @@
+---
+applyTo: "**/*.py"
+---
+
+# Python Guidelines
+
+**Important:** Match existing code style. Some directories (like `test/cqlpy` and `test/alternator`) prefer simplicity over type hints and docstrings.
+
+## Style
+- Follow PEP 8
+- Use type hints for function signatures (unless directory style omits them)
+- Use f-strings for formatting
+- Line length: 160 characters max
+- 4 spaces for indentation
+
+## Imports
+Order: standard library, third-party, local imports
+```python
+import os
+import sys
+
+import pytest
+from cassandra.cluster import Cluster
+
+from test.utils import setup_keyspace
+```
+
+Never use `from module import *`
+
+## Documentation
+All public functions/classes need docstrings (unless the current directory conventions omit them):
+```python
+def my_function(arg1: str, arg2: int) -> bool:
+    """
+    Brief summary of function purpose.
+
+    Args:
+        arg1: Description of first argument.
+        arg2: Description of second argument.
+
+    Returns:
+        Description of return value.
+    """
+    pass
+```
+
+## Testing Best Practices
+- Maintain bisectability: all tests must pass in every commit
+- Mark currently-failing tests with `@pytest.mark.xfail`, unmark when fixed
+- Use descriptive names that convey intent
+- Docstrings/comments should explain what the test verifies and why, and if it reproduces a specific issue or how it fits into the larger test suite
--- a/.github/scripts/auto-backport.py
+++ b/.github/scripts/auto-backport.py
@@ -62,7 +62,7 @@ def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr
        if is_draft:
            labels_to_add.append("conflicts")
            pr_comment = f"@{pr.user.login} - This PR was marked as draft because it has conflicts\n"
-            pr_comment += "Please resolve them and remove the 'conflicts' label. The PR will be made ready for review automatically."
+            pr_comment += "Please resolve them and mark this PR as ready for review"
            backport_pr.create_issue_comment(pr_comment)
        
        # Apply all labels at once if we have any
--- a/.github/workflows/backport-pr-fixes-validation.yaml
+++ b/.github/workflows/backport-pr-fixes-validation.yaml
@@ -18,7 +18,7 @@ jobs:
            
            // Regular expression pattern to check for "Fixes" prefix
            // Adjusted to dynamically insert the repository full name
-            const pattern = `Fixes:? ((?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)|(?:https://scylladb\\.atlassian\\.net/browse/)?([A-Z]+-\\d+))`;
+            const pattern = `Fixes:? (?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)`;
            const regex = new RegExp(pattern);
            
            if (!regex.test(body)) {
--- a/.github/workflows/call_backport_with_jira.yaml
+++ b/.github/workflows/call_backport_with_jira.yaml
@@ -1,53 +0,0 @@
-name: Backport with Jira Integration
-
-on:
-  push:
-    branches:
-      - master
-      - next-*.*
-      - branch-*.*
-  pull_request_target:
-    types: [labeled, closed]
-    branches: 
-      - master
-      - next
-      - next-*.*
-      - branch-*.*
-
-jobs:
-  backport-on-push:
-    if: github.event_name == 'push'
-    uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
-    with:
-      event_type: 'push'
-      base_branch: ${{ github.ref }}
-      commits: ${{ github.event.before }}..${{ github.sha }}
-    secrets:
-      gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
-      jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}
-
-  backport-on-label:
-    if: github.event_name == 'pull_request_target' && github.event.action == 'labeled'
-    uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
-    with:
-      event_type: 'labeled'
-      base_branch: refs/heads/${{ github.event.pull_request.base.ref }}
-      pull_request_number: ${{ github.event.pull_request.number }}
-      head_commit: ${{ github.event.pull_request.base.sha }}
-      label_name: ${{ github.event.label.name }}
-      pr_state: ${{ github.event.pull_request.state }}
-    secrets:
-      gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
-      jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}
-
-  backport-chain:
-    if: github.event_name == 'pull_request_target' && github.event.action == 'closed' && github.event.pull_request.merged == true
-    uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
-    with:
-      event_type: 'chain'
-      base_branch: refs/heads/${{ github.event.pull_request.base.ref }}
-      pull_request_number: ${{ github.event.pull_request.number }}
-      pr_body: ${{ github.event.pull_request.body }}
-    secrets:
-      gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
-      jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}
--- a/.github/workflows/docs-validate-metrics.yml
+++ b/.github/workflows/docs-validate-metrics.yml
@@ -0,0 +1,34 @@
+name: Docs / Validate metrics
+
+on:
+  pull_request:
+    branches:
+      - master
+      - enterprise
+    paths:
+      - '**/*.cc'
+      - 'scripts/metrics-config.yml' 
+      - 'scripts/get_description.py'
+      - 'docs/_ext/scylladb_metrics.py'
+
+jobs:
+  validate-metrics:
+    runs-on: ubuntu-latest
+    name: Check metrics documentation coverage
+    
+    steps:
+    - name: Checkout code
+      uses: actions/checkout@v4
+      with:
+        submodules: true
+      
+    - name: Set up Python
+      uses: actions/setup-python@v6
+      with:
+        python-version: '3.10'
+        
+    - name: Install dependencies
+      run: pip install PyYAML
+        
+    - name: Validate metrics
+      run: python3 scripts/get_description.py --validate -c scripts/metrics-config.yml
--- a/.github/workflows/trigger-scylla-ci.yaml
+++ b/.github/workflows/trigger-scylla-ci.yaml
@@ -3,63 +3,19 @@ name: Trigger Scylla CI Route
 on:
  issue_comment:
    types: [created]
-  pull_request_target:
-    types:
-      - unlabeled

 jobs:
  trigger-jenkins:
-    if: (github.event_name == 'issue_comment' && github.event.comment.user.login != 'scylladbbot') || github.event.label.name == 'conflicts'
+    if: github.event.comment.user.login != 'scylladbbot' && contains(github.event.comment.body, '@scylladbbot') && contains(github.event.comment.body, 'trigger-ci')
    runs-on: ubuntu-latest
    steps:
-      - name: Verify Org Membership
-        id: verify_author
-        env:
-          EVENT_NAME: ${{ github.event_name }}
-          PR_AUTHOR: ${{ github.event.pull_request.user.login }}
-          PR_ASSOCIATION: ${{ github.event.pull_request.author_association }}
-          COMMENT_AUTHOR: ${{ github.event.comment.user.login }}
-          COMMENT_ASSOCIATION: ${{ github.event.comment.author_association }}
-        shell: bash
-        run: |
-          if [[ "$EVENT_NAME" == "pull_request_target" ]]; then
-            AUTHOR="$PR_AUTHOR"
-            ASSOCIATION="$PR_ASSOCIATION"
-          else
-            AUTHOR="$COMMENT_AUTHOR"
-            ASSOCIATION="$COMMENT_ASSOCIATION"
-          fi
-          ORG="scylladb"
-          if gh api "/orgs/${ORG}/members/${AUTHOR}" --silent 2>/dev/null; then
-            echo "member=true" >> $GITHUB_OUTPUT
-          else
-            echo "::warning::${AUTHOR} is not a member of ${ORG}; skipping CI trigger."
-            echo "member=false" >> $GITHUB_OUTPUT
-          fi
-
-      - name: Validate Comment Trigger
-        if: github.event_name == 'issue_comment'
-        id: verify_comment
-        env:
-          COMMENT_BODY: ${{ github.event.comment.body }}
-        shell: bash
-        run: |
-          CLEAN_BODY=$(echo "$COMMENT_BODY" | grep -v '^[[:space:]]*>')
-
-          if echo "$CLEAN_BODY" | grep -qi '@scylladbbot' && echo "$CLEAN_BODY" | grep -qi 'trigger-ci'; then
-            echo "trigger=true" >> $GITHUB_OUTPUT
-          else
-            echo "trigger=false" >> $GITHUB_OUTPUT
-          fi
-
      - name: Trigger Scylla-CI-Route Jenkins Job
-        if: steps.verify_author.outputs.member == 'true' && (github.event_name == 'pull_request_target' || steps.verify_comment.outputs.trigger == 'true')
        env:
          JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
          JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
          JENKINS_URL: "https://jenkins.scylladb.com"
-          PR_NUMBER: "${{ github.event.issue.number || github.event.pull_request.number }}"
-          PR_REPO_NAME: "${{ github.event.repository.full_name }}"
        run: |
+          PR_NUMBER=${{ github.event.issue.number }}
+          PR_REPO_NAME=${{ github.event.repository.full_name }}
          curl -X POST "$JENKINS_URL/job/releng/job/Scylla-CI-Route/buildWithParameters?PR_NUMBER=$PR_NUMBER&PR_REPO_NAME=$PR_REPO_NAME" \
-            --user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail
+          --user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail -i -v
--- a/.github/workflows/trigger_ci.yaml
+++ b/.github/workflows/trigger_ci.yaml
@@ -0,0 +1,242 @@
+name: Trigger next gating
+
+on:
+  pull_request_target:
+    types: [opened, reopened, synchronize]
+  issue_comment:
+    types: [created]
+    
+jobs:
+  trigger-ci:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Dump GitHub context
+        env:
+          GITHUB_CONTEXT: ${{ toJson(github) }}
+        run: echo "$GITHUB_CONTEXT"
+      - name: Checkout PR code
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0  # Needed to access full history
+          ref: ${{ github.event.pull_request.head.ref }}
+
+      - name: Fetch before commit if needed
+        run: |
+          if ! git cat-file -e ${{ github.event.before }} 2>/dev/null; then
+            echo "Fetching before commit ${{ github.event.before }}"
+            git fetch --depth=1 origin ${{ github.event.before }}
+          fi
+
+      - name: Compare commits for file changes
+        if: github.action == 'synchronize'
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          echo "Base: ${{ github.event.before }}"
+          echo "Head: ${{ github.event.after }}"
+
+          TREE_BEFORE=$(git show -s --format=%T ${{ github.event.before }})
+          TREE_AFTER=$(git show -s --format=%T ${{ github.event.after }})
+          
+          echo "TREE_BEFORE=$TREE_BEFORE" >> $GITHUB_ENV
+          echo "TREE_AFTER=$TREE_AFTER" >> $GITHUB_ENV
+
+      - name: Check if last push has file changes
+        run: |
+          if [[ "${{ env.TREE_BEFORE }}" == "${{ env.TREE_AFTER }}" ]]; then
+            echo "No file changes detected in the last push, only commit message edit."
+            echo "has_file_changes=false" >> $GITHUB_ENV
+          else
+            echo "File changes detected in the last push."
+            echo "has_file_changes=true" >> $GITHUB_ENV
+          fi
+
+      - name: Rule 1 - Check PR draft or conflict status
+        run: |
+          # Check if PR is in draft mode
+          IS_DRAFT="${{ github.event.pull_request.draft }}"
+          
+          # Check if PR has 'conflict' label
+          HAS_CONFLICT_LABEL="false"
+          LABELS='${{ toJson(github.event.pull_request.labels) }}'
+          if echo "$LABELS" | jq -r '.[].name' | grep -q "^conflict$"; then
+            HAS_CONFLICT_LABEL="true"
+          fi
+          
+          # Set draft_or_conflict variable
+          if [[ "$IS_DRAFT" == "true" || "$HAS_CONFLICT_LABEL" == "true" ]]; then
+            echo "draft_or_conflict=true" >> $GITHUB_ENV
+            echo "✅ Rule 1: PR is in draft mode or has conflict label - setting draft_or_conflict=true"
+          else
+            echo "draft_or_conflict=false" >> $GITHUB_ENV
+            echo "✅ Rule 1: PR is ready and has no conflict label - setting draft_or_conflict=false"
+          fi
+          
+          echo "Draft status: $IS_DRAFT"
+          echo "Has conflict label: $HAS_CONFLICT_LABEL"
+          echo "Result: draft_or_conflict = $draft_or_conflict"
+
+      - name: Rule 2 - Check labels
+        run: |
+          # Check if PR has P0 or P1 labels
+          HAS_P0_P1_LABEL="false"
+          LABELS='${{ toJson(github.event.pull_request.labels) }}'
+          if echo "$LABELS" | jq -r '.[].name' | grep -E "^(P0|P1)$" > /dev/null; then
+            HAS_P0_P1_LABEL="true"
+          fi
+          
+          # Check if PR already has force_on_cloud label
+          echo "HAS_FORCE_ON_CLOUD_LABEL=false" >> $GITHUB_ENV
+          if echo "$LABELS" | jq -r '.[].name' | grep -q "^force_on_cloud$"; then
+            HAS_FORCE_ON_CLOUD_LABEL="true"
+            echo "HAS_FORCE_ON_CLOUD_LABEL=true" >> $GITHUB_ENV
+          fi
+          
+          echo "Has P0/P1 label: $HAS_P0_P1_LABEL"
+          echo "Has force_on_cloud label: $HAS_FORCE_ON_CLOUD_LABEL"
+          
+          # Add force_on_cloud label if PR has P0/P1 and doesn't already have force_on_cloud
+          if [[ "$HAS_P0_P1_LABEL" == "true" && "$HAS_FORCE_ON_CLOUD_LABEL" == "false" ]]; then
+            echo "✅ Rule 2: PR has P0 or P1 label - adding force_on_cloud label"
+            curl -X POST \
+              -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
+              -H "Accept: application/vnd.github.v3+json" \
+              "https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/labels" \
+              -d '{"labels":["force_on_cloud"]}'
+          elif [[ "$HAS_P0_P1_LABEL" == "true" && "$HAS_FORCE_ON_CLOUD_LABEL" == "true" ]]; then
+            echo "✅ Rule 2: PR has P0 or P1 label and already has force_on_cloud label - no action needed"
+          else
+            echo "✅ Rule 2: PR does not have P0 or P1 label - no force_on_cloud label needed"
+          fi
+
+          SKIP_UNIT_TEST_CUSTOM="false"
+          if echo "$LABELS" | jq -r '.[].name' | grep -q "^ci/skip_unit-tests_custom$"; then
+            SKIP_UNIT_TEST_CUSTOM="true"
+          fi
+          echo "SKIP_UNIT_TEST_CUSTOM=$SKIP_UNIT_TEST_CUSTOM" >> $GITHUB_ENV
+
+      - name: Rule 3 - Analyze changed files and set build requirements
+        run: |
+          # Get list of changed files
+          CHANGED_FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }})
+          echo "Changed files:"
+          echo "$CHANGED_FILES"
+          echo ""
+          
+          # Initialize all requirements to false
+          REQUIRE_BUILD="false"
+          REQUIRE_DTEST="false"
+          REQUIRE_UNITTEST="false"
+          REQUIRE_ARTIFACTS="false"
+          REQUIRE_SCYLLA_GDB="false"
+          
+          # Check each file against patterns
+          while IFS= read -r file; do
+            if [[ -n "$file" ]]; then
+              echo "Checking file: $file"
+              
+              # Build pattern: ^(?!scripts\/pull_github_pr.sh).*$
+              # Everything except scripts/pull_github_pr.sh
+              if [[ "$file" != "scripts/pull_github_pr.sh" ]]; then
+                REQUIRE_BUILD="true"
+                echo "  ✓ Matches build pattern"
+              fi
+              
+              # Dtest pattern: ^(?!test(.py|\/)|dist\/docker\/|dist\/common\/scripts\/).*$
+              # Everything except test files, dist/docker/, dist/common/scripts/
+              if [[ ! "$file" =~ ^test\.(py|/).*$ ]] && [[ ! "$file" =~ ^dist/docker/.*$ ]] && [[ ! "$file" =~ ^dist/common/scripts/.*$ ]]; then
+                REQUIRE_DTEST="true"
+                echo "  ✓ Matches dtest pattern"
+              fi
+              
+              # Unittest pattern: ^(?!dist\/docker\/|dist\/common\/scripts).*$
+              # Everything except dist/docker/, dist/common/scripts/
+              if [[ ! "$file" =~ ^dist/docker/.*$ ]] && [[ ! "$file" =~ ^dist/common/scripts.*$ ]]; then
+                REQUIRE_UNITTEST="true"
+                echo "  ✓ Matches unittest pattern"
+              fi
+              
+              # Artifacts pattern: ^(?:dist|tools\/toolchain).*$
+              # Files starting with dist or tools/toolchain
+              if [[ "$file" =~ ^dist.*$ ]] || [[ "$file" =~ ^tools/toolchain.*$ ]]; then
+                REQUIRE_ARTIFACTS="true"
+                echo "  ✓ Matches artifacts pattern"
+              fi
+              
+              # Scylla GDB pattern: ^(scylla-gdb.py).*$
+              # Files starting with scylla-gdb.py
+              if [[ "$file" =~ ^scylla-gdb\.py.*$ ]]; then
+                REQUIRE_SCYLLA_GDB="true"
+                echo "  ✓ Matches scylla_gdb pattern"
+              fi
+            fi
+          done <<< "$CHANGED_FILES"
+          
+          # Set environment variables
+          echo "requireBuild=$REQUIRE_BUILD" >> $GITHUB_ENV
+          echo "requireDtest=$REQUIRE_DTEST" >> $GITHUB_ENV
+          echo "requireUnittest=$REQUIRE_UNITTEST" >> $GITHUB_ENV
+          echo "requireArtifacts=$REQUIRE_ARTIFACTS" >> $GITHUB_ENV
+          echo "requireScyllaGdb=$REQUIRE_SCYLLA_GDB" >> $GITHUB_ENV
+          
+          echo ""
+          echo "✅ Rule 3: File analysis complete"
+          echo "Build required: $REQUIRE_BUILD"
+          echo "Dtest required: $REQUIRE_DTEST"
+          echo "Unittest required: $REQUIRE_UNITTEST"
+          echo "Artifacts required: $REQUIRE_ARTIFACTS"
+          echo "Scylla GDB required: $REQUIRE_SCYLLA_GDB"
+
+      - name: Determine Jenkins Job Name
+        run: |
+          if [[ "${{ github.ref_name }}" == "next" ]]; then
+            FOLDER_NAME="scylla-master"
+          elif [[ "${{ github.ref_name }}" == "next-enterprise" ]]; then
+            FOLDER_NAME="scylla-enterprise"
+          else
+            VERSION=$(echo "${{ github.ref_name }}" | awk -F'-' '{print $2}')
+            if [[ "$VERSION" =~ ^202[0-4]\.[0-9]+$ ]]; then
+              FOLDER_NAME="enterprise-$VERSION"
+            elif [[ "$VERSION" =~ ^[0-9]+\.[0-9]+$ ]]; then
+              FOLDER_NAME="scylla-$VERSION"
+            fi
+          fi
+          echo "JOB_NAME=${FOLDER_NAME}/job/scylla-ci" >> $GITHUB_ENV
+
+      - name: Trigger Jenkins Job
+        if: env.draft_or_conflict == 'false' && env.has_file_changes == 'true' && github.action == 'opened' || github.action == 'reopened'
+        env:
+          JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
+          JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
+          JENKINS_URL: "https://jenkins.scylladb.com"
+          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
+        run: |
+          PR_NUMBER=${{ github.event.issue.number }}
+          PR_REPO_NAME=${{ github.event.repository.full_name }}
+          echo "Triggering Jenkins Job: $JOB_NAME"
+          curl -X POST \
+            "$JENKINS_URL/job/$JOB_NAME/buildWithParameters? \
+            PR_NUMBER=$PR_NUMBER& \
+            RUN_DTEST=$REQUIRE_DTEST& \
+            RUN_ONLY_SCYLLA_GDB=$REQUIRE_SCYLLA_GDB& \
+            RUN_UNIT_TEST=$REQUIRE_UNITTEST& \
+            FORCE_ON_CLOUD=$HAS_FORCE_ON_CLOUD_LABEL& \
+            SKIP_UNIT_TEST_CUSTOM=$SKIP_UNIT_TEST_CUSTOM& \
+            RUN_ARTIFACT_TESTS=$REQUIRE_ARTIFACTS" \
+            --fail \
+            --user "$JENKINS_USER:$JENKINS_API_TOKEN" \
+            -i -v
+  trigger-ci-via-comment:
+    if: github.event.comment.user.login != 'scylladbbot' && contains(github.event.comment.body, '@scylladbbot') && contains(github.event.comment.body, 'trigger-ci')
+    runs-on: ubuntu-latest
+    steps:
+      - name: Trigger Scylla-CI Jenkins Job
+        env:
+          JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
+          JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
+          JENKINS_URL: "https://jenkins.scylladb.com"
+        run: |
+          PR_NUMBER=${{ github.event.issue.number }}
+          PR_REPO_NAME=${{ github.event.repository.full_name }}
+          curl -X POST "$JENKINS_URL/job/$JOB_NAME/buildWithParameters?PR_NUMBER=$PR_NUMBER" \
+          --user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail -i -v
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "seastar"]
 	path = seastar
-	url = ../scylla-seastar
+	url = ../seastar
 	ignore = dirty
 [submodule "swagger-ui"]
 	path = swagger-ui
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -49,7 +49,7 @@ include(limit_jobs)
 set(CMAKE_CXX_STANDARD "23" CACHE INTERNAL "")
 set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")
 set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")
-set(CMAKE_CXX_VISIBILITY_PRESET hidden)
+set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)

 if(is_multi_config)
    find_package(Seastar)
@@ -90,13 +90,13 @@ if(is_multi_config)
    add_dependencies(Seastar::seastar_testing Seastar)
 else()
    set(Seastar_TESTING ON CACHE BOOL "" FORCE)
-    set(Seastar_API_LEVEL 8 CACHE STRING "" FORCE)
+    set(Seastar_API_LEVEL 9 CACHE STRING "" FORCE)
    set(Seastar_DEPRECATED_OSTREAM_FORMATTERS OFF CACHE BOOL "" FORCE)
    set(Seastar_APPS ON CACHE BOOL "" FORCE)
    set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)
    set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)
    set(Seastar_IO_URING ON CACHE BOOL "" FORCE)
-    set(Seastar_SCHEDULING_GROUPS_COUNT 20 CACHE STRING "" FORCE)
+    set(Seastar_SCHEDULING_GROUPS_COUNT 21 CACHE STRING "" FORCE)
    set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)
    add_subdirectory(seastar)
    target_compile_definitions (seastar
@@ -116,6 +116,7 @@ list(APPEND absl_cxx_flags
 if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
    list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})
 elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
+    list(APPEND absl_cxx_flags "-Wno-deprecated-builtins")
    list(APPEND ABSL_LLVM_FLAGS ${absl_cxx_flags})
 endif()
 set(ABSL_DEFAULT_LINKOPTS
@@ -163,7 +164,45 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")
 include(add_version_library)
 generate_scylla_version()

+option(Scylla_USE_PRECOMPILED_HEADER "Use precompiled header for Scylla" ON)
+add_library(scylla-precompiled-header STATIC exported_templates.cc)
+target_link_libraries(scylla-precompiled-header PRIVATE
+    absl::headers
+    absl::btree
+    absl::hash
+    absl::raw_hash_set
+    Seastar::seastar
+    Snappy::snappy
+    systemd
+    ZLIB::ZLIB
+    lz4::lz4_static
+    zstd::zstd_static)
+if (Scylla_USE_PRECOMPILED_HEADER)
+  set(Scylla_USE_PRECOMPILED_HEADER_USE ON)
+  find_program(DISTCC_EXEC NAMES distcc OPTIONAL)
+  if (DISTCC_EXEC)
+    if(DEFINED ENV{DISTCC_HOSTS})
+      set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
+      message(STATUS "Disabling precompiled header usage because distcc exists and DISTCC_HOSTS is set, assuming you're using distributed compilation.")
+    else()
+      file(REAL_PATH "~/.distcc/hosts" DIST_CC_HOSTS_PATH EXPAND_TILDE)
+      if (EXISTS ${DIST_CC_HOSTS_PATH})
+        set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
+        message(STATUS "Disabling precompiled header usage because distcc and ~/.distcc/hosts exists, assuming you're using distributed compilation.")
+      endif()
+    endif()
+  endif()
+  if (Scylla_USE_PRECOMPILED_HEADER_USE)
+    message(STATUS "Using precompiled header for Scylla - remember to add `sloppiness = pch_defines,time_macros` to ccache.conf, if you're using ccache.")
+    target_precompile_headers(scylla-precompiled-header PRIVATE "stdafx.hh")
+    target_compile_definitions(scylla-precompiled-header PRIVATE SCYLLA_USE_PRECOMPILED_HEADER)
+  endif()
+else()
+  set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
+endif()
+
 add_library(scylla-main STATIC)
+
 target_sources(scylla-main
  PRIVATE
    absl-flat_hash_map.cc
@@ -178,7 +217,6 @@ target_sources(scylla-main
    mutation_query.cc
    node_ops/task_manager_module.cc
    partition_slice_builder.cc
-    querier.cc
    query/query.cc
    query_ranges_to_vnodes.cc
    query/query-result-set.cc
@@ -209,6 +247,7 @@ target_link_libraries(scylla-main
    ZLIB::ZLIB
    lz4::lz4_static
    zstd::zstd_static
+    scylla-precompiled-header
 )

 option(Scylla_CHECK_HEADERS
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -12,7 +12,7 @@ Please use the [issue tracker](https://github.com/scylladb/scylla/issues/) to re

 ## Contributing code to Scylla

-Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form cla@scylladb.com. You can then submit your changes as patches to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
+Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form to cla@scylladb.com. You can then submit your changes as patches to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
 If you need help formatting or sending patches, [check out these instructions](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches).

 The Scylla C++ source code uses the [Seastar coding style](https://github.com/scylladb/seastar/blob/master/coding-style.md) so please adhere to that in your patches. Note that Scylla code is written with `using namespace seastar`, so should not explicitly add the `seastar::` prefix to Seastar symbols. You will usually not need to add `using namespace seastar` to new source files, because most Scylla header files have `#include "seastarx.hh"`, which does this.
--- a/HACKING.md
+++ b/HACKING.md
@@ -43,7 +43,7 @@ $ ./tools/toolchain/dbuild ninja build/release/scylla
 $ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
 ```

-Note: do not mix environemtns - either perform all your work with dbuild, or natively on the host.
+Note: do not mix environments - either perform all your work with dbuild, or natively on the host.
 Note2: you can get to an interactive shell within dbuild by running it without any parameters:
 ```bash
 $ ./tools/toolchain/dbuild
@@ -91,7 +91,7 @@ You can also specify a single mode. For example
 $ ninja-build release
 ```

-Will build everytihng in release mode. The valid modes are
+Will build everything in release mode. The valid modes are

 * Debug: Enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)
  and other sanity checks. It has no optimizations, which allows for debugging with tools like
@@ -361,7 +361,7 @@ avoid that the gold linker can be told to create an index with

 More info at https://gcc.gnu.org/wiki/DebugFission.

-Both options can be enable by passing `--split-dwarf` to configure.py.
+Both options can be enabled by passing `--split-dwarf` to configure.py.

 Note that distcc is *not* compatible with it, but icecream
 (https://github.com/icecc/icecream) is.
@@ -370,7 +370,7 @@ Note that distcc is *not* compatible with it, but icecream

 Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.

-One way to do this it to create a local remote for the Seastar submodule in the Scylla repository:
+One way to do this is to create a local remote for the Seastar submodule in the Scylla repository:

 ```bash
 $ cd $HOME/src/scylla
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@ Scylla is fairly fussy about its build environment, requiring very recent
 versions of the C++23 compiler and of many libraries to build. The document
 [HACKING.md](HACKING.md) includes detailed information on building and
 developing Scylla, but to get Scylla building quickly on (almost) any build
-machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),
+machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md).
 This is a pre-configured Docker image which includes recent versions of all
 the required compilers, libraries and build tools. Using the frozen toolchain
 allows you to avoid changing anything in your build machine to meet Scylla's
--- a/2
+++ b/2
@@ -78,7 +78,7 @@ fi

 # Default scylla product/version tags
 PRODUCT=scylla
-VERSION=2025.4.6
+VERSION=2026.1.0-dev

 if test -f version
 then
--- a/alternator/CMakeLists.txt
+++ b/alternator/CMakeLists.txt
@@ -34,5 +34,8 @@ target_link_libraries(alternator
    idl
    absl::headers)

+if (Scylla_USE_PRECOMPILED_HEADER_USE)
+  target_precompile_headers(alternator REUSE_FROM scylla-precompiled-header)
+endif()
 check_headers(check-headers alternator
  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
--- a/alternator/auth.cc
+++ b/alternator/auth.cc
@@ -11,7 +11,6 @@
 #include "utils/log.hh"
 #include <string>
 #include <string_view>
-#include "bytes.hh"
 #include "alternator/auth.hh"
 #include <fmt/format.h>
 #include "auth/password_authenticator.hh"
--- a/alternator/controller.cc
+++ b/alternator/controller.cc
@@ -137,6 +137,7 @@ future<> controller::start_server() {
            return server.init(addr, alternator_port, alternator_https_port, creds,
                    _config.alternator_enforce_authorization,
                    _config.alternator_warn_authorization,
+                    _config.alternator_max_users_query_size_in_trace_output,
                    &_memory_limiter.local().get_semaphore(),
                    _config.max_concurrent_requests_per_shard);
        }).handle_exception([this, addr, alternator_port, alternator_https_port] (std::exception_ptr ep) {
--- a/alternator/error.hh
+++ b/alternator/error.hh
@@ -94,6 +94,9 @@ public:
    static api_error internal(std::string msg) {
        return api_error("InternalServerError", std::move(msg), http::reply::status_type::internal_server_error);
    }
+    static api_error payload_too_large(std::string msg) {
+        return api_error("PayloadTooLarge", std::move(msg), status_type::payload_too_large);
+    }

    // Provide the "std::exception" interface, to make it easier to print this
    // exception in log messages. Note that this function is *not* used to
--- a/alternator/executor.cc
+++ b/alternator/executor.cc
--- a/alternator/executor.hh
+++ b/alternator/executor.hh
@@ -40,7 +40,6 @@ namespace cql3::selection {

 namespace service {
    class storage_proxy;
-    class cas_shard;
 }

 namespace cdc {
@@ -58,7 +57,6 @@ class schema_builder;
 namespace alternator {

 class rmw_operation;
-class put_or_delete_item;

 schema_ptr get_table(service::storage_proxy& proxy, const rjson::value& request);
 bool is_alternator_keyspace(const sstring& ks_name);
@@ -221,16 +219,6 @@ private:

    static void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string,std::string> * = nullptr, const std::map<sstring, sstring> *tags = nullptr);

-    future<> do_batch_write(
-        std::vector<std::pair<schema_ptr, put_or_delete_item>> mutation_builders,
-        service::client_state& client_state,
-        tracing::trace_state_ptr trace_state,
-        service_permit permit);
-
-    future<> cas_write(schema_ptr schema, service::cas_shard cas_shard, const dht::decorated_key& dk,
-        const std::vector<put_or_delete_item>& mutation_builders, service::client_state& client_state,
-        tracing::trace_state_ptr trace_state, service_permit permit);
-
 public:
    static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>&, const std::map<sstring, sstring> *tags = nullptr);

--- a/alternator/rmw_operation.hh
+++ b/alternator/rmw_operation.hh
@@ -8,6 +8,8 @@

 #pragma once

+#include "cdc/cdc_options.hh"
+#include "cdc/log.hh"
 #include "seastarx.hh"
 #include "service/paxos/cas_request.hh"
 #include "service/cas_shard.hh"
@@ -56,7 +58,7 @@ public:
    static write_isolation get_write_isolation_for_schema(schema_ptr schema);

    static write_isolation default_write_isolation;
-public:
+
    static void set_default_write_isolation(std::string_view mode);

 protected:
@@ -107,10 +109,11 @@ public:
    // violating this). We mark apply() "const" to let the compiler validate
    // this for us. The output-only field _return_attributes is marked
    // "mutable" above so that apply() can still write to it.
-    virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const = 0;
+    virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts, cdc::per_request_options& cdc_opts) const = 0;
    // Convert the above apply() into the signature needed by cas_request:
-    virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts) override;
+    virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts, cdc::per_request_options& cdc_opts) override;
    virtual ~rmw_operation() = default;
+    const wcu_consumed_capacity_counter& consumed_capacity() const noexcept { return _consumed_capacity; }
    schema_ptr schema() const { return _schema; }
    const rjson::value& request() const { return _request; }
    rjson::value&& move_request() && { return std::move(_request); }
@@ -124,6 +127,9 @@ public:
            stats& per_table_stats,
            uint64_t& wcu_total);
    std::optional<service::cas_shard> shard_for_execute(bool needs_read_before_write);
+
+private:
+    inline bool should_fill_preimage() const { return _schema->cdc_options().enabled(); }
 };

 } // namespace alternator
--- a/alternator/serialization.cc
+++ b/alternator/serialization.cc
@@ -12,7 +12,7 @@
 #include "serialization.hh"
 #include "error.hh"
 #include "types/concrete_types.hh"
-#include "cql3/type_json.hh"
+#include "types/json_utils.hh"
 #include "mutation/position_in_partition.hh"

 static logging::logger slogger("alternator-serialization");
--- a/alternator/server.cc
+++ b/alternator/server.cc
@@ -13,6 +13,7 @@
 #include <seastar/http/function_handlers.hh>
 #include <seastar/http/short_streams.hh>
 #include <seastar/core/coroutine.hh>
+#include <seastar/coroutine/maybe_yield.hh>
 #include <seastar/util/defer.hh>
 #include <seastar/util/short_streams.hh>
 #include "seastarx.hh"
@@ -32,6 +33,7 @@
 #include "utils/aws_sigv4.hh"
 #include "client_data.hh"
 #include "utils/updateable_value.hh"
+#include <zlib.h>

 static logging::logger slogger("alternator-server");

@@ -428,35 +430,82 @@ static tracing::trace_state_ptr create_tracing_session(tracing::tracing& tracing
    return tracing_instance.create_session(tracing::trace_type::QUERY, props);
 }

-// truncated_content_view() prints a potentially long chunked_content for
-// debugging purposes. In the common case when the content is not excessively
-// long, it just returns a view into the given content, without any copying.
-// But when the content is very long, it is truncated after some arbitrary
-// max_len (or one chunk, whichever comes first), with "<truncated>" added at
-// the end. To do this modification to the string, we need to create a new
-// std::string, so the caller must pass us a reference to one, "buf", where
-// we can store the content. The returned view is only alive for as long this
-// buf is kept alive.
-static std::string_view truncated_content_view(const chunked_content& content, std::string& buf) {
-    constexpr size_t max_len = 1024;
-    if (content.empty()) {
-        return std::string_view();
-    } else if (content.size() == 1 && content.begin()->size() <= max_len) {
-        return std::string_view(content.begin()->get(), content.begin()->size());
-    } else {
-        buf = std::string(content.begin()->get(), std::min(content.begin()->size(), max_len)) + "<truncated>";
-        return std::string_view(buf);
+// A helper class to represent a potentially truncated view of a chunked_content.
+// If the content is short enough and single chunked, it just holds a view into the content.
+// Otherwise it will be copied into an internal buffer, possibly truncated (depending on maximum allowed size passed in),
+// and the view will point into that buffer.
+// `as_view()` method will return the view.
+// `take_as_sstring()` will either move out the internal buffer (if any), or create a new sstring from the view.
+// You should consider `as_view()` valid as long both the original chunked_content and the truncated_content object are alive.
+class truncated_content {
+    std::string_view _view;
+    sstring _content_maybe;
+
+    void copy_from_content(const chunked_content& content) {
+        size_t offset = 0;
+        for(auto &tmp : content) {
+            size_t to_copy = std::min(tmp.size(), _content_maybe.size() - offset);
+            std::copy(tmp.get(), tmp.get() + to_copy, _content_maybe.data() + offset);
+            offset += to_copy;
+            if (offset >= _content_maybe.size()) {
+                break;
+            }
+        }
    }
+public:
+    truncated_content(const chunked_content& content, size_t max_len = std::numeric_limits<size_t>::max()) {
+        if (content.empty()) return;
+        if (content.size() == 1 && content.begin()->size() <= max_len) {
+            _view = std::string_view(content.begin()->get(), content.begin()->size());
+            return;
+        }
+
+        constexpr std::string_view truncated_text = "<truncated>";
+        size_t content_size = 0;
+        for(auto &tmp : content) {
+            content_size += tmp.size();
+        }
+        if (content_size <= max_len) {
+            _content_maybe = sstring{ sstring::initialized_later{}, content_size };
+            copy_from_content(content);
+        }
+        else {
+            _content_maybe = sstring{ sstring::initialized_later{}, max_len + truncated_text.size() };
+            copy_from_content(content);
+            std::copy(truncated_text.begin(), truncated_text.end(), _content_maybe.data() + _content_maybe.size() - truncated_text.size());
+        }
+        _view = std::string_view(_content_maybe);
+    }
+
+    std::string_view as_view() const { return _view; }
+    sstring take_as_sstring() && {
+        if (_content_maybe.empty() && !_view.empty()) {
+            return sstring{_view};
+        }
+        return std::move(_content_maybe);
+    }
+};
+
+// `truncated_content_view` will produce an object representing a view to a passed content
+// possibly truncated at some length. The value returned is used in two ways:
+// - to print it in logs (use `as_view()` method for this)
+// - to pass it to tracing object, where it will be stored and used later
+//   (use `take_as_sstring()` method as this produces a copy in form of a sstring)
+// `truncated_content` delays constructing `sstring` object until it's actually needed.
+// `truncated_content` is valid as long as passed `content` is alive.
+// if the content is truncated, `<truncated>` will be appended at the maximum size limit
+// and total size will be `max_users_query_size_in_trace_output() + strlen("<truncated>")`.
+static truncated_content truncated_content_view(const chunked_content& content, size_t max_size) {
+    return truncated_content{content, max_size};
 }

-static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_state, std::string_view username, std::string_view op, const chunked_content& query) {
+static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_state, std::string_view username, std::string_view op, const chunked_content& query, size_t max_users_query_size_in_trace_output) {
    tracing::trace_state_ptr trace_state;
    tracing::tracing& tracing_instance = tracing::tracing::get_local_tracing_instance();
    if (tracing_instance.trace_next_query() || tracing_instance.slow_query_tracing_enabled()) {
        trace_state = create_tracing_session(tracing_instance);
-        std::string buf;
        tracing::add_session_param(trace_state, "alternator_op", op);
-        tracing::add_query(trace_state, truncated_content_view(query, buf));
+        tracing::add_query(trace_state, truncated_content_view(query, max_users_query_size_in_trace_output).take_as_sstring());
        tracing::begin(trace_state, seastar::format("Alternator {}", op), client_state.get_client_address());
        if (!username.empty()) {
            tracing::set_username(trace_state, auth::authenticated_user(username));
@@ -465,26 +514,197 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_
    return trace_state;
 }

+// This read_entire_stream() is similar to Seastar's read_entire_stream()
+// which reads the given content_stream until its end into non-contiguous
+// memory. The difference is that this implementation takes an extra length
+// limit, and throws an error if we read more than this limit.
+// This length-limited variant would not have been needed if Seastar's HTTP
+// server's set_content_length_limit() worked in every case, but unfortunately
+// it does not - it only works if the request has a Content-Length header (see
+// issue #8196). In contrast this function can limit the request's length no
+// matter how it's encoded. We need this limit to protect Alternator from
+// oversized requests that can deplete memory.
+static future<chunked_content>
+read_entire_stream(input_stream<char>& inp, size_t length_limit) {
+    chunked_content ret;
+    // We try to read length_limit + 1 bytes, so that we can throw an
+    // exception if we managed to read more than length_limit.
+    ssize_t remain = length_limit + 1;
+    do {
+        temporary_buffer<char> buf = co_await inp.read_up_to(remain);
+        if (buf.empty()) {
+            break;
+        }
+        remain -= buf.size();
+        ret.push_back(std::move(buf));
+    } while (remain > 0);
+    // If we read the full length_limit + 1 bytes, we went over the limit:
+    if (remain <= 0) {
+        // By throwing here an error, we may send a reply (the error message)
+        // without having read the full request body. Seastar's httpd will
+        // realize that we have not read the entire content stream, and
+        // correctly mark the connection unreusable, i.e., close it.
+        // This means we are currently exposed to issue #12166 caused by
+        // Seastar issue 1325), where the client may get an RST instead of
+        // a FIN, and may rarely get a "Connection reset by peer" before
+        // reading the error we send.
+        throw api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", length_limit));
+    }
+    co_return ret;
+}
+
+// safe_gzip_stream is an exception-safe wrapper for zlib's z_stream.
+// The "z_stream" struct is used by zlib to hold state while decompressing a
+// stream of data. It allocates memory which must be freed with inflateEnd(),
+// which the destructor of this class does.
+class safe_gzip_zstream {
+    z_stream _zs;
+public:
+    safe_gzip_zstream() {
+        memset(&_zs, 0, sizeof(_zs));
+        // The strange 16 + WMAX_BITS tells zlib to expect and decode
+        // a gzip header, not a zlib header.
+        if (inflateInit2(&_zs, 16 + MAX_WBITS) != Z_OK) {
+            // Should only happen if memory allocation fails
+            throw std::bad_alloc();
+        }
+    }
+    ~safe_gzip_zstream() {
+        inflateEnd(&_zs);
+    }
+    z_stream* operator->() {
+        return &_zs;
+    }
+    z_stream* get() {
+        return &_zs;
+    }
+    void reset() {
+        inflateReset(&_zs);
+    }
+};
+
+// ungzip() takes a chunked_content with a gzip-compressed request body,
+// uncompresses it, and returns the uncompressed content as a chunked_content.
+// If the uncompressed content exceeds length_limit, an error is thrown.
+static future<chunked_content>
+ungzip(chunked_content&& compressed_body, size_t length_limit) {
+    chunked_content ret;
+    // output_buf can be any size - when uncompressing input_buf, it doesn't
+    // need to fit in a single output_buf, we'll use multiple output_buf for
+    // a single input_buf if needed.
+    constexpr size_t OUTPUT_BUF_SIZE = 4096;
+    temporary_buffer<char> output_buf;
+    safe_gzip_zstream strm;
+    bool complete_stream = false; // empty input is not a valid gzip
+    size_t total_out_bytes = 0;
+    for (const temporary_buffer<char>& input_buf : compressed_body) {
+        if (input_buf.empty()) {
+            continue;
+        }
+        complete_stream = false;
+        strm->next_in = (Bytef*) input_buf.get();
+        strm->avail_in = (uInt) input_buf.size();
+        do {
+            co_await coroutine::maybe_yield();
+            if (output_buf.empty()) {
+                output_buf = temporary_buffer<char>(OUTPUT_BUF_SIZE);
+            }
+            strm->next_out = (Bytef*) output_buf.get();
+            strm->avail_out = OUTPUT_BUF_SIZE;
+            int e = inflate(strm.get(), Z_NO_FLUSH);
+            size_t out_bytes = OUTPUT_BUF_SIZE - strm->avail_out;
+            if (out_bytes > 0) {
+                // If output_buf is nearly full, we save it as-is in ret. But
+                // if it only has little data, better copy to a small buffer.
+                if (out_bytes > OUTPUT_BUF_SIZE/2) {
+                    ret.push_back(std::move(output_buf).prefix(out_bytes));
+                    // output_buf is now empty. if this loop finds more input,
+                    // we'll allocate a new output buffer.
+                } else {
+                    ret.push_back(temporary_buffer<char>(output_buf.get(), out_bytes));
+                }
+                total_out_bytes += out_bytes;
+                if (total_out_bytes > length_limit) {
+                    throw api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", length_limit));
+                }
+            }
+            if (e == Z_STREAM_END) {
+                // There may be more input after the first gzip stream - in
+                // either this input_buf or the next one. The additional input
+                // should be a second concatenated gzip. We need to allow that
+                // by resetting the gzip stream and continuing the input loop
+                // until there's no more input.
+                strm.reset();
+                if (strm->avail_in == 0) {
+                    complete_stream = true;
+                    break;
+                }
+            } else if (e != Z_OK && e != Z_BUF_ERROR) {
+                // DynamoDB returns an InternalServerError when given a bad
+                // gzip request body. See test test_broken_gzip_content
+                throw api_error::internal("Error during gzip decompression of request body");
+            }
+        } while (strm->avail_in > 0 || strm->avail_out == 0);
+    }
+    if (!complete_stream) {
+        // The gzip stream was not properly finished with Z_STREAM_END
+        throw api_error::internal("Truncated gzip in request body");
+    }
+    co_return ret;
+}
+
 future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {
    _executor._stats.total_operations++;
    sstring target = req->get_header("X-Amz-Target");
    // target is DynamoDB API version followed by a dot '.' and operation type (e.g. CreateTable)
    auto dot = target.find('.');
    std::string_view op = (dot == sstring::npos) ? std::string_view() : std::string_view(target).substr(dot+1);
+    if (req->content_length > request_content_length_limit) {
+        // If we have a Content-Length header and know the request will be too
+        // long, we don't need to wait for read_entire_stream() below to
+        // discover it. And we definitely mustn't try to get_units() below for
+        // for such a size.
+        co_return api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", request_content_length_limit));
+    }
    // JSON parsing can allocate up to roughly 2x the size of the raw
    // document, + a couple of bytes for maintenance.
-    // TODO: consider the case where req->content_length is missing. Maybe
-    // we need to take the content_length_limit and return some of the units
-    // when we finish read_content_and_verify_signature?
-    size_t mem_estimate = req->content_length * 2 + 8000;
+    // If the Content-Length of the request is not available, we assume
+    // the largest possible request (request_content_length_limit, i.e., 16 MB)
+    // and after reading the request we return_units() the excess.
+    size_t mem_estimate = (req->content_length ? req->content_length : request_content_length_limit) * 2 + 8000;
    auto units_fut = get_units(*_memory_limiter, mem_estimate);
    if (_memory_limiter->waiters()) {
        ++_executor._stats.requests_blocked_memory;
    }
    auto units = co_await std::move(units_fut);
    SCYLLA_ASSERT(req->content_stream);
-    chunked_content content = co_await util::read_entire_stream(*req->content_stream);
+    chunked_content content = co_await read_entire_stream(*req->content_stream, request_content_length_limit);
+    // If the request had no Content-Length, we reserved too many units
+    // so need to return some
+    if (req->content_length == 0) {
+        size_t content_length = 0;
+        for (const auto& chunk : content) {
+            content_length += chunk.size();
+        }
+        size_t new_mem_estimate = content_length * 2 + 8000;
+        units.return_units(mem_estimate - new_mem_estimate);
+    }
    auto username = co_await verify_signature(*req, content);
+    // If the request is compressed, uncompress it now, after we checked
+    // the signature (the signature is computed on the compressed content).
+    // We apply the request_content_length_limit again to the uncompressed
+    // content - we don't want to allow a tiny compressed request to
+    // expand to a huge uncompressed request.
+    sstring content_encoding = req->get_header("Content-Encoding");
+    if (content_encoding == "gzip") {
+        content = co_await ungzip(std::move(content), request_content_length_limit);
+    } else if (!content_encoding.empty()) {
+        // DynamoDB returns a 500 error for unsupported Content-Encoding.
+        // I'm not sure if this is the best error code, but let's do it too.
+        // See the test test_garbage_content_encoding confirming this case.
+        co_return api_error::internal("Unsupported Content-Encoding");
+    }
+
    // As long as the system_clients_entry object is alive, this request will
    // be visible in the "system.clients" virtual table. When requested, this
    // entry will be formatted by server::ongoing_request::make_client_data().
@@ -494,8 +714,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
        req->get_protocol_name() == "https");

    if (slogger.is_enabled(log_level::trace)) {
-        std::string buf;
-        slogger.trace("Request: {} {} {}", op, truncated_content_view(content, buf), req->_headers);
+        slogger.trace("Request: {} {} {}", op, truncated_content_view(content, _max_users_query_size_in_trace_output).as_view(), req->_headers);
    }
    auto callback_it = _callbacks.find(op);
    if (callback_it == _callbacks.end()) {
@@ -515,7 +734,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
    }
    co_await client_state.maybe_update_per_service_level_params();

-    tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content);
+    tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content, _max_users_query_size_in_trace_output.get());
    tracing::trace(trace_state, "{}", op);

    auto user = client_state.user();
@@ -566,7 +785,7 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos
        , _auth_service(auth_service)
        , _sl_controller(sl_controller)
        , _key_cache(1024, 1min, slogger)
-        , _enforce_authorization(false)
+        , _max_users_query_size_in_trace_output(1024)
        , _enabled_servers{}
        , _pending_requests("alternator::server::pending_requests")
        , _timeout_config(_proxy.data_dictionary().get_config())
@@ -647,12 +866,13 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos
 }

 future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
-        utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization,
+        utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization, utils::updateable_value<uint64_t> max_users_query_size_in_trace_output,
        semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests) {
    _memory_limiter = memory_limiter;
    _enforce_authorization = std::move(enforce_authorization);
    _warn_authorization = std::move(warn_authorization);
    _max_concurrent_requests = std::move(max_concurrent_requests);
+    _max_users_query_size_in_trace_output = std::move(max_users_query_size_in_trace_output);
    if (!port && !https_port) {
        return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
                " must be specified in order to init an alternator HTTP server instance"));
@@ -662,14 +882,12 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:

        if (port) {
            set_routes(_http_server._routes);
-            _http_server.set_content_length_limit(server::content_length_limit);
            _http_server.set_content_streaming(true);
            _http_server.listen(socket_address{addr, *port}).get();
            _enabled_servers.push_back(std::ref(_http_server));
        }
        if (https_port) {
            set_routes(_https_server._routes);
-            _https_server.set_content_length_limit(server::content_length_limit);
            _https_server.set_content_streaming(true);

            if (this_shard_id() == 0) {
--- a/alternator/server.hh
+++ b/alternator/server.hh
@@ -28,7 +28,11 @@ namespace alternator {
 using chunked_content = rjson::chunked_content;

 class server : public peering_sharded_service<server> {
-    static constexpr size_t content_length_limit = 16*MB;
+    // The maximum size of a request body that Alternator will accept,
+    // in bytes. This is a safety measure to prevent Alternator from
+    // running out of memory when a client sends a very large request.
+    // DynamoDB also has the same limit set to 16 MB.
+    static constexpr size_t request_content_length_limit = 16*MB;
    using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,
            tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>)>;
    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
@@ -44,6 +48,7 @@ class server : public peering_sharded_service<server> {
    key_cache _key_cache;
    utils::updateable_value<bool> _enforce_authorization;
    utils::updateable_value<bool> _warn_authorization;
+    utils::updateable_value<uint64_t> _max_users_query_size_in_trace_output;
    utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;
    named_gate _pending_requests;
    // In some places we will need a CQL updateable_timeout_config object even
@@ -95,7 +100,7 @@ public:
    server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& service, qos::service_level_controller& sl_controller);

    future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
-            utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization,
+            utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization, utils::updateable_value<uint64_t> max_users_query_size_in_trace_output,
            semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);
    future<> stop();
    // get_client_data() is called (on each shard separately) when the virtual
--- a/alternator/stats.cc
+++ b/alternator/stats.cc
@@ -154,6 +154,18 @@ static void register_metrics_with_optional_table(seastar::metrics::metric_groups
                    [&stats]{ return estimated_histogram_to_metrics(stats.api_operations.batch_get_item_histogram);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
            seastar::metrics::make_histogram("batch_item_count_histogram", seastar::metrics::description("Histogram of the number of items in a batch request"), labels,
                    [&stats]{ return estimated_histogram_to_metrics(stats.api_operations.batch_write_item_histogram);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
+            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
+                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.get_item_op_size_kb);})(op("GetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
+            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
+                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.put_item_op_size_kb);})(op("PutItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
+            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
+                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.delete_item_op_size_kb);})(op("DeleteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
+            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
+                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.update_item_op_size_kb);})(op("UpdateItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
+            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
+                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.batch_get_item_op_size_kb);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
+            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
+                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.batch_write_item_op_size_kb);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
    });

    seastar::metrics::label expression_label("expression");
--- a/alternator/stats.hh
+++ b/alternator/stats.hh
@@ -79,6 +79,32 @@ public:
        utils::estimated_histogram batch_get_item_histogram{22}; // a histogram that covers the range 1 - 100
        utils::estimated_histogram batch_write_item_histogram{22}; // a histogram that covers the range 1 - 100
    } api_operations;
+    // Operation size metrics
+    struct {
+        // Item size statistics collected per table and aggregated per node.
+        // Each histogram covers the range 0 - 446. Resolves #25143.
+        // A size is the retrieved item's size.
+        utils::estimated_histogram get_item_op_size_kb{30};
+        // A size is the maximum of the new item's size and the old item's size.
+        utils::estimated_histogram put_item_op_size_kb{30};
+        // A size is the deleted item's size. If the deleted item's size is
+        // unknown (i.e. read-before-write wasn't necessary and it wasn't
+        // forced by a configuration option), it won't be recorded on the
+        // histogram.
+        utils::estimated_histogram delete_item_op_size_kb{30};
+        // A size is the maximum of existing item's size and the estimated size
+        // of the update. This will be changed to the maximum of the existing item's
+        // size and the new item's size in a subsequent PR.
+        utils::estimated_histogram update_item_op_size_kb{30};
+
+        // A size is the sum of the sizes of all items per table. This means
+        // that a single BatchGetItem / BatchWriteItem updates the histogram
+        // for each table that it has items in.
+        // The sizes are the retrieved items' sizes grouped per table.
+        utils::estimated_histogram batch_get_item_op_size_kb{30};
+        // The sizes are the the written items' sizes grouped per table.
+        utils::estimated_histogram batch_write_item_op_size_kb{30};
+    } operation_sizes;
    // Count of authentication and authorization failures, counted if either
    // alternator_enforce_authorization or alternator_warn_authorization are
    // set to true. If both are false, no authentication or authorization
@@ -137,4 +163,8 @@ struct table_stats {
 };
 void register_metrics(seastar::metrics::metric_groups& metrics, const stats& stats);

+inline uint64_t bytes_to_kb_ceil(uint64_t bytes) {
+    return (bytes + 1023) / 1024;
+}
+
 }
--- a/alternator/streams.cc
+++ b/alternator/streams.cc
@@ -13,7 +13,6 @@

 #include <seastar/json/formatter.hh>

-#include "auth/permission.hh"
 #include "db/config.hh"

 #include "cdc/log.hh"
@@ -127,7 +126,7 @@ public:
    }
 };

-}
+} // namespace alternator

 template<typename ValueType>
 struct rapidjson::internal::TypeHelper<ValueType, alternator::stream_arn>
@@ -297,7 +296,7 @@ sequence_number::sequence_number(std::string_view v)
    }())
 {}

-}
+} // namespace alternator

 template<typename ValueType>
 struct rapidjson::internal::TypeHelper<ValueType, alternator::shard_id>
@@ -357,7 +356,7 @@ static stream_view_type cdc_options_to_steam_view_type(const cdc::options& opts)
    return type;
 }

-}
+} // namespace alternator

 template<typename ValueType>
 struct rapidjson::internal::TypeHelper<ValueType, alternator::stream_view_type>
@@ -476,10 +475,10 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
        } else {
            status = "ENABLED";
        }
-    } 
+    }

    auto ttl = std::chrono::seconds(opts.ttl());
-    
+
    rjson::add(stream_desc, "StreamStatus", rjson::from_string(status));

    stream_view_type type = cdc_options_to_steam_view_type(opts);
@@ -715,7 +714,7 @@ future<executor::request_return_type> executor::get_shard_iterator(client_state&

    auto type = rjson::get<shard_iterator_type>(request, "ShardIteratorType");
    auto seq_num = rjson::get_opt<sequence_number>(request, "SequenceNumber");
-    
+
    if (type < shard_iterator_type::TRIM_HORIZON && !seq_num) {
        throw api_error::validation("Missing required parameter \"SequenceNumber\"");
    }
@@ -725,7 +724,7 @@ future<executor::request_return_type> executor::get_shard_iterator(client_state&

    auto stream_arn = rjson::get<alternator::stream_arn>(request, "StreamArn");
    auto db = _proxy.data_dictionary();
-    
+
    schema_ptr schema = nullptr;
    std::optional<shard_id> sid;

@@ -790,7 +789,7 @@ struct event_id {
        return os;
    }
 };
-}
+} // namespace alternator

 template<typename ValueType>
 struct rapidjson::internal::TypeHelper<ValueType, alternator::event_id>
@@ -941,7 +940,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
                rjson::add(record, "awsRegion", rjson::from_string(dc_name));
                rjson::add(record, "eventID", event_id(iter.shard.id, *timestamp));
                rjson::add(record, "eventSource", "scylladb:alternator");
-                rjson::add(record, "eventVersion", "1.0");
+                rjson::add(record, "eventVersion", "1.1");
                rjson::push_back(records, std::move(record));
                record = rjson::empty_object();
                --limit;
@@ -1000,6 +999,16 @@ future<executor::request_return_type> executor::get_records(client_state& client
            case cdc::operation::insert:
                rjson::add(record, "eventName", "INSERT");
                break;
+            case cdc::operation::service_row_delete:
+            case cdc::operation::service_partition_delete:
+            {
+                auto user_identity = rjson::empty_object();
+                rjson::add(user_identity, "Type", "Service");
+                rjson::add(user_identity, "PrincipalId", "dynamodb.amazonaws.com");
+                rjson::add(record, "userIdentity", std::move(user_identity));
+                rjson::add(record, "eventName", "REMOVE");
+                break;
+            }
            default:
                rjson::add(record, "eventName", "REMOVE");
                break;
@@ -1064,9 +1073,7 @@ bool executor::add_stream_options(const rjson::value& stream_specification, sche
    }

    if (stream_enabled->GetBool()) {
-        auto db = sp.data_dictionary();
-
-        if (!db.features().alternator_streams) {
+        if (!sp.features().alternator_streams) {
            throw api_error::validation("StreamSpecification: alternator streams feature not enabled in cluster.");
        }

@@ -1125,4 +1132,4 @@ void executor::supplement_table_stream_info(rjson::value& descr, const schema& s
    }
 }

-}
+} // namespace alternator
--- a/alternator/ttl.cc
+++ b/alternator/ttl.cc
@@ -17,6 +17,7 @@
 #include <seastar/core/lowres_clock.hh>
 #include <seastar/coroutine/maybe_yield.hh>

+#include "cdc/log.hh"
 #include "exceptions/exceptions.hh"
 #include "gms/gossiper.hh"
 #include "gms/inet_address.hh"
@@ -67,7 +68,7 @@ extern const sstring TTL_TAG_KEY;

 future<executor::request_return_type> executor::update_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {
    _stats.api_operations.update_time_to_live++;
-    if (!_proxy.data_dictionary().features().alternator_ttl) {
+    if (!_proxy.features().alternator_ttl) {
        co_return api_error::unknown_operation("UpdateTimeToLive not yet supported. Experimental support is available if the 'alternator-ttl' experimental feature is enabled on all nodes.");
    }

@@ -292,7 +293,12 @@ static future<> expire_item(service::storage_proxy& proxy,
        db::consistency_level::LOCAL_QUORUM,
        executor::default_timeout(), // FIXME - which timeout?
        qs.get_trace_state(), qs.get_permit(),
-        db::allow_per_partition_rate_limit::no);
+        db::allow_per_partition_rate_limit::no,
+        false,
+        cdc::per_request_options{
+            .is_system_originated = true,
+        }
+    );
 }

 static size_t random_offset(size_t min, size_t max) {
--- a/api/CMakeLists.txt
+++ b/api/CMakeLists.txt
@@ -106,5 +106,8 @@ target_link_libraries(api
    wasmtime_bindings
    absl::headers)

+if (Scylla_USE_PRECOMPILED_HEADER_USE)
+  target_precompile_headers(api REUSE_FROM scylla-precompiled-header)
+endif()
 check_headers(check-headers api
  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
--- a/api/api-doc/storage_service.json
+++ b/api/api-doc/storage_service.json
@@ -220,6 +220,25 @@
            }
         ]
      },
+      {
+         "path":"/storage_service/nodes/excluded",
+         "operations":[
+            {
+               "method":"GET",
+               "summary":"Retrieve host ids of nodes which are marked as excluded",
+               "type":"array",
+               "items":{
+                  "type":"string"
+               },
+               "nickname":"get_excluded_nodes",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[
+               ]
+            }
+         ]
+      },
      {
         "path":"/storage_service/nodes/joining",
         "operations":[
@@ -594,6 +613,50 @@
            }
         ]
      },
+      {
+         "path": "/storage_service/natural_endpoints/v2/{keyspace}",
+         "operations": [
+            {
+               "method": "GET",
+               "summary":"This method returns the N endpoints that are responsible for storing the specified key i.e for replication. the endpoint responsible for this key",
+               "type": "array",
+               "items": {
+                  "type": "string"
+               },
+               "nickname": "get_natural_endpoints_v2",
+               "produces": [
+                  "application/json"
+               ],
+               "parameters": [
+                  {
+                     "name": "keyspace",
+                     "description": "The keyspace to query about.",
+                     "required": true,
+                     "allowMultiple": false,
+                     "type": "string",
+                     "paramType": "path"
+                  },
+                  {
+                     "name": "cf",
+                     "description": "Column family name.",
+                     "required": true,
+                     "allowMultiple": false,
+                     "type": "string",
+                     "paramType": "query"
+                  },
+                  {
+                     "name": "key_component",
+                     "description": "Each component of the key for which we need to find the endpoint (e.g. ?key_component=part1&key_component=part2).",
+                     "required": true,
+                     "allowMultiple": true,
+                     "type": "string",
+                     "paramType": "query"
+                  }
+               ]
+            }
+         ]
+      },
+
      {
         "path":"/storage_service/cdc_streams_check_and_repair",
         "operations":[
@@ -1132,6 +1195,14 @@
                     "allowMultiple":false,
                     "type":"string",
                     "paramType":"query"
+                  },
+                  {
+                     "name": "drop_unfixable_sstables",
+                     "description": "When set to true, drop unfixable sstables. Applies only to scrub mode SEGREGATE.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"boolean",
+                     "paramType":"query"
                  }
               ]
            }
@@ -1551,6 +1622,30 @@
            }
         ]
      },
+      {
+         "path":"/storage_service/exclude_node",
+         "operations":[
+            {
+               "method":"POST",
+               "summary":"Marks the node as permanently down (excluded).",
+               "type":"void",
+               "nickname":"exclude_node",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[
+                  {
+                     "name":"hosts",
+                     "description":"Comma-separated list of host ids to exclude",
+                     "required":true,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  }
+               ]
+            }
+         ]
+      },
      {
         "path":"/storage_service/removal_status",
         "operations":[
@@ -2956,7 +3051,7 @@
                  },
                  {
                     "name":"incremental_mode",
-                     "description":"Set the incremental repair mode. Can be 'disabled', 'incremental', or 'full'. 'incremental': The incremental repair logic is enabled. Unrepaired sstables will be included for repair. Repaired sstables will be skipped. The incremental repair states will be updated after repair. 'full': The incremental repair logic is enabled. Both repaired and unrepaired sstables will be included for repair. The incremental repair states will be updated after repair. 'disabled': The incremental repair logic is disabled completely. The incremental repair states, e.g., repaired_at in sstables and sstables_repaired_at in the system.tablets table, will not be updated after repair. When the option is not provided, it defaults to 'disabled' mode.",
+                     "description":"Set the incremental repair mode. Can be 'disabled', 'incremental', or 'full'. 'incremental': The incremental repair logic is enabled. Unrepaired sstables will be included for repair. Repaired sstables will be skipped. The incremental repair states will be updated after repair. 'full': The incremental repair logic is enabled. Both repaired and unrepaired sstables will be included for repair. The incremental repair states will be updated after repair. 'disabled': The incremental repair logic is disabled completely. The incremental repair states, e.g., repaired_at in sstables and sstables_repaired_at in the system.tablets table, will not be updated after repair. When the option is not provided, it defaults to incremental mode.",
                     "required":false,
                     "allowMultiple":false,
                     "type":"string",
--- a/api/api-doc/task_manager.json
+++ b/api/api-doc/task_manager.json
@@ -349,9 +349,13 @@
               "type":"long",
               "description":"The shard the task is running on"
            },
+            "creation_time":{
+               "type":"datetime",
+               "description":"The creation time of the task (when it was queued); extracted from the task_id UUID"
+            },
            "start_time":{
               "type":"datetime",
-               "description":"The start time of the task; unspecified (equal to epoch) when state == created"
+               "description":"The start time of the task (when execution began); unspecified (equal to epoch) when state == created"
            },
            "end_time":{
               "type":"datetime",
@@ -398,13 +402,17 @@
               "type":"boolean",
               "description":"Boolean flag indicating whether the task can be aborted"
            },
+            "creation_time":{
+               "type":"datetime",
+               "description":"The creation time of the task (when it was queued); extracted from the task_id UUID"
+            },
            "start_time":{
               "type":"datetime",
-               "description":"The start time of the task"
+               "description":"The start time of the task (when execution began); unspecified (equal to epoch) when state == created"
            },
            "end_time":{
               "type":"datetime",
-               "description":"The end time of the task (unspecified when the task is not completed)"
+               "description":"The end time of the task (when execution completed); unspecified (equal to epoch) when the task is not completed"
            },
            "error":{
               "type":"string",
--- a/api/api.cc
+++ b/api/api.cc
@@ -216,10 +216,10 @@ future<> unset_server_gossip(http_context& ctx) {
    });
 }

-future<> set_server_column_family(http_context& ctx, sharded<replica::database>& db, sharded<db::system_keyspace>& sys_ks) {
+future<> set_server_column_family(http_context& ctx, sharded<replica::database>& db) {
    co_await register_api(ctx, "column_family",
-                "The column family API", [&db, &sys_ks] (http_context& ctx, routes& r) {
-                    set_column_family(ctx, r, db, sys_ks);
+                "The column family API", [&db] (http_context& ctx, routes& r) {
+                    set_column_family(ctx, r, db);
                });
    co_await register_api(ctx, "cache_service",
            "The cache service API", [&db] (http_context& ctx, routes& r) {
--- a/api/api_init.hh
+++ b/api/api_init.hh
@@ -58,7 +58,6 @@ class sstables_format_selector;
 namespace view {
 class view_builder;
 }
-class system_keyspace;
 }
 namespace netw { class messaging_service; }
 class repair_service;
@@ -118,7 +117,7 @@ future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_to
 future<> unset_server_token_metadata(http_context& ctx);
 future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);
 future<> unset_server_gossip(http_context& ctx);
-future<> set_server_column_family(http_context& ctx, sharded<replica::database>& db, sharded<db::system_keyspace>& sys_ks);
+future<> set_server_column_family(http_context& ctx, sharded<replica::database>& db);
 future<> unset_server_column_family(http_context& ctx);
 future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
 future<> unset_server_messaging_service(http_context& ctx);
--- a/api/column_family.cc
+++ b/api/column_family.cc
@@ -18,7 +18,6 @@
 #include "utils/assert.hh"
 #include "utils/estimated_histogram.hh"
 #include <algorithm>
-#include "db/system_keyspace.hh"
 #include "db/data_listeners.hh"
 #include "storage_service.hh"
 #include "compaction/compaction_manager.hh"
@@ -67,6 +66,13 @@ static future<json::json_return_type>  get_cf_stats(sharded<replica::database>&
    }, std::plus<int64_t>());
 }

+static future<json::json_return_type>  get_cf_stats(sharded<replica::database>& db,
+        std::function<int64_t(const replica::column_family_stats&)> f) {
+    return map_reduce_cf(db, int64_t(0), [f](const replica::column_family& cf) {
+        return f(cf.get_stats());
+    }, std::plus<int64_t>());
+}
+
 static future<json::json_return_type> for_tables_on_all_shards(sharded<replica::database>& db, std::vector<table_info> tables, std::function<future<>(replica::table&)> set) {
    return do_with(std::move(tables), [&db, set] (const std::vector<table_info>& tables) {
        return db.invoke_on_all([&tables, set] (replica::database& db) {
@@ -336,7 +342,7 @@ uint64_t accumulate_on_active_memtables(replica::table& t, noncopyable_function<
    return ret;
 }

-void set_column_family(http_context& ctx, routes& r, sharded<replica::database>& db, sharded<db::system_keyspace>& sys_ks) {
+void set_column_family(http_context& ctx, routes& r, sharded<replica::database>& db) {
    cf::get_column_family_name.set(r, [&db] (const_req req){
        std::vector<sstring> res;
        const replica::database::tables_metadata& meta = db.local().get_tables_metadata();
@@ -937,30 +943,6 @@ void set_column_family(http_context& ctx, routes& r, sharded<replica::database>&
        return set_tables_tombstone_gc(db, std::move(tables), false);
    });

-    cf::get_built_indexes.set(r, [&db, &sys_ks](std::unique_ptr<http::request> req) {
-        auto [ks, cf_name] = parse_fully_qualified_cf_name(req->get_path_param("name"));
-        // Use of load_built_views() as filtering table should be in sync with
-        // built_indexes_virtual_reader filtering with BUILT_VIEWS table
-        return sys_ks.local().load_built_views().then([ks, cf_name, &db](const std::vector<db::system_keyspace::view_name>& vb) mutable {
-            std::set<sstring> vp;
-            for (auto b : vb) {
-                if (b.first == ks) {
-                    vp.insert(b.second);
-                }
-            }
-            std::vector<sstring> res;
-            auto uuid = validate_table(db.local(), ks, cf_name);
-            replica::column_family& cf = db.local().find_column_family(uuid);
-            res.reserve(cf.get_index_manager().list_indexes().size());
-            for (auto&& i : cf.get_index_manager().list_indexes()) {
-                if (vp.contains(secondary_index::index_table_name(i.metadata().name()))) {
-                    res.emplace_back(i.metadata().name());
-                }
-            }
-            return make_ready_future<json::json_return_type>(res);
-        });
-    });
-
    cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {
        // FIXME
        // Currently there are no information on the compression
@@ -1091,10 +1073,14 @@ void set_column_family(http_context& ctx, routes& r, sharded<replica::database>&
    });

    ss::get_load.set(r, [&db] (std::unique_ptr<http::request> req) {
-        return get_cf_stats(db, &replica::column_family_stats::live_disk_space_used);
+        return get_cf_stats(db, [](const replica::column_family_stats& stats) {
+            return stats.live_disk_space_used.on_disk;
+        });
    });
    ss::get_metrics_load.set(r, [&db] (std::unique_ptr<http::request> req) {
-        return get_cf_stats(db, &replica::column_family_stats::live_disk_space_used);
+        return get_cf_stats(db, [](const replica::column_family_stats& stats) {
+            return stats.live_disk_space_used.on_disk;
+        });
    });

    ss::get_keyspaces.set(r, [&db] (const_req req) {
@@ -1215,7 +1201,6 @@ void unset_column_family(http_context& ctx, routes& r) {
    cf::disable_tombstone_gc.unset(r);
    ss::enable_tombstone_gc.unset(r);
    ss::disable_tombstone_gc.unset(r);
-    cf::get_built_indexes.unset(r);
    cf::get_compression_metadata_off_heap_memory_used.unset(r);
    cf::get_compression_parameters.unset(r);
    cf::get_compression_ratio.unset(r);
--- a/api/column_family.hh
+++ b/api/column_family.hh
@@ -13,13 +13,9 @@
 #include <any>
 #include "api/api_init.hh"

-namespace db {
-class system_keyspace;
-}
-
 namespace api {

-void set_column_family(http_context& ctx, httpd::routes& r, sharded<replica::database>& db, sharded<db::system_keyspace>& sys_ks);
+void set_column_family(http_context& ctx, httpd::routes& r, sharded<replica::database>& db);
 void unset_column_family(http_context& ctx, httpd::routes& r);

 table_info parse_table_info(const sstring& name, const replica::database& db);
--- a/api/error_injection.cc
+++ b/api/error_injection.cc
@@ -21,10 +21,10 @@ namespace hf = httpd::error_injection_json;

 void set_error_injection(http_context& ctx, routes& r) {

-    hf::enable_injection.set(r, [](std::unique_ptr<request> req) {
+    hf::enable_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {
        sstring injection = req->get_path_param("injection");
        bool one_shot = req->get_query_param("one_shot") == "True";
-        auto params = req->content;
+        auto params = co_await util::read_entire_stream_contiguous(*req->content_stream);

        const size_t max_params_size = 1024 * 1024;
        if (params.size() > max_params_size) {
@@ -39,12 +39,11 @@ void set_error_injection(http_context& ctx, routes& r) {
                : rjson::parse_to_map<utils::error_injection_parameters>(params);

            auto& errinj = utils::get_local_injector();
-            return errinj.enable_on_all(injection, one_shot, std::move(parameters)).then([] {
-                return make_ready_future<json::json_return_type>(json::json_void());
-            });
+            co_await errinj.enable_on_all(injection, one_shot, std::move(parameters));
        } catch (const rjson::error& e) {
            throw httpd::bad_param_exception(format("Failed to parse injections parameters: {}", e.what()));
        }
+        co_return json::json_void();
    });

    hf::get_enabled_injections_on_all.set(r, [](std::unique_ptr<request> req) {
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -37,6 +37,7 @@
 #include "gms/gossiper.hh"
 #include "db/system_keyspace.hh"
 #include <seastar/http/exception.hh>
+#include <seastar/http/short_streams.hh>
 #include <seastar/core/coroutine.hh>
 #include <seastar/coroutine/parallel_for_each.hh>
 #include <seastar/coroutine/exception.hh>
@@ -273,6 +274,13 @@ scrub_info parse_scrub_options(const http_context& ctx, std::unique_ptr<http::re
        throw httpd::bad_param_exception(fmt::format("Unknown argument for 'quarantine_mode' parameter: {}", quarantine_mode_str));
    }

+    if(req_param<bool>(*req, "drop_unfixable_sstables", false)) {
+        if(scrub_mode != compaction::compaction_type_options::scrub::mode::segregate) {
+            throw httpd::bad_param_exception("The 'drop_unfixable_sstables' parameter is only valid when 'scrub_mode' is 'SEGREGATE'");
+        }
+        info.opts.drop_unfixable = compaction::compaction_type_options::scrub::drop_unfixable_sstables::yes;
+    }
+
    return info;
 }

@@ -499,9 +507,8 @@ void set_sstables_loader(http_context& ctx, routes& r, sharded<sstables_loader>&
        auto scope = parse_stream_scope(req->get_query_param("scope"));
        auto primary_replica_only = validate_bool_x(req->get_query_param("primary_replica_only"), false);

-        // TODO: the http_server backing the API does not use content streaming
-        // should use it for better performance
-        rjson::value parsed = rjson::parse(req->content);
+        rjson::chunked_content content = co_await util::read_entire_stream(*req->content_stream);
+        rjson::value parsed = rjson::parse(std::move(content));
        if (!parsed.IsArray()) {
            throw httpd::bad_param_exception("malformatted sstables in body");
        }
@@ -529,10 +536,35 @@ void set_view_builder(http_context& ctx, routes& r, sharded<db::view::view_build
        });
    });

+    cf::get_built_indexes.set(r, [&vb](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
+        auto [ks, cf_name] = parse_fully_qualified_cf_name(req->get_path_param("name"));
+        // Use of load_built_views() as filtering table should be in sync with
+        // built_indexes_virtual_reader filtering with BUILT_VIEWS table
+        std::vector<db::system_keyspace::view_name> vn = co_await vb.local().get_sys_ks().load_built_views();
+        std::set<sstring> vp;
+        for (auto b : vn) {
+            if (b.first == ks) {
+                vp.insert(b.second);
+            }
+        }
+        std::vector<sstring> res;
+        replica::database& db = vb.local().get_db();
+        auto uuid = validate_table(db, ks, cf_name);
+        replica::column_family& cf = db.find_column_family(uuid);
+        res.reserve(cf.get_index_manager().list_indexes().size());
+        for (auto&& i : cf.get_index_manager().list_indexes()) {
+            if (vp.contains(secondary_index::index_table_name(i.metadata().name()))) {
+                res.emplace_back(i.metadata().name());
+            }
+        }
+        co_return res;
+    });
+
 }

 void unset_view_builder(http_context& ctx, routes& r) {
    ss::view_build_statuses.unset(r);
+    cf::get_built_indexes.unset(r);
 }

 static future<json::json_return_type> describe_ring_as_json(sharded<service::storage_service>& ss, sstring keyspace) {
@@ -712,6 +744,14 @@ rest_get_natural_endpoints(http_context& ctx, sharded<service::storage_service>&
        return res | std::views::transform([] (auto& ep) { return fmt::to_string(ep); }) | std::ranges::to<std::vector>();
 }

+static
+json::json_return_type
+rest_get_natural_endpoints_v2(http_context& ctx, sharded<service::storage_service>& ss, const_req req) {
+        auto keyspace = validate_keyspace(ctx, req);
+        auto res = ss.local().get_natural_endpoints(keyspace, req.get_query_param("cf"), req.get_query_param_array("key_component"));
+        return res | std::views::transform([] (auto& ep) { return fmt::to_string(ep); }) | std::ranges::to<std::vector>();
+}
+
 static
 future<json::json_return_type>
 rest_cdc_streams_check_and_repair(sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {
@@ -736,7 +776,7 @@ rest_cleanup_all(http_context& ctx, sharded<service::storage_service>& ss, std::
            if (!ss.is_topology_coordinator_enabled()) {
                co_return false;
            }
-            co_await ss.do_cluster_cleanup();
+            co_await ss.do_clusterwide_vnodes_cleanup();
            co_return true;
        });
        if (done) {
@@ -833,6 +873,25 @@ rest_remove_node(sharded<service::storage_service>& ss, std::unique_ptr<http::re
        });
 }

+static
+future<json::json_return_type>
+rest_exclude_node(sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {
+    auto hosts = utils::split_comma_separated_list(req->get_query_param("hosts"))
+        | std::views::transform([] (const sstring& s) { return locator::host_id(utils::UUID(s)); })
+        | std::ranges::to<std::vector<locator::host_id>>();
+
+    auto& topo = ss.local().get_token_metadata().get_topology();
+    for (auto host : hosts) {
+        if (!topo.has_node(host)) {
+            throw bad_param_exception(fmt::format("Host ID {} does not belong to this cluster", host));
+        }
+    }
+
+    apilog.info("exclude_node: hosts={}", hosts);
+    co_await ss.local().mark_excluded(hosts);
+    co_return json_void();
+}
+
 static
 future<json::json_return_type>
 rest_get_removal_status(sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {
@@ -1750,6 +1809,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
    ss::describe_ring.set(r, rest_bind(rest_describe_ring, ctx, ss));
    ss::get_current_generation_number.set(r, rest_bind(rest_get_current_generation_number, ss));
    ss::get_natural_endpoints.set(r, rest_bind(rest_get_natural_endpoints, ctx, ss));
+    ss::get_natural_endpoints_v2.set(r, rest_bind(rest_get_natural_endpoints_v2, ctx, ss));
    ss::cdc_streams_check_and_repair.set(r, rest_bind(rest_cdc_streams_check_and_repair, ss));
    ss::cleanup_all.set(r, rest_bind(rest_cleanup_all, ctx, ss));
    ss::reset_cleanup_needed.set(r, rest_bind(rest_reset_cleanup_needed, ctx, ss));
@@ -1758,6 +1818,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_
    ss::decommission.set(r, rest_bind(rest_decommission, ss));
    ss::move.set(r, rest_bind(rest_move, ss));
    ss::remove_node.set(r, rest_bind(rest_remove_node, ss));
+    ss::exclude_node.set(r, rest_bind(rest_exclude_node, ss));
    ss::get_removal_status.set(r, rest_bind(rest_get_removal_status, ss));
    ss::force_remove_completion.set(r, rest_bind(rest_force_remove_completion, ss));
    ss::set_logging_level.set(r, rest_bind(rest_set_logging_level));
@@ -1836,6 +1897,7 @@ void unset_storage_service(http_context& ctx, routes& r) {
    ss::decommission.unset(r);
    ss::move.unset(r);
    ss::remove_node.unset(r);
+    ss::exclude_node.unset(r);
    ss::get_removal_status.unset(r);
    ss::force_remove_completion.unset(r);
    ss::set_logging_level.unset(r);
--- a/api/system.cc
+++ b/api/system.cc
@@ -54,7 +54,8 @@ void set_system(http_context& ctx, routes& r) {

    hm::set_metrics_config.set(r, [](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
        rapidjson::Document doc;
-        doc.Parse(req->content.c_str());
+        auto content = co_await util::read_entire_stream_contiguous(*req->content_stream);
+        doc.Parse(content.c_str());
        if (!doc.IsArray()) {
            throw bad_param_exception("Expected a json array");
        }
@@ -87,21 +88,19 @@ void set_system(http_context& ctx, routes& r) {
                relabels[i].expr = element["regex"].GetString();
            }
        }
-        return do_with(std::move(relabels), false, [](const std::vector<seastar::metrics::relabel_config>& relabels, bool& failed) {
-            return smp::invoke_on_all([&relabels, &failed] {
-                return metrics::set_relabel_configs(relabels).then([&failed](const metrics::metric_relabeling_result& result) {
-                    if (result.metrics_relabeled_due_to_collision > 0) {
-                        failed = true;
-                    }
-                    return;
-                });
-            }).then([&failed](){
-                if (failed) {
-                    throw bad_param_exception("conflicts found during relabeling");
+        bool failed = false;
+        co_await smp::invoke_on_all([&relabels, &failed] {
+            return metrics::set_relabel_configs(relabels).then([&failed](const metrics::metric_relabeling_result& result) {
+                if (result.metrics_relabeled_due_to_collision > 0) {
+                    failed = true;
                }
-                return make_ready_future<json::json_return_type>(seastar::json::json_void());
+                return;
            });
        });
+        if (failed) {
+            throw bad_param_exception("conflicts found during relabeling");
+        }
+        co_return seastar::json::json_void();
    });

    hs::get_system_uptime.set(r, [](const_req req) {
--- a/api/task_manager.cc
+++ b/api/task_manager.cc
@@ -55,6 +55,7 @@ tm::task_status make_status(tasks::task_status status, sharded<gms::gossiper>& g
    res.scope = status.scope;
    res.state = status.state;
    res.is_abortable = bool(status.is_abortable);
+    res.creation_time = get_time(status.creation_time);
    res.start_time = get_time(status.start_time);
    res.end_time = get_time(status.end_time);
    res.error = status.error;
@@ -83,6 +84,7 @@ tm::task_stats make_stats(tasks::task_stats stats) {
    res.table = stats.table;
    res.entity = stats.entity;
    res.shard = stats.shard;
+    res.creation_time = get_time(stats.creation_time);
    res.start_time = get_time(stats.start_time);
    res.end_time = get_time(stats.end_time);;
    return res;
--- a/api/tasks.cc
+++ b/api/tasks.cc
@@ -73,7 +73,7 @@ static future<shared_ptr<compaction::cleanup_keyspace_compaction_task_impl>> for
        co_return nullptr;
    }
    apilog.info("force_keyspace_cleanup: keyspace={} tables={}", keyspace, table_infos);
-    if (!co_await ss.local().is_cleanup_allowed(keyspace)) {
+    if (!co_await ss.local().is_vnodes_cleanup_allowed(keyspace)) {
        auto msg = "Can not perform cleanup operation when topology changes";
        apilog.warn("force_keyspace_cleanup: keyspace={} tables={}: {}", keyspace, table_infos, msg);
        co_await coroutine::return_exception(std::runtime_error(msg));
--- a/api/token_metadata.cc
+++ b/api/token_metadata.cc
@@ -62,6 +62,17 @@ void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_to
        return addr | std::ranges::to<std::vector>();
    });

+    ss::get_excluded_nodes.set(r, [&tm](const_req req) {
+        const auto& local_tm = *tm.local().get();
+        std::vector<sstring> eps;
+        local_tm.get_topology().for_each_node([&] (auto& node) {
+            if (node.is_excluded()) {
+                eps.push_back(node.host_id().to_sstring());
+            }
+        });
+        return eps;
+    });
+
    ss::get_joining_nodes.set(r, [&tm, &g](const_req req) {
        const auto& local_tm = *tm.local().get();
        const auto& points = local_tm.get_bootstrap_tokens();
@@ -130,6 +141,7 @@ void unset_token_metadata(http_context& ctx, routes& r) {
    ss::get_leaving_nodes.unset(r);
    ss::get_moving_nodes.unset(r);
    ss::get_joining_nodes.unset(r);
+    ss::get_excluded_nodes.unset(r);
    ss::get_host_id_map.unset(r);
    httpd::endpoint_snitch_info_json::get_datacenter.unset(r);
    httpd::endpoint_snitch_info_json::get_rack.unset(r);
--- a/audit/CMakeLists.txt
+++ b/audit/CMakeLists.txt
@@ -5,6 +5,7 @@ target_sources(scylla_audit
  PRIVATE
    audit.cc
    audit_cf_storage_helper.cc
+    audit_composite_storage_helper.cc
    audit_syslog_storage_helper.cc)
 target_include_directories(scylla_audit
  PUBLIC
@@ -16,4 +17,7 @@ target_link_libraries(scylla_audit
  PRIVATE
    cql3)

+if (Scylla_USE_PRECOMPILED_HEADER_USE)
+  target_precompile_headers(scylla_audit REUSE_FROM scylla-precompiled-header)
+endif()
 add_whole_archive(audit scylla_audit)
--- a/audit/audit.cc
+++ b/audit/audit.cc
@@ -13,9 +13,11 @@
 #include "cql3/statements/batch_statement.hh"
 #include "cql3/statements/modification_statement.hh"
 #include "storage_helper.hh"
+#include "audit_cf_storage_helper.hh"
+#include "audit_syslog_storage_helper.hh"
+#include "audit_composite_storage_helper.hh"
 #include "audit.hh"
 #include "../db/config.hh"
-#include "utils/class_registrator.hh"

 #include <boost/algorithm/string/split.hpp>
 #include <boost/algorithm/string/trim.hpp>
@@ -26,6 +28,47 @@ namespace audit {

 logging::logger logger("audit");

+static std::set<sstring> parse_audit_modes(const sstring& data) {
+    std::set<sstring> result;
+    if (!data.empty()) {
+        std::vector<sstring> audit_modes;
+        boost::split(audit_modes, data, boost::is_any_of(","));
+        if (audit_modes.empty()) {
+            return {};
+        }
+        for (sstring& audit_mode : audit_modes) {
+            boost::trim(audit_mode);
+            if (audit_mode == "none") {
+                return {};
+            }
+            if (audit_mode != "table" && audit_mode != "syslog") {
+                throw audit_exception(fmt::format("Bad configuration: invalid 'audit': {}", audit_mode));
+            }
+            result.insert(std::move(audit_mode));
+        }
+    }
+    return result;
+}
+
+static std::unique_ptr<storage_helper> create_storage_helper(const std::set<sstring>& audit_modes, cql3::query_processor& qp, service::migration_manager& mm) {
+    SCYLLA_ASSERT(!audit_modes.empty() && !audit_modes.contains("none"));
+
+    std::vector<std::unique_ptr<storage_helper>> helpers;
+    for (const sstring& audit_mode : audit_modes) {
+        if (audit_mode == "table") {
+            helpers.emplace_back(std::make_unique<audit_cf_storage_helper>(qp, mm));
+        } else if (audit_mode == "syslog") {
+            helpers.emplace_back(std::make_unique<audit_syslog_storage_helper>(qp, mm));
+        }
+    }
+
+    SCYLLA_ASSERT(!helpers.empty());
+    if (helpers.size() == 1) {
+        return std::move(helpers.front());
+    }
+    return std::make_unique<audit_composite_storage_helper>(std::move(helpers));
+}
+
 static sstring category_to_string(statement_category category)
 {
    switch (category) {
@@ -103,7 +146,9 @@ static std::set<sstring> parse_audit_keyspaces(const sstring& data) {
 }

 audit::audit(locator::shared_token_metadata& token_metadata,
-             sstring&& storage_helper_name,
+             cql3::query_processor& qp,
+             service::migration_manager& mm,
+             std::set<sstring>&& audit_modes,
             std::set<sstring>&& audited_keyspaces,
             std::map<sstring, std::set<sstring>>&& audited_tables,
             category_set&& audited_categories,
@@ -112,28 +157,21 @@ audit::audit(locator::shared_token_metadata& token_metadata,
    , _audited_keyspaces(std::move(audited_keyspaces))
    , _audited_tables(std::move(audited_tables))
    , _audited_categories(std::move(audited_categories))
-    , _storage_helper_class_name(std::move(storage_helper_name))
    , _cfg(cfg)
    , _cfg_keyspaces_observer(cfg.audit_keyspaces.observe([this] (sstring const& new_value){ update_config<std::set<sstring>>(new_value, parse_audit_keyspaces, _audited_keyspaces); }))
    , _cfg_tables_observer(cfg.audit_tables.observe([this] (sstring const& new_value){ update_config<std::map<sstring, std::set<sstring>>>(new_value, parse_audit_tables, _audited_tables); }))
    , _cfg_categories_observer(cfg.audit_categories.observe([this] (sstring const& new_value){ update_config<category_set>(new_value, parse_audit_categories, _audited_categories); }))
-{ }
+{
+    _storage_helper_ptr = create_storage_helper(std::move(audit_modes), qp, mm);
+}

 audit::~audit() = default;

-future<> audit::create_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm) {
-    sstring storage_helper_name;
-    if (cfg.audit() == "table") {
-        storage_helper_name = "audit_cf_storage_helper";
-    } else if (cfg.audit() == "syslog") {
-        storage_helper_name = "audit_syslog_storage_helper";
-    } else if (cfg.audit() == "none") {
-        // Audit is off
+future<> audit::start_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm) {
+    std::set<sstring> audit_modes = parse_audit_modes(cfg.audit());
+    if (audit_modes.empty()) {
        logger.info("Audit is disabled");
-
        return make_ready_future<>();
-    } else {
-        throw audit_exception(fmt::format("Bad configuration: invalid 'audit': {}", cfg.audit()));
    }
    category_set audited_categories = parse_audit_categories(cfg.audit_categories());
    std::map<sstring, std::set<sstring>> audited_tables = parse_audit_tables(cfg.audit_tables());
@@ -143,19 +181,20 @@ future<> audit::create_audit(const db::config& cfg, sharded<locator::shared_toke
                cfg.audit(), cfg.audit_categories(), cfg.audit_keyspaces(), cfg.audit_tables());

    return audit_instance().start(std::ref(stm),
-                                  std::move(storage_helper_name),
+                                  std::ref(qp),
+                                  std::ref(mm),
+                                  std::move(audit_modes),
                                  std::move(audited_keyspaces),
                                  std::move(audited_tables),
                                  std::move(audited_categories),
-                                  std::cref(cfg));
-}
-
-future<> audit::start_audit(const db::config& cfg, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm) {
-    if (!audit_instance().local_is_initialized()) {
-        return make_ready_future<>();
-    }
-    return audit_instance().invoke_on_all([&cfg, &qp, &mm] (audit& local_audit) {
-        return local_audit.start(cfg, qp.local(), mm.local());
+                                  std::cref(cfg))
+    .then([&cfg] {
+        if (!audit_instance().local_is_initialized()) {
+            return make_ready_future<>();
+        }
+        return audit_instance().invoke_on_all([&cfg] (audit& local_audit) {
+            return local_audit.start(cfg);
+        });
    });
 }

@@ -181,15 +220,7 @@ audit_info_ptr audit::create_no_audit_info() {
    return audit_info_ptr();
 }

-future<> audit::start(const db::config& cfg, cql3::query_processor& qp, service::migration_manager& mm) {
-    try {
-        _storage_helper_ptr = create_object<storage_helper>(_storage_helper_class_name, qp, mm);
-    } catch (no_such_class& e) {
-        logger.error("Can't create audit storage helper {}: not supported", _storage_helper_class_name);
-        throw;
-    } catch (...) {
-        throw;
-    }
+future<> audit::start(const db::config& cfg) {
    return _storage_helper_ptr->start(cfg);
 }

--- a/audit/audit.hh
+++ b/audit/audit.hh
@@ -102,7 +102,6 @@ class audit final : public seastar::async_sharded_service<audit> {
    std::map<sstring, std::set<sstring>> _audited_tables;
    category_set _audited_categories;

-    sstring _storage_helper_class_name;
    std::unique_ptr<storage_helper> _storage_helper_ptr;

    const db::config& _cfg;
@@ -125,18 +124,20 @@ public:
    static audit& local_audit_instance() {
        return audit_instance().local();
    }
-    static future<> create_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm);
-    static future<> start_audit(const db::config& cfg, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm);
+    static future<> start_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm);
    static future<> stop_audit();
    static audit_info_ptr create_audit_info(statement_category cat, const sstring& keyspace, const sstring& table);
    static audit_info_ptr create_no_audit_info();
-    audit(locator::shared_token_metadata& stm, sstring&& storage_helper_name,
+    audit(locator::shared_token_metadata& stm,
+          cql3::query_processor& qp,
+          service::migration_manager& mm,
+          std::set<sstring>&& audit_modes,
          std::set<sstring>&& audited_keyspaces,
          std::map<sstring, std::set<sstring>>&& audited_tables,
          category_set&& audited_categories,
          const db::config& cfg);
    ~audit();
-    future<> start(const db::config& cfg, cql3::query_processor& qp, service::migration_manager& mm);
+    future<> start(const db::config& cfg);
    future<> stop();
    future<> shutdown();
    bool should_log(const audit_info* audit_info) const;
--- a/audit/audit_cf_storage_helper.cc
+++ b/audit/audit_cf_storage_helper.cc
@@ -11,11 +11,11 @@
 #include "cql3/query_processor.hh"
 #include "data_dictionary/keyspace_metadata.hh"
 #include "utils/UUID_gen.hh"
-#include "utils/class_registrator.hh"
 #include "cql3/query_options.hh"
 #include "cql3/statements/ks_prop_defs.hh"
 #include "service/migration_manager.hh"
 #include "service/storage_proxy.hh"
+#include "locator/abstract_replication_strategy.hh"

 namespace audit {

@@ -64,8 +64,8 @@ future<> audit_cf_storage_helper::migrate_audit_table(service::group0_guard grou
            data_dictionary::database db = _qp.db();
            cql3::statements::ks_prop_defs old_ks_prop_defs;
            auto old_ks_metadata = old_ks_prop_defs.as_ks_metadata_update(
-                    ks->metadata(), *_qp.proxy().get_token_metadata_ptr(), db.features());
-            std::map<sstring, sstring> strategy_opts;
+                    ks->metadata(), *_qp.proxy().get_token_metadata_ptr(), db.features(), db.get_config());
+            locator::replication_strategy_config_options strategy_opts;
            for (const auto &dc: _qp.proxy().get_token_metadata_ptr()->get_topology().get_datacenters())
                strategy_opts[dc] = "3";

@@ -73,6 +73,7 @@ future<> audit_cf_storage_helper::migrate_audit_table(service::group0_guard grou
                                                                   "org.apache.cassandra.locator.NetworkTopologyStrategy",
                                                                   strategy_opts,
                                                                   std::nullopt, // initial_tablets
+                                                                   std::nullopt, // consistency_option
                                                                   old_ks_metadata->durable_writes(),
                                                                   old_ks_metadata->get_storage_options(),
                                                                   old_ks_metadata->tables());
@@ -196,7 +197,4 @@ cql3::query_options audit_cf_storage_helper::make_login_data(socket_address node
    return cql3::query_options(cql3::default_cql_config, db::consistency_level::ONE, std::nullopt, std::move(values), false, cql3::query_options::specific_options::DEFAULT);
 }

-using registry = class_registrator<storage_helper, audit_cf_storage_helper, cql3::query_processor&, service::migration_manager&>;
-static registry registrator1("audit_cf_storage_helper");
-
 }
--- a/audit/audit_composite_storage_helper.cc
+++ b/audit/audit_composite_storage_helper.cc
@@ -0,0 +1,68 @@
+/*
+ * Copyright (C) 2025 ScyllaDB
+ */
+
+/*
+ * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
+ */
+
+#include <seastar/core/loop.hh>
+#include <seastar/core/future-util.hh>
+
+#include "audit/audit_composite_storage_helper.hh"
+
+#include "utils/class_registrator.hh"
+
+namespace audit {
+
+audit_composite_storage_helper::audit_composite_storage_helper(std::vector<std::unique_ptr<storage_helper>>&& storage_helpers)
+    : _storage_helpers(std::move(storage_helpers))
+{}
+
+future<> audit_composite_storage_helper::start(const db::config& cfg) {
+    auto res = seastar::parallel_for_each(
+        _storage_helpers,
+        [&cfg] (std::unique_ptr<storage_helper>& h) {
+            return h->start(cfg);
+        }
+    );
+    return res;
+}
+
+future<> audit_composite_storage_helper::stop() {
+    auto res = seastar::parallel_for_each(
+        _storage_helpers,
+        [] (std::unique_ptr<storage_helper>& h) {
+            return h->stop();
+        }
+    );
+    return res;
+}
+
+future<> audit_composite_storage_helper::write(const audit_info* audit_info,
+                                               socket_address node_ip,
+                                               socket_address client_ip,
+                                               db::consistency_level cl,
+                                               const sstring& username,
+                                               bool error) {
+    return seastar::parallel_for_each(
+        _storage_helpers,
+        [audit_info, node_ip, client_ip, cl, &username, error](std::unique_ptr<storage_helper>& h) {
+            return h->write(audit_info, node_ip, client_ip, cl, username, error);
+        }
+    );
+}
+
+future<> audit_composite_storage_helper::write_login(const sstring& username,
+                                                     socket_address node_ip,
+                                                     socket_address client_ip,
+                                                     bool error) {
+    return seastar::parallel_for_each(
+        _storage_helpers,
+        [&username, node_ip, client_ip, error](std::unique_ptr<storage_helper>& h) {
+            return h->write_login(username, node_ip, client_ip, error);
+        }
+    );
+}
+
+} // namespace audit
--- a/audit/audit_composite_storage_helper.hh
+++ b/audit/audit_composite_storage_helper.hh
@@ -0,0 +1,37 @@
+/*
+ * Copyright (C) 2025 ScyllaDB
+ */
+
+/*
+ * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
+ */
+#pragma once
+
+#include "audit/audit.hh"
+#include <seastar/core/future.hh>
+
+#include "storage_helper.hh"
+
+namespace audit {
+
+class audit_composite_storage_helper : public storage_helper {
+    std::vector<std::unique_ptr<storage_helper>> _storage_helpers;
+
+public:
+    explicit audit_composite_storage_helper(std::vector<std::unique_ptr<storage_helper>>&&);
+    virtual ~audit_composite_storage_helper() = default;
+    virtual future<> start(const db::config& cfg) override;
+    virtual future<> stop() override;
+    virtual future<> write(const audit_info* audit_info,
+                           socket_address node_ip,
+                           socket_address client_ip,
+                           db::consistency_level cl,
+                           const sstring& username,
+                           bool error) override;
+    virtual future<> write_login(const sstring& username,
+                                 socket_address node_ip,
+                                 socket_address client_ip,
+                                 bool error) override;
+};
+
+} // namespace audit
--- a/audit/audit_syslog_storage_helper.cc
+++ b/audit/audit_syslog_storage_helper.cc
@@ -21,7 +21,6 @@
 #include <fmt/chrono.h>

 #include "cql3/query_processor.hh"
-#include "utils/class_registrator.hh"

 namespace cql3 {

@@ -143,7 +142,4 @@ future<> audit_syslog_storage_helper::write_login(const sstring& username,
    co_await syslog_send_helper(msg.c_str());
 }

-using registry = class_registrator<storage_helper, audit_syslog_storage_helper, cql3::query_processor&, service::migration_manager&>;
-static registry registrator1("audit_syslog_storage_helper");
-
 }
--- a/auth/CMakeLists.txt
+++ b/auth/CMakeLists.txt
@@ -9,6 +9,7 @@ target_sources(scylla_auth
    allow_all_authorizer.cc
    authenticated_user.cc
    authenticator.cc
+    cache.cc
    certificate_authenticator.cc
    common.cc
    default_authorizer.cc
@@ -44,5 +45,8 @@ target_link_libraries(scylla_auth

 add_whole_archive(auth scylla_auth)

+if (Scylla_USE_PRECOMPILED_HEADER_USE)
+  target_precompile_headers(scylla_auth REUSE_FROM scylla-precompiled-header)
+endif()
 check_headers(check-headers scylla_auth
  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
--- a/auth/allow_all_authenticator.cc
+++ b/auth/allow_all_authenticator.cc
@@ -23,6 +23,7 @@ static const class_registrator<
        cql3::query_processor&,
        ::service::raft_group0_client&,
        ::service::migration_manager&,
+        cache&,
        utils::alien_worker&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");

 }
--- a/auth/allow_all_authenticator.hh
+++ b/auth/allow_all_authenticator.hh
@@ -12,6 +12,7 @@

 #include "auth/authenticated_user.hh"
 #include "auth/authenticator.hh"
+#include "auth/cache.hh"
 #include "auth/common.hh"
 #include "utils/alien_worker.hh"

@@ -29,7 +30,7 @@ extern const std::string_view allow_all_authenticator_name;

 class allow_all_authenticator final : public authenticator {
 public:
-    allow_all_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&) {
+    allow_all_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&) {
    }

    virtual future<> start() override {
--- a/auth/cache.cc
+++ b/auth/cache.cc
@@ -0,0 +1,180 @@
+/*
+ * Copyright (C) 2017-present ScyllaDB
+ */
+
+/*
+ * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
+ */
+
+#include "auth/cache.hh"
+#include "auth/common.hh"
+#include "auth/roles-metadata.hh"
+#include "cql3/query_processor.hh"
+#include "cql3/untyped_result_set.hh"
+#include "db/consistency_level_type.hh"
+#include "db/system_keyspace.hh"
+#include "schema/schema.hh"
+#include <iterator>
+#include <seastar/coroutine/maybe_yield.hh>
+#include <seastar/core/format.hh>
+
+namespace auth {
+
+logging::logger logger("auth-cache");
+
+cache::cache(cql3::query_processor& qp) noexcept
+    : _current_version(0)
+    , _qp(qp) {
+}
+
+lw_shared_ptr<const cache::role_record> cache::get(const role_name_t& role) const noexcept {
+    auto it = _roles.find(role);
+    if (it == _roles.end()) {
+        return {};
+    }
+    return it->second;
+}
+
+future<lw_shared_ptr<cache::role_record>> cache::fetch_role(const role_name_t& role) const {
+    auto rec = make_lw_shared<role_record>();
+    rec->version = _current_version;
+
+    auto fetch = [this, &role](const sstring& q) {
+        return _qp.execute_internal(q, db::consistency_level::LOCAL_ONE,
+                internal_distributed_query_state(), {role},
+                cql3::query_processor::cache_internal::yes);
+    };
+    // roles
+    {
+        static const sstring q = format("SELECT * FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, meta::roles_table::name);
+        auto rs = co_await fetch(q);
+        if (!rs->empty()) {
+            auto& r = rs->one();
+            rec->is_superuser = r.get_or<bool>("is_superuser", false);
+            rec->can_login = r.get_or<bool>("can_login", false);
+            rec->salted_hash = r.get_or<sstring>("salted_hash", "");
+            if (r.has("member_of")) {
+                auto mo = r.get_set<sstring>("member_of");
+                rec->member_of.insert(
+                        std::make_move_iterator(mo.begin()),
+                        std::make_move_iterator(mo.end()));
+            }
+        } else {
+            // role got deleted
+            co_return nullptr;
+        }
+    }
+    // members
+    {
+        static const sstring q = format("SELECT role, member FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, ROLE_MEMBERS_CF);
+        auto rs = co_await fetch(q);
+        for (const auto& r : *rs) {
+            rec->members.insert(r.get_as<sstring>("member"));
+            co_await coroutine::maybe_yield();
+        }
+    }
+    // attributes
+    {
+        static const sstring q = format("SELECT role, name, value FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, ROLE_ATTRIBUTES_CF);
+        auto rs = co_await fetch(q);
+        for (const auto& r : *rs) {
+            rec->attributes[r.get_as<sstring>("name")] =
+                    r.get_as<sstring>("value");
+            co_await coroutine::maybe_yield();
+        }
+    }
+    // permissions
+    {
+        static const sstring q = format("SELECT role, resource, permissions FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, PERMISSIONS_CF);
+        auto rs = co_await fetch(q);
+        for (const auto& r : *rs) {
+            auto resource = r.get_as<sstring>("resource");
+            auto perms_strings = r.get_set<sstring>("permissions");
+            std::unordered_set<sstring> perms_set(perms_strings.begin(), perms_strings.end());
+            auto pset = permissions::from_strings(perms_set);
+            rec->permissions[std::move(resource)] = std::move(pset);
+            co_await coroutine::maybe_yield();
+        }
+    }
+    co_return rec;
+}
+
+future<> cache::prune_all() noexcept {
+    for (auto it = _roles.begin(); it != _roles.end(); ) {
+        if (it->second->version != _current_version) {
+            _roles.erase(it++);
+            co_await coroutine::maybe_yield();
+        } else {
+            ++it;
+        }
+    }
+    co_return;
+}
+
+future<> cache::load_all() {
+    if (legacy_mode(_qp)) {
+        co_return;
+    }
+    SCYLLA_ASSERT(this_shard_id() == 0);
+    ++_current_version;
+
+    logger.info("Loading all roles");
+    const uint32_t page_size = 128;
+    auto loader = [this](const cql3::untyped_result_set::row& r) -> future<stop_iteration> {
+        const auto name = r.get_as<sstring>("role");
+        auto role = co_await fetch_role(name);
+        if (role) {
+            _roles[name] = role;
+        }
+        co_return stop_iteration::no;
+    };
+    co_await _qp.query_internal(format("SELECT * FROM {}.{}",
+            db::system_keyspace::NAME, meta::roles_table::name),
+            db::consistency_level::LOCAL_ONE, {}, page_size, loader);
+
+    co_await prune_all();
+    for (const auto& [name, role] : _roles) {
+        co_await distribute_role(name, role);
+    }
+    co_await container().invoke_on_others([this](cache& c) -> future<> {
+        c._current_version = _current_version;
+        co_await c.prune_all();
+    });
+}
+
+future<> cache::load_roles(std::unordered_set<role_name_t> roles) {
+    if (legacy_mode(_qp)) {
+        co_return;
+    }
+    for (const auto& name : roles) {
+        logger.info("Loading role {}", name);
+        auto role = co_await fetch_role(name);
+         if (role) {
+            _roles[name] = role;
+        } else {
+            _roles.erase(name);
+        }
+        co_await distribute_role(name, role);
+    }
+}
+
+future<> cache::distribute_role(const role_name_t& name, lw_shared_ptr<role_record> role) {
+    auto role_ptr = role.get();
+    co_await container().invoke_on_others([&name, role_ptr](cache& c) {
+        if (!role_ptr) {
+            c._roles.erase(name);
+            return;
+        }
+        auto role_copy = make_lw_shared<role_record>(*role_ptr);
+        c._roles[name] = std::move(role_copy);
+    });
+}
+
+bool cache::includes_table(const table_id& id) noexcept {
+    return id == db::system_keyspace::roles()->id()
+            || id == db::system_keyspace::role_members()->id()
+            || id == db::system_keyspace::role_attributes()->id()
+            || id == db::system_keyspace::role_permissions()->id();
+}
+
+} // namespace auth
--- a/auth/cache.hh
+++ b/auth/cache.hh
@@ -0,0 +1,61 @@
+/*
+ * Copyright (C) 2025-present ScyllaDB
+ */
+
+/*
+ * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
+ */
+
+#pragma once
+
+#include <unordered_set>
+#include <unordered_map>
+
+#include <seastar/core/sstring.hh>
+#include <seastar/core/future.hh>
+#include <seastar/core/sharded.hh>
+#include <seastar/core/shared_ptr.hh>
+
+#include <absl/container/flat_hash_map.h>
+
+#include "auth/permission.hh"
+#include "auth/common.hh"
+
+namespace cql3 { class query_processor; }
+
+namespace auth {
+
+class cache : public peering_sharded_service<cache> {
+public:
+    using role_name_t = sstring;
+    using version_tag_t = char;
+
+	struct role_record {
+        bool can_login = false;
+        bool is_superuser = false;
+        std::unordered_set<role_name_t> member_of;
+        std::unordered_set<role_name_t> members;
+        sstring salted_hash;
+        std::unordered_map<sstring, sstring> attributes;
+        std::unordered_map<sstring, permission_set> permissions;
+        version_tag_t version; // used for seamless cache reloads
+    };
+
+    explicit cache(cql3::query_processor& qp) noexcept;
+    lw_shared_ptr<const role_record> get(const role_name_t& role) const noexcept;
+    future<> load_all();
+    future<> load_roles(std::unordered_set<role_name_t> roles);
+    static bool includes_table(const table_id&) noexcept;
+
+private:
+    using roles_map = absl::flat_hash_map<role_name_t, lw_shared_ptr<role_record>>;
+    roles_map _roles;
+    version_tag_t _current_version;
+    cql3::query_processor& _qp;
+
+    future<lw_shared_ptr<role_record>> fetch_role(const role_name_t& role) const;
+    future<> prune_all() noexcept;
+    future<> distribute_role(const role_name_t& name, const lw_shared_ptr<role_record> role);
+};
+
+} // namespace auth
--- a/auth/certificate_authenticator.cc
+++ b/auth/certificate_authenticator.cc
@@ -8,6 +8,7 @@
 */

 #include "auth/certificate_authenticator.hh"
+#include "auth/cache.hh"

 #include <boost/regex.hpp>
 #include <fmt/ranges.h>
@@ -34,13 +35,14 @@ static const class_registrator<auth::authenticator
    , cql3::query_processor&
    , ::service::raft_group0_client&
    , ::service::migration_manager&
+    , auth::cache&
    , utils::alien_worker&> cert_auth_reg(CERT_AUTH_NAME);

 enum class auth::certificate_authenticator::query_source {
    subject, altname
 };

-auth::certificate_authenticator::certificate_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&)
+auth::certificate_authenticator::certificate_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, auth::cache&, utils::alien_worker&)
    : _queries([&] {
        auto& conf = qp.db().get_config();
        auto queries = conf.auth_certificate_role_queries();
--- a/auth/certificate_authenticator.hh
+++ b/auth/certificate_authenticator.hh
@@ -26,13 +26,15 @@ class raft_group0_client;

 namespace auth {

+class cache;
+
 extern const std::string_view certificate_authenticator_name;

 class certificate_authenticator : public authenticator {
    enum class query_source;
    std::vector<std::pair<query_source, boost::regex>> _queries;
 public:
-    certificate_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&);
+    certificate_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&);
    ~certificate_authenticator();

    future<> start() override;
--- a/auth/common.hh
+++ b/auth/common.hh
@@ -48,6 +48,10 @@ extern constinit const std::string_view AUTH_PACKAGE_NAME;

 } // namespace meta

+constexpr std::string_view PERMISSIONS_CF = "role_permissions";
+constexpr std::string_view ROLE_MEMBERS_CF = "role_members";
+constexpr std::string_view ROLE_ATTRIBUTES_CF = "role_attributes";
+
 // This is a helper to check whether auth-v2 is on.
 bool legacy_mode(cql3::query_processor& qp);

--- a/auth/default_authorizer.cc
+++ b/auth/default_authorizer.cc
@@ -37,7 +37,6 @@ std::string_view default_authorizer::qualified_java_name() const {
 static constexpr std::string_view ROLE_NAME = "role";
 static constexpr std::string_view RESOURCE_NAME = "resource";
 static constexpr std::string_view PERMISSIONS_NAME = "permissions";
-static constexpr std::string_view PERMISSIONS_CF = "role_permissions";

 static logging::logger alogger("default_authorizer");

--- a/auth/ldap_role_manager.cc
+++ b/auth/ldap_role_manager.cc
@@ -83,17 +83,18 @@ static const class_registrator<
    ldap_role_manager,
    cql3::query_processor&,
    ::service::raft_group0_client&,
-    ::service::migration_manager&> registration(ldap_role_manager_full_name);
+    ::service::migration_manager&,
+    cache&> registration(ldap_role_manager_full_name);

 ldap_role_manager::ldap_role_manager(
        std::string_view query_template, std::string_view target_attr, std::string_view bind_name, std::string_view bind_password,
-        cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm)
-        : _std_mgr(qp, rg0c, mm), _group0_client(rg0c), _query_template(query_template), _target_attr(target_attr), _bind_name(bind_name)
+        cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache)
+        : _std_mgr(qp, rg0c, mm, cache), _group0_client(rg0c), _query_template(query_template), _target_attr(target_attr), _bind_name(bind_name)
        , _bind_password(bind_password)
        , _connection_factory(bind(std::mem_fn(&ldap_role_manager::reconnect), std::ref(*this))) {
 }

-ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm)
+ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache)
    : ldap_role_manager(
            qp.db().get_config().ldap_url_template(),
            qp.db().get_config().ldap_attr_role(),
@@ -101,7 +102,8 @@ ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_
            qp.db().get_config().ldap_bind_passwd(),
            qp,
            rg0c,
-            mm) {
+            mm,
+            cache) {
 }

 std::string_view ldap_role_manager::qualified_java_name() const noexcept {
--- a/auth/ldap_role_manager.hh
+++ b/auth/ldap_role_manager.hh
@@ -14,6 +14,7 @@

 #include "ent/ldap/ldap_connection.hh"
 #include "standard_role_manager.hh"
+#include "auth/cache.hh"

 namespace auth {

@@ -43,12 +44,13 @@ class ldap_role_manager : public role_manager {
            std::string_view bind_password, ///< LDAP bind credentials.
            cql3::query_processor& qp, ///< Passed to standard_role_manager.
            ::service::raft_group0_client& rg0c, ///< Passed to standard_role_manager.
-            ::service::migration_manager& mm ///< Passed to standard_role_manager.
+            ::service::migration_manager& mm, ///< Passed to standard_role_manager.
+            cache& cache ///< Passed to standard_role_manager.
    );

    /// Retrieves LDAP configuration entries from qp and invokes the other constructor.  Required by
    /// class_registrator<role_manager>.
-    ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm);
+    ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache);

    /// Thrown when query-template parsing fails.
    struct url_error : public std::runtime_error {
--- a/auth/maintenance_socket_role_manager.cc
+++ b/auth/maintenance_socket_role_manager.cc
@@ -11,6 +11,7 @@
 #include <seastar/core/future.hh>
 #include <stdexcept>
 #include <string_view>
+#include "auth/cache.hh"
 #include "cql3/description.hh"
 #include "utils/class_registrator.hh"

@@ -23,7 +24,8 @@ static const class_registrator<
        maintenance_socket_role_manager,
        cql3::query_processor&,
        ::service::raft_group0_client&,
-        ::service::migration_manager&> registration(sstring{maintenance_socket_role_manager_name});
+        ::service::migration_manager&,
+        cache&> registration(sstring{maintenance_socket_role_manager_name});


 std::string_view maintenance_socket_role_manager::qualified_java_name() const noexcept {
--- a/auth/maintenance_socket_role_manager.hh
+++ b/auth/maintenance_socket_role_manager.hh
@@ -8,6 +8,7 @@

 #pragma once

+#include "auth/cache.hh"
 #include "auth/resource.hh"
 #include "auth/role_manager.hh"
 #include <seastar/core/future.hh>
@@ -29,7 +30,7 @@ extern const std::string_view maintenance_socket_role_manager_name;
 // system_auth keyspace, which may be not yet created when the maintenance socket starts listening.
 class maintenance_socket_role_manager final : public role_manager {
 public:
-    maintenance_socket_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&) {}
+    maintenance_socket_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&) {}

    virtual std::string_view qualified_java_name() const noexcept override;

--- a/auth/password_authenticator.cc
+++ b/auth/password_authenticator.cc
@@ -49,6 +49,7 @@ static const class_registrator<
        cql3::query_processor&,
        ::service::raft_group0_client&,
        ::service::migration_manager&,
+        cache&,
        utils::alien_worker&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");

 static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}());
@@ -63,10 +64,11 @@ std::string password_authenticator::default_superuser(const db::config& cfg) {
 password_authenticator::~password_authenticator() {
 }

-password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, utils::alien_worker& hashing_worker)
+password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache, utils::alien_worker& hashing_worker)
    : _qp(qp)
    , _group0_client(g0)
    , _migration_manager(mm)
+    , _cache(cache)
    , _stopped(make_ready_future<>()) 
    , _superuser(default_superuser(qp.db().get_config()))
    , _hashing_worker(hashing_worker)
@@ -315,11 +317,20 @@ future<authenticated_user> password_authenticator::authenticate(
    const sstring password = credentials.at(PASSWORD_KEY);

    try {
-        const std::optional<sstring> salted_hash = co_await get_password_hash(username);
-        if (!salted_hash) {
-            throw exceptions::authentication_exception("Username and/or password are incorrect");
+        std::optional<sstring> salted_hash;
+        if (legacy_mode(_qp)) {
+            salted_hash = co_await get_password_hash(username);
+            if (!salted_hash) {
+                throw exceptions::authentication_exception("Username and/or password are incorrect");
+            }
+        } else {
+            auto role = _cache.get(username);
+            if (!role || role->salted_hash.empty()) {
+                throw exceptions::authentication_exception("Username and/or password are incorrect");
+            }
+            salted_hash = role->salted_hash;
        }
-        const bool password_match = co_await _hashing_worker.submit<bool>([password = std::move(password), salted_hash = std::move(salted_hash)]{
+        const bool password_match = co_await _hashing_worker.submit<bool>([password = std::move(password), salted_hash] {
            return passwords::check(password, *salted_hash);
        });
        if (!password_match) {
--- a/auth/password_authenticator.hh
+++ b/auth/password_authenticator.hh
@@ -16,6 +16,7 @@
 #include "db/consistency_level_type.hh"
 #include "auth/authenticator.hh"
 #include "auth/passwords.hh"
+#include "auth/cache.hh"
 #include "service/raft/raft_group0_client.hh"
 #include "utils/alien_worker.hh"

@@ -41,6 +42,7 @@ class password_authenticator : public authenticator {
    cql3::query_processor& _qp;
    ::service::raft_group0_client& _group0_client;
    ::service::migration_manager& _migration_manager;
+    cache& _cache;
    future<> _stopped;
    abort_source _as;
    std::string _superuser; // default superuser name from the config (may or may not be present in roles table)
@@ -53,7 +55,7 @@ public:
    static db::consistency_level consistency_for_user(std::string_view role_name);
    static std::string default_superuser(const db::config&);

-    password_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&);
+    password_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&);

    ~password_authenticator();

--- a/auth/saslauthd_authenticator.cc
+++ b/auth/saslauthd_authenticator.cc
@@ -35,9 +35,10 @@ static const class_registrator<
        cql3::query_processor&,
        ::service::raft_group0_client&,
        ::service::migration_manager&,
+        cache&,
        utils::alien_worker&> saslauthd_auth_reg("com.scylladb.auth.SaslauthdAuthenticator");

-saslauthd_authenticator::saslauthd_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&)
+saslauthd_authenticator::saslauthd_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&)
    : _socket_path(qp.db().get_config().saslauthd_socket_path())
 {}

--- a/auth/saslauthd_authenticator.hh
+++ b/auth/saslauthd_authenticator.hh
@@ -11,6 +11,7 @@
 #pragma once

 #include "auth/authenticator.hh"
+#include "auth/cache.hh"
 #include "utils/alien_worker.hh"

 namespace cql3 {
@@ -29,7 +30,7 @@ namespace auth {
 class saslauthd_authenticator : public authenticator {
    sstring _socket_path; ///< Path to the domain socket on which saslauthd is listening.
 public:
-    saslauthd_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&);
+    saslauthd_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&,utils::alien_worker&);

    future<> start() override;

--- a/auth/service.cc
+++ b/auth/service.cc
@@ -17,6 +17,7 @@
 #include <chrono>

 #include <seastar/core/future-util.hh>
+#include <seastar/core/shard_id.hh>
 #include <seastar/core/sharded.hh>
 #include <seastar/core/shared_ptr.hh>

@@ -157,6 +158,7 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n

 service::service(
        utils::loading_cache_config c,
+        cache& cache,
        cql3::query_processor& qp,
        ::service::raft_group0_client& g0,
        ::service::migration_notifier& mn,
@@ -166,6 +168,7 @@ service::service(
        maintenance_socket_enabled used_by_maintenance_socket)
            : _loading_cache_config(std::move(c))
            , _permissions_cache(nullptr)
+            , _cache(cache)
            , _qp(qp)
            , _group0_client(g0)
            , _mnotifier(mn)
@@ -188,15 +191,17 @@ service::service(
        ::service::migration_manager& mm,
        const service_config& sc,
        maintenance_socket_enabled used_by_maintenance_socket,
+        cache& cache,
        utils::alien_worker& hashing_worker)
            : service(
                      std::move(c),
+                      cache,
                      qp,
                      g0,
                      mn,
                      create_object<authorizer>(sc.authorizer_java_name, qp, g0, mm),
-                      create_object<authenticator>(sc.authenticator_java_name, qp, g0, mm, hashing_worker),
-                      create_object<role_manager>(sc.role_manager_java_name, qp, g0, mm),
+                      create_object<authenticator>(sc.authenticator_java_name, qp, g0, mm, cache, hashing_worker),
+                      create_object<role_manager>(sc.role_manager_java_name, qp, g0, mm, cache),
                      used_by_maintenance_socket) {
 }

@@ -215,6 +220,7 @@ future<> service::create_legacy_keyspace_if_missing(::service::migration_manager
                    meta::legacy::AUTH_KS,
                    "org.apache.cassandra.locator.SimpleStrategy",
                    opts,
+                    std::nullopt,
                    std::nullopt);

            try {
@@ -231,6 +237,9 @@ future<> service::start(::service::migration_manager& mm, db::system_keyspace& s
    auto auth_version = co_await sys_ks.get_auth_version();
    // version is set in query processor to be easily available in various places we call auth::legacy_mode check.
    _qp.auth_version = auth_version;
+    if (this_shard_id() == 0) {
+        co_await _cache.load_all();
+    }
    if (!_used_by_maintenance_socket) {
        // this legacy keyspace is only used by cqlsh
        // it's needed when executing `list roles` or `list users`
--- a/auth/service.hh
+++ b/auth/service.hh
@@ -21,6 +21,7 @@
 #include "auth/authorizer.hh"
 #include "auth/permission.hh"
 #include "auth/permissions_cache.hh"
+#include "auth/cache.hh"
 #include "auth/role_manager.hh"
 #include "auth/common.hh"
 #include "cql3/description.hh"
@@ -77,6 +78,7 @@ public:
 class service final : public seastar::peering_sharded_service<service> {
    utils::loading_cache_config _loading_cache_config;
    std::unique_ptr<permissions_cache> _permissions_cache;
+    cache& _cache;

    cql3::query_processor& _qp;

@@ -107,6 +109,7 @@ class service final : public seastar::peering_sharded_service<service> {
 public:
    service(
            utils::loading_cache_config,
+            cache& cache,
            cql3::query_processor&,
            ::service::raft_group0_client&,
            ::service::migration_notifier&,
@@ -128,6 +131,7 @@ public:
            ::service::migration_manager&,
            const service_config&,
            maintenance_socket_enabled,
+            cache&,
            utils::alien_worker&);

    future<> start(::service::migration_manager&, db::system_keyspace&);
--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -41,21 +41,6 @@

 namespace auth {

-namespace meta {
-
-namespace role_members_table {
-
-constexpr std::string_view name{"role_members" , 12};
-
-}
-
-namespace role_attributes_table {
-
-constexpr std::string_view name{"role_attributes", 15};
-
-}
-
-}

 static logging::logger log("standard_role_manager");

@@ -64,7 +49,8 @@ static const class_registrator<
        standard_role_manager,
        cql3::query_processor&,
        ::service::raft_group0_client&,
-        ::service::migration_manager&> registration("org.apache.cassandra.auth.CassandraRoleManager");
+        ::service::migration_manager&,
+        cache&> registration("org.apache.cassandra.auth.CassandraRoleManager");

 struct record final {
    sstring name;
@@ -121,10 +107,11 @@ static bool has_can_login(const cql3::untyped_result_set_row& row) {
    return row.has("can_login") && !(boolean_type->deserialize(row.get_blob_unfragmented("can_login")).is_null());
 }

-standard_role_manager::standard_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm)
+standard_role_manager::standard_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache)
    : _qp(qp)
    , _group0_client(g0)
    , _migration_manager(mm)
+    , _cache(cache)
    , _stopped(make_ready_future<>())
    , _superuser(password_authenticator::default_superuser(qp.db().get_config()))
 {}
@@ -136,7 +123,7 @@ std::string_view standard_role_manager::qualified_java_name() const noexcept {
 const resource_set& standard_role_manager::protected_resources() const {
    static const resource_set resources({
            make_data_resource(meta::legacy::AUTH_KS, meta::roles_table::name),
-            make_data_resource(meta::legacy::AUTH_KS, meta::role_members_table::name)});
+            make_data_resource(meta::legacy::AUTH_KS, ROLE_MEMBERS_CF)});

    return resources;
 }
@@ -160,7 +147,7 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
            "  PRIMARY KEY (role, member)"
            ")",
            meta::legacy::AUTH_KS,
-            meta::role_members_table::name);
+            ROLE_MEMBERS_CF);
    static const sstring create_role_attributes_query = seastar::format(
            "CREATE TABLE {}.{} ("
            "  role text,"
@@ -169,7 +156,7 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
            "  PRIMARY KEY(role, name)"
            ")",
            meta::legacy::AUTH_KS,
-            meta::role_attributes_table::name);
+            ROLE_ATTRIBUTES_CF);
    return when_all_succeed(
            create_legacy_metadata_table_if_missing(
                    meta::roles_table::name,
@@ -177,12 +164,12 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
                    create_roles_query,
                    _migration_manager),
            create_legacy_metadata_table_if_missing(
-                    meta::role_members_table::name,
+                    ROLE_MEMBERS_CF,
                    _qp,
                    create_role_members_query,
                    _migration_manager),
            create_legacy_metadata_table_if_missing(
-                    meta::role_attributes_table::name,
+                    ROLE_ATTRIBUTES_CF,
                    _qp,
                    create_role_attributes_query,
                    _migration_manager)).discard_result();
@@ -429,7 +416,7 @@ future<> standard_role_manager::drop(std::string_view role_name, ::service::grou
    const auto revoke_from_members = [this, role_name, &mc] () -> future<> {
        const sstring query = seastar::format("SELECT member FROM {}.{} WHERE role = ?",
                get_auth_ks_name(_qp),
-                meta::role_members_table::name);
+                ROLE_MEMBERS_CF);
        const auto members = co_await _qp.execute_internal(
                query,
                consistency_for_role(role_name),
@@ -461,7 +448,7 @@ future<> standard_role_manager::drop(std::string_view role_name, ::service::grou
    const auto remove_attributes_of = [this, role_name, &mc] () -> future<> {
        const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ?",
                get_auth_ks_name(_qp),
-                meta::role_attributes_table::name);
+                ROLE_ATTRIBUTES_CF);
        if (legacy_mode(_qp)) {
            co_await _qp.execute_internal(query, {sstring(role_name)},
                cql3::query_processor::cache_internal::yes).discard_result();
@@ -517,7 +504,7 @@ standard_role_manager::legacy_modify_membership(
            case membership_change::add: {
                const sstring insert_query = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)",
                        get_auth_ks_name(_qp),
-                        meta::role_members_table::name);
+                        ROLE_MEMBERS_CF);
                co_return co_await _qp.execute_internal(
                        insert_query,
                        consistency_for_role(role_name),
@@ -529,7 +516,7 @@ standard_role_manager::legacy_modify_membership(
            case membership_change::remove: {
                const sstring delete_query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?",
                        get_auth_ks_name(_qp),
-                        meta::role_members_table::name);
+                        ROLE_MEMBERS_CF);
                co_return co_await _qp.execute_internal(
                        delete_query,
                        consistency_for_role(role_name),
@@ -567,12 +554,12 @@ standard_role_manager::modify_membership(
    case membership_change::add:
        modify_role_members = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)",
                get_auth_ks_name(_qp),
-                meta::role_members_table::name);
+                ROLE_MEMBERS_CF);
        break;
    case membership_change::remove:
        modify_role_members = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?",
                get_auth_ks_name(_qp),
-                meta::role_members_table::name);
+                ROLE_MEMBERS_CF);
        break;
    default:
        on_internal_error(log, format("unknown membership_change value: {}", int(ch)));
@@ -666,7 +653,7 @@ future<role_set> standard_role_manager::query_granted(std::string_view grantee_n
 future<role_to_directly_granted_map> standard_role_manager::query_all_directly_granted(::service::query_state& qs) {
    const sstring query = seastar::format("SELECT * FROM {}.{}",
            get_auth_ks_name(_qp),
-            meta::role_members_table::name);
+            ROLE_MEMBERS_CF);

    const auto results = co_await _qp.execute_internal(
            query,
@@ -731,15 +718,21 @@ future<bool> standard_role_manager::is_superuser(std::string_view role_name) {
 }

 future<bool> standard_role_manager::can_login(std::string_view role_name) {
-    return require_record(_qp, role_name).then([](record r) {
-        return r.can_login;
-    });
+    if (legacy_mode(_qp)) {
+       const auto r = co_await require_record(_qp, role_name);
+       co_return r.can_login;
+    }
+    auto role = _cache.get(sstring(role_name));
+    if (!role) {
+        throw nonexistant_role(role_name);
+    }
+    co_return role->can_login;
 }

 future<std::optional<sstring>> standard_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state& qs) {
    const sstring query = seastar::format("SELECT name, value FROM {}.{} WHERE role = ? AND name = ?",
            get_auth_ks_name(_qp),
-            meta::role_attributes_table::name);
+            ROLE_ATTRIBUTES_CF);
    const auto result_set = co_await _qp.execute_internal(query, db::consistency_level::ONE, qs, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes);
    if (!result_set->empty()) {
        const cql3::untyped_result_set_row &row = result_set->one();
@@ -770,7 +763,7 @@ future<> standard_role_manager::set_attribute(std::string_view role_name, std::s
    }
    const sstring query = seastar::format("INSERT INTO {}.{} (role, name, value)  VALUES (?, ?, ?)",
            get_auth_ks_name(_qp),
-            meta::role_attributes_table::name);
+            ROLE_ATTRIBUTES_CF);
    if (legacy_mode(_qp)) {
        co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name), sstring(attribute_value)}, cql3::query_processor::cache_internal::yes).discard_result();
    } else {
@@ -785,7 +778,7 @@ future<> standard_role_manager::remove_attribute(std::string_view role_name, std
    }
    const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND name = ?",
            get_auth_ks_name(_qp),
-            meta::role_attributes_table::name);
+            ROLE_ATTRIBUTES_CF);
    if (legacy_mode(_qp)) {
        co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes).discard_result();
    } else {
--- a/auth/standard_role_manager.hh
+++ b/auth/standard_role_manager.hh
@@ -10,6 +10,7 @@

 #include "auth/common.hh"
 #include "auth/role_manager.hh"
+#include "auth/cache.hh"

 #include <string_view>

@@ -36,13 +37,14 @@ class standard_role_manager final : public role_manager {
    cql3::query_processor& _qp;
    ::service::raft_group0_client& _group0_client;
    ::service::migration_manager& _migration_manager;
+    cache& _cache;
    future<> _stopped;
    abort_source _as;
    std::string _superuser;
    shared_promise<> _superuser_created_promise;

 public:
-    standard_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&);
+    standard_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&);

    virtual std::string_view qualified_java_name() const noexcept override;

--- a/auth/transitional.cc
+++ b/auth/transitional.cc
@@ -13,6 +13,7 @@
 #include "auth/authorizer.hh"
 #include "auth/default_authorizer.hh"
 #include "auth/password_authenticator.hh"
+#include "auth/cache.hh"
 #include "auth/permission.hh"
 #include "service/raft/raft_group0_client.hh"
 #include "utils/class_registrator.hh"
@@ -37,8 +38,8 @@ class transitional_authenticator : public authenticator {
 public:
    static const sstring PASSWORD_AUTHENTICATOR_NAME;

-    transitional_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, utils::alien_worker& hashing_worker)
-            : transitional_authenticator(std::make_unique<password_authenticator>(qp, g0, mm, hashing_worker)) {
+    transitional_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache, utils::alien_worker& hashing_worker)
+            : transitional_authenticator(std::make_unique<password_authenticator>(qp, g0, mm, cache, hashing_worker)) {
    }
    transitional_authenticator(std::unique_ptr<authenticator> a)
            : _authenticator(std::move(a)) {
@@ -240,6 +241,7 @@ static const class_registrator<
        cql3::query_processor&,
        ::service::raft_group0_client&,
        ::service::migration_manager&,
+        auth::cache&,
        utils::alien_worker&> transitional_authenticator_reg(auth::PACKAGE_NAME + "TransitionalAuthenticator");

 static const class_registrator<
--- a/backlog_controller.hh
+++ b/backlog_controller.hh
@@ -15,6 +15,7 @@
 #include <cmath>

 #include "seastarx.hh"
+#include "backlog_controller_fwd.hh"

 // Simple proportional controller to adjust shares for processes for which a backlog can be clearly
 // defined.
@@ -128,11 +129,21 @@ public:
    static constexpr unsigned normalization_factor = 30;
    static constexpr float disable_backlog = std::numeric_limits<double>::infinity();
    static constexpr float backlog_disabled(float backlog) { return std::isinf(backlog); }
-    compaction_controller(backlog_controller::scheduling_group sg, float static_shares, std::chrono::milliseconds interval, std::function<float()> current_backlog)
+    static inline const std::vector<backlog_controller::control_point> default_control_points = {
+            backlog_controller::control_point{0.0, 50}, {1.5, 100}, {normalization_factor, default_compaction_maximum_shares}};
+    compaction_controller(backlog_controller::scheduling_group sg, float static_shares, std::optional<float> max_shares,
+        std::chrono::milliseconds interval, std::function<float()> current_backlog)
        : backlog_controller(std::move(sg), std::move(interval),
-          std::vector<backlog_controller::control_point>({{0.0, 50}, {1.5, 100} , {normalization_factor, 1000}}),
+          default_control_points,
          std::move(current_backlog),
          static_shares
        )
-    {}
+    {
+        if (max_shares) {
+            set_max_shares(*max_shares);
+        }
+    }
+
+    // Updates the maximum output value for control points.
+    void set_max_shares(float max_shares);
 };
--- a/backlog_controller_fwd.hh
+++ b/backlog_controller_fwd.hh
@@ -0,0 +1,13 @@
+/*
+ * Copyright (C) 2025-present ScyllaDB
+ */
+
+/*
+ * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
+ */
+
+#pragma once
+
+#include <cstdint>
+
+static constexpr uint64_t default_compaction_maximum_shares = 1000;
--- a/cdc/CMakeLists.txt
+++ b/cdc/CMakeLists.txt
@@ -17,5 +17,8 @@ target_link_libraries(cdc
  PRIVATE
    replica)

+if (Scylla_USE_PRECOMPILED_HEADER_USE)
+  target_precompile_headers(cdc REUSE_FROM scylla-precompiled-header)
+endif()
 check_headers(check-headers cdc
  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
--- a/cdc/log.cc
+++ b/cdc/log.cc
@@ -23,7 +23,9 @@
 #include "bytes.hh"
 #include "index/vector_index.hh"
 #include "locator/abstract_replication_strategy.hh"
+#include "locator/topology.hh"
 #include "replica/database.hh"
+#include "db/config.hh"
 #include "db/schema_tables.hh"
 #include "gms/feature_service.hh"
 #include "schema/schema.hh"
@@ -62,9 +64,9 @@ logging::logger cdc_log("cdc");

 namespace {

-shared_ptr<locator::abstract_replication_strategy> generate_replication_strategy(const keyspace_metadata& ksm) {
-    locator::replication_strategy_params params(ksm.strategy_options(), ksm.initial_tablets());
-    return locator::abstract_replication_strategy::create_replication_strategy(ksm.strategy_name(), params);
+shared_ptr<locator::abstract_replication_strategy> generate_replication_strategy(const keyspace_metadata& ksm, const locator::topology& topo) {
+    locator::replication_strategy_params params(ksm.strategy_options(), ksm.initial_tablets(), ksm.consistency_option());
+    return locator::abstract_replication_strategy::create_replication_strategy(ksm.strategy_name(), params, topo);
 }

 // When dropping a column from a CDC log table, we set the drop timestamp
@@ -202,7 +204,7 @@ public:
            check_that_cdc_log_table_does_not_exist(db, schema, logname);
            ensure_that_table_has_no_counter_columns(schema);
            if (!db.features().cdc_with_tablets) {
-                ensure_that_table_uses_vnodes(ksm, schema);
+                ensure_that_table_uses_vnodes(ksm, schema, db.get_token_metadata().get_topology());
            }

            // in seastar thread
@@ -249,7 +251,7 @@ public:
            check_for_attempt_to_create_nested_cdc_log(db, new_schema);
            ensure_that_table_has_no_counter_columns(new_schema);
            if (!db.features().cdc_with_tablets) {
-                ensure_that_table_uses_vnodes(*keyspace.metadata(), new_schema);
+                ensure_that_table_uses_vnodes(*keyspace.metadata(), new_schema, db.get_token_metadata().get_topology());
            }

            std::optional<table_id> maybe_id = log_schema ? std::make_optional(log_schema->id()) : std::nullopt;
@@ -316,7 +318,8 @@ public:
        lowres_clock::time_point timeout,
        utils::chunked_vector<mutation>&& mutations,
        tracing::trace_state_ptr tr_state,
-        db::consistency_level write_cl
+        db::consistency_level write_cl,
+        per_request_options options
    );

    template<typename Iter>
@@ -350,8 +353,8 @@ private:
    // Until we support CDC with tablets (issue #16317), we can't allow this
    // to be attempted - in particular the log table we try to create will not
    // have tablets, and will cause a failure.
-    static void ensure_that_table_uses_vnodes(const keyspace_metadata& ksm, const schema& schema) {
-        auto rs = generate_replication_strategy(ksm);
+    static void ensure_that_table_uses_vnodes(const keyspace_metadata& ksm, const schema& schema, const locator::topology& topo) {
+        auto rs = generate_replication_strategy(ksm, topo);
        if (rs->uses_tablets()) {
            throw exceptions::invalid_request_exception(format("Cannot create CDC log for a table {}.{}, because the keyspace uses tablets, and not all nodes support the CDC with tablets feature.",
                schema.ks_name(), schema.cf_name()));
@@ -584,11 +587,9 @@ bytes log_data_column_deleted_elements_name_bytes(const bytes& column_name) {
    return to_bytes(cdc_deleted_elements_column_prefix) + column_name;
 }

-static schema_ptr create_log_schema(const schema& s, const replica::database& db,
-        const keyspace_metadata& ksm, api::timestamp_type timestamp, std::optional<table_id> uuid, schema_ptr old)
+static void set_default_properties_log_table(schema_builder& b, const schema& s,
+        const replica::database& db, const keyspace_metadata& ksm)
 {
-    schema_builder b(s.ks_name(), log_name(s.cf_name()));
-    b.with_partitioner(cdc::cdc_partitioner::classname);
    b.set_compaction_strategy(compaction::compaction_strategy_type::time_window);
    b.set_comment(fmt::format("CDC log for {}.{}", s.ks_name(), s.cf_name()));
    auto ttl_seconds = s.cdc_options().ttl();
@@ -614,13 +615,22 @@ static schema_ptr create_log_schema(const schema& s, const replica::database& db
                        std::to_string(std::max(1, window_seconds / 2))},
        });
    }
+    b.set_caching_options(caching_options::get_disabled_caching_options());
+
+    auto rs = generate_replication_strategy(ksm, db.get_token_metadata().get_topology());
+    auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(*rs, db.get_token_metadata(), false));
+    b.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
+}
+
+static void add_columns_to_cdc_log(schema_builder& b, const schema& s,
+        const api::timestamp_type timestamp, const schema_ptr old)
+{
    b.with_column(log_meta_column_name_bytes("stream_id"), bytes_type, column_kind::partition_key);
    b.with_column(log_meta_column_name_bytes("time"), timeuuid_type, column_kind::clustering_key);
    b.with_column(log_meta_column_name_bytes("batch_seq_no"), int32_type, column_kind::clustering_key);
    b.with_column(log_meta_column_name_bytes("operation"), data_type_for<operation_native_type>());
    b.with_column(log_meta_column_name_bytes("ttl"), long_type);
    b.with_column(log_meta_column_name_bytes("end_of_batch"), boolean_type);
-    b.set_caching_options(caching_options::get_disabled_caching_options());

    auto validate_new_column = [&] (const sstring& name) {
        // When dropping a column from a CDC log table, we set the drop timestamp to be
@@ -690,15 +700,28 @@ static schema_ptr create_log_schema(const schema& s, const replica::database& db
    add_columns(s.clustering_key_columns());
    add_columns(s.static_columns(), true);
    add_columns(s.regular_columns(), true);
+}
+
+static schema_ptr create_log_schema(const schema& s, const replica::database& db,
+        const keyspace_metadata& ksm, api::timestamp_type timestamp, std::optional<table_id> uuid, schema_ptr old)
+{
+    schema_builder b(s.ks_name(), log_name(s.cf_name()));
+
+    b.with_partitioner(cdc::cdc_partitioner::classname);
+
+    if (old) {
+        // If the user reattaches the log table, do not change its properties.
+        b.set_properties(old->get_properties());
+    } else {
+        set_default_properties_log_table(b, s, db, ksm);
+    }
+
+    add_columns_to_cdc_log(b, s, timestamp, old);

    if (uuid) {
        b.set_uuid(*uuid);
    }

-    auto rs = generate_replication_strategy(ksm);
-    auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(*rs, db.get_token_metadata(), false));
-    b.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
-
    /**
     * #10473 - if we are redefining the log table, we need to ensure any dropped
     * columns are registered in "dropped_columns" table, otherwise clients will not
@@ -929,9 +952,6 @@ static managed_bytes merge(const abstract_type& type, const managed_bytes_opt& p
    throw std::runtime_error(format("cdc merge: unknown type {}", type.name()));
 }

-using cell_map = std::unordered_map<const column_definition*, managed_bytes_opt>;
-using row_states_map = std::unordered_map<clustering_key, cell_map, clustering_key::hashing, clustering_key::equality>;
-
 static managed_bytes_opt get_col_from_row_state(const cell_map* state, const column_definition& cdef) {
    if (state) {
        if (auto it = state->find(&cdef); it != state->end()) {
@@ -941,7 +961,12 @@ static managed_bytes_opt get_col_from_row_state(const cell_map* state, const col
    return std::nullopt;
 }

-static cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck) {
+cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck) {
+    auto it = row_states.find(ck);
+    return it == row_states.end() ? nullptr : &it->second;
+}
+
+const cell_map* get_row_state(const row_states_map& row_states, const clustering_key& ck) {
    auto it = row_states.find(ck);
    return it == row_states.end() ? nullptr : &it->second;
 }
@@ -1394,6 +1419,13 @@ struct process_row_visitor {
 };

 struct process_change_visitor {
+    const per_request_options& _request_options;
+    // The types of the operations used for row / partition deletes. Introduced
+    // to differentiate service operations (e.g. operation::service_row_delete
+    // vs operation::row_delete).
+    const operation _row_delete_op = operation::row_delete;
+    const operation _partition_delete_op = operation::partition_delete;
+
    stats::part_type_set& _touched_parts;

    log_mutation_builder& _builder;
@@ -1404,6 +1436,8 @@ struct process_change_visitor {
    row_states_map& _clustering_row_states;
    cell_map& _static_row_state;

+    const bool _is_update = false;
+
    const bool _generate_delta_values = true;

    void static_row_cells(auto&& visit_row_cells) {
@@ -1427,12 +1461,13 @@ struct process_change_visitor {

        struct clustering_row_cells_visitor : public process_row_visitor {
            operation _cdc_op = operation::update;
+            operation _marker_op = operation::insert;

            using process_row_visitor::process_row_visitor;

            void marker(const row_marker& rm) {
                _ttl_column = get_ttl(rm);
-                _cdc_op = operation::insert;
+                _cdc_op = _marker_op;
            }
        };

@@ -1440,6 +1475,9 @@ struct process_change_visitor {
                log_ck, _touched_parts, _builder,
                _enable_updating_state, &ckey, get_row_state(_clustering_row_states, ckey),
                _clustering_row_states, _generate_delta_values);
+        if (_is_update && _request_options.alternator) {
+            v._marker_op = operation::update;
+        }
        visit_row_cells(v);

        if (_enable_updating_state) {
@@ -1456,7 +1494,7 @@ struct process_change_visitor {
    void clustered_row_delete(const clustering_key& ckey, const tombstone&) {
        _touched_parts.set<stats::part_type::ROW_DELETE>();

-        auto log_ck = _builder.allocate_new_log_row(operation::row_delete);
+        auto log_ck = _builder.allocate_new_log_row(_row_delete_op);
        _builder.set_clustering_columns(log_ck, ckey);

        if (_enable_updating_state && get_row_state(_clustering_row_states, ckey)) {
@@ -1500,7 +1538,7 @@ struct process_change_visitor {

    void partition_delete(const tombstone&) {
        _touched_parts.set<stats::part_type::PARTITION_DELETE>();
-        auto log_ck = _builder.allocate_new_log_row(operation::partition_delete);
+        auto log_ck = _builder.allocate_new_log_row(_partition_delete_op);
        if (_enable_updating_state) {
            _clustering_row_states.clear();
        }
@@ -1515,6 +1553,7 @@ private:
    schema_ptr _schema;
    dht::decorated_key _dk;
    schema_ptr _log_schema;
+    const per_request_options& _options;

    /**
     * #6070, #6084
@@ -1592,6 +1631,11 @@ private:

    row_states_map _clustering_row_states;
    cell_map _static_row_state;
+    // True if the mutated row existed before applying the mutation. In other
+    // words, if the preimage is enabled and it isn't empty (otherwise, we
+    // assume that the row is non-existent). Used for Alternator Streams (see
+    // #6918).
+    bool _is_update = false;

    const bool _uses_tablets;

@@ -1604,11 +1648,12 @@ private:
    stats::part_type_set _touched_parts;

 public:
-    transformer(db_context ctx, schema_ptr s, dht::decorated_key dk)
+    transformer(db_context ctx, schema_ptr s, dht::decorated_key dk, const per_request_options& options)
        : _ctx(ctx)
        , _schema(std::move(s))
        , _dk(std::move(dk))
-        , _log_schema(ctx._proxy.get_db().local().find_schema(_schema->ks_name(), log_name(_schema->cf_name())))
+        , _log_schema(_schema->cdc_schema() ? _schema->cdc_schema() : ctx._proxy.get_db().local().find_schema(_schema->ks_name(), log_name(_schema->cf_name())))
+        , _options(options)
        , _clustering_row_states(0, clustering_key::hashing(*_schema), clustering_key::equality(*_schema))
        , _uses_tablets(ctx._proxy.get_db().local().find_keyspace(_schema->ks_name()).uses_tablets())
    {
@@ -1623,7 +1668,7 @@ public:
    }

    void produce_preimage(const clustering_key* ck, const one_kind_column_set& columns_to_include) override {
-        // iff we want full preimage, just ignore the affected columns and include everything. 
+        // if we want full preimage, just ignore the affected columns and include everything. 
        generate_image(operation::pre_image, ck, _schema->cdc_options().full_preimage() ? nullptr : &columns_to_include);
    };

@@ -1709,11 +1754,15 @@ public:
    void process_change(const mutation& m) override {
        SCYLLA_ASSERT(_builder);
        process_change_visitor v {
+            ._request_options = _options,
+            ._row_delete_op = _options.is_system_originated ? operation::service_row_delete : operation::row_delete,
+            ._partition_delete_op = _options.is_system_originated ? operation::service_partition_delete : operation::partition_delete,
            ._touched_parts = _touched_parts,
            ._builder = *_builder,
            ._enable_updating_state = _enable_updating_state,
            ._clustering_row_states = _clustering_row_states,
            ._static_row_state = _static_row_state,
+            ._is_update = _is_update,
            ._generate_delta_values = generate_delta_values(_builder->base_schema())
        };
        cdc::inspect_mutation(m, v);
@@ -1724,6 +1773,10 @@ public:
        _builder->end_record();
    }

+    const row_states_map& clustering_row_states() const override {
+        return _clustering_row_states;
+    }
+
    // Takes and returns generated cdc log mutations and associated statistics about parts touched during transformer's lifetime.
    // The `transformer` object on which this method was called on should not be used anymore.
    std::tuple<utils::chunked_vector<mutation>, stats::part_type_set> finish() && {
@@ -1740,7 +1793,8 @@ public:
            const mutation& m)
    {
        auto& p = m.partition();
-        if (p.clustered_rows().empty() && p.static_row().empty()) {
+        const bool no_ck_schema_partition_deletion = m.schema()->clustering_key_size() == 0 && bool(p.partition_tombstone());
+        if (p.clustered_rows().empty() && p.static_row().empty() && !no_ck_schema_partition_deletion) {
            return make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>();
        }

@@ -1789,12 +1843,12 @@ public:
                });
            }
        }
-        if (!p.clustered_rows().empty()) {
+        if (!p.clustered_rows().empty() || no_ck_schema_partition_deletion) {
            const bool has_row_delete = std::any_of(p.clustered_rows().begin(), p.clustered_rows().end(), [] (const rows_entry& re) {
                return re.row().deleted_at();
            });
            // for postimage we need everything...
-            if (has_row_delete || _schema->cdc_options().postimage() || _schema->cdc_options().full_preimage()) {
+            if (has_row_delete || _schema->cdc_options().postimage() || _schema->cdc_options().full_preimage() || no_ck_schema_partition_deletion) {
                for (const column_definition& c: _schema->regular_columns()) {
                    regular_columns.emplace_back(c.id);
                    columns.emplace_back(&c);
@@ -1846,6 +1900,7 @@ public:
                    _static_row_state[&c] = std::move(*maybe_cell_view);
                }
            }
+            _is_update = true;
        }

        if (static_only) {
@@ -1909,7 +1964,7 @@ transform_mutations(utils::chunked_vector<mutation>& muts, decltype(muts.size())
 } // namespace cdc

 future<std::tuple<utils::chunked_vector<mutation>, lw_shared_ptr<cdc::operation_result_tracker>>>
-cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout, utils::chunked_vector<mutation>&& mutations, tracing::trace_state_ptr tr_state, db::consistency_level write_cl) {
+cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout, utils::chunked_vector<mutation>&& mutations, tracing::trace_state_ptr tr_state, db::consistency_level write_cl, per_request_options options) {
    // we do all this because in the case of batches, we can have mixed schemas.
    auto e = mutations.end();
    auto i = std::find_if(mutations.begin(), e, [](const mutation& m) {
@@ -1923,9 +1978,9 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
    tracing::trace(tr_state, "CDC: Started generating mutations for log rows");
    mutations.reserve(2 * mutations.size());

-    return do_with(std::move(mutations), service::query_state(service::client_state::for_internal_calls(), empty_service_permit()), operation_details{},
-            [this, tr_state = std::move(tr_state), write_cl] (utils::chunked_vector<mutation>& mutations, service::query_state& qs, operation_details& details) {
-        return transform_mutations(mutations, 1, [this, &mutations, &qs, tr_state = tr_state, &details, write_cl] (int idx) mutable {
+    return do_with(std::move(mutations), service::query_state(service::client_state::for_internal_calls(), empty_service_permit()), operation_details{}, std::move(options),
+            [this, tr_state = std::move(tr_state), write_cl] (utils::chunked_vector<mutation>& mutations, service::query_state& qs, operation_details& details, per_request_options& options) {
+        return transform_mutations(mutations, 1, [this, &mutations, &qs, tr_state = tr_state, &details, write_cl, &options] (int idx) mutable {
            auto& m = mutations[idx];
            auto s = m.schema();

@@ -1933,12 +1988,17 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
                return make_ready_future<>();
            }

-            transformer trans(_ctxt, s, m.decorated_key());
+            const bool alternator_increased_compatibility = options.alternator && options.alternator_streams_increased_compatibility;
+            transformer trans(_ctxt, s, m.decorated_key(), options);

            auto f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(nullptr);
-            if (s->cdc_options().preimage() || s->cdc_options().postimage()) {
+            if (options.preimage && !options.preimage->empty()) {
+                // Preimage has been fetched by upper layers.
+                tracing::trace(tr_state, "CDC: Using a prefetched preimage");
+                f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(options.preimage);
+            } else if (s->cdc_options().preimage() || s->cdc_options().postimage() || alternator_increased_compatibility) {
                // Note: further improvement here would be to coalesce the pre-image selects into one
-                // iff a batch contains several modifications to the same table. Otoh, batch is rare(?)
+                // if a batch contains several modifications to the same table. Otoh, batch is rare(?)
                // so this is premature.
                tracing::trace(tr_state, "CDC: Selecting preimage for {}", m.decorated_key());
                f = trans.pre_image_select(qs.get_client_state(), write_cl, m).then_wrapped([this] (future<lw_shared_ptr<cql3::untyped_result_set>> f) {
@@ -1953,7 +2013,7 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
                tracing::trace(tr_state, "CDC: Preimage not enabled for the table, not querying current value of {}", m.decorated_key());
            }

-            return f.then([trans = std::move(trans), &mutations, idx, tr_state, &details] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
+            return f.then([alternator_increased_compatibility, trans = std::move(trans), &mutations, idx, tr_state, &details, &options] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
                auto& m = mutations[idx];
                auto& s = m.schema();

@@ -1968,13 +2028,13 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
                details.had_preimage |= preimage;
                details.had_postimage |= postimage;
                tracing::trace(tr_state, "CDC: Generating log mutations for {}", m.decorated_key());
-                if (should_split(m)) {
+                if (should_split(m, options)) {
                    tracing::trace(tr_state, "CDC: Splitting {}", m.decorated_key());
                    details.was_split = true;
-                    process_changes_with_splitting(m, trans, preimage, postimage);
+                    process_changes_with_splitting(m, trans, preimage, postimage, alternator_increased_compatibility);
                } else {
                    tracing::trace(tr_state, "CDC: No need to split {}", m.decorated_key());
-                    process_changes_without_splitting(m, trans, preimage, postimage);
+                    process_changes_without_splitting(m, trans, preimage, postimage, alternator_increased_compatibility);
                }
                auto [log_mut, touched_parts] = std::move(trans).finish();
                const int generated_count = log_mut.size();
@@ -1999,11 +2059,11 @@ bool cdc::cdc_service::needs_cdc_augmentation(const utils::chunked_vector<mutati
 }

 future<std::tuple<utils::chunked_vector<mutation>, lw_shared_ptr<cdc::operation_result_tracker>>>
-cdc::cdc_service::augment_mutation_call(lowres_clock::time_point timeout, utils::chunked_vector<mutation>&& mutations, tracing::trace_state_ptr tr_state, db::consistency_level write_cl) {
+cdc::cdc_service::augment_mutation_call(lowres_clock::time_point timeout, utils::chunked_vector<mutation>&& mutations, tracing::trace_state_ptr tr_state, db::consistency_level write_cl, per_request_options options) {
    if (utils::get_local_injector().enter("sleep_before_cdc_augmentation")) {
-        return seastar::sleep(std::chrono::milliseconds(100)).then([this, timeout, mutations = std::move(mutations), tr_state = std::move(tr_state), write_cl] () mutable {
-            return _impl->augment_mutation_call(timeout, std::move(mutations), std::move(tr_state), write_cl);
+        return seastar::sleep(std::chrono::milliseconds(100)).then([this, timeout, mutations = std::move(mutations), tr_state = std::move(tr_state), write_cl, options = std::move(options)] () mutable {
+            return _impl->augment_mutation_call(timeout, std::move(mutations), std::move(tr_state), write_cl, std::move(options));
        });
    }
-    return _impl->augment_mutation_call(timeout, std::move(mutations), std::move(tr_state), write_cl);
+    return _impl->augment_mutation_call(timeout, std::move(mutations), std::move(tr_state), write_cl, std::move(options));
 }
--- a/cdc/log.hh
+++ b/cdc/log.hh
@@ -21,6 +21,7 @@
 #include <seastar/core/shared_ptr.hh>
 #include <seastar/core/sstring.hh>

+#include "cql3/untyped_result_set.hh"
 #include "mutation/timestamp.hh"
 #include "tracing/trace_state.hh"
 #include "utils/UUID.hh"
@@ -51,6 +52,40 @@ class database;

 namespace cdc {

+using cell_map = std::unordered_map<const column_definition*, managed_bytes_opt>;
+using row_states_map = std::unordered_map<clustering_key, cell_map, clustering_key::hashing, clustering_key::equality>;
+
+// cdc log table operation
+enum class operation : int8_t {
+    // note: these values will eventually be read by a third party, probably not privvy to this
+    // enum decl, so don't change the constant values (or the datatype).
+    pre_image = 0, update = 1, insert = 2, row_delete = 3, partition_delete = 4,
+    range_delete_start_inclusive = 5, range_delete_start_exclusive = 6, range_delete_end_inclusive = 7, range_delete_end_exclusive = 8,
+    post_image = 9,
+
+    // Operations initiated internally by Scylla. Currently used only by Alternator
+    service_row_delete = -3, service_partition_delete = -4,
+};
+
+struct per_request_options {
+    // The value of the base row before current operation, queried by higher
+    // layers than CDC. We assume that CDC could have seen the row in this
+    // state, i.e. the value isn't 'stale'/'too recent'.
+    lw_shared_ptr<cql3::untyped_result_set> preimage;
+    // Whether this mutation is a result of an internal operation initiated by
+    // Scylla. Currently, only TTL expiration implementation for Alternator
+    // uses this.
+    const bool is_system_originated = false;
+    // True if this mutation was emitted by Alternator.
+    const bool alternator = false;
+    // Sacrifice performance for the sake of better compatibility with DynamoDB
+    // Streams. It's important for correctness that
+    // alternator_streams_increased_compatibility config flag be read once per
+    // request, because it's live-updateable. As a result, the flag may change
+    // between reads.
+    const bool alternator_streams_increased_compatibility = false;
+};
+
 struct operation_result_tracker;
 class db_context;
 class metadata;
@@ -80,8 +115,9 @@ public:
        lowres_clock::time_point timeout,
        utils::chunked_vector<mutation>&& mutations,
        tracing::trace_state_ptr tr_state,
-        db::consistency_level write_cl
-        );
+        db::consistency_level write_cl,
+        per_request_options options = {}
+    );
    bool needs_cdc_augmentation(const utils::chunked_vector<mutation>&) const;
 };

@@ -93,15 +129,6 @@ struct db_context final {
        : _proxy(proxy), _migration_notifier(notifier), _cdc_metadata(cdc_meta) {}
 };

-// cdc log table operation
-enum class operation : int8_t {
-    // note: these values will eventually be read by a third party, probably not privvy to this
-    // enum decl, so don't change the constant values (or the datatype).
-    pre_image = 0, update = 1, insert = 2, row_delete = 3, partition_delete = 4,
-    range_delete_start_inclusive = 5, range_delete_start_exclusive = 6, range_delete_end_inclusive = 7, range_delete_end_exclusive = 8,
-    post_image = 9,
-};
-
 bool is_log_for_some_table(const replica::database& db, const sstring& ks_name, const std::string_view& table_name);

 schema_ptr get_base_table(const replica::database&, const schema&);
@@ -126,4 +153,7 @@ bool is_cdc_metacolumn_name(const sstring& name);

 utils::UUID generate_timeuuid(api::timestamp_type t);

+cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck);
+const cell_map* get_row_state(const row_states_map& row_states, const clustering_key& ck);
+
 } // namespace cdc
--- a/cdc/split.cc
+++ b/cdc/split.cc
@@ -6,15 +6,28 @@
 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
 */

+#include "bytes.hh"
+#include "bytes_fwd.hh"
+#include "mutation/atomic_cell.hh"
+#include "mutation/atomic_cell_or_collection.hh"
+#include "mutation/collection_mutation.hh"
 #include "mutation/mutation.hh"
+#include "mutation/tombstone.hh"
 #include "schema/schema.hh"

+#include "seastar/core/sstring.hh"
 #include "types/concrete_types.hh"
+#include "types/types.hh"
 #include "types/user.hh"

 #include "split.hh"
 #include "log.hh"
 #include "change_visitor.hh"
+#include "utils/managed_bytes.hh"
+#include <string_view>
+#include <unordered_map>
+
+extern logging::logger cdc_log;

 struct atomic_column_update {
    column_id id;
@@ -111,6 +124,15 @@ struct batch {
                ret.insert(std::make_pair(change.key, all_columns));
            }
        }
+        // While deleting a full partition avoids row-by-row logging for performance
+        // reasons, we must explicitly log single-row deletions for tables without a
+        // clustering key. This ensures consistent behavior with deletions of single
+        // rows from tables with a clustering key. See issue #26382.
+        if (partition_deletions && s.clustering_key_size() == 0) {
+            cdc::one_kind_column_set all_columns{s.regular_columns_count()};
+            all_columns.set(0, s.regular_columns_count(), true);
+            ret.emplace(clustering_key::make_empty(), all_columns);
+        }

        auto process_change_type = [&] (const auto& changes) {
            for (const auto& change : changes) {
@@ -481,6 +503,8 @@ struct should_split_visitor {
    // Otherwise we store the change's ttl.
    std::optional<gc_clock::duration> _ttl = std::nullopt;

+    virtual ~should_split_visitor() = default;
+
    inline bool finished() const { return _result; }
    inline void stop() { _result = true; }

@@ -503,7 +527,7 @@ struct should_split_visitor {

    void collection_tombstone(const tombstone& t) { visit(t.timestamp + 1); }

-    void live_collection_cell(bytes_view, const atomic_cell_view& cell) {
+    virtual void live_collection_cell(bytes_view, const atomic_cell_view& cell) {
        if (_had_row_marker) {
            // nonatomic updates cannot be expressed with an INSERT.
            return stop();
@@ -513,7 +537,7 @@ struct should_split_visitor {
    void dead_collection_cell(bytes_view, const atomic_cell_view& cell) { visit(cell); }
    void collection_column(const column_definition&, auto&& visit_collection) { visit_collection(*this); }

-    void marker(const row_marker& rm) {
+    virtual void marker(const row_marker& rm) {
        _had_row_marker = true;
        visit(rm.timestamp(), get_ttl(rm));
    }
@@ -554,7 +578,29 @@ struct should_split_visitor {
    }
 };

-bool should_split(const mutation& m) {
+// This is the same as the above, but it doesn't split a row marker away from
+// an update. As a result, updates that create an item appear as a single log
+// row.
+class alternator_should_split_visitor : public should_split_visitor {
+public:
+    ~alternator_should_split_visitor() override = default;
+
+    void live_collection_cell(bytes_view, const atomic_cell_view& cell) override {
+        visit(cell.timestamp());
+    }
+
+    void marker(const row_marker& rm) override {
+        visit(rm.timestamp());
+    }
+};
+
+bool should_split(const mutation& m, const per_request_options& options) {
+    if (options.alternator) {
+        alternator_should_split_visitor v;
+        cdc::inspect_mutation(m, v);
+        return v._result || v._ts == api::missing_timestamp;
+    }
+
    should_split_visitor v;

    cdc::inspect_mutation(m, v);
@@ -564,8 +610,109 @@ bool should_split(const mutation& m) {
        || v._ts == api::missing_timestamp;
 }

+// Returns true if the row state and the atomic and nonatomic entries represent
+// an equivalent item.
+static bool entries_match_row_state(const schema_ptr& base_schema, const cell_map& row_state, const std::vector<atomic_column_update>& atomic_entries,
+        std::vector<nonatomic_column_update>& nonatomic_entries) {
+    for (const auto& update : atomic_entries) {
+        const column_definition& cdef = base_schema->column_at(column_kind::regular_column, update.id);
+        const auto it = row_state.find(&cdef);
+        if (it == row_state.end()) {
+            return false;
+        }
+        if (to_managed_bytes_opt(update.cell.value().linearize()) != it->second) {
+            return false;
+        }
+    }
+    if (nonatomic_entries.empty()) {
+        return true;
+    }
+
+    for (const auto& update : nonatomic_entries) {
+        const column_definition& cdef = base_schema->column_at(column_kind::regular_column, update.id);
+        const auto it = row_state.find(&cdef);
+        if (it == row_state.end()) {
+            return false;
+        }
+
+        // The only collection used by Alternator is a non-frozen map.
+        auto current_raw_map = cdef.type->deserialize(*it->second);
+        map_type_impl::native_type current_values = value_cast<map_type_impl::native_type>(current_raw_map);
+
+        if (current_values.size() != update.cells.size()) {
+            return false;
+        }
+        
+        std::unordered_map<sstring_view, bytes> current_values_map;
+        for (const auto& entry : current_values) {
+            const auto attr_name = std::string_view(value_cast<sstring>(entry.first));
+            current_values_map[attr_name] = value_cast<bytes>(entry.second);
+        }
+
+        for (const auto& [key, value] : update.cells) {
+            const auto key_str = to_string_view(key);
+            if (!value.is_live()) {
+                if (current_values_map.contains(key_str)) {
+                    return false;
+                }
+            } else if (current_values_map[key_str] != value.value().linearize()) {
+                return false;
+            }
+        }
+    }
+    return true;
+}
+
+bool should_skip(batch& changes, const mutation& base_mutation, change_processor& processor) {
+    const schema_ptr& base_schema = base_mutation.schema();
+    // Alternator doesn't use static updates and clustered range deletions.
+    if (!changes.static_updates.empty() || !changes.clustered_range_deletions.empty()) {
+        return false;
+    }
+
+    for (clustered_row_insert& u : changes.clustered_inserts) {
+        const cell_map* row_state = get_row_state(processor.clustering_row_states(), u.key);
+        if (!row_state) {
+            return false;
+        }
+        if (!entries_match_row_state(base_schema, *row_state, u.atomic_entries, u.nonatomic_entries)) {
+            return false;
+        }
+    }
+
+    for (clustered_row_update& u : changes.clustered_updates) {
+        const cell_map* row_state = get_row_state(processor.clustering_row_states(), u.key);
+        if (!row_state) {
+            return false;
+        }
+        if (!entries_match_row_state(base_schema, *row_state, u.atomic_entries, u.nonatomic_entries)) {
+            return false;
+        }
+    }
+
+    // Skip only if the row being deleted does not exist (i.e. the deletion is a no-op).
+    for (const auto& row_deletion : changes.clustered_row_deletions) {
+        if (processor.clustering_row_states().contains(row_deletion.key)) {
+            return false;
+        }
+    }
+
+    // Don't skip if the item exists.
+    //
+    // Increased DynamoDB Streams compatibility guarantees that single-item
+    // operations will read the item and store it in the clustering row states.
+    // If it is not found there, we may skip CDC. This is safe as long as the
+    // assumptions of this operation's write isolation are not violated.
+    if (changes.partition_deletions && processor.clustering_row_states().contains(clustering_key::make_empty())) {
+        return false;
+    }
+
+    cdc_log.trace("Skipping CDC log for mutation {}", base_mutation);
+    return true;
+}
+
 void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
-        bool enable_preimage, bool enable_postimage) {
+        bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility) {
    const auto base_schema = base_mutation.schema();
    auto changes = extract_changes(base_mutation);
    auto pk = base_mutation.key();
@@ -577,9 +724,6 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
    const auto last_timestamp = changes.rbegin()->first;

    for (auto& [change_ts, btch] : changes) {
-        const bool is_last = change_ts == last_timestamp;
-        processor.begin_timestamp(change_ts, is_last);
-
        clustered_column_set affected_clustered_columns_per_row{clustering_key::less_compare(*base_schema)};
        one_kind_column_set affected_static_columns{base_schema->static_columns_count()};

@@ -588,6 +732,12 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
            affected_clustered_columns_per_row = btch.get_affected_clustered_columns_per_row(*base_mutation.schema());
        }

+        if (alternator_strict_compatibility && should_skip(btch, base_mutation, processor)) {
+            continue;
+        }
+
+        const bool is_last = change_ts == last_timestamp;
+        processor.begin_timestamp(change_ts, is_last);
        if (enable_preimage) {
            if (affected_static_columns.count() > 0) {
                processor.produce_preimage(nullptr, affected_static_columns);
@@ -675,7 +825,13 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
 }

 void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
-        bool enable_preimage, bool enable_postimage) {
+        bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility) {
+    if (alternator_strict_compatibility) {
+        auto changes = extract_changes(base_mutation);
+        if (should_skip(changes.begin()->second, base_mutation, processor)) {
+            return;
+        }
+    }
    auto ts = find_timestamp(base_mutation);
    processor.begin_timestamp(ts, true);

--- a/cdc/split.hh
+++ b/cdc/split.hh
@@ -9,6 +9,7 @@
 #pragma once

 #include <boost/dynamic_bitset.hpp>  // IWYU pragma: keep
+#include "cdc/log.hh"
 #include "replica/database_fwd.hh"
 #include "mutation/timestamp.hh"

@@ -65,12 +66,14 @@ public:
    // Tells processor we have reached end of record - last part
    // of a given timestamp batch
    virtual void end_record() = 0;
+
+    virtual const row_states_map& clustering_row_states() const = 0;
 };

-bool should_split(const mutation& base_mutation);
+bool should_split(const mutation& base_mutation, const per_request_options& options);
 void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
-        bool enable_preimage, bool enable_postimage);
+        bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility);
 void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
-        bool enable_preimage, bool enable_postimage);
+        bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility);

 }
--- a/cmake/mode.common.cmake
+++ b/cmake/mode.common.cmake
@@ -117,6 +117,9 @@ add_compile_options("-ffile-prefix-map=${CMAKE_BINARY_DIR}=.")
 cmake_path(GET CMAKE_BINARY_DIR FILENAME build_dir_name)
 add_compile_options("-ffile-prefix-map=${CMAKE_BINARY_DIR}/=${build_dir_name}")

+# https://github.com/llvm/llvm-project/issues/163007
+add_compile_options("-fextend-variable-liveness=none")
+
 default_target_arch(target_arch)
 if(target_arch)
  add_compile_options("-march=${target_arch}")
--- a/compaction/CMakeLists.txt
+++ b/compaction/CMakeLists.txt
@@ -21,5 +21,8 @@ target_link_libraries(compaction
    mutation_writer
    replica)

+if (Scylla_USE_PRECOMPILED_HEADER_USE)
+  target_precompile_headers(compaction REUSE_FROM scylla-precompiled-header)
+endif()
 check_headers(check-headers compaction
  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
--- a/compaction/compaction.cc
+++ b/compaction/compaction.cc
@@ -129,6 +129,7 @@ static const std::unordered_map<compaction_type, sstring> compaction_types = {
    { compaction_type::Upgrade, "UPGRADE" },
    { compaction_type::Reshape, "RESHAPE" },
    { compaction_type::Split, "SPLIT" },
+    { compaction_type::Major, "MAJOR" },
 };

 sstring compaction_name(compaction_type type) {
@@ -159,6 +160,7 @@ std::string_view to_string(compaction_type type) {
    case compaction_type::Upgrade: return "Upgrade";
    case compaction_type::Reshape: return "Reshape";
    case compaction_type::Split: return "Split";
+    case compaction_type::Major: return "Major";
    }
    on_internal_error_noexcept(clogger, format("Invalid compaction type {}", int(type)));
    return "(invalid)";
@@ -1537,6 +1539,8 @@ private:
        mutation_fragment_stream_validator _validator;
        bool _skip_to_next_partition = false;
        uint64_t& _validation_errors;
+        bool& _failed_to_fix_sstable;
+        compaction_type_options::scrub::drop_unfixable_sstables _drop_unfixable_sstables;

    private:
        void maybe_abort_scrub(std::function<void()> report_error) {
@@ -1547,7 +1551,7 @@ private:
            ++_validation_errors;
        }

-        void on_unexpected_partition_start(const mutation_fragment_v2& ps, sstring error) {
+        skip on_unexpected_partition_start(const mutation_fragment_v2& ps, sstring error) {
            auto report_fn = [this, error] (std::string_view action = "") {
                report_validation_error(compaction_type::Scrub, *_schema, error, action);
            };
@@ -1556,6 +1560,11 @@ private:

            auto pe = mutation_fragment_v2(*_schema, _permit, partition_end{});
            if (!_validator(pe)) {
+                if (_drop_unfixable_sstables) {
+                    _failed_to_fix_sstable = true;
+                    end_stream();
+                    return skip::yes;
+                }
                throw compaction_aborted_exception(
                        _schema->ks_name(),
                        _schema->cf_name(),
@@ -1564,11 +1573,17 @@ private:
            push_mutation_fragment(std::move(pe));

            if (!_validator(ps)) {
+                if (_drop_unfixable_sstables) {
+                    _failed_to_fix_sstable = true;
+                    end_stream();
+                    return skip::yes;
+                }
                throw compaction_aborted_exception(
                        _schema->ks_name(),
                        _schema->cf_name(),
                        "scrub compaction failed to rectify unexpected partition-start, validator rejects it even after the injected partition-end");
            }
+            return skip::no;
        }

        skip on_invalid_partition(const dht::decorated_key& new_key, sstring error) {
@@ -1596,6 +1611,11 @@ private:
            const auto& key = _validator.previous_partition_key();

            if (_validator.current_tombstone()) {
+                if (_drop_unfixable_sstables) {
+                    _failed_to_fix_sstable = true;
+                    end_stream();
+                    return skip::yes;
+                }
                throw compaction_aborted_exception(
                        _schema->ks_name(),
                        _schema->cf_name(),
@@ -1635,13 +1655,21 @@ private:
        }

        void on_malformed_sstable_exception(std::exception_ptr e) {
-            if (_scrub_mode != compaction_type_options::scrub::mode::skip) {
+            bool should_abort = _scrub_mode == compaction_type_options::scrub::mode::abort ||
+                    (_scrub_mode == compaction_type_options::scrub::mode::segregate && !_drop_unfixable_sstables);
+            if (should_abort) {
                throw compaction_aborted_exception(
                        _schema->ks_name(),
                        _schema->cf_name(),
                        format("scrub compaction failed due to unrecoverable error: {}", e));
            }
+            if (_drop_unfixable_sstables) {
+                _failed_to_fix_sstable = true;
+            }
+            end_stream();
+        }

+        void end_stream() {
            // Closes the active range tombstone if needed, before emitting partition end.
            if (auto current_tombstone = _validator.current_tombstone(); current_tombstone) {
                const auto& last_pos = _validator.previous_position();
@@ -1662,6 +1690,10 @@ private:
        void fill_buffer_from_underlying() {
            utils::get_local_injector().inject("rest_api_keyspace_scrub_abort", [] { throw compaction_aborted_exception("", "", "scrub compaction found invalid data"); });
            while (!_reader.is_buffer_empty() && !is_buffer_full()) {
+                if (_end_of_stream && _failed_to_fix_sstable) {
+                    return;
+                }
+
                auto mf = _reader.pop_mutation_fragment();
                if (mf.is_partition_start()) {
                    // First check that fragment kind monotonicity stands.
@@ -1672,7 +1704,9 @@ private:
                    // will confuse it.
                    if (!_skip_to_next_partition) {
                        if (auto res = _validator(mf); !res) {
-                            on_unexpected_partition_start(mf, res.what());
+                            if (on_unexpected_partition_start(mf, res.what()) == skip::yes) {
+                                continue;
+                            }
                        }
                        // Continue processing this partition start.
                    }
@@ -1696,6 +1730,10 @@ private:
                push_mutation_fragment(std::move(mf));
            }

+            if (_end_of_stream && _failed_to_fix_sstable) {
+                return;
+            }
+
            _end_of_stream = _reader.is_end_of_stream() && _reader.is_buffer_empty();

            if (_end_of_stream) {
@@ -1706,12 +1744,15 @@ private:
        }

    public:
-        reader(mutation_reader underlying, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors)
+        reader(mutation_reader underlying, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors,
+                bool& failed_to_fix_sstable, compaction_type_options::scrub::drop_unfixable_sstables drop_unfixable_sstables)
            : impl(underlying.schema(), underlying.permit())
            , _scrub_mode(scrub_mode)
            , _reader(std::move(underlying))
            , _validator(*_schema)
            , _validation_errors(validation_errors)
+            , _failed_to_fix_sstable(failed_to_fix_sstable)
+            , _drop_unfixable_sstables(drop_unfixable_sstables)
        { }
        virtual future<> fill_buffer() override {
            if (_end_of_stream) {
@@ -1762,6 +1803,7 @@ private:
    mutable std::string _scrub_finish_description;
    uint64_t _bucket_count = 0;
    uint64_t _validation_errors = 0;
+    bool _failed_to_fix_sstable = false;

 public:
    scrub_compaction(compaction_group_view& table_s, compaction_descriptor descriptor, compaction_data& cdata, compaction_type_options::scrub options, compaction_progress_monitor& progress_monitor)
@@ -1793,7 +1835,7 @@ public:
            on_internal_error(clogger, fmt::format("Scrub compaction in mode {} expected full partition range, but got {} instead", _options.operation_mode, range));
        }
        auto full_scan_reader = _compacting->make_full_scan_reader(std::move(s), std::move(permit), nullptr, unwrap_monitor_generator(), sstables::integrity_check::yes);
-        return make_mutation_reader<reader>(std::move(full_scan_reader), _options.operation_mode, _validation_errors);
+        return make_mutation_reader<reader>(std::move(full_scan_reader), _options.operation_mode, _validation_errors, _failed_to_fix_sstable, _options.drop_unfixable);
    }

    uint64_t partitions_per_sstable() const override {
@@ -1830,11 +1872,45 @@ public:
        return ret;
    }

-    friend mutation_reader make_scrubbing_reader(mutation_reader rd, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors);
+    void drop_unfixable_sstables() {
+        if (!_sstables.empty() || !used_garbage_collected_sstables().empty()) {
+            std::vector<sstables::shared_sstable> old_sstables;
+            std::move(_sstables.begin(), _sstables.end(), std::back_inserter(old_sstables));
+
+            // Remove Garbage Collected SSTables from the SSTable set if any was previously added.
+            auto& used_gc_sstables = used_garbage_collected_sstables();
+            old_sstables.insert(old_sstables.end(), used_gc_sstables.begin(), used_gc_sstables.end());
+
+            _replacer(get_compaction_completion_desc(std::move(old_sstables), {}));
+        }
+
+        // Mark new sstables for deletion as well
+        for (auto& sst : boost::range::join(_new_partial_sstables, _new_unused_sstables)) {
+            sst->mark_for_deletion();
+        }
+    }
+
+    virtual void on_end_of_compaction() override {
+        if (_options.drop_unfixable && _failed_to_fix_sstable) {
+            drop_unfixable_sstables();
+        } else {
+            regular_compaction::on_end_of_compaction();
+        }
+    }
+
+    virtual void stop_sstable_writer(compaction_writer* writer) override {
+        if (_options.drop_unfixable && _failed_to_fix_sstable && writer) {
+            finish_new_sstable(writer);
+        } else {
+            regular_compaction::stop_sstable_writer(writer);
+        }
+    }
+
+    friend mutation_reader make_scrubbing_reader(mutation_reader rd, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors, bool& failed_to_fix_sstable, compaction_type_options::scrub::drop_unfixable_sstables drop_unfixable_sstables);
 };

-mutation_reader make_scrubbing_reader(mutation_reader rd, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors) {
-    return make_mutation_reader<scrub_compaction::reader>(std::move(rd), scrub_mode, validation_errors);
+mutation_reader make_scrubbing_reader(mutation_reader rd, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors, bool& failed_to_fix_sstable, compaction_type_options::scrub::drop_unfixable_sstables drop_unfixable_sstables) {
+    return make_mutation_reader<scrub_compaction::reader>(std::move(rd), scrub_mode, validation_errors, failed_to_fix_sstable, drop_unfixable_sstables);
 }

 class resharding_compaction final : public compaction {
@@ -1971,6 +2047,7 @@ compaction_type compaction_type_options::type() const {
        compaction_type::Reshard,
        compaction_type::Reshape,
        compaction_type::Split,
+        compaction_type::Major,
    };
    static_assert(std::variant_size_v<compaction_type_options::options_variant> == std::size(index_to_type));
    return index_to_type[_options.index()];
@@ -1992,6 +2069,9 @@ static std::unique_ptr<compaction> make_compaction(compaction_group_view& table_
        std::unique_ptr<compaction> operator()(compaction_type_options::regular) {
            return std::make_unique<regular_compaction>(table_s, std::move(descriptor), cdata, progress_monitor);
        }
+        std::unique_ptr<compaction> operator()(compaction_type_options::major) {
+            return std::make_unique<regular_compaction>(table_s, std::move(descriptor), cdata, progress_monitor);
+        }
        std::unique_ptr<compaction> operator()(compaction_type_options::cleanup) {
            return std::make_unique<cleanup_compaction>(table_s, std::move(descriptor), cdata, progress_monitor);
        }
--- a/compaction/compaction.hh
+++ b/compaction/compaction.hh
@@ -138,6 +138,6 @@ std::unordered_set<sstables::shared_sstable>
 get_fully_expired_sstables(const compaction_group_view& table_s, const std::vector<sstables::shared_sstable>& compacting, gc_clock::time_point gc_before);

 // For tests, can drop after we virtualize sstables.
-mutation_reader make_scrubbing_reader(mutation_reader rd, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors);
+mutation_reader make_scrubbing_reader(mutation_reader rd, compaction_type_options::scrub::mode scrub_mode, uint64_t& validation_errors, bool& failed_to_fix_sstable, compaction_type_options::scrub::drop_unfixable_sstables drop_unfixable_sstables);

 }
--- a/compaction/compaction_descriptor.hh
+++ b/compaction/compaction_descriptor.hh
@@ -20,7 +20,7 @@
 namespace compaction {

 enum class compaction_type {
-    Compaction = 0,
+    Compaction = 0, // Used only for regular compactions
    Cleanup = 1,
    Validation = 2, // Origin uses this for a compaction that is used exclusively for repair
    Scrub = 3,
@@ -29,6 +29,7 @@ enum class compaction_type {
    Upgrade = 6,
    Reshape = 7,
    Split = 8,
+    Major = 9,
 };

 struct compaction_completion_desc {
@@ -49,6 +50,8 @@ class compaction_type_options {
 public:
    struct regular {
    };
+    struct major {
+    };
    struct cleanup {
    };
    struct upgrade {
@@ -74,6 +77,11 @@ public:
        // Should invalid sstables be moved into quarantine.
        // Only applies to validate-mode.
        quarantine_invalid_sstables quarantine_sstables = quarantine_invalid_sstables::yes;
+
+        using drop_unfixable_sstables = bool_class<class drop_unfixable_sstables_tag>;
+        // Drop sstables that cannot be fixed.
+        // Only applies to segregate-mode.
+        drop_unfixable_sstables drop_unfixable = drop_unfixable_sstables::no;
    };
    struct reshard {
    };
@@ -83,7 +91,7 @@ public:
        mutation_writer::classify_by_token_group classifier;
    };
 private:
-    using options_variant = std::variant<regular, cleanup, upgrade, scrub, reshard, reshape, split>;
+    using options_variant = std::variant<regular, cleanup, upgrade, scrub, reshard, reshape, split, major>;

 private:
    options_variant _options;
@@ -105,6 +113,10 @@ public:
        return compaction_type_options(regular{});
    }

+    static compaction_type_options make_major() {
+        return compaction_type_options(major{});
+    }
+
    static compaction_type_options make_cleanup() {
        return compaction_type_options(cleanup{});
    }
@@ -113,8 +125,8 @@ public:
        return compaction_type_options(upgrade{});
    }

-    static compaction_type_options make_scrub(scrub::mode mode, scrub::quarantine_invalid_sstables quarantine_sstables = scrub::quarantine_invalid_sstables::yes) {
-        return compaction_type_options(scrub{.operation_mode = mode, .quarantine_sstables = quarantine_sstables});
+    static compaction_type_options make_scrub(scrub::mode mode, scrub::quarantine_invalid_sstables quarantine_sstables = scrub::quarantine_invalid_sstables::yes, scrub::drop_unfixable_sstables drop_unfixable_sstables = scrub::drop_unfixable_sstables::no) {
+        return compaction_type_options(scrub{.operation_mode = mode, .quarantine_sstables = quarantine_sstables, .drop_unfixable = drop_unfixable_sstables});
    }

    static compaction_type_options make_split(mutation_writer::classify_by_token_group classifier) {
--- a/compaction/compaction_manager.cc
+++ b/compaction/compaction_manager.cc
@@ -547,7 +547,7 @@ public:
            compaction_group_view* t,
            tasks::task_id parent_id,
            bool consider_only_existing_data)
-        : compaction_task_executor(mgr, do_throw_if_stopping, t, compaction_type::Compaction, "Major compaction")
+        : compaction_task_executor(mgr, do_throw_if_stopping, t, compaction_type::Major, "Major compaction")
        , major_compaction_task_impl(mgr._task_manager_module, tasks::task_id::create_random_id(), 0, "compaction group", t->schema()->ks_name(), t->schema()->cf_name(), "", parent_id, flush_mode::compacted_tables, consider_only_existing_data)
    {
        _status.progress_units = "bytes";
@@ -867,8 +867,8 @@ auto fmt::formatter<compaction::compaction_task_executor>::format(const compacti

 namespace compaction {

-inline compaction_controller make_compaction_controller(const compaction_manager::scheduling_group& csg, uint64_t static_shares, std::function<double()> fn) {
-    return compaction_controller(csg, static_shares, 250ms, std::move(fn));
+inline compaction_controller make_compaction_controller(const compaction_manager::scheduling_group& csg, uint64_t static_shares, std::optional<float> max_shares, std::function<double()> fn) {
+    return compaction_controller(csg, static_shares, max_shares, 250ms, std::move(fn));
 }

 compaction::compaction_state::~compaction_state() {
@@ -1014,7 +1014,7 @@ compaction_manager::compaction_manager(config cfg, abort_source& as, tasks::task
    , _sys_ks("compaction_manager::system_keyspace")
    , _cfg(std::move(cfg))
    , _compaction_submission_timer(compaction_sg(), compaction_submission_callback())
-    , _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), [this] () -> float {
+    , _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), _cfg.max_shares.get(), [this] () -> float {
        _last_backlog = backlog();
        auto b = _last_backlog / available_memory();
        // This means we are using an unimplemented strategy
@@ -1033,6 +1033,10 @@ compaction_manager::compaction_manager(config cfg, abort_source& as, tasks::task
    , _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
    , _update_compaction_static_shares_action([this] { return update_static_shares(static_shares()); })
    , _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
+    , _compaction_max_shares_observer(_cfg.max_shares.observe([this] (const float& max_shares) {
+        cmlog.info("Updating max shares to {}", max_shares);
+        _compaction_controller.set_max_shares(max_shares);
+    }))
    , _strategy_control(std::make_unique<strategy_control>(*this))
    , _tombstone_gc_state(_shared_tombstone_gc_state) {
    tm.register_module(_task_manager_module->get_name(), _task_manager_module);
@@ -1051,11 +1055,12 @@ compaction_manager::compaction_manager(tasks::task_manager& tm)
    , _sys_ks("compaction_manager::system_keyspace")
    , _cfg(config{ .available_memory = 1 })
    , _compaction_submission_timer(compaction_sg(), compaction_submission_callback())
-    , _compaction_controller(make_compaction_controller(compaction_sg(), 1, [] () -> float { return 1.0; }))
+    , _compaction_controller(make_compaction_controller(compaction_sg(), 1, std::nullopt, [] () -> float { return 1.0; }))
    , _backlog_manager(_compaction_controller)
    , _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
    , _update_compaction_static_shares_action([] { return make_ready_future<>(); })
    , _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
+    , _compaction_max_shares_observer(_cfg.max_shares.observe([] (const float& max_shares) {}))
    , _strategy_control(std::make_unique<strategy_control>(*this))
    , _tombstone_gc_state(_shared_tombstone_gc_state) {
    tm.register_module(_task_manager_module->get_name(), _task_manager_module);
@@ -1512,9 +1517,7 @@ future<> compaction_manager::maybe_wait_for_sstable_count_reduction(compaction_g
            | std::views::transform(std::mem_fn(&sstables::sstable::run_identifier))
            | std::ranges::to<std::unordered_set>());
    };
-    const auto threshold = utils::get_local_injector().inject_parameter<size_t>("set_sstable_count_reduction_threshold")
-        .value_or(size_t(std::max(schema->max_compaction_threshold(), 32)));
-
+    const auto threshold = size_t(std::max(schema->max_compaction_threshold(), 32));
    auto count = co_await num_runs_for_compaction();
    if (count <= threshold) {
        cmlog.trace("No need to wait for sstable count reduction in {}: {} <= {}",
@@ -1529,7 +1532,9 @@ future<> compaction_manager::maybe_wait_for_sstable_count_reduction(compaction_g
    auto& cstate = get_compaction_state(&t);
    try {
        while (can_perform_regular_compaction(t) && co_await num_runs_for_compaction() > threshold) {
-            co_await cstate.compaction_done.wait();
+            co_await cstate.compaction_done.wait([this, &t] {
+                return !can_perform_regular_compaction(t);
+            });
        }
    } catch (const broken_condition_variable&) {
        co_return;
@@ -2313,7 +2318,7 @@ future<compaction_manager::compaction_stats_opt> compaction_manager::perform_sst
    }
    owned_ranges_ptr owned_ranges_ptr = {};
    sstring option_desc = fmt::format("mode: {};\nquarantine_mode: {}\n", opts.operation_mode, opts.quarantine_operation_mode);
-    co_return co_await rewrite_sstables(t, compaction_type_options::make_scrub(scrub_mode), std::move(owned_ranges_ptr), [&t, opts] -> future<std::vector<sstables::shared_sstable>> {
+    co_return co_await rewrite_sstables(t, compaction_type_options::make_scrub(scrub_mode, opts.quarantine_sstables, opts.drop_unfixable), std::move(owned_ranges_ptr), [&t, opts] -> future<std::vector<sstables::shared_sstable>> {
        auto all_sstables = co_await get_all_sstables(t);
        std::vector<sstables::shared_sstable> sstables = all_sstables
                | std::views::filter([&opts] (const sstables::shared_sstable& sst) {
--- a/compaction/compaction_manager.hh
+++ b/compaction/compaction_manager.hh
@@ -80,6 +80,7 @@ public:
        scheduling_group maintenance_sched_group;
        size_t available_memory = 0;
        utils::updateable_value<float> static_shares = utils::updateable_value<float>(0);
+        utils::updateable_value<float> max_shares = utils::updateable_value<float>(0);
        utils::updateable_value<uint32_t> throughput_mb_per_sec = utils::updateable_value<uint32_t>(0);
        std::chrono::seconds flush_all_tables_before_major = std::chrono::duration_cast<std::chrono::seconds>(std::chrono::days(1));
    };
@@ -159,6 +160,7 @@ private:
    std::optional<utils::observer<uint32_t>> _throughput_option_observer;
    serialized_action _update_compaction_static_shares_action;
    utils::observer<float> _compaction_static_shares_observer;
+    utils::observer<float> _compaction_max_shares_observer;
    uint64_t _validation_errors = 0;

    class strategy_control;
@@ -291,6 +293,10 @@ public:
        return _cfg.static_shares.get();
    }

+    float max_shares() const noexcept {
+        return _cfg.max_shares.get();
+    }
+
    uint32_t throughput_mbs() const noexcept {
        return _cfg.throughput_mb_per_sec.get();
    }
@@ -569,7 +575,7 @@ protected:
                                sstables::offstrategy offstrategy = sstables::offstrategy::no);
    future<> update_history(::compaction::compaction_group_view& t, compaction_result&& res, const compaction_data& cdata);
    bool should_update_history(compaction_type ct) {
-        return ct == compaction_type::Compaction;
+        return ct == compaction_type::Compaction || ct == compaction_type::Major;
    }
 public:
    compaction_manager::compaction_stats_opt get_stats() const noexcept {
--- a/compaction/compaction_strategy.cc
+++ b/compaction/compaction_strategy.cc
@@ -41,7 +41,7 @@ using timestamp_type = api::timestamp_type;

 compaction_descriptor compaction_strategy_impl::make_major_compaction_job(std::vector<sstables::shared_sstable> candidates, int level, uint64_t max_sstable_bytes) {
    // run major compaction in maintenance priority
-    return compaction_descriptor(std::move(candidates), level, max_sstable_bytes);
+    return compaction_descriptor(std::move(candidates), level, max_sstable_bytes, sstables::run_id::create_random_id(), compaction_type_options::make_major());
 }

 std::vector<compaction_descriptor> compaction_strategy_impl::get_cleanup_compaction_jobs(compaction_group_view& table_s, std::vector<sstables::shared_sstable> candidates) const {
--- a/compaction/task_manager_module.cc
+++ b/compaction/task_manager_module.cc
@@ -227,7 +227,7 @@ future<> run_table_tasks(replica::database& db, std::vector<table_tasks_info> ta
                // Tables will be kept in descending order.
                std::ranges::sort(table_tasks, std::greater<>(), [&] (const table_tasks_info& tti) {
                    try {
-                        return db.find_column_family(tti.ti.id).get_stats().live_disk_space_used;
+                        return db.find_column_family(tti.ti.id).get_stats().live_disk_space_used.on_disk;
                    } catch (const replica::no_such_column_family& e) {
                        return int64_t(-1);
                    }
@@ -281,7 +281,7 @@ future<> run_keyspace_tasks(replica::database& db, std::vector<keyspace_tasks_in
                    try {
                        return std::accumulate(kti.table_infos.begin(), kti.table_infos.end(), int64_t(0), [&] (int64_t sum, const table_info& t) {
                            try {
-                                sum += db.find_column_family(t.id).get_stats().live_disk_space_used;
+                                sum += db.find_column_family(t.id).get_stats().live_disk_space_used.on_disk;
                            } catch (const replica::no_such_column_family&) {
                                // ignore
                            }
--- a/configure.py
+++ b/configure.py
@@ -445,6 +445,7 @@ ldap_tests = set([
 scylla_tests = set([
    'test/boost/combined_tests',
    'test/boost/UUID_test',
+    'test/boost/url_parse_test',
    'test/boost/advanced_rpc_compressor_test',
    'test/boost/allocation_strategy_test',
    'test/boost/alternator_unit_test',
@@ -526,6 +527,7 @@ scylla_tests = set([
    'test/boost/mutation_test',
    'test/boost/mvcc_test',
    'test/boost/nonwrapping_interval_test',
+    'test/boost/object_storage_upload_test',
    'test/boost/observable_test',
    'test/boost/partitioner_test',
    'test/boost/pretty_printers_test',
@@ -619,6 +621,7 @@ perf_tests = set([
    'test/perf/perf_idl',
    'test/perf/perf_vint',
    'test/perf/perf_big_decimal',
+    'test/perf/perf_bti_key_translation',
    'test/perf/perf_sort_by_proximity',
 ])

@@ -644,6 +647,28 @@ vector_search_tests = set([
    'test/vector_search/client_test'
 ])

+vector_search_validator_bin = 'vector-search-validator/bin/vector-search-validator'
+vector_search_validator_deps = set([
+    'test/vector_search_validator/build-validator',
+    'test/vector_search_validator/Cargo.toml',
+    'test/vector_search_validator/crates/validator/Cargo.toml',
+    'test/vector_search_validator/crates/validator/src/main.rs',
+    'test/vector_search_validator/crates/validator-scylla/Cargo.toml',
+    'test/vector_search_validator/crates/validator-scylla/src/lib.rs',
+    'test/vector_search_validator/crates/validator-scylla/src/cql.rs',
+])
+
+vector_store_bin = 'vector-search-validator/bin/vector-store'
+vector_store_deps = set([
+    'test/vector_search_validator/build-env',
+    'test/vector_search_validator/build-vector-store',
+])
+
+vector_search_validator_bins = set([
+    vector_search_validator_bin,
+    vector_store_bin,
+])
+
 wasms = set([
    'wasm/return_input.wat',
    'wasm/test_complex_null_values.wat',
@@ -677,7 +702,7 @@ other = set([
    'iotune',
 ])

-all_artifacts = apps | cpp_apps | tests | other | wasms
+all_artifacts = apps | cpp_apps | tests | other | wasms | vector_search_validator_bins

 arg_parser = argparse.ArgumentParser('Configure scylla', add_help=False, formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 arg_parser.add_argument('--out', dest='buildfile', action='store', default='build.ninja',
@@ -761,6 +786,7 @@ arg_parser.add_argument('--use-cmake', action=argparse.BooleanOptionalAction, de
 arg_parser.add_argument('--coverage', action = 'store_true', help = 'Compile scylla with coverage instrumentation')
 arg_parser.add_argument('--build-dir', action='store', default='build',
                        help='Build directory path')
+arg_parser.add_argument('--disable-precompiled-header', action='store_true', default=False, help='Disable precompiled header for scylla binary')
 arg_parser.add_argument('-h', '--help', action='store_true', help='show this help message and exit')
 args = arg_parser.parse_args()
 if args.help:
@@ -790,6 +816,9 @@ scylla_raft_core = [
 ]

 scylla_core = (['message/messaging_service.cc',
+                'message/advanced_rpc_compressor.cc',
+                'message/stream_compressor.cc',
+                'message/dict_trainer.cc',
                'replica/database.cc',
                'replica/schema_describe_helper.cc',
                'replica/table.cc',
@@ -800,6 +829,7 @@ scylla_core = (['message/messaging_service.cc',
                'replica/dirty_memory_manager.cc',
                'replica/multishard_query.cc',
                'replica/mutation_dump.cc',
+                'replica/querier.cc',
                'mutation/atomic_cell.cc',
                'mutation/canonical_mutation.cc',
                'mutation/frozen_mutation.cc',
@@ -834,7 +864,6 @@ scylla_core = (['message/messaging_service.cc',
                'utils/buffer_input_stream.cc',
                'utils/limiting_data_source.cc',
                'utils/updateable_value.cc',
-                'utils/dict_trainer.cc',
                'message/dictionary_service.cc',
                'utils/directories.cc',
                'gms/generation-number.cc',
@@ -844,7 +873,6 @@ scylla_core = (['message/messaging_service.cc',
                'utils/io-wrappers.cc',
                'utils/on_internal_error.cc',
                'utils/pretty_printers.cc',
-                'utils/stream_compressor.cc',
                'utils/labels.cc',
                'mutation/converting_mutation_partition_applier.cc',
                'readers/combined.cc',
@@ -878,6 +906,7 @@ scylla_core = (['message/messaging_service.cc',
                'compaction/incremental_compaction_strategy.cc',
                'compaction/incremental_backlog_tracker.cc',
                'sstables/integrity_checked_file_impl.cc',
+                'sstables/object_storage_client.cc',
                'sstables/prepended_input_stream.cc',
                'sstables/m_format_read_helpers.cc',
                'sstables/sstable_directory.cc',
@@ -902,7 +931,6 @@ scylla_core = (['message/messaging_service.cc',
                'cdc/split.cc',
                'cdc/generation.cc',
                'cdc/metadata.cc',
-                'cql3/type_json.cc',
                'cql3/attributes.cc',
                'cql3/cf_name.cc',
                'cql3/cql3_type.cc',
@@ -989,6 +1017,7 @@ scylla_core = (['message/messaging_service.cc',
                'utils/uuid.cc',
                'utils/big_decimal.cc',
                'types/comparable_bytes.cc',
+                'types/json_utils.cc',
                'types/types.cc',
                'validation.cc',
                'service/migration_manager.cc',
@@ -1033,7 +1062,6 @@ scylla_core = (['message/messaging_service.cc',
                'db/hints/resource_manager.cc',
                'db/hints/sync_point.cc',
                'db/large_data_handler.cc',
-                'db/legacy_schema_migrator.cc',
                'db/marshal/type_parser.cc',
                'db/per_partition_rate_limit_options.cc',
                'db/rate_limiter.cc',
@@ -1057,6 +1085,7 @@ scylla_core = (['message/messaging_service.cc',
                'db/virtual_table.cc',
                'db/virtual_tables.cc',
                'db/tablet_options.cc',
+                'db/object_storage_endpoint_param.cc',
                'index/secondary_index_manager.cc',
                'index/secondary_index.cc',
                'index/vector_index.cc',
@@ -1077,15 +1106,13 @@ scylla_core = (['message/messaging_service.cc',
                'utils/rest/client.cc',
                'utils/s3/aws_error.cc',
                'utils/s3/client.cc',
-                'utils/s3/retryable_http_client.cc',
-                'utils/s3/retry_strategy.cc',
+                'utils/s3/default_aws_retry_strategy.cc',
                'utils/s3/credentials_providers/aws_credentials_provider.cc',
                'utils/s3/credentials_providers/environment_aws_credentials_provider.cc',
                'utils/s3/credentials_providers/instance_profile_credentials_provider.cc',
                'utils/s3/credentials_providers/sts_assume_role_credentials_provider.cc',
                'utils/s3/credentials_providers/aws_credentials_provider_chain.cc',
                'utils/s3/utils/manip_s3.cc',
-                'utils/advanced_rpc_compressor.cc',
                'utils/azure/identity/credentials.cc',
                'utils/azure/identity/service_principal_credentials.cc',
                'utils/azure/identity/managed_identity_credentials.cc',
@@ -1168,6 +1195,7 @@ scylla_core = (['message/messaging_service.cc',
                'auth/allow_all_authorizer.cc',
                'auth/authenticated_user.cc',
                'auth/authenticator.cc',
+                'auth/cache.cc',
                'auth/common.cc',
                'auth/default_authorizer.cc',
                'auth/resource.cc',
@@ -1192,6 +1220,7 @@ scylla_core = (['message/messaging_service.cc',
                'table_helper.cc',
                'audit/audit.cc',
                'audit/audit_cf_storage_helper.cc',
+                'audit/audit_composite_storage_helper.cc',
                'audit/audit_syslog_storage_helper.cc',
                'tombstone_gc_options.cc',
                'tombstone_gc.cc',
@@ -1200,7 +1229,6 @@ scylla_core = (['message/messaging_service.cc',
                'utils/aws_sigv4.cc',
                'types/duration.cc',
                'vint-serialization.cc',
-                'querier.cc',
                'mutation_writer/multishard_writer.cc',
                'ent/encryption/encryption_config.cc',
                'ent/encryption/encryption.cc',
@@ -1408,6 +1436,9 @@ scylla_tests_dependencies = scylla_core + alternator + idls + scylla_tests_gener
    'test/lib/random_schema.cc',
    'test/lib/key_utils.cc',
    'test/lib/proc_utils.cc',
+    'test/lib/gcs_fixture.cc',
+    'test/lib/aws_kms_fixture.cc',
+    'test/lib/azure_kms_fixture.cc',
 ]

 scylla_raft_dependencies = scylla_raft_core + ['utils/uuid.cc', 'utils/error_injection.cc', 'utils/exceptions.cc']
@@ -1616,6 +1647,7 @@ deps['test/boost/bytes_ostream_test'] = [
 ]
 deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
 deps['test/boost/UUID_test'] = ['clocks-impl.cc', 'utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
+deps['test/boost/url_parse_test'] = ['utils/http.cc', 'test/boost/url_parse_test.cc', ]
 deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
 deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'utils/labels.cc']
 deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
@@ -1816,6 +1848,9 @@ user_cflags = args.user_cflags + f" -ffile-prefix-map={curdir}=."
 # Since gcc 13, libgcc doesn't need the exception workaround
 user_cflags += ' -DSEASTAR_NO_EXCEPTION_HACK'

+# https://github.com/llvm/llvm-project/issues/163007
+user_cflags += ' -fextend-variable-liveness=none'
+
 if args.target != '':
    user_cflags += ' -march=' + args.target

@@ -2003,11 +2038,11 @@ def configure_seastar(build_dir, mode, mode_config):
        '-DCMAKE_CXX_EXTENSIONS=ON',
        '-DSeastar_CXX_FLAGS=SHELL:{}'.format(mode_config['lib_cflags'] + extra_file_prefix_map),
        '-DSeastar_LD_FLAGS={}'.format(semicolon_separated(mode_config['lib_ldflags'], seastar_cxx_ld_flags)),
-        '-DSeastar_API_LEVEL=8',
+        '-DSeastar_API_LEVEL=9',
        '-DSeastar_DEPRECATED_OSTREAM_FORMATTERS=OFF',
        '-DSeastar_UNUSED_RESULT_ERROR=ON',
        '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON',
-        '-DSeastar_SCHEDULING_GROUPS_COUNT=20',
+        '-DSeastar_SCHEDULING_GROUPS_COUNT=21',
        '-DSeastar_IO_URING=ON',
    ]

@@ -2177,7 +2212,15 @@ if os.path.exists(kmipc_lib):
    user_cflags += f' -I{kmipc_dir}/include -DHAVE_KMIP'

 def get_extra_cxxflags(mode, mode_config, cxx, debuginfo):
-    cxxflags = []
+    cxxflags = [
+        # we need this flag for correct precompiled header handling in connection with ccache (or similar)
+        # `git` tools don't preserve timestamps, so when using ccache it might be possible to add pch to ccache
+        # and then later (after for example rebase) get `stdafx.hh` with different timestamp, but the same content.
+        # this will tell ccache to bring pch from its cache. Later on clang will check if timestamps match and complain.
+        # Adding `-fpch-validate-input-files-content` tells clang to check content of stdafx.hh if timestamps don't match.
+        # The flag seems to be present in gcc as well.
+        "" if args.disable_precompiled_header else '-fpch-validate-input-files-content'
+    ]

    optimization_level = mode_config['optimization-level']
    cxxflags.append(f'-O{optimization_level}')
@@ -2242,6 +2285,7 @@ def write_build_file(f,
                     scylla_version,
                     scylla_release,
                     args):
+    use_precompiled_header = not args.disable_precompiled_header
    warnings = get_warning_options(args.cxx)
    rustc_target = pick_rustc_target('wasm32-wasi', 'wasm32-wasip1')
    f.write(textwrap.dedent('''\
@@ -2348,7 +2392,10 @@ def write_build_file(f,

    for mode in build_modes:
        modeval = modes[mode]
-
+        seastar_lib_ext = 'so' if modeval['build_seastar_shared_libs'] else 'a'
+        seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
+        seastar_testing_dep = f'$builddir/{mode}/seastar/libseastar_testing.{seastar_lib_ext}'
+        abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
        fmt_lib = 'fmt'
        f.write(textwrap.dedent('''\
            cxx_ld_flags_{mode} = {cxx_ld_flags}
@@ -2361,6 +2408,14 @@ def write_build_file(f,
              command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in
              description = CXX $out
              depfile = $out.d
+            rule cxx_build_precompiled_header.{mode}
+              command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in -Winvalid-pch -fpch-instantiate-templates -Xclang -emit-pch -DSCYLLA_USE_PRECOMPILED_HEADER
+              description = CXX-PRECOMPILED-HEADER $out
+              depfile = $out.d
+            rule cxx_with_pch.{mode}
+              command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in -Winvalid-pch -Xclang -include-pch -Xclang $builddir/{mode}/stdafx.hh.pch
+              description = CXX $out
+              depfile = $out.d
            rule link.{mode}
              command = $cxx  $ld_flags_{mode} $ldflags -o $out $in $libs $libs_{mode}
              description = LINK $out
@@ -2394,7 +2449,7 @@ def write_build_file(f,
                        $builddir/{mode}/gen/${{stem}}Parser.cpp
                description = ANTLR3 $in
            rule checkhh.{mode}
-              command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags --include $in -c -o $out $builddir/{mode}/gen/empty.cc
+              command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags --include $in -c -o $out $builddir/{mode}/gen/empty.cc -USCYLLA_USE_PRECOMPILED_HEADER
              description = CHECKHH $in
              depfile = $out.d
            rule test.{mode}
@@ -2408,10 +2463,11 @@ def write_build_file(f,
              description = RUST_LIB $out
            ''').format(mode=mode, antlr3_exec=args.antlr3_exec, fmt_lib=fmt_lib, test_repeat=args.test_repeat, test_timeout=args.test_timeout, **modeval))
        f.write(
-            'build {mode}-build: phony {artifacts} {wasms}\n'.format(
+            'build {mode}-build: phony {artifacts} {wasms} {vector_search_validator_bins}\n'.format(
                mode=mode,
-                artifacts=str.join(' ', ['$builddir/' + mode + '/' + x for x in sorted(build_artifacts - wasms)]),
+                artifacts=str.join(' ', ['$builddir/' + mode + '/' + x for x in sorted(build_artifacts - wasms - vector_search_validator_bins)]),
                wasms = str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & wasms)]),
+                vector_search_validator_bins=str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & vector_search_validator_bins)]),
            )
        )
        if profile_recipe := modes[mode].get('profile_recipe'):
@@ -2420,6 +2476,7 @@ def write_build_file(f,
        include_dist_target = f'dist-{mode}' if args.enable_dist is None or args.enable_dist else ''
        f.write(f'build {mode}: phony {include_cxx_target} {include_dist_target}\n')
        compiles = {}
+        compiles_with_pch = set()
        swaggers = set()
        serializers = {}
        ragels = {}
@@ -2434,16 +2491,16 @@ def write_build_file(f,
        # object code. And we enable LTO when linking the main Scylla executable, while disable
        # it when linking anything else.

-        seastar_lib_ext = 'so' if modeval['build_seastar_shared_libs'] else 'a'
        for binary in sorted(build_artifacts):
            if modeval['is_profile'] and binary != "scylla":
                # Just to avoid clutter in build.ninja
                continue
            profile_dep = modes[mode].get('profile_target', "")

-            if binary in other or binary in wasms:
+            if binary in other or binary in wasms or binary in vector_search_validator_bins:
                continue
            srcs = deps[binary]
+            # 'scylla'
            objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o')
                    for src in srcs
                    if src.endswith('.cc')]
@@ -2479,9 +2536,6 @@ def write_build_file(f,
                continue

            do_lto = modes[mode]['has_lto'] and binary in lto_binaries
-            seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
-            seastar_testing_dep = f'$builddir/{mode}/seastar/libseastar_testing.{seastar_lib_ext}'
-            abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
            seastar_testing_libs = f'$seastar_testing_libs_{mode}'

            local_libs = f'$seastar_libs_{mode} $libs'
@@ -2491,6 +2545,7 @@ def write_build_file(f,
                local_libs += ' -flto=thin -ffat-lto-objects'
            else:
                local_libs += ' -fno-lto'
+            use_pch = use_precompiled_header and binary == 'scylla'
            if binary in tests:
                if binary in pure_boost_tests:
                    local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
@@ -2519,6 +2574,8 @@ def write_build_file(f,
                if src.endswith('.cc'):
                    obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
                    compiles[obj] = src
+                    if use_pch:
+                        compiles_with_pch.add(obj)
                elif src.endswith('.idl.hh'):
                    hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
                    serializers[hh] = src
@@ -2551,10 +2608,11 @@ def write_build_file(f,
        )

        f.write(
-            'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/scylla {wasms}\n'.format(
+            'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/scylla {wasms} {vector_search_validator_bins} \n'.format(
                mode=mode,
                test_executables=' '.join(['$builddir/{}/{}'.format(mode, binary) for binary in sorted(tests)]),
                wasms=' '.join([f'$builddir/{binary}' for binary in sorted(wasms)]),
+                vector_search_validator_bins=' '.join([f'$builddir/{binary}' for binary in sorted(vector_search_validator_bins)]),
            )
        )
        f.write(
@@ -2597,7 +2655,9 @@ def write_build_file(f,
            src = compiles[obj]
            seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
            abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
-            f.write(f'build {obj}: cxx.{mode} {src} | {profile_dep} || {seastar_dep} {abseil_dep} {gen_headers_dep}\n')
+            pch_dep = f'$builddir/{mode}/stdafx.hh.pch' if obj in compiles_with_pch else ''
+            cxx_cmd = 'cxx_with_pch' if obj in compiles_with_pch else 'cxx'
+            f.write(f'build {obj}: {cxx_cmd}.{mode} {src} | {profile_dep} {seastar_dep} {abseil_dep} {gen_headers_dep} {pch_dep}\n')
            if src in modeval['per_src_extra_cxxflags']:
                f.write('    cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode=mode, extra_cxxflags=modeval["per_src_extra_cxxflags"][src], **modeval))
        for swagger in swaggers:
@@ -2658,6 +2718,8 @@ def write_build_file(f,
            f.write('  target = {lib}\n'.format(**locals()))
            f.write('  profile_dep = {profile_dep}\n'.format(**locals()))

+        f.write(f'build $builddir/{mode}/stdafx.hh.pch: cxx_build_precompiled_header.{mode} stdafx.hh | {profile_dep} {seastar_dep} {abseil_dep} {gen_headers_dep} {pch_dep}\n')
+
        f.write('build $builddir/{mode}/seastar/apps/iotune/iotune: ninja $builddir/{mode}/seastar/build.ninja | $builddir/{mode}/seastar/libseastar.{seastar_lib_ext}\n'
                .format(**locals()))
        f.write('  pool = submodule_pool\n')
@@ -2721,6 +2783,19 @@ def write_build_file(f,
            'build compiler-training: phony {}\n'.format(' '.join(['{mode}-compiler-training'.format(mode=mode) for mode in default_modes]))
    )

+    f.write(textwrap.dedent(f'''\
+        rule build-vector-search-validator
+            command = test/vector_search_validator/build-validator $builddir
+        rule build-vector-store
+            command = test/vector_search_validator/build-vector-store $builddir
+        '''))
+    f.write(
+            'build $builddir/{vector_search_validator_bin}: build-vector-search-validator {}\n'.format(' '.join([dep for dep in sorted(vector_search_validator_deps)]), vector_search_validator_bin=vector_search_validator_bin)
+    )
+    f.write(
+            'build $builddir/{vector_store_bin}: build-vector-store {}\n'.format(' '.join([dep for dep in sorted(vector_store_deps)]), vector_store_bin=vector_store_bin)
+    )
+
    f.write(textwrap.dedent(f'''\
        build dist-unified-tar: phony {' '.join([f'$builddir/{mode}/dist/tar/{scylla_product}-unified-{scylla_version}-{scylla_release}.{arch}.tar.gz' for mode in default_modes])}
        build dist-unified: phony dist-unified-tar
@@ -2934,7 +3009,7 @@ def configure_using_cmake(args):
        'CMAKE_DEFAULT_CONFIGS': selected_configs,
        'CMAKE_C_COMPILER': args.cc,
        'CMAKE_CXX_COMPILER': args.cxx,
-        'CMAKE_CXX_FLAGS': args.user_cflags,
+        'CMAKE_CXX_FLAGS': args.user_cflags + ("" if args.disable_precompiled_header else " -fpch-validate-input-files-content"),
        'CMAKE_EXE_LINKER_FLAGS': args.user_ldflags,
        'CMAKE_EXPORT_COMPILE_COMMANDS': 'ON',
        'Scylla_CHECK_HEADERS': 'ON',
@@ -2943,6 +3018,7 @@ def configure_using_cmake(args):
        'Scylla_TEST_REPEAT': args.test_repeat,
        'Scylla_ENABLE_LTO': 'ON' if args.lto else 'OFF',
        'Scylla_WITH_DEBUG_INFO' : 'ON' if args.debuginfo else 'OFF',
+        'Scylla_USE_PRECOMPILED_HEADER': 'OFF' if args.disable_precompiled_header else 'ON',
    }
    if args.date_stamp:
        settings['Scylla_DATE_STAMP'] = args.date_stamp
--- a/cql3/CMakeLists.txt
+++ b/cql3/CMakeLists.txt
@@ -28,7 +28,6 @@ set_property(
 add_library(cql3 STATIC)
 target_sources(cql3
  PRIVATE
-    type_json.cc
    attributes.cc
    cf_name.cc
    cql3_type.cc
@@ -139,5 +138,8 @@ target_link_libraries(cql3
    lang
    transport)

+if (Scylla_USE_PRECOMPILED_HEADER_USE)
+  target_precompile_headers(cql3 REUSE_FROM scylla-precompiled-header)
+endif()
 check_headers(check-headers cql3
  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -219,44 +219,17 @@ using uexpression = uninitialized<expression>;
        return token->getText();
    }

+    error_sink_fn get_error_sink() {
+        return [this] (const std::string& msg) { add_recognition_error(msg); };
+    }
+
    std::map<sstring, sstring> convert_property_map(const collection_constructor& map) {
-        if (map.elements.empty()) {
-            return std::map<sstring, sstring>{};
-        }
-        std::map<sstring, sstring> res;
-        for (auto&& entry : map.elements) {
-            auto entry_tuple = expr::as_if<tuple_constructor>(&entry);
-            // Because the parser tries to be smart and recover on error (to
-            // allow displaying more than one error I suppose), we have default-constructed
-            // entries in map.elements. Just skip those, a proper error will be thrown in the end.
-            if (!entry_tuple || entry_tuple->elements.size() != 2) {
-                break;
-            }
-            auto left = expr::as_if<untyped_constant>(&entry_tuple->elements[0]);
-            if (!left) {
-                sstring msg = fmt::format("Invalid property name: {}", entry_tuple->elements[0]);
-                if (expr::is<bind_variable>(entry_tuple->elements[0])) {
-                    msg += " (bind variables are not supported in DDL queries)";
-                }
-                add_recognition_error(msg);
-                break;
-            }
-            auto right = expr::as_if<untyped_constant>(&entry_tuple->elements[1]);
-            if (!right) {
-                sstring msg = fmt::format("Invalid property value: {} for property: {}", entry_tuple->elements[0], entry_tuple->elements[1]);
-                if (expr::is<bind_variable>(entry_tuple->elements[1])) {
-                    msg += " (bind variables are not supported in DDL queries)";
-                }
-                add_recognition_error(msg);
-                break;
-            }
-            if (!res.emplace(left->raw_text, right->raw_text).second) {
-                sstring msg = fmt::format("Multiple definition for property {}", left->raw_text);
-                add_recognition_error(msg);
-                break;
-            }
-        }
-        return res;
+        return cql3::expr::convert_property_map(map, get_error_sink());
+    }
+
+    property_definitions::extended_map_type
+    convert_extended_property_map(const collection_constructor& map) {
+        return cql3::expr::convert_extended_property_map(map, get_error_sink());
    }

    sstring to_lower(std::string_view s) {
@@ -602,6 +575,15 @@ usingTimeoutServiceLevelClauseObjective[std::unique_ptr<cql3::attributes::raw>&
    | serviceLevel sl_name=serviceLevelOrRoleName { attrs->service_level = std::move(sl_name); }
    ;

+usingTimeoutConcurrencyClause[std::unique_ptr<cql3::attributes::raw>& attrs]
+    : K_USING usingTimeoutConcurrencyClauseObjective[attrs] ( K_AND usingTimeoutConcurrencyClauseObjective[attrs] )*
+    ;
+
+usingTimeoutConcurrencyClauseObjective[std::unique_ptr<cql3::attributes::raw>& attrs]
+    : K_TIMEOUT to=term { attrs->timeout = std::move(to); }
+    | K_CONCURRENCY c=term { attrs->concurrency = std::move(c); }
+    ;
+
 /**
 * UPDATE <CF>
 * USING TIMESTAMP <long>
@@ -693,7 +675,7 @@ pruneMaterializedViewStatement returns [std::unique_ptr<raw::select_statement> e
        auto attrs = std::make_unique<cql3::attributes::raw>();
        expression wclause = conjunction{};
    }
-	: K_PRUNE K_MATERIALIZED K_VIEW cf=columnFamilyName (K_WHERE w=whereClause { wclause = std::move(w); } )? ( usingClause[attrs] )?
+	: K_PRUNE K_MATERIALIZED K_VIEW cf=columnFamilyName (K_WHERE w=whereClause { wclause = std::move(w); } )? ( usingTimeoutConcurrencyClause[attrs] )?
 	  {
 	        auto params = make_lw_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, statement_subtype, bypass_cache);
 	        return std::make_unique<raw::select_statement>(std::move(cf), std::move(params),
@@ -1587,6 +1569,10 @@ serviceLevelOrRoleName returns [sstring name]
 | t=QUOTED_NAME        { $name = sstring($t.text); }
 | k=unreserved_keyword { $name = k;
 						 std::transform($name.begin(), $name.end(), $name.begin(), ::tolower);}
+// The literal `default` will not be parsed by any of the previous
+// rules, so we need to cover it manually. Needed by CREATE SERVICE
+// LEVEL and ATTACH SERVICE LEVEL.
+| t=K_DEFAULT          { $name = sstring("default"); }
 | QMARK {add_recognition_error("Bind variables cannot be used for service levels or role names");}
 ;

@@ -1834,7 +1820,7 @@ properties[cql3::statements::property_definitions& props]

 property[cql3::statements::property_definitions& props]
    : k=ident '=' simple=propertyValue { try { $props.add_property(k->to_string(), simple); } catch (exceptions::syntax_exception e) { add_recognition_error(e.what()); } }
-    | k=ident '=' map=mapLiteral { try { $props.add_property(k->to_string(), convert_property_map(map)); } catch (exceptions::syntax_exception e) { add_recognition_error(e.what()); } }
+    | k=ident '=' map=mapLiteral { try { $props.add_property(k->to_string(), convert_extended_property_map(map)); } catch (exceptions::syntax_exception e) { add_recognition_error(e.what()); } }
    ;

 propertyValue returns [sstring str]
@@ -2393,6 +2379,7 @@ K_LIKE:        L I K E;

 K_TIMEOUT:     T I M E O U T;
 K_PRUNE:       P R U N E;
+K_CONCURRENCY: C O N C U R R E N C Y;

 K_EXECUTE:     E X E C U T E;

--- a/cql3/attributes.cc
+++ b/cql3/attributes.cc
@@ -20,19 +20,21 @@
 namespace cql3 {

 std::unique_ptr<attributes> attributes::none() {
-    return std::unique_ptr<attributes>{new attributes{{}, {}, {}, {}}};
+    return std::unique_ptr<attributes>{new attributes{{}, {}, {}, {}, {}}};
 }

 attributes::attributes(std::optional<cql3::expr::expression>&& timestamp,
                       std::optional<cql3::expr::expression>&& time_to_live,
                       std::optional<cql3::expr::expression>&& timeout,
-                       std::optional<sstring> service_level)
+                       std::optional<sstring> service_level,
+                       std::optional<cql3::expr::expression>&& concurrency)
    : _timestamp_unset_guard(timestamp)
    , _timestamp{std::move(timestamp)}
    , _time_to_live_unset_guard(time_to_live)
    , _time_to_live{std::move(time_to_live)}
    , _timeout{std::move(timeout)}
    , _service_level(std::move(service_level))
+    , _concurrency{std::move(concurrency)}
 { }

 bool attributes::is_timestamp_set() const {
@@ -51,6 +53,10 @@ bool attributes::is_service_level_set() const {
    return bool(_service_level);
 }

+bool attributes::is_concurrency_set() const {
+    return bool(_concurrency);
+}
+
 int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
    if (!_timestamp.has_value() || _timestamp_unset_guard.is_unset(options)) {
        return now;
@@ -123,6 +129,27 @@ qos::service_level_options attributes::get_service_level(qos::service_level_cont
    return sl_controller.get_service_level(sl_name).slo;
 }

+std::optional<int32_t> attributes::get_concurrency(const query_options& options) const {
+    if (!_concurrency.has_value()) {
+        return std::nullopt;
+    }
+
+    cql3::raw_value concurrency_raw = expr::evaluate(*_concurrency, options);
+    if (concurrency_raw.is_null()) {
+        throw exceptions::invalid_request_exception("Invalid null value of concurrency");
+    }
+    int32_t concurrency;
+    try {
+        concurrency = concurrency_raw.view().validate_and_deserialize<int32_t>(*int32_type);
+    } catch (marshal_exception& e) {
+        throw exceptions::invalid_request_exception("Invalid concurrency value");
+    }
+    if (concurrency <= 0) {
+        throw exceptions::invalid_request_exception("Concurrency must be a positive integer");
+    }
+    return concurrency;
+}
+
 void attributes::fill_prepare_context(prepare_context& ctx) {
    if (_timestamp.has_value()) {
        expr::fill_prepare_context(*_timestamp, ctx);
@@ -133,10 +160,13 @@ void attributes::fill_prepare_context(prepare_context& ctx) {
    if (_timeout.has_value()) {
        expr::fill_prepare_context(*_timeout, ctx);
    }
+    if (_concurrency.has_value()) {
+        expr::fill_prepare_context(*_concurrency, ctx);
+    }
 }

 std::unique_ptr<attributes> attributes::raw::prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const {
-    std::optional<expr::expression> ts, ttl, to;
+    std::optional<expr::expression> ts, ttl, to, conc;

    if (timestamp.has_value()) {
        ts = prepare_expression(*timestamp, db, ks_name, nullptr, timestamp_receiver(ks_name, cf_name));
@@ -153,7 +183,12 @@ std::unique_ptr<attributes> attributes::raw::prepare(data_dictionary::database d
        verify_no_aggregate_functions(*timeout, "USING clause");
    }

-    return std::unique_ptr<attributes>{new attributes{std::move(ts), std::move(ttl), std::move(to), std::move(service_level)}};
+    if (concurrency.has_value()) {
+        conc = prepare_expression(*concurrency, db, ks_name, nullptr, concurrency_receiver(ks_name, cf_name));
+        verify_no_aggregate_functions(*concurrency, "USING clause");
+    }
+
+    return std::unique_ptr<attributes>{new attributes{std::move(ts), std::move(ttl), std::move(to), std::move(service_level), std::move(conc)}};
 }

 lw_shared_ptr<column_specification> attributes::raw::timestamp_receiver(const sstring& ks_name, const sstring& cf_name) const {
@@ -168,4 +203,8 @@ lw_shared_ptr<column_specification> attributes::raw::timeout_receiver(const sstr
    return make_lw_shared<column_specification>(ks_name, cf_name, ::make_shared<column_identifier>("[timeout]", true), duration_type);
 }

+lw_shared_ptr<column_specification> attributes::raw::concurrency_receiver(const sstring& ks_name, const sstring& cf_name) const {
+    return make_lw_shared<column_specification>(ks_name, cf_name, ::make_shared<column_identifier>("[concurrency]", true), data_type_for<int32_t>());
+}
+
 }
--- a/cql3/attributes.hh
+++ b/cql3/attributes.hh
@@ -36,13 +36,15 @@ private:
    std::optional<cql3::expr::expression> _time_to_live;
    std::optional<cql3::expr::expression> _timeout;
    std::optional<sstring> _service_level;
+    std::optional<cql3::expr::expression> _concurrency;
 public:
    static std::unique_ptr<attributes> none();
 private:
    attributes(std::optional<cql3::expr::expression>&& timestamp,
               std::optional<cql3::expr::expression>&& time_to_live,
               std::optional<cql3::expr::expression>&& timeout,
-               std::optional<sstring> service_level);
+               std::optional<sstring> service_level,
+               std::optional<cql3::expr::expression>&& concurrency);
 public:
    bool is_timestamp_set() const;

@@ -52,6 +54,8 @@ public:

    bool is_service_level_set() const;

+    bool is_concurrency_set() const;
+
    int64_t get_timestamp(int64_t now, const query_options& options);

    std::optional<int32_t> get_time_to_live(const query_options& options);
@@ -60,6 +64,8 @@ public:

    qos::service_level_options get_service_level(qos::service_level_controller& sl_controller) const;

+    std::optional<int32_t> get_concurrency(const query_options& options) const;
+
    void fill_prepare_context(prepare_context& ctx);

    class raw final {
@@ -68,6 +74,7 @@ public:
        std::optional<cql3::expr::expression> time_to_live;
        std::optional<cql3::expr::expression> timeout;
        std::optional<sstring> service_level;
+        std::optional<cql3::expr::expression> concurrency;

        std::unique_ptr<attributes> prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const;
    private:
@@ -76,6 +83,8 @@ public:
        lw_shared_ptr<column_specification> time_to_live_receiver(const sstring& ks_name, const sstring& cf_name) const;

        lw_shared_ptr<column_specification> timeout_receiver(const sstring& ks_name, const sstring& cf_name) const;
+
+        lw_shared_ptr<column_specification> concurrency_receiver(const sstring& ks_name, const sstring& cf_name) const;
    };
 };

--- a/cql3/expr/expression.cc
+++ b/cql3/expr/expression.cc
@@ -2397,6 +2397,107 @@ split_aggregation(std::span<const expression> aggregation) {
    };
 }

+std::map<sstring, sstring> convert_property_map(const collection_constructor& map, error_sink_fn add_recognition_error) {
+    if (map.elements.empty()) {
+        return std::map<sstring, sstring>{};
+    }
+    std::map<sstring, sstring> res;
+    for (auto&& entry : map.elements) {
+        auto entry_tuple = expr::as_if<tuple_constructor>(&entry);
+        // Because the parser tries to be smart and recover on error (to
+        // allow displaying more than one error I suppose), we have default-constructed
+        // entries in map.elements. Just skip those, a proper error will be thrown in the end.
+        if (!entry_tuple || entry_tuple->elements.size() != 2) {
+            break;
+        }
+        auto left = expr::as_if<untyped_constant>(&entry_tuple->elements[0]);
+        if (!left) {
+            sstring msg = fmt::format("Invalid property name: {}", entry_tuple->elements[0]);
+            if (expr::is<bind_variable>(entry_tuple->elements[0])) {
+                msg += " (bind variables are not supported in DDL queries)";
+            }
+            add_recognition_error(msg);
+            break;
+        }
+        auto right = expr::as_if<untyped_constant>(&entry_tuple->elements[1]);
+        if (!right) {
+            sstring msg = fmt::format("Invalid property value: {} for property: {}", entry_tuple->elements[0], entry_tuple->elements[1]);
+            if (expr::is<bind_variable>(entry_tuple->elements[1])) {
+                msg += " (bind variables are not supported in DDL queries)";
+            }
+            add_recognition_error(msg);
+            break;
+        }
+        if (!res.emplace(left->raw_text, right->raw_text).second) {
+            sstring msg = fmt::format("Multiple definition for property {}", left->raw_text);
+            add_recognition_error(msg);
+            break;
+        }
+    }
+    return res;
+}
+
+std::map<sstring, std::variant<sstring, std::vector<sstring>>>
+convert_extended_property_map(const collection_constructor& map, error_sink_fn add_recognition_error) {
+    if (map.elements.empty()) {
+        return {};
+    }
+    std::map<sstring, std::variant<sstring, std::vector<sstring>>> res;
+    for (auto&& entry : map.elements) {
+        auto entry_tuple = expr::as_if<tuple_constructor>(&entry);
+        // Because the parser tries to be smart and recover on error (to
+        // allow displaying more than one error I suppose), we have default-constructed
+        // entries in map.elements. Just skip those, a proper error will be thrown in the end.
+        if (!entry_tuple || entry_tuple->elements.size() != 2) {
+            break;
+        }
+        auto left = expr::as_if<untyped_constant>(&entry_tuple->elements[0]);
+        if (!left) {
+            sstring msg = fmt::format("Invalid property name: {}", entry_tuple->elements[0]);
+            if (expr::is<bind_variable>(entry_tuple->elements[0])) {
+                msg += " (bind variables are not supported in DDL queries)";
+            }
+            add_recognition_error(msg);
+            break;
+        }
+        auto right_str = expr::as_if<untyped_constant>(&entry_tuple->elements[1]);
+        if (right_str) {
+            if (!res.emplace(left->raw_text, right_str->raw_text).second) {
+                sstring msg = fmt::format("Multiple definition for property {}", left->raw_text);
+                add_recognition_error(msg);
+                break;
+            }
+        } else {
+            auto right_vec = expr::as_if<collection_constructor>(&entry_tuple->elements[1]);
+            if (!right_vec) {
+                sstring msg = fmt::format("Invalid property value: {} for property: {}", entry_tuple->elements[1], entry_tuple->elements[0]);
+                if (expr::is<bind_variable>(entry_tuple->elements[1])) {
+                    msg += " (bind variables are not supported in DDL queries)";
+                }
+                add_recognition_error(msg);
+                break;
+            }
+            auto values = right_vec->elements | std::views::transform([&] (const auto& x) -> sstring {
+                auto elem = expr::as_if<untyped_constant>(&x);
+                if (!elem) {
+                    sstring msg = fmt::format("Invalid property vector value: {} for property: {}", x, entry_tuple->elements[0]);
+                    if (expr::is<bind_variable>(x)) {
+                        msg += " (bind variables are not supported in DDL queries)";
+                    }
+                    add_recognition_error(msg);
+                    return "<invalid>";
+                }
+                return elem->raw_text;
+            }) | std::ranges::to<std::vector<sstring>>();
+            if (!res.emplace(left->raw_text, std::move(values)).second) {
+                sstring msg = fmt::format("Multiple definition for property {}", left->raw_text);
+                add_recognition_error(msg);
+                break;
+            }
+        }
+    }
+    return res;
+}

 } // namespace expr
 } // namespace cql3
--- a/cql3/expr/expression.hh
+++ b/cql3/expr/expression.hh
@@ -430,6 +430,14 @@ struct collection_constructor {
    friend bool operator==(const collection_constructor&, const collection_constructor&) = default;
 };

+// Called with error message string.
+using error_sink_fn = std::function<void(const std::string&)>;
+
+std::map<sstring, sstring> convert_property_map(const collection_constructor&, error_sink_fn);
+
+std::map<sstring, std::variant<sstring, std::vector<sstring>>>
+convert_extended_property_map(const collection_constructor&, error_sink_fn);
+
 // Constructs an object of a user-defined type
 // For example: "{field1: 23343, field2: ?}"
 // During preparation usertype constructors with constant values are converted to expr::constant.
--- a/cql3/functions/as_json_function.hh
+++ b/cql3/functions/as_json_function.hh
@@ -13,10 +13,10 @@
 #include "cql3/functions/scalar_function.hh"
 #include "cql3/functions/function_name.hh"
 #include "cql3/cql3_type.hh"
-#include "cql3/type_json.hh"

 #include "bytes_ostream.hh"
 #include "types/types.hh"
+#include "types/json_utils.hh"

 namespace cql3 {

--- a/cql3/functions/functions.cc
+++ b/cql3/functions/functions.cc
@@ -10,7 +10,6 @@
 #include "functions.hh"
 #include "token_fct.hh"
 #include "cql3/ut_name.hh"
-#include "cql3/type_json.hh"
 #include "cql3/functions/aggregate_fcts.hh"
 #include "cql3/functions/bytes_conversion_fcts.hh"
 #include "cql3/functions/time_uuid_fcts.hh"
@@ -22,6 +21,7 @@
 #include "cql3/prepare_context.hh"
 #include "user_aggregate.hh"
 #include "cql3/expr/expression.hh"
+#include "types/json_utils.hh"
 #include "types/set.hh"
 #include "types/listlike_partial_deserializing_iterator.hh"

--- a/cql3/restrictions/statement_restrictions.cc
+++ b/cql3/restrictions/statement_restrictions.cc
@@ -1322,10 +1322,6 @@ const std::vector<expr::expression>& statement_restrictions::index_restrictions(
    return _index_restrictions;
 }

-bool statement_restrictions::is_empty() const {
-    return !_where.has_value();
-}
-
 // Current score table:
 // local and restrictions include full partition key: 2
 // global: 1
--- a/cql3/restrictions/statement_restrictions.hh
+++ b/cql3/restrictions/statement_restrictions.hh
@@ -408,8 +408,6 @@ public:

    /// Checks that the primary key restrictions don't contain null values, throws invalid_request_exception otherwise.
    void validate_primary_key(const query_options& options) const;
-
-    bool is_empty() const;
 };

 statement_restrictions analyze_statement_restrictions(
--- a/cql3/statements/alter_keyspace_statement.cc
+++ b/cql3/statements/alter_keyspace_statement.cc
@@ -14,7 +14,9 @@
 #include <stdexcept>
 #include <vector>
 #include "alter_keyspace_statement.hh"
+#include "cql3/statements/property_definitions.hh"
 #include "locator/tablets.hh"
+#include "locator/abstract_replication_strategy.hh"
 #include "mutation/canonical_mutation.hh"
 #include "prepared_statement.hh"
 #include "service/migration_manager.hh"
@@ -49,16 +51,8 @@ future<> cql3::statements::alter_keyspace_statement::check_access(query_processo
    return state.has_keyspace_access(_name, auth::permission::ALTER);
 }

-static unsigned get_abs_rf_diff(const std::string& curr_rf, const std::string& new_rf) {
-    try {
-        return std::abs(std::stoi(curr_rf) - std::stoi(new_rf));
-    } catch (std::invalid_argument const& ex) {
-        on_internal_error(mylogger, fmt::format("get_abs_rf_diff expects integer arguments, "
-                                                "but got curr_rf:{} and new_rf:{}", curr_rf, new_rf));
-    } catch (std::out_of_range const& ex) {
-        on_internal_error(mylogger, fmt::format("get_abs_rf_diff expects integer arguments to fit into `int` type, "
-                                                "but got curr_rf:{} and new_rf:{}", curr_rf, new_rf));
-    }
+static unsigned get_abs_rf_diff(const locator::replication_strategy_config_option& curr_rf, const locator::replication_strategy_config_option& new_rf) {
+    return std::abs(ssize_t(locator::get_replication_factor(curr_rf)) - ssize_t(locator::get_replication_factor(new_rf)));
 }

 void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, const service::client_state& state) const {
@@ -85,19 +79,22 @@ void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, c
                        current_options.type_string(), new_options.type_string()));
            }

-            auto new_ks = _attrs->as_ks_metadata_update(ks.metadata(), *qp.proxy().get_token_metadata_ptr(), qp.proxy().features());
+            auto new_ks = _attrs->as_ks_metadata_update(ks.metadata(), *qp.proxy().get_token_metadata_ptr(), qp.proxy().features(), qp.db().get_config());
+
+            auto tmptr = qp.proxy().get_token_metadata_ptr();
+            const auto& topo = tmptr->get_topology();

            if (ks.get_replication_strategy().uses_tablets()) {
-                const std::map<sstring, sstring>& current_rf_per_dc = ks.metadata()->strategy_options();
+                auto& current_rf_per_dc = ks.metadata()->strategy_options();
                auto new_rf_per_dc = _attrs->get_replication_options();
                new_rf_per_dc.erase(ks_prop_defs::REPLICATION_STRATEGY_CLASS_KEY);
                unsigned total_abs_rfs_diff = 0;
                for (const auto& [new_dc, new_rf] : new_rf_per_dc) {
-                    sstring old_rf = "0";
+                    auto old_rf = locator::replication_strategy_config_option(sstring("0"));
                    if (auto new_dc_in_current_mapping = current_rf_per_dc.find(new_dc);
                             new_dc_in_current_mapping != current_rf_per_dc.end()) {
                        old_rf = new_dc_in_current_mapping->second;
-                    } else if (!qp.proxy().get_token_metadata_ptr()->get_topology().get_datacenters().contains(new_dc)) {
+                    } else if (!topo.get_datacenters().contains(new_dc)) {
                        // This means that the DC listed in ALTER doesn't exist. This error will be reported later,
                        // during validation in abstract_replication_strategy::validate_replication_strategy.
                        // We can't report this error now, because it'd change the order of errors reported:
@@ -110,11 +107,14 @@ void cql3::statements::alter_keyspace_statement::validate(query_processor& qp, c
                }
            }

-            locator::replication_strategy_params params(new_ks->strategy_options(), new_ks->initial_tablets());
-            auto new_rs = locator::abstract_replication_strategy::create_replication_strategy(new_ks->strategy_name(), params);
+            locator::replication_strategy_params params(new_ks->strategy_options(), new_ks->initial_tablets(), new_ks->consistency_option());
+            auto new_rs = locator::abstract_replication_strategy::create_replication_strategy(new_ks->strategy_name(), params, topo);
            if (new_rs->is_per_table() != ks.get_replication_strategy().is_per_table()) {
                throw exceptions::invalid_request_exception(format("Cannot alter replication strategy vnode/tablets flavor"));
            }
+            if (new_ks->consistency_option() && new_ks->consistency_option() != ks.metadata()->consistency_option()) {
+                throw exceptions::invalid_request_exception(format("Cannot alter consistency option"));
+            }
        } catch (const std::runtime_error& e) {
            throw exceptions::invalid_request_exception(e.what());
        }
@@ -135,62 +135,6 @@ bool cql3::statements::alter_keyspace_statement::changes_tablets(query_processor
    return ks.get_replication_strategy().uses_tablets() && !_attrs->get_replication_options().empty();
 }

-namespace {
-// These functions are used to flatten all the options in the keyspace definition into a single-level map<string, string>.
-// (Currently options are stored in a nested structure that looks more like a map<string, map<string, string>>).
-// Flattening is simply joining the keys of maps from both levels with a colon ':' character,
-// or in other words: prefixing the keys in the output map with the option type, e.g. 'replication', 'storage', etc.,
-// so that the output map contains entries like: "replication:dc1" -> "3".
-// This is done to avoid key conflicts and to be able to de-flatten the map back into the original structure.
-
-void add_prefixed_key(const sstring& prefix, const std::map<sstring, sstring>& in, std::map<sstring, sstring>& out) {
-    for (const auto& [in_key, in_value]: in) {
-        out[prefix + ":" + in_key] = in_value;
-    }
-};
-
-std::map<sstring, sstring> get_current_options_flattened(const shared_ptr<cql3::statements::ks_prop_defs>& ks,
-                                                         const gms::feature_service& feat) {
-    std::map<sstring, sstring> all_options;
-
-    add_prefixed_key(ks->KW_REPLICATION, ks->get_replication_options(), all_options);
-    add_prefixed_key(ks->KW_STORAGE, ks->get_storage_options().to_map(), all_options);
-    // if no tablet options are specified in ATLER KS statement,
-    // we want to preserve the old ones and hence cannot overwrite them with defaults
-    if (ks->has_property(ks->KW_TABLETS)) {
-        auto initial_tablets = ks->get_initial_tablets(std::nullopt);
-        add_prefixed_key(ks->KW_TABLETS,
-                         {{"enabled", initial_tablets ? "true" : "false"},
-                         {"initial", std::to_string(initial_tablets.value_or(0))}},
-                         all_options);
-    }
-    add_prefixed_key(ks->KW_DURABLE_WRITES,
-                     {{sstring(ks->KW_DURABLE_WRITES), to_sstring(ks->get_boolean(ks->KW_DURABLE_WRITES, true))}},
-                     all_options);
-
-    return all_options;
-}
-
-std::map<sstring, sstring> get_old_options_flattened(const data_dictionary::keyspace& ks) {
-    std::map<sstring, sstring> all_options;
-
-    using namespace cql3::statements;
-    add_prefixed_key(ks_prop_defs::KW_REPLICATION, ks.get_replication_strategy().get_config_options(), all_options);
-    add_prefixed_key(ks_prop_defs::KW_STORAGE, ks.metadata()->get_storage_options().to_map(), all_options);
-    if (ks.metadata()->initial_tablets()) {
-        add_prefixed_key(ks_prop_defs::KW_TABLETS,
-                         {{"enabled", ks.metadata()->initial_tablets() ? "true" : "false"},
-                          {"initial", std::to_string(ks.metadata()->initial_tablets().value_or(0))}},
-                         all_options);
-    }
-    add_prefixed_key(ks_prop_defs::KW_DURABLE_WRITES,
-                     {{sstring(ks_prop_defs::KW_DURABLE_WRITES), to_sstring(ks.metadata()->durable_writes())}},
-                     all_options);
-
-    return all_options;
-}
-} // <anonymous> namespace
-
 future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, cql3::cql_warnings_vec>>
 cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_processor& qp, service::query_state& state, const query_options& options, service::group0_batch& mc) const {
    using namespace cql_transport;
@@ -199,36 +143,15 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
        auto ks = qp.db().find_keyspace(_name);
        auto ks_md = ks.metadata();
        const auto tmptr = qp.proxy().get_token_metadata_ptr();
+        const auto& topo = tmptr->get_topology();
        const auto& feat = qp.proxy().features();
-        auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, *tmptr, feat);
+        auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, *tmptr, feat, qp.db().get_config());
        utils::chunked_vector<mutation> muts;
        std::vector<sstring> warnings;
-        auto old_ks_options = get_old_options_flattened(ks);
-        auto ks_options = get_current_options_flattened(_attrs, feat);
-        ks_options.merge(old_ks_options);

        auto ts = mc.write_timestamp();
        auto global_request_id = mc.new_group0_state_id();

-        // #22688 - filter out any dc*:0 entries - consider these
-        // null and void (removed). Migration planning will treat it
-        // as dc*=0 still.
-        std::erase_if(ks_options, [](const auto& i) {
-            static constexpr std::string replication_prefix = ks_prop_defs::KW_REPLICATION + ":"s;
-            // Flattened map, replication entries starts with "replication:".
-            // Only valid options are replication_factor, class and per-dc rf:s. We want to
-            // filter out any dcN=0 entries.
-            auto& [key, val] = i;
-            if (key.starts_with(replication_prefix) && val == "0") {
-                std::string_view v(key);
-                v.remove_prefix(replication_prefix.size());
-                return v != ks_prop_defs::REPLICATION_FACTOR_KEY 
-                    && v != ks_prop_defs::REPLICATION_STRATEGY_CLASS_KEY
-                    ;
-            }
-            return false;
-        });
-
        // we only want to run the tablets path if there are actually any tablets changes, not only schema changes
        // TODO: the current `if (changes_tablets(qp))` is insufficient: someone may set the same RFs as before,
        //       and we'll unnecessarily trigger the processing path for ALTER tablets KS,
@@ -238,24 +161,19 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
                return make_exception_future<std::tuple<::shared_ptr<::cql_transport::event::schema_change>, cql3::cql_warnings_vec>>(
                        exceptions::invalid_request_exception("Another global topology request is ongoing, please retry."));
            }
-            if (_attrs->get_replication_options().contains(ks_prop_defs::REPLICATION_FACTOR_KEY)) {
-                return make_exception_future<std::tuple<::shared_ptr<::cql_transport::event::schema_change>, cql3::cql_warnings_vec>>(
-                       exceptions::invalid_request_exception("'replication_factor' tag is not allowed when executing ALTER KEYSPACE with tablets, please list the DCs explicitly"));
-            }
            qp.db().real_database().validate_keyspace_update(*ks_md_update);

            service::topology_mutation_builder builder(ts);
            service::topology_request_tracking_mutation_builder rtbuilder{global_request_id, qp.proxy().features().topology_requests_type_column};
-            rtbuilder.set("done", false)
-                     .set("start_time", db_clock::now());
+            rtbuilder.set("done", false);
            if (!qp.proxy().features().topology_global_request_queue) {
                builder.set_global_topology_request(service::global_topology_request::keyspace_rf_change);
                builder.set_global_topology_request_id(global_request_id);
-                builder.set_new_keyspace_rf_change_data(_name, ks_options);
+                builder.set_new_keyspace_rf_change_data(_name, _attrs->flattened());
            } else {
                builder.queue_global_topology_request_id(global_request_id);
                rtbuilder.set("request_type", service::global_topology_request::keyspace_rf_change)
-                         .set_new_keyspace_rf_change_data(_name, ks_options);
+                         .set_new_keyspace_rf_change_data(_name, _attrs->flattened());

            };
            service::topology_change change{{builder.build()}};
@@ -278,7 +196,8 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce

        auto rs = locator::abstract_replication_strategy::create_replication_strategy(
                ks_md_update->strategy_name(),
-                locator::replication_strategy_params(ks_md_update->strategy_options(), ks_md_update->initial_tablets()));
+                locator::replication_strategy_params(ks_md_update->strategy_options(), ks_md_update->initial_tablets(), ks_md_update->consistency_option()),
+                topo);

        // If `rf_rack_valid_keyspaces` is enabled, it's forbidden to perform a schema change that
        // would lead to an RF-rack-valid keyspace. Verify that this change does not.
--- a/Show More
+++ b/Show More