schema.cc/describe: fix invalid compaction options in schema

There is a typo in schema.cql of snapshot, lack of comma after compaction strategy. It will fail to restore schema by the file. AND compaction = {'class': 'SizeTieredCompactionStrategy''max_compaction_threshold': '32'} map_as_cql_param() function has a `first` parameter to smartly add comma, the compaction_strategy_options is always not the first. Fixes #7741 Signed-off-by: Amos Kong <amos@scylladb.com> Closes #7734 (cherry picked from commit 6b1659ee80)
sstable: writer: ka/la: Write row marker cell after row tombstone
2021-03-24 12:58:11 +02:00 · 2021-03-24 10:42:11 +02:00 · 2021-03-21 10:51:36 +02:00 · 2021-03-18 19:20:10 +02:00 · 2021-03-18 14:29:38 +02:00 · 2021-03-11 08:24:56 +02:00
463 changed files with 18249 additions and 9355 deletions
--- a/.gitmodules
+++ b/.gitmodules
@@ -9,9 +9,12 @@
 [submodule "libdeflate"]
 	path = libdeflate
 	url = ../libdeflate
-[submodule "zstd"]
-	path = zstd
-	url = ../zstd
 [submodule "abseil"]
 	path = abseil
 	url = ../abseil-cpp
+[submodule "scylla-jmx"]
+	path = scylla-jmx
+	url = ../scylla-jmx
+[submodule "scylla-tools"]
+	path = scylla-tools
+	url = ../scylla-tools-java
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -134,15 +134,11 @@ add_executable(scylla
        ${SEASTAR_SOURCE_FILES}
        ${SCYLLA_SOURCE_FILES})

-# Note that since CLion does not undestand GCC6 concepts, we always disable them (even if users configure otherwise).
-# CLion seems to have trouble with `-U` (macro undefinition), so we do it this way instead.
-list(REMOVE_ITEM SEASTAR_CFLAGS "-DHAVE_GCC6_CONCEPTS")
-
 # If the Seastar pkg-config information is available, append to the default flags.
 #
 # For ease of browsing the source code, we always pretend that DPDK is enabled.
 target_compile_options(scylla PUBLIC
-        -std=gnu++1z
+        -std=gnu++20
        -DHAVE_DPDK
        -DHAVE_HWLOC
        "${SEASTAR_CFLAGS}")
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -8,4 +8,4 @@ Please use the [Issue Tracker](https://github.com/scylladb/scylla/issues/) to re

 # Contributing Code to Scylla

-To contribute code to Scylla, you need to sign the [Contributor License Agreement](http://www.scylladb.com/opensource/cla/) and send your changes as [patches](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches) to the [mailing list](https://groups.google.com/forum/#!forum/scylladb-dev). We don't accept pull requests on GitHub.
+To contribute code to Scylla, you need to sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send your changes as [patches](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches) to the [mailing list](https://groups.google.com/forum/#!forum/scylladb-dev). We don't accept pull requests on GitHub.
--- a/HACKING.md
+++ b/HACKING.md
@@ -18,23 +18,35 @@ $ git submodule update --init --recursive

 ### Dependencies

-Scylla depends on the system package manager for its development dependencies.
+Scylla is fairly fussy about its build environment, requiring a very recent
+version of the C++20 compiler and numerous tools and libraries to build.

-Running `./install-dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
+Run `./install-dependencies.sh` (as root) to use your Linux distributions's
+package manager to install the appropriate packages on your build machine.
+However, this will only work on very recent distributions. For example,
+currently Fedora users must upgrade to Fedora 32 otherwise the C++ compiler
+will be too old, and not support the new C++20 standard that Scylla uses.

-On Ubuntu and Debian based Linux distributions, some packages
-required to build Scylla are missing in the official upstream:
+Alternatively, to avoid having to upgrade your build machine or install
+various packages on it, we provide another option - the **frozen toolchain**.
+This is a script, `./tools/toolchain/dbuild`, that can execute build or run
+commands inside a Docker image that contains exactly the right build tools and
+libraries. The `dbuild` technique is useful for beginners, but is also the way
+in which ScyllaDB produces official releases, so it is highly recommended.

- libthrift-dev and libthrift
- antlr3-c++-dev
+To use `dbuild`, you simply prefix any build or run command with it. Building
+and running Scylla becomes as easy as:

-Try running ```sudo ./scripts/scylla_current_repo``` to add Scylla upstream,
-and get the missing packages from it.
+```bash
+$ ./tools/toolchain/dbuild ./configure.py
+$ ./tools/toolchain/dbuild ninja build/release/scylla
+$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
+```

 ### Build system

 **Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native
-thread, and up to 3 GB per native thread while linking. GCC >= 8.1.1. is
+thread, and up to 3 GB per native thread while linking. GCC >= 10 is
 required.

 Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.
--- a/README.md
+++ b/README.md
@@ -2,22 +2,24 @@

 ## Quick-start

-To get the build going quickly, Scylla offers a [frozen toolchain](tools/toolchain/README.md)
-which would build and run Scylla using a pre-configured Docker image.
-Using the frozen toolchain will also isolate all of the installed
-dependencies in a Docker container.
-Assuming you have met the toolchain prerequisites, which is running
-Docker in user mode, building and running is as easy as:
+Scylla is fairly fussy about its build environment, requiring very recent
+versions of the C++20 compiler and of many libraries to build. The document
+[HACKING.md](HACKING.md) includes detailed information on building and
+developing Scylla, but to get Scylla building quickly on (almost) any build
+machine, Scylla offers offers a [frozen toolchain](tools/toolchain/README.md),
+This is a pre-configured Docker image which includes recent versions of all
+the required compilers, libraries and build tools. Using the frozen toolchain
+allows you to avoid changing anything in your build machine to meet Scylla's
+requirements - you just need to meet the frozen toolchain's prerequisites
+(mostly, Docker or Podman being available).
+
+Building and running Scylla with the frozen toolchain is as easy as:

 ```bash
 $ ./tools/toolchain/dbuild ./configure.py
 $ ./tools/toolchain/dbuild ninja build/release/scylla
 $ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
- ```
-
-Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.
-
-**Note**: GCC >= 8.1.1 is required to compile Scylla.
+```

 ## Running Scylla

@@ -67,15 +69,20 @@ The courses are free, self-paced and include hands-on examples. They cover a var
 administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, 
 multi-datacenters and how Scylla integrates with third-party applications.

-## Building Fedora-based Docker image
+## Building a CentOS-based Docker image

 Build a Docker image with:

 ```
-cd dist/docker
+cd dist/docker/redhat
 docker build -t <image-name> .
 ```

+This build is based on executables downloaded from downloads.scylladb.com,
+**not** on the executables built in this source directory. See further
+instructions in dist/docker/redhat/README.md to build a docker image from
+your own executables.
+
 Run the image with:

 ```
--- a/2
+++ b/2
@@ -1,7 +1,7 @@
 #!/bin/sh

 PRODUCT=scylla
-VERSION=4.1.11
+VERSION=4.2.4

 if test -f version
 then
--- a/absl-flat_hash_map.cc
+++ b/absl-flat_hash_map.cc
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2020 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "absl-flat_hash_map.hh"
+
+size_t sstring_hash::operator()(std::string_view v) const noexcept {
+    return absl::Hash<std::string_view>{}(v);
+}
--- a/absl-flat_hash_map.hh
+++ b/absl-flat_hash_map.hh
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2020 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <absl/container/flat_hash_map.h>
+#include <seastar/core/sstring.hh>
+
+using namespace seastar;
+
+struct sstring_hash {
+    using is_transparent = void;
+    size_t operator()(std::string_view v) const noexcept;
+};
+
+struct sstring_eq {
+    using is_transparent = void;
+    bool operator()(std::string_view a, std::string_view b) const noexcept {
+        return a == b;
+    }
+};
+
+template <typename K, typename V, typename... Ts>
+struct flat_hash_map : public absl::flat_hash_map<K, V, Ts...> {
+};
+
+template <typename V>
+struct flat_hash_map<sstring, V>
+    : public absl::flat_hash_map<sstring, V, sstring_hash, sstring_eq> {};
--- a/alternator/base64.cc
+++ b/alternator/base64.cc
@@ -77,7 +77,7 @@ std::string base64_encode(bytes_view in) {
    return ret;
 }

-bytes base64_decode(std::string_view in) {
+static std::string base64_decode_string(std::string_view in) {
    int i = 0;
    int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;
    std::string ret;
@@ -104,8 +104,42 @@ bytes base64_decode(std::string_view in) {
        if (i==3)
            ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
    }
+    return ret;
+}
+
+bytes base64_decode(std::string_view in) {
    // FIXME: This copy is sad. The problem is we need back "bytes"
    // but "bytes" doesn't have efficient append and std::string.
    // To fix this we need to use bytes' "uninitialized" feature.
+    std::string ret = base64_decode_string(in);
    return bytes(ret.begin(), ret.end());
 }
+
+static size_t base64_padding_len(std::string_view str) {
+    size_t padding = 0;
+    padding += (!str.empty() && str.back() == '=');
+    padding += (str.size() > 1 && *(str.end() - 2) == '=');
+    return padding;
+}
+
+size_t base64_decoded_len(std::string_view str) {
+    return str.size() / 4 * 3 - base64_padding_len(str);
+}
+
+bool base64_begins_with(std::string_view base, std::string_view operand) {
+    if (base.size() < operand.size() || base.size() % 4 != 0 || operand.size() % 4 != 0) {
+        return false;
+    }
+    if (base64_padding_len(operand) == 0) {
+        return base.starts_with(operand);
+    }
+    const std::string_view unpadded_base_prefix = base.substr(0, operand.size() - 4);
+    const std::string_view unpadded_operand = operand.substr(0, operand.size() - 4);
+    if (unpadded_base_prefix != unpadded_operand) {
+        return false;
+    }
+    // Decode and compare last 4 bytes of base64-encoded strings
+    const std::string base_remainder = base64_decode_string(base.substr(operand.size() - 4, operand.size()));
+    const std::string operand_remainder = base64_decode_string(operand.substr(operand.size() - 4));
+    return base_remainder.starts_with(operand_remainder);
+}
--- a/alternator/base64.hh
+++ b/alternator/base64.hh
@@ -32,3 +32,7 @@ bytes base64_decode(std::string_view);
 inline bytes base64_decode(const rjson::value& v) {
  return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));
 }
+
+size_t base64_decoded_len(std::string_view str);
+
+bool base64_begins_with(std::string_view base, std::string_view operand);
--- a/alternator/conditions.cc
+++ b/alternator/conditions.cc
@@ -34,7 +34,7 @@
 #include <boost/algorithm/cxx11/any_of.hpp>
 #include "utils/overloaded_functor.hh"

-#include "expressions_eval.hh"
+#include "expressions.hh"

 namespace alternator {

@@ -67,49 +67,6 @@ comparison_operator_type get_comparison_operator(const rjson::value& comparison_
    return it->second;
 }

-static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {
-    bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));
-    auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));
-    bytes raw_value = serialize_item(value);
-    auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
-    return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));
-}
-
-static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {
-    bytes raw_value = get_key_from_typed_value(value, cdef);
-    auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
-    return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));
-}
-
-::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {
-    clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));
-    auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);
-    for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {
-        std::string_view column_name(it->name.GetString(), it->name.GetStringLength());
-        const rjson::value& condition = it->value;
-
-        const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");
-        const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");
-        comparison_operator_type op = get_comparison_operator(comp_definition);
-
-        if (op != comparison_operator_type::EQ) {
-            throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");
-        }
-        if (attr_list.Size() != 1) {
-            throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));
-        }
-        if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {
-            // Primary key restriction
-            filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);
-        } else {
-            // Regular column restriction
-            filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);
-        }
-
-    }
-    return filtering_restrictions;
-}
-
 namespace {

 struct size_check {
@@ -202,36 +159,47 @@ static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
 }

 // Check if two JSON-encoded values match with the BEGINS_WITH relation
-static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
-    // BEGINS_WITH requires that its single operand (v2) be a string or
-    // binary - otherwise it's a validation error. However, problems with
-    // the stored attribute (v1) will just return false (no match).
-    if (!v2.IsObject() || v2.MemberCount() != 1) {
-        throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
-    }
-    auto it2 = v2.MemberBegin();
-    if (it2->name != "S" && it2->name != "B") {
-        throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));
-    }
-
-
+bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2,
+                       bool v1_from_query, bool v2_from_query) {
+    bool bad = false;
    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
+        if (v1_from_query) {
+            throw api_error("ValidationException", "begins_with() encountered malformed argument");
+        } else {
+            bad = true;
+        }
+    } else if (v1->MemberBegin()->name != "S" && v1->MemberBegin()->name != "B") {
+        if (v1_from_query) {
+            throw api_error("ValidationException", format("begins_with supports only string or binary type, got: {}", *v1));
+        } else {
+            bad = true;
+        }
+    }
+    if (!v2.IsObject() || v2.MemberCount() != 1) {
+        if (v2_from_query) {
+            throw api_error("ValidationException", "begins_with() encountered malformed argument");
+        } else {
+            bad = true;
+        }
+    } else if (v2.MemberBegin()->name != "S" && v2.MemberBegin()->name != "B") {
+        if (v2_from_query) {
+            throw api_error("ValidationException", format("begins_with() supports only string or binary type, got: {}", v2));
+        } else {
+            bad = true;
+        }
+    }
+    if (bad) {
        return false;
    }
    auto it1 = v1->MemberBegin();
+    auto it2 = v2.MemberBegin();
    if (it1->name != it2->name) {
        return false;
    }
    if (it2->name == "S") {
-        std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
-        std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());
-        return val1.substr(0, val2.size()) == val2;
+        return rjson::to_string_view(it1->value).starts_with(rjson::to_string_view(it2->value));
    } else /* it2->name == "B" */ {
-        // TODO (optimization): Check the begins_with condition directly on
-        // the base64-encoded string, without making a decoded copy.
-        bytes val1 = base64_decode(it1->value);
-        bytes val2 = base64_decode(it2->value);
-        return val1.substr(0, val2.size()) == val2;
+        return base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));
    }
 }

@@ -246,11 +214,6 @@ bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
    }
    const auto& kv1 = *v1->MemberBegin();
    const auto& kv2 = *v2.MemberBegin();
-    if (kv2.name != "S" && kv2.name != "N" &&  kv2.name != "B") {
-        throw api_error("ValidationException",
-                        format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
-                               "got {} instead", kv2.name));
-    }
    if (kv1.name == "S" && kv2.name == "S") {
        return rjson::to_string_view(kv1.value).find(rjson::to_string_view(kv2.value)) != std::string_view::npos;
    } else if (kv1.name == "B" && kv2.name == "B") {
@@ -333,24 +296,38 @@ static bool check_NOT_NULL(const rjson::value* val) {
    return val != nullptr;
 }

+// Only types S, N or B (string, number or bytes) may be compared by the
+// various comparion operators - lt, le, gt, ge, and between.
+static bool check_comparable_type(const rjson::value& v) {
+    if (!v.IsObject() || v.MemberCount() != 1) {
+        return false;
+    }
+    const rjson::value& type = v.MemberBegin()->name;
+    return type == "S" || type == "N" || type == "B";
+}
+
 // Check if two JSON-encoded values match with cmp.
 template <typename Comparator>
-bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
-    if (!v2.IsObject() || v2.MemberCount() != 1) {
-        throw api_error("ValidationException",
-                        format("{} requires a single AttributeValue of type String, Number, or Binary",
-                               cmp.diagnostic));
+bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp,
+                   bool v1_from_query, bool v2_from_query) {
+    bool bad = false;
+    if (!v1 || !check_comparable_type(*v1)) {
+        if (v1_from_query) {
+            throw api_error("ValidationException", format("{} allow only the types String, Number, or Binary", cmp.diagnostic));
+        }
+        bad = true;
    }
-    const auto& kv2 = *v2.MemberBegin();
-    if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
-        throw api_error("ValidationException",
-                        format("{} requires a single AttributeValue of type String, Number, or Binary",
-                               cmp.diagnostic));
+    if (!check_comparable_type(v2)) {
+        if (v2_from_query) {
+            throw api_error("ValidationException", format("{} allow only the types String, Number, or Binary", cmp.diagnostic));
+        }
+        bad = true;
    }
-    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
+    if (bad) {
        return false;
    }
    const auto& kv1 = *v1->MemberBegin();
+    const auto& kv2 = *v2.MemberBegin();
    if (kv1.name != kv2.name) {
        return false;
    }
@@ -364,7 +341,8 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara
    if (kv1.name == "B") {
        return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
    }
-    clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
+    // cannot reach here, as check_comparable_type() verifies the type is one
+    // of the above options.
    return false;
 }

@@ -395,57 +373,71 @@ struct cmp_gt {
    static constexpr const char* diagnostic = "GT operator";
 };

-// True if v is between lb and ub, inclusive.  Throws if lb > ub.
+// True if v is between lb and ub, inclusive.  Throws or returns false
+// (depending on bounds_from_query parameter) if lb > ub.
 template <typename T>
-bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
+static bool check_BETWEEN(const T& v, const T& lb, const T& ub, bool bounds_from_query) {
    if (cmp_lt()(ub, lb)) {
-        throw api_error("ValidationException",
-                        format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
+        if (bounds_from_query) {
+            throw api_error("ValidationException",
+                format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
+        } else {
+            return false;
+        }
    }
    return cmp_ge()(v, lb) && cmp_le()(v, ub);
 }

-static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
-    if (!v) {
+static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub,
+                          bool v_from_query, bool lb_from_query, bool ub_from_query) {
+    if ((v && v_from_query && !check_comparable_type(*v)) ||
+        (lb_from_query && !check_comparable_type(lb)) ||
+        (ub_from_query && !check_comparable_type(ub))) {
+        throw api_error("ValidationException", "between allow only the types String, Number, or Binary");
+
+    }
+    if (!v || !v->IsObject() || v->MemberCount() != 1 ||
+        !lb.IsObject() || lb.MemberCount() != 1 ||
+        !ub.IsObject() || ub.MemberCount() != 1) {
        return false;
    }
-    if (!v->IsObject() || v->MemberCount() != 1) {
-        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
-    }
-    if (!lb.IsObject() || lb.MemberCount() != 1) {
-        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
-    }
-    if (!ub.IsObject() || ub.MemberCount() != 1) {
-        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
-    }

    const auto& kv_v = *v->MemberBegin();
    const auto& kv_lb = *lb.MemberBegin();
    const auto& kv_ub = *ub.MemberBegin();
+    bool bounds_from_query = lb_from_query && ub_from_query;
    if (kv_lb.name != kv_ub.name) {
-        throw api_error(
-                "ValidationException",
+        if (bounds_from_query) {
+           throw api_error("ValidationException",
                format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
                       kv_lb.name, kv_ub.name));
+        } else {
+            return false;
+        }
    }
    if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
        return false;
    }
    if (kv_v.name == "N") {
        const char* diag = "BETWEEN operator";
-        return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
+        return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag), bounds_from_query);
    }
    if (kv_v.name == "S") {
        return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
                             std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
-                             std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
+                             std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()),
+                             bounds_from_query);
    }
    if (kv_v.name == "B") {
-        return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
+        return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value), bounds_from_query);
    }
-    throw api_error("ValidationException",
-        format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
+    if (v_from_query) {
+        throw api_error("ValidationException",
+            format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
               kv_lb.name));
+    } else {
+        return false;
+    }
 }

 // Verify one Expect condition on one attribute (whose content is "got")
@@ -492,19 +484,19 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
            return check_NE(got, (*attribute_value_list)[0]);
        case comparison_operator_type::LT:
            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
-            return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
+            return check_compare(got, (*attribute_value_list)[0], cmp_lt{}, false, true);
        case comparison_operator_type::LE:
            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
-            return check_compare(got, (*attribute_value_list)[0], cmp_le{});
+            return check_compare(got, (*attribute_value_list)[0], cmp_le{}, false, true);
        case comparison_operator_type::GT:
            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
-            return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
+            return check_compare(got, (*attribute_value_list)[0], cmp_gt{}, false, true);
        case comparison_operator_type::GE:
            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
-            return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
+            return check_compare(got, (*attribute_value_list)[0], cmp_ge{}, false, true);
        case comparison_operator_type::BEGINS_WITH:
            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
-            return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
+            return check_BEGINS_WITH(got, (*attribute_value_list)[0], false, true);
        case comparison_operator_type::IN:
            verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
            return check_IN(got, *attribute_value_list);
@@ -516,56 +508,87 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
            return check_NOT_NULL(got);
        case comparison_operator_type::BETWEEN:
            verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
-            return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
+            return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1],
+                                 false, true, true);
        case comparison_operator_type::CONTAINS:
-            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
-            return check_CONTAINS(got, (*attribute_value_list)[0]);
+            {
+                verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+                // Expected's "CONTAINS" has this artificial limitation.
+                // ConditionExpression's "contains()" does not...
+                const rjson::value& arg = (*attribute_value_list)[0];
+                const auto& argtype = (*arg.MemberBegin()).name;
+                if (argtype != "S" && argtype != "N" && argtype != "B") {
+                    throw api_error("ValidationException",
+                            format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
+                                    "got {} instead", argtype));
+                }
+                return check_CONTAINS(got, arg);
+            }
        case comparison_operator_type::NOT_CONTAINS:
-            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
-            return check_NOT_CONTAINS(got, (*attribute_value_list)[0]);
+            {
+                verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+                // Expected's "NOT_CONTAINS" has this artificial limitation.
+                // ConditionExpression's "contains()" does not...
+                const rjson::value& arg = (*attribute_value_list)[0];
+                const auto& argtype = (*arg.MemberBegin()).name;
+                if (argtype != "S" && argtype != "N" && argtype != "B") {
+                    throw api_error("ValidationException",
+                            format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
+                                    "got {} instead", argtype));
+                }
+                return check_NOT_CONTAINS(got, arg);
+            }
        }
        throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));
    }
 }

+conditional_operator_type get_conditional_operator(const rjson::value& req) {
+    const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");
+    if (!conditional_operator) {
+        return conditional_operator_type::MISSING;
+    }
+    if (!conditional_operator->IsString()) {
+        throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");
+    }
+    auto s = rjson::to_string_view(*conditional_operator);
+    if (s == "AND") {
+        return conditional_operator_type::AND;
+    } else if (s == "OR") {
+        return conditional_operator_type::OR;
+    } else {
+        throw api_error("ValidationException",
+                format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));
+    }
+}
+
 // Check if the existing values of the item (previous_item) match the
 // conditions given by the Expected and ConditionalOperator parameters
 // (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).
 // This function can throw an ValidationException API error if there
 // are errors in the format of the condition itself.
-bool verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {
+bool verify_expected(const rjson::value& req, const rjson::value* previous_item) {
    const rjson::value* expected = rjson::find(req, "Expected");
+    auto conditional_operator = get_conditional_operator(req);
+    if (conditional_operator != conditional_operator_type::MISSING &&
+        (!expected || (expected->IsObject() && expected->GetObject().ObjectEmpty()))) {
+            throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for missing or empty Expression");
+    }
    if (!expected) {
        return true;
    }
    if (!expected->IsObject()) {
        throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");
    }
-    // ConditionalOperator can be "AND" for requiring all conditions, or
-    // "OR" for requiring one condition, and defaults to "AND" if missing.
-    const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");
-    bool require_all = true;
-    if (conditional_operator) {
-        if (!conditional_operator->IsString()) {
-            throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");
-        }
-        std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());
-        if (s == "AND") {
-            // require_all is already true
-        } else if (s == "OR") {
-            require_all = false;
-        } else {
-            throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");
-        }
-        if (expected->GetObject().ObjectEmpty()) {
-            throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");
-        }
-    }
+    bool require_all = conditional_operator != conditional_operator_type::OR;
+    return verify_condition(*expected, require_all, previous_item);
+}

-    for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {
+bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item) {
+    for (auto it = condition.MemberBegin(); it != condition.MemberEnd(); ++it) {
        const rjson::value* got = nullptr;
-        if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {
-            got = rjson::find((*previous_item)["Item"], rjson::to_string_view(it->name));
+        if (previous_item) {
+            got = rjson::find(*previous_item, rjson::to_string_view(it->name));
        }
        bool success = verify_expected_one(it->value, got);
        if (success && !require_all) {
@@ -581,12 +604,8 @@ bool verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value
    return require_all;
 }

-bool calculate_primitive_condition(const parsed::primitive_condition& cond,
-        std::unordered_set<std::string>& used_attribute_values,
-        std::unordered_set<std::string>& used_attribute_names,
-        const rjson::value& req,
-        schema_ptr schema,
-        const std::unique_ptr<rjson::value>& previous_item) {
+static bool calculate_primitive_condition(const parsed::primitive_condition& cond,
+        const rjson::value* previous_item) {
    std::vector<rjson::value> calculated_values;
    calculated_values.reserve(cond._values.size());
    for (const parsed::value& v : cond._values) {
@@ -594,9 +613,7 @@ bool calculate_primitive_condition(const parsed::primitive_condition& cond,
                cond._op == parsed::primitive_condition::type::VALUE ?
                        calculate_value_caller::ConditionExpressionAlone :
                        calculate_value_caller::ConditionExpression,
-                rjson::find(req, "ExpressionAttributeValues"),
-                used_attribute_names, used_attribute_values,
-                req, schema, previous_item));
+                previous_item));
    }
    switch (cond._op) {
    case parsed::primitive_condition::type::BETWEEN:
@@ -604,7 +621,8 @@ bool calculate_primitive_condition(const parsed::primitive_condition& cond,
            // Shouldn't happen unless we have a bug in the parser
            throw std::logic_error(format("Wrong number of values {} in BETWEEN primitive_condition", cond._values.size()));
        }
-        return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2]);
+        return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2],
+                             cond._values[0].is_constant(), cond._values[1].is_constant(), cond._values[2].is_constant());
    case parsed::primitive_condition::type::IN:
        return check_IN(calculated_values);
    case parsed::primitive_condition::type::VALUE:
@@ -635,13 +653,17 @@ bool calculate_primitive_condition(const parsed::primitive_condition& cond,
    case parsed::primitive_condition::type::NE:
        return check_NE(&calculated_values[0], calculated_values[1]);
    case parsed::primitive_condition::type::GT:
-        return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{});
+        return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{},
+            cond._values[0].is_constant(), cond._values[1].is_constant());
    case parsed::primitive_condition::type::GE:
-        return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{});
+        return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{},
+            cond._values[0].is_constant(), cond._values[1].is_constant());
    case parsed::primitive_condition::type::LT:
-        return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{});
+        return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{},
+            cond._values[0].is_constant(), cond._values[1].is_constant());
    case parsed::primitive_condition::type::LE:
-        return check_compare(&calculated_values[0], calculated_values[1], cmp_le{});
+        return check_compare(&calculated_values[0], calculated_values[1], cmp_le{},
+            cond._values[0].is_constant(), cond._values[1].is_constant());
    default:
        // Shouldn't happen unless we have a bug in the parser
        throw std::logic_error(format("Unknown type {} in primitive_condition object", (int)(cond._op)));
@@ -652,23 +674,17 @@ bool calculate_primitive_condition(const parsed::primitive_condition& cond,
 // conditions given by the given parsed ConditionExpression.
 bool verify_condition_expression(
        const parsed::condition_expression& condition_expression,
-        std::unordered_set<std::string>& used_attribute_values,
-        std::unordered_set<std::string>& used_attribute_names,
-        const rjson::value& req,
-        schema_ptr schema,
-        const std::unique_ptr<rjson::value>& previous_item) {
+        const rjson::value* previous_item) {
    if (condition_expression.empty()) {
        return true;
    }
    bool ret = std::visit(overloaded_functor {
        [&] (const parsed::primitive_condition& cond) -> bool {
-            return calculate_primitive_condition(cond, used_attribute_values,
-                    used_attribute_names, req, schema, previous_item);
+            return calculate_primitive_condition(cond, previous_item);
        },
        [&] (const parsed::condition_expression::condition_list& list) -> bool {
            auto verify_condition = [&] (const parsed::condition_expression& e) {
-                return verify_condition_expression(e, used_attribute_values,
-                        used_attribute_names, req, schema, previous_item);
+                return verify_condition_expression(e, previous_item);
            };
            switch (list.op) {
            case '&':
--- a/alternator/conditions.hh
+++ b/alternator/conditions.hh
@@ -33,6 +33,7 @@

 #include "cql3/restrictions/statement_restrictions.hh"
 #include "serialization.hh"
+#include "expressions_types.hh"

 namespace alternator {

@@ -42,8 +43,19 @@ enum class comparison_operator_type {

 comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);

-::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);
+enum class conditional_operator_type {
+    AND, OR, MISSING
+};
+conditional_operator_type get_conditional_operator(const rjson::value& req);

-bool verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);
+bool verify_expected(const rjson::value& req, const rjson::value* previous_item);
+bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item);
+
+bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);
+bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query);
+
+bool verify_condition_expression(
+        const parsed::condition_expression& condition_expression,
+        const rjson::value* previous_item);

 }
--- a/alternator/executor.cc
+++ b/alternator/executor.cc
--- a/alternator/expressions.cc
+++ b/alternator/expressions.cc
@@ -20,16 +20,24 @@
 */

 #include "expressions.hh"
+#include "serialization.hh"
+#include "base64.hh"
+#include "conditions.hh"
 #include "alternator/expressionsLexer.hpp"
 #include "alternator/expressionsParser.hpp"
 #include "utils/overloaded_functor.hh"
+#include "error.hh"

-#include <seastarx.hh>
+#include "seastarx.hh"

 #include <seastar/core/print.hh>
 #include <seastar/util/log.hh>

+#include <boost/algorithm/cxx11/any_of.hpp>
+#include <boost/algorithm/cxx11/all_of.hpp>
+
 #include <functional>
+#include <unordered_map>

 namespace alternator {

@@ -122,6 +130,555 @@ void condition_expression::append(condition_expression&& a, char op) {
    }, _expression);
 }

-
 } // namespace parsed
+
+// The following resolve_*() functions resolve references in parsed
+// expressions of different types. Resolving a parsed expression means
+// replacing:
+//  1. In parsed::path objects, replace references like "#name" with the
+//     attribute name from ExpressionAttributeNames,
+//  2. In parsed::constant objects, replace references like ":value" with
+//     the value from ExpressionAttributeValues.
+// These function also track which name and value references were used, to
+// allow complaining if some remain unused.
+// Note that the resolve_*() functions modify the expressions in-place,
+// so if we ever intend to cache parsed expression, we need to pass a copy
+// into this function.
+//
+// Doing the "resolving" stage before the evaluation stage has two benefits.
+// First, it allows us to be compatible with DynamoDB in catching unused
+// names and values (see issue #6572). Second, in the FilterExpression case,
+// we need to resolve the expression just once but then use it many times
+// (once for each item to be filtered).
+
+static void resolve_path(parsed::path& p,
+        const rjson::value* expression_attribute_names,
+        std::unordered_set<std::string>& used_attribute_names) {
+    const std::string& column_name = p.root();
+    if (column_name.size() > 0 && column_name.front() == '#') {
+        if (!expression_attribute_names) {
+            throw api_error("ValidationException",
+                    format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));
+        }
+        const rjson::value* value = rjson::find(*expression_attribute_names, column_name);
+        if (!value || !value->IsString()) {
+            throw api_error("ValidationException",
+                    format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));
+        }
+        used_attribute_names.emplace(column_name);
+        p.set_root(std::string(rjson::to_string_view(*value)));
+    }
+}
+
+static void resolve_constant(parsed::constant& c,
+        const rjson::value* expression_attribute_values,
+        std::unordered_set<std::string>& used_attribute_values) {
+    std::visit(overloaded_functor {
+        [&] (const std::string& valref) {
+            if (!expression_attribute_values) {
+                throw api_error("ValidationException",
+                        format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));
+            }
+            const rjson::value* value = rjson::find(*expression_attribute_values, valref);
+            if (!value) {
+                throw api_error("ValidationException",
+                        format("ExpressionAttributeValues missing entry '{}' required by expression", valref));
+            }
+            if (value->IsNull()) {
+                throw api_error("ValidationException",
+                        format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));
+            }
+            validate_value(*value, "ExpressionAttributeValues");
+            used_attribute_values.emplace(valref);
+            c.set(*value);
+        },
+        [&] (const parsed::constant::literal& lit) {
+            // Nothing to do, already resolved
+        }
+    }, c._value);
+
+}
+
+void resolve_value(parsed::value& rhs,
+        const rjson::value* expression_attribute_names,
+        const rjson::value* expression_attribute_values,
+        std::unordered_set<std::string>& used_attribute_names,
+        std::unordered_set<std::string>& used_attribute_values) {
+    std::visit(overloaded_functor {
+        [&] (parsed::constant& c) {
+            resolve_constant(c, expression_attribute_values, used_attribute_values);
+        },
+        [&] (parsed::value::function_call& f) {
+            for (parsed::value& value : f._parameters) {
+                resolve_value(value, expression_attribute_names, expression_attribute_values,
+                        used_attribute_names, used_attribute_values);
+            }
+        },
+        [&] (parsed::path& p) {
+            resolve_path(p, expression_attribute_names, used_attribute_names);
+        }
+    }, rhs._value);
+}
+
+void resolve_set_rhs(parsed::set_rhs& rhs,
+        const rjson::value* expression_attribute_names,
+        const rjson::value* expression_attribute_values,
+        std::unordered_set<std::string>& used_attribute_names,
+        std::unordered_set<std::string>& used_attribute_values) {
+    resolve_value(rhs._v1, expression_attribute_names, expression_attribute_values,
+            used_attribute_names, used_attribute_values);
+    if (rhs._op != 'v') {
+        resolve_value(rhs._v2, expression_attribute_names, expression_attribute_values,
+                used_attribute_names, used_attribute_values);
+    }
+}
+
+void resolve_update_expression(parsed::update_expression& ue,
+        const rjson::value* expression_attribute_names,
+        const rjson::value* expression_attribute_values,
+        std::unordered_set<std::string>& used_attribute_names,
+        std::unordered_set<std::string>& used_attribute_values) {
+    for (parsed::update_expression::action& action : ue.actions()) {
+        resolve_path(action._path, expression_attribute_names, used_attribute_names);
+        std::visit(overloaded_functor {
+            [&] (parsed::update_expression::action::set& a) {
+                resolve_set_rhs(a._rhs, expression_attribute_names, expression_attribute_values,
+                        used_attribute_names, used_attribute_values);
+            },
+            [&] (parsed::update_expression::action::remove& a) {
+                // nothing to do
+            },
+            [&] (parsed::update_expression::action::add& a) {
+                resolve_constant(a._valref, expression_attribute_values, used_attribute_values);
+            },
+            [&] (parsed::update_expression::action::del& a) {
+                resolve_constant(a._valref, expression_attribute_values, used_attribute_values);
+            }
+        }, action._action);
+    }
+}
+
+static void resolve_primitive_condition(parsed::primitive_condition& pc,
+        const rjson::value* expression_attribute_names,
+        const rjson::value* expression_attribute_values,
+        std::unordered_set<std::string>& used_attribute_names,
+        std::unordered_set<std::string>& used_attribute_values) {
+    for (parsed::value& value : pc._values) {
+        resolve_value(value,
+                expression_attribute_names, expression_attribute_values,
+                used_attribute_names, used_attribute_values);
+    }
+}
+
+void resolve_condition_expression(parsed::condition_expression& ce,
+        const rjson::value* expression_attribute_names,
+        const rjson::value* expression_attribute_values,
+        std::unordered_set<std::string>& used_attribute_names,
+        std::unordered_set<std::string>& used_attribute_values) {
+    std::visit(overloaded_functor {
+        [&] (parsed::primitive_condition& cond) {
+            resolve_primitive_condition(cond,
+                    expression_attribute_names, expression_attribute_values,
+                    used_attribute_names, used_attribute_values);
+        },
+        [&] (parsed::condition_expression::condition_list& list) {
+            for (parsed::condition_expression& cond : list.conditions) {
+                resolve_condition_expression(cond,
+                        expression_attribute_names, expression_attribute_values,
+                            used_attribute_names, used_attribute_values);
+
+            }
+        }
+    }, ce._expression);
+}
+
+void resolve_projection_expression(std::vector<parsed::path>& pe,
+        const rjson::value* expression_attribute_names,
+        std::unordered_set<std::string>& used_attribute_names) {
+    for (parsed::path& p : pe) {
+        resolve_path(p, expression_attribute_names, used_attribute_names);
+    }
+}
+
+// condition_expression_on() checks whether a condition_expression places any
+// condition on the given attribute. It can be useful, for example, for
+// checking whether the condition tries to restrict a key column.
+
+static bool value_on(const parsed::value& v, std::string_view attribute) {
+    return std::visit(overloaded_functor {
+        [&] (const parsed::constant& c) {
+            return false;
+        },
+        [&] (const parsed::value::function_call& f) {
+            for (const parsed::value& value : f._parameters) {
+                if (value_on(value, attribute)) {
+                    return true;
+                }
+            }
+            return false;
+        },
+        [&] (const parsed::path& p) {
+            return p.root() == attribute;
+        }
+    }, v._value);
+}
+
+static bool primitive_condition_on(const parsed::primitive_condition& pc, std::string_view attribute) {
+    for (const parsed::value& value : pc._values) {
+        if (value_on(value, attribute)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+bool condition_expression_on(const parsed::condition_expression& ce, std::string_view attribute) {
+    return std::visit(overloaded_functor {
+        [&] (const parsed::primitive_condition& cond) {
+            return primitive_condition_on(cond, attribute);
+        },
+        [&] (const parsed::condition_expression::condition_list& list) {
+            for (const parsed::condition_expression& cond : list.conditions) {
+                if (condition_expression_on(cond, attribute)) {
+                    return true;
+                }
+            }
+            return false;
+        }
+    }, ce._expression);
+}
+
+// for_condition_expression_on() runs a given function over all the attributes
+// mentioned in the expression. If the same attribute is mentioned more than
+// once, the function will be called more than once for the same attribute.
+
+static void for_value_on(const parsed::value& v, const noncopyable_function<void(std::string_view)>& func) {
+    std::visit(overloaded_functor {
+        [&] (const parsed::constant& c) { },
+        [&] (const parsed::value::function_call& f) {
+            for (const parsed::value& value : f._parameters) {
+                for_value_on(value, func);
+            }
+        },
+        [&] (const parsed::path& p) {
+            func(p.root());
+        }
+    }, v._value);
+}
+
+void for_condition_expression_on(const parsed::condition_expression& ce, const noncopyable_function<void(std::string_view)>& func) {
+    std::visit(overloaded_functor {
+        [&] (const parsed::primitive_condition& cond) {
+            for (const parsed::value& value : cond._values) {
+                for_value_on(value, func);
+            }
+        },
+        [&] (const parsed::condition_expression::condition_list& list) {
+            for (const parsed::condition_expression& cond : list.conditions) {
+                for_condition_expression_on(cond, func);
+            }
+        }
+    }, ce._expression);
+}
+
+// The following calculate_value() functions calculate, or evaluate, a parsed
+// expression. The parsed expression is assumed to have been "resolved", with
+// the matching resolve_* function.
+
+// Take two JSON-encoded list values (remember that a list value is
+// {"L": [...the actual list]}) and return the concatenation, again as
+// a list value.
+static rjson::value list_concatenate(const rjson::value& v1, const rjson::value& v2) {
+    const rjson::value* list1 = unwrap_list(v1);
+    const rjson::value* list2 = unwrap_list(v2);
+    if (!list1 || !list2) {
+        throw api_error("ValidationException", "UpdateExpression: list_append() given a non-list");
+    }
+    rjson::value cat = rjson::copy(*list1);
+    for (const auto& a : list2->GetArray()) {
+        rjson::push_back(cat, rjson::copy(a));
+    }
+    rjson::value ret = rjson::empty_object();
+    rjson::set(ret, "L", std::move(cat));
+    return ret;
+}
+
+// calculate_size() is ConditionExpression's size() function, i.e., it takes
+// a JSON-encoded value and returns its "size" as defined differently for the
+// different types - also as a JSON-encoded number.
+// It return a JSON-encoded "null" value if this value's type has no size
+// defined. Comparisons against this non-numeric value will later fail.
+static rjson::value calculate_size(const rjson::value& v) {
+    // NOTE: If v is improperly formatted for our JSON value encoding, it
+    // must come from the request itself, not from the database, so it makes
+    // sense to throw a ValidationException if we see such a problem.
+    if (!v.IsObject() || v.MemberCount() != 1) {
+        throw api_error("ValidationException", format("invalid object: {}", v));
+    }
+    auto it = v.MemberBegin();
+    int ret;
+    if (it->name == "S") {
+        if (!it->value.IsString()) {
+            throw api_error("ValidationException", format("invalid string: {}", v));
+        }
+        ret = it->value.GetStringLength();
+    } else if (it->name == "NS" || it->name == "SS" || it->name == "BS" || it->name == "L") {
+        if (!it->value.IsArray()) {
+            throw api_error("ValidationException", format("invalid set: {}", v));
+        }
+        ret = it->value.Size();
+    } else if (it->name == "M") {
+        if (!it->value.IsObject()) {
+            throw api_error("ValidationException", format("invalid map: {}", v));
+        }
+        ret = it->value.MemberCount();
+    } else if (it->name == "B") {
+        if (!it->value.IsString()) {
+            throw api_error("ValidationException", format("invalid byte string: {}", v));
+        }
+        ret = base64_decoded_len(rjson::to_string_view(it->value));
+    } else {
+        rjson::value json_ret = rjson::empty_object();
+        rjson::set(json_ret, "null", rjson::value(true));
+        return json_ret;
+    }
+    rjson::value json_ret = rjson::empty_object();
+    rjson::set(json_ret, "N", rjson::from_string(std::to_string(ret)));
+    return json_ret;
+}
+
+static const rjson::value& calculate_value(const parsed::constant& c) {
+    return std::visit(overloaded_functor {
+        [&] (const parsed::constant::literal& v) -> const rjson::value& {
+            return *v;
+        },
+        [&] (const std::string& valref) -> const rjson::value& {
+            // Shouldn't happen, we should have called resolve_value() earlier
+            // and replaced the value reference by the literal constant.
+            throw std::logic_error("calculate_value() called before resolve_value()");
+        }
+    }, c._value);
+}
+
+static rjson::value to_bool_json(bool b) {
+    rjson::value json_ret = rjson::empty_object();
+    rjson::set(json_ret, "BOOL", rjson::value(b));
+    return json_ret;
+}
+
+static bool known_type(std::string_view type) {
+    static thread_local const std::unordered_set<std::string_view> types = {
+            "N", "S", "B", "NS", "SS", "BS", "L", "M", "NULL", "BOOL"
+    };
+    return types.contains(type);
+}
+
+using function_handler_type = rjson::value(calculate_value_caller, const rjson::value*, const parsed::value::function_call&);
+static const
+std::unordered_map<std::string_view, function_handler_type*> function_handlers {
+    {"list_append", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
+            if (caller != calculate_value_caller::UpdateExpression) {
+                throw api_error("ValidationException",
+                        format("{}: list_append() not allowed here", caller));
+            }
+            if (f._parameters.size() != 2) {
+                throw api_error("ValidationException",
+                        format("{}: list_append() accepts 2 parameters, got {}", caller, f._parameters.size()));
+            }
+            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
+            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
+            return list_concatenate(v1, v2);
+        }
+    },
+    {"if_not_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
+            if (caller != calculate_value_caller::UpdateExpression) {
+                throw api_error("ValidationException",
+                        format("{}: if_not_exists() not allowed here", caller));
+            }
+            if (f._parameters.size() != 2) {
+                throw api_error("ValidationException",
+                        format("{}: if_not_exists() accepts 2 parameters, got {}", caller, f._parameters.size()));
+            }
+            if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
+                throw api_error("ValidationException",
+                        format("{}: if_not_exists() must include path as its first argument", caller));
+            }
+            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
+            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
+            return v1.IsNull() ? std::move(v2) : std::move(v1);
+        }
+    },
+    {"size", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
+            if (caller != calculate_value_caller::ConditionExpression) {
+                throw api_error("ValidationException",
+                        format("{}: size() not allowed here", caller));
+            }
+            if (f._parameters.size() != 1) {
+                throw api_error("ValidationException",
+                        format("{}: size() accepts 1 parameter, got {}", caller, f._parameters.size()));
+            }
+            rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
+            return calculate_size(v);
+        }
+    },
+    {"attribute_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
+            if (caller != calculate_value_caller::ConditionExpressionAlone) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_exists() not allowed here", caller));
+            }
+            if (f._parameters.size() != 1) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_exists() accepts 1 parameter, got {}", caller, f._parameters.size()));
+            }
+            if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_exists()'s parameter must be a path", caller));
+            }
+            rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
+            return to_bool_json(!v.IsNull());
+        }
+    },
+    {"attribute_not_exists", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
+            if (caller != calculate_value_caller::ConditionExpressionAlone) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_not_exists() not allowed here", caller));
+            }
+            if (f._parameters.size() != 1) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_not_exists() accepts 1 parameter, got {}", caller, f._parameters.size()));
+            }
+            if (!std::holds_alternative<parsed::path>(f._parameters[0]._value)) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_not_exists()'s parameter must be a path", caller));
+            }
+            rjson::value v = calculate_value(f._parameters[0], caller, previous_item);
+            return to_bool_json(v.IsNull());
+        }
+    },
+    {"attribute_type", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
+            if (caller != calculate_value_caller::ConditionExpressionAlone) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_type() not allowed here", caller));
+            }
+            if (f._parameters.size() != 2) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_type() accepts 2 parameters, got {}", caller, f._parameters.size()));
+            }
+            // There is no real reason for the following check (not
+            // allowing the type to come from a document attribute), but
+            // DynamoDB does this check, so we do too...
+            if (!f._parameters[1].is_constant()) {
+                throw api_error("ValidationException",
+                        format("{}: attribute_types()'s first parameter must be an expression attribute", caller));
+            }
+            rjson::value v0 = calculate_value(f._parameters[0], caller, previous_item);
+            rjson::value v1 = calculate_value(f._parameters[1], caller, previous_item);
+            if (v1.IsObject() && v1.MemberCount() == 1 && v1.MemberBegin()->name == "S") {
+                // If the type parameter is not one of the legal types
+                // we should generate an error, not a failed condition:
+                if (!known_type(rjson::to_string_view(v1.MemberBegin()->value))) {
+                    throw api_error("ValidationException",
+                            format("{}: attribute_types()'s second parameter, {}, is not a known type",
+                                    caller, v1.MemberBegin()->value));
+                }
+                if (v0.IsObject() && v0.MemberCount() == 1) {
+                    return to_bool_json(v1.MemberBegin()->value == v0.MemberBegin()->name);
+                } else {
+                    return to_bool_json(false);
+                }
+            } else {
+                throw api_error("ValidationException",
+                        format("{}: attribute_type() second parameter must refer to a string, got {}", caller, v1));
+            }
+        }
+    },
+    {"begins_with", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
+            if (caller != calculate_value_caller::ConditionExpressionAlone) {
+                throw api_error("ValidationException",
+                        format("{}: begins_with() not allowed here", caller));
+            }
+            if (f._parameters.size() != 2) {
+                throw api_error("ValidationException",
+                        format("{}: begins_with() accepts 2 parameters, got {}", caller, f._parameters.size()));
+            }
+            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
+            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
+            return to_bool_json(check_BEGINS_WITH(v1.IsNull() ? nullptr : &v1,  v2,
+                                    f._parameters[0].is_constant(), f._parameters[1].is_constant()));
+        }
+    },
+    {"contains", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {
+            if (caller != calculate_value_caller::ConditionExpressionAlone) {
+                throw api_error("ValidationException",
+                        format("{}: contains() not allowed here", caller));
+            }
+            if (f._parameters.size() != 2) {
+                throw api_error("ValidationException",
+                        format("{}: contains() accepts 2 parameters, got {}", caller, f._parameters.size()));
+            }
+            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
+            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
+            return to_bool_json(check_CONTAINS(v1.IsNull() ? nullptr : &v1,  v2));
+        }
+    },
+};
+
+// Given a parsed::value, which can refer either to a constant value from
+// ExpressionAttributeValues, to the value of some attribute, or to a function
+// of other values, this function calculates the resulting value.
+// "caller" determines which expression - ConditionExpression or
+// UpdateExpression - is asking for this value. We need to know this because
+// DynamoDB allows a different choice of functions for different expressions.
+rjson::value calculate_value(const parsed::value& v,
+        calculate_value_caller caller,
+        const rjson::value* previous_item) {
+    return std::visit(overloaded_functor {
+        [&] (const parsed::constant& c) -> rjson::value {
+            return rjson::copy(calculate_value(c));
+        },
+        [&] (const parsed::value::function_call& f) -> rjson::value {
+            auto function_it = function_handlers.find(std::string_view(f._function_name));
+            if (function_it == function_handlers.end()) {
+                throw api_error("ValidationException",
+                        format("UpdateExpression: unknown function '{}' called.", f._function_name));
+            }
+            return function_it->second(caller, previous_item, f);
+        },
+        [&] (const parsed::path& p) -> rjson::value {
+            if (!previous_item) {
+                return rjson::null_value();
+            }
+            std::string update_path = p.root();
+            if (p.has_operators()) {
+                // FIXME: support this
+                throw api_error("ValidationException", "Reading attribute paths not yet implemented");
+            }
+            const rjson::value* previous_value = rjson::find(*previous_item, update_path);
+            return previous_value ? rjson::copy(*previous_value) : rjson::null_value();
+        }
+    }, v._value);
+}
+
+// Same as calculate_value() above, except takes a set_rhs, which may be
+// either a single value, or v1+v2 or v1-v2.
+rjson::value calculate_value(const parsed::set_rhs& rhs,
+        const rjson::value* previous_item) {
+    switch(rhs._op) {
+    case 'v':
+        return calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);
+    case '+': {
+        rjson::value v1 = calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);
+        rjson::value v2 = calculate_value(rhs._v2, calculate_value_caller::UpdateExpression, previous_item);
+        return number_add(v1, v2);
+    }
+    case '-': {
+        rjson::value v1 = calculate_value(rhs._v1, calculate_value_caller::UpdateExpression, previous_item);
+        rjson::value v2 = calculate_value(rhs._v2, calculate_value_caller::UpdateExpression, previous_item);
+        return number_subtract(v1, v2);
+    }
+    }
+    // Can't happen
+    return rjson::null_value();
+}
+
 } // namespace alternator
--- a/alternator/expressions.hh
+++ b/alternator/expressions.hh
@@ -24,8 +24,13 @@
 #include <string>
 #include <stdexcept>
 #include <vector>
+#include <unordered_set>
+#include <string_view>
+
+#include <seastar/util/noncopyable_function.hh>

 #include "expressions_types.hh"
+#include "rjson.hh"

 namespace alternator {

@@ -38,4 +43,60 @@ parsed::update_expression parse_update_expression(std::string query);
 std::vector<parsed::path> parse_projection_expression(std::string query);
 parsed::condition_expression parse_condition_expression(std::string query);

+void resolve_update_expression(parsed::update_expression& ue,
+        const rjson::value* expression_attribute_names,
+        const rjson::value* expression_attribute_values,
+        std::unordered_set<std::string>& used_attribute_names,
+        std::unordered_set<std::string>& used_attribute_values);
+void resolve_projection_expression(std::vector<parsed::path>& pe,
+        const rjson::value* expression_attribute_names,
+        std::unordered_set<std::string>& used_attribute_names);
+void resolve_condition_expression(parsed::condition_expression& ce,
+        const rjson::value* expression_attribute_names,
+        const rjson::value* expression_attribute_values,
+        std::unordered_set<std::string>& used_attribute_names,
+        std::unordered_set<std::string>& used_attribute_values);
+
+void validate_value(const rjson::value& v, const char* caller);
+
+bool condition_expression_on(const parsed::condition_expression& ce, std::string_view attribute);
+
+// for_condition_expression_on() runs the given function on the attributes
+// that the expression uses. It may run for the same attribute more than once
+// if the same attribute is used more than once in the expression.
+void for_condition_expression_on(const parsed::condition_expression& ce, const noncopyable_function<void(std::string_view)>& func);
+
+// calculate_value() behaves slightly different (especially, different
+// functions supported) when used in different types of expressions, as
+// enumerated in this enum:
+enum class calculate_value_caller {
+    UpdateExpression, ConditionExpression, ConditionExpressionAlone
+};
+
+inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {
+    switch (caller) {
+        case calculate_value_caller::UpdateExpression:
+            out << "UpdateExpression";
+            break;
+        case calculate_value_caller::ConditionExpression:
+            out << "ConditionExpression";
+            break;
+        case calculate_value_caller::ConditionExpressionAlone:
+            out << "ConditionExpression";
+            break;
+        default:
+            out << "unknown type of expression";
+            break;
+    }
+    return out;
+}
+
+rjson::value calculate_value(const parsed::value& v,
+        calculate_value_caller caller,
+        const rjson::value* previous_item);
+
+rjson::value calculate_value(const parsed::set_rhs& rhs,
+        const rjson::value* previous_item);
+
+
 } /* namespace alternator */
--- a/alternator/expressions_eval.hh
+++ b/alternator/expressions_eval.hh
@@ -1,78 +0,0 @@
-/*
- * Copyright 2020 ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU Affero General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#pragma once
-
-#include <string>
-#include <unordered_set>
-
-#include "rjson.hh"
-#include "schema_fwd.hh"
-
-#include "expressions_types.hh"
-
-namespace alternator {
-
-// calculate_value() behaves slightly different (especially, different
-// functions supported) when used in different types of expressions, as
-// enumerated in this enum:
-enum class calculate_value_caller {
-    UpdateExpression, ConditionExpression, ConditionExpressionAlone
-};
-
-inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {
-    switch (caller) {
-        case calculate_value_caller::UpdateExpression:
-            out << "UpdateExpression";
-            break;
-        case calculate_value_caller::ConditionExpression:
-            out << "ConditionExpression";
-            break;
-        case calculate_value_caller::ConditionExpressionAlone:
-            out << "ConditionExpression";
-            break;
-        default:
-            out << "unknown type of expression";
-            break;
-    }
-    return out;
-}
-
-bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);
-
-rjson::value calculate_value(const parsed::value& v,
-        calculate_value_caller caller,
-        const rjson::value* expression_attribute_values,
-        std::unordered_set<std::string>& used_attribute_names,
-        std::unordered_set<std::string>& used_attribute_values,
-        const rjson::value& update_info,
-        schema_ptr schema,
-        const std::unique_ptr<rjson::value>& previous_item);
-
-bool verify_condition_expression(
-        const parsed::condition_expression& condition_expression,
-        std::unordered_set<std::string>& used_attribute_values,
-        std::unordered_set<std::string>& used_attribute_names,
-        const rjson::value& req,
-        schema_ptr schema,
-        const std::unique_ptr<rjson::value>& previous_item);
-
-} /* namespace alternator */
--- a/alternator/expressions_types.hh
+++ b/alternator/expressions_types.hh
@@ -25,6 +25,10 @@
 #include <string>
 #include <variant>

+#include <seastar/core/shared_ptr.hh>
+
+#include "rjson.hh"
+
 /*
 * Parsed representation of expressions and their components.
 *
@@ -63,10 +67,27 @@ public:
    }
 };

+// When an expression is first parsed, all constants are references, like
+// ":val1", into ExpressionAttributeValues. This uses std::string() variant.
+// The resolve_value() function replaces these constants by the JSON item
+// extracted from the ExpressionAttributeValues.
+struct constant {
+    // We use lw_shared_ptr<rjson::value> just to make rjson::value copyable,
+    // to make this entire object copyable as ANTLR needs.
+    using literal = lw_shared_ptr<rjson::value>;
+    std::variant<std::string, literal> _value;
+    void set(const rjson::value& v) {
+        _value = make_lw_shared<rjson::value>(rjson::copy(v));
+    }
+    void set(std::string& s) {
+        _value = s;
+    }
+};
+
 // "value" is is a value used in the right hand side of an assignment
-// expression, "SET a = ...". It can be a reference to a value included in
-// the request (":val"), a path to an attribute from the existing item
-// (e.g., "a.b[3].c"), or a function of other such values.
+// expression, "SET a = ...". It can be a constant (a reference to a value
+// included in the request, e.g., ":val"), a path to an attribute from the
+// existing item (e.g., "a.b[3].c"), or a function of other such values.
 // Note that the real right-hand-side of an assignment is actually a bit
 // more general - it allows either a value, or a value+value or value-value -
 // see class set_rhs below.
@@ -75,9 +96,12 @@ struct value {
        std::string _function_name;
        std::vector<value> _parameters;
    };
-    std::variant<std::string, path, function_call> _value;
+    std::variant<constant, path, function_call> _value;
+    void set_constant(constant c) {
+        _value = std::move(c);
+    }
    void set_valref(std::string s) {
-        _value = std::move(s);
+        _value = constant { std::move(s) };
    }
    void set_path(path p) {
        _value = std::move(p);
@@ -88,8 +112,8 @@ struct value {
    void add_func_parameter(value v) {
        std::get<function_call>(_value)._parameters.emplace_back(std::move(v));
    }
-    bool is_valref() const {
-        return std::holds_alternative<std::string>(_value);
+    bool is_constant() const {
+        return std::holds_alternative<constant>(_value);
    }
    bool is_path() const {
        return std::holds_alternative<path>(_value);
@@ -130,10 +154,10 @@ public:
        struct remove {
        };
        struct add {
-            std::string _valref;
+            constant _valref;
        };
        struct del {
-            std::string _valref;
+            constant _valref;
        };
        std::variant<set, remove, add, del> _action;

@@ -147,11 +171,11 @@ public:
        }
        void assign_add(path p, std::string v) {
            _path = std::move(p);
-            _action = add { std::move(v) };
+            _action = add { constant { std::move(v) } };
        }
        void assign_del(path p, std::string v) {
            _path = std::move(p);
-            _action = del { std::move(v) };
+            _action = del { constant { std::move(v) } };
        }
    };
 private:
@@ -169,6 +193,9 @@ public:
    const std::vector<action>& actions() const {
        return _actions;
    }
+    std::vector<action>& actions() {
+        return _actions;
+    }
 };

 // A primitive_condition is a condition expression involving one condition,
--- a/alternator/rmw_operation.hh
+++ b/alternator/rmw_operation.hh
@@ -21,9 +21,9 @@

 #pragma once

-#include <seastarx.hh>
-#include <service/storage_proxy.hh>
-#include <service/storage_proxy.hh>
+#include "seastarx.hh"
+#include "service/storage_proxy.hh"
+#include "service/storage_proxy.hh"
 #include "rjson.hh"
 #include "executor.hh"

--- a/alternator/serialization.cc
+++ b/alternator/serialization.cc
@@ -31,8 +31,8 @@ static logging::logger slogger("alternator-serialization");

 namespace alternator {

-type_info type_info_from_string(std::string type) {
-    static thread_local const std::unordered_map<std::string, type_info> type_infos = {
+type_info type_info_from_string(std::string_view type) {
+    static thread_local const std::unordered_map<std::string_view, type_info> type_infos = {
        {"S", {alternator_type::S, utf8_type}},
        {"B", {alternator_type::B, bytes_type}},
        {"BOOL", {alternator_type::BOOL, boolean_type}},
@@ -87,7 +87,7 @@ bytes serialize_item(const rjson::value& item) {
        throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));
    }
    auto it = item.MemberBegin();
-    type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings
+    type_info type_info = type_info_from_string(rjson::to_string_view(it->name)); // JSON keys are guaranteed to be strings

    if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {
        slogger.trace("Non-optimal serialization of type {}", it->name.GetString());
@@ -186,6 +186,11 @@ bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column
                format("Type mismatch: expected type {} for key column {}, got type {}",
                        type_to_string(column.type), column.name_as_text(), it->name.GetString()));
    }
+    std::string_view value_view = rjson::to_string_view(it->value);
+    if (value_view.empty()) {
+        throw api_error("ValidationException",
+                format("The AttributeValue for a key attribute cannot contain an empty string value. Key: {}", column.name_as_text()));
+    }
    if (column.type == bytes_type) {
        return base64_decode(it->value);
    } else {
@@ -270,4 +275,93 @@ const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value&
    return std::make_pair(it_key, &(it->value));
 }

+const rjson::value* unwrap_list(const rjson::value& v) {
+    if (!v.IsObject() || v.MemberCount() != 1) {
+        return nullptr;
+    }
+    auto it = v.MemberBegin();
+    if (it->name != std::string("L")) {
+        return nullptr;
+    }
+    return &(it->value);
+}
+
+// Take two JSON-encoded numeric values ({"N": "thenumber"}) and return the
+// sum, again as a JSON-encoded number.
+rjson::value number_add(const rjson::value& v1, const rjson::value& v2) {
+    auto n1 = unwrap_number(v1, "UpdateExpression");
+    auto n2 = unwrap_number(v2, "UpdateExpression");
+    rjson::value ret = rjson::empty_object();
+    std::string str_ret = std::string((n1 + n2).to_string());
+    rjson::set(ret, "N", rjson::from_string(str_ret));
+    return ret;
+}
+
+rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2) {
+    auto n1 = unwrap_number(v1, "UpdateExpression");
+    auto n2 = unwrap_number(v2, "UpdateExpression");
+    rjson::value ret = rjson::empty_object();
+    std::string str_ret = std::string((n1 - n2).to_string());
+    rjson::set(ret, "N", rjson::from_string(str_ret));
+    return ret;
+}
+
+// Take two JSON-encoded set values (e.g. {"SS": [...the actual set]}) and
+// return the sum of both sets, again as a set value.
+rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {
+    auto [set1_type, set1] = unwrap_set(v1);
+    auto [set2_type, set2] = unwrap_set(v2);
+    if (set1_type != set2_type) {
+        throw api_error("ValidationException", format("Mismatched set types: {} and {}", set1_type, set2_type));
+    }
+    if (!set1 || !set2) {
+        throw api_error("ValidationException", "UpdateExpression: ADD operation for sets must be given sets as arguments");
+    }
+    rjson::value sum = rjson::copy(*set1);
+    std::set<rjson::value, rjson::single_value_comp> set1_raw;
+    for (auto it = sum.Begin(); it != sum.End(); ++it) {
+        set1_raw.insert(rjson::copy(*it));
+    }
+    for (const auto& a : set2->GetArray()) {
+        if (set1_raw.count(a) == 0) {
+            rjson::push_back(sum, rjson::copy(a));
+        }
+    }
+    rjson::value ret = rjson::empty_object();
+    rjson::set_with_string_name(ret, set1_type, std::move(sum));
+    return ret;
+}
+
+// Take two JSON-encoded set values (e.g. {"SS": [...the actual list]}) and
+// return the difference of s1 - s2, again as a set value.
+// DynamoDB does not allow empty sets, so if resulting set is empty, return
+// an unset optional instead.
+std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value& v2) {
+    auto [set1_type, set1] = unwrap_set(v1);
+    auto [set2_type, set2] = unwrap_set(v2);
+    if (set1_type != set2_type) {
+        throw api_error("ValidationException", format("Mismatched set types: {} and {}", set1_type, set2_type));
+    }
+    if (!set1 || !set2) {
+        throw api_error("ValidationException", "UpdateExpression: DELETE operation can only be performed on a set");
+    }
+    std::set<rjson::value, rjson::single_value_comp> set1_raw;
+    for (auto it = set1->Begin(); it != set1->End(); ++it) {
+        set1_raw.insert(rjson::copy(*it));
+    }
+    for (const auto& a : set2->GetArray()) {
+        set1_raw.erase(a);
+    }
+    if (set1_raw.empty()) {
+        return std::nullopt;
+    }
+    rjson::value ret = rjson::empty_object();
+    rjson::set_with_string_name(ret, set1_type, rjson::empty_array());
+    rjson::value& result_set = ret[set1_type];
+    for (const auto& a : set1_raw) {
+        rjson::push_back(result_set, rjson::copy(a));
+    }
+    return ret;
+}
+
 }
--- a/alternator/serialization.hh
+++ b/alternator/serialization.hh
@@ -45,7 +45,7 @@ struct type_representation {
    data_type dtype;
 };

-type_info type_info_from_string(std::string type);
+type_info type_info_from_string(std::string_view type);
 type_representation represent_type(alternator_type atype);

 bytes serialize_item(const rjson::value& item);
@@ -69,4 +69,21 @@ big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
 // returned value is {"", nullptr}
 const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);

+// Check if a given JSON object encodes a list (i.e., it is a {"L": [...]}
+// and returns a pointer to that list.
+const rjson::value* unwrap_list(const rjson::value& v);
+
+// Take two JSON-encoded numeric values ({"N": "thenumber"}) and return the
+// sum, again as a JSON-encoded number.
+rjson::value number_add(const rjson::value& v1, const rjson::value& v2);
+rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2);
+// Take two JSON-encoded set values (e.g. {"SS": [...the actual set]}) and
+// return the sum of both sets, again as a set value.
+rjson::value set_sum(const rjson::value& v1, const rjson::value& v2);
+// Take two JSON-encoded set values (e.g. {"SS": [...the actual list]}) and
+// return the difference of s1 - s2, again as a set value.
+// DynamoDB does not allow empty sets, so if resulting set is empty, return
+// an unset optional instead.
+std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value& v2);
+
 }
--- a/alternator/server.cc
+++ b/alternator/server.cc
@@ -23,7 +23,7 @@
 #include "log.hh"
 #include <seastar/http/function_handlers.hh>
 #include <seastar/json/json_elements.hh>
-#include <seastarx.hh>
+#include "seastarx.hh"
 #include "error.hh"
 #include "rjson.hh"
 #include "auth.hh"
--- a/alternator/server.hh
+++ b/alternator/server.hh
@@ -26,8 +26,8 @@
 #include <seastar/http/httpd.hh>
 #include <seastar/net/tls.hh>
 #include <optional>
-#include <alternator/auth.hh>
-#include <utils/small_vector.hh>
+#include "alternator/auth.hh"
+#include "utils/small_vector.hh"
 #include <seastar/core/units.hh>

 namespace alternator {
--- a/api/api-doc/storage_service.json
+++ b/api/api-doc/storage_service.json
@@ -511,6 +511,21 @@
            }
         ]
      },
+      {
+         "path":"/storage_service/cdc_streams_check_and_repair",
+         "operations":[
+            {
+               "method":"POST",
+               "summary":"Checks that CDC streams reflect current cluster topology and regenerates them if not.",
+               "type":"void",
+               "nickname":"cdc_streams_check_and_repair",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[]
+            }
+         ]
+      },
      {
         "path":"/storage_service/snapshots",
         "operations":[
--- a/api/api.cc
+++ b/api/api.cc
@@ -93,6 +93,22 @@ static future<> register_api(http_context& ctx, const sstring& api_name,
    });
 }

+future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl) {
+    return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_transport_controller(ctx, r, ctl); });
+}
+
+future<> unset_transport_controller(http_context& ctx) {
+    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_transport_controller(ctx, r); });
+}
+
+future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl) {
+    return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_rpc_controller(ctx, r, ctl); });
+}
+
+future<> unset_rpc_controller(http_context& ctx) {
+    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_rpc_controller(ctx, r); });
+}
+
 future<> set_server_storage_service(http_context& ctx) {
    return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
 }
--- a/api/api_init.hh
+++ b/api/api_init.hh
@@ -25,6 +25,8 @@

 namespace service { class load_meter; }
 namespace locator { class token_metadata; }
+namespace cql_transport { class controller; }
+class thrift_controller;

 namespace api {

@@ -48,6 +50,10 @@ future<> set_server_init(http_context& ctx);
 future<> set_server_config(http_context& ctx);
 future<> set_server_snitch(http_context& ctx);
 future<> set_server_storage_service(http_context& ctx);
+future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);
+future<> unset_transport_controller(http_context& ctx);
+future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);
+future<> unset_rpc_controller(http_context& ctx);
 future<> set_server_snapshot(http_context& ctx);
 future<> set_server_gossip(http_context& ctx);
 future<> set_server_load_sstable(http_context& ctx);
--- a/api/column_family.cc
+++ b/api/column_family.cc
@@ -650,7 +650,7 @@ void set_column_family(http_context& ctx, routes& r) {
    cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
-                return sst->filter_size();
+                return s + sst->filter_size();
            });
        }, std::plus<uint64_t>());
    });
@@ -658,7 +658,7 @@ void set_column_family(http_context& ctx, routes& r) {
    cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
        return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
-                return sst->filter_size();
+                return s + sst->filter_size();
            });
        }, std::plus<uint64_t>());
    });
@@ -666,7 +666,7 @@ void set_column_family(http_context& ctx, routes& r) {
    cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
-                return sst->filter_memory_size();
+                return s + sst->filter_memory_size();
            });
        }, std::plus<uint64_t>());
    });
@@ -674,7 +674,7 @@ void set_column_family(http_context& ctx, routes& r) {
    cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
        return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
-                return sst->filter_memory_size();
+                return s + sst->filter_memory_size();
            });
        }, std::plus<uint64_t>());
    });
@@ -682,7 +682,7 @@ void set_column_family(http_context& ctx, routes& r) {
    cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
-                return sst->get_summary().memory_footprint();
+                return s + sst->get_summary().memory_footprint();
            });
        }, std::plus<uint64_t>());
    });
@@ -690,7 +690,7 @@ void set_column_family(http_context& ctx, routes& r) {
    cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
        return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
-                return sst->get_summary().memory_footprint();
+                return s + sst->get_summary().memory_footprint();
            });
        }, std::plus<uint64_t>());
    });
--- a/api/commitlog.cc
+++ b/api/commitlog.cc
@@ -20,7 +20,7 @@
 */

 #include "commitlog.hh"
-#include <db/commitlog/commitlog.hh>
+#include "db/commitlog/commitlog.hh"
 #include "api/api-doc/commitlog.json.hh"
 #include "database.hh"
 #include <vector>
--- a/api/gossiper.cc
+++ b/api/gossiper.cc
@@ -21,7 +21,7 @@

 #include "gossiper.hh"
 #include "api/api-doc/gossiper.json.hh"
-#include <gms/gossiper.hh>
+#include "gms/gossiper.hh"

 namespace api {
 using namespace json;
--- a/api/storage_proxy.cc
+++ b/api/storage_proxy.cc
@@ -116,6 +116,23 @@ static future<json::json_return_type>  sum_timed_rate_as_long(distributed<proxy>
    });
 }

+utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val) {
+    utils_json::estimated_histogram res;
+    for (size_t i = 0; i < val.size(); i++) {
+        res.buckets.push(val.get(i));
+        res.bucket_offsets.push(val.get_bucket_lower_limit(i));
+    }
+    return res;
+}
+
+static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::time_estimated_histogram service::storage_proxy_stats::stats::*f) {
+
+    return two_dimensional_map_reduce(ctx.sp, f, utils::time_estimated_histogram_merge,
+            utils::time_estimated_histogram()).then([](const utils::time_estimated_histogram& val) {
+        return make_ready_future<json::json_return_type>(time_to_json_histogram(val));
+    });
+}
+
 static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::estimated_histogram service::storage_proxy_stats::stats::*f) {

    return two_dimensional_map_reduce(ctx.sp, f, utils::estimated_histogram_merge,
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -41,6 +41,8 @@
 #include "sstables/sstables.hh"
 #include "database.hh"
 #include "db/extensions.hh"
+#include "transport/controller.hh"
+#include "thrift/controller.hh"

 namespace api {

@@ -85,21 +87,66 @@ static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {
    };
 }

-future<> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
+future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
    if (tables.empty()) {
        tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
    }
-    return ctx.db.invoke_on_all([keyspace, tables, enabled] (database& db) {
-        return parallel_for_each(tables, [&db, keyspace, enabled](const sstring& table) mutable {
-            column_family& cf = db.find_column_family(keyspace, table);
-            if (enabled) {
-                cf.enable_auto_compaction();
-            } else {
-                cf.disable_auto_compaction();
-            }
-            return make_ready_future<>();
+
+    return service::get_local_storage_service().set_tables_autocompaction(keyspace, tables, enabled).then([]{
+        return make_ready_future<json::json_return_type>(json_void());
+    });
+}
+
+void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {
+    ss::start_native_transport.set(r, [&ctl](std::unique_ptr<request> req) {
+        return ctl.start_server().then([] {
+            return make_ready_future<json::json_return_type>(json_void());
        });
    });
+
+    ss::stop_native_transport.set(r, [&ctl](std::unique_ptr<request> req) {
+        return ctl.stop_server().then([] {
+            return make_ready_future<json::json_return_type>(json_void());
+        });
+    });
+
+    ss::is_native_transport_running.set(r, [&ctl] (std::unique_ptr<request> req) {
+        return ctl.is_server_running().then([] (bool running) {
+            return make_ready_future<json::json_return_type>(running);
+        });
+    });
+}
+
+void unset_transport_controller(http_context& ctx, routes& r) {
+    ss::start_native_transport.unset(r);
+    ss::stop_native_transport.unset(r);
+    ss::is_native_transport_running.unset(r);
+}
+
+void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl) {
+    ss::stop_rpc_server.set(r, [&ctl](std::unique_ptr<request> req) {
+        return ctl.stop_server().then([] {
+            return make_ready_future<json::json_return_type>(json_void());
+        });
+    });
+
+    ss::start_rpc_server.set(r, [&ctl](std::unique_ptr<request> req) {
+        return ctl.start_server().then([] {
+            return make_ready_future<json::json_return_type>(json_void());
+        });
+    });
+
+    ss::is_rpc_server_running.set(r, [&ctl] (std::unique_ptr<request> req) {
+        return ctl.is_server_running().then([] (bool running) {
+            return make_ready_future<json::json_return_type>(running);
+        });
+    });
+}
+
+void unset_rpc_controller(http_context& ctx, routes& r) {
+    ss::stop_rpc_server.unset(r);
+    ss::start_rpc_server.unset(r);
+    ss::is_rpc_server_running.unset(r);
 }

 void set_storage_service(http_context& ctx, routes& r) {
@@ -232,6 +279,12 @@ void set_storage_service(http_context& ctx, routes& r) {
                req.get_query_param("key")));
    });

+    ss::cdc_streams_check_and_repair.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return service::get_local_storage_service().check_and_repair_cdc_streams().then([] {
+            return make_ready_future<json::json_return_type>(json_void());
+        });
+    });
+
    ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
        auto keyspace = validate_keyspace(ctx, req->param);
        auto column_families = split_cf(req->get_query_param("cf"));
@@ -496,42 +549,6 @@ void set_storage_service(http_context& ctx, routes& r) {
        });
    });

-    ss::stop_rpc_server.set(r, [](std::unique_ptr<request> req) {
-        return service::get_local_storage_service().stop_rpc_server().then([] {
-            return make_ready_future<json::json_return_type>(json_void());
-        });
-    });
-
-    ss::start_rpc_server.set(r, [](std::unique_ptr<request> req) {
-        return service::get_local_storage_service().start_rpc_server().then([] {
-            return make_ready_future<json::json_return_type>(json_void());
-        });
-    });
-
-    ss::is_rpc_server_running.set(r, [] (std::unique_ptr<request> req) {
-        return service::get_local_storage_service().is_rpc_server_running().then([] (bool running) {
-            return make_ready_future<json::json_return_type>(running);
-        });
-    });
-
-    ss::start_native_transport.set(r, [](std::unique_ptr<request> req) {
-        return service::get_local_storage_service().start_native_transport().then([] {
-            return make_ready_future<json::json_return_type>(json_void());
-        });
-    });
-
-    ss::stop_native_transport.set(r, [](std::unique_ptr<request> req) {
-        return service::get_local_storage_service().stop_native_transport().then([] {
-            return make_ready_future<json::json_return_type>(json_void());
-        });
-    });
-
-    ss::is_native_transport_running.set(r, [] (std::unique_ptr<request> req) {
-        return service::get_local_storage_service().is_native_transport_running().then([] (bool running) {
-            return make_ready_future<json::json_return_type>(running);
-        });
-    });
-
    ss::join_ring.set(r, [](std::unique_ptr<request> req) {
        return make_ready_future<json::json_return_type>(json_void());
    });
@@ -718,17 +735,15 @@ void set_storage_service(http_context& ctx, routes& r) {
    ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
        auto keyspace = validate_keyspace(ctx, req->param);
        auto tables = split_cf(req->get_query_param("cf"));
-        return set_tables_autocompaction(ctx, keyspace, tables, true).then([]{
-            return make_ready_future<json::json_return_type>(json_void());
-        });
+
+        return set_tables_autocompaction(ctx, keyspace, tables, true);
    });

    ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
        auto keyspace = validate_keyspace(ctx, req->param);
        auto tables = split_cf(req->get_query_param("cf"));
-        return set_tables_autocompaction(ctx, keyspace, tables, false).then([]{
-            return make_ready_future<json::json_return_type>(json_void());
-        });
+
+        return set_tables_autocompaction(ctx, keyspace, tables, false);
    });

    ss::deliver_hints.set(r, [](std::unique_ptr<request> req) {
@@ -1005,12 +1020,12 @@ void set_snapshot(http_context& ctx, routes& r) {

    ss::take_snapshot.set(r, [](std::unique_ptr<request> req) {
        auto tag = req->get_query_param("tag");
-        auto column_family = req->get_query_param("cf");
+        auto column_families = split(req->get_query_param("cf"), ",");

        std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");

        auto resp = make_ready_future<>();
-        if (column_family.empty()) {
+        if (column_families.empty()) {
            resp = service::get_local_storage_service().take_snapshot(tag, keynames);
        } else {
            if (keynames.empty()) {
@@ -1019,7 +1034,7 @@ void set_snapshot(http_context& ctx, routes& r) {
            if (keynames.size() > 1) {
                throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
            }
-            resp = service::get_local_storage_service().take_column_family_snapshot(keynames[0], column_family, tag);
+            resp = service::get_local_storage_service().take_column_family_snapshot(keynames[0], column_families, tag);
        }
        return resp.then([] {
            return make_ready_future<json::json_return_type>(json_void());
--- a/api/storage_service.hh
+++ b/api/storage_service.hh
@@ -23,9 +23,16 @@

 #include "api.hh"

+namespace cql_transport { class controller; }
+class thrift_controller;
+
 namespace api {

 void set_storage_service(http_context& ctx, routes& r);
+void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl);
+void unset_transport_controller(http_context& ctx, routes& r);
+void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl);
+void unset_rpc_controller(http_context& ctx, routes& r);
 void set_snapshot(http_context& ctx, routes& r);

 }
--- a/atomic_cell.hh
+++ b/atomic_cell.hh
@@ -29,7 +29,6 @@
 #include <seastar/net//byteorder.hh>
 #include <cstdint>
 #include <iosfwd>
-#include <seastar/util/gcc6-concepts.hh>
 #include "data/cell.hh"
 #include "data/schema_info.hh"
 #include "imr/utils.hh"
--- a/auth/service.cc
+++ b/auth/service.cc
@@ -178,7 +178,7 @@ future<> service::start(::service::migration_manager& mm) {
        return create_keyspace_if_missing(mm);
    }).then([this] {
        return _role_manager->start().then([this] {
-            return when_all_succeed(_authorizer->start(), _authenticator->start());
+            return when_all_succeed(_authorizer->start(), _authenticator->start()).discard_result();
        });
    }).then([this] {
        _permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
@@ -199,7 +199,7 @@ future<> service::stop() {
        }
        return make_ready_future<>();
    }).then([this] {
-        return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
+        return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop()).discard_result();
    });
 }

@@ -458,7 +458,9 @@ future<> drop_role(const service& ser, std::string_view name) {

        return when_all_succeed(
                a.revoke_all(name),
-                a.revoke_all(r)).handle_exception_type([](const unsupported_authorization_operation&) {
+                a.revoke_all(r))
+                    .discard_result()
+                    .handle_exception_type([](const unsupported_authorization_operation&) {
            // Nothing.
        });
    }).then([&ser, name] {
@@ -471,7 +473,7 @@ future<> drop_role(const service& ser, std::string_view name) {
 future<bool> has_role(const service& ser, std::string_view grantee, std::string_view name) {
    return when_all_succeed(
            validate_role_exists(ser, name),
-            ser.get_roles(grantee)).then([name](role_set all_roles) {
+            ser.get_roles(grantee)).then_unpack([name](role_set all_roles) {
        return make_ready_future<bool>(all_roles.count(sstring(name)) != 0);
    });
 }
--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -161,7 +161,7 @@ future<> standard_role_manager::create_metadata_tables_if_missing() const {
                    meta::role_members_table::name,
                    _qp,
                    create_role_members_query,
-                    _migration_manager));
+                    _migration_manager)).discard_result();
 }

 future<> standard_role_manager::create_default_role_if_missing() const {
@@ -367,7 +367,7 @@ future<> standard_role_manager::drop(std::string_view role_name) const {
                    {sstring(role_name)}).discard_result();
        };

-        return when_all_succeed(revoke_from_members(), revoke_members_of()).then([delete_role = std::move(delete_role)] {
+        return when_all_succeed(revoke_from_members(), revoke_members_of()).then_unpack([delete_role = std::move(delete_role)] {
            return delete_role();
        });
    });
@@ -416,7 +416,7 @@ standard_role_manager::modify_membership(
        return make_ready_future<>();
    };

-    return when_all_succeed(modify_roles(), modify_role_members());
+    return when_all_succeed(modify_roles(), modify_role_members()).discard_result();
 }

 future<>
@@ -445,7 +445,7 @@ standard_role_manager::grant(std::string_view grantee_name, std::string_view rol
        });
    };

-   return when_all_succeed(check_redundant(), check_cycle()).then([this, role_name, grantee_name] {
+   return when_all_succeed(check_redundant(), check_cycle()).then_unpack([this, role_name, grantee_name] {
       return this->modify_membership(grantee_name, role_name, membership_change::add);
   });
 }
--- a/caching_options.hh
+++ b/caching_options.hh
@@ -39,7 +39,10 @@ class caching_options {

    sstring _key_cache;
    sstring _row_cache;
-    caching_options(sstring k, sstring r) : _key_cache(k), _row_cache(r) {
+    bool _enabled = true;
+    caching_options(sstring k, sstring r, bool enabled)
+        : _key_cache(k), _row_cache(r), _enabled(enabled)
+    {
        if ((k != "ALL") && (k != "NONE")) {
            throw exceptions::configuration_exception("Invalid key value: " + k); 
        }
@@ -59,36 +62,53 @@ class caching_options {
    caching_options() : _key_cache(default_key), _row_cache(default_row) {}
 public:

+    bool enabled() const {
+        return _enabled;
+    }
+
    std::map<sstring, sstring> to_map() const {
-        return {{ "keys", _key_cache }, { "rows_per_partition", _row_cache }};
+        std::map<sstring, sstring> res = {{ "keys", _key_cache },
+                { "rows_per_partition", _row_cache }};
+        if (!_enabled) {
+            res.insert({"enabled", "false"});
+        }
+        return res;
    }

    sstring to_sstring() const {
        return json::to_json(to_map());
    }

+    static caching_options get_disabled_caching_options() {
+        return caching_options("NONE", "NONE", false);
+    }
+
    template<typename Map>
    static caching_options from_map(const Map & map) {
        sstring k = default_key;
        sstring r = default_row;
+        bool e = true;

        for (auto& p : map) {
            if (p.first == "keys") {
                k = p.second;
            } else if (p.first == "rows_per_partition") {
                r = p.second;
+            } else if (p.first == "enabled") {
+                e = p.second == "true";
            } else {
                throw exceptions::configuration_exception("Invalid caching option: " + p.first);
            }
        }
-        return caching_options(k, r);
+        return caching_options(k, r, e);
    }
    static caching_options from_sstring(const sstring& str) {
        return from_map(json::to_map(str));
    }

    bool operator==(const caching_options& other) const {
-        return _key_cache == other._key_cache && _row_cache == other._row_cache;
+        return _key_cache == other._key_cache && _row_cache == other._row_cache
+            && _enabled == other._enabled;
    }
    bool operator!=(const caching_options& other) const {
        return !(*this == other);
--- a/cdc/generation.cc
+++ b/cdc/generation.cc
@@ -190,12 +190,7 @@ public:
        , _bootstrap_tokens(bootstrap_tokens)
        , _token_metadata(token_metadata)
        , _gossiper(gossiper)
-    {
-        if (_bootstrap_tokens.empty()) {
-            throw std::runtime_error(
-                    "cdc: bootstrap tokens is empty in generate_topology_description");
-        }
-    }
+    {}

    /*
     * Generate a set of CDC stream identifiers such that for each shard
@@ -257,8 +252,6 @@ db_clock::time_point make_new_cdc_generation(
        db::system_distributed_keyspace& sys_dist_ks,
        std::chrono::milliseconds ring_delay,
        bool for_testing) {
-    assert(!bootstrap_tokens.empty());
-
    auto gen = topology_description_generator(cfg, bootstrap_tokens, tm, g).generate();

    // Begin the race.
--- a/cdc/log.cc
+++ b/cdc/log.cc
@@ -51,6 +51,7 @@
 #include "types/listlike_partial_deserializing_iterator.hh"
 #include "tracing/trace_state.hh"
 #include "stats.hh"
+#include "compaction_strategy.hh"

 namespace std {

@@ -173,6 +174,7 @@ public:
            auto& db = _ctxt._proxy.get_db().local();
            auto logname = log_name(schema.cf_name());
            check_that_cdc_log_table_does_not_exist(db, schema, logname);
+            ensure_that_table_has_no_counter_columns(schema);

            // in seastar thread
            auto log_schema = create_log_schema(schema);
@@ -199,6 +201,7 @@ public:
            }
            if (is_cdc) {
                check_for_attempt_to_create_nested_cdc_log(new_schema);
+                ensure_that_table_has_no_counter_columns(new_schema);
            }

            auto logname = log_name(old_schema.cf_name());
@@ -263,6 +266,13 @@ private:
                    schema.ks_name(), logname));
        }
    }
+
+    static void ensure_that_table_has_no_counter_columns(const schema& schema) {
+        if (schema.is_counter()) {
+            throw exceptions::invalid_request_exception(format("Cannot create CDC log for table {}.{}. Counter support not implemented",
+                    schema.ks_name(), schema.cf_name()));
+        }
+    }
 };

 cdc::cdc_service::cdc_service(service::storage_proxy& proxy)
@@ -276,6 +286,7 @@ cdc::cdc_service::cdc_service(db_context ctxt)
 }

 future<> cdc::cdc_service::stop() {
+    _impl->_ctxt._proxy.set_cdc_service(nullptr);
    return _impl->stop();
 }

@@ -392,12 +403,37 @@ bytes log_data_column_deleted_elements_name_bytes(const bytes& column_name) {
 static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID> uuid) {
    schema_builder b(s.ks_name(), log_name(s.cf_name()));
    b.with_partitioner("com.scylladb.dht.CDCPartitioner");
+    b.set_compaction_strategy(sstables::compaction_strategy_type::time_window);
    b.set_comment(sprint("CDC log for %s.%s", s.ks_name(), s.cf_name()));
+    auto ttl_seconds = s.cdc_options().ttl();
+    if (ttl_seconds > 0) {
+        b.set_gc_grace_seconds(0);
+        auto ceil = [] (int dividend, int divisor) {
+            return dividend / divisor + (dividend % divisor == 0 ? 0 : 1);
+        };
+        auto seconds_to_minutes = [] (int seconds_value) {
+            using namespace std::chrono;
+            return std::chrono::ceil<minutes>(seconds(seconds_value)).count();
+        };
+        // What's the minimum window that won't create more than 24 sstables.
+        auto window_seconds = ceil(ttl_seconds, 24);
+        auto window_minutes = seconds_to_minutes(window_seconds);
+        b.set_compaction_strategy_options({
+                {"compaction_window_unit", "MINUTES"},
+                {"compaction_window_size", std::to_string(window_minutes)},
+                // A new SSTable will become fully expired every
+                // `window_seconds` seconds so we shouldn't check for expired
+                // sstables too often.
+                {"expired_sstable_check_frequency_seconds",
+                        std::to_string(std::max(1, window_seconds / 2))},
+        });
+    }
    b.with_column(log_meta_column_name_bytes("stream_id"), bytes_type, column_kind::partition_key);
    b.with_column(log_meta_column_name_bytes("time"), timeuuid_type, column_kind::clustering_key);
    b.with_column(log_meta_column_name_bytes("batch_seq_no"), int32_type, column_kind::clustering_key);
    b.with_column(log_meta_column_name_bytes("operation"), data_type_for<operation_native_type>());
    b.with_column(log_meta_column_name_bytes("ttl"), long_type);
+    b.set_caching_options(caching_options::get_disabled_caching_options());
    auto add_columns = [&] (const schema::const_iterator_range_type& columns, bool is_data_col = false) {
        for (const auto& column : columns) {
            auto type = column.type;
@@ -443,7 +479,7 @@ static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID>
    if (uuid) {
        b.set_uuid(*uuid);
    }
-    
+
    return b.build();
 }

@@ -521,6 +557,12 @@ api::timestamp_type find_timestamp(const schema& s, const mutation& m) {
                    [&] (collection_mutation_view_description mview) {
                t = mview.tomb.timestamp;
                if (t != api::missing_timestamp) {
+                    // A collection tombstone with timestamp T can be created with:
+                    // UPDATE ks.t USING TIMESTAMP T + 1 SET X = null WHERE ...
+                    // where X is a non-atomic column.
+                    // This is, among others, the reason why we show it in the CDC log
+                    // with cdc$time using timestamp T + 1 instead of T.
+                    t += 1;
                    return stop_iteration::yes;
                }

@@ -716,17 +758,79 @@ private:
    const column_definition& _op_col;
    const column_definition& _ttl_col;
    ttl_opt _cdc_ttl_opt;
+
    /**
-     * #6070
-     * When mutation splitting was added, non-atomic column assignments were broken
-     * into two invocation of transform. This means the second (actual data assignment)
-     * does not know about the tombstone in first one -> postimage is created as if 
-     * we were _adding_ to the collection, not replacing it. 
+     * #6070, #6084
+     * Non-atomic column assignments which use a TTL are broken into two invocations
+     * of `transform`, such as in the following example:
+     * CREATE TABLE t (a int PRIMARY KEY, b map<int, int>) WITH cdc = {'enabled':true};
+     * UPDATE t USING TTL 5 SET b = {0:0} WHERE a = 0;
+     *
+     * The above UPDATE creates a tombstone and a (0, 0) cell; because tombstones don't have the notion
+     * of a TTL, we split the UPDATE into two separate changes (represented as two separate delta rows in the log,
+     * resulting in two invocations of `transform`): one change for the deletion with no TTL,
+     * and one change for adding cells with TTL = 5.
+     *
+     * In other words, we use the fact that
+     * UPDATE t USING TTL 5 SET b = {0:0} WHERE a = 0;
+     * is equivalent to
+     * BEGIN UNLOGGED BATCH
+     *  UPDATE t SET b = null WHERE a = 0;
+     *  UPDATE t USING TTL 5 SET b = b + {0:0} WHERE a = 0;
+     * APPLY BATCH;
+     * (the mutations are the same in both cases),
+     * and perform a separate `transform` call for each statement in the batch.
+     *
+     * An assignment also happens when an INSERT statement is used as follows:
+     * INSERT INTO t (a, b) VALUES (0, {0:0}) USING TTL 5;
     * 
-     * Not pretty, but to handle this we use the knowledge that we always get 
-     * invoked in timestamp order -> tombstone first, then assign.
-     * So we simply keep track of non-atomic columns deleted across calls 
-     * and filter out preimage data post this.
+     * This will be split into three separate changes (three invocations of `transform`):
+     * 1. One with TTL = 5 for the row marker (introduces by the INSERT), indicating that a row was inserted.
+     * 2. One without a TTL for the tombstone, indicating that the collection was cleared.
+     * 3. One with TTL = 5 for the addition of cell (0, 0), indicating that the collection
+     *    was extended by a new key/value.
+     *
+     * Why do we need three changes and not two, like in the UPDATE case?
+     * The tombstone needs to be a separate change because it doesn't have a TTL,
+     * so only the row marker change could potentially be merged with the cell change (1 and 3 above).
+     * However, we cannot do that: the row marker change is of INSERT type (cdc$operation == cdc::operation::insert),
+     * but there is no way to create a statement that
+     * - has a row marker,
+     * - adds cells to a collection,
+     * - but *doesn't* add a tombstone for this collection.
+     * INSERT statements that modify collections *always* add tombstones.
+     *
+     * Merging the row marker with the cell addition would result in such an impossible statement.
+     *
+     * Instead, we observe that
+     * INSERT INTO t (a, b) VALUES (0, {0:0}) USING TTL 5;
+     * is equivalent to
+     * BEGIN UNLOGGED BATCH
+     *  INSERT INTO t (a) VALUES (0) USING TTL 5;
+     *  UPDATE t SET b = null WHERE a = 0;
+     *  UPDATE t USING TTL 5 SET b = b + {0:0} WHERE a = 0;
+     * APPLY BATCH;
+     * and perform a separate `transform` call for each statement in the batch.
+     *
+     * Unfortunately, due to splitting, the cell addition call (b + b {0:0}) does not know about the tombstone.
+     * If it was performed independently from the tombstone call, it would create a wrong post-image:
+     * the post-image would look as if the previous cells still existed.
+     * For example, suppose that b was equal to {1:1} before the above statement was performed.
+     * Then the final post-image for b for above statement/batch would be {0:0, 1:1}, when instead it should be {0:0}.
+     *
+     * To handle this we use the fact that
+     * 1. changes without a TTL are treated as if TTL = 0,
+     * 2. `transform` is invoked in order of increasing TTLs,
+     * and we maintain state between `transform` invocations (`_non_atomic_column_deletes`).
+     *
+     * Thus, the tombstone call will happen *before* the cell addition call,
+     * so the cell addition call will know that there previously was a tombstone
+     * and create a correct post-image.
+     *
+     * Furthermore, `transform` calls for INSERT changes (i.e. with a row marker)
+     * happen before `transform` calls for UPDATE changes, so in the case of an INSERT
+     * which modifies a collection column as above, the row marker call will happen first;
+     * its post-image will still show {1:1} for the collection column. Good.
     */
    std::unordered_set<const column_definition*> _non_atomic_column_deletes;

@@ -929,6 +1033,9 @@ public:
                                        : value.value().first_fragment()
                                        ;
                                    value_callback(key, val, live);
+                                    if (value.is_live_and_has_ttl()) {
+                                        ttl = value.ttl();
+                                    }
                                }
                            };

@@ -1382,7 +1489,7 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
                tracing::trace(tr_state, "CDC: Preimage not enabled for the table, not querying current value of {}", m.decorated_key());
            }

-            return f.then([trans = std::move(trans), &mutations, idx, tr_state = std::move(tr_state), &details] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
+            return f.then([trans = std::move(trans), &mutations, idx, tr_state, &details] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
                auto& m = mutations[idx];
                auto& s = m.schema();
                details.had_preimage |= s->cdc_options().preimage();
--- a/cdc/log.hh
+++ b/cdc/log.hh
@@ -75,7 +75,7 @@ class metadata;
 /// CDC service will listen for schema changes and iff CDC is enabled/changed
 /// create/modify/delete corresponding log tables etc as part of the schema change. 
 ///
-class cdc_service {
+class cdc_service final : public async_sharded_service<cdc::cdc_service> {
    class impl;
    std::unique_ptr<impl> _impl;
 public:
--- a/cdc/split.cc
+++ b/cdc/split.cc
@@ -30,23 +30,16 @@ struct atomic_column_update {
    atomic_cell cell;
 };

-// see the comment inside `clustered_row_insert` for motivation for separating
-// nonatomic deletions from nonatomic updates
-struct nonatomic_column_deletion {
-    column_id id;
-    tombstone t;
-};
-
 struct nonatomic_column_update {
    column_id id;
+    tombstone t; // optional
    utils::chunked_vector<std::pair<bytes, atomic_cell>> cells;
 };

 struct static_row_update {
    gc_clock::duration ttl;
    std::vector<atomic_column_update> atomic_entries;
-    std::vector<nonatomic_column_deletion> nonatomic_deletions;
-    std::vector<nonatomic_column_update> nonatomic_updates;
+    std::vector<nonatomic_column_update> nonatomic_entries;
 };

 struct clustered_row_insert {
@@ -54,19 +47,14 @@ struct clustered_row_insert {
    clustering_key key;
    row_marker marker;
    std::vector<atomic_column_update> atomic_entries;
-    std::vector<nonatomic_column_deletion> nonatomic_deletions;
-    // INSERTs can't express updates of individual cells inside a non-atomic
-    // (without deleting the entire field first), so no `nonatomic_updates` field
-    // overwriting a nonatomic column inside an INSERT will be split into two changes:
-    // one with a nonatomic deletion, and one with a nonatomic update
+    std::vector<nonatomic_column_update> nonatomic_entries;
 };

 struct clustered_row_update {
    gc_clock::duration ttl;
    clustering_key key;
    std::vector<atomic_column_update> atomic_entries;
-    std::vector<nonatomic_column_deletion> nonatomic_deletions;
-    std::vector<nonatomic_column_update> nonatomic_updates;
+    std::vector<nonatomic_column_update> nonatomic_entries;
 };

 struct clustered_row_deletion {
@@ -95,8 +83,7 @@ using set_of_changes = std::map<api::timestamp_type, batch>;

 struct row_update {
    std::vector<atomic_column_update> atomic_entries;
-    std::vector<nonatomic_column_deletion> nonatomic_deletions;
-    std::vector<nonatomic_column_update> nonatomic_updates;
+    std::vector<nonatomic_column_update> nonatomic_entries;
 };

 static
@@ -122,7 +109,7 @@ extract_row_updates(const row& r, column_kind ckind, const schema& schema) {
                        v.timestamp(),
                        v.is_live_and_has_ttl() ? v.ttl() : gc_clock::duration(0)
                    );
-                auto& updates = result[timestamp_and_ttl].nonatomic_updates;
+                auto& updates = result[timestamp_and_ttl].nonatomic_entries;
                if (updates.empty() || updates.back().id != id) {
                    updates.push_back({id, {}});
                }
@@ -130,8 +117,12 @@ extract_row_updates(const row& r, column_kind ckind, const schema& schema) {
            }

            if (desc.tomb) {
-                auto timestamp_and_ttl = std::pair(desc.tomb.timestamp, gc_clock::duration(0));
-                result[timestamp_and_ttl].nonatomic_deletions.push_back({id, desc.tomb});
+                auto timestamp_and_ttl = std::pair(desc.tomb.timestamp + 1, gc_clock::duration(0));
+                auto& updates = result[timestamp_and_ttl].nonatomic_entries;
+                if (updates.empty() || updates.back().id != id) {
+                    updates.push_back({id, {}});
+                }
+                updates.back().t = std::move(desc.tomb);
            }
        });
    });
@@ -148,8 +139,7 @@ set_of_changes extract_changes(const mutation& base_mutation, const schema& base
        res[timestamp].static_updates.push_back({
                ttl,
                std::move(up.atomic_entries),
-                std::move(up.nonatomic_deletions),
-                std::move(up.nonatomic_updates)
+                std::move(up.nonatomic_entries)
            });
    }

@@ -173,6 +163,9 @@ set_of_changes extract_changes(const mutation& base_mutation, const schema& base
        };

        for (auto& [k, up]: cr_updates) {
+            // It is important that changes in the resulting `set_of_changes` are listed
+            // in increasing TTL order. The reason is explained in a comment in cdc/log.cc,
+            // search for "#6070".
            auto [timestamp, ttl] = k;

            if (is_insert(timestamp, ttl)) {
@@ -181,25 +174,70 @@ set_of_changes extract_changes(const mutation& base_mutation, const schema& base
                        cr.key(),
                        marker,
                        std::move(up.atomic_entries),
-                        std::move(up.nonatomic_deletions)
+                        {}
                    });
-                if (!up.nonatomic_updates.empty()) {
-                    // nonatomic updates cannot be expressed with an INSERT.
-                    res[timestamp].clustered_updates.push_back({
-                            ttl,
-                            cr.key(),
-                            {},
-                            {},
-                            std::move(up.nonatomic_updates)
-                        });
+
+                auto& cr_insert = res[timestamp].clustered_inserts.back();
+                bool clustered_update_exists = false;
+                for (auto& nonatomic_up: up.nonatomic_entries) {
+                    // Updating a collection column with an INSERT statement implies inserting a tombstone.
+                    //
+                    // For example, suppose that we have:
+                    //     CREATE TABLE t (a int primary key, b map<int, int>);
+                    // Then the following statement:
+                    //     INSERT INTO t (a, b) VALUES (0, {0:0}) USING TIMESTAMP T;
+                    // creates a tombstone in column b with timestamp T-1.
+                    // It also creates a cell (0, 0) with timestamp T.
+                    //
+                    // There is no way to create just the cell using an INSERT statement.
+                    // This can only be done using an UPDATE, as follows:
+                    //     UPDATE t USING TIMESTAMP T SET b = b + {0:0} WHERE a = 0;
+                    // note that this is different  than
+                    //     UPDATE t USING TIMESTAMP T SET b = {0:0} WHERE a = 0;
+                    // which also creates a tombstone with timestamp T-1.
+                    //
+                    // It follows that:
+                    // - if `nonatomic_up` has a tombstone, it can be made merged with our `cr_insert`,
+                    //   which represents an INSERT change.
+                    // - but if `nonatomic_up` only has cells, we must create a separate UPDATE change
+                    //   for the cells alone.
+                    if (nonatomic_up.t) {
+                        cr_insert.nonatomic_entries.push_back(std::move(nonatomic_up));
+                    } else {
+                        if (!clustered_update_exists) {
+                            res[timestamp].clustered_updates.push_back({
+                                ttl,
+                                cr.key(),
+                                {},
+                                {}
+                            });
+
+                            // Multiple iterations of this `for` loop (for different collection columns)
+                            // might want to put their `nonatomic_up`s into an UPDATE change;
+                            // but we don't want to create a separate change for each of them, reusing one instead.
+                            //
+                            // Example:
+                            // CREATE TABLE t (a int primary key, b map<int, int>, c map <int, int>) with cdc = {'enabled':true};
+                            // insert into t (a, b, c) values (0, {1:1}, {2:2}) USING TTL 5;
+                            //
+                            // this should create 3 delta rows:
+                            // 1. one for the row marker (indicating an INSERT), with TTL 5
+                            // 2. one for the b and c tombstones, without TTL (cdc$ttl = null)
+                            // 3. one for the b and c cells, with TTL 5
+                            // This logic takes care that b cells and c cells are put into a single change (3. above).
+                            clustered_update_exists = true;
+                        }
+
+                        auto& cr_update = res[timestamp].clustered_updates.back();
+                        cr_update.nonatomic_entries.push_back(std::move(nonatomic_up));
+                    }
                }
            } else {
                res[timestamp].clustered_updates.push_back({
                        ttl,
                        cr.key(),
                        std::move(up.atomic_entries),
-                        std::move(up.nonatomic_deletions),
-                        std::move(up.nonatomic_updates)
+                        std::move(up.nonatomic_entries)
                    });
            }
        }
@@ -271,7 +309,7 @@ bool should_split(const mutation& base_mutation, const schema& base_schema) {
            }

            if (desc.tomb) {
-                if (check_or_set(desc.tomb.timestamp, gc_clock::duration(0))) {
+                if (check_or_set(desc.tomb.timestamp + 1, gc_clock::duration(0))) {
                    should_split = true;
                    return;
                }
@@ -326,7 +364,7 @@ bool should_split(const mutation& base_mutation, const schema& base_schema) {
                }

                if (mview.tomb) {
-                    if (check_or_set(mview.tomb.timestamp, gc_clock::duration(0))) {
+                    if (check_or_set(mview.tomb.timestamp + 1, gc_clock::duration(0))) {
                        should_split = true;
                        return;
                    }
@@ -392,13 +430,9 @@ void for_each_change(const mutation& base_mutation, const schema_ptr& base_schem
                auto& cdef = base_schema->column_at(column_kind::static_column, atomic_update.id);
                m.set_static_cell(cdef, std::move(atomic_update.cell));
            }
-            for (auto& nonatomic_delete : sr_update.nonatomic_deletions) {
-                auto& cdef = base_schema->column_at(column_kind::static_column, nonatomic_delete.id);
-                m.set_static_cell(cdef, collection_mutation_description{nonatomic_delete.t, {}}.serialize(*cdef.type));
-            }
-            for (auto& nonatomic_update : sr_update.nonatomic_updates) {
+            for (auto& nonatomic_update : sr_update.nonatomic_entries) {
                auto& cdef = base_schema->column_at(column_kind::static_column, nonatomic_update.id);
-                m.set_static_cell(cdef, collection_mutation_description{{}, std::move(nonatomic_update.cells)}.serialize(*cdef.type));
+                m.set_static_cell(cdef, collection_mutation_description{nonatomic_update.t, std::move(nonatomic_update.cells)}.serialize(*cdef.type));
            }
            f(std::move(m), change_ts, tuuid, batch_no);
        }
@@ -411,9 +445,9 @@ void for_each_change(const mutation& base_mutation, const schema_ptr& base_schem
                auto& cdef = base_schema->column_at(column_kind::regular_column, atomic_update.id);
                row.cells().apply(cdef, std::move(atomic_update.cell));
            }
-            for (auto& nonatomic_delete : cr_insert.nonatomic_deletions) {
-                auto& cdef = base_schema->column_at(column_kind::regular_column, nonatomic_delete.id);
-                row.cells().apply(cdef, collection_mutation_description{nonatomic_delete.t, {}}.serialize(*cdef.type));
+            for (auto& nonatomic_update : cr_insert.nonatomic_entries) {
+                auto& cdef = base_schema->column_at(column_kind::regular_column, nonatomic_update.id);
+                row.cells().apply(cdef, collection_mutation_description{nonatomic_update.t, std::move(nonatomic_update.cells)}.serialize(*cdef.type));
            }
            row.apply(cr_insert.marker);

@@ -428,13 +462,9 @@ void for_each_change(const mutation& base_mutation, const schema_ptr& base_schem
                auto& cdef = base_schema->column_at(column_kind::regular_column, atomic_update.id);
                row.apply(cdef, std::move(atomic_update.cell));
            }
-            for (auto& nonatomic_delete : cr_update.nonatomic_deletions) {
-                auto& cdef = base_schema->column_at(column_kind::regular_column, nonatomic_delete.id);
-                row.apply(cdef, collection_mutation_description{nonatomic_delete.t, {}}.serialize(*cdef.type));
-            }
-            for (auto& nonatomic_update : cr_update.nonatomic_updates) {
+            for (auto& nonatomic_update : cr_update.nonatomic_entries) {
                auto& cdef = base_schema->column_at(column_kind::regular_column, nonatomic_update.id);
-                row.apply(cdef, collection_mutation_description{{}, std::move(nonatomic_update.cells)}.serialize(*cdef.type));
+                row.apply(cdef, collection_mutation_description{nonatomic_update.t, std::move(nonatomic_update.cells)}.serialize(*cdef.type));
            }

            f(std::move(m), change_ts, tuuid, batch_no);
--- a/clustering_bounds_comparator.hh
+++ b/clustering_bounds_comparator.hh
@@ -122,26 +122,26 @@ public:
        return {_empty_prefix, bound_kind::incl_end};
    }
    template<template<typename> typename R>
-    GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
+    requires Range<R, clustering_key_prefix_view>
    static bound_view from_range_start(const R<clustering_key_prefix>& range) {
        return range.start()
               ? bound_view(range.start()->value(), range.start()->is_inclusive() ? bound_kind::incl_start : bound_kind::excl_start)
               : bottom();
    }
    template<template<typename> typename R>
-    GCC6_CONCEPT( requires Range<R, clustering_key_prefix> )
+    requires Range<R, clustering_key_prefix>
    static bound_view from_range_end(const R<clustering_key_prefix>& range) {
        return range.end()
               ? bound_view(range.end()->value(), range.end()->is_inclusive() ? bound_kind::incl_end : bound_kind::excl_end)
               : top();
    }
    template<template<typename> typename R>
-    GCC6_CONCEPT( requires Range<R, clustering_key_prefix> )
+    requires Range<R, clustering_key_prefix>
    static std::pair<bound_view, bound_view> from_range(const R<clustering_key_prefix>& range) {
        return {from_range_start(range), from_range_end(range)};
    }
    template<template<typename> typename R>
-    GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
+    requires Range<R, clustering_key_prefix_view>
    static std::optional<typename R<clustering_key_prefix_view>::bound> to_range_bound(const bound_view& bv) {
        if (&bv._prefix.get() == &_empty_prefix) {
            return {};
--- a/collection_mutation.cc
+++ b/collection_mutation.cc
@@ -61,7 +61,7 @@ bool collection_mutation_view::is_empty() const {
 }

 template <typename F>
-GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
+requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>
 static bool is_any_live(const atomic_cell_value_view& data, tombstone tomb, gc_clock::time_point now, F&& read_cell_type_info) {
    auto in = collection_mutation_input_stream(data);
    auto has_tomb = in.read_trivial<bool>();
@@ -108,7 +108,7 @@ bool collection_mutation_view::is_any_live(const abstract_type& type, tombstone
 }

 template <typename F>
-GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
+requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>
 static api::timestamp_type last_update(const atomic_cell_value_view& data, F&& read_cell_type_info) {
    auto in = collection_mutation_input_stream(data);
    api::timestamp_type max = api::missing_timestamp;
@@ -313,7 +313,7 @@ collection_mutation collection_mutation_view_description::serialize(const abstra
 }

 template <typename C>
-GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
+requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>
 static collection_mutation_view_description
 merge(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type) {
    using element_type = std::pair<bytes_view, atomic_cell_view>;
@@ -375,7 +375,7 @@ collection_mutation merge(const abstract_type& type, collection_mutation_view a,
 }

 template <typename C>
-GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
+requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>
 static collection_mutation_view_description
 difference(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type)
 {
@@ -421,7 +421,7 @@ collection_mutation difference(const abstract_type& type, collection_mutation_vi
 }

 template <typename F>
-GCC6_CONCEPT(requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, collection_mutation_input_stream&>)
+requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, collection_mutation_input_stream&>
 static collection_mutation_view_description
 deserialize_collection_mutation(collection_mutation_input_stream& in, F&& read_kv) {
    collection_mutation_view_description ret;
--- a/compaction_strategy.hh
+++ b/compaction_strategy.hh
@@ -23,11 +23,13 @@

 #include <seastar/core/future.hh>
 #include <seastar/util/noncopyable_function.hh>
+#include <seastar/core/file.hh>

 #include "schema_fwd.hh"
 #include "sstables/shared_sstable.hh"
 #include "exceptions/exceptions.hh"
 #include "sstables/compaction_backlog_manager.hh"
+#include "compaction_strategy_type.hh"

 class table;
 using column_family = table;
@@ -37,15 +39,6 @@ struct mutation_source_metadata;

 namespace sstables {

-enum class compaction_strategy_type {
-    null,
-    major,
-    size_tiered,
-    leveled,
-    date_tiered,
-    time_window,
-};
-
 class compaction_strategy_impl;
 class sstable;
 class sstable_set;
@@ -70,8 +63,6 @@ public:

    compaction_descriptor get_major_compaction_job(column_family& cf, std::vector<shared_sstable> candidates);

-    std::vector<resharding_descriptor> get_resharding_jobs(column_family& cf, std::vector<shared_sstable> candidates);
-
    // Some strategies may look at the compacted and resulting sstables to
    // get some useful information for subsequent compactions.
    void notify_completion(const std::vector<shared_sstable>& removed, const std::vector<shared_sstable>& added);
@@ -143,6 +134,20 @@ public:

    // Returns whether or not interposer consumer is used by a given strategy.
    bool use_interposer_consumer() const;
+
+    // Informs the caller (usually the compaction manager) about what would it take for this set of
+    // SSTables closer to becoming in-strategy. If this returns an empty compaction descriptor, this
+    // means that the sstable set is already in-strategy.
+    //
+    // The caller can specify one of two modes: strict or relaxed. In relaxed mode the tolerance for
+    // what is considered offstrategy is higher. It can be used, for instance, for when the system
+    // is restarting and previous compactions were likely in-flight. In strict mode, we are less
+    // tolerant to invariant breakages.
+    //
+    // The caller should also pass a maximum number of SSTables which is the maximum amount of
+    // SSTables that can be added into a single job.
+    compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, const ::io_priority_class& iop, reshape_mode mode);
+
 };

 // Creates a compaction_strategy object from one of the strategies available.
--- a/compaction_strategy_type.hh
+++ b/compaction_strategy_type.hh
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2020 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+namespace sstables {
+
+enum class compaction_strategy_type {
+    null,
+    major,
+    size_tiered,
+    leveled,
+    date_tiered,
+    time_window,
+};
+
+enum class reshape_mode { strict, relaxed };
+}
--- a/compound.hh
+++ b/compound.hh
@@ -29,7 +29,6 @@
 #include <boost/range/adaptor/transformed.hpp>
 #include "utils/serialization.hh"
 #include <seastar/util/backtrace.hh>
-#include "unimplemented.hh"

 enum class allow_prefixes { no, yes };

@@ -91,7 +90,7 @@ private:
        return len;
    }
 public:
-    bytes serialize_single(bytes&& v) {
+    bytes serialize_single(bytes&& v) const {
        return serialize_value({std::move(v)});
    }
    template<typename RangeOfSerializedComponents>
@@ -109,7 +108,7 @@ public:
    static bytes serialize_value(std::initializer_list<T> values) {
        return serialize_value(boost::make_iterator_range(values.begin(), values.end()));
    }
-    bytes serialize_optionals(const std::vector<bytes_opt>& values) {
+    bytes serialize_optionals(const std::vector<bytes_opt>& values) const {
        return serialize_value(values | boost::adaptors::transformed([] (const bytes_opt& bo) -> bytes_view {
            if (!bo) {
                throw std::logic_error("attempted to create key component from empty optional");
@@ -117,7 +116,7 @@ public:
            return *bo;
        }));
    }
-    bytes serialize_value_deep(const std::vector<data_value>& values) {
+    bytes serialize_value_deep(const std::vector<data_value>& values) const {
        // TODO: Optimize
        std::vector<bytes> partial;
        partial.reserve(values.size());
@@ -128,7 +127,7 @@ public:
        }
        return serialize_value(partial);
    }
-    bytes decompose_value(const value_type& values) {
+    bytes decompose_value(const value_type& values) const {
        return serialize_value(values);
    }
    class iterator : public std::iterator<std::input_iterator_tag, const bytes_view> {
@@ -180,7 +179,7 @@ public:
    static boost::iterator_range<iterator> components(const bytes_view& v) {
        return { begin(v), end(v) };
    }
-    value_type deserialize_value(bytes_view v) {
+    value_type deserialize_value(bytes_view v) const {
        std::vector<bytes> result;
        result.reserve(_types.size());
        std::transform(begin(v), end(v), std::back_inserter(result), [] (auto&& v) {
@@ -188,10 +187,10 @@ public:
        });
        return result;
    }
-    bool less(bytes_view b1, bytes_view b2) {
+    bool less(bytes_view b1, bytes_view b2) const {
        return compare(b1, b2) < 0;
    }
-    size_t hash(bytes_view v) {
+    size_t hash(bytes_view v) const {
        if (_byte_order_equal) {
            return std::hash<bytes_view>()(v);
        }
@@ -203,7 +202,7 @@ public:
        }
        return h;
    }
-    int compare(bytes_view b1, bytes_view b2) {
+    int compare(bytes_view b1, bytes_view b2) const {
        if (_byte_order_comparable) {
            if (_is_reversed) {
                return compare_unsigned(b2, b1);
@@ -224,11 +223,21 @@ public:
    bool is_empty(bytes_view v) const {
        return begin(v) == end(v);
    }
-    void validate(bytes_view v) {
-        // FIXME: implement
-        warn(unimplemented::cause::VALIDATION);
+    void validate(bytes_view v) const {
+        std::vector<bytes_view> values(begin(v), end(v));
+        if (AllowPrefixes == allow_prefixes::no && values.size() < _types.size()) {
+            throw marshal_exception(fmt::format("compound::validate(): non-prefixable compound cannot be a prefix"));
+        }
+        if (values.size() > _types.size()) {
+            throw marshal_exception(fmt::format("compound::validate(): cannot have more values than types, have {} values but only {} types",
+                        values.size(), _types.size()));
+        }
+        for (size_t i = 0; i != values.size(); ++i) {
+            //FIXME: is it safe to assume internal serialization-format format?
+            _types[i]->validate(values[i], cql_serialization_format::internal());
+        }
    }
-    bool equal(bytes_view v1, bytes_view v2) {
+    bool equal(bytes_view v1, bytes_view v2) const {
        if (_byte_order_equal) {
            return compare_unsigned(v1, v2) == 0;
        }
--- a/compound_compat.hh
+++ b/compound_compat.hh
@@ -213,6 +213,8 @@ public:
            , _is_compound(true)
    { }

+    explicit composite(const composite_view& v);
+
    composite()
            : _bytes()
            , _is_compound(true)
@@ -503,6 +505,7 @@ public:
 };

 class composite_view final {
+    friend class composite;
    bytes_view _bytes;
    bool _is_compound;
 public:
@@ -602,6 +605,11 @@ public:
    }
 };

+inline
+composite::composite(const composite_view& v)
+    : composite(bytes(v._bytes), v._is_compound)
+{ }
+
 inline
 std::ostream& operator<<(std::ostream& os, const composite& v) {
    return os << composite_view(v);
--- a/concrete_types.hh
+++ b/concrete_types.hh
@@ -152,41 +152,39 @@ struct uuid_type_impl final : public concrete_type<utils::UUID> {

 template <typename Func> using visit_ret_type = std::invoke_result_t<Func, const ascii_type_impl&>;

-GCC6_CONCEPT(
-template <typename Func> concept bool CanHandleAllTypes = requires(Func f) {
-    { f(*static_cast<const ascii_type_impl*>(nullptr)) }       -> visit_ret_type<Func>;
-    { f(*static_cast<const boolean_type_impl*>(nullptr)) }     -> visit_ret_type<Func>;
-    { f(*static_cast<const byte_type_impl*>(nullptr)) }        -> visit_ret_type<Func>;
-    { f(*static_cast<const bytes_type_impl*>(nullptr)) }       -> visit_ret_type<Func>;
-    { f(*static_cast<const counter_type_impl*>(nullptr)) }     -> visit_ret_type<Func>;
-    { f(*static_cast<const date_type_impl*>(nullptr)) }        -> visit_ret_type<Func>;
-    { f(*static_cast<const decimal_type_impl*>(nullptr)) }     -> visit_ret_type<Func>;
-    { f(*static_cast<const double_type_impl*>(nullptr)) }      -> visit_ret_type<Func>;
-    { f(*static_cast<const duration_type_impl*>(nullptr)) }    -> visit_ret_type<Func>;
-    { f(*static_cast<const empty_type_impl*>(nullptr)) }       -> visit_ret_type<Func>;
-    { f(*static_cast<const float_type_impl*>(nullptr)) }       -> visit_ret_type<Func>;
-    { f(*static_cast<const inet_addr_type_impl*>(nullptr)) }   -> visit_ret_type<Func>;
-    { f(*static_cast<const int32_type_impl*>(nullptr)) }       -> visit_ret_type<Func>;
-    { f(*static_cast<const list_type_impl*>(nullptr)) }        -> visit_ret_type<Func>;
-    { f(*static_cast<const long_type_impl*>(nullptr)) }        -> visit_ret_type<Func>;
-    { f(*static_cast<const map_type_impl*>(nullptr)) }         -> visit_ret_type<Func>;
-    { f(*static_cast<const reversed_type_impl*>(nullptr)) }    -> visit_ret_type<Func>;
-    { f(*static_cast<const set_type_impl*>(nullptr)) }         -> visit_ret_type<Func>;
-    { f(*static_cast<const short_type_impl*>(nullptr)) }       -> visit_ret_type<Func>;
-    { f(*static_cast<const simple_date_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
-    { f(*static_cast<const time_type_impl*>(nullptr)) }        -> visit_ret_type<Func>;
-    { f(*static_cast<const timestamp_type_impl*>(nullptr)) }   -> visit_ret_type<Func>;
-    { f(*static_cast<const timeuuid_type_impl*>(nullptr)) }    -> visit_ret_type<Func>;
-    { f(*static_cast<const tuple_type_impl*>(nullptr)) }       -> visit_ret_type<Func>;
-    { f(*static_cast<const user_type_impl*>(nullptr)) }        -> visit_ret_type<Func>;
-    { f(*static_cast<const utf8_type_impl*>(nullptr)) }        -> visit_ret_type<Func>;
-    { f(*static_cast<const uuid_type_impl*>(nullptr)) }        -> visit_ret_type<Func>;
-    { f(*static_cast<const varint_type_impl*>(nullptr)) }      -> visit_ret_type<Func>;
+template <typename Func> concept CanHandleAllTypes = requires(Func f) {
+    { f(*static_cast<const ascii_type_impl*>(nullptr)) }       -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const boolean_type_impl*>(nullptr)) }     -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const byte_type_impl*>(nullptr)) }        -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const bytes_type_impl*>(nullptr)) }       -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const counter_type_impl*>(nullptr)) }     -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const date_type_impl*>(nullptr)) }        -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const decimal_type_impl*>(nullptr)) }     -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const double_type_impl*>(nullptr)) }      -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const duration_type_impl*>(nullptr)) }    -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const empty_type_impl*>(nullptr)) }       -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const float_type_impl*>(nullptr)) }       -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const inet_addr_type_impl*>(nullptr)) }   -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const int32_type_impl*>(nullptr)) }       -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const list_type_impl*>(nullptr)) }        -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const long_type_impl*>(nullptr)) }        -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const map_type_impl*>(nullptr)) }         -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const reversed_type_impl*>(nullptr)) }    -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const set_type_impl*>(nullptr)) }         -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const short_type_impl*>(nullptr)) }       -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const simple_date_type_impl*>(nullptr)) } -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const time_type_impl*>(nullptr)) }        -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const timestamp_type_impl*>(nullptr)) }   -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const timeuuid_type_impl*>(nullptr)) }    -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const tuple_type_impl*>(nullptr)) }       -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const user_type_impl*>(nullptr)) }        -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const utf8_type_impl*>(nullptr)) }        -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const uuid_type_impl*>(nullptr)) }        -> std::same_as<visit_ret_type<Func>>;
+    { f(*static_cast<const varint_type_impl*>(nullptr)) }      -> std::same_as<visit_ret_type<Func>>;
 };
-)

 template<typename Func>
-GCC6_CONCEPT(requires CanHandleAllTypes<Func>)
+requires CanHandleAllTypes<Func>
 static inline visit_ret_type<Func> visit(const abstract_type& t, Func&& f) {
    switch (t.get_kind()) {
    case abstract_type::kind::ascii:
--- a/configure.py
+++ b/configure.py
@@ -32,6 +32,8 @@ import tempfile
 import textwrap
 from distutils.spawn import find_executable

+curdir = os.getcwd()
+
 tempfile.tempdir = "./build/tmp"

 configure_args = str.join(' ', [shlex.quote(x) for x in sys.argv[1:]])
@@ -166,9 +168,27 @@ def maybe_static(flag, libs):
    return libs


-class Thrift(object):
-    def __init__(self, source, service):
+class Source(object):
+    def __init__(self, source, hh_prefix, cc_prefix):
        self.source = source
+        self.hh_prefix = hh_prefix
+        self.cc_prefix = cc_prefix
+
+    def headers(self, gen_dir):
+        return [x for x in self.generated(gen_dir) if x.endswith(self.hh_prefix)]
+
+    def sources(self, gen_dir):
+        return [x for x in self.generated(gen_dir) if x.endswith(self.cc_prefix)]
+
+    def objects(self, gen_dir):
+        return [x.replace(self.cc_prefix, '.o') for x in self.sources(gen_dir)]
+
+    def endswith(self, end):
+        return self.source.endswith(end)
+
+class Thrift(Source):
+    def __init__(self, source, service):
+        Source.__init__(self, source, '.h', '.cpp')
        self.service = service

    def generated(self, gen_dir):
@@ -179,19 +199,6 @@ class Thrift(object):
                  for ext in ['.cpp', '.h']]
        return [os.path.join(gen_dir, file) for file in files]

-    def headers(self, gen_dir):
-        return [x for x in self.generated(gen_dir) if x.endswith('.h')]
-
-    def sources(self, gen_dir):
-        return [x for x in self.generated(gen_dir) if x.endswith('.cpp')]
-
-    def objects(self, gen_dir):
-        return [x.replace('.cpp', '.o') for x in self.sources(gen_dir)]
-
-    def endswith(self, end):
-        return self.source.endswith(end)
-
-
 def default_target_arch():
    if platform.machine() in ['i386', 'i686', 'x86_64']:
        return 'westmere'   # support PCLMUL
@@ -201,9 +208,9 @@ def default_target_arch():
        return ''


-class Antlr3Grammar(object):
+class Antlr3Grammar(Source):
    def __init__(self, source):
-        self.source = source
+        Source.__init__(self, source, '.hpp', '.cpp')

    def generated(self, gen_dir):
        basename = os.path.splitext(self.source)[0]
@@ -211,18 +218,12 @@ class Antlr3Grammar(object):
                 for ext in ['Lexer.cpp', 'Lexer.hpp', 'Parser.cpp', 'Parser.hpp']]
        return [os.path.join(gen_dir, file) for file in files]

-    def headers(self, gen_dir):
-        return [x for x in self.generated(gen_dir) if x.endswith('.hpp')]
-
-    def sources(self, gen_dir):
-        return [x for x in self.generated(gen_dir) if x.endswith('.cpp')]
-
-    def objects(self, gen_dir):
-        return [x.replace('.cpp', '.o') for x in self.sources(gen_dir)]
-
-    def endswith(self, end):
-        return self.source.endswith(end)
+class Json2Code(Source):
+    def __init__(self, source):
+        Source.__init__(self, source, '.hh', '.cc')

+    def generated(self, gen_dir):
+        return [os.path.join(gen_dir, self.source + '.hh'), os.path.join(gen_dir, self.source + '.cc')]

 def find_headers(repodir, excluded_dirs):
    walker = os.walk(repodir)
@@ -248,7 +249,7 @@ def find_headers(repodir, excluded_dirs):

 modes = {
    'debug': {
-        'cxxflags': '-DDEBUG -DDEBUG_LSA_SANITIZER -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSCYLLA_ENABLE_ERROR_INJECTION',
+        'cxxflags': '-DDEBUG -DDEBUG_LSA_SANITIZER -DSCYLLA_ENABLE_ERROR_INJECTION',
        'cxx_ld_flags': '-Wstack-usage=%s' % (1024*40),
    },
    'release': {
@@ -269,6 +270,7 @@ scylla_tests = set([
    'test/boost/UUID_test',
    'test/boost/aggregate_fcts_test',
    'test/boost/allocation_strategy_test',
+    'test/boost/alternator_base64_test',
    'test/boost/anchorless_list_test',
    'test/boost/auth_passwords_test',
    'test/boost/auth_resource_test',
@@ -278,6 +280,7 @@ scylla_tests = set([
    'test/boost/broken_sstable_test',
    'test/boost/bytes_ostream_test',
    'test/boost/cache_flat_mutation_reader_test',
+    'test/boost/cached_file_test',
    'test/boost/caching_options_test',
    'test/boost/canonical_mutation_test',
    'test/boost/cartesian_product_test',
@@ -326,6 +329,7 @@ scylla_tests = set([
    'test/boost/linearizing_input_stream_test',
    'test/boost/loading_cache_test',
    'test/boost/log_heap_test',
+    'test/boost/estimated_histogram_test',
    'test/boost/logalloc_test',
    'test/boost/managed_vector_test',
    'test/boost/map_difference_test',
@@ -365,6 +369,7 @@ scylla_tests = set([
    'test/boost/schema_changes_test',
    'test/boost/sstable_conforms_to_mutation_source_test',
    'test/boost/sstable_resharding_test',
+    'test/boost/sstable_directory_test',
    'test/boost/sstable_test',
    'test/boost/storage_proxy_test',
    'test/boost/top_k_test',
@@ -414,12 +419,13 @@ perf_tests = set([
    'test/perf/perf_mutation_fragment',
    'test/perf/perf_idl',
    'test/perf/perf_vint',
+    'test/perf/perf_big_decimal',
 ])

 apps = set([
    'scylla',
    'test/tools/cql_repl',
-    'tools/scylla_types',
+    'tools/scylla-types',
 ])

 tests = scylla_tests | perf_tests
@@ -453,8 +459,8 @@ arg_parser.add_argument('--c-compiler', action='store', dest='cc', default='gcc'
                        help='C compiler path')
 arg_parser.add_argument('--with-osv', action='store', dest='with_osv', default='',
                        help='Shortcut for compile for OSv')
-arg_parser.add_argument('--enable-dpdk', action='store_true', dest='dpdk', default=False,
-                        help='Enable dpdk (from seastar dpdk sources)')
+add_tristate(arg_parser, name='dpdk', dest='dpdk',
+                        help='Use dpdk (from seastar dpdk sources) (default=True for release builds)')
 arg_parser.add_argument('--dpdk-target', action='store', dest='dpdk_target', default='',
                        help='Path to DPDK SDK target location (e.g. <DPDK SDK dir>/x86_64-native-linuxapp-gcc)')
 arg_parser.add_argument('--debuginfo', action='store', dest='debuginfo', type=int, default=1,
@@ -473,8 +479,6 @@ arg_parser.add_argument('--python', action='store', dest='python', default='pyth
                        help='Python3 path')
 arg_parser.add_argument('--split-dwarf', dest='split_dwarf', action='store_true', default=False,
                        help='use of split dwarf (https://gcc.gnu.org/wiki/DebugFission) to speed up linking')
-arg_parser.add_argument('--enable-gcc6-concepts', dest='gcc6_concepts', action='store_true', default=False,
-                        help='enable experimental support for C++ Concepts as implemented in GCC 6')
 arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_injector', action='store_true', default=False,
                        help='enable allocation failure injection')
 arg_parser.add_argument('--with-antlr3', dest='antlr3_exec', action='store', default=None,
@@ -493,6 +497,7 @@ extra_cxxflags = {}
 cassandra_interface = Thrift(source='interface/cassandra.thrift', service='Cassandra')

 scylla_core = (['database.cc',
+                'absl-flat_hash_map.cc',
                'table.cc',
                'atomic_cell.cc',
                'collection_mutation.cc',
@@ -511,13 +516,13 @@ scylla_core = (['database.cc',
                'frozen_mutation.cc',
                'memtable.cc',
                'schema_mutations.cc',
-                'supervisor.cc',
                'utils/logalloc.cc',
                'utils/large_bitset.cc',
                'utils/buffer_input_stream.cc',
                'utils/limiting_data_source.cc',
                'utils/updateable_value.cc',
                'utils/directories.cc',
+                'utils/generation-number.cc',
                'mutation_partition.cc',
                'mutation_partition_view.cc',
                'mutation_partition_serializer.cc',
@@ -546,9 +551,11 @@ scylla_core = (['database.cc',
                'sstables/integrity_checked_file_impl.cc',
                'sstables/prepended_input_stream.cc',
                'sstables/m_format_read_helpers.cc',
+                'sstables/sstable_directory.cc',
                'transport/event.cc',
                'transport/event_notifier.cc',
                'transport/server.cc',
+                'transport/controller.cc',
                'transport/messages/result_message.cc',
                'cdc/cdc_partitioner.cc',
                'cdc/log.cc',
@@ -571,6 +578,7 @@ scylla_core = (['database.cc',
                'cql3/functions/functions.cc',
                'cql3/functions/aggregate_fcts.cc',
                'cql3/functions/castas_fcts.cc',
+                'cql3/functions/error_injection_fcts.cc',
                'cql3/statements/cf_prop_defs.cc',
                'cql3/statements/cf_statement.cc',
                'cql3/statements/authentication_statement.cc',
@@ -617,6 +625,7 @@ scylla_core = (['database.cc',
                'cql3/role_name.cc',
                'thrift/handler.cc',
                'thrift/server.cc',
+                'thrift/controller.cc',
                'thrift/thrift_validation.cc',
                'utils/runtime.cc',
                'utils/murmur_hash.cc',
@@ -674,6 +683,7 @@ scylla_core = (['database.cc',
                'db/view/view.cc',
                'db/view/view_update_generator.cc',
                'db/view/row_locking.cc',
+                'db/sstables-format-selector.cc',
                'index/secondary_index_manager.cc',
                'index/secondary_index.cc',
                'utils/UUID_gen.cc',
@@ -795,41 +805,41 @@ scylla_core = (['database.cc',
               )

 api = ['api/api.cc',
-       'api/api-doc/storage_service.json',
-       'api/api-doc/lsa.json',
+       Json2Code('api/api-doc/storage_service.json'),
+       Json2Code('api/api-doc/lsa.json'),
       'api/storage_service.cc',
-       'api/api-doc/commitlog.json',
+       Json2Code('api/api-doc/commitlog.json'),
       'api/commitlog.cc',
-       'api/api-doc/gossiper.json',
+       Json2Code('api/api-doc/gossiper.json'),
       'api/gossiper.cc',
-       'api/api-doc/failure_detector.json',
+       Json2Code('api/api-doc/failure_detector.json'),
       'api/failure_detector.cc',
-       'api/api-doc/column_family.json',
+       Json2Code('api/api-doc/column_family.json'),
       'api/column_family.cc',
       'api/messaging_service.cc',
-       'api/api-doc/messaging_service.json',
-       'api/api-doc/storage_proxy.json',
+       Json2Code('api/api-doc/messaging_service.json'),
+       Json2Code('api/api-doc/storage_proxy.json'),
       'api/storage_proxy.cc',
-       'api/api-doc/cache_service.json',
+       Json2Code('api/api-doc/cache_service.json'),
       'api/cache_service.cc',
-       'api/api-doc/collectd.json',
+       Json2Code('api/api-doc/collectd.json'),
       'api/collectd.cc',
-       'api/api-doc/endpoint_snitch_info.json',
+       Json2Code('api/api-doc/endpoint_snitch_info.json'),
       'api/endpoint_snitch.cc',
-       'api/api-doc/compaction_manager.json',
+       Json2Code('api/api-doc/compaction_manager.json'),
       'api/compaction_manager.cc',
-       'api/api-doc/hinted_handoff.json',
+       Json2Code('api/api-doc/hinted_handoff.json'),
       'api/hinted_handoff.cc',
-       'api/api-doc/utils.json',
+       Json2Code('api/api-doc/utils.json'),
       'api/lsa.cc',
-       'api/api-doc/stream_manager.json',
+       Json2Code('api/api-doc/stream_manager.json'),
       'api/stream_manager.cc',
-       'api/api-doc/system.json',
+       Json2Code('api/api-doc/system.json'),
       'api/system.cc',
       'api/config.cc',
-       'api/api-doc/config.json',
-        'api/error_injection.cc',
-        'api/api-doc/error_injection.json',
+       Json2Code('api/api-doc/config.json'),
+       'api/error_injection.cc',
+       Json2Code('api/api-doc/error_injection.json'),
       ]

 alternator = [
@@ -895,6 +905,8 @@ scylla_tests_generic_dependencies = [
    'test/lib/cql_test_env.cc',
    'test/lib/test_services.cc',
    'test/lib/log.cc',
+    'test/lib/reader_permit.cc',
+    'test/lib/test_utils.cc',
 ]

 scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependencies + [
@@ -911,7 +923,7 @@ deps = {
    'scylla': idls + ['main.cc', 'release.cc', 'build_id.cc'] + scylla_core + api + alternator + redis,
    'test/tools/cql_repl': idls + ['test/tools/cql_repl.cc'] + scylla_core + scylla_tests_generic_dependencies,
    #FIXME: we don't need all of scylla_core here, only the types module, need to modularize scylla_core.
-    'tools/scylla_types': idls + ['tools/scylla_types.cc'] + scylla_core,
+    'tools/scylla-types': idls + ['tools/scylla-types.cc'] + scylla_core,
 }

 pure_boost_tests = set([
@@ -950,6 +962,7 @@ pure_boost_tests = set([
 ])

 tests_not_using_seastar_test_framework = set([
+    'test/boost/alternator_base64_test',
    'test/boost/small_vector_test',
    'test/manual/gossip',
    'test/manual/message',
@@ -1000,6 +1013,7 @@ deps['test/boost/UUID_test'] = ['utils/UUID_gen.cc', 'test/boost/UUID_test.cc',
 deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
 deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
 deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
+deps['test/boost/estimated_histogram_test'] = ['test/boost/estimated_histogram_test.cc']
 deps['test/boost/anchorless_list_test'] = ['test/boost/anchorless_list_test.cc']
 deps['test/perf/perf_fast_forward'] += ['release.cc']
 deps['test/perf/perf_simple_query'] += ['release.cc']
@@ -1019,6 +1033,7 @@ deps['test/boost/linearizing_input_stream_test'] = [
 ]

 deps['test/boost/duration_test'] += ['test/lib/exception_utils.cc']
+deps['test/boost/alternator_base64_test'] += ['alternator/base64.cc']

 deps['utils/gz/gen_crc_combine_table'] = ['utils/gz/gen_crc_combine_table.cc']

@@ -1081,34 +1096,14 @@ else:
 # a list element means a list of alternative packages to consider
 # the first element becomes the HAVE_pkg define
 # a string element is a package name with no alternatives
-optional_packages = [['libsystemd', 'libsystemd-daemon']]
+optional_packages = [[]]
 pkgs = []

 # Lua can be provided by lua53 package on Debian-like
 # systems and by Lua on others.
 pkgs.append('lua53' if have_pkg('lua53') else 'lua')

-
-def setup_first_pkg_of_list(pkglist):
-    # The HAVE_pkg symbol is taken from the first alternative
-    upkg = pkglist[0].upper().replace('-', '_')
-    for pkg in pkglist:
-        if have_pkg(pkg):
-            pkgs.append(pkg)
-            defines.append('HAVE_{}=1'.format(upkg))
-            return True
-    return False
-
-
-for pkglist in optional_packages:
-    if isinstance(pkglist, str):
-        pkglist = [pkglist]
-    if not setup_first_pkg_of_list(pkglist):
-        if len(pkglist) == 1:
-            print('Missing optional package {pkglist[0]}'.format(**locals()))
-        else:
-            alternatives = ':'.join(pkglist[1:])
-            print('Missing optional package {pkglist[0]} (or alteratives {alternatives})'.format(**locals()))
+pkgs.append('libsystemd')


 compiler_test_src = '''
@@ -1181,8 +1176,24 @@ extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\
 for m in ['debug', 'release', 'sanitize']:
    modes[m]['cxxflags'] += ' ' + dbgflag

-get_dynamic_linker_output = subprocess.check_output(['./reloc/get-dynamic-linker.sh'], shell=True)
-dynamic_linker = get_dynamic_linker_output.decode('utf-8').strip()
+# The relocatable package includes its own dynamic linker. We don't
+# know the path it will be installed to, so for now use a very long
+# path so that patchelf doesn't need to edit the program headers.  The
+# kernel imposes a limit of 4096 bytes including the null. The other
+# constraint is that the build-id has to be in the first page, so we
+# can't use all 4096 bytes for the dynamic linker.
+# In here we just guess that 2000 extra / should be enough to cover
+# any path we get installed to but not so large that the build-id is
+# pushed to the second page.
+# At the end of the build we check that the build-id is indeed in the
+# first page. At install time we check that patchelf doesn't modify
+# the program headers.
+
+gcc_linker_output = subprocess.check_output(['gcc', '-###', '/dev/null', '-o', 't'], stderr=subprocess.STDOUT).decode('utf-8')
+original_dynamic_linker = re.search('-dynamic-linker ([^ ]*)', gcc_linker_output).groups()[0]
+# gdb has a SO_NAME_MAX_PATH_SIZE of 512, so limit the path size to
+# that. The 512 includes the null at the end, hence the 511 bellow.
+dynamic_linker = '/' * (511 - len(original_dynamic_linker)) + original_dynamic_linker

 forced_ldflags = '-Wl,'

@@ -1198,13 +1209,14 @@ args.user_ldflags = forced_ldflags + ' ' + args.user_ldflags

 args.user_cflags += ' -Wno-error=stack-usage='

+args.user_cflags += f"-ffile-prefix-map={curdir}=."
+
 seastar_cflags = args.user_cflags
 if args.target != '':
    seastar_cflags += ' -march=' + args.target
 seastar_ldflags = args.user_ldflags

 libdeflate_cflags = seastar_cflags
-zstd_cflags = seastar_cflags + ' -Wno-implicit-fallthrough'

 MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }

@@ -1218,8 +1230,8 @@ def configure_seastar(build_dir, mode):
        '-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON',
        '-DSeastar_CXX_FLAGS={}'.format((seastar_cflags + ' ' + modes[mode]['cxx_ld_flags']).replace(' ', ';')),
        '-DSeastar_LD_FLAGS={}'.format(seastar_ldflags),
-        '-DSeastar_CXX_DIALECT=gnu++17',
-        '-DSeastar_STD_OPTIONAL_VARIANT_STRINGVIEW=ON',
+        '-DSeastar_CXX_DIALECT=gnu++20',
+        '-DSeastar_API_LEVEL=4',
        '-DSeastar_UNUSED_RESULT_ERROR=ON',
    ]

@@ -1227,10 +1239,11 @@ def configure_seastar(build_dir, mode):
        stack_guards = 'ON' if args.stack_guards else 'OFF'
        seastar_cmake_args += ['-DSeastar_STACK_GUARDS={}'.format(stack_guards)]

-    if args.dpdk:
+    dpdk = args.dpdk
+    if dpdk is None:
+        dpdk = mode == 'release'
+    if dpdk:
        seastar_cmake_args += ['-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm']
-    if args.gcc6_concepts:
-        seastar_cmake_args += ['-DSeastar_GCC6_CONCEPTS=ON']
    if args.split_dwarf:
        seastar_cmake_args += ['-DSeastar_SPLIT_DWARF=ON']
    if args.alloc_failure_injector:
@@ -1238,7 +1251,7 @@ def configure_seastar(build_dir, mode):

    seastar_cmd = ['cmake', '-G', 'Ninja', os.path.relpath(args.seastar_path, seastar_build_dir)] + seastar_cmake_args
    cmake_dir = seastar_build_dir
-    if args.dpdk:
+    if dpdk:
        # need to cook first
        cmake_dir = args.seastar_path # required by cooking.sh
        relative_seastar_build_dir = os.path.join('..', seastar_build_dir)  # relative to seastar/
@@ -1271,25 +1284,6 @@ for mode in build_modes:
    modes[mode]['seastar_cflags'] = seastar_pc_cflags
    modes[mode]['seastar_libs'] = seastar_pc_libs

-# We need to use experimental features of the zstd library (to use our own allocators for the (de)compression context),
-# which are available only when the library is linked statically.
-def configure_zstd(build_dir, mode):
-    zstd_build_dir = os.path.join(build_dir, mode, 'zstd')
-
-    zstd_cmake_args = [
-        '-DCMAKE_BUILD_TYPE={}'.format(MODE_TO_CMAKE_BUILD_TYPE[mode]),
-        '-DCMAKE_C_COMPILER={}'.format(args.cc),
-        '-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
-        '-DCMAKE_C_FLAGS={}'.format(zstd_cflags),
-        '-DZSTD_BUILD_PROGRAMS=OFF'
-    ]
-
-    zstd_cmd = ['cmake', '-G', 'Ninja', os.path.relpath('zstd/build/cmake', zstd_build_dir)] + zstd_cmake_args
-
-    print(zstd_cmd)
-    os.makedirs(zstd_build_dir, exist_ok=True)
-    subprocess.check_call(zstd_cmd, shell=False, cwd=zstd_build_dir)
-
 def configure_abseil(build_dir, mode):
    abseil_build_dir = os.path.join(build_dir, mode, 'abseil')

@@ -1334,6 +1328,9 @@ args.user_cflags += " " + pkg_config('jsoncpp', '--cflags')
 args.user_cflags += ' -march=' + args.target
 libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-latomic', '-llz4', '-lz', '-lsnappy', pkg_config('jsoncpp', '--libs'),
                 ' -lstdc++fs', ' -lcrypt', ' -lcryptopp', ' -lpthread',
+                 # Must link with static version of libzstd, since
+                 # experimental APIs that we use are only present there.
+                 maybe_static(True, '-lzstd'),
                 maybe_static(args.staticboost, '-lboost_date_time -lboost_regex -licuuc'), ])

 pkgconfig_libs = [
@@ -1388,9 +1385,6 @@ if args.ragel_exec:
 else:
    ragel_exec = "ragel"

-for mode in build_modes:
-    configure_zstd(outdir, mode)
-
 for mode in build_modes:
    configure_abseil(outdir, mode)

@@ -1417,7 +1411,7 @@ with open(buildfile_tmp, 'w') as f:
            command = echo -e $text > $out
            description = GEN $out
        rule swagger
-            command = {args.seastar_path}/scripts/seastar-json2code.py -f $in -o $out
+            command = {args.seastar_path}/scripts/seastar-json2code.py --create-cc -f $in -o $out
            description = SWAGGER $out
        rule serializer
            command = {python} ./idl-compiler.py --ns ser -f $in -o $out
@@ -1439,6 +1433,10 @@ with open(buildfile_tmp, 'w') as f:
            description = COPY $out
        rule package
            command = scripts/create-relocatable-package.py --mode $mode $out
+        rule rpmbuild
+            command = reloc/build_rpm.sh --reloc-pkg $in --builddir $out
+        rule debbuild
+            command = reloc/build_deb.sh --reloc-pkg $in --builddir $out
        ''').format(**globals()))
    for mode in build_modes:
        modeval = modes[mode]
@@ -1446,7 +1444,7 @@ with open(buildfile_tmp, 'w') as f:
        f.write(textwrap.dedent('''\
            cxx_ld_flags_{mode} = {cxx_ld_flags}
            ld_flags_{mode} = $cxx_ld_flags_{mode}
-            cxxflags_{mode} = $cxx_ld_flags_{mode} {cxxflags} -I. -I $builddir/{mode}/gen
+            cxxflags_{mode} = $cxx_ld_flags_{mode} {cxxflags} -iquote. -iquote $builddir/{mode}/gen
            libs_{mode} = -l{fmt_lib}
            seastar_libs_{mode} = {seastar_libs}
            rule cxx.{mode}
@@ -1503,7 +1501,7 @@ with open(buildfile_tmp, 'w') as f:
            )
        )
        compiles = {}
-        swaggers = {}
+        swaggers = set()
        serializers = {}
        thrifts = set()
        ragels = {}
@@ -1525,12 +1523,13 @@ with open(buildfile_tmp, 'w') as f:
                    objs += dep.objects('$builddir/' + mode + '/gen')
                if isinstance(dep, Antlr3Grammar):
                    objs += dep.objects('$builddir/' + mode + '/gen')
+                if isinstance(dep, Json2Code):
+                    objs += dep.objects('$builddir/' + mode + '/gen')
            if binary.endswith('.a'):
                f.write('build $builddir/{}/{}: ar.{} {}\n'.format(mode, binary, mode, str.join(' ', objs)))
            else:
                objs.extend(['$builddir/' + mode + '/' + artifact for artifact in [
                    'libdeflate/libdeflate.a',
-                    'zstd/lib/libzstd.a',
                ] + [
                    'abseil/' + x for x in abseil_libs
                ]])
@@ -1565,8 +1564,7 @@ with open(buildfile_tmp, 'w') as f:
                    hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
                    serializers[hh] = src
                elif src.endswith('.json'):
-                    hh = '$builddir/' + mode + '/gen/' + src + '.hh'
-                    swaggers[hh] = src
+                    swaggers.add(src)
                elif src.endswith('.rl'):
                    hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
                    ragels[hh] = src
@@ -1608,12 +1606,14 @@ with open(buildfile_tmp, 'w') as f:
            )
        )

+        gen_dir = '$builddir/{}/gen'.format(mode)
        gen_headers = []
        for th in thrifts:
            gen_headers += th.headers('$builddir/{}/gen'.format(mode))
        for g in antlr3_grammars:
            gen_headers += g.headers('$builddir/{}/gen'.format(mode))
-        gen_headers += list(swaggers.keys())
+        for g in swaggers:
+            gen_headers += g.headers('$builddir/{}/gen'.format(mode))
        gen_headers += list(serializers.keys())
        gen_headers += list(ragels.keys())
        gen_headers_dep = ' '.join(gen_headers)
@@ -1623,9 +1623,13 @@ with open(buildfile_tmp, 'w') as f:
            f.write('build {}: cxx.{} {} || {} {}\n'.format(obj, mode, src, seastar_dep, gen_headers_dep))
            if src in extra_cxxflags:
                f.write('    cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode=mode, extra_cxxflags=extra_cxxflags[src], **modeval))
-        for hh in swaggers:
-            src = swaggers[hh]
-            f.write('build {}: swagger {} | {}/scripts/seastar-json2code.py\n'.format(hh, src, args.seastar_path))
+        for swagger in swaggers:
+            hh = swagger.headers(gen_dir)[0]
+            cc = swagger.sources(gen_dir)[0]
+            obj = swagger.objects(gen_dir)[0]
+            src = swagger.source
+            f.write('build {} | {} : swagger {} | {}/scripts/seastar-json2code.py\n'.format(hh, cc, src, args.seastar_path))
+            f.write('build {}: cxx.{} {}\n'.format(obj, mode, cc))
        for hh in serializers:
            src = serializers[hh]
            f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh, src))
@@ -1674,17 +1678,20 @@ with open(buildfile_tmp, 'w') as f:
        f.write(textwrap.dedent('''\
            build build/{mode}/iotune: copy build/{mode}/seastar/apps/iotune/iotune
            ''').format(**locals()))
-        f.write('build build/{mode}/scylla-package.tar.gz: package build/{mode}/scylla build/{mode}/iotune build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE | always\n'.format(**locals()))
+        f.write('build build/{mode}/scylla-package.tar.gz: package build/{mode}/scylla build/{mode}/iotune build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE build/debian/debian | always\n'.format(**locals()))
        f.write('  pool = submodule_pool\n')
        f.write('  mode = {mode}\n'.format(**locals()))
+        f.write(f'build build/dist/{mode}/redhat: rpmbuild build/{mode}/scylla-package.tar.gz\n')
+        f.write(f'  pool = submodule_pool\n')
+        f.write(f'  mode = {mode}\n')
+        f.write(f'build build/dist/{mode}/debian: debbuild build/{mode}/scylla-package.tar.gz\n')
+        f.write(f'  pool = submodule_pool\n')
+        f.write(f'  mode = {mode}\n')
+        f.write(f'build dist-server-{mode}: phony build/dist/{mode}/redhat build/dist/{mode}/debian\n')
        f.write('rule libdeflate.{mode}\n'.format(**locals()))
        f.write('  command = make -C libdeflate BUILD_DIR=../build/{mode}/libdeflate/ CFLAGS="{libdeflate_cflags}" CC={args.cc} ../build/{mode}/libdeflate//libdeflate.a\n'.format(**locals()))
        f.write('build build/{mode}/libdeflate/libdeflate.a: libdeflate.{mode}\n'.format(**locals()))
        f.write('  pool = submodule_pool\n')
-        f.write('build build/{mode}/zstd/lib/libzstd.a: ninja\n'.format(**locals()))
-        f.write('  pool = submodule_pool\n')
-        f.write('  subdir = build/{mode}/zstd\n'.format(**locals()))
-        f.write('  target = libzstd.a\n'.format(**locals()))

        for lib in abseil_libs:
            f.write('build build/{mode}/abseil/{lib}: ninja\n'.format(**locals()))
@@ -1702,6 +1709,65 @@ with open(buildfile_tmp, 'w') as f:
            'build check: phony {}\n'.format(' '.join(['{mode}-check'.format(mode=mode) for mode in modes]))
    )

+    f.write(textwrap.dedent(f'''\
+        build dist-server-deb: phony {' '.join(['build/dist/{mode}/debian'.format(mode=mode) for mode in build_modes])}
+        build dist-server-rpm: phony {' '.join(['build/dist/{mode}/redhat'.format(mode=mode) for mode in build_modes])}
+        build dist-server: phony dist-server-rpm dist-server-deb
+
+        rule build-submodule-reloc
+          command = cd $reloc_dir && ./reloc/build_reloc.sh
+        rule build-submodule-rpm
+          command = cd $dir && ./reloc/build_rpm.sh --reloc-pkg $artifact
+        rule build-submodule-deb
+          command = cd $dir && ./reloc/build_deb.sh --reloc-pkg $artifact
+
+        build scylla-jmx/build/scylla-jmx-package.tar.gz: build-submodule-reloc
+          reloc_dir = scylla-jmx
+        build dist-jmx-rpm: build-submodule-rpm scylla-jmx/build/scylla-jmx-package.tar.gz
+          dir = scylla-jmx
+          artifact = build/scylla-jmx-package.tar.gz
+        build dist-jmx-deb: build-submodule-deb scylla-jmx/build/scylla-jmx-package.tar.gz
+          dir = scylla-jmx
+          artifact = build/scylla-jmx-package.tar.gz
+        build dist-jmx: phony dist-jmx-rpm dist-jmx-deb
+
+        build scylla-tools/build/scylla-tools-package.tar.gz: build-submodule-reloc
+          reloc_dir = scylla-tools
+        build dist-tools-rpm: build-submodule-rpm scylla-tools/build/scylla-tools-package.tar.gz
+          dir = scylla-tools
+          artifact = build/scylla-tools-package.tar.gz
+        build dist-tools-deb: build-submodule-deb scylla-tools/build/scylla-tools-package.tar.gz
+          dir = scylla-tools
+          artifact = build/scylla-tools-package.tar.gz
+        build dist-tools: phony dist-tools-rpm dist-tools-deb
+
+        rule build-python-reloc
+          command = ./reloc/python3/build_reloc.sh
+        rule build-python-rpm
+          command = ./reloc/python3/build_rpm.sh
+        rule build-python-deb
+          command = ./reloc/python3/build_deb.sh
+
+        build build/release/scylla-python3-package.tar.gz: build-python-reloc
+        build dist-python-rpm: build-python-rpm build/release/scylla-python3-package.tar.gz
+        build dist-python-deb: build-python-deb build/release/scylla-python3-package.tar.gz
+        build dist-python: phony dist-python-rpm dist-python-deb
+        build dist-deb: phony dist-server-deb dist-python-deb dist-jmx-deb dist-tools-deb
+        build dist-rpm: phony dist-server-rpm dist-python-rpm dist-jmx-rpm dist-tools-rpm
+        build dist: phony dist-server dist-python dist-jmx dist-tools
+        '''))
+
+    f.write(textwrap.dedent(f'''\
+        build dist-check: phony {' '.join(['dist-check-{mode}'.format(mode=mode) for mode in build_modes])}
+        rule dist-check
+          command = ./tools/testing/dist-check/dist-check.sh --mode $mode
+        '''))
+    for mode in build_modes:
+        f.write(textwrap.dedent(f'''\
+        build dist-check-{mode}: dist-check
+          mode = {mode}
+            '''))
+
    f.write(textwrap.dedent('''\
        rule configure
          command = {python} configure.py $configure_args
@@ -1726,6 +1792,9 @@ with open(buildfile_tmp, 'w') as f:
        rule scylla_version_gen
            command = ./SCYLLA-VERSION-GEN
        build build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE: scylla_version_gen
+        rule debian_files_gen
+            command = ./dist/debian/debian_files_gen.py
+        build build/debian/debian: debian_files_gen | always
        ''').format(modes_list=' '.join(build_modes), **globals()))

 os.rename(buildfile_tmp, buildfile)
--- a/counters.hh
+++ b/counters.hh
@@ -73,7 +73,9 @@ public:
        return counter_id(utils::make_random_uuid());
    }
 };
-static_assert(std::is_pod<counter_id>::value, "counter_id should be a POD type");
+static_assert(
+        std::is_standard_layout_v<counter_id> && std::is_trivial_v<counter_id>,
+        "counter_id should be a POD type");

 std::ostream& operator<<(std::ostream& os, const counter_id& id);

@@ -154,10 +156,10 @@ private:
    // Shared logic for applying counter_shards and counter_shard_views.
    // T is either counter_shard or basic_counter_shard_view<U>.
    template<typename T>
-    GCC6_CONCEPT(requires requires(T shard) {
-        { shard.value() } -> int64_t;
-        { shard.logical_clock() } -> int64_t;
-    })
+    requires requires(T shard) {
+        { shard.value() } -> std::same_as<int64_t>;
+        { shard.logical_clock() } -> std::same_as<int64_t>;
+    }
    counter_shard& do_apply(T&& other) noexcept {
        auto other_clock = other.logical_clock();
        if (_logical_clock < other_clock) {
--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -106,7 +106,7 @@ using namespace cql3::statements;
 using namespace cql3::selection;
 using cql3::cql3_type;
 using conditions_type = std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>,lw_shared_ptr<cql3::column_condition::raw>>>;
-using operations_type = std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>,::shared_ptr<cql3::operation::raw_update>>>;
+using operations_type = std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, std::unique_ptr<cql3::operation::raw_update>>>;

 // ANTLR forces us to define a default-initialized return value
 // for every rule (e.g. [returns ut_name name]), but not every type
@@ -255,8 +255,8 @@ struct uninitialized {
        return to_lower(s) == "true";
    }

-    void add_raw_update(std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>,::shared_ptr<cql3::operation::raw_update>>>& operations,
-        ::shared_ptr<cql3::column_identifier::raw> key, ::shared_ptr<cql3::operation::raw_update> update)
+    void add_raw_update(std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, std::unique_ptr<cql3::operation::raw_update>>>& operations,
+        ::shared_ptr<cql3::column_identifier::raw> key, std::unique_ptr<cql3::operation::raw_update> update)
    {
        for (auto&& p : operations) {
            if (*p.first == *key && !p.second->is_compatible_with(update)) {
@@ -532,7 +532,7 @@ updateStatement returns [std::unique_ptr<raw::update_statement> expr]
    @init {
        bool if_exists = false;
        auto attrs = std::make_unique<cql3::attributes::raw>();
-        std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, ::shared_ptr<cql3::operation::raw_update>>> operations;
+        std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, std::unique_ptr<cql3::operation::raw_update>>> operations;
    }
    : K_UPDATE cf=columnFamilyName
      ( usingClause[attrs] )?
@@ -563,7 +563,7 @@ updateConditions returns [conditions_type conditions]
 deleteStatement returns [std::unique_ptr<raw::delete_statement> expr]
    @init {
        auto attrs = std::make_unique<cql3::attributes::raw>();
-        std::vector<::shared_ptr<cql3::operation::raw_deletion>> column_deletions;
+        std::vector<std::unique_ptr<cql3::operation::raw_deletion>> column_deletions;
        bool if_exists = false;
    }
    : K_DELETE ( dels=deleteSelection { column_deletions = std::move(dels); } )?
@@ -581,15 +581,15 @@ deleteStatement returns [std::unique_ptr<raw::delete_statement> expr]
      }
    ;

-deleteSelection returns [std::vector<::shared_ptr<cql3::operation::raw_deletion>> operations]
+deleteSelection returns [std::vector<std::unique_ptr<cql3::operation::raw_deletion>> operations]
    : t1=deleteOp { $operations.emplace_back(std::move(t1)); }
      (',' tN=deleteOp { $operations.emplace_back(std::move(tN)); })*
    ;

-deleteOp returns [::shared_ptr<cql3::operation::raw_deletion> op]
-    : c=cident                { $op = ::make_shared<cql3::operation::column_deletion>(std::move(c)); }
-    | c=cident '[' t=term ']' { $op = ::make_shared<cql3::operation::element_deletion>(std::move(c), std::move(t)); }
-    | c=cident '.' field=ident { $op = ::make_shared<cql3::operation::field_deletion>(std::move(c), std::move(field)); }
+deleteOp returns [std::unique_ptr<cql3::operation::raw_deletion> op]
+    : c=cident                { $op = std::make_unique<cql3::operation::column_deletion>(std::move(c)); }
+    | c=cident '[' t=term ']' { $op = std::make_unique<cql3::operation::element_deletion>(std::move(c), std::move(t)); }
+    | c=cident '.' field=ident { $op = std::make_unique<cql3::operation::field_deletion>(std::move(c), std::move(field)); }
    ;

 usingClauseDelete[std::unique_ptr<cql3::attributes::raw>& attrs]
@@ -1416,12 +1416,12 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
    : t=term ('+' c=cident )?
      {
          if (!c) {
-              add_raw_update(operations, key, ::make_shared<cql3::operation::set_value>(t));
+              add_raw_update(operations, key, std::make_unique<cql3::operation::set_value>(t));
          } else {
              if (*key != *c) {
                add_recognition_error("Only expressions of the form X = <value> + X are supported.");
              }
-              add_raw_update(operations, key, ::make_shared<cql3::operation::prepend>(t));
+              add_raw_update(operations, key, std::make_unique<cql3::operation::prepend>(t));
          }
      }
    | c=cident sig=('+' | '-') t=term
@@ -1429,11 +1429,11 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
          if (*key != *c) {
              add_recognition_error("Only expressions of the form X = X " + $sig.text + "<value> are supported.");
          }
-          shared_ptr<cql3::operation::raw_update> op;
+          std::unique_ptr<cql3::operation::raw_update> op;
          if ($sig.text == "+") {
-              op = make_shared<cql3::operation::addition>(t);
+              op = std::make_unique<cql3::operation::addition>(t);
          } else {
-              op = make_shared<cql3::operation::subtraction>(t);
+              op = std::make_unique<cql3::operation::subtraction>(t);
          }
          add_raw_update(operations, key, std::move(op));
      }
@@ -1444,11 +1444,11 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
              // We don't yet allow a '+' in front of an integer, but we could in the future really, so let's be future-proof in our error message
              add_recognition_error("Only expressions of the form X = X " + sstring($i.text[0] == '-' ? "-" : "+") + " <value> are supported.");
          }
-          add_raw_update(operations, key, make_shared<cql3::operation::addition>(cql3::constants::literal::integer($i.text)));
+          add_raw_update(operations, key, std::make_unique<cql3::operation::addition>(cql3::constants::literal::integer($i.text)));
      }
    | K_SCYLLA_COUNTER_SHARD_LIST '(' t=term ')'
      {
-          add_raw_update(operations, key, ::make_shared<cql3::operation::set_counter_value_from_tuple_list>(t));      
+          add_raw_update(operations, key, std::make_unique<cql3::operation::set_counter_value_from_tuple_list>(t));
      }
    ;

@@ -1458,7 +1458,7 @@ collectionColumnOperation[operations_type& operations,
                          bool by_uuid]
    : '=' t=term
      {
-          add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t, by_uuid));
+          add_raw_update(operations, key, std::make_unique<cql3::operation::set_element>(k, t, by_uuid));
      }
    ;

@@ -1467,7 +1467,7 @@ udtColumnOperation[operations_type& operations,
                   shared_ptr<cql3::column_identifier> field]
    : '=' t=term
      {
-          add_raw_update(operations, std::move(key), make_shared<cql3::operation::set_field>(std::move(field), std::move(t)));
+          add_raw_update(operations, std::move(key), std::make_unique<cql3::operation::set_field>(std::move(field), std::move(t)));
      }
    ;

--- a/cql3/abstract_marker.cc
+++ b/cql3/abstract_marker.cc
@@ -87,7 +87,7 @@ abstract_marker::raw::raw(int32_t bind_index)
    return ::make_shared<constants::marker>(_bind_index, receiver);
 }

-assignment_testable::test_result abstract_marker::raw::test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const {
+assignment_testable::test_result abstract_marker::raw::test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const {
    return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
 }

--- a/cql3/abstract_marker.hh
+++ b/cql3/abstract_marker.hh
@@ -72,7 +72,7 @@ public:

        virtual ::shared_ptr<term> prepare(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override;

-        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override;
+        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const override;

        virtual sstring to_string() const override;
    };
--- a/cql3/assignment_testable.hh
+++ b/cql3/assignment_testable.hh
@@ -70,7 +70,7 @@ public:
    // Test all elements of toTest for assignment. If all are exact match, return exact match. If any is not assignable,
    // return not assignable. Otherwise, return weakly assignable.
    template <typename AssignmentTestablePtrRange>
-    static test_result test_all(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver,
+    static test_result test_all(database& db, const sstring& keyspace, const column_specification& receiver,
                AssignmentTestablePtrRange&& to_test) {
        test_result res = test_result::EXACT_MATCH;
        for (auto&& rt : to_test) {
@@ -99,7 +99,7 @@ public:
     * Most caller should just call the isAssignable() method on the result, though functions have a use for
     * testing "strong" equality to decide the most precise overload to pick when multiple could match.
     */
-    virtual test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const = 0;
+    virtual test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const = 0;

    // for error reporting
    virtual sstring assignment_testable_source_context() const = 0;
--- a/cql3/column_identifier.hh
+++ b/cql3/column_identifier.hh
@@ -139,16 +139,6 @@ static inline
    return def.column_specification->name;
 }

-static inline
-std::vector<::shared_ptr<column_identifier>> to_identifiers(const std::vector<const column_definition*>& defs) {
-    std::vector<::shared_ptr<column_identifier>> r;
-    r.reserve(defs.size());
-    for (auto&& def : defs) {
-        r.push_back(to_identifier(*def));
-    }
-    return r;
-}
-
 }

 namespace std {
--- a/cql3/constants.cc
+++ b/cql3/constants.cc
@@ -82,9 +82,9 @@ constants::literal::parsed_value(data_type validator) const
 }

 assignment_testable::test_result
-constants::literal::test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const
+constants::literal::test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const
 {
-    auto receiver_type = receiver->type->as_cql3_type();
+    auto receiver_type = receiver.type->as_cql3_type();
    if (receiver_type.is_collection() || receiver_type.is_user_type()) {
        return test_result::NOT_ASSIGNABLE;
    }
@@ -157,7 +157,7 @@ constants::literal::test_assignment(database& db, const sstring& keyspace, lw_sh
 ::shared_ptr<term>
 constants::literal::prepare(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const
 {
-    if (!is_assignable(test_assignment(db, keyspace, receiver))) {
+    if (!is_assignable(test_assignment(db, keyspace, *receiver))) {
        throw exceptions::invalid_request_exception(format("Invalid {} constant ({}) for \"{}\" of type {}",
            _type, _text, *receiver->name, receiver->type->as_cql3_type().to_string()));
    }
--- a/cql3/constants.hh
+++ b/cql3/constants.hh
@@ -88,7 +88,7 @@ public:
    public:
        static thread_local const ::shared_ptr<terminal> NULL_VALUE;
        virtual ::shared_ptr<term> prepare(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override {
-            if (!is_assignable(test_assignment(db, keyspace, receiver))) {
+            if (!is_assignable(test_assignment(db, keyspace, *receiver))) {
                throw exceptions::invalid_request_exception("Invalid null value for counter increment/decrement");
            }
            return NULL_VALUE;
@@ -96,8 +96,8 @@ public:

        virtual assignment_testable::test_result test_assignment(database& db,
            const sstring& keyspace,
-            lw_shared_ptr<column_specification> receiver) const override {
-                return receiver->type->is_counter()
+            const column_specification& receiver) const override {
+                return receiver.type->is_counter()
                    ? assignment_testable::test_result::NOT_ASSIGNABLE
                    : assignment_testable::test_result::WEAKLY_ASSIGNABLE;
        }
@@ -161,7 +161,7 @@ public:
            return _text;
        }

-        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const;
+        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const;

        virtual sstring to_string() const override {
            return _type == type::STRING ? sstring(format("'{}'", _text)) : _text;
--- a/cql3/functions/abstract_function.hh
+++ b/cql3/functions/abstract_function.hh
@@ -95,10 +95,6 @@ public:
        return _name.keyspace == ks_name && _name.name == function_name;
    }

-    virtual bool has_reference_to(function& f) const override {
-        return false;
-    }
-
    virtual sstring column_name(const std::vector<sstring>& column_names) const override {
        return format("{}({})", _name, join(", ", column_names));
    }
--- a/cql3/functions/as_json_function.hh
+++ b/cql3/functions/as_json_function.hh
@@ -144,10 +144,6 @@ public:
        return false;
    }

-    virtual bool has_reference_to(function& f) const override {
-        return false;
-    }
-
    virtual sstring column_name(const std::vector<sstring>& column_names) const override {
        return "[json]";
    }
--- a/cql3/functions/error_injection_fcts.cc
+++ b/cql3/functions/error_injection_fcts.cc
@@ -0,0 +1,122 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "error_injection_fcts.hh"
+#include "utils/error_injection.hh"
+#include "types/list.hh"
+
+namespace cql3
+{
+
+namespace functions
+{
+
+namespace error_injection
+{
+
+namespace
+{
+
+template <typename Func, bool Pure>
+class failure_injection_function_for : public failure_injection_function  {
+    Func _func;
+public:
+    failure_injection_function_for(sstring name,
+                                   data_type return_type,
+                                   const std::vector<data_type> arg_types,
+                                   Func&& func)
+        : failure_injection_function(std::move(name), std::move(return_type), std::move(arg_types))
+        , _func(std::forward<Func>(func)) {}
+
+    bool is_pure() const override {
+        return Pure;
+    }
+
+    bytes_opt execute(cql_serialization_format sf, const std::vector<bytes_opt>& parameters) override {
+        return _func(sf, parameters);
+    }
+};
+
+template <bool Pure, typename Func>
+shared_ptr<function>
+make_failure_injection_function(sstring name,
+        data_type return_type,
+        std::vector<data_type> args_type,
+        Func&& func) {
+    return ::make_shared<failure_injection_function_for<Func, Pure>>(std::move(name),
+        std::move(return_type),
+        std::move(args_type),
+        std::forward<Func>(func));
+}
+
+} // anonymous namespace
+
+shared_ptr<function> make_enable_injection_function() {
+    return make_failure_injection_function<false>("enable_injection", empty_type, { ascii_type, ascii_type },
+            [] (cql_serialization_format, const std::vector<bytes_opt>& parameters) {
+        sstring injection_name = ascii_type->get_string(parameters[0].value());
+        const bool one_shot = ascii_type->get_string(parameters[1].value()) == "true";
+        smp::invoke_on_all([injection_name, one_shot] () mutable {
+            utils::get_local_injector().enable(injection_name, one_shot);
+        }).get0();
+        return std::nullopt;
+    });
+}
+
+shared_ptr<function> make_disable_injection_function() {
+    return make_failure_injection_function<false>("disable_injection", empty_type, { ascii_type },
+            [] (cql_serialization_format, const std::vector<bytes_opt>& parameters) {
+        sstring injection_name = ascii_type->get_string(parameters[0].value());
+        smp::invoke_on_all([injection_name] () mutable {
+            utils::get_local_injector().disable(injection_name);
+        }).get0();
+        return std::nullopt;
+    });
+}
+
+shared_ptr<function> make_enabled_injections_function() {
+    const auto list_type_inst = list_type_impl::get_instance(ascii_type, false);
+    return make_failure_injection_function<true>("enabled_injections", list_type_inst, {},
+        [list_type_inst] (cql_serialization_format, const std::vector<bytes_opt>&) -> bytes {
+            return seastar::map_reduce(smp::all_cpus(), [] (unsigned) {
+                return make_ready_future<std::vector<sstring>>(utils::get_local_injector().enabled_injections());
+            }, std::vector<data_value>(),
+            [](std::vector<data_value> a, std::vector<sstring>&& b) -> std::vector<data_value> {
+                for (auto&& x : b) {
+                    if (a.end() == std::find(a.begin(), a.end(), x)) {
+                        a.push_back(data_value(std::move(x)));
+                    }
+                }
+                return a;
+            }).then([list_type_inst](std::vector<data_value> const& active_injections) {
+                auto list_val = make_list_value(list_type_inst, active_injections);
+                return list_type_inst->decompose(list_val);
+            }).get0();
+        });
+}
+
+} // namespace error_injection
+
+} // namespace functions
+
+} // namespace cql3
--- a/cql3/functions/error_injection_fcts.hh
+++ b/cql3/functions/error_injection_fcts.hh
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "native_scalar_function.hh"
+
+namespace cql3
+{
+
+namespace functions
+{
+
+namespace error_injection
+{
+
+class failure_injection_function  : public native_scalar_function {
+protected:
+    failure_injection_function(sstring name, data_type return_type, std::vector<data_type> args_type)
+            : native_scalar_function(std::move(name), std::move(return_type), std::move(args_type)) {
+    }
+
+    bool requires_thread() const override {
+        return true;
+    }
+};
+
+shared_ptr<function> make_enable_injection_function();
+shared_ptr<function> make_disable_injection_function();
+shared_ptr<function> make_enabled_injections_function();
+
+} // namespace error_injection
+
+} // namespace functions
+
+} // namespace cql3
--- a/cql3/functions/function.hh
+++ b/cql3/functions/function.hh
@@ -82,7 +82,6 @@ public:

    virtual void print(std::ostream& os) const = 0;
    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const = 0;
-    virtual bool has_reference_to(function& f) const = 0;

    /**
     * Returns the name of the function to use within a ResultSet.
--- a/cql3/functions/function_call.hh
+++ b/cql3/functions/function_call.hh
@@ -79,7 +79,7 @@ public:
        // All parameters must be terminal
        static bytes_opt execute(scalar_function& fun, std::vector<shared_ptr<term>> parameters);
    public:
-        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override;
+        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const override;
        virtual sstring to_string() const override;
    };
 };
--- a/cql3/functions/functions.cc
+++ b/cql3/functions/functions.cc
@@ -37,6 +37,8 @@
 #include "concrete_types.hh"
 #include "as_json_function.hh"

+#include "error_injection_fcts.hh"
+
 namespace std {
 std::ostream& operator<<(std::ostream& os, const std::vector<data_type>& arg_types) {
    for (size_t i = 0; i < arg_types.size(); ++i) {
@@ -107,6 +109,10 @@ functions::init() {
    declare(make_blob_as_varchar_fct());
    add_agg_functions(ret);

+    declare(error_injection::make_enable_injection_function());
+    declare(error_injection::make_disable_injection_function());
+    declare(error_injection::make_enabled_injections_function());
+
    // also needed for smp:
 #if 0
    MigrationManager.instance.register(new FunctionsMigrationListener());
@@ -152,11 +158,6 @@ functions::make_arg_spec(const sstring& receiver_ks, const sstring& receiver_cf,
                                   fun.arg_types()[i]);
 }

-int
-functions::get_overload_count(const function_name& name) {
-    return _declared.count(name);
-}
-
 inline
 shared_ptr<function>
 make_to_json_function(data_type t) {
@@ -187,7 +188,7 @@ functions::get(database& db,
        const std::vector<shared_ptr<assignment_testable>>& provided_args,
        const sstring& receiver_ks,
        const sstring& receiver_cf,
-        lw_shared_ptr<column_specification> receiver) {
+        const column_specification* receiver) {

    static const function_name TOKEN_FUNCTION_NAME = function_name::native_function("token");
    static const function_name TO_JSON_FUNCTION_NAME = function_name::native_function("tojson");
@@ -370,7 +371,7 @@ functions::validate_types(database& db,
        }

        auto&& expected = make_arg_spec(receiver_ks, receiver_cf, *fun, i);
-        if (!is_assignable(provided->test_assignment(db, keyspace, expected))) {
+        if (!is_assignable(provided->test_assignment(db, keyspace, *expected))) {
            throw exceptions::invalid_request_exception(
                    format("Type error: {} cannot be passed as argument {:d} of function {} of type {}",
                            provided, i, fun->name(), expected->type->as_cql3_type()));
@@ -397,7 +398,7 @@ functions::match_arguments(database& db, const sstring& keyspace,
            continue;
        }
        auto&& expected = make_arg_spec(receiver_ks, receiver_cf, *fun, i);
-        auto arg_res = provided->test_assignment(db, keyspace, expected);
+        auto arg_res = provided->test_assignment(db, keyspace, *expected);
        if (arg_res == assignment_testable::test_result::NOT_ASSIGNABLE) {
            return assignment_testable::test_result::NOT_ASSIGNABLE;
        }
@@ -514,7 +515,7 @@ function_call::raw::prepare(database& db, const sstring& keyspace, lw_shared_ptr
            [] (auto&& x) -> shared_ptr<assignment_testable> {
        return x;
    });
-    auto&& fun = functions::functions::get(db, keyspace, _name, args, receiver->ks_name, receiver->cf_name, receiver);
+    auto&& fun = functions::functions::get(db, keyspace, _name, args, receiver->ks_name, receiver->cf_name, receiver.get());
    if (!fun) {
        throw exceptions::invalid_request_exception(format("Unknown function {} called", _name));
    }
@@ -572,16 +573,16 @@ function_call::raw::execute(scalar_function& fun, std::vector<shared_ptr<term>>
 }

 assignment_testable::test_result
-function_call::raw::test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const {
+function_call::raw::test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const {
    // Note: Functions.get() will return null if the function doesn't exist, or throw is no function matching
    // the arguments can be found. We may get one of those if an undefined/wrong function is used as argument
    // of another, existing, function. In that case, we return true here because we'll throw a proper exception
    // later with a more helpful error message that if we were to return false here.
    try {
-        auto&& fun = functions::get(db, keyspace, _name, _terms, receiver->ks_name, receiver->cf_name, receiver);
-        if (fun && receiver->type == fun->return_type()) {
+        auto&& fun = functions::get(db, keyspace, _name, _terms, receiver.ks_name, receiver.cf_name, &receiver);
+        if (fun && receiver.type == fun->return_type()) {
            return assignment_testable::test_result::EXACT_MATCH;
-        } else if (!fun || receiver->type->is_value_compatible_with(*fun->return_type())) {
+        } else if (!fun || receiver.type->is_value_compatible_with(*fun->return_type())) {
            return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
        } else {
            return assignment_testable::test_result::NOT_ASSIGNABLE;
--- a/cql3/functions/functions.hh
+++ b/cql3/functions/functions.hh
@@ -69,7 +69,6 @@ private:
 public:
    static lw_shared_ptr<column_specification> make_arg_spec(const sstring& receiver_ks, const sstring& receiver_cf,
            const function& fun, size_t i);
-    static int get_overload_count(const function_name& name);
 public:
    static shared_ptr<function> get(database& db,
                                    const sstring& keyspace,
@@ -77,7 +76,7 @@ public:
                                    const std::vector<shared_ptr<assignment_testable>>& provided_args,
                                    const sstring& receiver_ks,
                                    const sstring& receiver_cf,
-                                    lw_shared_ptr<column_specification> receiver = nullptr);
+                                    const column_specification* receiver = nullptr);
    template <typename AssignmentTestablePtrRange>
    static shared_ptr<function> get(database& db,
                                    const sstring& keyspace,
@@ -85,7 +84,7 @@ public:
                                    AssignmentTestablePtrRange&& provided_args,
                                    const sstring& receiver_ks,
                                    const sstring& receiver_cf,
-                                    lw_shared_ptr<column_specification> receiver = nullptr) {
+                                    const column_specification* receiver = nullptr) {
        const std::vector<shared_ptr<assignment_testable>> args(std::begin(provided_args), std::end(provided_args));
        return get(db, keyspace, name, args, receiver_ks, receiver_cf, receiver);
    }
--- a/cql3/lists.cc
+++ b/cql3/lists.cc
@@ -93,7 +93,7 @@ lists::literal::validate_assignable_to(database& db, const sstring keyspace, con
    }
    auto&& value_spec = value_spec_of(receiver);
    for (auto rt : _elements) {
-        if (!is_assignable(rt->test_assignment(db, keyspace, value_spec))) {
+        if (!is_assignable(rt->test_assignment(db, keyspace, *value_spec))) {
            throw exceptions::invalid_request_exception(format("Invalid list literal for {}: value {} is not of type {}",
                    *receiver.name, *rt, value_spec->type->as_cql3_type()));
        }
@@ -101,8 +101,8 @@ lists::literal::validate_assignable_to(database& db, const sstring keyspace, con
 }

 assignment_testable::test_result
-lists::literal::test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const {
-    if (!dynamic_pointer_cast<const list_type_impl>(receiver->type)) {
+lists::literal::test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const {
+    if (!dynamic_pointer_cast<const list_type_impl>(receiver.type)) {
        return assignment_testable::test_result::NOT_ASSIGNABLE;
    }

@@ -111,11 +111,11 @@ lists::literal::test_assignment(database& db, const sstring& keyspace, lw_shared
        return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
    }

-    auto&& value_spec = value_spec_of(*receiver);
+    auto&& value_spec = value_spec_of(receiver);
    std::vector<shared_ptr<assignment_testable>> to_test;
    to_test.reserve(_elements.size());
    std::copy(_elements.begin(), _elements.end(), std::back_inserter(to_test));
-    return assignment_testable::test_all(db, keyspace, value_spec, to_test);
+    return assignment_testable::test_all(db, keyspace, *value_spec, to_test);
 }

 sstring
--- a/cql3/lists.hh
+++ b/cql3/lists.hh
@@ -68,7 +68,7 @@ public:
    private:
        void validate_assignable_to(database& db, const sstring keyspace, const column_specification& receiver) const;
    public:
-        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override;
+        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const override;
        virtual sstring to_string() const override;
    };

--- a/cql3/maps.cc
+++ b/cql3/maps.cc
@@ -104,31 +104,31 @@ maps::literal::validate_assignable_to(database& db, const sstring& keyspace, con
    auto&& key_spec = maps::key_spec_of(receiver);
    auto&& value_spec = maps::value_spec_of(receiver);
    for (auto&& entry : entries) {
-        if (!is_assignable(entry.first->test_assignment(db, keyspace, key_spec))) {
+        if (!is_assignable(entry.first->test_assignment(db, keyspace, *key_spec))) {
            throw exceptions::invalid_request_exception(format("Invalid map literal for {}: key {} is not of type {}", *receiver.name, *entry.first, key_spec->type->as_cql3_type()));
        }
-        if (!is_assignable(entry.second->test_assignment(db, keyspace, value_spec))) {
+        if (!is_assignable(entry.second->test_assignment(db, keyspace, *value_spec))) {
            throw exceptions::invalid_request_exception(format("Invalid map literal for {}: value {} is not of type {}", *receiver.name, *entry.second, value_spec->type->as_cql3_type()));
        }
    }
 }

 assignment_testable::test_result
-maps::literal::test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const {
-    if (!dynamic_pointer_cast<const map_type_impl>(receiver->type)) {
+maps::literal::test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const {
+    if (!dynamic_pointer_cast<const map_type_impl>(receiver.type)) {
        return assignment_testable::test_result::NOT_ASSIGNABLE;
    }
    // If there is no elements, we can't say it's an exact match (an empty map if fundamentally polymorphic).
    if (entries.empty()) {
        return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
    }
-    auto key_spec = maps::key_spec_of(*receiver);
-    auto value_spec = maps::value_spec_of(*receiver);
+    auto key_spec = maps::key_spec_of(receiver);
+    auto value_spec = maps::value_spec_of(receiver);
    // It's an exact match if all are exact match, but is not assignable as soon as any is non assignable.
    auto res = assignment_testable::test_result::EXACT_MATCH;
    for (auto entry : entries) {
-        auto t1 = entry.first->test_assignment(db, keyspace, key_spec);
-        auto t2 = entry.second->test_assignment(db, keyspace, value_spec);
+        auto t1 = entry.first->test_assignment(db, keyspace, *key_spec);
+        auto t2 = entry.second->test_assignment(db, keyspace, *value_spec);
        if (t1 == assignment_testable::test_result::NOT_ASSIGNABLE || t2 == assignment_testable::test_result::NOT_ASSIGNABLE)
            return assignment_testable::test_result::NOT_ASSIGNABLE;
        if (t1 != assignment_testable::test_result::EXACT_MATCH || t2 != assignment_testable::test_result::EXACT_MATCH)
--- a/cql3/maps.hh
+++ b/cql3/maps.hh
@@ -70,7 +70,7 @@ public:
    private:
        void validate_assignable_to(database& db, const sstring& keyspace, const column_specification& receiver) const;
    public:
-        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override;
+        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const override;
        virtual sstring to_string() const override;
    };

--- a/cql3/operation.cc
+++ b/cql3/operation.cc
@@ -87,10 +87,10 @@ operation::set_element::prepare(database& db, const sstring& keyspace, const col
 }

 bool
-operation::set_element::is_compatible_with(shared_ptr<raw_update> other) const {
+operation::set_element::is_compatible_with(const std::unique_ptr<raw_update>& other) const {
    // TODO: we could check that the other operation is not setting the same element
    // too (but since the index/key set may be a bind variables we can't always do it at this point)
-    return !dynamic_pointer_cast<set_value>(std::move(other));
+    return !dynamic_cast<const set_value*>(other.get());
 }

 sstring
@@ -120,13 +120,13 @@ operation::set_field::prepare(database& db, const sstring& keyspace, const colum
 }

 bool
-operation::set_field::is_compatible_with(shared_ptr<raw_update> other) const {
-    auto x = dynamic_pointer_cast<set_field>(other);
+operation::set_field::is_compatible_with(const std::unique_ptr<raw_update>& other) const {
+    auto x = dynamic_cast<const set_field*>(other.get());
    if (x) {
        return _field != x->_field;
    }

-    return !dynamic_pointer_cast<set_value>(std::move(other));
+    return !dynamic_cast<const set_value*>(other.get());
 }

 const column_identifier::raw&
@@ -185,8 +185,8 @@ operation::addition::prepare(database& db, const sstring& keyspace, const column
 }

 bool
-operation::addition::is_compatible_with(shared_ptr<raw_update> other) const {
-    return !dynamic_pointer_cast<set_value>(other);
+operation::addition::is_compatible_with(const std::unique_ptr<raw_update>& other) const {
+    return !dynamic_cast<const set_value*>(other.get());
 }

 sstring
@@ -227,8 +227,8 @@ operation::subtraction::prepare(database& db, const sstring& keyspace, const col
 }

 bool
-operation::subtraction::is_compatible_with(shared_ptr<raw_update> other) const {
-    return !dynamic_pointer_cast<set_value>(other);
+operation::subtraction::is_compatible_with(const std::unique_ptr<raw_update>& other) const {
+    return !dynamic_cast<const set_value*>(other.get());
 }

 sstring
@@ -250,8 +250,8 @@ operation::prepend::prepare(database& db, const sstring& keyspace, const column_
 }

 bool
-operation::prepend::is_compatible_with(shared_ptr<raw_update> other) const {
-    return !dynamic_pointer_cast<set_value>(other);
+operation::prepend::is_compatible_with(const std::unique_ptr<raw_update>& other) const {
+    return !dynamic_cast<const set_value*>(other.get());
 }


@@ -356,7 +356,7 @@ operation::set_counter_value_from_tuple_list::prepare(database& db, const sstrin
 };

 bool
-operation::set_value::is_compatible_with(::shared_ptr <raw_update> other) const {
+operation::set_value::is_compatible_with(const std::unique_ptr<raw_update>& other) const {
    // We don't allow setting multiple time the same column, because 1)
    // it's stupid and 2) the result would seem random to the user.
    return false;
--- a/cql3/operation.hh
+++ b/cql3/operation.hh
@@ -168,7 +168,7 @@ public:
         * @return whether this operation can be applied alongside the {@code
         * other} update (in the same UPDATE statement for the same column).
         */
-        virtual bool is_compatible_with(::shared_ptr<raw_update> other) const = 0;
+        virtual bool is_compatible_with(const std::unique_ptr<raw_update>& other) const = 0;
    };

    /**
@@ -181,7 +181,7 @@ public:
     */
    class raw_deletion {
    public:
-        ~raw_deletion() {}
+        virtual ~raw_deletion() = default;

        /**
         * The name of the column affected by this delete operation.
@@ -218,7 +218,7 @@ public:

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) const override;

-        virtual bool is_compatible_with(shared_ptr<raw_update> other) const override;
+        virtual bool is_compatible_with(const std::unique_ptr<raw_update>& other) const override;
    };

    // Set a single field inside a user-defined type.
@@ -234,7 +234,7 @@ public:

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) const override;

-        virtual bool is_compatible_with(shared_ptr<raw_update> other) const override;
+        virtual bool is_compatible_with(const std::unique_ptr<raw_update>& other) const override;
    };

    // Delete a single field inside a user-defined type.
@@ -263,7 +263,7 @@ public:

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) const override;

-        virtual bool is_compatible_with(shared_ptr<raw_update> other) const override;
+        virtual bool is_compatible_with(const std::unique_ptr<raw_update>& other) const override;
    };

    class subtraction : public raw_update {
@@ -277,7 +277,7 @@ public:

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) const override;

-        virtual bool is_compatible_with(shared_ptr<raw_update> other) const override;
+        virtual bool is_compatible_with(const std::unique_ptr<raw_update>& other) const override;
    };

    class prepend : public raw_update {
@@ -291,7 +291,7 @@ public:

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) const override;

-        virtual bool is_compatible_with(shared_ptr<raw_update> other) const override;
+        virtual bool is_compatible_with(const std::unique_ptr<raw_update>& other) const override;
    };

    class column_deletion;
--- a/cql3/operation_impl.hh
+++ b/cql3/operation_impl.hh
@@ -65,7 +65,7 @@ public:
        }
 #endif

-    virtual bool is_compatible_with(::shared_ptr <raw_update> other) const override;
+    virtual bool is_compatible_with(const std::unique_ptr<raw_update>& other) const override;
 };

 class operation::set_counter_value_from_tuple_list : public set_value {
--- a/cql3/query_options.hh
+++ b/cql3/query_options.hh
@@ -41,7 +41,7 @@

 #pragma once

-#include <seastar/util/gcc6-concepts.hh>
+#include <concepts>
 #include "timestamp.hh"
 #include "bytes.hh"
 #include "db/consistency_level_type.hh"
@@ -97,11 +97,11 @@ private:
     * @param values_ranges a vector of values ranges for each statement in the batch.
     */
    template<typename OneMutationDataRange>
-    GCC6_CONCEPT( requires requires (OneMutationDataRange range) {
+    requires requires (OneMutationDataRange range) {
         std::begin(range);
         std::end(range);
-    } && ( requires (OneMutationDataRange range) { { *range.begin() } -> raw_value_view; } ||
-           requires (OneMutationDataRange range) { { *range.begin() } -> raw_value; } ) )
+    } && ( requires (OneMutationDataRange range) { { *range.begin() } -> std::convertible_to<raw_value_view>; } ||
+           requires (OneMutationDataRange range) { { *range.begin() } -> std::convertible_to<raw_value>; } )
    explicit query_options(query_options&& o, std::vector<OneMutationDataRange> values_ranges);

 public:
@@ -145,11 +145,11 @@ public:
     * @param values_ranges a vector of values ranges for each statement in the batch.
     */
    template<typename OneMutationDataRange>
-    GCC6_CONCEPT( requires requires (OneMutationDataRange range) {
+    requires requires (OneMutationDataRange range) {
         std::begin(range);
         std::end(range);
-    } && ( requires (OneMutationDataRange range) { { *range.begin() } -> raw_value_view; } ||
-           requires (OneMutationDataRange range) { { *range.begin() } -> raw_value; } ) )
+    } && ( requires (OneMutationDataRange range) { { *range.begin() } -> std::convertible_to<raw_value_view>; } ||
+           requires (OneMutationDataRange range) { { *range.begin() } -> std::convertible_to<raw_value>; } )
    static query_options make_batch_options(query_options&& o, std::vector<OneMutationDataRange> values_ranges) {
        return query_options(std::move(o), std::move(values_ranges));
    }
@@ -251,11 +251,11 @@ private:
 };

 template<typename OneMutationDataRange>
-GCC6_CONCEPT( requires requires (OneMutationDataRange range) {
+requires requires (OneMutationDataRange range) {
     std::begin(range);
     std::end(range);
-} && ( requires (OneMutationDataRange range) { { *range.begin() } -> raw_value_view; } ||
-       requires (OneMutationDataRange range) { { *range.begin() } -> raw_value; } ) )
+} && ( requires (OneMutationDataRange range) { { *range.begin() } -> std::convertible_to<raw_value_view>; } ||
+       requires (OneMutationDataRange range) { { *range.begin() } -> std::convertible_to<raw_value>; } )
 query_options::query_options(query_options&& o, std::vector<OneMutationDataRange> values_ranges)
    : query_options(std::move(o))
 {
--- a/cql3/query_processor.cc
+++ b/cql3/query_processor.cc
@@ -562,27 +562,6 @@ query_processor::prepare(sstring query_string, const service::client_state& clie
    }
 }

-::shared_ptr<cql_transport::messages::result_message::prepared>
-query_processor::get_stored_prepared_statement(
-        const std::string_view& query_string,
-        const sstring& keyspace,
-        bool for_thrift) {
-    using namespace cql_transport::messages;
-    if (for_thrift) {
-        return get_stored_prepared_statement_one<result_message::prepared::thrift>(
-                query_string,
-                keyspace,
-                compute_thrift_id,
-                prepared_cache_key_type::thrift_id);
-    } else {
-        return get_stored_prepared_statement_one<result_message::prepared::cql>(
-                query_string,
-                keyspace,
-                compute_id,
-                prepared_cache_key_type::cql_id);
-    }
-}
-
 static std::string hash_target(std::string_view query_string, std::string_view keyspace) {
    std::string ret(keyspace);
    ret += query_string;
--- a/cql3/query_processor.hh
+++ b/cql3/query_processor.hh
@@ -414,28 +414,6 @@ private:
            });
        });
    };
-
-    template <typename ResultMsgType, typename KeyGenerator, typename IdGetter>
-    ::shared_ptr<cql_transport::messages::result_message::prepared>
-    get_stored_prepared_statement_one(
-            const std::string_view& query_string,
-            const sstring& keyspace,
-            KeyGenerator&& key_gen,
-            IdGetter&& id_getter) {
-        auto cache_key = key_gen(query_string, keyspace);
-        auto it = _prepared_cache.find(cache_key);
-        if (it == _prepared_cache.end()) {
-            return ::shared_ptr<cql_transport::messages::result_message::prepared>();
-        }
-
-        return ::make_shared<ResultMsgType>(id_getter(cache_key), *it);
-    }
-
-    ::shared_ptr<cql_transport::messages::result_message::prepared>
-    get_stored_prepared_statement(
-            const std::string_view& query_string,
-            const sstring& keyspace,
-            bool for_thrift);
 };

 class query_processor::migration_subscriber : public service::migration_listener {
--- a/cql3/result_set.hh
+++ b/cql3/result_set.hh
@@ -50,7 +50,6 @@

 #include "result_generator.hh"

-#include <seastar/util/gcc6-concepts.hh>

 namespace cql3 {

@@ -150,17 +149,13 @@ public:
    const std::vector<uint16_t>& partition_key_bind_indices() const;
 };

-GCC6_CONCEPT(
-
 template<typename Visitor>
-concept bool ResultVisitor = requires(Visitor& visitor) {
+concept ResultVisitor = requires(Visitor& visitor) {
    visitor.start_row();
    visitor.accept_value(std::optional<query::result_bytes_view>());
    visitor.end_row();
 };

-)
-
 class result_set {
    ::shared_ptr<metadata> _metadata;
    std::deque<std::vector<bytes_opt>> _rows;
@@ -199,7 +194,7 @@ public:
    const std::deque<std::vector<bytes_opt>>& rows() const;

    template<typename Visitor>
-    GCC6_CONCEPT(requires ResultVisitor<Visitor>)
+    requires ResultVisitor<Visitor>
    void visit(Visitor&& visitor) const {
        auto column_count = get_metadata().column_count();
        for (auto& row : _rows) {
@@ -264,7 +259,7 @@ public:
    }

    template<typename Visitor>
-    GCC6_CONCEPT(requires ResultVisitor<Visitor>)
+    requires ResultVisitor<Visitor>
    void visit(Visitor&& visitor) const {
        if (_result_set) {
            _result_set->visit(std::forward<Visitor>(visitor));
--- a/cql3/selection/selector.hh
+++ b/cql3/selection/selector.hh
@@ -107,8 +107,8 @@ public:
     */
    virtual void reset() = 0;

-    virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override {
-        auto t1 = receiver->type->underlying_type();
+    virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const override {
+        auto t1 = receiver.type->underlying_type();
        auto t2 = get_type()->underlying_type();
        // We want columns of `counter_type' to be served by underlying type's overloads
        // (here: `counter_cell_view::total_value_type()') with an `EXACT_MATCH'.
--- a/cql3/sets.cc
+++ b/cql3/sets.cc
@@ -98,17 +98,17 @@ sets::literal::validate_assignable_to(database& db, const sstring& keyspace, con

    auto&& value_spec = value_spec_of(receiver);
    for (shared_ptr<term::raw> rt : _elements) {
-        if (!is_assignable(rt->test_assignment(db, keyspace, value_spec))) {
+        if (!is_assignable(rt->test_assignment(db, keyspace, *value_spec))) {
            throw exceptions::invalid_request_exception(format("Invalid set literal for {}: value {} is not of type {}", *receiver.name, *rt, value_spec->type->as_cql3_type()));
        }
    }
 }

 assignment_testable::test_result
-sets::literal::test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const {
-    if (!dynamic_pointer_cast<const set_type_impl>(receiver->type)) {
+sets::literal::test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const {
+    if (!dynamic_pointer_cast<const set_type_impl>(receiver.type)) {
        // We've parsed empty maps as a set literal to break the ambiguity so handle that case now
-        if (dynamic_pointer_cast<const map_type_impl>(receiver->type) && _elements.empty()) {
+        if (dynamic_pointer_cast<const map_type_impl>(receiver.type) && _elements.empty()) {
            return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
        }

@@ -120,10 +120,10 @@ sets::literal::test_assignment(database& db, const sstring& keyspace, lw_shared_
        return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
    }

-    auto&& value_spec = value_spec_of(*receiver);
+    auto&& value_spec = value_spec_of(receiver);
    // FIXME: make assignment_testable::test_all() accept ranges
    std::vector<shared_ptr<assignment_testable>> to_test(_elements.begin(), _elements.end());
-    return assignment_testable::test_all(db, keyspace, value_spec, to_test);
+    return assignment_testable::test_all(db, keyspace, *value_spec, to_test);
 }

 sstring
--- a/cql3/sets.hh
+++ b/cql3/sets.hh
@@ -67,7 +67,7 @@ public:
        virtual shared_ptr<term> prepare(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override;
        void validate_assignable_to(database& db, const sstring& keyspace, const column_specification& receiver) const;
        assignment_testable::test_result
-        test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const;
+        test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const;
        virtual sstring to_string() const override;
    };

--- a/cql3/single_column_relation.hh
+++ b/cql3/single_column_relation.hh
@@ -108,10 +108,6 @@ public:
        return _entity;
    }

-    ::shared_ptr<term::raw> get_map_key() {
-        return _map_key;
-    }
-
    ::shared_ptr<term::raw> get_value() {
        return _value;
    }
--- a/cql3/statements/alter_table_statement.cc
+++ b/cql3/statements/alter_table_statement.cc
@@ -294,6 +294,12 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
        throw exceptions::invalid_request_exception("Cannot use ALTER TABLE on Materialized View");
    }

+    const auto& ks = db.find_keyspace(keyspace());
+    auto replication_type = ks.get_replication_strategy().get_type();
+    if (is_local_only && replication_type != locator::replication_strategy_type::local) {
+        throw std::logic_error(format("Internal queries should not try to alter table schema for non-local tables, because it leads to inconsistencies: {}.{}",
+                s->ks_name(), s->cf_name()));
+    }
    auto cfm = schema_builder(s);

    if (_properties->get_id()) {
--- a/cql3/statements/batch_statement.cc
+++ b/cql3/statements/batch_statement.cc
@@ -161,7 +161,7 @@ void batch_statement::validate()
                || (boost::distance(_statements
                        | boost::adaptors::transformed([] (auto&& s) { return s.statement->column_family(); })
                        | boost::adaptors::uniqued) != 1))) {
-        throw exceptions::invalid_request_exception("Batch with conditions cannot span multiple tables");
+        throw exceptions::invalid_request_exception("BATCH with conditions cannot span multiple tables");
    }
    std::optional<bool> raw_counter;
    for (auto& s : _statements) {
--- a/cql3/statements/cf_prop_defs.cc
+++ b/cql3/statements/cf_prop_defs.cc
@@ -146,6 +146,10 @@ void cf_prop_defs::validate(const database& db, const schema::extensions_map& sc
        cp.validate();
    }

+    if (auto caching_options = get_caching_options(); caching_options && !caching_options->enabled() && !db.features().cluster_supports_per_table_caching()) {
+        throw exceptions::configuration_exception(KW_CACHING + " can't contain \"'enabled':false\" unless whole cluster supports it");
+    }
+
    auto cdc_options = get_cdc_options(schema_extensions);
    if (cdc_options && cdc_options->enabled() && !db.features().cluster_supports_cdc()) {
        throw exceptions::configuration_exception("CDC not supported by the cluster");
@@ -200,6 +204,21 @@ std::optional<utils::UUID> cf_prop_defs::get_id() const {
    return std::nullopt;
 }

+std::optional<caching_options> cf_prop_defs::get_caching_options() const {
+    auto value = get(KW_CACHING);
+    if (!value) {
+        return {};
+    }
+    return std::visit(make_visitor(
+        [] (const property_definitions::map_type& map) {
+            return map.empty() ? std::nullopt : std::optional<caching_options>(caching_options::from_map(map));
+        },
+        [] (const sstring& str) {
+            return std::optional<caching_options>(caching_options::from_sstring(str));
+        }
+    ), *value);
+}
+
 const cdc::options* cf_prop_defs::get_cdc_options(const schema::extensions_map& schema_exts) const {
    auto it = schema_exts.find(cdc::cdc_extension::NAME);
    if (it == schema_exts.end()) {
@@ -286,11 +305,10 @@ void cf_prop_defs::apply_to_builder(schema_builder& builder, schema::extensions_
        builder.set_compressor_params(compression_parameters(*compression_options));
    }

-#if 0
-    CachingOptions cachingOptions = getCachingOptions();
-    if (cachingOptions != null)
-        cfm.caching(cachingOptions);
-#endif
+    auto caching_options = get_caching_options();
+    if (caching_options) {
+        builder.set_caching_options(std::move(*caching_options));
+    }

    // for extensions that are not altered, keep the old ones
    auto& old_exts = builder.get_extensions();
--- a/cql3/statements/cf_prop_defs.hh
+++ b/cql3/statements/cf_prop_defs.hh
@@ -95,6 +95,7 @@ public:
    std::map<sstring, sstring> get_compaction_options() const;
    std::optional<std::map<sstring, sstring>> get_compression_options() const;
    const cdc::options* get_cdc_options(const schema::extensions_map&) const;
+    std::optional<caching_options> get_caching_options() const;
 #if 0
    public CachingOptions getCachingOptions() throws SyntaxException, ConfigurationException
    {
--- a/cql3/statements/delete_statement.cc
+++ b/cql3/statements/delete_statement.cc
@@ -122,7 +122,7 @@ delete_statement::prepare_internal(database& db, schema_ptr schema, variable_spe

 delete_statement::delete_statement(::shared_ptr<cf_name> name,
                                 std::unique_ptr<attributes::raw> attrs,
-                                 std::vector<::shared_ptr<operation::raw_deletion>> deletions,
+                                 std::vector<std::unique_ptr<operation::raw_deletion>> deletions,
                                 std::vector<::shared_ptr<relation>> where_clause,
                                 conditions_vector conditions,
                                 bool if_exists)
--- a/cql3/statements/list_users_statement.cc
+++ b/cql3/statements/list_users_statement.cc
@@ -90,7 +90,7 @@ cql3::statements::list_users_statement::execute(service::storage_proxy& proxy, s
            return do_for_each(sorted_roles, [&as, &results](const sstring& role) {
                return when_all_succeed(
                        as.has_superuser(role),
-                        as.underlying_role_manager().can_login(role)).then([&results, &role](bool super, bool login) {
+                        as.underlying_role_manager().can_login(role)).then_unpack([&results, &role](bool super, bool login) {
                    if (login) {
                        results->add_column_value(utf8_type->decompose(role));
                        results->add_column_value(boolean_type->decompose(super));
--- a/cql3/statements/modification_statement.cc
+++ b/cql3/statements/modification_statement.cc
@@ -51,7 +51,6 @@
 #include <boost/range/adaptor/map.hpp>
 #include <boost/range/adaptor/indirected.hpp>
 #include "db/config.hh"
-#include "service/storage_service.hh"
 #include "transport/messages/result_message.hh"
 #include "database.hh"
 #include <seastar/core/execution_stage.hh>
@@ -266,7 +265,7 @@ dht::partition_range_vector
 modification_statement::build_partition_keys(const query_options& options, const json_cache_opt& json_cache) const {
    auto keys = _restrictions->get_partition_key_restrictions()->bounds_ranges(options);
    for (auto const& k : keys) {
-        validation::validate_cql_key(s, *k.start()->value().key());
+        validation::validate_cql_key(*s, *k.start()->value().key());
    }
    return keys;
 }
--- a/cql3/statements/property_definitions.cc
+++ b/cql3/statements/property_definitions.cc
@@ -109,6 +109,13 @@ bool property_definitions::has_property(const sstring& name) const {
    return _properties.find(name) != _properties.end();
 }

+std::optional<property_definitions::value_type> property_definitions::get(const sstring& name) const {
+    if (auto it = _properties.find(name); it != _properties.end()) {
+        return it->second;
+    }
+    return std::nullopt;
+}
+
 sstring property_definitions::get_string(sstring key, sstring default_value) const {
    auto value = get_simple(key);
    if (value) {
--- a/cql3/statements/property_definitions.hh
+++ b/cql3/statements/property_definitions.hh
@@ -86,6 +86,8 @@ protected:
 public:
    bool has_property(const sstring& name) const;

+    std::optional<value_type> get(const sstring& name) const;
+
    sstring get_string(sstring key, sstring default_value) const;

    // Return a property value, typed as a Boolean
--- a/cql3/statements/raw/delete_statement.hh
+++ b/cql3/statements/raw/delete_statement.hh
@@ -55,12 +55,12 @@ namespace raw {

 class delete_statement : public modification_statement {
 private:
-    std::vector<::shared_ptr<operation::raw_deletion>> _deletions;
+    std::vector<std::unique_ptr<operation::raw_deletion>> _deletions;
    std::vector<::shared_ptr<relation>> _where_clause;
 public:
    delete_statement(::shared_ptr<cf_name> name,
           std::unique_ptr<attributes::raw> attrs,
-           std::vector<::shared_ptr<operation::raw_deletion>> deletions,
+           std::vector<std::unique_ptr<operation::raw_deletion>> deletions,
           std::vector<::shared_ptr<relation>> where_clause,
           conditions_vector conditions,
           bool if_exists);
--- a/cql3/statements/raw/update_statement.hh
+++ b/cql3/statements/raw/update_statement.hh
@@ -62,7 +62,7 @@ namespace raw {
 class update_statement : public raw::modification_statement {
 private:
    // Provided for an UPDATE
-    std::vector<std::pair<::shared_ptr<column_identifier::raw>, ::shared_ptr<operation::raw_update>>> _updates;
+    std::vector<std::pair<::shared_ptr<column_identifier::raw>, std::unique_ptr<operation::raw_update>>> _updates;
    std::vector<relation_ptr> _where_clause;
 public:
    /**
@@ -76,7 +76,7 @@ public:
     */
    update_statement(::shared_ptr<cf_name> name,
        std::unique_ptr<attributes::raw> attrs,
-        std::vector<std::pair<::shared_ptr<column_identifier::raw>, ::shared_ptr<operation::raw_update>>> updates,
+        std::vector<std::pair<::shared_ptr<column_identifier::raw>, std::unique_ptr<operation::raw_update>>> updates,
        std::vector<relation_ptr> where_clause,
        conditions_vector conditions, bool if_exists);
 protected:
--- a/cql3/statements/role-management-statements.cc
+++ b/cql3/statements/role-management-statements.cc
@@ -375,7 +375,7 @@ list_roles_statement::execute(service::storage_proxy&, service::query_state& sta
                return when_all_succeed(
                        rm.can_login(role),
                        rm.is_superuser(role),
-                        a.query_custom_options(role)).then([&results, &role](
+                        a.query_custom_options(role)).then_unpack([&results, &role](
                               bool login,
                               bool super,
                               auth::custom_options os) {
--- a/cql3/statements/select_statement.cc
+++ b/cql3/statements/select_statement.cc
@@ -59,6 +59,7 @@
 #include "db/timeout_clock.hh"
 #include "db/consistency_level_validations.hh"
 #include "database.hh"
+#include "test/lib/select_statement_utils.hh"
 #include <boost/algorithm/cxx11/any_of.hpp>

 bool is_system_keyspace(const sstring& name);
@@ -67,6 +68,8 @@ namespace cql3 {

 namespace statements {

+static constexpr int DEFAULT_INTERNAL_PAGING_SIZE = select_statement::DEFAULT_COUNT_PAGE_SIZE;
+thread_local int internal_paging_size = DEFAULT_INTERNAL_PAGING_SIZE;
 thread_local const lw_shared_ptr<const select_statement::parameters> select_statement::_default_parameters = make_lw_shared<select_statement::parameters>();

 select_statement::parameters::parameters()
@@ -333,7 +336,7 @@ select_statement::do_execute(service::storage_proxy& proxy,
    const bool aggregate = _selection->is_aggregate() || has_group_by();
    const bool nonpaged_filtering = restrictions_need_filtering && page_size <= 0;
    if (aggregate || nonpaged_filtering) {
-        page_size = DEFAULT_COUNT_PAGE_SIZE;
+        page_size = internal_paging_size;
    }

    auto key_ranges = _restrictions->get_partition_key_ranges(options);
@@ -360,7 +363,7 @@ select_statement::do_execute(service::storage_proxy& proxy,
    command->slice.options.set<query::partition_slice::option::allow_short_read>();
    auto timeout_duration = options.get_timeout_config().*get_timeout_config_selector();
    auto p = service::pager::query_pagers::pager(_schema, _selection,
-            state, options, command, std::move(key_ranges), _stats, restrictions_need_filtering ? _restrictions : nullptr);
+            state, options, command, std::move(key_ranges), restrictions_need_filtering ? _restrictions : nullptr);

    if (aggregate || nonpaged_filtering) {
        return do_with(
@@ -372,10 +375,11 @@ select_statement::do_execute(service::storage_proxy& proxy,
                                auto timeout = db::timeout_clock::now() + timeout_duration;
                                return p->fetch_page(builder, page_size, now, timeout);
                            }
-                    ).then([this, &builder, restrictions_need_filtering] {
-                        return builder.with_thread_if_needed([this, &builder, restrictions_need_filtering] {
+                    ).then([this, p, &builder, restrictions_need_filtering] {
+                        return builder.with_thread_if_needed([this, p, &builder, restrictions_need_filtering] {
                            auto rs = builder.build();
                            if (restrictions_need_filtering) {
+                                _stats.filtered_rows_read_total += p->stats().rows_read_total;
                                _stats.filtered_rows_matched_total += rs->size();
                            }
                            update_stats_rows_read(rs->size());
@@ -419,6 +423,7 @@ select_statement::do_execute(service::storage_proxy& proxy,
                }

                if (restrictions_need_filtering) {
+                    _stats.filtered_rows_read_total += p->stats().rows_read_total;
                    _stats.filtered_rows_matched_total += rs->size();
                }
                update_stats_rows_read(rs->size());
@@ -428,9 +433,7 @@ select_statement::do_execute(service::storage_proxy& proxy,
 }

 template<typename KeyType>
-GCC6_CONCEPT(
-    requires (std::is_same_v<KeyType, partition_key> || std::is_same_v<KeyType, clustering_key_prefix>)
-)
+requires (std::is_same_v<KeyType, partition_key> || std::is_same_v<KeyType, clustering_key_prefix>)
 static KeyType
 generate_base_key_from_index_pk(const partition_key& index_pk, const std::optional<clustering_key>& index_ck, const schema& base_schema, const schema& view_schema) {
    const auto& base_columns = std::is_same_v<KeyType, partition_key> ? base_schema.partition_key_columns() : base_schema.clustering_key_columns();
@@ -530,13 +533,29 @@ indexed_table_select_statement::do_execute_base_query(
            if (old_paging_state && concurrency == 1) {
                auto base_pk = generate_base_key_from_index_pk<partition_key>(old_paging_state->get_partition_key(),
                        old_paging_state->get_clustering_key(), *_schema, *_view_schema);
+                auto row_ranges = command->slice.default_row_ranges();
                if (old_paging_state->get_clustering_key() && _schema->clustering_key_size() > 0) {
                    auto base_ck = generate_base_key_from_index_pk<clustering_key>(old_paging_state->get_partition_key(),
                            old_paging_state->get_clustering_key(), *_schema, *_view_schema);
-                    command->slice.set_range(*_schema, base_pk,
-                            std::vector<query::clustering_range>{query::clustering_range::make_starting_with(range_bound<clustering_key>(base_ck, false))});
+
+                    query::trim_clustering_row_ranges_to(*_schema, row_ranges, base_ck, false);
+                    command->slice.set_range(*_schema, base_pk, row_ranges);
                } else {
-                    command->slice.set_range(*_schema, base_pk, std::vector<query::clustering_range>{query::clustering_range::make_open_ended_both_sides()});
+                    // There is no clustering key in old_paging_state and/or no clustering key in 
+                    // _schema, therefore read an entire partition (whole clustering range).
+                    //
+                    // The only exception to applying no restrictions on clustering key
+                    // is a case when we have a secondary index on the first column
+                    // of clustering key. In such a case we should not read the
+                    // entire clustering range - only a range in which first column
+                    // of clustering key has the correct value. 
+                    //
+                    // This means that we should not set a open_ended_both_sides
+                    // clustering range on base_pk, instead intersect it with
+                    // _row_ranges (which contains the restrictions neccessary for the
+                    // case described above). The result of such intersection is just
+                    // _row_ranges, which we explicity set on base_pk.
+                    command->slice.set_range(*_schema, base_pk, row_ranges);
                }
            }
            concurrency *= 2;
@@ -844,9 +863,7 @@ indexed_table_select_statement::indexed_table_select_statement(schema_ptr schema
 }

 template<typename KeyType>
-GCC6_CONCEPT(
-    requires (std::is_same_v<KeyType, partition_key> || std::is_same_v<KeyType, clustering_key_prefix>)
-)
+requires (std::is_same_v<KeyType, partition_key> || std::is_same_v<KeyType, clustering_key_prefix>)
 static void append_base_key_to_index_ck(std::vector<bytes_view>& exploded_index_ck, const KeyType& base_key, const column_definition& index_cdef) {
    auto key_view = base_key.view();
    auto begin = key_view.begin();
@@ -976,36 +993,41 @@ indexed_table_select_statement::do_execute(service::storage_proxy& proxy,
    const bool aggregate = _selection->is_aggregate() || has_group_by();
    if (aggregate) {
        const bool restrictions_need_filtering = _restrictions->need_filtering();
-        return do_with(cql3::selection::result_set_builder(*_selection, now, options.get_cql_serialization_format()), std::make_unique<cql3::query_options>(cql3::query_options(options)),
+        return do_with(cql3::selection::result_set_builder(*_selection, now, options.get_cql_serialization_format(), *_group_by_cell_indices), std::make_unique<cql3::query_options>(cql3::query_options(options)),
                [this, &options, &proxy, &state, now, whole_partitions, partition_slices, restrictions_need_filtering] (cql3::selection::result_set_builder& builder, std::unique_ptr<cql3::query_options>& internal_options) {
            // page size is set to the internal count page size, regardless of the user-provided value
-            internal_options.reset(new cql3::query_options(std::move(internal_options), options.get_paging_state(), DEFAULT_COUNT_PAGE_SIZE));
+            internal_options.reset(new cql3::query_options(std::move(internal_options), options.get_paging_state(), internal_paging_size));
            return repeat([this, &builder, &options, &internal_options, &proxy, &state, now, whole_partitions, partition_slices, restrictions_need_filtering] () {
-                auto consume_results = [this, &builder, &options, &internal_options, restrictions_need_filtering] (foreign_ptr<lw_shared_ptr<query::result>> results, lw_shared_ptr<query::read_command> cmd) {
+                auto consume_results = [this, &builder, &options, &internal_options, &proxy, &state, restrictions_need_filtering] (foreign_ptr<lw_shared_ptr<query::result>> results, lw_shared_ptr<query::read_command> cmd, lw_shared_ptr<const service::pager::paging_state> paging_state) {
+                    if (paging_state) {
+                        paging_state = generate_view_paging_state_from_base_query_results(paging_state, results, proxy, state, options);
+                    }
+                    internal_options.reset(new cql3::query_options(std::move(internal_options), paging_state ? make_lw_shared<service::pager::paging_state>(*paging_state) : nullptr));
                    if (restrictions_need_filtering) {
+                        _stats.filtered_rows_read_total += *results->row_count();
                        query::result_view::consume(*results, cmd->slice, cql3::selection::result_set_builder::visitor(builder, *_schema, *_selection,
                                cql3::selection::result_set_builder::restrictions_filter(_restrictions, options, cmd->row_limit, _schema, cmd->slice.partition_row_limit())));
                    } else {
                        query::result_view::consume(*results, cmd->slice, cql3::selection::result_set_builder::visitor(builder, *_schema, *_selection));
                    }
+                    bool has_more_pages = paging_state && paging_state->get_remaining() > 0;
+                    return stop_iteration(!has_more_pages);
                };

                if (whole_partitions || partition_slices) {
                    return find_index_partition_ranges(proxy, state, *internal_options).then(
                            [this, now, &state, &internal_options, &proxy, consume_results = std::move(consume_results)] (dht::partition_range_vector partition_ranges, lw_shared_ptr<const service::pager::paging_state> paging_state) {
-                        bool has_more_pages = paging_state && paging_state->get_remaining() > 0;
-                        internal_options.reset(new cql3::query_options(std::move(internal_options), paging_state ? make_lw_shared<service::pager::paging_state>(*paging_state) : nullptr));
-                        return do_execute_base_query(proxy, std::move(partition_ranges), state, *internal_options, now, std::move(paging_state)).then(consume_results).then([has_more_pages] {
-                            return stop_iteration(!has_more_pages);
+                        return do_execute_base_query(proxy, std::move(partition_ranges), state, *internal_options, now, paging_state)
+                        .then([paging_state, consume_results = std::move(consume_results)](foreign_ptr<lw_shared_ptr<query::result>> results, lw_shared_ptr<query::read_command> cmd) {
+                            return consume_results(std::move(results), std::move(cmd), std::move(paging_state));
                        });
                    });
                } else {
                    return find_index_clustering_rows(proxy, state, *internal_options).then(
                            [this, now, &state, &internal_options, &proxy, consume_results = std::move(consume_results)] (std::vector<primary_key> primary_keys, lw_shared_ptr<const service::pager::paging_state> paging_state) {
-                        bool has_more_pages = paging_state && paging_state->get_remaining() > 0;
-                        internal_options.reset(new cql3::query_options(std::move(internal_options), paging_state ? make_lw_shared<service::pager::paging_state>(*paging_state) : nullptr));
-                        return this->do_execute_base_query(proxy, std::move(primary_keys), state, *internal_options, now, std::move(paging_state)).then(consume_results).then([has_more_pages] {
-                            return stop_iteration(!has_more_pages);
+                        return this->do_execute_base_query(proxy, std::move(primary_keys), state, *internal_options, now, paging_state)
+                        .then([paging_state, consume_results = std::move(consume_results)](foreign_ptr<lw_shared_ptr<query::result>> results, lw_shared_ptr<query::read_command> cmd) {
+                            return consume_results(std::move(results), std::move(cmd), std::move(paging_state));
                        });
                    });
                }
@@ -1172,7 +1194,7 @@ indexed_table_select_statement::read_posting_list(service::storage_proxy& proxy,
    }

    auto p = service::pager::query_pagers::pager(_view_schema, selection,
-            state, options, cmd, std::move(partition_ranges), _stats, nullptr);
+            state, options, cmd, std::move(partition_ranges), nullptr);
    return p->fetch_page(options.get_page_size(), now, timeout).then([p, &options, limit, now] (std::unique_ptr<cql3::result_set> rs) {
        rs->get_metadata().set_paging_state(p->state());
        return ::make_shared<cql_transport::messages::result_message::rows>(result(std::move(rs)));
@@ -1662,6 +1684,16 @@ std::vector<size_t> select_statement::prepare_group_by(const schema& schema, sel

 }

+future<> set_internal_paging_size(int paging_size) {
+    return seastar::smp::invoke_on_all([paging_size] {
+        internal_paging_size = paging_size;
+    });
+}
+
+future<> reset_internal_paging_size() {
+    return set_internal_paging_size(DEFAULT_INTERNAL_PAGING_SIZE);
+}
+
 }

 namespace util {
--- a/cql3/statements/update_statement.cc
+++ b/cql3/statements/update_statement.cc
@@ -379,7 +379,7 @@ insert_json_statement::prepare_internal(database& db, schema_ptr schema,

 update_statement::update_statement(::shared_ptr<cf_name> name,
                                   std::unique_ptr<attributes::raw> attrs,
-                                   std::vector<std::pair<::shared_ptr<column_identifier::raw>, ::shared_ptr<operation::raw_update>>> updates,
+                                   std::vector<std::pair<::shared_ptr<column_identifier::raw>, std::unique_ptr<operation::raw_update>>> updates,
                                   std::vector<relation_ptr> where_clause,
                                   conditions_vector conditions, bool if_exists)
    : raw::modification_statement(std::move(name), std::move(attrs), std::move(conditions), false, if_exists)
--- a/cql3/tuples.hh
+++ b/cql3/tuples.hh
@@ -82,15 +82,15 @@ public:

                auto&& value = _elements[i];
                auto&& spec = component_spec_of(receiver, i);
-                if (!assignment_testable::is_assignable(value->test_assignment(db, keyspace, spec))) {
+                if (!assignment_testable::is_assignable(value->test_assignment(db, keyspace, *spec))) {
                    throw exceptions::invalid_request_exception(format("Invalid tuple literal for {}: component {:d} is not of type {}", receiver.name, i, spec->type->as_cql3_type()));
                }
            }
        }
    public:
-        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override {
+        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const override {
            try {
-                validate_assignable_to(db, keyspace, *receiver);
+                validate_assignable_to(db, keyspace, receiver);
                return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
            } catch (exceptions::invalid_request_exception& e) {
                return assignment_testable::test_result::NOT_ASSIGNABLE;
--- a/cql3/type_cast.hh
+++ b/cql3/type_cast.hh
@@ -53,10 +53,10 @@ public:
    }

    virtual shared_ptr<term> prepare(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override {
-        if (!is_assignable(_term->test_assignment(db, keyspace, casted_spec_of(db, keyspace, *receiver)))) {
+        if (!is_assignable(_term->test_assignment(db, keyspace, *casted_spec_of(db, keyspace, *receiver)))) {
            throw exceptions::invalid_request_exception(format("Cannot cast value {} to type {}", _term, _type));
        }
-        if (!is_assignable(test_assignment(db, keyspace, receiver))) {
+        if (!is_assignable(test_assignment(db, keyspace, *receiver))) {
            throw exceptions::invalid_request_exception(format("Cannot assign value {} to {} of type {}", *this, receiver->name, receiver->type->as_cql3_type()));
        }
        return _term->prepare(db, keyspace, receiver);
@@ -67,12 +67,12 @@ private:
                ::make_shared<column_identifier>(to_string(), true), _type->prepare(db, keyspace).get_type());
    }
 public:
-    virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override {
+    virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const override {
        try {
            auto&& casted_type = _type->prepare(db, keyspace).get_type();
-            if (receiver->type == casted_type) {
+            if (receiver.type == casted_type) {
                return assignment_testable::test_result::EXACT_MATCH;
-            } else if (receiver->type->is_value_compatible_with(*casted_type)) {
+            } else if (receiver.type->is_value_compatible_with(*casted_type)) {
                return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
            } else {
                return assignment_testable::test_result::NOT_ASSIGNABLE;
--- a/cql3/untyped_result_set.cc
+++ b/cql3/untyped_result_set.cc
@@ -47,14 +47,14 @@
 #include "result_set.hh"
 #include "transport/messages/result_message.hh"

-cql3::untyped_result_set_row::untyped_result_set_row(const std::unordered_map<sstring, bytes_opt>& data)
+cql3::untyped_result_set_row::untyped_result_set_row(const map_t& data)
    : _data(data)
 {}

 cql3::untyped_result_set_row::untyped_result_set_row(const std::vector<lw_shared_ptr<column_specification>>& columns, std::vector<bytes_opt> data)
 : _columns(columns)
 , _data([&columns, data = std::move(data)] () mutable {
-    std::unordered_map<sstring, bytes_opt> tmp;
+    map_t tmp;
    std::transform(columns.begin(), columns.end(), data.begin(), std::inserter(tmp, tmp.end()), [](lw_shared_ptr<column_specification> c, bytes_opt& d) {
       return std::make_pair<sstring, bytes_opt>(c->name->to_string(), std::move(d));
    });
@@ -62,7 +62,7 @@ cql3::untyped_result_set_row::untyped_result_set_row(const std::vector<lw_shared
 }())
 {}

-bool cql3::untyped_result_set_row::has(const sstring& name) const {
+bool cql3::untyped_result_set_row::has(std::string_view name) const {
    auto i = _data.find(name);
    return i != _data.end() && i->second;
 }
--- a/cql3/untyped_result_set.hh
+++ b/cql3/untyped_result_set.hh
@@ -47,6 +47,8 @@
 #include "types/list.hh"
 #include "types/set.hh"
 #include "transport/messages/result_message_base.hh"
+#include "column_specification.hh"
+#include "absl-flat_hash_map.hh"

 #pragma once

@@ -55,26 +57,27 @@ namespace cql3 {
 class untyped_result_set_row {
 private:
    const std::vector<lw_shared_ptr<column_specification>> _columns;
-    const std::unordered_map<sstring, bytes_opt> _data;
+    using map_t = flat_hash_map<sstring, bytes_opt>;
+    const map_t _data;
 public:
-    untyped_result_set_row(const std::unordered_map<sstring, bytes_opt>&);
+    untyped_result_set_row(const map_t&);
    untyped_result_set_row(const std::vector<lw_shared_ptr<column_specification>>&, std::vector<bytes_opt>);
    untyped_result_set_row(untyped_result_set_row&&) = default;
    untyped_result_set_row(const untyped_result_set_row&) = delete;

-    bool has(const sstring&) const;
-    bytes_view get_view(const sstring& name) const {
+    bool has(std::string_view) const;
+    bytes_view get_view(std::string_view name) const {
        return *_data.at(name);
    }
-    bytes get_blob(const sstring& name) const {
+    bytes get_blob(std::string_view name) const {
        return bytes(get_view(name));
    }
    template<typename T>
-    T get_as(const sstring& name) const {
+    T get_as(std::string_view name) const {
        return value_cast<T>(data_type_for<T>()->deserialize(get_view(name)));
    }
    template<typename T>
-    std::optional<T> get_opt(const sstring& name) const {
+    std::optional<T> get_opt(std::string_view name) const {
        return has(name) ? get_as<T>(name) : std::optional<T>{};
    }
    bytes_view_opt get_view_opt(const sstring& name) const {
@@ -84,13 +87,13 @@ public:
        return std::nullopt;
    }
    template<typename T>
-    T get_or(const sstring& name, T t) const {
+    T get_or(std::string_view name, T t) const {
        return has(name) ? get_as<T>(name) : t;
    }
    // this could maybe be done as an overload of get_as (or something), but that just
    // muddles things for no real gain. Let user (us) attempt to know what he is doing instead.
    template<typename K, typename V, typename Iter>
-    void get_map_data(const sstring& name, Iter out, data_type keytype =
+    void get_map_data(std::string_view name, Iter out, data_type keytype =
            data_type_for<K>(), data_type valtype =
            data_type_for<V>()) const {
        auto vec =
@@ -103,7 +106,7 @@ public:
                });
    }
    template<typename K, typename V, typename ... Rest>
-    std::unordered_map<K, V, Rest...> get_map(const sstring& name,
+    std::unordered_map<K, V, Rest...> get_map(std::string_view name,
            data_type keytype = data_type_for<K>(), data_type valtype =
                    data_type_for<V>()) const {
        std::unordered_map<K, V, Rest...> res;
@@ -111,7 +114,7 @@ public:
        return res;
    }
    template<typename V, typename Iter>
-    void get_list_data(const sstring& name, Iter out, data_type valtype = data_type_for<V>()) const {
+    void get_list_data(std::string_view name, Iter out, data_type valtype = data_type_for<V>()) const {
        auto vec =
                value_cast<list_type_impl::native_type>(
                        list_type_impl::get_instance(valtype, false)->deserialize(
@@ -119,13 +122,13 @@ public:
        std::transform(vec.begin(), vec.end(), out, [](auto& v) { return value_cast<V>(v); });
    }
    template<typename V, typename ... Rest>
-    std::vector<V, Rest...> get_list(const sstring& name, data_type valtype = data_type_for<V>()) const {
+    std::vector<V, Rest...> get_list(std::string_view name, data_type valtype = data_type_for<V>()) const {
        std::vector<V, Rest...> res;
        get_list_data<V>(name, std::back_inserter(res), valtype);
        return res;
    }
    template<typename V, typename Iter>
-    void get_set_data(const sstring& name, Iter out, data_type valtype =
+    void get_set_data(std::string_view name, Iter out, data_type valtype =
                    data_type_for<V>()) const {
        auto vec =
                        value_cast<set_type_impl::native_type>(
@@ -137,7 +140,7 @@ public:
        });
    }
    template<typename V, typename ... Rest>
-    std::unordered_set<V, Rest...> get_set(const sstring& name,
+    std::unordered_set<V, Rest...> get_set(std::string_view name,
            data_type valtype =
                    data_type_for<V>()) const {
        std::unordered_set<V, Rest...> res;
--- a/cql3/user_types.cc
+++ b/cql3/user_types.cc
@@ -122,15 +122,15 @@ void user_types::literal::validate_assignable_to(database& db, const sstring& ke
        }
        const shared_ptr<term::raw>& value = _entries.at(field);
        auto&& field_spec = field_spec_of(receiver, i);
-        if (!assignment_testable::is_assignable(value->test_assignment(db, keyspace, field_spec))) {
+        if (!assignment_testable::is_assignable(value->test_assignment(db, keyspace, *field_spec))) {
            throw exceptions::invalid_request_exception(format("Invalid user type literal for {}: field {} is not of type {}", receiver.name, field, field_spec->type->as_cql3_type()));
        }
    }
 }

-assignment_testable::test_result user_types::literal::test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const {
+assignment_testable::test_result user_types::literal::test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const {
    try {
-        validate_assignable_to(db, keyspace, *receiver);
+        validate_assignable_to(db, keyspace, receiver);
        return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
    } catch (exceptions::invalid_request_exception& e) {
        return assignment_testable::test_result::NOT_ASSIGNABLE;
--- a/cql3/user_types.hh
+++ b/cql3/user_types.hh
@@ -67,7 +67,7 @@ public:
    private:
        void validate_assignable_to(database& db, const sstring& keyspace, const column_specification& receiver) const;
    public:
-        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, lw_shared_ptr<column_specification> receiver) const override;
+        virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, const column_specification& receiver) const override;
        virtual sstring assignment_testable_source_context() const override;
        virtual sstring to_string() const override;
    };
--- a/data/cell.hh
+++ b/data/cell.hh
@@ -292,12 +292,10 @@ public:
    /// \arg data needs to remain valid as long as the writer is in use.
    /// \returns imr::WriterAllocator for cell::structure.
    template<typename FragmentRange, typename = std::enable_if_t<is_fragment_range_v<std::decay_t<FragmentRange>>>>
-    GCC6_CONCEPT(
-        requires std::is_nothrow_move_constructible_v<std::decay_t<FragmentRange>> &&
-                std::is_nothrow_copy_constructible_v<std::decay_t<FragmentRange>> &&
-                std::is_nothrow_copy_assignable_v<std::decay_t<FragmentRange>> &&
-                std::is_nothrow_move_assignable_v<std::decay_t<FragmentRange>>
-    )
+    requires std::is_nothrow_move_constructible_v<std::decay_t<FragmentRange>> &&
+            std::is_nothrow_copy_constructible_v<std::decay_t<FragmentRange>> &&
+            std::is_nothrow_copy_assignable_v<std::decay_t<FragmentRange>> &&
+            std::is_nothrow_move_assignable_v<std::decay_t<FragmentRange>>
    static auto make_collection(FragmentRange data) noexcept {
        return [data = std::move(data)] (auto&& serializer, auto&& allocations) noexcept {
            return serializer
--- a/data/cell_impl.hh
+++ b/data/cell_impl.hh
@@ -86,12 +86,10 @@ public:
    { }

    template<typename Serializer, typename Allocator>
-    GCC6_CONCEPT(
-        requires (imr::is_sizer_for_v<cell::variable_value::structure, Serializer>
-                && std::is_same_v<Allocator, imr::alloc::object_allocator::sizer>)
-            || (imr::is_serializer_for_v<cell::variable_value::structure, Serializer>
-                && std::is_same_v<Allocator, imr::alloc::object_allocator::serializer>)
-    )
+    requires (imr::is_sizer_for_v<cell::variable_value::structure, Serializer>
+            && std::is_same_v<Allocator, imr::alloc::object_allocator::sizer>)
+        || (imr::is_serializer_for_v<cell::variable_value::structure, Serializer>
+            && std::is_same_v<Allocator, imr::alloc::object_allocator::serializer>)
    auto operator()(Serializer serializer, Allocator allocations) {
        auto after_size = serializer.serialize(_value_size);
        if (_force_internal || _value_size <= cell::maximum_internal_storage_length) {
@@ -134,14 +132,14 @@ public:

 inline value_writer<empty_fragment_range> cell::variable_value::write(size_t value_size, bool force_internal) noexcept
 {
-    GCC6_CONCEPT(static_assert(imr::WriterAllocator<value_writer<empty_fragment_range>, structure>));
+    static_assert(imr::WriterAllocator<value_writer<empty_fragment_range>, structure>);
    return value_writer<empty_fragment_range>(empty_fragment_range(), value_size, force_internal);
 }

 template<typename FragmentRange>
 inline value_writer<std::decay_t<FragmentRange>> cell::variable_value::write(FragmentRange&& value, bool force_internal) noexcept
 {
-    GCC6_CONCEPT(static_assert(imr::WriterAllocator<value_writer<std::decay_t<FragmentRange>>, structure>));
+    static_assert(imr::WriterAllocator<value_writer<std::decay_t<FragmentRange>>, structure>);
    return value_writer<std::decay_t<FragmentRange>>(std::forward<FragmentRange>(value), value.size_bytes(), force_internal);
 }

--- a/Show More
+++ b/Show More