materialized_views: propagate "view virtual columns" between nodes

db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed to list the same schema tables - the former is the list of their names, and the latter is the list of their schemas. This code duplication makes it easy to forget to update one of them, and indeed recently the new "view_virtual_columns" was added to all_tables() but not to ALL. What this patch does is to make ALL a function instead of constant vector. The newly named all_table_names() function uses all_tables() so the list of schema tables only appears once. So that nobody worries about the performance impact, all_table_names() caches the list in a per-thread vector that is only prepared once per thread. Because after this patch all_table_names() has the "view_virtual_columns" that was previously missing, this patch also fixes #4339, which was about virtual columns in materialized views not being propagated to other nodes. Unfortunately, to test the fix for #4339 we need a test with multiple nodes, so we cannot test it here in a unit test, and will instead use the dtest framework, in a separate patch. Fixes #4339 Branches: 3.0 Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190320063437.32731-1-nyh@scylladb.com> (cherry picked from commit 7c874057f5)
cql: alter type: Format field name as text instead of hex
2020-01-06 00:37:59 +02:00 · 2020-01-05 18:55:40 +02:00 · 2020-01-05 18:50:27 +02:00 · 2019-12-24 18:42:33 +02:00 · 2019-12-24 17:44:40 +02:00 · 2019-12-24 17:44:40 +02:00
1438 changed files with 65590 additions and 19812 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -1,3 +1,9 @@
+This is Scylla's bug tracker, to be used for reporting bugs only.
+If you have a question about Scylla, and not a bug, please ask it in
+our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.
+
+- [] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.
+
 *Installation details*
 Scylla version (or git commit hash):
 Cluster size:
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,4 @@
+Scylla doesn't use pull-requests, please send a patch to the [mailing list](mailto:scylladb-dev@googlegroups.com) instead.
+See our [contributing guidelines](../CONTRIBUTING.md) and our [Scylla development guidelines](../HACKING.md) for more information.
+
+If you have any questions please don't hesitate to send a mail to the [dev list](mailto:scylladb-dev@googlegroups.com).
--- a/.gitignore
+++ b/.gitignore
@@ -18,3 +18,4 @@ CMakeLists.txt.user
 *.egg-info
 __pycache__CMakeLists.txt.user
 .gdbinit
+resources
--- a/.gitmodules
+++ b/.gitmodules
@@ -6,9 +6,9 @@
 	path = swagger-ui
 	url = ../scylla-swagger-ui
 	ignore = dirty
-[submodule "dist/ami/files/scylla-ami"]
-	path = dist/ami/files/scylla-ami
-	url = ../scylla-ami
 [submodule "xxHash"]
 	path = xxHash
 	url = ../xxHash
+[submodule "libdeflate"]
+	path = libdeflate
+	url = ../libdeflate
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -138,4 +138,5 @@ target_include_directories(scylla PUBLIC
        ${SEASTAR_INCLUDE_DIRS}
        ${Boost_INCLUDE_DIRS}
        xxhash
+        libdeflate
        build/release/gen)
--- a/HACKING.md
+++ b/HACKING.md
@@ -20,7 +20,7 @@ $ git submodule update --init --recursive

 Scylla depends on the system package manager for its development dependencies.

-Running `./install_dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
+Running `./install-dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.

 ### Build system

--- a/README.md
+++ b/README.md
@@ -50,12 +50,12 @@ Then, to build an RPM, run:
 ./dist/redhat/build_rpm.sh
 ```

-The built RPM is stored in ``/var/lib/mock/<configuration>/result`` directory.
+The built RPM is stored in the ``build/mock/<configuration>/result`` directory.
 For example, on Fedora 21 mock reports the following:

 ```
 INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds
-INFO: Results and/or logs in: /var/lib/mock/fedora-21-x86_64/result
+INFO: Results and/or logs in: build/mock/fedora-21-x86_64/result
 ```

 ## Building Fedora-based Docker image
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 #!/bin/sh

-VERSION=2.2.2
+VERSION=3.0.11

 if test -f version
 then
--- a/api/api-doc/column_family.json
+++ b/api/api-doc/column_family.json
@@ -455,7 +455,7 @@
         "operations":[
            {
               "method":"GET",
-               "summary":"Returns a list of filenames that contain the given key on this node",
+               "summary":"Returns a list of sstable filenames that contain the given partition key on this node",
               "type":"array",
               "items":{
                  "type":"string"
@@ -475,7 +475,7 @@
                  },
                  {
                     "name":"key",
-                     "description":"The key",
+                     "description":"The partition key. In a composite-key scenario, use ':' to separate the columns in the key.",
                     "required":true,
                     "allowMultiple":false,
                     "type":"string",
--- a/api/api-doc/config.json
+++ b/api/api-doc/config.json
@@ -0,0 +1,30 @@
+"/v2/config/{id}": {
+      "get": {
+        "description": "Return a config value",
+        "operationId": "find_config_id",
+        "produces": [
+          "application/json"
+        ],
+        "tags": ["config"],
+        "parameters": [
+          {
+            "name": "id",
+            "in": "path",
+            "description": "ID of config to return",
+            "required": true,
+            "type": "string"
+          }
+        ],
+        "responses": {
+          "200": {
+            "description": "Config value"
+          },
+          "default": {
+            "description": "unexpected error",
+            "schema": {
+              "$ref": "#/definitions/ErrorModel"
+            }
+          }
+        }
+      }
+}
--- a/api/api-doc/storage_service.json
+++ b/api/api-doc/storage_service.json
@@ -2129,6 +2129,41 @@
               ]
            }
         ]
+      },
+      {
+         "path":"/storage_service/view_build_statuses/{keyspace}/{view}",
+         "operations":[
+            {
+               "method":"GET",
+               "summary":"Gets the progress of a materialized view build",
+               "type":"array",
+               "items":{
+                  "type":"mapper"
+               },
+               "nickname":"view_build_statuses",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[
+                  {
+                     "name":"keyspace",
+                     "description":"The keyspace",
+                     "required":true,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"path"
+                  },
+                  {
+                     "name":"view",
+                     "description":"View name",
+                     "required":true,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"path"
+                  }
+               ]
+            }
+         ]
      }
   ],
   "models":{
--- a/api/api.cc
+++ b/api/api.cc
@@ -39,6 +39,7 @@
 #include "http/exception.hh"
 #include "stream_manager.hh"
 #include "system.hh"
+#include "api/config.hh"

 namespace api {

@@ -65,6 +66,7 @@ future<> set_server_init(http_context& ctx) {
        rb->set_api_doc(r);
        rb02->set_api_doc(r);
        rb02->register_api_file(r, "swagger20_header");
+        set_config(rb02, ctx, r);
        rb->register_function(r, "system",
                "The system related API");
        set_system(ctx, r);
--- a/api/column_family.cc
+++ b/api/column_family.cc
@@ -429,7 +429,7 @@ void set_column_family(http_context& ctx, routes& r) {
        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
            utils::estimated_histogram res(0);
            for (auto i: *cf.get_sstables() ) {
-                res.merge(i->get_stats_metadata().estimated_column_count);
+                res.merge(i->get_stats_metadata().estimated_cells_count);
            }
            return res;
        },
@@ -905,5 +905,20 @@ void set_column_family(http_context& ctx, routes& r) {
            return make_ready_future<json::json_return_type>(res);
        });
    });
+
+    cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<request> req) {
+        auto key = req->get_query_param("key");
+        auto uuid = get_uuid(req->param["name"], ctx.db.local());
+
+        return ctx.db.map_reduce0([key, uuid] (database& db) {
+            return db.find_column_family(uuid).get_sstables_by_partition_key(key);
+        }, std::unordered_set<sstring>(),
+            [](std::unordered_set<sstring> a, std::unordered_set<sstring>&& b) mutable {
+            a.insert(b.begin(),b.end());
+            return a;
+        }).then([](const std::unordered_set<sstring>& res) {
+            return make_ready_future<json::json_return_type>(container_to_vec(res));
+        });
+    });
 }
 }
--- a/api/column_family.hh
+++ b/api/column_family.hh
@@ -24,6 +24,7 @@
 #include "api.hh"
 #include "api/api-doc/column_family.json.hh"
 #include "database.hh"
+#include <any>

 namespace api {

@@ -37,9 +38,15 @@ template<class Mapper, class I, class Reducer>
 future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
        Mapper mapper, Reducer reducer) {
    auto uuid = get_uuid(name, ctx.db.local());
-    return ctx.db.map_reduce0([mapper, uuid](database& db) {
-        return mapper(db.find_column_family(uuid));
-    }, init, reducer);
+    using mapper_type = std::function<std::any (database&)>;
+    using reducer_type = std::function<std::any (std::any, std::any)>;
+    return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {
+        return I(mapper(db.find_column_family(uuid)));
+    }), std::any(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
+        return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
+    })).then([] (std::any r) {
+        return std::any_cast<I>(std::move(r));
+    });
 }


@@ -51,35 +58,42 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n
    });
 }

-template<class Mapper, class I, class Reducer, class Result>
-future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
-        Mapper mapper, Reducer reducer, Result result) {
-    auto uuid = get_uuid(name, ctx.db.local());
-    return ctx.db.map_reduce0([mapper, uuid](database& db) {
-        return mapper(db.find_column_family(uuid));
-    }, init, reducer);
-}
-
-
 template<class Mapper, class I, class Reducer, class Result>
 future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
        Mapper mapper, Reducer reducer, Result result) {
-    return map_reduce_cf_raw(ctx, name, init, mapper, reducer, result).then([result](const I& res) mutable {
+    return map_reduce_cf_raw(ctx, name, init, mapper, reducer).then([result](const I& res) mutable {
        result = res;
        return make_ready_future<json::json_return_type>(result);
    });
 }

-template<class Mapper, class I, class Reducer>
-future<I> map_reduce_cf_raw(http_context& ctx, I init,
-        Mapper mapper, Reducer reducer) {
-    return ctx.db.map_reduce0([mapper, init, reducer](database& db) {
+struct map_reduce_column_families_locally {
+    std::any init;
+    std::function<std::any (column_family&)> mapper;
+    std::function<std::any (std::any, std::any)> reducer;
+    std::any operator()(database& db) const {
        auto res = init;
        for (auto i : db.get_column_families()) {
            res = reducer(res, mapper(*i.second.get()));
        }
        return res;
-    }, init, reducer);
+    }
+};
+
+template<class Mapper, class I, class Reducer>
+future<I> map_reduce_cf_raw(http_context& ctx, I init,
+        Mapper mapper, Reducer reducer) {
+    using mapper_type = std::function<std::any (column_family&)>;
+    using reducer_type = std::function<std::any (std::any, std::any)>;
+    auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {
+        return I(mapper(cf));
+    });
+    auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
+        return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
+    });
+    return ctx.db.map_reduce0(map_reduce_column_families_locally{init, std::move(wrapped_mapper), wrapped_reducer}, std::any(init), wrapped_reducer).then([] (std::any res) {
+        return std::any_cast<I>(std::move(res));
+    });
 }


--- a/api/config.cc
+++ b/api/config.cc
@@ -0,0 +1,112 @@
+/*
+ * Copyright 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "api/config.hh"
+#include "api/api-doc/config.json.hh"
+#include "db/config.hh"
+#include <sstream>
+#include <boost/algorithm/string/replace.hpp>
+
+namespace api {
+
+template<class T>
+json::json_return_type get_json_return_type(const T& val) {
+    return json::json_return_type(val);
+}
+
+/*
+ * As commented on db::seed_provider_type is not used
+ * and probably never will.
+ *
+ * Just in case, we will return its name
+ */
+template<>
+json::json_return_type get_json_return_type(const db::seed_provider_type& val) {
+    return json::json_return_type(val.class_name);
+}
+
+std::string format_type(const std::string& type) {
+    if (type == "int") {
+        return "integer";
+    }
+    return type;
+}
+
+future<> get_config_swagger_entry(const std::string& name, const std::string& description, const std::string& type, bool& first, output_stream<char>& os) {
+    std::stringstream ss;
+    if (first) {
+        first=false;
+    } else {
+        ss <<',';
+    };
+    ss << "\"/config/" << name <<"\": {"
+      "\"get\": {"
+        "\"description\": \"" << boost::replace_all_copy(boost::replace_all_copy(boost::replace_all_copy(description,"\n","\\n"),"\"", "''"), "\t", " ") <<"\","
+        "\"operationId\": \"find_config_"<< name <<"\","
+        "\"produces\": ["
+          "\"application/json\""
+        "],"
+        "\"tags\": [\"config\"],"
+        "\"parameters\": ["
+        "],"
+        "\"responses\": {"
+          "\"200\": {"
+            "\"description\": \"Config value\","
+             "\"schema\": {"
+               "\"type\": \"" << format_type(type) << "\""
+             "}"
+          "},"
+          "\"default\": {"
+            "\"description\": \"unexpected error\","
+            "\"schema\": {"
+              "\"$ref\": \"#/definitions/ErrorModel\""
+            "}"
+          "}"
+        "}"
+      "}"
+    "}";
+    return os.write(ss.str());
+}
+
+namespace cs = httpd::config_json;
+#define _get_config_value(name, type, deflt, status, desc, ...) if (id == #name) {return get_json_return_type(ctx.db.local().get_config().name());}
+
+
+#define _get_config_description(name, type, deflt, status, desc, ...) f = f.then([&os, &first] {return get_config_swagger_entry(#name, desc, #type, first, os);});
+
+void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {
+    rb->register_function(r, [] (output_stream<char>& os) {
+        return do_with(true, [&os] (bool& first) {
+            auto f = make_ready_future();
+            _make_config_values(_get_config_description)
+            return f;
+        });
+    });
+
+    cs::find_config_id.set(r, [&ctx] (const_req r) {
+        auto id = r.param["id"];
+        _make_config_values(_get_config_value)
+        throw bad_param_exception(sstring("No such config entry: ") + id);
+    });
+}
+
+}
+
--- a/api/config.hh
+++ b/api/config.hh
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "api.hh"
+#include <seastar/http/api_docs.hh>
+
+namespace api {
+
+void set_config(std::shared_ptr<api_registry_builder20> rb, http_context& ctx, routes& r);
+}
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -78,15 +78,17 @@ void set_storage_service(http_context& ctx, routes& r) {
        });
    });

-    ss::get_tokens.set(r, [] (const_req req) {
-        auto tokens = service::get_local_storage_service().get_token_metadata().sorted_tokens();
-        return container_to_vec(tokens);
+    ss::get_tokens.set(r, [] (std::unique_ptr<request> req) {
+        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().sorted_tokens(), [](const dht::token& i) {
+           return boost::lexical_cast<std::string>(i);
+        }));
    });

-    ss::get_node_tokens.set(r, [] (const_req req) {
-        gms::inet_address addr(req.param["endpoint"]);
-        auto tokens = service::get_local_storage_service().get_token_metadata().get_tokens(addr);
-        return container_to_vec(tokens);
+    ss::get_node_tokens.set(r, [] (std::unique_ptr<request> req) {
+        gms::inet_address addr(req->param["endpoint"]);
+        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_metadata().get_tokens(addr), [](const dht::token& i) {
+           return boost::lexical_cast<std::string>(i);
+       }));
    });

    ss::get_commitlog.set(r, [&ctx](const_req req) {
@@ -107,11 +109,7 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::get_moving_nodes.set(r, [](const_req req) {
-        auto points = service::get_local_storage_service().get_token_metadata().get_moving_endpoints();
        std::unordered_set<sstring> addr;
-        for (auto i: points) {
-            addr.insert(boost::lexical_cast<std::string>(i.second));
-        }
        return container_to_vec(addr);
    });

@@ -852,6 +850,15 @@ void set_storage_service(http_context& ctx, routes& r) {
            return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
        });
    });
+
+    ss::view_build_statuses.set(r, [&ctx] (std::unique_ptr<request> req) {
+        auto keyspace = validate_keyspace(ctx, req->param);
+        auto view = req->param["view"];
+        return service::get_local_storage_service().view_build_statuses(std::move(keyspace), std::move(view)).then([] (std::unordered_map<sstring, sstring> status) {
+            std::vector<storage_service_json::mapper> res;
+            return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));
+        });
+    });
 }

 }
--- a/atomic_cell.cc
+++ b/atomic_cell.cc
@@ -0,0 +1,258 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "atomic_cell.hh"
+#include "atomic_cell_or_collection.hh"
+#include "types.hh"
+
+/// LSA mirator for cells with irrelevant type
+///
+///
+const data::type_imr_descriptor& no_type_imr_descriptor() {
+    static thread_local data::type_imr_descriptor state(data::type_info::make_variable_size());
+    return state;
+}
+
+atomic_cell atomic_cell::make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
+    auto& imr_data = no_type_imr_descriptor();
+    return atomic_cell(
+            imr_data.type_info(),
+            imr_object_type::make(data::cell::make_dead(timestamp, deletion_time), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value, atomic_cell::collection_member cm) {
+    auto& imr_data = type.imr_state();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value, atomic_cell::collection_member cm) {
+    auto& imr_data = type.imr_state();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value, collection_member cm)
+{
+    auto& imr_data = type.imr_state();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
+                             gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
+    auto& imr_data = type.imr_state();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
+                             gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
+    auto& imr_data = type.imr_state();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
+                                   gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm)
+{
+    auto& imr_data = type.imr_state();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
+    auto& imr_data = no_type_imr_descriptor();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live_counter_update(timestamp, value), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size) {
+    auto& imr_data = no_type_imr_descriptor();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live_uninitialized(imr_data.type_info(), timestamp, size), &imr_data.lsa_migrator())
+    );
+}
+
+static imr::utils::object<data::cell::structure> copy_cell(const data::type_imr_descriptor& imr_data, const uint8_t* ptr)
+{
+    using imr_object_type = imr::utils::object<data::cell::structure>;
+
+    // If the cell doesn't own any memory it is trivial and can be copied with
+    // memcpy.
+    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
+    if (!f.template get<data::cell::tags::external_data>()) {
+        data::cell::context ctx(f, imr_data.type_info());
+        // XXX: We may be better off storing the total cell size in memory. Measure!
+        auto size = data::cell::structure::serialized_object_size(ptr, ctx);
+        return imr_object_type::make_raw(size, [&] (uint8_t* dst) noexcept {
+            std::copy_n(ptr, size, dst);
+        }, &imr_data.lsa_migrator());
+    }
+
+    return imr_object_type::make(data::cell::copy_fn(imr_data.type_info(), ptr), &imr_data.lsa_migrator());
+}
+
+atomic_cell::atomic_cell(const abstract_type& type, atomic_cell_view other)
+    : atomic_cell(type.imr_state().type_info(),
+                  copy_cell(type.imr_state(), other._view.raw_pointer()))
+{ }
+
+atomic_cell_or_collection atomic_cell_or_collection::copy(const abstract_type& type) const {
+    if (!_data.get()) {
+        return atomic_cell_or_collection();
+    }
+    auto& imr_data = type.imr_state();
+    return atomic_cell_or_collection(
+        copy_cell(imr_data, _data.get())
+    );
+}
+
+atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type, atomic_cell_view acv)
+    : _data(copy_cell(type.imr_state(), acv._view.raw_pointer()))
+{
+}
+
+static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
+{
+    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
+    auto ti = data::type_info::make_collection();
+    data::cell::context ctx(f, ti);
+    auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
+    auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
+    return collection_mutation_view { dv };
+}
+
+collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
+    return get_collection_mutation_view(_data.get());
+}
+
+collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
+    : _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
+{
+}
+
+collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
+    : _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
+{
+}
+
+collection_mutation::operator collection_mutation_view() const
+{
+    return get_collection_mutation_view(_data.get());
+}
+
+bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
+{
+    auto ptr_a = _data.get();
+    auto ptr_b = other._data.get();
+
+    if (!ptr_a || !ptr_b) {
+        return !ptr_a && !ptr_b;
+    }
+
+    if (type.is_atomic()) {
+        auto a = atomic_cell_view::from_bytes(type.imr_state().type_info(), _data);
+        auto b = atomic_cell_view::from_bytes(type.imr_state().type_info(), other._data);
+        if (a.timestamp() != b.timestamp()) {
+            return false;
+        }
+        if (a.is_live()) {
+            if (!b.is_live()) {
+                return false;
+            }
+            if (a.is_counter_update()) {
+                if (!b.is_counter_update()) {
+                    return false;
+                }
+                return a.counter_update_value() == b.counter_update_value();
+            }
+            if (a.is_live_and_has_ttl()) {
+                if (!b.is_live_and_has_ttl()) {
+                    return false;
+                }
+                if (a.ttl() != b.ttl() || a.expiry() != b.expiry()) {
+                    return false;
+                }
+            }
+            return a.value() == b.value();
+        }
+        return a.deletion_time() == b.deletion_time();
+    } else {
+        return as_collection_mutation().data == other.as_collection_mutation().data;
+    }
+}
+
+size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t) const
+{
+    if (!_data.get()) {
+        return 0;
+    }
+    auto ctx = data::cell::context(_data.get(), t.imr_state().type_info());
+
+    auto view = data::cell::structure::make_view(_data.get(), ctx);
+    auto flags = view.get<data::cell::tags::flags>();
+
+    size_t external_value_size = 0;
+    if (flags.get<data::cell::tags::external_data>()) {
+        if (flags.get<data::cell::tags::collection>()) {
+            external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
+        } else {
+            auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
+            external_value_size = cell_view.value_size();
+        }
+        // Add overhead of chunk headers. The last one is a special case.
+        external_value_size += (external_value_size - 1) / data::cell::maximum_external_chunk_length * data::cell::external_chunk_overhead;
+        external_value_size += data::cell::external_last_chunk_overhead;
+    }
+    return data::cell::structure::serialized_object_size(_data.get(), ctx)
+        + imr_object_type::size_overhead + external_value_size;
+}
+
+std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection& c) {
+    if (!c._data.get()) {
+        return os << "{ null atomic_cell_or_collection }";
+    }
+    using dc = data::cell;
+    os << "{ ";
+    if (dc::structure::get_member<dc::tags::flags>(c._data.get()).get<dc::tags::collection>()) {
+        os << "collection";
+    } else {
+        os << "atomic cell";
+    }
+    return os << " @" << static_cast<const void*>(c._data.get()) << " }";
+}
--- a/atomic_cell.hh
+++ b/atomic_cell.hh
@@ -30,189 +30,51 @@
 #include <cstdint>
 #include <iosfwd>
 #include <seastar/util/gcc6-concepts.hh>
+#include "data/cell.hh"
+#include "data/schema_info.hh"
+#include "imr/utils.hh"
+#include "utils/fragmented_temporary_buffer.hh"

-template<typename T, typename Input>
-static inline
-void set_field(Input& v, unsigned offset, T val) {
-    reinterpret_cast<net::packed<T>*>(v.begin() + offset)->raw = net::hton(val);
-}
+#include "serializer.hh"

-template<typename T>
-static inline
-T get_field(const bytes_view& v, unsigned offset) {
-    return net::ntoh(*reinterpret_cast<const net::packed<T>*>(v.begin() + offset));
-}
+class abstract_type;
+class collection_type_impl;

-class atomic_cell_or_collection;
+using atomic_cell_value_view = data::value_view;
+using atomic_cell_value_mutable_view = data::value_mutable_view;

-/*
- * Represents atomic cell layout. Works on serialized form.
- *
- * Layout:
- *
- *  <live>  := <int8_t:flags><int64_t:timestamp>(<int32_t:expiry><int32_t:ttl>)?<value>
- *  <dead>  := <int8_t:    0><int64_t:timestamp><int32_t:deletion_time>
- */
-class atomic_cell_type final {
-private:
-    static constexpr int8_t LIVE_FLAG = 0x01;
-    static constexpr int8_t EXPIRY_FLAG = 0x02; // When present, expiry field is present. Set only for live cells
-    static constexpr int8_t COUNTER_UPDATE_FLAG = 0x08; // Cell is a counter update.
-    static constexpr int8_t COUNTER_IN_PLACE_REVERT = 0x10;
-    static constexpr unsigned flags_size = 1;
-    static constexpr unsigned timestamp_offset = flags_size;
-    static constexpr unsigned timestamp_size = 8;
-    static constexpr unsigned expiry_offset = timestamp_offset + timestamp_size;
-    static constexpr unsigned expiry_size = 4;
-    static constexpr unsigned deletion_time_offset = timestamp_offset + timestamp_size;
-    static constexpr unsigned deletion_time_size = 4;
-    static constexpr unsigned ttl_offset = expiry_offset + expiry_size;
-    static constexpr unsigned ttl_size = 4;
-    friend class counter_cell_builder;
-private:
-    static bool is_counter_update(bytes_view cell) {
-        return cell[0] & COUNTER_UPDATE_FLAG;
-    }
-    static bool is_counter_in_place_revert_set(bytes_view cell) {
-        return cell[0] & COUNTER_IN_PLACE_REVERT;
-    }
-    template<typename BytesContainer>
-    static void set_counter_in_place_revert(BytesContainer& cell, bool flag) {
-        cell[0] = (cell[0] & ~COUNTER_IN_PLACE_REVERT) | (flag * COUNTER_IN_PLACE_REVERT);
-    }
-    static bool is_live(const bytes_view& cell) {
-        return cell[0] & LIVE_FLAG;
-    }
-    static bool is_live_and_has_ttl(const bytes_view& cell) {
-        return cell[0] & EXPIRY_FLAG;
-    }
-    static bool is_dead(const bytes_view& cell) {
-        return !is_live(cell);
-    }
-    // Can be called on live and dead cells
-    static api::timestamp_type timestamp(const bytes_view& cell) {
-        return get_field<api::timestamp_type>(cell, timestamp_offset);
-    }
-    template<typename BytesContainer>
-    static void set_timestamp(BytesContainer& cell, api::timestamp_type ts) {
-        set_field(cell, timestamp_offset, ts);
-    }
-    // Can be called on live cells only
-private:
-    template<typename BytesView>
-    static BytesView do_get_value(BytesView cell) {
-        auto expiry_field_size = bool(cell[0] & EXPIRY_FLAG) * (expiry_size + ttl_size);
-        auto value_offset = flags_size + timestamp_size + expiry_field_size;
-        cell.remove_prefix(value_offset);
-        return cell;
-    }
-public:
-    static bytes_view value(bytes_view cell) {
-        return do_get_value(cell);
-    }
-    static bytes_mutable_view value(bytes_mutable_view cell) {
-        return do_get_value(cell);
-    }
-    // Can be called on live counter update cells only
-    static int64_t counter_update_value(bytes_view cell) {
-        return get_field<int64_t>(cell, flags_size + timestamp_size);
-    }
-    // Can be called only when is_dead() is true.
-    static gc_clock::time_point deletion_time(const bytes_view& cell) {
-        assert(is_dead(cell));
-        return gc_clock::time_point(gc_clock::duration(
-            get_field<int32_t>(cell, deletion_time_offset)));
-    }
-    // Can be called only when is_live_and_has_ttl() is true.
-    static gc_clock::time_point expiry(const bytes_view& cell) {
-        assert(is_live_and_has_ttl(cell));
-        auto expiry = get_field<int32_t>(cell, expiry_offset);
-        return gc_clock::time_point(gc_clock::duration(expiry));
-    }
-    // Can be called only when is_live_and_has_ttl() is true.
-    static gc_clock::duration ttl(const bytes_view& cell) {
-        assert(is_live_and_has_ttl(cell));
-        return gc_clock::duration(get_field<int32_t>(cell, ttl_offset));
-    }
-    static managed_bytes make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
-        managed_bytes b(managed_bytes::initialized_later(), flags_size + timestamp_size + deletion_time_size);
-        b[0] = 0;
-        set_field(b, timestamp_offset, timestamp);
-        set_field(b, deletion_time_offset, deletion_time.time_since_epoch().count());
-        return b;
-    }
-    static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value) {
-        auto value_offset = flags_size + timestamp_size;
-        managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
-        b[0] = LIVE_FLAG;
-        set_field(b, timestamp_offset, timestamp);
-        std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
-        return b;
-    }
-    static managed_bytes make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
-        auto value_offset = flags_size + timestamp_size;
-        managed_bytes b(managed_bytes::initialized_later(), value_offset + sizeof(value));
-        b[0] = LIVE_FLAG | COUNTER_UPDATE_FLAG;
-        set_field(b, timestamp_offset, timestamp);
-        set_field(b, value_offset, value);
-        return b;
-    }
-    static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value, gc_clock::time_point expiry, gc_clock::duration ttl) {
-        auto value_offset = flags_size + timestamp_size + expiry_size + ttl_size;
-        managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
-        b[0] = EXPIRY_FLAG | LIVE_FLAG;
-        set_field(b, timestamp_offset, timestamp);
-        set_field(b, expiry_offset, expiry.time_since_epoch().count());
-        set_field(b, ttl_offset, ttl.count());
-        std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
-        return b;
-    }
-    // make_live_from_serializer() is intended for users that need to serialise
-    // some object or objects to the format used in atomic_cell::value().
-    // With just make_live() the patter would look like follows:
-    // 1. allocate a buffer and write to it serialised objects
-    // 2. pass that buffer to make_live()
-    // 3. make_live() needs to prepend some metadata to the cell value so it
-    //    allocates a new buffer and copies the content of the original one
-    //
-    // The allocation and copy of a buffer can be avoided.
-    // make_live_from_serializer() allows the user code to specify the timestamp
-    // and size of the cell value as well as provide the serialiser function
-    // object, which would write the serialised value of the cell to the buffer
-    // given to it by make_live_from_serializer().
-    template<typename Serializer>
-    GCC6_CONCEPT(requires requires(Serializer serializer, bytes::iterator it) {
-        serializer(it);
-    })
-    static managed_bytes make_live_from_serializer(api::timestamp_type timestamp, size_t size, Serializer&& serializer) {
-        auto value_offset = flags_size + timestamp_size;
-        managed_bytes b(managed_bytes::initialized_later(), value_offset + size);
-        b[0] = LIVE_FLAG;
-        set_field(b, timestamp_offset, timestamp);
-        serializer(b.begin() + value_offset);
-        return b;
-    }
-    template<typename ByteContainer>
-    friend class atomic_cell_base;
+/// View of an atomic cell
+template<mutable_view is_mutable>
+class basic_atomic_cell_view {
+protected:
+    data::cell::basic_atomic_cell_view<is_mutable> _view;
    friend class atomic_cell;
-};
+public:
+    using pointer_type = std::conditional_t<is_mutable == mutable_view::no, const uint8_t*, uint8_t*>;
+protected:
+    explicit basic_atomic_cell_view(data::cell::basic_atomic_cell_view<is_mutable> v)
+        : _view(std::move(v)) { }
+
+    basic_atomic_cell_view(const data::type_info& ti, pointer_type ptr)
+        : _view(data::cell::make_atomic_cell_view(ti, ptr))
+    { }

-template<typename ByteContainer>
-class atomic_cell_base {
-protected:
-    ByteContainer _data;
-protected:
-    atomic_cell_base(ByteContainer&& data) : _data(std::forward<ByteContainer>(data)) { }
    friend class atomic_cell_or_collection;
 public:
-    bool is_counter_update() const {
-        return atomic_cell_type::is_counter_update(_data);
+    operator basic_atomic_cell_view<mutable_view::no>() const noexcept {
+        return basic_atomic_cell_view<mutable_view::no>(_view);
    }
-    bool is_counter_in_place_revert_set() const {
-        return atomic_cell_type::is_counter_in_place_revert_set(_data);
+
+    void swap(basic_atomic_cell_view& other) noexcept {
+        using std::swap;
+        swap(_view, other._view);
+    }
+
+    bool is_counter_update() const {
+        return _view.is_counter_update();
    }
    bool is_live() const {
-        return atomic_cell_type::is_live(_data);
+        return _view.is_live();
    }
    bool is_live(tombstone t, bool is_counter) const {
        return is_live() && !is_covered_by(t, is_counter);
@@ -221,122 +83,140 @@ public:
        return is_live() && !is_covered_by(t, is_counter) && !has_expired(now);
    }
    bool is_live_and_has_ttl() const {
-        return atomic_cell_type::is_live_and_has_ttl(_data);
+        return _view.is_expiring();
    }
    bool is_dead(gc_clock::time_point now) const {
-        return atomic_cell_type::is_dead(_data) || has_expired(now);
+        return !is_live() || has_expired(now);
    }
    bool is_covered_by(tombstone t, bool is_counter) const {
        return timestamp() <= t.timestamp || (is_counter && t.timestamp != api::missing_timestamp);
    }
    // Can be called on live and dead cells
    api::timestamp_type timestamp() const {
-        return atomic_cell_type::timestamp(_data);
+        return _view.timestamp();
    }
    void set_timestamp(api::timestamp_type ts) {
-        atomic_cell_type::set_timestamp(_data, ts);
+        _view.set_timestamp(ts);
    }
    // Can be called on live cells only
-    auto value() const {
-        return atomic_cell_type::value(_data);
+    data::basic_value_view<is_mutable> value() const {
+        return _view.value();
+    }
+    // Can be called on live cells only
+    size_t value_size() const {
+        return _view.value_size();
+    }
+    bool is_value_fragmented() const {
+        return _view.is_value_fragmented();
    }
    // Can be called on live counter update cells only
    int64_t counter_update_value() const {
-        return atomic_cell_type::counter_update_value(_data);
+        return _view.counter_update_value();
    }
    // Can be called only when is_dead(gc_clock::time_point)
    gc_clock::time_point deletion_time() const {
-        return !is_live() ? atomic_cell_type::deletion_time(_data) : expiry() - ttl();
+        return !is_live() ? _view.deletion_time() : expiry() - ttl();
    }
    // Can be called only when is_live_and_has_ttl()
    gc_clock::time_point expiry() const {
-        return atomic_cell_type::expiry(_data);
+        return _view.expiry();
    }
    // Can be called only when is_live_and_has_ttl()
    gc_clock::duration ttl() const {
-        return atomic_cell_type::ttl(_data);
+        return _view.ttl();
    }
    // Can be called on live and dead cells
    bool has_expired(gc_clock::time_point now) const {
        return is_live_and_has_ttl() && expiry() <= now;
    }
+
    bytes_view serialize() const {
-        return _data;
-    }
-    void set_counter_in_place_revert(bool flag) {
-        atomic_cell_type::set_counter_in_place_revert(_data, flag);
+        return _view.serialize();
    }
 };

-class atomic_cell_view final : public atomic_cell_base<bytes_view> {
-    atomic_cell_view(bytes_view data) : atomic_cell_base(std::move(data)) {}
-public:
-    static atomic_cell_view from_bytes(bytes_view data) { return atomic_cell_view(data); }
+class atomic_cell_view final : public basic_atomic_cell_view<mutable_view::no> {
+    atomic_cell_view(const data::type_info& ti, const uint8_t* data)
+        : basic_atomic_cell_view<mutable_view::no>(ti, data) {}

+    template<mutable_view is_mutable>
+    atomic_cell_view(data::cell::basic_atomic_cell_view<is_mutable> view)
+        : basic_atomic_cell_view<mutable_view::no>(view) { }
    friend class atomic_cell;
+public:
+    static atomic_cell_view from_bytes(const data::type_info& ti, const imr::utils::object<data::cell::structure>& data) {
+        return atomic_cell_view(ti, data.get());
+    }
+
+    static atomic_cell_view from_bytes(const data::type_info& ti, bytes_view bv) {
+        return atomic_cell_view(ti, reinterpret_cast<const uint8_t*>(bv.begin()));
+    }
+
    friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
 };

-class atomic_cell_mutable_view final : public atomic_cell_base<bytes_mutable_view> {
-    atomic_cell_mutable_view(bytes_mutable_view data) : atomic_cell_base(std::move(data)) {}
+class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
+    atomic_cell_mutable_view(const data::type_info& ti, uint8_t* data)
+        : basic_atomic_cell_view<mutable_view::yes>(ti, data) {}
 public:
-    static atomic_cell_mutable_view from_bytes(bytes_mutable_view data) { return atomic_cell_mutable_view(data); }
+    static atomic_cell_mutable_view from_bytes(const data::type_info& ti, imr::utils::object<data::cell::structure>& data) {
+        return atomic_cell_mutable_view(ti, data.get());
+    }

    friend class atomic_cell;
 };

-class atomic_cell_ref final : public atomic_cell_base<managed_bytes&> {
-public:
-    atomic_cell_ref(managed_bytes& buf) : atomic_cell_base(buf) {}
-};
+using atomic_cell_ref = atomic_cell_mutable_view;

-class atomic_cell final : public atomic_cell_base<managed_bytes> {
-    atomic_cell(managed_bytes b) : atomic_cell_base(std::move(b)) {}
+class atomic_cell final : public basic_atomic_cell_view<mutable_view::yes> {
+    using imr_object_type =  imr::utils::object<data::cell::structure>;
+    imr_object_type _data;
+    atomic_cell(const data::type_info& ti, imr::utils::object<data::cell::structure>&& data)
+        : basic_atomic_cell_view<mutable_view::yes>(ti, data.get()), _data(std::move(data)) {}
 public:
-    atomic_cell(const atomic_cell&) = default;
+    class collection_member_tag;
+    using collection_member = bool_class<collection_member_tag>;
+
    atomic_cell(atomic_cell&&) = default;
-    atomic_cell& operator=(const atomic_cell&) = default;
+    atomic_cell& operator=(const atomic_cell&) = delete;
    atomic_cell& operator=(atomic_cell&&) = default;
-    static atomic_cell from_bytes(managed_bytes b) {
-        return atomic_cell(std::move(b));
+    void swap(atomic_cell& other) noexcept {
+        basic_atomic_cell_view<mutable_view::yes>::swap(other);
+        _data.swap(other._data);
    }
-    atomic_cell(atomic_cell_view other) : atomic_cell_base(managed_bytes{other._data}) {}
-    operator atomic_cell_view() const {
-        return atomic_cell_view(_data);
+    operator atomic_cell_view() const { return atomic_cell_view(_view); }
+    atomic_cell(const abstract_type& t, atomic_cell_view other);
+    static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time);
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
+                                 collection_member = collection_member::no);
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
+                                 collection_member = collection_member::no);
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
+                                 collection_member = collection_member::no);
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
+                                 collection_member cm = collection_member::no) {
+        return make_live(type, timestamp, bytes_view(value), cm);
    }
-    static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
-        return atomic_cell_type::make_dead(timestamp, deletion_time);
-    }
-    static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value) {
-        return atomic_cell_type::make_live(timestamp, value);
-    }
-    static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value) {
-        return make_live(timestamp, bytes_view(value));
-    }
-    static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
-        return atomic_cell_type::make_live_counter_update(timestamp, value);
-    }
-    static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value,
-        gc_clock::time_point expiry, gc_clock::duration ttl)
+    static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value);
+    static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, bytes_view value,
+        gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
+    static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, ser::buffer_view<bytes_ostream::fragment_iterator> value,
+        gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
+    static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, const fragmented_temporary_buffer::view& value,
+        gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
+                                 gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm = collection_member::no)
    {
-        return atomic_cell_type::make_live(timestamp, value, expiry, ttl);
+        return make_live(type, timestamp, bytes_view(value), expiry, ttl, cm);
    }
-    static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value,
-                                 gc_clock::time_point expiry, gc_clock::duration ttl)
-    {
-        return make_live(timestamp, bytes_view(value), expiry, ttl);
-    }
-    static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value, ttl_opt ttl) {
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value, ttl_opt ttl, collection_member cm = collection_member::no) {
        if (!ttl) {
-            return atomic_cell_type::make_live(timestamp, value);
+            return make_live(type, timestamp, value, cm);
        } else {
-            return atomic_cell_type::make_live(timestamp, value, gc_clock::now() + *ttl, *ttl);
+            return make_live(type, timestamp, value, gc_clock::now() + *ttl, *ttl, cm);
        }
    }
-    template<typename Serializer>
-    static atomic_cell make_live_from_serializer(api::timestamp_type timestamp, size_t size, Serializer&& serializer) {
-        return atomic_cell_type::make_live_from_serializer(timestamp, size, std::forward<Serializer>(serializer));
-    }
+    static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
    friend class atomic_cell_or_collection;
    friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
 };
@@ -350,33 +230,24 @@ class collection_mutation_view;
 //   list: tbd, probably ugly
 class collection_mutation {
 public:
-    managed_bytes data;
+    using imr_object_type =  imr::utils::object<data::cell::structure>;
+    imr_object_type _data;
+
    collection_mutation() {}
-    collection_mutation(managed_bytes b) : data(std::move(b)) {}
-    collection_mutation(collection_mutation_view v);
+    collection_mutation(const collection_type_impl&, collection_mutation_view v);
+    collection_mutation(const collection_type_impl&, bytes_view bv);
    operator collection_mutation_view() const;
 };

+
 class collection_mutation_view {
 public:
-    bytes_view data;
-    bytes_view serialize() const { return data; }
-    static collection_mutation_view from_bytes(bytes_view v) { return { v }; }
+    atomic_cell_value_view data;
 };

-inline
-collection_mutation::collection_mutation(collection_mutation_view v)
-        : data(v.data) {
-}
-
-inline
-collection_mutation::operator collection_mutation_view() const {
-    return { data };
-}
-
 class column_definition;

 int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);
-void merge_column(const column_definition& def,
+void merge_column(const abstract_type& def,
        atomic_cell_or_collection& old,
        const atomic_cell_or_collection& neww);
--- a/atomic_cell_hash.hh
+++ b/atomic_cell_hash.hh
@@ -33,12 +33,15 @@ template<>
 struct appending_hash<collection_mutation_view> {
    template<typename Hasher>
    void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
-        auto m_view = collection_type_impl::deserialize_mutation_form(cell);
+      cell.data.with_linearized([&] (bytes_view cell_bv) {
+        auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
+        auto m_view = ctype->deserialize_mutation_form(cell_bv);
        ::feed_hash(h, m_view.tomb);
        for (auto&& key_and_value : m_view.cells) {
            ::feed_hash(h, key_and_value.first);
            ::feed_hash(h, key_and_value.second, cdef);
        }
+      });
    }
 };

@@ -50,7 +53,9 @@ struct appending_hash<atomic_cell_view> {
        feed_hash(h, cell.timestamp());
        if (cell.is_live()) {
            if (cdef.is_counter()) {
-                ::feed_hash(h, counter_cell_view(cell));
+                counter_cell_view::with_linearized(cell, [&] (counter_cell_view ccv) {
+                    ::feed_hash(h, ccv);
+                });
                return;
            }
            if (cell.is_live_and_has_ttl()) {
@@ -85,9 +90,9 @@ struct appending_hash<atomic_cell_or_collection> {
    template<typename Hasher>
    void operator()(Hasher& h, const atomic_cell_or_collection& c, const column_definition& cdef) const {
        if (cdef.is_atomic()) {
-            feed_hash(h, c.as_atomic_cell(), cdef);
+            feed_hash(h, c.as_atomic_cell(cdef), cdef);
        } else {
            feed_hash(h, c.as_collection_mutation(), cdef);
        }
    }
-};
+};
--- a/atomic_cell_or_collection.hh
+++ b/atomic_cell_or_collection.hh
@@ -25,42 +25,56 @@
 #include "schema.hh"
 #include "hashing.hh"

+#include "imr/utils.hh"
+
 // A variant type that can hold either an atomic_cell, or a serialized collection.
 // Which type is stored is determined by the schema.
-// Has an "empty" state.
-// Objects moved-from are left in an empty state.
 class atomic_cell_or_collection final {
-    managed_bytes _data;
+    // FIXME: This has made us lose small-buffer optimisation. Unfortunately,
+    // due to the changed cell format it would be less effective now, anyway.
+    // Measure the actual impact because any attempts to fix this will become
+    // irrelevant once rows are converted to the IMR as well, so maybe we can
+    // live with this like that.
+    using imr_object_type = imr::utils::object<data::cell::structure>;
+    imr_object_type _data;
 private:
-    atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
+    atomic_cell_or_collection(imr::utils::object<data::cell::structure>&& data) : _data(std::move(data)) {}
 public:
    atomic_cell_or_collection() = default;
+    atomic_cell_or_collection(atomic_cell_or_collection&&) = default;
+    atomic_cell_or_collection(const atomic_cell_or_collection&) = delete;
+    atomic_cell_or_collection& operator=(atomic_cell_or_collection&&) = default;
+    atomic_cell_or_collection& operator=(const atomic_cell_or_collection&) = delete;
    atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
+    atomic_cell_or_collection(const abstract_type& at, atomic_cell_view acv);
    static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
-    atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
-    atomic_cell_ref as_atomic_cell_ref() { return { _data }; }
-    atomic_cell_mutable_view as_mutable_atomic_cell() { return atomic_cell_mutable_view::from_bytes(_data); }
-    atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
+    atomic_cell_view as_atomic_cell(const column_definition& cdef) const { return atomic_cell_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
+    atomic_cell_ref as_atomic_cell_ref(const column_definition& cdef) { return atomic_cell_mutable_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
+    atomic_cell_mutable_view as_mutable_atomic_cell(const column_definition& cdef) { return atomic_cell_mutable_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
+    atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm._data)) { }
+    atomic_cell_or_collection copy(const abstract_type&) const;
    explicit operator bool() const {
-        return !_data.empty();
+        return bool(_data);
    }
-    bool can_use_mutable_view() const {
-        return !_data.is_fragmented();
+    static constexpr bool can_use_mutable_view() {
+        return true;
    }
-    static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
-        return std::move(data.data);
-    }
-    collection_mutation_view as_collection_mutation() const {
-        return collection_mutation_view{_data};
-    }
-    bytes_view serialize() const {
-        return _data;
-    }
-    bool operator==(const atomic_cell_or_collection& other) const {
-        return _data == other._data;
-    }
-    size_t external_memory_usage() const {
-        return _data.external_memory_usage();
+    void swap(atomic_cell_or_collection& other) noexcept {
+        _data.swap(other._data);
    }
+    static atomic_cell_or_collection from_collection_mutation(collection_mutation data) { return std::move(data._data); }
+    collection_mutation_view as_collection_mutation() const;
+    bytes_view serialize() const;
+    bool equals(const abstract_type& type, const atomic_cell_or_collection& other) const;
+    size_t external_memory_usage(const abstract_type&) const;
    friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
 };
+
+namespace std {
+
+inline void swap(atomic_cell_or_collection& a, atomic_cell_or_collection& b) noexcept
+{
+    a.swap(b);
+}
+
+}
--- a/auth/common.cc
+++ b/auth/common.cc
@@ -28,6 +28,7 @@
 #include "database.hh"
 #include "schema_builder.hh"
 #include "service/migration_manager.hh"
+#include "timeout_config.hh"

 namespace auth {

@@ -86,12 +87,24 @@ future<> create_metadata_table_if_missing(
    return mm.announce_new_column_family(b.build(), false);
 }

-future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db) {
+future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db, seastar::abort_source& as) {
    static const auto pause = [] { return sleep(std::chrono::milliseconds(500)); };

-    return do_until([&db] { return db.get_version() != database::empty_version; }, pause).then([&mm] {
-        return do_until([&mm] { return mm.have_schema_agreement(); }, pause);
+    return do_until([&db, &as] {
+        as.check();
+        return db.get_version() != database::empty_version;
+    }, pause).then([&mm, &as] {
+        return do_until([&mm, &as] {
+            as.check();
+            return mm.have_schema_agreement();
+        }, pause);
    });
 }

+const timeout_config& internal_distributed_timeout_config() noexcept {
+    static const auto t = 5s;
+    static const timeout_config tc{t, t, t, t, t, t, t};
+    return tc;
+}
+
 }
--- a/auth/common.hh
+++ b/auth/common.hh
@@ -38,6 +38,7 @@
 using namespace std::chrono_literals;

 class database;
+class timeout_config;

 namespace service {
 class migration_manager;
@@ -80,6 +81,11 @@ future<> create_metadata_table_if_missing(
        stdx::string_view cql,
        ::service::migration_manager&);

-future<> wait_for_schema_agreement(::service::migration_manager&, const database&);
+future<> wait_for_schema_agreement(::service::migration_manager&, const database&, seastar::abort_source&);
+
+///
+/// Time-outs for internal, non-local CQL queries.
+///
+const timeout_config& internal_distributed_timeout_config() noexcept;

 }
--- a/auth/default_authorizer.cc
+++ b/auth/default_authorizer.cc
@@ -103,6 +103,7 @@ future<bool> default_authorizer::any_granted() const {
    return _qp.process(
            query,
            db::consistency_level::LOCAL_ONE,
+            infinite_timeout_config,
            {},
            true).then([this](::shared_ptr<cql3::untyped_result_set> results) {
        return !results->empty();
@@ -115,7 +116,8 @@ future<> default_authorizer::migrate_legacy_metadata() const {

    return _qp.process(
            query,
-            db::consistency_level::LOCAL_ONE).then([this](::shared_ptr<cql3::untyped_result_set> results) {
+            db::consistency_level::LOCAL_ONE,
+            infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {
        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
            return do_with(
                    row.get_as<sstring>("username"),
@@ -158,7 +160,7 @@ future<> default_authorizer::start() {
                _migration_manager).then([this] {
            _finished = do_after_system_ready(_as, [this] {
                return async([this] {
-                    wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
+                    wait_for_schema_agreement(_migration_manager, _qp.db().local(), _as).get0();

                    if (legacy_metadata_exists()) {
                        if (!any_granted().get0()) {
@@ -176,7 +178,7 @@ future<> default_authorizer::start() {

 future<> default_authorizer::stop() {
    _as.request_abort();
-    return _finished.handle_exception_type([](const sleep_aborted&) {});
+    return _finished.handle_exception_type([](const sleep_aborted&) {}).handle_exception_type([](const abort_requested_exception&) {});
 }

 future<permission_set>
@@ -196,6 +198,7 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc
    return _qp.process(
            query,
            db::consistency_level::LOCAL_ONE,
+            infinite_timeout_config,
            {*maybe_role.name, r.name()}).then([](::shared_ptr<cql3::untyped_result_set> results) {
        if (results->empty()) {
            return permissions::NONE;
@@ -225,6 +228,7 @@ default_authorizer::modify(
        return _qp.process(
                query,
                db::consistency_level::ONE,
+                internal_distributed_timeout_config(),
                {permissions::to_strings(set), sstring(role_name), resource.name()}).discard_result();
    });
 }
@@ -250,6 +254,7 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {
    return _qp.process(
            query,
            db::consistency_level::ONE,
+            internal_distributed_timeout_config(),
            {},
            true).then([](::shared_ptr<cql3::untyped_result_set> results) {
        std::vector<permission_details> all_details;
@@ -277,6 +282,7 @@ future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
    return _qp.process(
            query,
            db::consistency_level::ONE,
+            internal_distributed_timeout_config(),
            {sstring(role_name)}).discard_result().handle_exception([role_name](auto ep) {
        try {
            std::rethrow_exception(ep);
@@ -297,6 +303,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
    return _qp.process(
            query,
            db::consistency_level::LOCAL_ONE,
+            infinite_timeout_config,
            {resource.name()}).then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
        try {
            auto res = f.get0();
@@ -314,6 +321,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
                return _qp.process(
                        query,
                        db::consistency_level::LOCAL_ONE,
+                        infinite_timeout_config,
                        {r.get_as<sstring>(ROLE_NAME), resource.name()}).discard_result().handle_exception(
                                [resource](auto ep) {
                    try {
--- a/auth/password_authenticator.cc
+++ b/auth/password_authenticator.cc
@@ -41,11 +41,6 @@

 #include "auth/password_authenticator.hh"

-extern "C" {
-#include <crypt.h>
-#include <unistd.h>
-}
-
 #include <algorithm>
 #include <chrono>
 #include <random>
@@ -55,6 +50,7 @@ extern "C" {

 #include "auth/authenticated_user.hh"
 #include "auth/common.hh"
+#include "auth/passwords.hh"
 #include "auth/roles-metadata.hh"
 #include "cql3/untyped_result_set.hh"
 #include "log.hh"
@@ -82,6 +78,8 @@ static const class_registrator<
        cql3::query_processor&,
        ::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");

+static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}());
+
 password_authenticator::~password_authenticator() {
 }

@@ -91,80 +89,8 @@ password_authenticator::password_authenticator(cql3::query_processor& qp, ::serv
    , _stopped(make_ready_future<>()) {
 }

-// TODO: blowfish
-// Origin uses Java bcrypt library, i.e. blowfish salt
-// generation and hashing, which is arguably a "better"
-// password hash than sha/md5 versions usually available in
-// crypt_r. Otoh, glibc 2.7+ uses a modified sha512 algo
-// which should be the same order of safe, so the only
-// real issue should be salted hash compatibility with
-// origin if importing system tables from there.
-//
-// Since bcrypt/blowfish is _not_ (afaict) not available
-// as a dev package/lib on most linux distros, we'd have to
-// copy and compile for example OWL  crypto
-// (http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/)
-// to be fully bit-compatible.
-//
-// Until we decide this is needed, let's just use crypt_r,
-// and some old-fashioned random salt generation.
-
-static constexpr size_t rand_bytes = 16;
-static thread_local crypt_data tlcrypt = { 0, };
-
-static sstring hashpw(const sstring& pass, const sstring& salt) {
-    auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);
-    if (res == nullptr) {
-        throw std::system_error(errno, std::system_category());
-    }
-    return res;
-}
-
-static bool checkpw(const sstring& pass, const sstring& salted_hash) {
-    auto tmp = hashpw(pass, salted_hash);
-    return tmp == salted_hash;
-}
-
-static sstring gensalt() {
-    static sstring prefix;
-
-    std::random_device rd;
-    std::default_random_engine e1(rd());
-    std::uniform_int_distribution<char> dist;
-
-    sstring valid_salt = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
-    sstring input(rand_bytes, 0);
-
-    for (char&c : input) {
-        c = valid_salt[dist(e1) % valid_salt.size()];
-    }
-
-    sstring salt;
-
-    if (!prefix.empty()) {
-        return prefix + input;
-    }
-
-    // Try in order:
-    // blowfish 2011 fix, blowfish, sha512, sha256, md5
-    for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
-        salt = pfx + input;
-        const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
-
-        if (e && (e[0] != '*')) {
-            prefix = pfx;
-            return salt;
-        }
-    }
-    throw std::runtime_error("Could not initialize hashing algorithm");
-}
-
-static sstring hashpw(const sstring& pass) {
-    return hashpw(pass, gensalt());
-}
-
 static bool has_salted_hash(const cql3::untyped_result_set_row& row) {
-    return utf8_type->deserialize(row.get_blob(SALTED_HASH)) != data_value::make_null(utf8_type);
+    return !row.get_or<sstring>(SALTED_HASH, "").empty();
 }

 static const sstring update_row_query = sprint(
@@ -185,7 +111,8 @@ future<> password_authenticator::migrate_legacy_metadata() const {

    return _qp.process(
            query,
-            db::consistency_level::QUORUM).then([this](::shared_ptr<cql3::untyped_result_set> results) {
+            db::consistency_level::QUORUM,
+            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
            auto username = row.get_as<sstring>("username");
            auto salted_hash = row.get_as<sstring>(SALTED_HASH);
@@ -193,6 +120,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {
            return _qp.process(
                    update_row_query,
                    consistency_for_user(username),
+                    internal_distributed_timeout_config(),
                    {std::move(salted_hash), username}).discard_result();
        }).finally([results] {});
    }).then([] {
@@ -209,7 +137,8 @@ future<> password_authenticator::create_default_if_missing() const {
            return _qp.process(
                    update_row_query,
                    db::consistency_level::QUORUM,
-                    {hashpw(DEFAULT_USER_PASSWORD), DEFAULT_USER_NAME}).then([](auto&&) {
+                    internal_distributed_timeout_config(),
+                    {passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt), DEFAULT_USER_NAME}).then([](auto&&) {
                plogger.info("Created default superuser authentication record.");
            });
        }
@@ -220,8 +149,6 @@ future<> password_authenticator::create_default_if_missing() const {

 future<> password_authenticator::start() {
     return once_among_shards([this] {
-         gensalt(); // do this once to determine usable hashing
-
         auto f = create_metadata_table_if_missing(
                 meta::roles_table::name,
                 _qp,
@@ -230,7 +157,7 @@ future<> password_authenticator::start() {

         _stopped = do_after_system_ready(_as, [this] {
             return async([this] {
-                 wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
+                 wait_for_schema_agreement(_migration_manager, _qp.db().local(), _as).get0();

                 if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {
                     if (legacy_metadata_exists()) {
@@ -255,7 +182,7 @@ future<> password_authenticator::start() {

 future<> password_authenticator::stop() {
    _as.request_abort();
-    return _stopped.handle_exception_type([] (const sleep_aborted&) { });
+    return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});
 }

 db::consistency_level password_authenticator::consistency_for_user(stdx::string_view role_name) {
@@ -308,12 +235,17 @@ future<authenticated_user> password_authenticator::authenticate(
        return _qp.process(
                query,
                consistency_for_user(username),
+                internal_distributed_timeout_config(),
                {username},
                true);
    }).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
        try {
            auto res = f.get0();
-            if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
+            auto salted_hash = std::experimental::optional<sstring>();
+            if (!res->empty()) {
+                salted_hash = res->one().get_opt<sstring>(SALTED_HASH);
+            }
+            if (!salted_hash || !passwords::check(password, *salted_hash)) {
                throw exceptions::authentication_exception("Username and/or password are incorrect");
            }
            return make_ready_future<authenticated_user>(username);
@@ -335,7 +267,8 @@ future<> password_authenticator::create(stdx::string_view role_name, const authe
    return _qp.process(
            update_row_query,
            consistency_for_user(role_name),
-            {hashpw(*options.password), sstring(role_name)}).discard_result();
+            internal_distributed_timeout_config(),
+            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
 }

 future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) const {
@@ -352,7 +285,8 @@ future<> password_authenticator::alter(stdx::string_view role_name, const authen
    return _qp.process(
            query,
            consistency_for_user(role_name),
-            {hashpw(*options.password), sstring(role_name)}).discard_result();
+            internal_distributed_timeout_config(),
+            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
 }

 future<> password_authenticator::drop(stdx::string_view name) const {
@@ -362,7 +296,10 @@ future<> password_authenticator::drop(stdx::string_view name) const {
            meta::roles_table::qualified_name(),
            meta::roles_table::role_col_name);

-    return _qp.process(query, consistency_for_user(name), {sstring(name)}).discard_result();
+    return _qp.process(
+            query, consistency_for_user(name),
+            internal_distributed_timeout_config(),
+            {sstring(name)}).discard_result();
 }

 future<custom_options> password_authenticator::query_custom_options(stdx::string_view role_name) const {
--- a/auth/passwords.cc
+++ b/auth/passwords.cc
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/passwords.hh"
+
+#include <cerrno>
+#include <optional>
+
+extern "C" {
+#include <crypt.h>
+#include <unistd.h>
+}
+
+namespace auth::passwords {
+
+static thread_local crypt_data tlcrypt = { 0, };
+
+namespace detail {
+
+scheme identify_best_supported_scheme() {
+    const auto all_schemes = { scheme::bcrypt_y, scheme::bcrypt_a, scheme::sha_512, scheme::sha_256, scheme::md5 };
+    // "Random", for testing schemes.
+    const sstring random_part_of_salt = "aaaabbbbccccdddd";
+
+    for (scheme c : all_schemes) {
+        const sstring salt = sstring(prefix_for_scheme(c)) + random_part_of_salt;
+        const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
+
+        if (e && (e[0] != '*')) {
+            return c;
+        }
+    }
+
+    throw no_supported_schemes();
+}
+
+sstring hash_with_salt(const sstring& pass, const sstring& salt) {
+    auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);
+    if (!res || (res[0] == '*')) {
+        throw std::system_error(errno, std::system_category());
+    }
+    return res;
+}
+
+const char* prefix_for_scheme(scheme c) noexcept {
+    switch (c) {
+    case scheme::bcrypt_y: return "$2y$";
+    case scheme::bcrypt_a: return "$2a$";
+    case scheme::sha_512: return "$6$";
+    case scheme::sha_256: return "$5$";
+    case scheme::md5: return "$1$";
+    default: return nullptr;
+    }
+}
+
+} // namespace detail
+
+no_supported_schemes::no_supported_schemes()
+        : std::runtime_error("No allowed hashing schemes are supported on this system") {
+}
+
+bool check(const sstring& pass, const sstring& salted_hash) {
+    return detail::hash_with_salt(pass, salted_hash) == salted_hash;
+}
+
+} // namespace auth::paswords
--- a/auth/passwords.hh
+++ b/auth/passwords.hh
@@ -0,0 +1,125 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <random>
+#include <stdexcept>
+
+#include <seastar/core/sstring.hh>
+
+#include "seastarx.hh"
+
+namespace auth::passwords {
+
+class no_supported_schemes : public std::runtime_error {
+public:
+    no_supported_schemes();
+};
+
+///
+/// Apache Cassandra uses a library to provide the bcrypt scheme. Many Linux implementations do not support bcrypt, so
+/// we support alternatives. The cost is loss of direct compatibility with Apache Cassandra system tables.
+///
+enum class scheme {
+    bcrypt_y,
+    bcrypt_a,
+    sha_512,
+    sha_256,
+    md5
+};
+
+namespace detail {
+
+template <typename RandomNumberEngine>
+sstring generate_random_salt_bytes(RandomNumberEngine& g) {
+    static const sstring valid_bytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
+    static constexpr std::size_t num_bytes = 16;
+    std::uniform_int_distribution<std::size_t> dist(0, valid_bytes.size() - 1);
+    sstring result(num_bytes, 0);
+
+    for (char& c : result) {
+        c = valid_bytes[dist(g)];
+    }
+
+    return result;
+}
+
+///
+/// Test each allowed hashing scheme and report the best supported one on the current system.
+///
+/// \throws \ref no_supported_schemes when none of the known schemes is supported.
+///
+scheme identify_best_supported_scheme();
+
+const char* prefix_for_scheme(scheme) noexcept;
+
+///
+/// Generate a implementation-specific salt string for hashing passwords.
+///
+/// The `RandomNumberEngine` is used to generate the string, which is an implementation-specific length.
+///
+/// \throws \ref no_supported_schemes when no known hashing schemes are supported on the system.
+///
+template <typename RandomNumberEngine>
+sstring generate_salt(RandomNumberEngine& g) {
+    static const scheme scheme = identify_best_supported_scheme();
+    static const sstring prefix = sstring(prefix_for_scheme(scheme));
+    return prefix + generate_random_salt_bytes(g);
+}
+
+///
+/// Hash a password combined with an implementation-specific salt string.
+///
+/// \throws \ref std::system_error when an unexpected implementation-specific error occurs.
+///
+sstring hash_with_salt(const sstring& pass, const sstring& salt);
+
+} // namespace detail
+
+///
+/// Run a one-way hashing function on cleartext to produce encrypted text.
+///
+/// Prior to applying the hashing function, random salt is amended to the cleartext. The random salt bytes are generated
+/// according to the random number engine `g`.
+///
+/// The result is the encrypted cyphertext, and also the salt used but in a implementation-specific format.
+///
+/// \throws \ref std::system_error when the implementation-specific implementation fails to hash the cleartext.
+///
+template <typename RandomNumberEngine>
+sstring hash(const sstring& pass, RandomNumberEngine& g) {
+    return detail::hash_with_salt(pass, detail::generate_salt(g));
+}
+
+///
+/// Check that cleartext matches previously hashed cleartext with salt.
+///
+/// \ref salted_hash is the result of invoking \ref hash, which is the implementation-specific combination of the hashed
+/// password and the salt that was generated for it.
+///
+/// \returns `true` if the cleartext matches the salted hash.
+///
+/// \throws \ref std::system_error when an unexpected implementation-specific error occurs.
+///
+bool check(const sstring& pass, const sstring& salted_hash);
+
+} // namespace auth::passwords
--- a/auth/roles-metadata.cc
+++ b/auth/roles-metadata.cc
@@ -72,12 +72,14 @@ future<bool> default_role_row_satisfies(
        return qp.process(
                query,
                db::consistency_level::ONE,
+                infinite_timeout_config,
                {meta::DEFAULT_SUPERUSER_NAME},
                true).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {
            if (results->empty()) {
                return qp.process(
                        query,
                        db::consistency_level::QUORUM,
+                        internal_distributed_timeout_config(),
                        {meta::DEFAULT_SUPERUSER_NAME},
                        true).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
                    if (results->empty()) {
@@ -101,7 +103,8 @@ future<bool> any_nondefault_role_row_satisfies(
    return do_with(std::move(p), [&qp](const auto& p) {
        return qp.process(
                query,
-                db::consistency_level::QUORUM).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
+                db::consistency_level::QUORUM,
+                internal_distributed_timeout_config()).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
            if (results->empty()) {
                return false;
            }
--- a/auth/service.cc
+++ b/auth/service.cc
@@ -37,7 +37,7 @@
 #include "cql3/query_processor.hh"
 #include "cql3/untyped_result_set.hh"
 #include "db/config.hh"
-#include "db/consistency_level.hh"
+#include "db/consistency_level_type.hh"
 #include "exceptions/exceptions.hh"
 #include "log.hh"
 #include "service/migration_listener.hh"
@@ -184,7 +184,9 @@ future<> service::start() {
    return once_among_shards([this] {
        return create_keyspace_if_missing();
    }).then([this] {
-        return when_all_succeed(_role_manager->start(), _authorizer->start(), _authenticator->start());
+        return _role_manager->start().then([this] {
+            return when_all_succeed(_authorizer->start(), _authenticator->start());
+        });
    }).then([this] {
        _permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
    }).then([this] {
@@ -196,6 +198,10 @@ future<> service::start() {
 }

 future<> service::stop() {
+    // Only one of the shards has the listener registered, but let's try to
+    // unregister on each one just to make sure.
+    _migration_manager.unregister_listener(_migration_listener.get());
+
    return _permissions_cache->stop().then([this] {
        return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
    });
@@ -223,6 +229,7 @@ future<bool> service::has_existing_legacy_users() const {
    return _qp.process(
            default_user_query,
            db::consistency_level::ONE,
+            infinite_timeout_config,
            {meta::DEFAULT_SUPERUSER_NAME},
            true).then([this](auto results) {
        if (!results->empty()) {
@@ -232,6 +239,7 @@ future<bool> service::has_existing_legacy_users() const {
        return _qp.process(
                default_user_query,
                db::consistency_level::QUORUM,
+                infinite_timeout_config,
                {meta::DEFAULT_SUPERUSER_NAME},
                true).then([this](auto results) {
            if (!results->empty()) {
@@ -240,7 +248,8 @@ future<bool> service::has_existing_legacy_users() const {

            return _qp.process(
                    all_users_query,
-                    db::consistency_level::QUORUM).then([](auto results) {
+                    db::consistency_level::QUORUM,
+                    infinite_timeout_config).then([](auto results) {
                return make_ready_future<bool>(!results->empty());
            });
        });
--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -89,6 +89,7 @@ static future<stdx::optional<record>> find_record(cql3::query_processor& qp, std
    return qp.process(
            query,
            consistency_for_role(role_name),
+            internal_distributed_timeout_config(),
            {sstring(role_name)},
            true).then([](::shared_ptr<cql3::untyped_result_set> results) {
        if (results->empty()) {
@@ -173,6 +174,7 @@ future<> standard_role_manager::create_default_role_if_missing() const {
            return _qp.process(
                    query,
                    db::consistency_level::QUORUM,
+                    internal_distributed_timeout_config(),
                    {meta::DEFAULT_SUPERUSER_NAME}).then([](auto&&) {
                log.info("Created default superuser role '{}'.", meta::DEFAULT_SUPERUSER_NAME);
                return make_ready_future<>();
@@ -198,7 +200,8 @@ future<> standard_role_manager::migrate_legacy_metadata() const {

    return _qp.process(
            query,
-            db::consistency_level::QUORUM).then([this](::shared_ptr<cql3::untyped_result_set> results) {
+            db::consistency_level::QUORUM,
+            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
            role_config config;
            config.is_superuser = row.get_as<bool>("super");
@@ -224,7 +227,7 @@ future<> standard_role_manager::start() {
        return this->create_metadata_tables_if_missing().then([this] {
            _stopped = auth::do_after_system_ready(_as, [this] {
                return seastar::async([this] {
-                    wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
+                    wait_for_schema_agreement(_migration_manager, _qp.db().local(), _as).get0();

                    if (any_nondefault_role_row_satisfies(_qp, &has_can_login).get0()) {
                        if (this->legacy_metadata_exists()) {
@@ -248,7 +251,7 @@ future<> standard_role_manager::start() {

 future<> standard_role_manager::stop() {
    _as.request_abort();
-    return _stopped.handle_exception_type([] (const sleep_aborted&) { });
+    return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});;
 }

 future<> standard_role_manager::create_or_replace(stdx::string_view role_name, const role_config& c) const {
@@ -260,6 +263,7 @@ future<> standard_role_manager::create_or_replace(stdx::string_view role_name, c
    return _qp.process(
            query,
            consistency_for_role(role_name),
+            internal_distributed_timeout_config(),
            {sstring(role_name), c.is_superuser, c.can_login},
            true).discard_result();
 }
@@ -303,6 +307,7 @@ standard_role_manager::alter(stdx::string_view role_name, const role_config_upda
                        build_column_assignments(u),
                        meta::roles_table::role_col_name),
                consistency_for_role(role_name),
+                internal_distributed_timeout_config(),
                {sstring(role_name)}).discard_result();
    });
 }
@@ -322,6 +327,7 @@ future<> standard_role_manager::drop(stdx::string_view role_name) const {
            return _qp.process(
                    query,
                    consistency_for_role(role_name),
+                    internal_distributed_timeout_config(),
                    {sstring(role_name)}).then([this, role_name](::shared_ptr<cql3::untyped_result_set> members) {
                return parallel_for_each(
                        members->begin(),
@@ -361,6 +367,7 @@ future<> standard_role_manager::drop(stdx::string_view role_name) const {
            return _qp.process(
                    query,
                    consistency_for_role(role_name),
+                    internal_distributed_timeout_config(),
                    {sstring(role_name)}).discard_result();
        };

@@ -387,6 +394,7 @@ standard_role_manager::modify_membership(
        return _qp.process(
                query,
                consistency_for_role(grantee_name),
+                internal_distributed_timeout_config(),
                {role_set{sstring(role_name)}, sstring(grantee_name)}).discard_result();
    };

@@ -398,6 +406,7 @@ standard_role_manager::modify_membership(
                                "INSERT INTO %s (role, member) VALUES (?, ?)",
                                meta::role_members_table::qualified_name()),
                        consistency_for_role(role_name),
+                        internal_distributed_timeout_config(),
                        {sstring(role_name), sstring(grantee_name)}).discard_result();

            case membership_change::remove:
@@ -406,6 +415,7 @@ standard_role_manager::modify_membership(
                                "DELETE FROM %s WHERE role = ? AND member = ?",
                                meta::role_members_table::qualified_name()),
                        consistency_for_role(role_name),
+                        internal_distributed_timeout_config(),
                        {sstring(role_name), sstring(grantee_name)}).discard_result();
        }

@@ -506,7 +516,10 @@ future<role_set> standard_role_manager::query_all() const {
    // To avoid many copies of a view.
    static const auto role_col_name_string = sstring(meta::roles_table::role_col_name);

-    return _qp.process(query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> results) {
+    return _qp.process(
+            query,
+            db::consistency_level::QUORUM,
+            internal_distributed_timeout_config()).then([](::shared_ptr<cql3::untyped_result_set> results) {
        role_set roles;

        std::transform(
--- a/backlog_controller.hh
+++ b/backlog_controller.hh
@@ -77,7 +77,7 @@ protected:
        , _io_priority(iop)
        , _interval(interval)
        , _update_timer([this] { adjust(); })
-        , _control_points({{0,0}})
+        , _control_points()
        , _current_backlog(std::move(backlog))
        , _inflight_update(make_ready_future<>())
    {
@@ -96,6 +96,12 @@ protected:
    }

    virtual ~backlog_controller() {}
+public:
+    backlog_controller(backlog_controller&&) = default;
+    float backlog_of_shares(float shares) const;
+    seastar::scheduling_group sg() {
+        return _scheduling_group;
+    }
 };

 // memtable flush CPU controller.
@@ -119,7 +125,7 @@ public:
    flush_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
    flush_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, float soft_limit, std::function<float()> current_dirty)
        : backlog_controller(sg, iop, std::move(interval),
-          std::vector<backlog_controller::control_point>({{soft_limit, 100}, {soft_limit + (hard_dirty_limit - soft_limit) / 2, 200} , {hard_dirty_limit, 1000}}),
+          std::vector<backlog_controller::control_point>({{0.0, 0.0}, {soft_limit, 10}, {soft_limit + (hard_dirty_limit - soft_limit) / 2, 200} , {hard_dirty_limit, 1000}}),
          std::move(current_dirty)
        )
    {}
@@ -128,10 +134,12 @@ public:
 class compaction_controller : public backlog_controller {
 public:
    static constexpr unsigned normalization_factor = 30;
+    static constexpr float disable_backlog = std::numeric_limits<double>::infinity();
+    static constexpr float backlog_disabled(float backlog) { return std::isinf(backlog); }
    compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
    compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, std::function<float()> current_backlog)
        : backlog_controller(sg, iop, std::move(interval),
-          std::vector<backlog_controller::control_point>({{0.5, 10}, {1.5, 100} , {normalization_factor, 1000}}),
+          std::vector<backlog_controller::control_point>({{0.0, 50}, {1.5, 100} , {normalization_factor, 1000}}),
          std::move(current_backlog)
        )
    {}
--- a/bytes.hh
+++ b/bytes.hh
@@ -29,12 +29,16 @@
 #include <functional>
 #include "utils/mutable_view.hh"

-using bytes = basic_sstring<int8_t, uint32_t, 31>;
+using bytes = basic_sstring<int8_t, uint32_t, 31, false>;
 using bytes_view = std::experimental::basic_string_view<int8_t>;
 using bytes_mutable_view = basic_mutable_view<bytes_view::value_type>;
 using bytes_opt = std::experimental::optional<bytes>;
 using sstring_view = std::experimental::string_view;

+inline sstring_view to_sstring_view(bytes_view view) {
+    return {reinterpret_cast<const char*>(view.data()), view.size()};
+}
+
 namespace std {

 template <>
@@ -78,3 +82,11 @@ struct appending_hash<bytes_view> {
        h.update(reinterpret_cast<const char*>(v.begin()), v.size() * sizeof(bytes_view::value_type));
    }
 };
+
+inline int32_t compare_unsigned(bytes_view v1, bytes_view v2) {
+    auto n = memcmp(v1.begin(), v2.begin(), std::min(v1.size(), v2.size()));
+    if (n) {
+        return n;
+    }
+    return (int32_t) (v1.size() - v2.size());
+}
--- a/bytes_ostream.hh
+++ b/bytes_ostream.hh
@@ -38,7 +38,7 @@ class bytes_ostream {
 public:
    using size_type = bytes::size_type;
    using value_type = bytes::value_type;
-    static constexpr size_type max_chunk_size() { return 16 * 1024; }
+    static constexpr size_type max_chunk_size() { return 128 * 1024; }
 private:
    static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
    struct chunk {
@@ -57,16 +57,17 @@ private:
        value_type data[0];
        void operator delete(void* ptr) { free(ptr); }
    };
-    // FIXME: consider increasing chunk size as the buffer grows
-    static constexpr size_type chunk_size{512};
+    static constexpr size_type default_chunk_size{512};
 private:
    std::unique_ptr<chunk> _begin;
    chunk* _current;
    size_type _size;
+    size_type _initial_chunk_size = default_chunk_size;
 public:
    class fragment_iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
-        chunk* _current;
+        chunk* _current = nullptr;
    public:
+        fragment_iterator() = default;
        fragment_iterator(chunk* current) : _current(current) {}
        fragment_iterator(const fragment_iterator&) = default;
        fragment_iterator& operator=(const fragment_iterator&) = default;
@@ -101,13 +102,13 @@ private:
    }
    // Figure out next chunk size.
    //   - must be enough for data_size
-    //   - must be at least chunk_size
+    //   - must be at least _initial_chunk_size
    //   - try to double each time to prevent too many allocations
    //   - do not exceed max_chunk_size
    size_type next_alloc_size(size_t data_size) const {
        auto next_size = _current
                ? _current->size * 2
-                : chunk_size;
+                : _initial_chunk_size;
        next_size = std::min(next_size, max_chunk_size());
        // FIXME: check for overflow?
        return std::max<size_type>(next_size, data_size + sizeof(chunk));
@@ -115,13 +116,19 @@ private:
    // Makes room for a contiguous region of given size.
    // The region is accounted for as already written.
    // size must not be zero.
+    [[gnu::always_inline]]
    value_type* alloc(size_type size) {
-        if (size <= current_space_left()) {
+        if (__builtin_expect(size <= current_space_left(), true)) {
            auto ret = _current->data + _current->offset;
            _current->offset += size;
            _size += size;
            return ret;
        } else {
+            return alloc_new(size);
+        }
+    }
+    [[gnu::noinline]]
+    value_type* alloc_new(size_type size) {
            auto alloc_size = next_alloc_size(size);
            auto space = malloc(alloc_size);
            if (!space) {
@@ -139,19 +146,22 @@ private:
            }
            _size += size;
            return _current->data;
-        };
    }
 public:
-    bytes_ostream() noexcept
+    explicit bytes_ostream(size_t initial_chunk_size) noexcept
        : _begin()
        , _current(nullptr)
        , _size(0)
+        , _initial_chunk_size(initial_chunk_size)
    { }

+    bytes_ostream() noexcept : bytes_ostream(default_chunk_size) {}
+
    bytes_ostream(bytes_ostream&& o) noexcept
        : _begin(std::move(o._begin))
        , _current(o._current)
        , _size(o._size)
+        , _initial_chunk_size(o._initial_chunk_size)
    {
        o._current = nullptr;
        o._size = 0;
@@ -161,6 +171,7 @@ public:
        : _begin()
        , _current(nullptr)
        , _size(0)
+        , _initial_chunk_size(o._initial_chunk_size)
    {
        append(o);
    }
@@ -198,18 +209,20 @@ public:
        return place_holder<T>{alloc(sizeof(T))};
    }

+    [[gnu::always_inline]]
    value_type* write_place_holder(size_type size) {
        return alloc(size);
    }

    // Writes given sequence of bytes
+    [[gnu::always_inline]]
    inline void write(bytes_view v) {
        if (v.empty()) {
            return;
        }

        auto this_size = std::min(v.size(), size_t(current_space_left()));
-        if (this_size) {
+        if (__builtin_expect(this_size, true)) {
            memcpy(_current->data + _current->offset, v.begin(), this_size);
            _current->offset += this_size;
            _size += this_size;
@@ -218,11 +231,12 @@ public:

        while (!v.empty()) {
            auto this_size = std::min(v.size(), size_t(max_chunk_size()));
-            std::copy_n(v.begin(), this_size, alloc(this_size));
+            std::copy_n(v.begin(), this_size, alloc_new(this_size));
            v.remove_prefix(this_size);
        }
    }

+    [[gnu::always_inline]]
    void write(const char* ptr, size_t size) {
        write(bytes_view(reinterpret_cast<const signed char*>(ptr), size));
    }
@@ -289,6 +303,24 @@ public:
        }
    }

+    // Removes n bytes from the end of the bytes_ostream.
+    // Beware of O(n) algorithm.
+    void remove_suffix(size_t n) {
+        _size -= n;
+        auto left = _size;
+        auto current = _begin.get();
+        while (current) {
+            if (current->offset >= left) {
+                current->offset = left;
+                _current = current;
+                current->next.reset();
+                return;
+            }
+            left -= current->offset;
+            current = current->next.get();
+        }
+    }
+
    // begin() and end() form an input range to bytes_view representing fragments.
    // Any modification of this instance invalidates iterators.
    fragment_iterator begin() const { return { _begin.get() }; }
@@ -374,6 +406,21 @@ public:
    bool operator!=(const bytes_ostream& other) const {
        return !(*this == other);
    }
+
+    // Makes this instance empty.
+    //
+    // The first buffer is not deallocated, so callers may rely on the
+    // fact that if they write less than the initial chunk size between
+    // the clear() calls then writes will not involve any memory allocations,
+    // except for the first write made on this instance.
+    void clear() {
+        if (_begin) {
+            _begin->offset = 0;
+            _size = 0;
+            _current = _begin.get();
+            _begin->next.reset();
+        }
+    }
 };

 template<>
--- a/cache_flat_mutation_reader.hh
+++ b/cache_flat_mutation_reader.hh
@@ -61,11 +61,12 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
        // - _last_row points at a direct predecessor of the next row which is going to be read.
        //   Used for populating continuity.
        // - _population_range_starts_before_all_rows is set accordingly
+        // - _underlying is engaged and fast-forwarded
        reading_from_underlying,

        end_of_stream
    };
-    lw_shared_ptr<partition_snapshot> _snp;
+    partition_snapshot_ptr _snp;
    position_in_partition::tri_compare _position_cmp;

    query::clustering_key_filter_ranges _ck_ranges;
@@ -94,7 +95,18 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
    // Valid when _state == reading_from_underlying.
    bool _population_range_starts_before_all_rows;

+    // Whether _lower_bound was changed within current fill_buffer().
+    // If it did not then we cannot break out of it (e.g. on preemption) because
+    // forward progress is not guaranteed in case iterators are getting constantly invalidated.
+    bool _lower_bound_changed = false;
+
+    // Points to the underlying reader conforming to _schema,
+    // either to *_underlying_holder or _read_context->underlying().underlying().
+    flat_mutation_reader* _underlying = nullptr;
+    std::optional<flat_mutation_reader> _underlying_holder;
+
    future<> do_fill_buffer(db::timeout_clock::time_point);
+    future<> ensure_underlying(db::timeout_clock::time_point);
    void copy_from_cache_to_buffer();
    future<> process_static_row(db::timeout_clock::time_point);
    void move_to_end();
@@ -132,7 +144,7 @@ public:
                               dht::decorated_key dk,
                               query::clustering_key_filter_ranges&& crr,
                               lw_shared_ptr<read_context> ctx,
-                               lw_shared_ptr<partition_snapshot> snp,
+                               partition_snapshot_ptr snp,
                               row_cache& cache)
        : flat_mutation_reader::impl(std::move(s))
        , _snp(std::move(snp))
@@ -152,9 +164,6 @@ public:
    cache_flat_mutation_reader(const cache_flat_mutation_reader&) = delete;
    cache_flat_mutation_reader(cache_flat_mutation_reader&&) = delete;
    virtual future<> fill_buffer(db::timeout_clock::time_point timeout) override;
-    virtual ~cache_flat_mutation_reader() {
-        maybe_merge_versions(_snp, _lsa_manager.region(), _lsa_manager.read_section());
-    }
    virtual void next_partition() override {
        clear_buffer_to_next_partition();
        if (is_buffer_empty()) {
@@ -184,23 +193,22 @@ future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_
        return make_ready_future<>();
    } else {
        _read_context->cache().on_row_miss();
-        return _read_context->get_next_fragment(timeout).then([this] (mutation_fragment_opt&& sr) {
-            if (sr) {
-                assert(sr->is_static_row());
-                maybe_add_to_cache(sr->as_static_row());
-                push_mutation_fragment(std::move(*sr));
-            }
-            maybe_set_static_row_continuous();
+        return ensure_underlying(timeout).then([this, timeout] {
+            return (*_underlying)(timeout).then([this] (mutation_fragment_opt&& sr) {
+                if (sr) {
+                    assert(sr->is_static_row());
+                    maybe_add_to_cache(sr->as_static_row());
+                    push_mutation_fragment(std::move(*sr));
+                }
+                maybe_set_static_row_continuous();
+            });
        });
    }
 }

 inline
 void cache_flat_mutation_reader::touch_partition() {
-    if (_snp->at_latest_version()) {
-        rows_entry& last_dummy = *_snp->version()->partition().clustered_rows().rbegin();
-        _snp->tracker()->touch(last_dummy);
-    }
+    _snp->touch();
 }

 inline
@@ -230,14 +238,36 @@ future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point t
    });
 }

+inline
+future<> cache_flat_mutation_reader::ensure_underlying(db::timeout_clock::time_point timeout) {
+    if (_underlying) {
+        return make_ready_future<>();
+    }
+    return _read_context->ensure_underlying(timeout).then([this, timeout] {
+        flat_mutation_reader& ctx_underlying = _read_context->underlying().underlying();
+        if (ctx_underlying.schema() != _schema) {
+            _underlying_holder = make_delegating_reader(ctx_underlying);
+            _underlying_holder->upgrade_schema(_schema);
+            _underlying = &*_underlying_holder;
+        } else {
+            _underlying = &ctx_underlying;
+        }
+    });
+}
+
 inline
 future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
    if (_state == state::move_to_underlying) {
+        if (!_underlying) {
+            return ensure_underlying(timeout).then([this, timeout] {
+                return do_fill_buffer(timeout);
+            });
+        }
        _state = state::reading_from_underlying;
        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
        auto end = _next_row_in_range ? position_in_partition(_next_row.position())
                                      : position_in_partition(_upper_bound);
-        return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
+        return _underlying->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
            return read_from_underlying(timeout);
        });
    }
@@ -262,9 +292,13 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin
        }
        _next_row.maybe_refresh();
        clogger.trace("csm {}: next={}, cont={}", this, _next_row.position(), _next_row.continuous());
-        while (!is_buffer_full() && _state == state::reading_from_cache) {
+        _lower_bound_changed = false;
+        while (_state == state::reading_from_cache) {
            copy_from_cache_to_buffer();
-            if (need_preempt()) {
+            // We need to check _lower_bound_changed even if is_buffer_full() because
+            // we may have emitted only a range tombstone which overlapped with _lower_bound
+            // and thus didn't cause _lower_bound to change.
+            if ((need_preempt() || is_buffer_full()) && _lower_bound_changed) {
                break;
            }
        }
@@ -274,7 +308,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin

 inline
 future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::time_point timeout) {
-    return consume_mutation_fragments_until(_read_context->underlying().underlying(),
+    return consume_mutation_fragments_until(*_underlying,
        [this] { return _state != state::reading_from_underlying || is_buffer_full(); },
        [this] (mutation_fragment mf) {
            _read_context->cache().on_row_miss();
@@ -355,7 +389,7 @@ future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::tim
                }
            });
            return make_ready_future<>();
-        });
+        }, timeout);
 }

 inline
@@ -374,7 +408,7 @@ bool cache_flat_mutation_reader::ensure_population_lower_bound() {
            rows_entry::compare less(*_schema);
            // FIXME: Avoid the copy by inserting an incomplete clustering row
            auto e = alloc_strategy_unique_ptr<rows_entry>(
-                current_allocator().construct<rows_entry>(*_last_row));
+                current_allocator().construct<rows_entry>(*_schema, *_last_row));
            e->set_continuous(false);
            auto insert_result = rows.insert_check(rows.end(), *e, less);
            auto inserted = insert_result.second;
@@ -428,7 +462,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
            cr.cells().prepare_hash(*_schema, column_kind::regular_column);
        }
        auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
-            current_allocator().construct<rows_entry>(cr.key(), cr.tomb(), cr.marker(), cr.cells()));
+            current_allocator().construct<rows_entry>(*_schema, cr.key(), cr.tomb(), cr.marker(), cr.cells()));
        new_entry->set_continuous(false);
        auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
                                              : mp.clustered_rows().lower_bound(cr.key(), less);
@@ -471,15 +505,19 @@ void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
    _next_row.touch();
    position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());
    for (auto &&rts : _snp->range_tombstones(_lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {
+        position_in_partition::less_compare less(*_schema);
        // This guarantees that rts starts after any emitted clustering_row
        // and not before any emitted range tombstone.
-        if (rts.trim_front(*_schema, _lower_bound)) {
+        if (!less(_lower_bound, rts.position())) {
+            rts.set_start(*_schema, _lower_bound);
+        } else {
            _lower_bound = position_in_partition(rts.position());
+            _lower_bound_changed = true;
            if (is_buffer_full()) {
                return;
            }
-            push_mutation_fragment(std::move(rts));
        }
+        push_mutation_fragment(std::move(rts));
    }
    // We add the row to the buffer even when it's full.
    // This simplifies the code. For more info see #3139.
@@ -516,6 +554,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con
    _last_row = nullptr;
    _lower_bound = std::move(lb);
    _upper_bound = std::move(ub);
+    _lower_bound_changed = true;
    _ck_ranges_curr = next_it;
    auto adjacent = _next_row.advance_to(_lower_bound);
    _next_row_in_range = !after_current_range(_next_row.position());
@@ -593,6 +632,7 @@ void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&
    auto new_lower_bound = position_in_partition::after_key(row.key());
    push_mutation_fragment(std::move(mf));
    _lower_bound = std::move(new_lower_bound);
+    _lower_bound_changed = true;
 }

 inline
@@ -600,10 +640,16 @@ void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {
    clogger.trace("csm {}: add_to_buffer({})", this, rt);
    // This guarantees that rt starts after any emitted clustering_row
    // and not before any emitted range tombstone.
-    if (!rt.trim_front(*_schema, _lower_bound)) {
+    position_in_partition::less_compare less(*_schema);
+    if (!less(_lower_bound, rt.end_position())) {
        return;
    }
-    _lower_bound = position_in_partition(rt.position());
+    if (!less(_lower_bound, rt.position())) {
+        rt.set_start(*_schema, _lower_bound);
+    } else {
+        _lower_bound = position_in_partition(rt.position());
+        _lower_bound_changed = true;
+    }
    push_mutation_fragment(std::move(rt));
 }

@@ -657,7 +703,7 @@ inline flat_mutation_reader make_cache_flat_mutation_reader(schema_ptr s,
                                                            query::clustering_key_filter_ranges crr,
                                                            row_cache& cache,
                                                            lw_shared_ptr<cache::read_context> ctx,
-                                                            lw_shared_ptr<partition_snapshot> snp)
+                                                            partition_snapshot_ptr snp)
 {
    return make_flat_mutation_reader<cache::cache_flat_mutation_reader>(
        std::move(s), std::move(dk), std::move(crr), std::move(ctx), std::move(snp), cache);
--- a/cell_locking.hh
+++ b/cell_locking.hh
@@ -23,29 +23,15 @@

 #include <boost/intrusive/unordered_set.hpp>

-#if __has_include(<boost/container/small_vector.hpp>)
-
-#include <boost/container/small_vector.hpp>
-
-template <typename T, size_t N>
-using small_vector = boost::container::small_vector<T, N>;
-
-#else
-
-#include <vector>
-template <typename T, size_t N>
-using small_vector = std::vector<T>;
-
-#endif
-
-#include "fnv1a_hasher.hh"
+#include "utils/small_vector.hh"
 #include "mutation_fragment.hh"
 #include "mutation_partition.hh"
+#include "xx_hasher.hh"

 #include "db/timeout_clock.hh"

 class cells_range {
-    using ids_vector_type = small_vector<column_id, 5>;
+    using ids_vector_type = utils::small_vector<column_id, 5>;

    position_in_partition_view _position;
    ids_vector_type _ids;
@@ -208,10 +194,10 @@ private:
            explicit hasher(const schema& s) : _schema(&s) { }

            size_t operator()(const cell_address& ca) const {
-                fnv1a_hasher hasher;
+                xx_hasher hasher;
                ca.position.feed_hash(hasher, *_schema);
                ::feed_hash(hasher, ca.id);
-                return hasher.finalize();
+                return static_cast<size_t>(hasher.finalize_uint64());
            }
            size_t operator()(const cell_entry& ce) const {
                return operator()(ce._address);
--- a/clustering_bounds_comparator.hh
+++ b/clustering_bounds_comparator.hh
@@ -22,6 +22,7 @@

 #pragma once

+#include <functional>
 #include "keys.hh"
 #include "schema.hh"
 #include "range.hh"
@@ -43,22 +44,20 @@ bound_kind invert_kind(bound_kind k);
 int32_t weight(bound_kind k);

 class bound_view {
+    const static thread_local clustering_key _empty_prefix;
+    std::reference_wrapper<const clustering_key_prefix> _prefix;
+    bound_kind _kind;
 public:
-    const static thread_local clustering_key empty_prefix;
-    const clustering_key_prefix& prefix;
-    bound_kind kind;
    bound_view(const clustering_key_prefix& prefix, bound_kind kind)
-        : prefix(prefix)
-        , kind(kind)
+        : _prefix(prefix)
+        , _kind(kind)
    { }
    bound_view(const bound_view& other) noexcept = default;
-    bound_view& operator=(const bound_view& other) noexcept {
-        if (this != &other) {
-            this->~bound_view();
-            new (this) bound_view(other);
-        }
-        return *this;
-    }
+    bound_view& operator=(const bound_view& other) noexcept = default;
+
+    bound_kind kind() const { return _kind; }
+    const clustering_key_prefix& prefix() const { return _prefix; }
+
    struct tri_compare {
        // To make it assignable and to avoid taking a schema_ptr, we
        // wrap the schema reference.
@@ -82,13 +81,13 @@ public:
            return d1 < d2 ? w1 - (w1 <= 0) : -(w2 - (w2 <= 0));
        }
        int operator()(const bound_view b, const clustering_key_prefix& p) const {
-            return operator()(b.prefix, weight(b.kind), p, 0);
+            return operator()(b._prefix, weight(b._kind), p, 0);
        }
        int operator()(const clustering_key_prefix& p, const bound_view b) const {
-            return operator()(p, 0, b.prefix, weight(b.kind));
+            return operator()(p, 0, b._prefix, weight(b._kind));
        }
        int operator()(const bound_view b1, const bound_view b2) const {
-            return operator()(b1.prefix, weight(b1.kind), b2.prefix, weight(b2.kind));
+            return operator()(b1._prefix, weight(b1._kind), b2._prefix, weight(b2._kind));
        }
    };
    struct compare {
@@ -101,26 +100,26 @@ public:
            return _cmp(p1, w1, p2, w2) < 0;
        }
        bool operator()(const bound_view b, const clustering_key_prefix& p) const {
-            return operator()(b.prefix, weight(b.kind), p, 0);
+            return operator()(b._prefix, weight(b._kind), p, 0);
        }
        bool operator()(const clustering_key_prefix& p, const bound_view b) const {
-            return operator()(p, 0, b.prefix, weight(b.kind));
+            return operator()(p, 0, b._prefix, weight(b._kind));
        }
        bool operator()(const bound_view b1, const bound_view b2) const {
-            return operator()(b1.prefix, weight(b1.kind), b2.prefix, weight(b2.kind));
+            return operator()(b1._prefix, weight(b1._kind), b2._prefix, weight(b2._kind));
        }
    };
    bool equal(const schema& s, const bound_view other) const {
-        return kind == other.kind && prefix.equal(s, other.prefix);
+        return _kind == other._kind && _prefix.get().equal(s, other._prefix.get());
    }
    bool adjacent(const schema& s, const bound_view other) const {
-        return invert_kind(other.kind) == kind && prefix.equal(s, other.prefix);
+        return invert_kind(other._kind) == _kind && _prefix.get().equal(s, other._prefix.get());
    }
    static bound_view bottom() {
-        return {empty_prefix, bound_kind::incl_start};
+        return {_empty_prefix, bound_kind::incl_start};
    }
    static bound_view top() {
-        return {empty_prefix, bound_kind::incl_end};
+        return {_empty_prefix, bound_kind::incl_end};
    }
    template<template<typename> typename R>
    GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
@@ -144,13 +143,13 @@ public:
    template<template<typename> typename R>
    GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
    static stdx::optional<typename R<clustering_key_prefix_view>::bound> to_range_bound(const bound_view& bv) {
-        if (&bv.prefix == &empty_prefix) {
+        if (&bv._prefix.get() == &_empty_prefix) {
            return {};
        }
-        bool inclusive = bv.kind != bound_kind::excl_end && bv.kind != bound_kind::excl_start;
-        return {typename R<clustering_key_prefix_view>::bound(bv.prefix.view(), inclusive)};
+        bool inclusive = bv._kind != bound_kind::excl_end && bv._kind != bound_kind::excl_start;
+        return {typename R<clustering_key_prefix_view>::bound(bv._prefix.get().view(), inclusive)};
    }
    friend std::ostream& operator<<(std::ostream& out, const bound_view& b) {
-        return out << "{bound: prefix=" << b.prefix << ", kind=" << b.kind << "}";
+        return out << "{bound: prefix=" << b._prefix.get() << ", kind=" << b._kind << "}";
    }
 };
--- a/clustering_key_filter.hh
+++ b/clustering_key_filter.hh
@@ -30,7 +30,7 @@ namespace query {

 class clustering_key_filter_ranges {
    clustering_row_ranges _storage;
-    const clustering_row_ranges& _ref;
+    std::reference_wrapper<const clustering_row_ranges> _ref;
 public:
    clustering_key_filter_ranges(const clustering_row_ranges& ranges) : _ref(ranges) { }
    struct reversed { };
@@ -39,21 +39,21 @@ public:

    clustering_key_filter_ranges(clustering_key_filter_ranges&& other) noexcept
        : _storage(std::move(other._storage))
-        , _ref(&other._ref == &other._storage ? _storage : other._ref)
+        , _ref(&other._ref.get() == &other._storage ? _storage : other._ref.get())
    { }

    clustering_key_filter_ranges& operator=(clustering_key_filter_ranges&& other) noexcept {
        if (this != &other) {
-            this->~clustering_key_filter_ranges();
-            new (this) clustering_key_filter_ranges(std::move(other));
+            _storage = std::move(other._storage);
+            _ref = (&other._ref.get() == &other._storage) ? _storage : other._ref.get();
        }
        return *this;
    }

-    auto begin() const { return _ref.begin(); }
-    auto end() const { return _ref.end(); }
-    bool empty() const { return _ref.empty(); }
-    size_t size() const { return _ref.size(); }
+    auto begin() const { return _ref.get().begin(); }
+    auto end() const { return _ref.get().end(); }
+    bool empty() const { return _ref.get().empty(); }
+    size_t size() const { return _ref.get().size(); }
    const clustering_row_ranges& ranges() const { return _ref; }

    static clustering_key_filter_ranges get_ranges(const schema& schema, const query::partition_slice& slice, const partition_key& key) {
--- a/clustering_ranges_walker.hh
+++ b/clustering_ranges_walker.hh
@@ -31,72 +31,61 @@
 class clustering_ranges_walker {
    const schema& _schema;
    const query::clustering_row_ranges& _ranges;
-    query::clustering_row_ranges::const_iterator _current;
-    query::clustering_row_ranges::const_iterator _end;
+    boost::iterator_range<query::clustering_row_ranges::const_iterator> _current_range;
    bool _in_current; // next position is known to be >= _current_start
    bool _with_static_row;
    position_in_partition_view _current_start;
    position_in_partition_view _current_end;
-    stdx::optional<position_in_partition> _trim;
+    std::optional<position_in_partition> _trim;
    size_t _change_counter = 1;
 private:
    bool advance_to_next_range() {
        _in_current = false;
        if (!_current_start.is_static_row()) {
-            if (_current == _end) {
+            if (!_current_range) {
                return false;
            }
-            ++_current;
+            _current_range.advance_begin(1);
        }
        ++_change_counter;
-        if (_current == _end) {
+        if (!_current_range) {
            _current_end = _current_start = position_in_partition_view::after_all_clustered_rows();
            return false;
        }
-        _current_start = position_in_partition_view::for_range_start(*_current);
-        _current_end = position_in_partition_view::for_range_end(*_current);
+        _current_start = position_in_partition_view::for_range_start(_current_range.front());
+        _current_end = position_in_partition_view::for_range_end(_current_range.front());
        return true;
    }
-public:
-    clustering_ranges_walker(const schema& s, const query::clustering_row_ranges& ranges, bool with_static_row = true)
-        : _schema(s)
-        , _ranges(ranges)
-        , _current(ranges.begin())
-        , _end(ranges.end())
-        , _in_current(with_static_row)
-        , _with_static_row(with_static_row)
-        , _current_start(position_in_partition_view::for_static_row())
-        , _current_end(position_in_partition_view::before_all_clustered_rows())
-    {
-        if (!with_static_row) {
-            if (_current == _end) {
+
+    void set_current_positions() {
+         if (!_with_static_row) {
+            if (!_current_range) {
                _current_start = position_in_partition_view::before_all_clustered_rows();
            } else {
-                _current_start = position_in_partition_view::for_range_start(*_current);
-                _current_end = position_in_partition_view::for_range_end(*_current);
+                _current_start = position_in_partition_view::for_range_start(_current_range.front());
+                _current_end = position_in_partition_view::for_range_end(_current_range.front());
            }
        }
    }
-    clustering_ranges_walker(clustering_ranges_walker&& o) noexcept
-        : _schema(o._schema)
-        , _ranges(o._ranges)
-        , _current(o._current)
-        , _end(o._end)
-        , _in_current(o._in_current)
-        , _with_static_row(o._with_static_row)
-        , _current_start(o._current_start)
-        , _current_end(o._current_end)
-        , _trim(std::move(o._trim))
-        , _change_counter(o._change_counter)
-    { }
-    clustering_ranges_walker& operator=(clustering_ranges_walker&& o) {
-        if (this != &o) {
-            this->~clustering_ranges_walker();
-            new (this) clustering_ranges_walker(std::move(o));
-        }
-        return *this;
+
+public:
+    clustering_ranges_walker(const schema& s, const query::clustering_row_ranges& ranges, bool with_static_row = true)
+            : _schema(s)
+            , _ranges(ranges)
+            , _current_range(ranges)
+            , _in_current(with_static_row)
+            , _with_static_row(with_static_row)
+            , _current_start(position_in_partition_view::for_static_row())
+            , _current_end(position_in_partition_view::before_all_clustered_rows()) {
+        set_current_positions();
    }

+    clustering_ranges_walker(const clustering_ranges_walker&) = delete;
+    clustering_ranges_walker(clustering_ranges_walker&&) = delete;
+
+    clustering_ranges_walker& operator=(const clustering_ranges_walker&) = delete;
+    clustering_ranges_walker& operator=(clustering_ranges_walker&&) = delete;
+
    // Excludes positions smaller than pos from the ranges.
    // pos should be monotonic.
    // No constraints between pos and positions passed to advance_to().
@@ -173,17 +162,15 @@ public:
            return false;
        }

-        auto i = _current;
-        while (i != _end) {
-            auto range_start = position_in_partition_view::for_range_start(*i);
+        for (const auto& rng : _current_range) {
+            auto range_start = position_in_partition_view::for_range_start(rng);
            if (!less(range_start, end)) {
                return false;
            }
-            auto range_end = position_in_partition_view::for_range_end(*i);
+            auto range_end = position_in_partition_view::for_range_end(rng);
            if (less(start, range_end)) {
                return true;
            }
-            ++i;
        }

        return false;
@@ -191,18 +178,20 @@ public:

    // Returns true if advanced past all contained positions. Any later advance_to() until reset() will return false.
    bool out_of_range() const {
-        return !_in_current && _current == _end;
+        return !_in_current && !_current_range;
    }

    // Resets the state of the walker so that advance_to() can be now called for new sequence of positions.
    // Any range trimmings still hold after this.
    void reset() {
-        auto trim = std::move(_trim);
-        auto ctr = _change_counter;
-        *this = clustering_ranges_walker(_schema, _ranges, _with_static_row);
-        _change_counter = ctr + 1;
-        if (trim) {
-            trim_front(std::move(*trim));
+        _current_range = _ranges;
+        _in_current = _with_static_row;
+        _current_start = position_in_partition_view::for_static_row();
+        _current_end = position_in_partition_view::before_all_clustered_rows();
+        set_current_positions();
+        ++_change_counter;
+        if (_trim) {
+            trim_front(*std::exchange(_trim, {}));
        }
    }

@@ -211,6 +200,11 @@ public:
        return _current_start;
    }

+    // Returns the upper bound of the last range in provided ranges set
+    position_in_partition_view uppermost_bound() const {
+        return position_in_partition_view::for_range_end(_ranges.back());
+    }
+
    // When lower_bound() changes, this also does
    // Always > 0.
    size_t lower_bound_change_counter() const {
--- a/compaction_strategy.hh
+++ b/compaction_strategy.hh
@@ -25,7 +25,8 @@
 #include "exceptions/exceptions.hh"
 #include "sstables/compaction_backlog_manager.hh"

-class column_family;
+class table;
+using column_family = table;
 class schema;
 using schema_ptr = lw_shared_ptr<const schema>;

--- a/compatible_ring_position.hh
+++ b/compatible_ring_position.hh
@@ -1,67 +0,0 @@
-/*
- * Copyright (C) 2016 ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-
-#pragma once
-
-#include "query-request.hh"
-#include <experimental/optional>
-
-// Wraps ring_position so it is compatible with old-style C++: default constructor,
-// stateless comparators, yada yada
-class compatible_ring_position {
-    const schema* _schema = nullptr;
-    // optional to supply a default constructor, no more
-    std::experimental::optional<dht::ring_position> _rp;
-public:
-    compatible_ring_position() noexcept = default;
-    compatible_ring_position(const schema& s, const dht::ring_position& rp)
-            : _schema(&s), _rp(rp) {
-    }
-    compatible_ring_position(const schema& s, dht::ring_position&& rp)
-            : _schema(&s), _rp(std::move(rp)) {
-    }
-    const dht::token& token() const {
-        return _rp->token();
-    }
-    friend int tri_compare(const compatible_ring_position& x, const compatible_ring_position& y) {
-        return x._rp->tri_compare(*x._schema, *y._rp);
-    }
-    friend bool operator<(const compatible_ring_position& x, const compatible_ring_position& y) {
-        return tri_compare(x, y) < 0;
-    }
-    friend bool operator<=(const compatible_ring_position& x, const compatible_ring_position& y) {
-        return tri_compare(x, y) <= 0;
-    }
-    friend bool operator>(const compatible_ring_position& x, const compatible_ring_position& y) {
-        return tri_compare(x, y) > 0;
-    }
-    friend bool operator>=(const compatible_ring_position& x, const compatible_ring_position& y) {
-        return tri_compare(x, y) >= 0;
-    }
-    friend bool operator==(const compatible_ring_position& x, const compatible_ring_position& y) {
-        return tri_compare(x, y) == 0;
-    }
-    friend bool operator!=(const compatible_ring_position& x, const compatible_ring_position& y) {
-        return tri_compare(x, y) != 0;
-    }
-};
-
--- a/compatible_ring_position_view.hh
+++ b/compatible_ring_position_view.hh
@@ -0,0 +1,64 @@
+/*
+ * Copyright (C) 2016 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+#pragma once
+
+#include "query-request.hh"
+#include <optional>
+
+// Wraps ring_position_view so it is compatible with old-style C++: default
+// constructor, stateless comparators, yada yada.
+class compatible_ring_position_view {
+    const schema* _schema = nullptr;
+    // Optional to supply a default constructor, no more.
+    std::optional<dht::ring_position_view> _rpv;
+public:
+    constexpr compatible_ring_position_view() = default;
+    compatible_ring_position_view(const schema& s, dht::ring_position_view rpv)
+        : _schema(&s), _rpv(rpv) {
+    }
+    const dht::ring_position_view& position() const {
+        return *_rpv;
+    }
+    friend int tri_compare(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
+        return dht::ring_position_tri_compare(*x._schema, *x._rpv, *y._rpv);
+    }
+    friend bool operator<(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
+        return tri_compare(x, y) < 0;
+    }
+    friend bool operator<=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
+        return tri_compare(x, y) <= 0;
+    }
+    friend bool operator>(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
+        return tri_compare(x, y) > 0;
+    }
+    friend bool operator>=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
+        return tri_compare(x, y) >= 0;
+    }
+    friend bool operator==(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
+        return tri_compare(x, y) == 0;
+    }
+    friend bool operator!=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
+        return tri_compare(x, y) != 0;
+    }
+};
+
--- a/compound_compat.hh
+++ b/compound_compat.hh
@@ -25,6 +25,7 @@
 #include <boost/range/adaptor/transformed.hpp>
 #include "compound.hh"
 #include "schema.hh"
+#include "sstables/version.hh"

 //
 // This header provides adaptors between the representation used by our compound_type<>
@@ -302,7 +303,7 @@ private:
    }
 public:
    template <typename Describer>
-    auto describe_type(Describer f) const {
+    auto describe_type(sstables::sstable_version_types v, Describer f) const {
        return f(const_cast<bytes&>(_bytes));
    }

--- a/compress.cc
+++ b/compress.cc
@@ -112,7 +112,7 @@ const sstring compression_parameters::CHUNK_LENGTH_KB = "chunk_length_kb";
 const sstring compression_parameters::CRC_CHECK_CHANCE = "crc_check_chance";

 compression_parameters::compression_parameters()
-    : compression_parameters(nullptr)
+    : compression_parameters(compressor::lz4)
 {}

 compression_parameters::~compression_parameters()
@@ -241,7 +241,7 @@ size_t lz4_processor::compress(const char* input, size_t input_len,
    output[1] = (input_len >> 8) & 0xFF;
    output[2] = (input_len >> 16) & 0xFF;
    output[3] = (input_len >> 24) & 0xFF;
-#ifdef HAVE_LZ4_COMPRESS_DEFAULT
+#ifdef SEASTAR_HAVE_LZ4_COMPRESS_DEFAULT
    auto ret = LZ4_compress_default(input, output + 4, input_len, LZ4_compressBound(input_len));
 #else
    auto ret = LZ4_compress(input, output + 4, input_len);
--- a/compress.hh
+++ b/compress.hh
@@ -118,6 +118,10 @@ public:
    std::map<sstring, sstring> get_options() const;
    bool operator==(const compression_parameters& other) const;
    bool operator!=(const compression_parameters& other) const;
+
+    static compression_parameters no_compression() {
+        return compression_parameters(nullptr);
+    }
 private:
    void validate_options(const std::map<sstring, sstring>&);
 };
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -242,6 +242,9 @@ batch_size_fail_threshold_in_kb: 50

 # The directory where hints files are stored if hinted handoff is enabled.
 # hints_directory: /var/lib/scylla/hints
+ 
+# The directory where hints files are stored for materialized-view updates
+# view_hints_directory: /var/lib/scylla/view_hints

 # See http://wiki.apache.org/cassandra/HintedHandoff
 # May either be "true" or "false" to enable globally, or contain a list
--- a/configure.py
+++ b/configure.py
--- a/converting_mutation_partition_applier.hh
+++ b/converting_mutation_partition_applier.hh
@@ -38,28 +38,45 @@ private:
    static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
        return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
    }
-    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
-        if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
-            dst.apply(new_def, atomic_cell_or_collection(cell));
+    static atomic_cell upgrade_cell(const abstract_type& new_type, const abstract_type& old_type, atomic_cell_view cell,
+                                    atomic_cell::collection_member cm = atomic_cell::collection_member::no) {
+        if (cell.is_live() && !old_type.is_counter()) {
+            if (cell.is_live_and_has_ttl()) {
+                return atomic_cell::make_live(new_type, cell.timestamp(), cell.value().linearize(), cell.expiry(), cell.ttl(), cm);
+            }
+            return atomic_cell::make_live(new_type, cell.timestamp(), cell.value().linearize(), cm);
+        } else {
+            return atomic_cell(new_type, cell);
        }
    }
+    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
+        if (!is_compatible(new_def, old_type, kind) || cell.timestamp() <= new_def.dropped_at()) {
+            return;
+        }
+        dst.apply(new_def, upgrade_cell(*new_def.type, *old_type, cell));
+    }
    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
        if (!is_compatible(new_def, old_type, kind)) {
            return;
        }
-        auto&& ctype = static_pointer_cast<const collection_type_impl>(old_type);
-        auto old_view = ctype->deserialize_mutation_form(cell);
+      cell.data.with_linearized([&] (bytes_view cell_bv) {
+        auto new_ctype = static_pointer_cast<const collection_type_impl>(new_def.type);
+        auto old_ctype = static_pointer_cast<const collection_type_impl>(old_type);
+        auto old_view = old_ctype->deserialize_mutation_form(cell_bv);

-        collection_type_impl::mutation_view new_view;
+        collection_type_impl::mutation new_view;
        if (old_view.tomb.timestamp > new_def.dropped_at()) {
            new_view.tomb = old_view.tomb;
        }
        for (auto& c : old_view.cells) {
            if (c.second.timestamp() > new_def.dropped_at()) {
-                new_view.cells.emplace_back(std::move(c));
+                new_view.cells.emplace_back(c.first, upgrade_cell(*new_ctype->value_comparator(), *old_ctype->value_comparator(), c.second, atomic_cell::collection_member::yes));
            }
        }
-        dst.apply(new_def, ctype->serialize_mutation_form(std::move(new_view)));
+        if (new_view.tomb || !new_view.cells.empty()) {
+            dst.apply(new_def, new_ctype->serialize_mutation_form(std::move(new_view)));
+        }
+      });
    }
 public:
    converting_mutation_partition_applier(
@@ -75,6 +92,10 @@ public:
        _p.apply(t);
    }

+    void accept_static_cell(column_id id, atomic_cell cell) {
+        return accept_static_cell(id, atomic_cell_view(cell));
+    }
+
    virtual void accept_static_cell(column_id id, atomic_cell_view cell) override {
        const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
@@ -102,6 +123,10 @@ public:
        _current_row = &r;
    }

+    void accept_row_cell(column_id id, atomic_cell cell) {
+        return accept_row_cell(id, atomic_cell_view(cell));
+    }
+
    virtual void accept_row_cell(column_id id, atomic_cell_view cell) override {
        const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
@@ -120,11 +145,11 @@ public:

    // Appends the cell to dst upgrading it to the new schema.
    // Cells must have monotonic names.
-    static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, const atomic_cell_or_collection& cell) {
+    static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const column_definition& old_def, const atomic_cell_or_collection& cell) {
        if (new_def.is_atomic()) {
-            accept_cell(dst, kind, new_def, old_type, cell.as_atomic_cell());
+            accept_cell(dst, kind, new_def, old_def.type, cell.as_atomic_cell(old_def));
        } else {
-            accept_cell(dst, kind, new_def, old_type, cell.as_collection_mutation());
+            accept_cell(dst, kind, new_def, old_def.type, cell.as_collection_mutation());
        }
    }
 };
--- a/counters.cc
+++ b/counters.cc
@@ -78,10 +78,10 @@ std::vector<counter_shard> counter_cell_view::shards_compatible_with_1_7_4() con
    return sorted_shards;
 }

-static bool apply_in_place(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
+static bool apply_in_place(const column_definition& cdef, atomic_cell_mutable_view dst, atomic_cell_mutable_view src)
 {
-    auto dst_ccmv = counter_cell_mutable_view(dst.as_mutable_atomic_cell());
-    auto src_ccmv = counter_cell_mutable_view(src.as_mutable_atomic_cell());
+    auto dst_ccmv = counter_cell_mutable_view(dst);
+    auto src_ccmv = counter_cell_mutable_view(src);
    auto dst_shards = dst_ccmv.shards();
    auto src_shards = src_ccmv.shards();

@@ -118,48 +118,19 @@ static bool apply_in_place(atomic_cell_or_collection& dst, atomic_cell_or_collec
    auto src_ts = src_ccmv.timestamp();
    dst_ccmv.set_timestamp(std::max(dst_ts, src_ts));
    src_ccmv.set_timestamp(dst_ts);
-    src.as_mutable_atomic_cell().set_counter_in_place_revert(true);
    return true;
 }

-static void revert_in_place_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
+void counter_cell_view::apply(const column_definition& cdef, atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
 {
-    assert(dst.can_use_mutable_view() && src.can_use_mutable_view());
-    auto dst_ccmv = counter_cell_mutable_view(dst.as_mutable_atomic_cell());
-    auto src_ccmv = counter_cell_mutable_view(src.as_mutable_atomic_cell());
-    auto dst_shards = dst_ccmv.shards();
-    auto src_shards = src_ccmv.shards();
-
-    auto dst_it = dst_shards.begin();
-    auto src_it = src_shards.begin();
-
-    while (src_it != src_shards.end()) {
-        while (dst_it != dst_shards.end() && dst_it->id() < src_it->id()) {
-            ++dst_it;
-        }
-        assert(dst_it != dst_shards.end() && dst_it->id() == src_it->id());
-        dst_it->swap_value_and_clock(*src_it);
-        ++src_it;
-    }
-
-    auto dst_ts = dst_ccmv.timestamp();
-    auto src_ts = src_ccmv.timestamp();
-    dst_ccmv.set_timestamp(src_ts);
-    src_ccmv.set_timestamp(dst_ts);
-    src.as_mutable_atomic_cell().set_counter_in_place_revert(false);
-}
-
-bool counter_cell_view::apply_reversibly(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
-{
-    auto dst_ac = dst.as_atomic_cell();
-    auto src_ac = src.as_atomic_cell();
+    auto dst_ac = dst.as_atomic_cell(cdef);
+    auto src_ac = src.as_atomic_cell(cdef);

    if (!dst_ac.is_live() || !src_ac.is_live()) {
        if (dst_ac.is_live() || (!src_ac.is_live() && compare_atomic_cell_for_merge(dst_ac, src_ac) < 0)) {
            std::swap(dst, src);
-            return true;
        }
-        return false;
+        return;
    }

    if (dst_ac.is_counter_update() && src_ac.is_counter_update()) {
@@ -167,22 +138,26 @@ bool counter_cell_view::apply_reversibly(atomic_cell_or_collection& dst, atomic_
        auto dst_v = dst_ac.counter_update_value();
        dst = atomic_cell::make_live_counter_update(std::max(dst_ac.timestamp(), src_ac.timestamp()),
                                                    src_v + dst_v);
-        return true;
+        return;
    }

    assert(!dst_ac.is_counter_update());
    assert(!src_ac.is_counter_update());
+ with_linearized(dst_ac, [&] (counter_cell_view dst_ccv) {
+  with_linearized(src_ac, [&] (counter_cell_view src_ccv) {

-    if (counter_cell_view(dst_ac).shard_count() >= counter_cell_view(src_ac).shard_count()
-        && dst.can_use_mutable_view() && src.can_use_mutable_view()) {
-        if (apply_in_place(dst, src)) {
-            return true;
+    if (dst_ccv.shard_count() >= src_ccv.shard_count()) {
+        auto dst_amc = dst.as_mutable_atomic_cell(cdef);
+        auto src_amc = src.as_mutable_atomic_cell(cdef);
+        if (!dst_amc.is_value_fragmented() && !src_amc.is_value_fragmented()) {
+            if (apply_in_place(cdef, dst_amc, src_amc)) {
+                return;
+            }
        }
    }

-    src.as_mutable_atomic_cell().set_counter_in_place_revert(false);
-    auto dst_shards = counter_cell_view(dst_ac).shards();
-    auto src_shards = counter_cell_view(src_ac).shards();
+    auto dst_shards = dst_ccv.shards();
+    auto src_shards = src_ccv.shards();

    counter_cell_builder result;
    combine(dst_shards.begin(), dst_shards.end(), src_shards.begin(), src_shards.end(),
@@ -191,22 +166,9 @@ bool counter_cell_view::apply_reversibly(atomic_cell_or_collection& dst, atomic_
            });

    auto cell = result.build(std::max(dst_ac.timestamp(), src_ac.timestamp()));
-    src = std::exchange(dst, atomic_cell_or_collection(cell));
-    return true;
-}
-
-void counter_cell_view::revert_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
-{
-    if (dst.as_atomic_cell().is_counter_update()) {
-        auto src_v = src.as_atomic_cell().counter_update_value();
-        auto dst_v = dst.as_atomic_cell().counter_update_value();
-        dst = atomic_cell::make_live(dst.as_atomic_cell().timestamp(),
-                                     long_type->decompose(dst_v - src_v));
-    } else if (src.as_atomic_cell().is_counter_in_place_revert_set()) {
-        revert_in_place_apply(dst, src);
-    } else {
-        std::swap(dst, src);
-    }
+    src = std::exchange(dst, atomic_cell_or_collection(std::move(cell)));
+  });
+ });
 }

 stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, atomic_cell_view b)
@@ -216,13 +178,15 @@ stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, at

    if (!b.is_live() || !a.is_live()) {
        if (b.is_live() || (!a.is_live() && compare_atomic_cell_for_merge(b, a) < 0)) {
-            return atomic_cell(a);
+            return atomic_cell(*counter_type, a);
        }
        return { };
    }

-    auto a_shards = counter_cell_view(a).shards();
-    auto b_shards = counter_cell_view(b).shards();
+ return with_linearized(a, [&] (counter_cell_view a_ccv) {
+  return with_linearized(b, [&] (counter_cell_view b_ccv) {
+    auto a_shards = a_ccv.shards();
+    auto b_shards = b_ccv.shards();

    auto a_it = a_shards.begin();
    auto a_end = a_shards.end();
@@ -244,18 +208,21 @@ stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, at
    if (!result.empty()) {
        diff = result.build(std::max(a.timestamp(), b.timestamp()));
    } else if (a.timestamp() > b.timestamp()) {
-        diff = atomic_cell::make_live(a.timestamp(), bytes_view());
+        diff = atomic_cell::make_live(*counter_type, a.timestamp(), bytes_view());
    }
    return diff;
+  });
+ });
 }


 void transform_counter_updates_to_shards(mutation& m, const mutation* current_state, uint64_t clock_offset) {
    // FIXME: allow current_state to be frozen_mutation

-    auto transform_new_row_to_shards = [clock_offset] (auto& cells) {
-        cells.for_each_cell([clock_offset] (auto, atomic_cell_or_collection& ac_o_c) {
-            auto acv = ac_o_c.as_atomic_cell();
+    auto transform_new_row_to_shards = [&s = *m.schema(), clock_offset] (column_kind kind, auto& cells) {
+        cells.for_each_cell([&] (column_id id, atomic_cell_or_collection& ac_o_c) {
+            auto& cdef = s.column_at(kind, id);
+            auto acv = ac_o_c.as_atomic_cell(cdef);
            if (!acv.is_live()) {
                return; // continue -- we are in lambda
            }
@@ -266,32 +233,35 @@ void transform_counter_updates_to_shards(mutation& m, const mutation* current_st
    };

    if (!current_state) {
-        transform_new_row_to_shards(m.partition().static_row());
+        transform_new_row_to_shards(column_kind::static_column, m.partition().static_row());
        for (auto& cr : m.partition().clustered_rows()) {
-            transform_new_row_to_shards(cr.row().cells());
+            transform_new_row_to_shards(column_kind::regular_column, cr.row().cells());
        }
        return;
    }

    clustering_key::less_compare cmp(*m.schema());

-    auto transform_row_to_shards = [clock_offset] (auto& transformee, auto& state) {
+    auto transform_row_to_shards = [&s = *m.schema(), clock_offset] (column_kind kind, auto& transformee, auto& state) {
        std::deque<std::pair<column_id, counter_shard>> shards;
        state.for_each_cell([&] (column_id id, const atomic_cell_or_collection& ac_o_c) {
-            auto acv = ac_o_c.as_atomic_cell();
+            auto& cdef = s.column_at(kind, id);
+            auto acv = ac_o_c.as_atomic_cell(cdef);
            if (!acv.is_live()) {
                return; // continue -- we are in lambda
            }
-            counter_cell_view ccv(acv);
+          counter_cell_view::with_linearized(acv, [&] (counter_cell_view ccv) {
            auto cs = ccv.local_shard();
            if (!cs) {
                return; // continue
            }
            shards.emplace_back(std::make_pair(id, counter_shard(*cs)));
+          });
        });

        transformee.for_each_cell([&] (column_id id, atomic_cell_or_collection& ac_o_c) {
-            auto acv = ac_o_c.as_atomic_cell();
+            auto& cdef = s.column_at(kind, id);
+            auto acv = ac_o_c.as_atomic_cell(cdef);
            if (!acv.is_live()) {
                return; // continue -- we are in lambda
            }
@@ -313,7 +283,7 @@ void transform_counter_updates_to_shards(mutation& m, const mutation* current_st
        });
    };

-    transform_row_to_shards(m.partition().static_row(), current_state->partition().static_row());
+    transform_row_to_shards(column_kind::static_column, m.partition().static_row(), current_state->partition().static_row());

    auto& cstate = current_state->partition();
    auto it = cstate.clustered_rows().begin();
@@ -323,10 +293,10 @@ void transform_counter_updates_to_shards(mutation& m, const mutation* current_st
            ++it;
        }
        if (it == end || cmp(cr.key(), it->key())) {
-            transform_new_row_to_shards(cr.row().cells());
+            transform_new_row_to_shards(column_kind::regular_column, cr.row().cells());
            continue;
        }

-        transform_row_to_shards(cr.row().cells(), it->row().cells());
+        transform_row_to_shards(column_kind::regular_column, cr.row().cells(), it->row().cells());
    }
 }
--- a/counters.hh
+++ b/counters.hh
@@ -79,7 +79,7 @@ static_assert(std::is_pod<counter_id>::value, "counter_id should be a POD type")

 std::ostream& operator<<(std::ostream& os, const counter_id& id);

-template<typename View>
+template<mutable_view is_mutable>
 class basic_counter_shard_view {
    enum class offset : unsigned {
        id = 0u,
@@ -88,7 +88,8 @@ class basic_counter_shard_view {
        total_size = unsigned(logical_clock) + sizeof(int64_t),
    };
 private:
-    typename View::pointer _base;
+    using pointer_type = std::conditional_t<is_mutable == mutable_view::no, const signed char*, signed char*>;
+    pointer_type _base;
 private:
    template<typename T>
    T read(offset off) const {
@@ -100,7 +101,7 @@ public:
    static constexpr auto size = size_t(offset::total_size);
 public:
    basic_counter_shard_view() = default;
-    explicit basic_counter_shard_view(typename View::pointer ptr) noexcept
+    explicit basic_counter_shard_view(pointer_type ptr) noexcept
        : _base(ptr) { }

    counter_id id() const { return read<counter_id>(offset::id); }
@@ -111,7 +112,7 @@ public:
        static constexpr size_t off = size_t(offset::value);
        static constexpr size_t size = size_t(offset::total_size) - off;

-        typename View::value_type tmp[size];
+        signed char tmp[size];
        std::copy_n(_base + off, size, tmp);
        std::copy_n(other._base + off, size, _base + off);
        std::copy_n(tmp, size, other._base + off);
@@ -138,7 +139,7 @@ public:
    };
 };

-using counter_shard_view = basic_counter_shard_view<bytes_view>;
+using counter_shard_view = basic_counter_shard_view<mutable_view::no>;

 std::ostream& operator<<(std::ostream& os, counter_shard_view csv);

@@ -198,7 +199,7 @@ public:
        return do_apply(other);
    }

-    static size_t serialized_size() {
+    static constexpr size_t serialized_size() {
        return counter_shard_view::size;
    }
    void serialize(bytes::iterator& out) const {
@@ -252,15 +253,33 @@ public:
    }

    atomic_cell build(api::timestamp_type timestamp) const {
-        return atomic_cell::make_live_from_serializer(timestamp, serialized_size(), [this] (bytes::iterator out) {
-            serialize(out);
-        });
+        // If we can assume that the counter shards never cross fragment boundaries
+        // the serialisation code gets much simpler.
+        static_assert(data::cell::maximum_external_chunk_length % counter_shard::serialized_size() == 0);
+
+        auto ac = atomic_cell::make_live_uninitialized(*counter_type, timestamp, serialized_size());
+
+        auto dst_it = ac.value().begin();
+        auto dst_current = *dst_it++;
+        for (auto&& cs : _shards) {
+            if (dst_current.empty()) {
+                dst_current = *dst_it++;
+            }
+            assert(!dst_current.empty());
+            auto value_dst = dst_current.data();
+            cs.serialize(value_dst);
+            dst_current.remove_prefix(counter_shard::serialized_size());
+        }
+        return ac;
    }

    static atomic_cell from_single_shard(api::timestamp_type timestamp, const counter_shard& cs) {
-        return atomic_cell::make_live_from_serializer(timestamp, counter_shard::serialized_size(), [&cs] (bytes::iterator out) {
-            cs.serialize(out);
-        });
+        // We don't really need to bother with fragmentation here.
+        static_assert(data::cell::maximum_external_chunk_length >= counter_shard::serialized_size());
+        auto ac = atomic_cell::make_live_uninitialized(*counter_type, timestamp, counter_shard::serialized_size());
+        auto dst = ac.value().first_fragment().begin();
+        cs.serialize(dst);
+        return ac;
    }

    class inserter_iterator : public std::iterator<std::output_iterator_tag, counter_shard> {
@@ -287,28 +306,32 @@ public:
 // <counter_id>   := <int64_t><int64_t>
 // <shard>        := <counter_id><int64_t:value><int64_t:logical_clock>
 // <counter_cell> := <shard>*
-template<typename View>
+template<mutable_view is_mutable>
 class basic_counter_cell_view {
 protected:
-    atomic_cell_base<View> _cell;
+    using linearized_value_view = std::conditional_t<is_mutable == mutable_view::no,
+                                                     bytes_view, bytes_mutable_view>;
+    using pointer_type = typename linearized_value_view::pointer;
+    basic_atomic_cell_view<is_mutable> _cell;
+    linearized_value_view _value;
 private:
-    class shard_iterator : public std::iterator<std::input_iterator_tag, basic_counter_shard_view<View>> {
-        typename View::pointer _current;
-        basic_counter_shard_view<View> _current_view;
+    class shard_iterator : public std::iterator<std::input_iterator_tag, basic_counter_shard_view<is_mutable>> {
+        pointer_type _current;
+        basic_counter_shard_view<is_mutable> _current_view;
    public:
        shard_iterator() = default;
-        shard_iterator(typename View::pointer ptr) noexcept
+        shard_iterator(pointer_type ptr) noexcept
            : _current(ptr), _current_view(ptr) { }

-        basic_counter_shard_view<View>& operator*() noexcept {
+        basic_counter_shard_view<is_mutable>& operator*() noexcept {
            return _current_view;
        }
-        basic_counter_shard_view<View>* operator->() noexcept {
+        basic_counter_shard_view<is_mutable>* operator->() noexcept {
            return &_current_view;
        }
        shard_iterator& operator++() noexcept {
            _current += counter_shard_view::size;
-            _current_view = basic_counter_shard_view<View>(_current);
+            _current_view = basic_counter_shard_view<is_mutable>(_current);
            return *this;
        }
        shard_iterator operator++(int) noexcept {
@@ -318,7 +341,7 @@ private:
        }
        shard_iterator& operator--() noexcept {
            _current -= counter_shard_view::size;
-            _current_view = basic_counter_shard_view<View>(_current);
+            _current_view = basic_counter_shard_view<is_mutable>(_current);
            return *this;
        }
        shard_iterator operator--(int) noexcept {
@@ -335,22 +358,23 @@ private:
    };
 public:
    boost::iterator_range<shard_iterator> shards() const {
-        auto bv = _cell.value();
-        auto begin = shard_iterator(bv.data());
-        auto end = shard_iterator(bv.data() + bv.size());
+        auto begin = shard_iterator(_value.data());
+        auto end = shard_iterator(_value.data() + _value.size());
        return boost::make_iterator_range(begin, end);
    }

    size_t shard_count() const {
-        return _cell.value().size() / counter_shard_view::size;
+        return _cell.value().size_bytes() / counter_shard_view::size;
    }
-public:
+protected:
    // ac must be a live counter cell
-    explicit basic_counter_cell_view(atomic_cell_base<View> ac) noexcept : _cell(ac) {
+    explicit basic_counter_cell_view(basic_atomic_cell_view<is_mutable> ac, linearized_value_view vv) noexcept
+        : _cell(ac), _value(vv)
+    {
        assert(_cell.is_live());
        assert(!_cell.is_counter_update());
    }
-
+public:
    api::timestamp_type timestamp() const { return _cell.timestamp(); }

    static data_type total_value_type() { return long_type; }
@@ -381,18 +405,22 @@ public:
    }
 };

-struct counter_cell_view : basic_counter_cell_view<bytes_view> {
+struct counter_cell_view : basic_counter_cell_view<mutable_view::no> {
    using basic_counter_cell_view::basic_counter_cell_view;

+    template<typename Function>
+    static decltype(auto) with_linearized(basic_atomic_cell_view<mutable_view::no> ac, Function&& fn) {
+        return ac.value().with_linearized([&] (bytes_view value_view) {
+            counter_cell_view ccv(ac, value_view);
+            return fn(ccv);
+        });
+    }
+
    // Returns counter shards in an order that is compatible with Scylla 1.7.4.
    std::vector<counter_shard> shards_compatible_with_1_7_4() const;

    // Reversibly applies two counter cells, at least one of them must be live.
-    // Returns true iff dst was modified.
-    static bool apply_reversibly(atomic_cell_or_collection& dst, atomic_cell_or_collection& src);
-
-    // Reverts apply performed by apply_reversible().
-    static void revert_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src);
+    static void apply(const column_definition& cdef, atomic_cell_or_collection& dst, atomic_cell_or_collection& src);

    // Computes a counter cell containing minimal amount of data which, when
    // applied to 'b' returns the same cell as 'a' and 'b' applied together.
@@ -401,9 +429,15 @@ struct counter_cell_view : basic_counter_cell_view<bytes_view> {
    friend std::ostream& operator<<(std::ostream& os, counter_cell_view ccv);
 };

-struct counter_cell_mutable_view : basic_counter_cell_view<bytes_mutable_view> {
+struct counter_cell_mutable_view : basic_counter_cell_view<mutable_view::yes> {
    using basic_counter_cell_view::basic_counter_cell_view;

+    explicit counter_cell_mutable_view(atomic_cell_mutable_view ac) noexcept
+        : basic_counter_cell_view<mutable_view::yes>(ac, ac.value().first_fragment())
+    {
+        assert(!ac.value().is_fragmented());
+    }
+
    void set_timestamp(api::timestamp_type ts) { _cell.set_timestamp(ts); }
 };

--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -373,7 +373,7 @@ useStatement returns [::shared_ptr<raw::use_statement> stmt]
    ;

 /**
- * SELECT <expression>
+ * SELECT [JSON] <expression>
 * FROM <CF>
 * WHERE KEY = "key1" AND COL > 1 AND COL < 100
 * LIMIT <NUMBER>;
@@ -384,9 +384,12 @@ selectStatement returns [shared_ptr<raw::select_statement> expr]
        ::shared_ptr<cql3::term::raw> limit;
        raw::select_statement::parameters::orderings_type orderings;
        bool allow_filtering = false;
+        bool is_json = false;
    }
-    : K_SELECT ( ( K_DISTINCT { is_distinct = true; } )?
-                 sclause=selectClause
+    : K_SELECT (
+                ( K_JSON { is_json = true; } )?
+                ( K_DISTINCT { is_distinct = true; } )?
+                sclause=selectClause
               )
      K_FROM cf=columnFamilyName
      ( K_WHERE wclause=whereClause )?
@@ -394,7 +397,7 @@ selectStatement returns [shared_ptr<raw::select_statement> expr]
      ( K_LIMIT rows=intValue { limit = rows; } )?
      ( K_ALLOW K_FILTERING  { allow_filtering = true; } )?
      {
-          auto params = ::make_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering);
+          auto params = ::make_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, is_json);
          $expr = ::make_shared<raw::select_statement>(std::move(cf), std::move(params),
            std::move(sclause), std::move(wclause), std::move(limit));
      }
@@ -448,33 +451,54 @@ orderByClause[raw::select_statement::parameters::orderings_type& orderings]
    : c=cident (K_ASC | K_DESC { reversed = true; })? { orderings.emplace_back(c, reversed); }
    ;

+jsonValue returns [::shared_ptr<cql3::term::raw> value]
+    :
+    | s=STRING_LITERAL { $value = cql3::constants::literal::string(sstring{$s.text}); }
+    | ':' id=ident     { $value = new_bind_variables(id); }
+    | QMARK            { $value = new_bind_variables(shared_ptr<cql3::column_identifier>{}); }
+    ;
+
 /**
 * INSERT INTO <CF> (<column>, <column>, <column>, ...)
 * VALUES (<value>, <value>, <value>, ...)
 * USING TIMESTAMP <long>;
 *
 */
-insertStatement returns [::shared_ptr<raw::insert_statement> expr]
+insertStatement returns [::shared_ptr<raw::modification_statement> expr]
    @init {
        auto attrs = ::make_shared<cql3::attributes::raw>();
        std::vector<::shared_ptr<cql3::column_identifier::raw>> column_names;
        std::vector<::shared_ptr<cql3::term::raw>> values;
        bool if_not_exists = false;
+        bool default_unset = false;
+        ::shared_ptr<cql3::term::raw> json_value;
    }
    : K_INSERT K_INTO cf=columnFamilyName
-          '(' c1=cident { column_names.push_back(c1); }  ( ',' cn=cident { column_names.push_back(cn); } )* ')'
-        K_VALUES
-          '(' v1=term { values.push_back(v1); } ( ',' vn=term { values.push_back(vn); } )* ')'
-
-        ( K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
-        ( usingClause[attrs] )?
-      {
-          $expr = ::make_shared<raw::insert_statement>(std::move(cf),
-                                                   std::move(attrs),
-                                                   std::move(column_names),
-                                                   std::move(values),
-                                                   if_not_exists);
-      }
+        ('(' c1=cident { column_names.push_back(c1); }  ( ',' cn=cident { column_names.push_back(cn); } )* ')'
+            K_VALUES
+            '(' v1=term { values.push_back(v1); } ( ',' vn=term { values.push_back(vn); } )* ')'
+            ( K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
+            ( usingClause[attrs] )?
+              {
+              $expr = ::make_shared<raw::insert_statement>(std::move(cf),
+                                                       std::move(attrs),
+                                                       std::move(column_names),
+                                                       std::move(values),
+                                                       if_not_exists);
+              }
+        | K_JSON
+          json_token=jsonValue { json_value = $json_token.value; }
+            ( K_DEFAULT K_UNSET { default_unset = true; } | K_DEFAULT K_NULL )?
+            ( K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
+            ( usingClause[attrs] )?
+              {
+              $expr = ::make_shared<raw::insert_json_statement>(std::move(cf),
+                                                       std::move(attrs),
+                                                       std::move(json_value),
+                                                       if_not_exists,
+                                                       default_unset);
+              }
+        )
    ;

 usingClause[::shared_ptr<cql3::attributes::raw> attrs]
@@ -1510,12 +1534,22 @@ inMarkerForTuple returns [shared_ptr<cql3::tuples::in_raw> marker]
    | ':' name=ident { $marker = new_tuple_in_bind_variables(name); }
    ;

-comparatorType returns [shared_ptr<cql3_type::raw> t]
-    : n=native_type     { $t = cql3_type::raw::from(n); }
-    | c=collection_type { $t = c; }
-    | tt=tuple_type     { $t = tt; }
+// The comparator_type rule is used for users' queries (internal=false)
+// and for internal calls from db::cql_type_parser::parse() (internal=true).
+// The latter is used for reading schemas stored in the system tables, and
+// may support additional column types that cannot be created through CQL,
+// but only internally through code. Today the only such type is "empty":
+// Scylla code internally creates columns with type "empty" or collections
+// "empty" to represent unselected columns in materialized views.
+// If a user (internal=false) tries to use "empty" as a type, it is treated -
+// as do all unknown types - as an attempt to use a user-defined type, and
+// we report this name is reserved (as for _reserved_type_names()).
+comparator_type [bool internal] returns [shared_ptr<cql3_type::raw> t]
+    : n=native_or_internal_type[internal]     { $t = cql3_type::raw::from(n); }
+    | c=collection_type[internal]   { $t = c; }
+    | tt=tuple_type[internal]       { $t = tt; }
    | id=userTypeName   { $t = cql3::cql3_type::raw::user_type(id); }
-    | K_FROZEN '<' f=comparatorType '>'
+    | K_FROZEN '<' f=comparator_type[internal] '>'
      {
        try {
            $t = cql3::cql3_type::raw::frozen(f);
@@ -1537,6 +1571,22 @@ comparatorType returns [shared_ptr<cql3_type::raw> t]
 #endif
    ;

+native_or_internal_type [bool internal] returns [shared_ptr<cql3_type> t]
+    : n=native_type     { $t = n; }
+    // The "internal" types, only supported when internal==true:
+    | K_EMPTY   {
+        if (internal) {
+            $t = cql3_type::empty;
+        } else {
+            add_recognition_error("Invalid (reserved) user type name empty");
+        }
+      }
+    ;
+
+comparatorType returns [shared_ptr<cql3_type::raw> t]
+    : tt=comparator_type[false]    { $t = tt; }
+    ;
+
 native_type returns [shared_ptr<cql3_type> t]
    : K_ASCII     { $t = cql3_type::ascii; }
    | K_BIGINT    { $t = cql3_type::bigint; }
@@ -1561,24 +1611,24 @@ native_type returns [shared_ptr<cql3_type> t]
    | K_TIME      { $t = cql3_type::time; }
    ;

-collection_type returns [shared_ptr<cql3::cql3_type::raw> pt]
-    : K_MAP  '<' t1=comparatorType ',' t2=comparatorType '>'
+collection_type [bool internal] returns [shared_ptr<cql3::cql3_type::raw> pt]
+    : K_MAP  '<' t1=comparator_type[internal] ',' t2=comparator_type[internal] '>'
        {
            // if we can't parse either t1 or t2, antlr will "recover" and we may have t1 or t2 null.
            if (t1 && t2) {
                $pt = cql3::cql3_type::raw::map(t1, t2);
            }
        }
-    | K_LIST '<' t=comparatorType '>'
+    | K_LIST '<' t=comparator_type[internal] '>'
        { if (t) { $pt = cql3::cql3_type::raw::list(t); } }
-    | K_SET  '<' t=comparatorType '>'
+    | K_SET  '<' t=comparator_type[internal] '>'
        { if (t) { $pt = cql3::cql3_type::raw::set(t); } }
    ;

-tuple_type returns [shared_ptr<cql3::cql3_type::raw> t]
+tuple_type [bool internal] returns [shared_ptr<cql3::cql3_type::raw> t]
        @init{ std::vector<shared_ptr<cql3::cql3_type::raw>> types; }
    : K_TUPLE '<'
-         t1=comparatorType { types.push_back(t1); } (',' tn=comparatorType { types.push_back(tn); })*
+         t1=comparator_type[internal] { types.push_back(t1); } (',' tn=comparator_type[internal] { types.push_back(tn); })*
      '>' { $t = cql3::cql3_type::raw::tuple(std::move(types)); }
    ;

@@ -1604,7 +1654,7 @@ unreserved_keyword returns [sstring str]

 unreserved_function_keyword returns [sstring str]
    : u=basic_unreserved_keyword { $str = u; }
-    | t=native_type              { $str = t->to_string(); }
+    | t=native_or_internal_type[true]   { $str = t->to_string(); }
    ;

 basic_unreserved_keyword returns [sstring str]
@@ -1650,6 +1700,7 @@ basic_unreserved_keyword returns [sstring str]
        | K_LANGUAGE
        | K_NON
        | K_DETERMINISTIC
+        | K_JSON
        ) { $str = $k.text; }
    ;

@@ -1786,6 +1837,11 @@ K_NON:         N O N;
 K_OR:          O R;
 K_REPLACE:     R E P L A C E;
 K_DETERMINISTIC: D E T E R M I N I S T I C;
+K_JSON:        J S O N;
+K_DEFAULT:     D E F A U L T;
+K_UNSET:       U N S E T;
+
+K_EMPTY:       E M P T Y;

 K_SCYLLA_TIMEUUID_LIST_INDEX: S C Y L L A '_' T I M E U U I D '_' L I S T '_' I N D E X;
 K_SCYLLA_COUNTER_SHARD_LIST: S C Y L L A '_' C O U N T E R '_' S H A R D '_' L I S T; 
--- a/cql3/attributes.cc
+++ b/cql3/attributes.cc
@@ -77,12 +77,14 @@ int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
    if (tval.is_unset_value()) {
        return now;
    }
+  return with_linearized(*tval, [] (bytes_view val) {
    try {
-        data_type_for<int64_t>()->validate(*tval);
+        data_type_for<int64_t>()->validate(val);
    } catch (marshal_exception& e) {
        throw exceptions::invalid_request_exception("Invalid timestamp value");
    }
-    return value_cast<int64_t>(data_type_for<int64_t>()->deserialize(*tval));
+    return value_cast<int64_t>(data_type_for<int64_t>()->deserialize(val));
+  });
 }

 int32_t attributes::get_time_to_live(const query_options& options) {
@@ -96,14 +98,16 @@ int32_t attributes::get_time_to_live(const query_options& options) {
    if (tval.is_unset_value()) {
        return 0;
    }
+  auto ttl = with_linearized(*tval, [] (bytes_view val) {
    try {
-        data_type_for<int32_t>()->validate(*tval);
+        data_type_for<int32_t>()->validate(val);
    }
    catch (marshal_exception& e) {
        throw exceptions::invalid_request_exception("Invalid TTL value");
    }

-    auto ttl = value_cast<int32_t>(data_type_for<int32_t>()->deserialize(*tval));
+    return value_cast<int32_t>(data_type_for<int32_t>()->deserialize(val));
+  });
    if (ttl < 0) {
        throw exceptions::invalid_request_exception("A TTL must be greater or equal to 0");
    }
--- a/cql3/authorized_prepared_statements_cache.hh
+++ b/cql3/authorized_prepared_statements_cache.hh
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "cql3/prepared_statements_cache.hh"
+
+namespace cql3 {
+
+struct authorized_prepared_statements_cache_size {
+    size_t operator()(const statements::prepared_statement::checked_weak_ptr& val) {
+        // TODO: improve the size approximation - most of the entry is occupied by the key here.
+        return 100;
+    }
+};
+
+class authorized_prepared_statements_cache_key {
+public:
+    using cache_key_type = std::pair<auth::authenticated_user, typename cql3::prepared_cache_key_type::cache_key_type>;
+private:
+    cache_key_type _key;
+
+public:
+    authorized_prepared_statements_cache_key(auth::authenticated_user user, cql3::prepared_cache_key_type prepared_cache_key)
+        : _key(std::move(user), std::move(prepared_cache_key.key())) {}
+
+    cache_key_type& key() { return _key; }
+
+    const cache_key_type& key() const { return _key; }
+
+    bool operator==(const authorized_prepared_statements_cache_key& other) const {
+        return _key == other._key;
+    }
+
+    bool operator!=(const authorized_prepared_statements_cache_key& other) const {
+        return !(*this == other);
+    }
+
+    static size_t hash(const auth::authenticated_user& user, const cql3::prepared_cache_key_type::cache_key_type& prep_cache_key) {
+        return utils::hash_combine(std::hash<auth::authenticated_user>()(user), utils::tuple_hash()(prep_cache_key));
+    }
+};
+
+/// \class authorized_prepared_statements_cache
+/// \brief A cache of previously authorized statements.
+///
+/// Entries are inserted every time a new statement is authorized.
+/// Entries are evicted in any of the following cases:
+///    - When the corresponding prepared statement is not valid anymore.
+///    - Periodically, with the same period as the permission cache is refreshed.
+///    - If the corresponding entry hasn't been used for \ref entry_expiry.
+class authorized_prepared_statements_cache {
+public:
+    struct stats {
+        uint64_t authorized_prepared_statements_cache_evictions = 0;
+    };
+
+    static stats& shard_stats() {
+        static thread_local stats _stats;
+        return _stats;
+    }
+
+    struct authorized_prepared_statements_cache_stats_updater {
+        static void inc_hits() noexcept {}
+        static void inc_misses() noexcept {}
+        static void inc_blocks() noexcept {}
+        static void inc_evictions() noexcept {
+            ++shard_stats().authorized_prepared_statements_cache_evictions;
+        }
+    };
+
+private:
+    using cache_key_type = authorized_prepared_statements_cache_key;
+    using checked_weak_ptr = typename statements::prepared_statement::checked_weak_ptr;
+    using cache_type = utils::loading_cache<cache_key_type,
+                                            checked_weak_ptr,
+                                            utils::loading_cache_reload_enabled::yes,
+                                            authorized_prepared_statements_cache_size,
+                                            std::hash<cache_key_type>,
+                                            std::equal_to<cache_key_type>,
+                                            authorized_prepared_statements_cache_stats_updater>;
+
+public:
+    using key_type = cache_key_type;
+    using value_type = checked_weak_ptr;
+    using entry_is_too_big = typename cache_type::entry_is_too_big;
+    using iterator = typename cache_type::iterator;
+
+private:
+    cache_type _cache;
+    logging::logger& _logger;
+
+public:
+    // Choose the memory budget such that would allow us ~4K entries when a shard gets 1GB of RAM
+    authorized_prepared_statements_cache(std::chrono::milliseconds entry_expiration, std::chrono::milliseconds entry_refresh, size_t cache_size, logging::logger& logger)
+        : _cache(cache_size, entry_expiration, entry_refresh, logger, [this] (const key_type& k) {
+            _cache.remove(k);
+            return make_ready_future<value_type>();
+        })
+        , _logger(logger)
+    {}
+
+    future<> insert(auth::authenticated_user user, cql3::prepared_cache_key_type prep_cache_key, value_type v) noexcept {
+        return _cache.get_ptr(key_type(std::move(user), std::move(prep_cache_key)), [v = std::move(v)] (const cache_key_type&) mutable {
+            return make_ready_future<value_type>(std::move(v));
+        }).discard_result();
+    }
+
+    iterator find(const auth::authenticated_user& user, const cql3::prepared_cache_key_type& prep_cache_key) {
+        struct key_view {
+            const auth::authenticated_user& user_ref;
+            const cql3::prepared_cache_key_type& prep_cache_key_ref;
+        };
+
+        struct hasher {
+            size_t operator()(const key_view& kv) {
+                return cql3::authorized_prepared_statements_cache_key::hash(kv.user_ref, kv.prep_cache_key_ref.key());
+            }
+        };
+
+        struct equal {
+            bool operator()(const key_type& k1, const key_view& k2) {
+                return k1.key().first == k2.user_ref && k1.key().second == k2.prep_cache_key_ref.key();
+            }
+
+            bool operator()(const key_view& k2, const key_type& k1) {
+                return operator()(k1, k2);
+            }
+        };
+
+        return _cache.find(key_view{user, prep_cache_key}, hasher(), equal());
+    }
+
+    iterator end() {
+        return _cache.end();
+    }
+
+    void remove(const auth::authenticated_user& user, const cql3::prepared_cache_key_type& prep_cache_key) {
+        iterator it = find(user, prep_cache_key);
+        _cache.remove(it);
+    }
+
+    size_t size() const {
+        return _cache.size();
+    }
+
+    size_t memory_footprint() const {
+        return _cache.memory_footprint();
+    }
+
+    future<> stop() {
+        return _cache.stop();
+    }
+};
+
+}
+
+namespace std {
+template <>
+struct hash<cql3::authorized_prepared_statements_cache_key> final {
+    size_t operator()(const cql3::authorized_prepared_statements_cache_key& k) const {
+        return cql3::authorized_prepared_statements_cache_key::hash(k.key().first, k.key().second);
+    }
+};
+
+inline std::ostream& operator<<(std::ostream& out, const cql3::authorized_prepared_statements_cache_key& k) {
+    return out << "{ " << k.key().first << ", " << k.key().second << " }";
+}
+}
--- a/cql3/column_identifier.cc
+++ b/cql3/column_identifier.cc
@@ -22,6 +22,7 @@
 #include "cql3/column_identifier.hh"
 #include "exceptions/exceptions.hh"
 #include "cql3/selection/simple_selector.hh"
+#include "cql3/util.hh"

 #include <regex>

@@ -62,14 +63,11 @@ sstring column_identifier::to_string() const {
 }

 sstring column_identifier::to_cql_string() const {
-    static const std::regex unquoted_identifier_re("[a-z][a-z0-9_]*");
-    if (std::regex_match(_text.begin(), _text.end(), unquoted_identifier_re)) {
-        return _text;
-    }
-    static const std::regex double_quote_re("\"");
-    std::string result = _text;
-    std::regex_replace(result, double_quote_re, "\"\"");
-    return '"' + result + '"';
+    return util::maybe_quote(_text);
+}
+
+sstring column_identifier::raw::to_cql_string() const {
+    return util::maybe_quote(_text);
 }

 column_identifier::raw::raw(sstring raw_text, bool keep_case)
@@ -129,7 +127,11 @@ column_identifier::new_selector_factory(database& db, schema_ptr schema, std::ve
    if (!def) {
        throw exceptions::invalid_request_exception(sprint("Undefined name %s in selection clause", _text));
    }
-
+    // Do not allow explicitly selecting hidden columns. We also skip them on
+    // "SELECT *" (see selection::wildcard()).
+    if (def->is_view_virtual()) {
+        throw exceptions::invalid_request_exception(sprint("Undefined name %s in selection clause", _text));
+    }
    return selection::simple_selector::new_factory(def->name_as_text(), add_and_get_index(*def, defs), def->type);
 }

--- a/cql3/column_identifier.hh
+++ b/cql3/column_identifier.hh
@@ -123,6 +123,7 @@ public:
    bool operator!=(const raw& other) const;

    virtual sstring to_string() const;
+    sstring to_cql_string() const;

    friend std::hash<column_identifier::raw>;
    friend std::ostream& operator<<(std::ostream& out, const column_identifier::raw& id);
--- a/cql3/constants.hh
+++ b/cql3/constants.hh
@@ -85,8 +85,8 @@ public:
            virtual ::shared_ptr<terminal> bind(const query_options& options) override { return {}; }
            virtual sstring to_string() const override { return "null"; }
        };
-        static thread_local const ::shared_ptr<terminal> NULL_VALUE;
    public:
+        static thread_local const ::shared_ptr<terminal> NULL_VALUE;
        virtual ::shared_ptr<term> prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) override {
            if (!is_assignable(test_assignment(db, keyspace, receiver))) {
                throw exceptions::invalid_request_exception("Invalid null value for counter increment/decrement");
@@ -203,10 +203,14 @@ public:

        virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override {
            auto value = _t->bind_and_get(params._options);
+            execute(m, prefix, params, column, std::move(value));
+        }
+
+        static void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params, const column_definition& column, cql3::raw_value_view value) {
            if (value.is_null()) {
                m.set_cell(prefix, column, std::move(make_dead_cell(params)));
            } else if (value.is_value()) {
-                m.set_cell(prefix, column, std::move(make_cell(*value, params)));
+                m.set_cell(prefix, column, std::move(make_cell(*column.type, *value, params)));
            }
        }
    };
@@ -221,7 +225,9 @@ public:
            } else if (value.is_unset_value()) {
                return;
            }
-            auto increment = value_cast<int64_t>(long_type->deserialize_value(*value));
+            auto increment = with_linearized(*value, [] (bytes_view value_view) {
+                return value_cast<int64_t>(long_type->deserialize_value(value_view));
+            });
            m.set_cell(prefix, column, make_counter_update_cell(increment, params));
        }
    };
@@ -236,7 +242,9 @@ public:
            } else if (value.is_unset_value()) {
                return;
            }
-            auto increment = value_cast<int64_t>(long_type->deserialize_value(*value));
+            auto increment = with_linearized(*value, [] (bytes_view value_view) {
+                return value_cast<int64_t>(long_type->deserialize_value(value_view));
+            });
            if (increment == std::numeric_limits<int64_t>::min()) {
                throw exceptions::invalid_request_exception(sprint("The negation of %d overflows supported counter precision (signed 8 bytes integer)", increment));
            }
--- a/cql3/cql3_type.cc
+++ b/cql3/cql3_type.cc
@@ -395,18 +395,15 @@ operator<<(std::ostream& os, const cql3_type::raw& r) {

 namespace util {

-sstring maybe_quote(const sstring& s) {
-    static const std::regex unquoted("\\w*");
-    static const std::regex double_quote("\"");
-
-    if (std::regex_match(s.begin(), s.end(), unquoted)) {
-        return s;
+sstring maybe_quote(const sstring& identifier) {
+    static const std::regex unquoted_identifier_re("[a-z][a-z0-9_]*");
+    if (std::regex_match(identifier.begin(), identifier.end(), unquoted_identifier_re)) {
+        return identifier;
    }
-    std::ostringstream ss;
-    ss << "\"";
-    std::regex_replace(std::ostreambuf_iterator<char>(ss), s.begin(), s.end(), double_quote, "\"\"");
-    ss << "\"";
-    return ss.str();
+    static const std::regex double_quote_re("\"");
+    std::string result = identifier;
+    std::regex_replace(result, double_quote_re, "\"\"");
+    return '"' + result + '"';
 }

 }
--- a/cql3/cql_statement.hh
+++ b/cql3/cql_statement.hh
@@ -45,6 +45,7 @@
 #include "service/query_state.hh"
 #include "service/storage_proxy.hh"
 #include "cql3/query_options.hh"
+#include "timeout_config.hh"

 namespace cql_transport {

@@ -62,10 +63,15 @@ class metadata;
 shared_ptr<const metadata> make_empty_metadata();

 class cql_statement {
+    timeout_config_selector _timeout_config_selector;
 public:
+    explicit cql_statement(timeout_config_selector timeout_selector) : _timeout_config_selector(timeout_selector) {}
+
    virtual ~cql_statement()
    { }

+    timeout_config_selector get_timeout_config_selector() const { return _timeout_config_selector; }
+
    virtual uint32_t get_bound_terms() = 0;

    /**
@@ -81,7 +87,7 @@ public:
     *
     * @param state the current client state
     */
-    virtual void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) = 0;
+    virtual void validate(service::storage_proxy& proxy, const service::client_state& state) = 0;

    /**
     * Execute the statement and return the resulting result or null if there is no result.
@@ -90,15 +96,7 @@ public:
     * @param options options for this query (consistency, variables, pageSize, ...)
     */
    virtual future<::shared_ptr<cql_transport::messages::result_message>>
-        execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) = 0;
-
-    /**
-     * Variant of execute used for internal query against the system tables, and thus only query the local node = 0.
-     *
-     * @param state the current query state
-     */
-    virtual future<::shared_ptr<cql_transport::messages::result_message>>
-        execute_internal(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) = 0;
+        execute(service::storage_proxy& proxy, service::query_state& state, const query_options& options) = 0;

    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const = 0;

@@ -111,6 +109,7 @@ public:

 class cql_statement_no_metadata : public cql_statement {
 public:
+    using cql_statement::cql_statement;
    virtual shared_ptr<const metadata> get_result_metadata() const override {
        return make_empty_metadata();
    }
--- a/cql3/functions/abstract_function.hh
+++ b/cql3/functions/abstract_function.hh
@@ -42,6 +42,7 @@
 #pragma once

 #include "types.hh"
+#include "cql3/cql3_type.hh"
 #include <vector>
 #include <iosfwd>
 #include <boost/functional/hash.hpp>
@@ -105,9 +106,9 @@ abstract_function::print(std::ostream& os) const {
        if (i > 0) {
            os << ", ";
        }
-        os << _arg_types[i]->name(); // FIXME: asCQL3Type()
+        os << _arg_types[i]->as_cql3_type()->to_string();
    }
-    os << ") -> " << _return_type->name(); // FIXME: asCQL3Type()
+    os << ") -> " << _return_type->as_cql3_type()->to_string();
 }

 }
--- a/cql3/functions/functions.cc
+++ b/cql3/functions/functions.cc
@@ -20,6 +20,7 @@
 */

 #include "functions.hh"
+
 #include "function_call.hh"
 #include "token_fct.hh"
 #include "cql3/maps.hh"
@@ -41,11 +42,22 @@ functions::init() {
    declare(time_uuid_fcts::make_min_timeuuid_fct());
    declare(time_uuid_fcts::make_max_timeuuid_fct());
    declare(time_uuid_fcts::make_date_of_fct());
-    declare(time_uuid_fcts::make_unix_timestamp_of_fcf());
+    declare(time_uuid_fcts::make_unix_timestamp_of_fct());
+    declare(time_uuid_fcts::make_currenttimestamp_fct());
+    declare(time_uuid_fcts::make_currentdate_fct());
+    declare(time_uuid_fcts::make_currenttime_fct());
+    declare(time_uuid_fcts::make_currenttimeuuid_fct());
+    declare(time_uuid_fcts::make_timeuuidtodate_fct());
+    declare(time_uuid_fcts::make_timestamptodate_fct());
+    declare(time_uuid_fcts::make_timeuuidtotimestamp_fct());
+    declare(time_uuid_fcts::make_datetotimestamp_fct());
+    declare(time_uuid_fcts::make_timeuuidtounixtimestamp_fct());
+    declare(time_uuid_fcts::make_timestamptounixtimestamp_fct());
+    declare(time_uuid_fcts::make_datetounixtimestamp_fct());
    declare(make_uuid_fct());

    for (auto&& type : cql3_type::values()) {
-        // Note: because text and varchar ends up being synonimous, our automatic makeToBlobFunction doesn't work
+        // Note: because text and varchar ends up being synonymous, our automatic makeToBlobFunction doesn't work
        // for varchar, so we special case it below. We also skip blob for obvious reasons.
        if (type == cql3_type::varchar || type == cql3_type::blob) {
            continue;
@@ -95,15 +107,22 @@ functions::init() {
    declare(aggregate_fcts::make_max_function<sstring>());
    declare(aggregate_fcts::make_min_function<sstring>());

+    declare(aggregate_fcts::make_count_function<simple_date_native_type>());
    declare(aggregate_fcts::make_max_function<simple_date_native_type>());
    declare(aggregate_fcts::make_min_function<simple_date_native_type>());

+    declare(aggregate_fcts::make_count_function<timestamp_native_type>());
    declare(aggregate_fcts::make_max_function<timestamp_native_type>());
    declare(aggregate_fcts::make_min_function<timestamp_native_type>());

+    declare(aggregate_fcts::make_count_function<timeuuid_native_type>());
    declare(aggregate_fcts::make_max_function<timeuuid_native_type>());
    declare(aggregate_fcts::make_min_function<timeuuid_native_type>());

+    declare(aggregate_fcts::make_count_function<utils::UUID>());
+    declare(aggregate_fcts::make_max_function<utils::UUID>());
+    declare(aggregate_fcts::make_min_function<utils::UUID>());
+
    //FIXME:
    //declare(aggregate_fcts::make_count_function<bytes>());
    //declare(aggregate_fcts::make_max_function<bytes>());
@@ -153,23 +172,73 @@ functions::get_overload_count(const function_name& name) {
    return _declared.count(name);
 }

+inline
+shared_ptr<function>
+make_to_json_function(data_type t) {
+    return make_native_scalar_function<true>("tojson", utf8_type, {t},
+            [t](cql_serialization_format sf, const std::vector<bytes_opt>& parameters) -> bytes_opt {
+        return utf8_type->decompose(t->to_json_string(parameters[0]));
+    });
+}
+
+inline
+shared_ptr<function>
+make_from_json_function(database& db, const sstring& keyspace, data_type t) {
+    return make_native_scalar_function<true>("fromjson", t, {utf8_type},
+            [&db, &keyspace, t](cql_serialization_format sf, const std::vector<bytes_opt>& parameters) -> bytes_opt {
+        Json::Value json_value = json::to_json_value(utf8_type->to_string(parameters[0].value()));
+        bytes_opt parsed_json_value;
+        if (!json_value.isNull()) {
+            parsed_json_value.emplace(t->from_json_object(json_value, sf));
+        }
+        return std::move(parsed_json_value);
+    });
+}
+
 shared_ptr<function>
 functions::get(database& db,
        const sstring& keyspace,
        const function_name& name,
        const std::vector<shared_ptr<assignment_testable>>& provided_args,
        const sstring& receiver_ks,
-        const sstring& receiver_cf) {
+        const sstring& receiver_cf,
+        shared_ptr<column_specification> receiver) {

    static const function_name TOKEN_FUNCTION_NAME = function_name::native_function("token");
+    static const function_name TO_JSON_FUNCTION_NAME = function_name::native_function("tojson");
+    static const function_name FROM_JSON_FUNCTION_NAME = function_name::native_function("fromjson");

    if (name.has_keyspace()
-        ? name == TOKEN_FUNCTION_NAME
-        : name.name == TOKEN_FUNCTION_NAME.name)
-    {
+                ? name == TOKEN_FUNCTION_NAME
+                : name.name == TOKEN_FUNCTION_NAME.name) {
        return ::make_shared<token_fct>(db.find_schema(receiver_ks, receiver_cf));
    }

+    if (name.has_keyspace()
+                ? name == TO_JSON_FUNCTION_NAME
+                : name.name == TO_JSON_FUNCTION_NAME.name) {
+        if (provided_args.size() != 1) {
+            throw exceptions::invalid_request_exception("toJson() accepts 1 argument only");
+        }
+        selection::selector *sp = dynamic_cast<selection::selector *>(provided_args[0].get());
+        if (!sp) {
+            throw exceptions::invalid_request_exception("toJson() is only valid in SELECT clause");
+        }
+        return make_to_json_function(sp->get_type());
+    }
+
+    if (name.has_keyspace()
+                ? name == FROM_JSON_FUNCTION_NAME
+                : name.name == FROM_JSON_FUNCTION_NAME.name) {
+        if (provided_args.size() != 1) {
+            throw exceptions::invalid_request_exception("fromJson() accepts 1 argument only");
+        }
+        if (!receiver) {
+            throw exceptions::invalid_request_exception("fromJson() can only be called if receiver type is known");
+        }
+        return make_from_json_function(db, keyspace, receiver->type);
+    }
+
    std::vector<shared_ptr<function>> candidates;
    auto&& add_declared = [&] (function_name fn) {
        auto&& fns = _declared.equal_range(fn);
@@ -392,9 +461,9 @@ function_call::make_terminal(shared_ptr<function> fun, cql3::raw_value result, c
    }

    auto ctype = static_pointer_cast<const collection_type_impl>(fun->return_type());
-    bytes_view res;
+    fragmented_temporary_buffer::view res;
    if (result) {
-        res = *result;
+        res = fragmented_temporary_buffer::view(bytes_view(*result));
    }
    if (&ctype->_kind == &collection_type_impl::kind::list) {
        return make_shared(lists::value::from_serialized(std::move(res), static_pointer_cast<const list_type_impl>(ctype), sf));
@@ -414,7 +483,7 @@ function_call::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<
            [] (auto&& x) -> shared_ptr<assignment_testable> {
        return x;
    });
-    auto&& fun = functions::functions::get(db, keyspace, _name, args, receiver->ks_name, receiver->cf_name);
+    auto&& fun = functions::functions::get(db, keyspace, _name, args, receiver->ks_name, receiver->cf_name, receiver);
    if (!fun) {
        throw exceptions::invalid_request_exception(sprint("Unknown function %s called", _name));
    }
@@ -478,7 +547,7 @@ function_call::raw::test_assignment(database& db, const sstring& keyspace, share
    // of another, existing, function. In that case, we return true here because we'll throw a proper exception
    // later with a more helpful error message that if we were to return false here.
    try {
-        auto&& fun = functions::get(db, keyspace, _name, _terms, receiver->ks_name, receiver->cf_name);
+        auto&& fun = functions::get(db, keyspace, _name, _terms, receiver->ks_name, receiver->cf_name, receiver);
        if (fun && receiver->type->equals(fun->return_type())) {
            return assignment_testable::test_result::EXACT_MATCH;
        } else if (!fun || receiver->type->is_value_compatible_with(*fun->return_type())) {
--- a/cql3/functions/functions.hh
+++ b/cql3/functions/functions.hh
@@ -80,16 +80,18 @@ public:
                                    const function_name& name,
                                    const std::vector<shared_ptr<assignment_testable>>& provided_args,
                                    const sstring& receiver_ks,
-                                    const sstring& receiver_cf);
+                                    const sstring& receiver_cf,
+                                    ::shared_ptr<column_specification> receiver = nullptr);
    template <typename AssignmentTestablePtrRange>
    static shared_ptr<function> get(database& db,
                                    const sstring& keyspace,
                                    const function_name& name,
                                    AssignmentTestablePtrRange&& provided_args,
                                    const sstring& receiver_ks,
-                                    const sstring& receiver_cf) {
+                                    const sstring& receiver_cf,
+                                    ::shared_ptr<column_specification> receiver = nullptr) {
        const std::vector<shared_ptr<assignment_testable>> args(std::begin(provided_args), std::end(provided_args));
-        return get(db, keyspace, name, args, receiver_ks, receiver_cf);
+        return get(db, keyspace, name, args, receiver_ks, receiver_cf, receiver);
    }
    static std::vector<shared_ptr<function>> find(const function_name& name);
    static shared_ptr<function> find(const function_name& name, const std::vector<data_type>& arg_types);
--- a/cql3/functions/time_uuid_fcts.hh
+++ b/cql3/functions/time_uuid_fcts.hh
@@ -117,7 +117,7 @@ make_date_of_fct() {

 inline
 shared_ptr<function>
-make_unix_timestamp_of_fcf() {
+make_unix_timestamp_of_fct() {
    return make_native_scalar_function<true>("unixtimestampof", long_type, { timeuuid_type },
            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
        using namespace utils;
@@ -129,6 +129,163 @@ make_unix_timestamp_of_fcf() {
    });
 }

+inline shared_ptr<function>
+make_currenttimestamp_fct() {
+    return make_native_scalar_function<true>("currenttimestamp", timestamp_type, {},
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        return {timestamp_type->decompose(timestamp_native_type{db_clock::now()})};
+    });
+}
+
+inline shared_ptr<function>
+make_currenttime_fct() {
+    return make_native_scalar_function<true>("currenttime", time_type, {},
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        constexpr int64_t milliseconds_in_day = 3600 * 24 * 1000;
+        int64_t milliseconds_since_epoch = std::chrono::duration_cast<std::chrono::milliseconds>(db_clock::now().time_since_epoch()).count();
+        int64_t nanoseconds_today = (milliseconds_since_epoch % milliseconds_in_day) * 1000 * 1000;
+        return {time_type->decompose(time_native_type{nanoseconds_today})};
+    });
+}
+
+inline shared_ptr<function>
+make_currentdate_fct() {
+    return make_native_scalar_function<true>("currentdate", simple_date_type, {},
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
+        return {simple_date_type->decompose(to_simple_date(timestamp_native_type{db_clock::now()}))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_currenttimeuuid_fct() {
+    return make_native_scalar_function<true>("currenttimeuuid", timeuuid_type, {},
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        return {timeuuid_type->decompose(timeuuid_native_type{utils::UUID_gen::get_time_UUID()})};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timeuuidtodate_fct() {
+    return make_native_scalar_function<true>("todate", simple_date_type, { timeuuid_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
+        auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
+        return {simple_date_type->decompose(to_simple_date(ts))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timestamptodate_fct() {
+    return make_native_scalar_function<true>("todate", simple_date_type, { timestamp_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto ts_obj = timestamp_type->deserialize(*bb);
+        if (ts_obj.is_null()) {
+            return {};
+        }
+        auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
+        return {simple_date_type->decompose(to_simple_date(ts_obj))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timeuuidtotimestamp_fct() {
+    return make_native_scalar_function<true>("totimestamp", timestamp_type, { timeuuid_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
+        return {timestamp_type->decompose(ts)};
+    });
+}
+
+inline
+shared_ptr<function>
+make_datetotimestamp_fct() {
+    return make_native_scalar_function<true>("totimestamp", timestamp_type, { simple_date_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto simple_date_obj = simple_date_type->deserialize(*bb);
+        if (simple_date_obj.is_null()) {
+            return {};
+        }
+        auto from_simple_date = get_castas_fctn(timestamp_type, simple_date_type);
+        return {timestamp_type->decompose(from_simple_date(simple_date_obj))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timeuuidtounixtimestamp_fct() {
+    return make_native_scalar_function<true>("tounixtimestamp", long_type, { timeuuid_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        return {long_type->decompose(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb)))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timestamptounixtimestamp_fct() {
+    return make_native_scalar_function<true>("tounixtimestamp", long_type, { timestamp_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto ts_obj = timestamp_type->deserialize(*bb);
+        if (ts_obj.is_null()) {
+            return {};
+        }
+        return {long_type->decompose(ts_obj)};
+    });
+}
+
+inline
+shared_ptr<function>
+make_datetounixtimestamp_fct() {
+    return make_native_scalar_function<true>("tounixtimestamp", long_type, { simple_date_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto simple_date_obj = simple_date_type->deserialize(*bb);
+        if (simple_date_obj.is_null()) {
+            return {};
+        }
+        auto from_simple_date = get_castas_fctn(timestamp_type, simple_date_type);
+        return {long_type->decompose(from_simple_date(simple_date_obj))};
+    });
+}
+
 }
 }
 }
--- a/cql3/lists.cc
+++ b/cql3/lists.cc
@@ -115,11 +115,12 @@ lists::literal::to_string() const {
 }

 lists::value
-lists::value::from_serialized(bytes_view v, list_type type, cql_serialization_format sf) {
+lists::value::from_serialized(const fragmented_temporary_buffer::view& val, list_type type, cql_serialization_format sf) {
    try {
        // Collections have this small hack that validate cannot be called on a serialized object,
        // but compose does the validation (so we're fine).
        // FIXME: deserializeForNativeProtocol()?!
+      return with_linearized(val, [&] (bytes_view v) {
        auto l = value_cast<list_type_impl::native_type>(type->deserialize(v, sf));
        std::vector<bytes_opt> elements;
        elements.reserve(l.size());
@@ -128,6 +129,7 @@ lists::value::from_serialized(bytes_view v, list_type type, cql_serialization_fo
            elements.push_back(element.is_null() ? bytes_opt() : bytes_opt(type->get_elements_type()->decompose(element)));
        }
        return value(std::move(elements));
+      });
    } catch (marshal_exception& e) {
        throw exceptions::invalid_request_exception(e.what());
    }
@@ -237,7 +239,12 @@ lists::precision_time::get_next(db_clock::time_point millis) {

 void
 lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
-    const auto& value = _t->bind(params._options);
+    auto value = _t->bind(params._options);
+    execute(m, prefix, params, column, std::move(value));
+}
+
+void
+lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value) {
    if (value == constants::UNSET_VALUE) {
        return;
    }
@@ -280,7 +287,9 @@ lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix
        return;
    }

-    auto idx = net::ntoh(int32_t(*unaligned_cast<int32_t>(index->begin())));
+    auto idx = with_linearized(*index, [] (bytes_view v) {
+        return value_cast<int32_t>(data_type_for<int32_t>()->deserialize(v));
+    });
    auto&& existing_list_opt = params.get_prefetched_list(m.key().view(), prefix.view(), column);
    if (!existing_list_opt) {
        throw exceptions::invalid_request_exception("Attempted to set an element on a list which is null");
@@ -299,7 +308,7 @@ lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix
    if (!value) {
        mut.cells.emplace_back(eidx, params.make_dead_cell());
    } else {
-        mut.cells.emplace_back(eidx, params.make_cell(*value));
+        mut.cells.emplace_back(eidx, params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
    }
    auto smut = ltype->serialize_mutation_form(mut);
    m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(std::move(smut)));
@@ -326,7 +335,7 @@ lists::setter_by_uuid::execute(mutation& m, const clustering_key_prefix& prefix,

    list_type_impl::mutation mut;
    mut.cells.reserve(1);
-    mut.cells.emplace_back(to_bytes(*index), params.make_cell(*value));
+    mut.cells.emplace_back(to_bytes(*index), params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
    auto smut = ltype->serialize_mutation_form(mut);
    m.set_cell(prefix, column,
                    atomic_cell_or_collection::from_collection_mutation(
@@ -365,7 +374,7 @@ lists::do_append(shared_ptr<term> value,
            auto uuid1 = utils::UUID_gen::get_time_UUID_bytes();
            auto uuid = bytes(reinterpret_cast<const int8_t*>(uuid1.data()), uuid1.size());
            // FIXME: can e be empty?
-            appended.cells.emplace_back(std::move(uuid), params.make_cell(*e));
+            appended.cells.emplace_back(std::move(uuid), params.make_cell(*ltype->value_comparator(), *e, atomic_cell::collection_member::yes));
        }
        m.set_cell(prefix, column, ltype->serialize_mutation_form(appended));
    } else {
@@ -374,7 +383,7 @@ lists::do_append(shared_ptr<term> value,
            m.set_cell(prefix, column, params.make_dead_cell());
        } else {
            auto newv = list_value->get_with_protocol_version(cql_serialization_format::internal());
-            m.set_cell(prefix, column, params.make_cell(std::move(newv)));
+            m.set_cell(prefix, column, params.make_cell(*column.type, std::move(newv)));
        }
    }
 }
@@ -395,14 +404,14 @@ lists::prepender::execute(mutation& m, const clustering_key_prefix& prefix, cons
    mut.cells.reserve(lvalue->get_elements().size());
    // We reverse the order of insertion, so that the last element gets the lastest time
    // (lists are sorted by time)
+    auto&& ltype = static_cast<const list_type_impl*>(column.type.get());
    for (auto&& v : lvalue->_elements | boost::adaptors::reversed) {
        auto&& pt = precision_time::get_next(time);
        auto uuid = utils::UUID_gen::get_time_UUID_bytes(pt.millis.time_since_epoch().count(), pt.nanos);
-        mut.cells.emplace_back(bytes(uuid.data(), uuid.size()), params.make_cell(*v));
+        mut.cells.emplace_back(bytes(uuid.data(), uuid.size()), params.make_cell(*ltype->value_comparator(), *v, atomic_cell::collection_member::yes));
    }
    // now reverse again, to get the original order back
    std::reverse(mut.cells.begin(), mut.cells.end());
-    auto&& ltype = static_cast<const list_type_impl*>(column.type.get());
    m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(ltype->serialize_mutation_form(std::move(mut))));
 }

--- a/cql3/lists.hh
+++ b/cql3/lists.hh
@@ -79,7 +79,7 @@ public:
        explicit value(std::vector<bytes_opt> elements)
            : _elements(std::move(elements)) {
        }
-        static value from_serialized(bytes_view v, list_type type, cql_serialization_format sf);
+        static value from_serialized(const fragmented_temporary_buffer::view& v, list_type type, cql_serialization_format sf);
        virtual cql3::raw_value get(const query_options& options) override;
        virtual bytes get_with_protocol_version(cql_serialization_format sf) override;
        bool equals(shared_ptr<list_type_impl> lt, const value& v);
@@ -147,6 +147,7 @@ public:
                : operation(column, std::move(t)) {
        }
        virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override;
+        static void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value);
    };

    class setter_by_index : public operation {
--- a/cql3/maps.cc
+++ b/cql3/maps.cc
@@ -152,18 +152,20 @@ maps::literal::to_string() const {
 }

 maps::value
-maps::value::from_serialized(bytes_view value, map_type type, cql_serialization_format sf) {
+maps::value::from_serialized(const fragmented_temporary_buffer::view& fragmented_value, map_type type, cql_serialization_format sf) {
    try {
        // Collections have this small hack that validate cannot be called on a serialized object,
        // but compose does the validation (so we're fine).
        // FIXME: deserialize_for_native_protocol?!
+      return with_linearized(fragmented_value, [&] (bytes_view value) {
        auto m = value_cast<map_type_impl::native_type>(type->deserialize(value, sf));
        std::map<bytes, bytes, serialized_compare> map(type->get_keys_type()->as_less_comparator());
        for (auto&& e : m) {
            map.emplace(type->get_keys_type()->decompose(e.first),
                        type->get_values_type()->decompose(e.second));
        }
-        return { std::move(map) };
+        return maps::value { std::move(map) };
+      });
    } catch (marshal_exception& e) {
        throw exceptions::invalid_request_exception(e.what());
    }
@@ -233,10 +235,10 @@ maps::delayed_value::bind(const query_options& options) {
        if (key_bytes.is_unset_value()) {
            throw exceptions::invalid_request_exception("unset value is not supported inside collections");
        }
-        if (key_bytes->size() > std::numeric_limits<uint16_t>::max()) {
+        if (key_bytes->size_bytes() > std::numeric_limits<uint16_t>::max()) {
            throw exceptions::invalid_request_exception(sprint("Map key is too long. Map keys are limited to %d bytes but %d bytes keys provided",
                                                   std::numeric_limits<uint16_t>::max(),
-                                                   key_bytes->size()));
+                                                   key_bytes->size_bytes()));
        }
        auto value_bytes = value->bind_and_get(options);
        if (value_bytes.is_null()) {
@@ -266,6 +268,11 @@ maps::marker::bind(const query_options& options) {
 void
 maps::setter::execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params) {
    auto value = _t->bind(params._options);
+    execute(m, row_key, params, column, std::move(value));
+}
+
+void
+maps::setter::execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value) {
    if (value == constants::UNSET_VALUE) {
        return;
    }
@@ -295,10 +302,11 @@ maps::setter_by_key::execute(mutation& m, const clustering_key_prefix& prefix, c
    if (!key) {
        throw invalid_request_exception("Invalid null map key");
    }
-    auto avalue = value ? params.make_cell(*value) : params.make_dead_cell();
-    map_type_impl::mutation update = { {}, { { std::move(to_bytes(*key)), std::move(avalue) } } };
-    // should have been verified as map earlier?
    auto ctype = static_pointer_cast<const map_type_impl>(column.type);
+    auto avalue = value ? params.make_cell(*ctype->get_values_type(), *value, atomic_cell::collection_member::yes) : params.make_dead_cell();
+    map_type_impl::mutation update;
+    update.cells.emplace_back(std::move(to_bytes(*key)), std::move(avalue));
+    // should have been verified as map earlier?
    auto col_mut = ctype->serialize_mutation_form(std::move(update));
    m.set_cell(prefix, column, std::move(col_mut));
 }
@@ -323,10 +331,10 @@ maps::do_put(mutation& m, const clustering_key_prefix& prefix, const update_para
            return;
        }

-        for (auto&& e : map_value->map) {
-            mut.cells.emplace_back(e.first, params.make_cell(e.second));
-        }
        auto ctype = static_pointer_cast<const map_type_impl>(column.type);
+        for (auto&& e : map_value->map) {
+            mut.cells.emplace_back(e.first, params.make_cell(*ctype->get_values_type(), fragmented_temporary_buffer::view(e.second), atomic_cell::collection_member::yes));
+        }
        auto col_mut = ctype->serialize_mutation_form(std::move(mut));
        m.set_cell(prefix, column, std::move(col_mut));
    } else {
@@ -336,7 +344,7 @@ maps::do_put(mutation& m, const clustering_key_prefix& prefix, const update_para
        } else {
            auto v = map_type_impl::serialize_partially_deserialized_form({map_value->map.begin(), map_value->map.end()},
                    cql_serialization_format::internal());
-            m.set_cell(prefix, column, params.make_cell(std::move(v)));
+            m.set_cell(prefix, column, params.make_cell(*column.type, fragmented_temporary_buffer::view(std::move(v))));
        }
    }
 }
--- a/cql3/maps.hh
+++ b/cql3/maps.hh
@@ -81,7 +81,7 @@ public:
        value(std::map<bytes, bytes, serialized_compare> map)
            : map(std::move(map)) {
        }
-        static value from_serialized(bytes_view value, map_type type, cql_serialization_format sf);
+        static value from_serialized(const fragmented_temporary_buffer::view& value, map_type type, cql_serialization_format sf);
        virtual cql3::raw_value get(const query_options& options) override;
        virtual bytes get_with_protocol_version(cql_serialization_format sf);
        bool equals(map_type mt, const value& v);
@@ -117,6 +117,7 @@ public:
        }

        virtual void execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params) override;
+        static void execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value);
    };

    class setter_by_key : public operation {
--- a/cql3/operation.hh
+++ b/cql3/operation.hh
@@ -87,15 +87,19 @@ public:

    virtual ~operation() {}

-    atomic_cell make_dead_cell(const update_parameters& params) const {
+    static atomic_cell make_dead_cell(const update_parameters& params) {
        return params.make_dead_cell();
    }

-    atomic_cell make_cell(bytes_view value, const update_parameters& params) const {
-        return params.make_cell(value);
+    static atomic_cell make_cell(const abstract_type& type, bytes_view value, const update_parameters& params) {
+        return params.make_cell(type, fragmented_temporary_buffer::view(value));
    }

-    atomic_cell make_counter_update_cell(int64_t delta, const update_parameters& params) const {
+    static atomic_cell make_cell(const abstract_type& type, const fragmented_temporary_buffer::view& value, const update_parameters& params) {
+        return params.make_cell(type, value);
+    }
+
+    static atomic_cell make_counter_update_cell(int64_t delta, const update_parameters& params) {
        return params.make_counter_update_cell(delta);
    }

--- a/cql3/prepared_statements_cache.hh
+++ b/cql3/prepared_statements_cache.hh
@@ -68,6 +68,14 @@ public:
    static thrift_prepared_id_type thrift_id(const prepared_cache_key_type& key) {
        return key.key().second;
    }
+
+    bool operator==(const prepared_cache_key_type& other) const {
+        return _key == other._key;
+    }
+
+    bool operator!=(const prepared_cache_key_type& other) const {
+        return !(*this == other);
+    }
 };

 class prepared_statements_cache {
@@ -102,9 +110,9 @@ private:
        }
    };

+public:
    static const std::chrono::minutes entry_expiry;

-public:
    using key_type = prepared_cache_key_type;
    using value_type = checked_weak_ptr;
    using statement_is_too_big = typename cache_type::entry_is_too_big;
@@ -116,8 +124,8 @@ private:
    value_extractor_fn _value_extractor_fn;

 public:
-    prepared_statements_cache(logging::logger& logger)
-        : _cache(memory::stats().total_memory() / 256, entry_expiry, logger)
+    prepared_statements_cache(logging::logger& logger, size_t size)
+        : _cache(size, entry_expiry, logger)
    {}

    template <typename LoadFunc>
@@ -155,6 +163,10 @@ public:
    size_t memory_footprint() const {
        return _cache.memory_footprint();
    }
+
+    future<> stop() {
+        return _cache.stop();
+    }
 };
 }

@@ -168,4 +180,11 @@ inline std::ostream& operator<<(std::ostream& os, const cql3::prepared_cache_key
    os << p.key();
    return os;
 }
+
+template<>
+struct hash<cql3::prepared_cache_key_type> final {
+    size_t operator()(const cql3::prepared_cache_key_type& k) const {
+        return utils::tuple_hash()(k.key());
+    }
+};
 }
--- a/cql3/query_options.cc
+++ b/cql3/query_options.cc
@@ -46,10 +46,11 @@ namespace cql3 {

 thread_local const query_options::specific_options query_options::specific_options::DEFAULT{-1, {}, {}, api::missing_timestamp};

-thread_local query_options query_options::DEFAULT{db::consistency_level::ONE, std::experimental::nullopt,
+thread_local query_options query_options::DEFAULT{db::consistency_level::ONE, infinite_timeout_config, std::experimental::nullopt,
    std::vector<cql3::raw_value_view>(), false, query_options::specific_options::DEFAULT, cql_serialization_format::latest()};

 query_options::query_options(db::consistency_level consistency,
+                           const ::timeout_config& timeout_config,
                           std::experimental::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value> values,
                           std::vector<cql3::raw_value_view> value_views,
@@ -57,6 +58,7 @@ query_options::query_options(db::consistency_level consistency,
                           specific_options options,
                           cql_serialization_format sf)
   : _consistency(consistency)
+   , _timeout_config(timeout_config)
   , _names(std::move(names))
   , _values(std::move(values))
   , _value_views(value_views)
@@ -67,12 +69,14 @@ query_options::query_options(db::consistency_level consistency,
 }

 query_options::query_options(db::consistency_level consistency,
+                             const ::timeout_config& timeout_config,
                             std::experimental::optional<std::vector<sstring_view>> names,
                             std::vector<cql3::raw_value> values,
                             bool skip_metadata,
                             specific_options options,
                             cql_serialization_format sf)
    : _consistency(consistency)
+    , _timeout_config(timeout_config)
    , _names(std::move(names))
    , _values(std::move(values))
    , _value_views()
@@ -84,12 +88,14 @@ query_options::query_options(db::consistency_level consistency,
 }

 query_options::query_options(db::consistency_level consistency,
+                             const ::timeout_config& timeout_config,
                             std::experimental::optional<std::vector<sstring_view>> names,
                             std::vector<cql3::raw_value_view> value_views,
                             bool skip_metadata,
                             specific_options options,
                             cql_serialization_format sf)
    : _consistency(consistency)
+    , _timeout_config(timeout_config)
    , _names(std::move(names))
    , _values()
    , _value_views(std::move(value_views))
@@ -99,9 +105,10 @@ query_options::query_options(db::consistency_level consistency,
 {
 }

-query_options::query_options(db::consistency_level cl, std::vector<cql3::raw_value> values, specific_options options)
+query_options::query_options(db::consistency_level cl, const ::timeout_config& timeout_config, std::vector<cql3::raw_value> values, specific_options options)
    : query_options(
          cl,
+          timeout_config,
          {},
          std::move(values),
          false,
@@ -113,6 +120,7 @@ query_options::query_options(db::consistency_level cl, std::vector<cql3::raw_val

 query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<service::pager::paging_state> paging_state)
        : query_options(qo->_consistency,
+        qo->get_timeout_config(),
        std::move(qo->_names),
        std::move(qo->_values),
        std::move(qo->_value_views),
@@ -122,84 +130,49 @@ query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<ser

 }

+query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<service::pager::paging_state> paging_state, int32_t page_size)
+        : query_options(qo->_consistency,
+        qo->get_timeout_config(),
+        std::move(qo->_names),
+        std::move(qo->_values),
+        std::move(qo->_value_views),
+        qo->_skip_metadata,
+        std::move(query_options::specific_options{page_size, paging_state, qo->_options.serial_consistency, qo->_options.timestamp}),
+        qo->_cql_serialization_format) {
+
+}
+
 query_options::query_options(std::vector<cql3::raw_value> values)
    : query_options(
-          db::consistency_level::ONE, std::move(values))
+          db::consistency_level::ONE, infinite_timeout_config, std::move(values))
 {}

-db::consistency_level query_options::get_consistency() const
-{
-    return _consistency;
-}
-
-cql3::raw_value_view query_options::get_value_at(size_t idx) const
-{
-    return _value_views.at(idx);
-}
-
-size_t query_options::get_values_count() const
-{
-    return _value_views.size();
-}
-
 cql3::raw_value_view query_options::make_temporary(cql3::raw_value value) const
 {
    if (value) {
-        _temporaries.emplace_back(value->begin(), value->end());
-        auto& temporary = _temporaries.back();
-        return cql3::raw_value_view::make_value(bytes_view{temporary.data(), temporary.size()});
+        auto value_view = *value;
+        auto ptr = _temporaries.write_place_holder(value_view.size());
+        std::copy_n(value_view.data(), value_view.size(), ptr);
+        return cql3::raw_value_view::make_value(fragmented_temporary_buffer::view(bytes_view{ptr, value_view.size()}));
    }
    return cql3::raw_value_view::make_null();
 }

-bool query_options::skip_metadata() const
+bytes_view query_options::linearize(fragmented_temporary_buffer::view view) const
 {
-    return _skip_metadata;
-}
-
-int32_t query_options::get_page_size() const
-{
-    return get_specific_options().page_size;
-}
-
-::shared_ptr<service::pager::paging_state> query_options::get_paging_state() const
-{
-    return get_specific_options().state;
-}
-
-std::experimental::optional<db::consistency_level> query_options::get_serial_consistency() const
-{
-    return get_specific_options().serial_consistency;
-}
-
-api::timestamp_type query_options::get_timestamp(service::query_state& state) const
-{
-    auto tstamp = get_specific_options().timestamp;
-    return tstamp != api::missing_timestamp ? tstamp : state.get_timestamp();
-}
-
-int query_options::get_protocol_version() const
-{
-    return _cql_serialization_format.protocol_version();
-}
-
-cql_serialization_format query_options::get_cql_serialization_format() const
-{
-    return _cql_serialization_format;
-}
-
-const query_options::specific_options& query_options::get_specific_options() const
-{
-    return _options;
-}
-
-const query_options& query_options::for_statement(size_t i) const
-{
-    if (!_batch_options) {
-        // No per-statement options supplied, so use the "global" options
-        return *this;
+    if (view.empty()) {
+        return { };
+    } else if (std::next(view.begin()) == view.end()) {
+        return *view.begin();
+    } else {
+        auto ptr = _temporaries.write_place_holder(view.size_bytes());
+        auto dst = ptr;
+        using boost::range::for_each;
+        for_each(view, [&] (bytes_view bv) {
+            dst = std::copy(bv.begin(), bv.end(), dst);
+        });
+        return bytes_view(ptr, view.size_bytes());
    }
-    return _batch_options->at(i);
 }

 void query_options::prepare(const std::vector<::shared_ptr<column_specification>>& specs)
@@ -226,11 +199,7 @@ void query_options::prepare(const std::vector<::shared_ptr<column_specification>
 void query_options::fill_value_views()
 {
    for (auto&& value : _values) {
-        if (value) {
-            _value_views.emplace_back(cql3::raw_value_view::make_value(bytes_view{*value}));
-        } else {
-            _value_views.emplace_back(cql3::raw_value_view::make_null());
-        }
+        _value_views.emplace_back(value.to_view());
    }
 }

--- a/cql3/query_options.hh
+++ b/cql3/query_options.hh
@@ -44,13 +44,14 @@
 #include <seastar/util/gcc6-concepts.hh>
 #include "timestamp.hh"
 #include "bytes.hh"
-#include "db/consistency_level.hh"
+#include "db/consistency_level_type.hh"
 #include "service/query_state.hh"
 #include "service/pager/paging_state.hh"
 #include "cql3/column_specification.hh"
 #include "cql3/column_identifier.hh"
 #include "cql3/values.hh"
 #include "cql_serialization_format.hh"
+#include "timeout_config.hh"

 namespace cql3 {

@@ -70,10 +71,11 @@ public:
    };
 private:
    const db::consistency_level _consistency;
+    const timeout_config& _timeout_config;
    const std::experimental::optional<std::vector<sstring_view>> _names;
    std::vector<cql3::raw_value> _values;
    std::vector<cql3::raw_value_view> _value_views;
-    mutable std::vector<std::vector<int8_t>> _temporaries;
+    mutable bytes_ostream _temporaries;
    const bool _skip_metadata;
    const specific_options _options;
    cql_serialization_format _cql_serialization_format;
@@ -100,15 +102,17 @@ private:

 public:
    query_options(query_options&&) = default;
-    query_options(const query_options&) = delete;
+    explicit query_options(const query_options&) = default;

    explicit query_options(db::consistency_level consistency,
+                           const timeout_config& timeouts,
                           std::experimental::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value> values,
                           bool skip_metadata,
                           specific_options options,
                           cql_serialization_format sf);
    explicit query_options(db::consistency_level consistency,
+                           const timeout_config& timeouts,
                           std::experimental::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value> values,
                           std::vector<cql3::raw_value_view> value_views,
@@ -116,6 +120,7 @@ public:
                           specific_options options,
                           cql_serialization_format sf);
    explicit query_options(db::consistency_level consistency,
+                           const timeout_config& timeouts,
                           std::experimental::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value_view> value_views,
                           bool skip_metadata,
@@ -147,30 +152,81 @@ public:

    // forInternalUse
    explicit query_options(std::vector<cql3::raw_value> values);
-    explicit query_options(db::consistency_level, std::vector<cql3::raw_value> values, specific_options options = specific_options::DEFAULT);
+    explicit query_options(db::consistency_level, const timeout_config& timeouts,
+            std::vector<cql3::raw_value> values, specific_options options = specific_options::DEFAULT);
    explicit query_options(std::unique_ptr<query_options>, ::shared_ptr<service::pager::paging_state> paging_state);
+    explicit query_options(std::unique_ptr<query_options>, ::shared_ptr<service::pager::paging_state> paging_state, int32_t page_size);
+
+    const timeout_config& get_timeout_config() const { return _timeout_config; }
+
+    db::consistency_level get_consistency() const {
+        return _consistency;
+    }
+
+    cql3::raw_value_view get_value_at(size_t idx) const {
+        return _value_views.at(idx);
+    }
+
+    size_t get_values_count() const {
+        return _value_views.size();
+    }

-    db::consistency_level get_consistency() const;
-    cql3::raw_value_view get_value_at(size_t idx) const;
    cql3::raw_value_view make_temporary(cql3::raw_value value) const;
-    size_t get_values_count() const;
-    bool skip_metadata() const;
-    /**  The pageSize for this query. Will be <= 0 if not relevant for the query.  */
-    int32_t get_page_size() const;
+    bytes_view linearize(fragmented_temporary_buffer::view) const;
+
+    bool skip_metadata() const {
+        return _skip_metadata;
+    }
+
+    int32_t get_page_size() const {
+        return get_specific_options().page_size;
+    }
+
    /** The paging state for this query, or null if not relevant. */
-    ::shared_ptr<service::pager::paging_state> get_paging_state() const;
+    ::shared_ptr<service::pager::paging_state> get_paging_state() const {
+        return get_specific_options().state;
+    }
+
    /**  Serial consistency for conditional updates. */
-    std::experimental::optional<db::consistency_level> get_serial_consistency() const;
-    api::timestamp_type get_timestamp(service::query_state& state) const;
+    std::experimental::optional<db::consistency_level> get_serial_consistency() const {
+        return get_specific_options().serial_consistency;
+    }
+
+    api::timestamp_type get_timestamp(service::query_state& state) const {
+        auto tstamp = get_specific_options().timestamp;
+        return tstamp != api::missing_timestamp ? tstamp : state.get_timestamp();
+    }
+
    /**
     * The protocol version for the query. Will be 3 if the object don't come from
     * a native protocol request (i.e. it's been allocated locally or by CQL-over-thrift).
     */
-    int get_protocol_version() const;
-    cql_serialization_format get_cql_serialization_format() const;
+    int get_protocol_version() const {
+        return _cql_serialization_format.protocol_version();
+    }
+
+    cql_serialization_format get_cql_serialization_format() const {
+        return _cql_serialization_format;
+    }
+
+    const query_options::specific_options& get_specific_options() const {
+        return _options;
+    }
+
    // Mainly for the sake of BatchQueryOptions
-    const specific_options& get_specific_options() const;
-    const query_options& for_statement(size_t i) const;
+    const query_options& for_statement(size_t i) const {
+        if (!_batch_options) {
+            // No per-statement options supplied, so use the "global" options
+            return *this;
+        }
+        return _batch_options->at(i);
+    }
+
+
+    const std::experimental::optional<std::vector<sstring_view>>& get_names() const noexcept {
+        return _names;
+    }
+
    void prepare(const std::vector<::shared_ptr<column_specification>>& specs);
 private:
    void fill_value_views();
@@ -188,7 +244,7 @@ query_options::query_options(query_options&& o, std::vector<OneMutationDataRange
    std::vector<query_options> tmp;
    tmp.reserve(values_ranges.size());
    std::transform(values_ranges.begin(), values_ranges.end(), std::back_inserter(tmp), [this](auto& values_range) {
-        return query_options(_consistency, {}, std::move(values_range), _skip_metadata, _options, _cql_serialization_format);
+        return query_options(_consistency, _timeout_config, {}, std::move(values_range), _skip_metadata, _options, _cql_serialization_format);
    });
    _batch_options = std::move(tmp);
 }
--- a/cql3/query_processor.cc
+++ b/cql3/query_processor.cc
@@ -58,6 +58,7 @@ using namespace cql_transport::messages;

 logging::logger log("query_processor");
 logging::logger prep_cache_log("prepared_statements_cache");
+logging::logger authorized_prepared_statements_cache_log("authorized_prepared_statements_cache");

 distributed<query_processor> _the_query_processor;

@@ -91,12 +92,16 @@ api::timestamp_type query_processor::next_timestamp() {
    return _internal_state->next_timestamp();
 }

-query_processor::query_processor(distributed<service::storage_proxy>& proxy, distributed<database>& db)
+query_processor::query_processor(service::storage_proxy& proxy, distributed<database>& db, query_processor::memory_config mcfg)
        : _migration_subscriber{std::make_unique<migration_subscriber>(this)}
        , _proxy(proxy)
        , _db(db)
        , _internal_state(new internal_state())
-        , _prepared_cache(prep_cache_log) {
+        , _prepared_cache(prep_cache_log, mcfg.prepared_statment_cache_size)
+        , _authorized_prepared_cache(std::min(std::chrono::milliseconds(_db.local().get_config().permissions_validity_in_ms()),
+                                              std::chrono::duration_cast<std::chrono::milliseconds>(prepared_statements_cache::entry_expiry)),
+                                     std::chrono::milliseconds(_db.local().get_config().permissions_update_interval_in_ms()),
+                                     mcfg.authorized_prepared_cache_size, authorized_prepared_statements_cache_log) {
    namespace sm = seastar::metrics;

    _metrics.add_group(
@@ -159,6 +164,11 @@ query_processor::query_processor(distributed<service::storage_proxy>& proxy, dis
                            sm::description("Counts a total number of LOGGED batches that were executed as UNLOGGED "
                                            "batches.")),

+                    sm::make_derive(
+                            "rows_read",
+                            _cql_stats.rows_read,
+                            sm::description("Counts a total number of rows read during CQL requests.")),
+
                    sm::make_derive(
                            "prepared_cache_evictions",
                            [] { return prepared_statements_cache::shard_stats().prepared_cache_evictions; },
@@ -172,7 +182,80 @@ query_processor::query_processor(distributed<service::storage_proxy>& proxy, dis
                    sm::make_gauge(
                            "prepared_cache_memory_footprint",
                            [this] { return _prepared_cache.memory_footprint(); },
-                            sm::description("Size (in bytes) of the prepared statements cache."))});
+                            sm::description("Size (in bytes) of the prepared statements cache.")),
+
+                    sm::make_derive(
+                            "secondary_index_creates",
+                            _cql_stats.secondary_index_creates,
+                            sm::description("Counts a total number of CQL CREATE INDEX requests.")),
+
+                    sm::make_derive(
+                            "secondary_index_drops",
+                            _cql_stats.secondary_index_drops,
+                            sm::description("Counts a total number of CQL DROP INDEX requests.")),
+
+                    // secondary_index_reads total count is also included in all cql reads
+                    sm::make_derive(
+                            "secondary_index_reads",
+                            _cql_stats.secondary_index_reads,
+                            sm::description("Counts a total number of CQL read requests performed using secondary indexes.")),
+
+                    // secondary_index_rows_read total count is also included in all cql rows read
+                    sm::make_derive(
+                            "secondary_index_rows_read",
+                            _cql_stats.secondary_index_rows_read,
+                            sm::description("Counts a total number of rows read during CQL requests performed using secondary indexes.")),
+
+                    // read requests that required ALLOW FILTERING
+                    sm::make_derive(
+                            "filtered_read_requests",
+                            _cql_stats.filtered_reads,
+                            sm::description("Counts a total number of CQL read requests that required ALLOW FILTERING. See filtered_rows_read_total to compare how many rows needed to be filtered.")),
+
+                    // rows read with filtering enabled (because ALLOW FILTERING was required)
+                    sm::make_derive(
+                            "filtered_rows_read_total",
+                            _cql_stats.filtered_rows_read_total,
+                            sm::description("Counts a total number of rows read during CQL requests that required ALLOW FILTERING. See filtered_rows_matched_total and filtered_rows_dropped_total for information how accurate filtering queries are.")),
+
+                    // rows read with filtering enabled and accepted by the filter
+                    sm::make_derive(
+                            "filtered_rows_matched_total",
+                            _cql_stats.filtered_rows_matched_total,
+                            sm::description("Counts a number of rows read during CQL requests that required ALLOW FILTERING and accepted by the filter. Number similar to filtered_rows_read_total indicates that filtering is accurate.")),
+
+                    // rows read with filtering enabled and rejected by the filter
+                    sm::make_derive(
+                            "filtered_rows_dropped_total",
+                            [this]() {return _cql_stats.filtered_rows_read_total - _cql_stats.filtered_rows_matched_total;},
+                            sm::description("Counts a number of rows read during CQL requests that required ALLOW FILTERING and dropped by the filter. Number similar to filtered_rows_read_total indicates that filtering is not accurate and might cause performance degradation.")),
+
+                    sm::make_derive(
+                            "authorized_prepared_statements_cache_evictions",
+                            [] { return authorized_prepared_statements_cache::shard_stats().authorized_prepared_statements_cache_evictions; },
+                            sm::description("Counts a number of authenticated prepared statements cache entries evictions.")),
+
+                    sm::make_gauge(
+                            "authorized_prepared_statements_cache_size",
+                            [this] { return _authorized_prepared_cache.size(); },
+                            sm::description("A number of entries in the authenticated prepared statements cache.")),
+
+                    sm::make_gauge(
+                            "user_prepared_auth_cache_footprint",
+                            [this] { return _authorized_prepared_cache.memory_footprint(); },
+                            sm::description("Size (in bytes) of the authenticated prepared statements cache.")),
+
+                    sm::make_counter(
+                            "reverse_queries",
+                            _cql_stats.reverse_queries,
+                            sm::description("Counts number of CQL SELECT requests with ORDER BY DESC.")),
+
+                    sm::make_counter(
+                            "unpaged_select_queries",
+                            _cql_stats.unpaged_select_queries,
+                            sm::description("Counts number of unpaged CQL SELECT requests.")),
+
+            });

    service::get_local_migration_manager().register_listener(_migration_subscriber.get());
 }
@@ -182,7 +265,7 @@ query_processor::~query_processor() {

 future<> query_processor::stop() {
    service::get_local_migration_manager().unregister_listener(_migration_subscriber.get());
-    return make_ready_future<>();
+    return _authorized_prepared_cache.stop().finally([this] { return _prepared_cache.stop(); });
 }

 future<::shared_ptr<result_message>>
@@ -190,11 +273,11 @@ query_processor::process(const sstring_view& query_string, service::query_state&
    log.trace("process: \"{}\"", query_string);
    tracing::trace(query_state.get_trace_state(), "Parsing a statement");
    auto p = get_statement(query_string, query_state.get_client_state());
-    options.prepare(p->bound_names);
    auto cql_statement = p->statement;
    if (cql_statement->get_bound_terms() != options.get_values_count()) {
        throw exceptions::invalid_request_exception("Invalid amount of bind variables");
    }
+    options.prepare(p->bound_names);

    warn(unimplemented::cause::METRICS);
 #if 0
@@ -202,33 +285,55 @@ query_processor::process(const sstring_view& query_string, service::query_state&
            metrics.regularStatementsExecuted.inc();
 #endif
    tracing::trace(query_state.get_trace_state(), "Processing a statement");
-    return process_statement(std::move(cql_statement), query_state, options);
+    return process_statement_unprepared(std::move(cql_statement), query_state, options);
 }

 future<::shared_ptr<result_message>>
-query_processor::process_statement(
+query_processor::process_statement_unprepared(
        ::shared_ptr<cql_statement> statement,
        service::query_state& query_state,
        const query_options& options) {
-    return statement->check_access(query_state.get_client_state()).then([this, statement, &query_state, &options]() {
-        auto& client_state = query_state.get_client_state();
+    return statement->check_access(query_state.get_client_state()).then([this, statement, &query_state, &options] () mutable {
+        return process_authorized_statement(std::move(statement), query_state, options);
+    });
+}

-        statement->validate(_proxy, client_state);
+future<::shared_ptr<result_message>>
+query_processor::process_statement_prepared(
+        statements::prepared_statement::checked_weak_ptr prepared,
+        cql3::prepared_cache_key_type cache_key,
+        service::query_state& query_state,
+        const query_options& options,
+        bool needs_authorization) {

-        auto fut = make_ready_future<::shared_ptr<cql_transport::messages::result_message>>();
-        if (client_state.is_internal()) {
-            fut = statement->execute_internal(_proxy, query_state, options);
-        } else  {
-            fut = statement->execute(_proxy, query_state, options);
-        }
-
-        return fut.then([statement] (auto msg) {
-            if (msg) {
-                return make_ready_future<::shared_ptr<result_message>>(std::move(msg));
-            }
-            return make_ready_future<::shared_ptr<result_message>>(
-                ::make_shared<result_message::void_message>());
+    ::shared_ptr<cql_statement> statement = prepared->statement;
+    future<> fut = make_ready_future<>();
+    if (needs_authorization) {
+        fut = statement->check_access(query_state.get_client_state()).then([this, &query_state, prepared = std::move(prepared), cache_key = std::move(cache_key)] () mutable {
+            return _authorized_prepared_cache.insert(*query_state.get_client_state().user(), std::move(cache_key), std::move(prepared)).handle_exception([this] (auto eptr) {
+                log.error("failed to cache the entry", eptr);
+            });
        });
+    }
+
+    return fut.then([this, statement = std::move(statement), &query_state, &options] () mutable {
+        return process_authorized_statement(std::move(statement), query_state, options);
+    });
+}
+
+future<::shared_ptr<result_message>>
+query_processor::process_authorized_statement(const ::shared_ptr<cql_statement> statement, service::query_state& query_state, const query_options& options) {
+    auto& client_state = query_state.get_client_state();
+
+    statement->validate(_proxy, client_state);
+
+    auto fut = statement->execute(_proxy, query_state, options);
+
+    return fut.then([statement] (auto msg) {
+        if (msg) {
+            return make_ready_future<::shared_ptr<result_message>>(std::move(msg));
+        }
+        return make_ready_future<::shared_ptr<result_message>>(::make_shared<result_message::void_message>());
    });
 }

@@ -340,6 +445,7 @@ query_options query_processor::make_internal_options(
        const statements::prepared_statement::checked_weak_ptr& p,
        const std::initializer_list<data_value>& values,
        db::consistency_level cl,
+        const timeout_config& timeout_config,
        int32_t page_size) {
    if (p->bound_names.size() != values.size()) {
        throw std::invalid_argument(
@@ -363,10 +469,11 @@ query_options query_processor::make_internal_options(
        api::timestamp_type ts = api::missing_timestamp;
        return query_options(
                cl,
+                timeout_config,
                bound_values,
                cql3::query_options::specific_options{page_size, std::move(paging_state), serial_consistency, ts});
    }
-    return query_options(cl, bound_values);
+    return query_options(cl, timeout_config, bound_values);
 }

 statements::prepared_statement::checked_weak_ptr query_processor::prepare_internal(const sstring& query_string) {
@@ -397,7 +504,7 @@ struct internal_query_state {
 ::shared_ptr<internal_query_state> query_processor::create_paged_state(const sstring& query_string,
        const std::initializer_list<data_value>& values, int32_t page_size) {
    auto p = prepare_internal(query_string);
-    auto opts = make_internal_options(p, values, db::consistency_level::ONE, page_size);
+    auto opts = make_internal_options(p, values, db::consistency_level::ONE, infinite_timeout_config, page_size);
    ::shared_ptr<internal_query_state> res = ::make_shared<internal_query_state>(
            internal_query_state{
                    query_string,
@@ -446,7 +553,7 @@ future<> query_processor::for_each_cql_result(

 future<::shared_ptr<untyped_result_set>>
 query_processor::execute_paged_internal(::shared_ptr<internal_query_state> state) {
-    return state->p->statement->execute_internal(_proxy, *_internal_state, *state->opts).then(
+    return state->p->statement->execute(_proxy, *_internal_state, *state->opts).then(
            [state, this](::shared_ptr<cql_transport::messages::result_message> msg) mutable {
        class visitor : public result_message::visitor_base {
            ::shared_ptr<internal_query_state> _state;
@@ -485,9 +592,9 @@ future<::shared_ptr<untyped_result_set>>
 query_processor::execute_internal(
        statements::prepared_statement::checked_weak_ptr p,
        const std::initializer_list<data_value>& values) {
-    query_options opts = make_internal_options(p, values);
+    query_options opts = make_internal_options(p, values, db::consistency_level::ONE, infinite_timeout_config);
    return do_with(std::move(opts), [this, p = std::move(p)](auto& opts) {
-        return p->statement->execute_internal(
+        return p->statement->execute(
                _proxy,
                *_internal_state,
                opts).then([&opts, stmt = p->statement](auto msg) {
@@ -500,15 +607,16 @@ future<::shared_ptr<untyped_result_set>>
 query_processor::process(
        const sstring& query_string,
        db::consistency_level cl,
+        const timeout_config& timeout_config,
        const std::initializer_list<data_value>& values,
        bool cache) {
    if (cache) {
-        return process(prepare_internal(query_string), cl, values);
+        return process(prepare_internal(query_string), cl, timeout_config, values);
    } else {
        auto p = parse_statement(query_string)->prepare(_db.local(), _cql_stats);
        p->statement->validate(_proxy, *_internal_state);
        auto checked_weak_ptr = p->checked_weak_from_this();
-        return process(std::move(checked_weak_ptr), cl, values).finally([p = std::move(p)] {});
+        return process(std::move(checked_weak_ptr), cl, timeout_config, values).finally([p = std::move(p)] {});
    }
 }

@@ -516,8 +624,9 @@ future<::shared_ptr<untyped_result_set>>
 query_processor::process(
        statements::prepared_statement::checked_weak_ptr p,
        db::consistency_level cl,
+        const timeout_config& timeout_config,
        const std::initializer_list<data_value>& values) {
-    auto opts = make_internal_options(p, values, cl);
+    auto opts = make_internal_options(p, values, cl, timeout_config);
    return do_with(std::move(opts), [this, p = std::move(p)](auto & opts) {
        return p->statement->execute(_proxy, *_internal_state, opts).then([](auto msg) {
            return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
@@ -529,11 +638,18 @@ future<::shared_ptr<cql_transport::messages::result_message>>
 query_processor::process_batch(
        ::shared_ptr<statements::batch_statement> batch,
        service::query_state& query_state,
-        query_options& options) {
-    return batch->check_access(query_state.get_client_state()).then([this, &query_state, &options, batch] {
-        batch->validate();
-        batch->validate(_proxy, query_state.get_client_state());
-        return batch->execute(_proxy, query_state, options);
+        query_options& options,
+        std::unordered_map<prepared_cache_key_type, authorized_prepared_statements_cache::value_type> pending_authorization_entries) {
+    return batch->check_access(query_state.get_client_state()).then([this, &query_state, &options, batch, pending_authorization_entries = std::move(pending_authorization_entries)] () mutable {
+        return parallel_for_each(pending_authorization_entries, [this, &query_state] (auto& e) {
+            return _authorized_prepared_cache.insert(*query_state.get_client_state().user(), e.first, std::move(e.second)).handle_exception([this] (auto eptr) {
+                log.error("failed to cache the entry", eptr);
+            });
+        }).then([this, &query_state, &options, batch] {
+            batch->validate();
+            batch->validate(_proxy, query_state.get_client_state());
+            return batch->execute(_proxy, query_state, options);
+        });
    });
 }

--- a/cql3/query_processor.hh
+++ b/cql3/query_processor.hh
@@ -49,6 +49,7 @@
 #include <seastar/core/shared_ptr.hh>

 #include "cql3/prepared_statements_cache.hh"
+#include "cql3/authorized_prepared_statements_cache.hh"
 #include "cql3/query_options.hh"
 #include "cql3/statements/prepared_statement.hh"
 #include "cql3/statements/raw/parsed_statement.hh"
@@ -99,10 +100,14 @@ public:
 class query_processor {
 public:
    class migration_subscriber;
+    struct memory_config {
+        size_t prepared_statment_cache_size = 0;
+        size_t authorized_prepared_cache_size = 0;
+    };

 private:
    std::unique_ptr<migration_subscriber> _migration_subscriber;
-    distributed<service::storage_proxy>& _proxy;
+    service::storage_proxy& _proxy;
    distributed<database>& _db;

    struct stats {
@@ -117,6 +122,7 @@ private:
    std::unique_ptr<internal_state> _internal_state;

    prepared_statements_cache _prepared_cache;
+    authorized_prepared_statements_cache _authorized_prepared_cache;

    // A map for prepared statements used internally (which we don't want to mix with user statement, in particular we
    // don't bother with expiration on those.
@@ -135,7 +141,7 @@ public:

    static ::shared_ptr<statements::raw::parsed_statement> parse_statement(const std::experimental::string_view& query);

-    query_processor(distributed<service::storage_proxy>& proxy, distributed<database>& db);
+    query_processor(service::storage_proxy& proxy, distributed<database>& db, memory_config mcfg);

    ~query_processor();

@@ -143,7 +149,7 @@ public:
        return _db;
    }

-    distributed<service::storage_proxy>& proxy() {
+    service::storage_proxy& proxy() {
        return _proxy;
    }

@@ -151,6 +157,21 @@ public:
        return _cql_stats;
    }

+    statements::prepared_statement::checked_weak_ptr get_prepared(const auth::authenticated_user* user_ptr, const prepared_cache_key_type& key) {
+        if (user_ptr) {
+            auto it = _authorized_prepared_cache.find(*user_ptr, key);
+            if (it != _authorized_prepared_cache.end()) {
+                try {
+                    return it->get()->checked_weak_from_this();
+                } catch (seastar::checked_ptr_is_null_exception&) {
+                    // If the prepared statement got invalidated - remove the corresponding authorized_prepared_statements_cache entry as well.
+                    _authorized_prepared_cache.remove(*user_ptr, key);
+                }
+            }
+        }
+        return statements::prepared_statement::checked_weak_ptr();
+    }
+
    statements::prepared_statement::checked_weak_ptr get_prepared(const prepared_cache_key_type& key) {
        auto it = _prepared_cache.find(key);
        if (it == _prepared_cache.end()) {
@@ -160,11 +181,19 @@ public:
    }

    future<::shared_ptr<cql_transport::messages::result_message>>
-    process_statement(
+    process_statement_unprepared(
            ::shared_ptr<cql_statement> statement,
            service::query_state& query_state,
            const query_options& options);

+    future<::shared_ptr<cql_transport::messages::result_message>>
+    process_statement_prepared(
+            statements::prepared_statement::checked_weak_ptr statement,
+            cql3::prepared_cache_key_type cache_key,
+            service::query_state& query_state,
+            const query_options& options,
+            bool needs_authorization);
+
    future<::shared_ptr<cql_transport::messages::result_message>>
    process(
            const std::experimental::string_view& query_string,
@@ -215,12 +244,14 @@ public:
    future<::shared_ptr<untyped_result_set>> process(
            const sstring& query_string,
            db::consistency_level,
+            const timeout_config& timeout_config,
            const std::initializer_list<data_value>& = { },
            bool cache = false);

    future<::shared_ptr<untyped_result_set>> process(
            statements::prepared_statement::checked_weak_ptr p,
            db::consistency_level,
+            const timeout_config& timeout_config,
            const std::initializer_list<data_value>& = { });

    /*
@@ -242,7 +273,11 @@ public:
    future<> stop();

    future<::shared_ptr<cql_transport::messages::result_message>>
-    process_batch(::shared_ptr<statements::batch_statement>, service::query_state& query_state, query_options& options);
+    process_batch(
+            ::shared_ptr<statements::batch_statement>,
+            service::query_state& query_state,
+            query_options& options,
+            std::unordered_map<prepared_cache_key_type, authorized_prepared_statements_cache::value_type> pending_authorization_entries);

    std::unique_ptr<statements::prepared_statement> get_statement(
            const std::experimental::string_view& query,
@@ -254,9 +289,13 @@ private:
    query_options make_internal_options(
            const statements::prepared_statement::checked_weak_ptr& p,
            const std::initializer_list<data_value>&,
-            db::consistency_level = db::consistency_level::ONE,
+            db::consistency_level,
+            const timeout_config& timeout_config,
            int32_t page_size = -1);

+    future<::shared_ptr<cql_transport::messages::result_message>>
+    process_authorized_statement(const ::shared_ptr<cql_statement> statement, service::query_state& query_state, const query_options& options);
+
    /*!
     * \brief created a state object for paging
     *
--- a/cql3/restrictions/multi_column_restriction.hh
+++ b/cql3/restrictions/multi_column_restriction.hh
@@ -45,12 +45,16 @@
 #include "cql3/statements/request_validations.hh"
 #include "cql3/restrictions/primary_key_restrictions.hh"
 #include "cql3/statements/request_validations.hh"
+#include "cql3/restrictions/single_column_primary_key_restrictions.hh"

 namespace cql3 {

 namespace restrictions {

 class multi_column_restriction : public primary_key_restrictions<clustering_key_prefix> {
+private:
+    bool _has_only_asc_columns;
+    bool _has_only_desc_columns;
 protected:
    schema_ptr _schema;
    std::vector<const column_definition*> _column_defs;
@@ -58,7 +62,9 @@ public:
    multi_column_restriction(schema_ptr schema, std::vector<const column_definition*>&& defs)
        : _schema(schema)
        , _column_defs(std::move(defs))
-    { }
+    {
+        update_asc_desc_existence();
+    }

    virtual bool is_multi_column() const override {
        return true;
@@ -84,6 +90,7 @@ public:
            "Mixing single column relations and multi column relations on clustering columns is not allowed");
        auto as_pkr = static_pointer_cast<primary_key_restrictions<clustering_key_prefix>>(other);
        do_merge_with(as_pkr);
+        update_asc_desc_existence();
    }

    bool is_satisfied_by(const schema& schema,
@@ -140,6 +147,40 @@ protected:

    virtual bool is_supported_by(const secondary_index::index& index) const = 0;

+    /**
+     * @return true if the restriction contains at least one column of each
+     * ordering, false otherwise.
+     */
+    bool is_mixed_order() const {
+        return !is_desc_order() && !is_asc_order();
+    }
+
+    /**
+     * @return true if all the restricted columns ordered in descending
+     * order, false otherwise
+     */
+    bool is_desc_order() const {
+        return _has_only_desc_columns;
+    }
+
+    /**
+     * @return true if all the restricted columns ordered in ascending
+     * order, false otherwise
+     */
+    bool is_asc_order() const {
+        return _has_only_asc_columns;
+    }
+
+private:
+    /**
+     * Updates the _has_only_asc_columns and _has_only_desc_columns fields.
+     */
+    void update_asc_desc_existence() {
+        std::size_t num_of_desc =
+                std::count_if(_column_defs.begin(), _column_defs.end(),  [] (const column_definition* cd) { return cd->type->is_reversed(); });
+        _has_only_asc_columns = num_of_desc == 0;
+        _has_only_desc_columns = num_of_desc == _column_defs.size();
+    }
 #if 0
    /**
     * Check if this type of restriction is supported for the specified column by the specified index.
@@ -385,6 +426,7 @@ protected:
 };

 class multi_column_restriction::slice final : public multi_column_restriction {
+    using restriction_shared_ptr = ::shared_ptr<primary_key_restrictions<clustering_key_prefix>>;
 private:
    term_slice _slice;

@@ -422,24 +464,11 @@ public:
    }

    virtual std::vector<bounds_range_type> bounds_ranges(const query_options& options) const override {
-        // FIXME: doesn't work properly with mixed CLUSTERING ORDER (CASSANDRA-7281)
-        auto read_bound = [&] (statements::bound b) -> std::experimental::optional<bounds_range_type::bound> {
-            if (!has_bound(b)) {
-                return {};
-            }
-            auto vals = component_bounds(b, options);
-            for (unsigned i = 0; i < vals.size(); i++) {
-                statements::request_validations::check_not_null(vals[i], "Invalid null value in condition for column %s", _column_defs.at(i)->name_as_text());
-            }
-            auto prefix = clustering_key_prefix::from_optional_exploded(*_schema, vals);
-            return bounds_range_type::bound(prefix, is_inclusive(b));
-        };
-        auto range = wrapping_range<clustering_key_prefix>(read_bound(statements::bound::START), read_bound(statements::bound::END));
-        auto bounds = bound_view::from_range(range);
-        if (bound_view::compare(*_schema)(bounds.second, bounds.first)) {
-            return { };
+        if (!is_mixed_order()) {
+            return bounds_ranges_unified_order(options);
+        } else {
+            return bounds_ranges_mixed_order(options);
        }
-        return { bounds_range_type(std::move(range)) };
    }
 #if 0
        @Override
@@ -514,6 +543,221 @@ private:
        auto value = static_pointer_cast<tuples::value>(_slice.bound(b)->bind(options));
        return value->get_elements();
    }
+
+    std::vector<bytes_opt> read_bound_components(const query_options& options, statements::bound b) const {
+        if (!has_bound(b)) {
+            return {};
+        }
+        auto vals = component_bounds(b, options);
+        for (unsigned i = 0; i < vals.size(); i++) {
+            statements::request_validations::check_not_null(vals[i], "Invalid null value in condition for column %s", _column_defs.at(i)->name_as_text());
+        }
+        return vals;
+    }
+
+    /**
+     * Retrieve the bounds for the case that all clustering columns have the same order.
+     * Having the same order implies we can do a prefix search on the data.
+     * @param options the query options
+     * @return the vector of ranges for the restriction
+     */
+    std::vector<bounds_range_type> bounds_ranges_unified_order(const query_options& options) const {
+        auto start_prefix = clustering_key_prefix::from_optional_exploded(*_schema, read_bound_components(options, statements::bound::START));
+        auto start_bound = bounds_range_type::bound(std::move(start_prefix), is_inclusive(statements::bound::START));
+        auto end_prefix = clustering_key_prefix::from_optional_exploded(*_schema, read_bound_components(options, statements::bound::END));
+        auto end_bound = bounds_range_type::bound(std::move(end_prefix), is_inclusive(statements::bound::END));
+        auto make_range = [&] () {
+            if (is_asc_order()) {
+                return bounds_range_type::make(start_bound, end_bound);
+            } else {
+                return bounds_range_type::make(end_bound, start_bound);
+            }
+        };
+        auto range = make_range();
+        auto bounds = bound_view::from_range(range);
+        if (bound_view::compare(*_schema)(bounds.second, bounds.first)) {
+            return { };
+        }
+        return { std::move(range) };
+    }
+
+    /**
+     * Retrieve the bounds when clustering columns are mixed order
+     * (contains ASC and DESC together).
+     * Having mixed order implies that a prefix search can't take place,
+     * instead, the bounds have to be broken down to separate prefix serchable
+     * ranges such that their combination is equivalent to the original range.
+     * @param options the query options
+     * @return the vector of ranges for the restriction
+     */
+    std::vector<bounds_range_type> bounds_ranges_mixed_order(const query_options& options) const {
+        std::vector<bounds_range_type> ret_ranges;
+        auto mixed_order_restrictions = build_mixed_order_restriction_set(options);
+        ret_ranges.reserve(mixed_order_restrictions.size());
+        for (auto r : mixed_order_restrictions) {
+            for (auto&& range : r->bounds_ranges(options)) {
+                ret_ranges.emplace_back(std::move(range));
+            }
+        }
+        return ret_ranges;
+    }
+
+    /**
+     * The function returns the first real inequality component.
+     * The first real inequality is the index of the first component in the
+     * tuple that will turn into a slice single column restriction.
+     * For example: (a, b, c) > (0, 1, 2) and (a, b, c) < (0, 1, 5) will be
+     * broken into one single column restriction set of the form:
+     * a = 0 and b = 1 and c > 2 and c < 5 , c is the first element that has
+     * inequality so for this case the function will return 2.
+     * @param start_components - the components of the starts tuple range.
+     * @param end_components - the components of the end tuple range.
+     * @return an empty value if not found and the index of the first index that
+     * will yield inequality
+     */
+    std::optional<std::size_t> find_first_neq_component(std::vector<bytes_opt>& start_components,
+                                                        std::vector<bytes_opt>& end_components) const {
+        size_t common_components_count = std::min(start_components.size(), end_components.size());
+        for (size_t i = 0; i < common_components_count ; i++) {
+            if (start_components[i].value() != end_components[i].value()) {
+                return i;
+            }
+        }
+
+        size_t max_components_count = std::max(start_components.size(), end_components.size());
+        if (common_components_count < max_components_count) {
+            return common_components_count;
+        } else {
+            return std::nullopt;
+        }
+    }
+
+    /**
+     * Creates a single column restriction which is either slice or equality.
+     * @param bound - if bound is empty this is an equality, if its either START or END ,
+     *        this is the corresponding slice restriction.
+     * @param inclusive - is the slice inclusive (ignored for equality).
+     * @param column_pos - the column position to restrict
+     * @param value - the value to restrict the colum with.
+     * @return a shared pointer to the just created restriction.
+     */
+    ::shared_ptr<restriction> make_single_column_restriction(std::optional<cql3::statements::bound> bound, bool inclusive,
+                                                             std::size_t column_pos,const bytes_opt& value) const {
+        ::shared_ptr<cql3::term> term = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(value));
+        if (!bound){
+            return ::make_shared<cql3::restrictions::single_column_restriction::EQ>(*_column_defs[column_pos], term);
+        } else {
+            return ::make_shared<cql3::restrictions::single_column_restriction::slice>(*_column_defs[column_pos], bound.value(), inclusive, term);
+        }
+    }
+
+    /**
+     * A helper function to create a single column restrictions set from a tuple relation on
+     * clustering keys.
+     * i.e : (a,b,c) >= (0,1,2) will become:
+     *      1.a > 0
+     *      2. a = 0 and b > 1
+     *      3. a = 0 and b = 1 and c >=2
+     * @param bound - determines if the operator is '>' (START) or '<' (END)
+     * @param bound_inclusive - determines if to append equality to the operator i.e: if > becomes >=
+     * @param bound_values - the tuple values for the restriction
+     * @param first_neq_component - the first component that will have inequality.
+     *        for the example above, if this parameter is 1, only restrictions 2 and 3 will be created.
+     *        this parameter helps to facilitate the nuances of breaking more complex relations, for example when
+     *        there is in existence a second condition limiting the other side of the bound
+     *        i.e:(a,b,c) >= (0,1,2)  and (a,b,c) < (5,6,7), this will require each bound to use the parameter.
+     * @return the single column restriction set built according to the above parameters.
+     */
+    std::vector<restriction_shared_ptr> make_single_bound_restrictions(statements::bound bound, bool bound_inclusive,
+                                                                       std::vector<bytes_opt>& bound_values,
+                                                                       std::size_t first_neq_component) const{
+        std::vector<restriction_shared_ptr> ret;
+        std::size_t num_of_restrictions = bound_values.size() - first_neq_component;
+        ret.reserve(num_of_restrictions);
+        for (std::size_t i = 0;i < num_of_restrictions ; i++) {
+            ret.emplace_back(::make_shared<cql3::restrictions::single_column_primary_key_restrictions<clustering_key>>(_schema, false));
+            std::size_t neq_component_idx = first_neq_component + i;
+            for (std::size_t j = 0;j < neq_component_idx; j++) {
+                ret[i]->merge_with(make_single_column_restriction(std::nullopt, false, j, bound_values[j]));
+            }
+            bool inclusive = (i == (num_of_restrictions-1)) && bound_inclusive;
+            ret[i]->merge_with(make_single_column_restriction(bound, inclusive, neq_component_idx, bound_values[neq_component_idx]));
+        }
+        return ret;
+    }
+
+    /**
+     * Builds and returns a set of restrictions such that the union of their ranges (the restrictions OR-ed together)
+     * is logically identical to this restriction, with the additional property that it can execute
+     * correctly when the clustering columns are with "mixed order" - contains ASC and DESC orderings.
+     * for more information: https://github.com/scylladb/scylla/issues/2050
+     * @param options - the query options
+     * @return set of restrictions which their ranges union is logically identical to this restriction.
+     */
+    std::vector<::shared_ptr<primary_key_restrictions<clustering_key_prefix>>>
+    build_mixed_order_restriction_set(const query_options& options) const {
+        std::vector<restriction_shared_ptr> ret;
+        auto start_components = read_bound_components(options, statements::bound::START);
+        auto end_components = read_bound_components(options, statements::bound::END);
+        bool start_inclusive = is_inclusive(statements::bound::START);
+        bool end_inclusive = is_inclusive(statements::bound::END);
+        std::optional<std::size_t> first_neq_component = std::nullopt;
+
+        // find the first index of the first component that is not equal between the tuples.
+        if (start_components.empty() || end_components.empty()) {
+            first_neq_component = 0;
+        } else {
+            auto tuple_mismatch = std::mismatch(start_components.begin(), start_components.end(),
+                    end_components.begin(), end_components.end());
+            if ((tuple_mismatch.first != start_components.end()) ||
+                (tuple_mismatch.second != end_components.end())) {
+                first_neq_component = std::distance(start_components.begin(), tuple_mismatch.first);
+            }
+        }
+
+        // this is either a simple equality or a never fulfilled restriction
+        if (!first_neq_component && start_inclusive && end_inclusive) {
+            // This is a simple equality case
+            shared_ptr<cql3::term> term = ::make_shared<cql3::tuples::value>(start_components);
+            ret.emplace_back(::make_shared<cql3::restrictions::multi_column_restriction::EQ>(_schema, _column_defs, term));
+            return ret;
+        } else if (!first_neq_component) {
+            // This is a contradiction case
+            return {};
+        } else if ((*first_neq_component == end_components.size() && !end_inclusive ) ||
+                   (*first_neq_component == start_components.size() && !start_inclusive )) {
+            // This is a case where one bound is a prefix of the other. If this prefix bound
+            // is not inclusive the result will be an empty set.
+            return {};
+        }
+
+        bool start_components_exists = (start_components.size() - first_neq_component.value()) > 0;
+        bool end_components_exists = (end_components.size() - first_neq_component.value()) > 0;
+        bool both_components_exists = start_components_exists && end_components_exists;
+        if (start_components_exists) {
+            auto restrictions =
+                    make_single_bound_restrictions(statements::bound::START, start_inclusive, start_components, first_neq_component.value());
+            for (auto&& r : restrictions) {
+                ret.emplace_back(r);
+            }
+        }
+
+        if (end_components_exists) {
+            auto restrictions =
+                    make_single_bound_restrictions(statements::bound::END, end_inclusive,
+                            end_components, first_neq_component.value() + both_components_exists);
+            for (auto&& r : restrictions) {
+                ret.emplace_back(r);
+            }
+        }
+
+        if (both_components_exists) {
+            bool inclusive = end_inclusive && ((end_components.size() - first_neq_component.value()) == 1);
+            ret[0]->merge_with(make_single_column_restriction(statements::bound::END, inclusive, first_neq_component.value(),
+                    end_components[first_neq_component.value()]));
+        }
+        return ret;
+    }
 };

 }
--- a/cql3/restrictions/primary_key_restrictions.hh
+++ b/cql3/restrictions/primary_key_restrictions.hh
@@ -88,6 +88,7 @@ public:

    using restrictions::uses_function;
    using restrictions::has_supporting_index;
+    using restrictions::values;

    bool empty() const override {
        return get_column_defs().empty();
@@ -95,7 +96,72 @@ public:
    uint32_t size() const override {
        return uint32_t(get_column_defs().size());
    }
+
+    bool has_unrestricted_components(const schema& schema) const;
+
+    virtual bool needs_filtering(const schema& schema) const;
+
+    // How long a prefix of the restrictions could have resulted in
+    // need_filtering() == false. These restrictions do not need to be
+    // applied during filtering.
+    // For example, if we have the filter "c1 < 3 and c2 > 3", c1 does
+    // not need filtering (just a read stopping at c1=3) but c2 does,
+    // so num_prefix_columns_that_need_not_be_filtered() will be 1.
+    virtual unsigned int num_prefix_columns_that_need_not_be_filtered() const {
+        return 0;
+    }
+
+    virtual bool is_all_eq() const {
+        return false;
+    }
+    virtual size_t prefix_size() const {
+        return 0;
+    }
+
+    size_t prefix_size(const schema_ptr schema) const {
+        return 0;
+    }
+
 };

+template<>
+inline bool primary_key_restrictions<partition_key>::has_unrestricted_components(const schema& schema) const {
+    return size() < schema.partition_key_size();
+}
+
+template<>
+inline bool primary_key_restrictions<clustering_key>::has_unrestricted_components(const schema& schema) const {
+    return size() < schema.clustering_key_size();
+}
+
+template<>
+inline bool primary_key_restrictions<partition_key>::needs_filtering(const schema& schema) const  {
+    return !empty() && !is_on_token() && (has_unrestricted_components(schema) || is_contains() || is_slice());
+}
+
+template<>
+inline bool primary_key_restrictions<clustering_key>::needs_filtering(const schema& schema) const  {
+    // Currently only overloaded single_column_primary_key_restrictions will require ALLOW FILTERING
+    return false;
+}
+
+template<>
+inline size_t primary_key_restrictions<clustering_key>::prefix_size(const schema_ptr schema) const {
+    size_t count = 0;
+    if (schema->clustering_key_columns().empty()) {
+        return count;
+    }
+    auto column_defs = get_column_defs();
+    column_id expected_column_id = schema->clustering_key_columns().begin()->id;
+    for (auto&& cdef : column_defs) {
+        if (schema->position(*cdef) != expected_column_id) {
+            return count;
+        }
+        expected_column_id++;
+        count++;
+    }
+    return count;
+}
+
 }
 }
--- a/cql3/restrictions/restrictions.hh
+++ b/cql3/restrictions/restrictions.hh
@@ -68,6 +68,10 @@ public:

    virtual std::vector<bytes_opt> values(const query_options& options) const = 0;

+    virtual bytes_opt value_for(const column_definition& cdef, const query_options& options) const {
+        throw exceptions::invalid_request_exception("Single value can be obtained from single-column restrictions only");
+    }
+
    /**
     * Returns <code>true</code> if one of the restrictions use the specified function.
     *
--- a/cql3/restrictions/single_column_primary_key_restrictions.hh
+++ b/cql3/restrictions/single_column_primary_key_restrictions.hh
@@ -49,6 +49,7 @@
 #include <boost/algorithm/cxx11/all_of.hpp>
 #include <boost/range/adaptor/transformed.hpp>
 #include <boost/range/adaptor/filtered.hpp>
+#include <boost/range/adaptor/map.hpp>

 namespace cql3 {

@@ -62,21 +63,46 @@ class single_column_primary_key_restrictions : public primary_key_restrictions<V
    using range_type = query::range<ValueType>;
    using range_bound = typename range_type::bound;
    using bounds_range_type = typename primary_key_restrictions<ValueType>::bounds_range_type;
+    template<typename OtherValueType>
+    friend class single_column_primary_key_restrictions;
 private:
    schema_ptr _schema;
+    bool _allow_filtering;
    ::shared_ptr<single_column_restrictions> _restrictions;
    bool _slice;
    bool _contains;
    bool _in;
 public:
-    single_column_primary_key_restrictions(schema_ptr schema)
+    single_column_primary_key_restrictions(schema_ptr schema, bool allow_filtering)
        : _schema(schema)
+        , _allow_filtering(allow_filtering)
        , _restrictions(::make_shared<single_column_restrictions>(schema))
        , _slice(false)
        , _contains(false)
        , _in(false)
    { }

+    // Convert another primary key restrictions type into this type, possibly using different schema
+    template<typename OtherValueType>
+    explicit single_column_primary_key_restrictions(schema_ptr schema, const single_column_primary_key_restrictions<OtherValueType>& other)
+        : _schema(schema)
+        , _allow_filtering(other._allow_filtering)
+        , _restrictions(::make_shared<single_column_restrictions>(schema))
+        , _slice(other._slice)
+        , _contains(other._contains)
+        , _in(other._in)
+    {
+        for (const auto& entry : other._restrictions->restrictions()) {
+            const column_definition* other_cdef = entry.first;
+            const column_definition* this_cdef = _schema->get_column_definition(other_cdef->name());
+            if (!this_cdef) {
+                throw exceptions::invalid_request_exception(sprint("Base column %s not found in view index schema", other_cdef->name_as_text()));
+            }
+            ::shared_ptr<single_column_restriction> restriction = entry.second;
+            _restrictions->add_restriction(restriction->apply_to(*this_cdef));
+        }
+    }
+
    virtual bool is_on_token() const override {
        return false;
    }
@@ -97,6 +123,10 @@ public:
        return _in;
    }

+    virtual bool is_all_eq() const override {
+        return _restrictions->is_all_eq();
+    }
+
    virtual bool has_bound(statements::bound b) const override {
        return boost::algorithm::all_of(_restrictions->restrictions(), [b] (auto&& r) { return r.second->has_bound(b); });
    }
@@ -110,7 +140,7 @@ public:
    }

    void do_merge_with(::shared_ptr<single_column_restriction> restriction) {
-        if (!_restrictions->empty()) {
+        if (!_restrictions->empty() && !_allow_filtering) {
            auto last_column = *_restrictions->last_column();
            auto new_column = restriction->get_column_def();

@@ -127,11 +157,6 @@ public:
                        last_column.name_as_text(), new_column.name_as_text()));
                }
            }
-
-            if (_in && _schema->position(new_column) > _schema->position(last_column)) {
-                throw exceptions::invalid_request_exception(sprint("Clustering column \"%s\" cannot be restricted by an IN relation",
-                    new_column.name_as_text()));
-            }
        }

        _slice |= restriction->is_slice();
@@ -140,6 +165,25 @@ public:
        _restrictions->add_restriction(restriction);
    }

+    virtual size_t prefix_size() const override {
+        return primary_key_restrictions<ValueType>::prefix_size(_schema);
+    }
+
+    ::shared_ptr<single_column_primary_key_restrictions<clustering_key>> get_longest_prefix_restrictions() {
+        static_assert(std::is_same_v<ValueType, clustering_key>, "Only clustering key can produce longest prefix restrictions");
+        size_t current_prefix_size = prefix_size();
+        if (current_prefix_size == _restrictions->restrictions().size()) {
+            return dynamic_pointer_cast<single_column_primary_key_restrictions<clustering_key>>(this->shared_from_this());
+        }
+
+        auto longest_prefix_restrictions = ::make_shared<single_column_primary_key_restrictions<clustering_key>>(_schema, _allow_filtering);
+        auto restriction_it = _restrictions->restrictions().begin();
+        for (size_t i = 0; i < current_prefix_size; ++i) {
+            longest_prefix_restrictions->merge_with((restriction_it++)->second);
+        }
+        return longest_prefix_restrictions;
+    }
+
    virtual void merge_with(::shared_ptr<restriction> restriction) override {
        if (restriction->is_multi_column()) {
            throw exceptions::invalid_request_exception(
@@ -312,11 +356,20 @@ public:
        }
        return res;
    }
+
+    virtual bytes_opt value_for(const column_definition& cdef, const query_options& options) const override {
+        return _restrictions->value_for(cdef, options);
+    }
+
    std::vector<bytes_opt> bounds(statements::bound b, const query_options& options) const override {
        // TODO: if this proved to be required.
        fail(unimplemented::cause::LEGACY_COMPOSITE_KEYS); // not 100% correct...
    }

+    const single_column_restrictions::restrictions_map& restrictions() const {
+        return _restrictions->restrictions();
+    }
+
    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const override {
        return _restrictions->has_supporting_index(index_manager);
    }
@@ -352,10 +405,13 @@ public:
            _restrictions->restrictions() | boost::adaptors::map_values,
            [&] (auto&& r) { return r->is_satisfied_by(schema, key, ckey, cells, options, now); });
    }
+
+    virtual bool needs_filtering(const schema& schema) const override;
+    virtual unsigned int num_prefix_columns_that_need_not_be_filtered() const override;
 };

 template<>
-dht::partition_range_vector
+inline dht::partition_range_vector
 single_column_primary_key_restrictions<partition_key>::bounds_ranges(const query_options& options) const {
    dht::partition_range_vector ranges;
    ranges.reserve(size());
@@ -373,7 +429,7 @@ single_column_primary_key_restrictions<partition_key>::bounds_ranges(const query
 }

 template<>
-std::vector<query::clustering_range>
+inline std::vector<query::clustering_range>
 single_column_primary_key_restrictions<clustering_key_prefix>::bounds_ranges(const query_options& options) const {
    auto wrapping_bounds = compute_bounds(options);
    auto bounds = boost::copy_range<query::clustering_row_ranges>(wrapping_bounds
@@ -409,6 +465,62 @@ single_column_primary_key_restrictions<clustering_key_prefix>::bounds_ranges(con
    return bounds;
 }

+template<>
+inline bool single_column_primary_key_restrictions<partition_key>::needs_filtering(const schema& schema) const {
+    return primary_key_restrictions<partition_key>::needs_filtering(schema);
+}
+
+template<>
+inline bool single_column_primary_key_restrictions<clustering_key>::needs_filtering(const schema& schema) const {
+    // Restrictions currently need filtering in three cases:
+    // 1. any of them is a CONTAINS restriction
+    // 2. restrictions do not form a contiguous prefix (i.e. there are gaps in it)
+    // 3. a SLICE restriction isn't on a last place
+    column_id position = 0;
+    for (const auto& restriction : _restrictions->restrictions() | boost::adaptors::map_values) {
+        if (restriction->is_contains() || position != restriction->get_column_def().id) {
+            return true;
+        }
+        if (!restriction->is_slice()) {
+            position = restriction->get_column_def().id + 1;
+        }
+    }
+    return false;
+}
+
+// How many of the restrictions (in column order) do not need filtering
+// because they are implemented as a slice (potentially, a contiguous disk
+// read). For example, if we have the filter "c1 < 3 and c2 > 3", c1 does not
+// need filtering but c2 does so num_prefix_columns_that_need_not_be_filtered
+// will be 1.
+// The implementation of num_prefix_columns_that_need_not_be_filtered() is
+// closely tied to that of needs_filtering() above - basically, if only the
+// first num_prefix_columns_that_need_not_be_filtered() restrictions existed,
+// then needs_filtering() would have returned false.
+template<>
+inline unsigned single_column_primary_key_restrictions<clustering_key>::num_prefix_columns_that_need_not_be_filtered() const {
+    column_id position = 0;
+    unsigned int count = 0;
+    for (const auto& restriction : _restrictions->restrictions() | boost::adaptors::map_values) {
+        if (restriction->is_contains() || position != restriction->get_column_def().id) {
+            return count;
+        }
+        if (!restriction->is_slice()) {
+            position = restriction->get_column_def().id + 1;
+        }
+        count++;
+    }
+    return count;
+}
+
+template<>
+inline unsigned single_column_primary_key_restrictions<partition_key>::num_prefix_columns_that_need_not_be_filtered() const {
+    // skip_filtering() is currently called only for clustering key
+    // restrictions, so it doesn't matter what we return here.
+    return 0;
+}
+
+
 }
 }

--- a/cql3/restrictions/single_column_restriction.hh
+++ b/cql3/restrictions/single_column_restriction.hh
@@ -93,6 +93,9 @@ public:
    }

    virtual bool is_supported_by(const secondary_index::index& index) const = 0;
+    using abstract_restriction::is_satisfied_by;
+    virtual bool is_satisfied_by(bytes_view data, const query_options& options) const = 0;
+    virtual ::shared_ptr<single_column_restriction> apply_to(const column_definition& cdef) = 0;
 #if 0
    /**
     * Check if this type of restriction is supported by the specified index.
@@ -113,7 +116,7 @@ public:
    class contains;

 protected:
-    bytes_view_opt get_value(const schema& schema,
+    std::optional<atomic_cell_value_view> get_value(const schema& schema,
            const partition_key& key,
            const clustering_key_prefix& ckey,
            const row& cells,
@@ -166,6 +169,10 @@ public:
                                 const row& cells,
                                 const query_options& options,
                                 gc_clock::time_point now) const override;
+    virtual bool is_satisfied_by(bytes_view data, const query_options& options) const override;
+    virtual ::shared_ptr<single_column_restriction> apply_to(const column_definition& cdef) override {
+        return ::make_shared<EQ>(cdef, _value);
+    }

 #if 0
        @Override
@@ -201,6 +208,10 @@ public:
                                 const row& cells,
                                 const query_options& options,
                                 gc_clock::time_point now) const override;
+    virtual bool is_satisfied_by(bytes_view data, const query_options& options) const override;
+    virtual ::shared_ptr<single_column_restriction> apply_to(const column_definition& cdef) override {
+        throw std::logic_error("IN superclass should never be cloned directly");
+    }

    virtual std::vector<bytes_opt> values_raw(const query_options& options) const = 0;

@@ -243,6 +254,10 @@ public:
    virtual sstring to_string() const override {
        return sprint("IN(%s)", std::to_string(_values));
    }
+
+    virtual ::shared_ptr<single_column_restriction> apply_to(const column_definition& cdef) override {
+        return ::make_shared<IN_with_values>(cdef, _values);
+    }
 };

 class single_column_restriction::IN_with_marker : public IN {
@@ -268,6 +283,10 @@ public:
    virtual sstring to_string() const override {
        return "IN ?";
    }
+
+    virtual ::shared_ptr<single_column_restriction> apply_to(const column_definition& cdef) override {
+        return ::make_shared<IN_with_marker>(cdef, _marker);
+    }
 };

 class single_column_restriction::slice : public single_column_restriction {
@@ -279,6 +298,11 @@ public:
        , _slice(term_slice::new_instance(bound, inclusive, std::move(term)))
    { }

+    slice(const column_definition& column_def, term_slice slice)
+        : single_column_restriction(column_def)
+        , _slice(slice)
+    { }
+
    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override {
        return (_slice.has_bound(statements::bound::START) && abstract_restriction::term_uses_function(_slice.bound(statements::bound::START), ks_name, function_name))
                || (_slice.has_bound(statements::bound::END) && abstract_restriction::term_uses_function(_slice.bound(statements::bound::END), ks_name, function_name));
@@ -364,6 +388,10 @@ public:
                                 const row& cells,
                                 const query_options& options,
                                 gc_clock::time_point now) const override;
+    virtual bool is_satisfied_by(bytes_view data, const query_options& options) const override;
+    virtual ::shared_ptr<single_column_restriction> apply_to(const column_definition& cdef) override {
+        return ::make_shared<slice>(cdef, _slice);
+    }
 };

 // This holds CONTAINS, CONTAINS_KEY, and map[key] = value restrictions because we might want to have any combination of them.
@@ -485,6 +513,10 @@ public:
                                 const row& cells,
                                 const query_options& options,
                                 gc_clock::time_point now) const override;
+    virtual bool is_satisfied_by(bytes_view data, const query_options& options) const override;
+    virtual ::shared_ptr<single_column_restriction> apply_to(const column_definition& cdef) override {
+        throw std::logic_error("Cloning 'contains' restriction is not implemented.");
+    }

 #if 0
        private List<ByteBuffer> keys(const query_options& options) {
--- a/cql3/restrictions/single_column_restrictions.hh
+++ b/cql3/restrictions/single_column_restrictions.hh
@@ -111,6 +111,11 @@ public:
        return r;
    }

+    virtual bytes_opt value_for(const column_definition& cdef, const query_options& options) const override {
+        auto it = _restrictions.find(std::addressof(cdef));
+        return (it != _restrictions.end()) ? it->second->value(options) : bytes_opt{};
+    }
+
    /**
     * Returns the restriction associated to the specified column.
     *
--- a/cql3/restrictions/statement_restrictions.cc
+++ b/cql3/restrictions/statement_restrictions.cc
@@ -23,6 +23,7 @@
 #include <boost/range/algorithm/transform.hpp>
 #include <boost/range/algorithm.hpp>
 #include <boost/range/adaptors.hpp>
+#include <boost/algorithm/cxx11/any_of.hpp>

 #include "statement_restrictions.hh"
 #include "single_column_primary_key_restrictions.hh"
@@ -36,19 +37,24 @@
 namespace cql3 {
 namespace restrictions {

+static logging::logger rlogger("restrictions");
+
 using boost::adaptors::filtered;
 using boost::adaptors::transformed;

 template<typename T>
 class statement_restrictions::initial_key_restrictions : public primary_key_restrictions<T> {
+    bool _allow_filtering;
 public:
+    initial_key_restrictions(bool allow_filtering)
+        : _allow_filtering(allow_filtering) {}
    using bounds_range_type = typename primary_key_restrictions<T>::bounds_range_type;

    ::shared_ptr<primary_key_restrictions<T>> do_merge_to(schema_ptr schema, ::shared_ptr<restriction> restriction) const {
        if (restriction->is_multi_column()) {
            throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
        }
-        return ::make_shared<single_column_primary_key_restrictions<T>>(schema)->merge_to(schema, restriction);
+        return ::make_shared<single_column_primary_key_restrictions<T>>(schema, _allow_filtering)->merge_to(schema, restriction);
    }
    ::shared_ptr<primary_key_restrictions<T>> merge_to(schema_ptr schema, ::shared_ptr<restriction> restriction) override {
        if (restriction->is_multi_column()) {
@@ -57,7 +63,7 @@ public:
        if (restriction->is_on_token()) {
            return static_pointer_cast<token_restriction>(restriction);
        }
-        return ::make_shared<single_column_primary_key_restrictions<T>>(schema)->merge_to(restriction);
+        return ::make_shared<single_column_primary_key_restrictions<T>>(schema, _allow_filtering)->merge_to(restriction);
    }
    void merge_with(::shared_ptr<restriction> restriction) override {
        throw exceptions::unsupported_operation_exception();
@@ -66,6 +72,9 @@ public:
        // throw? should not reach?
        return {};
    }
+    bytes_opt value_for(const column_definition& cdef, const query_options& options) const override {
+        return {};
+    }
    std::vector<T> values_as_keys(const query_options& options) const override {
        // throw? should not reach?
        return {};
@@ -122,9 +131,10 @@ statement_restrictions::initial_key_restrictions<clustering_key_prefix>::merge_t
 }

 template<typename T>
-::shared_ptr<primary_key_restrictions<T>> statement_restrictions::get_initial_key_restrictions() {
-    static thread_local ::shared_ptr<primary_key_restrictions<T>> initial_kr = ::make_shared<initial_key_restrictions<T>>();
-    return initial_kr;
+::shared_ptr<primary_key_restrictions<T>> statement_restrictions::get_initial_key_restrictions(bool allow_filtering) {
+    static thread_local ::shared_ptr<primary_key_restrictions<T>> initial_kr_true = ::make_shared<initial_key_restrictions<T>>(true);
+    static thread_local ::shared_ptr<primary_key_restrictions<T>> initial_kr_false = ::make_shared<initial_key_restrictions<T>>(false);
+    return allow_filtering ? initial_kr_true : initial_kr_false;
 }

 std::vector<::shared_ptr<column_identifier>>
@@ -141,10 +151,10 @@ statement_restrictions::get_partition_key_unrestricted_components() const {
    return r;
 }

-statement_restrictions::statement_restrictions(schema_ptr schema)
+statement_restrictions::statement_restrictions(schema_ptr schema, bool allow_filtering)
    : _schema(schema)
-    , _partition_key_restrictions(get_initial_key_restrictions<partition_key>())
-    , _clustering_columns_restrictions(get_initial_key_restrictions<clustering_key_prefix>())
+    , _partition_key_restrictions(get_initial_key_restrictions<partition_key>(allow_filtering))
+    , _clustering_columns_restrictions(get_initial_key_restrictions<clustering_key_prefix>(allow_filtering))
    , _nonprimary_key_restrictions(::make_shared<single_column_restrictions>(schema))
 { }
 #if 0
@@ -162,8 +172,9 @@ statement_restrictions::statement_restrictions(database& db,
        ::shared_ptr<variable_specifications> bound_names,
        bool selects_only_static_columns,
        bool select_a_collection,
-        bool for_view)
-    : statement_restrictions(schema)
+        bool for_view,
+        bool allow_filtering)
+    : statement_restrictions(schema, allow_filtering)
 {
    /*
     * WHERE clause. For a given entity, rules are: - EQ relation conflicts with anything else (including a 2nd EQ)
@@ -197,23 +208,22 @@ statement_restrictions::statement_restrictions(database& db,
                    throw exceptions::invalid_request_exception(sprint("restriction '%s' is only supported in materialized view creation", relation->to_string()));
                }
            } else {
-                add_restriction(relation->to_restriction(db, schema, bound_names));
+                add_restriction(relation->to_restriction(db, schema, bound_names), for_view, allow_filtering);
            }
        }
    }
    auto& cf = db.find_column_family(schema);
    auto& sim = cf.get_index_manager();
-    bool has_queriable_clustering_column_index = _clustering_columns_restrictions->has_supporting_index(sim);
-    bool has_queriable_index = has_queriable_clustering_column_index
-            || _partition_key_restrictions->has_supporting_index(sim)
-            || _nonprimary_key_restrictions->has_supporting_index(sim);
+    const bool has_queriable_clustering_column_index = _clustering_columns_restrictions->has_supporting_index(sim);
+    const bool has_queriable_pk_index = _partition_key_restrictions->has_supporting_index(sim);
+    const bool has_queriable_regular_index = _nonprimary_key_restrictions->has_supporting_index(sim);

    // At this point, the select statement if fully constructed, but we still have a few things to validate
-    process_partition_key_restrictions(has_queriable_index, for_view);
+    process_partition_key_restrictions(has_queriable_pk_index, for_view, allow_filtering);

    // Some but not all of the partition key columns have been specified;
    // hence we need turn these restrictions into index expressions.
-    if (_uses_secondary_indexing) {
+    if (_uses_secondary_indexing || _partition_key_restrictions->needs_filtering(*_schema)) {
        _index_restrictions.push_back(_partition_key_restrictions);
    }

@@ -229,13 +239,14 @@ statement_restrictions::statement_restrictions(database& db,
        }
    }

-    process_clustering_columns_restrictions(has_queriable_index, select_a_collection, for_view);
+    process_clustering_columns_restrictions(has_queriable_clustering_column_index, select_a_collection, for_view, allow_filtering);

    // Covers indexes on the first clustering column (among others).
-    if (_is_key_range && has_queriable_clustering_column_index)
-    _uses_secondary_indexing = true;
+    if (_is_key_range && has_queriable_clustering_column_index) {
+        _uses_secondary_indexing = true;
+    }

-    if (_uses_secondary_indexing) {
+    if (_uses_secondary_indexing || _clustering_columns_restrictions->needs_filtering(*_schema)) {
        _index_restrictions.push_back(_clustering_columns_restrictions);
    } else if (_clustering_columns_restrictions->is_contains()) {
        fail(unimplemented::cause::INDEXES);
@@ -264,31 +275,48 @@ statement_restrictions::statement_restrictions(database& db,
        uses_secondary_indexing = true;
 #endif
    }
-    // Even if uses_secondary_indexing is false at this point, we'll still have to use one if
-    // there is restrictions not covered by the PK.
+
    if (!_nonprimary_key_restrictions->empty()) {
-        _uses_secondary_indexing = true;
+        if (has_queriable_regular_index) {
+            _uses_secondary_indexing = true;
+        } else if (!allow_filtering) {
+            throw exceptions::invalid_request_exception("Cannot execute this query as it might involve data filtering and "
+                "thus may have unpredictable performance. If you want to execute "
+                "this query despite the performance unpredictability, use ALLOW FILTERING");
+        }
        _index_restrictions.push_back(_nonprimary_key_restrictions);
    }

-    if (_uses_secondary_indexing && !for_view) {
+    if (_uses_secondary_indexing && !(for_view || allow_filtering)) {
        validate_secondary_index_selections(selects_only_static_columns);
    }
 }

-void statement_restrictions::add_restriction(::shared_ptr<restriction> restriction) {
+void statement_restrictions::add_restriction(::shared_ptr<restriction> restriction, bool for_view, bool allow_filtering) {
    if (restriction->is_multi_column()) {
        _clustering_columns_restrictions = _clustering_columns_restrictions->merge_to(_schema, restriction);
    } else if (restriction->is_on_token()) {
        _partition_key_restrictions = _partition_key_restrictions->merge_to(_schema, restriction);
    } else {
-        add_single_column_restriction(::static_pointer_cast<single_column_restriction>(restriction));
+        add_single_column_restriction(::static_pointer_cast<single_column_restriction>(restriction), for_view, allow_filtering);
    }
 }

-void statement_restrictions::add_single_column_restriction(::shared_ptr<single_column_restriction> restriction) {
+void statement_restrictions::add_single_column_restriction(::shared_ptr<single_column_restriction> restriction, bool for_view, bool allow_filtering) {
    auto& def = restriction->get_column_def();
    if (def.is_partition_key()) {
+        // A SELECT query may not request a slice (range) of partition keys
+        // without using token(). This is because there is no way to do this
+        // query efficiently: mumur3 turns a contiguous range of partition
+        // keys into tokens all over the token space.
+        // However, in a SELECT statement used to define a materialized view,
+        // such a slice is fine - it is used to check whether individual
+        // partitions, match, and does not present a performance problem.
+        assert(!restriction->is_on_token());
+        if (restriction->is_slice() && !for_view && !allow_filtering) {
+            throw exceptions::invalid_request_exception(
+                    "Only EQ and IN relation are supported on the partition key (unless you use the token() function or allow filtering)");
+        }
        _partition_key_restrictions = _partition_key_restrictions->merge_to(_schema, restriction);
    } else if (def.is_clustering_key()) {
        _clustering_columns_restrictions = _clustering_columns_restrictions->merge_to(_schema, restriction);
@@ -307,7 +335,54 @@ const std::vector<::shared_ptr<restrictions>>& statement_restrictions::index_res
    return _index_restrictions;
 }

-void statement_restrictions::process_partition_key_restrictions(bool has_queriable_index, bool for_view) {
+std::optional<secondary_index::index> statement_restrictions::find_idx(secondary_index::secondary_index_manager& sim) const {
+    for (::shared_ptr<cql3::restrictions::restrictions> restriction : index_restrictions()) {
+        for (const auto& cdef : restriction->get_column_defs()) {
+            for (auto index : sim.list_indexes()) {
+                if (index.depends_on(*cdef)) {
+                    return std::make_optional<secondary_index::index>(std::move(index));
+                }
+            }
+        }
+    }
+    return std::nullopt;
+}
+
+std::vector<const column_definition*> statement_restrictions::get_column_defs_for_filtering(database& db) const {
+    std::vector<const column_definition*> column_defs_for_filtering;
+    if (need_filtering()) {
+        auto& sim = db.find_column_family(_schema).get_index_manager();
+        std::optional<secondary_index::index> opt_idx = find_idx(sim);
+        auto column_uses_indexing = [&opt_idx] (const column_definition* cdef) {
+            return opt_idx && opt_idx->depends_on(*cdef);
+        };
+        if (_partition_key_restrictions->needs_filtering(*_schema)) {
+            for (auto&& cdef : _partition_key_restrictions->get_column_defs()) {
+                if (!column_uses_indexing(cdef)) {
+                    column_defs_for_filtering.emplace_back(cdef);
+                }
+            }
+        }
+        const bool pk_has_unrestricted_components = _partition_key_restrictions->has_unrestricted_components(*_schema);
+        if (pk_has_unrestricted_components || _clustering_columns_restrictions->needs_filtering(*_schema)) {
+            column_id first_filtering_id = pk_has_unrestricted_components ? 0 : _schema->clustering_key_columns().begin()->id +
+                    _clustering_columns_restrictions->num_prefix_columns_that_need_not_be_filtered();
+            for (auto&& cdef : _clustering_columns_restrictions->get_column_defs()) {
+                if (cdef->id >= first_filtering_id && !column_uses_indexing(cdef)) {
+                    column_defs_for_filtering.emplace_back(cdef);
+                }
+            }
+        }
+        for (auto&& cdef : _nonprimary_key_restrictions->get_column_defs()) {
+            if (!column_uses_indexing(cdef)) {
+                column_defs_for_filtering.emplace_back(cdef);
+            }
+        }
+    }
+    return column_defs_for_filtering;
+}
+
+void statement_restrictions::process_partition_key_restrictions(bool has_queriable_index, bool for_view, bool allow_filtering) {
    // If there is a queriable index, no special condition are required on the other restrictions.
    // But we still need to know 2 things:
    // - If we don't have a queriable index, is the query ok
@@ -316,28 +391,32 @@ void statement_restrictions::process_partition_key_restrictions(bool has_queriab
    // components must have a EQ. Only the last partition key component can be in IN relation.
    if (_partition_key_restrictions->is_on_token()) {
        _is_key_range = true;
-    } else if (has_partition_key_unrestricted_components()) {
-        if (!_partition_key_restrictions->empty() && !for_view) {
-            if (!has_queriable_index) {
-                throw exceptions::invalid_request_exception(sprint("Partition key parts: %s must be restricted as other parts are",
-                    join(", ", get_partition_key_unrestricted_components())));
-            }
-        }
-
+    } else if (_partition_key_restrictions->has_unrestricted_components(*_schema)) {
        _is_key_range = true;
        _uses_secondary_indexing = has_queriable_index;
    }
+
+    if (_partition_key_restrictions->needs_filtering(*_schema)) {
+        if (!allow_filtering && !for_view && !has_queriable_index) {
+            throw exceptions::invalid_request_exception("Cannot execute this query as it might involve data filtering and "
+                "thus may have unpredictable performance. If you want to execute "
+                "this query despite the performance unpredictability, use ALLOW FILTERING");
+        }
+        _is_key_range = true;
+        _uses_secondary_indexing = has_queriable_index;
+    }
+
 }

 bool statement_restrictions::has_partition_key_unrestricted_components() const {
-    return _partition_key_restrictions->size() < _schema->partition_key_size();
+    return _partition_key_restrictions->has_unrestricted_components(*_schema);
 }

 bool statement_restrictions::has_unrestricted_clustering_columns() const {
-    return _clustering_columns_restrictions->size() < _schema->clustering_key_size();
+    return _clustering_columns_restrictions->has_unrestricted_components(*_schema);
 }

-void statement_restrictions::process_clustering_columns_restrictions(bool has_queriable_index, bool select_a_collection, bool for_view) {
+void statement_restrictions::process_clustering_columns_restrictions(bool has_queriable_index, bool select_a_collection, bool for_view, bool allow_filtering) {
    if (!has_clustering_columns_restriction()) {
        return;
    }
@@ -346,38 +425,36 @@ void statement_restrictions::process_clustering_columns_restrictions(bool has_qu
        throw exceptions::invalid_request_exception(
            "Cannot restrict clustering columns by IN relations when a collection is selected by the query");
    }
-    if (_clustering_columns_restrictions->is_contains() && !has_queriable_index) {
+    if (_clustering_columns_restrictions->is_contains() && !has_queriable_index && !allow_filtering) {
        throw exceptions::invalid_request_exception(
-            "Cannot restrict clustering columns by a CONTAINS relation without a secondary index");
+            "Cannot restrict clustering columns by a CONTAINS relation without a secondary index or filtering");
    }

-    auto clustering_columns_iter = _schema->clustering_key_columns().begin();
-
-    for (auto&& restricted_column : _clustering_columns_restrictions->get_column_defs()) {
-        const column_definition* clustering_column = &(*clustering_columns_iter);
-        ++clustering_columns_iter;
-
-        if (clustering_column != restricted_column && !for_view) {
-            if (!has_queriable_index) {
-                throw exceptions::invalid_request_exception(sprint(
-                    "PRIMARY KEY column \"%s\" cannot be restricted as preceding column \"%s\" is not restricted",
-                    restricted_column->name_as_text(), clustering_column->name_as_text()));
+    if (has_clustering_columns_restriction() && _clustering_columns_restrictions->needs_filtering(*_schema)) {
+        if (has_queriable_index) {
+            _uses_secondary_indexing = true;
+        } else if (!allow_filtering && !for_view) {
+            auto clustering_columns_iter = _schema->clustering_key_columns().begin();
+            for (auto&& restricted_column : _clustering_columns_restrictions->get_column_defs()) {
+                const column_definition* clustering_column = &(*clustering_columns_iter);
+                ++clustering_columns_iter;
+                if (clustering_column != restricted_column) {
+                        throw exceptions::invalid_request_exception(sprint(
+                            "PRIMARY KEY column \"%s\" cannot be restricted as preceding column \"%s\" is not restricted",
+                            restricted_column->name_as_text(), clustering_column->name_as_text()));
+                }
            }
-
-            _uses_secondary_indexing = true; // handle gaps and non-keyrange cases.
-            break;
        }
    }
-
-    if (_clustering_columns_restrictions->is_contains()) {
-        _uses_secondary_indexing = true;
-    }
 }

 dht::partition_range_vector statement_restrictions::get_partition_key_ranges(const query_options& options) const {
    if (_partition_key_restrictions->empty()) {
        return {dht::partition_range::make_open_ended_both_sides()};
    }
+    if (_partition_key_restrictions->needs_filtering(*_schema)) {
+        return {dht::partition_range::make_open_ended_both_sides()};
+    }
    return _partition_key_restrictions->bounds_ranges(options);
 }

@@ -385,18 +462,40 @@ std::vector<query::clustering_range> statement_restrictions::get_clustering_boun
    if (_clustering_columns_restrictions->empty()) {
        return {query::clustering_range::make_open_ended_both_sides()};
    }
+    if (_clustering_columns_restrictions->needs_filtering(*_schema)) {
+        if (auto single_ck_restrictions = dynamic_pointer_cast<single_column_primary_key_restrictions<clustering_key>>(_clustering_columns_restrictions)) {
+            return single_ck_restrictions->get_longest_prefix_restrictions()->bounds_ranges(options);
+        }
+        return {query::clustering_range::make_open_ended_both_sides()};
+    }
    return _clustering_columns_restrictions->bounds_ranges(options);
 }

-bool statement_restrictions::need_filtering() {
-    uint32_t number_of_restricted_columns = 0;
+bool statement_restrictions::need_filtering() const {
+    uint32_t number_of_restricted_columns_for_indexing = 0;
    for (auto&& restrictions : _index_restrictions) {
-        number_of_restricted_columns += restrictions->size();
+        number_of_restricted_columns_for_indexing += restrictions->size();
    }

-    return number_of_restricted_columns > 1
-           || (number_of_restricted_columns == 0 && has_clustering_columns_restriction())
-           || (number_of_restricted_columns != 0 && _nonprimary_key_restrictions->has_multiple_contains());
+    int number_of_filtering_restrictions = _nonprimary_key_restrictions->size();
+    // If the whole partition key is restricted, it does not imply filtering
+    if (_partition_key_restrictions->has_unrestricted_components(*_schema) || !_partition_key_restrictions->is_all_eq()) {
+        number_of_filtering_restrictions += _partition_key_restrictions->size() + _clustering_columns_restrictions->size();
+    } else if (_clustering_columns_restrictions->has_unrestricted_components(*_schema)) {
+        number_of_filtering_restrictions += _clustering_columns_restrictions->size() - _clustering_columns_restrictions->prefix_size();
+    }
+
+    if (_partition_key_restrictions->is_multi_column() || _clustering_columns_restrictions->is_multi_column()) {
+        // TODO(sarna): Implement ALLOW FILTERING support for multi-column restrictions - return false for now
+        // in order to ensure backwards compatibility
+        return false;
+    }
+
+    return number_of_restricted_columns_for_indexing > 1
+            || (number_of_restricted_columns_for_indexing == 0 && _partition_key_restrictions->empty() && !_clustering_columns_restrictions->empty())
+            || (number_of_restricted_columns_for_indexing != 0 && _nonprimary_key_restrictions->has_multiple_contains())
+            || (number_of_restricted_columns_for_indexing != 0 && !_uses_secondary_indexing)
+            || (_uses_secondary_indexing && number_of_filtering_restrictions > 1);
 }

 void statement_restrictions::validate_secondary_index_selections(bool selects_only_static_columns) {
@@ -414,7 +513,34 @@ void statement_restrictions::validate_secondary_index_selections(bool selects_on
    }
 }

-static bytes_view_opt do_get_value(const schema& schema,
+const single_column_restrictions::restrictions_map& statement_restrictions::get_single_column_partition_key_restrictions() const {
+    static single_column_restrictions::restrictions_map empty;
+    auto single_restrictions = dynamic_pointer_cast<single_column_primary_key_restrictions<partition_key>>(_partition_key_restrictions);
+    if (!single_restrictions) {
+        if (dynamic_pointer_cast<initial_key_restrictions<partition_key>>(_partition_key_restrictions)) {
+            return empty;
+        }
+        throw std::runtime_error("statement restrictions for multi-column partition key restrictions are not implemented yet");
+    }
+    return single_restrictions->restrictions();
+}
+
+/**
+ * @return clustering key restrictions split into single column restrictions (e.g. for filtering support).
+ */
+const single_column_restrictions::restrictions_map& statement_restrictions::get_single_column_clustering_key_restrictions() const {
+    static single_column_restrictions::restrictions_map empty;
+    auto single_restrictions = dynamic_pointer_cast<single_column_primary_key_restrictions<clustering_key>>(_clustering_columns_restrictions);
+    if (!single_restrictions) {
+        if (dynamic_pointer_cast<initial_key_restrictions<clustering_key>>(_clustering_columns_restrictions)) {
+            return empty;
+        }
+        throw std::runtime_error("statement restrictions for multi-column partition key restrictions are not implemented yet");
+    }
+    return single_restrictions->restrictions();
+}
+
+static std::optional<atomic_cell_value_view> do_get_value(const schema& schema,
        const column_definition& cdef,
        const partition_key& key,
        const clustering_key_prefix& ckey,
@@ -422,21 +548,21 @@ static bytes_view_opt do_get_value(const schema& schema,
        gc_clock::time_point now) {
    switch(cdef.kind) {
        case column_kind::partition_key:
-            return key.get_component(schema, cdef.component_index());
+            return atomic_cell_value_view(key.get_component(schema, cdef.component_index()));
        case column_kind::clustering_key:
-            return ckey.get_component(schema, cdef.component_index());
+            return atomic_cell_value_view(ckey.get_component(schema, cdef.component_index()));
        default:
            auto cell = cells.find_cell(cdef.id);
            if (!cell) {
-                return stdx::nullopt;
+                return std::nullopt;
            }
            assert(cdef.is_atomic());
-            auto c = cell->as_atomic_cell();
-            return c.is_dead(now) ? stdx::nullopt : bytes_view_opt(c.value());
+            auto c = cell->as_atomic_cell(cdef);
+            return c.is_dead(now) ? std::nullopt : std::optional<atomic_cell_value_view>(c.value());
    }
 }

-bytes_view_opt single_column_restriction::get_value(const schema& schema,
+std::optional<atomic_cell_value_view> single_column_restriction::get_value(const schema& schema,
        const partition_key& key,
        const clustering_key_prefix& ckey,
        const row& cells,
@@ -456,11 +582,24 @@ bool single_column_restriction::EQ::is_satisfied_by(const schema& schema,
    auto operand = value(options);
    if (operand) {
        auto cell_value = get_value(schema, key, ckey, cells, now);
-        return cell_value && _column_def.type->compare(*operand, *cell_value) == 0;
+        if (!cell_value) {
+            return false;
+        }
+        return cell_value->with_linearized([&] (bytes_view cell_value_bv) {
+            return _column_def.type->compare(*operand, cell_value_bv) == 0;
+        });
    }
    return false;
 }

+bool single_column_restriction::EQ::is_satisfied_by(bytes_view data, const query_options& options) const {
+    if (_column_def.type->is_counter()) {
+        fail(unimplemented::cause::COUNTERS);
+    }
+    auto operand = value(options);
+    return operand && _column_def.type->compare(*operand, data) == 0;
+}
+
 bool single_column_restriction::IN::is_satisfied_by(const schema& schema,
        const partition_key& key,
        const clustering_key_prefix& ckey,
@@ -475,8 +614,20 @@ bool single_column_restriction::IN::is_satisfied_by(const schema& schema,
        return false;
    }
    auto operands = values(options);
+  return cell_value->with_linearized([&] (bytes_view cell_value_bv) {
    return std::any_of(operands.begin(), operands.end(), [&] (auto&& operand) {
-        return operand && _column_def.type->compare(*operand, *cell_value) == 0;
+        return operand && _column_def.type->compare(*operand, cell_value_bv) == 0;
+    });
+  });
+}
+
+bool single_column_restriction::IN::is_satisfied_by(bytes_view data, const query_options& options) const {
+    if (_column_def.type->is_counter()) {
+        fail(unimplemented::cause::COUNTERS);
+    }
+    auto operands = values(options);
+    return boost::algorithm::any_of(operands, [this, &data] (const bytes_opt& operand) {
+        return operand && _column_def.type->compare(*operand, data) == 0;
    });
 }

@@ -490,7 +641,8 @@ static query::range<bytes_view> to_range(const term_slice& slice, const query_op
        if (!value) {
            return { };
        }
-        return { range_type::bound(*value, slice.is_inclusive(bound)) };
+        auto value_view = options.linearize(*value);
+        return { range_type::bound(value_view, slice.is_inclusive(bound)) };
    };
    return range_type(
        extract_bound(statements::bound::START),
@@ -510,7 +662,16 @@ bool single_column_restriction::slice::is_satisfied_by(const schema& schema,
    if (!cell_value) {
        return false;
    }
-    return to_range(_slice, options).contains(*cell_value, _column_def.type->as_tri_comparator());
+    return cell_value->with_linearized([&] (bytes_view cell_value_bv) {
+        return to_range(_slice, options).contains(cell_value_bv, _column_def.type->as_tri_comparator());
+    });
+}
+
+bool single_column_restriction::slice::is_satisfied_by(bytes_view data, const query_options& options) const {
+    if (_column_def.type->is_counter()) {
+        fail(unimplemented::cause::COUNTERS);
+    }
+    return to_range(_slice, options).contains(data, _column_def.type->underlying_type()->as_tri_comparator());
 }

 bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
@@ -536,7 +697,8 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
    auto&& element_type = col_type->is_set() ? col_type->name_comparator() : col_type->value_comparator();
    if (_column_def.type->is_multi_cell()) {
        auto cell = cells.find_cell(_column_def.id);
-        auto&& elements = col_type->deserialize_mutation_form(cell->as_collection_mutation()).cells;
+      return cell->as_collection_mutation().data.with_linearized([&] (bytes_view collection_bv) {
+        auto&& elements = col_type->deserialize_mutation_form(collection_bv).cells;
        auto end = std::remove_if(elements.begin(), elements.end(), [now] (auto&& element) {
            return element.second.is_dead(now);
        });
@@ -545,8 +707,12 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
            if (!val) {
                continue;
            }
-            auto found = std::find_if(elements.begin(), end, [&] (auto&& element) {
-                return element_type->compare(element.second.value(), *val) == 0;
+            auto found = with_linearized(*val, [&] (bytes_view bv) {
+              return std::find_if(elements.begin(), end, [&] (auto&& element) {
+                return element.second.value().with_linearized([&] (bytes_view value_bv) {
+                    return element_type->compare(value_bv, bv) == 0;
+                });
+              });
            });
            if (found == end) {
                return false;
@@ -557,8 +723,10 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
            if (!k) {
                continue;
            }
-            auto found = std::find_if(elements.begin(), end, [&] (auto&& element) {
-                return map_key_type->compare(element.first, *k) == 0;
+            auto found = with_linearized(*k, [&] (bytes_view bv) {
+              return std::find_if(elements.begin(), end, [&] (auto&& element) {
+                return map_key_type->compare(element.first, bv) == 0;
+              });
            });
            if (found == end) {
                return false;
@@ -570,27 +738,42 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
            if (!map_key || !map_value) {
                continue;
            }
-            auto found = std::find_if(elements.begin(), end, [&] (auto&& element) {
-                return map_key_type->compare(element.first, *map_key) == 0;
+            auto found = with_linearized(*map_key, [&] (bytes_view map_key_bv) {
+              return std::find_if(elements.begin(), end, [&] (auto&& element) {
+                return map_key_type->compare(element.first, map_key_bv) == 0;
+              });
            });
-            if (found == end || element_type->compare(found->second.value(), *map_value) != 0) {
+            if (found == end) {
+                return false;
+            }
+            auto cmp = with_linearized(*map_value, [&] (bytes_view map_value_bv) {
+              return found->second.value().with_linearized([&] (bytes_view value_bv) {
+                return element_type->compare(value_bv, map_value_bv);
+              });
+            });
+            if (cmp != 0) {
                return false;
            }
        }
+        return true;
+      });
    } else {
        auto cell_value = get_value(schema, key, ckey, cells, now);
        if (!cell_value) {
            return false;
        }
-        auto deserialized = _column_def.type->deserialize(*cell_value);
+        auto deserialized = cell_value->with_linearized([&] (bytes_view cell_value_bv) {
+            return _column_def.type->deserialize(cell_value_bv);
+        });
        for (auto&& value : _values) {
-            auto val = value->bind_and_get(options);
-            if (!val) {
+            auto fragmented_val = value->bind_and_get(options);
+            if (!fragmented_val) {
                continue;
            }
+          return with_linearized(*fragmented_val, [&] (bytes_view val) {
            auto exists_in = [&](auto&& range) {
                auto found = std::find_if(range.begin(), range.end(), [&] (auto&& element) {
-                    return element_type->compare(element.serialize(), *val) == 0;
+                    return element_type->compare(element.serialize(), val) == 0;
                });
                return found != range.end();
            };
@@ -608,6 +791,8 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
                    return false;
                }
            }
+            return true;
+          });
        }
        if (col_type->is_map()) {
            auto& data_map = value_cast<map_type_impl::native_type>(deserialized);
@@ -616,8 +801,10 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
                if (!k) {
                    continue;
                }
-                auto found = std::find_if(data_map.begin(), data_map.end(), [&] (auto&& element) {
-                    return map_key_type->compare(element.first.serialize(), *k) == 0;
+                auto found = with_linearized(*k, [&] (bytes_view k_bv) {
+                  return std::find_if(data_map.begin(), data_map.end(), [&] (auto&& element) {
+                    return map_key_type->compare(element.first.serialize(), k_bv) == 0;
+                  });
                });
                if (found == data_map.end()) {
                    return false;
@@ -629,10 +816,15 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
                if (!map_key || !map_value) {
                    continue;
                }
-                auto found = std::find_if(data_map.begin(), data_map.end(), [&] (auto&& element) {
-                    return map_key_type->compare(element.first.serialize(), *map_key) == 0;
+                auto found = with_linearized(*map_key, [&] (bytes_view map_key_bv) {
+                  return std::find_if(data_map.begin(), data_map.end(), [&] (auto&& element) {
+                    return map_key_type->compare(element.first.serialize(), map_key_bv) == 0;
+                  });
                });
-                if (found == data_map.end() || element_type->compare(found->second.serialize(), *map_value) != 0) {
+                if (found == data_map.end()
+                    || with_linearized(*map_value, [&] (bytes_view map_value_bv) {
+                         return element_type->compare(found->second.serialize(), map_value_bv);
+                       }) != 0) {
                    return false;
                }
            }
@@ -642,6 +834,11 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
    return true;
 }

+bool single_column_restriction::contains::is_satisfied_by(bytes_view data, const query_options& options) const {
+    //TODO(sarna): Deserialize & return. It would be nice to deduplicate, is_satisfied_by above is rather long
+    fail(unimplemented::cause::INDEXES);
+}
+
 bool token_restriction::EQ::is_satisfied_by(const schema& schema,
        const partition_key& key,
        const clustering_key_prefix& ckey,
@@ -653,7 +850,9 @@ bool token_restriction::EQ::is_satisfied_by(const schema& schema,
    for (auto&& operand : values(options)) {
        if (operand) {
            auto cell_value = do_get_value(schema, **cdef, key, ckey, cells, now);
-            satisfied = cell_value && (*cdef)->type->compare(*operand, *cell_value) == 0;
+            satisfied = cell_value && cell_value->with_linearized([&] (bytes_view cell_value_bv) {
+                return (*cdef)->type->compare(*operand, cell_value_bv) == 0;
+            });
        }
        if (!satisfied) {
            break;
@@ -675,7 +874,9 @@ bool token_restriction::slice::is_satisfied_by(const schema& schema,
        if (!cell_value) {
            return false;
        }
-        satisfied = range.contains(*cell_value, cdef->type->as_tri_comparator());
+        satisfied = cell_value->with_linearized([&] (bytes_view cell_value_bv) {
+            return range.contains(cell_value_bv, cdef->type->as_tri_comparator());
+        });
        if (!satisfied) {
            break;
        }
--- a/cql3/restrictions/statement_restrictions.hh
+++ b/cql3/restrictions/statement_restrictions.hh
@@ -67,7 +67,7 @@ private:
    class initial_key_restrictions;

    template<typename T>
-    static ::shared_ptr<primary_key_restrictions<T>> get_initial_key_restrictions();
+    static ::shared_ptr<primary_key_restrictions<T>> get_initial_key_restrictions(bool allow_filtering);

    /**
     * Restrictions on partitioning columns
@@ -108,7 +108,7 @@ public:
     * @param cfm the column family meta data
     * @return a new empty <code>StatementRestrictions</code>.
     */
-    statement_restrictions(schema_ptr schema);
+    statement_restrictions(schema_ptr schema, bool allow_filtering);

    statement_restrictions(database& db,
        schema_ptr schema,
@@ -117,10 +117,11 @@ public:
        ::shared_ptr<variable_specifications> bound_names,
        bool selects_only_static_columns,
        bool select_a_collection,
-        bool for_view = false);
+        bool for_view = false,
+        bool allow_filtering = false);
 private:
-    void add_restriction(::shared_ptr<restriction> restriction);
-    void add_single_column_restriction(::shared_ptr<single_column_restriction> restriction);
+    void add_restriction(::shared_ptr<restriction> restriction, bool for_view, bool allow_filtering);
+    void add_single_column_restriction(::shared_ptr<single_column_restriction> restriction, bool for_view, bool allow_filtering);
 public:
    bool uses_function(const sstring& ks_name, const sstring& function_name) const;

@@ -162,6 +163,20 @@ public:
        return _clustering_columns_restrictions;
    }

+    /**
+     * Builds a possibly empty collection of column definitions that will be used for filtering
+     * @param db - the database context
+     * @return A list with the column definitions needed for filtering.
+     */
+    std::vector<const column_definition*> get_column_defs_for_filtering(database& db) const;
+
+    /**
+     * Determines the index to be used with the restriction.
+     * @param db - the database context (for extracting index manager)
+     * @return If an index can be used, an optional containing this index, otherwise an empty optional.
+     */
+    std::optional<secondary_index::index> find_idx(secondary_index::secondary_index_manager& sim) const;
+
    /**
     * Checks if the partition key has some unrestricted components.
     * @return <code>true</code> if the partition key has some unrestricted components, <code>false</code> otherwise.
@@ -174,7 +189,7 @@ public:
     */
    bool has_unrestricted_clustering_columns() const;
 private:
-    void process_partition_key_restrictions(bool has_queriable_index, bool for_view);
+    void process_partition_key_restrictions(bool has_queriable_index, bool for_view, bool allow_filtering);

    /**
     * Returns the partition key components that are not restricted.
@@ -189,7 +204,7 @@ private:
     * @param select_a_collection <code>true</code> if the query should return a collection column
     * @throws InvalidRequestException if the request is invalid
     */
-    void process_clustering_columns_restrictions(bool has_queriable_index, bool select_a_collection, bool for_view);
+    void process_clustering_columns_restrictions(bool has_queriable_index, bool select_a_collection, bool for_view, bool allow_filtering);

    /**
     * Returns the <code>Restrictions</code> for the specified type of columns.
@@ -357,7 +372,7 @@ public:
     * Checks if the query need to use filtering.
     * @return <code>true</code> if the query need to use filtering, <code>false</code> otherwise.
     */
-    bool need_filtering();
+    bool need_filtering() const;

    void validate_secondary_index_selections(bool selects_only_static_columns);

@@ -380,6 +395,14 @@ public:
        return !_nonprimary_key_restrictions->empty();
    }

+    bool pk_restrictions_need_filtering() const {
+        return _partition_key_restrictions->needs_filtering(*_schema);
+    }
+
+    bool ck_restrictions_need_filtering() const {
+        return _partition_key_restrictions->has_unrestricted_components(*_schema) || _clustering_columns_restrictions->needs_filtering(*_schema);
+    }
+
    /**
     * @return true if column is restricted by some restriction, false otherwise
     */
@@ -398,6 +421,16 @@ public:
    const single_column_restrictions::restrictions_map& get_non_pk_restriction() const {
        return _nonprimary_key_restrictions->restrictions();
    }
+
+    /**
+     * @return partition key restrictions split into single column restrictions (e.g. for filtering support).
+     */
+    const single_column_restrictions::restrictions_map& get_single_column_partition_key_restrictions() const;
+
+    /**
+     * @return clustering key restrictions split into single column restrictions (e.g. for filtering support).
+     */
+    const single_column_restrictions::restrictions_map& get_single_column_clustering_key_restrictions() const;
 };

 }
--- a/cql3/result_generator.hh
+++ b/cql3/result_generator.hh
@@ -0,0 +1,139 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "selection/selection.hh"
+#include "stats.hh"
+
+namespace cql3 {
+
+class result_generator {
+    schema_ptr _schema;
+    foreign_ptr<lw_shared_ptr<query::result>> _result;
+    lw_shared_ptr<const query::read_command> _command;
+    shared_ptr<const selection::selection> _selection;
+    cql_stats* _stats;
+private:
+    template<typename Visitor>
+    class query_result_visitor {
+        const schema& _schema;
+        std::vector<bytes> _partition_key;
+        std::vector<bytes> _clustering_key;
+        uint32_t _partition_row_count = 0;
+        uint32_t _total_row_count = 0;
+        Visitor& _visitor;
+        const selection::selection& _selection;
+    private:
+        void accept_cell_value(const column_definition& def, query::result_row_view::iterator_type& i) {
+            if (def.is_multi_cell()) {
+                _visitor.accept_value(i.next_collection_cell());
+            } else {
+                auto cell = i.next_atomic_cell();
+                _visitor.accept_value(cell ? std::optional<query::result_bytes_view>(cell->value()) : std::optional<query::result_bytes_view>());
+            }
+        }
+    public:
+        query_result_visitor(const schema& s, Visitor& visitor, const selection::selection& select)
+            : _schema(s), _visitor(visitor), _selection(select) { }
+
+        void accept_new_partition(const partition_key& key, uint32_t row_count) {
+            _partition_key = key.explode(_schema);
+            accept_new_partition(row_count);
+        }
+        void accept_new_partition(uint32_t row_count) {
+            _partition_row_count = row_count;
+            _total_row_count += row_count;
+        }
+
+        void accept_new_row(const clustering_key& key, query::result_row_view static_row,
+                            query::result_row_view row) {
+            _clustering_key = key.explode(_schema);
+            accept_new_row(static_row, row);
+        }
+        void accept_new_row(query::result_row_view static_row, query::result_row_view row) {
+            auto static_row_iterator = static_row.iterator();
+            auto row_iterator = row.iterator();
+            _visitor.start_row();
+            for (auto&& def : _selection.get_columns()) {
+                switch (def->kind) {
+                case column_kind::partition_key:
+                    _visitor.accept_value(query::result_bytes_view(bytes_view(_partition_key[def->component_index()])));
+                    break;
+                case column_kind::clustering_key:
+                    if (_clustering_key.size() > def->component_index()) {
+                        _visitor.accept_value(query::result_bytes_view(bytes_view(_clustering_key[def->component_index()])));
+                    } else {
+                        _visitor.accept_value({});
+                    }
+                    break;
+                case column_kind::regular_column:
+                    accept_cell_value(*def, row_iterator);
+                    break;
+                case column_kind::static_column:
+                    accept_cell_value(*def, static_row_iterator);
+                    break;
+                }
+            }
+            _visitor.end_row();
+        }
+
+        void accept_partition_end(const query::result_row_view& static_row) {
+            if (_partition_row_count == 0) {
+                _total_row_count++;
+                _visitor.start_row();
+                auto static_row_iterator = static_row.iterator();
+                for (auto&& def : _selection.get_columns()) {
+                    if (def->is_partition_key()) {
+                        _visitor.accept_value(query::result_bytes_view(bytes_view(_partition_key[def->component_index()])));
+                    } else if (def->is_static()) {
+                        accept_cell_value(*def, static_row_iterator);
+                    } else {
+                        _visitor.accept_value({});
+                    }
+                }
+                _visitor.end_row();
+            }
+        }
+
+        uint32_t rows_read() const { return _total_row_count; }
+    };
+public:
+    result_generator() = default;
+
+    result_generator(schema_ptr s, foreign_ptr<lw_shared_ptr<query::result>> result, lw_shared_ptr<const query::read_command> cmd,
+                     ::shared_ptr<const selection::selection> select, cql_stats& stats)
+        : _schema(std::move(s))
+        , _result(std::move(result))
+        , _command(std::move(cmd))
+        , _selection(std::move(select))
+        , _stats(&stats)
+    { }
+
+    template<typename Visitor>
+    void visit(Visitor&& visitor) const {
+        query_result_visitor<Visitor> v(*_schema, visitor, *_selection);
+        query::result_view::consume(*_result, _command->slice, v);
+        _stats->rows_read += v.rows_read();
+    }
+};
+
+}
--- a/cql3/result_set.cc
+++ b/cql3/result_set.cc
@@ -45,27 +45,25 @@ namespace cql3 {

 metadata::metadata(std::vector<::shared_ptr<column_specification>> names_)
        : _flags(flag_enum_set())
-        , names(std::move(names_)) {
-    _column_count = names.size();
-}
+        , _column_info(make_lw_shared<column_info>(std::move(names_)))
+{ }

 metadata::metadata(flag_enum_set flags, std::vector<::shared_ptr<column_specification>> names_, uint32_t column_count,
        ::shared_ptr<const service::pager::paging_state> paging_state)
    : _flags(flags)
-    , names(std::move(names_))
-    , _column_count(column_count)
+    , _column_info(make_lw_shared<column_info>(std::move(names_), column_count))
    , _paging_state(std::move(paging_state))
 { }

 // The maximum number of values that the ResultSet can hold. This can be bigger than columnCount due to CASSANDRA-4911
 uint32_t metadata::value_count() const {
-    return _flags.contains<flag::NO_METADATA>() ? _column_count : names.size();
+    return _flags.contains<flag::NO_METADATA>() ? _column_info->_column_count : _column_info->_names.size();
 }

 void metadata::add_non_serialized_column(::shared_ptr<column_specification> name) {
    // See comment above. Because columnCount doesn't account the newly added name, it
    // won't be serialized.
-    names.emplace_back(std::move(name));
+    _column_info->_names.emplace_back(std::move(name));
 }

 bool metadata::all_in_same_cf() const {
@@ -73,18 +71,24 @@ bool metadata::all_in_same_cf() const {
        return false;
    }

-    return column_specification::all_in_same_table(names);
+    return column_specification::all_in_same_table(_column_info->_names);
 }

-void metadata::set_has_more_pages(::shared_ptr<const service::pager::paging_state> paging_state) {
-    if (!paging_state) {
-        return;
-    }
-
+void metadata::set_paging_state(::shared_ptr<const service::pager::paging_state> paging_state) {
    _flags.set<flag::HAS_MORE_PAGES>();
    _paging_state = std::move(paging_state);
 }

+void metadata::maybe_set_paging_state(::shared_ptr<const service::pager::paging_state> paging_state) {
+    assert(paging_state);
+    if (paging_state->get_remaining() > 0) {
+        set_paging_state(std::move(paging_state));
+    } else {
+        _flags.remove<flag::HAS_MORE_PAGES>();
+        _paging_state = nullptr;
+    }
+}
+
 void metadata::set_skip_metadata() {
    _flags.set<flag::NO_METADATA>();
 }
@@ -93,18 +97,10 @@ metadata::flag_enum_set metadata::flags() const {
    return _flags;
 }

-uint32_t metadata::column_count() const {
-    return _column_count;
-}
-
 ::shared_ptr<const service::pager::paging_state> metadata::paging_state() const {
    return _paging_state;
 }

-const std::vector<::shared_ptr<column_specification>>& metadata::get_names() const {
-    return names;
-}
-
 prepared_metadata::prepared_metadata(const std::vector<::shared_ptr<column_specification>>& names,
                                     const std::vector<uint16_t>& partition_key_bind_indices)
    : _names{names}
--- a/cql3/result_set.hh
+++ b/cql3/result_set.hh
@@ -47,6 +47,12 @@
 #include "service/pager/paging_state.hh"
 #include "schema.hh"

+#include "query-result-reader.hh"
+
+#include "result_generator.hh"
+
+#include <seastar/util/gcc6-concepts.hh>
+
 namespace cql3 {

 class metadata {
@@ -64,18 +70,29 @@ public:

    using flag_enum_set = enum_set<flag_enum>;

-private:
-    flag_enum_set _flags;
-
-public:
+    struct column_info {
    // Please note that columnCount can actually be smaller than names, even if names is not null. This is
    // used to include columns in the resultSet that we need to do post-query re-orderings
    // (SelectStatement.orderResults) but that shouldn't be sent to the user as they haven't been requested
    // (CASSANDRA-4911). So the serialization code will exclude any columns in name whose index is >= columnCount.
-    std::vector<::shared_ptr<column_specification>> names;
+        std::vector<::shared_ptr<column_specification>> _names;
+        uint32_t _column_count;
+
+        column_info(std::vector<::shared_ptr<column_specification>> names, uint32_t column_count)
+            : _names(std::move(names))
+            , _column_count(column_count)
+        { }
+
+        explicit column_info(std::vector<::shared_ptr<column_specification>> names)
+            : _names(std::move(names))
+            , _column_count(_names.size())
+        { }
+    };
+private:
+    flag_enum_set _flags;

 private:
-    uint32_t _column_count;
+    lw_shared_ptr<column_info> _column_info;
    ::shared_ptr<const service::pager::paging_state> _paging_state;

 public:
@@ -93,17 +110,20 @@ private:
    bool all_in_same_cf() const;

 public:
-    void set_has_more_pages(::shared_ptr<const service::pager::paging_state> paging_state);
+    void set_paging_state(::shared_ptr<const service::pager::paging_state> paging_state);
+    void maybe_set_paging_state(::shared_ptr<const service::pager::paging_state> paging_state);

    void set_skip_metadata();

    flag_enum_set flags() const;

-    uint32_t column_count() const;
+    uint32_t column_count() const { return _column_info->_column_count; }

    ::shared_ptr<const service::pager::paging_state> paging_state() const;

-    const std::vector<::shared_ptr<column_specification>>& get_names() const;
+    const std::vector<::shared_ptr<column_specification>>& get_names() const {
+        return _column_info->_names;
+    }
 };

 ::shared_ptr<const cql3::metadata> make_empty_metadata();
@@ -131,10 +151,22 @@ public:
    const std::vector<uint16_t>& partition_key_bind_indices() const;
 };

+GCC6_CONCEPT(
+
+template<typename Visitor>
+concept bool ResultVisitor = requires(Visitor& visitor) {
+    visitor.start_row();
+    visitor.accept_value(std::optional<query::result_bytes_view>());
+    visitor.end_row();
+};
+
+)
+
 class result_set {
-public:
    ::shared_ptr<metadata> _metadata;
    std::deque<std::vector<bytes_opt>> _rows;
+
+    friend class result;
 public:
    result_set(std::vector<::shared_ptr<column_specification>> metadata_);

@@ -163,6 +195,80 @@ public:

    // Returns a range of rows. A row is a range of bytes_opt.
    const std::deque<std::vector<bytes_opt>>& rows() const;
+
+    template<typename Visitor>
+    GCC6_CONCEPT(requires ResultVisitor<Visitor>)
+    void visit(Visitor&& visitor) const {
+        auto column_count = get_metadata().column_count();
+        for (auto& row : _rows) {
+            visitor.start_row();
+            for (auto i = 0u; i < column_count; i++) {
+                auto& cell = row[i];
+                visitor.accept_value(cell ? std::optional<query::result_bytes_view>(*cell) : std::optional<query::result_bytes_view>());
+            }
+            visitor.end_row();
+        }
+    }
+
+    class builder;
+};
+
+class result_set::builder {
+    result_set _result;
+    std::vector<bytes_opt> _current_row;
+public:
+    explicit builder(shared_ptr<metadata> mtd)
+        : _result(std::move(mtd)) { }
+
+    void start_row() { }
+    void accept_value(std::optional<query::result_bytes_view> value) {
+        if (!value) {
+            _current_row.emplace_back();
+            return;
+        }
+        _current_row.emplace_back(value->linearize());
+    }
+    void end_row() {
+        _result.add_row(std::exchange(_current_row, { }));
+    }
+    result_set get_result_set() && { return std::move(_result); }
+};
+
+class result {
+    std::unique_ptr<cql3::result_set> _result_set;
+    result_generator _result_generator;
+    shared_ptr<const cql3::metadata> _metadata;
+public:
+    explicit result(std::unique_ptr<cql3::result_set> rs)
+        : _result_set(std::move(rs))
+        , _metadata(_result_set->_metadata)
+    { }
+
+    explicit result(result_generator generator, shared_ptr<const metadata> m)
+        : _result_generator(std::move(generator))
+        , _metadata(std::move(m))
+    { }
+
+    const cql3::metadata& get_metadata() const { return *_metadata; }
+    cql3::result_set result_set() const {
+        if (_result_set) {
+            return *_result_set;
+        } else {
+            auto builder = result_set::builder(make_shared<cql3::metadata>(*_metadata));
+            _result_generator.visit(builder);
+            return std::move(builder).get_result_set();
+        }
+    }
+    
+    template<typename Visitor>
+    GCC6_CONCEPT(requires ResultVisitor<Visitor>)
+    void visit(Visitor&& visitor) const {
+        if (_result_set) {
+            _result_set->visit(std::forward<Visitor>(visitor));
+        } else {
+            _result_generator.visit(std::forward<Visitor>(visitor));
+        }
+    }
 };

 }
--- a/cql3/selection/selectable.cc
+++ b/cql3/selection/selectable.cc
@@ -112,11 +112,37 @@ selectable::with_function::raw::make_count_rows_function() {
                    std::vector<shared_ptr<cql3::selection::selectable::raw>>());
 }

+shared_ptr<selector::factory>
+selectable::with_anonymous_function::new_selector_factory(database& db, schema_ptr s, std::vector<const column_definition*>& defs) {
+    auto&& factories = selector_factories::create_factories_and_collect_column_definitions(_args, db, s, defs);
+    return abstract_function_selector::new_factory(_function, std::move(factories));
+}
+
+sstring
+selectable::with_anonymous_function::to_string() const {
+    return sprint("%s(%s)", _function->name().name, join(", ", _args));
+}
+
+shared_ptr<selectable>
+selectable::with_anonymous_function::raw::prepare(schema_ptr s) {
+        std::vector<shared_ptr<selectable>> prepared_args;
+        prepared_args.reserve(_args.size());
+        for (auto&& arg : _args) {
+            prepared_args.push_back(arg->prepare(s));
+        }
+        return ::make_shared<with_anonymous_function>(_function, std::move(prepared_args));
+    }
+
+bool
+selectable::with_anonymous_function::raw::processes_selection() const {
+    return true;
+}
+
 shared_ptr<selector::factory>
 selectable::with_field_selection::new_selector_factory(database& db, schema_ptr s, std::vector<const column_definition*>& defs) {
    auto&& factory = _selected->new_selector_factory(db, s, defs);
    auto&& type = factory->new_instance()->get_type();
-    auto&& ut = dynamic_pointer_cast<const user_type_impl>(std::move(type));
+    auto&& ut = dynamic_pointer_cast<const user_type_impl>(type->underlying_type());
    if (!ut) {
        throw exceptions::invalid_request_exception(
                sprint("Invalid field selection: %s of type %s is not a user type",
--- a/cql3/selection/selectable.hh
+++ b/cql3/selection/selectable.hh
@@ -46,6 +46,7 @@
 #include "core/shared_ptr.hh"
 #include "cql3/selection/selector.hh"
 #include "cql3/cql3_type.hh"
+#include "cql3/functions/function.hh"
 #include "cql3/functions/function_name.hh"

 namespace cql3 {
@@ -82,6 +83,7 @@ public:
    class writetime_or_ttl;

    class with_function;
+    class with_anonymous_function;

    class with_field_selection;

@@ -114,6 +116,28 @@ public:
    };
 };

+class selectable::with_anonymous_function : public selectable {
+    shared_ptr<functions::function> _function;
+    std::vector<shared_ptr<selectable>> _args;
+public:
+    with_anonymous_function(::shared_ptr<functions::function> f, std::vector<shared_ptr<selectable>> args)
+        : _function(f), _args(std::move(args)) {
+    }
+
+    virtual sstring to_string() const override;
+
+    virtual shared_ptr<selector::factory> new_selector_factory(database& db, schema_ptr s, std::vector<const column_definition*>& defs) override;
+    class raw : public selectable::raw {
+        shared_ptr<functions::function> _function;
+        std::vector<shared_ptr<selectable::raw>> _args;
+    public:
+        raw(shared_ptr<functions::function> f, std::vector<shared_ptr<selectable::raw>> args)
+                : _function(f), _args(std::move(args)) {
+        }
+        virtual shared_ptr<selectable> prepare(schema_ptr s) override;
+        virtual bool processes_selection() const override;
+    };
+};

 class selectable::with_cast : public selectable {
    ::shared_ptr<selectable> _arg;
--- a/cql3/selection/selection.cc
+++ b/cql3/selection/selection.cc
@@ -40,6 +40,7 @@
 */

 #include <boost/range/adaptor/transformed.hpp>
+#include <boost/range/adaptor/filtered.hpp>

 #include "cql3/selection/selection.hh"
 #include "cql3/selection/selector_factories.hh"
@@ -53,13 +54,15 @@ selection::selection(schema_ptr schema,
    std::vector<const column_definition*> columns,
    std::vector<::shared_ptr<column_specification>> metadata_,
    bool collect_timestamps,
-    bool collect_TTLs)
+    bool collect_TTLs,
+    trivial is_trivial)
        : _schema(std::move(schema))
        , _columns(std::move(columns))
        , _metadata(::make_shared<metadata>(std::move(metadata_)))
        , _collect_timestamps(collect_timestamps)
        , _collect_TTLs(collect_TTLs)
        , _contains_static_columns(std::any_of(_columns.begin(), _columns.end(), std::mem_fn(&column_definition::is_static)))
+        , _is_trivial(is_trivial)
 { }

 query::partition_slice::option_set selection::get_query_options() {
@@ -100,7 +103,7 @@ public:
     */
    simple_selection(schema_ptr schema, std::vector<const column_definition*> columns,
        std::vector<::shared_ptr<column_specification>> metadata, bool is_wildcard)
-            : selection(schema, std::move(columns), std::move(metadata), false, false)
+            : selection(schema, std::move(columns), std::move(metadata), false, false, trivial::yes)
            , _is_wildcard(is_wildcard)
    { }

@@ -153,9 +156,9 @@ public:
        return _factories->uses_function(ks_name, function_name);
    }

-    virtual uint32_t add_column_for_ordering(const column_definition& c) override {
-        uint32_t index = selection::add_column_for_ordering(c);
-        _factories->add_selector_for_ordering(c, index);
+    virtual uint32_t add_column_for_post_processing(const column_definition& c) override {
+        uint32_t index = selection::add_column_for_post_processing(c);
+        _factories->add_selector_for_post_processing(c, index);
        return index;
    }

@@ -206,9 +209,17 @@ protected:

 ::shared_ptr<selection> selection::wildcard(schema_ptr schema) {
    auto columns = schema->all_columns_in_select_order();
-    auto cds = boost::copy_range<std::vector<const column_definition*>>(columns | boost::adaptors::transformed([](const column_definition& c) {
-        return &c;
-    }));
+    // filter out hidden columns, which should not be seen by the
+    // user when doing "SELECT *". We also disallow selecting them
+    // individually (see column_identifier::new_selector_factory()).
+    auto cds = boost::copy_range<std::vector<const column_definition*>>(
+        columns |
+        boost::adaptors::filtered([](const column_definition& c) {
+            return !c.is_view_virtual();
+        }) |
+        boost::adaptors::transformed([](const column_definition& c) {
+            return &c;
+        }));
    return simple_selection::make(schema, std::move(cds), true);
 }

@@ -216,7 +227,7 @@ protected:
    return simple_selection::make(schema, std::move(columns), false);
 }

-uint32_t selection::add_column_for_ordering(const column_definition& c) {
+uint32_t selection::add_column_for_post_processing(const column_definition& c) {
    _columns.push_back(&c);
    _metadata->add_non_serialized_column(c.column_specification);
    return _columns.size() - 1;
@@ -328,93 +339,106 @@ std::unique_ptr<result_set> result_set_builder::build() {
    return std::move(_result_set);
 }

-result_set_builder::visitor::visitor(
-        cql3::selection::result_set_builder& builder, const schema& s,
-        const selection& selection)
-        : _builder(builder), _schema(s), _selection(selection), _row_count(0) {
-}
+bool result_set_builder::restrictions_filter::do_filter(const selection& selection,
+                                                         const std::vector<bytes>& partition_key,
+                                                         const std::vector<bytes>& clustering_key,
+                                                         const query::result_row_view& static_row,
+                                                         const query::result_row_view& row) const {
+    static logging::logger rlogger("restrictions_filter");

-void result_set_builder::visitor::add_value(const column_definition& def,
-        query::result_row_view::iterator_type& i) {
-    if (def.type->is_multi_cell()) {
-        auto cell = i.next_collection_cell();
-        if (!cell) {
-            _builder.add_empty();
-            return;
-        }
-        _builder.add_collection(def, *cell);
-    } else {
-        auto cell = i.next_atomic_cell();
-        if (!cell) {
-            _builder.add_empty();
-            return;
-        }
-        _builder.add(def, *cell);
+    if (_current_partition_key_does_not_match || _current_static_row_does_not_match || _remaining == 0) {
+        return false;
    }
-}

-void result_set_builder::visitor::accept_new_partition(const partition_key& key,
-        uint32_t row_count) {
-    _partition_key = key.explode(_schema);
-    _row_count = row_count;
-}
-
-void result_set_builder::visitor::accept_new_partition(uint32_t row_count) {
-    _row_count = row_count;
-}
-
-void result_set_builder::visitor::accept_new_row(const clustering_key& key,
-        const query::result_row_view& static_row,
-        const query::result_row_view& row) {
-    _clustering_key = key.explode(_schema);
-    accept_new_row(static_row, row);
-}
-
-void result_set_builder::visitor::accept_new_row(
-        const query::result_row_view& static_row,
-        const query::result_row_view& row) {
    auto static_row_iterator = static_row.iterator();
    auto row_iterator = row.iterator();
-    _builder.new_row();
-    for (auto&& def : _selection.get_columns()) {
-        switch (def->kind) {
-        case column_kind::partition_key:
-            _builder.add(_partition_key[def->component_index()]);
-            break;
-        case column_kind::clustering_key:
-            if (_clustering_key.size() > def->component_index()) {
-                _builder.add(_clustering_key[def->component_index()]);
+    auto non_pk_restrictions_map = _restrictions->get_non_pk_restriction();
+    auto partition_key_restrictions_map = _restrictions->get_single_column_partition_key_restrictions();
+    auto clustering_key_restrictions_map = _restrictions->get_single_column_clustering_key_restrictions();
+    for (auto&& cdef : selection.get_columns()) {
+        switch (cdef->kind) {
+        case column_kind::static_column:
+            // fallthrough
+        case column_kind::regular_column: {
+            auto& cell_iterator = (cdef->kind == column_kind::static_column) ? static_row_iterator : row_iterator;
+            if (cdef->type->is_multi_cell()) {
+                cell_iterator.next_collection_cell();
+                auto restr_it = non_pk_restrictions_map.find(cdef);
+                if (restr_it == non_pk_restrictions_map.end()) {
+                    continue;
+                }
+                throw exceptions::invalid_request_exception("Collection filtering is not supported yet");
            } else {
-                _builder.add({});
+                auto cell = cell_iterator.next_atomic_cell();
+
+                auto restr_it = non_pk_restrictions_map.find(cdef);
+                if (restr_it == non_pk_restrictions_map.end()) {
+                    continue;
+                }
+                restrictions::single_column_restriction& restriction = *restr_it->second;
+
+                bool regular_restriction_matches;
+                if (cell) {
+                    regular_restriction_matches = cell->value().with_linearized([&restriction, this](bytes_view data) {
+                        return restriction.is_satisfied_by(data, _options);
+                    });
+                } else {
+                    regular_restriction_matches = restriction.is_satisfied_by(bytes(), _options);
+                }
+                if (!regular_restriction_matches) {
+                    _current_static_row_does_not_match = (cdef->kind == column_kind::static_column);
+                    return false;
+                }
+
+            }
            }
            break;
-        case column_kind::regular_column:
-            add_value(*def, row_iterator);
+        case column_kind::partition_key: {
+            auto restr_it = partition_key_restrictions_map.find(cdef);
+            if (restr_it == partition_key_restrictions_map.end()) {
+                continue;
+            }
+            restrictions::single_column_restriction& restriction = *restr_it->second;
+            const bytes& value_to_check = partition_key[cdef->id];
+            bool pk_restriction_matches = restriction.is_satisfied_by(value_to_check, _options);
+            if (!pk_restriction_matches) {
+                _current_partition_key_does_not_match = true;
+                return false;
+            }
+            }
            break;
-        case column_kind::static_column:
-            add_value(*def, static_row_iterator);
+        case column_kind::clustering_key: {
+            auto restr_it = clustering_key_restrictions_map.find(cdef);
+            if (restr_it == clustering_key_restrictions_map.end()) {
+                continue;
+            }
+            restrictions::single_column_restriction& restriction = *restr_it->second;
+            const bytes& value_to_check = clustering_key[cdef->id];
+            bool pk_restriction_matches = restriction.is_satisfied_by(value_to_check, _options);
+            if (!pk_restriction_matches) {
+                return false;
+            }
+            }
            break;
        default:
-            assert(0);
+            break;
        }
    }
+    return true;
 }

-void result_set_builder::visitor::accept_partition_end(
-        const query::result_row_view& static_row) {
-    if (_row_count == 0) {
-        _builder.new_row();
-        auto static_row_iterator = static_row.iterator();
-        for (auto&& def : _selection.get_columns()) {
-            if (def->is_partition_key()) {
-                _builder.add(_partition_key[def->component_index()]);
-            } else if (def->is_static()) {
-                add_value(*def, static_row_iterator);
-            } else {
-                _builder.add_empty();
-            }
-        }
+bool result_set_builder::restrictions_filter::operator()(const selection& selection,
+                                                         const std::vector<bytes>& partition_key,
+                                                         const std::vector<bytes>& clustering_key,
+                                                         const query::result_row_view& static_row,
+                                                         const query::result_row_view& row) const {
+    const bool accepted = do_filter(selection, partition_key, clustering_key, static_row, row);
+    if (!accepted) {
+        ++_rows_dropped;
+    } else if (_remaining > 0) {
+        --_remaining;
    }
+    return accepted;
 }

 api::timestamp_type result_set_builder::timestamp_of(size_t idx) {
@@ -426,7 +450,7 @@ int32_t result_set_builder::ttl_of(size_t idx) {
 }

 bytes_opt result_set_builder::get_value(data_type t, query::result_atomic_cell_view c) {
-    return {to_bytes(c.value())};
+    return {c.value().linearize()};
 }

 }
--- a/cql3/selection/selection.hh
+++ b/cql3/selection/selection.hh
@@ -48,6 +48,7 @@
 #include "exceptions/exceptions.hh"
 #include "cql3/selection/raw_selector.hh"
 #include "cql3/selection/selector_factories.hh"
+#include "cql3/restrictions/statement_restrictions.hh"
 #include "unimplemented.hh"

 namespace cql3 {
@@ -84,12 +85,15 @@ private:
    const bool _collect_timestamps;
    const bool _collect_TTLs;
    const bool _contains_static_columns;
+    bool _is_trivial;
 protected:
+    using trivial = bool_class<class trivial_tag>;
+
    selection(schema_ptr schema,
        std::vector<const column_definition*> columns,
        std::vector<::shared_ptr<column_specification>> metadata_,
        bool collect_timestamps,
-        bool collect_TTLs);
+        bool collect_TTLs, trivial is_trivial = trivial::no);

    virtual ~selection() {}
 public:
@@ -165,10 +169,14 @@ public:
        return _metadata;
    }

+    ::shared_ptr<metadata> get_result_metadata() {
+        return _metadata;
+    }
+
    static ::shared_ptr<selection> wildcard(schema_ptr schema);
    static ::shared_ptr<selection> for_columns(schema_ptr schema, std::vector<const column_definition*> columns);

-    virtual uint32_t add_column_for_ordering(const column_definition& c);
+    virtual uint32_t add_column_for_post_processing(const column_definition& c);

    virtual bool uses_function(const sstring &ks_name, const sstring& function_name) const {
        return false;
@@ -223,6 +231,12 @@ public:
        }
    }

+    /**
+     * Returns true if the selection is trivial, i.e. there are no function
+     * selectors (including casts or aggregates).
+     */
+    bool is_trivial() const { return _is_trivial; }
+
    friend class result_set_builder;
 };

@@ -238,6 +252,40 @@ private:
    const gc_clock::time_point _now;
    cql_serialization_format _cql_serialization_format;
 public:
+    class nop_filter {
+    public:
+        inline bool operator()(const selection&, const std::vector<bytes>&, const std::vector<bytes>&, const query::result_row_view&, const query::result_row_view&) const {
+            return true;
+        }
+        void reset() {
+        }
+        uint32_t get_rows_dropped() const {
+            return 0;
+        }
+    };
+    class restrictions_filter {
+        ::shared_ptr<restrictions::statement_restrictions> _restrictions;
+        const query_options& _options;
+        mutable bool _current_partition_key_does_not_match = false;
+        mutable bool _current_static_row_does_not_match = false;
+        mutable uint32_t _rows_dropped = 0;
+        mutable uint32_t _remaining = 0;
+    public:
+        restrictions_filter() = default;
+        explicit restrictions_filter(::shared_ptr<restrictions::statement_restrictions> restrictions, const query_options& options, uint32_t remaining) : _restrictions(restrictions), _options(options), _remaining(remaining) {}
+        bool operator()(const selection& selection, const std::vector<bytes>& pk, const std::vector<bytes>& ck, const query::result_row_view& static_row, const query::result_row_view& row) const;
+        void reset() {
+            _current_partition_key_does_not_match = false;
+            _current_static_row_does_not_match = false;
+            _rows_dropped = 0;
+        }
+        uint32_t get_rows_dropped() const {
+            return _rows_dropped;
+        }
+    private:
+        bool do_filter(const selection& selection, const std::vector<bytes>& pk, const std::vector<bytes>& ck, const query::result_row_view& static_row, const query::result_row_view& row) const;
+    };
+
    result_set_builder(const selection& s, gc_clock::time_point now, cql_serialization_format sf);
    void add_empty();
    void add(bytes_opt value);
@@ -247,8 +295,9 @@ public:
    std::unique_ptr<result_set> build();
    api::timestamp_type timestamp_of(size_t idx);
    int32_t ttl_of(size_t idx);
-    
+
    // Implements ResultVisitor concept from query.hh
+    template<typename Filter = nop_filter>
    class visitor {
    protected:
        result_set_builder& _builder;
@@ -257,20 +306,101 @@ public:
        uint32_t _row_count;
        std::vector<bytes> _partition_key;
        std::vector<bytes> _clustering_key;
+        Filter _filter;
    public:
-        visitor(cql3::selection::result_set_builder& builder, const schema& s, const selection&);
+        visitor(cql3::selection::result_set_builder& builder, const schema& s,
+                const selection& selection, Filter filter = Filter())
+            : _builder(builder)
+            , _schema(s)
+            , _selection(selection)
+            , _row_count(0)
+            , _filter(filter)
+        {}
        visitor(visitor&&) = default;

-        void add_value(const column_definition& def, query::result_row_view::iterator_type& i);
-        void accept_new_partition(const partition_key& key, uint32_t row_count);
-        void accept_new_partition(uint32_t row_count);
-        void accept_new_row(const clustering_key& key,
-                const query::result_row_view& static_row,
-                const query::result_row_view& row);
-        void accept_new_row(const query::result_row_view& static_row,
-                const query::result_row_view& row);
-        void accept_partition_end(const query::result_row_view& static_row);
+        void add_value(const column_definition& def, query::result_row_view::iterator_type& i) {
+            if (def.type->is_multi_cell()) {
+                auto cell = i.next_collection_cell();
+                if (!cell) {
+                    _builder.add_empty();
+                    return;
+                }
+                _builder.add_collection(def, cell->linearize());
+            } else {
+                auto cell = i.next_atomic_cell();
+                if (!cell) {
+                    _builder.add_empty();
+                    return;
+                }
+                _builder.add(def, *cell);
+            }
+        }
+
+        void accept_new_partition(const partition_key& key, uint32_t row_count) {
+            _partition_key = key.explode(_schema);
+            _row_count = row_count;
+            _filter.reset();
+        }
+
+        void accept_new_partition(uint32_t row_count) {
+            _row_count = row_count;
+            _filter.reset();
+        }
+
+        void accept_new_row(const clustering_key& key, const query::result_row_view& static_row, const query::result_row_view& row) {
+            _clustering_key = key.explode(_schema);
+            accept_new_row(static_row, row);
+        }
+
+        void accept_new_row(const query::result_row_view& static_row, const query::result_row_view& row) {
+            auto static_row_iterator = static_row.iterator();
+            auto row_iterator = row.iterator();
+            if (!_filter(_selection, _partition_key, _clustering_key, static_row, row)) {
+                return;
+            }
+            _builder.new_row();
+            for (auto&& def : _selection.get_columns()) {
+                switch (def->kind) {
+                case column_kind::partition_key:
+                    _builder.add(_partition_key[def->component_index()]);
+                    break;
+                case column_kind::clustering_key:
+                    if (_clustering_key.size() > def->component_index()) {
+                        _builder.add(_clustering_key[def->component_index()]);
+                    } else {
+                        _builder.add({});
+                    }
+                    break;
+                case column_kind::regular_column:
+                    add_value(*def, row_iterator);
+                    break;
+                case column_kind::static_column:
+                    add_value(*def, static_row_iterator);
+                    break;
+                default:
+                    assert(0);
+                }
+            }
+        }
+
+        uint32_t accept_partition_end(const query::result_row_view& static_row) {
+            if (_row_count == 0) {
+                _builder.new_row();
+                auto static_row_iterator = static_row.iterator();
+                for (auto&& def : _selection.get_columns()) {
+                    if (def->is_partition_key()) {
+                        _builder.add(_partition_key[def->component_index()]);
+                    } else if (def->is_static()) {
+                        add_value(*def, static_row_iterator);
+                    } else {
+                        _builder.add_empty();
+                    }
+                }
+            }
+            return _filter.get_rows_dropped();
+        }
    };
+
 private:
    bytes_opt get_value(data_type t, query::result_atomic_cell_view c);
 };
--- a/cql3/selection/selector_factories.cc
+++ b/cql3/selection/selector_factories.cc
@@ -53,6 +53,7 @@ selector_factories::selector_factories(std::vector<::shared_ptr<selectable>> sel
    : _contains_write_time_factory(false)
    , _contains_ttl_factory(false)
    , _number_of_aggregate_factories(0)
+    , _number_of_factories_for_post_processing(0)
 {
    _factories.reserve(selectables.size());

@@ -76,8 +77,9 @@ bool selector_factories::uses_function(const sstring& ks_name, const sstring& fu
    return false;
 }

-void selector_factories::add_selector_for_ordering(const column_definition& def, uint32_t index) {
+void selector_factories::add_selector_for_post_processing(const column_definition& def, uint32_t index) {
    _factories.emplace_back(simple_selector::new_factory(def.name_as_text(), index, def.type));
+    ++_number_of_factories_for_post_processing;
 }

 std::vector<::shared_ptr<selector>> selector_factories::new_instances() const {
--- a/cql3/selection/selector_factories.hh
+++ b/cql3/selection/selector_factories.hh
@@ -74,6 +74,11 @@ private:
     */
    uint32_t _number_of_aggregate_factories;

+    /**
+     * The number of factories that are only for post processing.
+     */
+    uint32_t _number_of_factories_for_post_processing;
+
 public:
    /**
     * Creates a new <code>SelectorFactories</code> instance and collect the column definitions.
@@ -97,11 +102,12 @@ public:
    bool uses_function(const sstring& ks_name, const sstring& function_name) const;

    /**
-     * Adds a new <code>Selector.Factory</code> for a column that is needed only for ORDER BY purposes.
+     * Adds a new <code>Selector.Factory</code> for a column that is needed only for ORDER BY or post
+     * processing purposes.
     * @param def the column that is needed for ordering
     * @param index the index of the column definition in the Selection's list of columns
     */
-    void add_selector_for_ordering(const column_definition& def, uint32_t index);
+    void add_selector_for_post_processing(const column_definition& def, uint32_t index);

    /**
     * Checks if this <code>SelectorFactories</code> contains only factories for aggregates.
@@ -111,7 +117,7 @@ public:
     */
    bool contains_only_aggregate_functions() const {
        auto size = _factories.size();
-        return size != 0 && _number_of_aggregate_factories == size;
+        return size != 0 && _number_of_aggregate_factories  == (size - _number_of_factories_for_post_processing);
    }

    /**
--- a/cql3/sets.cc
+++ b/cql3/sets.cc
@@ -120,17 +120,19 @@ sets::literal::to_string() const {
 }

 sets::value
-sets::value::from_serialized(bytes_view v, set_type type, cql_serialization_format sf) {
+sets::value::from_serialized(const fragmented_temporary_buffer::view& val, set_type type, cql_serialization_format sf) {
    try {
        // Collections have this small hack that validate cannot be called on a serialized object,
        // but compose does the validation (so we're fine).
        // FIXME: deserializeForNativeProtocol?!
+      return with_linearized(val, [&] (bytes_view v) {
        auto s = value_cast<set_type_impl::native_type>(type->deserialize(v, sf));
        std::set<bytes, serialized_compare> elements(type->get_elements_type()->as_less_comparator());
        for (auto&& element : s) {
            elements.insert(elements.end(), type->get_elements_type()->decompose(element));
        }
        return value(std::move(elements));
+      });
    } catch (marshal_exception& e) {
        throw exceptions::invalid_request_exception(e.what());
    }
@@ -198,10 +200,10 @@ sets::delayed_value::bind(const query_options& options) {
            return constants::UNSET_VALUE;
        }
        // We don't support value > 64K because the serialization format encode the length as an unsigned short.
-        if (b->size() > std::numeric_limits<uint16_t>::max()) {
+        if (b->size_bytes() > std::numeric_limits<uint16_t>::max()) {
            throw exceptions::invalid_request_exception(sprint("Set value is too long. Set values are limited to %d bytes but %d bytes value provided",
                    std::numeric_limits<uint16_t>::max(),
-                    b->size()));
+                    b->size_bytes()));
        }

        buffers.insert(buffers.end(), std::move(to_bytes(*b)));
@@ -225,7 +227,12 @@ sets::marker::bind(const query_options& options) {

 void
 sets::setter::execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params) {
-    const auto& value = _t->bind(params._options);
+    auto value = _t->bind(params._options);
+    execute(m, row_key, params, column, std::move(value));
+}
+
+void
+sets::setter::execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value) {
    if (value == constants::UNSET_VALUE) {
        return;
    }
@@ -264,7 +271,7 @@ sets::adder::do_add(mutation& m, const clustering_key_prefix& row_key, const upd
        }

        for (auto&& e : set_value->_elements) {
-            mut.cells.emplace_back(e, params.make_cell({}));
+            mut.cells.emplace_back(e, params.make_cell(*set_type->value_comparator(), bytes_view(), atomic_cell::collection_member::yes));
        }
        auto smut = set_type->serialize_mutation_form(mut);

@@ -274,7 +281,7 @@ sets::adder::do_add(mutation& m, const clustering_key_prefix& row_key, const upd
        auto v = set_type->serialize_partially_deserialized_form(
                {set_value->_elements.begin(), set_value->_elements.end()},
                cql_serialization_format::internal());
-        m.set_cell(row_key, column, params.make_cell(std::move(v)));
+        m.set_cell(row_key, column, params.make_cell(*column.type, fragmented_temporary_buffer::view(v)));
    } else {
        m.set_cell(row_key, column, params.make_dead_cell());
    }
--- a/cql3/sets.hh
+++ b/cql3/sets.hh
@@ -78,7 +78,7 @@ public:
        value(std::set<bytes, serialized_compare> elements)
                : _elements(std::move(elements)) {
        }
-        static value from_serialized(bytes_view v, set_type type, cql_serialization_format sf);
+        static value from_serialized(const fragmented_temporary_buffer::view& v, set_type type, cql_serialization_format sf);
        virtual cql3::raw_value get(const query_options& options) override;
        virtual bytes get_with_protocol_version(cql_serialization_format sf) override;
        bool equals(set_type st, const value& v);
@@ -113,6 +113,7 @@ public:
                : operation(column, std::move(t)) {
        }
        virtual void execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params) override;
+        static void execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value);
    };

    class adder : public operation {
--- a/cql3/single_column_relation.cc
+++ b/cql3/single_column_relation.cc
@@ -101,13 +101,6 @@ single_column_relation::to_receivers(schema_ptr schema, const column_definition&
    }

    if (is_IN()) {
-        // For partition keys we only support IN for the last name so far
-        if (column_def.is_partition_key() && !schema->is_last_partition_key(column_def)) {
-            throw exceptions::invalid_request_exception(sprint(
-                "Partition KEY part %s cannot be restricted by IN relation (only the last part of the partition key can)",
-                column_def.name_as_text()));
-        }
-
        // We only allow IN on the row key and the clustering key so far, never on non-PK columns, and this even if
        // there's an index
        // Note: for backward compatibility reason, we conside a IN of 1 value the same as a EQ, so we let that
@@ -116,18 +109,6 @@ single_column_relation::to_receivers(schema_ptr schema, const column_definition&
            throw exceptions::invalid_request_exception(sprint(
                   "IN predicates on non-primary-key columns (%s) is not yet supported", column_def.name_as_text()));
        }
-    } else if (is_slice()) {
-        // Non EQ relation is not supported without token(), even if we have a 2ndary index (since even those
-        // are ordered by partitioner).
-        // Note: In theory we could allow it for 2ndary index queries with ALLOW FILTERING, but that would
-        // probably require some special casing
-        // Note bis: This is also why we don't bother handling the 'tuple' notation of #4851 for keys. If we
-        // lift the limitation for 2ndary
-        // index with filtering, we'll need to handle it though.
-        if (column_def.is_partition_key()) {
-            throw exceptions::invalid_request_exception(
-                "Only EQ and IN relation are supported on the partition key (unless you use the token() function)");
-        }
    }

    if (is_contains() && !receiver->type->is_collection()) {
--- a/cql3/single_column_relation.hh
+++ b/cql3/single_column_relation.hh
@@ -134,7 +134,7 @@ protected:
 #endif

    virtual sstring to_string() const override {
-        auto entity_as_string = _entity->to_string();
+        auto entity_as_string = _entity->to_cql_string();
        if (_map_key) {
            entity_as_string = sprint("%s[%s]", std::move(entity_as_string), _map_key->to_string());
        }
--- a/cql3/statements/alter_keyspace_statement.cc
+++ b/cql3/statements/alter_keyspace_statement.cc
@@ -42,7 +42,7 @@
 #include "alter_keyspace_statement.hh"
 #include "prepared_statement.hh"
 #include "service/migration_manager.hh"
-#include "database.hh"
+#include "db/system_keyspace.hh"

 bool is_system_keyspace(const sstring& keyspace);

@@ -59,7 +59,7 @@ future<> cql3::statements::alter_keyspace_statement::check_access(const service:
    return state.has_keyspace_access(_name, auth::permission::ALTER);
 }

-void cql3::statements::alter_keyspace_statement::validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) {
+void cql3::statements::alter_keyspace_statement::validate(service::storage_proxy& proxy, const service::client_state& state) {
    try {
        service::get_local_storage_proxy().get_db().local().find_keyspace(_name); // throws on failure
        auto tmp = _name;
@@ -90,7 +90,7 @@ void cql3::statements::alter_keyspace_statement::validate(distributed<service::s
    }
 }

-future<shared_ptr<cql_transport::event::schema_change>> cql3::statements::alter_keyspace_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) {
+future<shared_ptr<cql_transport::event::schema_change>> cql3::statements::alter_keyspace_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only) {
    auto old_ksm = service::get_local_storage_proxy().get_db().local().find_keyspace(_name).metadata();
    return service::get_local_migration_manager().announce_keyspace_update(_attrs->as_ks_metadata_update(old_ksm), is_local_only).then([this] {
        using namespace cql_transport;
--- a/cql3/statements/alter_keyspace_statement.hh
+++ b/cql3/statements/alter_keyspace_statement.hh
@@ -60,8 +60,8 @@ public:
    const sstring& keyspace() const override;

    future<> check_access(const service::client_state& state) override;
-    void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) override;
-    future<shared_ptr<cql_transport::event::schema_change>> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
+    void validate(service::storage_proxy& proxy, const service::client_state& state) override;
+    future<shared_ptr<cql_transport::event::schema_change>> announce_migration(service::storage_proxy& proxy, bool is_local_only) override;
    virtual std::unique_ptr<prepared> prepare(database& db, cql_stats& stats) override;
 };

--- a/cql3/statements/alter_role_statement.hh
+++ b/cql3/statements/alter_role_statement.hh
@@ -62,12 +62,12 @@ public:
                , _options(std::move(options)) {
    }

-    void validate(distributed<service::storage_proxy>&, const service::client_state&) override;
+    void validate(service::storage_proxy&, const service::client_state&) override;

    virtual future<> check_access(const service::client_state&) override;

    virtual future<::shared_ptr<cql_transport::messages::result_message>>
-    execute(distributed<service::storage_proxy>&, service::query_state&, const query_options&) override;
+    execute(service::storage_proxy&, service::query_state&, const query_options&) override;
 };

 }
--- a/cql3/statements/alter_table_statement.cc
+++ b/cql3/statements/alter_table_statement.cc
@@ -75,7 +75,7 @@ future<> alter_table_statement::check_access(const service::client_state& state)
    return state.has_column_family_access(keyspace(), column_family(), auth::permission::ALTER);
 }

-void alter_table_statement::validate(distributed<service::storage_proxy>& proxy, const service::client_state& state)
+void alter_table_statement::validate(service::storage_proxy& proxy, const service::client_state& state)
 {
    // validated in announce_migration()
 }
@@ -165,9 +165,9 @@ static void validate_column_rename(database& db, const schema& schema, const col
    }
 }

-future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only)
+future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only)
 {
-    auto& db = proxy.local().get_db().local();
+    auto& db = proxy.get_db().local();
    auto schema = validation::validate_column_family(db, keyspace(), column_family());
    if (schema->is_view()) {
        throw exceptions::invalid_request_exception("Cannot use ALTER TABLE on Materialized View");
@@ -246,15 +246,22 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a

        cfm.with_column(column_name->name(), type, _is_static ? column_kind::static_column : column_kind::regular_column);

-        // Adding a column to a table which has an include all view requires the column to be added to the view
-        // as well
+        // Adding a column to a base table always requires updating the view
+        // schemas: If the view includes all columns it should include the new
+        // column, but if it doesn't, it may need to include the new
+        // unselected column as a virtual column. The case when it we
+        // shouldn't add a virtual column is when the view has in its PK one
+        // of the base's regular columns - but even in this case we need to
+        // rebuild the view schema, to update the column ID.
        if (!_is_static) {
            for (auto&& view : cf.views()) {
+                schema_builder builder(view);
                if (view->view_info()->include_all_columns()) {
-                    schema_builder builder(view);
                    builder.with_column(column_name->name(), type);
-                    view_updates.push_back(view_ptr(builder.build()));
+                } else if (!view->view_info()->base_non_pk_column_in_view_pk()) {
+                    db::view::create_virtual_column(builder, column_name->name(), type);
                }
+                view_updates.push_back(view_ptr(builder.build()));
            }
        }

@@ -269,7 +276,7 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a

        auto type = validate_alter(schema, *def, *validator);
        // In any case, we update the column definition
-        cfm.with_altered_column_type(column_name->name(), type);
+        cfm.alter_column_type(column_name->name(), type);

        // We also have to validate the view types here. If we have a view which includes a column as part of
        // the clustering key, we need to make sure that it is indeed compatible.
@@ -278,7 +285,7 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
            if (view_def) {
                schema_builder builder(view);
                auto view_type = validate_alter(view, *view_def, *validator);
-                builder.with_altered_column_type(column_name->name(), std::move(view_type));
+                builder.alter_column_type(column_name->name(), std::move(view_type));
                view_updates.push_back(view_ptr(builder.build()));
            }
        }
@@ -299,20 +306,16 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
        } else {
            for (auto&& column_def : boost::range::join(schema->static_columns(), schema->regular_columns())) { // find
                if (column_def.name() == column_name->name()) {
-                    cfm.without_column(column_name->name());
+                    cfm.remove_column(column_name->name());
                    break;
                }
            }
        }

-        // If a column is dropped which is included in a view, we don't allow the drop to take place.
-        auto view_names = ::join(", ", cf.views()
-                   | boost::adaptors::filtered([&] (auto&& v) { return bool(v->get_column_definition(column_name->name())); })
-                   | boost::adaptors::transformed([] (auto&& v) { return v->cf_name(); }));
-        if (!view_names.empty()) {
+        if (!cf.views().empty()) {
            throw exceptions::invalid_request_exception(sprint(
-                    "Cannot drop column %s, depended on by materialized views (%s.{%s})",
-                    column_name, keyspace(), view_names));
+                    "Cannot drop column %s on base table %s.%s with materialized views",
+                    column_name, keyspace(), column_family()));
        }
        break;
    }
@@ -346,9 +349,10 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
            auto to = entry.second->prepare_column_identifier(schema);

            validate_column_rename(db, *schema, *from, *to);
-            cfm.with_column_rename(from->name(), to->name());
+            cfm.rename_column(from->name(), to->name());

-            // If the view includes a renamed column, it must be renamed in the view table and the definition.
+            // If the view includes a renamed column, it must be renamed in
+            // the view table and the definition.
            for (auto&& view : cf.views()) {
                if (view->get_column_definition(from->name())) {
                    schema_builder builder(view);
@@ -356,7 +360,7 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
                    auto view_from = entry.first->prepare_column_identifier(view);
                    auto view_to = entry.second->prepare_column_identifier(view);
                    validate_column_rename(db, *view, *view_from, *view_to);
-                    builder.with_column_rename(view_from->name(), view_to->name());
+                    builder.rename_column(view_from->name(), view_to->name());

                    auto new_where = util::rename_column_in_where_clause(
                            view->view_info()->where_clause(),
--- a/cql3/statements/alter_table_statement.hh
+++ b/cql3/statements/alter_table_statement.hh
@@ -77,8 +77,8 @@ public:
                          bool is_static);

    virtual future<> check_access(const service::client_state& state) override;
-    virtual void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) override;
-    virtual future<shared_ptr<cql_transport::event::schema_change>> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
+    virtual void validate(service::storage_proxy& proxy, const service::client_state& state) override;
+    virtual future<shared_ptr<cql_transport::event::schema_change>> announce_migration(service::storage_proxy& proxy, bool is_local_only) override;
    virtual std::unique_ptr<prepared> prepare(database& db, cql_stats& stats) override;
 };

--- a/cql3/statements/alter_type_statement.cc
+++ b/cql3/statements/alter_type_statement.cc
@@ -66,7 +66,7 @@ future<> alter_type_statement::check_access(const service::client_state& state)
    return state.has_keyspace_access(keyspace(), auth::permission::ALTER);
 }

-void alter_type_statement::validate(distributed<service::storage_proxy>& proxy, const service::client_state& state)
+void alter_type_statement::validate(service::storage_proxy& proxy, const service::client_state& state)
 {
    // Validation is left to announceMigration as it's easier to do it while constructing the updated type.
    // It doesn't really change anything anyway.
@@ -110,7 +110,7 @@ void alter_type_statement::do_announce_migration(database& db, ::keyspace& ks, b
            if (t_opt) {
                modified = true;
                // We need to update this column
-                cfm.with_altered_column_type(column.name(), *t_opt);
+                cfm.alter_column_type(column.name(), *t_opt);
            }
        }
        if (modified) {
@@ -135,10 +135,10 @@ void alter_type_statement::do_announce_migration(database& db, ::keyspace& ks, b
    }
 }

-future<shared_ptr<cql_transport::event::schema_change>> alter_type_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only)
+future<shared_ptr<cql_transport::event::schema_change>> alter_type_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only)
 {
    return seastar::async([this, &proxy, is_local_only] {
-        auto&& db = proxy.local().get_db().local();
+        auto&& db = proxy.get_db().local();
        try {
            auto&& ks = db.find_keyspace(keyspace());
            do_announce_migration(db, ks, is_local_only);
@@ -165,7 +165,7 @@ alter_type_statement::add_or_alter::add_or_alter(const ut_name& name, bool is_ad
 user_type alter_type_statement::add_or_alter::do_add(database& db, user_type to_update) const
 {
    if (get_idx_of_field(to_update, _field_name)) {
-        throw exceptions::invalid_request_exception(sprint("Cannot add new field %s to type %s: a field of the same name already exists", _field_name->name(), _name.to_string()));
+        throw exceptions::invalid_request_exception(sprint("Cannot add new field %s to type %s: a field of the same name already exists", _field_name->to_string(), _name.to_string()));
    }

    std::vector<bytes> new_names(to_update->field_names());
@@ -173,7 +173,7 @@ user_type alter_type_statement::add_or_alter::do_add(database& db, user_type to_
    std::vector<data_type> new_types(to_update->field_types());
    auto&& add_type = _field_type->prepare(db, keyspace())->get_type();
    if (add_type->references_user_type(to_update->_keyspace, to_update->_name)) {
-        throw exceptions::invalid_request_exception(sprint("Cannot add new field %s of type %s to type %s as this would create a circular reference", _field_name->name(), _field_type->to_string(), _name.to_string()));
+        throw exceptions::invalid_request_exception(sprint("Cannot add new field %s of type %s to type %s as this would create a circular reference", _field_name->to_string(), _field_type->to_string(), _name.to_string()));
    }
    new_types.push_back(std::move(add_type));
    return user_type_impl::get_instance(to_update->_keyspace, to_update->_name, std::move(new_names), std::move(new_types));
@@ -183,13 +183,13 @@ user_type alter_type_statement::add_or_alter::do_alter(database& db, user_type t
 {
    stdx::optional<uint32_t> idx = get_idx_of_field(to_update, _field_name);
    if (!idx) {
-        throw exceptions::invalid_request_exception(sprint("Unknown field %s in type %s", _field_name->name(), _name.to_string()));
+        throw exceptions::invalid_request_exception(sprint("Unknown field %s in type %s", _field_name->to_string(), _name.to_string()));
    }

    auto previous = to_update->field_types()[*idx];
    auto new_type = _field_type->prepare(db, keyspace())->get_type();
    if (!new_type->is_compatible_with(*previous)) {
-        throw exceptions::invalid_request_exception(sprint("Type %s in incompatible with previous type %s of field %s in user type %s", _field_type->to_string(), previous->as_cql3_type()->to_string(), _field_name->name(), _name.to_string()));
+        throw exceptions::invalid_request_exception(sprint("Type %s in incompatible with previous type %s of field %s in user type %s", _field_type->to_string(), previous->as_cql3_type()->to_string(), _field_name->to_string(), _name.to_string()));
    }

    std::vector<data_type> new_types(to_update->field_types());
--- a/cql3/statements/alter_type_statement.hh
+++ b/cql3/statements/alter_type_statement.hh
@@ -59,11 +59,11 @@ public:

    virtual future<> check_access(const service::client_state& state) override;

-    virtual void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) override;
+    virtual void validate(service::storage_proxy& proxy, const service::client_state& state) override;

    virtual const sstring& keyspace() const override;

-    virtual future<shared_ptr<cql_transport::event::schema_change>> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
+    virtual future<shared_ptr<cql_transport::event::schema_change>> announce_migration(service::storage_proxy& proxy, bool is_local_only) override;

    class add_or_alter;
    class renames;
--- a/cql3/statements/alter_view_statement.cc
+++ b/cql3/statements/alter_view_statement.cc
@@ -69,14 +69,14 @@ future<> alter_view_statement::check_access(const service::client_state& state)
    return make_ready_future<>();
 }

-void alter_view_statement::validate(distributed<service::storage_proxy>&, const service::client_state& state)
+void alter_view_statement::validate(service::storage_proxy&, const service::client_state& state)
 {
    // validated in announce_migration()
 }

-future<shared_ptr<cql_transport::event::schema_change>> alter_view_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only)
+future<shared_ptr<cql_transport::event::schema_change>> alter_view_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only)
 {
-    auto&& db = proxy.local().get_db().local();
+    auto&& db = proxy.get_db().local();
    schema_ptr schema = validation::validate_column_family(db, keyspace(), column_family());
    if (!schema->is_view()) {
        throw exceptions::invalid_request_exception("Cannot use ALTER MATERIALIZED VIEW on Table");
@@ -86,10 +86,10 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_view_statement::an
        throw exceptions::invalid_request_exception("ALTER MATERIALIZED VIEW WITH invoked, but no parameters found");
    }

-    _properties->validate(proxy.local().get_db().local().get_config().extensions());
+    _properties->validate(proxy.get_db().local().get_config().extensions());

    auto builder = schema_builder(schema);
-    _properties->apply_to_builder(builder, proxy.local().get_db().local().get_config().extensions());
+    _properties->apply_to_builder(builder, proxy.get_db().local().get_config().extensions());

    if (builder.get_gc_grace_seconds() == 0) {
        throw exceptions::invalid_request_exception(
--- a/Show More
+++ b/Show More