mutation_partition: correctly measure static row size when doing digest calculation

The code uses incorrect output stream in case only digest is requested and thus getting incorrect data size. Failing to correctly account for static row size while calculating digest may cause digest mismatch between digest and data query. Fixes #3753. Message-Id: <20180905131219.GD2326@scylladb.com> (cherry picked from commit 98092353df)
cql3: ensure repeated values in IN clauses don't return repeated rows
2018-09-06 16:51:31 +03:00 · 2018-08-26 15:52:18 +03:00 · 2018-08-21 17:37:36 +01:00 · 2018-08-21 18:24:06 +03:00 · 2018-08-05 10:30:58 +03:00 · 2018-08-01 14:30:58 +03:00
425 changed files with 27698 additions and 15717 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -9,3 +9,12 @@ dist/ami/files/*.rpm
 dist/ami/variables.json
 dist/ami/scylla_deploy.sh
 *.pyc
+Cql.tokens
+.kdev4
+*.kdev4
+CMakeLists.txt.user
+.cache
+.tox
+*.egg-info
+__pycache__CMakeLists.txt.user
+.gdbinit
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -5,8 +5,8 @@
 cmake_minimum_required(VERSION 3.7)
 project(scylla)

-if (NOT DEFINED ENV{CLION_IDE})
-    message(FATAL_ERROR "This CMakeLists.txt file is only valid for use in CLion")
+if (NOT DEFINED FOR_IDE AND NOT DEFINED ENV{FOR_IDE} AND NOT DEFINED ENV{CLION_IDE})
+    message(FATAL_ERROR "This CMakeLists.txt file is only valid for use in IDEs, please define FOR_IDE to acknowledge this.")
 endif()

 # Default value. A more accurate list is populated through `pkg-config` below if `seastar.pc` is available.
--- a/HACKING.md
+++ b/HACKING.md
@@ -0,0 +1,233 @@
+# Guidelines for developing Scylla
+
+This document is intended to help developers and contributors to Scylla get started. The first part consists of general guidelines that make no assumptions about a development environment or tooling. The second part describes a particular environment and work-flow for exemplary purposes.
+
+## Overview
+
+This section covers some high-level information about the Scylla source code and work-flow.
+
+### Getting the source code
+
+Scylla uses [Git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules) to manage its dependency on Seastar and other tools. Be sure that all submodules are correctly initialized when cloning the project:
+
+```bash
+$ git clone https://github.com/scylladb/scylla
+$ cd scylla
+$ git submodule update --init --recursive
+```
+
+### Dependencies
+
+Scylla depends on the system package manager for its development dependencies.
+
+Running `./install_dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
+
+### Build system
+
+**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native thread, and up to 3 GB per native thread while linking.
+
+Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.
+
+To build for the first time:
+
+```bash
+$ ./configure.py
+$ ninja-build
+```
+
+Afterwards, it is sufficient to just execute Ninja.
+
+The full suite of options for project configuration is available via
+
+```bash
+$ ./configure.py --help
+```
+
+The most important options are:
+
+- `--mode={release,debug,all}`: Debug mode enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) and allows for debugging with tools like GDB. Debugging builds are generally slower and generate much larger object files than release builds.
+
+- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
+
+Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.
+
+To save time -- for instance, to avoid compiling all unit tests -- you can also specify specific targets to Ninja. For example,
+
+```bash
+$ ninja-build build/release/tests/schema_change_test
+```
+
+### Unit testing
+
+Unit tests live in the `/tests` directory. Like with application source files, test sources and executables are specified manually in `configure.py` and need to be updated when changes are made.
+
+A test target can be any executable. A non-zero return code indicates test failure.
+
+Most tests in the Scylla repository are built using the [Boost.Test](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/index.html) library. Utilities for writing tests with Seastar futures are also included.
+
+Run all tests through the test execution wrapper with
+
+```bash
+$ ./test.py --mode={debug,release}
+```
+
+The `--name` argument can be specified to run a particular test.
+
+Alternatively, you can execute the test executable directly. For example,
+
+```bash
+$ build/release/tests/row_cache_test -- -c1 -m1G
+```
+
+The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread and 1 GB of memory.
+
+### Preparing patches
+
+All changes to Scylla are submitted as patches to the public mailing list. Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.
+
+Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/).
+
+### Running Scylla
+
+Once Scylla has been compiled, executing the (`debug` or `release`) target will start a running instance in the foreground:
+
+```bash
+$ build/release/scylla
+```
+
+The `scylla` executable requires a configuration file, `scylla.yaml`. By default, this is read from `$SCYLLA_HOME/conf/scylla.yaml`. A good starting point for development is located in the repository at `/conf/scylla.yaml`.
+
+For development, a directory at `$HOME/scylla` can be used for all Scylla-related files:
+
+```bash
+$ mkdir -p $HOME/scylla $HOME/scylla/conf
+$ cp conf/scylla.yaml $HOME/scylla/conf/scylla.yaml
+$ # Edit configuration options as appropriate
+$ SCYLLA_HOME=$HOME/scylla build/release/scylla
+```
+
+The `scylla.yaml` file in the repository by default writes all database data to `/var/lib/scylla`, which likely requires root access. Change the `data_file_directories` and `commitlog_directory` fields as appropriate.
+
+Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.
+
+Additionally, when running on under-powered platforms like portable laptops, the `--overprovisined` flag is useful.
+
+On a development machine, one might run Scylla as
+
+```bash
+$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes
+```
+
+### Branches and tags
+
+Multiple release branches are maintained on the Git repository at https://github.com/scylladb/scylla. Release 1.5, for instance, is tracked on the `branch-1.5` branch.
+
+Similarly, tags are used to pin-point precise release versions, including hot-fix versions like 1.5.4. These are named `scylla-1.5.4`, for example.
+
+Most development happens on the `master` branch. Release branches are cut from `master` based on time and/or features. When a patch against `master` fixes a serious issue like a node crash or data loss, it is backported to a particular release branch with `git cherry-pick` by the project maintainers.
+
+## Example: development on Fedora 25
+
+This section describes one possible work-flow for developing Scylla on a Fedora 25 system. It is presented as an example to help you to develop a work-flow and tools that you are comfortable with.
+
+### Preface
+
+This guide will be written from the perspective of a fictitious developer, Taylor Smith.
+
+### Git work-flow
+
+Having two Git remotes is useful:
+
+- A public clone of Seastar (`"public"`)
+- A private clone of Seastar (`"private"`) for in-progress work or work that is not yet ready to share
+
+The first step to contributing a change to Scylla is to create a local branch dedicated to it. For example, a feature that fixes a bug in the CQL statement for creating tables could be called `ts/cql_create_table_error/v1`. The branch name is prefaced by the developer's initials and has a suffix indicating that this is the first version. The version suffix is useful when branches are shared publicly and changes are requested on the mailing list. Having a branch for each version of the patch (or patch set) shared publicly makes it easier to reference and compare the history of a change.
+
+Setting the upstream branch of your development branch to `master` is a useful way to track your changes. You can do this with
+
+```bash
+$ git branch -u master ts/cql_create_table_error/v1
+```
+
+As a patch set is developed, you can periodically push the branch to the private remote to back-up work.
+
+Once the patch set is ready to be reviewed, push the branch to the public remote and prepare an email to the `scylladb-dev` mailing list. Including a link to the branch on your public remote allows for reviewers to quickly test and explore your changes.
+
+### Development environment and source code navigation
+
+Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt`, for use only with development environments (not for building) so that they can properly analyze the source code.
+
+[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice for code hygiene, though its C++ parser sometimes makes errors and flags false issues.
+
+Other good options that directly parse CMake files are [KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
+
+To use the `CMakeLists.txt` file with these programs, define the `FOR_IDE` CMake variable or shell environmental variable.
+
+[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects, and its C++ parser has many similar issues as CLion.
+
+### Distributed compilation: `distcc` and `ccache`
+
+Scylla's compilations times can be long. Two tools help somewhat:
+
+- [ccache](https://ccache.samba.org/) caches compiled object files on disk and re-uses them when possible
+- [distcc](https://github.com/distcc/distcc) distributes compilation jobs to remote machines
+
+A reasonably-powered laptop acts as the coordinator for compilation. A second, more powerful, machine acts as a passive compilation server.
+
+Having a direct wired connection between the machines ensures that object files can be transmitted quickly and limits the overhead of remote compilation.
+The coordinator has been assigned the static IP address `10.0.0.1` and the passive compilation machine has been assigned `10.0.0.2`.
+
+On Fedora, installing the `ccache` package places symbolic links for `gcc` and `g++` in the `PATH`. This allows normal compilation to transparently invoke `ccache` for compilation and cache object files on the local file-system.
+
+Next, set `CCACHE_PREFIX` so that `ccache` is responsible for invoking `distcc` as necessary:
+
+```bash
+export CCACHE_PREFIX="distcc"
+```
+
+On each host, edit `/etc/sysconfig/distccd` to include the allowed coordinators and the total number of jobs that the machine should accept.
+This example is for the laptop, which has 2 physical cores (4 logical cores with hyper-threading):
+
+```
+OPTIONS="--allow 10.0.0.2 --allow 127.0.0.1 --jobs 4"
+```
+
+`10.0.0.2` has 8 physical cores (16 logical cores) and 64 GB of memory.
+
+As a rule-of-thumb, the number of jobs that a machine should be specified to support should be equal to the number of its native threads.
+
+Restart the `distccd` service on all machines.
+
+On the coordinator machine, edit `$HOME/.distcc/hosts` with the available hosts for compilation. Order of the hosts indicates preference.
+
+```
+10.0.0.2/16 localhost/2
+```
+
+In this example, `10.0.0.2` will be sent up to 16 jobs and the local machine will be sent up to 2. Allowing for two extra threads on the host machine for coordination, we run compilation with `16 + 2 + 2 = 20` jobs in total: `ninja-build -j20`.
+
+When a compilation is in progress, the status of jobs on all remote machines can be visualized in the terminal with `distccmon-text` or graphically as a GTK application with `distccmon-gnome`.
+
+One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next section speeding up this process.
+
+### Using the `gold` linker
+
+Linking Scylla can be slow. The gold linker can replace GNU ld and often speeds the linking process. On Fedora, you can switch the system linker using
+
+```bash
+$ sudo alternatives --config ld
+```
+
+### Testing changes in Seastar with Scylla
+
+Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.
+
+One way to do this it to create a local remote for the Seastar submodule in the Scylla repository:
+
+```bash
+$ cd $HOME/src/scylla
+$ cd seastar
+$ git remote add local /home/tsmith/src/seastar
+$ git remote update
+$ git checkout -t local/my_local_seastar_branch
+```
--- a/README.md
+++ b/README.md
@@ -1,29 +1,19 @@
 # Scylla

-## Building Scylla
+## Quick-start

-In addition to required packages by Seastar, the following packages are required by Scylla.
-
-### Submodules
-Scylla uses submodules, so make sure you pull the submodules first by doing:
-```
-git submodule init
-git submodule update --init --recursive
+```bash
+$ git submodule update --init --recursive
+$ sudo ./install-dependencies.sh
+$ ./configure.py --mode=release
+$ ninja-build -j4 # Assuming 4 system threads.
+$ ./build/release/scylla
+$ # Rejoice!
 ```

-### Building and Running Scylla on Fedora
-* Installing required packages:
+Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.

-```
-sudo dnf install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel protobuf-devel protobuf-compiler systemd-devel libunwind-devel
-```
-
-* Build Scylla
-```
-./configure.py --mode=release --with=scylla --disable-xen
-ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM
-
-```
+## Running Scylla

 * Run Scylla
 ```
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 #!/bin/sh

-VERSION=2.0.4
+VERSION=2.1.6

 if test -f version
 then
--- a/api/api-doc/storage_service.json
+++ b/api/api-doc/storage_service.json
@@ -952,6 +952,22 @@
            }
         ]
      },
+      {
+         "path":"/storage_service/force_terminate_repair",
+         "operations":[
+            {
+               "method":"POST",
+               "summary":"Force terminate all repair sessions",
+               "type":"void",
+               "nickname":"force_terminate_all_repair_sessions_new",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[
+               ]
+            }
+         ]
+      },
      {
         "path":"/storage_service/decommission",
         "operations":[
--- a/api/api.cc
+++ b/api/api.cc
@@ -49,7 +49,7 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
        throw bad_param_exception(ex.what());
    }
    // We never going to get here
-    return std::make_unique<reply>();
+    throw std::runtime_error("exception_reply");
 }

 future<> set_server_init(http_context& ctx) {
--- a/api/compaction_manager.cc
+++ b/api/compaction_manager.cc
@@ -20,6 +20,7 @@
 */

 #include "compaction_manager.hh"
+#include "sstables/compaction_manager.hh"
 #include "api/api-doc/compaction_manager.json.hh"
 #include "db/system_keyspace.hh"
 #include "column_family.hh"
--- a/api/storage_proxy.cc
+++ b/api/storage_proxy.cc
@@ -397,7 +397,7 @@ void set_storage_proxy(http_context& ctx, routes& r) {
    });

    sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
-        return sum_timer_stats(ctx.sp, &proxy::stats::read);
+        return sum_timer_stats(ctx.sp, &proxy::stats::range);
    });

    sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -34,6 +34,7 @@
 #include "column_family.hh"
 #include "log.hh"
 #include "release.hh"
+#include "sstables/compaction_manager.hh"

 namespace api {

@@ -361,16 +362,22 @@ void set_storage_service(http_context& ctx, routes& r) {
            try {
                res = fut.get0();
            } catch(std::runtime_error& e) {
-                return make_ready_future<json::json_return_type>(json_exception(httpd::bad_param_exception(e.what())));
+                throw httpd::bad_param_exception(e.what());
            }
            return make_ready_future<json::json_return_type>(json::json_return_type(res));
        });
    });

    ss::force_terminate_all_repair_sessions.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(json_void());
+        return repair_abort_all(service::get_local_storage_service().db()).then([] {
+            return make_ready_future<json::json_return_type>(json_void());
+        });
+    });
+
+    ss::force_terminate_all_repair_sessions_new.set(r, [](std::unique_ptr<request> req) {
+        return repair_abort_all(service::get_local_storage_service().db()).then([] {
+            return make_ready_future<json::json_return_type>(json_void());
+        });
    });

    ss::decommission.set(r, [](std::unique_ptr<request> req) {
--- a/atomic_cell.hh
+++ b/atomic_cell.hh
@@ -269,7 +269,7 @@ public:
    }
    // Can be called on live and dead cells
    bool has_expired(gc_clock::time_point now) const {
-        return is_live_and_has_ttl() && expiry() < now;
+        return is_live_and_has_ttl() && expiry() <= now;
    }
    bytes_view serialize() const {
        return _data;
--- a/auth/allow_all_authenticator.cc
+++ b/auth/allow_all_authenticator.cc
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/allow_all_authenticator.hh"
+
+#include "service/migration_manager.hh"
+#include "utils/class_registrator.hh"
+
+namespace auth {
+
+const sstring& allow_all_authenticator_name() {
+    static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthenticator";
+    return name;
+}
+
+// To ensure correct initialization order, we unfortunately need to use a string literal.
+static const class_registrator<
+        authenticator,
+        allow_all_authenticator,
+        cql3::query_processor&,
+        ::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");
+
+}
--- a/auth/allow_all_authenticator.hh
+++ b/auth/allow_all_authenticator.hh
@@ -0,0 +1,97 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <stdexcept>
+
+#include "auth/authenticator.hh"
+#include "auth/authenticated_user.hh"
+#include "auth/common.hh"
+
+namespace cql3 {
+class query_processor;
+}
+
+namespace service {
+class migration_manager;
+}
+
+namespace auth {
+
+const sstring& allow_all_authenticator_name();
+
+class allow_all_authenticator final : public authenticator {
+public:
+    allow_all_authenticator(cql3::query_processor&, ::service::migration_manager&) {
+    }
+
+    future<> start() override {
+        return make_ready_future<>();
+    }
+
+    future<> stop() override {
+        return make_ready_future<>();
+    }
+
+    const sstring& qualified_java_name() const override {
+        return allow_all_authenticator_name();
+    }
+
+    bool require_authentication() const override {
+        return false;
+    }
+
+    option_set supported_options() const override {
+        return option_set();
+    }
+
+    option_set alterable_options() const override {
+        return option_set();
+    }
+
+    future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override {
+        return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+    }
+
+    future<> create(sstring username, const option_map& options) override {
+        return make_ready_future();
+    }
+
+    future<> alter(sstring username, const option_map& options) override {
+        return make_ready_future();
+    }
+
+    future<> drop(sstring username) override {
+        return make_ready_future();
+    }
+
+    const resource_ids& protected_resources() const override {
+        static const resource_ids ids;
+        return ids;
+    }
+
+    ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
+        throw std::runtime_error("Should not reach");
+    }
+};
+
+}
--- a/auth/allow_all_authorizer.cc
+++ b/auth/allow_all_authorizer.cc
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/allow_all_authorizer.hh"
+
+#include "auth/common.hh"
+#include "utils/class_registrator.hh"
+
+namespace auth {
+
+const sstring& allow_all_authorizer_name() {
+    static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthorizer";
+    return name;
+}
+
+// To ensure correct initialization order, we unfortunately need to use a string literal.
+static const class_registrator<
+    authorizer,
+    allow_all_authorizer,
+    cql3::query_processor&,
+    ::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthorizer");
+
+}
--- a/auth/allow_all_authorizer.hh
+++ b/auth/allow_all_authorizer.hh
@@ -0,0 +1,98 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "authorizer.hh"
+#include "exceptions/exceptions.hh"
+#include "stdx.hh"
+
+namespace cql3 {
+class query_processor;
+}
+
+namespace service {
+class migration_manager;
+}
+
+namespace auth {
+
+class service;
+
+const sstring& allow_all_authorizer_name();
+
+class allow_all_authorizer final  : public authorizer {
+public:
+    allow_all_authorizer(cql3::query_processor&, ::service::migration_manager&) {
+    }
+
+    future<> start() override {
+        return make_ready_future<>();
+    }
+
+    future<> stop() override {
+        return make_ready_future<>();
+    }
+
+    const sstring& qualified_java_name() const override {
+        return allow_all_authorizer_name();
+    }
+
+    future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const override {
+        return make_ready_future<permission_set>(permissions::ALL);
+    }
+
+    future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
+        throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
+    }
+
+    future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
+        throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
+    }
+
+    future<std::vector<permission_details>> list(
+            service&,
+            ::shared_ptr<authenticated_user> performer,
+            permission_set,
+            stdx::optional<data_resource>,
+            stdx::optional<sstring>) const override {
+        throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
+    }
+
+    future<> revoke_all(sstring dropped_user) override {
+        return make_ready_future();
+    }
+
+    future<> revoke_all(data_resource) override {
+        return make_ready_future();
+    }
+
+    const resource_ids& protected_resources() override {
+        static const resource_ids ids;
+        return ids;
+    }
+
+    future<> validate_configuration() const override {
+        return make_ready_future();
+    }
+};
+
+}
--- a/auth/auth.cc
+++ b/auth/auth.cc
@@ -1,384 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-/*
- * Copyright (C) 2016 ScyllaDB
- *
- * Modified by ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-#include <seastar/core/sleep.hh>
-
-#include <seastar/core/distributed.hh>
-
-#include "auth.hh"
-#include "authenticator.hh"
-#include "authorizer.hh"
-#include "database.hh"
-#include "cql3/query_processor.hh"
-#include "cql3/statements/raw/cf_statement.hh"
-#include "cql3/statements/create_table_statement.hh"
-#include "db/config.hh"
-#include "service/migration_manager.hh"
-#include "utils/loading_cache.hh"
-#include "utils/hash.hh"
-
-const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
-const sstring auth::auth::AUTH_KS("system_auth");
-const sstring auth::auth::USERS_CF("users");
-
-static const sstring USER_NAME("name");
-static const sstring SUPER("super");
-
-static logging::logger alogger("auth");
-
-// TODO: configurable
-using namespace std::chrono_literals;
-const std::chrono::milliseconds auth::auth::SUPERUSER_SETUP_DELAY = 10000ms;
-
-class auth_migration_listener : public service::migration_listener {
-    void on_create_keyspace(const sstring& ks_name) override {}
-    void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
-    void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
-    void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
-    void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
-    void on_create_view(const sstring& ks_name, const sstring& view_name) override {}
-
-    void on_update_keyspace(const sstring& ks_name) override {}
-    void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
-    void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
-    void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
-    void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
-    void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
-
-    void on_drop_keyspace(const sstring& ks_name) override {
-        auth::authorizer::get().revoke_all(auth::data_resource(ks_name));
-    }
-    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
-        auth::authorizer::get().revoke_all(auth::data_resource(ks_name, cf_name));
-    }
-    void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
-    void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
-    void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
-    void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
-};
-
-static auth_migration_listener auth_migration;
-
-namespace std {
-template <>
-struct hash<auth::data_resource> {
-    size_t operator()(const auth::data_resource & v) const {
-        return v.hash_value();
-    }
-};
-
-template <>
-struct hash<auth::authenticated_user> {
-    size_t operator()(const auth::authenticated_user & v) const {
-        return utils::tuple_hash()(v.name(), v.is_anonymous());
-    }
-};
-}
-
-class auth::auth::permissions_cache {
-public:
-    typedef utils::loading_cache<std::pair<authenticated_user, data_resource>, permission_set, utils::loading_cache_reload_enabled::yes, utils::simple_entry_size<permission_set>, utils::tuple_hash> cache_type;
-    typedef typename cache_type::key_type key_type;
-
-    permissions_cache()
-                    : permissions_cache(
-                                    cql3::get_local_query_processor().db().local().get_config()) {
-    }
-
-    permissions_cache(const db::config& cfg)
-                    : _cache(cfg.permissions_cache_max_entries(), std::chrono::milliseconds(cfg.permissions_validity_in_ms()), std::chrono::milliseconds(cfg.permissions_update_interval_in_ms()), alogger,
-                        [] (const key_type& k) {
-                            alogger.debug("Refreshing permissions for {}", k.first.name());
-                            return authorizer::get().authorize(::make_shared<authenticated_user>(k.first), k.second);
-                        }) {}
-
-    future<> stop() {
-        return _cache.stop();
-    }
-
-    future<permission_set> get(::shared_ptr<authenticated_user> user, data_resource resource) {
-        return _cache.get(key_type(*user, std::move(resource)));
-    }
-
-private:
-    cache_type _cache;
-};
-
-namespace std { // for ADL, yuch
-
-std::ostream& operator<<(std::ostream& os, const std::pair<auth::authenticated_user, auth::data_resource>& p) {
-    os << "{user: " << p.first.name() << ", data_resource: " << p.second << "}";
-    return os;
-}
-
-}
-
-static distributed<auth::auth::permissions_cache> perm_cache;
-
-/**
- * Poor mans job schedule. For maximum 2 jobs. Sic.
- * Still does nothing more clever than waiting 10 seconds
- * like origin, then runs the submitted tasks.
- *
- * Only difference compared to sleep (from which this
- * borrows _heavily_) is that if tasks have not run by the time
- * we exit (and do static clean up) we delete the promise + cont
- *
- * Should be abstracted to some sort of global server function
- * probably.
- */
-struct waiter {
-    promise<> done;
-    timer<> tmr;
-    waiter() : tmr([this] {done.set_value();})
-    {
-        tmr.arm(auth::auth::SUPERUSER_SETUP_DELAY);
-    }
-    ~waiter() {
-        if (tmr.armed()) {
-            tmr.cancel();
-            done.set_exception(std::runtime_error("shutting down"));
-        }
-        alogger.trace("Deleting scheduled task");
-    }
-    void kill() {
-    }
-};
-
-typedef std::unique_ptr<waiter> waiter_ptr;
-
-static std::vector<waiter_ptr> & thread_waiters() {
-    static thread_local std::vector<waiter_ptr> the_waiters;
-    return the_waiters;
-}
-
-void auth::auth::schedule_when_up(scheduled_func f) {
-    alogger.trace("Adding scheduled task");
-
-    auto & waiters = thread_waiters();
-
-    waiters.emplace_back(std::make_unique<waiter>());
-    auto* w = waiters.back().get();
-
-    w->done.get_future().finally([w] {
-        auto & waiters = thread_waiters();
-        auto i = std::find_if(waiters.begin(), waiters.end(), [w](const waiter_ptr& p) {
-                            return p.get() == w;
-                        });
-        if (i != waiters.end()) {
-            waiters.erase(i);
-        }
-    }).then([f = std::move(f)] {
-        alogger.trace("Running scheduled task");
-        return f();
-    }).handle_exception([](auto ep) {
-        return make_ready_future();
-    });
-}
-
-bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
-    if (type == classname) {
-        return true;
-    }
-    auto i = classname.find_last_of('.');
-    return classname.compare(i + 1, sstring::npos, type) == 0;
-}
-
-future<> auth::auth::setup() {
-    auto& db = cql3::get_local_query_processor().db().local();
-    auto& cfg = db.get_config();
-
-    future<> f = perm_cache.start();
-
-    if (is_class_type(cfg.authenticator(),
-                    authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)
-                    && is_class_type(cfg.authorizer(),
-                                    authorizer::ALLOW_ALL_AUTHORIZER_NAME)
-                                    ) {
-        // just create the objects
-        return f.then([&cfg] {
-            return authenticator::setup(cfg.authenticator());
-        }).then([&cfg] {
-            return authorizer::setup(cfg.authorizer());
-        });
-    }
-
-    if (!db.has_keyspace(AUTH_KS)) {
-        std::map<sstring, sstring> opts;
-        opts["replication_factor"] = "1";
-        auto ksm = keyspace_metadata::new_keyspace(AUTH_KS, "org.apache.cassandra.locator.SimpleStrategy", opts, true);
-        // We use min_timestamp so that default keyspace metadata will loose with any manual adjustments. See issue #2129.
-        f = service::get_local_migration_manager().announce_new_keyspace(ksm, api::min_timestamp, false);
-    }
-
-    return f.then([] {
-        return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
-                                        AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
-                                        90 * 24 * 60 * 60)); // 3 months.
-    }).then([&cfg] {
-        return authenticator::setup(cfg.authenticator());
-    }).then([&cfg] {
-        return authorizer::setup(cfg.authorizer());
-    }).then([] {
-        service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
-        // instead of once-timer, just schedule this later
-        schedule_when_up([] {
-            // setup default super user
-            return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
-                if (!exists) {
-                    auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
-                                    AUTH_KS, USERS_CF, USER_NAME, SUPER);
-                    cql3::get_local_query_processor().process(query, db::consistency_level::ONE, {DEFAULT_SUPERUSER_NAME, true}).then([](auto) {
-                        alogger.info("Created default superuser '{}'", DEFAULT_SUPERUSER_NAME);
-                    }).handle_exception([](auto ep) {
-                        try {
-                            std::rethrow_exception(ep);
-                        } catch (exceptions::request_execution_exception&) {
-                            alogger.warn("Skipped default superuser setup: some nodes were not ready");
-                        }
-                    });
-                }
-            });
-        });
-    });
-}
-
-future<> auth::auth::shutdown() {
-    // just make sure we don't have pending tasks.
-    // this is mostly relevant for test cases where
-    // db-env-shutdown != process shutdown
-    return smp::invoke_on_all([] {
-        thread_waiters().clear();
-    }).then([] {
-        return perm_cache.stop();
-    });
-}
-
-future<auth::permission_set> auth::auth::get_permissions(::shared_ptr<authenticated_user> user, data_resource resource) {
-    return perm_cache.local().get(std::move(user), std::move(resource));
-}
-
-static db::consistency_level consistency_for_user(const sstring& username) {
-    if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
-        return db::consistency_level::QUORUM;
-    }
-    return db::consistency_level::LOCAL_ONE;
-}
-
-static future<::shared_ptr<cql3::untyped_result_set>> select_user(const sstring& username) {
-    // Here was a thread local, explicit cache of prepared statement. In normal execution this is
-    // fine, but since we in testing set up and tear down system over and over, we'd start using
-    // obsolete prepared statements pretty quickly.
-    // Rely on query processing caching statements instead, and lets assume
-    // that a map lookup string->statement is not gonna kill us much.
-    return cql3::get_local_query_processor().process(
-                    sprint("SELECT * FROM %s.%s WHERE %s = ?",
-                                    auth::auth::AUTH_KS, auth::auth::USERS_CF,
-                                    USER_NAME), consistency_for_user(username),
-                    { username }, true);
-}
-
-future<bool> auth::auth::is_existing_user(const sstring& username) {
-    return select_user(username).then(
-                    [](::shared_ptr<cql3::untyped_result_set> res) {
-                        return make_ready_future<bool>(!res->empty());
-                    });
-}
-
-future<bool> auth::auth::is_super_user(const sstring& username) {
-    return select_user(username).then(
-                    [](::shared_ptr<cql3::untyped_result_set> res) {
-                        return make_ready_future<bool>(!res->empty() && res->one().get_as<bool>(SUPER));
-                    });
-}
-
-future<> auth::auth::insert_user(const sstring& username, bool is_super) {
-    return cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
-                    AUTH_KS, USERS_CF, USER_NAME, SUPER),
-                    consistency_for_user(username), { username, is_super }).discard_result();
-}
-
-future<> auth::auth::delete_user(const sstring& username) {
-    return cql3::get_local_query_processor().process(sprint("DELETE FROM %s.%s WHERE %s = ?",
-                    AUTH_KS, USERS_CF, USER_NAME),
-                    consistency_for_user(username), { username }).discard_result();
-}
-
-future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
-    auto& qp = cql3::get_local_query_processor();
-    auto& db = qp.db().local();
-
-    if (db.has_schema(AUTH_KS, name)) {
-        return make_ready_future();
-    }
-
-    ::shared_ptr<cql3::statements::raw::cf_statement> parsed = static_pointer_cast<
-                    cql3::statements::raw::cf_statement>(cql3::query_processor::parse_statement(cql));
-    parsed->prepare_keyspace(AUTH_KS);
-    ::shared_ptr<cql3::statements::create_table_statement> statement =
-                    static_pointer_cast<cql3::statements::create_table_statement>(
-                                    parsed->prepare(db, qp.get_cql_stats())->statement);
-    auto schema = statement->get_cf_meta_data();
-    auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
-
-    schema_builder b(schema);
-    b.set_uuid(uuid);
-    return service::get_local_migration_manager().announce_new_column_family(b.build(), false);
-}
-
-future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {
-    auto default_user_query = sprint("SELECT * FROM %s.%s WHERE %s = ?", AUTH_KS, cfname, name_column);
-    auto all_users_query = sprint("SELECT * FROM %s.%s LIMIT 1", AUTH_KS, cfname);
-
-    return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::ONE, { def_user_name }).then([=](::shared_ptr<cql3::untyped_result_set> res) {
-        if (!res->empty()) {
-            return make_ready_future<bool>(true);
-        }
-        return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::QUORUM, { def_user_name }).then([all_users_query](::shared_ptr<cql3::untyped_result_set> res) {
-            if (!res->empty()) {
-                return make_ready_future<bool>(true);
-            }
-            return cql3::get_local_query_processor().process(all_users_query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> res) {
-                return make_ready_future<bool>(!res->empty());
-            });
-        });
-    });
-}
-
--- a/auth/auth.hh
+++ b/auth/auth.hh
@@ -1,125 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-/*
- * Copyright (C) 2016 ScyllaDB
- *
- * Modified by ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#pragma once
-
-#include <chrono>
-#include <seastar/core/sstring.hh>
-#include <seastar/core/future.hh>
-#include <seastar/core/shared_ptr.hh>
-
-
-#include "exceptions/exceptions.hh"
-#include "permission.hh"
-#include "data_resource.hh"
-#include "authenticated_user.hh"
-
-namespace auth {
-
-class auth {
-public:
-    class permissions_cache;
-
-    static const sstring DEFAULT_SUPERUSER_NAME;
-    static const sstring AUTH_KS;
-    static const sstring USERS_CF;
-    static const std::chrono::milliseconds SUPERUSER_SETUP_DELAY;
-
-    static bool is_class_type(const sstring& type, const sstring& classname);
-
-    static future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource);
-
-    /**
-     * Checks if the username is stored in AUTH_KS.USERS_CF.
-     *
-     * @param username Username to query.
-     * @return whether or not Cassandra knows about the user.
-     */
-    static future<bool> is_existing_user(const sstring& username);
-
-    /**
-     * Checks if the user is a known superuser.
-     *
-     * @param username Username to query.
-     * @return true is the user is a superuser, false if they aren't or don't exist at all.
-     */
-    static future<bool> is_super_user(const sstring& username);
-
-    /**
-     * Inserts the user into AUTH_KS.USERS_CF (or overwrites their superuser status as a result of an ALTER USER query).
-     *
-     * @param username Username to insert.
-     * @param isSuper User's new status.
-     * @throws RequestExecutionException
-     */
-    static future<> insert_user(const sstring& username, bool is_super);
-
-    /**
-     * Deletes the user from AUTH_KS.USERS_CF.
-     *
-     * @param username Username to delete.
-     * @throws RequestExecutionException
-     */
-    static future<> delete_user(const sstring& username);
-
-    /**
-     * Sets up Authenticator and Authorizer.
-     */
-    static future<> setup();
-    static future<> shutdown();
-
-    /**
-     * Set up table from given CREATE TABLE statement under system_auth keyspace, if not already done so.
-     *
-     * @param name name of the table
-     * @param cql CREATE TABLE statement
-     */
-    static future<> setup_table(const sstring& name, const sstring& cql);
-
-    static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
-
-    // For internal use. Run function "when system is up".
-    typedef std::function<future<>()> scheduled_func;
-    static void schedule_when_up(scheduled_func);
-};
-}
-
-std::ostream& operator<<(std::ostream& os, const std::pair<auth::authenticated_user, auth::data_resource>& p);
--- a/auth/authenticated_user.cc
+++ b/auth/authenticated_user.cc
@@ -41,7 +41,6 @@


 #include "authenticated_user.hh"
-#include "auth.hh"

 const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");

@@ -60,13 +59,6 @@ const sstring& auth::authenticated_user::name() const {
    return _anon ? ANONYMOUS_USERNAME : _name;
 }

-future<bool> auth::authenticated_user::is_super() const {
-    if (is_anonymous()) {
-        return make_ready_future<bool>(false);
-    }
-    return auth::auth::is_super_user(_name);
-}
-
 bool auth::authenticated_user::operator==(const authenticated_user& v) const {
    return _anon ? v._anon : _name == v._name;
 }
--- a/auth/authenticated_user.hh
+++ b/auth/authenticated_user.hh
@@ -58,14 +58,6 @@ public:

    const sstring& name() const;

-    /**
-     * Checks the user's superuser status.
-     * Only a superuser is allowed to perform CREATE USER and DROP USER queries.
-     * Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
-     * (depends on IAuthorizer implementation).
-     */
-    future<bool> is_super() const;
-
    /**
     * If IAuthenticator doesn't require authentication, this method may return true.
     */
--- a/auth/authenticator.cc
+++ b/auth/authenticator.cc
@@ -41,13 +41,14 @@

 #include "authenticator.hh"
 #include "authenticated_user.hh"
+#include "common.hh"
 #include "password_authenticator.hh"
-#include "auth.hh"
+#include "cql3/query_processor.hh"
 #include "db/config.hh"
+#include "utils/class_registrator.hh"

 const sstring auth::authenticator::USERNAME_KEY("username");
 const sstring auth::authenticator::PASSWORD_KEY("password");
-const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");

 auth::authenticator::option auth::authenticator::string_to_option(const sstring& name) {
    if (strcasecmp(name.c_str(), "password") == 0) {
@@ -64,64 +65,3 @@ sstring auth::authenticator::option_to_string(option opt) {
        throw std::invalid_argument(sprint("Unknown option {}", opt));
    }
 }
-
-/**
- * Authenticator is assumed to be a fully state-less immutable object (note all the const).
- * We thus store a single instance globally, since it should be safe/ok.
- */
-static std::unique_ptr<auth::authenticator> global_authenticator;
-
-future<>
-auth::authenticator::setup(const sstring& type) {
-    if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHENTICATOR_NAME)) {
-        class allow_all_authenticator : public authenticator {
-        public:
-            const sstring& class_name() const override {
-                return ALLOW_ALL_AUTHENTICATOR_NAME;
-            }
-            bool require_authentication() const override {
-                return false;
-            }
-            option_set supported_options() const override {
-                return option_set();
-            }
-            option_set alterable_options() const override {
-                return option_set();
-            }
-            future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override {
-                return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
-            }
-            future<> create(sstring username, const option_map& options) override {
-                return make_ready_future();
-            }
-            future<> alter(sstring username, const option_map& options) override {
-                return make_ready_future();
-            }
-            future<> drop(sstring username) override {
-                return make_ready_future();
-            }
-            const resource_ids& protected_resources() const override {
-                static const resource_ids ids;
-                return ids;
-            }
-            ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
-                throw std::runtime_error("Should not reach");
-            }
-        };
-        global_authenticator = std::make_unique<allow_all_authenticator>();
-    } else if (auth::auth::is_class_type(type, password_authenticator::PASSWORD_AUTHENTICATOR_NAME)) {
-        auto pwa = std::make_unique<password_authenticator>();
-        auto f = pwa->init();
-        return f.then([pwa = std::move(pwa)]() mutable {
-            global_authenticator = std::move(pwa);
-        });
-    } else {
-        throw exceptions::configuration_exception("Invalid authenticator type: " + type);
-    }
-    return make_ready_future();
-}
-
-auth::authenticator& auth::authenticator::get() {
-    assert(global_authenticator);
-    return *global_authenticator;
-}
--- a/auth/authenticator.hh
+++ b/auth/authenticator.hh
@@ -69,7 +69,6 @@ class authenticator {
 public:
    static const sstring USERNAME_KEY;
    static const sstring PASSWORD_KEY;
-    static const sstring ALLOW_ALL_AUTHENTICATOR_NAME;

    /**
     * Supported CREATE USER/ALTER USER options.
@@ -86,23 +85,14 @@ public:
    using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
    using credentials_map = std::unordered_map<sstring, sstring>;

-    /**
-     * Setup is called once upon system startup to initialize the IAuthenticator.
-     *
-     * For example, use this method to create any required keyspaces/column families.
-     * Note: Only call from main thread.
-     */
-    static future<> setup(const sstring& type);
-
-    /**
-     * Returns the system authenticator. Must have called setup before calling this.
-     */
-    static authenticator& get();
-
    virtual ~authenticator()
    {}

-    virtual const sstring& class_name() const = 0;
+    virtual future<> start() = 0;
+
+    virtual future<> stop() = 0;
+
+    virtual const sstring& qualified_java_name() const = 0;

    /**
     * Whether or not the authenticator requires explicit login.
--- a/auth/authorizer.cc
+++ b/auth/authorizer.cc
@@ -41,23 +41,39 @@

 #include "authorizer.hh"
 #include "authenticated_user.hh"
+#include "common.hh"
 #include "default_authorizer.hh"
 #include "auth.hh"
+#include "cql3/query_processor.hh"
 #include "db/config.hh"
+#include "utils/class_registrator.hh"

-const sstring auth::authorizer::ALLOW_ALL_AUTHORIZER_NAME("org.apache.cassandra.auth.AllowAllAuthorizer");
+const sstring& auth::allow_all_authorizer_name() {
+    static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthorizer";
+    return name;
+}

 /**
 * Authenticator is assumed to be a fully state-less immutable object (note all the const).
 * We thus store a single instance globally, since it should be safe/ok.
 */
 static std::unique_ptr<auth::authorizer> global_authorizer;
+using authorizer_registry = class_registry<auth::authorizer, cql3::query_processor&>;

 future<>
 auth::authorizer::setup(const sstring& type) {
-    if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHORIZER_NAME)) {
+    if (type == allow_all_authorizer_name()) {
        class allow_all_authorizer : public authorizer {
        public:
+            future<> start() override {
+                return make_ready_future<>();
+            }
+            future<> stop() override {
+                return make_ready_future<>();
+            }
+            const sstring& qualified_java_name() const override {
+                return allow_all_authorizer_name();
+            }
            future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override {
                return make_ready_future<permission_set>(permissions::ALL);
            }
@@ -86,16 +102,14 @@ auth::authorizer::setup(const sstring& type) {
        };

        global_authorizer = std::make_unique<allow_all_authorizer>();
-    } else if (auth::auth::is_class_type(type, default_authorizer::DEFAULT_AUTHORIZER_NAME)) {
-        auto da = std::make_unique<default_authorizer>();
-        auto f = da->init();
-        return f.then([da = std::move(da)]() mutable {
-            global_authorizer = std::move(da);
-        });
+        return make_ready_future();
    } else {
-        throw exceptions::configuration_exception("Invalid authorizer type: " + type);
+        auto a = authorizer_registry::create(type, cql3::get_local_query_processor());
+        auto f = a->start();
+        return f.then([a = std::move(a)]() mutable {
+            global_authorizer = std::move(a);
+        });
    }
-    return make_ready_future();
 }

 auth::authorizer& auth::authorizer::get() {
--- a/auth/authorizer.hh
+++ b/auth/authorizer.hh
@@ -55,6 +55,8 @@

 namespace auth {

+class service;
+
 class authenticated_user;

 struct permission_details {
@@ -71,10 +73,14 @@ using std::experimental::optional;

 class authorizer {
 public:
-    static const sstring ALLOW_ALL_AUTHORIZER_NAME;
-
    virtual ~authorizer() {}

+    virtual future<> start() = 0;
+
+    virtual future<> stop() = 0;
+
+    virtual const sstring& qualified_java_name() const = 0;
+
    /**
     * The primary Authorizer method. Returns a set of permissions of a user on a resource.
     *
@@ -82,7 +88,7 @@ public:
     * @param resource Resource for which the authorization is being requested. @see DataResource.
     * @return Set of permissions of the user on the resource. Should never return empty. Use permission.NONE instead.
     */
-    virtual future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const = 0;
+    virtual future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const = 0;

    /**
     * Grants a set of permissions on a resource to a user.
@@ -126,7 +132,7 @@ public:
     * @throws RequestValidationException
     * @throws RequestExecutionException
     */
-    virtual future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;
+    virtual future<std::vector<permission_details>> list(service&, ::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;

    /**
     * This method is called before deleting a user with DROP USER query so that a new user with the same
@@ -156,18 +162,6 @@ public:
     * @throws ConfigurationException when there is a configuration error.
     */
    virtual future<> validate_configuration() const = 0;
-
-    /**
-     * Setup is called once upon system startup to initialize the IAuthorizer.
-     *
-     * For example, use this method to create any required keyspaces/column families.
-     */
-    static future<> setup(const sstring& type);
-
-    /**
-     * Returns the system authorizer. Must have called setup before calling this.
-     */
-    static authorizer& get();
 };

 }
--- a/auth/common.cc
+++ b/auth/common.cc
@@ -0,0 +1,70 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/common.hh"
+
+#include <seastar/core/shared_ptr.hh>
+
+#include "cql3/query_processor.hh"
+#include "cql3/statements/create_table_statement.hh"
+#include "schema_builder.hh"
+#include "service/migration_manager.hh"
+
+namespace auth {
+
+namespace meta {
+
+const sstring DEFAULT_SUPERUSER_NAME("cassandra");
+const sstring AUTH_KS("system_auth");
+const sstring USERS_CF("users");
+const sstring AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");
+
+}
+
+future<> create_metadata_table_if_missing(
+        const sstring& table_name,
+        cql3::query_processor& qp,
+        const sstring& cql,
+        ::service::migration_manager& mm) {
+    auto& db = qp.db().local();
+
+    if (db.has_schema(meta::AUTH_KS, table_name)) {
+        return make_ready_future<>();
+    }
+
+    auto parsed_statement = static_pointer_cast<cql3::statements::raw::cf_statement>(
+            cql3::query_processor::parse_statement(cql));
+
+    parsed_statement->prepare_keyspace(meta::AUTH_KS);
+
+    auto statement = static_pointer_cast<cql3::statements::create_table_statement>(
+            parsed_statement->prepare(db, qp.get_cql_stats())->statement);
+
+    const auto schema = statement->get_cf_meta_data();
+    const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
+
+    schema_builder b(schema);
+    b.set_uuid(uuid);
+
+    return mm.announce_new_column_family(b.build(), false);
+}
+
+}
--- a/auth/common.hh
+++ b/auth/common.hh
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <chrono>
+
+#include <seastar/core/future.hh>
+#include <seastar/core/reactor.hh>
+#include <seastar/core/resource.hh>
+#include <seastar/core/sstring.hh>
+
+#include "delayed_tasks.hh"
+#include "seastarx.hh"
+
+namespace service {
+class migration_manager;
+}
+
+namespace cql3 {
+class query_processor;
+}
+
+namespace auth {
+
+namespace meta {
+
+extern const sstring DEFAULT_SUPERUSER_NAME;
+extern const sstring AUTH_KS;
+extern const sstring USERS_CF;
+extern const sstring AUTH_PACKAGE_NAME;
+
+}
+
+template <class Task>
+future<> once_among_shards(Task&& f) {
+    if (engine().cpu_id() == 0u) {
+        return f();
+    }
+
+    return make_ready_future<>();
+}
+
+template <class Task, class Clock>
+void delay_until_system_ready(delayed_tasks<Clock>& ts, Task&& f) {
+    static const typename std::chrono::milliseconds delay_duration(10000);
+    ts.schedule_after(delay_duration, std::forward<Task>(f));
+}
+
+future<> create_metadata_table_if_missing(
+        const sstring& table_name,
+        cql3::query_processor&,
+        const sstring& cql,
+        ::service::migration_manager&);
+
+}
--- a/auth/default_authorizer.cc
+++ b/auth/default_authorizer.cc
@@ -46,16 +46,19 @@

 #include <seastar/core/reactor.hh>

-#include "auth.hh"
+#include "common.hh"
 #include "default_authorizer.hh"
 #include "authenticated_user.hh"
 #include "permission.hh"
 #include "cql3/query_processor.hh"
+#include "cql3/untyped_result_set.hh"
 #include "exceptions/exceptions.hh"
 #include "log.hh"

-const sstring auth::default_authorizer::DEFAULT_AUTHORIZER_NAME(
-                "org.apache.cassandra.auth.CassandraAuthorizer");
+const sstring& auth::default_authorizer_name() {
+    static const sstring name = meta::AUTH_PACKAGE_NAME + "CassandraAuthorizer";
+    return name;
+}

 static const sstring USER_NAME = "username";
 static const sstring RESOURCE_NAME = "resource";
@@ -64,28 +67,47 @@ static const sstring PERMISSIONS_CF = "permissions";

 static logging::logger alogger("default_authorizer");

-auth::default_authorizer::default_authorizer() {
+// To ensure correct initialization order, we unfortunately need to use a string literal.
+static const class_registrator<
+        auth::authorizer,
+        auth::default_authorizer,
+        cql3::query_processor&,
+        ::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.CassandraAuthorizer");
+
+auth::default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
+        : _qp(qp)
+        , _migration_manager(mm) {
 }
+
 auth::default_authorizer::~default_authorizer() {
 }

-future<> auth::default_authorizer::init() {
-    sstring create_table = sprint("CREATE TABLE %s.%s ("
+future<> auth::default_authorizer::start() {
+    static const sstring create_table = sprint("CREATE TABLE %s.%s ("
                    "%s text,"
                    "%s text,"
                    "%s set<text>,"
                    "PRIMARY KEY(%s, %s)"
-                    ") WITH gc_grace_seconds=%d", auth::auth::AUTH_KS,
+                    ") WITH gc_grace_seconds=%d", meta::AUTH_KS,
                    PERMISSIONS_CF, USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME,
                    USER_NAME, RESOURCE_NAME, 90 * 24 * 60 * 60); // 3 months.

-    return auth::setup_table(PERMISSIONS_CF, create_table);
+    return auth::once_among_shards([this] {
+        return auth::create_metadata_table_if_missing(
+                PERMISSIONS_CF,
+                _qp,
+                create_table,
+                _migration_manager);
+    });
 }

+future<> auth::default_authorizer::stop() {
+    return make_ready_future<>();
+}

 future<auth::permission_set> auth::default_authorizer::authorize(
-                ::shared_ptr<authenticated_user> user, data_resource resource) const {
-    return user->is_super().then([this, user, resource = std::move(resource)](bool is_super) {
+                service& ser, ::shared_ptr<authenticated_user> user, data_resource resource) const {
+    return auth::is_super_user(ser, *user).then([this, user, resource = std::move(resource)](bool is_super) {
        if (is_super) {
            return make_ready_future<permission_set>(permissions::ALL);
        }
@@ -94,10 +116,9 @@ future<auth::permission_set> auth::default_authorizer::authorize(
         * TOOD: could create actual data type for permission (translating string<->perm),
         * but this seems overkill right now. We still must store strings so...
         */
-        auto& qp = cql3::get_local_query_processor();
        auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?"
-                        , PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
-        return qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
+                        , PERMISSIONS_NAME, meta::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
+        return _qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
                        .then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
            try {
                auto res = f.get0();
@@ -120,11 +141,10 @@ future<> auth::default_authorizer::modify(
                ::shared_ptr<authenticated_user> performer, permission_set set,
                data_resource resource, sstring user, sstring op) {
    // TODO: why does this not check super user?
-    auto& qp = cql3::get_local_query_processor();
    auto query = sprint("UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
-                    auth::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
+                    meta::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
                    PERMISSIONS_NAME, op, USER_NAME, RESOURCE_NAME);
-    return qp.process(query, db::consistency_level::ONE, {
+    return _qp.process(query, db::consistency_level::ONE, {
                    permissions::to_strings(set), user, resource.name() }).discard_result();
 }

@@ -142,15 +162,14 @@ future<> auth::default_authorizer::revoke(
 }

 future<std::vector<auth::permission_details>> auth::default_authorizer::list(
-                ::shared_ptr<authenticated_user> performer, permission_set set,
+                service& ser, ::shared_ptr<authenticated_user> performer, permission_set set,
                optional<data_resource> resource, optional<sstring> user) const {
-    return performer->is_super().then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
+    return auth::is_super_user(ser, *performer).then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
        if (!is_super && (!user || performer->name() != *user)) {
            throw exceptions::unauthorized_exception(sprint("You are not authorized to view %s's permissions", user ? *user : "everyone"));
        }

-        auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF);
-        auto& qp = cql3::get_local_query_processor();
+        auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, meta::AUTH_KS, PERMISSIONS_CF);

        // Oh, look, it is a case where it does not pay off to have
        // parameters to process in an initializer list.
@@ -158,15 +177,15 @@ future<std::vector<auth::permission_details>> auth::default_authorizer::list(

        if (resource && user) {
            query += sprint(" WHERE %s = ? AND %s = ?", USER_NAME, RESOURCE_NAME);
-            f = qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
+            f = _qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
        } else if (resource) {
            query += sprint(" WHERE %s = ? ALLOW FILTERING", RESOURCE_NAME);
-            f = qp.process(query, db::consistency_level::ONE, {resource->name()});
+            f = _qp.process(query, db::consistency_level::ONE, {resource->name()});
        } else if (user) {
            query += sprint(" WHERE %s = ?", USER_NAME);
-            f = qp.process(query, db::consistency_level::ONE, {*user});
+            f = _qp.process(query, db::consistency_level::ONE, {*user});
        } else {
-            f = qp.process(query, db::consistency_level::ONE, {});
+            f = _qp.process(query, db::consistency_level::ONE, {});
        }

        return f.then([set](::shared_ptr<cql3::untyped_result_set> res) {
@@ -188,10 +207,9 @@ future<std::vector<auth::permission_details>> auth::default_authorizer::list(
 }

 future<> auth::default_authorizer::revoke_all(sstring dropped_user) {
-    auto& qp = cql3::get_local_query_processor();
-    auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", auth::AUTH_KS,
+    auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", meta::AUTH_KS,
                    PERMISSIONS_CF, USER_NAME);
-    return qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
+    return _qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
                    [dropped_user](auto ep) {
                        try {
                            std::rethrow_exception(ep);
@@ -202,17 +220,16 @@ future<> auth::default_authorizer::revoke_all(sstring dropped_user) {
 }

 future<> auth::default_authorizer::revoke_all(data_resource resource) {
-    auto& qp = cql3::get_local_query_processor();
    auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
-                    USER_NAME, auth::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
-    return qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
-                    .then_wrapped([resource, &qp](future<::shared_ptr<cql3::untyped_result_set>> f) {
+                    USER_NAME, meta::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
+    return _qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
+                    .then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
        try {
            auto res = f.get0();
-            return parallel_for_each(res->begin(), res->end(), [&qp, res, resource](const cql3::untyped_result_set::row& r) {
+            return parallel_for_each(res->begin(), res->end(), [this, res, resource](const cql3::untyped_result_set::row& r) {
                auto query = sprint("DELETE FROM %s.%s WHERE %s = ? AND %s = ?"
-                                , auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
-                return qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
+                                , meta::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
+                return _qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
                                .discard_result().handle_exception([resource](auto ep) {
                    try {
                        std::rethrow_exception(ep);
@@ -231,7 +248,7 @@ future<> auth::default_authorizer::revoke_all(data_resource resource) {


 const auth::resource_ids& auth::default_authorizer::protected_resources() {
-    static const resource_ids ids({ data_resource(auth::AUTH_KS, PERMISSIONS_CF) });
+    static const resource_ids ids({ data_resource(meta::AUTH_KS, PERMISSIONS_CF) });
    return ids;
 }

--- a/auth/default_authorizer.hh
+++ b/auth/default_authorizer.hh
@@ -41,26 +41,40 @@

 #pragma once

+#include <functional>
+
 #include "authorizer.hh"
+#include "cql3/query_processor.hh"
+#include "service/migration_manager.hh"

 namespace auth {

-class default_authorizer : public authorizer {
-public:
-    static const sstring DEFAULT_AUTHORIZER_NAME;
+const sstring& default_authorizer_name();

-    default_authorizer();
+class default_authorizer : public authorizer {
+    cql3::query_processor& _qp;
+
+    ::service::migration_manager& _migration_manager;
+
+public:
+    default_authorizer(cql3::query_processor&, ::service::migration_manager&);
    ~default_authorizer();

-    future<> init();
+    future<> start() override;

-    future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override;
+    future<> stop() override;
+
+    const sstring& qualified_java_name() const override {
+        return default_authorizer_name();
+    }
+
+    future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const override;

    future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;

    future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;

-    future<std::vector<permission_details>> list(::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;
+    future<std::vector<permission_details>> list(service&, ::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;

    future<> revoke_all(sstring) override;

--- a/auth/password_authenticator.cc
+++ b/auth/password_authenticator.cc
@@ -46,28 +46,42 @@

 #include <seastar/core/reactor.hh>

-#include "auth.hh"
+#include "common.hh"
 #include "password_authenticator.hh"
 #include "authenticated_user.hh"
-#include "cql3/query_processor.hh"
+#include "cql3/untyped_result_set.hh"
 #include "log.hh"
+#include "service/migration_manager.hh"
+#include "utils/class_registrator.hh"

-const sstring auth::password_authenticator::PASSWORD_AUTHENTICATOR_NAME("org.apache.cassandra.auth.PasswordAuthenticator");
+const sstring& auth::password_authenticator_name() {
+    static const sstring name = meta::AUTH_PACKAGE_NAME + "PasswordAuthenticator";
+    return name;
+}

 // name of the hash column.
 static const sstring SALTED_HASH = "salted_hash";
 static const sstring USER_NAME = "username";
-static const sstring DEFAULT_USER_NAME = auth::auth::DEFAULT_SUPERUSER_NAME;
-static const sstring DEFAULT_USER_PASSWORD = auth::auth::DEFAULT_SUPERUSER_NAME;
+static const sstring DEFAULT_USER_NAME = auth::meta::DEFAULT_SUPERUSER_NAME;
+static const sstring DEFAULT_USER_PASSWORD = auth::meta::DEFAULT_SUPERUSER_NAME;
 static const sstring CREDENTIALS_CF = "credentials";

 static logging::logger plogger("password_authenticator");

+// To ensure correct initialization order, we unfortunately need to use a string literal.
+static const class_registrator<
+        auth::authenticator,
+        auth::password_authenticator,
+        cql3::query_processor&,
+        ::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");
+
 auth::password_authenticator::~password_authenticator()
 {}

-auth::password_authenticator::password_authenticator()
-{}
+auth::password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
+    : _qp(qp)
+    , _migration_manager(mm) {
+}

 // TODO: blowfish
 // Origin uses Java bcrypt library, i.e. blowfish salt
@@ -88,12 +102,10 @@ auth::password_authenticator::password_authenticator()
 // and some old-fashioned random salt generation.

 static constexpr size_t rand_bytes = 16;
+static thread_local crypt_data tlcrypt = { 0, };

 static sstring hashpw(const sstring& pass, const sstring& salt) {
-    // crypt_data is huge. should this be a thread_local static?
-    auto tmp = std::make_unique<crypt_data>();
-    tmp->initialized = 0;
-    auto res = crypt_r(pass.c_str(), salt.c_str(), tmp.get());
+    auto res = crypt_r(pass.c_str(), salt.c_str(), &tlcrypt);
    if (res == nullptr) {
        throw std::system_error(errno, std::system_category());
    }
@@ -122,17 +134,16 @@ static sstring gensalt() {
    sstring salt;

    if (!prefix.empty()) {
-        return prefix + salt;
+        return prefix + input;
    }

-    auto tmp = std::make_unique<crypt_data>();
-    tmp->initialized = 0;
-
    // Try in order:
    // blowfish 2011 fix, blowfish, sha512, sha256, md5
    for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
        salt = pfx + input;
-        if (crypt_r("fisk", salt.c_str(), tmp.get())) {
+        const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
+
+        if (e && (e[0] != '*')) {
            prefix = pfx;
            return salt;
        }
@@ -144,39 +155,52 @@ static sstring hashpw(const sstring& pass) {
    return hashpw(pass, gensalt());
 }

-future<> auth::password_authenticator::init() {
-    gensalt(); // do this once to determine usable hashing
+future<> auth::password_authenticator::start() {
+    return auth::once_among_shards([this] {
+        gensalt(); // do this once to determine usable hashing

-    sstring create_table = sprint(
-                    "CREATE TABLE %s.%s ("
-                                    "%s text,"
-                                    "%s text," // salt + hash + number of rounds
-                                    "options map<text,text>,"// for future extensions
-                                    "PRIMARY KEY(%s)"
-                                    ") WITH gc_grace_seconds=%d",
-                    auth::auth::AUTH_KS,
-                    CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
-                    90 * 24 * 60 * 60); // 3 months.
+        static const sstring create_table = sprint(
+                "CREATE TABLE %s.%s ("
+                "%s text,"
+                "%s text," // salt + hash + number of rounds
+                "options map<text,text>,"// for future extensions
+                "PRIMARY KEY(%s)"
+                ") WITH gc_grace_seconds=%d",
+                meta::AUTH_KS,
+                CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
+                90 * 24 * 60 * 60); // 3 months.

-    return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
-        // instead of once-timer, just schedule this later
-        auth::schedule_when_up([] {
-            return auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
-                if (!exists) {
-                    cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
-                                                    auth::AUTH_KS,
-                                                    CREDENTIALS_CF,
-                                                    USER_NAME, SALTED_HASH
-                                    ),
-                                    db::consistency_level::ONE, {DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD)}).then([](auto) {
-                                        plogger.info("Created default user '{}'", DEFAULT_USER_NAME);
-                                    });
-                }
+        return auth::create_metadata_table_if_missing(
+                CREDENTIALS_CF,
+                _qp,
+                create_table,
+                _migration_manager).then([this] {
+            auth::delay_until_system_ready(_delayed, [this] {
+                return has_existing_users().then([this](bool existing) {
+                    if (!existing) {
+                        return _qp.process(
+                                sprint(
+                                        "INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
+                                        meta::AUTH_KS,
+                                        CREDENTIALS_CF,
+                                        USER_NAME, SALTED_HASH),
+                                db::consistency_level::ONE,
+                                { DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD) }).then([](auto) {
+                            plogger.info("Created default user '{}'", DEFAULT_USER_NAME);
+                        });
+                    }
+
+                    return make_ready_future<>();
+                });
            });
        });
    });
 }

+future<> auth::password_authenticator::stop() {
+    return make_ready_future<>();
+}
+
 db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
    if (username == DEFAULT_USER_NAME) {
        return db::consistency_level::QUORUM;
@@ -184,8 +208,8 @@ db::consistency_level auth::password_authenticator::consistency_for_user(const s
    return db::consistency_level::LOCAL_ONE;
 }

-const sstring& auth::password_authenticator::class_name() const {
-    return PASSWORD_AUTHENTICATOR_NAME;
+const sstring& auth::password_authenticator::qualified_java_name() const {
+    return password_authenticator_name();
 }

 bool auth::password_authenticator::require_authentication() const {
@@ -218,9 +242,8 @@ future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::au
    // Rely on query processing caching statements instead, and lets assume
    // that a map lookup string->statement is not gonna kill us much.
    return futurize_apply([this, username, password] {
-        auto& qp = cql3::get_local_query_processor();
-        return qp.process(sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
-                                        auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
+        return _qp.process(sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
+                                        meta::AUTH_KS, CREDENTIALS_CF, USER_NAME),
                        consistency_for_user(username), {username}, true);
    }).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
        try {
@@ -244,9 +267,8 @@ future<> auth::password_authenticator::create(sstring username,
    try {
        auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
        auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
-                        auth::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
-        auto& qp = cql3::get_local_query_processor();
-        return qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
+                        meta::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
+        return _qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
    } catch (std::out_of_range&) {
        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
    }
@@ -257,9 +279,8 @@ future<> auth::password_authenticator::alter(sstring username,
    try {
        auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
        auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
-                        auth::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
-        auto& qp = cql3::get_local_query_processor();
-        return qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
+                        meta::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
+        return _qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
    } catch (std::out_of_range&) {
        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
    }
@@ -268,24 +289,24 @@ future<> auth::password_authenticator::alter(sstring username,
 future<> auth::password_authenticator::drop(sstring username) {
    try {
        auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
-                        auth::AUTH_KS, CREDENTIALS_CF, USER_NAME);
-        auto& qp = cql3::get_local_query_processor();
-        return qp.process(query, consistency_for_user(username), { username }).discard_result();
+                        meta::AUTH_KS, CREDENTIALS_CF, USER_NAME);
+        return _qp.process(query, consistency_for_user(username), { username }).discard_result();
    } catch (std::out_of_range&) {
        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
    }
 }

 const auth::resource_ids& auth::password_authenticator::protected_resources() const {
-    static const resource_ids ids({ data_resource(auth::AUTH_KS, CREDENTIALS_CF) });
+    static const resource_ids ids({ data_resource(meta::AUTH_KS, CREDENTIALS_CF) });
    return ids;
 }

 ::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
    class plain_text_password_challenge: public sasl_challenge {
+        const password_authenticator& _self;
+
    public:
-        plain_text_password_challenge(const password_authenticator& a)
-                        : _authenticator(a)
+        plain_text_password_challenge(const password_authenticator& self) : _self(self)
        {}

        /**
@@ -340,12 +361,58 @@ const auth::resource_ids& auth::password_authenticator::protected_resources() co
            return _complete;
        }
        future<::shared_ptr<authenticated_user>> get_authenticated_user() const override {
-            return _authenticator.authenticate(_credentials);
+            return _self.authenticate(_credentials);
        }
    private:
-        const password_authenticator& _authenticator;
        credentials_map _credentials;
        bool _complete = false;
    };
    return ::make_shared<plain_text_password_challenge>(*this);
 }
+
+
+//
+// Similar in structure to `auth::service::has_existing_users()`, but trying to generalize the pattern breaks all kinds
+// of module boundaries and leaks implementation details.
+//
+future<bool> auth::password_authenticator::has_existing_users() const {
+    static const sstring default_user_query = sprint(
+            "SELECT * FROM %s.%s WHERE %s = ?",
+            meta::AUTH_KS,
+            CREDENTIALS_CF,
+            USER_NAME);
+
+    static const sstring all_users_query = sprint(
+            "SELECT * FROM %s.%s LIMIT 1",
+            meta::AUTH_KS,
+            CREDENTIALS_CF);
+
+    // This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we
+    // can potentially avoid doing a range query with a high consistency level.
+
+    return _qp.process(
+            default_user_query,
+            db::consistency_level::ONE,
+            { meta::DEFAULT_SUPERUSER_NAME },
+            true).then([this](auto results) {
+        if (!results->empty()) {
+            return make_ready_future<bool>(true);
+        }
+
+        return _qp.process(
+                default_user_query,
+                db::consistency_level::QUORUM,
+                { meta::DEFAULT_SUPERUSER_NAME },
+                true).then([this](auto results) {
+            if (!results->empty()) {
+                return make_ready_future<bool>(true);
+            }
+
+            return _qp.process(
+                    all_users_query,
+                    db::consistency_level::QUORUM).then([](auto results) {
+                return make_ready_future<bool>(!results->empty());
+            });
+        });
+    });
+}
--- a/auth/password_authenticator.hh
+++ b/auth/password_authenticator.hh
@@ -42,19 +42,33 @@
 #pragma once

 #include "authenticator.hh"
+#include "cql3/query_processor.hh"
+#include "delayed_tasks.hh"
+
+namespace service {
+class migration_manager;
+}

 namespace auth {

-class password_authenticator : public authenticator {
-public:
-    static const sstring PASSWORD_AUTHENTICATOR_NAME;
+const sstring& password_authenticator_name();

-    password_authenticator();
+class password_authenticator : public authenticator {
+    cql3::query_processor& _qp;
+
+    ::service::migration_manager& _migration_manager;
+
+    delayed_tasks<> _delayed{};
+
+public:
+    password_authenticator(cql3::query_processor&, ::service::migration_manager&);
    ~password_authenticator();

-    future<> init();
+    future<> start() override;

-    const sstring& class_name() const override;
+    future<> stop() override;
+
+    const sstring& qualified_java_name() const override;
    bool require_authentication() const override;
    option_set supported_options() const override;
    option_set alterable_options() const override;
@@ -67,6 +81,9 @@ public:


    static db::consistency_level consistency_for_user(const sstring& username);
+
+private:
+    future<bool> has_existing_users() const;
 };

 }
--- a/auth/permissions_cache.cc
+++ b/auth/permissions_cache.cc
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/permissions_cache.hh"
+
+#include "auth/authorizer.hh"
+#include "auth/common.hh"
+#include "auth/service.hh"
+#include "db/config.hh"
+
+namespace auth {
+
+permissions_cache_config permissions_cache_config::from_db_config(const db::config& dc) {
+    permissions_cache_config c;
+    c.max_entries = dc.permissions_cache_max_entries();
+    c.validity_period = std::chrono::milliseconds(dc.permissions_validity_in_ms());
+    c.update_period = std::chrono::milliseconds(dc.permissions_update_interval_in_ms());
+
+    return c;
+}
+
+permissions_cache::permissions_cache(const permissions_cache_config& c, service& ser, logging::logger& log)
+        : _cache(c.max_entries, c.validity_period, c.update_period, log, [&ser, &log](const key_type& k) {
+              log.debug("Refreshing permissions for {}", k.first.name());
+              return ser.underlying_authorizer().authorize(ser, ::make_shared<authenticated_user>(k.first), k.second);
+          }) {
+}
+
+future<permission_set> permissions_cache::get(::shared_ptr<authenticated_user> user, data_resource r) {
+    return _cache.get(key_type(*user, r));
+}
+
+}
--- a/auth/permissions_cache.hh
+++ b/auth/permissions_cache.hh
@@ -0,0 +1,99 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <chrono>
+#include <functional>
+#include <iostream>
+#include <utility>
+
+#include <seastar/core/future.hh>
+#include <seastar/core/shared_ptr.hh>
+
+#include "auth/authenticated_user.hh"
+#include "auth/data_resource.hh"
+#include "auth/permission.hh"
+#include "log.hh"
+#include "utils/loading_cache.hh"
+
+namespace std {
+
+template <>
+struct hash<auth::data_resource> final {
+    size_t operator()(const auth::data_resource & v) const {
+        return v.hash_value();
+    }
+};
+
+template <>
+struct hash<auth::authenticated_user> final {
+    size_t operator()(const auth::authenticated_user & v) const {
+        return utils::tuple_hash()(v.name(), v.is_anonymous());
+    }
+};
+
+inline std::ostream& operator<<(std::ostream& os, const std::pair<auth::authenticated_user, auth::data_resource>& p) {
+    os << "{user: " << p.first.name() << ", data_resource: " << p.second << "}";
+    return os;
+}
+
+}
+
+namespace db {
+class config;
+}
+
+namespace auth {
+
+class service;
+
+struct permissions_cache_config final {
+    static permissions_cache_config from_db_config(const db::config&);
+
+    std::size_t max_entries;
+    std::chrono::milliseconds validity_period;
+    std::chrono::milliseconds update_period;
+};
+
+class permissions_cache final {
+    using cache_type = utils::loading_cache<
+            std::pair<authenticated_user, data_resource>,
+            permission_set,
+            utils::loading_cache_reload_enabled::yes,
+            utils::simple_entry_size<permission_set>,
+            utils::tuple_hash>;
+
+    using key_type = typename cache_type::key_type;
+
+    cache_type _cache;
+
+public:
+    explicit permissions_cache(const permissions_cache_config&, service&, logging::logger&);
+
+    future <> stop() {
+        return _cache.stop();
+    }
+
+    future<permission_set> get(::shared_ptr<authenticated_user>, data_resource);
+};
+
+}
--- a/auth/service.cc
+++ b/auth/service.cc
@@ -0,0 +1,355 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/service.hh"
+
+#include <map>
+
+#include <seastar/core/future-util.hh>
+#include <seastar/core/shared_ptr.hh>
+
+#include "auth/allow_all_authenticator.hh"
+#include "auth/allow_all_authorizer.hh"
+#include "auth/common.hh"
+#include "cql3/query_processor.hh"
+#include "cql3/untyped_result_set.hh"
+#include "db/config.hh"
+#include "db/consistency_level.hh"
+#include "exceptions/exceptions.hh"
+#include "log.hh"
+#include "service/migration_listener.hh"
+#include "utils/class_registrator.hh"
+
+namespace auth {
+
+namespace meta {
+
+static const sstring user_name_col_name("name");
+static const sstring superuser_col_name("super");
+
+}
+
+static logging::logger log("auth_service");
+
+class auth_migration_listener final : public ::service::migration_listener {
+    authorizer& _authorizer;
+
+public:
+    explicit auth_migration_listener(authorizer& a) : _authorizer(a) {
+    }
+
+private:
+    void on_create_keyspace(const sstring& ks_name) override {}
+    void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
+    void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+    void on_create_view(const sstring& ks_name, const sstring& view_name) override {}
+
+    void on_update_keyspace(const sstring& ks_name) override {}
+    void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
+    void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+    void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
+
+    void on_drop_keyspace(const sstring& ks_name) override {
+        _authorizer.revoke_all(auth::data_resource(ks_name));
+    }
+
+    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
+        _authorizer.revoke_all(auth::data_resource(ks_name, cf_name));
+    }
+
+    void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+    void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
+};
+
+static db::consistency_level consistency_for_user(const sstring& name) {
+    if (name == meta::DEFAULT_SUPERUSER_NAME) {
+        return db::consistency_level::QUORUM;
+    } else {
+        return db::consistency_level::LOCAL_ONE;
+    }
+}
+
+static future<::shared_ptr<cql3::untyped_result_set>> select_user(cql3::query_processor& qp, const sstring& name) {
+    // Here was a thread local, explicit cache of prepared statement. In normal execution this is
+    // fine, but since we in testing set up and tear down system over and over, we'd start using
+    // obsolete prepared statements pretty quickly.
+    // Rely on query processing caching statements instead, and lets assume
+    // that a map lookup string->statement is not gonna kill us much.
+    return qp.process(
+            sprint(
+                    "SELECT * FROM %s.%s WHERE %s = ?",
+                    meta::AUTH_KS,
+                    meta::USERS_CF,
+                    meta::user_name_col_name),
+            consistency_for_user(name),
+            { name },
+            true);
+}
+
+service_config service_config::from_db_config(const db::config& dc) {
+    const qualified_name qualified_authorizer_name(meta::AUTH_PACKAGE_NAME, dc.authorizer());
+    const qualified_name qualified_authenticator_name(meta::AUTH_PACKAGE_NAME, dc.authenticator());
+
+    service_config c;
+    c.authorizer_java_name = qualified_authorizer_name;
+    c.authenticator_java_name = qualified_authenticator_name;
+
+    return c;
+}
+
+service::service(
+        permissions_cache_config c,
+        cql3::query_processor& qp,
+        ::service::migration_manager& mm,
+        std::unique_ptr<authorizer> a,
+        std::unique_ptr<authenticator> b)
+            : _permissions_cache_config(std::move(c))
+            , _permissions_cache(nullptr)
+            , _qp(qp)
+            , _migration_manager(mm)
+            , _authorizer(std::move(a))
+            , _authenticator(std::move(b))
+            , _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer)) {
+}
+
+service::service(
+        permissions_cache_config cache_config,
+        cql3::query_processor& qp,
+        ::service::migration_manager& mm,
+        const service_config& sc)
+            : service(
+                      std::move(cache_config),
+                      qp,
+                      mm,
+                      create_object<authorizer>(sc.authorizer_java_name, qp, mm),
+                      create_object<authenticator>(sc.authenticator_java_name, qp, mm)) {
+}
+
+bool service::should_create_metadata() const {
+    const bool null_authorizer = _authorizer->qualified_java_name() == allow_all_authorizer_name();
+    const bool null_authenticator = _authenticator->qualified_java_name() == allow_all_authenticator_name();
+    return !null_authorizer || !null_authenticator;
+}
+
+future<> service::create_metadata_if_missing() {
+    auto& db = _qp.db().local();
+
+    auto f = make_ready_future<>();
+
+    if (!db.has_keyspace(meta::AUTH_KS)) {
+        std::map<sstring, sstring> opts{{"replication_factor", "1"}};
+
+        auto ksm = keyspace_metadata::new_keyspace(
+                meta::AUTH_KS,
+                "org.apache.cassandra.locator.SimpleStrategy",
+                opts,
+                true);
+
+        // We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.
+        // See issue #2129.
+        f = _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
+    }
+
+    return f.then([this] {
+        // 3 months.
+        static const auto gc_grace_seconds = 90 * 24 * 60 * 60;
+
+        static const sstring users_table_query = sprint(
+                "CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY (%s)) WITH gc_grace_seconds=%s",
+                meta::AUTH_KS,
+                meta::USERS_CF,
+                meta::user_name_col_name,
+                meta::superuser_col_name,
+                meta::user_name_col_name,
+                gc_grace_seconds);
+
+        return create_metadata_table_if_missing(
+                meta::USERS_CF,
+                _qp,
+                users_table_query,
+                _migration_manager);
+    }).then([this] {
+        delay_until_system_ready(_delayed, [this] {
+            return has_existing_users().then([this](bool existing) {
+                if (!existing) {
+                    //
+                    // Create default superuser.
+                    //
+
+                    static const sstring query = sprint(
+                            "INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
+                            meta::AUTH_KS,
+                            meta::USERS_CF,
+                            meta::user_name_col_name,
+                            meta::superuser_col_name);
+
+                    return _qp.process(
+                            query,
+                            db::consistency_level::ONE,
+                            { meta::DEFAULT_SUPERUSER_NAME, true }).then([](auto&&) {
+                        log.info("Created default superuser '{}'", meta::DEFAULT_SUPERUSER_NAME);
+                    }).handle_exception([](auto exn) {
+                        try {
+                            std::rethrow_exception(exn);
+                        } catch (const exceptions::request_execution_exception&) {
+                            log.warn("Skipped default superuser setup: some nodes were not ready");
+                        }
+                    }).discard_result();
+                }
+
+                return make_ready_future<>();
+            });
+        });
+
+        return make_ready_future<>();
+    });
+}
+
+future<> service::start() {
+    return once_among_shards([this] {
+        if (should_create_metadata()) {
+            return create_metadata_if_missing();
+        }
+
+        return make_ready_future<>();
+    }).then([this] {
+        return when_all_succeed(_authorizer->start(), _authenticator->start());
+    }).then([this] {
+        _permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
+    }).then([this] {
+        return once_among_shards([this] {
+            _migration_manager.register_listener(_migration_listener.get());
+            return make_ready_future<>();
+        });
+    });
+}
+
+future<> service::stop() {
+    return once_among_shards([this] {
+        _delayed.cancel_all();
+        return make_ready_future<>();
+    }).then([this] {
+        return _permissions_cache->stop();
+    }).then([this] {
+        return when_all_succeed(_authorizer->stop(), _authenticator->stop());
+    });
+}
+
+future<bool> service::has_existing_users() const {
+    static const sstring default_user_query = sprint(
+            "SELECT * FROM %s.%s WHERE %s = ?",
+            meta::AUTH_KS,
+            meta::USERS_CF,
+            meta::user_name_col_name);
+
+    static const sstring all_users_query = sprint(
+            "SELECT * FROM %s.%s LIMIT 1",
+            meta::AUTH_KS,
+            meta::USERS_CF);
+
+    // This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we
+    // can potentially avoid doing a range query with a high consistency level.
+
+    return _qp.process(
+            default_user_query,
+            db::consistency_level::ONE,
+            { meta::DEFAULT_SUPERUSER_NAME },
+            true).then([this](auto results) {
+        if (!results->empty()) {
+            return make_ready_future<bool>(true);
+        }
+
+        return _qp.process(
+                default_user_query,
+                db::consistency_level::QUORUM,
+                { meta::DEFAULT_SUPERUSER_NAME },
+                true).then([this](auto results) {
+            if (!results->empty()) {
+                return make_ready_future<bool>(true);
+            }
+
+            return _qp.process(
+                    all_users_query,
+                    db::consistency_level::QUORUM).then([](auto results) {
+                return make_ready_future<bool>(!results->empty());
+            });
+        });
+    });
+}
+
+future<bool> service::is_existing_user(const sstring& name) const {
+    return select_user(_qp, name).then([](auto results) {
+        return !results->empty();
+    });
+}
+
+future<bool> service::is_super_user(const sstring& name) const {
+    return select_user(_qp, name).then([](auto results) {
+        return !results->empty() && results->one().template get_as<bool>(meta::superuser_col_name);
+    });
+}
+
+future<> service::insert_user(const sstring& name, bool is_superuser) {
+    return _qp.process(
+            sprint(
+                    "INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
+                    meta::AUTH_KS,
+                    meta::USERS_CF,
+                    meta::user_name_col_name,
+                    meta::superuser_col_name),
+            consistency_for_user(name),
+            { name, is_superuser }).discard_result();
+}
+
+future<> service::delete_user(const sstring& name) {
+    return _qp.process(
+            sprint(
+                    "DELETE FROM %s.%s WHERE %s = ?",
+                    meta::AUTH_KS,
+                    meta::USERS_CF,
+                    meta::user_name_col_name),
+            consistency_for_user(name),
+            { name }).discard_result();
+}
+
+future<permission_set> service::get_permissions(::shared_ptr<authenticated_user> u, data_resource r) const {
+    return _permissions_cache->get(std::move(u), std::move(r));
+}
+
+//
+// Free functions.
+//
+
+future<bool> is_super_user(const service& ser, const authenticated_user& u) {
+    if (u.is_anonymous()) {
+        return make_ready_future<bool>(false);
+    }
+
+    return ser.is_super_user(u.name());
+}
+
+}
--- a/auth/service.hh
+++ b/auth/service.hh
@@ -0,0 +1,133 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <memory>
+
+#include <seastar/core/future.hh>
+#include <seastar/core/sstring.hh>
+
+#include "auth/authenticator.hh"
+#include "auth/authorizer.hh"
+#include "auth/authenticated_user.hh"
+#include "auth/permission.hh"
+#include "auth/permissions_cache.hh"
+#include "delayed_tasks.hh"
+#include "seastarx.hh"
+
+namespace cql3 {
+class query_processor;
+}
+
+namespace db {
+class config;
+}
+
+namespace service {
+class migration_manager;
+class migration_listener;
+}
+
+namespace auth {
+
+class authenticator;
+class authorizer;
+
+struct service_config final {
+    static service_config from_db_config(const db::config&);
+
+    sstring authorizer_java_name;
+    sstring authenticator_java_name;
+};
+
+class service final {
+    permissions_cache_config _permissions_cache_config;
+    std::unique_ptr<permissions_cache> _permissions_cache;
+
+    cql3::query_processor& _qp;
+
+    ::service::migration_manager& _migration_manager;
+
+    std::unique_ptr<authorizer> _authorizer;
+
+    std::unique_ptr<authenticator> _authenticator;
+
+    // Only one of these should be registered, so we end up with some unused instances. Not the end of the world.
+    std::unique_ptr<::service::migration_listener> _migration_listener;
+
+    delayed_tasks<> _delayed{};
+
+public:
+    service(
+            permissions_cache_config,
+            cql3::query_processor&,
+            ::service::migration_manager&,
+            std::unique_ptr<authorizer>,
+            std::unique_ptr<authenticator>);
+
+    service(
+            permissions_cache_config,
+            cql3::query_processor&,
+            ::service::migration_manager&,
+            const service_config&);
+
+    future<> start();
+
+    future<> stop();
+
+    future<bool> is_existing_user(const sstring& name) const;
+
+    future<bool> is_super_user(const sstring& name) const;
+
+    future<> insert_user(const sstring& name, bool is_superuser);
+
+    future<> delete_user(const sstring& name);
+
+    future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource) const;
+
+    authenticator& underlying_authenticator() {
+        return *_authenticator;
+    }
+
+    const authenticator& underlying_authenticator() const {
+        return *_authenticator;
+    }
+
+    authorizer& underlying_authorizer() {
+        return *_authorizer;
+    }
+
+    const authorizer& underlying_authorizer() const {
+        return *_authorizer;
+    }
+
+private:
+    future<bool> has_existing_users() const;
+
+    bool should_create_metadata() const;
+
+    future<> create_metadata_if_missing();
+};
+
+future<bool> is_super_user(const service&, const authenticated_user&);
+
+}
--- a/auth/transitional.cc
+++ b/auth/transitional.cc
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright (C) 2017 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "authenticator.hh"
+#include "authenticated_user.hh"
+#include "authenticator.hh"
+#include "authorizer.hh"
+#include "password_authenticator.hh"
+#include "default_authorizer.hh"
+#include "permission.hh"
+#include "db/config.hh"
+#include "utils/class_registrator.hh"
+
+namespace auth {
+
+class service;
+
+static const sstring PACKAGE_NAME("com.scylladb.auth.");
+
+static const sstring& transitional_authenticator_name() {
+    static const sstring name = PACKAGE_NAME + "TransitionalAuthenticator";
+    return name;
+}
+
+static const sstring& transitional_authorizer_name() {
+    static const sstring name = PACKAGE_NAME + "TransitionalAuthorizer";
+    return name;
+}
+
+class transitional_authenticator : public authenticator {
+    std::unique_ptr<authenticator> _authenticator;
+public:
+    static const sstring PASSWORD_AUTHENTICATOR_NAME;
+
+    transitional_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
+            : transitional_authenticator(std::make_unique<password_authenticator>(qp, mm))
+    {}
+    transitional_authenticator(std::unique_ptr<authenticator> a)
+        : _authenticator(std::move(a))
+    {}
+    future<> start() override {
+        return _authenticator->start();
+    }
+    future<> stop() override {
+        return _authenticator->stop();
+    }
+    const sstring& qualified_java_name() const override {
+        return transitional_authenticator_name();
+    }
+    bool require_authentication() const override {
+        return true;
+    }
+    option_set supported_options() const override {
+        return _authenticator->supported_options();
+    }
+    option_set alterable_options() const override {
+        return _authenticator->alterable_options();
+    }
+    future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override {
+        auto i = credentials.find(authenticator::USERNAME_KEY);
+        if ((i == credentials.end() || i->second.empty()) && (!credentials.count(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {
+            // return anon user
+            return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+        }
+        return make_ready_future().then([this, &credentials] {
+            return _authenticator->authenticate(credentials);
+        }).handle_exception([](auto ep) {
+            try {
+                std::rethrow_exception(ep);
+            } catch (exceptions::authentication_exception&) {
+                // return anon user
+                return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+            }
+        });
+    }
+    future<> create(sstring username, const option_map& options) override {
+        return _authenticator->create(username, options);
+    }
+    future<> alter(sstring username, const option_map& options) override {
+        return _authenticator->alter(username, options);
+    }
+    future<> drop(sstring username) override {
+        return _authenticator->drop(username);
+    }
+    const resource_ids& protected_resources() const override {
+        return _authenticator->protected_resources();
+    }
+    ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
+        class sasl_wrapper : public sasl_challenge {
+        public:
+            sasl_wrapper(::shared_ptr<sasl_challenge> sasl)
+                : _sasl(std::move(sasl))
+            {}
+            bytes evaluate_response(bytes_view client_response) override {
+                try {
+                    return _sasl->evaluate_response(client_response);
+                } catch (exceptions::authentication_exception&) {
+                    _complete = true;
+                    return {};
+                }
+            }
+            bool is_complete() const {
+                return _complete || _sasl->is_complete();
+            }
+            future<::shared_ptr<authenticated_user>> get_authenticated_user() const {
+                return futurize_apply([this] {
+                    return _sasl->get_authenticated_user().handle_exception([](auto ep) {
+                        try {
+                            std::rethrow_exception(ep);
+                        } catch (exceptions::authentication_exception&) {
+                            // return anon user
+                            return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+                        }
+                    });
+                });
+            }
+        private:
+            ::shared_ptr<sasl_challenge> _sasl;
+            bool _complete = false;
+        };
+        return ::make_shared<sasl_wrapper>(_authenticator->new_sasl_challenge());
+    }
+};
+
+class transitional_authorizer : public authorizer {
+    std::unique_ptr<authorizer> _authorizer;
+public:
+    transitional_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
+        : transitional_authorizer(std::make_unique<default_authorizer>(qp, mm))
+    {}
+    transitional_authorizer(std::unique_ptr<authorizer> a)
+        : _authorizer(std::move(a))
+    {}
+    ~transitional_authorizer()
+    {}
+    future<> start() override {
+        return _authorizer->start();
+    }
+    future<> stop() override {
+        return _authorizer->stop();
+    }
+    const sstring& qualified_java_name() const override {
+        return transitional_authorizer_name();
+    }
+    future<permission_set> authorize(service& ser, ::shared_ptr<authenticated_user> user, data_resource resource) const override {
+        return is_super_user(ser, *user).then([](bool s) {
+            static const permission_set transitional_permissions =
+                            permission_set::of<permission::CREATE,
+                                            permission::ALTER, permission::DROP,
+                                            permission::SELECT, permission::MODIFY>();
+
+            return make_ready_future<permission_set>(s ? permissions::ALL : transitional_permissions);
+        });
+    }
+    future<> grant(::shared_ptr<authenticated_user> user, permission_set ps, data_resource r, sstring s) override {
+        return _authorizer->grant(std::move(user), std::move(ps), std::move(r), std::move(s));
+    }
+    future<> revoke(::shared_ptr<authenticated_user> user, permission_set ps, data_resource r, sstring s) override {
+        return _authorizer->revoke(std::move(user), std::move(ps), std::move(r), std::move(s));
+    }
+    future<std::vector<permission_details>> list(service& ser, ::shared_ptr<authenticated_user> user, permission_set ps, optional<data_resource> r, optional<sstring> s) const override {
+        return _authorizer->list(ser, std::move(user), std::move(ps), std::move(r), std::move(s));
+    }
+    future<> revoke_all(sstring s) override {
+        return _authorizer->revoke_all(std::move(s));
+    }
+    future<> revoke_all(data_resource r) override {
+        return _authorizer->revoke_all(std::move(r));
+    }
+    const resource_ids& protected_resources() override {
+        return _authorizer->protected_resources();
+    }
+    future<> validate_configuration() const override {
+        return _authorizer->validate_configuration();
+    }
+};
+
+}
+
+//
+// To ensure correct initialization order, we unfortunately need to use string literals.
+//
+
+static const class_registrator<
+        auth::authenticator,
+        auth::transitional_authenticator,
+        cql3::query_processor&,
+        ::service::migration_manager&> transitional_authenticator_reg("com.scylladb.auth.TransitionalAuthenticator");
+
+static const class_registrator<
+        auth::authorizer,
+        auth::transitional_authorizer,
+        cql3::query_processor&,
+        ::service::migration_manager&> transitional_authorizer_reg("com.scylladb.auth.TransitionalAuthorizer");
--- a/cache_flat_mutation_reader.hh
+++ b/cache_flat_mutation_reader.hh
@@ -0,0 +1,661 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <vector>
+#include "row_cache.hh"
+#include "mutation_reader.hh"
+#include "streamed_mutation.hh"
+#include "partition_version.hh"
+#include "utils/logalloc.hh"
+#include "query-request.hh"
+#include "partition_snapshot_reader.hh"
+#include "partition_snapshot_row_cursor.hh"
+#include "read_context.hh"
+#include "flat_mutation_reader.hh"
+
+namespace cache {
+
+extern logging::logger clogger;
+
+class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
+    enum class state {
+        before_static_row,
+
+        // Invariants:
+        //  - position_range(_lower_bound, _upper_bound) covers all not yet emitted positions from current range
+        //  - if _next_row has valid iterators:
+        //    - _next_row points to the nearest row in cache >= _lower_bound
+        //    - _next_row_in_range = _next.position() < _upper_bound
+        //  - if _next_row doesn't have valid iterators, it has no meaning.
+        reading_from_cache,
+
+        // Starts reading from underlying reader.
+        // The range to read is position_range(_lower_bound, min(_next_row.position(), _upper_bound)).
+        // Invariants:
+        //  - _next_row_in_range = _next.position() < _upper_bound
+        move_to_underlying,
+
+        // Invariants:
+        // - Upper bound of the read is min(_next_row.position(), _upper_bound)
+        // - _next_row_in_range = _next.position() < _upper_bound
+        // - _last_row points at a direct predecessor of the next row which is going to be read.
+        //   Used for populating continuity.
+        // - _population_range_starts_before_all_rows is set accordingly
+        reading_from_underlying,
+
+        end_of_stream
+    };
+    lw_shared_ptr<partition_snapshot> _snp;
+    position_in_partition::tri_compare _position_cmp;
+
+    query::clustering_key_filter_ranges _ck_ranges;
+    query::clustering_row_ranges::const_iterator _ck_ranges_curr;
+    query::clustering_row_ranges::const_iterator _ck_ranges_end;
+
+    lsa_manager _lsa_manager;
+
+    partition_snapshot_row_weakref _last_row;
+
+    // We need to be prepared that we may get overlapping and out of order
+    // range tombstones. We must emit fragments with strictly monotonic positions,
+    // so we can't just trim such tombstones to the position of the last fragment.
+    // To solve that, range tombstones are accumulated first in a range_tombstone_stream
+    // and emitted once we have a fragment with a larger position.
+    range_tombstone_stream _tombstones;
+
+    // Holds the lower bound of a position range which hasn't been processed yet.
+    // Only fragments with positions < _lower_bound have been emitted.
+    //
+    // It is assumed that !_lower_bound.is_clustering_row(). We depend on this when
+    // calling range_tombstone::trim_front() and when inserting dummy entries. Dummy
+    // entries are assumed to be only at !is_clustering_row() positions.
+    position_in_partition _lower_bound;
+    position_in_partition_view _upper_bound;
+
+    state _state = state::before_static_row;
+    lw_shared_ptr<read_context> _read_context;
+    partition_snapshot_row_cursor _next_row;
+    bool _next_row_in_range = false;
+
+    // True iff current population interval, since the previous clustering row, starts before all clustered rows.
+    // We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
+    // because we mark clustering intervals as continuous when consuming a clustering_row, it would prevent
+    // us from marking the interval as continuous.
+    // Valid when _state == reading_from_underlying.
+    bool _population_range_starts_before_all_rows;
+
+    future<> do_fill_buffer();
+    void copy_from_cache_to_buffer();
+    future<> process_static_row();
+    void move_to_end();
+    void move_to_next_range();
+    void move_to_range(query::clustering_row_ranges::const_iterator);
+    void move_to_next_entry();
+    // Emits all delayed range tombstones with positions smaller than upper_bound.
+    void drain_tombstones(position_in_partition_view upper_bound);
+    // Emits all delayed range tombstones.
+    void drain_tombstones();
+    void add_to_buffer(const partition_snapshot_row_cursor&);
+    void add_clustering_row_to_buffer(mutation_fragment&&);
+    void add_to_buffer(range_tombstone&&);
+    void add_to_buffer(mutation_fragment&&);
+    future<> read_from_underlying();
+    void start_reading_from_underlying();
+    bool after_current_range(position_in_partition_view position);
+    bool can_populate() const;
+    void maybe_update_continuity();
+    void maybe_add_to_cache(const mutation_fragment& mf);
+    void maybe_add_to_cache(const clustering_row& cr);
+    void maybe_add_to_cache(const range_tombstone& rt);
+    void maybe_add_to_cache(const static_row& sr);
+    void maybe_set_static_row_continuous();
+    void finish_reader() {
+        push_mutation_fragment(partition_end());
+        _end_of_stream = true;
+        _state = state::end_of_stream;
+    }
+public:
+    cache_flat_mutation_reader(schema_ptr s,
+                               dht::decorated_key dk,
+                               query::clustering_key_filter_ranges&& crr,
+                               lw_shared_ptr<read_context> ctx,
+                               lw_shared_ptr<partition_snapshot> snp,
+                               row_cache& cache)
+        : flat_mutation_reader::impl(std::move(s))
+        , _snp(std::move(snp))
+        , _position_cmp(*_schema)
+        , _ck_ranges(std::move(crr))
+        , _ck_ranges_curr(_ck_ranges.begin())
+        , _ck_ranges_end(_ck_ranges.end())
+        , _lsa_manager(cache)
+        , _tombstones(*_schema)
+        , _lower_bound(position_in_partition::before_all_clustered_rows())
+        , _upper_bound(position_in_partition_view::before_all_clustered_rows())
+        , _read_context(std::move(ctx))
+        , _next_row(*_schema, *_snp)
+    {
+        clogger.trace("csm {}: table={}.{}", this, _schema->ks_name(), _schema->cf_name());
+        push_mutation_fragment(partition_start(std::move(dk), _snp->partition_tombstone()));
+    }
+    cache_flat_mutation_reader(const cache_flat_mutation_reader&) = delete;
+    cache_flat_mutation_reader(cache_flat_mutation_reader&&) = delete;
+    virtual future<> fill_buffer() override;
+    virtual ~cache_flat_mutation_reader() {
+        maybe_merge_versions(_snp, _lsa_manager.region(), _lsa_manager.read_section());
+    }
+    virtual void next_partition() override {
+        clear_buffer_to_next_partition();
+        if (is_buffer_empty()) {
+            _end_of_stream = true;
+        }
+    }
+    virtual future<> fast_forward_to(const dht::partition_range&) override {
+        clear_buffer();
+        _end_of_stream = true;
+        return make_ready_future<>();
+    }
+    virtual future<> fast_forward_to(position_range pr) override {
+        throw std::bad_function_call();
+    }
+};
+
+inline
+future<> cache_flat_mutation_reader::process_static_row() {
+    if (_snp->version()->partition().static_row_continuous()) {
+        _read_context->cache().on_row_hit();
+        row sr = _lsa_manager.run_in_read_section([this] {
+            return _snp->static_row();
+        });
+        if (!sr.empty()) {
+            push_mutation_fragment(mutation_fragment(static_row(std::move(sr))));
+        }
+        return make_ready_future<>();
+    } else {
+        _read_context->cache().on_row_miss();
+        return _read_context->get_next_fragment().then([this] (mutation_fragment_opt&& sr) {
+            if (sr) {
+                assert(sr->is_static_row());
+                maybe_add_to_cache(sr->as_static_row());
+                push_mutation_fragment(std::move(*sr));
+            }
+            maybe_set_static_row_continuous();
+        });
+    }
+}
+
+inline
+future<> cache_flat_mutation_reader::fill_buffer() {
+    if (_state == state::before_static_row) {
+        auto after_static_row = [this] {
+            if (_ck_ranges_curr == _ck_ranges_end) {
+                finish_reader();
+                return make_ready_future<>();
+            }
+            _state = state::reading_from_cache;
+            _lsa_manager.run_in_read_section([this] {
+                move_to_range(_ck_ranges_curr);
+            });
+            return fill_buffer();
+        };
+        if (_schema->has_static_columns()) {
+            return process_static_row().then(std::move(after_static_row));
+        } else {
+            return after_static_row();
+        }
+    }
+    clogger.trace("csm {}: fill_buffer(), range={}, lb={}", this, *_ck_ranges_curr, _lower_bound);
+    return do_until([this] { return _end_of_stream || is_buffer_full(); }, [this] {
+        return do_fill_buffer();
+    });
+}
+
+inline
+future<> cache_flat_mutation_reader::do_fill_buffer() {
+    if (_state == state::move_to_underlying) {
+        _state = state::reading_from_underlying;
+        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
+        auto end = _next_row_in_range ? position_in_partition(_next_row.position())
+                                      : position_in_partition(_upper_bound);
+        return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}).then([this] {
+            return read_from_underlying();
+        });
+    }
+    if (_state == state::reading_from_underlying) {
+        return read_from_underlying();
+    }
+    // assert(_state == state::reading_from_cache)
+    return _lsa_manager.run_in_read_section([this] {
+        auto next_valid = _next_row.iterators_valid();
+        clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", this, _lower_bound,
+            _upper_bound, _next_row.position(), next_valid);
+        // We assume that if there was eviction, and thus the range may
+        // no longer be continuous, the cursor was invalidated.
+        if (!next_valid) {
+            auto adjacent = _next_row.advance_to(_lower_bound);
+            _next_row_in_range = !after_current_range(_next_row.position());
+            if (!adjacent && !_next_row.continuous()) {
+                _last_row = nullptr; // We could insert a dummy here, but this path is unlikely.
+                start_reading_from_underlying();
+                return make_ready_future<>();
+            }
+        }
+        _next_row.maybe_refresh();
+        clogger.trace("csm {}: next={}, cont={}", this, _next_row.position(), _next_row.continuous());
+        while (!is_buffer_full() && _state == state::reading_from_cache) {
+            copy_from_cache_to_buffer();
+            if (need_preempt()) {
+                break;
+            }
+        }
+        return make_ready_future<>();
+    });
+}
+
+inline
+future<> cache_flat_mutation_reader::read_from_underlying() {
+    return consume_mutation_fragments_until(_read_context->underlying().underlying(),
+        [this] { return _state != state::reading_from_underlying || is_buffer_full(); },
+        [this] (mutation_fragment mf) {
+            _read_context->cache().on_row_miss();
+            maybe_add_to_cache(mf);
+            add_to_buffer(std::move(mf));
+        },
+        [this] {
+            _state = state::reading_from_cache;
+            _lsa_manager.run_in_update_section([this] {
+                auto same_pos = _next_row.maybe_refresh();
+                if (!same_pos) {
+                    _read_context->cache().on_mispopulate(); // FIXME: Insert dummy entry at _upper_bound.
+                    _next_row_in_range = !after_current_range(_next_row.position());
+                    if (!_next_row.continuous()) {
+                        start_reading_from_underlying();
+                    }
+                    return;
+                }
+                if (_next_row_in_range) {
+                    maybe_update_continuity();
+                    _last_row = _next_row;
+                    add_to_buffer(_next_row);
+                    try {
+                        move_to_next_entry();
+                    } catch (const std::bad_alloc&) {
+                        // We cannot reenter the section, since we may have moved to the new range, and
+                        // because add_to_buffer() should not be repeated.
+                        _snp->region().allocator().invalidate_references(); // Invalidates _next_row
+                    }
+                } else {
+                    if (no_clustering_row_between(*_schema, _upper_bound, _next_row.position())) {
+                        this->maybe_update_continuity();
+                    } else if (can_populate()) {
+                        rows_entry::compare less(*_schema);
+                        auto& rows = _snp->version()->partition().clustered_rows();
+                        if (query::is_single_row(*_schema, *_ck_ranges_curr)) {
+                            with_allocator(_snp->region().allocator(), [&] {
+                                auto e = alloc_strategy_unique_ptr<rows_entry>(
+                                    current_allocator().construct<rows_entry>(_ck_ranges_curr->start()->value()));
+                                // Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
+                                auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
+                                auto inserted = insert_result.second;
+                                auto it = insert_result.first;
+                                if (inserted) {
+                                    e.release();
+                                    auto next = std::next(it);
+                                    it->set_continuous(next->continuous());
+                                    clogger.trace("csm {}: inserted dummy at {}, cont={}", this, it->position(), it->continuous());
+                                }
+                            });
+                        } else if (!_ck_ranges_curr->start() || _last_row.refresh(*_snp)) {
+                            with_allocator(_snp->region().allocator(), [&] {
+                                auto e = alloc_strategy_unique_ptr<rows_entry>(
+                                    current_allocator().construct<rows_entry>(*_schema, _upper_bound, is_dummy::yes, is_continuous::yes));
+                                // Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.
+                                auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
+                                auto inserted = insert_result.second;
+                                if (inserted) {
+                                    clogger.trace("csm {}: inserted dummy at {}", this, _upper_bound);
+                                    e.release();
+                                } else {
+                                    clogger.trace("csm {}: mark {} as continuous", this, insert_result.first->position());
+                                    insert_result.first->set_continuous(true);
+                                }
+                            });
+                        }
+                    } else {
+                        _read_context->cache().on_mispopulate();
+                    }
+                    try {
+                        move_to_next_range();
+                    } catch (const std::bad_alloc&) {
+                        // We cannot reenter the section, since we may have moved to the new range
+                        _snp->region().allocator().invalidate_references(); // Invalidates _next_row
+                    }
+                }
+            });
+            return make_ready_future<>();
+        });
+}
+
+inline
+void cache_flat_mutation_reader::maybe_update_continuity() {
+    if (can_populate() && (_population_range_starts_before_all_rows || _last_row.refresh(*_snp))) {
+            if (_next_row.is_in_latest_version()) {
+                clogger.trace("csm {}: mark {} continuous", this, _next_row.get_iterator_in_latest_version()->position());
+                _next_row.get_iterator_in_latest_version()->set_continuous(true);
+            } else {
+                // Cover entry from older version
+                with_allocator(_snp->region().allocator(), [&] {
+                    auto& rows = _snp->version()->partition().clustered_rows();
+                    rows_entry::compare less(*_schema);
+                    auto e = alloc_strategy_unique_ptr<rows_entry>(
+                        current_allocator().construct<rows_entry>(*_schema, _next_row.position(), is_dummy(_next_row.dummy()), is_continuous::yes));
+                    auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
+                    auto inserted = insert_result.second;
+                    if (inserted) {
+                        clogger.trace("csm {}: inserted dummy at {}", this, e->position());
+                        e.release();
+                    }
+                });
+            }
+    } else {
+        _read_context->cache().on_mispopulate();
+    }
+}
+
+inline
+void cache_flat_mutation_reader::maybe_add_to_cache(const mutation_fragment& mf) {
+    if (mf.is_range_tombstone()) {
+        maybe_add_to_cache(mf.as_range_tombstone());
+    } else {
+        assert(mf.is_clustering_row());
+        const clustering_row& cr = mf.as_clustering_row();
+        maybe_add_to_cache(cr);
+    }
+}
+
+inline
+void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
+    if (!can_populate()) {
+        _last_row = nullptr;
+        _population_range_starts_before_all_rows = false;
+        _read_context->cache().on_mispopulate();
+        return;
+    }
+    clogger.trace("csm {}: populate({})", this, cr);
+    _lsa_manager.run_in_update_section_with_allocator([this, &cr] {
+        mutation_partition& mp = _snp->version()->partition();
+        rows_entry::compare less(*_schema);
+
+        auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
+            current_allocator().construct<rows_entry>(cr.key(), cr.tomb(), cr.marker(), cr.cells()));
+        new_entry->set_continuous(false);
+        auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
+                                              : mp.clustered_rows().lower_bound(cr.key(), less);
+        auto insert_result = mp.clustered_rows().insert_check(it, *new_entry, less);
+        if (insert_result.second) {
+            _read_context->cache().on_row_insert();
+            new_entry.release();
+        }
+        it = insert_result.first;
+
+        rows_entry& e = *it;
+        if (!_ck_ranges_curr->start() || _last_row.refresh(*_snp)) {
+            clogger.trace("csm {}: set_continuous({})", this, e.position());
+            e.set_continuous(true);
+        } else {
+            _read_context->cache().on_mispopulate();
+        }
+        with_allocator(standard_allocator(), [&] {
+            _last_row = partition_snapshot_row_weakref(*_snp, it);
+        });
+        _population_range_starts_before_all_rows = false;
+    });
+}
+
+inline
+bool cache_flat_mutation_reader::after_current_range(position_in_partition_view p) {
+    return _position_cmp(p, _upper_bound) >= 0;
+}
+
+inline
+void cache_flat_mutation_reader::start_reading_from_underlying() {
+    clogger.trace("csm {}: start_reading_from_underlying(), range=[{}, {})", this, _lower_bound, _next_row_in_range ? _next_row.position() : _upper_bound);
+    _state = state::move_to_underlying;
+}
+
+inline
+void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
+    clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", this, _next_row.position(), _next_row_in_range);
+    position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());
+    for (auto&& rts : _snp->range_tombstones(*_schema, _lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {
+        add_to_buffer(std::move(rts));
+        if (is_buffer_full()) {
+            return;
+        }
+    }
+    if (_next_row_in_range) {
+        _last_row = _next_row;
+        add_to_buffer(_next_row);
+        move_to_next_entry();
+    } else {
+        move_to_next_range();
+    }
+}
+
+inline
+void cache_flat_mutation_reader::move_to_end() {
+    drain_tombstones();
+    finish_reader();
+    clogger.trace("csm {}: eos", this);
+}
+
+inline
+void cache_flat_mutation_reader::move_to_next_range() {
+    auto next_it = std::next(_ck_ranges_curr);
+    if (next_it == _ck_ranges_end) {
+        move_to_end();
+        _ck_ranges_curr = next_it;
+    } else {
+        move_to_range(next_it);
+    }
+}
+
+inline
+void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::const_iterator next_it) {
+    auto lb = position_in_partition::for_range_start(*next_it);
+    auto ub = position_in_partition_view::for_range_end(*next_it);
+    _last_row = nullptr;
+    _lower_bound = std::move(lb);
+    _upper_bound = std::move(ub);
+    _ck_ranges_curr = next_it;
+    auto adjacent = _next_row.advance_to(_lower_bound);
+    _next_row_in_range = !after_current_range(_next_row.position());
+    clogger.trace("csm {}: move_to_range(), range={}, lb={}, ub={}, next={}", this, *_ck_ranges_curr, _lower_bound, _upper_bound, _next_row.position());
+    if (!adjacent && !_next_row.continuous()) {
+        // FIXME: We don't insert a dummy for singular range to avoid allocating 3 entries
+        // for a hit (before, at and after). If we supported the concept of an incomplete row,
+        // we could insert such a row for the lower bound if it's full instead, for both singular and
+        // non-singular ranges.
+        if (_ck_ranges_curr->start() && !query::is_single_row(*_schema, *_ck_ranges_curr)) {
+            // Insert dummy for lower bound
+            if (can_populate()) {
+                // FIXME: _lower_bound could be adjacent to the previous row, in which case we could skip this
+                clogger.trace("csm {}: insert dummy at {}", this, _lower_bound);
+                auto it = with_allocator(_lsa_manager.region().allocator(), [&] {
+                    auto& rows = _snp->version()->partition().clustered_rows();
+                    auto new_entry = current_allocator().construct<rows_entry>(*_schema, _lower_bound, is_dummy::yes, is_continuous::no);
+                    return rows.insert_before(_next_row.get_iterator_in_latest_version(), *new_entry);
+                });
+                _last_row = partition_snapshot_row_weakref(*_snp, it);
+            } else {
+                _read_context->cache().on_mispopulate();
+            }
+        }
+        start_reading_from_underlying();
+    }
+}
+
+// _next_row must be inside the range.
+inline
+void cache_flat_mutation_reader::move_to_next_entry() {
+    clogger.trace("csm {}: move_to_next_entry(), curr={}", this, _next_row.position());
+    if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {
+        move_to_next_range();
+    } else {
+        if (!_next_row.next()) {
+            move_to_end();
+            return;
+        }
+        _next_row_in_range = !after_current_range(_next_row.position());
+        clogger.trace("csm {}: next={}, cont={}, in_range={}", this, _next_row.position(), _next_row.continuous(), _next_row_in_range);
+        if (!_next_row.continuous()) {
+            start_reading_from_underlying();
+        }
+    }
+}
+
+inline
+void cache_flat_mutation_reader::drain_tombstones(position_in_partition_view pos) {
+    while (true) {
+        reserve_one();
+        auto mfo = _tombstones.get_next(pos);
+        if (!mfo) {
+            break;
+        }
+        push_mutation_fragment(std::move(*mfo));
+    }
+}
+
+inline
+void cache_flat_mutation_reader::drain_tombstones() {
+    while (true) {
+        reserve_one();
+        auto mfo = _tombstones.get_next();
+        if (!mfo) {
+            break;
+        }
+        push_mutation_fragment(std::move(*mfo));
+    }
+}
+
+inline
+void cache_flat_mutation_reader::add_to_buffer(mutation_fragment&& mf) {
+    clogger.trace("csm {}: add_to_buffer({})", this, mf);
+    if (mf.is_clustering_row()) {
+        add_clustering_row_to_buffer(std::move(mf));
+    } else {
+        assert(mf.is_range_tombstone());
+        add_to_buffer(std::move(mf).as_range_tombstone());
+    }
+}
+
+inline
+void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_cursor& row) {
+    if (!row.dummy()) {
+        _read_context->cache().on_row_hit();
+        add_clustering_row_to_buffer(row.row());
+    }
+}
+
+// Maintains the following invariants, also in case of exception:
+//   (1) no fragment with position >= _lower_bound was pushed yet
+//   (2) If _lower_bound > mf.position(), mf was emitted
+inline
+void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&& mf) {
+    clogger.trace("csm {}: add_clustering_row_to_buffer({})", this, mf);
+    auto& row = mf.as_clustering_row();
+    auto key = row.key();
+    try {
+        drain_tombstones(row.position());
+        push_mutation_fragment(std::move(mf));
+        _lower_bound = position_in_partition::after_key(std::move(key));
+    } catch (...) {
+        // We may have emitted some of the range tombstones which start after the old _lower_bound
+        _lower_bound = position_in_partition::for_key(std::move(key));
+        throw;
+    }
+}
+
+inline
+void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {
+    clogger.trace("csm {}: add_to_buffer({})", this, rt);
+    // This guarantees that rt starts after any emitted clustering_row
+    if (!rt.trim_front(*_schema, _lower_bound)) {
+        return;
+    }
+    _lower_bound = position_in_partition(rt.position());
+    _tombstones.apply(std::move(rt));
+    drain_tombstones(_lower_bound);
+}
+
+inline
+void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {
+    if (can_populate()) {
+        clogger.trace("csm {}: maybe_add_to_cache({})", this, rt);
+        _lsa_manager.run_in_update_section_with_allocator([&] {
+            _snp->version()->partition().row_tombstones().apply_monotonically(*_schema, rt);
+        });
+    } else {
+        _read_context->cache().on_mispopulate();
+    }
+}
+
+inline
+void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {
+    if (can_populate()) {
+        clogger.trace("csm {}: populate({})", this, sr);
+        _read_context->cache().on_row_insert();
+        _lsa_manager.run_in_update_section_with_allocator([&] {
+            _snp->version()->partition().static_row().apply(*_schema, column_kind::static_column, sr.cells());
+        });
+    } else {
+        _read_context->cache().on_mispopulate();
+    }
+}
+
+inline
+void cache_flat_mutation_reader::maybe_set_static_row_continuous() {
+    if (can_populate()) {
+        clogger.trace("csm {}: set static row continuous", this);
+        _snp->version()->partition().set_static_row_continuous(true);
+    } else {
+        _read_context->cache().on_mispopulate();
+    }
+}
+
+inline
+bool cache_flat_mutation_reader::can_populate() const {
+    return _snp->at_latest_version() && _read_context->cache().phase_of(_read_context->key()) == _read_context->phase();
+}
+
+} // namespace cache
+
+inline flat_mutation_reader make_cache_flat_mutation_reader(schema_ptr s,
+                                                            dht::decorated_key dk,
+                                                            query::clustering_key_filter_ranges crr,
+                                                            row_cache& cache,
+                                                            lw_shared_ptr<cache::read_context> ctx,
+                                                            lw_shared_ptr<partition_snapshot> snp)
+{
+    return make_flat_mutation_reader<cache::cache_flat_mutation_reader>(
+        std::move(s), std::move(dk), std::move(crr), std::move(ctx), std::move(snp), cache);
+}
--- a/cache_streamed_mutation.hh
+++ b/cache_streamed_mutation.hh
@@ -1,538 +0,0 @@
-/*
- * Copyright (C) 2017 ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#pragma once
-
-#include <vector>
-#include "row_cache.hh"
-#include "mutation_reader.hh"
-#include "streamed_mutation.hh"
-#include "partition_version.hh"
-#include "utils/logalloc.hh"
-#include "query-request.hh"
-#include "partition_snapshot_reader.hh"
-#include "partition_snapshot_row_cursor.hh"
-#include "read_context.hh"
-
-namespace cache {
-
-class lsa_manager {
-    row_cache& _cache;
-public:
-    lsa_manager(row_cache& cache) : _cache(cache) { }
-    template<typename Func>
-    decltype(auto) run_in_read_section(const Func& func) {
-        return _cache._read_section(_cache._tracker.region(), [&func] () {
-            return with_linearized_managed_bytes([&func] () {
-                return func();
-            });
-        });
-    }
-    template<typename Func>
-    decltype(auto) run_in_update_section(const Func& func) {
-        return _cache._update_section(_cache._tracker.region(), [&func] () {
-            return with_linearized_managed_bytes([&func] () {
-                return func();
-            });
-        });
-    }
-    template<typename Func>
-    void run_in_update_section_with_allocator(Func&& func) {
-        return _cache._update_section(_cache._tracker.region(), [this, &func] () {
-            return with_linearized_managed_bytes([this, &func] () {
-                return with_allocator(_cache._tracker.region().allocator(), [this, &func] () mutable {
-                    return func();
-                });
-            });
-        });
-    }
-    logalloc::region& region() { return _cache._tracker.region(); }
-    logalloc::allocating_section& read_section() { return _cache._read_section; }
-};
-
-class cache_streamed_mutation final : public streamed_mutation::impl {
-    enum class state {
-        before_static_row,
-
-        // Invariants:
-        //  - position_range(_lower_bound, _upper_bound) covers all not yet emitted positions from current range
-        //  - _next_row points to the nearest row in cache >= _lower_bound
-        //  - _next_row_in_range = _next.position() < _upper_bound
-        reading_from_cache,
-
-        // Starts reading from underlying reader.
-        // The range to read is position_range(_lower_bound, min(_next_row.position(), _upper_bound)).
-        // Invariants:
-        //  - _next_row_in_range = _next.position() < _upper_bound
-        move_to_underlying,
-
-        // Invariants:
-        // - Upper bound of the read is min(_next_row.position(), _upper_bound)
-        // - _next_row_in_range = _next.position() < _upper_bound
-        // - _last_row_key contains the key of last emitted clustering_row
-        reading_from_underlying,
-
-        end_of_stream
-    };
-    lw_shared_ptr<partition_snapshot> _snp;
-    position_in_partition::tri_compare _position_cmp;
-
-    query::clustering_key_filter_ranges _ck_ranges;
-    query::clustering_row_ranges::const_iterator _ck_ranges_curr;
-    query::clustering_row_ranges::const_iterator _ck_ranges_end;
-
-    lsa_manager _lsa_manager;
-
-    stdx::optional<clustering_key> _last_row_key;
-
-    // We need to be prepared that we may get overlapping and out of order
-    // range tombstones. We must emit fragments with strictly monotonic positions,
-    // so we can't just trim such tombstones to the position of the last fragment.
-    // To solve that, range tombstones are accumulated first in a range_tombstone_stream
-    // and emitted once we have a fragment with a larger position.
-    range_tombstone_stream _tombstones;
-
-    // Holds the lower bound of a position range which hasn't been processed yet.
-    // Only fragments with positions < _lower_bound have been emitted.
-    position_in_partition _lower_bound;
-    position_in_partition_view _upper_bound;
-
-    state _state = state::before_static_row;
-    lw_shared_ptr<read_context> _read_context;
-    partition_snapshot_row_cursor _next_row;
-    bool _next_row_in_range = false;
-
-    future<> do_fill_buffer();
-    void copy_from_cache_to_buffer();
-    future<> process_static_row();
-    void move_to_end();
-    void move_to_next_range();
-    void move_to_current_range();
-    void move_to_next_entry();
-    // Emits all delayed range tombstones with positions smaller than upper_bound.
-    void drain_tombstones(position_in_partition_view upper_bound);
-    // Emits all delayed range tombstones.
-    void drain_tombstones();
-    void add_to_buffer(const partition_snapshot_row_cursor&);
-    void add_clustering_row_to_buffer(mutation_fragment&&);
-    void add_to_buffer(range_tombstone&&);
-    void add_to_buffer(mutation_fragment&&);
-    future<> read_from_underlying();
-    future<> start_reading_from_underlying();
-    bool after_current_range(position_in_partition_view position);
-    bool can_populate() const;
-    void maybe_update_continuity();
-    void maybe_add_to_cache(const mutation_fragment& mf);
-    void maybe_add_to_cache(const clustering_row& cr);
-    void maybe_add_to_cache(const range_tombstone& rt);
-    void maybe_add_to_cache(const static_row& sr);
-    void maybe_set_static_row_continuous();
-public:
-    cache_streamed_mutation(schema_ptr s,
-                            dht::decorated_key dk,
-                            query::clustering_key_filter_ranges&& crr,
-                            lw_shared_ptr<read_context> ctx,
-                            lw_shared_ptr<partition_snapshot> snp,
-                            row_cache& cache)
-        : streamed_mutation::impl(std::move(s), dk, snp->partition_tombstone())
-        , _snp(std::move(snp))
-        , _position_cmp(*_schema)
-        , _ck_ranges(std::move(crr))
-        , _ck_ranges_curr(_ck_ranges.begin())
-        , _ck_ranges_end(_ck_ranges.end())
-        , _lsa_manager(cache)
-        , _tombstones(*_schema)
-        , _lower_bound(position_in_partition::before_all_clustered_rows())
-        , _upper_bound(position_in_partition_view::before_all_clustered_rows())
-        , _read_context(std::move(ctx))
-        , _next_row(*_schema, cache._tracker.region(), *_snp)
-    { }
-    cache_streamed_mutation(const cache_streamed_mutation&) = delete;
-    cache_streamed_mutation(cache_streamed_mutation&&) = delete;
-    virtual future<> fill_buffer() override;
-    virtual ~cache_streamed_mutation() {
-        maybe_merge_versions(_snp, _lsa_manager.region(), _lsa_manager.read_section());
-    }
-};
-
-inline
-future<> cache_streamed_mutation::process_static_row() {
-    if (_snp->version()->partition().static_row_continuous()) {
-        _read_context->cache().on_row_hit();
-        row sr = _lsa_manager.run_in_read_section([this] {
-            return _snp->static_row();
-        });
-        if (!sr.empty()) {
-            push_mutation_fragment(mutation_fragment(static_row(std::move(sr))));
-        }
-        return make_ready_future<>();
-    } else {
-        _read_context->cache().on_row_miss();
-        return _read_context->get_next_fragment().then([this] (mutation_fragment_opt&& sr) {
-            if (sr) {
-                assert(sr->is_static_row());
-                maybe_add_to_cache(sr->as_static_row());
-                push_mutation_fragment(std::move(*sr));
-            }
-            maybe_set_static_row_continuous();
-        });
-    }
-}
-
-inline
-future<> cache_streamed_mutation::fill_buffer() {
-    if (_state == state::before_static_row) {
-        auto after_static_row = [this] {
-            if (_ck_ranges_curr == _ck_ranges_end) {
-                _end_of_stream = true;
-                _state = state::end_of_stream;
-                return make_ready_future<>();
-            }
-            _state = state::reading_from_cache;
-            _lsa_manager.run_in_read_section([this] {
-                move_to_current_range();
-            });
-            return fill_buffer();
-        };
-        if (_schema->has_static_columns()) {
-            return process_static_row().then(std::move(after_static_row));
-        } else {
-            return after_static_row();
-        }
-    }
-    return do_until([this] { return _end_of_stream || is_buffer_full(); }, [this] {
-        return do_fill_buffer();
-    });
-}
-
-inline
-future<> cache_streamed_mutation::do_fill_buffer() {
-    if (_state == state::move_to_underlying) {
-        _state = state::reading_from_underlying;
-        auto end = _next_row_in_range ? position_in_partition(_next_row.position())
-                                      : position_in_partition(_upper_bound);
-        return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}).then([this] {
-            return read_from_underlying();
-        });
-    }
-    if (_state == state::reading_from_underlying) {
-        return read_from_underlying();
-    }
-    // assert(_state == state::reading_from_cache)
-    return _lsa_manager.run_in_read_section([this] {
-        auto same_pos = _next_row.maybe_refresh();
-        // FIXME: If continuity changed anywhere between _lower_bound and _next_row.position()
-        // we need to redo the lookup with _lower_bound. There is no eviction yet, so not yet a problem.
-        assert(same_pos);
-        while (!is_buffer_full() && _state == state::reading_from_cache) {
-            copy_from_cache_to_buffer();
-            if (need_preempt()) {
-                break;
-            }
-        }
-        return make_ready_future<>();
-    });
-}
-
-inline
-future<> cache_streamed_mutation::read_from_underlying() {
-    return consume_mutation_fragments_until(_read_context->get_streamed_mutation(),
-        [this] { return _state != state::reading_from_underlying || is_buffer_full(); },
-        [this] (mutation_fragment mf) {
-            _read_context->cache().on_row_miss();
-            maybe_add_to_cache(mf);
-            add_to_buffer(std::move(mf));
-        },
-        [this] {
-            _state = state::reading_from_cache;
-            _lsa_manager.run_in_update_section([this] {
-                auto same_pos = _next_row.maybe_refresh();
-                assert(same_pos); // FIXME: handle eviction
-                if (_next_row_in_range) {
-                    maybe_update_continuity();
-                    add_to_buffer(_next_row);
-                    move_to_next_entry();
-                } else {
-                    if (no_clustering_row_between(*_schema, _upper_bound, _next_row.position())) {
-                        this->maybe_update_continuity();
-                    } else {
-                        // FIXME: Insert dummy entry at _upper_bound.
-                        _read_context->cache().on_mispopulate();
-                    }
-                    move_to_next_range();
-                }
-            });
-            return make_ready_future<>();
-        });
-}
-
-inline
-void cache_streamed_mutation::maybe_update_continuity() {
-    if (can_populate() && _next_row.is_in_latest_version()) {
-        if (_last_row_key) {
-            if (_next_row.previous_row_in_latest_version_has_key(*_last_row_key)) {
-                _next_row.set_continuous(true);
-            }
-        } else if (!_ck_ranges_curr->start()) {
-            _next_row.set_continuous(true);
-        }
-    } else {
-        _read_context->cache().on_mispopulate();
-    }
-}
-
-inline
-void cache_streamed_mutation::maybe_add_to_cache(const mutation_fragment& mf) {
-    if (mf.is_range_tombstone()) {
-        maybe_add_to_cache(mf.as_range_tombstone());
-    } else {
-        assert(mf.is_clustering_row());
-        const clustering_row& cr = mf.as_clustering_row();
-        maybe_add_to_cache(cr);
-    }
-}
-
-inline
-void cache_streamed_mutation::maybe_add_to_cache(const clustering_row& cr) {
-    if (!can_populate()) {
-        _read_context->cache().on_mispopulate();
-        return;
-    }
-    _lsa_manager.run_in_update_section_with_allocator([this, &cr] {
-        mutation_partition& mp = _snp->version()->partition();
-        rows_entry::compare less(*_schema);
-
-        // FIXME: If _next_row is up to date, but latest version doesn't have iterator in
-        // current row (could be far away, so we'd do this often), then this will do
-        // the lookup in mp. This is not necessary, because _next_row has iterators for
-        // next rows in each version, even if they're not part of the current row.
-        // They're currently buried in the heap, but you could keep a vector of
-        // iterators per each version in addition to the heap.
-        auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
-            current_allocator().construct<rows_entry>(cr.key(), cr.tomb(), cr.marker(), cr.cells()));
-        new_entry->set_continuous(false);
-        auto it = _next_row.has_valid_row_from_latest_version()
-                  ? _next_row.get_iterator_in_latest_version() : mp.clustered_rows().lower_bound(cr.key(), less);
-        auto insert_result = mp.clustered_rows().insert_check(it, *new_entry, less);
-        if (insert_result.second) {
-            _read_context->cache().on_row_insert();
-            new_entry.release();
-        }
-        it = insert_result.first;
-
-        rows_entry& e = *it;
-        if (_last_row_key) {
-            if (it == mp.clustered_rows().begin()) {
-                // FIXME: check whether entry for _last_row_key is in older versions and if so set
-                // continuity to true.
-                _read_context->cache().on_mispopulate();
-            } else {
-                auto prev_it = it;
-                --prev_it;
-                clustering_key_prefix::equality eq(*_schema);
-                if (eq(*_last_row_key, prev_it->key())) {
-                    e.set_continuous(true);
-                }
-            }
-        } else if (!_ck_ranges_curr->start()) {
-            e.set_continuous(true);
-        } else {
-            // FIXME: Insert dummy entry at _ck_ranges_curr->start()
-            _read_context->cache().on_mispopulate();
-        }
-    });
-}
-
-inline
-bool cache_streamed_mutation::after_current_range(position_in_partition_view p) {
-    return _position_cmp(p, _upper_bound) >= 0;
-}
-
-inline
-future<> cache_streamed_mutation::start_reading_from_underlying() {
-    _state = state::move_to_underlying;
-    return make_ready_future<>();
-}
-
-inline
-void cache_streamed_mutation::copy_from_cache_to_buffer() {
-    position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());
-    for (auto&& rts : _snp->range_tombstones(*_schema, _lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {
-        add_to_buffer(std::move(rts));
-        if (is_buffer_full()) {
-            return;
-        }
-    }
-    if (_next_row_in_range) {
-        add_to_buffer(_next_row);
-        move_to_next_entry();
-    } else {
-        move_to_next_range();
-    }
-}
-
-inline
-void cache_streamed_mutation::move_to_end() {
-    drain_tombstones();
-    _end_of_stream = true;
-    _state = state::end_of_stream;
-}
-
-inline
-void cache_streamed_mutation::move_to_next_range() {
-    ++_ck_ranges_curr;
-    if (_ck_ranges_curr == _ck_ranges_end) {
-        move_to_end();
-    } else {
-        move_to_current_range();
-    }
-}
-
-inline
-void cache_streamed_mutation::move_to_current_range() {
-    _last_row_key = std::experimental::nullopt;
-    _lower_bound = position_in_partition::for_range_start(*_ck_ranges_curr);
-    _upper_bound = position_in_partition_view::for_range_end(*_ck_ranges_curr);
-    auto complete_until_next = _next_row.advance_to(_lower_bound) || _next_row.continuous();
-    _next_row_in_range = !after_current_range(_next_row.position());
-    if (!complete_until_next) {
-        start_reading_from_underlying();
-    }
-}
-
-// _next_row must be inside the range.
-inline
-void cache_streamed_mutation::move_to_next_entry() {
-    if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {
-        move_to_next_range();
-    } else {
-        if (!_next_row.next()) {
-            move_to_end();
-            return;
-        }
-        _next_row_in_range = !after_current_range(_next_row.position());
-        if (!_next_row.continuous()) {
-            start_reading_from_underlying();
-        }
-    }
-}
-
-inline
-void cache_streamed_mutation::drain_tombstones(position_in_partition_view pos) {
-    while (auto mfo = _tombstones.get_next(pos)) {
-        push_mutation_fragment(std::move(*mfo));
-    }
-}
-
-inline
-void cache_streamed_mutation::drain_tombstones() {
-    while (auto mfo = _tombstones.get_next()) {
-        push_mutation_fragment(std::move(*mfo));
-    }
-}
-
-inline
-void cache_streamed_mutation::add_to_buffer(mutation_fragment&& mf) {
-    if (mf.is_clustering_row()) {
-        add_clustering_row_to_buffer(std::move(mf));
-    } else {
-        assert(mf.is_range_tombstone());
-        add_to_buffer(std::move(mf).as_range_tombstone());
-    }
-}
-
-inline
-void cache_streamed_mutation::add_to_buffer(const partition_snapshot_row_cursor& row) {
-    if (!row.dummy()) {
-        _read_context->cache().on_row_hit();
-        add_clustering_row_to_buffer(row.row());
-    }
-}
-
-inline
-void cache_streamed_mutation::add_clustering_row_to_buffer(mutation_fragment&& mf) {
-    auto& row = mf.as_clustering_row();
-    drain_tombstones(row.position());
-    _last_row_key = row.key();
-    _lower_bound = position_in_partition::after_key(row.key());
-    push_mutation_fragment(std::move(mf));
-}
-
-inline
-void cache_streamed_mutation::add_to_buffer(range_tombstone&& rt) {
-    // This guarantees that rt starts after any emitted clustering_row
-    if (!rt.trim_front(*_schema, _lower_bound)) {
-        return;
-    }
-    _lower_bound = position_in_partition(rt.position());
-    _tombstones.apply(std::move(rt));
-    drain_tombstones(_lower_bound);
-}
-
-inline
-void cache_streamed_mutation::maybe_add_to_cache(const range_tombstone& rt) {
-    if (can_populate()) {
-        _lsa_manager.run_in_update_section_with_allocator([&] {
-            _snp->version()->partition().row_tombstones().apply_monotonically(*_schema, rt);
-        });
-    } else {
-        _read_context->cache().on_mispopulate();
-    }
-}
-
-inline
-void cache_streamed_mutation::maybe_add_to_cache(const static_row& sr) {
-    if (can_populate()) {
-        _read_context->cache().on_row_insert();
-        _lsa_manager.run_in_update_section_with_allocator([&] {
-            _snp->version()->partition().static_row().apply(*_schema, column_kind::static_column, sr.cells());
-        });
-    } else {
-        _read_context->cache().on_mispopulate();
-    }
-}
-
-inline
-void cache_streamed_mutation::maybe_set_static_row_continuous() {
-    if (can_populate()) {
-        _snp->version()->partition().set_static_row_continuous(true);
-    } else {
-        _read_context->cache().on_mispopulate();
-    }
-}
-
-inline
-bool cache_streamed_mutation::can_populate() const {
-    return _snp->at_latest_version() && _read_context->cache().phase_of(_read_context->key()) == _read_context->phase();
-}
-
-} // namespace cache
-
-inline streamed_mutation make_cache_streamed_mutation(schema_ptr s,
-                                                      dht::decorated_key dk,
-                                                      query::clustering_key_filter_ranges crr,
-                                                      row_cache& cache,
-                                                      lw_shared_ptr<cache::read_context> ctx,
-                                                      lw_shared_ptr<partition_snapshot> snp)
-{
-    return make_streamed_mutation<cache::cache_streamed_mutation>(
-        std::move(s), std::move(dk), std::move(crr), std::move(ctx), std::move(snp), cache);
-}
--- a/checked-file-impl.hh
+++ b/checked-file-impl.hh
@@ -130,7 +130,7 @@ inline file make_checked_file(const io_error_handler& error_handler, file f)
 future<file>
 inline open_checked_file_dma(const io_error_handler& error_handler,
                             sstring name, open_flags flags,
-                             file_open_options options)
+                             file_open_options options = {})
 {
    return do_io_check(error_handler, [&] {
        return open_file_dma(name, flags, options).then([&] (file f) {
@@ -139,17 +139,6 @@ inline open_checked_file_dma(const io_error_handler& error_handler,
    });
 }

-future<file>
-inline open_checked_file_dma(const io_error_handler& error_handler,
-                             sstring name, open_flags flags)
-{
-    return do_io_check(error_handler, [&] {
-        return open_file_dma(name, flags).then([&] (file f) {
-            return make_ready_future<file>(make_checked_file(error_handler, f));
-        });
-    });
-}
-
 future<file>
 inline open_checked_directory(const io_error_handler& error_handler,
                              sstring name)
--- a/clustering_bounds_comparator.hh
+++ b/clustering_bounds_comparator.hh
@@ -42,17 +42,6 @@ std::ostream& operator<<(std::ostream& out, const bound_kind k);
 bound_kind invert_kind(bound_kind k);
 int32_t weight(bound_kind k);

-static inline bound_kind flip_bound_kind(bound_kind bk)
-{
-    switch (bk) {
-    case bound_kind::excl_end: return bound_kind::excl_start;
-    case bound_kind::incl_end: return bound_kind::incl_start;
-    case bound_kind::excl_start: return bound_kind::excl_end;
-    case bound_kind::incl_start: return bound_kind::incl_end;
-    }
-    abort();
-}
-
 class bound_view {
 public:
    const static thread_local clustering_key empty_prefix;
--- a/clustering_ranges_walker.hh
+++ b/clustering_ranges_walker.hh
@@ -169,14 +169,14 @@ public:
    bool contains_tombstone(position_in_partition_view start, position_in_partition_view end) const {
        position_in_partition::less_compare less(_schema);

-        if (_trim && less(end, *_trim)) {
+        if (_trim && !less(*_trim, end)) {
            return false;
        }

        auto i = _current;
        while (i != _end) {
            auto range_start = position_in_partition_view::for_range_start(*i);
-            if (less(end, range_start)) {
+            if (!less(range_start, end)) {
                return false;
            }
            auto range_end = position_in_partition_view::for_range_end(*i);
--- a/coding-style.md
+++ b/coding-style.md
@@ -0,0 +1,3 @@
+# Scylla Coding Style
+
+Please see the [Seastar style document](https://github.com/scylladb/seastar/blob/master/coding-style.md).
--- a/compaction_strategy.hh
+++ b/compaction_strategy.hh
@@ -21,6 +21,9 @@

 #pragma once

+#include "sstables/shared_sstable.hh"
+#include "exceptions/exceptions.hh"
+
 class column_family;
 class schema;
 using schema_ptr = lw_shared_ptr<const schema>;
@@ -33,6 +36,7 @@ enum class compaction_strategy_type {
    size_tiered,
    leveled,
    date_tiered,
+    time_window,
 };

 class compaction_strategy_impl;
@@ -53,13 +57,13 @@ public:
    compaction_strategy& operator=(compaction_strategy&&);

    // Return a list of sstables to be compacted after applying the strategy.
-    compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
+    compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<shared_sstable> candidates);

-    std::vector<resharding_descriptor> get_resharding_jobs(column_family& cf, std::vector<lw_shared_ptr<sstable>> candidates);
+    std::vector<resharding_descriptor> get_resharding_jobs(column_family& cf, std::vector<shared_sstable> candidates);

    // Some strategies may look at the compacted and resulting sstables to
    // get some useful information for subsequent compactions.
-    void notify_completion(const std::vector<lw_shared_ptr<sstable>>& removed, const std::vector<lw_shared_ptr<sstable>>& added);
+    void notify_completion(const std::vector<shared_sstable>& removed, const std::vector<shared_sstable>& added);

    // Return if parallel compaction is allowed by strategy.
    bool parallel_compaction() const;
@@ -82,6 +86,8 @@ public:
            return "LeveledCompactionStrategy";
        case compaction_strategy_type::date_tiered:
            return "DateTieredCompactionStrategy";
+        case compaction_strategy_type::time_window:
+            return "TimeWindowCompactionStrategy";
        default:
            throw std::runtime_error("Invalid Compaction Strategy");
        }
@@ -100,6 +106,8 @@ public:
            return compaction_strategy_type::leveled;
        } else if (short_name == "DateTieredCompactionStrategy") {
            return compaction_strategy_type::date_tiered;
+        } else if (short_name == "TimeWindowCompactionStrategy") {
+            return compaction_strategy_type::time_window;
        } else {
            throw exceptions::configuration_exception(sprint("Unable to find compaction strategy class '%s'", name));
        }
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -12,7 +12,9 @@

 # The name of the cluster. This is mainly used to prevent machines in
 # one logical cluster from joining another.
-cluster_name: 'Test Cluster'
+# It is recommended to change the default value when creating a new cluster.
+# You can NOT modify this value for an existing cluster
+#cluster_name: 'Test Cluster'

 # This defines the number of tokens randomly assigned to this node on the ring
 # The more tokens, relative to other nodes, the larger the proportion of data
@@ -85,6 +87,13 @@ listen_address: localhost
 # Leaving this blank will set it to the same value as listen_address
 # broadcast_address: 1.2.3.4

+
+# When using multiple physical network interfaces, set this to true to listen on broadcast_address
+# in addition to the listen_address, allowing nodes to communicate in both interfaces.
+# Ignore this property if the network configuration automatically routes between the public and private networks such as EC2.
+#
+# listen_on_broadcast_address: false
+
 # port for the CQL native transport to listen for clients on
 # For security reasons, you should not expose this port to the internet.  Firewall it if needed.
 native_transport_port: 9042
@@ -270,17 +279,17 @@ batch_size_fail_threshold_in_kb: 50

 # Validity period for permissions cache (fetching permissions can be an
 # expensive operation depending on the authorizer, CassandraAuthorizer is
-# one example). Defaults to 2000, set to 0 to disable.
+# one example). Defaults to 10000, set to 0 to disable.
 # Will be disabled automatically for AllowAllAuthorizer.
-# permissions_validity_in_ms: 2000
+# permissions_validity_in_ms: 10000

 # Refresh interval for permissions cache (if enabled).
 # After this interval, cache entries become eligible for refresh. Upon next
 # access, an async reload is scheduled and the old value returned until it
-# completes. If permissions_validity_in_ms is non-zero, then this must be
-# also.
-# Defaults to the same value as permissions_validity_in_ms.
-# permissions_update_interval_in_ms: 1000
+# completes. If permissions_validity_in_ms is non-zero, then this also must have
+# a non-zero value. Defaults to 2000. It's recommended to set this value to
+# be at least 3 times smaller than the permissions_validity_in_ms.
+# permissions_update_interval_in_ms: 2000

 # The partitioner is responsible for distributing groups of rows (by
 # partition key) across nodes in the cluster.  You should leave this
--- a/configure.py
+++ b/configure.py
@@ -34,7 +34,7 @@ for line in open('/etc/os-release'):
        os_ids += value.split(' ')

 # distribution "internationalization", converting package names.
-# Fedora name is key, values is distro -> package name dict.
+# Fedora name is key, values is distro -> package name dict. 
 i18n_xlat = {
    'boost-devel': {
        'debian': 'libboost-dev',
@@ -48,7 +48,7 @@ def pkgname(name):
        for id in os_ids:
            if id in dict:
                return dict[id]
-    return name
+    return name 

 def get_flags():
    with open('/proc/cpuinfo') as f:
@@ -86,7 +86,7 @@ def try_compile(compiler, source = '', flags = []):
    with tempfile.NamedTemporaryFile() as sfile:
        sfile.file.write(bytes(source, 'utf-8'))
        sfile.file.flush()
-        return subprocess.call([compiler, '-x', 'c++', '-o', '/dev/null', '-c', sfile.name] + flags,
+        return subprocess.call([compiler, '-x', 'c++', '-o', '/dev/null', '-c', sfile.name] + args.user_cflags.split() + flags,
                               stdout = subprocess.DEVNULL,
                               stderr = subprocess.DEVNULL) == 0

@@ -167,7 +167,9 @@ modes = {

 scylla_tests = [
    'tests/mutation_test',
+    'tests/mvcc_test',
    'tests/streamed_mutation_test',
+    'tests/flat_mutation_reader_test',
    'tests/schema_registry_test',
    'tests/canonical_mutation_test',
    'tests/range_test',
@@ -186,7 +188,8 @@ scylla_tests = [
    'tests/perf/perf_cql_parser',
    'tests/perf/perf_simple_query',
    'tests/perf/perf_fast_forward',
-    'tests/cache_streamed_mutation_test',
+    'tests/perf/perf_cache_eviction',
+    'tests/cache_flat_mutation_reader_test',
    'tests/row_cache_stress_test',
    'tests/memory_footprint',
    'tests/perf/perf_sstable',
@@ -221,7 +224,7 @@ scylla_tests = [
    'tests/murmur_hash_test',
    'tests/allocation_strategy_test',
    'tests/logalloc_test',
-    'tests/log_histogram_test',
+    'tests/log_heap_test',
    'tests/managed_vector_test',
    'tests/crc_test',
    'tests/flush_queue_test',
@@ -238,7 +241,15 @@ scylla_tests = [
    'tests/view_schema_test',
    'tests/counter_test',
    'tests/cell_locker_test',
+    'tests/streaming_histogram_test',
+    'tests/duration_test',
+    'tests/vint_serialization_test',
+    'tests/compress_test',
+    'tests/chunked_vector_test',
    'tests/loading_cache_test',
+    'tests/castas_fcts_test',
+    'tests/big_decimal_test',
+    'tests/aggregate_fcts_test',
 ]

 apps = [
@@ -324,6 +335,7 @@ scylla_core = (['database.cc',
                 'mutation_partition_view.cc',
                 'mutation_partition_serializer.cc',
                 'mutation_reader.cc',
+                 'flat_mutation_reader.cc',
                 'mutation_query.cc',
                 'keys.cc',
                 'counters.cc',
@@ -331,11 +343,11 @@ scylla_core = (['database.cc',
                 'sstables/compress.cc',
                 'sstables/row.cc',
                 'sstables/partition.cc',
-                 'sstables/filter.cc',
                 'sstables/compaction.cc',
                 'sstables/compaction_strategy.cc',
                 'sstables/compaction_manager.cc',
                 'sstables/atomic_deletion.cc',
+                 'sstables/integrity_checked_file_impl.cc',
                 'transport/event.cc',
                 'transport/event_notifier.cc',
                 'transport/server.cc',
@@ -350,6 +362,7 @@ scylla_core = (['database.cc',
                 'cql3/sets.cc',
                 'cql3/maps.cc',
                 'cql3/functions/functions.cc',
+                 'cql3/functions/castas_fcts.cc',
                 'cql3/statements/cf_prop_defs.cc',
                 'cql3/statements/cf_statement.cc',
                 'cql3/statements/authentication_statement.cc',
@@ -451,6 +464,7 @@ scylla_core = (['database.cc',
                 'utils/dynamic_bitset.cc',
                 'utils/managed_bytes.cc',
                 'utils/exceptions.cc',
+                 'utils/config_file.cc',
                 'gms/version_generator.cc',
                 'gms/versioned_value.cc',
                 'gms/gossiper.cc',
@@ -510,20 +524,27 @@ scylla_core = (['database.cc',
                 'lister.cc',
                 'repair/repair.cc',
                 'exceptions/exceptions.cc',
-                 'auth/auth.cc',
+                 'auth/allow_all_authenticator.cc',
+                 'auth/allow_all_authorizer.cc',
                 'auth/authenticated_user.cc',
                 'auth/authenticator.cc',
-                 'auth/authorizer.cc',
+                 'auth/common.cc',
                 'auth/default_authorizer.cc',
                 'auth/data_resource.cc',
                 'auth/password_authenticator.cc',
                 'auth/permission.cc',
+                 'auth/permissions_cache.cc',
+                 'auth/service.cc',
+                 'auth/transitional.cc',
                 'tracing/tracing.cc',
                 'tracing/trace_keyspace_helper.cc',
                 'tracing/trace_state.cc',
+                 'table_helper.cc',
                 'range_tombstone.cc',
                 'range_tombstone_list.cc',
-                 'disk-error-handler.cc'
+                 'disk-error-handler.cc',
+                 'duration.cc',
+                 'vint-serialization.cc',
                 ]
                + [Antlr3Grammar('cql3/Cql.g')]
                + [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -619,6 +640,12 @@ pure_boost_tests = set([
    'tests/dynamic_bitset_test',
    'tests/idl_test',
    'tests/cartesian_product_test',
+    'tests/streaming_histogram_test',
+    'tests/duration_test',
+    'tests/vint_serialization_test',
+    'tests/compress_test',
+    'tests/chunked_vector_test',
+    'tests/big_decimal_test',
 ])

 tests_not_using_seastar_test_framework = set([
@@ -632,6 +659,7 @@ tests_not_using_seastar_test_framework = set([
    'tests/message',
    'tests/perf/perf_simple_query',
    'tests/perf/perf_fast_forward',
+    'tests/perf/perf_cache_eviction',
    'tests/row_cache_stress_test',
    'tests/memory_footprint',
    'tests/gossip',
@@ -645,19 +673,20 @@ for t in tests_not_using_seastar_test_framework:
 for t in scylla_tests:
    deps[t] = [t + '.cc']
    if t not in tests_not_using_seastar_test_framework:
-        deps[t] += scylla_tests_dependencies
+        deps[t] += scylla_tests_dependencies 
        deps[t] += scylla_tests_seastar_deps
    else:
        deps[t] += scylla_core + api + idls + ['tests/cql_test_env.cc']

-deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc']
+deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc', 'tests/sstable_utils.cc']
+deps['tests/mutation_reader_test'] += ['tests/sstable_utils.cc']

 deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
 deps['tests/input_stream_test'] = ['tests/input_stream_test.cc']
 deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
 deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
 deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
-deps['tests/log_histogram_test'] = ['tests/log_histogram_test.cc']
+deps['tests/log_heap_test'] = ['tests/log_heap_test.cc']
 deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']

 warnings = [
@@ -671,6 +700,10 @@ warnings = [
    '-Wno-return-stack-address',
    '-Wno-missing-braces',
    '-Wno-unused-lambda-capture',
+    '-Wno-misleading-indentation',
+    '-Wno-overflow',
+    '-Wno-noexcept-type',
+    '-Wno-nonnull-compare'
    ]

 warnings = [w
@@ -772,7 +805,8 @@ if args.alloc_failure_injector:
    seastar_flags += ['--enable-alloc-failure-injector']

 seastar_cflags = args.user_cflags + " -march=nehalem"
-seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags)]
+seastar_ldflags = args.user_ldflags
+seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' %(seastar_ldflags)]

 status = subprocess.call([python, './configure.py'] + seastar_flags, cwd = 'seastar')

@@ -836,7 +870,7 @@ with open(buildfile, 'w') as f:
        builddir = {outdir}
        cxx = {cxx}
        cxxflags = {user_cflags} {warnings} {defines}
-        ldflags = {user_ldflags}
+        ldflags = -fuse-ld=gold {user_ldflags}
        libs = {libs}
        pool link_pool
            depth = {link_pool_depth}
@@ -893,7 +927,8 @@ with open(buildfile, 'w') as f:
                     && sed -i -e 's/^\\( *\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$$/\\1const \\2/' $
                        -e '1i using ExceptionBaseType = int;' $
                        -e 's/^{{/{{ ExceptionBaseType\* ex = nullptr;/; $
-                            s/ExceptionBaseType\* ex = new/ex = new/' $
+                            s/ExceptionBaseType\* ex = new/ex = new/; $
+                            s/exceptions::syntax_exception e/exceptions::syntax_exception\& e/' $
                        build/{mode}/gen/${{stem}}Parser.cpp
                description = ANTLR3 $in
            ''').format(mode = mode, **modeval))
@@ -919,25 +954,13 @@ with open(buildfile, 'w') as f:
                    objs += dep.objects('$builddir/' + mode + '/gen')
                if isinstance(dep, Antlr3Grammar):
                    objs += dep.objects('$builddir/' + mode + '/gen')
-            if binary.endswith('.pc'):
-                vars = modeval.copy()
-                vars.update(globals())
-                pc = textwrap.dedent('''\
-                        Name: Seastar
-                        URL: http://seastar-project.org/
-                        Description: Advanced C++ framework for high-performance server applications on modern hardware.
-                        Version: 1.0
-                        Libs: -L{srcdir}/{builddir} -Wl,--whole-archive -lseastar -Wl,--no-whole-archive {dbgflag} -Wl,--no-as-needed {static} {pie} -fvisibility=hidden -pthread {user_ldflags} {libs} {sanitize_libs}
-                        Cflags: -std=gnu++1y {dbgflag} {fpie} -Wall -Werror -fvisibility=hidden -pthread -I{srcdir} -I{srcdir}/{builddir}/gen {user_cflags} {warnings} {defines} {sanitize} {opt}
-                        ''').format(builddir = 'build/' + mode, srcdir = os.getcwd(), **vars)
-                f.write('build $builddir/{}/{}: gen\n  text = {}\n'.format(mode, binary, repr(pc)))
-            elif binary.endswith('.a'):
+            if binary.endswith('.a'):
                f.write('build $builddir/{}/{}: ar.{} {}\n'.format(mode, binary, mode, str.join(' ', objs)))
            else:
                if binary.startswith('tests/'):
                    local_libs = '$libs'
                    if binary not in tests_not_using_seastar_test_framework or binary in pure_boost_tests:
-                        local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
+                        local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework') 
                    if has_thrift:
                        local_libs += ' ' + thrift_libs + ' ' + maybe_static(args.staticboost, '-lboost_system')
                    # Our code's debugging information is huge, and multiplied
--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -399,6 +399,7 @@ unaliasedSelector returns [shared_ptr<selectable::raw> s]
       | K_WRITETIME '(' c=cident ')'              { tmp = make_shared<selectable::writetime_or_ttl::raw>(c, true); }
       | K_TTL       '(' c=cident ')'              { tmp = make_shared<selectable::writetime_or_ttl::raw>(c, false); }
       | f=functionName args=selectionFunctionArgs { tmp = ::make_shared<selectable::with_function::raw>(std::move(f), std::move(args)); }
+       | K_CAST      '(' arg=unaliasedSelector K_AS t=native_type ')'  { tmp = ::make_shared<selectable::with_cast::raw>(std::move(arg), std::move(t)); }
       )
       ( '.' fi=cident { tmp = make_shared<selectable::with_field_selection::raw>(std::move(tmp), std::move(fi)); } )*
    { $s = tmp; }
@@ -1167,6 +1168,7 @@ constant returns [shared_ptr<cql3::constants::literal> constant]
    | t=INTEGER        { $constant = cql3::constants::literal::integer(sstring{$t.text}); }
    | t=FLOAT          { $constant = cql3::constants::literal::floating_point(sstring{$t.text}); }
    | t=BOOLEAN        { $constant = cql3::constants::literal::bool_(sstring{$t.text}); }
+    | t=DURATION       { $constant = cql3::constants::literal::duration(sstring{$t.text}); }
    | t=UUID           { $constant = cql3::constants::literal::uuid(sstring{$t.text}); }
    | t=HEXNUMBER      { $constant = cql3::constants::literal::hex(sstring{$t.text}); }
    | { sign=""; } ('-' {sign = "-"; } )? t=(K_NAN | K_INFINITY) { $constant = cql3::constants::literal::floating_point(sstring{sign + $t.text}); }
@@ -1464,6 +1466,7 @@ native_type returns [shared_ptr<cql3_type> t]
    | K_COUNTER   { $t = cql3_type::counter; }
    | K_DECIMAL   { $t = cql3_type::decimal; }
    | K_DOUBLE    { $t = cql3_type::double_; }
+    | K_DURATION  { $t = cql3_type::duration; }
    | K_FLOAT     { $t = cql3_type::float_; }
    | K_INET      { $t = cql3_type::inet; }
    | K_INT       { $t = cql3_type::int_; }
@@ -1569,6 +1572,7 @@ basic_unreserved_keyword returns [sstring str]
 K_SELECT:      S E L E C T;
 K_FROM:        F R O M;
 K_AS:          A S;
+K_CAST:        C A S T;
 K_WHERE:       W H E R E;
 K_AND:         A N D;
 K_KEY:         K E Y;
@@ -1649,6 +1653,7 @@ K_BOOLEAN:     B O O L E A N;
 K_COUNTER:     C O U N T E R;
 K_DECIMAL:     D E C I M A L;
 K_DOUBLE:      D O U B L E;
+K_DURATION:    D U R A T I O N;
 K_FLOAT:       F L O A T;
 K_INET:        I N E T;
 K_INT:         I N T;
@@ -1778,6 +1783,20 @@ fragment EXPONENT
    : E ('+' | '-')? DIGIT+
    ;

+fragment DURATION_UNIT
+    : Y
+    | M O
+    | W
+    | D
+    | H
+    | M
+    | S
+    | M S
+    | U S
+    | '\u00B5' S
+    | N S
+    ;
+
 INTEGER
    : '-'? DIGIT+
    ;
@@ -1802,6 +1821,13 @@ BOOLEAN
    : T R U E | F A L S E
    ;

+DURATION
+    : '-'? DIGIT+ DURATION_UNIT (DIGIT+ DURATION_UNIT)*
+    | '-'? 'P' (DIGIT+ 'Y')? (DIGIT+ 'M')? (DIGIT+ 'D')? ('T' (DIGIT+ 'H')? (DIGIT+ 'M')? (DIGIT+ 'S')?)? // ISO 8601 "format with designators"
+    | '-'? 'P' DIGIT+ 'W'
+    | '-'? 'P' DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT 'T' DIGIT DIGIT ':' DIGIT DIGIT ':' DIGIT DIGIT // ISO 8601 "alternative format"
+    ;
+
 IDENT
    : LETTER (LETTER | DIGIT | '_')*
    ;
--- a/cql3/abstract_marker.cc
+++ b/cql3/abstract_marker.cc
@@ -79,6 +79,7 @@ abstract_marker::raw::raw(int32_t bind_index)
        return ::make_shared<maps::marker>(_bind_index, receiver);
    }
    assert(0);
+    return shared_ptr<term>();
 }

 assignment_testable::test_result abstract_marker::raw::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) {
--- a/cql3/attributes.cc
+++ b/cql3/attributes.cc
@@ -79,7 +79,7 @@ int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
    }
    try {
        data_type_for<int64_t>()->validate(*tval);
-    } catch (marshal_exception e) {
+    } catch (marshal_exception& e) {
        throw exceptions::invalid_request_exception("Invalid timestamp value");
    }
    return value_cast<int64_t>(data_type_for<int64_t>()->deserialize(*tval));
@@ -99,7 +99,7 @@ int32_t attributes::get_time_to_live(const query_options& options) {
    try {
        data_type_for<int32_t>()->validate(*tval);
    }
-    catch (marshal_exception e) {
+    catch (marshal_exception& e) {
        throw exceptions::invalid_request_exception("Invalid TTL value");
    }

--- a/cql3/column_condition.cc
+++ b/cql3/column_condition.cc
@@ -40,11 +40,29 @@
 */

 #include "cql3/column_condition.hh"
+#include "statements/request_validations.hh"
 #include "unimplemented.hh"
 #include "lists.hh"
 #include "maps.hh"
 #include <boost/range/algorithm_ext/push_back.hpp>

+namespace {
+
+void validate_operation_on_durations(const abstract_type& type, const cql3::operator_type& op) {
+    using cql3::statements::request_validations::check_false;
+
+    if (op.is_slice() && type.references_duration()) {
+        check_false(type.is_collection(), "Slice conditions are not supported on collections containing durations");
+        check_false(type.is_tuple(), "Slice conditions are not supported on tuples containing durations");
+        check_false(type.is_user_type(), "Slice conditions are not supported on UDTs containing durations");
+
+        // We're a duration.
+        throw exceptions::invalid_request_exception(sprint("Slice conditions are not supported on durations"));
+    }
+}
+
+}
+
 namespace cql3 {

 bool
@@ -95,6 +113,7 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
            }
            return column_condition::in_condition(receiver, std::move(terms));
        } else {
+            validate_operation_on_durations(*receiver.type, _op);
            return column_condition::condition(receiver, _value->prepare(db, keyspace, receiver.column_specification), _op);
        }
    }
@@ -129,6 +148,8 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
                                | boost::adaptors::transformed(std::bind(&term::raw::prepare, std::placeholders::_1, std::ref(db), std::ref(keyspace), value_spec)));
        return column_condition::in_condition(receiver, _collection_element->prepare(db, keyspace, element_spec), terms);
    } else {
+        validate_operation_on_durations(*receiver.type, _op);
+
        return column_condition::condition(receiver,
                _collection_element->prepare(db, keyspace, element_spec),
                _value->prepare(db, keyspace, value_spec),
--- a/cql3/constants.cc
+++ b/cql3/constants.cc
@@ -52,14 +52,15 @@ std::ostream&
 operator<<(std::ostream&out, constants::type t)
 {
    switch (t) {
-        case constants::type::STRING:  return out << "STRING";
-        case constants::type::INTEGER: return out << "INTEGER";
-        case constants::type::UUID:    return out << "UUID";
-        case constants::type::FLOAT:   return out << "FLOAT";
-        case constants::type::BOOLEAN: return out << "BOOLEAN";
-        case constants::type::HEX:     return out << "HEX";
-    };
-    assert(0);
+        case constants::type::STRING:   return out << "STRING";
+        case constants::type::INTEGER:  return out << "INTEGER";
+        case constants::type::UUID:     return out << "UUID";
+        case constants::type::FLOAT:    return out << "FLOAT";
+        case constants::type::BOOLEAN:  return out << "BOOLEAN";
+        case constants::type::HEX:      return out << "HEX";
+        case constants::type::DURATION: return out << "DURATION";
+    }
+    abort();
 }

 bytes
@@ -145,6 +146,11 @@ constants::literal::test_assignment(database& db, const sstring& keyspace, ::sha
                return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
            }
            break;
+        case type::DURATION:
+            if (kind == cql3_type::kind_enum_set::prepare<cql3_type::kind::DURATION>()) {
+                return assignment_testable::test_result::EXACT_MATCH;
+            }
+            break;
    }
    return assignment_testable::test_result::NOT_ASSIGNABLE;
 }
--- a/cql3/constants.hh
+++ b/cql3/constants.hh
@@ -60,7 +60,7 @@ public:
 #endif
 public:
    enum class type {
-        STRING, INTEGER, UUID, FLOAT, BOOLEAN, HEX
+        STRING, INTEGER, UUID, FLOAT, BOOLEAN, HEX, DURATION
    };

    /**
@@ -149,6 +149,10 @@ public:
            return ::make_shared<literal>(type::HEX, text);
        }

+        static ::shared_ptr<literal> duration(sstring text) {
+            return ::make_shared<literal>(type::DURATION, text);
+        }
+
        virtual ::shared_ptr<term> prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver);
    private:
        bytes parsed_value(data_type validator);
--- a/cql3/cql3_type.cc
+++ b/cql3/cql3_type.cc
@@ -48,6 +48,10 @@ shared_ptr<cql3_type> cql3_type::raw::prepare(database& db, const sstring& keysp
    }
 }

+bool cql3_type::raw::is_duration() const {
+    return false;
+}
+
 bool cql3_type::raw::references_user_type(const sstring& name) const {
    return false;
 }
@@ -78,6 +82,10 @@ public:
    virtual sstring to_string() const {
        return _type->to_string();
    }
+
+    virtual bool is_duration() const override {
+        return _type->get_type()->equals(duration_type);
+    }
 };

 class cql3_type::raw_collection : public raw {
@@ -126,9 +134,15 @@ public:
        if (_kind == &collection_type_impl::kind::list) {
            return make_shared(cql3_type(to_string(), list_type_impl::get_instance(_values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
        } else if (_kind == &collection_type_impl::kind::set) {
+            if (_values->is_duration()) {
+                throw exceptions::invalid_request_exception(sprint("Durations are not allowed inside sets: %s", *this));
+            }
            return make_shared(cql3_type(to_string(), set_type_impl::get_instance(_values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
        } else if (_kind == &collection_type_impl::kind::map) {
            assert(_keys); // "Got null keys type for a collection";
+            if (_keys->is_duration()) {
+                throw exceptions::invalid_request_exception(sprint("Durations are not allowed as map keys: %s", *this));
+            }
            return make_shared(cql3_type(to_string(), map_type_impl::get_instance(_keys->prepare_internal(keyspace, user_types)->get_type(), _values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
        }
        abort();
@@ -138,6 +152,10 @@ public:
        return (_keys && _keys->references_user_type(name)) || _values->references_user_type(name);
    }

+    bool is_duration() const override {
+        return false;
+    }
+
    virtual sstring to_string() const override {
        sstring start = _frozen ? "frozen<" : "";
        sstring end = _frozen ? ">" : "";
@@ -329,6 +347,7 @@ thread_local shared_ptr<cql3_type> cql3_type::inet = make("inet", inet_addr_type
 thread_local shared_ptr<cql3_type> cql3_type::varint = make("varint", varint_type, cql3_type::kind::VARINT);
 thread_local shared_ptr<cql3_type> cql3_type::decimal = make("decimal", decimal_type, cql3_type::kind::DECIMAL);
 thread_local shared_ptr<cql3_type> cql3_type::counter = make("counter", counter_type, cql3_type::kind::COUNTER);
+thread_local shared_ptr<cql3_type> cql3_type::duration = make("duration", duration_type, cql3_type::kind::DURATION);

 const std::vector<shared_ptr<cql3_type>>&
 cql3_type::values() {
@@ -354,6 +373,7 @@ cql3_type::values() {
        cql3_type::timeuuid,
        cql3_type::date,
        cql3_type::time,
+        cql3_type::duration,
    };
    return v;
 }
--- a/cql3/cql3_type.hh
+++ b/cql3/cql3_type.hh
@@ -75,6 +75,7 @@ public:
        virtual bool supports_freezing() const = 0;
        virtual bool is_collection() const;
        virtual bool is_counter() const;
+        virtual bool is_duration() const;
        virtual bool references_user_type(const sstring&) const;
        virtual std::experimental::optional<sstring> keyspace() const;
        virtual void freeze();
@@ -102,7 +103,7 @@ private:

 public:
    enum class kind : int8_t {
-        ASCII, BIGINT, BLOB, BOOLEAN, COUNTER, DECIMAL, DOUBLE, EMPTY, FLOAT, INT, SMALLINT, TINYINT, INET, TEXT, TIMESTAMP, UUID, VARCHAR, VARINT, TIMEUUID, DATE, TIME
+        ASCII, BIGINT, BLOB, BOOLEAN, COUNTER, DECIMAL, DOUBLE, EMPTY, FLOAT, INT, SMALLINT, TINYINT, INET, TEXT, TIMESTAMP, UUID, VARCHAR, VARINT, TIMEUUID, DATE, TIME, DURATION
    };
    using kind_enum = super_enum<kind,
        kind::ASCII,
@@ -125,7 +126,8 @@ public:
        kind::VARINT,
        kind::TIMEUUID,
        kind::DATE,
-        kind::TIME>;
+        kind::TIME,
+        kind::DURATION>;
    using kind_enum_set = enum_set<kind_enum>;
 private:
    std::experimental::optional<kind_enum_set::prepared> _kind;
@@ -154,6 +156,7 @@ public:
    static thread_local shared_ptr<cql3_type> varint;
    static thread_local shared_ptr<cql3_type> decimal;
    static thread_local shared_ptr<cql3_type> counter;
+    static thread_local shared_ptr<cql3_type> duration;

    static const std::vector<shared_ptr<cql3_type>>& values();
 public:
--- a/cql3/error_collector.hh
+++ b/cql3/error_collector.hh
@@ -67,10 +67,6 @@ class error_collector : public error_listener<RecognizerType, ExceptionBaseType>
     */
    const sstring_view _query;

-    /**
-     * The error messages.
-     */
-    std::vector<sstring> _error_msgs;
 public:

    /**
@@ -81,7 +77,10 @@ public:
     */
    error_collector(const sstring_view& query) : _query(query) {}

-    virtual void syntax_error(RecognizerType& recognizer, ANTLR_UINT8** token_names, ExceptionBaseType* ex) override {
+    /**
+     * Format and throw a new \c exceptions::syntax_exception.
+     */
+    [[noreturn]] virtual void syntax_error(RecognizerType& recognizer, ANTLR_UINT8** token_names, ExceptionBaseType* ex) override {
        auto hdr = get_error_header(ex);
        auto msg = get_error_message(recognizer, ex, token_names);
        std::stringstream result;
@@ -90,22 +89,15 @@ public:
        if (recognizer instanceof Parser)
            appendQuerySnippet((Parser) recognizer, builder);
 #endif
-        _error_msgs.emplace_back(result.str());
-    }

-    virtual void syntax_error(RecognizerType& recognizer, const sstring& msg) override {
-        _error_msgs.emplace_back(msg);
+        throw exceptions::syntax_exception(result.str());
    }

    /**
-     * Throws the first syntax error found by the lexer or the parser if it exists.
-     *
-     * @throws SyntaxException the syntax error.
+     * Throw a new \c exceptions::syntax_exception.
     */
-    void throw_first_syntax_error() {
-        if (!_error_msgs.empty()) {
-            throw exceptions::syntax_exception(_error_msgs[0]);
-        }
+    [[noreturn]] virtual void syntax_error(RecognizerType&, const sstring& msg) override {
+        throw exceptions::syntax_exception(msg);
    }

 private:
--- a/cql3/error_listener.hh
+++ b/cql3/error_listener.hh
@@ -53,6 +53,7 @@ namespace cql3 {
 template<typename RecognizerType, typename ExceptionBaseType>
 class error_listener {
 public:
+    virtual ~error_listener() = default;

    /**
     * Invoked when a syntax error occurs.
--- a/cql3/functions/aggregate_fcts.hh
+++ b/cql3/functions/aggregate_fcts.hh
@@ -41,6 +41,7 @@

 #pragma once

+#include "utils/big_decimal.hh"
 #include "aggregate_function.hh"
 #include "native_aggregate_function.hh"

@@ -111,9 +112,70 @@ make_sum_function() {
    return make_shared<sum_function_for<Type>>();
 }

+template <typename Type>
+class impl_div_for_avg {
+public:
+    static Type div(const Type& x, const int64_t y) {
+        return x/y;
+    }
+};
+
+template <>
+class impl_div_for_avg<big_decimal> {
+public:
+    static big_decimal div(const big_decimal& x, const int64_t y) {
+        return x.div(y, big_decimal::rounding_mode::HALF_EVEN);
+    }
+};
+
+// We need a wider accumulator for average, since summing the inputs can overflow
+// the input type
+template <typename T>
+struct accumulator_for;
+
+template <>
+struct accumulator_for<int8_t> {
+    using type = __int128;
+};
+
+template <>
+struct accumulator_for<int16_t> {
+    using type = __int128;
+};
+
+template <>
+struct accumulator_for<int32_t> {
+    using type = __int128;
+};
+
+template <>
+struct accumulator_for<int64_t> {
+    using type = __int128;
+};
+
+template <>
+struct accumulator_for<float> {
+    using type = float;
+};
+
+template <>
+struct accumulator_for<double> {
+    using type = double;
+};
+
+template <>
+struct accumulator_for<boost::multiprecision::cpp_int> {
+    using type = boost::multiprecision::cpp_int;
+};
+
+template <>
+struct accumulator_for<big_decimal> {
+    using type = big_decimal;
+};
+
 template <typename Type>
 class impl_avg_function_for final : public aggregate_function::aggregate {
-   Type _sum{};
+   typename accumulator_for<Type>::type _sum{};
   int64_t _count = 0;
 public:
    virtual void reset() override {
@@ -121,9 +183,9 @@ public:
        _count = 0;
    }
    virtual opt_bytes compute(cql_serialization_format sf) override {
-        Type ret = 0;
+        Type ret{};
        if (_count) {
-            ret = _sum / _count;
+            ret = impl_div_for_avg<Type>::div(_sum, _count);
        }
        return data_type_for<Type>()->decompose(ret);
    }
--- a/cql3/functions/castas_fcts.cc
+++ b/cql3/functions/castas_fcts.cc
@@ -0,0 +1,82 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "castas_fcts.hh"
+#include "cql3/functions/native_scalar_function.hh"
+
+namespace cql3 {
+namespace functions {
+
+namespace {
+
+using bytes_opt = std::experimental::optional<bytes>;
+
+class castas_function_for : public cql3::functions::native_scalar_function {
+    castas_fctn _func;
+public:
+    castas_function_for(data_type to_type,
+                        data_type from_type,
+                        castas_fctn func)
+            : native_scalar_function("castas" + to_type->as_cql3_type()->to_string(), to_type, {from_type})
+            , _func(func) {
+    }
+    virtual bool is_pure() override {
+        return true;
+    }
+    virtual void print(std::ostream& os) const override {
+        os << "cast(" << _arg_types[0]->name() << " as " << _return_type->name() << ")";
+    }
+    virtual bytes_opt execute(cql_serialization_format sf, const std::vector<bytes_opt>& parameters) override {
+        auto from_type = arg_types()[0];
+        auto to_type = return_type();
+
+        auto&& val = parameters[0];
+        if (!val) {
+            return val;
+        }
+        auto val_from = from_type->deserialize(*val);
+        auto val_to = _func(val_from);
+        return to_type->decompose(val_to);
+    }
+};
+
+shared_ptr<function> make_castas_function(data_type to_type, data_type from_type, castas_fctn func) {
+    return ::make_shared<castas_function_for>(std::move(to_type), std::move(from_type), std::move(func));
+}
+
+} /* Anonymous Namespace */
+
+shared_ptr<function> castas_functions::get(data_type to_type, const std::vector<shared_ptr<cql3::selection::selector>>& provided_args, schema_ptr s) {
+    if (provided_args.size() != 1) {
+        throw exceptions::invalid_request_exception("Invalid CAST expression");
+    }
+    auto from_type = provided_args[0]->get_type();
+    auto from_type_key = from_type;
+    if (from_type_key->is_reversed()) {
+        from_type_key = dynamic_cast<const reversed_type_impl&>(*from_type).underlying_type();
+    }
+
+    auto f = get_castas_fctn(to_type, from_type_key);
+    return make_castas_function(to_type, from_type, f);
+}
+
+}
+}
--- a/cql3/functions/castas_fcts.hh
+++ b/cql3/functions/castas_fcts.hh
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Modified by ScyllaDB
+ *
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <tuple>
+#include <unordered_map>
+
+#include "cql3/functions/function.hh"
+#include "cql3/functions/abstract_function.hh"
+#include "exceptions/exceptions.hh"
+#include "core/print.hh"
+#include "cql3/cql3_type.hh"
+#include "cql3/selection/selector.hh"
+
+namespace cql3 {
+namespace functions {
+
+class castas_functions {
+public:
+    static shared_ptr<function> get(data_type to_type, const std::vector<shared_ptr<cql3::selection::selector>>& provided_args, schema_ptr s);
+};
+
+}
+}
--- a/cql3/functions/functions.cc
+++ b/cql3/functions/functions.cc
@@ -59,6 +59,14 @@ functions::init() {
        declare(make_to_blob_function(type->get_type()));
        declare(make_from_blob_function(type->get_type()));
    }
+    declare(aggregate_fcts::make_count_function<int8_t>());
+    declare(aggregate_fcts::make_max_function<int8_t>());
+    declare(aggregate_fcts::make_min_function<int8_t>());
+
+    declare(aggregate_fcts::make_count_function<int16_t>());
+    declare(aggregate_fcts::make_max_function<int16_t>());
+    declare(aggregate_fcts::make_min_function<int16_t>());
+
    declare(aggregate_fcts::make_count_function<int32_t>());
    declare(aggregate_fcts::make_max_function<int32_t>());
    declare(aggregate_fcts::make_min_function<int32_t>());
@@ -67,6 +75,14 @@ functions::init() {
    declare(aggregate_fcts::make_max_function<int64_t>());
    declare(aggregate_fcts::make_min_function<int64_t>());

+    declare(aggregate_fcts::make_count_function<boost::multiprecision::cpp_int>());
+    declare(aggregate_fcts::make_max_function<boost::multiprecision::cpp_int>());
+    declare(aggregate_fcts::make_min_function<boost::multiprecision::cpp_int>());
+
+    declare(aggregate_fcts::make_count_function<big_decimal>());
+    declare(aggregate_fcts::make_max_function<big_decimal>());
+    declare(aggregate_fcts::make_min_function<big_decimal>());
+
    declare(aggregate_fcts::make_count_function<float>());
    declare(aggregate_fcts::make_max_function<float>());
    declare(aggregate_fcts::make_min_function<float>());
@@ -88,22 +104,22 @@ functions::init() {

    declare(make_varchar_as_blob_fct());
    declare(make_blob_as_varchar_fct());
+    declare(aggregate_fcts::make_sum_function<int8_t>());
+    declare(aggregate_fcts::make_sum_function<int16_t>());
    declare(aggregate_fcts::make_sum_function<int32_t>());
    declare(aggregate_fcts::make_sum_function<int64_t>());
    declare(aggregate_fcts::make_sum_function<float>());
    declare(aggregate_fcts::make_sum_function<double>());
-#if 0
-    declare(AggregateFcts.sumFunctionForDecimal);
-    declare(AggregateFcts.sumFunctionForVarint);
-#endif
+    declare(aggregate_fcts::make_sum_function<boost::multiprecision::cpp_int>());
+    declare(aggregate_fcts::make_sum_function<big_decimal>());
+    declare(aggregate_fcts::make_avg_function<int8_t>());
+    declare(aggregate_fcts::make_avg_function<int16_t>());
    declare(aggregate_fcts::make_avg_function<int32_t>());
    declare(aggregate_fcts::make_avg_function<int64_t>());
    declare(aggregate_fcts::make_avg_function<float>());
    declare(aggregate_fcts::make_avg_function<double>());
-#if 0
-    declare(AggregateFcts.avgFunctionForVarint);
-    declare(AggregateFcts.avgFunctionForDecimal);
-#endif
+    declare(aggregate_fcts::make_avg_function<boost::multiprecision::cpp_int>());
+    declare(aggregate_fcts::make_avg_function<big_decimal>());

    // also needed for smp:
 #if 0
@@ -342,7 +358,7 @@ function_call::execute_internal(cql_serialization_format sf, scalar_function& fu
            fun.return_type()->validate(*result);
        }
        return result;
-    } catch (marshal_exception e) {
+    } catch (marshal_exception& e) {
        throw runtime_exception(sprint("Return of function %s (%s) is not a valid value for its declared return type %s",
                                       fun, to_hex(result),
                                       *fun.return_type()->as_cql3_type()
--- a/cql3/operation.cc
+++ b/cql3/operation.cc
@@ -46,15 +46,19 @@

 namespace cql3 {

+sstring
+operation::set_element::to_string(const column_definition& receiver) const {
+    return format("{}[{}] = {}", receiver.name_as_text(), *_selector, *_value);
+}

 shared_ptr<operation>
 operation::set_element::prepare(database& db, const sstring& keyspace, const column_definition& receiver) {
    using exceptions::invalid_request_exception;
    auto rtype = dynamic_pointer_cast<const collection_type_impl>(receiver.type);
    if (!rtype) {
-        throw invalid_request_exception(sprint("Invalid operation (%s) for non collection column %s", receiver, receiver.name()));
+        throw invalid_request_exception(sprint("Invalid operation (%s) for non collection column %s", to_string(receiver), receiver.name()));
    } else if (!rtype->is_multi_cell()) {
-        throw invalid_request_exception(sprint("Invalid operation (%s) for frozen collection column %s", receiver, receiver.name()));
+        throw invalid_request_exception(sprint("Invalid operation (%s) for frozen collection column %s", to_string(receiver), receiver.name()));
    }

    if (&rtype->_kind == &collection_type_impl::kind::list) {
@@ -67,7 +71,7 @@ operation::set_element::prepare(database& db, const sstring& keyspace, const col
            return make_shared<lists::setter_by_index>(receiver, idx, lval);
        }
    } else if (&rtype->_kind == &collection_type_impl::kind::set) {
-        throw invalid_request_exception(sprint("Invalid operation (%s) for set column %s", receiver, receiver.name()));
+        throw invalid_request_exception(sprint("Invalid operation (%s) for set column %s", to_string(receiver), receiver.name()));
    } else if (&rtype->_kind == &collection_type_impl::kind::map) {
        auto key = _selector->prepare(db, keyspace, maps::key_spec_of(*receiver.column_specification));
        auto mval = _value->prepare(db, keyspace, maps::value_spec_of(*receiver.column_specification));
@@ -83,6 +87,11 @@ operation::set_element::is_compatible_with(shared_ptr<raw_update> other) {
    return !dynamic_pointer_cast<set_value>(std::move(other));
 }

+sstring
+operation::addition::to_string(const column_definition& receiver) const {
+    return format("{} = {} + {}", receiver.name_as_text(), receiver.name_as_text(), *_value);
+}
+
 shared_ptr<operation>
 operation::addition::prepare(database& db, const sstring& keyspace, const column_definition& receiver) {
    auto v = _value->prepare(db, keyspace, receiver.column_specification);
@@ -90,11 +99,11 @@ operation::addition::prepare(database& db, const sstring& keyspace, const column
    auto ctype = dynamic_pointer_cast<const collection_type_impl>(receiver.type);
    if (!ctype) {
        if (!receiver.is_counter()) {
-            throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for non counter column %s", receiver, receiver.name()));
+            throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for non counter column %s", to_string(receiver), receiver.name()));
        }
        return make_shared<constants::adder>(receiver, v);
    } else if (!ctype->is_multi_cell()) {
-        throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for frozen collection column %s", receiver, receiver.name()));
+        throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for frozen collection column %s", to_string(receiver), receiver.name()));
    }

    if (&ctype->_kind == &collection_type_impl::kind::list) {
@@ -113,19 +122,24 @@ operation::addition::is_compatible_with(shared_ptr<raw_update> other) {
    return !dynamic_pointer_cast<set_value>(other);
 }

+sstring
+operation::subtraction::to_string(const column_definition& receiver) const {
+    return format("{} = {} - {}", receiver.name_as_text(), receiver.name_as_text(), *_value);
+}
+
 shared_ptr<operation>
 operation::subtraction::prepare(database& db, const sstring& keyspace, const column_definition& receiver) {
    auto ctype = dynamic_pointer_cast<const collection_type_impl>(receiver.type);
    if (!ctype) {
        if (!receiver.is_counter()) {
-            throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for non counter column %s", receiver, receiver.name()));
+            throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for non counter column %s", to_string(receiver), receiver.name()));
        }
        auto v = _value->prepare(db, keyspace, receiver.column_specification);
        return make_shared<constants::subtracter>(receiver, v);
    }
    if (!ctype->is_multi_cell()) {
        throw exceptions::invalid_request_exception(
-                sprint("Invalid operation (%s) for frozen collection column %s", receiver, receiver.name()));
+                sprint("Invalid operation (%s) for frozen collection column %s", to_string(receiver), receiver.name()));
    }

    if (&ctype->_kind == &collection_type_impl::kind::list) {
@@ -150,14 +164,19 @@ operation::subtraction::is_compatible_with(shared_ptr<raw_update> other) {
    return !dynamic_pointer_cast<set_value>(other);
 }

+sstring
+operation::prepend::to_string(const column_definition& receiver) const {
+    return format("{} = {} + {}", receiver.name_as_text(), *_value, receiver.name_as_text());
+}
+
 shared_ptr<operation>
 operation::prepend::prepare(database& db, const sstring& keyspace, const column_definition& receiver) {
    auto v = _value->prepare(db, keyspace, receiver.column_specification);

    if (!dynamic_cast<const list_type_impl*>(receiver.type.get())) {
-        throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for non list column %s", receiver, receiver.name()));
+        throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for non list column %s", to_string(receiver), receiver.name()));
    } else if (!receiver.type->is_multi_cell()) {
-        throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for frozen list column %s", receiver, receiver.name()));
+        throw exceptions::invalid_request_exception(sprint("Invalid operation (%s) for frozen list column %s", to_string(receiver), receiver.name()));
    }

    return make_shared<lists::prepender>(receiver, std::move(v));
--- a/cql3/operation.hh
+++ b/cql3/operation.hh
@@ -203,6 +203,8 @@ public:
        const shared_ptr<term::raw> _selector;
        const shared_ptr<term::raw> _value;
        const bool _by_uuid;
+    private:
+        sstring to_string(const column_definition& receiver) const;
    public:
        set_element(shared_ptr<term::raw> selector, shared_ptr<term::raw> value, bool by_uuid = false)
            : _selector(std::move(selector)), _value(std::move(value)), _by_uuid(by_uuid) {
@@ -215,6 +217,8 @@ public:

    class addition : public raw_update {
        const shared_ptr<term::raw> _value;
+    private:
+        sstring to_string(const column_definition& receiver) const;
    public:
        addition(shared_ptr<term::raw> value)
                : _value(value) {
@@ -227,6 +231,8 @@ public:

    class subtraction : public raw_update {
        const shared_ptr<term::raw> _value;
+    private:
+        sstring to_string(const column_definition& receiver) const;
    public:
        subtraction(shared_ptr<term::raw> value)
                : _value(value) {
@@ -239,6 +245,8 @@ public:

    class prepend : public raw_update {
        shared_ptr<term::raw> _value;
+    private:
+        sstring to_string(const column_definition& receiver) const;
    public:
        prepend(shared_ptr<term::raw> value)
                : _value(std::move(value)) {
--- a/cql3/operator.hh
+++ b/cql3/operator.hh
@@ -71,7 +71,12 @@ private:
        , _text(std::move(text))
    {}
 public:
+    operator_type(const operator_type&) = delete;
+    operator_type& operator=(const operator_type&) = delete;
    const operator_type& reverse() const { return _reverse; }
+    bool is_slice() const {
+        return (*this == LT) || (*this == LTE) || (*this == GT) || (*this == GTE);
+    }
    sstring to_string() const { return _text; }
    bool operator==(const operator_type& other) const { return this == &other; }
    bool operator!=(const operator_type& other) const { return this != &other; }
--- a/cql3/query_options.cc
+++ b/cql3/query_options.cc
@@ -49,6 +49,23 @@ thread_local const query_options::specific_options query_options::specific_optio
 thread_local query_options query_options::DEFAULT{db::consistency_level::ONE, std::experimental::nullopt,
    std::vector<cql3::raw_value_view>(), false, query_options::specific_options::DEFAULT, cql_serialization_format::latest()};

+query_options::query_options(db::consistency_level consistency,
+                           std::experimental::optional<std::vector<sstring_view>> names,
+                           std::vector<cql3::raw_value> values,
+                           std::vector<cql3::raw_value_view> value_views,
+                           bool skip_metadata,
+                           specific_options options,
+                           cql_serialization_format sf)
+   : _consistency(consistency)
+   , _names(std::move(names))
+   , _values(std::move(values))
+   , _value_views(value_views)
+   , _skip_metadata(skip_metadata)
+   , _options(std::move(options))
+   , _cql_serialization_format(sf)
+{
+}
+
 query_options::query_options(db::consistency_level consistency,
                             std::experimental::optional<std::vector<sstring_view>> names,
                             std::vector<cql3::raw_value> values,
@@ -82,18 +99,29 @@ query_options::query_options(db::consistency_level consistency,
 {
 }

-query_options::query_options(db::consistency_level cl, std::vector<cql3::raw_value> values)
+query_options::query_options(db::consistency_level cl, std::vector<cql3::raw_value> values, specific_options options)
    : query_options(
          cl,
          {},
          std::move(values),
          false,
-          query_options::specific_options::DEFAULT,
+          std::move(options),
          cql_serialization_format::latest()
      )
 {
 }

+query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<service::pager::paging_state> paging_state)
+        : query_options(qo->_consistency,
+        std::move(qo->_names),
+        std::move(qo->_values),
+        std::move(qo->_value_views),
+        qo->_skip_metadata,
+        std::move(query_options::specific_options{qo->_options.page_size, paging_state, qo->_options.serial_consistency, qo->_options.timestamp}),
+        qo->_cql_serialization_format) {
+
+}
+
 query_options::query_options(std::vector<cql3::raw_value> values)
    : query_options(
          db::consistency_level::ONE, std::move(values))
@@ -181,19 +209,18 @@ void query_options::prepare(const std::vector<::shared_ptr<column_specification>
    }

    auto& names = *_names;
-    std::vector<cql3::raw_value> ordered_values;
+    std::vector<cql3::raw_value_view> ordered_values;
    ordered_values.reserve(specs.size());
    for (auto&& spec : specs) {
        auto& spec_name = spec->name->text();
        for (size_t j = 0; j < names.size(); j++) {
            if (names[j] == spec_name) {
-                ordered_values.emplace_back(_values[j]);
+                ordered_values.emplace_back(_value_views[j]);
                break;
            }
        }
    }
-    _values = std::move(ordered_values);
-    fill_value_views();
+    _value_views = std::move(ordered_values);
 }

 void query_options::fill_value_views()
--- a/cql3/query_options.hh
+++ b/cql3/query_options.hh
@@ -108,6 +108,13 @@ public:
                           bool skip_metadata,
                           specific_options options,
                           cql_serialization_format sf);
+    explicit query_options(db::consistency_level consistency,
+                           std::experimental::optional<std::vector<sstring_view>> names,
+                           std::vector<cql3::raw_value> values,
+                           std::vector<cql3::raw_value_view> value_views,
+                           bool skip_metadata,
+                           specific_options options,
+                           cql_serialization_format sf);
    explicit query_options(db::consistency_level consistency,
                           std::experimental::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value_view> value_views,
@@ -140,7 +147,8 @@ public:

    // forInternalUse
    explicit query_options(std::vector<cql3::raw_value> values);
-    explicit query_options(db::consistency_level, std::vector<cql3::raw_value> values);
+    explicit query_options(db::consistency_level, std::vector<cql3::raw_value> values, specific_options options = specific_options::DEFAULT);
+    explicit query_options(std::unique_ptr<query_options>, ::shared_ptr<service::pager::paging_state> paging_state);

    db::consistency_level get_consistency() const;
    cql3::raw_value_view get_value_at(size_t idx) const;
--- a/cql3/query_processor.cc
+++ b/cql3/query_processor.cc
@@ -38,19 +38,19 @@
 * You should have received a copy of the GNU General Public License
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */
-#include <seastar/core/metrics.hh>
+
+#define CRYPTOPP_ENABLE_NAMESPACE_WEAK 1

 #include "cql3/query_processor.hh"
+
+#include <cryptopp/md5.h>
+#include <seastar/core/metrics.hh>
+
 #include "cql3/CqlParser.hpp"
 #include "cql3/error_collector.hh"
 #include "cql3/statements/batch_statement.hh"
 #include "cql3/util.hh"

-#include "transport/messages/result_message.hh"
-
-#define CRYPTOPP_ENABLE_NAMESPACE_WEAK 1
-#include <cryptopp/md5.h>
-
 namespace cql3 {

 using namespace statements;
@@ -68,9 +68,8 @@ const std::chrono::minutes prepared_statements_cache::entry_expiry = std::chrono
 class query_processor::internal_state {
    service::query_state _qs;
 public:
-    internal_state()
-        : _qs(service::client_state{service::client_state::internal_tag()})
-    { }
+    internal_state() : _qs(service::client_state{service::client_state::internal_tag()}) {
+    }
    operator service::query_state&() {
        return _qs;
    }
@@ -92,74 +91,102 @@ api::timestamp_type query_processor::next_timestamp() {
    return _internal_state->next_timestamp();
 }

-query_processor::query_processor(distributed<service::storage_proxy>& proxy,
-                                 distributed<database>& db)
-    : _migration_subscriber{std::make_unique<migration_subscriber>(this)}
-    , _proxy(proxy)
-    , _db(db)
-    , _internal_state(new internal_state())
-    , _prepared_cache(prep_cache_log)
-{
+query_processor::query_processor(distributed<service::storage_proxy>& proxy, distributed<database>& db)
+        : _migration_subscriber{std::make_unique<migration_subscriber>(this)}
+        , _proxy(proxy)
+        , _db(db)
+        , _internal_state(new internal_state())
+        , _prepared_cache(prep_cache_log) {
    namespace sm = seastar::metrics;

-    _metrics.add_group("query_processor", {
-        sm::make_derive("statements_prepared", _stats.prepare_invocations,
-                        sm::description("Counts a total number of parsed CQL requests.")),
-    });
+    _metrics.add_group(
+            "query_processor",
+            {
+                    sm::make_derive(
+                            "statements_prepared",
+                            _stats.prepare_invocations,
+                            sm::description("Counts a total number of parsed CQL requests."))});

-    _metrics.add_group("cql", {
-        sm::make_derive("reads", _cql_stats.reads,
-                        sm::description("Counts a total number of CQL read requests.")),
+    _metrics.add_group(
+            "cql",
+            {
+                    sm::make_derive(
+                            "reads",
+                            _cql_stats.reads,
+                            sm::description("Counts a total number of CQL read requests.")),

-        sm::make_derive("inserts", _cql_stats.inserts,
-                        sm::description("Counts a total number of CQL INSERT requests.")),
+                    sm::make_derive(
+                            "inserts",
+                            _cql_stats.inserts,
+                            sm::description("Counts a total number of CQL INSERT requests.")),

-        sm::make_derive("updates", _cql_stats.updates,
-                        sm::description("Counts a total number of CQL UPDATE requests.")),
+                    sm::make_derive(
+                            "updates",
+                            _cql_stats.updates,
+                            sm::description("Counts a total number of CQL UPDATE requests.")),

-        sm::make_derive("deletes", _cql_stats.deletes,
-                        sm::description("Counts a total number of CQL DELETE requests.")),
+                    sm::make_derive(
+                            "deletes",
+                            _cql_stats.deletes,
+                            sm::description("Counts a total number of CQL DELETE requests.")),

-        sm::make_derive("batches", _cql_stats.batches,
-                        sm::description("Counts a total number of CQL BATCH requests.")),
+                    sm::make_derive(
+                            "batches",
+                            _cql_stats.batches,
+                            sm::description("Counts a total number of CQL BATCH requests.")),

-        sm::make_derive("statements_in_batches", _cql_stats.statements_in_batches,
-                        sm::description("Counts a total number of sub-statements in CQL BATCH requests.")),
+                    sm::make_derive(
+                            "statements_in_batches",
+                            _cql_stats.statements_in_batches,
+                            sm::description("Counts a total number of sub-statements in CQL BATCH requests.")),

-        sm::make_derive("batches_pure_logged", _cql_stats.batches_pure_logged,
-                        sm::description("Counts a total number of LOGGED batches that were executed as LOGGED batches.")),
+                    sm::make_derive(
+                            "batches_pure_logged",
+                            _cql_stats.batches_pure_logged,
+                            sm::description(
+                                    "Counts a total number of LOGGED batches that were executed as LOGGED batches.")),

-        sm::make_derive("batches_pure_unlogged", _cql_stats.batches_pure_unlogged,
-                        sm::description("Counts a total number of UNLOGGED batches that were executed as UNLOGGED batches.")),
+                    sm::make_derive(
+                            "batches_pure_unlogged",
+                            _cql_stats.batches_pure_unlogged,
+                            sm::description(
+                                    "Counts a total number of UNLOGGED batches that were executed as UNLOGGED "
+                                    "batches.")),

-        sm::make_derive("batches_unlogged_from_logged", _cql_stats.batches_unlogged_from_logged,
-                        sm::description("Counts a total number of LOGGED batches that were executed as UNLOGGED batches.")),
+                    sm::make_derive(
+                            "batches_unlogged_from_logged",
+                            _cql_stats.batches_unlogged_from_logged,
+                            sm::description("Counts a total number of LOGGED batches that were executed as UNLOGGED "
+                                            "batches.")),

-        sm::make_derive("prepared_cache_evictions", [] { return prepared_statements_cache::shard_stats().prepared_cache_evictions; },
-                        sm::description("Counts a number of prepared statements cache entries evictions.")),
+                    sm::make_derive(
+                            "prepared_cache_evictions",
+                            [] { return prepared_statements_cache::shard_stats().prepared_cache_evictions; },
+                            sm::description("Counts a number of prepared statements cache entries evictions.")),

-        sm::make_gauge("prepared_cache_size", [this] { return _prepared_cache.size(); },
-                        sm::description("A number of entries in the prepared statements cache.")),
+                    sm::make_gauge(
+                            "prepared_cache_size",
+                            [this] { return _prepared_cache.size(); },
+                            sm::description("A number of entries in the prepared statements cache.")),

-        sm::make_gauge("prepared_cache_memory_footprint", [this] { return _prepared_cache.memory_footprint(); },
-                        sm::description("Size (in bytes) of the prepared statements cache.")),
-    });
+                    sm::make_gauge(
+                            "prepared_cache_memory_footprint",
+                            [this] { return _prepared_cache.memory_footprint(); },
+                            sm::description("Size (in bytes) of the prepared statements cache."))});

    service::get_local_migration_manager().register_listener(_migration_subscriber.get());
 }

-query_processor::~query_processor()
-{}
+query_processor::~query_processor() {
+}

-future<> query_processor::stop()
-{
+future<> query_processor::stop() {
    service::get_local_migration_manager().unregister_listener(_migration_subscriber.get());
    return make_ready_future<>();
 }

 future<::shared_ptr<result_message>>
-query_processor::process(const sstring_view& query_string, service::query_state& query_state, query_options& options)
-{
+query_processor::process(const sstring_view& query_string, service::query_state& query_state, query_options& options) {
    log.trace("process: \"{}\"", query_string);
    tracing::trace(query_state.get_trace_state(), "Parsing a statement");
    auto p = get_statement(query_string, query_state.get_client_state());
@@ -179,14 +206,10 @@ query_processor::process(const sstring_view& query_string, service::query_state&
 }

 future<::shared_ptr<result_message>>
-query_processor::process_statement(::shared_ptr<cql_statement> statement,
-                                   service::query_state& query_state,
-                                   const query_options& options)
-{
-#if 0
-        logger.trace("Process {} @CL.{}", statement, options.getConsistency());
-#endif
-
+query_processor::process_statement(
+        ::shared_ptr<cql_statement> statement,
+        service::query_state& query_state,
+        const query_options& options) {
    return statement->check_access(query_state.get_client_state()).then([this, statement, &query_state, &options]() {
        auto& client_state = query_state.get_client_state();

@@ -210,38 +233,50 @@ query_processor::process_statement(::shared_ptr<cql_statement> statement,
 }

 future<::shared_ptr<cql_transport::messages::result_message::prepared>>
-query_processor::prepare(sstring query_string, service::query_state& query_state)
-{
+query_processor::prepare(sstring query_string, service::query_state& query_state) {
    auto& client_state = query_state.get_client_state();
    return prepare(std::move(query_string), client_state, client_state.is_thrift());
 }

 future<::shared_ptr<cql_transport::messages::result_message::prepared>>
-query_processor::prepare(sstring query_string, const service::client_state& client_state, bool for_thrift)
-{
+query_processor::prepare(sstring query_string, const service::client_state& client_state, bool for_thrift) {
    using namespace cql_transport::messages;
    if (for_thrift) {
-        return prepare_one<result_message::prepared::thrift>(std::move(query_string), client_state, compute_thrift_id, prepared_cache_key_type::thrift_id);
+        return prepare_one<result_message::prepared::thrift>(
+                std::move(query_string),
+                client_state,
+                compute_thrift_id, prepared_cache_key_type::thrift_id);
    } else {
-        return prepare_one<result_message::prepared::cql>(std::move(query_string), client_state, compute_id, prepared_cache_key_type::cql_id);
+        return prepare_one<result_message::prepared::cql>(
+                std::move(query_string),
+                client_state,
+                compute_id,
+                prepared_cache_key_type::cql_id);
    }
 }

 ::shared_ptr<cql_transport::messages::result_message::prepared>
-query_processor::get_stored_prepared_statement(const std::experimental::string_view& query_string,
-                                               const sstring& keyspace,
-                                               bool for_thrift)
-{
+query_processor::get_stored_prepared_statement(
+        const std::experimental::string_view& query_string,
+        const sstring& keyspace,
+        bool for_thrift) {
    using namespace cql_transport::messages;
    if (for_thrift) {
-        return get_stored_prepared_statement_one<result_message::prepared::thrift>(query_string, keyspace, compute_thrift_id, prepared_cache_key_type::thrift_id);
+        return get_stored_prepared_statement_one<result_message::prepared::thrift>(
+                query_string,
+                keyspace,
+                compute_thrift_id,
+                prepared_cache_key_type::thrift_id);
    } else {
-        return get_stored_prepared_statement_one<result_message::prepared::cql>(query_string, keyspace, compute_id, prepared_cache_key_type::cql_id);
+        return get_stored_prepared_statement_one<result_message::prepared::cql>(
+                query_string,
+                keyspace,
+                compute_id,
+                prepared_cache_key_type::cql_id);
    }
 }

-static bytes md5_calculate(const std::experimental::string_view& s)
-{
+static bytes md5_calculate(const std::experimental::string_view& s) {
    constexpr size_t size = CryptoPP::Weak1::MD5::DIGESTSIZE;
    CryptoPP::Weak::MD5 hash;
    unsigned char digest[size];
@@ -253,13 +288,15 @@ static sstring hash_target(const std::experimental::string_view& query_string, c
    return keyspace + query_string.to_string();
 }

-prepared_cache_key_type query_processor::compute_id(const std::experimental::string_view& query_string, const sstring& keyspace)
-{
+prepared_cache_key_type query_processor::compute_id(
+        const std::experimental::string_view& query_string,
+        const sstring& keyspace) {
    return prepared_cache_key_type(md5_calculate(hash_target(query_string, keyspace)));
 }

-prepared_cache_key_type query_processor::compute_thrift_id(const std::experimental::string_view& query_string, const sstring& keyspace)
-{
+prepared_cache_key_type query_processor::compute_thrift_id(
+        const std::experimental::string_view& query_string,
+        const sstring& keyspace) {
    auto target = hash_target(query_string, keyspace);
    uint32_t h = 0;
    for (auto&& c : hash_target(query_string, keyspace)) {
@@ -269,11 +306,7 @@ prepared_cache_key_type query_processor::compute_thrift_id(const std::experiment
 }

 std::unique_ptr<prepared_statement>
-query_processor::get_statement(const sstring_view& query, const service::client_state& client_state)
-{
-#if 0
-        Tracing.trace("Parsing {}", queryStr);
-#endif
+query_processor::get_statement(const sstring_view& query, const service::client_state& client_state) {
    ::shared_ptr<raw::parsed_statement> statement = parse_statement(query);

    // Set keyspace for statement that require login
@@ -281,16 +314,12 @@ query_processor::get_statement(const sstring_view& query, const service::client_
    if (cf_stmt) {
        cf_stmt->prepare_keyspace(client_state);
    }
-#if 0
-        Tracing.trace("Preparing statement");
-#endif
    ++_stats.prepare_invocations;
    return statement->prepare(_db.local(), _cql_stats);
 }

 ::shared_ptr<raw::parsed_statement>
-query_processor::parse_statement(const sstring_view& query)
-{
+query_processor::parse_statement(const sstring_view& query) {
    try {
        auto statement = util::do_with_parser(query,  std::mem_fn(&cql3_parser::CqlParser::query));
        if (!statement) {
@@ -307,12 +336,14 @@ query_processor::parse_statement(const sstring_view& query)
    }
 }

-query_options query_processor::make_internal_options(const statements::prepared_statement::checked_weak_ptr& p,
-                                                     const std::initializer_list<data_value>& values,
-                                                     db::consistency_level cl)
-{
+query_options query_processor::make_internal_options(
+        const statements::prepared_statement::checked_weak_ptr& p,
+        const std::initializer_list<data_value>& values,
+        db::consistency_level cl,
+        int32_t page_size) {
    if (p->bound_names.size() != values.size()) {
-        throw std::invalid_argument(sprint("Invalid number of values. Expecting %d but got %d", p->bound_names.size(), values.size()));
+        throw std::invalid_argument(
+                sprint("Invalid number of values. Expecting %d but got %d", p->bound_names.size(), values.size()));
    }
    auto ni = p->bound_names.begin();
    std::vector<cql3::raw_value> bound_values;
@@ -326,11 +357,19 @@ query_options query_processor::make_internal_options(const statements::prepared_
            bound_values.push_back(cql3::raw_value::make_value(n->type->decompose(v)));
        }
    }
+    if (page_size > 0) {
+        ::shared_ptr<service::pager::paging_state> paging_state;
+        db::consistency_level serial_consistency = db::consistency_level::SERIAL;
+        api::timestamp_type ts = api::missing_timestamp;
+        return query_options(
+                cl,
+                bound_values,
+                cql3::query_options::specific_options{page_size, std::move(paging_state), serial_consistency, ts});
+    }
    return query_options(cl, bound_values);
 }

-statements::prepared_statement::checked_weak_ptr query_processor::prepare_internal(const sstring& query_string)
-{
+statements::prepared_statement::checked_weak_ptr query_processor::prepare_internal(const sstring& query_string) {
    auto& p = _internal_statements[query_string];
    if (p == nullptr) {
        auto np = parse_statement(query_string)->prepare(_db.local(), _cql_stats);
@@ -341,33 +380,128 @@ statements::prepared_statement::checked_weak_ptr query_processor::prepare_intern
 }

 future<::shared_ptr<untyped_result_set>>
-query_processor::execute_internal(const sstring& query_string,
-                                  const std::initializer_list<data_value>& values)
-{
+query_processor::execute_internal(const sstring& query_string, const std::initializer_list<data_value>& values) {
    if (log.is_enabled(logging::log_level::trace)) {
        log.trace("execute_internal: \"{}\" ({})", query_string, ::join(", ", values));
    }
    return execute_internal(prepare_internal(query_string), values);
 }

+struct internal_query_state {
+    sstring query_string;
+    std::unique_ptr<query_options> opts;
+    statements::prepared_statement::checked_weak_ptr p;
+    bool more_results = true;
+};
+
+::shared_ptr<internal_query_state> query_processor::create_paged_state(const sstring& query_string,
+        const std::initializer_list<data_value>& values, int32_t page_size) {
+    auto p = prepare_internal(query_string);
+    auto opts = make_internal_options(p, values, db::consistency_level::ONE, page_size);
+    ::shared_ptr<internal_query_state> res = ::make_shared<internal_query_state>(
+            internal_query_state{
+                    query_string,
+                    std::make_unique<cql3::query_options>(std::move(opts)), std::move(p),
+                    true});
+    return res;
+}
+
+bool query_processor::has_more_results(::shared_ptr<cql3::internal_query_state> state) const {
+    if (state) {
+        return state->more_results;
+    }
+    return false;
+}
+
+future<> query_processor::for_each_cql_result(
+        ::shared_ptr<cql3::internal_query_state> state,
+        std::function<stop_iteration(const cql3::untyped_result_set::row&)>&& f) {
+    return do_with(seastar::shared_ptr<bool>(), [f, this, state](auto& is_done) mutable {
+        is_done = seastar::make_shared<bool>(false);
+
+        auto stop_when = [is_done]() {
+            return *is_done;
+        };
+        auto do_resuls = [is_done, state, f, this]() mutable {
+            return this->execute_paged_internal(
+                    state).then([is_done, state, f, this](::shared_ptr<cql3::untyped_result_set> msg) mutable {
+                if (msg->empty()) {
+                    *is_done = true;
+                } else {
+                    if (!this->has_more_results(state)) {
+                        *is_done = true;
+                    }
+                    for (auto& row : *msg) {
+                        if (f(row) == stop_iteration::yes) {
+                            *is_done = true;
+                            break;
+                        }
+                    }
+                }
+            });
+        };
+        return do_until(stop_when, do_resuls);
+    });
+}
+
 future<::shared_ptr<untyped_result_set>>
-query_processor::execute_internal(statements::prepared_statement::checked_weak_ptr p,
-                                  const std::initializer_list<data_value>& values)
-{
-    auto opts = make_internal_options(p, values);
+query_processor::execute_paged_internal(::shared_ptr<internal_query_state> state) {
+    return state->p->statement->execute_internal(_proxy, *_internal_state, *state->opts).then(
+            [state, this](::shared_ptr<cql_transport::messages::result_message> msg) mutable {
+        class visitor : public result_message::visitor_base {
+            ::shared_ptr<internal_query_state> _state;
+            query_processor& _qp;
+        public:
+            visitor(::shared_ptr<internal_query_state> state, query_processor& qp) : _state(state), _qp(qp) {
+            }
+            virtual ~visitor() = default;
+            void visit(const result_message::rows& rmrs) override {
+                auto& rs = rmrs.rs();
+                if (rs.get_metadata().paging_state()) {
+                    bool done = !rs.get_metadata().flags().contains<cql3::metadata::flag::HAS_MORE_PAGES>();
+
+                    if (done) {
+                        _state->more_results = false;
+                    } else {
+                        const service::pager::paging_state& st = *rs.get_metadata().paging_state();
+                        shared_ptr<service::pager::paging_state> shrd = ::make_shared<service::pager::paging_state>(st);
+                        _state->opts = std::make_unique<query_options>(std::move(_state->opts), shrd);
+                        _state->p = _qp.prepare_internal(_state->query_string);
+                    }
+                } else {
+                    _state->more_results = false;
+                }
+            }
+        };
+        visitor v(state, *this);
+        if (msg != nullptr) {
+            msg->accept(v);
+        }
+        return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
+    });
+}
+
+future<::shared_ptr<untyped_result_set>>
+query_processor::execute_internal(
+        statements::prepared_statement::checked_weak_ptr p,
+        const std::initializer_list<data_value>& values) {
+    query_options opts = make_internal_options(p, values);
    return do_with(std::move(opts), [this, p = std::move(p)](auto& opts) {
-        return p->statement->execute_internal(_proxy, *_internal_state, opts).then([stmt = p->statement](auto msg) {
+        return p->statement->execute_internal(
+                _proxy,
+                *_internal_state,
+                opts).then([&opts, stmt = p->statement](auto msg) {
            return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
        });
    });
 }

 future<::shared_ptr<untyped_result_set>>
-query_processor::process(const sstring& query_string,
-                         db::consistency_level cl,
-                         const std::initializer_list<data_value>& values,
-                         bool cache)
-{
+query_processor::process(
+        const sstring& query_string,
+        db::consistency_level cl,
+        const std::initializer_list<data_value>& values,
+        bool cache) {
    if (cache) {
        return process(prepare_internal(query_string), cl, values);
    } else {
@@ -379,10 +513,10 @@ query_processor::process(const sstring& query_string,
 }

 future<::shared_ptr<untyped_result_set>>
-query_processor::process(statements::prepared_statement::checked_weak_ptr p,
-                         db::consistency_level cl,
-                         const std::initializer_list<data_value>& values)
-{
+query_processor::process(
+        statements::prepared_statement::checked_weak_ptr p,
+        db::consistency_level cl,
+        const std::initializer_list<data_value>& values) {
    auto opts = make_internal_options(p, values, cl);
    return do_with(std::move(opts), [this, p = std::move(p)](auto & opts) {
        return p->statement->execute(_proxy, *_internal_state, opts).then([](auto msg) {
@@ -392,10 +526,10 @@ query_processor::process(statements::prepared_statement::checked_weak_ptr p,
 }

 future<::shared_ptr<cql_transport::messages::result_message>>
-query_processor::process_batch(::shared_ptr<statements::batch_statement> batch,
-                               service::query_state& query_state,
-                               query_options& options)
-{
+query_processor::process_batch(
+        ::shared_ptr<statements::batch_statement> batch,
+        service::query_state& query_state,
+        query_options& options) {
    return batch->check_access(query_state.get_client_state()).then([this, &query_state, &options, batch] {
        batch->validate();
        batch->validate(_proxy, query_state.get_client_state());
@@ -403,101 +537,90 @@ query_processor::process_batch(::shared_ptr<statements::batch_statement> batch,
    });
 }

-query_processor::migration_subscriber::migration_subscriber(query_processor* qp)
-    : _qp{qp}
-{
+query_processor::migration_subscriber::migration_subscriber(query_processor* qp) : _qp{qp} {
 }

-void query_processor::migration_subscriber::on_create_keyspace(const sstring& ks_name)
-{
+void query_processor::migration_subscriber::on_create_keyspace(const sstring& ks_name) {
 }

-void query_processor::migration_subscriber::on_create_column_family(const sstring& ks_name, const sstring& cf_name)
-{
+void query_processor::migration_subscriber::on_create_column_family(const sstring& ks_name, const sstring& cf_name) {
 }

-void query_processor::migration_subscriber::on_create_user_type(const sstring& ks_name, const sstring& type_name)
-{
+void query_processor::migration_subscriber::on_create_user_type(const sstring& ks_name, const sstring& type_name) {
 }

-void query_processor::migration_subscriber::on_create_function(const sstring& ks_name, const sstring& function_name)
-{
+void query_processor::migration_subscriber::on_create_function(const sstring& ks_name, const sstring& function_name) {
    log.warn("{} event ignored", __func__);
 }

-void query_processor::migration_subscriber::on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name)
-{
+void query_processor::migration_subscriber::on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) {
    log.warn("{} event ignored", __func__);
 }

-void query_processor::migration_subscriber::on_create_view(const sstring& ks_name, const sstring& view_name)
-{
+void query_processor::migration_subscriber::on_create_view(const sstring& ks_name, const sstring& view_name) {
 }

-void query_processor::migration_subscriber::on_update_keyspace(const sstring& ks_name)
-{
+void query_processor::migration_subscriber::on_update_keyspace(const sstring& ks_name) {
 }

-void query_processor::migration_subscriber::on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed)
-{
+void query_processor::migration_subscriber::on_update_column_family(
+        const sstring& ks_name,
+        const sstring& cf_name,
+        bool columns_changed) {
    // #1255: Ignoring columns_changed deliberately.
    log.info("Column definitions for {}.{} changed, invalidating related prepared statements", ks_name, cf_name);
    remove_invalid_prepared_statements(ks_name, cf_name);
 }

-void query_processor::migration_subscriber::on_update_user_type(const sstring& ks_name, const sstring& type_name)
-{
+void query_processor::migration_subscriber::on_update_user_type(const sstring& ks_name, const sstring& type_name) {
 }

-void query_processor::migration_subscriber::on_update_function(const sstring& ks_name, const sstring& function_name)
-{
+void query_processor::migration_subscriber::on_update_function(const sstring& ks_name, const sstring& function_name) {
 }

-void query_processor::migration_subscriber::on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name)
-{
+void query_processor::migration_subscriber::on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) {
 }

-void query_processor::migration_subscriber::on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed)
-{
+void query_processor::migration_subscriber::on_update_view(
+        const sstring& ks_name,
+        const sstring& view_name, bool columns_changed) {
 }

-void query_processor::migration_subscriber::on_drop_keyspace(const sstring& ks_name)
-{
+void query_processor::migration_subscriber::on_drop_keyspace(const sstring& ks_name) {
    remove_invalid_prepared_statements(ks_name, std::experimental::nullopt);
 }

-void query_processor::migration_subscriber::on_drop_column_family(const sstring& ks_name, const sstring& cf_name)
-{
+void query_processor::migration_subscriber::on_drop_column_family(const sstring& ks_name, const sstring& cf_name) {
    remove_invalid_prepared_statements(ks_name, cf_name);
 }

-void query_processor::migration_subscriber::on_drop_user_type(const sstring& ks_name, const sstring& type_name)
-{
+void query_processor::migration_subscriber::on_drop_user_type(const sstring& ks_name, const sstring& type_name) {
 }

-void query_processor::migration_subscriber::on_drop_function(const sstring& ks_name, const sstring& function_name)
-{
+void query_processor::migration_subscriber::on_drop_function(const sstring& ks_name, const sstring& function_name) {
    log.warn("{} event ignored", __func__);
 }

-void query_processor::migration_subscriber::on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name)
-{
+void query_processor::migration_subscriber::on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) {
    log.warn("{} event ignored", __func__);
 }

-void query_processor::migration_subscriber::on_drop_view(const sstring& ks_name, const sstring& view_name)
-{
+void query_processor::migration_subscriber::on_drop_view(const sstring& ks_name, const sstring& view_name) {
+    remove_invalid_prepared_statements(ks_name, view_name);
 }

-void query_processor::migration_subscriber::remove_invalid_prepared_statements(sstring ks_name, std::experimental::optional<sstring> cf_name)
-{
+void query_processor::migration_subscriber::remove_invalid_prepared_statements(
+        sstring ks_name,
+        std::experimental::optional<sstring> cf_name) {
    _qp->_prepared_cache.remove_if([&] (::shared_ptr<cql_statement> stmt) {
        return this->should_invalidate(ks_name, cf_name, stmt);
    });
 }

-bool query_processor::migration_subscriber::should_invalidate(sstring ks_name, std::experimental::optional<sstring> cf_name, ::shared_ptr<cql_statement> statement)
-{
+bool query_processor::migration_subscriber::should_invalidate(
+        sstring ks_name,
+        std::experimental::optional<sstring> cf_name,
+        ::shared_ptr<cql_statement> statement) {
    return statement->depends_on_keyspace(ks_name) && (!cf_name || statement->depends_on_column_family(*cf_name));
 }

--- a/cql3/query_processor.hh
+++ b/cql3/query_processor.hh
@@ -43,21 +43,22 @@

 #include <experimental/string_view>
 #include <unordered_map>
-#include <seastar/core/metrics_registration.hh>

-#include "core/shared_ptr.hh"
-#include "exceptions/exceptions.hh"
+#include <seastar/core/distributed.hh>
+#include <seastar/core/metrics_registration.hh>
+#include <seastar/core/shared_ptr.hh>
+
+#include "cql3/prepared_statements_cache.hh"
 #include "cql3/query_options.hh"
+#include "cql3/statements/prepared_statement.hh"
 #include "cql3/statements/raw/parsed_statement.hh"
 #include "cql3/statements/raw/cf_statement.hh"
+#include "cql3/untyped_result_set.hh"
+#include "exceptions/exceptions.hh"
+#include "log.hh"
 #include "service/migration_manager.hh"
 #include "service/query_state.hh"
-#include "log.hh"
-#include "core/distributed.hh"
-#include "statements/prepared_statement.hh"
 #include "transport/messages/result_message.hh"
-#include "untyped_result_set.hh"
-#include "prepared_statements_cache.hh"

 namespace cql3 {

@@ -65,14 +66,22 @@ namespace statements {
 class batch_statement;
 }

-class prepared_statement_is_too_big : public std::exception {
-public:
-    static constexpr int max_query_prefix = 100;
+class untyped_result_set;
+class untyped_result_set_row;

-private:
+/*!
+ * \brief to allow paging, holds
+ * internal state, that needs to be passed to the execute statement.
+ *
+ */
+struct internal_query_state;
+
+class prepared_statement_is_too_big : public std::exception {
    sstring _msg;

 public:
+    static constexpr int max_query_prefix = 100;
+
    prepared_statement_is_too_big(const sstring& query_string)
        : _msg(seastar::format("Prepared statement is too big: {}", query_string.substr(0, max_query_prefix)))
    {
@@ -107,15 +116,33 @@ private:
    class internal_state;
    std::unique_ptr<internal_state> _internal_state;

-public:
-    query_processor(distributed<service::storage_proxy>& proxy, distributed<database>& db);
-    ~query_processor();
+    prepared_statements_cache _prepared_cache;

+    // A map for prepared statements used internally (which we don't want to mix with user statement, in particular we
+    // don't bother with expiration on those.
+    std::unordered_map<sstring, std::unique_ptr<statements::prepared_statement>> _internal_statements;
+
+public:
    static const sstring CQL_VERSION;

+    static prepared_cache_key_type compute_id(
+            const std::experimental::string_view& query_string,
+            const sstring& keyspace);
+
+    static prepared_cache_key_type compute_thrift_id(
+            const std::experimental::string_view& query_string,
+            const sstring& keyspace);
+
+    static ::shared_ptr<statements::raw::parsed_statement> parse_statement(const std::experimental::string_view& query);
+
+    query_processor(distributed<service::storage_proxy>& proxy, distributed<database>& db);
+
+    ~query_processor();
+
    distributed<database>& db() {
        return _db;
    }
+
    distributed<service::storage_proxy>& proxy() {
        return _proxy;
    }
@@ -124,125 +151,6 @@ public:
        return _cql_stats;
    }

-#if 0
-    public static final QueryProcessor instance = new QueryProcessor();
-#endif
-private:
-#if 0
-    private static final Logger logger = LoggerFactory.getLogger(QueryProcessor.class);
-    private static final MemoryMeter meter = new MemoryMeter().withGuessing(MemoryMeter.Guess.FALLBACK_BEST).ignoreKnownSingletons();
-    private static final long MAX_CACHE_PREPARED_MEMORY = Runtime.getRuntime().maxMemory() / 256;
-
-    private static EntryWeigher<MD5Digest, ParsedStatement.Prepared> cqlMemoryUsageWeigher = new EntryWeigher<MD5Digest, ParsedStatement.Prepared>()
-    {
-        @Override
-        public int weightOf(MD5Digest key, ParsedStatement.Prepared value)
-        {
-            return Ints.checkedCast(measure(key) + measure(value.statement) + measure(value.boundNames));
-        }
-    };
-
-    private static EntryWeigher<Integer, ParsedStatement.Prepared> thriftMemoryUsageWeigher = new EntryWeigher<Integer, ParsedStatement.Prepared>()
-    {
-        @Override
-        public int weightOf(Integer key, ParsedStatement.Prepared value)
-        {
-            return Ints.checkedCast(measure(key) + measure(value.statement) + measure(value.boundNames));
-        }
-    };
-#endif
-    prepared_statements_cache _prepared_cache;
-    std::unordered_map<sstring, std::unique_ptr<statements::prepared_statement>> _internal_statements;
-#if 0
-
-    // A map for prepared statements used internally (which we don't want to mix with user statement, in particular we don't
-    // bother with expiration on those.
-    private static final ConcurrentMap<String, ParsedStatement.Prepared> internalStatements = new ConcurrentHashMap<>();
-
-    // Direct calls to processStatement do not increment the preparedStatementsExecuted/regularStatementsExecuted
-    // counters. Callers of processStatement are responsible for correctly notifying metrics
-    public static final CQLMetrics metrics = new CQLMetrics();
-
-    private static final AtomicInteger lastMinuteEvictionsCount = new AtomicInteger(0);
-
-    static
-    {
-        preparedStatements = new ConcurrentLinkedHashMap.Builder<MD5Digest, ParsedStatement.Prepared>()
-                             .maximumWeightedCapacity(MAX_CACHE_PREPARED_MEMORY)
-                             .weigher(cqlMemoryUsageWeigher)
-                             .listener(new EvictionListener<MD5Digest, ParsedStatement.Prepared>()
-                             {
-                                 public void onEviction(MD5Digest md5Digest, ParsedStatement.Prepared prepared)
-                                 {
-                                     metrics.preparedStatementsEvicted.inc();
-                                     lastMinuteEvictionsCount.incrementAndGet();
-                                 }
-                             }).build();
-
-        thriftPreparedStatements = new ConcurrentLinkedHashMap.Builder<Integer, ParsedStatement.Prepared>()
-                                   .maximumWeightedCapacity(MAX_CACHE_PREPARED_MEMORY)
-                                   .weigher(thriftMemoryUsageWeigher)
-                                   .listener(new EvictionListener<Integer, ParsedStatement.Prepared>()
-                                   {
-                                       public void onEviction(Integer integer, ParsedStatement.Prepared prepared)
-                                       {
-                                           metrics.preparedStatementsEvicted.inc();
-                                           lastMinuteEvictionsCount.incrementAndGet();
-                                       }
-                                   })
-                                   .build();
-
-        ScheduledExecutors.scheduledTasks.scheduleAtFixedRate(new Runnable()
-        {
-            public void run()
-            {
-                long count = lastMinuteEvictionsCount.getAndSet(0);
-                if (count > 0)
-                    logger.info("{} prepared statements discarded in the last minute because cache limit reached ({} bytes)",
-                                count,
-                                MAX_CACHE_PREPARED_MEMORY);
-            }
-        }, 1, 1, TimeUnit.MINUTES);
-    }
-
-    public static int preparedStatementsCount()
-    {
-        return preparedStatements.size() + thriftPreparedStatements.size();
-    }
-
-    // Work around initialization dependency
-    private static enum InternalStateInstance
-    {
-        INSTANCE;
-
-        private final QueryState queryState;
-
-        InternalStateInstance()
-        {
-            ClientState state = ClientState.forInternalCalls();
-            try
-            {
-                state.setKeyspace(SystemKeyspace.NAME);
-            }
-            catch (InvalidRequestException e)
-            {
-                throw new RuntimeException();
-            }
-            this.queryState = new QueryState(state);
-        }
-    }
-
-    private static QueryState internalQueryState()
-    {
-        return InternalStateInstance.INSTANCE.queryState;
-    }
-
-    private QueryProcessor()
-    {
-        MigrationManager.instance.register(new MigrationSubscriber());
-    }
-#endif
-public:
    statements::prepared_statement::checked_weak_ptr get_prepared(const prepared_cache_key_type& key) {
        auto it = _prepared_cache.find(key);
        if (it == _prepared_cache.end()) {
@@ -251,128 +159,69 @@ public:
        return *it;
    }

-#if 0
-    public static void validateKey(ByteBuffer key) throws InvalidRequestException
-    {
-        if (key == null || key.remaining() == 0)
-        {
-            throw new InvalidRequestException("Key may not be empty");
-        }
+    future<::shared_ptr<cql_transport::messages::result_message>>
+    process_statement(
+            ::shared_ptr<cql_statement> statement,
+            service::query_state& query_state,
+            const query_options& options);

-        // check that key can be handled by FBUtilities.writeShortByteArray
-        if (key.remaining() > FBUtilities.MAX_UNSIGNED_SHORT)
-        {
-            throw new InvalidRequestException("Key length of " + key.remaining() +
-                                              " is longer than maximum of " + FBUtilities.MAX_UNSIGNED_SHORT);
-        }
-    }
+    future<::shared_ptr<cql_transport::messages::result_message>>
+    process(
+            const std::experimental::string_view& query_string,
+            service::query_state& query_state,
+            query_options& options);

-    public static void validateCellNames(Iterable<CellName> cellNames, CellNameType type) throws InvalidRequestException
-    {
-        for (CellName name : cellNames)
-            validateCellName(name, type);
-    }
-
-    public static void validateCellName(CellName name, CellNameType type) throws InvalidRequestException
-    {
-        validateComposite(name, type);
-        if (name.isEmpty())
-            throw new InvalidRequestException("Invalid empty value for clustering column of COMPACT TABLE");
-    }
-
-    public static void validateComposite(Composite name, CType type) throws InvalidRequestException
-    {
-        long serializedSize = type.serializer().serializedSize(name, TypeSizes.NATIVE);
-        if (serializedSize > Cell.MAX_NAME_LENGTH)
-            throw new InvalidRequestException(String.format("The sum of all clustering columns is too long (%s > %s)",
-                                                            serializedSize,
-                                                            Cell.MAX_NAME_LENGTH));
-    }
-#endif
-public:
-    future<::shared_ptr<cql_transport::messages::result_message>> process_statement(::shared_ptr<cql_statement> statement,
-            service::query_state& query_state, const query_options& options);
-
-#if 0
-    public static ResultMessage process(String queryString, ConsistencyLevel cl, QueryState queryState)
-    throws RequestExecutionException, RequestValidationException
-    {
-        return instance.process(queryString, queryState, QueryOptions.forInternalCalls(cl, Collections.<ByteBuffer>emptyList()));
-    }
-#endif
-
-    future<::shared_ptr<cql_transport::messages::result_message>> process(const std::experimental::string_view& query_string,
-            service::query_state& query_state, query_options& options);
-
-#if 0
-    public static ParsedStatement.Prepared parseStatement(String queryStr, QueryState queryState) throws RequestValidationException
-    {
-        return getStatement(queryStr, queryState.getClientState());
-    }
-
-    public static UntypedResultSet process(String query, ConsistencyLevel cl) throws RequestExecutionException
-    {
-        try
-        {
-            ResultMessage result = instance.process(query, QueryState.forInternalCalls(), QueryOptions.forInternalCalls(cl, Collections.<ByteBuffer>emptyList()));
-            if (result instanceof ResultMessage.Rows)
-                return UntypedResultSet.create(((ResultMessage.Rows)result).result);
-            else
-                return null;
-        }
-        catch (RequestValidationException e)
-        {
-            throw new RuntimeException(e);
-        }
-    }
-
-    private static QueryOptions makeInternalOptions(ParsedStatement.Prepared prepared, Object[] values)
-    {
-        if (prepared.boundNames.size() != values.length)
-            throw new IllegalArgumentException(String.format("Invalid number of values. Expecting %d but got %d", prepared.boundNames.size(), values.length));
-
-        List<ByteBuffer> boundValues = new ArrayList<ByteBuffer>(values.length);
-        for (int i = 0; i < values.length; i++)
-        {
-            Object value = values[i];
-            AbstractType type = prepared.boundNames.get(i).type;
-            boundValues.add(value instanceof ByteBuffer || value == null ? (ByteBuffer)value : type.decompose(value));
-        }
-        return QueryOptions.forInternalCalls(boundValues);
-    }
-
-    private static ParsedStatement.Prepared prepareInternal(String query) throws RequestValidationException
-    {
-        ParsedStatement.Prepared prepared = internalStatements.get(query);
-        if (prepared != null)
-            return prepared;
-
-        // Note: if 2 threads prepare the same query, we'll live so don't bother synchronizing
-        prepared = parseStatement(query, internalQueryState());
-        prepared.statement.validate(internalQueryState().getClientState());
-        internalStatements.putIfAbsent(query, prepared);
-        return prepared;
-    }
-#endif
-private:
-    query_options make_internal_options(const statements::prepared_statement::checked_weak_ptr& p, const std::initializer_list<data_value>&, db::consistency_level = db::consistency_level::ONE);
-public:
-    future<::shared_ptr<untyped_result_set>> execute_internal(
-            const sstring& query_string,
-            const std::initializer_list<data_value>& = { });
+    future<::shared_ptr<untyped_result_set>>
+    execute_internal(const sstring& query_string, const std::initializer_list<data_value>& = { });

    statements::prepared_statement::checked_weak_ptr prepare_internal(const sstring& query);

-    future<::shared_ptr<untyped_result_set>> execute_internal(
-            statements::prepared_statement::checked_weak_ptr p,
-            const std::initializer_list<data_value>& = { });
+    future<::shared_ptr<untyped_result_set>>
+    execute_internal(statements::prepared_statement::checked_weak_ptr p, const std::initializer_list<data_value>& = { });
+
+    /*!
+     * \brief iterate over all cql results using paging
+     *
+     * You Create a statement with optional paraemter and pass
+     * a function that goes over the results.
+     *
+     * The passed function would be called for all the results, return stop_iteration::yes
+     * to stop during iteration.
+     *
+     * For example:
+            return query("SELECT * from system.compaction_history",
+                         [&history] (const cql3::untyped_result_set::row& row) mutable {
+                ....
+                ....
+                return stop_iteration::no;
+            });
+
+     * You can use place holder in the query, the prepared statement will only be done once.
+     *
+     *
+     * query_string - the cql string, can contain place holder
+     * f - a function to be run on each of the query result, if the function return false the iteration would stop
+     * args - arbitrary number of query parameters
+     */
+    template<typename... Args>
+    future<> query(
+            const sstring& query_string,
+            std::function<stop_iteration(const cql3::untyped_result_set_row&)>&& f,
+            Args&&... args) {
+        return for_each_cql_result(
+                create_paged_state(query_string, { data_value(std::forward<Args>(args))... }), std::move(f));
+    }

    future<::shared_ptr<untyped_result_set>> process(
-                    const sstring& query_string,
-                    db::consistency_level, const std::initializer_list<data_value>& = { }, bool cache = false);
+            const sstring& query_string,
+            db::consistency_level,
+            const std::initializer_list<data_value>& = { },
+            bool cache = false);
+
    future<::shared_ptr<untyped_result_set>> process(
-                    statements::prepared_statement::checked_weak_ptr p,
-                    db::consistency_level, const std::initializer_list<data_value>& = { });
+            statements::prepared_statement::checked_weak_ptr p,
+            db::consistency_level,
+            const std::initializer_list<data_value>& = { });

    /*
     * This function provides a timestamp that is guaranteed to be higher than any timestamp
@@ -384,115 +233,110 @@ public:
     */
    api::timestamp_type next_timestamp();

-#if 0
-    public static UntypedResultSet executeInternalWithPaging(String query, int pageSize, Object... values)
-    {
-        try
-        {
-            ParsedStatement.Prepared prepared = prepareInternal(query);
-            if (!(prepared.statement instanceof SelectStatement))
-                throw new IllegalArgumentException("Only SELECTs can be paged");
-
-            SelectStatement select = (SelectStatement)prepared.statement;
-            QueryPager pager = QueryPagers.localPager(select.getPageableCommand(makeInternalOptions(prepared, values)));
-            return UntypedResultSet.create(select, pager, pageSize);
-        }
-        catch (RequestValidationException e)
-        {
-            throw new RuntimeException("Error validating query" + e);
-        }
-    }
-
-    /**
-     * Same than executeInternal, but to use for queries we know are only executed once so that the
-     * created statement object is not cached.
-     */
-    public static UntypedResultSet executeOnceInternal(String query, Object... values)
-    {
-        try
-        {
-            ParsedStatement.Prepared prepared = parseStatement(query, internalQueryState());
-            prepared.statement.validate(internalQueryState().getClientState());
-            ResultMessage result = prepared.statement.executeInternal(internalQueryState(), makeInternalOptions(prepared, values));
-            if (result instanceof ResultMessage.Rows)
-                return UntypedResultSet.create(((ResultMessage.Rows)result).result);
-            else
-                return null;
-        }
-        catch (RequestExecutionException e)
-        {
-            throw new RuntimeException(e);
-        }
-        catch (RequestValidationException e)
-        {
-            throw new RuntimeException("Error validating query " + query, e);
-        }
-    }
-
-    public static UntypedResultSet resultify(String query, Row row)
-    {
-        return resultify(query, Collections.singletonList(row));
-    }
-
-    public static UntypedResultSet resultify(String query, List<Row> rows)
-    {
-        try
-        {
-            SelectStatement ss = (SelectStatement) getStatement(query, null).statement;
-            ResultSet cqlRows = ss.process(rows);
-            return UntypedResultSet.create(cqlRows);
-        }
-        catch (RequestValidationException e)
-        {
-            throw new AssertionError(e);
-        }
-    }
-#endif
-
    future<::shared_ptr<cql_transport::messages::result_message::prepared>>
    prepare(sstring query_string, service::query_state& query_state);

    future<::shared_ptr<cql_transport::messages::result_message::prepared>>
    prepare(sstring query_string, const service::client_state& client_state, bool for_thrift);

-    static prepared_cache_key_type compute_id(const std::experimental::string_view& query_string, const sstring& keyspace);
-    static prepared_cache_key_type compute_thrift_id(const std::experimental::string_view& query_string, const sstring& keyspace);
+    future<> stop();
+
+    future<::shared_ptr<cql_transport::messages::result_message>>
+    process_batch(::shared_ptr<statements::batch_statement>, service::query_state& query_state, query_options& options);
+
+    std::unique_ptr<statements::prepared_statement> get_statement(
+            const std::experimental::string_view& query,
+            const service::client_state& client_state);
+
+    friend class migration_subscriber;

 private:
+    query_options make_internal_options(
+            const statements::prepared_statement::checked_weak_ptr& p,
+            const std::initializer_list<data_value>&,
+            db::consistency_level = db::consistency_level::ONE,
+            int32_t page_size = -1);
+
+    /*!
+     * \brief created a state object for paging
+     *
+     * When using paging internally a state object is needed.
+     */
+    ::shared_ptr<internal_query_state> create_paged_state(
+            const sstring& query_string,
+            const std::initializer_list<data_value>& = { },
+            int32_t page_size = 1000);
+
+    /*!
+     * \brief run a query using paging
+     */
+    future<::shared_ptr<untyped_result_set>> execute_paged_internal(::shared_ptr<internal_query_state> state);
+
+    /*!
+     * \brief iterate over all results using paging
+     */
+    future<> for_each_cql_result(
+            ::shared_ptr<cql3::internal_query_state> state,
+            std::function<stop_iteration(const cql3::untyped_result_set_row&)>&& f);
+
+    /*!
+     * \brief check, based on the state if there are additional results
+     * Users of the paging, should not use the internal_query_state directly
+     */
+    bool has_more_results(::shared_ptr<cql3::internal_query_state> state) const;
+
    ///
    /// \tparam ResultMsgType type of the returned result message (CQL or Thrift)
-    /// \tparam PreparedKeyGenerator a function that generates the prepared statement cache key for given query and keyspace
-    /// \tparam IdGetter a function that returns the corresponding prepared statement ID (CQL or Thrift) for a given prepared statement cache key
+    /// \tparam PreparedKeyGenerator a function that generates the prepared statement cache key for given query and
+    ///         keyspace
+    /// \tparam IdGetter a function that returns the corresponding prepared statement ID (CQL or Thrift) for a given
+    ////        prepared statement cache key
    /// \param query_string
    /// \param client_state
    /// \param id_gen prepared ID generator, called before the first deferring
-    /// \param id_getter prepared ID getter, passed to deferred context by reference. The caller must ensure its liveness.
+    /// \param id_getter prepared ID getter, passed to deferred context by reference. The caller must ensure its
+    ////       liveness.
    /// \return
    template <typename ResultMsgType, typename PreparedKeyGenerator, typename IdGetter>
    future<::shared_ptr<cql_transport::messages::result_message::prepared>>
-    prepare_one(sstring query_string, const service::client_state& client_state, PreparedKeyGenerator&& id_gen, IdGetter&& id_getter) {
-        return do_with(id_gen(query_string, client_state.get_raw_keyspace()), std::move(query_string), [this, &client_state, &id_getter] (const prepared_cache_key_type& key, const sstring& query_string) {
+    prepare_one(
+            sstring query_string,
+            const service::client_state& client_state,
+            PreparedKeyGenerator&& id_gen,
+            IdGetter&& id_getter) {
+        return do_with(
+                id_gen(query_string, client_state.get_raw_keyspace()),
+                std::move(query_string),
+                [this, &client_state, &id_getter](const prepared_cache_key_type& key, const sstring& query_string) {
            return _prepared_cache.get(key, [this, &query_string, &client_state] {
                auto prepared = get_statement(query_string, client_state);
                auto bound_terms = prepared->statement->get_bound_terms();
                if (bound_terms > std::numeric_limits<uint16_t>::max()) {
-                    throw exceptions::invalid_request_exception(sprint("Too many markers(?). %d markers exceed the allowed maximum of %d", bound_terms, std::numeric_limits<uint16_t>::max()));
+                    throw exceptions::invalid_request_exception(
+                            sprint("Too many markers(?). %d markers exceed the allowed maximum of %d",
+                                   bound_terms,
+                                   std::numeric_limits<uint16_t>::max()));
                }
                assert(bound_terms == prepared->bound_names.size());
                prepared->raw_cql_statement = query_string;
                return make_ready_future<std::unique_ptr<statements::prepared_statement>>(std::move(prepared));
            }).then([&key, &id_getter] (auto prep_ptr) {
-                return make_ready_future<::shared_ptr<cql_transport::messages::result_message::prepared>>(::make_shared<ResultMsgType>(id_getter(key), std::move(prep_ptr)));
+                return make_ready_future<::shared_ptr<cql_transport::messages::result_message::prepared>>(
+                        ::make_shared<ResultMsgType>(id_getter(key), std::move(prep_ptr)));
            }).handle_exception_type([&query_string] (typename prepared_statements_cache::statement_is_too_big&) {
-                return make_exception_future<::shared_ptr<cql_transport::messages::result_message::prepared>>(prepared_statement_is_too_big(query_string));
+                return make_exception_future<::shared_ptr<cql_transport::messages::result_message::prepared>>(
+                        prepared_statement_is_too_big(query_string));
            });
        });
    };

    template <typename ResultMsgType, typename KeyGenerator, typename IdGetter>
    ::shared_ptr<cql_transport::messages::result_message::prepared>
-    get_stored_prepared_statement_one(const std::experimental::string_view& query_string, const sstring& keyspace, KeyGenerator&& key_gen, IdGetter&& id_getter)
-    {
+    get_stored_prepared_statement_one(
+            const std::experimental::string_view& query_string,
+            const sstring& keyspace,
+            KeyGenerator&& key_gen,
+            IdGetter&& id_getter) {
        auto cache_key = key_gen(query_string, keyspace);
        auto it = _prepared_cache.find(cache_key);
        if (it == _prepared_cache.end()) {
@@ -503,55 +347,15 @@ private:
    }

    ::shared_ptr<cql_transport::messages::result_message::prepared>
-    get_stored_prepared_statement(const std::experimental::string_view& query_string, const sstring& keyspace, bool for_thrift);
-
-#if 0
-    public ResultMessage processPrepared(CQLStatement statement, QueryState queryState, QueryOptions options)
-    throws RequestExecutionException, RequestValidationException
-    {
-        List<ByteBuffer> variables = options.getValues();
-        // Check to see if there are any bound variables to verify
-        if (!(variables.isEmpty() && (statement.getBoundTerms() == 0)))
-        {
-            if (variables.size() != statement.getBoundTerms())
-                throw new InvalidRequestException(String.format("there were %d markers(?) in CQL but %d bound variables",
-                                                                statement.getBoundTerms(),
-                                                                variables.size()));
-
-            // at this point there is a match in count between markers and variables that is non-zero
-
-            if (logger.isTraceEnabled())
-                for (int i = 0; i < variables.size(); i++)
-                    logger.trace("[{}] '{}'", i+1, variables.get(i));
-        }
-
-        metrics.preparedStatementsExecuted.inc();
-        return processStatement(statement, queryState, options);
-    }
-#endif
-
-public:
-    future<::shared_ptr<cql_transport::messages::result_message>> process_batch(::shared_ptr<statements::batch_statement>,
-            service::query_state& query_state, query_options& options);
-
-    std::unique_ptr<statements::prepared_statement> get_statement(const std::experimental::string_view& query,
-            const service::client_state& client_state);
-    static ::shared_ptr<statements::raw::parsed_statement> parse_statement(const std::experimental::string_view& query);
-
-#if 0
-    private static long measure(Object key)
-    {
-        return meter.measureDeep(key);
-    }
-#endif
-public:
-    future<> stop();
-
-    friend class migration_subscriber;
+    get_stored_prepared_statement(
+            const std::experimental::string_view& query_string,
+            const sstring& keyspace,
+            bool for_thrift);
 };

 class query_processor::migration_subscriber : public service::migration_listener {
    query_processor* _qp;
+
 public:
    migration_subscriber(query_processor* qp);

@@ -575,9 +379,14 @@ public:
    virtual void on_drop_function(const sstring& ks_name, const sstring& function_name) override;
    virtual void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override;
    virtual void on_drop_view(const sstring& ks_name, const sstring& view_name) override;
+
 private:
    void remove_invalid_prepared_statements(sstring ks_name, std::experimental::optional<sstring> cf_name);
-    bool should_invalidate(sstring ks_name, std::experimental::optional<sstring> cf_name, ::shared_ptr<cql_statement> statement);
+
+    bool should_invalidate(
+            sstring ks_name,
+            std::experimental::optional<sstring> cf_name,
+            ::shared_ptr<cql_statement> statement);
 };

 extern distributed<query_processor> _the_query_processor;
--- a/cql3/restrictions/forwarding_primary_key_restrictions.hh
+++ b/cql3/restrictions/forwarding_primary_key_restrictions.hh
@@ -74,11 +74,9 @@ public:
        get_delegate()->merge_with(restriction);
    }

-#if 0
-    virtual bool has_supporting_index(::shared_ptr<secondary_index_manager> index_manager) override {
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const override {
        return get_delegate()->has_supporting_index(index_manager);
    }
-#endif

    virtual std::vector<bytes_opt> values(const query_options& options) const override {
        return get_delegate()->values(options);
--- a/cql3/restrictions/multi_column_restriction.hh
+++ b/cql3/restrictions/multi_column_restriction.hh
@@ -128,19 +128,18 @@ protected:
        }
        return str;
    }
-#if 0
-    @Override
-    public final boolean hasSupportingIndex(SecondaryIndexManager indexManager)
-    {
-        for (ColumnDefinition columnDef : columnDefs)
-        {
-            SecondaryIndex index = indexManager.getIndexForColumn(columnDef.name.bytes);
-            if (index != null && isSupportedBy(index))
+
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const override {
+        for (const auto& index : index_manager.list_indexes()) {
+            if (is_supported_by(index))
                return true;
        }
        return false;
    }

+    virtual bool is_supported_by(const secondary_index::index& index) const = 0;
+
+#if 0
    /**
     * Check if this type of restriction is supported for the specified column by the specified index.
     * @param index the Secondary index
@@ -172,6 +171,15 @@ public:
        return abstract_restriction::term_uses_function(_value, ks_name, function_name);
    }

+    virtual bool is_supported_by(const secondary_index::index& index) const override {
+        for (auto* cdef : _column_defs) {
+            if (index.supports_expression(*cdef, cql3::operator_type::EQ)) {
+                return true;
+            }
+        }
+        return false;
+    }
+
    virtual sstring to_string() const override {
        return sprint("EQ(%s)", _value->to_string());
    }
@@ -232,6 +240,15 @@ class multi_column_restriction::IN : public multi_column_restriction {
 public:
    using multi_column_restriction::multi_column_restriction;

+    virtual bool is_supported_by(const secondary_index::index& index) const override {
+        for (auto* cdef : _column_defs) {
+            if (index.supports_expression(*cdef, cql3::operator_type::IN)) {
+                return true;
+            }
+        }
+        return false;
+    }
+
    virtual bool is_IN() const override {
        return true;
    }
@@ -379,6 +396,15 @@ public:
        : slice(schema, defs, term_slice::new_instance(bound, inclusive, term))
    { }

+    virtual bool is_supported_by(const secondary_index::index& index) const override {
+        for (auto* cdef : _column_defs) {
+            if (_slice.is_supported_by(*cdef, index)) {
+                return true;
+            }
+        }
+        return false;
+    }
+
    virtual bool is_slice() const override {
        return true;
    }
--- a/cql3/restrictions/primary_key_restrictions.hh
+++ b/cql3/restrictions/primary_key_restrictions.hh
@@ -87,6 +87,7 @@ public:
    virtual std::vector<bounds_range_type> bounds_ranges(const query_options& options) const = 0;

    using restrictions::uses_function;
+    using restrictions::has_supporting_index;

    bool empty() const override {
        return get_column_defs().empty();
--- a/cql3/restrictions/restriction.hh
+++ b/cql3/restrictions/restriction.hh
@@ -43,6 +43,7 @@

 #include <vector>

+#include "index/secondary_index_manager.hh"
 #include "cql3/query_options.hh"
 #include "cql3/statements/bound.hh"
 #include "types.hh"
@@ -107,15 +108,15 @@ public:
     */
    virtual void merge_with(::shared_ptr<restriction> other) = 0;

-#if 0
    /**
     * Check if the restriction is on indexed columns.
     *
     * @param indexManager the index manager
     * @return <code>true</code> if the restriction is on indexed columns, <code>false</code>
     */
-    public boolean hasSupportingIndex(SecondaryIndexManager indexManager);
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const = 0;

+#if 0
    /**
     * Adds to the specified list the <code>IndexExpression</code>s corresponding to this <code>Restriction</code>.
     *
--- a/cql3/restrictions/restrictions.hh
+++ b/cql3/restrictions/restrictions.hh
@@ -47,6 +47,7 @@
 #include "cql3/query_options.hh"
 #include "types.hh"
 #include "schema.hh"
+#include "index/secondary_index_manager.hh"

 namespace cql3 {

@@ -74,15 +75,15 @@ public:
     */
    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const = 0;

-#if 0
    /**
     * Check if the restriction is on indexed columns.
     *
     * @param index_manager the index manager
     * @return <code>true</code> if the restriction is on indexed columns, <code>false</code>
     */
-    virtual bool has_supporting_index(::shared_ptr<secondary_index_manager> index_manager) const = 0;
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const = 0;

+#if 0
    /**
     * Adds to the specified list the <code>index_expression</code>s corresponding to this <code>Restriction</code>.
     *
--- a/cql3/restrictions/single_column_primary_key_restrictions.hh
+++ b/cql3/restrictions/single_column_primary_key_restrictions.hh
@@ -316,11 +316,11 @@ public:
        fail(unimplemented::cause::LEGACY_COMPOSITE_KEYS); // not 100% correct...
    }

-#if 0
-    virtual bool hasSupportingIndex(SecondaryIndexManager indexManager) override {
-        return restrictions.hasSupportingIndex(indexManager);
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const override {
+        return _restrictions->has_supporting_index(index_manager);
    }

+#if 0
    virtual void addIndexExpressionTo(List<IndexExpression> expressions, QueryOptions options) override {
        restrictions.addIndexExpressionTo(expressions, options);
    }
--- a/cql3/restrictions/single_column_restriction.hh
+++ b/cql3/restrictions/single_column_restriction.hh
@@ -82,14 +82,18 @@ public:
        ByteBuffer value = validateIndexedValue(columnDef, values.get(0));
        expressions.add(new IndexExpression(columnDef.name.bytes, Operator.EQ, value));
    }
+#endif

-    @Override
-    public boolean hasSupportingIndex(SecondaryIndexManager indexManager)
-    {
-        SecondaryIndex index = indexManager.getIndexForColumn(columnDef.name.bytes);
-        return index != null && isSupportedBy(index);
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const override {
+        for (const auto& index : index_manager.list_indexes()) {
+            if (is_supported_by(index))
+                return true;
+        }
+        return false;
    }

+    virtual bool is_supported_by(const secondary_index::index& index) const = 0;
+#if 0
    /**
     * Check if this type of restriction is supported by the specified index.
     *
@@ -129,6 +133,10 @@ public:
        return abstract_restriction::term_uses_function(_value, ks_name, function_name);
    }

+    virtual bool is_supported_by(const secondary_index::index& index) const override {
+        return index.supports_expression(_column_def, cql3::operator_type::EQ);
+    }
+
    virtual bool is_EQ() const override {
        return true;
    }
@@ -178,6 +186,10 @@ public:
        return true;
    }

+    virtual bool is_supported_by(const secondary_index::index& index) const override {
+        return index.supports_expression(_column_def, cql3::operator_type::IN);
+    }
+
    virtual void merge_with(::shared_ptr<restriction> r) override {
        throw exceptions::invalid_request_exception(sprint(
            "%s cannot be restricted by more than one relation if it includes a IN", _column_def.name_as_text()));
@@ -190,6 +202,14 @@ public:
                                 const query_options& options,
                                 gc_clock::time_point now) const override;

+    virtual std::vector<bytes_opt> values_raw(const query_options& options) const = 0;
+
+    virtual std::vector<bytes_opt> values(const query_options& options) const override {
+        std::vector<bytes_opt> ret = values_raw(options);
+        std::sort(ret.begin(),ret.end());
+        ret.erase(std::unique(ret.begin(),ret.end()),ret.end());
+        return ret;
+    }
 #if 0
    @Override
    protected final boolean isSupportedBy(SecondaryIndex index)
@@ -212,7 +232,7 @@ public:
        return abstract_restriction::term_uses_function(_values, ks_name, function_name);
    }

-    virtual std::vector<bytes_opt> values(const query_options& options) const override {
+    virtual std::vector<bytes_opt> values_raw(const query_options& options) const override {
        std::vector<bytes_opt> ret;
        for (auto&& v : _values) {
            ret.emplace_back(to_bytes_opt(v->bind_and_get(options)));
@@ -237,7 +257,7 @@ public:
        return false;
    }

-    virtual std::vector<bytes_opt> values(const query_options& options) const override {
+    virtual std::vector<bytes_opt> values_raw(const query_options& options) const override {
        auto&& lval = dynamic_pointer_cast<multi_item_terminal>(_marker->bind(options));
        if (!lval) {
            throw exceptions::invalid_request_exception("Invalid null value for IN restriction");
@@ -264,6 +284,10 @@ public:
                || (_slice.has_bound(statements::bound::END) && abstract_restriction::term_uses_function(_slice.bound(statements::bound::END), ks_name, function_name));
    }

+    virtual bool is_supported_by(const secondary_index::index& index) const override {
+        return _slice.is_supported_by(_column_def, index);
+    }
+
    virtual bool is_slice() const override {
        return true;
    }
@@ -403,22 +427,21 @@ public:
                target.add(new IndexExpression(columnDef.name.bytes, op, value));
            }
        }
+#endif

-        virtual bool is_supported_by(SecondaryIndex index) override {
+        virtual bool is_supported_by(const secondary_index::index& index) const override {
            bool supported = false;
-
-            if (numberOfValues() > 0)
-                supported |= index.supportsOperator(Operator.CONTAINS);
-
-            if (numberOfKeys() > 0)
-                supported |= index.supportsOperator(Operator.CONTAINS_KEY);
-
-            if (numberOfEntries() > 0)
-                supported |= index.supportsOperator(Operator.EQ);
-
+            if (number_of_values() > 0) {
+                supported |= index.supports_expression(_column_def, cql3::operator_type::CONTAINS);
+            }
+            if (number_of_keys() > 0) {
+                supported |= index.supports_expression(_column_def, cql3::operator_type::CONTAINS_KEY);
+            }
+            if (number_of_entries() > 0) {
+                supported |= index.supports_expression(_column_def, cql3::operator_type::EQ);
+            }
            return supported;
        }
-#endif

    uint32_t number_of_values() const {
        return _values.size();
--- a/cql3/restrictions/single_column_restrictions.hh
+++ b/cql3/restrictions/single_column_restrictions.hh
@@ -150,8 +150,7 @@ public:
        }
    }

-#if 0
-    virtual bool has_supporting_index(::shared_ptr<secondary_index_manager> index_manager) const override {
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const override {
        for (auto&& e : _restrictions) {
            if (e.second->has_supporting_index(index_manager)) {
                return true;
@@ -159,7 +158,6 @@ public:
        }
        return false;
    }
-#endif

    /**
     * Returns the column after the specified one.
--- a/cql3/restrictions/statement_restrictions.cc
+++ b/cql3/restrictions/statement_restrictions.cc
@@ -87,6 +87,9 @@ public:
    uint32_t size() const override {
        return 0;
    }
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const override {
+        return false;
+    }
    sstring to_string() const override {
        return "Initial restrictions";
    }
@@ -198,16 +201,12 @@ statement_restrictions::statement_restrictions(database& db,
            }
        }
    }
-
-    warn(unimplemented::cause::INDEXES);
-#if 0
-    ColumnFamilyStore cfs = Keyspace.open(cfm.ks_name).getColumnFamilyStore(cfm.cfName);
-    secondary_index_manager secondaryIndexManager = cfs.index_manager;
-#endif
-    bool has_queriable_clustering_column_index = false; /*_clustering_columns_restrictions->has_supporting_index(secondaryIndexManager);*/
-    bool has_queriable_index = false; /*has_queriable_clustering_column_index
-            || _partition_key_restrictions->has_supporting_index(secondaryIndexManager)
-            || nonprimary_key_restrictions->has_supporting_index(secondaryIndexManager);*/
+    auto& cf = db.find_column_family(schema);
+    auto& sim = cf.get_index_manager();
+    bool has_queriable_clustering_column_index = _clustering_columns_restrictions->has_supporting_index(sim);
+    bool has_queriable_index = has_queriable_clustering_column_index
+            || _partition_key_restrictions->has_supporting_index(sim)
+            || _nonprimary_key_restrictions->has_supporting_index(sim);

    // At this point, the select statement if fully constructed, but we still have a few things to validate
    process_partition_key_restrictions(has_queriable_index, for_view);
@@ -273,10 +272,7 @@ statement_restrictions::statement_restrictions(database& db,
    }

    if (_uses_secondary_indexing && !for_view) {
-        fail(unimplemented::cause::INDEXES);
-#if 0
        validate_secondary_index_selections(selects_only_static_columns);
-#endif
    }
 }

@@ -307,6 +303,10 @@ bool statement_restrictions::uses_function(const sstring& ks_name, const sstring
            || _nonprimary_key_restrictions->uses_function(ks_name, function_name);
 }

+const std::vector<::shared_ptr<restrictions>>& statement_restrictions::index_restrictions() const {
+    return _index_restrictions;
+}
+
 void statement_restrictions::process_partition_key_restrictions(bool has_queriable_index, bool for_view) {
    // If there is a queriable index, no special condition are required on the other restrictions.
    // But we still need to know 2 things:
--- a/cql3/restrictions/statement_restrictions.hh
+++ b/cql3/restrictions/statement_restrictions.hh
@@ -124,6 +124,8 @@ private:
 public:
    bool uses_function(const sstring& ks_name, const sstring& function_name) const;

+    const std::vector<::shared_ptr<restrictions>>& index_restrictions() const;
+
    /**
     * Checks if the restrictions on the partition key is an IN restriction.
     *
--- a/cql3/restrictions/term_slice.hh
+++ b/cql3/restrictions/term_slice.hh
@@ -117,6 +117,21 @@ public:
        }
    }

+    bool is_supported_by(const column_definition& cdef, const secondary_index::index& index) const {
+        bool supported = false;
+        if (has_bound(statements::bound::START)) {
+            supported |= is_inclusive(statements::bound::START)
+                         ? index.supports_expression(cdef, cql3::operator_type::GTE)
+                         : index.supports_expression(cdef, cql3::operator_type::GT);
+        }
+        if (has_bound(statements::bound::END)) {
+            supported |= is_inclusive(statements::bound::END)
+                         ? index.supports_expression(cdef, cql3::operator_type::LTE)
+                         : index.supports_expression(cdef, cql3::operator_type::LT);
+        }
+        return supported;
+    }
+
    sstring to_string() const {
        static auto print_term = [] (::shared_ptr<term> t) -> sstring {
            return t ? t->to_string() : "null";
--- a/cql3/restrictions/token_restriction.hh
+++ b/cql3/restrictions/token_restriction.hh
@@ -73,11 +73,11 @@ public:
        return _column_definitions;
    }

-#if 0
-    bool has_supporting_index(::shared_ptr<secondary_index_manager> index_manager) const override {
+    virtual bool has_supporting_index(const secondary_index::secondary_index_manager& index_manager) const override {
        return false;
    }

+#if 0
    void add_index_expression_to(std::vector<::shared_ptr<index_expression>>& expressions,
                                         const query_options& options) override {
        throw exceptions::unsupported_operation_exception();
--- a/cql3/selection/selectable.cc
+++ b/cql3/selection/selectable.cc
@@ -25,6 +25,7 @@
 #include "writetime_or_ttl.hh"
 #include "selector_factories.hh"
 #include "cql3/functions/functions.hh"
+#include "cql3/functions/castas_fcts.hh"
 #include "abstract_function_selector.hh"
 #include "writetime_or_ttl_selector.hh"

@@ -141,6 +142,30 @@ selectable::with_field_selection::raw::processes_selection() const {
    return true;
 }

+shared_ptr<selector::factory>
+selectable::with_cast::new_selector_factory(database& db, schema_ptr s, std::vector<const column_definition*>& defs) {
+    std::vector<shared_ptr<selectable>> args{_arg};
+    auto&& factories = selector_factories::create_factories_and_collect_column_definitions(args, db, s, defs);
+    auto&& fun = functions::castas_functions::get(_type->get_type(), factories->new_instances(), s);
+
+    return abstract_function_selector::new_factory(std::move(fun), std::move(factories));
+}
+
+sstring
+selectable::with_cast::to_string() const {
+    return sprint("cast(%s as %s)", _arg->to_string(), _type->to_string());
+}
+
+shared_ptr<selectable>
+selectable::with_cast::raw::prepare(schema_ptr s) {
+    return ::make_shared<selectable::with_cast>(_arg->prepare(s), _type);
+}
+
+bool
+selectable::with_cast::raw::processes_selection() const {
+    return true;
+}
+
 std::ostream & operator<<(std::ostream &os, const selectable& s) {
    return os << s.to_string();
 }
--- a/cql3/selection/selectable.hh
+++ b/cql3/selection/selectable.hh
@@ -45,6 +45,7 @@
 #include "schema.hh"
 #include "core/shared_ptr.hh"
 #include "cql3/selection/selector.hh"
+#include "cql3/cql3_type.hh"
 #include "cql3/functions/function_name.hh"

 namespace cql3 {
@@ -83,6 +84,8 @@ public:
    class with_function;

    class with_field_selection;
+
+    class with_cast;
 };

 std::ostream & operator<<(std::ostream &os, const selectable& s);
@@ -110,6 +113,29 @@ public:
    };
 };

+class selectable::with_cast : public selectable {
+    ::shared_ptr<selectable> _arg;
+    ::shared_ptr<cql3_type> _type;
+public:
+    with_cast(::shared_ptr<selectable> arg, ::shared_ptr<cql3_type> type)
+        : _arg(std::move(arg)), _type(std::move(type)) {
+    }
+
+    virtual sstring to_string() const override;
+
+    virtual shared_ptr<selector::factory> new_selector_factory(database& db, schema_ptr s, std::vector<const column_definition*>& defs) override;
+    class raw : public selectable::raw {
+        ::shared_ptr<selectable::raw> _arg;
+        ::shared_ptr<cql3_type> _type;
+    public:
+        raw(shared_ptr<selectable::raw> arg, ::shared_ptr<cql3_type> type)
+                : _arg(std::move(arg)), _type(std::move(type)) {
+        }
+        virtual shared_ptr<selectable> prepare(schema_ptr s) override;
+        virtual bool processes_selection() const override;
+    };
+};
+
 }

 }
--- a/cql3/single_column_relation.cc
+++ b/cql3/single_column_relation.cc
@@ -130,6 +130,10 @@ single_column_relation::to_receivers(schema_ptr schema, const column_definition&
        }
    }

+    if (is_contains() && !receiver->type->is_collection()) {
+        throw exceptions::invalid_request_exception(sprint("Cannot use CONTAINS on non-collection column \"%s\"", receiver->name));
+    }
+
    if (is_contains_key()) {
        if (!dynamic_cast<const map_type_impl*>(receiver->type.get())) {
            throw exceptions::invalid_request_exception(sprint("Cannot use CONTAINS KEY on non-map column %s", receiver->name));
--- a/cql3/single_column_relation.hh
+++ b/cql3/single_column_relation.hh
@@ -43,6 +43,7 @@

 #include <vector>
 #include "cql3/restrictions/single_column_restriction.hh"
+#include "statements/request_validations.hh"

 #include "core/shared_ptr.hh"
 #include "to_string.hh"
@@ -157,6 +158,19 @@ protected:
            statements::bound bound,
            bool inclusive) override {
        auto&& column_def = to_column_definition(schema, _entity);
+
+        if (column_def.type->references_duration()) {
+            using statements::request_validations::check_false;
+            const auto& ty = *column_def.type;
+
+            check_false(ty.is_collection(), "Slice restrictions are not supported on collections containing durations");
+            check_false(ty.is_tuple(), "Slice restrictions are not supported on tuples containing durations");
+            check_false(ty.is_user_type(), "Slice restrictions are not supported on UDTs containing durations");
+
+            // We're a duration.
+            throw exceptions::invalid_request_exception("Slice restrictions are not supported on duration columns");
+        }
+
        auto term = to_term(to_receivers(schema, column_def), _value, db, schema->ks_name(), std::move(bound_names));
        return ::make_shared<restrictions::single_column_restriction::slice>(column_def, bound, inclusive, std::move(term));
    }
--- a/cql3/statements/alter_table_statement.cc
+++ b/cql3/statements/alter_table_statement.cc
@@ -138,7 +138,7 @@ static data_type validate_alter(schema_ptr schema, const column_definition& def,
    return type;
 }

-static void validate_column_rename(const schema& schema, const column_identifier& from, const column_identifier& to)
+static void validate_column_rename(database& db, const schema& schema, const column_identifier& from, const column_identifier& to)
 {
    auto def = schema.get_column_definition(from.name());
    if (!def) {
@@ -154,8 +154,7 @@ static void validate_column_rename(const schema& schema, const column_identifier
    }

    if (!schema.indices().empty()) {
-        auto& sim = secondary_index::get_secondary_index_manager();
-        auto dependent_indices = sim.local().get_dependent_indices(*def);
+        auto dependent_indices = db.find_column_family(schema.id()).get_index_manager().get_dependent_indices(*def);
        if (!dependent_indices.empty()) {
            auto index_names = ::join(", ", dependent_indices | boost::adaptors::transformed([](const index_metadata& im) {
                return im.name();
@@ -235,7 +234,8 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
            // with the same name unless the types are compatible (see #6276).
            auto& dropped = schema->dropped_columns();
            auto i = dropped.find(column_name->text());
-            if (i != dropped.end() && !type->is_compatible_with(*i->second.type)) {
+            if (i != dropped.end() && i->second.type->is_collection() && i->second.type->is_multi_cell()
+                    && !type->is_compatible_with(*i->second.type)) {
                throw exceptions::invalid_request_exception(sprint("Cannot add a collection with the name %s "
                    "because a collection with the same name and a different type has already been used in the past", column_name));
            }
@@ -342,7 +342,7 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
            auto from = entry.first->prepare_column_identifier(schema);
            auto to = entry.second->prepare_column_identifier(schema);

-            validate_column_rename(*schema, *from, *to);
+            validate_column_rename(db, *schema, *from, *to);
            cfm.with_column_rename(from->name(), to->name());

            // If the view includes a renamed column, it must be renamed in the view table and the definition.
@@ -352,7 +352,7 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a

                    auto view_from = entry.first->prepare_column_identifier(view);
                    auto view_to = entry.second->prepare_column_identifier(view);
-                    validate_column_rename(*view, *view_from, *view_to);
+                    validate_column_rename(db, *view, *view_from, *view_to);
                    builder.with_column_rename(view_from->name(), view_to->name());

                    auto new_where = util::rename_column_in_where_clause(
--- a/cql3/statements/alter_user_statement.cc
+++ b/cql3/statements/alter_user_statement.cc
@@ -42,8 +42,8 @@
 #include <boost/range/adaptor/map.hpp>

 #include "alter_user_statement.hh"
-#include "auth/auth.hh"
 #include "auth/authenticator.hh"
+#include "auth/service.hh"

 cql3::statements::alter_user_statement::alter_user_statement(sstring username, ::shared_ptr<user_options> opts, std::experimental::optional<bool> superuser)
    : _username(std::move(username))
@@ -52,7 +52,7 @@ cql3::statements::alter_user_statement::alter_user_statement(sstring username, :
 {}

 void cql3::statements::alter_user_statement::validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) {
-    _opts->validate();
+    _opts->validate(state.get_auth_service()->underlying_authenticator());

    if (!_superuser && _opts->empty()) {
        throw exceptions::invalid_request_exception("ALTER USER can't be empty");
@@ -73,7 +73,10 @@ future<> cql3::statements::alter_user_statement::check_access(const service::cli
        // my disgust.
        throw exceptions::unauthorized_exception("You aren't allowed to alter your own superuser status");
    }
-    return user->is_super().then([this, user](bool is_super) {
+
+    const auto& auth_service = *state.get_auth_service();
+
+    return auth::is_super_user(auth_service, *user).then([this, user, &auth_service](bool is_super) {
        if (_superuser && !is_super) {
            throw exceptions::unauthorized_exception("Only superusers are allowed to alter superuser status");
        }
@@ -84,7 +87,7 @@ future<> cql3::statements::alter_user_statement::check_access(const service::cli

        if (!is_super) {
            for (auto o : _opts->options() | boost::adaptors::map_keys) {
-                if (!auth::authenticator::get().alterable_options().contains(o)) {
+                if (!auth_service.underlying_authenticator().alterable_options().contains(o)) {
                    throw exceptions::unauthorized_exception(sprint("You aren't allowed to alter {} option", o));
                }
            }
@@ -94,14 +97,17 @@ future<> cql3::statements::alter_user_statement::check_access(const service::cli

 future<::shared_ptr<cql_transport::messages::result_message>>
 cql3::statements::alter_user_statement::execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
-    return auth::auth::is_existing_user(_username).then([this](bool exists) {
+    auto& client_state = state.get_client_state();
+    auto& auth_service = *client_state.get_auth_service();
+
+    return auth_service.is_existing_user(_username).then([this, &auth_service](bool exists) {
        if (!exists) {
            throw exceptions::invalid_request_exception(sprint("User %s doesn't exist", _username));
        }
-        auto f = _opts->options().empty() ? make_ready_future() : auth::authenticator::get().alter(_username, _opts->options());
+        auto f = _opts->options().empty() ? make_ready_future() : auth_service.underlying_authenticator().alter(_username, _opts->options());
        if (_superuser) {
-            f = f.then([this] {
-                return auth::auth::insert_user(_username, *_superuser);
+            f = f.then([this, &auth_service] {
+                return auth_service.insert_user(_username, *_superuser);
            });
        }
        return f.then([] { return  make_ready_future<::shared_ptr<cql_transport::messages::result_message>>(); });
--- a/cql3/statements/create_index_statement.cc
+++ b/cql3/statements/create_index_statement.cc
@@ -47,6 +47,7 @@
 #include "service/storage_service.hh"
 #include "schema.hh"
 #include "schema_builder.hh"
+#include "request_validations.hh"

 #include <boost/range/adaptor/transformed.hpp>
 #include <boost/algorithm/string/join.hpp>
@@ -108,6 +109,18 @@ create_index_statement::validate(distributed<service::storage_proxy>& proxy, con
                    sprint("No column definition found for column %s", *target->column));
        }

+        if (cd->type->references_duration()) {
+            using request_validations::check_false;
+            const auto& ty = *cd->type;
+
+            check_false(ty.is_collection(), "Secondary indexes are not supported on collections containing durations");
+            check_false(ty.is_tuple(), "Secondary indexes are not supported on tuples containing durations");
+            check_false(ty.is_user_type(), "Secondary indexes are not supported on UDTs containing durations");
+
+            // We're a duration.
+            throw exceptions::invalid_request_exception("Secondary indexes are not supported on duration columns");
+        }
+
        // Origin TODO: we could lift that limitation
        if ((schema->is_dense() || !schema->thrift().has_compound_comparator()) &&
            cd->kind != column_kind::regular_column) {
--- a/cql3/statements/create_table_statement.cc
+++ b/cql3/statements/create_table_statement.cc
@@ -219,6 +219,9 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
        if (t->is_counter()) {
            throw exceptions::invalid_request_exception(sprint("counter type is not supported for PRIMARY KEY part %s", alias->text()));
        }
+        if (t->references_duration()) {
+            throw exceptions::invalid_request_exception(sprint("duration type is not supported for PRIMARY KEY part %s", alias->text()));
+        }
        if (_static_columns.count(alias) > 0) {
            throw exceptions::invalid_request_exception(sprint("Static column %s cannot be part of the PRIMARY KEY", alias->text()));
        }
@@ -254,6 +257,9 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
            if (at->is_counter()) {
                throw exceptions::invalid_request_exception(sprint("counter type is not supported for PRIMARY KEY part %s", stmt->_column_aliases[0]));
            }
+            if (at->references_duration()) {
+                throw exceptions::invalid_request_exception(sprint("duration type is not supported for PRIMARY KEY part %s", stmt->_column_aliases[0]));
+            }
            stmt->_clustering_key_types.emplace_back(at);
        } else {
            std::vector<data_type> types;
@@ -263,6 +269,9 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
                if (type->is_counter()) {
                    throw exceptions::invalid_request_exception(sprint("counter type is not supported for PRIMARY KEY part %s", t->text()));
                }
+                if (type->references_duration()) {
+                    throw exceptions::invalid_request_exception(sprint("duration type is not supported for PRIMARY KEY part %s", t->text()));
+                }
                if (_static_columns.count(t) > 0) {
                    throw exceptions::invalid_request_exception(sprint("Static column %s cannot be part of the PRIMARY KEY", t->text()));
                }
--- a/cql3/statements/create_user_statement.cc
+++ b/cql3/statements/create_user_statement.cc
@@ -40,8 +40,8 @@
 */

 #include "create_user_statement.hh"
-#include "auth/auth.hh"
 #include "auth/authenticator.hh"
+#include "auth/service.hh"

 cql3::statements::create_user_statement::create_user_statement(sstring username, ::shared_ptr<user_options> opts, bool superuser, bool if_not_exists)
    : _username(std::move(username))
@@ -55,7 +55,7 @@ void cql3::statements::create_user_statement::validate(distributed<service::stor
        throw exceptions::invalid_request_exception("Username can't be an empty string");
    }

-    _opts->validate();
+    _opts->validate(state.get_auth_service()->underlying_authenticator());

    // validate login here before checkAccess to avoid leaking user existence to anonymous users.
    state.ensure_not_anonymous();
@@ -66,19 +66,22 @@ void cql3::statements::create_user_statement::validate(distributed<service::stor

 future<::shared_ptr<cql_transport::messages::result_message>>
 cql3::statements::create_user_statement::execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
-    return state.get_client_state().user()->is_super().then([this](bool is_super) {
+    auto& client_state = state.get_client_state();
+    auto& auth_service = *client_state.get_auth_service();
+
+    return auth::is_super_user(auth_service, *client_state.user()).then([this, &auth_service](bool is_super) {
        if (!is_super) {
            throw exceptions::unauthorized_exception("Only superusers are allowed to perform CREATE USER queries");
        }
-        return auth::auth::is_existing_user(_username).then([this](bool exists) {
+        return auth_service.is_existing_user(_username).then([this, &auth_service](bool exists) {
            if (exists && !_if_not_exists) {
                throw exceptions::invalid_request_exception(sprint("User %s already exists", _username));
            }
            if (exists && _if_not_exists) {
                return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>();
            }
-            return auth::authenticator::get().create(_username, _opts->options()).then([this] {
-                return auth::auth::insert_user(_username, _superuser).then([] {
+            return auth_service.underlying_authenticator().create(_username, _opts->options()).then([this, &auth_service] {
+                return auth_service.insert_user(_username, _superuser).then([] {
                    return  make_ready_future<::shared_ptr<cql_transport::messages::result_message>>();
                });
            });
--- a/cql3/statements/create_view_statement.cc
+++ b/cql3/statements/create_view_statement.cc
@@ -116,6 +116,12 @@ static bool validate_primary_key(
        throw exceptions::invalid_request_exception(sprint(
                "Cannot use MultiCell column '%s' in PRIMARY KEY of materialized view", def->name_as_text()));
    }
+
+    if (def->type->references_duration()) {
+        throw exceptions::invalid_request_exception(sprint(
+                "Cannot use Duration column '%s' in PRIMARY KEY of materialized view", def->name_as_text()));
+    }
+
    if (def->is_static()) {
        throw exceptions::invalid_request_exception(sprint(
                "Cannot use Static column '%s' in PRIMARY KEY of materialized view", def->name_as_text()));
--- a/cql3/statements/drop_user_statement.cc
+++ b/cql3/statements/drop_user_statement.cc
@@ -42,9 +42,9 @@
 #include <boost/range/adaptor/map.hpp>

 #include "drop_user_statement.hh"
-#include "auth/auth.hh"
 #include "auth/authenticator.hh"
 #include "auth/authorizer.hh"
+#include "auth/service.hh"

 cql3::statements::drop_user_statement::drop_user_statement(sstring username, bool if_exists)
    : _username(std::move(username))
@@ -65,12 +65,15 @@ void cql3::statements::drop_user_statement::validate(distributed<service::storag

 future<::shared_ptr<cql_transport::messages::result_message>>
 cql3::statements::drop_user_statement::execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
-    return state.get_client_state().user()->is_super().then([this](bool is_super) {
+    auto& client_state = state.get_client_state();
+    auto& auth_service = *client_state.get_auth_service();
+
+    return auth::is_super_user(auth_service, *client_state.user()).then([this, &auth_service](bool is_super) {
        if (!is_super) {
            throw exceptions::unauthorized_exception("Only superusers are allowed to perform DROP USER queries");
        }

-        return auth::auth::is_existing_user(_username).then([this](bool exists) {
+        return auth_service.is_existing_user(_username).then([this, &auth_service](bool exists) {
            if (!_if_exists && !exists) {
                throw exceptions::invalid_request_exception(sprint("User %s doesn't exist", _username));
            }
@@ -79,9 +82,9 @@ cql3::statements::drop_user_statement::execute(distributed<service::storage_prox
            }

            // clean up permissions after the dropped user.
-            return auth::authorizer::get().revoke_all(_username).then([this] {
-                return auth::auth::delete_user(_username).then([this] {
-                    return auth::authenticator::get().drop(_username);
+            return auth_service.underlying_authorizer().revoke_all(_username).then([this, &auth_service] {
+                return auth_service.delete_user(_username).then([this, &auth_service] {
+                    return auth_service.underlying_authenticator().drop(_username);
                });
            }).then([] {
                return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>();
--- a/cql3/statements/grant_statement.cc
+++ b/cql3/statements/grant_statement.cc
@@ -44,7 +44,10 @@

 future<::shared_ptr<cql_transport::messages::result_message>>
 cql3::statements::grant_statement::execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
-    return auth::authorizer::get().grant(state.get_client_state().user(), _permissions, _resource, _username).then([] {
+    auto& client_state = state.get_client_state();
+    auto& auth_service = *client_state.get_auth_service();
+
+    return auth_service.underlying_authorizer().grant(client_state.user(), _permissions, _resource, _username).then([] {
        return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>();
    });
 }
--- a/cql3/statements/index_target.cc
+++ b/cql3/statements/index_target.cc
@@ -59,6 +59,20 @@ sstring index_target::as_cql_string(schema_ptr schema) const {
    return sprint("%s(%s)", to_sstring(type), column->to_cql_string());
 }

+index_target::target_type index_target::from_sstring(const sstring& s)
+{
+    if (s == "keys") {
+        return index_target::target_type::keys;
+    } else if (s == "entries") {
+        return index_target::target_type::keys_and_values;
+    } else if (s == "values") {
+        return index_target::target_type::values;
+    } else if (s == "full") {
+        return index_target::target_type::full;
+    }
+    throw std::runtime_error(sprint("Unknown target type: %s", s));
+}
+
 sstring index_target::index_option(target_type type) {
    switch (type) {
        case target_type::keys: return secondary_index::index_keys_option_name;
--- a/cql3/statements/index_target.hh
+++ b/cql3/statements/index_target.hh
@@ -68,6 +68,7 @@ struct index_target {

    static sstring index_option(target_type type);
    static target_type from_column_definition(const column_definition& cd);
+    static index_target::target_type from_sstring(const sstring& s);

    class raw {
    public:
--- a/cql3/statements/list_permissions_statement.cc
+++ b/cql3/statements/list_permissions_statement.cc
@@ -44,7 +44,7 @@

 #include "list_permissions_statement.hh"
 #include "auth/authorizer.hh"
-#include "auth/auth.hh"
+#include "auth/common.hh"
 #include "cql3/result_set.hh"
 #include "transport/messages/result_message.hh"

@@ -64,7 +64,7 @@ void cql3::statements::list_permissions_statement::validate(distributed<service:
 future<> cql3::statements::list_permissions_statement::check_access(const service::client_state& state) {
    auto f = make_ready_future();
    if (_username) {
-        f = auth::auth::is_existing_user(*_username).then([this](bool exists) {
+        f = state.get_auth_service()->is_existing_user(*_username).then([this](bool exists) {
            if (!exists) {
                throw exceptions::invalid_request_exception(sprint("User %s doesn't exist", *_username));
            }
@@ -84,7 +84,7 @@ future<> cql3::statements::list_permissions_statement::check_access(const servic
 future<::shared_ptr<cql_transport::messages::result_message>>
 cql3::statements::list_permissions_statement::execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
    static auto make_column = [](sstring name) {
-        return ::make_shared<column_specification>(auth::auth::AUTH_KS, "permissions", ::make_shared<column_identifier>(std::move(name), true), utf8_type);
+        return ::make_shared<column_specification>(auth::meta::AUTH_KS, "permissions", ::make_shared<column_identifier>(std::move(name), true), utf8_type);
    };
    static thread_local const std::vector<::shared_ptr<column_specification>> metadata({
        make_column("username"), make_column("resource"), make_column("permission")
@@ -104,7 +104,8 @@ cql3::statements::list_permissions_statement::execute(distributed<service::stora
    }

    return map_reduce(resources, [&state, this](opt_resource r) {
-        return auth::authorizer::get().list(state.get_client_state().user(), _permissions, std::move(r), _username);
+        auto& auth_service = *state.get_client_state().get_auth_service();
+        return auth_service.underlying_authorizer().list(auth_service, state.get_client_state().user(), _permissions, std::move(r), _username);
    }, std::vector<auth::permission_details>(), [](std::vector<auth::permission_details> details, std::vector<auth::permission_details> pd) {
        details.insert(details.end(), pd.begin(), pd.end());
        return std::move(details);
--- a/cql3/statements/list_users_statement.cc
+++ b/cql3/statements/list_users_statement.cc
@@ -42,7 +42,7 @@
 #include "list_users_statement.hh"
 #include "cql3/query_processor.hh"
 #include "cql3/query_options.hh"
-#include "auth/auth.hh"
+#include "auth/common.hh"

 void cql3::statements::list_users_statement::validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) {
 }
@@ -57,7 +57,7 @@ cql3::statements::list_users_statement::execute(distributed<service::storage_pro
    auto is = std::make_unique<service::query_state>(service::client_state::for_internal_calls());
    auto io = std::make_unique<query_options>(db::consistency_level::QUORUM, std::vector<cql3::raw_value>{});
    auto f = get_local_query_processor().process(
-                    sprint("SELECT * FROM %s.%s", auth::auth::AUTH_KS,
-                                    auth::auth::USERS_CF), *is, *io);
+                    sprint("SELECT * FROM %s.%s", auth::meta::AUTH_KS,
+                                    auth::meta::USERS_CF), *is, *io);
    return f.finally([is = std::move(is), io = std::move(io)] {});
 }
--- a/cql3/statements/permission_altering_statement.cc
+++ b/cql3/statements/permission_altering_statement.cc
@@ -41,11 +41,11 @@

 #include <seastar/core/thread.hh>

+#include "auth/service.hh"
 #include "permission_altering_statement.hh"
 #include "cql3/query_processor.hh"
 #include "cql3/query_options.hh"
 #include "cql3/selection/selection.hh"
-#include "auth/auth.hh"

 cql3::statements::permission_altering_statement::permission_altering_statement(
                auth::permission_set permissions, auth::data_resource resource,
@@ -60,7 +60,7 @@ void cql3::statements::permission_altering_statement::validate(distributed<servi
 }

 future<> cql3::statements::permission_altering_statement::check_access(const service::client_state& state) {
-    return auth::auth::is_existing_user(_username).then([this, &state](bool exists) {
+    return state.get_auth_service()->is_existing_user(_username).then([this, &state](bool exists) {
        if (!exists) {
            throw exceptions::invalid_request_exception(sprint("User %s doesn't exist", _username));
        }
--- a/cql3/statements/revoke_statement.cc
+++ b/cql3/statements/revoke_statement.cc
@@ -44,7 +44,10 @@

 future<::shared_ptr<cql_transport::messages::result_message>>
 cql3::statements::revoke_statement::execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
-    return auth::authorizer::get().revoke(state.get_client_state().user(), _permissions, _resource, _username).then([] {
+    auto& client_state = state.get_client_state();
+    auto& auth_service = *client_state.get_auth_service();
+
+    return auth_service.underlying_authorizer().revoke(client_state.user(), _permissions, _resource, _username).then([] {
        return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>();
    });
 }
--- a/cql3/statements/select_statement.cc
+++ b/cql3/statements/select_statement.cc
@@ -51,6 +51,8 @@
 #include "service/pager/query_pagers.hh"
 #include <seastar/core/execution_stage.hh>
 #include "view_info.hh"
+#include "partition_slice_builder.hh"
+#include "cql3/untyped_result_set.hh"

 namespace cql3 {

@@ -341,6 +343,10 @@ select_statement::execute_internal(distributed<service::storage_proxy>& proxy,
                                   service::query_state& state,
                                   const query_options& options)
 {
+    if (options.get_specific_options().page_size > 0) {
+        // need page, use regular execute
+        return do_execute(proxy, state, options);
+    }
    int32_t limit = get_limit(options);
    auto now = gc_clock::now();
    auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
@@ -397,6 +403,167 @@ select_statement::process_results(foreign_ptr<lw_shared_ptr<query::result>> resu
    return _restrictions;
 }

+primary_key_select_statement::primary_key_select_statement(schema_ptr schema, uint32_t bound_terms,
+                                                           ::shared_ptr<parameters> parameters,
+                                                           ::shared_ptr<selection::selection> selection,
+                                                           ::shared_ptr<restrictions::statement_restrictions> restrictions,
+                                                           bool is_reversed,
+                                                           ordering_comparator_type ordering_comparator,
+                                                           ::shared_ptr<term> limit, cql_stats &stats)
+    : select_statement{schema, bound_terms, parameters, selection, restrictions, is_reversed, ordering_comparator, limit, stats}
+{}
+
+::shared_ptr<cql3::statements::select_statement>
+indexed_table_select_statement::prepare(database& db,
+                                        schema_ptr schema,
+                                        uint32_t bound_terms,
+                                        ::shared_ptr<parameters> parameters,
+                                        ::shared_ptr<selection::selection> selection,
+                                        ::shared_ptr<restrictions::statement_restrictions> restrictions,
+                                        bool is_reversed,
+                                        ordering_comparator_type ordering_comparator,
+                                        ::shared_ptr<term> limit, cql_stats &stats)
+{
+    auto index_opt = find_idx(db, schema, restrictions);
+    if (!index_opt) {
+        throw std::runtime_error("No index found.");
+    }
+    return ::make_shared<cql3::statements::indexed_table_select_statement>(
+            schema,
+            bound_terms,
+            parameters,
+            std::move(selection),
+            std::move(restrictions),
+            is_reversed,
+            std::move(ordering_comparator),
+            limit,
+            stats,
+            *index_opt);
+
+}
+
+
+stdx::optional<secondary_index::index> indexed_table_select_statement::find_idx(database& db,
+                                                                                schema_ptr schema,
+                                                                                ::shared_ptr<restrictions::statement_restrictions> restrictions)
+{
+    auto& sim = db.find_column_family(schema).get_index_manager();
+    for (::shared_ptr<cql3::restrictions::restrictions> restriction : restrictions->index_restrictions()) {
+        for (const auto& cdef : restriction->get_column_defs()) {
+            for (auto index : sim.list_indexes()) {
+                if (index.depends_on(*cdef)) {
+                    return stdx::make_optional<secondary_index::index>(std::move(index));
+                }
+            }
+        }
+    }
+    return stdx::nullopt;
+}
+
+indexed_table_select_statement::indexed_table_select_statement(schema_ptr schema, uint32_t bound_terms,
+                                                           ::shared_ptr<parameters> parameters,
+                                                           ::shared_ptr<selection::selection> selection,
+                                                           ::shared_ptr<restrictions::statement_restrictions> restrictions,
+                                                           bool is_reversed,
+                                                           ordering_comparator_type ordering_comparator,
+                                                           ::shared_ptr<term> limit, cql_stats &stats,
+                                                           const secondary_index::index& index)
+    : select_statement{schema, bound_terms, parameters, selection, restrictions, is_reversed, ordering_comparator, limit, stats}
+    , _index{index}
+{}
+
+future<shared_ptr<cql_transport::messages::result_message>>
+indexed_table_select_statement::do_execute(distributed<service::storage_proxy>& proxy,
+                             service::query_state& state,
+                             const query_options& options)
+{
+    tracing::add_table_name(state.get_trace_state(), keyspace(), column_family());
+
+    auto cl = options.get_consistency();
+
+    validate_for_read(_schema->ks_name(), cl);
+
+    int32_t limit = get_limit(options);
+    auto now = gc_clock::now();
+
+    ++_stats.reads;
+
+    assert(_restrictions->uses_secondary_indexing());
+    return find_index_partition_ranges(proxy, state, options).then([limit, now, &state, &options, &proxy, this] (dht::partition_range_vector partition_ranges) {
+        auto command = ::make_lw_shared<query::read_command>(
+                _schema->id(),
+                _schema->version(),
+                make_partition_slice(options),
+                limit,
+                now,
+                tracing::make_trace_info(state.get_trace_state()),
+                query::max_partitions,
+                options.get_timestamp(state));
+        return this->execute(proxy, command, std::move(partition_ranges), state, options, now);
+    });
+}
+
+future<dht::partition_range_vector>
+indexed_table_select_statement::find_index_partition_ranges(distributed<service::storage_proxy>& proxy,
+                                             service::query_state& state,
+                                             const query_options& options)
+{
+    const auto& im = _index.metadata();
+    sstring index_table_name = sprint("%s_index", im.name());
+    tracing::add_table_name(state.get_trace_state(), keyspace(), index_table_name);
+    auto& db = proxy.local().get_db().local();
+    const auto& view = db.find_column_family(_schema->ks_name(), index_table_name);
+    dht::partition_range_vector partition_ranges;
+    for (const auto& entry : _restrictions->get_non_pk_restriction()) {
+        auto pk = partition_key::from_optional_exploded(*view.schema(), entry.second->values(options));
+        auto dk = dht::global_partitioner().decorate_key(*view.schema(), pk);
+        auto range = dht::partition_range::make_singular(dk);
+        partition_ranges.emplace_back(range);
+    }
+
+    auto now = gc_clock::now();
+    int32_t limit = get_limit(options);
+
+    partition_slice_builder partition_slice_builder{*view.schema()};
+    auto cmd = ::make_lw_shared<query::read_command>(
+            view.schema()->id(),
+            view.schema()->version(),
+            partition_slice_builder.build(),
+            limit,
+            now,
+            tracing::make_trace_info(state.get_trace_state()),
+            query::max_partitions,
+            options.get_timestamp(state));
+    return proxy.local().query(view.schema(),
+                               cmd,
+                               std::move(partition_ranges),
+                               options.get_consistency(),
+                               state.get_trace_state()).then([cmd, this, &options, now, &view] (foreign_ptr<lw_shared_ptr<query::result>> result) {
+        std::vector<const column_definition*> columns;
+        for (const column_definition& cdef : _schema->partition_key_columns()) {
+            columns.emplace_back(view.schema()->get_column_definition(cdef.name()));
+        }
+        auto selection = selection::selection::for_columns(view.schema(), columns);
+        cql3::selection::result_set_builder builder(*selection, now, options.get_cql_serialization_format());
+        query::result_view::consume(*result,
+                                    cmd->slice,
+                                    cql3::selection::result_set_builder::visitor(builder, *view.schema(), *selection));
+        auto rs = cql3::untyped_result_set(::make_shared<cql_transport::messages::result_message::rows>(std::move(builder.build())));
+        dht::partition_range_vector partition_ranges;
+        for (size_t i = 0; i < rs.size(); i++) {
+            const auto& row = rs.at(i);
+            for (const auto& column : row.get_columns()) {
+                auto blob = row.get_blob(column->name->to_cql_string());
+                auto pk = partition_key::from_exploded(*_schema, { blob });
+                auto dk = dht::global_partitioner().decorate_key(*_schema, pk);
+                auto range = dht::partition_range::make_singular(dk);
+                partition_ranges.emplace_back(range);
+            }
+        }
+        return make_ready_future<dht::partition_range_vector>(partition_ranges);
+    }).finally([cmd] {});
+}
+
 namespace raw {

 select_statement::select_statement(::shared_ptr<cf_name> cf_name,
@@ -437,15 +604,31 @@ std::unique_ptr<prepared_statement> select_statement::prepare(database& db, cql_

    check_needs_filtering(restrictions);

-    auto stmt = ::make_shared<cql3::statements::select_statement>(schema,
-        bound_names->size(),
-        _parameters,
-        std::move(selection),
-        std::move(restrictions),
-        is_reversed_,
-        std::move(ordering_comparator),
-        prepare_limit(db, bound_names),
-        stats);
+    ::shared_ptr<cql3::statements::select_statement> stmt;
+    if (restrictions->uses_secondary_indexing()) {
+        stmt = indexed_table_select_statement::prepare(
+                db,
+                schema,
+                bound_names->size(),
+                _parameters,
+                std::move(selection),
+                std::move(restrictions),
+                is_reversed_,
+                std::move(ordering_comparator),
+                prepare_limit(db, bound_names),
+                stats);
+    } else {
+        stmt = ::make_shared<cql3::statements::primary_key_select_statement>(
+                schema,
+                bound_names->size(),
+                _parameters,
+                std::move(selection),
+                std::move(restrictions),
+                is_reversed_,
+                std::move(ordering_comparator),
+                prepare_limit(db, bound_names),
+                stats);
+    }

    auto partition_key_bind_indices = bound_names->get_partition_key_bind_indexes(schema);

--- a/cql3/statements/select_statement.hh
+++ b/cql3/statements/select_statement.hh
@@ -66,7 +66,8 @@ namespace statements {
 class select_statement : public cql_statement {
 public:
    using parameters = raw::select_statement::parameters;
-private:
+    using ordering_comparator_type = raw::select_statement::ordering_comparator_type;
+protected:
    static constexpr int DEFAULT_COUNT_PAGE_SIZE = 10000;
    static thread_local const ::shared_ptr<parameters> _default_parameters;
    schema_ptr _schema;
@@ -81,7 +82,6 @@ private:
    using compare_fn = raw::select_statement::compare_fn<T>;

    using result_row_type = raw::select_statement::result_row_type;
-    using ordering_comparator_type = raw::select_statement::ordering_comparator_type;

    /**
     * The comparator used to orders results when multiple keys are selected (using IN).
@@ -90,8 +90,8 @@ private:

    query::partition_slice::option_set _opts;
    cql_stats& _stats;
-private:
-    future<::shared_ptr<cql_transport::messages::result_message>> do_execute(distributed<service::storage_proxy>& proxy,
+protected :
+    virtual future<::shared_ptr<cql_transport::messages::result_message>> do_execute(distributed<service::storage_proxy>& proxy,
        service::query_state& state, const query_options& options);
    friend class select_statement_executor;
 public:
@@ -126,54 +126,6 @@ public:

    shared_ptr<cql_transport::messages::result_message> process_results(foreign_ptr<lw_shared_ptr<query::result>> results,
        lw_shared_ptr<query::read_command> cmd, const query_options& options, gc_clock::time_point now);
-#if 0
-    private ResultMessage.Rows pageAggregateQuery(QueryPager pager, QueryOptions options, int pageSize, long now)
-            throws RequestValidationException, RequestExecutionException
-    {
-        Selection.ResultSetBuilder result = _selection->resultSetBuilder(now);
-        while (!pager.isExhausted())
-        {
-            for (org.apache.cassandra.db.Row row : pager.fetchPage(pageSize))
-            {
-                // Not columns match the query, skip
-                if (row.cf == null)
-                    continue;
-
-                processColumnFamily(row.key.getKey(), row.cf, options, now, result);
-            }
-        }
-        return new ResultMessage.Rows(result.build(options.getProtocolVersion()));
-    }
-
-    static List<Row> readLocally(String keyspaceName, List<ReadCommand> cmds)
-    {
-        Keyspace keyspace = Keyspace.open(keyspaceName);
-        List<Row> rows = new ArrayList<Row>(cmds.size());
-        for (ReadCommand cmd : cmds)
-            rows.add(cmd.getRow(keyspace));
-        return rows;
-    }
-
-    public ResultMessage.Rows executeInternal(QueryState state, QueryOptions options) throws RequestExecutionException, RequestValidationException
-    {
-        int limit = getLimit(options);
-        long now = System.currentTimeMillis();
-        Pageable command = getPageableCommand(options, limit, now);
-        List<Row> rows = command == null
-                       ? Collections.<Row>emptyList()
-                       : (command instanceof Pageable.ReadCommands
-                          ? readLocally(keyspace(), ((Pageable.ReadCommands)command).commands)
-                          : ((RangeSliceCommand)command).executeLocally());
-
-        return processResults(rows, options, limit, now);
-    }
-
-    public ResultSet process(List<Row> rows) throws InvalidRequestException
-    {
-        QueryOptions options = QueryOptions.DEFAULT;
-        return process(rows, options, getLimit(options), System.currentTimeMillis());
-    }
-#endif

    const sstring& keyspace() const;

@@ -183,241 +135,60 @@ public:

    ::shared_ptr<restrictions::statement_restrictions> get_restrictions() const;

-#if 0
-    private SliceQueryFilter sliceFilter(ColumnSlice slice, int limit, int toGroup)
-    {
-        return sliceFilter(new ColumnSlice[]{ slice }, limit, toGroup);
-    }
-
-    private SliceQueryFilter sliceFilter(ColumnSlice[] slices, int limit, int toGroup)
-    {
-        assert ColumnSlice.validateSlices(slices, _schema.comparator, _is_reversed) : String.format("Invalid slices: " + Arrays.toString(slices) + (_is_reversed ? " (reversed)" : ""));
-        return new SliceQueryFilter(slices, _is_reversed, limit, toGroup);
-    }
-#endif
-
-private:
+protected:
    int32_t get_limit(const query_options& options) const;
    bool needs_post_query_ordering() const;
+};

-#if 0
-    private int updateLimitForQuery(int limit)
-    {
-        // Internally, we don't support exclusive bounds for slices. Instead, we query one more element if necessary
-        // and exclude it later (in processColumnFamily)
-        return restrictions.isNonCompositeSliceWithExclusiveBounds() && limit != Integer.MAX_VALUE
-             ? limit + 1
-             : limit;
-    }
+class primary_key_select_statement : public select_statement {
+public:
+    primary_key_select_statement(schema_ptr schema,
+                     uint32_t bound_terms,
+                     ::shared_ptr<parameters> parameters,
+                     ::shared_ptr<selection::selection> selection,
+                     ::shared_ptr<restrictions::statement_restrictions> restrictions,
+                     bool is_reversed,
+                     ordering_comparator_type ordering_comparator,
+                     ::shared_ptr<term> limit,
+                     cql_stats &stats);
+};

-    private SortedSet<CellName> getRequestedColumns(QueryOptions options) throws InvalidRequestException
-    {
-        // Note: getRequestedColumns don't handle static columns, but due to CASSANDRA-5762
-        // we always do a slice for CQL3 tables, so it's ok to ignore them here
-        assert !restrictions.isColumnRange();
-        SortedSet<CellName> columns = new TreeSet<CellName>(cfm.comparator);
-        for (Composite composite : restrictions.getClusteringColumnsAsComposites(options))
-            columns.addAll(addSelectedColumns(composite));
-        return columns;
-    }
+class indexed_table_select_statement : public select_statement {
+    secondary_index::index _index;
+public:
+    static ::shared_ptr<cql3::statements::select_statement> prepare(database& db,
+                                                                    schema_ptr schema,
+                                                                    uint32_t bound_terms,
+                                                                    ::shared_ptr<parameters> parameters,
+                                                                    ::shared_ptr<selection::selection> selection,
+                                                                    ::shared_ptr<restrictions::statement_restrictions> restrictions,
+                                                                    bool is_reversed,
+                                                                    ordering_comparator_type ordering_comparator,
+                                                                    ::shared_ptr<term> limit,
+                                                                    cql_stats &stats);

-    private SortedSet<CellName> addSelectedColumns(Composite prefix)
-    {
-        if (cfm.comparator.isDense())
-        {
-            return FBUtilities.singleton(cfm.comparator.create(prefix, null), cfm.comparator);
-        }
-        else
-        {
-            SortedSet<CellName> columns = new TreeSet<CellName>(cfm.comparator);
+    indexed_table_select_statement(schema_ptr schema,
+                                   uint32_t bound_terms,
+                                   ::shared_ptr<parameters> parameters,
+                                   ::shared_ptr<selection::selection> selection,
+                                   ::shared_ptr<restrictions::statement_restrictions> restrictions,
+                                   bool is_reversed,
+                                   ordering_comparator_type ordering_comparator,
+                                   ::shared_ptr<term> limit,
+                                   cql_stats &stats,
+                                   const secondary_index::index& index);

-            // We need to query the selected column as well as the marker
-            // column (for the case where the row exists but has no columns outside the PK)
-            // Two exceptions are "static CF" (non-composite non-compact CF) and "super CF"
-            // that don't have marker and for which we must query all columns instead
-            if (cfm.comparator.isCompound() && !cfm.isSuper())
-            {
-                // marker
-                columns.add(cfm.comparator.rowMarker(prefix));
+private:
+    static stdx::optional<secondary_index::index> find_idx(database& db,
+                                                           schema_ptr schema,
+                                                           ::shared_ptr<restrictions::statement_restrictions> restrictions);

-                // selected columns
-                for (ColumnDefinition def : selection.getColumns())
-                    if (def.isRegular() || def.isStatic())
-                        columns.add(cfm.comparator.create(prefix, def));
-            }
-            else
-            {
-                // We now that we're not composite so we can ignore static columns
-                for (ColumnDefinition def : cfm.regularColumns())
-                    columns.add(cfm.comparator.create(prefix, def));
-            }
-            return columns;
-        }
-    }
+    virtual future<::shared_ptr<cql_transport::messages::result_message>> do_execute(distributed<service::storage_proxy>& proxy,
+                                                                                     service::query_state& state, const query_options& options) override;

-    public List<IndexExpression> getValidatedIndexExpressions(QueryOptions options) throws InvalidRequestException
-    {
-        if (!restrictions.usesSecondaryIndexing())
-            return Collections.emptyList();
-
-        List<IndexExpression> expressions = restrictions.getIndexExpressions(options);
-
-        ColumnFamilyStore cfs = Keyspace.open(keyspace()).getColumnFamilyStore(columnFamily());
-        SecondaryIndexManager secondaryIndexManager = cfs.indexManager;
-        secondaryIndexManager.validateIndexSearchersForQuery(expressions);
-
-        return expressions;
-    }
-
-    private CellName makeExclusiveSliceBound(Bound bound, CellNameType type, QueryOptions options) throws InvalidRequestException
-    {
-        if (restrictions.areRequestedBoundsInclusive(bound))
-            return null;
-
-       return type.makeCellName(restrictions.getClusteringColumnsBounds(bound, options).get(0));
-    }
-
-    private Iterator<Cell> applySliceRestriction(final Iterator<Cell> cells, final QueryOptions options) throws InvalidRequestException
-    {
-        final CellNameType type = cfm.comparator;
-
-        final CellName excludedStart = makeExclusiveSliceBound(Bound.START, type, options);
-        final CellName excludedEnd = makeExclusiveSliceBound(Bound.END, type, options);
-
-        return Iterators.filter(cells, new Predicate<Cell>()
-        {
-            public boolean apply(Cell c)
-            {
-                // For dynamic CF, the column could be out of the requested bounds (because we don't support strict bounds internally (unless
-                // the comparator is composite that is)), filter here
-                return !((excludedStart != null && type.compare(c.name(), excludedStart) == 0)
-                            || (excludedEnd != null && type.compare(c.name(), excludedEnd) == 0));
-            }
-        });
-    }
-
-    private ResultSet process(List<Row> rows, QueryOptions options, int limit, long now) throws InvalidRequestException
-    {
-        Selection.ResultSetBuilder result = selection.resultSetBuilder(now);
-        for (org.apache.cassandra.db.Row row : rows)
-        {
-            // Not columns match the query, skip
-            if (row.cf == null)
-                continue;
-
-            processColumnFamily(row.key.getKey(), row.cf, options, now, result);
-        }
-
-        ResultSet cqlRows = result.build(options.getProtocolVersion());
-
-        orderResults(cqlRows);
-
-        // Internal calls always return columns in the comparator order, even when reverse was set
-        if (isReversed)
-            cqlRows.reverse();
-
-        // Trim result if needed to respect the user limit
-        cqlRows.trim(limit);
-        return cqlRows;
-    }
-
-    // Used by ModificationStatement for CAS operations
-    void processColumnFamily(ByteBuffer key, ColumnFamily cf, QueryOptions options, long now, Selection.ResultSetBuilder result)
-    throws InvalidRequestException
-    {
-        CFMetaData cfm = cf.metadata();
-        ByteBuffer[] keyComponents = null;
-        if (cfm.getKeyValidator() instanceof CompositeType)
-        {
-            keyComponents = ((CompositeType)cfm.getKeyValidator()).split(key);
-        }
-        else
-        {
-            keyComponents = new ByteBuffer[]{ key };
-        }
-
-        Iterator<Cell> cells = cf.getSortedColumns().iterator();
-        if (restrictions.isNonCompositeSliceWithExclusiveBounds())
-            cells = applySliceRestriction(cells, options);
-
-        CQL3Row.RowIterator iter = cfm.comparator.CQL3RowBuilder(cfm, now).group(cells);
-
-        // If there is static columns but there is no non-static row, then provided the select was a full
-        // partition selection (i.e. not a 2ndary index search and there was no condition on clustering columns)
-        // then we want to include the static columns in the result set (and we're done).
-        CQL3Row staticRow = iter.getStaticRow();
-        if (staticRow != null && !iter.hasNext() && !restrictions.usesSecondaryIndexing() && restrictions.hasNoClusteringColumnsRestriction())
-        {
-            result.newRow(options.getProtocolVersion());
-            for (ColumnDefinition def : selection.getColumns())
-            {
-                switch (def.kind)
-                {
-                    case PARTITION_KEY:
-                        result.add(keyComponents[def.position()]);
-                        break;
-                    case STATIC:
-                        addValue(result, def, staticRow, options);
-                        break;
-                    default:
-                        result.add((ByteBuffer)null);
-                }
-            }
-            return;
-        }
-
-        while (iter.hasNext())
-        {
-            CQL3Row cql3Row = iter.next();
-
-            // Respect requested order
-            result.newRow(options.getProtocolVersion());
-            // Respect selection order
-            for (ColumnDefinition def : selection.getColumns())
-            {
-                switch (def.kind)
-                {
-                    case PARTITION_KEY:
-                        result.add(keyComponents[def.position()]);
-                        break;
-                    case CLUSTERING_COLUMN:
-                        result.add(cql3Row.getClusteringColumn(def.position()));
-                        break;
-                    case COMPACT_VALUE:
-                        result.add(cql3Row.getColumn(null));
-                        break;
-                    case REGULAR:
-                        addValue(result, def, cql3Row, options);
-                        break;
-                    case STATIC:
-                        addValue(result, def, staticRow, options);
-                        break;
-                }
-            }
-        }
-    }
-
-    private static void addValue(Selection.ResultSetBuilder result, ColumnDefinition def, CQL3Row row, QueryOptions options)
-    {
-        if (row == null)
-        {
-            result.add((ByteBuffer)null);
-            return;
-        }
-
-        if (def.type.isMultiCell())
-        {
-            List<Cell> cells = row.getMultiCellColumn(def.name);
-            ByteBuffer buffer = cells == null
-                             ? null
-                             : ((CollectionType)def.type).serializeForNativeProtocol(cells, options.getProtocolVersion());
-            result.add(buffer);
-            return;
-        }
-
-        result.add(row.getColumn(def.name));
-    }
-#endif
+    future<dht::partition_range_vector> find_index_partition_ranges(distributed<service::storage_proxy>& proxy,
+                                                                    service::query_state& state,
+                                                                    const query_options& options);
 };

 }
--- a/cql3/tuples.hh
+++ b/cql3/tuples.hh
@@ -134,7 +134,7 @@ public:
            try {
                validate_assignable_to(db, keyspace, receiver);
                return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
-            } catch (exceptions::invalid_request_exception e) {
+            } catch (exceptions::invalid_request_exception& e) {
                return assignment_testable::test_result::NOT_ASSIGNABLE;
            }
        }
--- a/cql3/untyped_result_set.cc
+++ b/cql3/untyped_result_set.cc
@@ -47,11 +47,11 @@
 #include "result_set.hh"
 #include "transport/messages/result_message.hh"

-cql3::untyped_result_set::row::row(const std::unordered_map<sstring, bytes_opt>& data)
+cql3::untyped_result_set_row::untyped_result_set_row(const std::unordered_map<sstring, bytes_opt>& data)
    : _data(data)
 {}

-cql3::untyped_result_set::row::row(const std::vector<::shared_ptr<column_specification>>& columns, std::vector<bytes_opt> data)
+cql3::untyped_result_set_row::untyped_result_set_row(const std::vector<::shared_ptr<column_specification>>& columns, std::vector<bytes_opt> data)
 : _columns(columns)
 , _data([&columns, data = std::move(data)] () mutable {
    std::unordered_map<sstring, bytes_opt> tmp;
@@ -62,7 +62,7 @@ cql3::untyped_result_set::row::row(const std::vector<::shared_ptr<column_specifi
 }())
 {}

-bool cql3::untyped_result_set::row::has(const sstring& name) const {
+bool cql3::untyped_result_set_row::has(const sstring& name) const {
    auto i = _data.find(name);
    return i != _data.end() && i->second;
 }
@@ -90,7 +90,7 @@ cql3::untyped_result_set::untyped_result_set(::shared_ptr<result_message> msg)
 }())
 {}

-const cql3::untyped_result_set::row& cql3::untyped_result_set::one() const {
+const cql3::untyped_result_set_row& cql3::untyped_result_set::one() const {
    if (_rows.size() != 1) {
        throw std::runtime_error("One row required, " + std::to_string(_rows.size()) + " found");
    }
--- a/cql3/untyped_result_set.hh
+++ b/cql3/untyped_result_set.hh
@@ -49,96 +49,97 @@

 namespace cql3 {

+class untyped_result_set_row {
+private:
+    const std::vector<::shared_ptr<column_specification>> _columns;
+    const std::unordered_map<sstring, bytes_opt> _data;
+public:
+    untyped_result_set_row(const std::unordered_map<sstring, bytes_opt>&);
+    untyped_result_set_row(const std::vector<::shared_ptr<column_specification>>&, std::vector<bytes_opt>);
+    untyped_result_set_row(untyped_result_set_row&&) = default;
+    untyped_result_set_row(const untyped_result_set_row&) = delete;
+
+    bool has(const sstring&) const;
+    bytes get_blob(const sstring& name) const {
+        return *_data.at(name);
+    }
+    template<typename T>
+    T get_as(const sstring& name) const {
+        return value_cast<T>(data_type_for<T>()->deserialize(get_blob(name)));
+    }
+    template<typename T>
+    std::experimental::optional<T> get_opt(const sstring& name) const {
+        return has(name) ? get_as<T>(name) : std::experimental::optional<T>{};
+    }
+    template<typename T>
+    T get_or(const sstring& name, T t) const {
+        return has(name) ? get_as<T>(name) : t;
+    }
+    // this could maybe be done as an overload of get_as (or something), but that just
+    // muddles things for no real gain. Let user (us) attempt to know what he is doing instead.
+    template<typename K, typename V, typename Iter>
+    void get_map_data(const sstring& name, Iter out, data_type keytype =
+            data_type_for<K>(), data_type valtype =
+            data_type_for<V>()) const {
+        auto vec =
+                value_cast<map_type_impl::native_type>(
+                        map_type_impl::get_instance(keytype, valtype, false)->deserialize(
+                                get_blob(name)));
+        std::transform(vec.begin(), vec.end(), out,
+                [](auto& p) {
+                    return std::pair<K, V>(value_cast<K>(p.first), value_cast<V>(p.second));
+                });
+    }
+    template<typename K, typename V, typename ... Rest>
+    std::unordered_map<K, V, Rest...> get_map(const sstring& name,
+            data_type keytype = data_type_for<K>(), data_type valtype =
+                    data_type_for<V>()) const {
+        std::unordered_map<K, V, Rest...> res;
+        get_map_data<K, V>(name, std::inserter(res, res.end()), keytype, valtype);
+        return res;
+    }
+    template<typename V, typename Iter>
+    void get_list_data(const sstring& name, Iter out, data_type valtype = data_type_for<V>()) const {
+        auto vec =
+                value_cast<list_type_impl::native_type>(
+                        list_type_impl::get_instance(valtype, false)->deserialize(
+                                get_blob(name)));
+        std::transform(vec.begin(), vec.end(), out, [](auto& v) { return value_cast<V>(v); });
+    }
+    template<typename V, typename ... Rest>
+    std::vector<V, Rest...> get_list(const sstring& name, data_type valtype = data_type_for<V>()) const {
+        std::vector<V, Rest...> res;
+        get_list_data<V>(name, std::back_inserter(res), valtype);
+        return res;
+    }
+    template<typename V, typename Iter>
+    void get_set_data(const sstring& name, Iter out, data_type valtype =
+                    data_type_for<V>()) const {
+        auto vec =
+                        value_cast<set_type_impl::native_type>(
+                                        set_type_impl::get_instance(valtype,
+                                                        false)->deserialize(
+                                                        get_blob(name)));
+        std::transform(vec.begin(), vec.end(), out, [](auto& p) {
+            return value_cast<V>(p);
+        });
+    }
+    template<typename V, typename ... Rest>
+    std::unordered_set<V, Rest...> get_set(const sstring& name,
+            data_type valtype =
+                    data_type_for<V>()) const {
+        std::unordered_set<V, Rest...> res;
+        get_set_data<V>(name, std::inserter(res, res.end()), valtype);
+        return res;
+    }
+    const std::vector<::shared_ptr<column_specification>>& get_columns() const {
+        return _columns;
+    }
+};
+
 class untyped_result_set {
 public:
-    class row {
-    private:
-        const std::vector<::shared_ptr<column_specification>> _columns;
-        const std::unordered_map<sstring, bytes_opt> _data;
-    public:
-        row(const std::unordered_map<sstring, bytes_opt>&);
-        row(const std::vector<::shared_ptr<column_specification>>&, std::vector<bytes_opt>);
-        row(row&&) = default;
-        row(const row&) = delete;
-
-        bool has(const sstring&) const;
-        bytes get_blob(const sstring& name) const {
-            return *_data.at(name);
-        }
-        template<typename T>
-        T get_as(const sstring& name) const {
-            return value_cast<T>(data_type_for<T>()->deserialize(get_blob(name)));
-        }
-        template<typename T>
-        std::experimental::optional<T> get_opt(const sstring& name) const {
-            return has(name) ? get_as<T>(name) : std::experimental::optional<T>{};
-        }
-        template<typename T>
-        T get_or(const sstring& name, T t) const {
-            return has(name) ? get_as<T>(name) : t;
-        }
-        // this could maybe be done as an overload of get_as (or something), but that just
-        // muddles things for no real gain. Let user (us) attempt to know what he is doing instead.
-        template<typename K, typename V, typename Iter>
-        void get_map_data(const sstring& name, Iter out, data_type keytype =
-                data_type_for<K>(), data_type valtype =
-                data_type_for<V>()) const {
-            auto vec =
-                    value_cast<map_type_impl::native_type>(
-                            map_type_impl::get_instance(keytype, valtype, false)->deserialize(
-                                    get_blob(name)));
-            std::transform(vec.begin(), vec.end(), out,
-                    [](auto& p) {
-                        return std::pair<K, V>(value_cast<K>(p.first), value_cast<V>(p.second));
-                    });
-        }
-        template<typename K, typename V, typename ... Rest>
-        std::unordered_map<K, V, Rest...> get_map(const sstring& name,
-                data_type keytype = data_type_for<K>(), data_type valtype =
-                        data_type_for<V>()) const {
-            std::unordered_map<K, V, Rest...> res;
-            get_map_data<K, V>(name, std::inserter(res, res.end()), keytype, valtype);
-            return res;
-        }
-        template<typename V, typename Iter>
-        void get_list_data(const sstring& name, Iter out, data_type valtype = data_type_for<V>()) const {
-            auto vec =
-                    value_cast<list_type_impl::native_type>(
-                            list_type_impl::get_instance(valtype, false)->deserialize(
-                                    get_blob(name)));
-            std::transform(vec.begin(), vec.end(), out, [](auto& v) { return value_cast<V>(v); });
-        }
-        template<typename V, typename ... Rest>
-        std::vector<V, Rest...> get_list(const sstring& name, data_type valtype = data_type_for<V>()) const {
-            std::vector<V, Rest...> res;
-            get_list_data<V>(name, std::back_inserter(res), valtype);
-            return res;
-        }
-        template<typename V, typename Iter>
-        void get_set_data(const sstring& name, Iter out, data_type valtype =
-                        data_type_for<V>()) const {
-            auto vec =
-                            value_cast<set_type_impl::native_type>(
-                                            set_type_impl::get_instance(valtype,
-                                                            false)->deserialize(
-                                                            get_blob(name)));
-            std::transform(vec.begin(), vec.end(), out, [](auto& p) {
-                return value_cast<V>(p);
-            });
-        }
-        template<typename V, typename ... Rest>
-        std::unordered_set<V, Rest...> get_set(const sstring& name,
-                data_type valtype =
-                        data_type_for<V>()) const {
-            std::unordered_set<V, Rest...> res;
-            get_set_data<V>(name, std::inserter(res, res.end()), valtype);
-            return res;
-        }
-        const std::vector<::shared_ptr<column_specification>>& get_columns() const {
-            return _columns;
-        }
-    };
-
+    using row = untyped_result_set_row;
    typedef std::vector<row> rows_type;
    using const_iterator = rows_type::const_iterator;

--- a/cql3/update_parameters.cc
+++ b/cql3/update_parameters.cc
@@ -53,6 +53,9 @@ update_parameters::get_prefetched_list(
        return {};
    }

+    if (column.is_static()) {
+        ckey = clustering_key_view::make_empty();
+    }
    auto i = _prefetched->rows.find(std::make_pair(std::move(pkey), std::move(ckey)));
    if (i == _prefetched->rows.end()) {
        return {};
--- a/Show More
+++ b/Show More