cache_flat_mutation_reader: read_from_underlying(): propagate timeout

Propagate the timeout to `consume_mutation_fragments_until()` and hence to the underlying reader, to ensure queued sstable reads that belong to timed-out requests are dropped from the queue, instead of pointlessly serving them. consume_mutation_fragments_until() received a `timeout` parameter as it didn't have one. Fixes: #1068 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190906135629.67342-1-bdenes@scylladb.com>
mutation_partition: verify row::append_cell() precondition
2019-09-06 16:05:42 +02:00 · 2019-08-23 15:06:35 +02:00 · 2019-08-17 13:12:40 +03:00 · 2019-08-14 15:33:33 +02:00 · 2019-08-14 13:11:56 +02:00 · 2019-08-14 13:11:56 +02:00
1238 changed files with 69839 additions and 29482 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -1,3 +1,9 @@
+This is Scylla's bug tracker, to be used for reporting bugs only.
+If you have a question about Scylla, and not a bug, please ask it in
+our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.
+
+- [] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.
+
 *Installation details*
 Scylla version (or git commit hash):
 Cluster size:
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,4 @@
+Scylla doesn't use pull-requests, please send a patch to the [mailing list](mailto:scylladb-dev@googlegroups.com) instead.
+See our [contributing guidelines](../CONTRIBUTING.md) and our [Scylla development guidelines](../HACKING.md) for more information.
+
+If you have any questions please don't hesitate to send a mail to the [dev list](mailto:scylladb-dev@googlegroups.com).
--- a/.gitignore
+++ b/.gitignore
@@ -18,3 +18,4 @@ CMakeLists.txt.user
 *.egg-info
 __pycache__CMakeLists.txt.user
 .gdbinit
+resources
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "seastar"]
 	path = seastar
-	url = ../seastar
+	url = ../scylla-seastar
 	ignore = dirty
 [submodule "swagger-ui"]
 	path = swagger-ui
@@ -9,3 +9,6 @@
 [submodule "dist/ami/files/scylla-ami"]
 	path = dist/ami/files/scylla-ami
 	url = ../scylla-ami
+[submodule "xxHash"]
+	path = xxHash
+	url = ../xxHash
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -125,7 +125,7 @@ list(REMOVE_ITEM SEASTAR_CFLAGS "-DHAVE_GCC6_CONCEPTS")
 #
 # For ease of browsing the source code, we always pretend that DPDK is enabled.
 target_compile_options(scylla PUBLIC
-        -std=gnu++14
+        -std=gnu++1z
        -DHAVE_DPDK
        -DHAVE_HWLOC
        "${SEASTAR_CFLAGS}")
@@ -137,4 +137,5 @@ target_include_directories(scylla PUBLIC
        ${SEASTAR_DPDK_INCLUDE_DIRS}
        ${SEASTAR_INCLUDE_DIRS}
        ${Boost_INCLUDE_DIRS}
+        xxhash
        build/release/gen)
--- a/HACKING.md
+++ b/HACKING.md
@@ -85,7 +85,53 @@ The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread

 All changes to Scylla are submitted as patches to the public mailing list. Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.

-Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/).
+Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/). There are also some guidelines that can help you make the patch review process smoother:
+
+1. Before generating patches, make sure your Git configuration points to `.gitorderfile`. You can do it by running
+
+```bash
+$ git config diff.orderfile .gitorderfile
+```
+
+2. If you are sending more than a single patch, push your changes into a new branch of your fork of Scylla on GitHub and add a URL pointing to this branch to your cover letter.
+
+3. If you are sending a new revision of an earlier patchset, add a brief summary of changes in this version, for example:
+```
+In v3:
+    - declared move constructor and move assignment operator as noexcept
+    - used std::variant instead of a union
+    ...
+```
+
+4. Add information about the tests run with this fix. It can look like
+```
+"Tests: unit ({mode}), dtest ({smp})"
+```
+
+The usual is "Tests: unit (release)", although running debug tests is encouraged.
+
+5. When answering review comments, prefer inline quotes as they make it easier to track the conversation across multiple e-mails.
+
+### Finding a person to review and merge your patches
+
+You can use the `scripts/find-maintainer` script to find a subsystem maintainer and/or reviewer for your patches. The script accepts a filename in the git source tree as an argument and outputs a list of subsystems the file belongs to and their respective maintainers and reviewers. For example, if you changed the `cql3/statements/create_view_statement.hh` file, run the script as follows:
+
+```bash
+$ ./scripts/find-maintainer cql3/statements/create_view_statement.hh
+```
+
+and you will get output like this:
+
+```
+CQL QUERY LANGUAGE
+  Tomasz Grabiec <tgrabiec@scylladb.com>   [maintainer]
+  Pekka Enberg <penberg@scylladb.com>      [maintainer]
+MATERIALIZED VIEWS
+  Pekka Enberg <penberg@scylladb.com>      [maintainer]
+  Duarte Nunes <duarte@scylladb.com>       [maintainer]
+  Nadav Har'El <nyh@scylladb.com>          [reviewer]
+  Duarte Nunes <duarte@scylladb.com>       [reviewer]
+```

 ### Running Scylla

--- a/131
+++ b/131
@@ -0,0 +1,131 @@
+M: Maintainer with commit access
+R: Reviewer with subsystem expertise
+F: Filename, directory, or pattern for the subsystem
+
+---
+
+AUTH
+M: Paweł Dziepak <pdziepak@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+R: Calle Wilund <calle@scylladb.com>
+R: Vlad Zolotarov <vladz@scylladb.com>
+R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
+F: auth/*
+
+CACHE
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+M: Paweł Dziepak <pdziepak@scylladb.com>
+R: Piotr Jastrzebski <piotr@scylladb.com>
+F: row_cache*
+F: *mutation*
+F: tests/mvcc*
+
+COMMITLOG / BATCHLOGa
+M: Paweł Dziepak <pdziepak@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+R: Calle Wilund <calle@scylladb.com>
+F: db/commitlog/*
+F: db/batch*
+
+COORDINATOR
+M: Paweł Dziepak <pdziepak@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+R: Gleb Natapov <gleb@scylladb.com>
+F: service/storage_proxy*
+
+COMPACTION
+R: Raphael S. Carvalho <raphaelsc@scylladb.com>
+R: Glauber Costa <glauber@scylladb.com>
+R: Nadav Har'El <nyh@scylladb.com>
+F: sstables/compaction*
+
+CQL TRANSPORT LAYER
+M: Pekka Enberg <penberg@scylladb.com>
+F: transport/*
+
+CQL QUERY LANGUAGE
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+M: Pekka Enberg <penberg@scylladb.com>
+F: cql3/*
+
+COUNTERS
+M: Paweł Dziepak <pdziepak@scylladb.com>
+F: counters*
+F: tests/counter_test*
+
+GOSSIP
+M: Duarte Nunes <duarte@scylladb.com>
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+R: Asias He <asias@scylladb.com>
+F: gms/*
+
+DOCKER
+M: Pekka Enberg <penberg@scylladb.com>
+F: dist/docker/*
+
+LSA
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+M: Paweł Dziepak <pdziepak@scylladb.com>
+F: utils/logalloc*
+
+MATERIALIZED VIEWS
+M: Duarte Nunes <duarte@scylladb.com>
+M: Pekka Enberg <penberg@scylladb.com>
+R: Nadav Har'El <nyh@scylladb.com>
+R: Duarte Nunes <duarte@scylladb.com>
+F: db/view/*
+F: cql3/statements/*view*
+
+PACKAGING
+R: Takuya ASADA <syuu@scylladb.com>
+F: dist/*
+
+REPAIR
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+R: Asias He <asias@scylladb.com>
+R: Nadav Har'El <nyh@scylladb.com>
+F: repair/*
+
+SCHEMA MANAGEMENT
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+M: Pekka Enberg <penberg@scylladb.com>
+F: db/schema_tables*
+F: db/legacy_schema_migrator*
+F: service/migration*
+F: schema*
+
+SECONDARY INDEXES
+M: Pekka Enberg <penberg@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+R: Nadav Har'El <nyh@scylladb.com>
+R: Pekka Enberg <penberg@scylladb.com>
+F: db/index/*
+F: cql3/statements/*index*
+
+SSTABLES
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+R: Raphael S. Carvalho <raphaelsc@scylladb.com>
+R: Glauber Costa <glauber@scylladb.com>
+R: Nadav Har'El <nyh@scylladb.com>
+F: sstables/*
+
+STREAMING
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+R: Asias He <asias@scylladb.com>
+F: streaming/*
+F: service/storage_service.*
+
+THRIFT TRANSPORT LAYER
+M: Duarte Nunes <duarte@scylladb.com>
+F: thrift/*
+
+THE REST
+M: Avi Kivity <avi@scylladb.com>
+M: Paweł Dziepak <pdziepak@scylladb.com>
+M: Duarte Nunes <duarte@scylladb.com>
+M: Tomasz Grabiec <tgrabiec@scylladb.com>
+F: *
--- a/NOTICE.txt
+++ b/NOTICE.txt
@@ -1,2 +1,5 @@
 This project includes code developed by the Apache Software Foundation (http://www.apache.org/),
 especially Apache Cassandra.
+
+It also includes files from https://github.com/antonblanchard/crc32-vpmsum (author Anton Blanchard <anton@au.ibm.com>, IBM).
+These files are located in utils/arch/powerpc/crc32-vpmsum. Their license may be found in licenses/LICENSE-crc32-vpmsum.TXT.
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 #!/bin/sh

-VERSION=666.development
+VERSION=2.3.6

 if test -f version
 then
--- a/api/api-doc/column_family.json
+++ b/api/api-doc/column_family.json
@@ -455,7 +455,7 @@
         "operations":[
            {
               "method":"GET",
-               "summary":"Returns a list of filenames that contain the given key on this node",
+               "summary":"Returns a list of sstable filenames that contain the given partition key on this node",
               "type":"array",
               "items":{
                  "type":"string"
@@ -475,7 +475,7 @@
                  },
                  {
                     "name":"key",
-                     "description":"The key",
+                     "description":"The partition key. In a composite-key scenario, use ':' to separate the columns in the key.",
                     "required":true,
                     "allowMultiple":false,
                     "type":"string",
--- a/api/api-doc/config.json
+++ b/api/api-doc/config.json
@@ -0,0 +1,30 @@
+"/v2/config/{id}": {
+      "get": {
+        "description": "Return a config value",
+        "operationId": "find_config_id",
+        "produces": [
+          "application/json"
+        ],
+        "tags": ["config"],
+        "parameters": [
+          {
+            "name": "id",
+            "in": "path",
+            "description": "ID of config to return",
+            "required": true,
+            "type": "string"
+          }
+        ],
+        "responses": {
+          "200": {
+            "description": "Config value"
+          },
+          "default": {
+            "description": "unexpected error",
+            "schema": {
+              "$ref": "#/definitions/ErrorModel"
+            }
+          }
+        }
+      }
+}
--- a/api/api-doc/storage_service.json
+++ b/api/api-doc/storage_service.json
@@ -792,6 +792,24 @@
            }
         ]
      },
+      {
+         "path":"/storage_service/active_repair/",
+         "operations":[
+            {
+               "method":"GET",
+               "summary":"Return an array with the ids of the currently active repairs",
+               "type":"array",
+               "items":{
+                  "type":"int"
+               },
+               "nickname":"get_active_repair_async",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[]
+            }
+         ]
+      },
      {
         "path":"/storage_service/repair_async/{keyspace}",
         "operations":[
@@ -2111,6 +2129,41 @@
               ]
            }
         ]
+      },
+      {
+         "path":"/storage_service/view_build_statuses/{keyspace}/{view}",
+         "operations":[
+            {
+               "method":"GET",
+               "summary":"Gets the progress of a materialized view build",
+               "type":"array",
+               "items":{
+                  "type":"mapper"
+               },
+               "nickname":"view_build_statuses",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[
+                  {
+                     "name":"keyspace",
+                     "description":"The keyspace",
+                     "required":true,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"path"
+                  },
+                  {
+                     "name":"view",
+                     "description":"View name",
+                     "required":true,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"path"
+                  }
+               ]
+            }
+         ]
      }
   ],
   "models":{
@@ -2175,11 +2228,11 @@
               "description":"The column family"
            },
            "total":{
-               "type":"int",
+               "type":"long",
               "description":"The total snapshot size"
            },
            "live":{
-               "type":"int",
+               "type":"long",
               "description":"The live snapshot size"
            }
         }
--- a/api/api-doc/swagger20_header.json
+++ b/api/api-doc/swagger20_header.json
@@ -0,0 +1,29 @@
+{
+  "swagger": "2.0",
+  "info": {
+    "version": "1.0.0",
+    "title": "Scylla API",
+    "description": "The scylla API version 2.0",
+    "termsOfService": "http://www.scylladb.com/tos/",
+    "contact": {
+      "name": "Scylla Team",
+      "email": "info@scylladb.com",
+      "url": "http://scylladb.com"
+    },
+    "license": {
+      "name": "AGPL",
+      "url": "https://github.com/scylladb/scylla/blob/master/LICENSE.AGPL"
+    }
+  },
+  "host": "{{Host}}",
+  "basePath": "/v2",
+  "schemes": [
+    "http"
+  ],
+  "consumes": [
+    "application/json"
+  ],
+  "produces": [
+    "application/json"
+  ],
+  "paths": {
--- a/api/api.cc
+++ b/api/api.cc
@@ -39,6 +39,7 @@
 #include "http/exception.hh"
 #include "stream_manager.hh"
 #include "system.hh"
+#include "api/config.hh"

 namespace api {

@@ -54,14 +55,18 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {

 future<> set_server_init(http_context& ctx) {
    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
+    auto rb02 = std::make_shared < api_registry_builder20 > (ctx.api_doc, "/v2");

-    return ctx.http_server.set_routes([rb, &ctx](routes& r) {
+    return ctx.http_server.set_routes([rb, &ctx, rb02](routes& r) {
        r.register_exeption_handler(exception_reply);
        r.put(GET, "/ui", new httpd::file_handler(ctx.api_dir + "/index.html",
                new content_replace("html")));
        r.add(GET, url("/ui").remainder("path"), new httpd::directory_handler(ctx.api_dir,
                new content_replace("html")));
        rb->set_api_doc(r);
+        rb02->set_api_doc(r);
+        rb02->register_api_file(r, "swagger20_header");
+        set_config(rb02, ctx, r);
        rb->register_function(r, "system",
                "The system related API");
        set_system(ctx, r);
@@ -112,6 +117,11 @@ future<> set_server_stream_manager(http_context& ctx) {
                "The stream manager API", set_stream_manager);
 }

+future<> set_server_cache(http_context& ctx) {
+    return register_api(ctx, "cache_service",
+            "The cache service API", set_cache_service);
+}
+
 future<> set_server_gossip_settle(http_context& ctx) {
    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

@@ -119,9 +129,6 @@ future<> set_server_gossip_settle(http_context& ctx) {
        rb->register_function(r, "failure_detector",
                "The failure detector API");
        set_failure_detector(ctx,r);
-        rb->register_function(r, "cache_service",
-                "The cache service API");
-        set_cache_service(ctx,r);
    });
 }

--- a/api/api_init.hh
+++ b/api/api_init.hh
@@ -46,7 +46,7 @@ future<> set_server_messaging_service(http_context& ctx);
 future<> set_server_storage_proxy(http_context& ctx);
 future<> set_server_stream_manager(http_context& ctx);
 future<> set_server_gossip_settle(http_context& ctx);
+future<> set_server_cache(http_context& ctx);
 future<> set_server_done(http_context& ctx);

-
 }
--- a/api/column_family.cc
+++ b/api/column_family.cc
@@ -429,7 +429,7 @@ void set_column_family(http_context& ctx, routes& r) {
        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
            utils::estimated_histogram res(0);
            for (auto i: *cf.get_sstables() ) {
-                res.merge(i->get_stats_metadata().estimated_column_count);
+                res.merge(i->get_stats_metadata().estimated_cells_count);
            }
            return res;
        },
@@ -905,5 +905,20 @@ void set_column_family(http_context& ctx, routes& r) {
            return make_ready_future<json::json_return_type>(res);
        });
    });
+
+    cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<request> req) {
+        auto key = req->get_query_param("key");
+        auto uuid = get_uuid(req->param["name"], ctx.db.local());
+
+        return ctx.db.map_reduce0([key, uuid] (database& db) {
+            return db.find_column_family(uuid).get_sstables_by_partition_key(key);
+        }, std::unordered_set<sstring>(),
+            [](std::unordered_set<sstring> a, std::unordered_set<sstring>&& b) mutable {
+            a.insert(b.begin(),b.end());
+            return a;
+        }).then([](const std::unordered_set<sstring>& res) {
+            return make_ready_future<json::json_return_type>(container_to_vec(res));
+        });
+    });
 }
 }
--- a/api/column_family.hh
+++ b/api/column_family.hh
@@ -24,6 +24,7 @@
 #include "api.hh"
 #include "api/api-doc/column_family.json.hh"
 #include "database.hh"
+#include <any>

 namespace api {

@@ -37,9 +38,15 @@ template<class Mapper, class I, class Reducer>
 future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
        Mapper mapper, Reducer reducer) {
    auto uuid = get_uuid(name, ctx.db.local());
-    return ctx.db.map_reduce0([mapper, uuid](database& db) {
-        return mapper(db.find_column_family(uuid));
-    }, init, reducer);
+    using mapper_type = std::function<std::any (database&)>;
+    using reducer_type = std::function<std::any (std::any, std::any)>;
+    return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {
+        return I(mapper(db.find_column_family(uuid)));
+    }), std::any(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
+        return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
+    })).then([] (std::any r) {
+        return std::any_cast<I>(std::move(r));
+    });
 }


@@ -51,35 +58,42 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n
    });
 }

-template<class Mapper, class I, class Reducer, class Result>
-future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
-        Mapper mapper, Reducer reducer, Result result) {
-    auto uuid = get_uuid(name, ctx.db.local());
-    return ctx.db.map_reduce0([mapper, uuid](database& db) {
-        return mapper(db.find_column_family(uuid));
-    }, init, reducer);
-}
-
-
 template<class Mapper, class I, class Reducer, class Result>
 future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
        Mapper mapper, Reducer reducer, Result result) {
-    return map_reduce_cf_raw(ctx, name, init, mapper, reducer, result).then([result](const I& res) mutable {
+    return map_reduce_cf_raw(ctx, name, init, mapper, reducer).then([result](const I& res) mutable {
        result = res;
        return make_ready_future<json::json_return_type>(result);
    });
 }

-template<class Mapper, class I, class Reducer>
-future<I> map_reduce_cf_raw(http_context& ctx, I init,
-        Mapper mapper, Reducer reducer) {
-    return ctx.db.map_reduce0([mapper, init, reducer](database& db) {
+struct map_reduce_column_families_locally {
+    std::any init;
+    std::function<std::any (column_family&)> mapper;
+    std::function<std::any (std::any, std::any)> reducer;
+    std::any operator()(database& db) const {
        auto res = init;
        for (auto i : db.get_column_families()) {
            res = reducer(res, mapper(*i.second.get()));
        }
        return res;
-    }, init, reducer);
+    }
+};
+
+template<class Mapper, class I, class Reducer>
+future<I> map_reduce_cf_raw(http_context& ctx, I init,
+        Mapper mapper, Reducer reducer) {
+    using mapper_type = std::function<std::any (column_family&)>;
+    using reducer_type = std::function<std::any (std::any, std::any)>;
+    auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {
+        return I(mapper(cf));
+    });
+    auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
+        return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
+    });
+    return ctx.db.map_reduce0(map_reduce_column_families_locally{init, std::move(wrapped_mapper), wrapped_reducer}, std::any(init), wrapped_reducer).then([] (std::any res) {
+        return std::any_cast<I>(std::move(res));
+    });
 }


--- a/api/config.cc
+++ b/api/config.cc
@@ -0,0 +1,112 @@
+/*
+ * Copyright 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "api/config.hh"
+#include "api/api-doc/config.json.hh"
+#include "db/config.hh"
+#include <sstream>
+#include <boost/algorithm/string/replace.hpp>
+
+namespace api {
+
+template<class T>
+json::json_return_type get_json_return_type(const T& val) {
+    return json::json_return_type(val);
+}
+
+/*
+ * As commented on db::seed_provider_type is not used
+ * and probably never will.
+ *
+ * Just in case, we will return its name
+ */
+template<>
+json::json_return_type get_json_return_type(const db::seed_provider_type& val) {
+    return json::json_return_type(val.class_name);
+}
+
+std::string format_type(const std::string& type) {
+    if (type == "int") {
+        return "integer";
+    }
+    return type;
+}
+
+future<> get_config_swagger_entry(const std::string& name, const std::string& description, const std::string& type, bool& first, output_stream<char>& os) {
+    std::stringstream ss;
+    if (first) {
+        first=false;
+    } else {
+        ss <<',';
+    };
+    ss << "\"/config/" << name <<"\": {"
+      "\"get\": {"
+        "\"description\": \"" << boost::replace_all_copy(boost::replace_all_copy(boost::replace_all_copy(description,"\n","\\n"),"\"", "''"), "\t", " ") <<"\","
+        "\"operationId\": \"find_config_"<< name <<"\","
+        "\"produces\": ["
+          "\"application/json\""
+        "],"
+        "\"tags\": [\"config\"],"
+        "\"parameters\": ["
+        "],"
+        "\"responses\": {"
+          "\"200\": {"
+            "\"description\": \"Config value\","
+             "\"schema\": {"
+               "\"type\": \"" << format_type(type) << "\""
+             "}"
+          "},"
+          "\"default\": {"
+            "\"description\": \"unexpected error\","
+            "\"schema\": {"
+              "\"$ref\": \"#/definitions/ErrorModel\""
+            "}"
+          "}"
+        "}"
+      "}"
+    "}";
+    return os.write(ss.str());
+}
+
+namespace cs = httpd::config_json;
+#define _get_config_value(name, type, deflt, status, desc, ...) if (id == #name) {return get_json_return_type(ctx.db.local().get_config().name());}
+
+
+#define _get_config_description(name, type, deflt, status, desc, ...) f = f.then([&os, &first] {return get_config_swagger_entry(#name, desc, #type, first, os);});
+
+void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {
+    rb->register_function(r, [] (output_stream<char>& os) {
+        return do_with(true, [&os] (bool& first) {
+            auto f = make_ready_future();
+            _make_config_values(_get_config_description)
+            return f;
+        });
+    });
+
+    cs::find_config_id.set(r, [&ctx] (const_req r) {
+        auto id = r.param["id"];
+        _make_config_values(_get_config_value)
+        throw bad_param_exception(sstring("No such config entry: ") + id);
+    });
+}
+
+}
+
--- a/utils/utils.cc
+++ b/utils/utils.cc
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 ScyllaDB
+ * Copyright (C) 2018 ScyllaDB
 */

 /*
@@ -19,7 +19,12 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

-// Used to ensure that all .hh files build, as well as a place to put
-// out-of-line implementations.
+#pragma once

-#include "fb_utilities.hh"
+#include "api.hh"
+#include <seastar/http/api_docs.hh>
+
+namespace api {
+
+void set_config(std::shared_ptr<api_registry_builder20> rb, http_context& ctx, routes& r);
+}
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -93,10 +93,13 @@ void set_storage_service(http_context& ctx, routes& r) {
        return ctx.db.local().commitlog()->active_config().commit_log_location;
    });

-    ss::get_token_endpoint.set(r, [] (const_req req) {
-        auto token_to_ep = service::get_local_storage_service().get_token_to_endpoint_map();
-        std::vector<storage_service_json::mapper> res;
-        return map_to_key_value(token_to_ep, res);
+    ss::get_token_endpoint.set(r, [] (std::unique_ptr<request> req) {
+        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().get_token_to_endpoint_map(), [](const auto& i) {
+            storage_service_json::mapper val;
+            val.key = boost::lexical_cast<std::string>(i.first);
+            val.value = boost::lexical_cast<std::string>(i.second);
+            return val;
+        }));
    });

    ss::get_leaving_nodes.set(r, [](const_req req) {
@@ -355,6 +358,12 @@ void set_storage_service(http_context& ctx, routes& r) {
                });
    });

+    ss::get_active_repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
+        return get_active_repairs(ctx.db).then([] (std::vector<int> res){
+            return make_ready_future<json::json_return_type>(res);
+        });
+    });
+
    ss::repair_async_status.set(r, [&ctx](std::unique_ptr<request> req) {
        return repair_get_status(ctx.db, boost::lexical_cast<int>( req->get_query_param("id")))
                .then_wrapped([] (future<repair_status>&& fut) {
@@ -843,6 +852,15 @@ void set_storage_service(http_context& ctx, routes& r) {
            return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
        });
    });
+
+    ss::view_build_statuses.set(r, [&ctx] (std::unique_ptr<request> req) {
+        auto keyspace = validate_keyspace(ctx, req->param);
+        auto view = req->param["view"];
+        return service::get_local_storage_service().view_build_statuses(std::move(keyspace), std::move(view)).then([] (std::unordered_map<sstring, sstring> status) {
+            std::vector<storage_service_json::mapper> res;
+            return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));
+        });
+    });
 }

 }
--- a/atomic_cell.cc
+++ b/atomic_cell.cc
@@ -0,0 +1,222 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "atomic_cell.hh"
+#include "atomic_cell_or_collection.hh"
+#include "types.hh"
+
+/// LSA mirator for cells with irrelevant type
+///
+///
+const data::type_imr_descriptor& no_type_imr_descriptor() {
+    static thread_local data::type_imr_descriptor state(data::type_info::make_variable_size());
+    return state;
+}
+
+atomic_cell atomic_cell::make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
+    auto& imr_data = no_type_imr_descriptor();
+    return atomic_cell(
+            imr_data.type_info(),
+            imr_object_type::make(data::cell::make_dead(timestamp, deletion_time), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value, atomic_cell::collection_member cm) {
+    auto& imr_data = type.imr_state();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, bool(cm)), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
+                             gc_clock::time_point expiry, gc_clock::duration ttl, atomic_cell::collection_member cm) {
+    auto& imr_data = type.imr_state();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live(imr_data.type_info(), timestamp, value, expiry, ttl, bool(cm)), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
+    auto& imr_data = no_type_imr_descriptor();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live_counter_update(timestamp, value), &imr_data.lsa_migrator())
+    );
+}
+
+atomic_cell atomic_cell::make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size) {
+    auto& imr_data = no_type_imr_descriptor();
+    return atomic_cell(
+        imr_data.type_info(),
+        imr_object_type::make(data::cell::make_live_uninitialized(imr_data.type_info(), timestamp, size), &imr_data.lsa_migrator())
+    );
+}
+
+static imr::utils::object<data::cell::structure> copy_cell(const data::type_imr_descriptor& imr_data, const uint8_t* ptr)
+{
+    using imr_object_type = imr::utils::object<data::cell::structure>;
+
+    // If the cell doesn't own any memory it is trivial and can be copied with
+    // memcpy.
+    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
+    if (!f.template get<data::cell::tags::external_data>()) {
+        data::cell::context ctx(f, imr_data.type_info());
+        // XXX: We may be better off storing the total cell size in memory. Measure!
+        auto size = data::cell::structure::serialized_object_size(ptr, ctx);
+        return imr_object_type::make_raw(size, [&] (uint8_t* dst) noexcept {
+            std::copy_n(ptr, size, dst);
+        }, &imr_data.lsa_migrator());
+    }
+
+    return imr_object_type::make(data::cell::copy_fn(imr_data.type_info(), ptr), &imr_data.lsa_migrator());
+}
+
+atomic_cell::atomic_cell(const abstract_type& type, atomic_cell_view other)
+    : atomic_cell(type.imr_state().type_info(),
+                  copy_cell(type.imr_state(), other._view.raw_pointer()))
+{ }
+
+atomic_cell_or_collection atomic_cell_or_collection::copy(const abstract_type& type) const {
+    if (!_data.get()) {
+        return atomic_cell_or_collection();
+    }
+    auto& imr_data = type.imr_state();
+    return atomic_cell_or_collection(
+        copy_cell(imr_data, _data.get())
+    );
+}
+
+atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type, atomic_cell_view acv)
+    : _data(copy_cell(type.imr_state(), acv._view.raw_pointer()))
+{
+}
+
+static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
+{
+    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
+    auto ti = data::type_info::make_collection();
+    data::cell::context ctx(f, ti);
+    auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
+    auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
+    return collection_mutation_view { dv };
+}
+
+collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
+    return get_collection_mutation_view(_data.get());
+}
+
+collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
+    : _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
+{
+}
+
+collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
+    : _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
+{
+}
+
+collection_mutation::operator collection_mutation_view() const
+{
+    return get_collection_mutation_view(_data.get());
+}
+
+bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
+{
+    auto ptr_a = _data.get();
+    auto ptr_b = other._data.get();
+
+    if (!ptr_a || !ptr_b) {
+        return !ptr_a && !ptr_b;
+    }
+
+    if (type.is_atomic()) {
+        auto a = atomic_cell_view::from_bytes(type.imr_state().type_info(), _data);
+        auto b = atomic_cell_view::from_bytes(type.imr_state().type_info(), other._data);
+        if (a.timestamp() != b.timestamp()) {
+            return false;
+        }
+        if (a.is_live()) {
+            if (!b.is_live()) {
+                return false;
+            }
+            if (a.is_counter_update()) {
+                if (!b.is_counter_update()) {
+                    return false;
+                }
+                return a.counter_update_value() == b.counter_update_value();
+            }
+            if (a.is_live_and_has_ttl()) {
+                if (!b.is_live_and_has_ttl()) {
+                    return false;
+                }
+                if (a.ttl() != b.ttl() || a.expiry() != b.expiry()) {
+                    return false;
+                }
+            }
+            return a.value() == b.value();
+        }
+        return a.deletion_time() == b.deletion_time();
+    } else {
+        return as_collection_mutation().data == other.as_collection_mutation().data;
+    }
+}
+
+size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t) const
+{
+    if (!_data.get()) {
+        return 0;
+    }
+    auto ctx = data::cell::context(_data.get(), t.imr_state().type_info());
+
+    auto view = data::cell::structure::make_view(_data.get(), ctx);
+    auto flags = view.get<data::cell::tags::flags>();
+
+    size_t external_value_size = 0;
+    if (flags.get<data::cell::tags::external_data>()) {
+        if (flags.get<data::cell::tags::collection>()) {
+            external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
+        } else {
+            auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
+            external_value_size = cell_view.value_size();
+        }
+        // Add overhead of chunk headers. The last one is a special case.
+        external_value_size += (external_value_size - 1) / data::cell::maximum_external_chunk_length * data::cell::external_chunk_overhead;
+        external_value_size += data::cell::external_last_chunk_overhead;
+    }
+    return data::cell::structure::serialized_object_size(_data.get(), ctx)
+        + imr_object_type::size_overhead + external_value_size;
+}
+
+std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection& c) {
+    if (!c._data.get()) {
+        return os << "{ null atomic_cell_or_collection }";
+    }
+    using dc = data::cell;
+    os << "{ ";
+    if (dc::structure::get_member<dc::tags::flags>(c._data.get()).get<dc::tags::collection>()) {
+        os << "collection";
+    } else {
+        os << "atomic cell";
+    }
+    return os << " @" << static_cast<const void*>(c._data.get()) << " }";
+}
--- a/atomic_cell.hh
+++ b/atomic_cell.hh
@@ -30,200 +30,48 @@
 #include <cstdint>
 #include <iosfwd>
 #include <seastar/util/gcc6-concepts.hh>
+#include "data/cell.hh"
+#include "data/schema_info.hh"
+#include "imr/utils.hh"

-template<typename T, typename Input>
-static inline
-void set_field(Input& v, unsigned offset, T val) {
-    reinterpret_cast<net::packed<T>*>(v.begin() + offset)->raw = net::hton(val);
-}
+class abstract_type;
+class collection_type_impl;

-template<typename T>
-static inline
-T get_field(const bytes_view& v, unsigned offset) {
-    return net::ntoh(*reinterpret_cast<const net::packed<T>*>(v.begin() + offset));
-}
+using atomic_cell_value_view = data::value_view;
+using atomic_cell_value_mutable_view = data::value_mutable_view;

-class atomic_cell_or_collection;
-
-/*
- * Represents atomic cell layout. Works on serialized form.
- *
- * Layout:
- *
- *  <live>  := <int8_t:flags><int64_t:timestamp>(<int32_t:expiry><int32_t:ttl>)?<value>
- *  <dead>  := <int8_t:    0><int64_t:timestamp><int32_t:deletion_time>
- */
-class atomic_cell_type final {
-private:
-    static constexpr int8_t LIVE_FLAG = 0x01;
-    static constexpr int8_t EXPIRY_FLAG = 0x02; // When present, expiry field is present. Set only for live cells
-    static constexpr int8_t REVERT_FLAG = 0x04; // transient flag used to efficiently implement ReversiblyMergeable for atomic cells.
-    static constexpr int8_t COUNTER_UPDATE_FLAG = 0x08; // Cell is a counter update.
-    static constexpr int8_t COUNTER_IN_PLACE_REVERT = 0x10;
-    static constexpr unsigned flags_size = 1;
-    static constexpr unsigned timestamp_offset = flags_size;
-    static constexpr unsigned timestamp_size = 8;
-    static constexpr unsigned expiry_offset = timestamp_offset + timestamp_size;
-    static constexpr unsigned expiry_size = 4;
-    static constexpr unsigned deletion_time_offset = timestamp_offset + timestamp_size;
-    static constexpr unsigned deletion_time_size = 4;
-    static constexpr unsigned ttl_offset = expiry_offset + expiry_size;
-    static constexpr unsigned ttl_size = 4;
-    friend class counter_cell_builder;
-private:
-    static bool is_counter_update(bytes_view cell) {
-        return cell[0] & COUNTER_UPDATE_FLAG;
-    }
-    static bool is_revert_set(bytes_view cell) {
-        return cell[0] & REVERT_FLAG;
-    }
-    static bool is_counter_in_place_revert_set(bytes_view cell) {
-        return cell[0] & COUNTER_IN_PLACE_REVERT;
-    }
-    template<typename BytesContainer>
-    static void set_revert(BytesContainer& cell, bool revert) {
-        cell[0] = (cell[0] & ~REVERT_FLAG) | (revert * REVERT_FLAG);
-    }
-    template<typename BytesContainer>
-    static void set_counter_in_place_revert(BytesContainer& cell, bool flag) {
-        cell[0] = (cell[0] & ~COUNTER_IN_PLACE_REVERT) | (flag * COUNTER_IN_PLACE_REVERT);
-    }
-    static bool is_live(const bytes_view& cell) {
-        return cell[0] & LIVE_FLAG;
-    }
-    static bool is_live_and_has_ttl(const bytes_view& cell) {
-        return cell[0] & EXPIRY_FLAG;
-    }
-    static bool is_dead(const bytes_view& cell) {
-        return !is_live(cell);
-    }
-    // Can be called on live and dead cells
-    static api::timestamp_type timestamp(const bytes_view& cell) {
-        return get_field<api::timestamp_type>(cell, timestamp_offset);
-    }
-    template<typename BytesContainer>
-    static void set_timestamp(BytesContainer& cell, api::timestamp_type ts) {
-        set_field(cell, timestamp_offset, ts);
-    }
-    // Can be called on live cells only
-private:
-    template<typename BytesView>
-    static BytesView do_get_value(BytesView cell) {
-        auto expiry_field_size = bool(cell[0] & EXPIRY_FLAG) * (expiry_size + ttl_size);
-        auto value_offset = flags_size + timestamp_size + expiry_field_size;
-        cell.remove_prefix(value_offset);
-        return cell;
-    }
-public:
-    static bytes_view value(bytes_view cell) {
-        return do_get_value(cell);
-    }
-    static bytes_mutable_view value(bytes_mutable_view cell) {
-        return do_get_value(cell);
-    }
-    // Can be called on live counter update cells only
-    static int64_t counter_update_value(bytes_view cell) {
-        return get_field<int64_t>(cell, flags_size + timestamp_size);
-    }
-    // Can be called only when is_dead() is true.
-    static gc_clock::time_point deletion_time(const bytes_view& cell) {
-        assert(is_dead(cell));
-        return gc_clock::time_point(gc_clock::duration(
-            get_field<int32_t>(cell, deletion_time_offset)));
-    }
-    // Can be called only when is_live_and_has_ttl() is true.
-    static gc_clock::time_point expiry(const bytes_view& cell) {
-        assert(is_live_and_has_ttl(cell));
-        auto expiry = get_field<int32_t>(cell, expiry_offset);
-        return gc_clock::time_point(gc_clock::duration(expiry));
-    }
-    // Can be called only when is_live_and_has_ttl() is true.
-    static gc_clock::duration ttl(const bytes_view& cell) {
-        assert(is_live_and_has_ttl(cell));
-        return gc_clock::duration(get_field<int32_t>(cell, ttl_offset));
-    }
-    static managed_bytes make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
-        managed_bytes b(managed_bytes::initialized_later(), flags_size + timestamp_size + deletion_time_size);
-        b[0] = 0;
-        set_field(b, timestamp_offset, timestamp);
-        set_field(b, deletion_time_offset, deletion_time.time_since_epoch().count());
-        return b;
-    }
-    static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value) {
-        auto value_offset = flags_size + timestamp_size;
-        managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
-        b[0] = LIVE_FLAG;
-        set_field(b, timestamp_offset, timestamp);
-        std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
-        return b;
-    }
-    static managed_bytes make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
-        auto value_offset = flags_size + timestamp_size;
-        managed_bytes b(managed_bytes::initialized_later(), value_offset + sizeof(value));
-        b[0] = LIVE_FLAG | COUNTER_UPDATE_FLAG;
-        set_field(b, timestamp_offset, timestamp);
-        set_field(b, value_offset, value);
-        return b;
-    }
-    static managed_bytes make_live(api::timestamp_type timestamp, bytes_view value, gc_clock::time_point expiry, gc_clock::duration ttl) {
-        auto value_offset = flags_size + timestamp_size + expiry_size + ttl_size;
-        managed_bytes b(managed_bytes::initialized_later(), value_offset + value.size());
-        b[0] = EXPIRY_FLAG | LIVE_FLAG;
-        set_field(b, timestamp_offset, timestamp);
-        set_field(b, expiry_offset, expiry.time_since_epoch().count());
-        set_field(b, ttl_offset, ttl.count());
-        std::copy_n(value.begin(), value.size(), b.begin() + value_offset);
-        return b;
-    }
-    // make_live_from_serializer() is intended for users that need to serialise
-    // some object or objects to the format used in atomic_cell::value().
-    // With just make_live() the patter would look like follows:
-    // 1. allocate a buffer and write to it serialised objects
-    // 2. pass that buffer to make_live()
-    // 3. make_live() needs to prepend some metadata to the cell value so it
-    //    allocates a new buffer and copies the content of the original one
-    //
-    // The allocation and copy of a buffer can be avoided.
-    // make_live_from_serializer() allows the user code to specify the timestamp
-    // and size of the cell value as well as provide the serialiser function
-    // object, which would write the serialised value of the cell to the buffer
-    // given to it by make_live_from_serializer().
-    template<typename Serializer>
-    GCC6_CONCEPT(requires requires(Serializer serializer, bytes::iterator it) {
-        serializer(it);
-    })
-    static managed_bytes make_live_from_serializer(api::timestamp_type timestamp, size_t size, Serializer&& serializer) {
-        auto value_offset = flags_size + timestamp_size;
-        managed_bytes b(managed_bytes::initialized_later(), value_offset + size);
-        b[0] = LIVE_FLAG;
-        set_field(b, timestamp_offset, timestamp);
-        serializer(b.begin() + value_offset);
-        return b;
-    }
-    template<typename ByteContainer>
-    friend class atomic_cell_base;
+/// View of an atomic cell
+template<mutable_view is_mutable>
+class basic_atomic_cell_view {
+protected:
+    data::cell::basic_atomic_cell_view<is_mutable> _view;
    friend class atomic_cell;
-};
+public:
+    using pointer_type = std::conditional_t<is_mutable == mutable_view::no, const uint8_t*, uint8_t*>;
+protected:
+    explicit basic_atomic_cell_view(data::cell::basic_atomic_cell_view<is_mutable> v)
+        : _view(std::move(v)) { }
+
+    basic_atomic_cell_view(const data::type_info& ti, pointer_type ptr)
+        : _view(data::cell::make_atomic_cell_view(ti, ptr))
+    { }

-template<typename ByteContainer>
-class atomic_cell_base {
-protected:
-    ByteContainer _data;
-protected:
-    atomic_cell_base(ByteContainer&& data) : _data(std::forward<ByteContainer>(data)) { }
    friend class atomic_cell_or_collection;
 public:
+    operator basic_atomic_cell_view<mutable_view::no>() const noexcept {
+        return basic_atomic_cell_view<mutable_view::no>(_view);
+    }
+
+    void swap(basic_atomic_cell_view& other) noexcept {
+        using std::swap;
+        swap(_view, other._view);
+    }
+
    bool is_counter_update() const {
-        return atomic_cell_type::is_counter_update(_data);
-    }
-    bool is_revert_set() const {
-        return atomic_cell_type::is_revert_set(_data);
-    }
-    bool is_counter_in_place_revert_set() const {
-        return atomic_cell_type::is_counter_in_place_revert_set(_data);
+        return _view.is_counter_update();
    }
    bool is_live() const {
-        return atomic_cell_type::is_live(_data);
+        return _view.is_live();
    }
    bool is_live(tombstone t, bool is_counter) const {
        return is_live() && !is_covered_by(t, is_counter);
@@ -232,125 +80,132 @@ public:
        return is_live() && !is_covered_by(t, is_counter) && !has_expired(now);
    }
    bool is_live_and_has_ttl() const {
-        return atomic_cell_type::is_live_and_has_ttl(_data);
+        return _view.is_expiring();
    }
    bool is_dead(gc_clock::time_point now) const {
-        return atomic_cell_type::is_dead(_data) || has_expired(now);
+        return !is_live() || has_expired(now);
    }
    bool is_covered_by(tombstone t, bool is_counter) const {
        return timestamp() <= t.timestamp || (is_counter && t.timestamp != api::missing_timestamp);
    }
    // Can be called on live and dead cells
    api::timestamp_type timestamp() const {
-        return atomic_cell_type::timestamp(_data);
+        return _view.timestamp();
    }
    void set_timestamp(api::timestamp_type ts) {
-        atomic_cell_type::set_timestamp(_data, ts);
+        _view.set_timestamp(ts);
    }
    // Can be called on live cells only
-    auto value() const {
-        return atomic_cell_type::value(_data);
+    data::basic_value_view<is_mutable> value() const {
+        return _view.value();
+    }
+    // Can be called on live cells only
+    size_t value_size() const {
+        return _view.value_size();
+    }
+    bool is_value_fragmented() const {
+        return _view.is_value_fragmented();
    }
    // Can be called on live counter update cells only
    int64_t counter_update_value() const {
-        return atomic_cell_type::counter_update_value(_data);
+        return _view.counter_update_value();
    }
    // Can be called only when is_dead(gc_clock::time_point)
    gc_clock::time_point deletion_time() const {
-        return !is_live() ? atomic_cell_type::deletion_time(_data) : expiry() - ttl();
+        return !is_live() ? _view.deletion_time() : expiry() - ttl();
    }
    // Can be called only when is_live_and_has_ttl()
    gc_clock::time_point expiry() const {
-        return atomic_cell_type::expiry(_data);
+        return _view.expiry();
    }
    // Can be called only when is_live_and_has_ttl()
    gc_clock::duration ttl() const {
-        return atomic_cell_type::ttl(_data);
+        return _view.ttl();
    }
    // Can be called on live and dead cells
    bool has_expired(gc_clock::time_point now) const {
        return is_live_and_has_ttl() && expiry() <= now;
    }
+
    bytes_view serialize() const {
-        return _data;
-    }
-    void set_revert(bool revert) {
-        atomic_cell_type::set_revert(_data, revert);
-    }
-    void set_counter_in_place_revert(bool flag) {
-        atomic_cell_type::set_counter_in_place_revert(_data, flag);
+        return _view.serialize();
    }
 };

-class atomic_cell_view final : public atomic_cell_base<bytes_view> {
-    atomic_cell_view(bytes_view data) : atomic_cell_base(std::move(data)) {}
-public:
-    static atomic_cell_view from_bytes(bytes_view data) { return atomic_cell_view(data); }
+class atomic_cell_view final : public basic_atomic_cell_view<mutable_view::no> {
+    atomic_cell_view(const data::type_info& ti, const uint8_t* data)
+        : basic_atomic_cell_view<mutable_view::no>(ti, data) {}

+    template<mutable_view is_mutable>
+    atomic_cell_view(data::cell::basic_atomic_cell_view<is_mutable> view)
+        : basic_atomic_cell_view<mutable_view::no>(view) { }
    friend class atomic_cell;
+public:
+    static atomic_cell_view from_bytes(const data::type_info& ti, const imr::utils::object<data::cell::structure>& data) {
+        return atomic_cell_view(ti, data.get());
+    }
+
+    static atomic_cell_view from_bytes(const data::type_info& ti, bytes_view bv) {
+        return atomic_cell_view(ti, reinterpret_cast<const uint8_t*>(bv.begin()));
+    }
+
    friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
 };

-class atomic_cell_mutable_view final : public atomic_cell_base<bytes_mutable_view> {
-    atomic_cell_mutable_view(bytes_mutable_view data) : atomic_cell_base(std::move(data)) {}
+class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
+    atomic_cell_mutable_view(const data::type_info& ti, uint8_t* data)
+        : basic_atomic_cell_view<mutable_view::yes>(ti, data) {}
 public:
-    static atomic_cell_mutable_view from_bytes(bytes_mutable_view data) { return atomic_cell_mutable_view(data); }
+    static atomic_cell_mutable_view from_bytes(const data::type_info& ti, imr::utils::object<data::cell::structure>& data) {
+        return atomic_cell_mutable_view(ti, data.get());
+    }

    friend class atomic_cell;
 };

-class atomic_cell_ref final : public atomic_cell_base<managed_bytes&> {
-public:
-    atomic_cell_ref(managed_bytes& buf) : atomic_cell_base(buf) {}
-};
+using atomic_cell_ref = atomic_cell_mutable_view;

-class atomic_cell final : public atomic_cell_base<managed_bytes> {
-    atomic_cell(managed_bytes b) : atomic_cell_base(std::move(b)) {}
+class atomic_cell final : public basic_atomic_cell_view<mutable_view::yes> {
+    using imr_object_type =  imr::utils::object<data::cell::structure>;
+    imr_object_type _data;
+    atomic_cell(const data::type_info& ti, imr::utils::object<data::cell::structure>&& data)
+        : basic_atomic_cell_view<mutable_view::yes>(ti, data.get()), _data(std::move(data)) {}
 public:
-    atomic_cell(const atomic_cell&) = default;
+    class collection_member_tag;
+    using collection_member = bool_class<collection_member_tag>;
+
    atomic_cell(atomic_cell&&) = default;
-    atomic_cell& operator=(const atomic_cell&) = default;
+    atomic_cell& operator=(const atomic_cell&) = delete;
    atomic_cell& operator=(atomic_cell&&) = default;
-    static atomic_cell from_bytes(managed_bytes b) {
-        return atomic_cell(std::move(b));
+    void swap(atomic_cell& other) noexcept {
+        basic_atomic_cell_view<mutable_view::yes>::swap(other);
+        _data.swap(other._data);
    }
-    atomic_cell(atomic_cell_view other) : atomic_cell_base(managed_bytes{other._data}) {}
-    operator atomic_cell_view() const {
-        return atomic_cell_view(_data);
+    operator atomic_cell_view() const { return atomic_cell_view(_view); }
+    atomic_cell(const abstract_type& t, atomic_cell_view other);
+    static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time);
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value,
+                                 collection_member = collection_member::no);
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
+                                 collection_member cm = collection_member::no) {
+        return make_live(type, timestamp, bytes_view(value), cm);
    }
-    static atomic_cell make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
-        return atomic_cell_type::make_dead(timestamp, deletion_time);
-    }
-    static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value) {
-        return atomic_cell_type::make_live(timestamp, value);
-    }
-    static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value) {
-        return make_live(timestamp, bytes_view(value));
-    }
-    static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value) {
-        return atomic_cell_type::make_live_counter_update(timestamp, value);
-    }
-    static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value,
-        gc_clock::time_point expiry, gc_clock::duration ttl)
+    static atomic_cell make_live_counter_update(api::timestamp_type timestamp, int64_t value);
+    static atomic_cell make_live(const abstract_type&, api::timestamp_type timestamp, bytes_view value,
+        gc_clock::time_point expiry, gc_clock::duration ttl, collection_member = collection_member::no);
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, const bytes& value,
+                                 gc_clock::time_point expiry, gc_clock::duration ttl, collection_member cm = collection_member::no)
    {
-        return atomic_cell_type::make_live(timestamp, value, expiry, ttl);
+        return make_live(type, timestamp, bytes_view(value), expiry, ttl, cm);
    }
-    static atomic_cell make_live(api::timestamp_type timestamp, const bytes& value,
-                                 gc_clock::time_point expiry, gc_clock::duration ttl)
-    {
-        return make_live(timestamp, bytes_view(value), expiry, ttl);
-    }
-    static atomic_cell make_live(api::timestamp_type timestamp, bytes_view value, ttl_opt ttl) {
+    static atomic_cell make_live(const abstract_type& type, api::timestamp_type timestamp, bytes_view value, ttl_opt ttl, collection_member cm = collection_member::no) {
        if (!ttl) {
-            return atomic_cell_type::make_live(timestamp, value);
+            return make_live(type, timestamp, value, cm);
        } else {
-            return atomic_cell_type::make_live(timestamp, value, gc_clock::now() + *ttl, *ttl);
+            return make_live(type, timestamp, value, gc_clock::now() + *ttl, *ttl, cm);
        }
    }
-    template<typename Serializer>
-    static atomic_cell make_live_from_serializer(api::timestamp_type timestamp, size_t size, Serializer&& serializer) {
-        return atomic_cell_type::make_live_from_serializer(timestamp, size, std::forward<Serializer>(serializer));
-    }
+    static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
    friend class atomic_cell_or_collection;
    friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
 };
@@ -364,33 +219,24 @@ class collection_mutation_view;
 //   list: tbd, probably ugly
 class collection_mutation {
 public:
-    managed_bytes data;
+    using imr_object_type =  imr::utils::object<data::cell::structure>;
+    imr_object_type _data;
+
    collection_mutation() {}
-    collection_mutation(managed_bytes b) : data(std::move(b)) {}
-    collection_mutation(collection_mutation_view v);
+    collection_mutation(const collection_type_impl&, collection_mutation_view v);
+    collection_mutation(const collection_type_impl&, bytes_view bv);
    operator collection_mutation_view() const;
 };

+
 class collection_mutation_view {
 public:
-    bytes_view data;
-    bytes_view serialize() const { return data; }
-    static collection_mutation_view from_bytes(bytes_view v) { return { v }; }
+    atomic_cell_value_view data;
 };

-inline
-collection_mutation::collection_mutation(collection_mutation_view v)
-        : data(v.data) {
-}
-
-inline
-collection_mutation::operator collection_mutation_view() const {
-    return { data };
-}
-
 class column_definition;

 int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);
-void merge_column(const column_definition& def,
+void merge_column(const abstract_type& def,
        atomic_cell_or_collection& old,
        const atomic_cell_or_collection& neww);
--- a/atomic_cell_hash.hh
+++ b/atomic_cell_hash.hh
@@ -25,6 +25,7 @@

 #include "types.hh"
 #include "atomic_cell.hh"
+#include "atomic_cell_or_collection.hh"
 #include "hashing.hh"
 #include "counters.hh"

@@ -32,12 +33,15 @@ template<>
 struct appending_hash<collection_mutation_view> {
    template<typename Hasher>
    void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
-        auto m_view = collection_type_impl::deserialize_mutation_form(cell);
+      cell.data.with_linearized([&] (bytes_view cell_bv) {
+        auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
+        auto m_view = ctype->deserialize_mutation_form(cell_bv);
        ::feed_hash(h, m_view.tomb);
        for (auto&& key_and_value : m_view.cells) {
            ::feed_hash(h, key_and_value.first);
            ::feed_hash(h, key_and_value.second, cdef);
        }
+      });
    }
 };

@@ -49,7 +53,9 @@ struct appending_hash<atomic_cell_view> {
        feed_hash(h, cell.timestamp());
        if (cell.is_live()) {
            if (cdef.is_counter()) {
-                ::feed_hash(h, counter_cell_view(cell));
+                counter_cell_view::with_linearized(cell, [&] (counter_cell_view ccv) {
+                    ::feed_hash(h, ccv);
+                });
                return;
            }
            if (cell.is_live_and_has_ttl()) {
@@ -78,3 +84,15 @@ struct appending_hash<collection_mutation> {
        feed_hash(h, static_cast<collection_mutation_view>(cm), cdef);
    }
 };
+
+template<>
+struct appending_hash<atomic_cell_or_collection> {
+    template<typename Hasher>
+    void operator()(Hasher& h, const atomic_cell_or_collection& c, const column_definition& cdef) const {
+        if (cdef.is_atomic()) {
+            feed_hash(h, c.as_atomic_cell(cdef), cdef);
+        } else {
+            feed_hash(h, c.as_collection_mutation(), cdef);
+        }
+    }
+};
--- a/atomic_cell_or_collection.hh
+++ b/atomic_cell_or_collection.hh
@@ -25,50 +25,56 @@
 #include "schema.hh"
 #include "hashing.hh"

+#include "imr/utils.hh"
+
 // A variant type that can hold either an atomic_cell, or a serialized collection.
 // Which type is stored is determined by the schema.
-// Has an "empty" state.
-// Objects moved-from are left in an empty state.
 class atomic_cell_or_collection final {
-    managed_bytes _data;
+    // FIXME: This has made us lose small-buffer optimisation. Unfortunately,
+    // due to the changed cell format it would be less effective now, anyway.
+    // Measure the actual impact because any attempts to fix this will become
+    // irrelevant once rows are converted to the IMR as well, so maybe we can
+    // live with this like that.
+    using imr_object_type = imr::utils::object<data::cell::structure>;
+    imr_object_type _data;
 private:
-    atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
+    atomic_cell_or_collection(imr::utils::object<data::cell::structure>&& data) : _data(std::move(data)) {}
 public:
    atomic_cell_or_collection() = default;
+    atomic_cell_or_collection(atomic_cell_or_collection&&) = default;
+    atomic_cell_or_collection(const atomic_cell_or_collection&) = delete;
+    atomic_cell_or_collection& operator=(atomic_cell_or_collection&&) = default;
+    atomic_cell_or_collection& operator=(const atomic_cell_or_collection&) = delete;
    atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
+    atomic_cell_or_collection(const abstract_type& at, atomic_cell_view acv);
    static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
-    atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
-    atomic_cell_ref as_atomic_cell_ref() { return { _data }; }
-    atomic_cell_mutable_view as_mutable_atomic_cell() { return atomic_cell_mutable_view::from_bytes(_data); }
-    atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
+    atomic_cell_view as_atomic_cell(const column_definition& cdef) const { return atomic_cell_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
+    atomic_cell_ref as_atomic_cell_ref(const column_definition& cdef) { return atomic_cell_mutable_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
+    atomic_cell_mutable_view as_mutable_atomic_cell(const column_definition& cdef) { return atomic_cell_mutable_view::from_bytes(cdef.type->imr_state().type_info(), _data); }
+    atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm._data)) { }
+    atomic_cell_or_collection copy(const abstract_type&) const;
    explicit operator bool() const {
-        return !_data.empty();
+        return bool(_data);
    }
-    bool can_use_mutable_view() const {
-        return !_data.is_fragmented();
+    static constexpr bool can_use_mutable_view() {
+        return true;
    }
-    static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
-        return std::move(data.data);
-    }
-    collection_mutation_view as_collection_mutation() const {
-        return collection_mutation_view{_data};
-    }
-    bytes_view serialize() const {
-        return _data;
-    }
-    bool operator==(const atomic_cell_or_collection& other) const {
-        return _data == other._data;
-    }
-    template<typename Hasher>
-    void feed_hash(Hasher& h, const column_definition& def) const {
-        if (def.is_atomic()) {
-            ::feed_hash(h, as_atomic_cell(), def);
-        } else {
-            ::feed_hash(h, as_collection_mutation(), def);
-        }
-    }
-    size_t external_memory_usage() const {
-        return _data.external_memory_usage();
+    void swap(atomic_cell_or_collection& other) noexcept {
+        _data.swap(other._data);
    }
+    static atomic_cell_or_collection from_collection_mutation(collection_mutation data) { return std::move(data._data); }
+    collection_mutation_view as_collection_mutation() const;
+    bytes_view serialize() const;
+    bool equals(const abstract_type& type, const atomic_cell_or_collection& other) const;
+    size_t external_memory_usage(const abstract_type&) const;
    friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
 };
+
+namespace std {
+
+inline void swap(atomic_cell_or_collection& a, atomic_cell_or_collection& b) noexcept
+{
+    a.swap(b);
+}
+
+}
--- a/auth/allow_all_authenticator.hh
+++ b/auth/allow_all_authenticator.hh
@@ -23,8 +23,8 @@

 #include <stdexcept>

-#include "auth/authenticator.hh"
 #include "auth/authenticated_user.hh"
+#include "auth/authenticator.hh"
 #include "auth/common.hh"

 namespace cql3 {
@@ -44,52 +44,56 @@ public:
    allow_all_authenticator(cql3::query_processor&, ::service::migration_manager&) {
    }

-    future<> start() override {
+    virtual future<> start() override {
        return make_ready_future<>();
    }

-    future<> stop() override {
+    virtual future<> stop() override {
        return make_ready_future<>();
    }

-    const sstring& qualified_java_name() const override {
+    virtual const sstring& qualified_java_name() const override {
        return allow_all_authenticator_name();
    }

-    bool require_authentication() const override {
+    virtual bool require_authentication() const override {
        return false;
    }

-    option_set supported_options() const override {
-        return option_set();
+    virtual authentication_option_set supported_options() const override {
+        return authentication_option_set();
    }

-    option_set alterable_options() const override {
-        return option_set();
+    virtual authentication_option_set alterable_options() const override {
+        return authentication_option_set();
    }

-    future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override {
-        return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+    future<authenticated_user> authenticate(const credentials_map& credentials) const override {
+        return make_ready_future<authenticated_user>(anonymous_user());
    }

-    future<> create(sstring username, const option_map& options) override {
+    virtual future<> create(stdx::string_view, const authentication_options& options) const override {
        return make_ready_future();
    }

-    future<> alter(sstring username, const option_map& options) override {
+    virtual future<> alter(stdx::string_view, const authentication_options& options) const override {
        return make_ready_future();
    }

-    future<> drop(sstring username) override {
+    virtual future<> drop(stdx::string_view) const override {
        return make_ready_future();
    }

-    const resource_ids& protected_resources() const override {
-        static const resource_ids ids;
-        return ids;
+    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
+        return make_ready_future<custom_options>();
    }

-    ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
+    virtual const resource_set& protected_resources() const override {
+        static const resource_set resources;
+        return resources;
+    }
+
+    virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
        throw std::runtime_error("Should not reach");
    }
 };
--- a/auth/allow_all_authorizer.hh
+++ b/auth/allow_all_authorizer.hh
@@ -21,7 +21,7 @@

 #pragma once

-#include "authorizer.hh"
+#include "auth/authorizer.hh"
 #include "exceptions/exceptions.hh"
 #include "stdx.hh"

@@ -35,8 +35,6 @@ class migration_manager;

 namespace auth {

-class service;
-
 const sstring& allow_all_authorizer_name();

 class allow_all_authorizer final  : public authorizer {
@@ -44,54 +42,51 @@ public:
    allow_all_authorizer(cql3::query_processor&, ::service::migration_manager&) {
    }

-    future<> start() override {
+    virtual future<> start() override {
        return make_ready_future<>();
    }

-    future<> stop() override {
+    virtual future<> stop() override {
        return make_ready_future<>();
    }

-    const sstring& qualified_java_name() const override {
+    virtual const sstring& qualified_java_name() const override {
        return allow_all_authorizer_name();
    }

-    future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const override {
+    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override {
        return make_ready_future<permission_set>(permissions::ALL);
    }

-    future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
-        throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
+    virtual future<> grant(stdx::string_view, permission_set, const resource&) const override {
+        return make_exception_future<>(
+                unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));
    }

-    future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
-        throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
+    virtual future<> revoke(stdx::string_view, permission_set, const resource&) const override {
+        return make_exception_future<>(
+                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
    }

-    future<std::vector<permission_details>> list(
-            service&,
-            ::shared_ptr<authenticated_user> performer,
-            permission_set,
-            stdx::optional<data_resource>,
-            stdx::optional<sstring>) const override {
-        throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
+    virtual future<std::vector<permission_details>> list_all() const override {
+        return make_exception_future<std::vector<permission_details>>(
+                unsupported_authorization_operation(
+                        "LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));
    }

-    future<> revoke_all(sstring dropped_user) override {
-        return make_ready_future();
+    virtual future<> revoke_all(stdx::string_view) const override {
+        return make_exception_future(
+                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
    }

-    future<> revoke_all(data_resource) override {
-        return make_ready_future();
+    virtual future<> revoke_all(const resource&) const override {
+        return make_exception_future(
+                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
    }

-    const resource_ids& protected_resources() override {
-        static const resource_ids ids;
-        return ids;
-    }
-
-    future<> validate_configuration() const override {
-        return make_ready_future();
+    virtual const resource_set& protected_resources() const override {
+        static const resource_set resources;
+        return resources;
    }
 };

--- a/auth/authenticated_user.cc
+++ b/auth/authenticated_user.cc
@@ -39,26 +39,30 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

+#include "auth/authenticated_user.hh"

-#include "authenticated_user.hh"
+#include <iostream>

-const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");
+namespace auth {

-auth::authenticated_user::authenticated_user()
-                : _anon(true)
-{}
-
-auth::authenticated_user::authenticated_user(sstring name)
-                : _name(name), _anon(false)
-{}
-
-auth::authenticated_user::authenticated_user(authenticated_user&&) = default;
-auth::authenticated_user::authenticated_user(const authenticated_user&) = default;
-
-const sstring& auth::authenticated_user::name() const {
-    return _anon ? ANONYMOUS_USERNAME : _name;
+authenticated_user::authenticated_user(stdx::string_view name)
+        : name(sstring(name)) {
+}
+
+std::ostream& operator<<(std::ostream& os, const authenticated_user& u) {
+    if (!u.name) {
+        os << "anonymous";
+    } else {
+        os << *u.name;
+    }
+
+    return os;
+}
+
+static const authenticated_user the_anonymous_user{};
+
+const authenticated_user& anonymous_user() noexcept {
+    return the_anonymous_user;
 }

-bool auth::authenticated_user::operator==(const authenticated_user& v) const {
-    return _anon ? v._anon : _name == v._name;
 }
--- a/auth/authenticated_user.hh
+++ b/auth/authenticated_user.hh
@@ -41,35 +41,63 @@

 #pragma once

+#include <experimental/string_view>
+#include <functional>
+#include <iosfwd>
+#include <optional>
+
 #include <seastar/core/sstring.hh>
-#include <seastar/core/future.hh>
+
 #include "seastarx.hh"
+#include "stdx.hh"

 namespace auth {

-class authenticated_user {
+///
+/// A type-safe wrapper for the name of a logged-in user, or a nameless (anonymous) user.
+///
+class authenticated_user final {
 public:
-    static const sstring ANONYMOUS_USERNAME;
+    ///
+    /// An anonymous user has no name.
+    ///
+    std::optional<sstring> name{};

-    authenticated_user();
-    authenticated_user(sstring name);
-    authenticated_user(authenticated_user&&);
-    authenticated_user(const authenticated_user&);
-
-    const sstring& name() const;
-
-    /**
-     * If IAuthenticator doesn't require authentication, this method may return true.
-     */
-    bool is_anonymous() const {
-        return _anon;
-    }
-
-    bool operator==(const authenticated_user&) const;
-private:
-    sstring _name;
-    bool _anon;
+    ///
+    /// An anonymous user.
+    ///
+    authenticated_user() = default;
+    explicit authenticated_user(stdx::string_view name);
 };

+///
+/// The user name, or "anonymous".
+///
+std::ostream& operator<<(std::ostream&, const authenticated_user&);
+
+inline bool operator==(const authenticated_user& u1, const authenticated_user& u2) noexcept {
+    return u1.name == u2.name;
+}
+
+inline bool operator!=(const authenticated_user& u1, const authenticated_user& u2) noexcept {
+    return !(u1 == u2);
+}
+
+const authenticated_user& anonymous_user() noexcept;
+
+inline bool is_anonymous(const authenticated_user& u) noexcept {
+    return u == anonymous_user();
+}
+
 }

+namespace std {
+
+template <>
+struct hash<auth::authenticated_user> final {
+    size_t operator()(const auth::authenticated_user &u) const {
+        return std::hash<std::optional<sstring>>()(u.name);
+    }
+};
+
+}
--- a/auth/authentication_options.cc
+++ b/auth/authentication_options.cc
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 ScyllaDB
+ * Copyright (C) 2018 ScyllaDB
 */

 /*
@@ -19,15 +19,19 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

-#pragma once
+#include "auth/authentication_options.hh"

-#include <experimental/optional>
+#include <iostream>
+
+namespace auth {
+
+std::ostream& operator<<(std::ostream& os, authentication_option a) {
+    switch (a) {
+        case authentication_option::password: os << "PASSWORD"; break;
+        case authentication_option::options: os << "OPTIONS"; break;
+    }
+
+    return os;
+}

-template<typename T>
-inline
-std::experimental::optional<T>
-move_and_disengage(std::experimental::optional<T>& opt) {
-    auto t = std::move(opt);
-    opt = std::experimental::nullopt;
-    return t;
 }
--- a/auth/authentication_options.hh
+++ b/auth/authentication_options.hh
@@ -0,0 +1,64 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <iosfwd>
+#include <optional>
+#include <stdexcept>
+#include <unordered_map>
+#include <unordered_set>
+
+#include <seastar/core/print.hh>
+#include <seastar/core/sstring.hh>
+
+#include "seastarx.hh"
+
+namespace auth {
+
+enum class authentication_option {
+    password,
+    options
+};
+
+std::ostream& operator<<(std::ostream&, authentication_option);
+
+using authentication_option_set = std::unordered_set<authentication_option>;
+
+using custom_options = std::unordered_map<sstring, sstring>;
+
+struct authentication_options final {
+    std::optional<sstring> password;
+    std::optional<custom_options> options;
+};
+
+inline bool any_authentication_options(const authentication_options& aos) noexcept {
+    return aos.password || aos.options;
+}
+
+class unsupported_authentication_option : public std::invalid_argument {
+public:
+    explicit unsupported_authentication_option(authentication_option k)
+            : std::invalid_argument(sprint("The %s option is not supported.", k)) {
+    }
+};
+
+}
--- a/auth/authenticator.cc
+++ b/auth/authenticator.cc
@@ -39,29 +39,14 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

-#include "authenticator.hh"
-#include "authenticated_user.hh"
-#include "common.hh"
-#include "password_authenticator.hh"
+#include "auth/authenticator.hh"
+
+#include "auth/authenticated_user.hh"
+#include "auth/common.hh"
+#include "auth/password_authenticator.hh"
 #include "cql3/query_processor.hh"
 #include "db/config.hh"
 #include "utils/class_registrator.hh"

 const sstring auth::authenticator::USERNAME_KEY("username");
 const sstring auth::authenticator::PASSWORD_KEY("password");
-
-auth::authenticator::option auth::authenticator::string_to_option(const sstring& name) {
-    if (strcasecmp(name.c_str(), "password") == 0) {
-        return option::PASSWORD;
-    }
-    throw std::invalid_argument(name);
-}
-
-sstring auth::authenticator::option_to_string(option opt) {
-    switch (opt) {
-    case option::PASSWORD:
-        return "PASSWORD";
-    default:
-        throw std::invalid_argument(sprint("Unknown option {}", opt));
-    }
-}
--- a/auth/authenticator.hh
+++ b/auth/authenticator.hh
@@ -41,21 +41,24 @@

 #pragma once

+#include <experimental/string_view>
 #include <memory>
-#include <unordered_map>
 #include <set>
 #include <stdexcept>
+#include <unordered_map>
+
 #include <boost/any.hpp>
-
-#include <seastar/core/sstring.hh>
-#include <seastar/core/future.hh>
-#include <seastar/core/shared_ptr.hh>
 #include <seastar/core/enum.hh>
+#include <seastar/core/future.hh>
+#include <seastar/core/sstring.hh>
+#include <seastar/core/shared_ptr.hh>

+#include "auth/authentication_options.hh"
+#include "auth/resource.hh"
 #include "bytes.hh"
-#include "data_resource.hh"
 #include "enum_set.hh"
 #include "exceptions/exceptions.hh"
+#include "stdx.hh"

 namespace db {
    class config;
@@ -65,126 +68,104 @@ namespace auth {

 class authenticated_user;

+///
+/// Abstract client for authenticating role identity.
+///
+/// All state necessary to authorize a role is stored externally to the client instance.
+///
 class authenticator {
 public:
+    ///
+    /// The name of the key to be used for the user-name part of password authentication with \ref authenticate.
+    ///
    static const sstring USERNAME_KEY;
+
+    ///
+    /// The name of the key to be used for the password part of password authentication with \ref authenticate.
+    ///
    static const sstring PASSWORD_KEY;

-    /**
-     * Supported CREATE USER/ALTER USER options.
-     * Currently only PASSWORD is available.
-     */
-    enum class option {
-        PASSWORD
-    };
-
-    static option string_to_option(const sstring&);
-    static sstring option_to_string(option);
-
-    using option_set = enum_set<super_enum<option, option::PASSWORD>>;
-    using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
    using credentials_map = std::unordered_map<sstring, sstring>;

-    virtual ~authenticator()
-    {}
+    virtual ~authenticator() = default;

    virtual future<> start() = 0;

    virtual future<> stop() = 0;

+    ///
+    /// A fully-qualified (class with package) Java-like name for this implementation.
+    ///
    virtual const sstring& qualified_java_name() const = 0;

-    /**
-     * Whether or not the authenticator requires explicit login.
-     * If false will instantiate user with AuthenticatedUser.ANONYMOUS_USER.
-     */
    virtual bool require_authentication() const = 0;

-    /**
-     * Set of options supported by CREATE USER and ALTER USER queries.
-     * Should never return null - always return an empty set instead.
-     */
-    virtual option_set supported_options() const = 0;
+    virtual authentication_option_set supported_options() const = 0;

-    /**
-     * Subset of supportedOptions that users are allowed to alter when performing ALTER USER [themselves].
-     * Should never return null - always return an empty set instead.
-     */
-    virtual option_set alterable_options() const = 0;
+    ///
+    /// A subset of `supported_options()` that users are permitted to alter for themselves.
+    ///
+    virtual authentication_option_set alterable_options() const = 0;

-    /**
-     * Authenticates a user given a Map<String, String> of credentials.
-     * Should never return null - always throw AuthenticationException instead.
-     * Returning AuthenticatedUser.ANONYMOUS_USER is an option as well if authentication is not required.
-     *
-     * @throws authentication_exception if credentials don't match any known user.
-     */
-    virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const = 0;
+    ///
+    /// Authenticate a user given implementation-specific credentials.
+    ///
+    /// If this implementation does not require authentication (\ref require_authentication), an anonymous user may
+    /// result.
+    ///
+    /// \returns an exceptional future with \ref exceptions::authentication_exception if given invalid credentials.
+    ///
+    virtual future<authenticated_user> authenticate(const credentials_map& credentials) const = 0;

-    /**
-     * Called during execution of CREATE USER query (also may be called on startup, see seedSuperuserOptions method).
-     * If authenticator is static then the body of the method should be left blank, but don't throw an exception.
-     * options are guaranteed to be a subset of supportedOptions().
-     *
-     * @param username Username of the user to create.
-     * @param options Options the user will be created with.
-     * @throws exceptions::request_validation_exception
-     * @throws exceptions::request_execution_exception
-     */
-    virtual future<> create(sstring username, const option_map& options) = 0;
+    ///
+    /// Create an authentication record for a new user. This is required before the user can log-in.
+    ///
+    /// The options provided must be a subset of `supported_options()`.
+    ///
+    virtual future<> create(stdx::string_view role_name, const authentication_options& options) const = 0;

-    /**
-     * Called during execution of ALTER USER query.
-     * options are always guaranteed to be a subset of supportedOptions(). Furthermore, if the user performing the query
-     * is not a superuser and is altering himself, then options are guaranteed to be a subset of alterableOptions().
-     * Keep the body of the method blank if your implementation doesn't support any options.
-     *
-     * @param username Username of the user that will be altered.
-     * @param options Options to alter.
-     * @throws exceptions::request_validation_exception
-     * @throws exceptions::request_execution_exception
-     */
-    virtual future<> alter(sstring username, const option_map& options) = 0;
+    ///
+    /// Alter the authentication record of an existing user.
+    ///
+    /// The options provided must be a subset of `supported_options()`.
+    ///
+    /// Callers must ensure that the specification of `alterable_options()` is adhered to.
+    ///
+    virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const = 0;

+    ///
+    /// Delete the authentication record for a user. This will disallow the user from logging in.
+    ///
+    virtual future<> drop(stdx::string_view role_name) const = 0;

-    /**
-     * Called during execution of DROP USER query.
-     *
-     * @param username Username of the user that will be dropped.
-     * @throws exceptions::request_validation_exception
-     * @throws exceptions::request_execution_exception
-     */
-    virtual future<> drop(sstring username) = 0;
+    ///
+    /// Query for custom options (those corresponding to \ref authentication_options::options).
+    ///
+    /// If no options are set the result is an empty container.
+    ///
+    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const = 0;

-     /**
-     * Set of resources that should be made inaccessible to users and only accessible internally.
-     *
-     * @return Keyspaces, column families that will be unmodifiable by users; other resources.
-     * @see resource_ids
-     */
-    virtual const resource_ids& protected_resources() const = 0;
+    ///
+    /// System resources used internally as part of the implementation. These are made inaccessible to users.
+    ///
+    virtual const resource_set& protected_resources() const = 0;

+    ///
+    /// A stateful SASL challenge which supports many authentication schemes (depending on the implementation).
+    ///
    class sasl_challenge {
    public:
-        virtual ~sasl_challenge() {}
+        virtual ~sasl_challenge() = default;
+
        virtual bytes evaluate_response(bytes_view client_response) = 0;
+
        virtual bool is_complete() const = 0;
-        virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const = 0;
+
+        virtual future<authenticated_user> get_authenticated_user() const = 0;
    };

-    /**
-     * Provide a sasl_challenge to be used by the CQL binary protocol server. If
-     * the configured authenticator requires authentication but does not implement this
-     * interface we refuse to start the binary protocol server as it will have no way
-     * of authenticating clients.
-     * @return sasl_challenge implementation
-     */
    virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
 };

-inline std::ostream& operator<<(std::ostream& os, authenticator::option opt) {
-    return os << authenticator::option_to_string(opt);
-}
-
 }

--- a/auth/authorizer.cc
+++ b/auth/authorizer.cc
@@ -1,118 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-/*
- * Copyright (C) 2016 ScyllaDB
- *
- * Modified by ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include "authorizer.hh"
-#include "authenticated_user.hh"
-#include "common.hh"
-#include "default_authorizer.hh"
-#include "auth.hh"
-#include "cql3/query_processor.hh"
-#include "db/config.hh"
-#include "utils/class_registrator.hh"
-
-const sstring& auth::allow_all_authorizer_name() {
-    static const sstring name = meta::AUTH_PACKAGE_NAME + "AllowAllAuthorizer";
-    return name;
-}
-
-/**
- * Authenticator is assumed to be a fully state-less immutable object (note all the const).
- * We thus store a single instance globally, since it should be safe/ok.
- */
-static std::unique_ptr<auth::authorizer> global_authorizer;
-using authorizer_registry = class_registry<auth::authorizer, cql3::query_processor&>;
-
-future<>
-auth::authorizer::setup(const sstring& type) {
-    if (type == allow_all_authorizer_name()) {
-        class allow_all_authorizer : public authorizer {
-        public:
-            future<> start() override {
-                return make_ready_future<>();
-            }
-            future<> stop() override {
-                return make_ready_future<>();
-            }
-            const sstring& qualified_java_name() const override {
-                return allow_all_authorizer_name();
-            }
-            future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override {
-                return make_ready_future<permission_set>(permissions::ALL);
-            }
-            future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
-                throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
-            }
-            future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
-                throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
-            }
-            future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const override {
-                throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
-            }
-            future<> revoke_all(sstring dropped_user) override {
-                return make_ready_future();
-            }
-            future<> revoke_all(data_resource) override {
-                return make_ready_future();
-            }
-            const resource_ids& protected_resources() override {
-                static const resource_ids ids;
-                return ids;
-            }
-            future<> validate_configuration() const override {
-                return make_ready_future();
-            }
-        };
-
-        global_authorizer = std::make_unique<allow_all_authorizer>();
-        return make_ready_future();
-    } else {
-        auto a = authorizer_registry::create(type, cql3::get_local_query_processor());
-        auto f = a->start();
-        return f.then([a = std::move(a)]() mutable {
-            global_authorizer = std::move(a);
-        });
-    }
-}
-
-auth::authorizer& auth::authorizer::get() {
-    assert(global_authorizer);
-    return *global_authorizer;
-}
--- a/auth/authorizer.hh
+++ b/auth/authorizer.hh
@@ -41,127 +41,116 @@

 #pragma once

-#include <vector>
+#include <experimental/string_view>
+#include <functional>
+#include <optional>
+#include <stdexcept>
 #include <tuple>
+#include <vector>

-#include <experimental/optional>
 #include <seastar/core/future.hh>
 #include <seastar/core/shared_ptr.hh>

-#include "permission.hh"
-#include "data_resource.hh"
-
+#include "auth/permission.hh"
+#include "auth/resource.hh"
 #include "seastarx.hh"
+#include "stdx.hh"

 namespace auth {

-class service;
-
-class authenticated_user;
+class role_or_anonymous;

 struct permission_details {
-    sstring user;
-    data_resource resource;
+    sstring role_name;
+    ::auth::resource resource;
    permission_set permissions;
-
-    bool operator<(const permission_details& v) const {
-        return std::tie(user, resource, permissions) < std::tie(v.user, v.resource, v.permissions);
-    }
 };

-using std::experimental::optional;
+inline bool operator==(const permission_details& pd1, const permission_details& pd2) {
+    return std::forward_as_tuple(pd1.role_name, pd1.resource, pd1.permissions.mask())
+            == std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions.mask());
+}

+inline bool operator!=(const permission_details& pd1, const permission_details& pd2) {
+    return !(pd1 == pd2);
+}
+
+inline bool operator<(const permission_details& pd1, const permission_details& pd2) {
+    return std::forward_as_tuple(pd1.role_name, pd1.resource, pd1.permissions)
+            < std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions);
+}
+
+class unsupported_authorization_operation : public std::invalid_argument {
+public:
+    using std::invalid_argument::invalid_argument;
+};
+
+///
+/// Abstract client for authorizing roles to access resources.
+///
+/// All state necessary to authorize a role is stored externally to the client instance.
+///
 class authorizer {
 public:
-    virtual ~authorizer() {}
+    virtual ~authorizer() = default;

    virtual future<> start() = 0;

    virtual future<> stop() = 0;

+    ///
+    /// A fully-qualified (class with package) Java-like name for this implementation.
+    ///
    virtual const sstring& qualified_java_name() const = 0;

-    /**
-     * The primary Authorizer method. Returns a set of permissions of a user on a resource.
-     *
-     * @param user Authenticated user requesting authorization.
-     * @param resource Resource for which the authorization is being requested. @see DataResource.
-     * @return Set of permissions of the user on the resource. Should never return empty. Use permission.NONE instead.
-     */
-    virtual future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const = 0;
+    ///
+    /// Query for the permissions granted directly to a role for a particular \ref resource (and not any of its
+    /// parents).
+    ///
+    /// The optional role name is empty when an anonymous user is authorized. Some implementations may still wish to
+    /// grant default permissions in this case.
+    ///
+    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const = 0;

-    /**
-     * Grants a set of permissions on a resource to a user.
-     * The opposite of revoke().
-     *
-     * @param performer User who grants the permissions.
-     * @param permissions Set of permissions to grant.
-     * @param to Grantee of the permissions.
-     * @param resource Resource on which to grant the permissions.
-     *
-     * @throws RequestValidationException
-     * @throws RequestExecutionException
-     */
-    virtual future<> grant(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring to) = 0;
+    ///
+    /// Grant a set of permissions to a role for a particular \ref resource.
+    ///
+    /// \throws \ref unsupported_authorization_operation if granting permissions is not supported.
+    ///
+    virtual future<> grant(stdx::string_view role_name, permission_set, const resource&) const = 0;

-    /**
-     * Revokes a set of permissions on a resource from a user.
-     * The opposite of grant().
-     *
-     * @param performer User who revokes the permissions.
-     * @param permissions Set of permissions to revoke.
-     * @param from Revokee of the permissions.
-     * @param resource Resource on which to revoke the permissions.
-     *
-     * @throws RequestValidationException
-     * @throws RequestExecutionException
-     */
-    virtual future<> revoke(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring from) = 0;
+    ///
+    /// Revoke a set of permissions from a role for a particular \ref resource.
+    ///
+    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
+    ///
+    virtual future<> revoke(stdx::string_view role_name, permission_set, const resource&) const = 0;

-    /**
-     * Returns a list of permissions on a resource of a user.
-     *
-     * @param performer User who wants to see the permissions.
-     * @param permissions Set of Permission values the user is interested in. The result should only include the matching ones.
-     * @param resource The resource on which permissions are requested. Can be null, in which case permissions on all resources
-     *                 should be returned.
-     * @param of The user whose permissions are requested. Can be null, in which case permissions of every user should be returned.
-     *
-     * @return All of the matching permission that the requesting user is authorized to know about.
-     *
-     * @throws RequestValidationException
-     * @throws RequestExecutionException
-     */
-    virtual future<std::vector<permission_details>> list(service&, ::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;
+    ///
+    /// Query for all directly granted permissions.
+    ///
+    /// \throws \ref unsupported_authorization_operation if listing permissions is not supported.
+    ///
+    virtual future<std::vector<permission_details>> list_all() const = 0;

-    /**
-     * This method is called before deleting a user with DROP USER query so that a new user with the same
-     * name wouldn't inherit permissions of the deleted user in the future.
-     *
-     * @param droppedUser The user to revoke all permissions from.
-     */
-    virtual future<> revoke_all(sstring dropped_user) = 0;
+    ///
+    /// Revoke all permissions granted directly to a particular role.
+    ///
+    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
+    ///
+    virtual future<> revoke_all(stdx::string_view role_name) const = 0;

-    /**
-     * This method is called after a resource is removed (i.e. keyspace or a table is dropped).
-     *
-     * @param droppedResource The resource to revoke all permissions on.
-     */
-    virtual future<> revoke_all(data_resource) = 0;
+    ///
+    /// Revoke all permissions granted to any role for a particular resource.
+    ///
+    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
+    ///
+    virtual future<> revoke_all(const resource&) const = 0;

-    /**
-     * Set of resources that should be made inaccessible to users and only accessible internally.
-     *
-     * @return Keyspaces, column families that will be unmodifiable by users; other resources.
-     */
-    virtual const resource_ids& protected_resources() = 0;
-
-    /**
-     * Validates configuration of IAuthorizer implementation (if configurable).
-     *
-     * @throws ConfigurationException when there is a configuration error.
-     */
-    virtual future<> validate_configuration() const = 0;
+    ///
+    /// System resources used internally as part of the implementation. These are made inaccessible to users.
+    ///
+    virtual const resource_set& protected_resources() const = 0;
 };

 }
--- a/auth/common.cc
+++ b/auth/common.cc
@@ -25,8 +25,10 @@

 #include "cql3/query_processor.hh"
 #include "cql3/statements/create_table_statement.hh"
+#include "database.hh"
 #include "schema_builder.hh"
 #include "service/migration_manager.hh"
+#include "timeout_config.hh"

 namespace auth {

@@ -39,14 +41,32 @@ const sstring AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");

 }

+static logging::logger auth_log("auth");
+
+// Func must support being invoked more than once.
+future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func) {
+    struct empty_state { };
+    return delay_until_system_ready(as).then([&as, func = std::move(func)] () mutable {
+        return exponential_backoff_retry::do_until_value(1s, 1min, as, [func = std::move(func)] {
+            return func().then_wrapped([] (auto&& f) -> stdx::optional<empty_state> {
+                if (f.failed()) {
+                    auth_log.info("Auth task failed with error, rescheduling: {}", f.get_exception());
+                    return { };
+                }
+                return { empty_state() };
+            });
+        });
+    }).discard_result();
+}
+
 future<> create_metadata_table_if_missing(
-        const sstring& table_name,
+        stdx::string_view table_name,
        cql3::query_processor& qp,
-        const sstring& cql,
+        stdx::string_view cql,
        ::service::migration_manager& mm) {
    auto& db = qp.db().local();

-    if (db.has_schema(meta::AUTH_KS, table_name)) {
+    if (db.has_schema(meta::AUTH_KS, sstring(table_name))) {
        return make_ready_future<>();
    }

@@ -58,7 +78,7 @@ future<> create_metadata_table_if_missing(
    auto statement = static_pointer_cast<cql3::statements::create_table_statement>(
            parsed_statement->prepare(db, qp.get_cql_stats())->statement);

-    const auto schema = statement->get_cf_meta_data();
+    const auto schema = statement->get_cf_meta_data(qp.db().local());
    const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());

    schema_builder b(schema);
@@ -67,4 +87,18 @@ future<> create_metadata_table_if_missing(
    return mm.announce_new_column_family(b.build(), false);
 }

+future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db) {
+    static const auto pause = [] { return sleep(std::chrono::milliseconds(500)); };
+
+    return do_until([&db] { return db.get_version() != database::empty_version; }, pause).then([&mm] {
+        return do_until([&mm] { return mm.have_schema_agreement(); }, pause);
+    });
+}
+
+const timeout_config& internal_distributed_timeout_config() noexcept {
+    static const auto t = 5s;
+    static const timeout_config tc{t, t, t, t, t, t, t};
+    return tc;
+}
+
 }
--- a/auth/common.hh
+++ b/auth/common.hh
@@ -22,14 +22,23 @@
 #pragma once

 #include <chrono>
+#include <experimental/string_view>

 #include <seastar/core/future.hh>
+#include <seastar/core/abort_source.hh>
+#include <seastar/util/noncopyable_function.hh>
 #include <seastar/core/reactor.hh>
 #include <seastar/core/resource.hh>
 #include <seastar/core/sstring.hh>

-#include "delayed_tasks.hh"
+#include "log.hh"
 #include "seastarx.hh"
+#include "utils/exponential_backoff_retry.hh"
+
+using namespace std::chrono_literals;
+
+class database;
+class timeout_config;

 namespace service {
 class migration_manager;
@@ -59,16 +68,24 @@ future<> once_among_shards(Task&& f) {
    return make_ready_future<>();
 }

-template <class Task, class Clock>
-void delay_until_system_ready(delayed_tasks<Clock>& ts, Task&& f) {
-    static const typename std::chrono::milliseconds delay_duration(10000);
-    ts.schedule_after(delay_duration, std::forward<Task>(f));
+inline future<> delay_until_system_ready(seastar::abort_source& as) {
+    return sleep_abortable(15s, as);
 }

+// Func must support being invoked more than once.
+future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);
+
 future<> create_metadata_table_if_missing(
-        const sstring& table_name,
+        stdx::string_view table_name,
        cql3::query_processor&,
-        const sstring& cql,
+        stdx::string_view cql,
        ::service::migration_manager&);

+future<> wait_for_schema_agreement(::service::migration_manager&, const database&);
+
+///
+/// Time-outs for internal, non-local CQL queries.
+///
+const timeout_config& internal_distributed_timeout_config() noexcept;
+
 }
--- a/auth/data_resource.cc
+++ b/auth/data_resource.cc
@@ -1,171 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-/*
- * Copyright (C) 2016 ScyllaDB
- *
- * Modified by ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include "data_resource.hh"
-
-#include <regex>
-#include "service/storage_proxy.hh"
-
-const sstring auth::data_resource::ROOT_NAME("data");
-
-auth::data_resource::data_resource(level l, const sstring& ks, const sstring& cf)
-    : _level(l), _ks(ks), _cf(cf)
-{
-}
-
-auth::data_resource::data_resource()
-    : data_resource(level::ROOT)
-{}
-
-auth::data_resource::data_resource(const sstring& ks)
-    : data_resource(level::KEYSPACE, ks)
-{}
-
-auth::data_resource::data_resource(const sstring& ks, const sstring& cf)
-    : data_resource(level::COLUMN_FAMILY, ks, cf)
-{}
-
-auth::data_resource::level auth::data_resource::get_level() const {
-    return _level;
-}
-
-auth::data_resource auth::data_resource::from_name(
-                const sstring& s) {
-
-    static std::regex slash_regex("/");
-
-    auto i = std::regex_token_iterator<sstring::const_iterator>(s.begin(),
-                    s.end(), slash_regex, -1);
-    auto e = std::regex_token_iterator<sstring::const_iterator>();
-    auto n = std::distance(i, e);
-
-    if (n > 3 || ROOT_NAME != sstring(*i++)) {
-        throw std::invalid_argument(sprint("%s is not a valid data resource name", s));
-    }
-
-    if (n == 1) {
-        return data_resource();
-    }
-    auto ks = *i++;
-    if (n == 2) {
-        return data_resource(ks.str());
-    }
-    auto cf = *i++;
-    return data_resource(ks.str(), cf.str());
-}
-
-sstring auth::data_resource::name() const {
-    switch (get_level()) {
-        case level::ROOT:
-            return ROOT_NAME;
-        case level::KEYSPACE:
-            return sprint("%s/%s", ROOT_NAME, _ks);
-        case level::COLUMN_FAMILY:
-        default:
-            return sprint("%s/%s/%s", ROOT_NAME, _ks, _cf);
-    }
-}
-
-auth::data_resource auth::data_resource::get_parent() const {
-    switch (get_level()) {
-    case level::KEYSPACE:
-        return data_resource();
-    case level::COLUMN_FAMILY:
-        return data_resource(_ks);
-    default:
-        throw std::invalid_argument("Root-level resource can't have a parent");
-    }
-}
-
-const sstring& auth::data_resource::keyspace() const {
-    if (is_root_level()) {
-        throw std::invalid_argument("ROOT data resource has no keyspace");
-    }
-    return _ks;
-}
-
-const sstring& auth::data_resource::column_family() const {
-    if (!is_column_family_level()) {
-        throw std::invalid_argument(sprint("%s data resource has no column family", name()));
-    }
-    return _cf;
-}
-
-bool auth::data_resource::has_parent() const {
-    return !is_root_level();
-}
-
-bool auth::data_resource::exists() const {
-    switch (get_level()) {
-        case level::ROOT:
-            return true;
-        case level::KEYSPACE:
-            return service::get_local_storage_proxy().get_db().local().has_keyspace(_ks);
-        case level::COLUMN_FAMILY:
-        default:
-            return service::get_local_storage_proxy().get_db().local().has_schema(_ks, _cf);
-    }
-}
-
-sstring auth::data_resource::to_string() const {
-    switch (get_level()) {
-        case level::ROOT:
-            return "<all keyspaces>";
-        case level::KEYSPACE:
-            return sprint("<keyspace %s>", _ks);
-        case level::COLUMN_FAMILY:
-        default:
-            return sprint("<table %s.%s>", _ks, _cf);
-    }
-}
-
-bool auth::data_resource::operator==(const data_resource& v) const {
-    return _ks == v._ks && _cf == v._cf;
-}
-
-bool auth::data_resource::operator<(const data_resource& v) const {
-    return _ks < v._ks ? true : (v._ks < _ks ? false : _cf < v._cf);
-}
-
-std::ostream& auth::operator<<(std::ostream& os, const data_resource& r) {
-    return os << r.to_string();
-}
-
--- a/auth/data_resource.hh
+++ b/auth/data_resource.hh
@@ -1,159 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-/*
- * Copyright (C) 2016 ScyllaDB
- *
- * Modified by ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#pragma once
-
-#include "utils/hash.hh"
-#include <iosfwd>
-#include <set>
-#include <seastar/core/sstring.hh>
-#include "seastarx.hh"
-
-namespace auth {
-
-class data_resource {
-private:
-    enum class level {
-        ROOT, KEYSPACE, COLUMN_FAMILY
-    };
-
-    static const sstring ROOT_NAME;
-
-    level _level;
-    sstring _ks;
-    sstring _cf;
-
-    data_resource(level, const sstring& ks = {}, const sstring& cf = {});
-
-    level get_level() const;
-public:
-    /**
-     * Creates a DataResource representing the root-level resource.
-     * @return the root-level resource.
-     */
-    data_resource();
-    /**
-     * Creates a DataResource representing a keyspace.
-     *
-     * @param keyspace Name of the keyspace.
-     */
-    data_resource(const sstring& ks);
-    /**
-     * Creates a DataResource instance representing a column family.
-     *
-     * @param keyspace Name of the keyspace.
-     * @param columnFamily Name of the column family.
-     */
-    data_resource(const sstring& ks, const sstring& cf);
-
-    /**
-     * Parses a data resource name into a DataResource instance.
-     *
-     * @param name Name of the data resource.
-     * @return DataResource instance matching the name.
-     */
-    static data_resource from_name(const sstring&);
-
-    /**
-     * @return Printable name of the resource.
-     */
-    sstring name() const;
-
-    /**
-     * @return Parent of the resource, if any. Throws IllegalStateException if it's the root-level resource.
-     */
-    data_resource get_parent() const;
-
-    bool is_root_level() const {
-        return get_level() == level::ROOT;
-    }
-
-    bool is_keyspace_level() const {
-        return get_level() == level::KEYSPACE;
-    }
-
-    bool is_column_family_level() const {
-        return get_level() == level::COLUMN_FAMILY;
-    }
-
-    /**
-     * @return keyspace of the resource.
-     * @throws std::invalid_argument if it's the root-level resource.
-     */
-    const sstring& keyspace() const;
-
-    /**
-     * @return column family of the resource.
-     * @throws std::invalid_argument if it's not a cf-level resource.
-     */
-    const sstring& column_family() const;
-
-    /**
-     * @return Whether or not the resource has a parent in the hierarchy.
-     */
-    bool has_parent() const;
-
-    /**
-     * @return Whether or not the resource exists in scylla.
-     */
-    bool exists() const;
-
-    sstring to_string() const;
-
-    bool operator==(const data_resource&) const;
-    bool operator<(const data_resource&) const;
-
-    size_t hash_value() const {
-        return utils::tuple_hash()(_ks, _cf);
-    }
-};
-
-/**
- * Resource id mappings, i.e. keyspace and/or column families.
- */
-using resource_ids = std::set<data_resource>;
-
-std::ostream& operator<<(std::ostream&, const data_resource&);
-
-}
-
-
-
--- a/auth/default_authorizer.cc
+++ b/auth/default_authorizer.cc
@@ -39,198 +39,291 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

-#include <unistd.h>
-#include <crypt.h>
-#include <random>
-#include <chrono>
+#include "auth/default_authorizer.hh"

+extern "C" {
+#include <crypt.h>
+#include <unistd.h>
+}
+
+#include <chrono>
+#include <random>
+
+#include <boost/algorithm/string/join.hpp>
+#include <boost/range.hpp>
 #include <seastar/core/reactor.hh>

-#include "common.hh"
-#include "default_authorizer.hh"
-#include "authenticated_user.hh"
-#include "permission.hh"
+#include "auth/authenticated_user.hh"
+#include "auth/common.hh"
+#include "auth/permission.hh"
+#include "auth/role_or_anonymous.hh"
 #include "cql3/query_processor.hh"
 #include "cql3/untyped_result_set.hh"
 #include "exceptions/exceptions.hh"
 #include "log.hh"

-const sstring& auth::default_authorizer_name() {
+namespace auth {
+
+const sstring& default_authorizer_name() {
    static const sstring name = meta::AUTH_PACKAGE_NAME + "CassandraAuthorizer";
    return name;
 }

-static const sstring USER_NAME = "username";
+static const sstring ROLE_NAME = "role";
 static const sstring RESOURCE_NAME = "resource";
 static const sstring PERMISSIONS_NAME = "permissions";
-static const sstring PERMISSIONS_CF = "permissions";
+static const sstring PERMISSIONS_CF = "role_permissions";

 static logging::logger alogger("default_authorizer");

 // To ensure correct initialization order, we unfortunately need to use a string literal.
 static const class_registrator<
-        auth::authorizer,
-        auth::default_authorizer,
+        authorizer,
+        default_authorizer,
        cql3::query_processor&,
        ::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.CassandraAuthorizer");

-auth::default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
+default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
        : _qp(qp)
        , _migration_manager(mm) {
 }

-auth::default_authorizer::~default_authorizer() {
+default_authorizer::~default_authorizer() {
 }

-future<> auth::default_authorizer::start() {
-    static const sstring create_table = sprint("CREATE TABLE %s.%s ("
-                    "%s text,"
-                    "%s text,"
-                    "%s set<text>,"
-                    "PRIMARY KEY(%s, %s)"
-                    ") WITH gc_grace_seconds=%d", meta::AUTH_KS,
-                    PERMISSIONS_CF, USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME,
-                    USER_NAME, RESOURCE_NAME, 90 * 24 * 60 * 60); // 3 months.
+static const sstring legacy_table_name{"permissions"};

-    return auth::once_among_shards([this] {
-        return auth::create_metadata_table_if_missing(
+bool default_authorizer::legacy_metadata_exists() const {
+    return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
+}
+
+future<bool> default_authorizer::any_granted() const {
+    static const sstring query = sprint("SELECT * FROM %s.%s LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);
+
+    return _qp.process(
+            query,
+            db::consistency_level::LOCAL_ONE,
+            infinite_timeout_config,
+            {},
+            true).then([this](::shared_ptr<cql3::untyped_result_set> results) {
+        return !results->empty();
+    });
+}
+
+future<> default_authorizer::migrate_legacy_metadata() const {
+    alogger.info("Starting migration of legacy permissions metadata.");
+    static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
+
+    return _qp.process(
+            query,
+            db::consistency_level::LOCAL_ONE,
+            infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {
+        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
+            return do_with(
+                    row.get_as<sstring>("username"),
+                    parse_resource(row.get_as<sstring>(RESOURCE_NAME)),
+                    [this, &row](const auto& username, const auto& r) {
+                const permission_set perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
+                return grant(username, perms, r);
+            });
+        }).finally([results] {});
+    }).then([] {
+        alogger.info("Finished migrating legacy permissions metadata.");
+    }).handle_exception([](std::exception_ptr ep) {
+        alogger.error("Encountered an error during migration!");
+        std::rethrow_exception(ep);
+    });
+}
+
+future<> default_authorizer::start() {
+    static const sstring create_table = sprint(
+            "CREATE TABLE %s.%s ("
+            "%s text,"
+            "%s text,"
+            "%s set<text>,"
+            "PRIMARY KEY(%s, %s)"
+            ") WITH gc_grace_seconds=%d",
+            meta::AUTH_KS,
+            PERMISSIONS_CF,
+            ROLE_NAME,
+            RESOURCE_NAME,
+            PERMISSIONS_NAME,
+            ROLE_NAME,
+            RESOURCE_NAME,
+            90 * 24 * 60 * 60); // 3 months.
+
+    return once_among_shards([this] {
+        return create_metadata_table_if_missing(
                PERMISSIONS_CF,
                _qp,
                create_table,
-                _migration_manager);
-    });
-}
+                _migration_manager).then([this] {
+            _finished = do_after_system_ready(_as, [this] {
+                return async([this] {
+                    wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();

-future<> auth::default_authorizer::stop() {
-    return make_ready_future<>();
-}
-
-future<auth::permission_set> auth::default_authorizer::authorize(
-                service& ser, ::shared_ptr<authenticated_user> user, data_resource resource) const {
-    return auth::is_super_user(ser, *user).then([this, user, resource = std::move(resource)](bool is_super) {
-        if (is_super) {
-            return make_ready_future<permission_set>(permissions::ALL);
-        }
-
-        /**
-         * TOOD: could create actual data type for permission (translating string<->perm),
-         * but this seems overkill right now. We still must store strings so...
-         */
-        auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?"
-                        , PERMISSIONS_NAME, meta::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
-        return _qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
-                        .then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
-            try {
-                auto res = f.get0();
-
-                if (res->empty() || !res->one().has(PERMISSIONS_NAME)) {
-                    return make_ready_future<permission_set>(permissions::NONE);
-                }
-                return make_ready_future<permission_set>(permissions::from_strings(res->one().get_set<sstring>(PERMISSIONS_NAME)));
-            } catch (exceptions::request_execution_exception& e) {
-                alogger.warn("CassandraAuthorizer failed to authorize {} for {}", user->name(), resource);
-                return make_ready_future<permission_set>(permissions::NONE);
-            }
-        });
-    });
-}
-
-#include <boost/range.hpp>
-
-future<> auth::default_authorizer::modify(
-                ::shared_ptr<authenticated_user> performer, permission_set set,
-                data_resource resource, sstring user, sstring op) {
-    // TODO: why does this not check super user?
-    auto query = sprint("UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
-                    meta::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
-                    PERMISSIONS_NAME, op, USER_NAME, RESOURCE_NAME);
-    return _qp.process(query, db::consistency_level::ONE, {
-                    permissions::to_strings(set), user, resource.name() }).discard_result();
-}
-
-
-future<> auth::default_authorizer::grant(
-                ::shared_ptr<authenticated_user> performer, permission_set set,
-                data_resource resource, sstring to) {
-    return modify(std::move(performer), std::move(set), std::move(resource), std::move(to), "+");
-}
-
-future<> auth::default_authorizer::revoke(
-                ::shared_ptr<authenticated_user> performer, permission_set set,
-                data_resource resource, sstring from) {
-    return modify(std::move(performer), std::move(set), std::move(resource), std::move(from), "-");
-}
-
-future<std::vector<auth::permission_details>> auth::default_authorizer::list(
-                service& ser, ::shared_ptr<authenticated_user> performer, permission_set set,
-                optional<data_resource> resource, optional<sstring> user) const {
-    return auth::is_super_user(ser, *performer).then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
-        if (!is_super && (!user || performer->name() != *user)) {
-            throw exceptions::unauthorized_exception(sprint("You are not authorized to view %s's permissions", user ? *user : "everyone"));
-        }
-
-        auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, meta::AUTH_KS, PERMISSIONS_CF);
-
-        // Oh, look, it is a case where it does not pay off to have
-        // parameters to process in an initializer list.
-        future<::shared_ptr<cql3::untyped_result_set>> f = make_ready_future<::shared_ptr<cql3::untyped_result_set>>();
-
-        if (resource && user) {
-            query += sprint(" WHERE %s = ? AND %s = ?", USER_NAME, RESOURCE_NAME);
-            f = _qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
-        } else if (resource) {
-            query += sprint(" WHERE %s = ? ALLOW FILTERING", RESOURCE_NAME);
-            f = _qp.process(query, db::consistency_level::ONE, {resource->name()});
-        } else if (user) {
-            query += sprint(" WHERE %s = ?", USER_NAME);
-            f = _qp.process(query, db::consistency_level::ONE, {*user});
-        } else {
-            f = _qp.process(query, db::consistency_level::ONE, {});
-        }
-
-        return f.then([set](::shared_ptr<cql3::untyped_result_set> res) {
-            std::vector<permission_details> result;
-
-            for (auto& row : *res) {
-                if (row.has(PERMISSIONS_NAME)) {
-                    auto username = row.get_as<sstring>(USER_NAME);
-                    auto resource = data_resource::from_name(row.get_as<sstring>(RESOURCE_NAME));
-                    auto ps = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
-                    ps = permission_set::from_mask(ps.mask() & set.mask());
-
-                    result.emplace_back(permission_details {username, resource, ps});
-                }
-            }
-            return make_ready_future<std::vector<permission_details>>(std::move(result));
-        });
-    });
-}
-
-future<> auth::default_authorizer::revoke_all(sstring dropped_user) {
-    auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", meta::AUTH_KS,
-                    PERMISSIONS_CF, USER_NAME);
-    return _qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
-                    [dropped_user](auto ep) {
-                        try {
-                            std::rethrow_exception(ep);
-                        } catch (exceptions::request_execution_exception& e) {
-                            alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", dropped_user, e);
+                    if (legacy_metadata_exists()) {
+                        if (!any_granted().get0()) {
+                            migrate_legacy_metadata().get0();
+                            return;
                        }
-                    });
+
+                        alogger.warn("Ignoring legacy permissions metadata since role permissions exist.");
+                    }
+                });
+            });
+        });
+    });
 }

-future<> auth::default_authorizer::revoke_all(data_resource resource) {
-    auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
-                    USER_NAME, meta::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
-    return _qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
-                    .then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
+future<> default_authorizer::stop() {
+    _as.request_abort();
+    return _finished.handle_exception_type([](const sleep_aborted&) {});
+}
+
+future<permission_set>
+default_authorizer::authorize(const role_or_anonymous& maybe_role, const resource& r) const {
+    if (is_anonymous(maybe_role)) {
+        return make_ready_future<permission_set>(permissions::NONE);
+    }
+
+    static const sstring query = sprint(
+            "SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?",
+            PERMISSIONS_NAME,
+            meta::AUTH_KS,
+            PERMISSIONS_CF,
+            ROLE_NAME,
+            RESOURCE_NAME);
+
+    return _qp.process(
+            query,
+            db::consistency_level::LOCAL_ONE,
+            infinite_timeout_config,
+            {*maybe_role.name, r.name()}).then([](::shared_ptr<cql3::untyped_result_set> results) {
+        if (results->empty()) {
+            return permissions::NONE;
+        }
+
+        return permissions::from_strings(results->one().get_set<sstring>(PERMISSIONS_NAME));
+    });
+}
+
+future<>
+default_authorizer::modify(
+        stdx::string_view role_name,
+        permission_set set,
+        const resource& resource,
+        stdx::string_view op) const {
+    return do_with(
+            sprint(
+                    "UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
+                    meta::AUTH_KS,
+                    PERMISSIONS_CF,
+                    PERMISSIONS_NAME,
+                    PERMISSIONS_NAME,
+                    op,
+                    ROLE_NAME,
+                    RESOURCE_NAME),
+            [this, &role_name, set, &resource](const auto& query) {
+        return _qp.process(
+                query,
+                db::consistency_level::ONE,
+                internal_distributed_timeout_config(),
+                {permissions::to_strings(set), sstring(role_name), resource.name()}).discard_result();
+    });
+}
+
+
+future<> default_authorizer::grant(stdx::string_view role_name, permission_set set, const resource& resource) const {
+    return modify(role_name, std::move(set), resource, "+");
+}
+
+future<> default_authorizer::revoke(stdx::string_view role_name, permission_set set, const resource& resource) const {
+    return modify(role_name, std::move(set), resource, "-");
+}
+
+future<std::vector<permission_details>> default_authorizer::list_all() const {
+    static const sstring query = sprint(
+            "SELECT %s, %s, %s FROM %s.%s",
+            ROLE_NAME,
+            RESOURCE_NAME,
+            PERMISSIONS_NAME,
+            meta::AUTH_KS,
+            PERMISSIONS_CF);
+
+    return _qp.process(
+            query,
+            db::consistency_level::ONE,
+            internal_distributed_timeout_config(),
+            {},
+            true).then([](::shared_ptr<cql3::untyped_result_set> results) {
+        std::vector<permission_details> all_details;
+
+        for (const auto& row : *results) {
+            if (row.has(PERMISSIONS_NAME)) {
+                auto role_name = row.get_as<sstring>(ROLE_NAME);
+                auto resource = parse_resource(row.get_as<sstring>(RESOURCE_NAME));
+                auto perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
+                all_details.push_back(permission_details{std::move(role_name), std::move(resource), std::move(perms)});
+            }
+        }
+
+        return all_details;
+    });
+}
+
+future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
+    static const sstring query = sprint(
+            "DELETE FROM %s.%s WHERE %s = ?",
+            meta::AUTH_KS,
+            PERMISSIONS_CF,
+            ROLE_NAME);
+
+    return _qp.process(
+            query,
+            db::consistency_level::ONE,
+            internal_distributed_timeout_config(),
+            {sstring(role_name)}).discard_result().handle_exception([role_name](auto ep) {
+        try {
+            std::rethrow_exception(ep);
+        } catch (exceptions::request_execution_exception& e) {
+            alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);
+        }
+    });
+}
+
+future<> default_authorizer::revoke_all(const resource& resource) const {
+    static const sstring query = sprint(
+            "SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
+            ROLE_NAME,
+            meta::AUTH_KS,
+            PERMISSIONS_CF,
+            RESOURCE_NAME);
+
+    return _qp.process(
+            query,
+            db::consistency_level::LOCAL_ONE,
+            infinite_timeout_config,
+            {resource.name()}).then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
        try {
            auto res = f.get0();
-            return parallel_for_each(res->begin(), res->end(), [this, res, resource](const cql3::untyped_result_set::row& r) {
-                auto query = sprint("DELETE FROM %s.%s WHERE %s = ? AND %s = ?"
-                                , meta::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
-                return _qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
-                                .discard_result().handle_exception([resource](auto ep) {
+            return parallel_for_each(
+                    res->begin(),
+                    res->end(),
+                    [this, res, resource](const cql3::untyped_result_set::row& r) {
+                static const sstring query = sprint(
+                        "DELETE FROM %s.%s WHERE %s = ? AND %s = ?",
+                        meta::AUTH_KS,
+                        PERMISSIONS_CF,
+                        ROLE_NAME,
+                        RESOURCE_NAME);
+
+                return _qp.process(
+                        query,
+                        db::consistency_level::LOCAL_ONE,
+                        infinite_timeout_config,
+                        {r.get_as<sstring>(ROLE_NAME), resource.name()}).discard_result().handle_exception(
+                                [resource](auto ep) {
                    try {
                        std::rethrow_exception(ep);
                    } catch (exceptions::request_execution_exception& e) {
@@ -246,12 +339,9 @@ future<> auth::default_authorizer::revoke_all(data_resource resource) {
    });
 }

-
-const auth::resource_ids& auth::default_authorizer::protected_resources() {
-    static const resource_ids ids({ data_resource(meta::AUTH_KS, PERMISSIONS_CF) });
-    return ids;
+const resource_set& default_authorizer::protected_resources() const {
+    static const resource_set resources({ make_data_resource(meta::AUTH_KS, PERMISSIONS_CF) });
+    return resources;
 }

-future<> auth::default_authorizer::validate_configuration() const {
-    return make_ready_future();
 }
--- a/auth/default_authorizer.hh
+++ b/auth/default_authorizer.hh
@@ -43,7 +43,9 @@

 #include <functional>

-#include "authorizer.hh"
+#include <seastar/core/abort_source.hh>
+
+#include "auth/authorizer.hh"
 #include "cql3/query_processor.hh"
 #include "service/migration_manager.hh"

@@ -56,36 +58,45 @@ class default_authorizer : public authorizer {

    ::service::migration_manager& _migration_manager;

+    abort_source _as{};
+
+    future<> _finished{make_ready_future<>()};
+
 public:
    default_authorizer(cql3::query_processor&, ::service::migration_manager&);
+
    ~default_authorizer();

-    future<> start() override;
+    virtual future<> start() override;

-    future<> stop() override;
+    virtual future<> stop() override;

-    const sstring& qualified_java_name() const override {
+    virtual const sstring& qualified_java_name() const override {
        return default_authorizer_name();
    }

-    future<permission_set> authorize(service&, ::shared_ptr<authenticated_user>, data_resource) const override;
+    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;

-    future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
+    virtual future<> grant(stdx::string_view, permission_set, const resource&) const override;

-    future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
+    virtual future<> revoke( stdx::string_view, permission_set, const resource&) const override;

-    future<std::vector<permission_details>> list(service&, ::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;
+    virtual future<std::vector<permission_details>> list_all() const override;

-    future<> revoke_all(sstring) override;
+    virtual future<> revoke_all(stdx::string_view) const override;

-    future<> revoke_all(data_resource) override;
+    virtual future<> revoke_all(const resource&) const override;

-    const resource_ids& protected_resources() override;
-
-    future<> validate_configuration() const override;
+    virtual const resource_set& protected_resources() const override;

 private:
-    future<> modify(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring, sstring);
+    bool legacy_metadata_exists() const;
+
+    future<bool> any_granted() const;
+
+    future<> migrate_legacy_metadata() const;
+
+    future<> modify(stdx::string_view, permission_set, const resource&, stdx::string_view) const;
 };

 } /* namespace auth */
--- a/auth/password_authenticator.cc
+++ b/auth/password_authenticator.cc
@@ -39,48 +39,56 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

-#include <unistd.h>
-#include <crypt.h>
-#include <random>
-#include <chrono>
+#include "auth/password_authenticator.hh"

+extern "C" {
+#include <crypt.h>
+#include <unistd.h>
+}
+
+#include <algorithm>
+#include <chrono>
+#include <random>
+
+#include <boost/algorithm/cxx11/all_of.hpp>
 #include <seastar/core/reactor.hh>

-#include "common.hh"
-#include "password_authenticator.hh"
-#include "authenticated_user.hh"
+#include "auth/authenticated_user.hh"
+#include "auth/common.hh"
+#include "auth/roles-metadata.hh"
 #include "cql3/untyped_result_set.hh"
 #include "log.hh"
 #include "service/migration_manager.hh"
 #include "utils/class_registrator.hh"

-const sstring& auth::password_authenticator_name() {
+namespace auth {
+
+const sstring& password_authenticator_name() {
    static const sstring name = meta::AUTH_PACKAGE_NAME + "PasswordAuthenticator";
    return name;
 }

 // name of the hash column.
 static const sstring SALTED_HASH = "salted_hash";
-static const sstring USER_NAME = "username";
-static const sstring DEFAULT_USER_NAME = auth::meta::DEFAULT_SUPERUSER_NAME;
-static const sstring DEFAULT_USER_PASSWORD = auth::meta::DEFAULT_SUPERUSER_NAME;
-static const sstring CREDENTIALS_CF = "credentials";
+static const sstring DEFAULT_USER_NAME = meta::DEFAULT_SUPERUSER_NAME;
+static const sstring DEFAULT_USER_PASSWORD = meta::DEFAULT_SUPERUSER_NAME;

 static logging::logger plogger("password_authenticator");

 // To ensure correct initialization order, we unfortunately need to use a string literal.
 static const class_registrator<
-        auth::authenticator,
-        auth::password_authenticator,
+        authenticator,
+        password_authenticator,
        cql3::query_processor&,
        ::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");

-auth::password_authenticator::~password_authenticator()
-{}
+password_authenticator::~password_authenticator() {
+}

-auth::password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
+password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
    : _qp(qp)
-    , _migration_manager(mm) {
+    , _migration_manager(mm)
+    , _stopped(make_ready_future<>()) {
 }

 // TODO: blowfish
@@ -141,7 +149,9 @@ static sstring gensalt() {
    // blowfish 2011 fix, blowfish, sha512, sha256, md5
    for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
        salt = pfx + input;
-        if (crypt_r("fisk", salt.c_str(), &tlcrypt)) {
+        const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
+
+        if (e && (e[0] != '*')) {
            prefix = pfx;
            return salt;
        }
@@ -153,76 +163,128 @@ static sstring hashpw(const sstring& pass) {
    return hashpw(pass, gensalt());
 }

-future<> auth::password_authenticator::start() {
-    return auth::once_among_shards([this] {
-        gensalt(); // do this once to determine usable hashing
+static bool has_salted_hash(const cql3::untyped_result_set_row& row) {
+    return !row.get_or<sstring>(SALTED_HASH, "").empty();
+}

-        static const sstring create_table = sprint(
-                "CREATE TABLE %s.%s ("
-                "%s text,"
-                "%s text," // salt + hash + number of rounds
-                "options map<text,text>,"// for future extensions
-                "PRIMARY KEY(%s)"
-                ") WITH gc_grace_seconds=%d",
-                meta::AUTH_KS,
-                CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
-                90 * 24 * 60 * 60); // 3 months.
+static const sstring update_row_query = sprint(
+        "UPDATE %s SET %s = ? WHERE %s = ?",
+        meta::roles_table::qualified_name(),
+        SALTED_HASH,
+        meta::roles_table::role_col_name);

-        return auth::create_metadata_table_if_missing(
-                CREDENTIALS_CF,
-                _qp,
-                create_table,
-                _migration_manager).then([this] {
-            auth::delay_until_system_ready(_delayed, [this] {
-                return has_existing_users().then([this](bool existing) {
-                    if (!existing) {
-                        return _qp.process(
-                                sprint(
-                                        "INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
-                                        meta::AUTH_KS,
-                                        CREDENTIALS_CF,
-                                        USER_NAME, SALTED_HASH),
-                                db::consistency_level::ONE,
-                                { DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD) }).then([](auto) {
-                            plogger.info("Created default user '{}'", DEFAULT_USER_NAME);
-                        });
-                    }
+static const sstring legacy_table_name{"credentials"};

-                    return make_ready_future<>();
-                });
-            });
-        });
+bool password_authenticator::legacy_metadata_exists() const {
+    return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
+}
+
+future<> password_authenticator::migrate_legacy_metadata() const {
+    plogger.info("Starting migration of legacy authentication metadata.");
+    static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
+
+    return _qp.process(
+            query,
+            db::consistency_level::QUORUM,
+            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
+        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
+            auto username = row.get_as<sstring>("username");
+            auto salted_hash = row.get_as<sstring>(SALTED_HASH);
+
+            return _qp.process(
+                    update_row_query,
+                    consistency_for_user(username),
+                    internal_distributed_timeout_config(),
+                    {std::move(salted_hash), username}).discard_result();
+        }).finally([results] {});
+    }).then([] {
+       plogger.info("Finished migrating legacy authentication metadata.");
+    }).handle_exception([](std::exception_ptr ep) {
+        plogger.error("Encountered an error during migration!");
+        std::rethrow_exception(ep);
    });
 }

-future<> auth::password_authenticator::stop() {
-    return make_ready_future<>();
+future<> password_authenticator::create_default_if_missing() const {
+    return default_role_row_satisfies(_qp, &has_salted_hash).then([this](bool exists) {
+        if (!exists) {
+            return _qp.process(
+                    update_row_query,
+                    db::consistency_level::QUORUM,
+                    internal_distributed_timeout_config(),
+                    {hashpw(DEFAULT_USER_PASSWORD), DEFAULT_USER_NAME}).then([](auto&&) {
+                plogger.info("Created default superuser authentication record.");
+            });
+        }
+
+        return make_ready_future<>();
+    });
 }

-db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
-    if (username == DEFAULT_USER_NAME) {
+future<> password_authenticator::start() {
+     return once_among_shards([this] {
+         gensalt(); // do this once to determine usable hashing
+
+         auto f = create_metadata_table_if_missing(
+                 meta::roles_table::name,
+                 _qp,
+                 meta::roles_table::creation_query(),
+                 _migration_manager);
+
+         _stopped = do_after_system_ready(_as, [this] {
+             return async([this] {
+                 wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
+
+                 if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {
+                     if (legacy_metadata_exists()) {
+                         plogger.warn("Ignoring legacy authentication metadata since nondefault data already exist.");
+                     }
+
+                     return;
+                 }
+
+                 if (legacy_metadata_exists()) {
+                     migrate_legacy_metadata().get0();
+                     return;
+                 }
+
+                 create_default_if_missing().get0();
+             });
+         });
+
+         return f;
+     });
+ }
+
+future<> password_authenticator::stop() {
+    _as.request_abort();
+    return _stopped.handle_exception_type([] (const sleep_aborted&) { });
+}
+
+db::consistency_level password_authenticator::consistency_for_user(stdx::string_view role_name) {
+    if (role_name == DEFAULT_USER_NAME) {
        return db::consistency_level::QUORUM;
    }
    return db::consistency_level::LOCAL_ONE;
 }

-const sstring& auth::password_authenticator::qualified_java_name() const {
+const sstring& password_authenticator::qualified_java_name() const {
    return password_authenticator_name();
 }

-bool auth::password_authenticator::require_authentication() const {
+bool password_authenticator::require_authentication() const {
    return true;
 }

-auth::authenticator::option_set auth::password_authenticator::supported_options() const {
-    return option_set::of<option::PASSWORD>();
+authentication_option_set password_authenticator::supported_options() const {
+    return authentication_option_set{authentication_option::password};
 }

-auth::authenticator::option_set auth::password_authenticator::alterable_options() const {
-    return option_set::of<option::PASSWORD>();
+authentication_option_set password_authenticator::alterable_options() const {
+    return authentication_option_set{authentication_option::password};
 }

-future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::authenticate(
+future<authenticated_user> password_authenticator::authenticate(
                const credentials_map& credentials) const {
    if (!credentials.count(USERNAME_KEY)) {
        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
@@ -240,16 +302,29 @@ future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::au
    // Rely on query processing caching statements instead, and lets assume
    // that a map lookup string->statement is not gonna kill us much.
    return futurize_apply([this, username, password] {
-        return _qp.process(sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
-                                        meta::AUTH_KS, CREDENTIALS_CF, USER_NAME),
-                        consistency_for_user(username), {username}, true);
+        static const sstring query = sprint(
+                "SELECT %s FROM %s WHERE %s = ?",
+                SALTED_HASH,
+                meta::roles_table::qualified_name(),
+                meta::roles_table::role_col_name);
+
+        return _qp.process(
+                query,
+                consistency_for_user(username),
+                internal_distributed_timeout_config(),
+                {username},
+                true);
    }).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
        try {
            auto res = f.get0();
-            if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
+            auto salted_hash = std::experimental::optional<sstring>();
+            if (!res->empty()) {
+                salted_hash = res->one().get_opt<sstring>(SALTED_HASH);
+            }
+            if (!salted_hash || !checkpw(password, *salted_hash)) {
                throw exceptions::authentication_exception("Username and/or password are incorrect");
            }
-            return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>(username));
+            return make_ready_future<authenticated_user>(username);
        } catch (std::system_error &) {
            std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
        } catch (exceptions::request_execution_exception& e) {
@@ -260,52 +335,65 @@ future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::au
    });
 }

-future<> auth::password_authenticator::create(sstring username,
-                const option_map& options) {
-    try {
-        auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
-        auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
-                        meta::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
-        return _qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
-    } catch (std::out_of_range&) {
-        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
+future<> password_authenticator::create(stdx::string_view role_name, const authentication_options& options) const {
+    if (!options.password) {
+        return make_ready_future<>();
    }
+
+    return _qp.process(
+            update_row_query,
+            consistency_for_user(role_name),
+            internal_distributed_timeout_config(),
+            {hashpw(*options.password), sstring(role_name)}).discard_result();
 }

-future<> auth::password_authenticator::alter(sstring username,
-                const option_map& options) {
-    try {
-        auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
-        auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
-                        meta::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
-        return _qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
-    } catch (std::out_of_range&) {
-        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
+future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) const {
+    if (!options.password) {
+        return make_ready_future<>();
    }
+
+    static const sstring query = sprint(
+            "UPDATE %s SET %s = ? WHERE %s = ?",
+            meta::roles_table::qualified_name(),
+            SALTED_HASH,
+            meta::roles_table::role_col_name);
+
+    return _qp.process(
+            query,
+            consistency_for_user(role_name),
+            internal_distributed_timeout_config(),
+            {hashpw(*options.password), sstring(role_name)}).discard_result();
 }

-future<> auth::password_authenticator::drop(sstring username) {
-    try {
-        auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
-                        meta::AUTH_KS, CREDENTIALS_CF, USER_NAME);
-        return _qp.process(query, consistency_for_user(username), { username }).discard_result();
-    } catch (std::out_of_range&) {
-        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
-    }
+future<> password_authenticator::drop(stdx::string_view name) const {
+    static const sstring query = sprint(
+            "DELETE %s FROM %s WHERE %s = ?",
+            SALTED_HASH,
+            meta::roles_table::qualified_name(),
+            meta::roles_table::role_col_name);
+
+    return _qp.process(
+            query, consistency_for_user(name),
+            internal_distributed_timeout_config(),
+            {sstring(name)}).discard_result();
 }

-const auth::resource_ids& auth::password_authenticator::protected_resources() const {
-    static const resource_ids ids({ data_resource(meta::AUTH_KS, CREDENTIALS_CF) });
-    return ids;
+future<custom_options> password_authenticator::query_custom_options(stdx::string_view role_name) const {
+    return make_ready_future<custom_options>();
 }

-::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
-    class plain_text_password_challenge: public sasl_challenge {
+const resource_set& password_authenticator::protected_resources() const {
+    static const resource_set resources({make_data_resource(meta::AUTH_KS, meta::roles_table::name)});
+    return resources;
+}
+
+::shared_ptr<authenticator::sasl_challenge> password_authenticator::new_sasl_challenge() const {
+    class plain_text_password_challenge : public sasl_challenge {
        const password_authenticator& _self;

    public:
-        plain_text_password_challenge(const password_authenticator& self) : _self(self)
-        {}
+        plain_text_password_challenge(const password_authenticator& self) : _self(self) {
+        }

        /**
         * SASL PLAIN mechanism specifies that credentials are encoded in a
@@ -355,10 +443,12 @@ const auth::resource_ids& auth::password_authenticator::protected_resources() co
            _complete = true;
            return {};
        }
+
        bool is_complete() const override {
            return _complete;
        }
-        future<::shared_ptr<authenticated_user>> get_authenticated_user() const override {
+
+        future<authenticated_user> get_authenticated_user() const override {
            return _self.authenticate(_credentials);
        }
    private:
@@ -368,49 +458,4 @@ const auth::resource_ids& auth::password_authenticator::protected_resources() co
    return ::make_shared<plain_text_password_challenge>(*this);
 }

-
-//
-// Similar in structure to `auth::service::has_existing_users()`, but trying to generalize the pattern breaks all kinds
-// of module boundaries and leaks implementation details.
-//
-future<bool> auth::password_authenticator::has_existing_users() const {
-    static const sstring default_user_query = sprint(
-            "SELECT * FROM %s.%s WHERE %s = ?",
-            meta::AUTH_KS,
-            CREDENTIALS_CF,
-            USER_NAME);
-
-    static const sstring all_users_query = sprint(
-            "SELECT * FROM %s.%s LIMIT 1",
-            meta::AUTH_KS,
-            CREDENTIALS_CF);
-
-    // This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we
-    // can potentially avoid doing a range query with a high consistency level.
-
-    return _qp.process(
-            default_user_query,
-            db::consistency_level::ONE,
-            { meta::DEFAULT_SUPERUSER_NAME },
-            true).then([this](auto results) {
-        if (!results->empty()) {
-            return make_ready_future<bool>(true);
-        }
-
-        return _qp.process(
-                default_user_query,
-                db::consistency_level::QUORUM,
-                { meta::DEFAULT_SUPERUSER_NAME },
-                true).then([this](auto results) {
-            if (!results->empty()) {
-                return make_ready_future<bool>(true);
-            }
-
-            return _qp.process(
-                    all_users_query,
-                    db::consistency_level::QUORUM).then([](auto results) {
-                return make_ready_future<bool>(!results->empty());
-            });
-        });
-    });
 }
--- a/auth/password_authenticator.hh
+++ b/auth/password_authenticator.hh
@@ -41,9 +41,10 @@

 #pragma once

-#include "authenticator.hh"
+#include <seastar/core/abort_source.hh>
+
+#include "auth/authenticator.hh"
 #include "cql3/query_processor.hh"
-#include "delayed_tasks.hh"

 namespace service {
 class migration_manager;
@@ -55,35 +56,49 @@ const sstring& password_authenticator_name();

 class password_authenticator : public authenticator {
    cql3::query_processor& _qp;
-
    ::service::migration_manager& _migration_manager;
-
-    delayed_tasks<> _delayed{};
+    future<> _stopped;
+    seastar::abort_source _as;

 public:
+    static db::consistency_level consistency_for_user(stdx::string_view role_name);
+
    password_authenticator(cql3::query_processor&, ::service::migration_manager&);
+
    ~password_authenticator();

-    future<> start() override;
+    virtual future<> start() override;

-    future<> stop() override;
+    virtual future<> stop() override;

-    const sstring& qualified_java_name() const override;
-    bool require_authentication() const override;
-    option_set supported_options() const override;
-    option_set alterable_options() const override;
-    future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override;
-    future<> create(sstring username, const option_map& options) override;
-    future<> alter(sstring username, const option_map& options) override;
-    future<> drop(sstring username) override;
-    const resource_ids& protected_resources() const override;
-    ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
+    virtual const sstring& qualified_java_name() const override;

+    virtual bool require_authentication() const override;

-    static db::consistency_level consistency_for_user(const sstring& username);
+    virtual authentication_option_set supported_options() const override;
+
+    virtual authentication_option_set alterable_options() const override;
+
+    virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;
+
+    virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override;
+
+    virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override;
+
+    virtual future<> drop(stdx::string_view role_name) const override;
+
+    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override;
+
+    virtual const resource_set& protected_resources() const override;
+
+    virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;

 private:
-    future<bool> has_existing_users() const;
+    bool legacy_metadata_exists() const;
+
+    future<> migrate_legacy_metadata() const;
+
+    future<> create_default_if_missing() const;
 };

 }
--- a/auth/permission.cc
+++ b/auth/permission.cc
@@ -39,32 +39,33 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

-#include <unordered_map>
-#include <boost/algorithm/string.hpp>
-#include "permission.hh"
+#include "auth/permission.hh"
+
+#include <boost/algorithm/string.hpp>
+
+#include <unordered_map>
+
+const auth::permission_set auth::permissions::ALL = auth::permission_set::of<
+        auth::permission::CREATE,
+        auth::permission::ALTER,
+        auth::permission::DROP,
+        auth::permission::SELECT,
+        auth::permission::MODIFY,
+        auth::permission::AUTHORIZE,
+        auth::permission::DESCRIBE>();

-const auth::permission_set auth::permissions::ALL_DATA =
-                auth::permission_set::of<auth::permission::CREATE,
-                                auth::permission::ALTER, auth::permission::DROP,
-                                auth::permission::SELECT,
-                                auth::permission::MODIFY,
-                                auth::permission::AUTHORIZE>();
-const auth::permission_set auth::permissions::ALL = auth::permissions::ALL_DATA;
 const auth::permission_set auth::permissions::NONE;
-const auth::permission_set auth::permissions::ALTERATIONS =
-                auth::permission_set::of<auth::permission::CREATE,
-                                auth::permission::ALTER, auth::permission::DROP>();

 static const std::unordered_map<sstring, auth::permission> permission_names({
-    { "READ", auth::permission::READ },
-    { "WRITE", auth::permission::WRITE  },
-    { "CREATE", auth::permission::CREATE },
-    { "ALTER", auth::permission::ALTER },
-    { "DROP", auth::permission::DROP },
-    { "SELECT", auth::permission::SELECT  },
-    { "MODIFY", auth::permission::MODIFY   },
-    { "AUTHORIZE", auth::permission::AUTHORIZE },
-});
+        {"READ", auth::permission::READ},
+        {"WRITE", auth::permission::WRITE},
+        {"CREATE", auth::permission::CREATE},
+        {"ALTER", auth::permission::ALTER},
+        {"DROP", auth::permission::DROP},
+        {"SELECT", auth::permission::SELECT},
+        {"MODIFY", auth::permission::MODIFY},
+        {"AUTHORIZE", auth::permission::AUTHORIZE},
+        {"DESCRIBE", auth::permission::DESCRIBE}});

 const sstring& auth::permissions::to_string(permission p) {
    for (auto& v : permission_names) {
--- a/auth/permission.hh
+++ b/auth/permission.hh
@@ -42,10 +42,11 @@
 #pragma once

 #include <unordered_set>
+
 #include <seastar/core/sstring.hh>

-#include "seastarx.hh"
 #include "enum_set.hh"
+#include "seastarx.hh"

 namespace auth {

@@ -66,9 +67,13 @@ enum class permission {

    // permission management
    AUTHORIZE, // required for GRANT and REVOKE.
+    DESCRIBE, // required on the root-level role resource to list all roles.
+
 };

-typedef enum_set<super_enum<permission,
+typedef enum_set<
+        super_enum<
+                permission,
                permission::READ,
                permission::WRITE,
                permission::CREATE,
@@ -76,16 +81,15 @@ typedef enum_set<super_enum<permission,
                permission::DROP,
                permission::SELECT,
                permission::MODIFY,
-                permission::AUTHORIZE>> permission_set;
+                permission::AUTHORIZE,
+                permission::DESCRIBE>> permission_set;

 bool operator<(const permission_set&, const permission_set&);

 namespace permissions {

-extern const permission_set ALL_DATA;
 extern const permission_set ALL;
 extern const permission_set NONE;
-extern const permission_set ALTERATIONS;

 const sstring& to_string(permission);
 permission from_string(const sstring&);
@@ -93,7 +97,6 @@ permission from_string(const sstring&);
 std::unordered_set<sstring> to_strings(const permission_set&);
 permission_set from_strings(const std::unordered_set<sstring>&);

-
 }

 }
--- a/auth/permissions_cache.cc
+++ b/auth/permissions_cache.cc
@@ -39,13 +39,15 @@ permissions_cache_config permissions_cache_config::from_db_config(const db::conf

 permissions_cache::permissions_cache(const permissions_cache_config& c, service& ser, logging::logger& log)
        : _cache(c.max_entries, c.validity_period, c.update_period, log, [&ser, &log](const key_type& k) {
-              log.debug("Refreshing permissions for {}", k.first.name());
-              return ser.underlying_authorizer().authorize(ser, ::make_shared<authenticated_user>(k.first), k.second);
+              log.debug("Refreshing permissions for {}", k.first);
+              return ser.get_uncached_permissions(k.first, k.second);
          }) {
 }

-future<permission_set> permissions_cache::get(::shared_ptr<authenticated_user> user, data_resource r) {
-    return _cache.get(key_type(*user, r));
+future<permission_set> permissions_cache::get(const role_or_anonymous& maybe_role, const resource& r) {
+    return do_with(key_type(maybe_role, r), [this](const auto& k) {
+        return _cache.get(k);
+    });
 }

 }
--- a/auth/permissions_cache.hh
+++ b/auth/permissions_cache.hh
@@ -22,37 +22,29 @@
 #pragma once

 #include <chrono>
+#include <experimental/string_view>
 #include <functional>
 #include <iostream>
+#include <optional>
 #include <utility>

 #include <seastar/core/future.hh>
 #include <seastar/core/shared_ptr.hh>
+#include <seastar/core/sstring.hh>

 #include "auth/authenticated_user.hh"
-#include "auth/data_resource.hh"
 #include "auth/permission.hh"
+#include "auth/resource.hh"
+#include "auth/role_or_anonymous.hh"
 #include "log.hh"
+#include "stdx.hh"
+#include "utils/hash.hh"
 #include "utils/loading_cache.hh"

 namespace std {

-template <>
-struct hash<auth::data_resource> final {
-    size_t operator()(const auth::data_resource & v) const {
-        return v.hash_value();
-    }
-};
-
-template <>
-struct hash<auth::authenticated_user> final {
-    size_t operator()(const auth::authenticated_user & v) const {
-        return utils::tuple_hash()(v.name(), v.is_anonymous());
-    }
-};
-
-inline std::ostream& operator<<(std::ostream& os, const std::pair<auth::authenticated_user, auth::data_resource>& p) {
-    os << "{user: " << p.first.name() << ", data_resource: " << p.second << "}";
+inline std::ostream& operator<<(std::ostream& os, const pair<auth::role_or_anonymous, auth::resource>& p) {
+    os << "{role: " << p.first << ", resource: " << p.second << "}";
    return os;
 }

@@ -76,7 +68,7 @@ struct permissions_cache_config final {

 class permissions_cache final {
    using cache_type = utils::loading_cache<
-            std::pair<authenticated_user, data_resource>,
+            std::pair<role_or_anonymous, resource>,
            permission_set,
            utils::loading_cache_reload_enabled::yes,
            utils::simple_entry_size<permission_set>,
@@ -89,15 +81,11 @@ class permissions_cache final {
 public:
    explicit permissions_cache(const permissions_cache_config&, service&, logging::logger&);

-    future<> start() {
-        return make_ready_future<>();
-    }
-
    future <> stop() {
        return _cache.stop();
    }

-    future<permission_set> get(::shared_ptr<authenticated_user>, data_resource);
+    future<permission_set> get(const role_or_anonymous&, const resource&);
 };

 }
--- a/auth/resource.cc
+++ b/auth/resource.cc
@@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright (C) 2016 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/resource.hh"
+
+#include <algorithm>
+#include <iterator>
+#include <unordered_map>
+
+#include <boost/algorithm/string/join.hpp>
+#include <boost/algorithm/string/split.hpp>
+
+#include "service/storage_proxy.hh"
+
+namespace auth {
+
+std::ostream& operator<<(std::ostream& os, resource_kind kind) {
+    switch (kind) {
+        case resource_kind::data: os << "data"; break;
+        case resource_kind::role: os << "role"; break;
+    }
+
+    return os;
+}
+
+static const std::unordered_map<resource_kind, stdx::string_view> roots{
+        {resource_kind::data, "data"},
+        {resource_kind::role, "roles"}};
+
+static const std::unordered_map<resource_kind, std::size_t> max_parts{
+        {resource_kind::data, 2},
+        {resource_kind::role, 1}};
+
+static permission_set applicable_permissions(const data_resource_view& dv) {
+    if (dv.table()) {
+        return permission_set::of<
+                permission::ALTER,
+                permission::DROP,
+                permission::SELECT,
+                permission::MODIFY,
+                permission::AUTHORIZE>();
+    }
+
+    return permission_set::of<
+            permission::CREATE,
+            permission::ALTER,
+            permission::DROP,
+            permission::SELECT,
+            permission::MODIFY,
+            permission::AUTHORIZE>();
+}
+
+static permission_set applicable_permissions(const role_resource_view& rv) {
+    if (rv.role()) {
+        return permission_set::of<permission::ALTER, permission::DROP, permission::AUTHORIZE>();
+    }
+
+    return permission_set::of<
+            permission::CREATE,
+            permission::ALTER,
+            permission::DROP,
+            permission::AUTHORIZE,
+            permission::DESCRIBE>();
+}
+
+resource::resource(resource_kind kind) : _kind(kind), _parts{sstring(roots.at(kind))}  {
+}
+
+resource::resource(resource_kind kind, std::vector<sstring> parts) : resource(kind) {
+    _parts.reserve(parts.size() + 1);
+    _parts.insert(_parts.end(), std::make_move_iterator(parts.begin()), std::make_move_iterator(parts.end()));
+}
+
+resource::resource(data_resource_t, stdx::string_view keyspace)
+        : resource(resource_kind::data, std::vector<sstring>{sstring(keyspace)}) {
+}
+
+resource::resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table)
+        : resource(resource_kind::data, std::vector<sstring>{sstring(keyspace), sstring(table)}) {
+}
+
+resource::resource(role_resource_t, stdx::string_view role)
+        : resource(resource_kind::role, std::vector<sstring>{sstring(role)}) {
+}
+
+sstring resource::name() const {
+    return boost::algorithm::join(_parts, "/");
+}
+
+std::optional<resource> resource::parent() const {
+    if (_parts.size() == 1) {
+        return {};
+    }
+
+    resource copy = *this;
+    copy._parts.pop_back();
+    return copy;
+}
+
+permission_set resource::applicable_permissions() const {
+    permission_set ps;
+
+    switch (_kind) {
+        case resource_kind::data: ps = ::auth::applicable_permissions(data_resource_view(*this)); break;
+        case resource_kind::role: ps = ::auth::applicable_permissions(role_resource_view(*this)); break;
+    }
+
+    return ps;
+}
+
+bool operator<(const resource& r1, const resource& r2) {
+    if (r1._kind != r2._kind) {
+        return r1._kind < r2._kind;
+    }
+
+    return std::lexicographical_compare(
+            r1._parts.cbegin() + 1,
+            r1._parts.cend(),
+            r2._parts.cbegin() + 1,
+            r2._parts.cend());
+}
+
+std::ostream& operator<<(std::ostream& os, const resource& r) {
+    switch (r.kind()) {
+        case resource_kind::data: return os << data_resource_view(r);
+        case resource_kind::role: return os << role_resource_view(r);
+    }
+
+    return os;
+}
+
+data_resource_view::data_resource_view(const resource& r) : _resource(r) {
+    if (r._kind != resource_kind::data) {
+        throw resource_kind_mismatch(resource_kind::data, r._kind);
+    }
+}
+
+std::optional<stdx::string_view> data_resource_view::keyspace() const {
+    if (_resource._parts.size() == 1) {
+        return {};
+    }
+
+    return _resource._parts[1];
+}
+
+std::optional<stdx::string_view> data_resource_view::table() const {
+    if (_resource._parts.size() <= 2) {
+        return {};
+    }
+
+    return _resource._parts[2];
+}
+
+std::ostream& operator<<(std::ostream& os, const data_resource_view& v) {
+    const auto keyspace = v.keyspace();
+    const auto table = v.table();
+
+    if (!keyspace) {
+        os << "<all keyspaces>";
+    } else if (!table) {
+        os << "<keyspace " << *keyspace << '>';
+    } else {
+        os << "<table " << *keyspace << '.' << *table << '>';
+    }
+
+    return os;
+}
+
+role_resource_view::role_resource_view(const resource& r) : _resource(r) {
+    if (r._kind != resource_kind::role) {
+        throw resource_kind_mismatch(resource_kind::role, r._kind);
+    }
+}
+
+std::optional<stdx::string_view> role_resource_view::role() const {
+    if (_resource._parts.size() == 1) {
+        return {};
+    }
+
+    return _resource._parts[1];
+}
+
+std::ostream& operator<<(std::ostream& os, const role_resource_view& v) {
+    const auto role = v.role();
+
+    if (!role) {
+        os << "<all roles>";
+    } else {
+        os << "<role " << *role << '>';
+    }
+
+    return os;
+}
+
+resource parse_resource(stdx::string_view name) {
+    static const std::unordered_map<stdx::string_view, resource_kind> reverse_roots = [] {
+        std::unordered_map<stdx::string_view, resource_kind> result;
+
+        for (const auto& pair : roots) {
+            result.emplace(pair.second, pair.first);
+        }
+
+        return result;
+    }();
+
+    std::vector<sstring> parts;
+    boost::split(parts, name, [](char ch) { return ch == '/'; });
+
+    if (parts.empty()) {
+        throw invalid_resource_name(name);
+    }
+
+    const auto iter = reverse_roots.find(parts[0]);
+    if (iter == reverse_roots.end()) {
+        throw invalid_resource_name(name);
+    }
+
+    const auto kind = iter->second;
+    parts.erase(parts.begin());
+
+    if (parts.size() > max_parts.at(kind)) {
+        throw invalid_resource_name(name);
+    }
+
+    return resource(kind, std::move(parts));
+}
+
+static const resource the_root_data_resource{resource_kind::data};
+
+const resource& root_data_resource() {
+    return the_root_data_resource;
+}
+
+static const resource the_root_role_resource{resource_kind::role};
+
+const resource& root_role_resource() {
+    return the_root_role_resource;
+}
+
+resource_set expand_resource_family(const resource& rr) {
+    resource r = rr;
+    resource_set rs;
+
+    while (true) {
+        const auto pr = r.parent();
+        rs.insert(std::move(r));
+
+        if (!pr) {
+            break;
+        }
+
+        r = std::move(*pr);
+    }
+
+    return rs;
+}
+
+}
--- a/auth/resource.hh
+++ b/auth/resource.hh
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright (C) 2016 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <experimental/string_view>
+#include <iostream>
+#include <optional>
+#include <stdexcept>
+#include <tuple>
+#include <vector>
+#include <unordered_set>
+
+#include <seastar/core/print.hh>
+#include <seastar/core/sstring.hh>
+
+#include "auth/permission.hh"
+#include "seastarx.hh"
+#include "stdx.hh"
+#include "utils/hash.hh"
+
+namespace auth {
+
+class invalid_resource_name : public std::invalid_argument {
+public:
+    explicit invalid_resource_name(stdx::string_view name)
+            : std::invalid_argument(sprint("The resource name '%s' is invalid.", name)) {
+    }
+};
+
+enum class resource_kind {
+    data, role
+};
+
+std::ostream& operator<<(std::ostream&, resource_kind);
+
+///
+/// Type tag for constructing data resources.
+///
+struct data_resource_t final {};
+
+///
+/// Type tag for constructing role resources.
+///
+struct role_resource_t final {};
+
+///
+/// Resources are entities that users can be granted permissions on.
+///
+/// There are data (keyspaces and tables) and role resources. There may be other kinds of resources in the future.
+///
+/// When they are stored as system metadata, resources have the form `root/part_0/part_1/.../part_n`. Each kind of
+/// resource has a specific root prefix, followed by a maximum of `n` parts (where `n` is distinct for each kind of
+/// resource as well). In this code, this form is called the "name".
+///
+/// Since all resources have this same structure, all the different kinds are stored in instances of the same class:
+/// \ref resource. When we wish to query a resource for kind-specific data (like the table of a "data" resource), we
+/// create a kind-specific "view" of the resource.
+///
+class resource final {
+    resource_kind _kind;
+
+    std::vector<sstring> _parts;
+
+public:
+    ///
+    /// A root resource of a particular kind.
+    ///
+    explicit resource(resource_kind);
+    resource(data_resource_t, stdx::string_view keyspace);
+    resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table);
+    resource(role_resource_t, stdx::string_view role);
+
+    resource_kind kind() const noexcept {
+        return _kind;
+    }
+
+    ///
+    /// A machine-friendly identifier unique to each resource.
+    ///
+    sstring name() const;
+
+    std::optional<resource> parent() const;
+
+    permission_set applicable_permissions() const;
+
+private:
+    resource(resource_kind, std::vector<sstring> parts);
+
+    friend class std::hash<resource>;
+    friend class data_resource_view;
+    friend class role_resource_view;
+
+    friend bool operator<(const resource&, const resource&);
+    friend bool operator==(const resource&, const resource&);
+    friend resource parse_resource(stdx::string_view);
+};
+
+bool operator<(const resource&, const resource&);
+
+inline bool operator==(const resource& r1, const resource& r2) {
+    return (r1._kind == r2._kind) && (r1._parts == r2._parts);
+}
+
+inline bool operator!=(const resource& r1, const resource& r2) {
+    return !(r1 == r2);
+}
+
+std::ostream& operator<<(std::ostream&, const resource&);
+
+class resource_kind_mismatch : public std::invalid_argument {
+public:
+    explicit resource_kind_mismatch(resource_kind expected, resource_kind actual)
+        : std::invalid_argument(
+            sprint("This resource has kind '%s', but was expected to have kind '%s'.", actual, expected)) {
+    }
+};
+
+/// A "data" view of \ref resource.
+///
+/// If neither `keyspace` nor `table` is present, this is the root resource.
+class data_resource_view final {
+    const resource& _resource;
+
+public:
+    ///
+    /// \throws `resource_kind_mismatch` if the argument is not a `data` resource.
+    ///
+    explicit data_resource_view(const resource& r);
+
+    std::optional<stdx::string_view> keyspace() const;
+
+    std::optional<stdx::string_view> table() const;
+};
+
+std::ostream& operator<<(std::ostream&, const data_resource_view&);
+
+///
+/// A "role" view of \ref resource.
+///
+/// If `role` is not present, this is the root resource.
+///
+class role_resource_view final {
+    const resource& _resource;
+
+public:
+    ///
+    /// \throws \ref resource_kind_mismatch if the argument is not a "role" resource.
+    ///
+    explicit role_resource_view(const resource&);
+
+    std::optional<stdx::string_view> role() const;
+};
+
+std::ostream& operator<<(std::ostream&, const role_resource_view&);
+
+///
+/// Parse a resource from its name.
+///
+/// \throws \ref invalid_resource_name when the name is malformed.
+///
+resource parse_resource(stdx::string_view name);
+
+const resource& root_data_resource();
+
+inline resource make_data_resource(stdx::string_view keyspace) {
+    return resource(data_resource_t{}, keyspace);
+}
+inline resource make_data_resource(stdx::string_view keyspace, stdx::string_view table) {
+    return resource(data_resource_t{}, keyspace, table);
+}
+
+const resource& root_role_resource();
+
+inline resource make_role_resource(stdx::string_view role) {
+    return resource(role_resource_t{}, role);
+}
+
+}
+
+namespace std {
+
+template <>
+struct hash<auth::resource> {
+    static size_t hash_data(const auth::data_resource_view& dv) {
+        return utils::tuple_hash()(std::make_tuple(auth::resource_kind::data, dv.keyspace(), dv.table()));
+    }
+
+    static size_t hash_role(const auth::role_resource_view& rv) {
+        return utils::tuple_hash()(std::make_tuple(auth::resource_kind::role, rv.role()));
+    }
+
+    size_t operator()(const auth::resource& r) const {
+        std::size_t value;
+
+        switch (r._kind) {
+        case auth::resource_kind::data: value = hash_data(auth::data_resource_view(r)); break;
+        case auth::resource_kind::role: value = hash_role(auth::role_resource_view(r)); break;
+        }
+
+        return value;
+    }
+};
+
+}
+
+namespace auth {
+
+using resource_set = std::unordered_set<resource>;
+
+//
+// A resource and all of its parents.
+//
+resource_set expand_resource_family(const resource&);
+
+}
--- a/auth/role_manager.hh
+++ b/auth/role_manager.hh
@@ -0,0 +1,169 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <experimental/string_view>
+#include <memory>
+#include <optional>
+#include <stdexcept>
+#include <unordered_set>
+
+#include <seastar/core/future.hh>
+#include <seastar/core/print.hh>
+#include <seastar/core/sstring.hh>
+
+#include "auth/resource.hh"
+#include "seastarx.hh"
+#include "stdx.hh"
+
+namespace auth {
+
+struct role_config final {
+    bool is_superuser{false};
+    bool can_login{false};
+};
+
+///
+/// Differential update for altering existing roles.
+///
+struct role_config_update final {
+    std::optional<bool> is_superuser{};
+    std::optional<bool> can_login{};
+};
+
+///
+/// A logical argument error for a role-management operation.
+///
+class roles_argument_exception : public std::invalid_argument {
+public:
+    using std::invalid_argument::invalid_argument;
+};
+
+class role_already_exists : public roles_argument_exception {
+public:
+    explicit role_already_exists(stdx::string_view role_name)
+            : roles_argument_exception(sprint("Role %s already exists.", role_name)) {
+    }
+};
+
+class nonexistant_role : public roles_argument_exception {
+public:
+    explicit nonexistant_role(stdx::string_view role_name)
+            : roles_argument_exception(sprint("Role %s doesn't exist.", role_name)) {
+    }
+};
+
+class role_already_included : public roles_argument_exception {
+public:
+    role_already_included(stdx::string_view grantee_name, stdx::string_view role_name)
+            : roles_argument_exception(
+                      sprint("%s already includes role %s.", grantee_name, role_name)) {
+    }
+};
+
+class revoke_ungranted_role : public roles_argument_exception {
+public:
+    revoke_ungranted_role(stdx::string_view revokee_name, stdx::string_view role_name)
+            : roles_argument_exception(
+                      sprint("%s was not granted role %s, so it cannot be revoked.", revokee_name, role_name)) {
+    }
+};
+
+using role_set = std::unordered_set<sstring>;
+
+enum class recursive_role_query { yes, no };
+
+///
+/// Abstract client for managing roles.
+///
+/// All state necessary for managing roles is stored externally to the client instance.
+///
+/// All implementations should throw role-related exceptions as documented. Authorization is not addressed here, and
+/// access-control should never be enforced in implementations.
+///
+class role_manager {
+public:
+    virtual ~role_manager() = default;
+
+    virtual stdx::string_view qualified_java_name() const noexcept = 0;
+
+    virtual const resource_set& protected_resources() const = 0;
+
+    virtual future<> start() = 0;
+
+    virtual future<> stop() = 0;
+
+    ///
+    /// \returns an exceptional future with \ref role_already_exists for a role that has previously been created.
+    ///
+    virtual future<> create(stdx::string_view role_name, const role_config&) const = 0;
+
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
+    ///
+    virtual future<> drop(stdx::string_view role_name) const = 0;
+
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
+    ///
+    virtual future<> alter(stdx::string_view role_name, const role_config_update&) const = 0;
+
+    ///
+    /// Grant `role_name` to `grantee_name`.
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if either the role or the grantee do not exist.
+    ///
+    /// \returns an exceptional future with \ref role_already_included if granting the role would be redundant, or
+    /// create a cycle.
+    ///
+    virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const = 0;
+
+    ///
+    /// Revoke `role_name` from `revokee_name`.
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if either the role or the revokee do not exist.
+    ///
+    /// \returns an exceptional future with \ref revoke_ungranted_role if the role was not granted.
+    ///
+    virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const = 0;
+
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
+    ///
+    virtual future<role_set> query_granted(stdx::string_view grantee, recursive_role_query) const = 0;
+
+    virtual future<role_set> query_all() const = 0;
+
+    virtual future<bool> exists(stdx::string_view role_name) const = 0;
+
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
+    ///
+    virtual future<bool> is_superuser(stdx::string_view role_name) const = 0;
+
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
+    ///
+    virtual future<bool> can_login(stdx::string_view role_name) const = 0;
+};
+
+}
--- a/auth/role_or_anonymous.cc
+++ b/auth/role_or_anonymous.cc
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/role_or_anonymous.hh"
+
+#include <iostream>
+
+namespace auth {
+
+std::ostream& operator<<(std::ostream& os, const role_or_anonymous& mr) {
+    os << mr.name.value_or("<anonymous>");
+    return os;
+}
+
+bool operator==(const role_or_anonymous& mr1, const role_or_anonymous& mr2) noexcept {
+    return mr1.name == mr2.name;
+}
+
+bool is_anonymous(const role_or_anonymous& mr) noexcept {
+    return !mr.name.has_value();
+}
+
+}
--- a/auth/role_or_anonymous.hh
+++ b/auth/role_or_anonymous.hh
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <experimental/string_view>
+#include <functional>
+#include <iosfwd>
+#include <optional>
+
+#include <seastar/core/sstring.hh>
+
+#include "seastarx.hh"
+#include "stdx.hh"
+
+namespace auth {
+
+class role_or_anonymous final {
+public:
+    std::optional<sstring> name{};
+
+    role_or_anonymous() = default;
+    role_or_anonymous(stdx::string_view name) : name(name) {
+    }
+};
+
+std::ostream& operator<<(std::ostream&, const role_or_anonymous&);
+
+bool operator==(const role_or_anonymous&, const role_or_anonymous&) noexcept;
+
+inline bool operator!=(const role_or_anonymous& mr1, const role_or_anonymous& mr2) noexcept {
+    return !(mr1 == mr2);
+}
+
+bool is_anonymous(const role_or_anonymous&) noexcept;
+
+}
+
+namespace std {
+
+template <>
+struct hash<auth::role_or_anonymous> {
+    size_t operator()(const auth::role_or_anonymous& mr) const {
+        return hash<std::optional<sstring>>()(mr.name);
+    }
+};
+
+}
--- a/auth/roles-metadata.cc
+++ b/auth/roles-metadata.cc
@@ -0,0 +1,122 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/roles-metadata.hh"
+
+#include <boost/algorithm/cxx11/any_of.hpp>
+#include <seastar/core/print.hh>
+#include <seastar/core/shared_ptr.hh>
+#include <seastar/core/sstring.hh>
+
+#include "auth/common.hh"
+#include "cql3/query_processor.hh"
+#include "cql3/untyped_result_set.hh"
+
+namespace auth {
+
+namespace meta {
+
+namespace roles_table {
+
+stdx::string_view creation_query() {
+    static const sstring instance = sprint(
+            "CREATE TABLE %s ("
+            "  %s text PRIMARY KEY,"
+            "  can_login boolean,"
+            "  is_superuser boolean,"
+            "  member_of set<text>,"
+            "  salted_hash text"
+            ")",
+            qualified_name(),
+            role_col_name);
+
+    return instance;
+}
+
+stdx::string_view qualified_name() noexcept {
+    static const sstring instance = AUTH_KS + "." + sstring(name);
+    return instance;
+}
+
+}
+
+}
+
+future<bool> default_role_row_satisfies(
+        cql3::query_processor& qp,
+        std::function<bool(const cql3::untyped_result_set_row&)> p) {
+    static const sstring query = sprint(
+            "SELECT * FROM %s WHERE %s = ?",
+            meta::roles_table::qualified_name(),
+            meta::roles_table::role_col_name);
+
+    return do_with(std::move(p), [&qp](const auto& p) {
+        return qp.process(
+                query,
+                db::consistency_level::ONE,
+                infinite_timeout_config,
+                {meta::DEFAULT_SUPERUSER_NAME},
+                true).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {
+            if (results->empty()) {
+                return qp.process(
+                        query,
+                        db::consistency_level::QUORUM,
+                        internal_distributed_timeout_config(),
+                        {meta::DEFAULT_SUPERUSER_NAME},
+                        true).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
+                    if (results->empty()) {
+                        return make_ready_future<bool>(false);
+                    }
+
+                    return make_ready_future<bool>(p(results->one()));
+                });
+            }
+
+            return make_ready_future<bool>(p(results->one()));
+        });
+    });
+}
+
+future<bool> any_nondefault_role_row_satisfies(
+        cql3::query_processor& qp,
+        std::function<bool(const cql3::untyped_result_set_row&)> p) {
+    static const sstring query = sprint("SELECT * FROM %s", meta::roles_table::qualified_name());
+
+    return do_with(std::move(p), [&qp](const auto& p) {
+        return qp.process(
+                query,
+                db::consistency_level::QUORUM,
+                internal_distributed_timeout_config()).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
+            if (results->empty()) {
+                return false;
+            }
+
+            static const sstring col_name = sstring(meta::roles_table::role_col_name);
+
+            return boost::algorithm::any_of(*results, [&p](const cql3::untyped_result_set_row& row) {
+                const bool is_nondefault = row.get_as<sstring>(col_name) != meta::DEFAULT_SUPERUSER_NAME;
+                return is_nondefault && p(row);
+            });
+        });
+    });
+}
+
+}
--- a/auth/roles-metadata.hh
+++ b/auth/roles-metadata.hh
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <experimental/string_view>
+#include <functional>
+
+#include <seastar/core/future.hh>
+
+#include "seastarx.hh"
+#include "stdx.hh"
+
+namespace cql3 {
+class query_processor;
+class untyped_result_set_row;
+}
+
+namespace auth {
+
+namespace meta {
+
+namespace roles_table {
+
+stdx::string_view creation_query();
+
+constexpr stdx::string_view name{"roles", 5};
+
+stdx::string_view qualified_name() noexcept;
+
+constexpr stdx::string_view role_col_name{"role", 4};
+
+}
+
+}
+
+///
+/// Check that the default role satisfies a predicate, or `false` if the default role does not exist.
+///
+future<bool> default_role_row_satisfies(
+        cql3::query_processor&,
+        std::function<bool(const cql3::untyped_result_set_row&)>);
+
+///
+/// Check that any nondefault role satisfies a predicate. `false` if no nondefault roles exist.
+///
+future<bool> any_nondefault_role_row_satisfies(
+        cql3::query_processor&,
+        std::function<bool(const cql3::untyped_result_set_row&)>);
+
+}
--- a/auth/service.cc
+++ b/auth/service.cc
@@ -21,6 +21,7 @@

 #include "auth/service.hh"

+#include <algorithm>
 #include <map>

 #include <seastar/core/future-util.hh>
@@ -30,10 +31,13 @@
 #include "auth/allow_all_authenticator.hh"
 #include "auth/allow_all_authorizer.hh"
 #include "auth/common.hh"
+#include "auth/password_authenticator.hh"
+#include "auth/role_or_anonymous.hh"
+#include "auth/standard_role_manager.hh"
 #include "cql3/query_processor.hh"
 #include "cql3/untyped_result_set.hh"
 #include "db/config.hh"
-#include "db/consistency_level.hh"
+#include "db/consistency_level_type.hh"
 #include "exceptions/exceptions.hh"
 #include "log.hh"
 #include "service/migration_listener.hh"
@@ -73,11 +77,18 @@ private:
    void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}

    void on_drop_keyspace(const sstring& ks_name) override {
-        _authorizer.revoke_all(auth::data_resource(ks_name));
+        _authorizer.revoke_all(
+                auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
+            // Nothing.
+        });
    }

    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
-        _authorizer.revoke_all(auth::data_resource(ks_name, cf_name));
+        _authorizer.revoke_all(
+                auth::make_data_resource(
+                        ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {
+            // Nothing.
+        });
    }

    void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
@@ -86,40 +97,23 @@ private:
    void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
 };

-static sharded<permissions_cache> sharded_permissions_cache{};
-
-static db::consistency_level consistency_for_user(const sstring& name) {
-    if (name == meta::DEFAULT_SUPERUSER_NAME) {
-        return db::consistency_level::QUORUM;
-    } else {
-        return db::consistency_level::LOCAL_ONE;
-    }
-}
-
-static future<::shared_ptr<cql3::untyped_result_set>> select_user(cql3::query_processor& qp, const sstring& name) {
-    // Here was a thread local, explicit cache of prepared statement. In normal execution this is
-    // fine, but since we in testing set up and tear down system over and over, we'd start using
-    // obsolete prepared statements pretty quickly.
-    // Rely on query processing caching statements instead, and lets assume
-    // that a map lookup string->statement is not gonna kill us much.
-    return qp.process(
-            sprint(
-                    "SELECT * FROM %s.%s WHERE %s = ?",
-                    meta::AUTH_KS,
-                    meta::USERS_CF,
-                    meta::user_name_col_name),
-            consistency_for_user(name),
-            { name },
-            true);
+static future<> validate_role_exists(const service& ser, stdx::string_view role_name) {
+    return ser.underlying_role_manager().exists(role_name).then([role_name](bool exists) {
+        if (!exists) {
+            throw nonexistant_role(role_name);
+        }
+    });
 }

 service_config service_config::from_db_config(const db::config& dc) {
    const qualified_name qualified_authorizer_name(meta::AUTH_PACKAGE_NAME, dc.authorizer());
    const qualified_name qualified_authenticator_name(meta::AUTH_PACKAGE_NAME, dc.authenticator());
+    const qualified_name qualified_role_manager_name(meta::AUTH_PACKAGE_NAME, dc.role_manager());

    service_config c;
    c.authorizer_java_name = qualified_authorizer_name;
    c.authenticator_java_name = qualified_authenticator_name;
+    c.role_manager_java_name = qualified_role_manager_name;

    return c;
 }
@@ -128,40 +122,47 @@ service::service(
        permissions_cache_config c,
        cql3::query_processor& qp,
        ::service::migration_manager& mm,
-        std::unique_ptr<authorizer> a,
-        std::unique_ptr<authenticator> b)
-            : _cache_config(std::move(c))
+        std::unique_ptr<authorizer> z,
+        std::unique_ptr<authenticator> a,
+        std::unique_ptr<role_manager> r)
+            : _permissions_cache_config(std::move(c))
+            , _permissions_cache(nullptr)
            , _qp(qp)
            , _migration_manager(mm)
-            , _authorizer(std::move(a))
-            , _authenticator(std::move(b))
+            , _authorizer(std::move(z))
+            , _authenticator(std::move(a))
+            , _role_manager(std::move(r))
            , _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer)) {
+    // The password authenticator requires that the `standard_role_manager` is running so that the roles metadata table
+    // it manages is created and updated. This cross-module dependency is rather gross, but we have to maintain it for
+    // the sake of compatibility with Apache Cassandra and its choice of auth. schema.
+    if ((_authenticator->qualified_java_name() == password_authenticator_name())
+            && (_role_manager->qualified_java_name() != standard_role_manager_name())) {
+        throw incompatible_module_combination(
+                sprint(
+                        "The %s authenticator must be loaded alongside the %s role-manager.",
+                        password_authenticator_name(),
+                        standard_role_manager_name()));
+    }
 }

 service::service(
-        permissions_cache_config cache_config,
+        permissions_cache_config c,
        cql3::query_processor& qp,
        ::service::migration_manager& mm,
        const service_config& sc)
            : service(
-                      std::move(cache_config),
+                      std::move(c),
                      qp,
                      mm,
                      create_object<authorizer>(sc.authorizer_java_name, qp, mm),
-                      create_object<authenticator>(sc.authenticator_java_name, qp, mm)) {
+                      create_object<authenticator>(sc.authenticator_java_name, qp, mm),
+                      create_object<role_manager>(sc.role_manager_java_name, qp, mm)) {
 }

-bool service::should_create_metadata() const {
-    const bool null_authorizer = _authorizer->qualified_java_name() == allow_all_authorizer_name();
-    const bool null_authenticator = _authenticator->qualified_java_name() == allow_all_authenticator_name();
-    return !null_authorizer || !null_authenticator;
-}
-
-future<> service::create_metadata_if_missing() {
+future<> service::create_keyspace_if_missing() const {
    auto& db = _qp.db().local();

-    auto f = make_ready_future<>();
-
    if (!db.has_keyspace(meta::AUTH_KS)) {
        std::map<sstring, sstring> opts{{"replication_factor", "1"}};

@@ -173,91 +174,42 @@ future<> service::create_metadata_if_missing() {

        // We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.
        // See issue #2129.
-        f = _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
+        return _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
    }

-    return f.then([this] {
-        // 3 months.
-        static const auto gc_grace_seconds = 90 * 24 * 60 * 60;
-
-        static const sstring users_table_query = sprint(
-                "CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY (%s)) WITH gc_grace_seconds=%s",
-                meta::AUTH_KS,
-                meta::USERS_CF,
-                meta::user_name_col_name,
-                meta::superuser_col_name,
-                meta::user_name_col_name,
-                gc_grace_seconds);
-
-        return create_metadata_table_if_missing(
-                meta::USERS_CF,
-                _qp,
-                users_table_query,
-                _migration_manager);
-    }).then([this] {
-        delay_until_system_ready(_delayed, [this] {
-            return has_existing_users().then([this](bool existing) {
-                if (!existing) {
-                    //
-                    // Create default superuser.
-                    //
-
-                    static const sstring query = sprint(
-                            "INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
-                            meta::AUTH_KS,
-                            meta::USERS_CF,
-                            meta::user_name_col_name,
-                            meta::superuser_col_name);
-
-                    return _qp.process(
-                            query,
-                            db::consistency_level::ONE,
-                            { meta::DEFAULT_SUPERUSER_NAME, true }).then([](auto&&) {
-                        log.info("Created default superuser '{}'", meta::DEFAULT_SUPERUSER_NAME);
-                    }).handle_exception([](auto exn) {
-                        try {
-                            std::rethrow_exception(exn);
-                        } catch (const exceptions::request_execution_exception&) {
-                            log.warn("Skipped default superuser setup: some nodes were not ready");
-                        }
-                    }).discard_result();
-                }
-
-                return make_ready_future<>();
-            });
-        });
-
-        return make_ready_future<>();
-    });
+    return make_ready_future<>();
 }

 future<> service::start() {
    return once_among_shards([this] {
-        if (should_create_metadata()) {
-            return create_metadata_if_missing();
-        }
-
-        return make_ready_future<>();
+        return create_keyspace_if_missing();
    }).then([this] {
-        return when_all_succeed(_authorizer->start(), _authenticator->start());
+        return when_all_succeed(_role_manager->start(), _authorizer->start(), _authenticator->start());
+    }).then([this] {
+        _permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
    }).then([this] {
        return once_among_shards([this] {
            _migration_manager.register_listener(_migration_listener.get());
-            return sharded_permissions_cache.start(std::ref(_cache_config), std::ref(*this), std::ref(log));
+            return make_ready_future<>();
        });
    });
 }

 future<> service::stop() {
-    return once_among_shards([this] {
-        _delayed.cancel_all();
-        return sharded_permissions_cache.stop();
-    }).then([this] {
-        return when_all_succeed(_authorizer->stop(), _authenticator->stop());
+    // Only one of the shards has the listener registered, but let's try to
+    // unregister on each one just to make sure.
+    _migration_manager.unregister_listener(_migration_listener.get());
+
+    return _permissions_cache->stop().then([this] {
+        return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
    });
 }

-future<bool> service::has_existing_users() const {
+future<bool> service::has_existing_legacy_users() const {
+    if (!_qp.db().local().has_schema(meta::AUTH_KS, meta::USERS_CF)) {
+        return make_ready_future<bool>(false);
+    }
+
    static const sstring default_user_query = sprint(
            "SELECT * FROM %s.%s WHERE %s = ?",
            meta::AUTH_KS,
@@ -275,7 +227,8 @@ future<bool> service::has_existing_users() const {
    return _qp.process(
            default_user_query,
            db::consistency_level::ONE,
-            { meta::DEFAULT_SUPERUSER_NAME },
+            infinite_timeout_config,
+            {meta::DEFAULT_SUPERUSER_NAME},
            true).then([this](auto results) {
        if (!results->empty()) {
            return make_ready_future<bool>(true);
@@ -284,7 +237,8 @@ future<bool> service::has_existing_users() const {
        return _qp.process(
                default_user_query,
                db::consistency_level::QUORUM,
-                { meta::DEFAULT_SUPERUSER_NAME },
+                infinite_timeout_config,
+                {meta::DEFAULT_SUPERUSER_NAME},
                true).then([this](auto results) {
            if (!results->empty()) {
                return make_ready_future<bool>(true);
@@ -292,62 +246,342 @@ future<bool> service::has_existing_users() const {

            return _qp.process(
                    all_users_query,
-                    db::consistency_level::QUORUM).then([](auto results) {
+                    db::consistency_level::QUORUM,
+                    infinite_timeout_config).then([](auto results) {
                return make_ready_future<bool>(!results->empty());
            });
        });
    });
 }

-future<bool> service::is_existing_user(const sstring& name) const {
-    return select_user(_qp, name).then([](auto results) {
-        return !results->empty();
+future<permission_set>
+service::get_uncached_permissions(const role_or_anonymous& maybe_role, const resource& r) const {
+    if (is_anonymous(maybe_role)) {
+        return _authorizer->authorize(maybe_role, r);
+    }
+
+    const stdx::string_view role_name = *maybe_role.name;
+
+    return has_superuser(role_name).then([this, role_name, &r](bool superuser) {
+        if (superuser) {
+            return make_ready_future<permission_set>(r.applicable_permissions());
+        }
+
+        //
+        // Aggregate the permissions from all granted roles.
+        //
+
+        return do_with(permission_set(), [this, role_name, &r](auto& all_perms) {
+            return get_roles(role_name).then([this, &r, &all_perms](role_set all_roles) {
+                return do_with(std::move(all_roles), [this, &r, &all_perms](const auto& all_roles) {
+                    return parallel_for_each(all_roles, [this, &r, &all_perms](stdx::string_view role_name) {
+                        return _authorizer->authorize(role_name, r).then([&all_perms](permission_set perms) {
+                            all_perms = permission_set::from_mask(all_perms.mask() | perms.mask());
+                        });
+                    });
+                });
+            }).then([&all_perms] {
+                return all_perms;
+            });
+        });
    });
 }

-future<bool> service::is_super_user(const sstring& name) const {
-    return select_user(_qp, name).then([](auto results) {
-        return !results->empty() && results->one().template get_as<bool>(meta::superuser_col_name);
+future<permission_set> service::get_permissions(const role_or_anonymous& maybe_role, const resource& r) const {
+    return _permissions_cache->get(maybe_role, r);
+}
+
+future<bool> service::has_superuser(stdx::string_view role_name) const {
+    return this->get_roles(std::move(role_name)).then([this](role_set roles) {
+        return do_with(std::move(roles), [this](const role_set& roles) {
+            return do_with(false, roles.begin(), [this, &roles](bool& any_super, auto& iter) {
+                return do_until(
+                        [&roles, &any_super, &iter] { return any_super || (iter == roles.end()); },
+                        [this, &any_super, &iter] {
+                    return _role_manager->is_superuser(*iter++).then([&any_super](bool super) {
+                        any_super = super;
+                    });
+                }).then([&any_super] {
+                    return any_super;
+                });
+            });
+        });
    });
 }

-future<> service::insert_user(const sstring& name, bool is_superuser) {
-    return _qp.process(
-            sprint(
-                    "INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
-                    meta::AUTH_KS,
-                    meta::USERS_CF,
-                    meta::user_name_col_name,
-                    meta::superuser_col_name),
-            consistency_for_user(name),
-            { name, is_superuser }).discard_result();
+future<role_set> service::get_roles(stdx::string_view role_name) const {
+    //
+    // We may wish to cache this information in the future (as Apache Cassandra does).
+    //
+
+    return _role_manager->query_granted(role_name, recursive_role_query::yes);
 }

-future<> service::delete_user(const sstring& name) {
-    return _qp.process(
-            sprint(
-                    "DELETE FROM %s.%s WHERE %s = ?",
-                    meta::AUTH_KS,
-                    meta::USERS_CF,
-                    meta::user_name_col_name),
-            consistency_for_user(name),
-            { name }).discard_result();
-}
+future<bool> service::exists(const resource& r) const {
+    switch (r.kind()) {
+        case resource_kind::data: {
+            const auto& db = _qp.db().local();

-future<permission_set> service::get_permissions(::shared_ptr<authenticated_user> u, data_resource r) const {
-    return sharded_permissions_cache.local().get(std::move(u), std::move(r));
+            data_resource_view v(r);
+            const auto keyspace = v.keyspace();
+            const auto table = v.table();
+
+            if (table) {
+                return make_ready_future<bool>(db.has_schema(sstring(*keyspace), sstring(*table)));
+            }
+
+            if (keyspace) {
+                return make_ready_future<bool>(db.has_keyspace(sstring(*keyspace)));
+            }
+
+            return make_ready_future<bool>(true);
+        }
+
+        case resource_kind::role: {
+            role_resource_view v(r);
+            const auto role = v.role();
+
+            if (role) {
+                return _role_manager->exists(*role);
+            }
+
+            return make_ready_future<bool>(true);
+        }
+    }
+
+    return make_ready_future<bool>(false);
 }

 //
 // Free functions.
 //

-future<bool> is_super_user(const service& ser, const authenticated_user& u) {
-    if (u.is_anonymous()) {
+future<bool> has_superuser(const service& ser, const authenticated_user& u) {
+    if (is_anonymous(u)) {
        return make_ready_future<bool>(false);
    }

-    return ser.is_super_user(u.name());
+    return ser.has_superuser(*u.name);
+}
+
+future<role_set> get_roles(const service& ser, const authenticated_user& u) {
+    if (is_anonymous(u)) {
+        return make_ready_future<role_set>();
+    }
+
+    return ser.get_roles(*u.name);
+}
+
+future<permission_set> get_permissions(const service& ser, const authenticated_user& u, const resource& r) {
+    return do_with(role_or_anonymous(), [&ser, &u, &r](auto& maybe_role) {
+        maybe_role.name = u.name;
+        return ser.get_permissions(maybe_role, r);
+    });
+}
+
+bool is_enforcing(const service& ser)  {
+    const bool enforcing_authorizer = ser.underlying_authorizer().qualified_java_name() != allow_all_authorizer_name();
+
+    const bool enforcing_authenticator = ser.underlying_authenticator().qualified_java_name()
+            != allow_all_authenticator_name();
+
+    return enforcing_authorizer || enforcing_authenticator;
+}
+
+bool is_protected(const service& ser, const resource& r) noexcept {
+    return ser.underlying_role_manager().protected_resources().count(r)
+            || ser.underlying_authenticator().protected_resources().count(r)
+            || ser.underlying_authorizer().protected_resources().count(r);
+}
+
+static void validate_authentication_options_are_supported(
+        const authentication_options& options,
+        const authentication_option_set& supported) {
+    const auto check = [&supported](authentication_option k) {
+        if (supported.count(k) == 0) {
+            throw unsupported_authentication_option(k);
+        }
+    };
+
+    if (options.password) {
+        check(authentication_option::password);
+    }
+
+    if (options.options) {
+        check(authentication_option::options);
+    }
+}
+
+
+future<> create_role(
+        const service& ser,
+        stdx::string_view name,
+        const role_config& config,
+        const authentication_options& options) {
+    return ser.underlying_role_manager().create(name, config).then([&ser, name, &options] {
+        if (!auth::any_authentication_options(options)) {
+            return make_ready_future<>();
+        }
+
+        return futurize_apply(
+                &validate_authentication_options_are_supported,
+                options,
+                ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {
+            return ser.underlying_authenticator().create(name, options);
+        }).handle_exception([&ser, &name](std::exception_ptr ep) {
+            // Roll-back.
+            return ser.underlying_role_manager().drop(name).then([ep = std::move(ep)] {
+                std::rethrow_exception(ep);
+            });
+        });
+    });
+}
+
+future<> alter_role(
+        const service& ser,
+        stdx::string_view name,
+        const role_config_update& config_update,
+        const authentication_options& options) {
+    return ser.underlying_role_manager().alter(name, config_update).then([&ser, name, &options] {
+        if (!any_authentication_options(options)) {
+            return make_ready_future<>();
+        }
+
+        return futurize_apply(
+                &validate_authentication_options_are_supported,
+                options,
+                ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {
+            return ser.underlying_authenticator().alter(name, options);
+        });
+    });
+}
+
+future<> drop_role(const service& ser, stdx::string_view name) {
+    return do_with(make_role_resource(name), [&ser, name](const resource& r) {
+        auto& a = ser.underlying_authorizer();
+
+        return when_all_succeed(
+                a.revoke_all(name),
+                a.revoke_all(r)).handle_exception_type([](const unsupported_authorization_operation&) {
+            // Nothing.
+        });
+    }).then([&ser, name] {
+        return ser.underlying_authenticator().drop(name);
+    }).then([&ser, name] {
+        return ser.underlying_role_manager().drop(name);
+    });
+}
+
+future<bool> has_role(const service& ser, stdx::string_view grantee, stdx::string_view name) {
+    return when_all_succeed(
+            validate_role_exists(ser, name),
+            ser.get_roles(grantee)).then([name](role_set all_roles) {
+        return make_ready_future<bool>(all_roles.count(sstring(name)) != 0);
+    });
+}
+future<bool> has_role(const service& ser, const authenticated_user& u, stdx::string_view name) {
+    if (is_anonymous(u)) {
+        return make_ready_future<bool>(false);
+    }
+
+    return has_role(ser, *u.name, name);
+}
+
+future<> grant_permissions(
+        const service& ser,
+        stdx::string_view role_name,
+        permission_set perms,
+        const resource& r) {
+    return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {
+        return ser.underlying_authorizer().grant(role_name, perms, r);
+    });
+}
+
+future<> grant_applicable_permissions(const service& ser, stdx::string_view role_name, const resource& r) {
+    return grant_permissions(ser, role_name, r.applicable_permissions(), r);
+}
+future<> grant_applicable_permissions(const service& ser, const authenticated_user& u, const resource& r) {
+    if (is_anonymous(u)) {
+        return make_ready_future<>();
+    }
+
+    return grant_applicable_permissions(ser, *u.name, r);
+}
+
+future<> revoke_permissions(
+        const service& ser,
+        stdx::string_view role_name,
+        permission_set perms,
+        const resource& r) {
+    return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {
+        return ser.underlying_authorizer().revoke(role_name, perms, r);
+    });
+}
+
+future<std::vector<permission_details>> list_filtered_permissions(
+        const service& ser,
+        permission_set perms,
+        std::optional<stdx::string_view> role_name,
+        const std::optional<std::pair<resource, recursive_permissions>>& resource_filter) {
+    return ser.underlying_authorizer().list_all().then([&ser, perms, role_name, &resource_filter](
+            std::vector<permission_details> all_details) {
+
+        if (resource_filter) {
+            const resource r = resource_filter->first;
+
+            const auto resources = resource_filter->second
+                    ? auth::expand_resource_family(r)
+                    : auth::resource_set{r};
+
+            all_details.erase(
+                    std::remove_if(
+                            all_details.begin(),
+                            all_details.end(),
+                            [&resources](const permission_details& pd) {
+                        return resources.count(pd.resource) == 0;
+                    }),
+                    all_details.end());
+        }
+
+        std::transform(
+                std::make_move_iterator(all_details.begin()),
+                std::make_move_iterator(all_details.end()),
+                all_details.begin(),
+                [perms](permission_details pd) {
+                    pd.permissions = permission_set::from_mask(pd.permissions.mask() & perms.mask());
+                    return pd;
+                });
+
+        // Eliminate rows with an empty permission set.
+        all_details.erase(
+                std::remove_if(all_details.begin(), all_details.end(), [](const permission_details& pd) {
+                    return pd.permissions.mask() == 0;
+                }),
+                all_details.end());
+
+        if (!role_name) {
+            return make_ready_future<std::vector<permission_details>>(std::move(all_details));
+        }
+
+        //
+        // Filter out rows based on whether permissions have been granted to this role (directly or indirectly).
+        //
+
+        return do_with(std::move(all_details), [&ser, role_name](auto& all_details) {
+            return ser.get_roles(*role_name).then([&all_details](role_set all_roles) {
+                all_details.erase(
+                        std::remove_if(
+                                all_details.begin(),
+                                all_details.end(),
+                                [&all_roles](const permission_details& pd) {
+                            return all_roles.count(pd.role_name) == 0;
+                        }),
+                        all_details.end());
+
+                return make_ready_future<std::vector<permission_details>>(std::move(all_details));
+            });
+        });
+    });
 }

 }
--- a/auth/service.hh
+++ b/auth/service.hh
@@ -21,18 +21,21 @@

 #pragma once

+#include <experimental/string_view>
 #include <memory>
+#include <optional>

 #include <seastar/core/future.hh>
 #include <seastar/core/sstring.hh>
+#include <seastar/util/bool_class.hh>

 #include "auth/authenticator.hh"
 #include "auth/authorizer.hh"
-#include "auth/authenticated_user.hh"
 #include "auth/permission.hh"
 #include "auth/permissions_cache.hh"
-#include "delayed_tasks.hh"
+#include "auth/role_manager.hh"
 #include "seastarx.hh"
+#include "stdx.hh"

 namespace cql3 {
 class query_processor;
@@ -49,18 +52,40 @@ class migration_listener;

 namespace auth {

-class authenticator;
-class authorizer;
+class role_or_anonymous;

 struct service_config final {
    static service_config from_db_config(const db::config&);

    sstring authorizer_java_name;
    sstring authenticator_java_name;
+    sstring role_manager_java_name;
 };

+///
+/// Due to poor (in this author's opinion) decisions of Apache Cassandra, certain choices of one role-manager,
+/// authenticator, or authorizer imply restrictions on the rest.
+///
+/// This exception is thrown when an invalid combination of modules is selected, with a message explaining the
+/// incompatibility.
+///
+class incompatible_module_combination : public std::invalid_argument {
+public:
+    using std::invalid_argument::invalid_argument;
+};
+
+///
+/// Client for access-control in the system.
+///
+/// Access control encompasses user/role management, authentication, and authorization. This client provides access to
+/// the dynamically-loaded implementations of these modules (through the `underlying_*` member functions), but also
+/// builds on their functionality with caching and abstractions for common operations.
+///
+/// All state associated with access-control is stored externally to any particular instance of this class.
+///
 class service final {
-    permissions_cache_config _cache_config;
+    permissions_cache_config _permissions_cache_config;
+    std::unique_ptr<permissions_cache> _permissions_cache;

    cql3::query_processor& _qp;

@@ -70,19 +95,25 @@ class service final {

    std::unique_ptr<authenticator> _authenticator;

+    std::unique_ptr<role_manager> _role_manager;
+
    // Only one of these should be registered, so we end up with some unused instances. Not the end of the world.
    std::unique_ptr<::service::migration_listener> _migration_listener;

-    delayed_tasks<> _delayed{};
-
 public:
    service(
            permissions_cache_config,
            cql3::query_processor&,
            ::service::migration_manager&,
            std::unique_ptr<authorizer>,
-            std::unique_ptr<authenticator>);
+            std::unique_ptr<authenticator>,
+            std::unique_ptr<role_manager>);

+    ///
+    /// This constructor is intended to be used when the class is sharded via \ref seastar::sharded. In that case, the
+    /// arguments must be copyable, which is why we delay construction with instance-construction instructions instead
+    /// of the instances themselves.
+    ///
    service(
            permissions_cache_config,
            cql3::query_processor&,
@@ -93,40 +124,173 @@ public:

    future<> stop();

-    future<bool> is_existing_user(const sstring& name) const;
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.
+    ///
+    future<permission_set> get_permissions(const role_or_anonymous&, const resource&) const;

-    future<bool> is_super_user(const sstring& name) const;
+    ///
+    /// Like \ref get_permissions, but never returns cached permissions.
+    ///
+    future<permission_set> get_uncached_permissions(const role_or_anonymous&, const resource&) const;

-    future<> insert_user(const sstring& name, bool is_superuser);
+    ///
+    /// Query whether the named role has been granted a role that is a superuser.
+    ///
+    /// A role is always granted to itself. Therefore, a role that "is" a superuser also "has" superuser.
+    ///
+    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
+    ///
+    future<bool> has_superuser(stdx::string_view role_name) const;

-    future<> delete_user(const sstring& name);
+    ///
+    /// Return the set of all roles granted to the given role, including itself and roles granted through other roles.
+    ///
+    /// \returns an exceptional future with \ref nonexistent_role if the role does not exist.
+    future<role_set> get_roles(stdx::string_view role_name) const;

-    future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource) const;
-
-    authenticator& underlying_authenticator() {
-        return *_authenticator;
-    }
+    future<bool> exists(const resource&) const;

    const authenticator& underlying_authenticator() const {
        return *_authenticator;
    }

-    authorizer& underlying_authorizer() {
-        return *_authorizer;
-    }
-
    const authorizer& underlying_authorizer() const {
        return *_authorizer;
    }

+    const role_manager& underlying_role_manager() const {
+        return *_role_manager;
+    }
+
 private:
-    future<bool> has_existing_users() const;
+    future<bool> has_existing_legacy_users() const;

-    bool should_create_metadata() const;
-
-    future<> create_metadata_if_missing();
+    future<> create_keyspace_if_missing() const;
 };

-future<bool> is_super_user(const service&, const authenticated_user&);
+future<bool> has_superuser(const service&, const authenticated_user&);
+
+future<role_set> get_roles(const service&, const authenticated_user&);
+
+future<permission_set> get_permissions(const service&, const authenticated_user&, const resource&);
+
+///
+/// Access-control is "enforcing" when either the authenticator or the authorizer are not their "allow-all" variants.
+///
+/// Put differently, when access control is not enforcing, all operations on resources will be allowed and users do not
+/// need to authenticate themselves.
+///
+bool is_enforcing(const service&);
+
+///
+/// Protected resources cannot be modified even if the performer has permissions to do so.
+///
+bool is_protected(const service&, const resource&) noexcept;
+
+///
+/// Create a role with optional authentication information.
+///
+/// \returns an exceptional future with \ref role_already_exists if the user or role exists.
+///
+/// \returns an exceptional future with \ref unsupported_authentication_option if an unsupported option is included.
+///
+future<> create_role(
+        const service&,
+        stdx::string_view name,
+        const role_config&,
+        const authentication_options&);
+
+///
+/// Alter an existing role and its authentication information.
+///
+/// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.
+///
+/// \returns an exceptional future with \ref unsupported_authentication_option if an unsupported option is included.
+///
+future<> alter_role(
+        const service&,
+        stdx::string_view name,
+        const role_config_update&,
+        const authentication_options&);
+
+///
+/// Drop a role from the system, including all permissions and authentication information.
+///
+/// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.
+///
+future<> drop_role(const service&, stdx::string_view name);
+
+///
+/// Check if `grantee` has been granted the named role.
+///
+/// \returns an exceptional future with \ref nonexistent_role if `grantee` or `name` do not exist.
+///
+future<bool> has_role(const service&, stdx::string_view grantee, stdx::string_view name);
+///
+/// Check if the authenticated user has been granted the named role.
+///
+/// \returns an exceptional future with \ref nonexistent_role if the user or `name` do not exist.
+///
+future<bool> has_role(const service&, const authenticated_user&, stdx::string_view name);
+
+///
+/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
+///
+/// \returns an exceptional future with \ref unsupported_authorization_operation if granting permissions is not
+/// supported.
+///
+future<> grant_permissions(
+        const service&,
+        stdx::string_view role_name,
+        permission_set,
+        const resource&);
+
+///
+/// Like \ref grant_permissions, but grants all applicable permissions on the resource.
+///
+/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
+///
+/// \returns an exceptional future with \ref unsupported_authorization_operation if granting permissions is not
+/// supported.
+///
+future<> grant_applicable_permissions(const service&, stdx::string_view role_name, const resource&);
+future<> grant_applicable_permissions(const service&, const authenticated_user&, const resource&);
+
+///
+/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
+///
+/// \returns an exceptional future with \ref unsupported_authorization_operation if revoking permissions is not
+/// supported.
+///
+future<> revoke_permissions(
+        const service&,
+        stdx::string_view role_name,
+        permission_set,
+        const resource&);
+
+using recursive_permissions = bool_class<struct recursive_permissions_tag>;
+
+///
+/// Query for all granted permissions according to filtering criteria.
+///
+/// Only permissions included in the provided set are included.
+///
+/// If a role name is provided, only permissions granted (directly or recursively) to the role are included.
+///
+/// If a resource filter is provided, only permissions granted on the resource are included. When \ref
+/// recursive_permissions is `true`, permissions on a parent resource are included.
+///
+/// \returns an exceptional future with \ref nonexistent_role if a role name is included which refers to a role that
+/// does not exist.
+///
+/// \returns an exceptional future with \ref unsupported_authorization_operation if listing permissions is not
+/// supported.
+///
+future<std::vector<permission_details>> list_filtered_permissions(
+        const service&,
+        permission_set,
+        std::optional<stdx::string_view> role_name,
+        const std::optional<std::pair<resource, recursive_permissions>>& resource_filter);

 }
--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -0,0 +1,555 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "auth/standard_role_manager.hh"
+
+#include <experimental/optional>
+#include <unordered_set>
+#include <vector>
+
+#include <boost/algorithm/string/join.hpp>
+#include <seastar/core/future-util.hh>
+#include <seastar/core/print.hh>
+#include <seastar/core/sleep.hh>
+#include <seastar/core/sstring.hh>
+#include <seastar/core/thread.hh>
+
+#include "auth/common.hh"
+#include "auth/roles-metadata.hh"
+#include "cql3/query_processor.hh"
+#include "db/consistency_level_type.hh"
+#include "exceptions/exceptions.hh"
+#include "log.hh"
+#include "utils/class_registrator.hh"
+
+namespace auth {
+
+namespace meta {
+
+namespace role_members_table {
+
+constexpr stdx::string_view name{"role_members" , 12};
+
+static stdx::string_view qualified_name() noexcept {
+    static const sstring instance = AUTH_KS + "." + sstring(name);
+    return instance;
+}
+
+}
+
+}
+
+static logging::logger log("standard_role_manager");
+
+static const class_registrator<
+        role_manager,
+        standard_role_manager,
+        cql3::query_processor&,
+        ::service::migration_manager&> registration("org.apache.cassandra.auth.CassandraRoleManager");
+
+struct record final {
+    sstring name;
+    bool is_superuser;
+    bool can_login;
+    role_set member_of;
+};
+
+static db::consistency_level consistency_for_role(stdx::string_view role_name) noexcept {
+    if (role_name == meta::DEFAULT_SUPERUSER_NAME) {
+        return db::consistency_level::QUORUM;
+    }
+
+    return db::consistency_level::LOCAL_ONE;
+}
+
+static future<stdx::optional<record>> find_record(cql3::query_processor& qp, stdx::string_view role_name) {
+    static const sstring query = sprint(
+            "SELECT * FROM %s WHERE %s = ?",
+            meta::roles_table::qualified_name(),
+            meta::roles_table::role_col_name);
+
+    return qp.process(
+            query,
+            consistency_for_role(role_name),
+            internal_distributed_timeout_config(),
+            {sstring(role_name)},
+            true).then([](::shared_ptr<cql3::untyped_result_set> results) {
+        if (results->empty()) {
+            return stdx::optional<record>();
+        }
+
+        const cql3::untyped_result_set_row& row = results->one();
+
+        return stdx::make_optional(
+                record{
+                        row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),
+                        row.get_as<bool>("is_superuser"),
+                        row.get_as<bool>("can_login"),
+                        (row.has("member_of")
+                                 ? row.get_set<sstring>("member_of")
+                                 : role_set())});
+    });
+}
+
+static future<record> require_record(cql3::query_processor& qp, stdx::string_view role_name) {
+    return find_record(qp, role_name).then([role_name](stdx::optional<record> mr) {
+        if (!mr) {
+            throw nonexistant_role(role_name);
+        }
+
+        return make_ready_future<record>(*mr);
+   });
+}
+
+static bool has_can_login(const cql3::untyped_result_set_row& row) {
+    return row.has("can_login") && !(boolean_type->deserialize(row.get_blob("can_login")).is_null());
+}
+
+stdx::string_view standard_role_manager_name() noexcept {
+    static const sstring instance = meta::AUTH_PACKAGE_NAME + "CassandraRoleManager";
+    return instance;
+}
+
+stdx::string_view standard_role_manager::qualified_java_name() const noexcept {
+    return standard_role_manager_name();
+}
+
+const resource_set& standard_role_manager::protected_resources() const {
+    static const resource_set resources({
+            make_data_resource(meta::AUTH_KS, meta::roles_table::name),
+            make_data_resource(meta::AUTH_KS, meta::role_members_table::name)});
+
+    return resources;
+}
+
+future<> standard_role_manager::create_metadata_tables_if_missing() const {
+    static const sstring create_role_members_query = sprint(
+            "CREATE TABLE %s ("
+            "  role text,"
+            "  member text,"
+            "  PRIMARY KEY (role, member)"
+            ")",
+            meta::role_members_table::qualified_name());
+
+
+    return when_all_succeed(
+            create_metadata_table_if_missing(
+                    meta::roles_table::name,
+                    _qp,
+                    meta::roles_table::creation_query(),
+                    _migration_manager),
+            create_metadata_table_if_missing(
+                    meta::role_members_table::name,
+                    _qp,
+                    create_role_members_query,
+                    _migration_manager));
+}
+
+future<> standard_role_manager::create_default_role_if_missing() const {
+    return default_role_row_satisfies(_qp, &has_can_login).then([this](bool exists) {
+        if (!exists) {
+            static const sstring query = sprint(
+                    "INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, true, true)",
+                    meta::roles_table::qualified_name(),
+                    meta::roles_table::role_col_name);
+
+            return _qp.process(
+                    query,
+                    db::consistency_level::QUORUM,
+                    internal_distributed_timeout_config(),
+                    {meta::DEFAULT_SUPERUSER_NAME}).then([](auto&&) {
+                log.info("Created default superuser role '{}'.", meta::DEFAULT_SUPERUSER_NAME);
+                return make_ready_future<>();
+            });
+        }
+
+        return make_ready_future<>();
+    }).handle_exception_type([](const exceptions::unavailable_exception& e) {
+        log.warn("Skipped default role setup: some nodes were not ready; will retry");
+        return make_exception_future<>(e);
+    });
+}
+
+static const sstring legacy_table_name{"users"};
+
+bool standard_role_manager::legacy_metadata_exists() const {
+    return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
+}
+
+future<> standard_role_manager::migrate_legacy_metadata() const {
+    log.info("Starting migration of legacy user metadata.");
+    static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
+
+    return _qp.process(
+            query,
+            db::consistency_level::QUORUM,
+            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
+        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
+            role_config config;
+            config.is_superuser = row.get_as<bool>("super");
+            config.can_login = true;
+
+            return do_with(
+                    row.get_as<sstring>("name"),
+                    std::move(config),
+                    [this](const auto& name, const auto& config) {
+                return this->create_or_replace(name, config);
+            });
+        }).finally([results] {});
+    }).then([] {
+        log.info("Finished migrating legacy user metadata.");
+    }).handle_exception([](std::exception_ptr ep) {
+        log.error("Encountered an error during migration!");
+        std::rethrow_exception(ep);
+    });
+}
+
+future<> standard_role_manager::start() {
+    return once_among_shards([this] {
+        return this->create_metadata_tables_if_missing().then([this] {
+            _stopped = auth::do_after_system_ready(_as, [this] {
+                return seastar::async([this] {
+                    wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
+
+                    if (any_nondefault_role_row_satisfies(_qp, &has_can_login).get0()) {
+                        if (this->legacy_metadata_exists()) {
+                            log.warn("Ignoring legacy user metadata since nondefault roles already exist.");
+                        }
+
+                        return;
+                    }
+
+                    if (this->legacy_metadata_exists()) {
+                        this->migrate_legacy_metadata().get0();
+                        return;
+                    }
+
+                    create_default_role_if_missing().get0();
+                });
+            });
+        });
+    });
+}
+
+future<> standard_role_manager::stop() {
+    _as.request_abort();
+    return _stopped.handle_exception_type([] (const sleep_aborted&) { });
+}
+
+future<> standard_role_manager::create_or_replace(stdx::string_view role_name, const role_config& c) const {
+    static const sstring query = sprint(
+            "INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, ?, ?)",
+            meta::roles_table::qualified_name(),
+            meta::roles_table::role_col_name);
+
+    return _qp.process(
+            query,
+            consistency_for_role(role_name),
+            internal_distributed_timeout_config(),
+            {sstring(role_name), c.is_superuser, c.can_login},
+            true).discard_result();
+}
+
+future<>
+standard_role_manager::create(stdx::string_view role_name, const role_config& c) const {
+    return this->exists(role_name).then([this, role_name, &c](bool role_exists) {
+        if (role_exists) {
+            throw role_already_exists(role_name);
+        }
+
+        return this->create_or_replace(role_name, c);
+    });
+}
+
+future<>
+standard_role_manager::alter(stdx::string_view role_name, const role_config_update& u) const {
+    static const auto build_column_assignments = [](const role_config_update& u) -> sstring {
+        std::vector<sstring> assignments;
+
+        if (u.is_superuser) {
+            assignments.push_back(sstring("is_superuser = ") + (*u.is_superuser ? "true" : "false"));
+        }
+
+        if (u.can_login) {
+            assignments.push_back(sstring("can_login = ") + (*u.can_login ? "true" : "false"));
+        }
+
+        return boost::algorithm::join(assignments, ", ");
+    };
+
+    return require_record(_qp, role_name).then([this, role_name, &u](record) {
+        if (!u.is_superuser && !u.can_login) {
+            return make_ready_future<>();
+        }
+
+        return _qp.process(
+                sprint(
+                        "UPDATE %s SET %s WHERE %s = ?",
+                        meta::roles_table::qualified_name(),
+                        build_column_assignments(u),
+                        meta::roles_table::role_col_name),
+                consistency_for_role(role_name),
+                internal_distributed_timeout_config(),
+                {sstring(role_name)}).discard_result();
+    });
+}
+
+future<> standard_role_manager::drop(stdx::string_view role_name) const {
+    return this->exists(role_name).then([this, role_name](bool role_exists) {
+        if (!role_exists) {
+            throw nonexistant_role(role_name);
+        }
+
+        // First, revoke this role from all roles that are members of it.
+        const auto revoke_from_members = [this, role_name] {
+            static const sstring query = sprint(
+                    "SELECT member FROM %s WHERE role = ?",
+                    meta::role_members_table::qualified_name());
+
+            return _qp.process(
+                    query,
+                    consistency_for_role(role_name),
+                    internal_distributed_timeout_config(),
+                    {sstring(role_name)}).then([this, role_name](::shared_ptr<cql3::untyped_result_set> members) {
+                return parallel_for_each(
+                        members->begin(),
+                        members->end(),
+                        [this, role_name](const cql3::untyped_result_set_row& member_row) {
+                    const sstring member = member_row.template get_as<sstring>("member");
+                    return this->modify_membership(member, role_name, membership_change::remove);
+                }).finally([members] {});
+            });
+        };
+
+        // In parallel, revoke all roles that this role is members of.
+        const auto revoke_members_of = [this, grantee = role_name] {
+            return this->query_granted(
+                    grantee,
+                    recursive_role_query::no).then([this, grantee](role_set granted_roles) {
+                return do_with(
+                        std::move(granted_roles),
+                        [this, grantee](const role_set& granted_roles) {
+                    return parallel_for_each(
+                            granted_roles.begin(),
+                            granted_roles.end(),
+                            [this, grantee](const sstring& role_name) {
+                        return this->modify_membership(grantee, role_name, membership_change::remove);
+                    });
+                });
+            });
+        };
+
+        // Finally, delete the role itself.
+        auto delete_role = [this, role_name] {
+            static const sstring query = sprint(
+                    "DELETE FROM %s WHERE %s = ?",
+                    meta::roles_table::qualified_name(),
+                    meta::roles_table::role_col_name);
+
+            return _qp.process(
+                    query,
+                    consistency_for_role(role_name),
+                    internal_distributed_timeout_config(),
+                    {sstring(role_name)}).discard_result();
+        };
+
+        return when_all_succeed(revoke_from_members(), revoke_members_of()).then([delete_role = std::move(delete_role)] {
+            return delete_role();
+        });
+    });
+}
+
+future<>
+standard_role_manager::modify_membership(
+        stdx::string_view grantee_name,
+        stdx::string_view role_name,
+        membership_change ch) const {
+
+
+    const auto modify_roles = [this, role_name, grantee_name, ch] {
+        const auto query = sprint(
+                "UPDATE %s SET member_of = member_of %s ? WHERE %s = ?",
+                meta::roles_table::qualified_name(),
+                (ch == membership_change::add ? '+' : '-'),
+                meta::roles_table::role_col_name);
+
+        return _qp.process(
+                query,
+                consistency_for_role(grantee_name),
+                internal_distributed_timeout_config(),
+                {role_set{sstring(role_name)}, sstring(grantee_name)}).discard_result();
+    };
+
+    const auto modify_role_members = [this, role_name, grantee_name, ch] {
+        switch (ch) {
+            case membership_change::add:
+                return _qp.process(
+                        sprint(
+                                "INSERT INTO %s (role, member) VALUES (?, ?)",
+                                meta::role_members_table::qualified_name()),
+                        consistency_for_role(role_name),
+                        internal_distributed_timeout_config(),
+                        {sstring(role_name), sstring(grantee_name)}).discard_result();
+
+            case membership_change::remove:
+                return _qp.process(
+                        sprint(
+                                "DELETE FROM %s WHERE role = ? AND member = ?",
+                                meta::role_members_table::qualified_name()),
+                        consistency_for_role(role_name),
+                        internal_distributed_timeout_config(),
+                        {sstring(role_name), sstring(grantee_name)}).discard_result();
+        }
+
+        return make_ready_future<>();
+    };
+
+    return when_all_succeed(modify_roles(), modify_role_members());
+}
+
+future<>
+standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view role_name) const {
+    const auto check_redundant = [this, role_name, grantee_name] {
+        return this->query_granted(
+                grantee_name,
+                recursive_role_query::yes).then([role_name, grantee_name](role_set roles) {
+            if (roles.count(sstring(role_name)) != 0) {
+                throw role_already_included(grantee_name, role_name);
+            }
+
+            return make_ready_future<>();
+        });
+    };
+
+    const auto check_cycle = [this, role_name, grantee_name] {
+        return this->query_granted(
+                role_name,
+                recursive_role_query::yes).then([role_name, grantee_name](role_set roles) {
+            if (roles.count(sstring(grantee_name)) != 0) {
+                throw role_already_included(role_name, grantee_name);
+            }
+
+            return make_ready_future<>();
+        });
+    };
+
+   return when_all_succeed(check_redundant(), check_cycle()).then([this, role_name, grantee_name] {
+       return this->modify_membership(grantee_name, role_name, membership_change::add);
+   });
+}
+
+future<>
+standard_role_manager::revoke(stdx::string_view revokee_name, stdx::string_view role_name) const {
+    return this->exists(role_name).then([this, revokee_name, role_name](bool role_exists) {
+        if (!role_exists) {
+            throw nonexistant_role(sstring(role_name));
+        }
+    }).then([this, revokee_name, role_name] {
+        return this->query_granted(
+                revokee_name,
+                recursive_role_query::no).then([revokee_name, role_name](role_set roles) {
+            if (roles.count(sstring(role_name)) == 0) {
+                throw revoke_ungranted_role(revokee_name, role_name);
+            }
+
+            return make_ready_future<>();
+        }).then([this, revokee_name, role_name] {
+            return this->modify_membership(revokee_name, role_name, membership_change::remove);
+        });
+    });
+}
+
+static future<> collect_roles(
+        cql3::query_processor& qp,
+        stdx::string_view grantee_name,
+        bool recurse,
+        role_set& roles) {
+    return require_record(qp, grantee_name).then([&qp, &roles, recurse](record r) {
+        return do_with(std::move(r.member_of), [&qp, &roles, recurse](const role_set& memberships) {
+            return do_for_each(memberships.begin(), memberships.end(), [&qp, &roles, recurse](const sstring& role_name) {
+                roles.insert(role_name);
+
+                if (recurse) {
+                    return collect_roles(qp, role_name, true, roles);
+                }
+
+                return make_ready_future<>();
+            });
+        });
+    });
+}
+
+future<role_set> standard_role_manager::query_granted(stdx::string_view grantee_name, recursive_role_query m) const {
+    const bool recurse = (m == recursive_role_query::yes);
+
+    return do_with(
+            role_set{sstring(grantee_name)},
+            [this, grantee_name, recurse](role_set& roles) {
+        return collect_roles(_qp, grantee_name, recurse, roles).then([&roles] { return roles; });
+    });
+}
+
+future<role_set> standard_role_manager::query_all() const {
+    static const sstring query = sprint(
+            "SELECT %s FROM %s",
+            meta::roles_table::role_col_name,
+            meta::roles_table::qualified_name());
+
+    // To avoid many copies of a view.
+    static const auto role_col_name_string = sstring(meta::roles_table::role_col_name);
+
+    return _qp.process(
+            query,
+            db::consistency_level::QUORUM,
+            internal_distributed_timeout_config()).then([](::shared_ptr<cql3::untyped_result_set> results) {
+        role_set roles;
+
+        std::transform(
+                results->begin(),
+                results->end(),
+                std::inserter(roles, roles.begin()),
+                [](const cql3::untyped_result_set_row& row) {
+            return row.get_as<sstring>(role_col_name_string);
+        });
+
+        return roles;
+    });
+}
+
+future<bool> standard_role_manager::exists(stdx::string_view role_name) const  {
+    return find_record(_qp, role_name).then([](stdx::optional<record> mr) {
+        return static_cast<bool>(mr);
+    });
+}
+
+future<bool> standard_role_manager::is_superuser(stdx::string_view role_name) const {
+    return require_record(_qp, role_name).then([](record r) {
+        return r.is_superuser;
+    });
+}
+
+future<bool> standard_role_manager::can_login(stdx::string_view role_name) const {
+    return require_record(_qp, role_name).then([](record r) {
+        return r.can_login;
+    });
+}
+
+}
--- a/auth/standard_role_manager.hh
+++ b/auth/standard_role_manager.hh
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "auth/role_manager.hh"
+
+#include <experimental/string_view>
+#include <unordered_set>
+
+#include <seastar/core/abort_source.hh>
+#include <seastar/core/future.hh>
+#include <seastar/core/sstring.hh>
+
+#include "stdx.hh"
+#include "seastarx.hh"
+
+namespace cql3 {
+class query_processor;
+}
+
+namespace service {
+class migration_manager;
+}
+
+namespace auth {
+
+stdx::string_view standard_role_manager_name() noexcept;
+
+class standard_role_manager final : public role_manager {
+    cql3::query_processor& _qp;
+    ::service::migration_manager& _migration_manager;
+    future<> _stopped;
+    seastar::abort_source _as;
+
+public:
+    standard_role_manager(cql3::query_processor& qp, ::service::migration_manager& mm)
+            : _qp(qp)
+            , _migration_manager(mm)
+            , _stopped(make_ready_future<>()) {
+    }
+
+    virtual stdx::string_view qualified_java_name() const noexcept override;
+
+    virtual const resource_set& protected_resources() const override;
+
+    virtual future<> start() override;
+
+    virtual future<> stop() override;
+
+    virtual future<> create(stdx::string_view role_name, const role_config&) const override;
+
+    virtual future<> drop(stdx::string_view role_name) const override;
+
+    virtual future<> alter(stdx::string_view role_name, const role_config_update&) const override;
+
+    virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const override;
+
+    virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const override;
+
+    virtual future<role_set> query_granted(stdx::string_view grantee_name, recursive_role_query) const override;
+
+    virtual future<role_set> query_all() const override;
+
+    virtual future<bool> exists(stdx::string_view role_name) const override;
+
+    virtual future<bool> is_superuser(stdx::string_view role_name) const override;
+
+    virtual future<bool> can_login(stdx::string_view role_name) const override;
+
+private:
+    enum class membership_change { add, remove };
+
+    future<> create_metadata_tables_if_missing() const;
+
+    bool legacy_metadata_exists() const;
+
+    future<> migrate_legacy_metadata() const;
+
+    future<> create_default_role_if_missing() const;
+
+    future<> create_or_replace(stdx::string_view role_name, const role_config&) const;
+
+    future<> modify_membership(stdx::string_view role_name, stdx::string_view grantee_name, membership_change) const;
+};
+
+}
--- a/auth/transitional.cc
+++ b/auth/transitional.cc
@@ -39,20 +39,17 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

-#include "authenticator.hh"
-#include "authenticated_user.hh"
-#include "authenticator.hh"
-#include "authorizer.hh"
-#include "password_authenticator.hh"
-#include "default_authorizer.hh"
-#include "permission.hh"
+#include "auth/authenticated_user.hh"
+#include "auth/authenticator.hh"
+#include "auth/authorizer.hh"
+#include "auth/default_authorizer.hh"
+#include "auth/password_authenticator.hh"
+#include "auth/permission.hh"
 #include "db/config.hh"
 #include "utils/class_registrator.hh"

 namespace auth {

-class service;
-
 static const sstring PACKAGE_NAME("com.scylladb.auth.");

 static const sstring& transitional_authenticator_name() {
@@ -67,38 +64,47 @@ static const sstring& transitional_authorizer_name() {

 class transitional_authenticator : public authenticator {
    std::unique_ptr<authenticator> _authenticator;
+
 public:
    static const sstring PASSWORD_AUTHENTICATOR_NAME;

    transitional_authenticator(cql3::query_processor& qp, ::service::migration_manager& mm)
-            : transitional_authenticator(std::make_unique<password_authenticator>(qp, mm))
-    {}
+            : transitional_authenticator(std::make_unique<password_authenticator>(qp, mm)) {
+    }
    transitional_authenticator(std::unique_ptr<authenticator> a)
-        : _authenticator(std::move(a))
-    {}
-    future<> start() override {
+            : _authenticator(std::move(a)) {
+    }
+
+    virtual future<> start() override {
        return _authenticator->start();
    }
-    future<> stop() override {
+
+    virtual future<> stop() override {
        return _authenticator->stop();
    }
-    const sstring& qualified_java_name() const override {
+
+    virtual const sstring& qualified_java_name() const override {
        return transitional_authenticator_name();
    }
-    bool require_authentication() const override {
+
+    virtual bool require_authentication() const override {
        return true;
    }
-    option_set supported_options() const override {
+
+    virtual authentication_option_set supported_options() const override {
        return _authenticator->supported_options();
    }
-    option_set alterable_options() const override {
+
+    virtual authentication_option_set alterable_options() const override {
        return _authenticator->alterable_options();
    }
-    future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const override {
+
+    virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override {
        auto i = credentials.find(authenticator::USERNAME_KEY);
-        if ((i == credentials.end() || i->second.empty()) && (!credentials.count(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {
+        if ((i == credentials.end() || i->second.empty())
+                && (!credentials.count(PASSWORD_KEY) || credentials.at(PASSWORD_KEY).empty())) {
            // return anon user
-            return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+            return make_ready_future<authenticated_user>(anonymous_user());
        }
        return make_ready_future().then([this, &credentials] {
            return _authenticator->authenticate(credentials);
@@ -107,29 +113,39 @@ public:
                std::rethrow_exception(ep);
            } catch (exceptions::authentication_exception&) {
                // return anon user
-                return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+                return make_ready_future<authenticated_user>(anonymous_user());
            }
        });
    }
-    future<> create(sstring username, const option_map& options) override {
-        return _authenticator->create(username, options);
+
+    virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override {
+        return _authenticator->create(role_name, options);
    }
-    future<> alter(sstring username, const option_map& options) override {
-        return _authenticator->alter(username, options);
+
+    virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override {
+        return _authenticator->alter(role_name, options);
    }
-    future<> drop(sstring username) override {
-        return _authenticator->drop(username);
+
+    virtual future<> drop(stdx::string_view role_name) const override {
+        return _authenticator->drop(role_name);
    }
-    const resource_ids& protected_resources() const override {
+
+    virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
+        return _authenticator->query_custom_options(role_name);
+    }
+
+    virtual const resource_set& protected_resources() const override {
        return _authenticator->protected_resources();
    }
-    ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
+
+    virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
        class sasl_wrapper : public sasl_challenge {
        public:
            sasl_wrapper(::shared_ptr<sasl_challenge> sasl)
-                : _sasl(std::move(sasl))
-            {}
-            bytes evaluate_response(bytes_view client_response) override {
+                    : _sasl(std::move(sasl)) {
+            }
+
+            virtual bytes evaluate_response(bytes_view client_response) override {
                try {
                    return _sasl->evaluate_response(client_response);
                } catch (exceptions::authentication_exception&) {
@@ -137,14 +153,27 @@ public:
                    return {};
                }
            }
-            bool is_complete() const {
+
+            virtual bool is_complete() const override {
                return _complete || _sasl->is_complete();
            }
-            future<::shared_ptr<authenticated_user>> get_authenticated_user() const {
-                return _sasl->get_authenticated_user();
+
+            virtual future<authenticated_user> get_authenticated_user() const {
+                return futurize_apply([this] {
+                    return _sasl->get_authenticated_user().handle_exception([](auto ep) {
+                        try {
+                            std::rethrow_exception(ep);
+                        } catch (exceptions::authentication_exception&) {
+                            // return anon user
+                            return make_ready_future<authenticated_user>(anonymous_user());
+                        }
+                    });
+                });
            }
+
        private:
            ::shared_ptr<sasl_challenge> _sasl;
+
            bool _complete = false;
        };
        return ::make_shared<sasl_wrapper>(_authenticator->new_sasl_challenge());
@@ -153,55 +182,65 @@ public:

 class transitional_authorizer : public authorizer {
    std::unique_ptr<authorizer> _authorizer;
+
 public:
    transitional_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)
-        : transitional_authorizer(std::make_unique<default_authorizer>(qp, mm))
-    {}
+            : transitional_authorizer(std::make_unique<default_authorizer>(qp, mm)) {
+    }
    transitional_authorizer(std::unique_ptr<authorizer> a)
-        : _authorizer(std::move(a))
-    {}
-    ~transitional_authorizer()
-    {}
-    future<> start() override {
+            : _authorizer(std::move(a)) {
+    }
+
+    ~transitional_authorizer() {
+    }
+
+    virtual future<> start() override {
        return _authorizer->start();
    }
-    future<> stop() override {
+
+    virtual future<> stop() override {
        return _authorizer->stop();
    }
-    const sstring& qualified_java_name() const override {
+
+    virtual const sstring& qualified_java_name() const override {
        return transitional_authorizer_name();
    }
-    future<permission_set> authorize(service& ser, ::shared_ptr<authenticated_user> user, data_resource resource) const override {
-        return is_super_user(ser, *user).then([](bool s) {
-            static const permission_set transitional_permissions =
-                            permission_set::of<permission::CREATE,
-                                            permission::ALTER, permission::DROP,
-                                            permission::SELECT, permission::MODIFY>();

-            return make_ready_future<permission_set>(s ? permissions::ALL : transitional_permissions);
-        });
+    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override {
+        static const permission_set transitional_permissions =
+                permission_set::of<
+                        permission::CREATE,
+                        permission::ALTER,
+                        permission::DROP,
+                        permission::SELECT,
+                        permission::MODIFY>();
+
+        return make_ready_future<permission_set>(transitional_permissions);
    }
-    future<> grant(::shared_ptr<authenticated_user> user, permission_set ps, data_resource r, sstring s) override {
-        return _authorizer->grant(std::move(user), std::move(ps), std::move(r), std::move(s));
+
+    virtual future<> grant(stdx::string_view s, permission_set ps, const resource& r) const override {
+        return _authorizer->grant(s, std::move(ps), r);
    }
-    future<> revoke(::shared_ptr<authenticated_user> user, permission_set ps, data_resource r, sstring s) override {
-        return _authorizer->revoke(std::move(user), std::move(ps), std::move(r), std::move(s));
+
+    virtual future<> revoke(stdx::string_view s, permission_set ps, const resource& r) const override {
+        return _authorizer->revoke(s, std::move(ps), r);
    }
-    future<std::vector<permission_details>> list(service& ser, ::shared_ptr<authenticated_user> user, permission_set ps, optional<data_resource> r, optional<sstring> s) const override {
-        return _authorizer->list(ser, std::move(user), std::move(ps), std::move(r), std::move(s));
+
+    virtual future<std::vector<permission_details>> list_all() const override {
+        return _authorizer->list_all();
    }
-    future<> revoke_all(sstring s) override {
-        return _authorizer->revoke_all(std::move(s));
+
+    virtual future<> revoke_all(stdx::string_view s) const override {
+        return _authorizer->revoke_all(s);
    }
-    future<> revoke_all(data_resource r) override {
-        return _authorizer->revoke_all(std::move(r));
+
+    virtual future<> revoke_all(const resource& r) const override {
+        return _authorizer->revoke_all(r);
    }
-    const resource_ids& protected_resources() override {
+
+    virtual const resource_set& protected_resources() const override {
        return _authorizer->protected_resources();
    }
-    future<> validate_configuration() const override {
-        return _authorizer->validate_configuration();
-    }
 };

 }
@@ -214,10 +253,10 @@ static const class_registrator<
        auth::authenticator,
        auth::transitional_authenticator,
        cql3::query_processor&,
-        ::service::migration_manager&> transitional_authenticator_reg("com.scylladb.auth.TransitionalAuthenticator");
+        ::service::migration_manager&> transitional_authenticator_reg(auth::PACKAGE_NAME + "TransitionalAuthenticator");

 static const class_registrator<
        auth::authorizer,
        auth::transitional_authorizer,
        cql3::query_processor&,
-        ::service::migration_manager&> transitional_authorizer_reg("com.scylladb.auth.TransitionalAuthorizer");
+        ::service::migration_manager&> transitional_authorizer_reg(auth::PACKAGE_NAME + "TransitionalAuthorizer");
--- a/backlog_controller.hh
+++ b/backlog_controller.hh
@@ -0,0 +1,146 @@
+/*
+ * Copyright (C) 2017 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+#include <seastar/core/scheduling.hh>
+#include <seastar/core/timer.hh>
+#include <seastar/core/gate.hh>
+#include <chrono>
+
+// Simple proportional controller to adjust shares for processes for which a backlog can be clearly
+// defined.
+//
+// Goal is to consume the backlog as fast as we can, but not so fast that we steal all the CPU from
+// incoming requests, and at the same time minimize user-visible fluctuations in the quota.
+//
+// What that translates to is we'll try to keep the backlog's firt derivative at 0 (IOW, we keep
+// backlog constant). As the backlog grows we increase CPU usage, decreasing CPU usage as the
+// backlog diminishes.
+//
+// The exact point at which the controller stops determines the desired CPU usage. As the backlog
+// grows and approach a maximum desired, we need to be more aggressive. We will therefore define two
+// thresholds, and increase the constant as we cross them.
+//
+// Doing that divides the range in three (before the first, between first and second, and after
+// second threshold), and we'll be slow to grow in the first region, grow normally in the second
+// region, and aggressively in the third region.
+//
+// The constants q1 and q2 are used to determine the proportional factor at each stage.
+class backlog_controller {
+public:
+    future<> shutdown() {
+        _update_timer.cancel();
+        return std::move(_inflight_update);
+    }
+protected:
+    struct control_point {
+        float input;
+        float output;
+    };
+
+    seastar::scheduling_group _scheduling_group;
+    const ::io_priority_class& _io_priority;
+    std::chrono::milliseconds _interval;
+    timer<> _update_timer;
+
+    std::vector<control_point> _control_points;
+
+    std::function<float()> _current_backlog;
+    // updating shares for an I/O class may contact another shard and returns a future.
+    future<> _inflight_update;
+
+    virtual void update_controller(float quota);
+
+    void adjust();
+
+    backlog_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval,
+                       std::vector<control_point> control_points, std::function<float()> backlog)
+        : _scheduling_group(sg)
+        , _io_priority(iop)
+        , _interval(interval)
+        , _update_timer([this] { adjust(); })
+        , _control_points({{0,0}})
+        , _current_backlog(std::move(backlog))
+        , _inflight_update(make_ready_future<>())
+    {
+        _control_points.insert(_control_points.end(), control_points.begin(), control_points.end());
+         _update_timer.arm_periodic(_interval);
+    }
+
+    // Used when the controllers are disabled and a static share is used
+    // When that option is deprecated we should remove this.
+    backlog_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) 
+        : _scheduling_group(sg)
+        , _io_priority(iop)
+        , _inflight_update(make_ready_future<>())
+    {
+        update_controller(static_shares);
+    }
+
+    virtual ~backlog_controller() {}
+public:
+    backlog_controller(backlog_controller&&) = default;
+    float backlog_of_shares(float shares) const;
+    seastar::scheduling_group sg() {
+        return _scheduling_group;
+    }
+};
+
+// memtable flush CPU controller.
+//
+// - First threshold is the soft limit line,
+// - Maximum is the point in which we'd stop consuming request,
+// - Second threshold is halfway between them.
+//
+// Below the soft limit, we are in no particular hurry to flush, since it means we're set to
+// complete flushing before we a new memtable is ready. The quota is dirty * q1, and q1 is set to a
+// low number.
+//
+// The first half of the virtual dirty region is where we expect to be usually, so we have a low
+// slope corresponding to a sluggish response between q1 * soft_limit and q2.
+//
+// In the second half, we're getting close to the hard dirty limit so we increase the slope and
+// become more responsive, up to a maximum quota of qmax.
+class flush_controller : public backlog_controller {
+    static constexpr float hard_dirty_limit = 1.0f;
+public:
+    flush_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
+    flush_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, float soft_limit, std::function<float()> current_dirty)
+        : backlog_controller(sg, iop, std::move(interval),
+          std::vector<backlog_controller::control_point>({{soft_limit, 10}, {soft_limit + (hard_dirty_limit - soft_limit) / 2, 200} , {hard_dirty_limit, 1000}}),
+          std::move(current_dirty)
+        )
+    {}
+};
+
+class compaction_controller : public backlog_controller {
+public:
+    static constexpr unsigned normalization_factor = 30;
+    static constexpr float disable_backlog = std::numeric_limits<double>::infinity();
+    static constexpr float backlog_disabled(float backlog) { return std::isinf(backlog); }
+    compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
+    compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, std::function<float()> current_backlog)
+        : backlog_controller(sg, iop, std::move(interval),
+          std::vector<backlog_controller::control_point>({{0.5, 10}, {1.5, 100} , {normalization_factor, 1000}}),
+          std::move(current_backlog)
+        )
+    {}
+};
--- a/bytes.hh
+++ b/bytes.hh
@@ -29,7 +29,7 @@
 #include <functional>
 #include "utils/mutable_view.hh"

-using bytes = basic_sstring<int8_t, uint32_t, 31>;
+using bytes = basic_sstring<int8_t, uint32_t, 31, false>;
 using bytes_view = std::experimental::basic_string_view<int8_t>;
 using bytes_mutable_view = basic_mutable_view<bytes_view::value_type>;
 using bytes_opt = std::experimental::optional<bytes>;
@@ -78,3 +78,11 @@ struct appending_hash<bytes_view> {
        h.update(reinterpret_cast<const char*>(v.begin()), v.size() * sizeof(bytes_view::value_type));
    }
 };
+
+inline int32_t compare_unsigned(bytes_view v1, bytes_view v2) {
+    auto n = memcmp(v1.begin(), v2.begin(), std::min(v1.size(), v2.size()));
+    if (n) {
+        return n;
+    }
+    return (int32_t) (v1.size() - v2.size());
+}
--- a/bytes_ostream.hh
+++ b/bytes_ostream.hh
@@ -65,8 +65,9 @@ private:
    size_type _size;
 public:
    class fragment_iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
-        chunk* _current;
+        chunk* _current = nullptr;
    public:
+        fragment_iterator() = default;
        fragment_iterator(chunk* current) : _current(current) {}
        fragment_iterator(const fragment_iterator&) = default;
        fragment_iterator& operator=(const fragment_iterator&) = default;
@@ -289,6 +290,24 @@ public:
        }
    }

+    // Removes n bytes from the end of the bytes_ostream.
+    // Beware of O(n) algorithm.
+    void remove_suffix(size_t n) {
+        _size -= n;
+        auto left = _size;
+        auto current = _begin.get();
+        while (current) {
+            if (current->offset >= left) {
+                current->offset = left;
+                _current = current;
+                current->next.reset();
+                return;
+            }
+            left -= current->offset;
+            current = current->next.get();
+        }
+    }
+
    // begin() and end() form an input range to bytes_view representing fragments.
    // Any modification of this instance invalidates iterators.
    fragment_iterator begin() const { return { _begin.get() }; }
--- a/cache_flat_mutation_reader.hh
+++ b/cache_flat_mutation_reader.hh
@@ -24,53 +24,20 @@
 #include <vector>
 #include "row_cache.hh"
 #include "mutation_reader.hh"
-#include "streamed_mutation.hh"
+#include "mutation_fragment.hh"
 #include "partition_version.hh"
 #include "utils/logalloc.hh"
 #include "query-request.hh"
 #include "partition_snapshot_reader.hh"
 #include "partition_snapshot_row_cursor.hh"
 #include "read_context.hh"
+#include "flat_mutation_reader.hh"

 namespace cache {

 extern logging::logger clogger;

-class lsa_manager {
-    row_cache& _cache;
-public:
-    lsa_manager(row_cache& cache) : _cache(cache) { }
-    template<typename Func>
-    decltype(auto) run_in_read_section(const Func& func) {
-        return _cache._read_section(_cache._tracker.region(), [&func] () {
-            return with_linearized_managed_bytes([&func] () {
-                return func();
-            });
-        });
-    }
-    template<typename Func>
-    decltype(auto) run_in_update_section(const Func& func) {
-        return _cache._update_section(_cache._tracker.region(), [&func] () {
-            return with_linearized_managed_bytes([&func] () {
-                return func();
-            });
-        });
-    }
-    template<typename Func>
-    void run_in_update_section_with_allocator(Func&& func) {
-        return _cache._update_section(_cache._tracker.region(), [this, &func] () {
-            return with_linearized_managed_bytes([this, &func] () {
-                return with_allocator(_cache._tracker.region().allocator(), [this, &func] () mutable {
-                    return func();
-                });
-            });
-        });
-    }
-    logalloc::region& region() { return _cache._tracker.region(); }
-    logalloc::allocating_section& read_section() { return _cache._read_section; }
-};
-
-class cache_streamed_mutation final : public streamed_mutation::impl {
+class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
    enum class state {
        before_static_row,

@@ -93,6 +60,7 @@ class cache_streamed_mutation final : public streamed_mutation::impl {
        // - _next_row_in_range = _next.position() < _upper_bound
        // - _last_row points at a direct predecessor of the next row which is going to be read.
        //   Used for populating continuity.
+        // - _population_range_starts_before_all_rows is set accordingly
        reading_from_underlying,

        end_of_stream
@@ -108,19 +76,9 @@ class cache_streamed_mutation final : public streamed_mutation::impl {

    partition_snapshot_row_weakref _last_row;

-    // We need to be prepared that we may get overlapping and out of order
-    // range tombstones. We must emit fragments with strictly monotonic positions,
-    // so we can't just trim such tombstones to the position of the last fragment.
-    // To solve that, range tombstones are accumulated first in a range_tombstone_stream
-    // and emitted once we have a fragment with a larger position.
-    range_tombstone_stream _tombstones;
-
    // Holds the lower bound of a position range which hasn't been processed yet.
-    // Only fragments with positions < _lower_bound have been emitted.
-    //
-    // It is assumed that !_lower_bound.is_clustering_row(). We depend on this when
-    // calling range_tombstone::trim_front() and when inserting dummy entries. Dummy
-    // entries are assumed to be only at !is_clustering_row() positions.
+    // Only rows with positions < _lower_bound have been emitted, and only
+    // range_tombstones with positions <= _lower_bound.
    position_in_partition _lower_bound;
    position_in_partition_view _upper_bound;

@@ -129,75 +87,109 @@ class cache_streamed_mutation final : public streamed_mutation::impl {
    partition_snapshot_row_cursor _next_row;
    bool _next_row_in_range = false;

-    future<> do_fill_buffer();
+    // True iff current population interval, since the previous clustering row, starts before all clustered rows.
+    // We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
+    // because we mark clustering intervals as continuous when consuming a clustering_row, it would prevent
+    // us from marking the interval as continuous.
+    // Valid when _state == reading_from_underlying.
+    bool _population_range_starts_before_all_rows;
+
+    // Whether _lower_bound was changed within current fill_buffer().
+    // If it did not then we cannot break out of it (e.g. on preemption) because
+    // forward progress is not guaranteed in case iterators are getting constantly invalidated.
+    bool _lower_bound_changed = false;
+
+    future<> do_fill_buffer(db::timeout_clock::time_point);
    void copy_from_cache_to_buffer();
-    future<> process_static_row();
+    future<> process_static_row(db::timeout_clock::time_point);
    void move_to_end();
    void move_to_next_range();
    void move_to_range(query::clustering_row_ranges::const_iterator);
    void move_to_next_entry();
-    // Emits all delayed range tombstones with positions smaller than upper_bound.
-    void drain_tombstones(position_in_partition_view upper_bound);
-    // Emits all delayed range tombstones.
-    void drain_tombstones();
    void add_to_buffer(const partition_snapshot_row_cursor&);
    void add_clustering_row_to_buffer(mutation_fragment&&);
    void add_to_buffer(range_tombstone&&);
    void add_to_buffer(mutation_fragment&&);
-    future<> read_from_underlying();
+    future<> read_from_underlying(db::timeout_clock::time_point);
    void start_reading_from_underlying();
    bool after_current_range(position_in_partition_view position);
    bool can_populate() const;
+    // Marks the range between _last_row (exclusive) and _next_row (exclusive) as continuous,
+    // provided that the underlying reader still matches the latest version of the partition.
    void maybe_update_continuity();
+    // Tries to ensure that the lower bound of the current population range exists.
+    // Returns false if it failed and range cannot be populated.
+    // Assumes can_populate().
+    bool ensure_population_lower_bound();
    void maybe_add_to_cache(const mutation_fragment& mf);
    void maybe_add_to_cache(const clustering_row& cr);
    void maybe_add_to_cache(const range_tombstone& rt);
    void maybe_add_to_cache(const static_row& sr);
    void maybe_set_static_row_continuous();
+    void finish_reader() {
+        push_mutation_fragment(partition_end());
+        _end_of_stream = true;
+        _state = state::end_of_stream;
+    }
+    void touch_partition();
 public:
-    cache_streamed_mutation(schema_ptr s,
-                            dht::decorated_key dk,
-                            query::clustering_key_filter_ranges&& crr,
-                            lw_shared_ptr<read_context> ctx,
-                            lw_shared_ptr<partition_snapshot> snp,
-                            row_cache& cache)
-        : streamed_mutation::impl(std::move(s), std::move(dk), snp->partition_tombstone())
+    cache_flat_mutation_reader(schema_ptr s,
+                               dht::decorated_key dk,
+                               query::clustering_key_filter_ranges&& crr,
+                               lw_shared_ptr<read_context> ctx,
+                               lw_shared_ptr<partition_snapshot> snp,
+                               row_cache& cache)
+        : flat_mutation_reader::impl(std::move(s))
        , _snp(std::move(snp))
        , _position_cmp(*_schema)
        , _ck_ranges(std::move(crr))
        , _ck_ranges_curr(_ck_ranges.begin())
        , _ck_ranges_end(_ck_ranges.end())
        , _lsa_manager(cache)
-        , _tombstones(*_schema)
        , _lower_bound(position_in_partition::before_all_clustered_rows())
        , _upper_bound(position_in_partition_view::before_all_clustered_rows())
        , _read_context(std::move(ctx))
        , _next_row(*_schema, *_snp)
    {
        clogger.trace("csm {}: table={}.{}", this, _schema->ks_name(), _schema->cf_name());
+        push_mutation_fragment(partition_start(std::move(dk), _snp->partition_tombstone()));
    }
-    cache_streamed_mutation(const cache_streamed_mutation&) = delete;
-    cache_streamed_mutation(cache_streamed_mutation&&) = delete;
-    virtual future<> fill_buffer() override;
-    virtual ~cache_streamed_mutation() {
+    cache_flat_mutation_reader(const cache_flat_mutation_reader&) = delete;
+    cache_flat_mutation_reader(cache_flat_mutation_reader&&) = delete;
+    virtual future<> fill_buffer(db::timeout_clock::time_point timeout) override;
+    virtual ~cache_flat_mutation_reader() {
        maybe_merge_versions(_snp, _lsa_manager.region(), _lsa_manager.read_section());
    }
+    virtual void next_partition() override {
+        clear_buffer_to_next_partition();
+        if (is_buffer_empty()) {
+            _end_of_stream = true;
+        }
+    }
+    virtual future<> fast_forward_to(const dht::partition_range&, db::timeout_clock::time_point timeout) override {
+        clear_buffer();
+        _end_of_stream = true;
+        return make_ready_future<>();
+    }
+    virtual future<> fast_forward_to(position_range pr, db::timeout_clock::time_point timeout) override {
+        throw std::bad_function_call();
+    }
 };

 inline
-future<> cache_streamed_mutation::process_static_row() {
-    if (_snp->version()->partition().static_row_continuous()) {
+future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_point timeout) {
+    if (_snp->static_row_continuous()) {
        _read_context->cache().on_row_hit();
-        row sr = _lsa_manager.run_in_read_section([this] {
-            return _snp->static_row();
+        static_row sr = _lsa_manager.run_in_read_section([this] {
+            return _snp->static_row(_read_context->digest_requested());
        });
        if (!sr.empty()) {
-            push_mutation_fragment(mutation_fragment(static_row(std::move(sr))));
+            push_mutation_fragment(mutation_fragment(std::move(sr)));
        }
        return make_ready_future<>();
    } else {
        _read_context->cache().on_row_miss();
-        return _read_context->get_next_fragment().then([this] (mutation_fragment_opt&& sr) {
+        return _read_context->get_next_fragment(timeout).then([this] (mutation_fragment_opt&& sr) {
            if (sr) {
                assert(sr->is_static_row());
                maybe_add_to_cache(sr->as_static_row());
@@ -209,44 +201,53 @@ future<> cache_streamed_mutation::process_static_row() {
 }

 inline
-future<> cache_streamed_mutation::fill_buffer() {
+void cache_flat_mutation_reader::touch_partition() {
+    if (_snp->at_latest_version()) {
+        rows_entry& last_dummy = *_snp->version()->partition().clustered_rows().rbegin();
+        _snp->tracker()->touch(last_dummy);
+    }
+}
+
+inline
+future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point timeout) {
    if (_state == state::before_static_row) {
-        auto after_static_row = [this] {
+        auto after_static_row = [this, timeout] {
            if (_ck_ranges_curr == _ck_ranges_end) {
-                _end_of_stream = true;
-                _state = state::end_of_stream;
+                touch_partition();
+                finish_reader();
                return make_ready_future<>();
            }
            _state = state::reading_from_cache;
            _lsa_manager.run_in_read_section([this] {
                move_to_range(_ck_ranges_curr);
            });
-            return fill_buffer();
+            return fill_buffer(timeout);
        };
        if (_schema->has_static_columns()) {
-            return process_static_row().then(std::move(after_static_row));
+            return process_static_row(timeout).then(std::move(after_static_row));
        } else {
            return after_static_row();
        }
    }
    clogger.trace("csm {}: fill_buffer(), range={}, lb={}", this, *_ck_ranges_curr, _lower_bound);
-    return do_until([this] { return _end_of_stream || is_buffer_full(); }, [this] {
-        return do_fill_buffer();
+    return do_until([this] { return _end_of_stream || is_buffer_full(); }, [this, timeout] {
+        return do_fill_buffer(timeout);
    });
 }

 inline
-future<> cache_streamed_mutation::do_fill_buffer() {
+future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
    if (_state == state::move_to_underlying) {
        _state = state::reading_from_underlying;
+        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
        auto end = _next_row_in_range ? position_in_partition(_next_row.position())
                                      : position_in_partition(_upper_bound);
-        return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}).then([this] {
-            return read_from_underlying();
+        return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
+            return read_from_underlying(timeout);
        });
    }
    if (_state == state::reading_from_underlying) {
-        return read_from_underlying();
+        return read_from_underlying(timeout);
    }
    // assert(_state == state::reading_from_cache)
    return _lsa_manager.run_in_read_section([this] {
@@ -266,9 +267,13 @@ future<> cache_streamed_mutation::do_fill_buffer() {
        }
        _next_row.maybe_refresh();
        clogger.trace("csm {}: next={}, cont={}", this, _next_row.position(), _next_row.continuous());
-        while (!is_buffer_full() && _state == state::reading_from_cache) {
+        _lower_bound_changed = false;
+        while (_state == state::reading_from_cache) {
            copy_from_cache_to_buffer();
-            if (need_preempt()) {
+            // We need to check _lower_bound_changed even if is_buffer_full() because
+            // we may have emitted only a range tombstone which overlapped with _lower_bound
+            // and thus didn't cause _lower_bound to change.
+            if ((need_preempt() || is_buffer_full()) && _lower_bound_changed) {
                break;
            }
        }
@@ -277,8 +282,8 @@ future<> cache_streamed_mutation::do_fill_buffer() {
 }

 inline
-future<> cache_streamed_mutation::read_from_underlying() {
-    return consume_mutation_fragments_until(_read_context->get_streamed_mutation(),
+future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::time_point timeout) {
+    return consume_mutation_fragments_until(_read_context->underlying().underlying(),
        [this] { return _state != state::reading_from_underlying || is_buffer_full(); },
        [this] (mutation_fragment mf) {
            _read_context->cache().on_row_miss();
@@ -323,13 +328,14 @@ future<> cache_streamed_mutation::read_from_underlying() {
                                auto inserted = insert_result.second;
                                auto it = insert_result.first;
                                if (inserted) {
+                                    _snp->tracker()->insert(*e);
                                    e.release();
                                    auto next = std::next(it);
                                    it->set_continuous(next->continuous());
                                    clogger.trace("csm {}: inserted dummy at {}, cont={}", this, it->position(), it->continuous());
                                }
                            });
-                        } else if (!_ck_ranges_curr->start() || _last_row.refresh(*_snp)) {
+                        } else if (ensure_population_lower_bound()) {
                            with_allocator(_snp->region().allocator(), [&] {
                                auto e = alloc_strategy_unique_ptr<rows_entry>(
                                    current_allocator().construct<rows_entry>(*_schema, _upper_bound, is_dummy::yes, is_continuous::yes));
@@ -338,6 +344,7 @@ future<> cache_streamed_mutation::read_from_underlying() {
                                auto inserted = insert_result.second;
                                if (inserted) {
                                    clogger.trace("csm {}: inserted dummy at {}", this, _upper_bound);
+                                    _snp->tracker()->insert(*e);
                                    e.release();
                                } else {
                                    clogger.trace("csm {}: mark {} as continuous", this, insert_result.first->position());
@@ -357,37 +364,53 @@ future<> cache_streamed_mutation::read_from_underlying() {
                }
            });
            return make_ready_future<>();
-        });
+        }, timeout);
 }

 inline
-void cache_streamed_mutation::maybe_update_continuity() {
-    if (can_populate() && (!_ck_ranges_curr->start() || _last_row.refresh(*_snp))) {
-            if (_next_row.is_in_latest_version()) {
-                clogger.trace("csm {}: mark {} continuous", this, _next_row.get_iterator_in_latest_version()->position());
-                _next_row.get_iterator_in_latest_version()->set_continuous(true);
-            } else {
-                // Cover entry from older version
-                with_allocator(_snp->region().allocator(), [&] {
-                    auto& rows = _snp->version()->partition().clustered_rows();
-                    rows_entry::compare less(*_schema);
-                    auto e = alloc_strategy_unique_ptr<rows_entry>(
-                        current_allocator().construct<rows_entry>(*_schema, _next_row.position(), is_dummy(_next_row.dummy()), is_continuous::yes));
-                    auto insert_result = rows.insert_check(_next_row.get_iterator_in_latest_version(), *e, less);
-                    auto inserted = insert_result.second;
-                    if (inserted) {
-                        clogger.trace("csm {}: inserted dummy at {}", this, e->position());
-                        e.release();
-                    }
-                });
+bool cache_flat_mutation_reader::ensure_population_lower_bound() {
+    if (_population_range_starts_before_all_rows) {
+        return true;
+    }
+    if (!_last_row.refresh(*_snp)) {
+        return false;
+    }
+    // Continuity flag we will later set for the upper bound extends to the previous row in the same version,
+    // so we need to ensure we have an entry in the latest version.
+    if (!_last_row.is_in_latest_version()) {
+        with_allocator(_snp->region().allocator(), [&] {
+            auto& rows = _snp->version()->partition().clustered_rows();
+            rows_entry::compare less(*_schema);
+            // FIXME: Avoid the copy by inserting an incomplete clustering row
+            auto e = alloc_strategy_unique_ptr<rows_entry>(
+                current_allocator().construct<rows_entry>(*_schema, *_last_row));
+            e->set_continuous(false);
+            auto insert_result = rows.insert_check(rows.end(), *e, less);
+            auto inserted = insert_result.second;
+            if (inserted) {
+                clogger.trace("csm {}: inserted lower bound dummy at {}", this, e->position());
+                _snp->tracker()->insert(*e);
+                e.release();
            }
+        });
+    }
+    return true;
+}
+
+inline
+void cache_flat_mutation_reader::maybe_update_continuity() {
+    if (can_populate() && ensure_population_lower_bound()) {
+        with_allocator(_snp->region().allocator(), [&] {
+            rows_entry& e = _next_row.ensure_entry_in_latest().row;
+            e.set_continuous(true);
+        });
    } else {
        _read_context->cache().on_mispopulate();
    }
 }

 inline
-void cache_streamed_mutation::maybe_add_to_cache(const mutation_fragment& mf) {
+void cache_flat_mutation_reader::maybe_add_to_cache(const mutation_fragment& mf) {
    if (mf.is_range_tombstone()) {
        maybe_add_to_cache(mf.as_range_tombstone());
    } else {
@@ -398,9 +421,10 @@ void cache_streamed_mutation::maybe_add_to_cache(const mutation_fragment& mf) {
 }

 inline
-void cache_streamed_mutation::maybe_add_to_cache(const clustering_row& cr) {
+void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
    if (!can_populate()) {
        _last_row = nullptr;
+        _population_range_starts_before_all_rows = false;
        _read_context->cache().on_mispopulate();
        return;
    }
@@ -409,52 +433,69 @@ void cache_streamed_mutation::maybe_add_to_cache(const clustering_row& cr) {
        mutation_partition& mp = _snp->version()->partition();
        rows_entry::compare less(*_schema);

+        if (_read_context->digest_requested()) {
+            cr.cells().prepare_hash(*_schema, column_kind::regular_column);
+        }
        auto new_entry = alloc_strategy_unique_ptr<rows_entry>(
-            current_allocator().construct<rows_entry>(cr.key(), cr.tomb(), cr.marker(), cr.cells()));
+            current_allocator().construct<rows_entry>(*_schema, cr.key(), cr.tomb(), cr.marker(), cr.cells()));
        new_entry->set_continuous(false);
        auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()
                                              : mp.clustered_rows().lower_bound(cr.key(), less);
        auto insert_result = mp.clustered_rows().insert_check(it, *new_entry, less);
        if (insert_result.second) {
-            _read_context->cache().on_row_insert();
+            _snp->tracker()->insert(*new_entry);
            new_entry.release();
        }
        it = insert_result.first;

        rows_entry& e = *it;
-        if (!_ck_ranges_curr->start() || _last_row.refresh(*_snp)) {
+        if (ensure_population_lower_bound()) {
            clogger.trace("csm {}: set_continuous({})", this, e.position());
            e.set_continuous(true);
        } else {
            _read_context->cache().on_mispopulate();
        }
        with_allocator(standard_allocator(), [&] {
-            _last_row = partition_snapshot_row_weakref(*_snp, it);
+            _last_row = partition_snapshot_row_weakref(*_snp, it, true);
        });
+        _population_range_starts_before_all_rows = false;
    });
 }

 inline
-bool cache_streamed_mutation::after_current_range(position_in_partition_view p) {
+bool cache_flat_mutation_reader::after_current_range(position_in_partition_view p) {
    return _position_cmp(p, _upper_bound) >= 0;
 }

 inline
-void cache_streamed_mutation::start_reading_from_underlying() {
+void cache_flat_mutation_reader::start_reading_from_underlying() {
    clogger.trace("csm {}: start_reading_from_underlying(), range=[{}, {})", this, _lower_bound, _next_row_in_range ? _next_row.position() : _upper_bound);
    _state = state::move_to_underlying;
+    _next_row.touch();
 }

 inline
-void cache_streamed_mutation::copy_from_cache_to_buffer() {
+void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
    clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", this, _next_row.position(), _next_row_in_range);
+    _next_row.touch();
    position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());
-    for (auto&& rts : _snp->range_tombstones(*_schema, _lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {
-        add_to_buffer(std::move(rts));
-        if (is_buffer_full()) {
-            return;
+    for (auto &&rts : _snp->range_tombstones(_lower_bound, _next_row_in_range ? next_lower_bound : _upper_bound)) {
+        position_in_partition::less_compare less(*_schema);
+        // This guarantees that rts starts after any emitted clustering_row
+        // and not before any emitted range tombstone.
+        if (!less(_lower_bound, rts.position())) {
+            rts.set_start(*_schema, _lower_bound);
+        } else {
+            _lower_bound = position_in_partition(rts.position());
+            _lower_bound_changed = true;
+            if (is_buffer_full()) {
+                return;
+            }
        }
+        push_mutation_fragment(std::move(rts));
    }
+    // We add the row to the buffer even when it's full.
+    // This simplifies the code. For more info see #3139.
    if (_next_row_in_range) {
        _last_row = _next_row;
        add_to_buffer(_next_row);
@@ -465,15 +506,13 @@ void cache_streamed_mutation::copy_from_cache_to_buffer() {
 }

 inline
-void cache_streamed_mutation::move_to_end() {
-    drain_tombstones();
-    _end_of_stream = true;
-    _state = state::end_of_stream;
+void cache_flat_mutation_reader::move_to_end() {
+    finish_reader();
    clogger.trace("csm {}: eos", this);
 }

 inline
-void cache_streamed_mutation::move_to_next_range() {
+void cache_flat_mutation_reader::move_to_next_range() {
    auto next_it = std::next(_ck_ranges_curr);
    if (next_it == _ck_ranges_end) {
        move_to_end();
@@ -484,12 +523,13 @@ void cache_streamed_mutation::move_to_next_range() {
 }

 inline
-void cache_streamed_mutation::move_to_range(query::clustering_row_ranges::const_iterator next_it) {
+void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::const_iterator next_it) {
    auto lb = position_in_partition::for_range_start(*next_it);
    auto ub = position_in_partition_view::for_range_end(*next_it);
    _last_row = nullptr;
    _lower_bound = std::move(lb);
    _upper_bound = std::move(ub);
+    _lower_bound_changed = true;
    _ck_ranges_curr = next_it;
    auto adjacent = _next_row.advance_to(_lower_bound);
    _next_row_in_range = !after_current_range(_next_row.position());
@@ -509,7 +549,8 @@ void cache_streamed_mutation::move_to_range(query::clustering_row_ranges::const_
                    auto new_entry = current_allocator().construct<rows_entry>(*_schema, _lower_bound, is_dummy::yes, is_continuous::no);
                    return rows.insert_before(_next_row.get_iterator_in_latest_version(), *new_entry);
                });
-                _last_row = partition_snapshot_row_weakref(*_snp, it);
+                _snp->tracker()->insert(*it);
+                _last_row = partition_snapshot_row_weakref(*_snp, it, true);
            } else {
                _read_context->cache().on_mispopulate();
            }
@@ -520,7 +561,7 @@ void cache_streamed_mutation::move_to_range(query::clustering_row_ranges::const_

 // _next_row must be inside the range.
 inline
-void cache_streamed_mutation::move_to_next_entry() {
+void cache_flat_mutation_reader::move_to_next_entry() {
    clogger.trace("csm {}: move_to_next_entry(), curr={}", this, _next_row.position());
    if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {
        move_to_next_range();
@@ -538,31 +579,7 @@ void cache_streamed_mutation::move_to_next_entry() {
 }

 inline
-void cache_streamed_mutation::drain_tombstones(position_in_partition_view pos) {
-    while (true) {
-        reserve_one();
-        auto mfo = _tombstones.get_next(pos);
-        if (!mfo) {
-            break;
-        }
-        push_mutation_fragment(std::move(*mfo));
-    }
-}
-
-inline
-void cache_streamed_mutation::drain_tombstones() {
-    while (true) {
-        reserve_one();
-        auto mfo = _tombstones.get_next();
-        if (!mfo) {
-            break;
-        }
-        push_mutation_fragment(std::move(*mfo));
-    }
-}
-
-inline
-void cache_streamed_mutation::add_to_buffer(mutation_fragment&& mf) {
+void cache_flat_mutation_reader::add_to_buffer(mutation_fragment&& mf) {
    clogger.trace("csm {}: add_to_buffer({})", this, mf);
    if (mf.is_clustering_row()) {
        add_clustering_row_to_buffer(std::move(mf));
@@ -573,10 +590,10 @@ void cache_streamed_mutation::add_to_buffer(mutation_fragment&& mf) {
 }

 inline
-void cache_streamed_mutation::add_to_buffer(const partition_snapshot_row_cursor& row) {
+void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_cursor& row) {
    if (!row.dummy()) {
        _read_context->cache().on_row_hit();
-        add_clustering_row_to_buffer(row.row());
+        add_clustering_row_to_buffer(row.row(_read_context->digest_requested()));
    }
 }

@@ -584,35 +601,35 @@ void cache_streamed_mutation::add_to_buffer(const partition_snapshot_row_cursor&
 //   (1) no fragment with position >= _lower_bound was pushed yet
 //   (2) If _lower_bound > mf.position(), mf was emitted
 inline
-void cache_streamed_mutation::add_clustering_row_to_buffer(mutation_fragment&& mf) {
+void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&& mf) {
    clogger.trace("csm {}: add_clustering_row_to_buffer({})", this, mf);
    auto& row = mf.as_clustering_row();
-    auto key = row.key();
-    try {
-        drain_tombstones(row.position());
-        push_mutation_fragment(std::move(mf));
-        _lower_bound = position_in_partition::after_key(std::move(key));
-    } catch (...) {
-        // We may have emitted some of the range tombstones which start after the old _lower_bound
-        _lower_bound = position_in_partition::for_key(std::move(key));
-        throw;
-    }
+    auto new_lower_bound = position_in_partition::after_key(row.key());
+    push_mutation_fragment(std::move(mf));
+    _lower_bound = std::move(new_lower_bound);
+    _lower_bound_changed = true;
 }

 inline
-void cache_streamed_mutation::add_to_buffer(range_tombstone&& rt) {
+void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {
    clogger.trace("csm {}: add_to_buffer({})", this, rt);
    // This guarantees that rt starts after any emitted clustering_row
-    if (!rt.trim_front(*_schema, _lower_bound)) {
+    // and not before any emitted range tombstone.
+    position_in_partition::less_compare less(*_schema);
+    if (!less(_lower_bound, rt.end_position())) {
        return;
    }
-    _lower_bound = position_in_partition(rt.position());
-    _tombstones.apply(std::move(rt));
-    drain_tombstones(_lower_bound);
+    if (!less(_lower_bound, rt.position())) {
+        rt.set_start(*_schema, _lower_bound);
+    } else {
+        _lower_bound = position_in_partition(rt.position());
+        _lower_bound_changed = true;
+    }
+    push_mutation_fragment(std::move(rt));
 }

 inline
-void cache_streamed_mutation::maybe_add_to_cache(const range_tombstone& rt) {
+void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {
    if (can_populate()) {
        clogger.trace("csm {}: maybe_add_to_cache({})", this, rt);
        _lsa_manager.run_in_update_section_with_allocator([&] {
@@ -624,11 +641,14 @@ void cache_streamed_mutation::maybe_add_to_cache(const range_tombstone& rt) {
 }

 inline
-void cache_streamed_mutation::maybe_add_to_cache(const static_row& sr) {
+void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {
    if (can_populate()) {
        clogger.trace("csm {}: populate({})", this, sr);
-        _read_context->cache().on_row_insert();
+        _read_context->cache().on_static_row_insert();
        _lsa_manager.run_in_update_section_with_allocator([&] {
+            if (_read_context->digest_requested()) {
+                sr.cells().prepare_hash(*_schema, column_kind::static_column);
+            }
            _snp->version()->partition().static_row().apply(*_schema, column_kind::static_column, sr.cells());
        });
    } else {
@@ -637,7 +657,7 @@ void cache_streamed_mutation::maybe_add_to_cache(const static_row& sr) {
 }

 inline
-void cache_streamed_mutation::maybe_set_static_row_continuous() {
+void cache_flat_mutation_reader::maybe_set_static_row_continuous() {
    if (can_populate()) {
        clogger.trace("csm {}: set static row continuous", this);
        _snp->version()->partition().set_static_row_continuous(true);
@@ -647,19 +667,19 @@ void cache_streamed_mutation::maybe_set_static_row_continuous() {
 }

 inline
-bool cache_streamed_mutation::can_populate() const {
+bool cache_flat_mutation_reader::can_populate() const {
    return _snp->at_latest_version() && _read_context->cache().phase_of(_read_context->key()) == _read_context->phase();
 }

 } // namespace cache

-inline streamed_mutation make_cache_streamed_mutation(schema_ptr s,
-                                                      dht::decorated_key dk,
-                                                      query::clustering_key_filter_ranges crr,
-                                                      row_cache& cache,
-                                                      lw_shared_ptr<cache::read_context> ctx,
-                                                      lw_shared_ptr<partition_snapshot> snp)
+inline flat_mutation_reader make_cache_flat_mutation_reader(schema_ptr s,
+                                                            dht::decorated_key dk,
+                                                            query::clustering_key_filter_ranges crr,
+                                                            row_cache& cache,
+                                                            lw_shared_ptr<cache::read_context> ctx,
+                                                            lw_shared_ptr<partition_snapshot> snp)
 {
-    return make_streamed_mutation<cache::cache_streamed_mutation>(
+    return make_flat_mutation_reader<cache::cache_flat_mutation_reader>(
        std::move(s), std::move(dk), std::move(crr), std::move(ctx), std::move(snp), cache);
 }
--- a/canonical_mutation.cc
+++ b/canonical_mutation.cc
@@ -75,7 +75,7 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
    auto version = mv.schema_version();
    auto pk = mv.key();

-    mutation m(std::move(pk), std::move(s));
+    mutation m(std::move(s), std::move(pk));

    if (version == m.schema()->version()) {
        auto partition_view = mutation_partition_view::from_view(mv.partition());
--- a/cell_locking.hh
+++ b/cell_locking.hh
@@ -23,27 +23,15 @@

 #include <boost/intrusive/unordered_set.hpp>

-#if __has_include(<boost/container/small_vector.hpp>)
-
-#include <boost/container/small_vector.hpp>
-
-template <typename T, size_t N>
-using small_vector = boost::container::small_vector<T, N>;
-
-#else
-
-#include <vector>
-template <typename T, size_t N>
-using small_vector = std::vector<T>;
-
-#endif
-
+#include "utils/small_vector.hh"
 #include "fnv1a_hasher.hh"
-#include "streamed_mutation.hh"
+#include "mutation_fragment.hh"
 #include "mutation_partition.hh"

+#include "db/timeout_clock.hh"
+
 class cells_range {
-    using ids_vector_type = small_vector<column_id, 5>;
+    using ids_vector_type = utils::small_vector<column_id, 5>;

    position_in_partition_view _position;
    ids_vector_type _ids;
@@ -142,11 +130,7 @@ struct cell_locker_stats {
 };

 class cell_locker {
-public:
-    using timeout_clock = lowres_clock;
 private:
-    using semaphore_type = basic_semaphore<default_timeout_exception_factory, timeout_clock>;
-
    class partition_entry;

    struct cell_address {
@@ -158,7 +142,7 @@ private:
                       public enable_lw_shared_from_this<cell_entry> {
        partition_entry& _parent;
        cell_address _address;
-        semaphore_type _semaphore { 0 };
+        db::timeout_semaphore _semaphore { 0 };

        friend class cell_locker;
    public:
@@ -187,7 +171,7 @@ private:
            return _address.position;
        }

-        future<> lock(timeout_clock::time_point _timeout) {
+        future<> lock(db::timeout_clock::time_point _timeout) {
            return _semaphore.wait(_timeout);
        }
        void unlock() {
@@ -387,7 +371,7 @@ public:

    // partition_cells_range is required to be in cell_locker::schema()
    future<std::vector<locked_cell>> lock_cells(const dht::decorated_key& dk, partition_cells_range&& range,
-                                                timeout_clock::time_point timeout);
+                                                db::timeout_clock::time_point timeout);
 };


@@ -416,7 +400,7 @@ struct cell_locker::locker {
    partition_cells_range::iterator _current_ck;
    cells_range::const_iterator _current_cell;

-    timeout_clock::time_point _timeout;
+    db::timeout_clock::time_point _timeout;
    std::vector<locked_cell> _locks;
    cell_locker_stats& _stats;
 private:
@@ -430,7 +414,7 @@ private:

    bool is_done() const { return _current_ck == _range.end(); }
 public:
-    explicit locker(const ::schema& s, cell_locker_stats& st, partition_entry& pe, partition_cells_range&& range, timeout_clock::time_point timeout)
+    explicit locker(const ::schema& s, cell_locker_stats& st, partition_entry& pe, partition_cells_range&& range, db::timeout_clock::time_point timeout)
        : _hasher(s)
        , _eq_cmp(s)
        , _partition_entry(pe)
@@ -458,7 +442,7 @@ public:
 };

 inline
-future<std::vector<locked_cell>> cell_locker::lock_cells(const dht::decorated_key& dk, partition_cells_range&& range, timeout_clock::time_point timeout) {
+future<std::vector<locked_cell>> cell_locker::lock_cells(const dht::decorated_key& dk, partition_cells_range&& range, db::timeout_clock::time_point timeout) {
    partition_entry::hasher pe_hash;
    partition_entry::equal_compare pe_eq(*_schema);

--- a/clustering_bounds_comparator.hh
+++ b/clustering_bounds_comparator.hh
@@ -42,17 +42,6 @@ std::ostream& operator<<(std::ostream& out, const bound_kind k);
 bound_kind invert_kind(bound_kind k);
 int32_t weight(bound_kind k);

-static inline bound_kind flip_bound_kind(bound_kind bk)
-{
-    switch (bk) {
-    case bound_kind::excl_end: return bound_kind::excl_start;
-    case bound_kind::incl_end: return bound_kind::incl_start;
-    case bound_kind::excl_start: return bound_kind::excl_end;
-    case bound_kind::incl_start: return bound_kind::incl_end;
-    }
-    abort();
-}
-
 class bound_view {
 public:
    const static thread_local clustering_key empty_prefix;
--- a/clustering_ranges_walker.hh
+++ b/clustering_ranges_walker.hh
@@ -25,7 +25,7 @@

 #include "schema.hh"
 #include "query-request.hh"
-#include "streamed_mutation.hh"
+#include "mutation_fragment.hh"

 // Utility for in-order checking of overlap with position ranges.
 class clustering_ranges_walker {
@@ -70,7 +70,7 @@ public:
    {
        if (!with_static_row) {
            if (_current == _end) {
-                _current_start = _current_end = position_in_partition_view::after_all_clustered_rows();
+                _current_start = position_in_partition_view::before_all_clustered_rows();
            } else {
                _current_start = position_in_partition_view::for_range_start(*_current);
                _current_end = position_in_partition_view::for_range_end(*_current);
--- a/compaction_strategy.hh
+++ b/compaction_strategy.hh
@@ -23,8 +23,10 @@

 #include "sstables/shared_sstable.hh"
 #include "exceptions/exceptions.hh"
+#include "sstables/compaction_backlog_manager.hh"

-class column_family;
+class table;
+using column_family = table;
 class schema;
 using schema_ptr = lw_shared_ptr<const schema>;

@@ -120,6 +122,8 @@ public:
    }

    sstable_set make_sstable_set(schema_ptr schema) const;
+
+    compaction_backlog_tracker& get_backlog_tracker();
 };

 // Creates a compaction_strategy object from one of the strategies available.
--- a/compound.hh
+++ b/compound.hh
@@ -28,6 +28,7 @@
 #include <boost/range/iterator_range.hpp>
 #include <boost/range/adaptor/transformed.hpp>
 #include "utils/serialization.hh"
+#include "util/backtrace.hh"
 #include "unimplemented.hh"

 enum class allow_prefixes { no, yes };
@@ -144,7 +145,7 @@ public:
                }
                len = read_simple<size_type>(_v);
                if (_v.size() < len) {
-                    throw marshal_exception();
+                    throw_with_backtrace<marshal_exception>(sprint("compound_type iterator - not enough bytes, expected %d, got %d", len, _v.size()));
                }
            }
            _current = bytes_view(_v.begin(), len);
--- a/compound_compat.hh
+++ b/compound_compat.hh
@@ -25,6 +25,7 @@
 #include <boost/range/adaptor/transformed.hpp>
 #include "compound.hh"
 #include "schema.hh"
+#include "sstables/version.hh"

 //
 // This header provides adaptors between the representation used by our compound_type<>
@@ -241,7 +242,7 @@ public:
    using component_view = std::pair<bytes_view, eoc>;
 private:
    template<typename Value, typename = std::enable_if_t<!std::is_same<const data_value, std::decay_t<Value>>::value>>
-    static size_t size(Value& val) {
+    static size_t size(const Value& val) {
        return val.size();
    }
    static size_t size(const data_value& val) {
@@ -302,7 +303,7 @@ private:
    }
 public:
    template <typename Describer>
-    auto describe_type(Describer f) const {
+    auto describe_type(sstables::sstable_version_types v, Describer f) const {
        return f(const_cast<bytes&>(_bytes));
    }

@@ -345,7 +346,7 @@ public:
                }
                len = read_simple<size_type>(_v);
                if (_v.size() < len) {
-                    throw marshal_exception();
+                    throw_with_backtrace<marshal_exception>(sprint("composite iterator - not enough bytes, expected %d, got %d", len, _v.size()));
                }
            }
            auto value = bytes_view(_v.begin(), len);
@@ -445,17 +446,16 @@ public:
        return _is_compound;
    }

-    // The following factory functions assume this composite is a compound value.
    template <typename ClusteringElement>
    static composite from_clustering_element(const schema& s, const ClusteringElement& ce) {
-        return serialize_value(ce.components(s));
+        return serialize_value(ce.components(s), s.is_compound());
    }

-    static composite from_exploded(const std::vector<bytes_view>& v, eoc marker = eoc::none) {
+    static composite from_exploded(const std::vector<bytes_view>& v, bool is_compound, eoc marker = eoc::none) {
        if (v.size() == 0) {
-            return composite(bytes(size_t(1), bytes::value_type(marker)));
+            return composite(bytes(size_t(1), bytes::value_type(marker)), is_compound);
        }
-        return serialize_value(v, true, marker);
+        return serialize_value(v, is_compound, marker);
    }

    static composite static_prefix(const schema& s) {
--- a/compress.cc
+++ b/compress.cc
@@ -0,0 +1,345 @@
+/*
+ * Copyright (C) 2016 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <lz4.h>
+#include <zlib.h>
+#include <snappy-c.h>
+
+#include "compress.hh"
+#include "utils/class_registrator.hh"
+
+const sstring compressor::namespace_prefix = "org.apache.cassandra.io.compress.";
+
+class lz4_processor: public compressor {
+public:
+    using compressor::compressor;
+
+    size_t uncompress(const char* input, size_t input_len, char* output,
+                    size_t output_len) const override;
+    size_t compress(const char* input, size_t input_len, char* output,
+                    size_t output_len) const override;
+    size_t compress_max_size(size_t input_len) const override;
+};
+
+class snappy_processor: public compressor {
+public:
+    using compressor::compressor;
+
+    size_t uncompress(const char* input, size_t input_len, char* output,
+                    size_t output_len) const override;
+    size_t compress(const char* input, size_t input_len, char* output,
+                    size_t output_len) const override;
+    size_t compress_max_size(size_t input_len) const override;
+};
+
+class deflate_processor: public compressor {
+public:
+    using compressor::compressor;
+
+    size_t uncompress(const char* input, size_t input_len, char* output,
+                    size_t output_len) const override;
+    size_t compress(const char* input, size_t input_len, char* output,
+                    size_t output_len) const override;
+    size_t compress_max_size(size_t input_len) const override;
+};
+
+compressor::compressor(sstring name)
+    : _name(std::move(name))
+{}
+
+std::set<sstring> compressor::option_names() const {
+    return {};
+}
+
+std::map<sstring, sstring> compressor::options() const {
+    return {};
+}
+
+shared_ptr<compressor> compressor::create(const sstring& name, const opt_getter& opts) {
+    if (name.empty()) {
+        return {};
+    }
+
+    qualified_name qn(namespace_prefix, name);
+
+    for (auto& c : { lz4, snappy, deflate }) {
+        if (c->name() == qn) {
+            return c;
+        }
+    }
+
+    return compressor_registry::create(qn, opts);
+}
+
+shared_ptr<compressor> compressor::create(const std::map<sstring, sstring>& options) {
+    auto i = options.find(compression_parameters::SSTABLE_COMPRESSION);
+    if (i != options.end() && !i->second.empty()) {
+        return create(i->second, [&options](const sstring& key) -> opt_string {
+            auto i = options.find(key);
+            if (i == options.end()) {
+                return std::experimental::nullopt;
+            }
+            return { i->second };
+        });
+    }
+    return {};
+}
+
+thread_local const shared_ptr<compressor> compressor::lz4 = make_shared<lz4_processor>(namespace_prefix + "LZ4Compressor");
+thread_local const shared_ptr<compressor> compressor::snappy = make_shared<snappy_processor>(namespace_prefix + "SnappyCompressor");
+thread_local const shared_ptr<compressor> compressor::deflate = make_shared<deflate_processor>(namespace_prefix + "DeflateCompressor");
+
+const sstring compression_parameters::SSTABLE_COMPRESSION = "sstable_compression";
+const sstring compression_parameters::CHUNK_LENGTH_KB = "chunk_length_kb";
+const sstring compression_parameters::CRC_CHECK_CHANCE = "crc_check_chance";
+
+compression_parameters::compression_parameters()
+    : compression_parameters(nullptr)
+{}
+
+compression_parameters::~compression_parameters()
+{}
+
+compression_parameters::compression_parameters(compressor_ptr c)
+    : _compressor(std::move(c))
+{}
+
+compression_parameters::compression_parameters(const std::map<sstring, sstring>& options) {
+    _compressor = compressor::create(options);
+
+    validate_options(options);
+
+    auto chunk_length = options.find(CHUNK_LENGTH_KB);
+    if (chunk_length != options.end()) {
+        try {
+            _chunk_length = std::stoi(chunk_length->second) * 1024;
+        } catch (const std::exception& e) {
+            throw exceptions::syntax_exception(sstring("Invalid integer value ") + chunk_length->second + " for " + CHUNK_LENGTH_KB);
+        }
+    }
+    auto crc_chance = options.find(CRC_CHECK_CHANCE);
+    if (crc_chance != options.end()) {
+        try {
+            _crc_check_chance = std::stod(crc_chance->second);
+        } catch (const std::exception& e) {
+            throw exceptions::syntax_exception(sstring("Invalid double value ") + crc_chance->second + "for " + CRC_CHECK_CHANCE);
+        }
+    }
+}
+
+void compression_parameters::validate() {
+    if (_chunk_length) {
+        auto chunk_length = _chunk_length.value();
+        if (chunk_length <= 0) {
+            throw exceptions::configuration_exception(sstring("Invalid negative or null ") + CHUNK_LENGTH_KB);
+        }
+        // _chunk_length must be a power of two
+        if (chunk_length & (chunk_length - 1)) {
+            throw exceptions::configuration_exception(sstring(CHUNK_LENGTH_KB) + " must be a power of 2.");
+        }
+    }
+    if (_crc_check_chance && (_crc_check_chance.value() < 0.0 || _crc_check_chance.value() > 1.0)) {
+        throw exceptions::configuration_exception(sstring(CRC_CHECK_CHANCE) + " must be between 0.0 and 1.0.");
+    }
+}
+
+std::map<sstring, sstring> compression_parameters::get_options() const {
+    if (!_compressor) {
+        return std::map<sstring, sstring>();
+    }
+    auto opts = _compressor->options();
+
+    opts.emplace(compression_parameters::SSTABLE_COMPRESSION, _compressor->name());
+    if (_chunk_length) {
+        opts.emplace(sstring(CHUNK_LENGTH_KB), std::to_string(_chunk_length.value() / 1024));
+    }
+    if (_crc_check_chance) {
+        opts.emplace(sstring(CRC_CHECK_CHANCE), std::to_string(_crc_check_chance.value()));
+    }
+    return opts;
+}
+
+bool compression_parameters::operator==(const compression_parameters& other) const {
+    return _compressor == other._compressor
+           && _chunk_length == other._chunk_length
+           && _crc_check_chance == other._crc_check_chance;
+}
+
+bool compression_parameters::operator!=(const compression_parameters& other) const {
+    return !(*this == other);
+}
+
+void compression_parameters::validate_options(const std::map<sstring, sstring>& options) {
+    // currently, there are no options specific to a particular compressor
+    static std::set<sstring> keywords({
+        sstring(SSTABLE_COMPRESSION),
+        sstring(CHUNK_LENGTH_KB),
+        sstring(CRC_CHECK_CHANCE),
+    });
+    std::set<sstring> ckw;
+    if (_compressor) {
+        ckw = _compressor->option_names();
+    }
+    for (auto&& opt : options) {
+        if (!keywords.count(opt.first) && !ckw.count(opt.first)) {
+            throw exceptions::configuration_exception(sprint("Unknown compression option '%s'.", opt.first));
+        }
+    }
+}
+
+size_t lz4_processor::uncompress(const char* input, size_t input_len,
+                char* output, size_t output_len) const {
+    // We use LZ4_decompress_safe(). According to the documentation, the
+    // function LZ4_decompress_fast() is slightly faster, but maliciously
+    // crafted compressed data can cause it to overflow the output buffer.
+    // Theoretically, our compressed data is created by us so is not malicious
+    // (and accidental corruption is avoided by the compressed-data checksum),
+    // but let's not take that chance for now, until we've actually measured
+    // the performance benefit that LZ4_decompress_fast() would bring.
+
+    // Cassandra's LZ4Compressor prepends to the chunk its uncompressed length
+    // in 4 bytes little-endian (!) order. We don't need this information -
+    // we already know the uncompressed data is at most the given chunk size
+    // (and usually is exactly that, except in the last chunk). The advance
+    // knowledge of the uncompressed size could be useful if we used
+    // LZ4_decompress_fast(), but we prefer LZ4_decompress_safe() anyway...
+    input += 4;
+    input_len -= 4;
+
+    auto ret = LZ4_decompress_safe(input, output, input_len, output_len);
+    if (ret < 0) {
+        throw std::runtime_error("LZ4 uncompression failure");
+    }
+    return ret;
+}
+
+size_t lz4_processor::compress(const char* input, size_t input_len,
+                char* output, size_t output_len) const {
+    if (output_len < LZ4_COMPRESSBOUND(input_len) + 4) {
+        throw std::runtime_error("LZ4 compression failure: length of output is too small");
+    }
+    // Write input_len (32-bit data) to beginning of output in little-endian representation.
+    output[0] = input_len & 0xFF;
+    output[1] = (input_len >> 8) & 0xFF;
+    output[2] = (input_len >> 16) & 0xFF;
+    output[3] = (input_len >> 24) & 0xFF;
+#ifdef SEASTAR_HAVE_LZ4_COMPRESS_DEFAULT
+    auto ret = LZ4_compress_default(input, output + 4, input_len, LZ4_compressBound(input_len));
+#else
+    auto ret = LZ4_compress(input, output + 4, input_len);
+#endif
+    if (ret == 0) {
+        throw std::runtime_error("LZ4 compression failure: LZ4_compress() failed");
+    }
+    return ret + 4;
+}
+
+size_t lz4_processor::compress_max_size(size_t input_len) const {
+    return LZ4_COMPRESSBOUND(input_len) + 4;
+}
+
+size_t deflate_processor::uncompress(const char* input,
+                size_t input_len, char* output, size_t output_len) const {
+    z_stream zs;
+    zs.zalloc = Z_NULL;
+    zs.zfree = Z_NULL;
+    zs.opaque = Z_NULL;
+    zs.avail_in = 0;
+    zs.next_in = Z_NULL;
+    if (inflateInit(&zs) != Z_OK) {
+        throw std::runtime_error("deflate uncompression init failure");
+    }
+    // yuck, zlib is not const-correct, and also uses unsigned char while we use char :-(
+    zs.next_in = reinterpret_cast<unsigned char*>(const_cast<char*>(input));
+    zs.avail_in = input_len;
+    zs.next_out = reinterpret_cast<unsigned char*>(output);
+    zs.avail_out = output_len;
+    auto res = inflate(&zs, Z_FINISH);
+    inflateEnd(&zs);
+    if (res == Z_STREAM_END) {
+        return output_len - zs.avail_out;
+    } else {
+        throw std::runtime_error("deflate uncompression failure");
+    }
+}
+
+size_t deflate_processor::compress(const char* input,
+                size_t input_len, char* output, size_t output_len) const {
+    z_stream zs;
+    zs.zalloc = Z_NULL;
+    zs.zfree = Z_NULL;
+    zs.opaque = Z_NULL;
+    zs.avail_in = 0;
+    zs.next_in = Z_NULL;
+    if (deflateInit(&zs, Z_DEFAULT_COMPRESSION) != Z_OK) {
+        throw std::runtime_error("deflate compression init failure");
+    }
+    zs.next_in = reinterpret_cast<unsigned char*>(const_cast<char*>(input));
+    zs.avail_in = input_len;
+    zs.next_out = reinterpret_cast<unsigned char*>(output);
+    zs.avail_out = output_len;
+    auto res = ::deflate(&zs, Z_FINISH);
+    deflateEnd(&zs);
+    if (res == Z_STREAM_END) {
+        return output_len - zs.avail_out;
+    } else {
+        throw std::runtime_error("deflate compression failure");
+    }
+}
+
+size_t deflate_processor::compress_max_size(size_t input_len) const {
+    z_stream zs;
+    zs.zalloc = Z_NULL;
+    zs.zfree = Z_NULL;
+    zs.opaque = Z_NULL;
+    zs.avail_in = 0;
+    zs.next_in = Z_NULL;
+    if (deflateInit(&zs, Z_DEFAULT_COMPRESSION) != Z_OK) {
+        throw std::runtime_error("deflate compression init failure");
+    }
+    auto res = deflateBound(&zs, input_len);
+    deflateEnd(&zs);
+    return res;
+}
+
+size_t snappy_processor::uncompress(const char* input, size_t input_len,
+                char* output, size_t output_len) const {
+    if (snappy_uncompress(input, input_len, output, &output_len)
+            == SNAPPY_OK) {
+        return output_len;
+    } else {
+        throw std::runtime_error("snappy uncompression failure");
+    }
+}
+
+size_t snappy_processor::compress(const char* input, size_t input_len,
+                char* output, size_t output_len) const {
+    auto ret = snappy_compress(input, input_len, output, &output_len);
+    if (ret != SNAPPY_OK) {
+        throw std::runtime_error("snappy compression failure: snappy_compress() failed");
+    }
+    return output_len;
+}
+
+size_t snappy_processor::compress_max_size(size_t input_len) const {
+    return snappy_max_compressed_length(input_len);
+}
+
--- a/compress.hh
+++ b/compress.hh
@@ -21,135 +21,103 @@

 #pragma once

-#include "exceptions/exceptions.hh"
+#include <map>
+#include <set>

-enum class compressor {
-    none,
-    lz4,
-    snappy,
-    deflate,
+#include <seastar/core/future.hh>
+#include <seastar/core/shared_ptr.hh>
+#include <seastar/core/sstring.hh>
+
+#include "exceptions/exceptions.hh"
+#include "stdx.hh"
+
+
+class compressor {
+    sstring _name;
+public:
+    compressor(sstring);
+
+    virtual ~compressor() {}
+
+    /**
+     * Unpacks data in "input" to output. If output_len is of insufficient size,
+     * exception is thrown. I.e. you should keep track of the uncompressed size.
+     */
+    virtual size_t uncompress(const char* input, size_t input_len, char* output,
+                    size_t output_len) const = 0;
+    /**
+     * Packs data in "input" to output. If output_len is of insufficient size,
+     * exception is thrown. Maximum required size is obtained via "compress_max_size"
+     */
+    virtual size_t compress(const char* input, size_t input_len, char* output,
+                    size_t output_len) const = 0;
+    /**
+     * Returns the maximum output size for compressing data on "input_len" size.
+     */
+    virtual size_t compress_max_size(size_t input_len) const = 0;
+
+    /**
+     * Returns accepted option names for this compressor
+     */
+    virtual std::set<sstring> option_names() const;
+    /**
+     * Returns original options used in instantiating this compressor
+     */
+    virtual std::map<sstring, sstring> options() const;
+
+    /**
+     * Compressor class name.
+     */
+    const sstring& name() const {
+        return _name;
+    }
+
+    // to cheaply bridge sstable compression options / maps
+    using opt_string = stdx::optional<sstring>;
+    using opt_getter = std::function<opt_string(const sstring&)>;
+
+    static shared_ptr<compressor> create(const sstring& name, const opt_getter&);
+    static shared_ptr<compressor> create(const std::map<sstring, sstring>&);
+
+    static thread_local const shared_ptr<compressor> lz4;
+    static thread_local const shared_ptr<compressor> snappy;
+    static thread_local const shared_ptr<compressor> deflate;
+
+    static const sstring namespace_prefix;
 };

+template<typename BaseType, typename... Args>
+class class_registry;
+
+using compressor_ptr = shared_ptr<compressor>;
+using compressor_registry = class_registry<compressor_ptr, const typename compressor::opt_getter&>;
+
 class compression_parameters {
 public:
    static constexpr int32_t DEFAULT_CHUNK_LENGTH = 4 * 1024;
    static constexpr double DEFAULT_CRC_CHECK_CHANCE = 1.0;

-    static constexpr auto SSTABLE_COMPRESSION = "sstable_compression";
-    static constexpr auto CHUNK_LENGTH_KB = "chunk_length_kb";
-    static constexpr auto CRC_CHECK_CHANCE = "crc_check_chance";
+    static const sstring SSTABLE_COMPRESSION;
+    static const sstring CHUNK_LENGTH_KB;
+    static const sstring CRC_CHECK_CHANCE;
 private:
-    compressor _compressor;
+    compressor_ptr _compressor;
    std::experimental::optional<int> _chunk_length;
    std::experimental::optional<double> _crc_check_chance;
 public:
-    compression_parameters(compressor c = compressor::lz4) : _compressor(c) { }
-    compression_parameters(const std::map<sstring, sstring>& options) {
-        validate_options(options);
+    compression_parameters();
+    compression_parameters(compressor_ptr);
+    compression_parameters(const std::map<sstring, sstring>& options);
+    ~compression_parameters();

-        auto it = options.find(SSTABLE_COMPRESSION);
-        if (it == options.end() || it->second.empty()) {
-            _compressor = compressor::none;
-            return;
-        }
-        const auto& compressor_class = it->second;
-        if (is_compressor_class(compressor_class, "LZ4Compressor")) {
-            _compressor = compressor::lz4;
-        } else if (is_compressor_class(compressor_class, "SnappyCompressor")) {
-            _compressor = compressor::snappy;
-        } else if (is_compressor_class(compressor_class, "DeflateCompressor")) {
-            _compressor = compressor::deflate;
-        } else {
-            throw exceptions::configuration_exception(sstring("Unsupported compression class '") + compressor_class + "'.");
-        }
-        auto chunk_length = options.find(CHUNK_LENGTH_KB);
-        if (chunk_length != options.end()) {
-            try {
-                _chunk_length = std::stoi(chunk_length->second) * 1024;
-            } catch (const std::exception& e) {
-                throw exceptions::syntax_exception(sstring("Invalid integer value ") + chunk_length->second + " for " + CHUNK_LENGTH_KB);
-            }
-        }
-        auto crc_chance = options.find(CRC_CHECK_CHANCE);
-        if (crc_chance != options.end()) {
-            try {
-                _crc_check_chance = std::stod(crc_chance->second);
-            } catch (const std::exception& e) {
-                throw exceptions::syntax_exception(sstring("Invalid double value ") + crc_chance->second + "for " + CRC_CHECK_CHANCE);
-            }
-        }
-    }
-
-    compressor get_compressor() const { return _compressor; }
+    compressor_ptr get_compressor() const { return _compressor; }
    int32_t chunk_length() const { return _chunk_length.value_or(int(DEFAULT_CHUNK_LENGTH)); }
    double crc_check_chance() const { return _crc_check_chance.value_or(double(DEFAULT_CRC_CHECK_CHANCE)); }

-    void validate() {
-        if (_chunk_length) {
-            auto chunk_length = _chunk_length.value();
-            if (chunk_length <= 0) {
-                throw exceptions::configuration_exception(sstring("Invalid negative or null ") + CHUNK_LENGTH_KB);
-            }
-            // _chunk_length must be a power of two
-            if (chunk_length & (chunk_length - 1)) {
-                throw exceptions::configuration_exception(sstring(CHUNK_LENGTH_KB) + " must be a power of 2.");
-            }
-        }
-        if (_crc_check_chance && (_crc_check_chance.value() < 0.0 || _crc_check_chance.value() > 1.0)) {
-            throw exceptions::configuration_exception(sstring(CRC_CHECK_CHANCE) + " must be between 0.0 and 1.0.");
-        }
-    }
-
-    std::map<sstring, sstring> get_options() const {
-        if (_compressor == compressor::none) {
-            return std::map<sstring, sstring>();
-        }
-        std::map<sstring, sstring> opts;
-        opts.emplace(sstring(SSTABLE_COMPRESSION), compressor_name());
-        if (_chunk_length) {
-            opts.emplace(sstring(CHUNK_LENGTH_KB), std::to_string(_chunk_length.value() / 1024));
-        }
-        if (_crc_check_chance) {
-            opts.emplace(sstring(CRC_CHECK_CHANCE), std::to_string(_crc_check_chance.value()));
-        }
-        return opts;
-    }
-    bool operator==(const compression_parameters& other) const {
-        return _compressor == other._compressor
-               && _chunk_length == other._chunk_length
-               && _crc_check_chance == other._crc_check_chance;
-    }
-    bool operator!=(const compression_parameters& other) const {
-        return !(*this == other);
-    }
+    void validate();
+    std::map<sstring, sstring> get_options() const;
+    bool operator==(const compression_parameters& other) const;
+    bool operator!=(const compression_parameters& other) const;
 private:
-    void validate_options(const std::map<sstring, sstring>& options) {
-        // currently, there are no options specific to a particular compressor
-        static std::set<sstring> keywords({
-            sstring(SSTABLE_COMPRESSION),
-            sstring(CHUNK_LENGTH_KB),
-            sstring(CRC_CHECK_CHANCE),
-        });
-        for (auto&& opt : options) {
-            if (!keywords.count(opt.first)) {
-                throw exceptions::configuration_exception(sprint("Unknown compression option '%s'.", opt.first));
-            }
-        }
-    }
-    bool is_compressor_class(const sstring& value, const sstring& class_name) {
-        static const sstring namespace_prefix = "org.apache.cassandra.io.compress.";
-        return value == class_name || value == namespace_prefix + class_name;
-    }
-    sstring compressor_name() const {
-        switch (_compressor) {
-        case compressor::lz4:
-             return "org.apache.cassandra.io.compress.LZ4Compressor";
-        case compressor::snappy:
-            return "org.apache.cassandra.io.compress.SnappyCompressor";
-        case compressor::deflate:
-            return "org.apache.cassandra.io.compress.DeflateCompressor";
-        default:
-            abort();
-        }
-    }
+    void validate_options(const std::map<sstring, sstring>&);
 };
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -14,7 +14,7 @@
 # one logical cluster from joining another.
 # It is recommended to change the default value when creating a new cluster.
 # You can NOT modify this value for an existing cluster
-#cluster_name: 'ScyllaDB Cluster'
+#cluster_name: 'Test Cluster'

 # This defines the number of tokens randomly assigned to this node on the ring
 # The more tokens, relative to other nodes, the larger the proportion of data
@@ -87,6 +87,13 @@ listen_address: localhost
 # Leaving this blank will set it to the same value as listen_address
 # broadcast_address: 1.2.3.4

+
+# When using multiple physical network interfaces, set this to true to listen on broadcast_address
+# in addition to the listen_address, allowing nodes to communicate in both interfaces.
+# Ignore this property if the network configuration automatically routes between the public and private networks such as EC2.
+#
+# listen_on_broadcast_address: false
+
 # port for the CQL native transport to listen for clients on
 # For security reasons, you should not expose this port to the internet.  Firewall it if needed.
 native_transport_port: 9042
@@ -100,13 +107,6 @@ native_transport_port: 9042
 # keeping native_transport_port unencrypted.
 #native_transport_port_ssl: 9142

-# Throttles all outbound streaming file transfers on this node to the
-# given total throughput in Mbps. This is necessary because Scylla does
-# mostly sequential IO when streaming data during bootstrap or repair, which
-# can lead to saturating the network connection and degrading rpc performance.
-# When unset, the default is 200 Mbps or 25 MB/s.
-# stream_throughput_outbound_megabits_per_sec: 200
-
 # How long the coordinator should wait for read operations to complete
 read_request_timeout_in_ms: 5000

@@ -240,9 +240,8 @@ batch_size_fail_threshold_in_kb: 50
 # Uncomment to enable experimental features
 # experimental: true

-###################################################
-## Not currently supported, reserved for future use
-###################################################
+# The directory where hints files are stored if hinted handoff is enabled.
+# hints_directory: /var/lib/scylla/hints

 # See http://wiki.apache.org/cassandra/HintedHandoff
 # May either be "true" or "false" to enable globally, or contain a list
@@ -266,6 +265,10 @@ batch_size_fail_threshold_in_kb: 50
 # cross-dc handoff tends to be slower
 # max_hints_delivery_threads: 2

+###################################################
+## Not currently supported, reserved for future use
+###################################################
+
 # Maximum throttle in KBs per second, total. This will be
 # reduced proportionally to the number of nodes in the cluster.
 # batchlog_replay_throttle_in_kb: 1024
--- a/configure.py
+++ b/configure.py
@@ -20,9 +20,11 @@
 # along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 #

-import os, os.path, textwrap, argparse, sys, shlex, subprocess, tempfile, re
+import os, os.path, textwrap, argparse, sys, shlex, subprocess, tempfile, re, platform
 from distutils.spawn import find_executable

+tempfile.tempdir = "./build/tmp"
+
 configure_args = str.join(' ', [shlex.quote(x) for x in sys.argv[1:]])

 for line in open('/etc/os-release'):
@@ -83,17 +85,33 @@ def pkg_config(option, package):
    return output.decode('utf-8').strip()

 def try_compile(compiler, source = '', flags = []):
-    with tempfile.NamedTemporaryFile() as sfile:
-        sfile.file.write(bytes(source, 'utf-8'))
-        sfile.file.flush()
-        return subprocess.call([compiler, '-x', 'c++', '-o', '/dev/null', '-c', sfile.name] + args.user_cflags.split() + flags,
-                               stdout = subprocess.DEVNULL,
-                               stderr = subprocess.DEVNULL) == 0
+    return try_compile_and_link(compiler, source, flags = flags + ['-c'])

-def warning_supported(warning, compiler):
+def ensure_tmp_dir_exists():
+    if not os.path.exists(tempfile.tempdir):
+        os.makedirs(tempfile.tempdir)
+
+def try_compile_and_link(compiler, source = '', flags = []):
+    ensure_tmp_dir_exists()
+    with tempfile.NamedTemporaryFile() as sfile:
+        ofile = tempfile.mktemp()
+        try:
+            sfile.file.write(bytes(source, 'utf-8'))
+            sfile.file.flush()
+            # We can't write to /dev/null, since in some cases (-ftest-coverage) gcc will create an auxiliary
+            # output file based on the name of the output file, and "/dev/null.gcsa" is not a good name
+            return subprocess.call([compiler, '-x', 'c++', '-o', ofile, sfile.name] + args.user_cflags.split() + flags,
+                                   stdout = subprocess.DEVNULL,
+                                   stderr = subprocess.DEVNULL) == 0
+        finally:
+            if os.path.exists(ofile):
+                os.unlink(ofile)
+
+def flag_supported(flag, compiler):
    # gcc ignores -Wno-x even if it is not supported
-    adjusted = re.sub('^-Wno-', '-W', warning)
-    return try_compile(flags = ['-Werror', adjusted], compiler = compiler)
+    adjusted = re.sub('^-Wno-', '-W', flag)
+    split = adjusted.split(' ')
+    return try_compile(flags = ['-Werror'] + split, compiler = compiler)

 def debug_flag(compiler):
    src_with_auto = textwrap.dedent('''\
@@ -108,6 +126,14 @@ def debug_flag(compiler):
        print('Note: debug information disabled; upgrade your compiler')
        return ''

+def gold_supported(compiler):
+    src_main = 'int main(int argc, char **argv) { return 0; }'
+    if try_compile_and_link(source = src_main, flags = ['-fuse-ld=gold'], compiler = compiler):
+        return '-fuse-ld=gold'
+    else:
+        print('Note: gold not found; using default system linker')
+        return ''
+
 def maybe_static(flag, libs):
    if flag and not args.static:
        libs = '-Wl,-Bstatic {} -Wl,-Bdynamic'.format(libs)
@@ -133,6 +159,13 @@ class Thrift(object):
    def endswith(self, end):
        return self.source.endswith(end)

+def default_target_arch():
+    mach = platform.machine()
+    if platform.machine() in ['i386', 'i686', 'x86_64']:
+        return 'nehalem'
+    else:
+        return ''
+
 class Antlr3Grammar(object):
    def __init__(self, source):
        self.source = source
@@ -154,13 +187,13 @@ modes = {
    'debug': {
        'sanitize': '-fsanitize=address -fsanitize=leak -fsanitize=undefined',
        'sanitize_libs': '-lasan -lubsan',
-        'opt': '-O0 -DDEBUG -DDEBUG_SHARED_PTR -DDEFAULT_ALLOCATOR',
+        'opt': '-O0 -DDEBUG -DDEBUG_SHARED_PTR -DDEFAULT_ALLOCATOR -DDEBUG_LSA_SANITIZER',
        'libs': '',
    },
    'release': {
        'sanitize': '',
        'sanitize_libs': '',
-        'opt': '-O2',
+        'opt': '-O3',
        'libs': '',
    },
 }
@@ -168,7 +201,7 @@ modes = {
 scylla_tests = [
    'tests/mutation_test',
    'tests/mvcc_test',
-    'tests/streamed_mutation_test',
+    'tests/mutation_fragment_test',
    'tests/flat_mutation_reader_test',
    'tests/schema_registry_test',
    'tests/canonical_mutation_test',
@@ -178,6 +211,7 @@ scylla_tests = [
    'tests/partitioner_test',
    'tests/frozen_mutation_test',
    'tests/serialized_action_test',
+    'tests/hint_test',
    'tests/clustering_ranges_walker_test',
    'tests/perf/perf_mutation',
    'tests/lsa_async_eviction_test',
@@ -189,11 +223,12 @@ scylla_tests = [
    'tests/perf/perf_simple_query',
    'tests/perf/perf_fast_forward',
    'tests/perf/perf_cache_eviction',
-    'tests/cache_streamed_mutation_test',
+    'tests/cache_flat_mutation_reader_test',
    'tests/row_cache_stress_test',
    'tests/memory_footprint',
    'tests/perf/perf_sstable',
    'tests/cql_query_test',
+    'tests/secondary_index_test',
    'tests/storage_proxy_test',
    'tests/schema_change_test',
    'tests/mutation_reader_test',
@@ -201,6 +236,7 @@ scylla_tests = [
    'tests/row_cache_test',
    'tests/test-serialization',
    'tests/sstable_test',
+    'tests/sstable_3_x_test',
    'tests/sstable_mutation_test',
    'tests/sstable_resharding_test',
    'tests/memtable_test',
@@ -215,6 +251,7 @@ scylla_tests = [
    'tests/config_test',
    'tests/gossiping_property_file_snitch_test',
    'tests/ec2_snitch_test',
+    'tests/gce_snitch_test',
    'tests/snitch_reset_test',
    'tests/network_topology_strategy_test',
    'tests/query_processor_test',
@@ -236,27 +273,50 @@ scylla_tests = [
    'tests/database_test',
    'tests/nonwrapping_range_test',
    'tests/input_stream_test',
-    'tests/sstable_atomic_deletion_test',
    'tests/virtual_reader_test',
    'tests/view_schema_test',
+    'tests/view_build_test',
+    'tests/view_complex_test',
    'tests/counter_test',
    'tests/cell_locker_test',
+    'tests/row_locker_test',
    'tests/streaming_histogram_test',
    'tests/duration_test',
    'tests/vint_serialization_test',
+    'tests/continuous_data_consumer_test',
    'tests/compress_test',
    'tests/chunked_vector_test',
    'tests/loading_cache_test',
    'tests/castas_fcts_test',
    'tests/big_decimal_test',
    'tests/aggregate_fcts_test',
+    'tests/role_manager_test',
+    'tests/caching_options_test',
+    'tests/auth_resource_test',
+    'tests/cql_auth_query_test',
+    'tests/enum_set_test',
+    'tests/extensions_test',
+    'tests/cql_auth_syntax_test',
+    'tests/querier_cache',
+    'tests/limiting_data_source_test',
+    'tests/meta_test',
+    'tests/imr_test',
+    'tests/partition_data_test',
+    'tests/reusable_buffer_test',
+    'tests/json_test'
+]
+
+perf_tests = [
+    'tests/perf/perf_mutation_readers',
+    'tests/perf/perf_mutation_fragment',
+    'tests/perf/perf_idl',
 ]

 apps = [
    'scylla',
    ]

-tests = scylla_tests
+tests = scylla_tests + perf_tests

 other = [
    'iotune',
@@ -278,6 +338,8 @@ arg_parser.add_argument('--cflags', action = 'store', dest = 'user_cflags', defa
                        help = 'Extra flags for the C++ compiler')
 arg_parser.add_argument('--ldflags', action = 'store', dest = 'user_ldflags', default = '',
                        help = 'Extra flags for the linker')
+arg_parser.add_argument('--target', action = 'store', dest = 'target', default = default_target_arch(),
+                        help = 'Target architecture (-march)')
 arg_parser.add_argument('--compiler', action = 'store', dest = 'cxx', default = 'g++',
                        help = 'C++ compiler path')
 arg_parser.add_argument('--c-compiler', action='store', dest='cc', default='gcc',
@@ -296,6 +358,8 @@ arg_parser.add_argument('--static-thrift', dest = 'staticthrift', action = 'stor
            help = 'Link libthrift statically')
 arg_parser.add_argument('--static-boost', dest = 'staticboost', action = 'store_true',
            help = 'Link boost statically')
+arg_parser.add_argument('--static-yaml-cpp', dest = 'staticyamlcpp', action = 'store_true',
+            help = 'Link libyaml-cpp statically')
 arg_parser.add_argument('--tests-debuginfo', action = 'store', dest = 'tests_debuginfo', type = int, default = 0,
                        help = 'Enable(1)/disable(0)compiler debug information generation for tests')
 arg_parser.add_argument('--python', action = 'store', dest = 'python', default = 'python3',
@@ -306,6 +370,10 @@ arg_parser.add_argument('--enable-gcc6-concepts', dest='gcc6_concepts', action='
                        help='enable experimental support for C++ Concepts as implemented in GCC 6')
 arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_injector', action='store_true', default=False,
                        help='enable allocation failure injection')
+arg_parser.add_argument('--with-antlr3', dest='antlr3_exec', action='store', default=None,
+                        help='path to antlr3 executable')
+arg_parser.add_argument('--with-ragel', dest='ragel_exec', action='store', default=None,
+                        help='path to ragel executable')
 args = arg_parser.parse_args()

 defines = []
@@ -315,42 +383,51 @@ extra_cxxflags = {}
 cassandra_interface = Thrift(source = 'interface/cassandra.thrift', service = 'Cassandra')

 scylla_core = (['database.cc',
+                 'atomic_cell.cc',
                 'schema.cc',
                 'frozen_schema.cc',
                 'schema_registry.cc',
                 'bytes.cc',
                 'mutation.cc',
-                 'streamed_mutation.cc',
+                 'mutation_fragment.cc',
                 'partition_version.cc',
                 'row_cache.cc',
                 'canonical_mutation.cc',
                 'frozen_mutation.cc',
                 'memtable.cc',
                 'schema_mutations.cc',
-                 'release.cc',
                 'supervisor.cc',
                 'utils/logalloc.cc',
                 'utils/large_bitset.cc',
+                 'utils/buffer_input_stream.cc',
+                 'utils/limiting_data_source.cc',
                 'mutation_partition.cc',
                 'mutation_partition_view.cc',
                 'mutation_partition_serializer.cc',
                 'mutation_reader.cc',
                 'flat_mutation_reader.cc',
                 'mutation_query.cc',
+                 'json.cc',
                 'keys.cc',
-                 'counters.cc',
+                 'counters.cc',                 
+                 'compress.cc',
+                 'sstables/mp_row_consumer.cc',
                 'sstables/sstables.cc',
+                 'sstables/sstable_version.cc',
                 'sstables/compress.cc',
                 'sstables/row.cc',
                 'sstables/partition.cc',
                 'sstables/compaction.cc',
                 'sstables/compaction_strategy.cc',
                 'sstables/compaction_manager.cc',
-                 'sstables/atomic_deletion.cc',
                 'sstables/integrity_checked_file_impl.cc',
+                 'sstables/prepended_input_stream.cc',
+                 'sstables/m_format_write_helpers.cc',
+                 'sstables/m_format_read_helpers.cc',
                 'transport/event.cc',
                 'transport/event_notifier.cc',
                 'transport/server.cc',
+                 'transport/messages/result_message.cc',
                 'cql3/abstract_marker.cc',
                 'cql3/attributes.cc',
                 'cql3/cf_name.cc',
@@ -370,7 +447,6 @@ scylla_core = (['database.cc',
                 'cql3/statements/create_table_statement.cc',
                 'cql3/statements/create_view_statement.cc',
                 'cql3/statements/create_type_statement.cc',
-                 'cql3/statements/create_user_statement.cc',
                 'cql3/statements/drop_index_statement.cc',
                 'cql3/statements/drop_keyspace_statement.cc',
                 'cql3/statements/drop_table_statement.cc',
@@ -392,8 +468,6 @@ scylla_core = (['database.cc',
                 'cql3/statements/truncate_statement.cc',
                 'cql3/statements/alter_table_statement.cc',
                 'cql3/statements/alter_view_statement.cc',
-                 'cql3/statements/alter_user_statement.cc',
-                 'cql3/statements/drop_user_statement.cc',
                 'cql3/statements/list_users_statement.cc',
                 'cql3/statements/authorization_statement.cc',
                 'cql3/statements/permission_altering_statement.cc',
@@ -402,9 +476,10 @@ scylla_core = (['database.cc',
                 'cql3/statements/revoke_statement.cc',
                 'cql3/statements/alter_type_statement.cc',
                 'cql3/statements/alter_keyspace_statement.cc',
+                 'cql3/statements/role-management-statements.cc',
                 'cql3/update_parameters.cc',
                 'cql3/ut_name.cc',
-                 'cql3/user_options.cc',
+                 'cql3/role_name.cc',
                 'thrift/handler.cc',
                 'thrift/server.cc',
                 'thrift/thrift_validation.cc',
@@ -440,21 +515,26 @@ scylla_core = (['database.cc',
                 'cql3/variable_specifications.cc',
                 'db/consistency_level.cc',
                 'db/system_keyspace.cc',
+                 'db/system_distributed_keyspace.cc',
+                 'db/size_estimates_virtual_reader.cc',
                 'db/schema_tables.cc',
                 'db/cql_type_parser.cc',
                 'db/legacy_schema_migrator.cc',
                 'db/commitlog/commitlog.cc',
                 'db/commitlog/commitlog_replayer.cc',
                 'db/commitlog/commitlog_entry.cc',
+                 'db/hints/manager.cc',
+                 'db/hints/resource_manager.cc',
                 'db/config.cc',
+                 'db/extensions.cc',
                 'db/heat_load_balance.cc',
-                 'db/index/secondary_index.cc',
+                 'db/large_partition_handler.cc',
                 'db/marshal/type_parser.cc',
                 'db/batchlog_manager.cc',
                 'db/view/view.cc',
+                 'db/view/row_locking.cc',
                 'index/secondary_index_manager.cc',
-                 'io/io.cc',
-                 'utils/utils.cc',
+                 'index/secondary_index.cc',
                 'utils/UUID_gen.cc',
                 'utils/i_filter.cc',
                 'utils/bloom_filter.cc',
@@ -490,7 +570,6 @@ scylla_core = (['database.cc',
                 'locator/network_topology_strategy.cc',
                 'locator/everywhere_replication_strategy.cc',
                 'locator/token_metadata.cc',
-                 'locator/locator.cc',
                 'locator/snitch_base.cc',
                 'locator/simple_snitch.cc',
                 'locator/rack_inferring_snitch.cc',
@@ -498,6 +577,7 @@ scylla_core = (['database.cc',
                 'locator/production_snitch_base.cc',
                 'locator/ec2_snitch.cc',
                 'locator/ec2_multi_region_snitch.cc',
+                 'locator/gce_snitch.cc',
                 'message/messaging_service.cc',
                 'service/client_state.cc',
                 'service/migration_task.cc',
@@ -530,12 +610,16 @@ scylla_core = (['database.cc',
                 'auth/authenticator.cc',
                 'auth/common.cc',
                 'auth/default_authorizer.cc',
-                 'auth/data_resource.cc',
+                 'auth/resource.cc',
+                 'auth/roles-metadata.cc',
                 'auth/password_authenticator.cc',
                 'auth/permission.cc',
                 'auth/permissions_cache.cc',
                 'auth/service.cc',
+                 'auth/standard_role_manager.cc',
                 'auth/transitional.cc',
+                 'auth/authentication_options.cc',
+                 'auth/role_or_anonymous.cc',
                 'tracing/tracing.cc',
                 'tracing/trace_keyspace_helper.cc',
                 'tracing/trace_state.cc',
@@ -545,6 +629,9 @@ scylla_core = (['database.cc',
                 'disk-error-handler.cc',
                 'duration.cc',
                 'vint-serialization.cc',
+                 'utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc',
+                 'querier.cc',
+                 'data/cell.cc',
                 ]
                + [Antlr3Grammar('cql3/Cql.g')]
                + [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -581,7 +668,9 @@ api = ['api/api.cc',
       'api/api-doc/stream_manager.json',
       'api/stream_manager.cc',
       'api/api-doc/system.json',
-       'api/system.cc'
+       'api/system.cc',
+       'api/config.cc',
+       'api/api-doc/config.json',
       ]

 idls = ['idl/gossip_digest.idl.hh',
@@ -609,7 +698,7 @@ idls = ['idl/gossip_digest.idl.hh',
        'idl/cache_temperature.idl.hh',
        ]

-scylla_tests_dependencies = scylla_core + api + idls + [
+scylla_tests_dependencies = scylla_core + idls + [
    'tests/cql_test_env.cc',
    'tests/cql_assertions.cc',
    'tests/result_set_assertions.cc',
@@ -622,7 +711,7 @@ scylla_tests_seastar_deps = [
 ]

 deps = {
-    'scylla': idls + ['main.cc'] + scylla_core + api,
+    'scylla': idls + ['main.cc', 'release.cc'] + scylla_core + api,
 }

 pure_boost_tests = set([
@@ -646,6 +735,15 @@ pure_boost_tests = set([
    'tests/compress_test',
    'tests/chunked_vector_test',
    'tests/big_decimal_test',
+    'tests/caching_options_test',
+    'tests/auth_resource_test',
+    'tests/enum_set_test',
+    'tests/cql_auth_syntax_test',
+    'tests/meta_test',
+    'tests/imr_test',
+    'tests/partition_data_test',
+    'tests/reusable_buffer_test',
+    'tests/json_test',
 ])

 tests_not_using_seastar_test_framework = set([
@@ -676,7 +774,14 @@ for t in scylla_tests:
        deps[t] += scylla_tests_dependencies 
        deps[t] += scylla_tests_seastar_deps
    else:
-        deps[t] += scylla_core + api + idls + ['tests/cql_test_env.cc']
+        deps[t] += scylla_core + idls + ['tests/cql_test_env.cc']
+
+perf_tests_seastar_deps = [
+    'seastar/tests/perf/perf_tests.cc'
+]
+
+for t in perf_tests:
+    deps[t] = [t + '.cc'] + scylla_tests_dependencies + perf_tests_seastar_deps

 deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc', 'tests/sstable_utils.cc']
 deps['tests/mutation_reader_test'] += ['tests/sstable_utils.cc']
@@ -688,6 +793,10 @@ deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/mur
 deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
 deps['tests/log_heap_test'] = ['tests/log_heap_test.cc']
 deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']
+deps['tests/perf/perf_fast_forward'] += ['release.cc']
+deps['tests/meta_test'] = ['tests/meta_test.cc']
+deps['tests/imr_test'] = ['tests/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
+deps['tests/reusable_buffer_test'] = ['tests/reusable_buffer_test.cc']

 warnings = [
    '-Wno-mismatched-tags',  # clang-only
@@ -703,14 +812,25 @@ warnings = [
    '-Wno-misleading-indentation',
    '-Wno-overflow',
    '-Wno-noexcept-type',
+    '-Wno-nonnull-compare'
    ]

 warnings = [w
            for w in warnings
-            if warning_supported(warning = w, compiler = args.cxx)]
+            if flag_supported(flag = w, compiler = args.cxx)]

 warnings = ' '.join(warnings + ['-Wno-error=deprecated-declarations'])

+optimization_flags = [
+    '--param inline-unit-growth=300',
+]
+optimization_flags = [o
+                      for o in optimization_flags
+                      if flag_supported(flag = o, compiler = args.cxx)]
+modes['release']['opt'] += ' ' + ' '.join(optimization_flags)
+
+gold_linker_flag = gold_supported(compiler = args.cxx)
+
 dbgflag = debug_flag(args.cxx) if args.debuginfo else ''
 tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'

@@ -750,6 +870,22 @@ for pkglist in optional_packages:
            alternatives = ':'.join(pkglist[1:])
            print('Missing optional package {pkglist[0]} (or alteratives {alternatives})'.format(**locals()))

+
+compiler_test_src = '''
+#if __GNUC__ < 7
+    #error "MAJOR"
+#elif __GNUC__ == 7
+    #if __GNUC_MINOR__ < 3
+        #error "MINOR"
+    #endif
+#endif
+
+int main() { return 0; }
+'''
+if not try_compile_and_link(compiler=args.cxx, source=compiler_test_src):
+    print('Wrong GCC version. Scylla needs GCC >= 7.3 to compile.')
+    sys.exit(1)
+
 if not try_compile(compiler=args.cxx, source='#include <boost/version.hpp>'):
    print('Boost not installed.  Please install {}.'.format(pkgname("boost-devel")))
    sys.exit(1)
@@ -798,14 +934,20 @@ if args.staticcxx:
    seastar_flags += ['--static-stdc++']
 if args.staticboost:
    seastar_flags += ['--static-boost']
+if args.staticyamlcpp:
+    seastar_flags += ['--static-yaml-cpp']
 if args.gcc6_concepts:
    seastar_flags += ['--enable-gcc6-concepts']
 if args.alloc_failure_injector:
    seastar_flags += ['--enable-alloc-failure-injector']

-seastar_cflags = args.user_cflags + " -march=nehalem"
+seastar_cflags = args.user_cflags
+if args.target != '':
+    seastar_cflags += ' -march=' + args.target
 seastar_ldflags = args.user_ldflags
-seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' %(seastar_ldflags)]
+seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' %(seastar_ldflags),
+                  '--c++-dialect=gnu++1z', '--optflags=%s' % (modes['release']['opt']),
+                 ]

 status = subprocess.call([python, './configure.py'] + seastar_flags, cwd = 'seastar')

@@ -836,11 +978,16 @@ for mode in build_modes:
 seastar_deps = 'practically_anything_can_change_so_lets_run_it_every_time_and_restat.'

 args.user_cflags += " " + pkg_config("--cflags", "jsoncpp")
-libs = ' '.join(['-lyaml-cpp', '-llz4', '-lz', '-lsnappy', pkg_config("--libs", "jsoncpp"),
-                 maybe_static(args.staticboost, '-lboost_filesystem'), ' -lcrypt',
+libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-llz4', '-lz', '-lsnappy', pkg_config("--libs", "jsoncpp"),
+                 maybe_static(args.staticboost, '-lboost_filesystem'), ' -lcrypt', ' -lcryptopp',
                 maybe_static(args.staticboost, '-lboost_date_time'),
                ])

+xxhash_dir = 'xxHash'
+
+if not os.path.exists(xxhash_dir) or not os.listdir(xxhash_dir):
+    raise Exception(xxhash_dir + ' is empty. Run "git submodule update --init".')
+
 if not args.staticboost:
    args.user_cflags += ' -DBOOST_TEST_DYN_LINK'

@@ -863,20 +1010,31 @@ os.makedirs(outdir, exist_ok = True)
 do_sanitize = True
 if args.static:
    do_sanitize = False
+
+if args.antlr3_exec:
+    antlr3_exec = args.antlr3_exec
+else:
+    antlr3_exec = "antlr3"
+
+if args.ragel_exec:
+    ragel_exec = args.ragel_exec
+else:
+    ragel_exec = "ragel"
+
 with open(buildfile, 'w') as f:
    f.write(textwrap.dedent('''\
        configure_args = {configure_args}
        builddir = {outdir}
        cxx = {cxx}
        cxxflags = {user_cflags} {warnings} {defines}
-        ldflags = -fuse-ld=gold {user_ldflags}
+        ldflags = {gold_linker_flag} {user_ldflags}
        libs = {libs}
        pool link_pool
            depth = {link_pool_depth}
        pool seastar_pool
            depth = 1
        rule ragel
-            command = ragel -G2 -o $out $in
+            command = {ragel_exec} -G2 -o $out $in
            description = RAGEL $out
        rule gen
            command = echo -e $text > $out
@@ -898,7 +1056,7 @@ with open(buildfile, 'w') as f:
    for mode in build_modes:
        modeval = modes[mode]
        f.write(textwrap.dedent('''\
-            cxxflags_{mode} = -I. -I $builddir/{mode}/gen -I seastar -I seastar/build/{mode}/gen
+            cxxflags_{mode} = {opt} -DXXH_PRIVATE_API -I. -I $builddir/{mode}/gen -I seastar -I seastar/build/{mode}/gen
            rule cxx.{mode}
              command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags -c -o $out $in
              description = CXX $out
@@ -922,7 +1080,7 @@ with open(buildfile, 'w') as f:
                # Because we add such a variable to every function, and because `ExceptionBaseType` is not a global
                # name, we also add a global typedef to avoid compilation errors. 
                command = sed -e '/^#if 0/,/^#endif/d' $in > $builddir/{mode}/gen/$in $
-                     && antlr3 $builddir/{mode}/gen/$in $
+                     && {antlr3_exec} $builddir/{mode}/gen/$in $
                     && sed -i -e 's/^\\( *\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$$/\\1const \\2/' $
                        -e '1i using ExceptionBaseType = int;' $
                        -e 's/^{{/{{ ExceptionBaseType\* ex = nullptr;/; $
@@ -930,7 +1088,7 @@ with open(buildfile, 'w') as f:
                            s/exceptions::syntax_exception e/exceptions::syntax_exception\& e/' $
                        build/{mode}/gen/${{stem}}Parser.cpp
                description = ANTLR3 $in
-            ''').format(mode = mode, **modeval))
+            ''').format(mode = mode, antlr3_exec = antlr3_exec, **modeval))
        f.write('build {mode}: phony {artifacts}\n'.format(mode = mode,
            artifacts = str.join(' ', ('$builddir/' + mode + '/' + x for x in build_artifacts))))
        compiles = {}
@@ -946,6 +1104,7 @@ with open(buildfile, 'w') as f:
            objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o')
                    for src in srcs
                    if src.endswith('.cc')]
+            objs.append('$builddir/../utils/arch/powerpc/crc32-vpmsum/crc32.S')
            has_thrift = False
            for dep in deps[binary]:
                if isinstance(dep, Thrift):
@@ -1049,7 +1208,7 @@ with open(buildfile, 'w') as f:
        rule configure
          command = {python} configure.py $configure_args
          generator = 1
-        build build.ninja: configure | configure.py
+        build build.ninja: configure | configure.py seastar/configure.py
        rule cscope
            command = find -name '*.[chS]' -o -name "*.cc" -o -name "*.hh" | cscope -bq -i-
            description = CSCOPE
--- a/converting_mutation_partition_applier.hh
+++ b/converting_mutation_partition_applier.hh
@@ -39,16 +39,32 @@ private:
        return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
    }
    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
-        if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
-            dst.apply(new_def, atomic_cell_or_collection(cell));
+        if (!is_compatible(new_def, old_type, kind) || cell.timestamp() <= new_def.dropped_at()) {
+            return;
        }
+        auto new_cell = [&] {
+            if (cell.is_live() && !old_type->is_counter()) {
+                if (cell.is_live_and_has_ttl()) {
+                    return atomic_cell_or_collection(
+                        atomic_cell::make_live(*new_def.type, cell.timestamp(), cell.value().linearize(), cell.expiry(), cell.ttl())
+                    );
+                }
+                return atomic_cell_or_collection(
+                    atomic_cell::make_live(*new_def.type, cell.timestamp(), cell.value().linearize())
+                );
+            } else {
+                return atomic_cell_or_collection(*new_def.type, cell);
+            }
+        }();
+        dst.apply(new_def, std::move(new_cell));
    }
    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
        if (!is_compatible(new_def, old_type, kind)) {
            return;
        }
+      cell.data.with_linearized([&] (bytes_view cell_bv) {
        auto&& ctype = static_pointer_cast<const collection_type_impl>(old_type);
-        auto old_view = ctype->deserialize_mutation_form(cell);
+        auto old_view = ctype->deserialize_mutation_form(cell_bv);

        collection_type_impl::mutation_view new_view;
        if (old_view.tomb.timestamp > new_def.dropped_at()) {
@@ -60,6 +76,7 @@ private:
            }
        }
        dst.apply(new_def, ctype->serialize_mutation_form(std::move(new_view)));
+      });
    }
 public:
    converting_mutation_partition_applier(
@@ -120,11 +137,11 @@ public:

    // Appends the cell to dst upgrading it to the new schema.
    // Cells must have monotonic names.
-    static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, const atomic_cell_or_collection& cell) {
+    static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const column_definition& old_def, const atomic_cell_or_collection& cell) {
        if (new_def.is_atomic()) {
-            accept_cell(dst, kind, new_def, old_type, cell.as_atomic_cell());
+            accept_cell(dst, kind, new_def, old_def.type, cell.as_atomic_cell(old_def));
        } else {
-            accept_cell(dst, kind, new_def, old_type, cell.as_collection_mutation());
+            accept_cell(dst, kind, new_def, old_def.type, cell.as_collection_mutation());
        }
    }
 };
--- a/counters.cc
+++ b/counters.cc
@@ -78,10 +78,10 @@ std::vector<counter_shard> counter_cell_view::shards_compatible_with_1_7_4() con
    return sorted_shards;
 }

-static bool apply_in_place(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
+static bool apply_in_place(const column_definition& cdef, atomic_cell_mutable_view dst, atomic_cell_mutable_view src)
 {
-    auto dst_ccmv = counter_cell_mutable_view(dst.as_mutable_atomic_cell());
-    auto src_ccmv = counter_cell_mutable_view(src.as_mutable_atomic_cell());
+    auto dst_ccmv = counter_cell_mutable_view(dst);
+    auto src_ccmv = counter_cell_mutable_view(src);
    auto dst_shards = dst_ccmv.shards();
    auto src_shards = src_ccmv.shards();

@@ -118,48 +118,19 @@ static bool apply_in_place(atomic_cell_or_collection& dst, atomic_cell_or_collec
    auto src_ts = src_ccmv.timestamp();
    dst_ccmv.set_timestamp(std::max(dst_ts, src_ts));
    src_ccmv.set_timestamp(dst_ts);
-    src.as_mutable_atomic_cell().set_counter_in_place_revert(true);
    return true;
 }

-static void revert_in_place_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
+void counter_cell_view::apply(const column_definition& cdef, atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
 {
-    assert(dst.can_use_mutable_view() && src.can_use_mutable_view());
-    auto dst_ccmv = counter_cell_mutable_view(dst.as_mutable_atomic_cell());
-    auto src_ccmv = counter_cell_mutable_view(src.as_mutable_atomic_cell());
-    auto dst_shards = dst_ccmv.shards();
-    auto src_shards = src_ccmv.shards();
-
-    auto dst_it = dst_shards.begin();
-    auto src_it = src_shards.begin();
-
-    while (src_it != src_shards.end()) {
-        while (dst_it != dst_shards.end() && dst_it->id() < src_it->id()) {
-            ++dst_it;
-        }
-        assert(dst_it != dst_shards.end() && dst_it->id() == src_it->id());
-        dst_it->swap_value_and_clock(*src_it);
-        ++src_it;
-    }
-
-    auto dst_ts = dst_ccmv.timestamp();
-    auto src_ts = src_ccmv.timestamp();
-    dst_ccmv.set_timestamp(src_ts);
-    src_ccmv.set_timestamp(dst_ts);
-    src.as_mutable_atomic_cell().set_counter_in_place_revert(false);
-}
-
-bool counter_cell_view::apply_reversibly(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
-{
-    auto dst_ac = dst.as_atomic_cell();
-    auto src_ac = src.as_atomic_cell();
+    auto dst_ac = dst.as_atomic_cell(cdef);
+    auto src_ac = src.as_atomic_cell(cdef);

    if (!dst_ac.is_live() || !src_ac.is_live()) {
        if (dst_ac.is_live() || (!src_ac.is_live() && compare_atomic_cell_for_merge(dst_ac, src_ac) < 0)) {
            std::swap(dst, src);
-            return true;
        }
-        return false;
+        return;
    }

    if (dst_ac.is_counter_update() && src_ac.is_counter_update()) {
@@ -167,22 +138,26 @@ bool counter_cell_view::apply_reversibly(atomic_cell_or_collection& dst, atomic_
        auto dst_v = dst_ac.counter_update_value();
        dst = atomic_cell::make_live_counter_update(std::max(dst_ac.timestamp(), src_ac.timestamp()),
                                                    src_v + dst_v);
-        return true;
+        return;
    }

    assert(!dst_ac.is_counter_update());
    assert(!src_ac.is_counter_update());
+ with_linearized(dst_ac, [&] (counter_cell_view dst_ccv) {
+  with_linearized(src_ac, [&] (counter_cell_view src_ccv) {

-    if (counter_cell_view(dst_ac).shard_count() >= counter_cell_view(src_ac).shard_count()
-        && dst.can_use_mutable_view() && src.can_use_mutable_view()) {
-        if (apply_in_place(dst, src)) {
-            return true;
+    if (dst_ccv.shard_count() >= src_ccv.shard_count()) {
+        auto dst_amc = dst.as_mutable_atomic_cell(cdef);
+        auto src_amc = src.as_mutable_atomic_cell(cdef);
+        if (!dst_amc.is_value_fragmented() && !src_amc.is_value_fragmented()) {
+            if (apply_in_place(cdef, dst_amc, src_amc)) {
+                return;
+            }
        }
    }

-    src.as_mutable_atomic_cell().set_counter_in_place_revert(false);
-    auto dst_shards = counter_cell_view(dst_ac).shards();
-    auto src_shards = counter_cell_view(src_ac).shards();
+    auto dst_shards = dst_ccv.shards();
+    auto src_shards = src_ccv.shards();

    counter_cell_builder result;
    combine(dst_shards.begin(), dst_shards.end(), src_shards.begin(), src_shards.end(),
@@ -191,22 +166,9 @@ bool counter_cell_view::apply_reversibly(atomic_cell_or_collection& dst, atomic_
            });

    auto cell = result.build(std::max(dst_ac.timestamp(), src_ac.timestamp()));
-    src = std::exchange(dst, atomic_cell_or_collection(cell));
-    return true;
-}
-
-void counter_cell_view::revert_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src)
-{
-    if (dst.as_atomic_cell().is_counter_update()) {
-        auto src_v = src.as_atomic_cell().counter_update_value();
-        auto dst_v = dst.as_atomic_cell().counter_update_value();
-        dst = atomic_cell::make_live(dst.as_atomic_cell().timestamp(),
-                                     long_type->decompose(dst_v - src_v));
-    } else if (src.as_atomic_cell().is_counter_in_place_revert_set()) {
-        revert_in_place_apply(dst, src);
-    } else {
-        std::swap(dst, src);
-    }
+    src = std::exchange(dst, atomic_cell_or_collection(std::move(cell)));
+  });
+ });
 }

 stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, atomic_cell_view b)
@@ -216,13 +178,15 @@ stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, at

    if (!b.is_live() || !a.is_live()) {
        if (b.is_live() || (!a.is_live() && compare_atomic_cell_for_merge(b, a) < 0)) {
-            return atomic_cell(a);
+            return atomic_cell(*counter_type, a);
        }
        return { };
    }

-    auto a_shards = counter_cell_view(a).shards();
-    auto b_shards = counter_cell_view(b).shards();
+ return with_linearized(a, [&] (counter_cell_view a_ccv) {
+  return with_linearized(b, [&] (counter_cell_view b_ccv) {
+    auto a_shards = a_ccv.shards();
+    auto b_shards = b_ccv.shards();

    auto a_it = a_shards.begin();
    auto a_end = a_shards.end();
@@ -244,18 +208,21 @@ stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, at
    if (!result.empty()) {
        diff = result.build(std::max(a.timestamp(), b.timestamp()));
    } else if (a.timestamp() > b.timestamp()) {
-        diff = atomic_cell::make_live(a.timestamp(), bytes_view());
+        diff = atomic_cell::make_live(*counter_type, a.timestamp(), bytes_view());
    }
    return diff;
+  });
+ });
 }


 void transform_counter_updates_to_shards(mutation& m, const mutation* current_state, uint64_t clock_offset) {
    // FIXME: allow current_state to be frozen_mutation

-    auto transform_new_row_to_shards = [clock_offset] (auto& cells) {
-        cells.for_each_cell([clock_offset] (auto, atomic_cell_or_collection& ac_o_c) {
-            auto acv = ac_o_c.as_atomic_cell();
+    auto transform_new_row_to_shards = [&s = *m.schema(), clock_offset] (column_kind kind, auto& cells) {
+        cells.for_each_cell([&] (column_id id, atomic_cell_or_collection& ac_o_c) {
+            auto& cdef = s.column_at(kind, id);
+            auto acv = ac_o_c.as_atomic_cell(cdef);
            if (!acv.is_live()) {
                return; // continue -- we are in lambda
            }
@@ -266,32 +233,35 @@ void transform_counter_updates_to_shards(mutation& m, const mutation* current_st
    };

    if (!current_state) {
-        transform_new_row_to_shards(m.partition().static_row());
+        transform_new_row_to_shards(column_kind::static_column, m.partition().static_row());
        for (auto& cr : m.partition().clustered_rows()) {
-            transform_new_row_to_shards(cr.row().cells());
+            transform_new_row_to_shards(column_kind::regular_column, cr.row().cells());
        }
        return;
    }

    clustering_key::less_compare cmp(*m.schema());

-    auto transform_row_to_shards = [clock_offset] (auto& transformee, auto& state) {
+    auto transform_row_to_shards = [&s = *m.schema(), clock_offset] (column_kind kind, auto& transformee, auto& state) {
        std::deque<std::pair<column_id, counter_shard>> shards;
        state.for_each_cell([&] (column_id id, const atomic_cell_or_collection& ac_o_c) {
-            auto acv = ac_o_c.as_atomic_cell();
+            auto& cdef = s.column_at(kind, id);
+            auto acv = ac_o_c.as_atomic_cell(cdef);
            if (!acv.is_live()) {
                return; // continue -- we are in lambda
            }
-            counter_cell_view ccv(acv);
+          counter_cell_view::with_linearized(acv, [&] (counter_cell_view ccv) {
            auto cs = ccv.local_shard();
            if (!cs) {
                return; // continue
            }
            shards.emplace_back(std::make_pair(id, counter_shard(*cs)));
+          });
        });

        transformee.for_each_cell([&] (column_id id, atomic_cell_or_collection& ac_o_c) {
-            auto acv = ac_o_c.as_atomic_cell();
+            auto& cdef = s.column_at(kind, id);
+            auto acv = ac_o_c.as_atomic_cell(cdef);
            if (!acv.is_live()) {
                return; // continue -- we are in lambda
            }
@@ -313,7 +283,7 @@ void transform_counter_updates_to_shards(mutation& m, const mutation* current_st
        });
    };

-    transform_row_to_shards(m.partition().static_row(), current_state->partition().static_row());
+    transform_row_to_shards(column_kind::static_column, m.partition().static_row(), current_state->partition().static_row());

    auto& cstate = current_state->partition();
    auto it = cstate.clustered_rows().begin();
@@ -323,10 +293,10 @@ void transform_counter_updates_to_shards(mutation& m, const mutation* current_st
            ++it;
        }
        if (it == end || cmp(cr.key(), it->key())) {
-            transform_new_row_to_shards(cr.row().cells());
+            transform_new_row_to_shards(column_kind::regular_column, cr.row().cells());
            continue;
        }

-        transform_row_to_shards(cr.row().cells(), it->row().cells());
+        transform_row_to_shards(column_kind::regular_column, cr.row().cells(), it->row().cells());
    }
 }
--- a/counters.hh
+++ b/counters.hh
@@ -79,7 +79,7 @@ static_assert(std::is_pod<counter_id>::value, "counter_id should be a POD type")

 std::ostream& operator<<(std::ostream& os, const counter_id& id);

-template<typename View>
+template<mutable_view is_mutable>
 class basic_counter_shard_view {
    enum class offset : unsigned {
        id = 0u,
@@ -88,7 +88,8 @@ class basic_counter_shard_view {
        total_size = unsigned(logical_clock) + sizeof(int64_t),
    };
 private:
-    typename View::pointer _base;
+    using pointer_type = std::conditional_t<is_mutable == mutable_view::no, const signed char*, signed char*>;
+    pointer_type _base;
 private:
    template<typename T>
    T read(offset off) const {
@@ -100,7 +101,7 @@ public:
    static constexpr auto size = size_t(offset::total_size);
 public:
    basic_counter_shard_view() = default;
-    explicit basic_counter_shard_view(typename View::pointer ptr) noexcept
+    explicit basic_counter_shard_view(pointer_type ptr) noexcept
        : _base(ptr) { }

    counter_id id() const { return read<counter_id>(offset::id); }
@@ -111,7 +112,7 @@ public:
        static constexpr size_t off = size_t(offset::value);
        static constexpr size_t size = size_t(offset::total_size) - off;

-        typename View::value_type tmp[size];
+        signed char tmp[size];
        std::copy_n(_base + off, size, tmp);
        std::copy_n(other._base + off, size, _base + off);
        std::copy_n(tmp, size, other._base + off);
@@ -138,7 +139,7 @@ public:
    };
 };

-using counter_shard_view = basic_counter_shard_view<bytes_view>;
+using counter_shard_view = basic_counter_shard_view<mutable_view::no>;

 std::ostream& operator<<(std::ostream& os, counter_shard_view csv);

@@ -198,7 +199,7 @@ public:
        return do_apply(other);
    }

-    static size_t serialized_size() {
+    static constexpr size_t serialized_size() {
        return counter_shard_view::size;
    }
    void serialize(bytes::iterator& out) const {
@@ -252,15 +253,33 @@ public:
    }

    atomic_cell build(api::timestamp_type timestamp) const {
-        return atomic_cell::make_live_from_serializer(timestamp, serialized_size(), [this] (bytes::iterator out) {
-            serialize(out);
-        });
+        // If we can assume that the counter shards never cross fragment boundaries
+        // the serialisation code gets much simpler.
+        static_assert(data::cell::maximum_external_chunk_length % counter_shard::serialized_size() == 0);
+
+        auto ac = atomic_cell::make_live_uninitialized(*counter_type, timestamp, serialized_size());
+
+        auto dst_it = ac.value().begin();
+        auto dst_current = *dst_it++;
+        for (auto&& cs : _shards) {
+            if (dst_current.empty()) {
+                dst_current = *dst_it++;
+            }
+            assert(!dst_current.empty());
+            auto value_dst = dst_current.data();
+            cs.serialize(value_dst);
+            dst_current.remove_prefix(counter_shard::serialized_size());
+        }
+        return ac;
    }

    static atomic_cell from_single_shard(api::timestamp_type timestamp, const counter_shard& cs) {
-        return atomic_cell::make_live_from_serializer(timestamp, counter_shard::serialized_size(), [&cs] (bytes::iterator out) {
-            cs.serialize(out);
-        });
+        // We don't really need to bother with fragmentation here.
+        static_assert(data::cell::maximum_external_chunk_length >= counter_shard::serialized_size());
+        auto ac = atomic_cell::make_live_uninitialized(*counter_type, timestamp, counter_shard::serialized_size());
+        auto dst = ac.value().first_fragment().begin();
+        cs.serialize(dst);
+        return ac;
    }

    class inserter_iterator : public std::iterator<std::output_iterator_tag, counter_shard> {
@@ -287,28 +306,32 @@ public:
 // <counter_id>   := <int64_t><int64_t>
 // <shard>        := <counter_id><int64_t:value><int64_t:logical_clock>
 // <counter_cell> := <shard>*
-template<typename View>
+template<mutable_view is_mutable>
 class basic_counter_cell_view {
 protected:
-    atomic_cell_base<View> _cell;
+    using linearized_value_view = std::conditional_t<is_mutable == mutable_view::no,
+                                                     bytes_view, bytes_mutable_view>;
+    using pointer_type = typename linearized_value_view::pointer;
+    basic_atomic_cell_view<is_mutable> _cell;
+    linearized_value_view _value;
 private:
-    class shard_iterator : public std::iterator<std::input_iterator_tag, basic_counter_shard_view<View>> {
-        typename View::pointer _current;
-        basic_counter_shard_view<View> _current_view;
+    class shard_iterator : public std::iterator<std::input_iterator_tag, basic_counter_shard_view<is_mutable>> {
+        pointer_type _current;
+        basic_counter_shard_view<is_mutable> _current_view;
    public:
        shard_iterator() = default;
-        shard_iterator(typename View::pointer ptr) noexcept
+        shard_iterator(pointer_type ptr) noexcept
            : _current(ptr), _current_view(ptr) { }

-        basic_counter_shard_view<View>& operator*() noexcept {
+        basic_counter_shard_view<is_mutable>& operator*() noexcept {
            return _current_view;
        }
-        basic_counter_shard_view<View>* operator->() noexcept {
+        basic_counter_shard_view<is_mutable>* operator->() noexcept {
            return &_current_view;
        }
        shard_iterator& operator++() noexcept {
            _current += counter_shard_view::size;
-            _current_view = basic_counter_shard_view<View>(_current);
+            _current_view = basic_counter_shard_view<is_mutable>(_current);
            return *this;
        }
        shard_iterator operator++(int) noexcept {
@@ -318,7 +341,7 @@ private:
        }
        shard_iterator& operator--() noexcept {
            _current -= counter_shard_view::size;
-            _current_view = basic_counter_shard_view<View>(_current);
+            _current_view = basic_counter_shard_view<is_mutable>(_current);
            return *this;
        }
        shard_iterator operator--(int) noexcept {
@@ -335,22 +358,23 @@ private:
    };
 public:
    boost::iterator_range<shard_iterator> shards() const {
-        auto bv = _cell.value();
-        auto begin = shard_iterator(bv.data());
-        auto end = shard_iterator(bv.data() + bv.size());
+        auto begin = shard_iterator(_value.data());
+        auto end = shard_iterator(_value.data() + _value.size());
        return boost::make_iterator_range(begin, end);
    }

    size_t shard_count() const {
-        return _cell.value().size() / counter_shard_view::size;
+        return _cell.value().size_bytes() / counter_shard_view::size;
    }
-public:
+protected:
    // ac must be a live counter cell
-    explicit basic_counter_cell_view(atomic_cell_base<View> ac) noexcept : _cell(ac) {
+    explicit basic_counter_cell_view(basic_atomic_cell_view<is_mutable> ac, linearized_value_view vv) noexcept
+        : _cell(ac), _value(vv)
+    {
        assert(_cell.is_live());
        assert(!_cell.is_counter_update());
    }
-
+public:
    api::timestamp_type timestamp() const { return _cell.timestamp(); }

    static data_type total_value_type() { return long_type; }
@@ -381,18 +405,22 @@ public:
    }
 };

-struct counter_cell_view : basic_counter_cell_view<bytes_view> {
+struct counter_cell_view : basic_counter_cell_view<mutable_view::no> {
    using basic_counter_cell_view::basic_counter_cell_view;

+    template<typename Function>
+    static decltype(auto) with_linearized(basic_atomic_cell_view<mutable_view::no> ac, Function&& fn) {
+        return ac.value().with_linearized([&] (bytes_view value_view) {
+            counter_cell_view ccv(ac, value_view);
+            return fn(ccv);
+        });
+    }
+
    // Returns counter shards in an order that is compatible with Scylla 1.7.4.
    std::vector<counter_shard> shards_compatible_with_1_7_4() const;

    // Reversibly applies two counter cells, at least one of them must be live.
-    // Returns true iff dst was modified.
-    static bool apply_reversibly(atomic_cell_or_collection& dst, atomic_cell_or_collection& src);
-
-    // Reverts apply performed by apply_reversible().
-    static void revert_apply(atomic_cell_or_collection& dst, atomic_cell_or_collection& src);
+    static void apply(const column_definition& cdef, atomic_cell_or_collection& dst, atomic_cell_or_collection& src);

    // Computes a counter cell containing minimal amount of data which, when
    // applied to 'b' returns the same cell as 'a' and 'b' applied together.
@@ -401,9 +429,15 @@ struct counter_cell_view : basic_counter_cell_view<bytes_view> {
    friend std::ostream& operator<<(std::ostream& os, counter_cell_view ccv);
 };

-struct counter_cell_mutable_view : basic_counter_cell_view<bytes_mutable_view> {
+struct counter_cell_mutable_view : basic_counter_cell_view<mutable_view::yes> {
    using basic_counter_cell_view::basic_counter_cell_view;

+    explicit counter_cell_mutable_view(atomic_cell_mutable_view ac) noexcept
+        : basic_counter_cell_view<mutable_view::yes>(ac, ac.value().first_fragment())
+    {
+        assert(!ac.value().is_fragmented());
+    }
+
    void set_timestamp(api::timestamp_type ts) { _cell.set_timestamp(ts); }
 };

--- a/cpu_controller.hh
+++ b/cpu_controller.hh
@@ -1,89 +0,0 @@
-/*
- * Copyright (C) 2017 ScyllaDB
- */
-
-/*
- * This file is part of Scylla.
- *
- * Scylla is free software: you can redistribute it and/or modify
- * it under the terms of the GNU Affero General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Scylla is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#pragma once
-#include <seastar/core/thread.hh>
-#include <seastar/core/timer.hh>
-#include <chrono>
-
-// Simple proportional controller to adjust shares of memtable/streaming flushes.
-//
-// Goal is to flush as fast as we can, but not so fast that we steal all the CPU from incoming
-// requests, and at the same time minimize user-visible fluctuations in the flush quota.
-//
-// What that translates to is we'll try to keep virtual dirty's firt derivative at 0 (IOW, we keep
-// virtual dirty constant), which means that the rate of incoming writes is equal to the rate of
-// flushed bytes.
-//
-// The exact point at which the controller stops determines the desired flush CPU usage. As we
-// approach the hard dirty limit, we need to be more aggressive. We will therefore define two
-// thresholds, and increase the constant as we cross them.
-//
-//  1) the soft limit line
-//  2) halfway between soft limit and dirty limit
-//
-// The constants q1 and q2 are used to determine the proportional factor at each stage.
-//
-// Below the soft limit, we are in no particular hurry to flush, since it means we're set to
-// complete flushing before we a new memtable is ready. The quota is dirty * q1, and q1 is set to a
-// low number.
-//
-// The first half of the virtual dirty region is where we expect to be usually, so we have a low
-// slope corresponding to a sluggish response between q1 * soft_limit and q2.
-//
-// In the second half, we're getting close to the hard dirty limit so we increase the slope and
-// become more responsive, up to a maximum quota of qmax.
-//
-// For now we'll just set them in the structure not to complicate the constructor. But q1, q2 and
-// qmax can easily become parameters if we find another user.
-class flush_cpu_controller {
-    static constexpr float hard_dirty_limit = 0.50;
-    static constexpr float q1 = 0.01;
-    static constexpr float q2 = 0.2;
-    static constexpr float qmax = 1;
-
-    float _current_quota = 0.0f;
-    float _goal;
-    std::function<float()> _current_dirty;
-    std::chrono::milliseconds _interval;
-    timer<> _update_timer;
-
-    seastar::thread_scheduling_group _scheduling_group;
-    seastar::thread_scheduling_group *_current_scheduling_group = nullptr;
-
-    void adjust();
-public:
-    seastar::thread_scheduling_group* scheduling_group() {
-        return _current_scheduling_group;
-    }
-    float current_quota() const {
-        return _current_quota;
-    }
-
-    struct disabled {
-        seastar::thread_scheduling_group *backup;
-    };
-    flush_cpu_controller(disabled d) : _scheduling_group(std::chrono::nanoseconds(0), 0), _current_scheduling_group(d.backup) {}
-    flush_cpu_controller(std::chrono::milliseconds interval, float soft_limit, std::function<float()> current_dirty);
-    flush_cpu_controller(flush_cpu_controller&&) = default;
-};
-
-
--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -56,13 +56,16 @@ options {
 #include "cql3/statements/index_prop_defs.hh"
 #include "cql3/statements/raw/use_statement.hh"
 #include "cql3/statements/raw/batch_statement.hh"
-#include "cql3/statements/create_user_statement.hh"
-#include "cql3/statements/alter_user_statement.hh"
-#include "cql3/statements/drop_user_statement.hh"
 #include "cql3/statements/list_users_statement.hh"
 #include "cql3/statements/grant_statement.hh"
 #include "cql3/statements/revoke_statement.hh"
 #include "cql3/statements/list_permissions_statement.hh"
+#include "cql3/statements/alter_role_statement.hh"
+#include "cql3/statements/list_roles_statement.hh"
+#include "cql3/statements/grant_role_statement.hh"
+#include "cql3/statements/revoke_role_statement.hh"
+#include "cql3/statements/drop_role_statement.hh"
+#include "cql3/statements/create_role_statement.hh"
 #include "cql3/statements/index_target.hh"
 #include "cql3/statements/ks_prop_defs.hh"
 #include "cql3/selection/raw_selector.hh"
@@ -80,6 +83,8 @@ options {
 #include "cql3/maps.hh"
 #include "cql3/sets.hh"
 #include "cql3/lists.hh"
+#include "cql3/role_name.hh"
+#include "cql3/role_options.hh"
 #include "cql3/type_cast.hh"
 #include "cql3/tuples.hh"
 #include "cql3/user_types.hh"
@@ -89,6 +94,7 @@ options {
 #include "core/sstring.hh"
 #include "CqlLexer.hpp"

+#include <algorithm>
 #include <unordered_map>
 #include <map>
 }
@@ -236,6 +242,12 @@ struct uninitialized {
        return res;
    }

+    bool convert_boolean_literal(stdx::string_view s) {
+        std::string lower_s(s.size(), '\0');
+        std::transform(s.cbegin(), s.cend(), lower_s.begin(), &::tolower);
+        return lower_s == "true";
+    }
+
    void add_raw_update(std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>,::shared_ptr<cql3::operation::raw_update>>>& operations,
        ::shared_ptr<cql3::column_identifier::raw> key, ::shared_ptr<cql3::operation::raw_update> update)
    {
@@ -345,6 +357,12 @@ cqlStatement returns [shared_ptr<raw::parsed_statement> stmt]
    | st32=createViewStatement         { $stmt = st32; }
    | st33=alterViewStatement          { $stmt = st33; }
    | st34=dropViewStatement           { $stmt = st34; }
+    | st35=listRolesStatement          { $stmt = st35; }
+    | st36=grantRoleStatement          { $stmt = st36; }
+    | st37=revokeRoleStatement         { $stmt = st37; }
+    | st38=dropRoleStatement           { $stmt = st38; }
+    | st39=createRoleStatement         { $stmt = st39; }
+    | st40=alterRoleStatement          { $stmt = st40; }
    ;

 /*
@@ -355,7 +373,7 @@ useStatement returns [::shared_ptr<raw::use_statement> stmt]
    ;

 /**
- * SELECT <expression>
+ * SELECT [JSON] <expression>
 * FROM <CF>
 * WHERE KEY = "key1" AND COL > 1 AND COL < 100
 * LIMIT <NUMBER>;
@@ -366,10 +384,12 @@ selectStatement returns [shared_ptr<raw::select_statement> expr]
        ::shared_ptr<cql3::term::raw> limit;
        raw::select_statement::parameters::orderings_type orderings;
        bool allow_filtering = false;
+        bool is_json = false;
    }
-    : K_SELECT ( ( K_DISTINCT { is_distinct = true; } )?
-                 sclause=selectClause
-               | sclause=selectCountClause
+    : K_SELECT (
+                ( K_JSON { is_json = true; } )?
+                ( K_DISTINCT { is_distinct = true; } )?
+                sclause=selectClause
               )
      K_FROM cf=columnFamilyName
      ( K_WHERE wclause=whereClause )?
@@ -377,7 +397,7 @@ selectStatement returns [shared_ptr<raw::select_statement> expr]
      ( K_LIMIT rows=intValue { limit = rows; } )?
      ( K_ALLOW K_FILTERING  { allow_filtering = true; } )?
      {
-          auto params = ::make_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering);
+          auto params = ::make_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, is_json);
          $expr = ::make_shared<raw::select_statement>(std::move(cf), std::move(params),
            std::move(sclause), std::move(wclause), std::move(limit));
      }
@@ -396,6 +416,7 @@ selector returns [shared_ptr<raw_selector> s]
 unaliasedSelector returns [shared_ptr<selectable::raw> s]
    @init { shared_ptr<selectable::raw> tmp; }
    :  ( c=cident                                  { tmp = c; }
+       | K_COUNT '(' countArgument ')'             { tmp = selectable::with_function::raw::make_count_rows_function(); }
       | K_WRITETIME '(' c=cident ')'              { tmp = make_shared<selectable::writetime_or_ttl::raw>(c, true); }
       | K_TTL       '(' c=cident ')'              { tmp = make_shared<selectable::writetime_or_ttl::raw>(c, false); }
       | f=functionName args=selectionFunctionArgs { tmp = ::make_shared<selectable::with_function::raw>(std::move(f), std::move(args)); }
@@ -412,16 +433,6 @@ selectionFunctionArgs returns [std::vector<shared_ptr<selectable::raw>> a]
      ')'
    ;

-selectCountClause returns [std::vector<shared_ptr<raw_selector>> expr]
-    @init{ auto alias = make_shared<cql3::column_identifier>("count", false); }
-    : K_COUNT '(' countArgument ')' (K_AS c=ident { alias = c; })? {
-        auto&& with_fn = ::make_shared<cql3::selection::selectable::with_function::raw>(
-     	    cql3::functions::function_name::native_function("countRows"),
-     	        std::vector<shared_ptr<cql3::selection::selectable::raw>>()); 
-     	$expr.push_back(make_shared<cql3::selection::raw_selector>(with_fn, alias));
-     }
-    ;
-
 countArgument
    : '*'
    | i=INTEGER { if (i->getText() != "1") {
@@ -440,33 +451,51 @@ orderByClause[raw::select_statement::parameters::orderings_type& orderings]
    : c=cident (K_ASC | K_DESC { reversed = true; })? { orderings.emplace_back(c, reversed); }
    ;

+jsonValue returns [::shared_ptr<cql3::term::raw> value]
+    :
+    | s=STRING_LITERAL { $value = cql3::constants::literal::string(sstring{$s.text}); }
+    | ':' id=ident     { $value = new_bind_variables(id); }
+    | QMARK            { $value = new_bind_variables(shared_ptr<cql3::column_identifier>{}); }
+    ;
+
 /**
 * INSERT INTO <CF> (<column>, <column>, <column>, ...)
 * VALUES (<value>, <value>, <value>, ...)
 * USING TIMESTAMP <long>;
 *
 */
-insertStatement returns [::shared_ptr<raw::insert_statement> expr]
+insertStatement returns [::shared_ptr<raw::modification_statement> expr]
    @init {
        auto attrs = ::make_shared<cql3::attributes::raw>();
        std::vector<::shared_ptr<cql3::column_identifier::raw>> column_names;
        std::vector<::shared_ptr<cql3::term::raw>> values;
        bool if_not_exists = false;
+        ::shared_ptr<cql3::term::raw> json_value;
    }
    : K_INSERT K_INTO cf=columnFamilyName
-          '(' c1=cident { column_names.push_back(c1); }  ( ',' cn=cident { column_names.push_back(cn); } )* ')'
-        K_VALUES
-          '(' v1=term { values.push_back(v1); } ( ',' vn=term { values.push_back(vn); } )* ')'
-
-        ( K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
-        ( usingClause[attrs] )?
-      {
-          $expr = ::make_shared<raw::insert_statement>(std::move(cf),
-                                                   std::move(attrs),
-                                                   std::move(column_names),
-                                                   std::move(values),
-                                                   if_not_exists);
-      }
+        ('(' c1=cident { column_names.push_back(c1); }  ( ',' cn=cident { column_names.push_back(cn); } )* ')'
+            K_VALUES
+            '(' v1=term { values.push_back(v1); } ( ',' vn=term { values.push_back(vn); } )* ')'
+            ( K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
+            ( usingClause[attrs] )?
+              {
+              $expr = ::make_shared<raw::insert_statement>(std::move(cf),
+                                                       std::move(attrs),
+                                                       std::move(column_names),
+                                                       std::move(values),
+                                                       if_not_exists);
+              }
+        | K_JSON
+          json_token=jsonValue { json_value = $json_token.value; }
+            ( K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
+            ( usingClause[attrs] )?
+              {
+              $expr = ::make_shared<raw::insert_json_statement>(std::move(cf),
+                                                       std::move(attrs),
+                                                       std::move(json_value),
+                                                       if_not_exists);
+              }
+        )
    ;

 usingClause[::shared_ptr<cql3::attributes::raw> attrs]
@@ -975,7 +1004,7 @@ truncateStatement returns [::shared_ptr<truncate_statement> stmt]
    ;

 /**
- * GRANT <permission> ON <resource> TO <username>
+ * GRANT <permission> ON <resource> TO <grantee>
 */
 grantStatement returns [::shared_ptr<grant_statement> stmt]
    : K_GRANT
@@ -983,12 +1012,12 @@ grantStatement returns [::shared_ptr<grant_statement> stmt]
      K_ON
          resource
      K_TO
-          username
-      { $stmt = ::make_shared<grant_statement>($permissionOrAll.perms, $resource.res, $username.text); } 
+          grantee=userOrRoleName
+      { $stmt = ::make_shared<grant_statement>($permissionOrAll.perms, $resource.res, std::move(grantee)); } 
    ;

 /**
- * REVOKE <permission> ON <resource> FROM <username>
+ * REVOKE <permission> ON <resource> FROM <revokee>
 */
 revokeStatement returns [::shared_ptr<revoke_statement> stmt]
    : K_REVOKE
@@ -996,80 +1025,104 @@ revokeStatement returns [::shared_ptr<revoke_statement> stmt]
      K_ON
          resource
      K_FROM
-          username
-      { $stmt = ::make_shared<revoke_statement>($permissionOrAll.perms, $resource.res, $username.text); } 
+          revokee=userOrRoleName
+      { $stmt = ::make_shared<revoke_statement>($permissionOrAll.perms, $resource.res, std::move(revokee)); } 
+    ;
+
+/**
+ * GRANT <rolename> to <grantee>
+ */
+grantRoleStatement returns [::shared_ptr<grant_role_statement> stmt]
+    : K_GRANT role=userOrRoleName K_TO grantee=userOrRoleName
+      { $stmt = ::make_shared<grant_role_statement>(std::move(role), std::move(grantee));  }
+    ;
+
+/**
+ * REVOKE <rolename> FROM <revokee>
+ */
+revokeRoleStatement returns [::shared_ptr<revoke_role_statement> stmt]
+    : K_REVOKE role=userOrRoleName K_FROM revokee=userOrRoleName
+      { $stmt = ::make_shared<revoke_role_statement>(std::move(role), std::move(revokee)); }
    ;

 listPermissionsStatement returns [::shared_ptr<list_permissions_statement> stmt]
    @init {
-		std::experimental::optional<auth::data_resource> r;
-		std::experimental::optional<sstring> u;
+		std::optional<auth::resource> r;
+		std::optional<sstring> role;
 		bool recursive = true;
    }
    : K_LIST
          permissionOrAll
      ( K_ON resource { r = $resource.res; } )?
-      ( K_OF username { u = sstring($username.text); } )?
+      ( K_OF rn=userOrRoleName { role = sstring(static_cast<cql3::role_name>(rn).to_string()); } )?
      ( K_NORECURSIVE { recursive = false; } )?
-      { $stmt = ::make_shared<list_permissions_statement>($permissionOrAll.perms, std::move(r), std::move(u), recursive); } 
+      { $stmt = ::make_shared<list_permissions_statement>($permissionOrAll.perms, std::move(r), std::move(role), recursive); } 
    ;

 permission returns [auth::permission perm]
-    : p=(K_CREATE | K_ALTER | K_DROP | K_SELECT | K_MODIFY | K_AUTHORIZE)
+    : p=(K_CREATE | K_ALTER | K_DROP | K_SELECT | K_MODIFY | K_AUTHORIZE | K_DESCRIBE)
    { $perm = auth::permissions::from_string($p.text); }
    ;

 permissionOrAll returns [auth::permission_set perms]
-    : K_ALL ( K_PERMISSIONS )?       { $perms = auth::permissions::ALL_DATA; }
+    : K_ALL ( K_PERMISSIONS )?       { $perms = auth::permissions::ALL; }
    | p=permission ( K_PERMISSION )? { $perms = auth::permission_set::from_mask(auth::permission_set::mask_for($p.perm)); }
    ;

-resource returns [auth::data_resource res]
-    : r=dataResource { $res = $r.res; }
+resource returns [uninitialized<auth::resource> res]
+    : d=dataResource { $res = std::move(d); }
+    | r=roleResource { $res = std::move(r); }
    ;

-dataResource returns [auth::data_resource res]
-    : K_ALL K_KEYSPACES { $res = auth::data_resource(); }
-    | K_KEYSPACE ks = keyspaceName { $res = auth::data_resource($ks.id); }
+dataResource returns [uninitialized<auth::resource> res]
+    : K_ALL K_KEYSPACES { $res = auth::resource(auth::resource_kind::data); }
+    | K_KEYSPACE ks = keyspaceName { $res = auth::make_data_resource($ks.id); }
    | ( K_COLUMNFAMILY )? cf = columnFamilyName
-      { $res = auth::data_resource($cf.name->get_keyspace(), $cf.name->get_column_family()); }
+      { $res = auth::make_data_resource($cf.name->get_keyspace(), $cf.name->get_column_family()); }
+    ;
+
+roleResource returns [uninitialized<auth::resource> res]
+    : K_ALL K_ROLES { $res = auth::resource(auth::resource_kind::role); }
+    | K_ROLE role = userOrRoleName { $res = auth::make_role_resource(static_cast<const cql3::role_name&>(role).to_string()); }
    ;

 /**
 * CREATE USER [IF NOT EXISTS] <username> [WITH PASSWORD <password>] [SUPERUSER|NOSUPERUSER]
 */
-createUserStatement returns [::shared_ptr<create_user_statement> stmt]
+createUserStatement returns [::shared_ptr<create_role_statement> stmt]
    @init {
-    	auto opts = ::make_shared<cql3::user_options>();
-        bool superuser = false;
+        cql3::role_options opts;
+        opts.is_superuser = false;
+        opts.can_login = true;
+
        bool ifNotExists = false;
    }
    : K_CREATE K_USER (K_IF K_NOT K_EXISTS { ifNotExists = true; })? username
-      ( K_WITH userOptions[opts] )?
-      ( K_SUPERUSER { superuser = true; } | K_NOSUPERUSER { superuser = false; } )?
-      { $stmt = ::make_shared<create_user_statement>($username.text, std::move(opts), superuser, ifNotExists); }
+      ( K_WITH K_PASSWORD v=STRING_LITERAL { opts.password = $v.text; })?
+      ( K_SUPERUSER { opts.is_superuser = true; } | K_NOSUPERUSER { opts.is_superuser = false; } )?
+      { $stmt = ::make_shared<create_role_statement>(cql3::role_name($username.text, cql3::preserve_role_case::yes), std::move(opts), ifNotExists); }
    ;

 /**
 * ALTER USER <username> [WITH PASSWORD <password>] [SUPERUSER|NOSUPERUSER]
 */
-alterUserStatement returns [::shared_ptr<alter_user_statement> stmt]
+alterUserStatement returns [::shared_ptr<alter_role_statement> stmt]
    @init {
-    	auto opts = ::make_shared<cql3::user_options>();
-    	std::experimental::optional<bool> superuser;
+        cql3::role_options opts;
    }
    : K_ALTER K_USER username
-      ( K_WITH userOptions[opts] )?
-      ( K_SUPERUSER { superuser = true; } | K_NOSUPERUSER { superuser = false; } )?
-      { $stmt = ::make_shared<alter_user_statement>($username.text, std::move(opts), std::move(superuser)); }
+      ( K_WITH K_PASSWORD v=STRING_LITERAL { opts.password = $v.text; })?
+      ( K_SUPERUSER { opts.is_superuser = true; } | K_NOSUPERUSER { opts.is_superuser = false; } )?
+      { $stmt = ::make_shared<alter_role_statement>(cql3::role_name($username.text, cql3::preserve_role_case::yes), std::move(opts)); }
    ;

 /**
 * DROP USER [IF EXISTS] <username>
 */
-dropUserStatement returns [::shared_ptr<drop_user_statement> stmt]
+dropUserStatement returns [::shared_ptr<drop_role_statement> stmt]
    @init { bool ifExists = false; }
-    : K_DROP K_USER (K_IF K_EXISTS { ifExists = true; })? username { $stmt = ::make_shared<drop_user_statement>($username.text, ifExists); }
+    : K_DROP K_USER (K_IF K_EXISTS { ifExists = true; })? username
+      { $stmt = ::make_shared<drop_role_statement>(cql3::role_name($username.text, cql3::preserve_role_case::yes), ifExists); }
    ;

 /**
@@ -1079,12 +1132,67 @@ listUsersStatement returns [::shared_ptr<list_users_statement> stmt]
    : K_LIST K_USERS { $stmt = ::make_shared<list_users_statement>(); }
    ;

-userOptions[::shared_ptr<cql3::user_options> opts]
-    : userOption[opts]
+/**
+ * CREATE ROLE [IF NOT EXISTS] <role_name> [WITH <roleOption> [AND <roleOption>]*]
+ */
+createRoleStatement returns [::shared_ptr<create_role_statement> stmt]
+    @init {
+        cql3::role_options opts;
+        opts.is_superuser = false;
+        opts.can_login = false;
+        bool if_not_exists = false;
+    }
+    : K_CREATE K_ROLE (K_IF K_NOT K_EXISTS { if_not_exists = true; })? name=userOrRoleName
+      (K_WITH roleOptions[opts])?
+      { $stmt = ::make_shared<create_role_statement>(name, std::move(opts), if_not_exists); }
    ;

-userOption[::shared_ptr<cql3::user_options> opts]
-    : k=K_PASSWORD v=STRING_LITERAL { opts->put($k.text, $v.text); }
+/**
+ * ALTER ROLE <rolename> [WITH <roleOption> [AND <roleOption>]*]
+ */
+alterRoleStatement returns [::shared_ptr<alter_role_statement> stmt]
+    @init {
+        cql3::role_options opts;
+    }
+    : K_ALTER K_ROLE name=userOrRoleName
+      (K_WITH roleOptions[opts])?
+      { $stmt = ::make_shared<alter_role_statement>(name, std::move(opts)); }
+    ;
+
+/**
+ * DROP ROLE [IF EXISTS] <rolename>
+ */
+dropRoleStatement returns [::shared_ptr<drop_role_statement> stmt]
+    @init {
+        bool if_exists = false;
+    }
+    : K_DROP K_ROLE (K_IF K_EXISTS { if_exists = true; })? name=userOrRoleName
+      { $stmt = ::make_shared<drop_role_statement>(name, if_exists); }
+    ;
+
+/**
+ * LIST ROLES [OF <rolename>] [NORECURSIVE]
+ */
+listRolesStatement returns [::shared_ptr<list_roles_statement> stmt]
+    @init {
+        bool recursive = true;
+        std::optional<cql3::role_name> grantee;
+    }
+    : K_LIST K_ROLES
+        (K_OF g=userOrRoleName { grantee = std::move(g); })?
+        (K_NORECURSIVE { recursive = false; })?
+        { $stmt = ::make_shared<list_roles_statement>(grantee, recursive); }
+    ;
+
+roleOptions[cql3::role_options& opts]
+    : roleOption[opts] (K_AND roleOption[opts])*
+    ;
+
+roleOption[cql3::role_options& opts]
+    : K_PASSWORD '=' v=STRING_LITERAL { opts.password = $v.text; }
+    | K_OPTIONS '=' m=mapLiteral { opts.options = convert_property_map(m); }
+    | K_SUPERUSER '=' b=BOOLEAN { opts.is_superuser = convert_boolean_literal($b.text); }
+    | K_LOGIN '=' b=BOOLEAN { opts.can_login = convert_boolean_literal($b.text); }
    ;

 /** DEFINITIONS **/
@@ -1125,12 +1233,13 @@ userTypeName returns [uninitialized<cql3::ut_name> name]
    : (ks=ident '.')? ut=non_type_ident { $name = cql3::ut_name(ks, ut); }
    ;

-#if 0
-userOrRoleName returns [RoleName name]
-    @init { $name = new RoleName(); }
-    : roleName[name] {return $name;}
+userOrRoleName returns [uninitialized<cql3::role_name> name]
+    : t=IDENT              { $name = cql3::role_name($t.text, cql3::preserve_role_case::no); }
+    | t=STRING_LITERAL     { $name = cql3::role_name($t.text, cql3::preserve_role_case::yes); }
+    | t=QUOTED_NAME        { $name = cql3::role_name($t.text, cql3::preserve_role_case::yes); }
+    | k=unreserved_keyword { $name = cql3::role_name(k, cql3::preserve_role_case::no); }
+    | QMARK {add_recognition_error("Bind variables cannot be used for role names");}
    ;
-#endif

 ksName[::shared_ptr<cql3::keyspace_element_name> name]
    : t=IDENT              { $name->set_keyspace($t.text, false);}
@@ -1153,15 +1262,6 @@ idxName[::shared_ptr<cql3::index_name> name]
    | QMARK {add_recognition_error("Bind variables cannot be used for index names");}
    ;

-#if 0
-roleName[RoleName name]
-    : t=IDENT              { $name.setName($t.text, false); }
-    | t=QUOTED_NAME        { $name.setName($t.text, true); }
-    | k=unreserved_keyword { $name.setName(k, false); }
-    | QMARK {addRecognitionError("Bind variables cannot be used for role names");}
-    ;
-#endif
-
 constant returns [shared_ptr<cql3::constants::literal> constant]
    @init{std::string sign;}
    : t=STRING_LITERAL { $constant = cql3::constants::literal::string(sstring{$t.text}); }
@@ -1506,6 +1606,7 @@ tuple_type returns [shared_ptr<cql3::cql3_type::raw> t]
 username
    : IDENT
    | STRING_LITERAL
+    | QUOTED_NAME { add_recognition_error("Quoted strings are not supported for user names"); }
    ;

 // Basically the same as cident, but we need to exlude existing CQL3 types
@@ -1544,8 +1645,13 @@ basic_unreserved_keyword returns [sstring str]
        | K_ALL
        | K_USER
        | K_USERS
+        | K_ROLE
+        | K_ROLES
        | K_SUPERUSER
        | K_NOSUPERUSER
+        | K_LOGIN
+        | K_NOLOGIN
+        | K_OPTIONS
        | K_PASSWORD
        | K_EXISTS
        | K_CUSTOM
@@ -1565,6 +1671,7 @@ basic_unreserved_keyword returns [sstring str]
        | K_LANGUAGE
        | K_NON
        | K_DETERMINISTIC
+        | K_JSON
        ) { $str = $k.text; }
    ;

@@ -1637,13 +1744,19 @@ K_OF:          O F;
 K_REVOKE:      R E V O K E;
 K_MODIFY:      M O D I F Y;
 K_AUTHORIZE:   A U T H O R I Z E;
+K_DESCRIBE:    D E S C R I B E;
 K_NORECURSIVE: N O R E C U R S I V E;

 K_USER:        U S E R;
 K_USERS:       U S E R S;
+K_ROLE:        R O L E;
+K_ROLES:       R O L E S;
 K_SUPERUSER:   S U P E R U S E R;
 K_NOSUPERUSER: N O S U P E R U S E R;
 K_PASSWORD:    P A S S W O R D;
+K_LOGIN:       L O G I N;
+K_NOLOGIN:     N O L O G I N;
+K_OPTIONS:     O P T I O N S;

 K_CLUSTERING:  C L U S T E R I N G;
 K_ASCII:       A S C I I;
@@ -1695,6 +1808,7 @@ K_NON:         N O N;
 K_OR:          O R;
 K_REPLACE:     R E P L A C E;
 K_DETERMINISTIC: D E T E R M I N I S T I C;
+K_JSON:        J S O N;

 K_SCYLLA_TIMEUUID_LIST_INDEX: S C Y L L A '_' T I M E U U I D '_' L I S T '_' I N D E X;
 K_SCYLLA_COUNTER_SHARD_LIST: S C Y L L A '_' C O U N T E R '_' S H A R D '_' L I S T; 
--- a/cql3/authorized_prepared_statements_cache.hh
+++ b/cql3/authorized_prepared_statements_cache.hh
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2018 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "cql3/prepared_statements_cache.hh"
+
+namespace cql3 {
+
+struct authorized_prepared_statements_cache_size {
+    size_t operator()(const statements::prepared_statement::checked_weak_ptr& val) {
+        // TODO: improve the size approximation - most of the entry is occupied by the key here.
+        return 100;
+    }
+};
+
+class authorized_prepared_statements_cache_key {
+public:
+    using cache_key_type = std::pair<auth::authenticated_user, typename cql3::prepared_cache_key_type::cache_key_type>;
+private:
+    cache_key_type _key;
+
+public:
+    authorized_prepared_statements_cache_key(auth::authenticated_user user, cql3::prepared_cache_key_type prepared_cache_key)
+        : _key(std::move(user), std::move(prepared_cache_key.key())) {}
+
+    cache_key_type& key() { return _key; }
+
+    const cache_key_type& key() const { return _key; }
+
+    bool operator==(const authorized_prepared_statements_cache_key& other) const {
+        return _key == other._key;
+    }
+
+    bool operator!=(const authorized_prepared_statements_cache_key& other) const {
+        return !(*this == other);
+    }
+
+    static size_t hash(const auth::authenticated_user& user, const cql3::prepared_cache_key_type::cache_key_type& prep_cache_key) {
+        return utils::hash_combine(std::hash<auth::authenticated_user>()(user), utils::tuple_hash()(prep_cache_key));
+    }
+};
+
+/// \class authorized_prepared_statements_cache
+/// \brief A cache of previously authorized statements.
+///
+/// Entries are inserted every time a new statement is authorized.
+/// Entries are evicted in any of the following cases:
+///    - When the corresponding prepared statement is not valid anymore.
+///    - Periodically, with the same period as the permission cache is refreshed.
+///    - If the corresponding entry hasn't been used for \ref entry_expiry.
+class authorized_prepared_statements_cache {
+public:
+    struct stats {
+        uint64_t authorized_prepared_statements_cache_evictions = 0;
+    };
+
+    static stats& shard_stats() {
+        static thread_local stats _stats;
+        return _stats;
+    }
+
+    struct authorized_prepared_statements_cache_stats_updater {
+        static void inc_hits() noexcept {}
+        static void inc_misses() noexcept {}
+        static void inc_blocks() noexcept {}
+        static void inc_evictions() noexcept {
+            ++shard_stats().authorized_prepared_statements_cache_evictions;
+        }
+    };
+
+private:
+    using cache_key_type = authorized_prepared_statements_cache_key;
+    using checked_weak_ptr = typename statements::prepared_statement::checked_weak_ptr;
+    using cache_type = utils::loading_cache<cache_key_type,
+                                            checked_weak_ptr,
+                                            utils::loading_cache_reload_enabled::yes,
+                                            authorized_prepared_statements_cache_size,
+                                            std::hash<cache_key_type>,
+                                            std::equal_to<cache_key_type>,
+                                            authorized_prepared_statements_cache_stats_updater>;
+
+public:
+    using key_type = cache_key_type;
+    using value_type = checked_weak_ptr;
+    using entry_is_too_big = typename cache_type::entry_is_too_big;
+    using iterator = typename cache_type::iterator;
+
+private:
+    cache_type _cache;
+    logging::logger& _logger;
+
+public:
+    // Choose the memory budget such that would allow us ~4K entries when a shard gets 1GB of RAM
+    authorized_prepared_statements_cache(std::chrono::milliseconds entry_expiration, std::chrono::milliseconds entry_refresh, size_t cache_size, logging::logger& logger)
+        : _cache(cache_size, entry_expiration, entry_refresh, logger, [this] (const key_type& k) {
+            _cache.remove(k);
+            return make_ready_future<value_type>();
+        })
+        , _logger(logger)
+    {}
+
+    future<> insert(auth::authenticated_user user, cql3::prepared_cache_key_type prep_cache_key, value_type v) noexcept {
+        return _cache.get_ptr(key_type(std::move(user), std::move(prep_cache_key)), [v = std::move(v)] (const cache_key_type&) mutable {
+            return make_ready_future<value_type>(std::move(v));
+        }).discard_result();
+    }
+
+    iterator find(const auth::authenticated_user& user, const cql3::prepared_cache_key_type& prep_cache_key) {
+        struct key_view {
+            const auth::authenticated_user& user_ref;
+            const cql3::prepared_cache_key_type& prep_cache_key_ref;
+        };
+
+        struct hasher {
+            size_t operator()(const key_view& kv) {
+                return cql3::authorized_prepared_statements_cache_key::hash(kv.user_ref, kv.prep_cache_key_ref.key());
+            }
+        };
+
+        struct equal {
+            bool operator()(const key_type& k1, const key_view& k2) {
+                return k1.key().first == k2.user_ref && k1.key().second == k2.prep_cache_key_ref.key();
+            }
+
+            bool operator()(const key_view& k2, const key_type& k1) {
+                return operator()(k1, k2);
+            }
+        };
+
+        return _cache.find(key_view{user, prep_cache_key}, hasher(), equal());
+    }
+
+    iterator end() {
+        return _cache.end();
+    }
+
+    void remove(const auth::authenticated_user& user, const cql3::prepared_cache_key_type& prep_cache_key) {
+        iterator it = find(user, prep_cache_key);
+        _cache.remove(it);
+    }
+
+    size_t size() const {
+        return _cache.size();
+    }
+
+    size_t memory_footprint() const {
+        return _cache.memory_footprint();
+    }
+
+    future<> stop() {
+        return _cache.stop();
+    }
+};
+
+}
+
+namespace std {
+template <>
+struct hash<cql3::authorized_prepared_statements_cache_key> final {
+    size_t operator()(const cql3::authorized_prepared_statements_cache_key& k) const {
+        return cql3::authorized_prepared_statements_cache_key::hash(k.key().first, k.key().second);
+    }
+};
+
+inline std::ostream& operator<<(std::ostream& out, const cql3::authorized_prepared_statements_cache_key& k) {
+    return out << "{ " << k.key().first << ", " << k.key().second << " }";
+}
+}
--- a/cql3/column_identifier.cc
+++ b/cql3/column_identifier.cc
@@ -22,6 +22,7 @@
 #include "cql3/column_identifier.hh"
 #include "exceptions/exceptions.hh"
 #include "cql3/selection/simple_selector.hh"
+#include "cql3/util.hh"

 #include <regex>

@@ -62,14 +63,11 @@ sstring column_identifier::to_string() const {
 }

 sstring column_identifier::to_cql_string() const {
-    static const std::regex unquoted_identifier_re("[a-z][a-z0-9_]*");
-    if (std::regex_match(_text.begin(), _text.end(), unquoted_identifier_re)) {
-        return _text;
-    }
-    static const std::regex double_quote_re("\"");
-    std::string result = _text;
-    std::regex_replace(result, double_quote_re, "\"\"");
-    return '"' + result + '"';
+    return util::maybe_quote(_text);
+}
+
+sstring column_identifier::raw::to_cql_string() const {
+    return util::maybe_quote(_text);
 }

 column_identifier::raw::raw(sstring raw_text, bool keep_case)
--- a/cql3/column_identifier.hh
+++ b/cql3/column_identifier.hh
@@ -123,6 +123,7 @@ public:
    bool operator!=(const raw& other) const;

    virtual sstring to_string() const;
+    sstring to_cql_string() const;

    friend std::hash<column_identifier::raw>;
    friend std::ostream& operator<<(std::ostream& out, const column_identifier::raw& id);
--- a/cql3/constants.hh
+++ b/cql3/constants.hh
@@ -85,8 +85,8 @@ public:
            virtual ::shared_ptr<terminal> bind(const query_options& options) override { return {}; }
            virtual sstring to_string() const override { return "null"; }
        };
-        static thread_local const ::shared_ptr<terminal> NULL_VALUE;
    public:
+        static thread_local const ::shared_ptr<terminal> NULL_VALUE;
        virtual ::shared_ptr<term> prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) override {
            if (!is_assignable(test_assignment(db, keyspace, receiver))) {
                throw exceptions::invalid_request_exception("Invalid null value for counter increment/decrement");
@@ -123,7 +123,7 @@ public:
            // This is a workaround for antlr3 not distinguishing between
            // calling in lexer setText() with an empty string and not calling
            // setText() at all.
-            if (text.size() == 1 && text[0] == -1) {
+            if (text.size() == 1 && text[0] == '\xFF') {
                text.reset();
            }
            return ::make_shared<literal>(type::STRING, text);
@@ -203,10 +203,14 @@ public:

        virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override {
            auto value = _t->bind_and_get(params._options);
+            execute(m, prefix, params, column, std::move(value));
+        }
+
+        static void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params, const column_definition& column, cql3::raw_value_view value) {
            if (value.is_null()) {
                m.set_cell(prefix, column, std::move(make_dead_cell(params)));
            } else if (value.is_value()) {
-                m.set_cell(prefix, column, std::move(make_cell(*value, params)));
+                m.set_cell(prefix, column, std::move(make_cell(*column.type, *value, params)));
            }
        }
    };
--- a/cql3/cql3_type.cc
+++ b/cql3/cql3_type.cc
@@ -395,18 +395,15 @@ operator<<(std::ostream& os, const cql3_type::raw& r) {

 namespace util {

-sstring maybe_quote(const sstring& s) {
-    static const std::regex unquoted("\\w*");
-    static const std::regex double_quote("\"");
-
-    if (std::regex_match(s.begin(), s.end(), unquoted)) {
-        return s;
+sstring maybe_quote(const sstring& identifier) {
+    static const std::regex unquoted_identifier_re("[a-z][a-z0-9_]*");
+    if (std::regex_match(identifier.begin(), identifier.end(), unquoted_identifier_re)) {
+        return identifier;
    }
-    std::ostringstream ss;
-    ss << "\"";
-    std::regex_replace(std::ostreambuf_iterator<char>(ss), s.begin(), s.end(), double_quote, "\"\"");
-    ss << "\"";
-    return ss.str();
+    static const std::regex double_quote_re("\"");
+    std::string result = identifier;
+    std::regex_replace(result, double_quote_re, "\"\"");
+    return '"' + result + '"';
 }

 }
--- a/cql3/cql_statement.hh
+++ b/cql3/cql_statement.hh
@@ -45,6 +45,7 @@
 #include "service/query_state.hh"
 #include "service/storage_proxy.hh"
 #include "cql3/query_options.hh"
+#include "timeout_config.hh"

 namespace cql_transport {

@@ -62,10 +63,15 @@ class metadata;
 shared_ptr<const metadata> make_empty_metadata();

 class cql_statement {
+    timeout_config_selector _timeout_config_selector;
 public:
+    explicit cql_statement(timeout_config_selector timeout_selector) : _timeout_config_selector(timeout_selector) {}
+
    virtual ~cql_statement()
    { }

+    timeout_config_selector get_timeout_config_selector() const { return _timeout_config_selector; }
+
    virtual uint32_t get_bound_terms() = 0;

    /**
@@ -81,7 +87,7 @@ public:
     *
     * @param state the current client state
     */
-    virtual void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) = 0;
+    virtual void validate(service::storage_proxy& proxy, const service::client_state& state) = 0;

    /**
     * Execute the statement and return the resulting result or null if there is no result.
@@ -90,15 +96,7 @@ public:
     * @param options options for this query (consistency, variables, pageSize, ...)
     */
    virtual future<::shared_ptr<cql_transport::messages::result_message>>
-        execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) = 0;
-
-    /**
-     * Variant of execute used for internal query against the system tables, and thus only query the local node = 0.
-     *
-     * @param state the current query state
-     */
-    virtual future<::shared_ptr<cql_transport::messages::result_message>>
-        execute_internal(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) = 0;
+        execute(service::storage_proxy& proxy, service::query_state& state, const query_options& options) = 0;

    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const = 0;

@@ -111,6 +109,7 @@ public:

 class cql_statement_no_metadata : public cql_statement {
 public:
+    using cql_statement::cql_statement;
    virtual shared_ptr<const metadata> get_result_metadata() const override {
        return make_empty_metadata();
    }
--- a/cql3/error_collector.hh
+++ b/cql3/error_collector.hh
@@ -67,6 +67,12 @@ class error_collector : public error_listener<RecognizerType, ExceptionBaseType>
     */
    const sstring_view _query;

+    /**
+     * An empty bitset to be used as a workaround for AntLR null dereference
+     * bug.
+     */
+    static typename ExceptionBaseType::BitsetListType _empty_bit_list;
+
 public:

    /**
@@ -144,6 +150,14 @@ private:
            break;
        }
        default:
+            // AntLR Exception class has a bug of dereferencing a null
+            // pointer in the displayRecognitionError. The following
+            // if statement makes sure it will not be null before the
+            // call to that function (displayRecognitionError).
+            // bug reference: https://github.com/antlr/antlr3/issues/191
+            if (!ex->get_expectingSet()) {
+                ex->set_expectingSet(&_empty_bit_list);
+            }
            ex->displayRecognitionError(token_names, msg);
        }
        return msg.str();
@@ -345,4 +359,8 @@ private:
 #endif
 };

+template<typename RecognizerType, typename TokenType, typename ExceptionBaseType>
+typename ExceptionBaseType::BitsetListType
+error_collector<RecognizerType,TokenType,ExceptionBaseType>::_empty_bit_list = typename ExceptionBaseType::BitsetListType();
+
 }
--- a/cql3/functions/abstract_function.hh
+++ b/cql3/functions/abstract_function.hh
@@ -42,6 +42,7 @@
 #pragma once

 #include "types.hh"
+#include "cql3/cql3_type.hh"
 #include <vector>
 #include <iosfwd>
 #include <boost/functional/hash.hpp>
@@ -90,6 +91,10 @@ public:
        return false;
    }

+    virtual sstring column_name(const std::vector<sstring>& column_names) override {
+        return sprint("%s(%s)", _name, join(", ", column_names));
+    }
+
    virtual void print(std::ostream& os) const override;
 };

@@ -101,9 +106,9 @@ abstract_function::print(std::ostream& os) const {
        if (i > 0) {
            os << ", ";
        }
-        os << _arg_types[i]->name(); // FIXME: asCQL3Type()
+        os << _arg_types[i]->as_cql3_type()->to_string();
    }
-    os << ") -> " << _return_type->name(); // FIXME: asCQL3Type()
+    os << ") -> " << _return_type->as_cql3_type()->to_string();
 }

 }
--- a/cql3/functions/aggregate_fcts.hh
+++ b/cql3/functions/aggregate_fcts.hh
@@ -67,6 +67,19 @@ public:
    }
 };

+static const sstring COUNT_ROWS_FUNCTION_NAME = "countRows";
+
+class count_rows_function final : public native_aggregate_function {
+public:
+    count_rows_function() : native_aggregate_function(COUNT_ROWS_FUNCTION_NAME, long_type, {}) {}
+    virtual std::unique_ptr<aggregate> new_aggregate() override {
+        return std::make_unique<impl_count_function>();
+    }
+    virtual sstring column_name(const std::vector<sstring>& column_names) override {
+        return "count";
+    }
+};
+
    /**
     * The function used to count the number of rows of a result set. This function is called when COUNT(*) or COUNT(1)
     * is specified.
@@ -74,7 +87,7 @@ public:
 inline
 shared_ptr<aggregate_function>
 make_count_rows_function() {
-    return make_native_aggregate_function_using<impl_count_function>("countRows", long_type);
+    return make_shared<count_rows_function>();
 }

 template <typename Type>
@@ -214,9 +227,29 @@ make_avg_function() {
    return make_shared<avg_function_for<Type>>();
 }

+template <typename T>
+struct aggregate_type_for {
+    using type = T;
+};
+
+template<>
+struct aggregate_type_for<simple_date_native_type> {
+    using type = simple_date_native_type::primary_type;
+};
+
+template<>
+struct aggregate_type_for<timestamp_native_type> {
+    using type = timestamp_native_type::primary_type;
+};
+
+template<>
+struct aggregate_type_for<timeuuid_native_type> {
+    using type = timeuuid_native_type::primary_type;
+};
+
 template <typename Type>
 class impl_max_function_for final : public aggregate_function::aggregate {
-   std::experimental::optional<Type> _max{};
+   std::experimental::optional<typename aggregate_type_for<Type>::type> _max{};
 public:
    virtual void reset() override {
        _max = {};
@@ -225,13 +258,13 @@ public:
        if (!_max) {
            return {};
        }
-        return data_type_for<Type>()->decompose(*_max);
+        return data_type_for<Type>()->decompose(Type{*_max});
    }
    virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
        if (!values[0]) {
            return;
        }
-        auto val = value_cast<Type>(data_type_for<Type>()->deserialize(*values[0]));
+        auto val = value_cast<typename aggregate_type_for<Type>::type>(data_type_for<Type>()->deserialize(*values[0]));
        if (!_max) {
            _max = val;
        } else {
@@ -263,7 +296,7 @@ make_max_function() {

 template <typename Type>
 class impl_min_function_for final : public aggregate_function::aggregate {
-   std::experimental::optional<Type> _min{};
+   std::experimental::optional<typename aggregate_type_for<Type>::type> _min{};
 public:
    virtual void reset() override {
        _min = {};
@@ -272,13 +305,13 @@ public:
        if (!_min) {
            return {};
        }
-        return data_type_for<Type>()->decompose(*_min);
+        return data_type_for<Type>()->decompose(Type{*_min});
    }
    virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
        if (!values[0]) {
            return;
        }
-        auto val = value_cast<Type>(data_type_for<Type>()->deserialize(*values[0]));
+        auto val = value_cast<typename aggregate_type_for<Type>::type>(data_type_for<Type>()->deserialize(*values[0]));
        if (!_min) {
            _min = val;
        } else {
--- a/cql3/functions/function.hh
+++ b/cql3/functions/function.hh
@@ -81,6 +81,15 @@ public:
    virtual void print(std::ostream& os) const = 0;
    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) = 0;
    virtual bool has_reference_to(function& f) = 0;
+
+    /**
+     * Returns the name of the function to use within a ResultSet.
+     *
+     * @param column_names the names of the columns used to call the function
+     * @return the name of the function to use within a ResultSet
+     */
+    virtual sstring column_name(const std::vector<sstring>& column_names) = 0;
+
    friend class function_call;
    friend std::ostream& operator<<(std::ostream& os, const function& f);
 };
--- a/cql3/functions/function_name.hh
+++ b/cql3/functions/function_name.hh
@@ -42,10 +42,16 @@
 #pragma once

 #include "core/sstring.hh"
-#include "db/system_keyspace.hh"
+#include "seastarx.hh"
 #include <iosfwd>
 #include <functional>

+namespace db {
+
+sstring system_keyspace_name();
+
+}
+
 namespace cql3 {

 namespace functions {
@@ -56,7 +62,7 @@ public:
    sstring name;

    static function_name native_function(sstring name) {
-        return function_name(db::system_keyspace::NAME, name);
+        return function_name(db::system_keyspace_name(), name);
    }

    function_name() = default; // for ANTLR
--- a/cql3/functions/functions.cc
+++ b/cql3/functions/functions.cc
@@ -20,6 +20,7 @@
 */

 #include "functions.hh"
+
 #include "function_call.hh"
 #include "token_fct.hh"
 #include "cql3/maps.hh"
@@ -41,11 +42,22 @@ functions::init() {
    declare(time_uuid_fcts::make_min_timeuuid_fct());
    declare(time_uuid_fcts::make_max_timeuuid_fct());
    declare(time_uuid_fcts::make_date_of_fct());
-    declare(time_uuid_fcts::make_unix_timestamp_of_fcf());
+    declare(time_uuid_fcts::make_unix_timestamp_of_fct());
+    declare(time_uuid_fcts::make_currenttimestamp_fct());
+    declare(time_uuid_fcts::make_currentdate_fct());
+    declare(time_uuid_fcts::make_currenttime_fct());
+    declare(time_uuid_fcts::make_currenttimeuuid_fct());
+    declare(time_uuid_fcts::make_timeuuidtodate_fct());
+    declare(time_uuid_fcts::make_timestamptodate_fct());
+    declare(time_uuid_fcts::make_timeuuidtotimestamp_fct());
+    declare(time_uuid_fcts::make_datetotimestamp_fct());
+    declare(time_uuid_fcts::make_timeuuidtounixtimestamp_fct());
+    declare(time_uuid_fcts::make_timestamptounixtimestamp_fct());
+    declare(time_uuid_fcts::make_datetounixtimestamp_fct());
    declare(make_uuid_fct());

    for (auto&& type : cql3_type::values()) {
-        // Note: because text and varchar ends up being synonimous, our automatic makeToBlobFunction doesn't work
+        // Note: because text and varchar ends up being synonymous, our automatic makeToBlobFunction doesn't work
        // for varchar, so we special case it below. We also skip blob for obvious reasons.
        if (type == cql3_type::varchar || type == cql3_type::blob) {
            continue;
@@ -95,6 +107,22 @@ functions::init() {
    declare(aggregate_fcts::make_max_function<sstring>());
    declare(aggregate_fcts::make_min_function<sstring>());

+    declare(aggregate_fcts::make_count_function<simple_date_native_type>());
+    declare(aggregate_fcts::make_max_function<simple_date_native_type>());
+    declare(aggregate_fcts::make_min_function<simple_date_native_type>());
+
+    declare(aggregate_fcts::make_count_function<timestamp_native_type>());
+    declare(aggregate_fcts::make_max_function<timestamp_native_type>());
+    declare(aggregate_fcts::make_min_function<timestamp_native_type>());
+
+    declare(aggregate_fcts::make_count_function<timeuuid_native_type>());
+    declare(aggregate_fcts::make_max_function<timeuuid_native_type>());
+    declare(aggregate_fcts::make_min_function<timeuuid_native_type>());
+
+    declare(aggregate_fcts::make_count_function<utils::UUID>());
+    declare(aggregate_fcts::make_max_function<utils::UUID>());
+    declare(aggregate_fcts::make_min_function<utils::UUID>());
+
    //FIXME:
    //declare(aggregate_fcts::make_count_function<bytes>());
    //declare(aggregate_fcts::make_max_function<bytes>());
@@ -144,23 +172,73 @@ functions::get_overload_count(const function_name& name) {
    return _declared.count(name);
 }

+inline
+shared_ptr<function>
+make_to_json_function(data_type t) {
+    return make_native_scalar_function<true>("tojson", utf8_type, {t},
+            [t](cql_serialization_format sf, const std::vector<bytes_opt>& parameters) -> bytes_opt {
+        return utf8_type->decompose(t->to_json_string(parameters[0]));
+    });
+}
+
+inline
+shared_ptr<function>
+make_from_json_function(database& db, const sstring& keyspace, data_type t) {
+    return make_native_scalar_function<true>("fromjson", t, {utf8_type},
+            [&db, &keyspace, t](cql_serialization_format sf, const std::vector<bytes_opt>& parameters) -> bytes_opt {
+        Json::Value json_value = json::to_json_value(utf8_type->to_string(parameters[0].value()));
+        bytes_opt parsed_json_value;
+        if (!json_value.isNull()) {
+            parsed_json_value.emplace(t->from_json_object(json_value, sf));
+        }
+        return std::move(parsed_json_value);
+    });
+}
+
 shared_ptr<function>
 functions::get(database& db,
        const sstring& keyspace,
        const function_name& name,
        const std::vector<shared_ptr<assignment_testable>>& provided_args,
        const sstring& receiver_ks,
-        const sstring& receiver_cf) {
+        const sstring& receiver_cf,
+        shared_ptr<column_specification> receiver) {

    static const function_name TOKEN_FUNCTION_NAME = function_name::native_function("token");
+    static const function_name TO_JSON_FUNCTION_NAME = function_name::native_function("tojson");
+    static const function_name FROM_JSON_FUNCTION_NAME = function_name::native_function("fromjson");

    if (name.has_keyspace()
-        ? name == TOKEN_FUNCTION_NAME
-        : name.name == TOKEN_FUNCTION_NAME.name)
-    {
+                ? name == TOKEN_FUNCTION_NAME
+                : name.name == TOKEN_FUNCTION_NAME.name) {
        return ::make_shared<token_fct>(db.find_schema(receiver_ks, receiver_cf));
    }

+    if (name.has_keyspace()
+                ? name == TO_JSON_FUNCTION_NAME
+                : name.name == TO_JSON_FUNCTION_NAME.name) {
+        if (provided_args.size() != 1) {
+            throw exceptions::invalid_request_exception("toJson() accepts 1 argument only");
+        }
+        selection::selector *sp = dynamic_cast<selection::selector *>(provided_args[0].get());
+        if (!sp) {
+            throw exceptions::invalid_request_exception("toJson() is only valid in SELECT clause");
+        }
+        return make_to_json_function(sp->get_type());
+    }
+
+    if (name.has_keyspace()
+                ? name == FROM_JSON_FUNCTION_NAME
+                : name.name == FROM_JSON_FUNCTION_NAME.name) {
+        if (provided_args.size() != 1) {
+            throw exceptions::invalid_request_exception("fromJson() accepts 1 argument only");
+        }
+        if (!receiver) {
+            throw exceptions::invalid_request_exception("fromJson() can only be called if receiver type is known");
+        }
+        return make_from_json_function(db, keyspace, receiver->type);
+    }
+
    std::vector<shared_ptr<function>> candidates;
    auto&& add_declared = [&] (function_name fn) {
        auto&& fns = _declared.equal_range(fn);
@@ -405,7 +483,7 @@ function_call::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<
            [] (auto&& x) -> shared_ptr<assignment_testable> {
        return x;
    });
-    auto&& fun = functions::functions::get(db, keyspace, _name, args, receiver->ks_name, receiver->cf_name);
+    auto&& fun = functions::functions::get(db, keyspace, _name, args, receiver->ks_name, receiver->cf_name, receiver);
    if (!fun) {
        throw exceptions::invalid_request_exception(sprint("Unknown function %s called", _name));
    }
@@ -469,7 +547,7 @@ function_call::raw::test_assignment(database& db, const sstring& keyspace, share
    // of another, existing, function. In that case, we return true here because we'll throw a proper exception
    // later with a more helpful error message that if we were to return false here.
    try {
-        auto&& fun = functions::get(db, keyspace, _name, _terms, receiver->ks_name, receiver->cf_name);
+        auto&& fun = functions::get(db, keyspace, _name, _terms, receiver->ks_name, receiver->cf_name, receiver);
        if (fun && receiver->type->equals(fun->return_type())) {
            return assignment_testable::test_result::EXACT_MATCH;
        } else if (!fun || receiver->type->is_value_compatible_with(*fun->return_type())) {
--- a/cql3/functions/functions.hh
+++ b/cql3/functions/functions.hh
@@ -80,16 +80,18 @@ public:
                                    const function_name& name,
                                    const std::vector<shared_ptr<assignment_testable>>& provided_args,
                                    const sstring& receiver_ks,
-                                    const sstring& receiver_cf);
+                                    const sstring& receiver_cf,
+                                    ::shared_ptr<column_specification> receiver = nullptr);
    template <typename AssignmentTestablePtrRange>
    static shared_ptr<function> get(database& db,
                                    const sstring& keyspace,
                                    const function_name& name,
                                    AssignmentTestablePtrRange&& provided_args,
                                    const sstring& receiver_ks,
-                                    const sstring& receiver_cf) {
+                                    const sstring& receiver_cf,
+                                    ::shared_ptr<column_specification> receiver = nullptr) {
        const std::vector<shared_ptr<assignment_testable>> args(std::begin(provided_args), std::end(provided_args));
-        return get(db, keyspace, name, args, receiver_ks, receiver_cf);
+        return get(db, keyspace, name, args, receiver_ks, receiver_cf, receiver);
    }
    static std::vector<shared_ptr<function>> find(const function_name& name);
    static shared_ptr<function> find(const function_name& name, const std::vector<data_type>& arg_types);
--- a/cql3/functions/native_aggregate_function.hh
+++ b/cql3/functions/native_aggregate_function.hh
@@ -64,23 +64,5 @@ public:
    }
 };

-template <class Aggregate>
-class native_aggregate_function_using : public native_aggregate_function {
-public:
-    native_aggregate_function_using(sstring name, data_type type)
-            : native_aggregate_function(std::move(name), type, {}) {
-    }
-    virtual std::unique_ptr<aggregate> new_aggregate() override {
-        return std::make_unique<Aggregate>();
-    }
-};
-
-template <class Aggregate>
-shared_ptr<native_aggregate_function>
-make_native_aggregate_function_using(sstring name, data_type type) {
-    return ::make_shared<native_aggregate_function_using<Aggregate>>(name, type);
-}
-
-
 }
 }
--- a/cql3/functions/time_uuid_fcts.hh
+++ b/cql3/functions/time_uuid_fcts.hh
@@ -117,7 +117,7 @@ make_date_of_fct() {

 inline
 shared_ptr<function>
-make_unix_timestamp_of_fcf() {
+make_unix_timestamp_of_fct() {
    return make_native_scalar_function<true>("unixtimestampof", long_type, { timeuuid_type },
            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
        using namespace utils;
@@ -129,6 +129,163 @@ make_unix_timestamp_of_fcf() {
    });
 }

+inline shared_ptr<function>
+make_currenttimestamp_fct() {
+    return make_native_scalar_function<true>("currenttimestamp", timestamp_type, {},
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        return {timestamp_type->decompose(timestamp_native_type{db_clock::now()})};
+    });
+}
+
+inline shared_ptr<function>
+make_currenttime_fct() {
+    return make_native_scalar_function<true>("currenttime", time_type, {},
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        constexpr int64_t milliseconds_in_day = 3600 * 24 * 1000;
+        int64_t milliseconds_since_epoch = std::chrono::duration_cast<std::chrono::milliseconds>(db_clock::now().time_since_epoch()).count();
+        int64_t nanoseconds_today = (milliseconds_since_epoch % milliseconds_in_day) * 1000 * 1000;
+        return {time_type->decompose(time_native_type{nanoseconds_today})};
+    });
+}
+
+inline shared_ptr<function>
+make_currentdate_fct() {
+    return make_native_scalar_function<true>("currentdate", simple_date_type, {},
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
+        return {simple_date_type->decompose(to_simple_date(timestamp_native_type{db_clock::now()}))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_currenttimeuuid_fct() {
+    return make_native_scalar_function<true>("currenttimeuuid", timeuuid_type, {},
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        return {timeuuid_type->decompose(timeuuid_native_type{utils::UUID_gen::get_time_UUID()})};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timeuuidtodate_fct() {
+    return make_native_scalar_function<true>("todate", simple_date_type, { timeuuid_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
+        auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
+        return {simple_date_type->decompose(to_simple_date(ts))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timestamptodate_fct() {
+    return make_native_scalar_function<true>("todate", simple_date_type, { timestamp_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto ts_obj = timestamp_type->deserialize(*bb);
+        if (ts_obj.is_null()) {
+            return {};
+        }
+        auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
+        return {simple_date_type->decompose(to_simple_date(ts_obj))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timeuuidtotimestamp_fct() {
+    return make_native_scalar_function<true>("totimestamp", timestamp_type, { timeuuid_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
+        return {timestamp_type->decompose(ts)};
+    });
+}
+
+inline
+shared_ptr<function>
+make_datetotimestamp_fct() {
+    return make_native_scalar_function<true>("totimestamp", timestamp_type, { simple_date_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto simple_date_obj = simple_date_type->deserialize(*bb);
+        if (simple_date_obj.is_null()) {
+            return {};
+        }
+        auto from_simple_date = get_castas_fctn(timestamp_type, simple_date_type);
+        return {timestamp_type->decompose(from_simple_date(simple_date_obj))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timeuuidtounixtimestamp_fct() {
+    return make_native_scalar_function<true>("tounixtimestamp", long_type, { timeuuid_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        return {long_type->decompose(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb)))};
+    });
+}
+
+inline
+shared_ptr<function>
+make_timestamptounixtimestamp_fct() {
+    return make_native_scalar_function<true>("tounixtimestamp", long_type, { timestamp_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto ts_obj = timestamp_type->deserialize(*bb);
+        if (ts_obj.is_null()) {
+            return {};
+        }
+        return {long_type->decompose(ts_obj)};
+    });
+}
+
+inline
+shared_ptr<function>
+make_datetounixtimestamp_fct() {
+    return make_native_scalar_function<true>("tounixtimestamp", long_type, { simple_date_type },
+            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
+        using namespace utils;
+        auto& bb = values[0];
+        if (!bb) {
+            return {};
+        }
+        auto simple_date_obj = simple_date_type->deserialize(*bb);
+        if (simple_date_obj.is_null()) {
+            return {};
+        }
+        auto from_simple_date = get_castas_fctn(timestamp_type, simple_date_type);
+        return {long_type->decompose(from_simple_date(simple_date_obj))};
+    });
+}
+
 }
 }
 }
--- a/cql3/lists.cc
+++ b/cql3/lists.cc
@@ -202,12 +202,6 @@ lists::delayed_value::bind(const query_options& options) {
        if (bo.is_unset_value()) {
            return constants::UNSET_VALUE;
        }
-        // We don't support value > 64K because the serialization format encode the length as an unsigned short.
-        if (bo->size() > std::numeric_limits<uint16_t>::max()) {
-            throw exceptions::invalid_request_exception(sprint("List value is too long. List values are limited to %d bytes but %d bytes value provided",
-                    std::numeric_limits<uint16_t>::max(),
-                    bo->size()));
-        }

        buffers.push_back(std::move(to_bytes(*bo)));
    }
@@ -243,7 +237,12 @@ lists::precision_time::get_next(db_clock::time_point millis) {

 void
 lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
-    const auto& value = _t->bind(params._options);
+    auto value = _t->bind(params._options);
+    execute(m, prefix, params, column, std::move(value));
+}
+
+void
+lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value) {
    if (value == constants::UNSET_VALUE) {
        return;
    }
@@ -305,12 +304,7 @@ lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix
    if (!value) {
        mut.cells.emplace_back(eidx, params.make_dead_cell());
    } else {
-        if (value->size() > std::numeric_limits<uint16_t>::max()) {
-            throw exceptions::invalid_request_exception(
-                    sprint("List value is too long. List values are limited to %d bytes but %d bytes value provided",
-                            std::numeric_limits<uint16_t>::max(), value->size()));
-        }
-        mut.cells.emplace_back(eidx, params.make_cell(*value));
+        mut.cells.emplace_back(eidx, params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
    }
    auto smut = ltype->serialize_mutation_form(mut);
    m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(std::move(smut)));
@@ -337,7 +331,7 @@ lists::setter_by_uuid::execute(mutation& m, const clustering_key_prefix& prefix,

    list_type_impl::mutation mut;
    mut.cells.reserve(1);
-    mut.cells.emplace_back(to_bytes(*index), params.make_cell(*value));
+    mut.cells.emplace_back(to_bytes(*index), params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
    auto smut = ltype->serialize_mutation_form(mut);
    m.set_cell(prefix, column,
                    atomic_cell_or_collection::from_collection_mutation(
@@ -376,7 +370,7 @@ lists::do_append(shared_ptr<term> value,
            auto uuid1 = utils::UUID_gen::get_time_UUID_bytes();
            auto uuid = bytes(reinterpret_cast<const int8_t*>(uuid1.data()), uuid1.size());
            // FIXME: can e be empty?
-            appended.cells.emplace_back(std::move(uuid), params.make_cell(*e));
+            appended.cells.emplace_back(std::move(uuid), params.make_cell(*ltype->value_comparator(), *e, atomic_cell::collection_member::yes));
        }
        m.set_cell(prefix, column, ltype->serialize_mutation_form(appended));
    } else {
@@ -385,7 +379,7 @@ lists::do_append(shared_ptr<term> value,
            m.set_cell(prefix, column, params.make_dead_cell());
        } else {
            auto newv = list_value->get_with_protocol_version(cql_serialization_format::internal());
-            m.set_cell(prefix, column, params.make_cell(std::move(newv)));
+            m.set_cell(prefix, column, params.make_cell(*column.type, std::move(newv)));
        }
    }
 }
@@ -406,14 +400,14 @@ lists::prepender::execute(mutation& m, const clustering_key_prefix& prefix, cons
    mut.cells.reserve(lvalue->get_elements().size());
    // We reverse the order of insertion, so that the last element gets the lastest time
    // (lists are sorted by time)
+    auto&& ltype = static_cast<const list_type_impl*>(column.type.get());
    for (auto&& v : lvalue->_elements | boost::adaptors::reversed) {
        auto&& pt = precision_time::get_next(time);
        auto uuid = utils::UUID_gen::get_time_UUID_bytes(pt.millis.time_since_epoch().count(), pt.nanos);
-        mut.cells.emplace_back(bytes(uuid.data(), uuid.size()), params.make_cell(*v));
+        mut.cells.emplace_back(bytes(uuid.data(), uuid.size()), params.make_cell(*ltype->value_comparator(), *v, atomic_cell::collection_member::yes));
    }
    // now reverse again, to get the original order back
    std::reverse(mut.cells.begin(), mut.cells.end());
-    auto&& ltype = static_cast<const list_type_impl*>(column.type.get());
    m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(ltype->serialize_mutation_form(std::move(mut))));
 }

--- a/cql3/lists.hh
+++ b/cql3/lists.hh
@@ -147,6 +147,7 @@ public:
                : operation(column, std::move(t)) {
        }
        virtual void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) override;
+        static void execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value);
    };

    class setter_by_index : public operation {
--- a/cql3/maps.cc
+++ b/cql3/maps.cc
@@ -245,11 +245,6 @@ maps::delayed_value::bind(const query_options& options) {
        if (value_bytes.is_unset_value()) {
            return constants::UNSET_VALUE;
        }
-        if (value_bytes->size() > std::numeric_limits<uint16_t>::max()) {
-            throw exceptions::invalid_request_exception(sprint("Map value is too long. Map values are limited to %d bytes but %d bytes value provided",
-                                                    std::numeric_limits<uint16_t>::max(),
-                                                    value_bytes->size()));
-        }
        buffers.emplace(std::move(to_bytes(*key_bytes)), std::move(to_bytes(*value_bytes)));
    }
    return ::make_shared<value>(std::move(buffers));
@@ -271,6 +266,11 @@ maps::marker::bind(const query_options& options) {
 void
 maps::setter::execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params) {
    auto value = _t->bind(params._options);
+    execute(m, row_key, params, column, std::move(value));
+}
+
+void
+maps::setter::execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value) {
    if (value == constants::UNSET_VALUE) {
        return;
    }
@@ -300,16 +300,11 @@ maps::setter_by_key::execute(mutation& m, const clustering_key_prefix& prefix, c
    if (!key) {
        throw invalid_request_exception("Invalid null map key");
    }
-    if (value && value->size() >= std::numeric_limits<uint16_t>::max()) {
-        throw invalid_request_exception(
-                sprint("Map value is too long. Map values are limited to %d bytes but %d bytes value provided",
-                       std::numeric_limits<uint16_t>::max(),
-                       value->size()));
-    }
-    auto avalue = value ? params.make_cell(*value) : params.make_dead_cell();
-    map_type_impl::mutation update = { {}, { { std::move(to_bytes(*key)), std::move(avalue) } } };
-    // should have been verified as map earlier?
    auto ctype = static_pointer_cast<const map_type_impl>(column.type);
+    auto avalue = value ? params.make_cell(*ctype->get_values_type(), *value, atomic_cell::collection_member::yes) : params.make_dead_cell();
+    map_type_impl::mutation update;
+    update.cells.emplace_back(std::move(to_bytes(*key)), std::move(avalue));
+    // should have been verified as map earlier?
    auto col_mut = ctype->serialize_mutation_form(std::move(update));
    m.set_cell(prefix, column, std::move(col_mut));
 }
@@ -334,10 +329,10 @@ maps::do_put(mutation& m, const clustering_key_prefix& prefix, const update_para
            return;
        }

-        for (auto&& e : map_value->map) {
-            mut.cells.emplace_back(e.first, params.make_cell(e.second));
-        }
        auto ctype = static_pointer_cast<const map_type_impl>(column.type);
+        for (auto&& e : map_value->map) {
+            mut.cells.emplace_back(e.first, params.make_cell(*ctype->get_values_type(), e.second, atomic_cell::collection_member::yes));
+        }
        auto col_mut = ctype->serialize_mutation_form(std::move(mut));
        m.set_cell(prefix, column, std::move(col_mut));
    } else {
@@ -347,7 +342,7 @@ maps::do_put(mutation& m, const clustering_key_prefix& prefix, const update_para
        } else {
            auto v = map_type_impl::serialize_partially_deserialized_form({map_value->map.begin(), map_value->map.end()},
                    cql_serialization_format::internal());
-            m.set_cell(prefix, column, params.make_cell(std::move(v)));
+            m.set_cell(prefix, column, params.make_cell(*column.type, std::move(v)));
        }
    }
 }
--- a/cql3/maps.hh
+++ b/cql3/maps.hh
@@ -117,6 +117,7 @@ public:
        }

        virtual void execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params) override;
+        static void execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params, const column_definition& column, ::shared_ptr<terminal> value);
    };

    class setter_by_key : public operation {
--- a/cql3/operation.hh
+++ b/cql3/operation.hh
@@ -87,15 +87,15 @@ public:

    virtual ~operation() {}

-    atomic_cell make_dead_cell(const update_parameters& params) const {
+    static atomic_cell make_dead_cell(const update_parameters& params) {
        return params.make_dead_cell();
    }

-    atomic_cell make_cell(bytes_view value, const update_parameters& params) const {
-        return params.make_cell(value);
+    static atomic_cell make_cell(const abstract_type& type, bytes_view value, const update_parameters& params) {
+        return params.make_cell(type, value);
    }

-    atomic_cell make_counter_update_cell(int64_t delta, const update_parameters& params) const {
+    static atomic_cell make_counter_update_cell(int64_t delta, const update_parameters& params) {
        return params.make_counter_update_cell(delta);
    }

--- a/cql3/prepared_statements_cache.hh
+++ b/cql3/prepared_statements_cache.hh
@@ -68,6 +68,14 @@ public:
    static thrift_prepared_id_type thrift_id(const prepared_cache_key_type& key) {
        return key.key().second;
    }
+
+    bool operator==(const prepared_cache_key_type& other) const {
+        return _key == other._key;
+    }
+
+    bool operator!=(const prepared_cache_key_type& other) const {
+        return !(*this == other);
+    }
 };

 class prepared_statements_cache {
@@ -102,9 +110,9 @@ private:
        }
    };

+public:
    static const std::chrono::minutes entry_expiry;

-public:
    using key_type = prepared_cache_key_type;
    using value_type = checked_weak_ptr;
    using statement_is_too_big = typename cache_type::entry_is_too_big;
@@ -116,8 +124,8 @@ private:
    value_extractor_fn _value_extractor_fn;

 public:
-    prepared_statements_cache(logging::logger& logger)
-        : _cache(memory::stats().total_memory() / 256, entry_expiry, logger)
+    prepared_statements_cache(logging::logger& logger, size_t size)
+        : _cache(size, entry_expiry, logger)
    {}

    template <typename LoadFunc>
@@ -155,6 +163,10 @@ public:
    size_t memory_footprint() const {
        return _cache.memory_footprint();
    }
+
+    future<> stop() {
+        return _cache.stop();
+    }
 };
 }

@@ -168,4 +180,11 @@ inline std::ostream& operator<<(std::ostream& os, const cql3::prepared_cache_key
    os << p.key();
    return os;
 }
+
+template<>
+struct hash<cql3::prepared_cache_key_type> final {
+    size_t operator()(const cql3::prepared_cache_key_type& k) const {
+        return utils::tuple_hash()(k.key());
+    }
+};
 }
--- a/cql3/query_options.cc
+++ b/cql3/query_options.cc
@@ -46,10 +46,11 @@ namespace cql3 {

 thread_local const query_options::specific_options query_options::specific_options::DEFAULT{-1, {}, {}, api::missing_timestamp};

-thread_local query_options query_options::DEFAULT{db::consistency_level::ONE, std::experimental::nullopt,
+thread_local query_options query_options::DEFAULT{db::consistency_level::ONE, infinite_timeout_config, std::experimental::nullopt,
    std::vector<cql3::raw_value_view>(), false, query_options::specific_options::DEFAULT, cql_serialization_format::latest()};

 query_options::query_options(db::consistency_level consistency,
+                           const ::timeout_config& timeout_config,
                           std::experimental::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value> values,
                           std::vector<cql3::raw_value_view> value_views,
@@ -57,6 +58,7 @@ query_options::query_options(db::consistency_level consistency,
                           specific_options options,
                           cql_serialization_format sf)
   : _consistency(consistency)
+   , _timeout_config(timeout_config)
   , _names(std::move(names))
   , _values(std::move(values))
   , _value_views(value_views)
@@ -67,12 +69,14 @@ query_options::query_options(db::consistency_level consistency,
 }

 query_options::query_options(db::consistency_level consistency,
+                             const ::timeout_config& timeout_config,
                             std::experimental::optional<std::vector<sstring_view>> names,
                             std::vector<cql3::raw_value> values,
                             bool skip_metadata,
                             specific_options options,
                             cql_serialization_format sf)
    : _consistency(consistency)
+    , _timeout_config(timeout_config)
    , _names(std::move(names))
    , _values(std::move(values))
    , _value_views()
@@ -84,12 +88,14 @@ query_options::query_options(db::consistency_level consistency,
 }

 query_options::query_options(db::consistency_level consistency,
+                             const ::timeout_config& timeout_config,
                             std::experimental::optional<std::vector<sstring_view>> names,
                             std::vector<cql3::raw_value_view> value_views,
                             bool skip_metadata,
                             specific_options options,
                             cql_serialization_format sf)
    : _consistency(consistency)
+    , _timeout_config(timeout_config)
    , _names(std::move(names))
    , _values()
    , _value_views(std::move(value_views))
@@ -99,9 +105,10 @@ query_options::query_options(db::consistency_level consistency,
 {
 }

-query_options::query_options(db::consistency_level cl, std::vector<cql3::raw_value> values, specific_options options)
+query_options::query_options(db::consistency_level cl, const ::timeout_config& timeout_config, std::vector<cql3::raw_value> values, specific_options options)
    : query_options(
          cl,
+          timeout_config,
          {},
          std::move(values),
          false,
@@ -113,6 +120,7 @@ query_options::query_options(db::consistency_level cl, std::vector<cql3::raw_val

 query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<service::pager::paging_state> paging_state)
        : query_options(qo->_consistency,
+        qo->get_timeout_config(),
        std::move(qo->_names),
        std::move(qo->_values),
        std::move(qo->_value_views),
@@ -124,7 +132,7 @@ query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<ser

 query_options::query_options(std::vector<cql3::raw_value> values)
    : query_options(
-          db::consistency_level::ONE, std::move(values))
+          db::consistency_level::ONE, infinite_timeout_config, std::move(values))
 {}

 db::consistency_level query_options::get_consistency() const
@@ -209,19 +217,18 @@ void query_options::prepare(const std::vector<::shared_ptr<column_specification>
    }

    auto& names = *_names;
-    std::vector<cql3::raw_value> ordered_values;
+    std::vector<cql3::raw_value_view> ordered_values;
    ordered_values.reserve(specs.size());
    for (auto&& spec : specs) {
        auto& spec_name = spec->name->text();
        for (size_t j = 0; j < names.size(); j++) {
            if (names[j] == spec_name) {
-                ordered_values.emplace_back(_values[j]);
+                ordered_values.emplace_back(_value_views[j]);
                break;
            }
        }
    }
-    _values = std::move(ordered_values);
-    fill_value_views();
+    _value_views = std::move(ordered_values);
 }

 void query_options::fill_value_views()
--- a/Show More
+++ b/Show More