build: support for alternative versions of libsystemd pkgconfig

While pkgconfig is supposed to be a distribution and version neutral way of detecting packages, it doesn't always work this way. The sd_notify() manual page documents that sd_notify is available via the libsystemd package, but on centos 7.0 it is only available via the libsystemd-daemon package (on centos 7.1+ it works as expected). Fix by allowing for alternate version of package names, testing each one until a match is found. Fixes #879. Message-Id: <1454858862-5239-1-git-send-email-avi@scylladb.com> (cherry picked from commit 8b0a26f06d)
Merge "Sstable cleanup fixes" from Tomasz
2026-04-28 04:06:59 +00:00 · 2016-02-07 17:38:10 +02:00 · 2016-02-04 16:43:09 +02:00 · 2016-01-28 14:44:40 +02:00 · 2016-01-28 14:05:57 +02:00 · 2016-01-28 13:57:59 +02:00
299 changed files with 12080 additions and 5368 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,3 +5,6 @@ build
 build.ninja
 cscope.*
 /debian/
+dist/ami/files/*.rpm
+dist/ami/variables.json
+dist/ami/scylla_deploy.sh
--- a/IDL.md
+++ b/IDL.md
@@ -0,0 +1,103 @@
+#IDL definition
+The schema we use similar to c++ schema.
+Use class or struct similar to the object you need the serializer for.
+Use namespace when applicable.
+
+##keywords
+* class/struct - a class or a struct like C++
+  class/struct can have final or stub marker
+* namespace - has the same C++ meaning
+* enum class - has the same C++ meaning
+* final modifier for class - when a class mark as final it will not contain a size parameter. Note that final class cannot be extended by future version, so use with care
+* stub class - when a class is mark as stub, it means that no code will be generated for this class and it is only there as a documentation.
+* version attributes - mark with [[version id ]] mark that a field is available from a specific version
+* template - A template class definition like C++
+##Syntax
+
+###Namespace
+```
+namespace ns_name { namespace-body }
+```
+* ns_name: either a previously unused identifier, in which case this is original-namespace-definition or the name of a namespace, in which case this is extension-namespace-definition
+* namespace-body: possibly empty sequence of declarations of any kind (including class and struct definitions as well as nested namespaces)
+
+###class/struct
+`
+class-key  class-name final(optional) stub(optional) { member-specification } ;(optional)
+`
+* class-key: one of class or struct.
+* class-name: the name of the class that's being defined. optionally followed by keyword final, optionally followed by keyword stub
+* final: when a class mark as final, it means it can not be extended and there is no need to serialize its size, use with care.
+* stub: when a class is mark as stub, it means no code will generate for it and it is added for documentation only.
+* member-specification: list of access specifiers, and public member accessor see class member below.
+* to be compatible with C++ a class definition can be followed by a semicolon.
+###enum
+`enum-key identifier enum-base { enumerator-list(optional) }`
+* enum-key: only enum class is supported
+* identifier: the name of the enumeration that's being declared.
+* enum-base: colon (:), followed by a type-specifier-seq that names an integral type (see the C++ standard for the full list of all possible integral types).
+* enumerator-list: comma-separated list of enumerator definitions, each of which is either simply an identifier, which becomes the name of the enumerator, or an identifier with an initializer: identifier = integral value.
+Note that though C++ allows constexpr as an initialize value, it makes the documentation less readable, hence is not permitted.
+
+###class member
+`type member-access attributes(optional) default-value(optional);`
+* type: Any valid C++ type, following the C++ notation. note that there should be a serializer for the type, but deceleration order is not mandatory
+* member-access: is the way the member can be access. If the member is public it can be the name itself. if not it could be a getter function that should be followed by braces. Note that getter can (and probably should) be const methods.
+* attributes: Attributes define by square brackets. Currently are use to mark a version in which a specific member was added [ [ version version-number] ] would mark that the specific member was added in the given version number.
+
+###template
+`template < parameter-list > class-declaration`
+* parameter-list - a non-empty comma-separated list of the template parameters. 
+* class-decleration - (See class section) The class name declared become a template name.
+
+##IDL example
+Forward slashes comments are ignored until the end of the line.
+```
+namespace utils {
+// An example of a stub class
+class UUID stub {
+    int64_t most_sig_bits;
+    int64_t least_sig_bits;
+}
+}
+
+namespace gms {
+//an enum example
+enum class application_state:int {STATUS = 0,
+        LOAD,
+        SCHEMA,
+        DC};
+
+// example of final class
+class versioned_value final {
+// getter and setter as public member
+    int version;
+    sstring value;
+}
+
+class heart_beat_state {
+//getter as function
+    int32_t get_generation();
+//default value example
+    int32_t get_heart_beat_version() = 1;
+}
+
+class endpoint_state {
+    heart_beat_state get_heart_beat_state();
+    std::map<application_state, versioned_value> get_application_state_map();
+}
+
+class gossip_digest {
+    inet_address get_endpoint();
+    int32_t get_generation();
+//mark that a field was added on a specific version
+    int32_t get_max_version() [ [version 0.14.2] ];
+}
+
+class gossip_digest_ack {
+    std::vector<gossip_digest> digests();
+    std::map<inet_address, gms::endpoint_state> get_endpoint_state_map();
+}
+}
+```
+
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ git submodule update --recursive
 * Installing required packages:

 ```
-sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel
+sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing
 ```

 * Build Scylla
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 #!/bin/sh

-VERSION=0.15
+VERSION=0.17

 if test -f version
 then
--- a/api/api-doc/compaction_manager.json
+++ b/api/api-doc/compaction_manager.json
@@ -106,7 +106,7 @@
                     "required":true,
                     "allowMultiple":false,
                     "type":"string",
-                     "paramType":"string"
+                     "paramType":"query"
                  }
               ]
            }
--- a/api/api-doc/failure_detector.json
+++ b/api/api-doc/failure_detector.json
@@ -196,6 +196,10 @@
                "value": {
                    "type": "string",
                    "description": "The version value"
+                },
+                "version": {
+                    "type": "int",
+                    "description": "The application state version"
                }
            }
        }
--- a/api/api-doc/messaging_service.json
+++ b/api/api-doc/messaging_service.json
@@ -234,12 +234,12 @@
               "type":"string",
               "enum":[
                 "CLIENT_ID",
-                 "ECHO",
                 "MUTATION",
                 "MUTATION_DONE",
                 "READ_DATA",
                 "READ_MUTATION_DATA",
                 "READ_DIGEST",
+                 "GOSSIP_ECHO",
                 "GOSSIP_DIGEST_SYN",
                 "GOSSIP_DIGEST_ACK2",
                 "GOSSIP_SHUTDOWN",
@@ -247,13 +247,13 @@
                 "TRUNCATE",
                 "REPLICATION_FINISHED",
                 "MIGRATION_REQUEST",
-                 "STREAM_INIT_MESSAGE",
                 "PREPARE_MESSAGE",
                 "PREPARE_DONE_MESSAGE",
                 "STREAM_MUTATION",
                 "STREAM_MUTATION_DONE",
                 "COMPLETE_MESSAGE",
-                 "LAST"
+                 "REPAIR_CHECKSUM_RANGE",
+                 "GET_SCHEMA_VERSION"
               ]
            }
         }
--- a/api/api.cc
+++ b/api/api.cc
@@ -1,5 +1,5 @@
 /*
- * Copyright 2015 Cloudius Systems
+ * Copyright 2015 ScyllaDB
 */

 /*
@@ -52,67 +52,98 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
    return std::make_unique<reply>();
 }

-future<> set_server(http_context& ctx) {
+future<> set_server_init(http_context& ctx) {
    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

    return ctx.http_server.set_routes([rb, &ctx](routes& r) {
        r.register_exeption_handler(exception_reply);
-        httpd::directory_handler* dir = new httpd::directory_handler(ctx.api_dir,
-                new content_replace("html"));
        r.put(GET, "/ui", new httpd::file_handler(ctx.api_dir + "/index.html",
                new content_replace("html")));
-        r.add(GET, url("/ui").remainder("path"), dir);
-
-        rb->set_api_doc(r);
-        rb->register_function(r, "storage_service",
-                                "The storage service API");
-        set_storage_service(ctx,r);
-        rb->register_function(r, "commitlog",
-                                "The commit log API");
-        set_commitlog(ctx,r);
-        rb->register_function(r, "gossiper",
-                                "The gossiper API");
-        set_gossiper(ctx,r);
-        rb->register_function(r, "column_family",
-                                        "The column family API");
-        set_column_family(ctx, r);
-
-        rb->register_function(r, "lsa", "Log-structured allocator API");
-        set_lsa(ctx, r);
-
-        rb->register_function(r, "failure_detector",
-                                "The failure detector API");
-        set_failure_detector(ctx,r);
-
-        rb->register_function(r, "messaging_service",
-                "The messaging service API");
-        set_messaging_service(ctx, r);
-        rb->register_function(r, "storage_proxy",
-                                        "The storage proxy API");
-        set_storage_proxy(ctx, r);
-
-        rb->register_function(r, "cache_service",
-                                                "The cache service API");
-        set_cache_service(ctx,r);
-        rb->register_function(r, "collectd",
-                "The collectd API");
-        set_collectd(ctx, r);
-        rb->register_function(r, "endpoint_snitch_info",
-                        "The endpoint snitch info API");
-        set_endpoint_snitch(ctx, r);
-        rb->register_function(r, "compaction_manager",
-                        "The Compaction manager API");
-        set_compaction_manager(ctx, r);
-        rb->register_function(r, "hinted_handoff",
-                        "The hinted handoff API");
-        set_hinted_handoff(ctx, r);
-        rb->register_function(r, "stream_manager",
-                "The stream manager API");
-        set_stream_manager(ctx, r);
+        r.add(GET, url("/ui").remainder("path"), new httpd::directory_handler(ctx.api_dir,
+                new content_replace("html")));
        rb->register_function(r, "system",
                "The system related API");
        set_system(ctx, r);
+        rb->set_api_doc(r);
+    });
+}

+static future<> register_api(http_context& ctx, const sstring& api_name,
+        const sstring api_desc,
+        std::function<void(http_context& ctx, routes& r)> f) {
+    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
+
+    return ctx.http_server.set_routes([rb, &ctx, api_name, api_desc, f](routes& r) {
+        rb->register_function(r, api_name, api_desc);
+        f(ctx,r);
+    });
+}
+
+future<> set_server_storage_service(http_context& ctx) {
+    return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
+}
+
+future<> set_server_gossip(http_context& ctx) {
+    return register_api(ctx, "gossiper",
+                "The gossiper API", set_gossiper);
+}
+
+future<> set_server_load_sstable(http_context& ctx) {
+    return register_api(ctx, "column_family",
+                "The column family API", set_column_family);
+}
+
+future<> set_server_messaging_service(http_context& ctx) {
+    return register_api(ctx, "messaging_service",
+                "The messaging service API", set_messaging_service);
+}
+
+future<> set_server_storage_proxy(http_context& ctx) {
+    return register_api(ctx, "storage_proxy",
+                "The storage proxy API", set_storage_proxy);
+}
+
+future<> set_server_stream_manager(http_context& ctx) {
+    return register_api(ctx, "stream_manager",
+                "The stream manager API", set_stream_manager);
+}
+
+future<> set_server_gossip_settle(http_context& ctx) {
+    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
+
+    return ctx.http_server.set_routes([rb, &ctx](routes& r) {
+        rb->register_function(r, "failure_detector",
+                "The failure detector API");
+        set_failure_detector(ctx,r);
+        rb->register_function(r, "cache_service",
+                "The cache service API");
+        set_cache_service(ctx,r);
+
+        rb->register_function(r, "endpoint_snitch_info",
+                "The endpoint snitch info API");
+        set_endpoint_snitch(ctx, r);
+    });
+}
+
+future<> set_server_done(http_context& ctx) {
+    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
+
+    return ctx.http_server.set_routes([rb, &ctx](routes& r) {
+        rb->register_function(r, "compaction_manager",
+                "The Compaction manager API");
+        set_compaction_manager(ctx, r);
+        rb->register_function(r, "lsa", "Log-structured allocator API");
+        set_lsa(ctx, r);
+
+        rb->register_function(r, "commitlog",
+                "The commit log API");
+        set_commitlog(ctx,r);
+        rb->register_function(r, "hinted_handoff",
+                "The hinted handoff API");
+        set_hinted_handoff(ctx, r);
+        rb->register_function(r, "collectd",
+                "The collectd API");
+        set_collectd(ctx, r);
    });
 }

--- a/api/api.hh
+++ b/api/api.hh
@@ -1,5 +1,5 @@
 /*
- * Copyright 2015 Cloudius Systems
+ * Copyright 2015 ScyllaDB
 */

 /*
@@ -21,31 +21,17 @@

 #pragma once

-#include "http/httpd.hh"
 #include "json/json_elements.hh"
-#include "database.hh"
-#include "service/storage_proxy.hh"
 #include <boost/lexical_cast.hpp>
 #include <boost/algorithm/string/split.hpp>
 #include <boost/algorithm/string/classification.hpp>
 #include "api/api-doc/utils.json.hh"
 #include "utils/histogram.hh"
 #include "http/exception.hh"
+#include "api_init.hh"

 namespace api {

-struct http_context {
-    sstring api_dir;
-    sstring api_doc;
-    httpd::http_server_control http_server;
-    distributed<database>& db;
-    distributed<service::storage_proxy>& sp;
-    http_context(distributed<database>& _db, distributed<service::storage_proxy>&
-            _sp) : db(_db), sp(_sp) {}
-};
-
-future<> set_server(http_context& ctx);
-
 template<class T>
 std::vector<sstring> container_to_vec(const T& container) {
    std::vector<sstring> res;
--- a/api/api_init.hh
+++ b/api/api_init.hh
@@ -0,0 +1,51 @@
+/*
+ * Copyright 2016 ScylaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#pragma once
+#include "database.hh"
+#include "service/storage_proxy.hh"
+#include "http/httpd.hh"
+
+namespace api {
+
+struct http_context {
+    sstring api_dir;
+    sstring api_doc;
+    httpd::http_server_control http_server;
+    distributed<database>& db;
+    distributed<service::storage_proxy>& sp;
+    http_context(distributed<database>& _db,
+            distributed<service::storage_proxy>& _sp)
+            : db(_db), sp(_sp) {
+    }
+};
+
+future<> set_server_init(http_context& ctx);
+future<> set_server_storage_service(http_context& ctx);
+future<> set_server_gossip(http_context& ctx);
+future<> set_server_load_sstable(http_context& ctx);
+future<> set_server_messaging_service(http_context& ctx);
+future<> set_server_storage_proxy(http_context& ctx);
+future<> set_server_stream_manager(http_context& ctx);
+future<> set_server_gossip_settle(http_context& ctx);
+future<> set_server_done(http_context& ctx);
+
+
+}
--- a/api/compaction_manager.cc
+++ b/api/compaction_manager.cc
@@ -49,7 +49,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
                s.ks = c->ks;
                s.cf = c->cf;
                s.unit = "keys";
-                s.task_type = "compaction";
+                s.task_type = sstables::compaction_name(c->type);
                s.completed = c->total_keys_written;
                s.total = c->total_partitions;
                summaries.push_back(std::move(s));
@@ -67,11 +67,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {
        return make_ready_future<json::json_return_type>(json_void());
    });

-    cm::stop_compaction.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        warn(unimplemented::cause::API);
-        return make_ready_future<json::json_return_type>("");
+    cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<request> req) {
+        auto type = req->get_query_param("type");
+        return ctx.db.invoke_on_all([type] (database& db) {
+            auto& cm = db.get_compaction_manager();
+            cm.stop_compaction(type);
+        }).then([] {
+            return make_ready_future<json::json_return_type>(json_void());
+        });
    });

    cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {
--- a/api/failure_detector.cc
+++ b/api/failure_detector.cc
@@ -44,6 +44,7 @@ void set_failure_detector(http_context& ctx, routes& r) {
                // method that the state index are static but the name can be changed.
                version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(a.first);
                version_val.value = a.second.value;
+                version_val.version = a.second.version;
                val.application_state.push(version_val);
            }
            res.push_back(val);
--- a/api/messaging_service.cc
+++ b/api/messaging_service.cc
@@ -32,9 +32,9 @@ using namespace net;
 namespace api {

 using shard_info = messaging_service::shard_info;
-using shard_id = messaging_service::shard_id;
+using msg_addr = messaging_service::msg_addr;

-static const int32_t num_verb = static_cast<int32_t>(messaging_verb::LAST) + 1;
+static const int32_t num_verb = static_cast<int32_t>(messaging_verb::LAST);

 std::vector<message_counter> map_to_message_counters(
        const std::unordered_map<gms::inet_address, unsigned long>& map) {
@@ -58,7 +58,7 @@ future_json_function get_client_getter(std::function<uint64_t(const shard_info&)
        using map_type = std::unordered_map<gms::inet_address, uint64_t>;
        auto get_shard_map = [f](messaging_service& ms) {
            std::unordered_map<gms::inet_address, unsigned long> map;
-            ms.foreach_client([&map, f] (const shard_id& id, const shard_info& info) {
+            ms.foreach_client([&map, f] (const msg_addr& id, const shard_info& info) {
                map[id.addr] = f(info);
            });
            return map;
@@ -124,7 +124,7 @@ void set_messaging_service(http_context& ctx, routes& r) {
    });

    get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
-        shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb, 0);
+        shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);

        return net::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
            for (auto i = 0; i < num_verb; i++) {
@@ -137,8 +137,12 @@ void set_messaging_service(http_context& ctx, routes& r) {
            for (auto i : verb_counter::verb_wrapper::all_items()) {
                verb_counter c;
                messaging_verb v = i; // for type safety we use messaging_verb values
-                if ((*map)[static_cast<int32_t>(v)] > 0) {
-                    c.count = (*map)[static_cast<int32_t>(v)];
+                auto idx = static_cast<uint32_t>(v);
+                if (idx >= map->size()) {
+                    throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));
+                }
+                if ((*map)[idx] > 0) {
+                    c.count = (*map)[idx];
                    c.verb = i;
                    res.push_back(c);
                }
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -30,6 +30,7 @@
 #include "repair/repair.hh"
 #include "locator/snitch_base.hh"
 #include "column_family.hh"
+#include "log.hh"

 namespace api {

@@ -271,15 +272,21 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::force_keyspace_cleanup.set(r, [&ctx](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // the nodetool clean up is used in many tests
-        // this workaround willl let it work until
-        // a cleanup is implemented
-        warn(unimplemented::cause::API);
        auto keyspace = validate_keyspace(ctx, req->param);
-        auto column_family = req->get_query_param("cf");
-        return make_ready_future<json::json_return_type>(0);
+        auto column_families = split_cf(req->get_query_param("cf"));
+        if (column_families.empty()) {
+            column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
+        }
+        return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
+            std::vector<column_family*> column_families_vec;
+            auto& cm = db.get_compaction_manager();
+            for (auto entry : column_families) {
+                column_family* cf = &db.find_column_family(keyspace, entry);
+                cm.submit_cleanup_job(cf);
+            }
+        }).then([]{
+            return make_ready_future<json::json_return_type>(0);
+        });
    });

    ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -398,9 +405,13 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::get_logging_levels.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
        std::vector<ss::mapper> res;
+        for (auto i : logging::logger_registry().get_all_logger_names()) {
+            ss::mapper log;
+            log.key = i;
+            log.value = logging::level_name(logging::logger_registry().get_logger_level(i));
+            res.push_back(log);
+        }
        return make_ready_future<json::json_return_type>(res);
    });

--- a/api/stream_manager.cc
+++ b/api/stream_manager.cc
@@ -47,7 +47,7 @@ static hs::progress_info get_progress_info(const streaming::progress_info& info)
    res.direction = info.dir;
    res.file_name = info.file_name;
    res.peer = boost::lexical_cast<std::string>(info.peer);
-    res.session_index = info.session_index;
+    res.session_index = 0;
    res.total_bytes = info.total_bytes;
    return res;
 }
@@ -70,7 +70,7 @@ static hs::stream_state get_state(
    for (auto info : result_future.get_coordinator().get()->get_all_session_info()) {
        hs::stream_info si;
        si.peer = boost::lexical_cast<std::string>(info.peer);
-        si.session_index = info.session_index;
+        si.session_index = 0;
        si.state = info.state;
        si.connecting = si.peer;
        set_summaries(info.receiving_summaries, si.receiving_summaries);
@@ -109,14 +109,16 @@ void set_stream_manager(http_context& ctx, routes& r) {
    });

    hs::get_total_incoming_bytes.set(r, [](std::unique_ptr<request> req) {
-        gms::inet_address ep(req->param["peer"]);
-        utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);
-        return streaming::get_stream_manager().map_reduce0([plan_id](streaming::stream_manager& stream) {
+        gms::inet_address peer(req->param["peer"]);
+        return streaming::get_stream_manager().map_reduce0([peer](streaming::stream_manager& sm) {
            int64_t res = 0;
-            streaming::stream_result_future* s = stream.get_receiving_stream(plan_id).get();
-            if (s != nullptr) {
-                for (auto si: s->get_coordinator()->get_all_session_info()) {
-                    res += si.get_total_size_received();
+            for (auto sr : sm.get_all_streams()) {
+                if (sr) {
+                    for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
+                        if (session->peer == peer) {
+                            res += session->get_bytes_received();
+                        }
+                    }
                }
            }
            return res;
@@ -126,12 +128,12 @@ void set_stream_manager(http_context& ctx, routes& r) {
    });

    hs::get_all_total_incoming_bytes.set(r, [](std::unique_ptr<request> req) {
-        return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
+        return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& sm) {
            int64_t res = 0;
-            for (auto s : stream.get_receiving_streams()) {
-                if (s.second.get() != nullptr) {
-                    for (auto si: s.second.get()->get_coordinator()->get_all_session_info()) {
-                        res += si.get_total_size_received();
+            for (auto sr : sm.get_all_streams()) {
+                if (sr) {
+                    for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
+                        res += session->get_bytes_received();
                    }
                }
            }
@@ -142,14 +144,16 @@ void set_stream_manager(http_context& ctx, routes& r) {
    });

    hs::get_total_outgoing_bytes.set(r, [](std::unique_ptr<request> req) {
-        gms::inet_address ep(req->param["peer"]);
-        utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);
-        return streaming::get_stream_manager().map_reduce0([plan_id](streaming::stream_manager& stream) {
+        gms::inet_address peer(req->param["peer"]);
+        return streaming::get_stream_manager().map_reduce0([peer](streaming::stream_manager& sm) {
            int64_t res = 0;
-            streaming::stream_result_future* s = stream.get_sending_stream(plan_id).get();
-            if (s != nullptr) {
-                for (auto si: s->get_coordinator()->get_all_session_info()) {
-                    res += si.get_total_size_received();
+            for (auto sr : sm.get_all_streams()) {
+                if (sr) {
+                    for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
+                        if (session->peer == peer) {
+                            res += session->get_bytes_sent();
+                        }
+                    }
                }
            }
            return res;
@@ -159,12 +163,12 @@ void set_stream_manager(http_context& ctx, routes& r) {
    });

    hs::get_all_total_outgoing_bytes.set(r, [](std::unique_ptr<request> req) {
-        return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
+        return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& sm) {
            int64_t res = 0;
-            for (auto s : stream.get_initiated_streams()) {
-                if (s.second.get() != nullptr) {
-                    for (auto si: s.second.get()->get_coordinator()->get_all_session_info()) {
-                        res += si.get_total_size_received();
+            for (auto sr : sm.get_all_streams()) {
+                if (sr) {
+                    for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
+                        res += session->get_bytes_sent();
                    }
                }
            }
--- a/atomic_cell.hh
+++ b/atomic_cell.hh
@@ -272,45 +272,6 @@ template<typename T>
 class serializer;
 }

-// A variant type that can hold either an atomic_cell, or a serialized collection.
-// Which type is stored is determined by the schema.
-class atomic_cell_or_collection final {
-    managed_bytes _data;
-
-    template<typename T>
-    friend class db::serializer;
-private:
-    atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
-public:
-    atomic_cell_or_collection() = default;
-    atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
-    static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
-    atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
-    atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
-    explicit operator bool() const {
-        return !_data.empty();
-    }
-    static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
-        return std::move(data.data);
-    }
-    collection_mutation_view as_collection_mutation() const {
-        return collection_mutation_view{_data};
-    }
-    bytes_view serialize() const {
-        return _data;
-    }
-    bool operator==(const atomic_cell_or_collection& other) const {
-        return _data == other._data;
-    }
-    void linearize() {
-        _data.linearize();
-    }
-    void unlinearize() {
-        _data.scatter();
-    }
-    friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
-};
-
 class column_definition;

 int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);
--- a/atomic_cell_hash.hh
+++ b/atomic_cell_hash.hh
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2015 Cloudius Systems, Ltd.
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh
+
+#include "types.hh"
+#include "atomic_cell.hh"
+#include "hashing.hh"
+
+template<typename Hasher>
+void feed_hash(collection_mutation_view cell, Hasher& h, const data_type& type) {
+    auto&& ctype = static_pointer_cast<const collection_type_impl>(type);
+    auto m_view = ctype->deserialize_mutation_form(cell);
+    ::feed_hash(h, m_view.tomb);
+    for (auto&& key_and_value : m_view.cells) {
+        ::feed_hash(h, key_and_value.first);
+        ::feed_hash(h, key_and_value.second);
+    }
+}
+
+template<>
+struct appending_hash<atomic_cell_view> {
+    template<typename Hasher>
+    void operator()(Hasher& h, atomic_cell_view cell) const {
+        feed_hash(h, cell.is_live());
+        feed_hash(h, cell.timestamp());
+        if (cell.is_live()) {
+            if (cell.is_live_and_has_ttl()) {
+                feed_hash(h, cell.expiry());
+                feed_hash(h, cell.ttl());
+            }
+            feed_hash(h, cell.value());
+        } else {
+            feed_hash(h, cell.deletion_time());
+        }
+    }
+};
--- a/atomic_cell_or_collection.hh
+++ b/atomic_cell_or_collection.hh
@@ -0,0 +1,73 @@
+/*
+ * Copyright (C) 2015 Cloudius Systems, Ltd.
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "atomic_cell.hh"
+#include "schema.hh"
+#include "hashing.hh"
+
+// A variant type that can hold either an atomic_cell, or a serialized collection.
+// Which type is stored is determined by the schema.
+class atomic_cell_or_collection final {
+    managed_bytes _data;
+
+    template<typename T>
+    friend class db::serializer;
+private:
+    atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
+public:
+    atomic_cell_or_collection() = default;
+    atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
+    static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
+    atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
+    atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
+    explicit operator bool() const {
+        return !_data.empty();
+    }
+    static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
+        return std::move(data.data);
+    }
+    collection_mutation_view as_collection_mutation() const {
+        return collection_mutation_view{_data};
+    }
+    bytes_view serialize() const {
+        return _data;
+    }
+    bool operator==(const atomic_cell_or_collection& other) const {
+        return _data == other._data;
+    }
+    template<typename Hasher>
+    void feed_hash(Hasher& h, const column_definition& def) const {
+        if (def.is_atomic()) {
+            ::feed_hash(h, as_atomic_cell());
+        } else {
+            ::feed_hash(as_collection_mutation(), h, def.type);
+        }
+    }
+    void linearize() {
+        _data.linearize();
+    }
+    void unlinearize() {
+        _data.scatter();
+    }
+    friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
+};
--- a/auth/auth.cc
+++ b/auth/auth.cc
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <seastar/core/sleep.hh>
+
+#include "auth.hh"
+#include "authenticator.hh"
+#include "database.hh"
+#include "cql3/query_processor.hh"
+#include "cql3/statements/cf_statement.hh"
+#include "cql3/statements/create_table_statement.hh"
+#include "db/config.hh"
+#include "service/migration_manager.hh"
+
+const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
+const sstring auth::auth::AUTH_KS("system_auth");
+const sstring auth::auth::USERS_CF("users");
+
+static const sstring USER_NAME("name");
+static const sstring SUPER("super");
+
+static logging::logger logger("auth");
+
+// TODO: configurable
+using namespace std::chrono_literals;
+const std::chrono::milliseconds auth::auth::SUPERUSER_SETUP_DELAY = 10000ms;
+
+class auth_migration_listener : public service::migration_listener {
+    void on_create_keyspace(const sstring& ks_name) override {}
+    void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
+    void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+
+    void on_update_keyspace(const sstring& ks_name) override {}
+    void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
+    void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+
+    void on_drop_keyspace(const sstring& ks_name) override {
+        // TODO:
+        //DatabaseDescriptor.getAuthorizer().revokeAll(DataResource.keyspace(ksName));
+
+    }
+    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
+        // TODO:
+        //DatabaseDescriptor.getAuthorizer().revokeAll(DataResource.columnFamily(ksName, cfName));
+    }
+    void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+};
+
+static auth_migration_listener auth_migration;
+
+/**
+ * Poor mans job schedule. For maximum 2 jobs. Sic.
+ * Still does nothing more clever than waiting 10 seconds
+ * like origin, then runs the submitted tasks.
+ *
+ * Only difference compared to sleep (from which this
+ * borrows _heavily_) is that if tasks have not run by the time
+ * we exit (and do static clean up) we delete the promise + cont
+ *
+ * Should be abstracted to some sort of global server function
+ * probably.
+ */
+void auth::auth::schedule_when_up(scheduled_func f) {
+    struct waiter {
+        promise<> done;
+        timer<> tmr;
+        waiter() : tmr([this] {done.set_value();})
+        {
+            tmr.arm(SUPERUSER_SETUP_DELAY);
+        }
+        ~waiter() {
+            if (tmr.armed()) {
+                tmr.cancel();
+                done.set_exception(std::runtime_error("shutting down"));
+            }
+            logger.trace("Deleting scheduled task");
+        }
+        void kill() {
+        }
+    };
+
+    typedef std::unique_ptr<waiter> waiter_ptr;
+
+    static thread_local std::vector<waiter_ptr> waiters;
+
+    logger.trace("Adding scheduled task");
+
+    waiters.emplace_back(std::make_unique<waiter>());
+    auto* w = waiters.back().get();
+
+    w->done.get_future().finally([w] {
+        auto i = std::find_if(waiters.begin(), waiters.end(), [w](const waiter_ptr& p) {
+                            return p.get() == w;
+                        });
+        if (i != waiters.end()) {
+            waiters.erase(i);
+        }
+    }).then([f = std::move(f)] {
+        logger.trace("Running scheduled task");
+        return f();
+    }).handle_exception([](auto ep) {
+        return make_ready_future();
+    });
+}
+
+
+bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
+    if (type == classname) {
+        return true;
+    }
+    auto i = classname.find_last_of('.');
+    return classname.compare(i + 1, sstring::npos, type) == 0;
+}
+
+future<> auth::auth::setup() {
+    auto& db = cql3::get_local_query_processor().db().local();
+    auto& cfg = db.get_config();
+    auto type = cfg.authenticator();
+
+    if (is_class_type(type, authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)) {
+        return authenticator::setup(type).discard_result(); // just create the object
+    }
+
+    future<> f = make_ready_future();
+
+    if (!db.has_keyspace(AUTH_KS)) {
+        std::map<sstring, sstring> opts;
+        opts["replication_factor"] = "1";
+        auto ksm = keyspace_metadata::new_keyspace(AUTH_KS, "org.apache.cassandra.locator.SimpleStrategy", opts, true);
+        f = service::get_local_migration_manager().announce_new_keyspace(ksm, false);
+    }
+
+    return f.then([] {
+        return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
+                                        AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
+                                        90 * 24 * 60 * 60)); // 3 months.
+    }).then([type] {
+        return authenticator::setup(type).discard_result();
+    }).then([] {
+        // TODO authorizer
+    }).then([] {
+        service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
+        // instead of once-timer, just schedule this later
+        schedule_when_up([] {
+            // setup default super user
+            return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
+                if (!exists) {
+                    auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
+                                    AUTH_KS, USERS_CF, USER_NAME, SUPER);
+                    cql3::get_local_query_processor().process(query, db::consistency_level::ONE, {DEFAULT_SUPERUSER_NAME, true}).then([](auto) {
+                        logger.info("Created default superuser '{}'", DEFAULT_SUPERUSER_NAME);
+                    }).handle_exception([](auto ep) {
+                        try {
+                            std::rethrow_exception(ep);
+                        } catch (exceptions::request_execution_exception&) {
+                            logger.warn("Skipped default superuser setup: some nodes were not ready");
+                        }
+                    });
+                }
+            });
+        });
+    });
+}
+
+static db::consistency_level consistency_for_user(const sstring& username) {
+    if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
+        return db::consistency_level::QUORUM;
+    }
+    return db::consistency_level::LOCAL_ONE;
+}
+
+static future<::shared_ptr<cql3::untyped_result_set>> select_user(const sstring& username) {
+    // Here was a thread local, explicit cache of prepared statement. In normal execution this is
+    // fine, but since we in testing set up and tear down system over and over, we'd start using
+    // obsolete prepared statements pretty quickly.
+    // Rely on query processing caching statements instead, and lets assume
+    // that a map lookup string->statement is not gonna kill us much.
+    return cql3::get_local_query_processor().process(
+                    sprint("SELECT * FROM %s.%s WHERE %s = ?",
+                                    auth::auth::AUTH_KS, auth::auth::USERS_CF,
+                                    USER_NAME), consistency_for_user(username),
+                    { username }, true);
+}
+
+future<bool> auth::auth::is_existing_user(const sstring& username) {
+    return select_user(username).then(
+                    [](::shared_ptr<cql3::untyped_result_set> res) {
+                        return make_ready_future<bool>(!res->empty());
+                    });
+}
+
+future<bool> auth::auth::is_super_user(const sstring& username) {
+    return select_user(username).then(
+                    [](::shared_ptr<cql3::untyped_result_set> res) {
+                        return make_ready_future<bool>(!res->empty() && res->one().get_as<bool>(SUPER));
+                    });
+}
+
+future<> auth::auth::insert_user(const sstring& username, bool is_super)
+                throw (exceptions::request_execution_exception) {
+    return cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
+                    AUTH_KS, USERS_CF, USER_NAME, SUPER),
+                    consistency_for_user(username), { username, is_super }).discard_result();
+}
+
+future<> auth::auth::delete_user(const sstring& username) throw(exceptions::request_execution_exception) {
+    return cql3::get_local_query_processor().process(sprint("DELETE FROM %s.%s WHERE %s = ?",
+                    AUTH_KS, USERS_CF, USER_NAME),
+                    consistency_for_user(username), { username }).discard_result();
+}
+
+future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
+    auto& qp = cql3::get_local_query_processor();
+    auto& db = qp.db().local();
+
+    if (db.has_schema(AUTH_KS, name)) {
+        return make_ready_future();
+    }
+
+    ::shared_ptr<cql3::statements::cf_statement> parsed = static_pointer_cast<
+                    cql3::statements::cf_statement>(cql3::query_processor::parse_statement(cql));
+    parsed->prepare_keyspace(AUTH_KS);
+    ::shared_ptr<cql3::statements::create_table_statement> statement =
+                    static_pointer_cast<cql3::statements::create_table_statement>(
+                                    parsed->prepare(db)->statement);
+    // Origin sets "Legacy Cf Id" for the new table. We have no need to be
+    // pre-2.1 compatible (afaik), so lets skip a whole lotta hoolaballo
+    return statement->announce_migration(qp.proxy(), false).then([statement](bool) {});
+}
+
+future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {
+    auto default_user_query = sprint("SELECT * FROM %s.%s WHERE %s = ?", AUTH_KS, cfname, name_column);
+    auto all_users_query = sprint("SELECT * FROM %s.%s LIMIT 1", AUTH_KS, cfname);
+
+    return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::ONE, { def_user_name }).then([=](::shared_ptr<cql3::untyped_result_set> res) {
+        if (!res->empty()) {
+            return make_ready_future<bool>(true);
+        }
+        return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::QUORUM, { def_user_name }).then([all_users_query](::shared_ptr<cql3::untyped_result_set> res) {
+            if (!res->empty()) {
+                return make_ready_future<bool>(true);
+            }
+            return cql3::get_local_query_processor().process(all_users_query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> res) {
+                return make_ready_future<bool>(!res->empty());
+            });
+        });
+    });
+}
+
--- a/auth/auth.hh
+++ b/auth/auth.hh
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <chrono>
+#include <seastar/core/sstring.hh>
+#include <seastar/core/future.hh>
+
+#include "exceptions/exceptions.hh"
+
+namespace auth {
+
+class auth {
+public:
+    static const sstring DEFAULT_SUPERUSER_NAME;
+    static const sstring AUTH_KS;
+    static const sstring USERS_CF;
+    static const std::chrono::milliseconds SUPERUSER_SETUP_DELAY;
+
+    static bool is_class_type(const sstring& type, const sstring& classname);
+
+#if 0
+    public static Set<Permission> getPermissions(AuthenticatedUser user, IResource resource)
+    {
+        return permissionsCache.getPermissions(user, resource);
+    }
+#endif
+
+    /**
+     * Checks if the username is stored in AUTH_KS.USERS_CF.
+     *
+     * @param username Username to query.
+     * @return whether or not Cassandra knows about the user.
+     */
+    static future<bool> is_existing_user(const sstring& username);
+
+    /**
+     * Checks if the user is a known superuser.
+     *
+     * @param username Username to query.
+     * @return true is the user is a superuser, false if they aren't or don't exist at all.
+     */
+    static future<bool> is_super_user(const sstring& username);
+
+    /**
+     * Inserts the user into AUTH_KS.USERS_CF (or overwrites their superuser status as a result of an ALTER USER query).
+     *
+     * @param username Username to insert.
+     * @param isSuper User's new status.
+     * @throws RequestExecutionException
+     */
+    static future<> insert_user(const sstring& username, bool is_super) throw(exceptions::request_execution_exception);
+
+    /**
+     * Deletes the user from AUTH_KS.USERS_CF.
+     *
+     * @param username Username to delete.
+     * @throws RequestExecutionException
+     */
+    static future<> delete_user(const sstring& username) throw(exceptions::request_execution_exception);
+
+    /**
+     * Sets up Authenticator and Authorizer.
+     */
+    static future<> setup();
+
+    /**
+     * Set up table from given CREATE TABLE statement under system_auth keyspace, if not already done so.
+     *
+     * @param name name of the table
+     * @param cql CREATE TABLE statement
+     */
+    static future<> setup_table(const sstring& name, const sstring& cql);
+
+    static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
+
+    // For internal use. Run function "when system is up".
+    typedef std::function<future<>()> scheduled_func;
+    static void schedule_when_up(scheduled_func);
+};
+}
--- a/streaming/messages/retry_message.hh
+++ b/streaming/messages/retry_message.hh
@@ -14,9 +14,12 @@
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
 *
- * Modified by Cloudius Systems.
- * Copyright 2015 Cloudius Systems.
+ * Modified by Cloudius Systems
 */

 /*
@@ -36,39 +39,23 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

-#pragma once

-#include "utils/UUID.hh"
-#include "streaming/messages/stream_message.hh"
+#include "authenticated_user.hh"

-namespace streaming {
-namespace messages {
+const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");

-class retry_message : public stream_message {
-public:
-    using UUID = utils::UUID;
-    UUID cf_id;
-    int sequence_number;
-    retry_message() = default;
-    retry_message(UUID cf_id_, int sequence_number_)
-        : stream_message(stream_message::Type::RECEIVED)
-        , cf_id (cf_id_)
-        , sequence_number(sequence_number_) {
-    }
-#if 0
-    @Override
-    public String toString()
-    {
-        final StringBuilder sb = new StringBuilder("Retry (");
-        sb.append(cfId).append(", #").append(sequenceNumber).append(')');
-        return sb.toString();
-    }
-#endif
-public:
-    void serialize(bytes::iterator& out) const;
-    static retry_message deserialize(bytes_view& v);
-    size_t serialized_size() const;
-};
+auth::authenticated_user::authenticated_user()
+                : _anon(true)
+{}

-} // namespace messages
-} // namespace streaming
+auth::authenticated_user::authenticated_user(sstring name)
+                : _name(name), _anon(false)
+{}
+
+const sstring& auth::authenticated_user::name() const {
+    return _anon ? ANONYMOUS_USERNAME : _name;
+}
+
+bool auth::authenticated_user::operator==(const authenticated_user& v) const {
+    return _anon ? v._anon : _name == v._name;
+}
--- a/streaming/messages/incoming_file_message.hh
+++ b/streaming/messages/incoming_file_message.hh
@@ -14,9 +14,12 @@
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
 *
- * Modified by Cloudius Systems.
- * Copyright 2015 Cloudius Systems.
+ * Modified by Cloudius Systems
 */

 /*
@@ -38,35 +41,39 @@

 #pragma once

-#include "streaming/messages/stream_message.hh"
-#include "streaming/messages/file_message_header.hh"
-#include "sstables/sstables.hh"
-#include "mutation_reader.hh"
+#include <seastar/core/sstring.hh>

-namespace streaming {
-namespace messages {
+namespace auth {

-/**
- * IncomingFileMessage is used to receive the part(or whole) of a SSTable data file.
- */
-class incoming_file_message : public stream_message {
+class authenticated_user {
 public:
-    file_message_header header;
+    static const sstring ANONYMOUS_USERNAME;

-    incoming_file_message() = default;
-    incoming_file_message(file_message_header header_, mutation_reader mr_)
-        : stream_message(stream_message::Type::FILE)
-        , header(std::move(header_)) {
+    authenticated_user();
+    authenticated_user(sstring name);
+
+    const sstring& name() const;
+
+    /**
+     * Checks the user's superuser status.
+     * Only a superuser is allowed to perform CREATE USER and DROP USER queries.
+     * Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
+     * (depends on IAuthorizer implementation).
+     */
+    bool is_super() const;
+
+    /**
+     * If IAuthenticator doesn't require authentication, this method may return true.
+     */
+    bool is_anonymous() const {
+        return _anon;
    }

-#if 0
-    @Override
-    public String toString()
-    {
-        return "File (" + header + ", file: " + sstable.getFilename() + ")";
-    }
-#endif
+    bool operator==(const authenticated_user&) const;
+private:
+    sstring _name;
+    bool _anon;
 };

-} // namespace messages
-} // namespace streaming
+}
+
--- a/auth/authenticator.cc
+++ b/auth/authenticator.cc
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "authenticator.hh"
+#include "authenticated_user.hh"
+#include "password_authenticator.hh"
+#include "auth.hh"
+#include "db/config.hh"
+
+const sstring auth::authenticator::USERNAME_KEY("username");
+const sstring auth::authenticator::PASSWORD_KEY("password");
+const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");
+
+/**
+ * Authenticator is assumed to be a fully state-less immutable object (note all the const).
+ * We thus store a single instance globally, since it should be safe/ok.
+ */
+static std::unique_ptr<auth::authenticator> global_authenticator;
+
+future<>
+auth::authenticator::setup(const sstring& type) throw (exceptions::configuration_exception) {
+    if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHENTICATOR_NAME)) {
+        class allow_all_authenticator : public authenticator {
+        public:
+            const sstring& class_name() const override {
+                return ALLOW_ALL_AUTHENTICATOR_NAME;
+            }
+            bool require_authentication() const override {
+                return false;
+            }
+            option_set supported_options() const override {
+                return option_set();
+            }
+            option_set alterable_options() const override {
+                return option_set();
+            }
+            future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override {
+                return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+            }
+            future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
+                return make_ready_future();
+            }
+            future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
+                return make_ready_future();
+            }
+            future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
+                return make_ready_future();
+            }
+            resource_ids protected_resources() const override {
+                return resource_ids();
+            }
+            ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
+                throw std::runtime_error("Should not reach");
+            }
+        };
+        global_authenticator = std::make_unique<allow_all_authenticator>();
+    } else if (auth::auth::is_class_type(type, password_authenticator::PASSWORD_AUTHENTICATOR_NAME)) {
+        auto pwa = std::make_unique<password_authenticator>();
+        auto f = pwa->init();
+        return f.then([pwa = std::move(pwa)]() mutable {
+            global_authenticator = std::move(pwa);
+        });
+    } else {
+        throw exceptions::configuration_exception("Invalid authenticator type: " + type);
+    }
+    return make_ready_future();
+}
+
+auth::authenticator& auth::authenticator::get() {
+    assert(global_authenticator);
+    return *global_authenticator;
+}
--- a/auth/authenticator.hh
+++ b/auth/authenticator.hh
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <memory>
+#include <unordered_map>
+#include <set>
+#include <stdexcept>
+#include <boost/any.hpp>
+
+#include <seastar/core/sstring.hh>
+#include <seastar/core/future.hh>
+#include <seastar/core/shared_ptr.hh>
+#include <seastar/core/enum.hh>
+
+#include "bytes.hh"
+#include "data_resource.hh"
+#include "enum_set.hh"
+#include "exceptions/exceptions.hh"
+
+namespace db {
+    class config;
+}
+
+namespace auth {
+
+class authenticated_user;
+
+class authenticator {
+public:
+    static const sstring USERNAME_KEY;
+    static const sstring PASSWORD_KEY;
+    static const sstring ALLOW_ALL_AUTHENTICATOR_NAME;
+
+    /**
+     * Supported CREATE USER/ALTER USER options.
+     * Currently only PASSWORD is available.
+     */
+    enum class option {
+        PASSWORD
+    };
+
+    using option_set = enum_set<super_enum<option, option::PASSWORD>>;
+    using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
+    using credentials_map = std::unordered_map<sstring, sstring>;
+
+    /**
+     * Resource id mappings, i.e. keyspace and/or column families.
+     */
+    using resource_ids = std::set<data_resource>;
+
+    /**
+     * Setup is called once upon system startup to initialize the IAuthenticator.
+     *
+     * For example, use this method to create any required keyspaces/column families.
+     * Note: Only call from main thread.
+     */
+    static future<> setup(const sstring& type) throw(exceptions::configuration_exception);
+
+    /**
+     * Returns the system authenticator. Must have called setup before calling this.
+     */
+    static authenticator& get();
+
+    virtual ~authenticator()
+    {}
+
+    virtual const sstring& class_name() const = 0;
+
+    /**
+     * Whether or not the authenticator requires explicit login.
+     * If false will instantiate user with AuthenticatedUser.ANONYMOUS_USER.
+     */
+    virtual bool require_authentication() const = 0;
+
+    /**
+     * Set of options supported by CREATE USER and ALTER USER queries.
+     * Should never return null - always return an empty set instead.
+     */
+    virtual option_set supported_options() const = 0;
+
+    /**
+     * Subset of supportedOptions that users are allowed to alter when performing ALTER USER [themselves].
+     * Should never return null - always return an empty set instead.
+     */
+    virtual option_set alterable_options() const = 0;
+
+    /**
+     * Authenticates a user given a Map<String, String> of credentials.
+     * Should never return null - always throw AuthenticationException instead.
+     * Returning AuthenticatedUser.ANONYMOUS_USER is an option as well if authentication is not required.
+     *
+     * @throws authentication_exception if credentials don't match any known user.
+     */
+    virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) = 0;
+
+    /**
+     * Called during execution of CREATE USER query (also may be called on startup, see seedSuperuserOptions method).
+     * If authenticator is static then the body of the method should be left blank, but don't throw an exception.
+     * options are guaranteed to be a subset of supportedOptions().
+     *
+     * @param username Username of the user to create.
+     * @param options Options the user will be created with.
+     * @throws exceptions::request_validation_exception
+     * @throws exceptions::request_execution_exception
+     */
+    virtual future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
+
+    /**
+     * Called during execution of ALTER USER query.
+     * options are always guaranteed to be a subset of supportedOptions(). Furthermore, if the user performing the query
+     * is not a superuser and is altering himself, then options are guaranteed to be a subset of alterableOptions().
+     * Keep the body of the method blank if your implementation doesn't support any options.
+     *
+     * @param username Username of the user that will be altered.
+     * @param options Options to alter.
+     * @throws exceptions::request_validation_exception
+     * @throws exceptions::request_execution_exception
+     */
+    virtual future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
+
+
+    /**
+     * Called during execution of DROP USER query.
+     *
+     * @param username Username of the user that will be dropped.
+     * @throws exceptions::request_validation_exception
+     * @throws exceptions::request_execution_exception
+     */
+    virtual future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
+
+     /**
+     * Set of resources that should be made inaccessible to users and only accessible internally.
+     *
+     * @return Keyspaces, column families that will be unmodifiable by users; other resources.
+     * @see resource_ids
+     */
+    virtual resource_ids protected_resources() const = 0;
+
+    class sasl_challenge {
+    public:
+        virtual ~sasl_challenge() {}
+        virtual bytes evaluate_response(bytes_view client_response) throw(exceptions::authentication_exception) = 0;
+        virtual bool is_complete() const = 0;
+        virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const throw(exceptions::authentication_exception) = 0;
+    };
+
+    /**
+     * Provide a sasl_challenge to be used by the CQL binary protocol server. If
+     * the configured authenticator requires authentication but does not implement this
+     * interface we refuse to start the binary protocol server as it will have no way
+     * of authenticating clients.
+     * @return sasl_challenge implementation
+     */
+    virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
+};
+
+}
+
--- a/auth/data_resource.cc
+++ b/auth/data_resource.cc
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "data_resource.hh"
+
+#include <regex>
+#include "service/storage_proxy.hh"
+
+const sstring auth::data_resource::ROOT_NAME("data");
+
+auth::data_resource::data_resource(level l, const sstring& ks, const sstring& cf)
+    : _ks(ks), _cf(cf)
+{
+    if (l != get_level()) {
+        throw std::invalid_argument("level/keyspace/column mismatch");
+    }
+}
+
+auth::data_resource::data_resource()
+    : data_resource(level::ROOT)
+{}
+
+auth::data_resource::data_resource(const sstring& ks)
+    : data_resource(level::KEYSPACE, ks)
+{}
+
+auth::data_resource::data_resource(const sstring& ks, const sstring& cf)
+    : data_resource(level::COLUMN_FAMILY, ks, cf)
+{}
+
+auth::data_resource::level auth::data_resource::get_level() const {
+    if (!_cf.empty()) {
+        assert(!_ks.empty());
+        return level::COLUMN_FAMILY;
+    }
+    if (!_ks.empty()) {
+        return level::KEYSPACE;
+    }
+    return level::ROOT;
+}
+
+auth::data_resource auth::data_resource::from_name(
+                const sstring& s) {
+
+    static std::regex slash_regex("/");
+
+    auto i = std::regex_token_iterator<sstring::const_iterator>(s.begin(),
+                    s.end(), slash_regex, -1);
+    auto e = std::regex_token_iterator<sstring::const_iterator>();
+    auto n = std::distance(i, e);
+
+    if (n > 3 || ROOT_NAME != sstring(*i++)) {
+        throw std::invalid_argument(sprint("%s is not a valid data resource name", s));
+    }
+
+    if (n == 1) {
+        return data_resource();
+    }
+    auto ks = *i++;
+    if (n == 2) {
+        return data_resource(ks.str());
+    }
+    auto cf = *i++;
+    return data_resource(ks.str(), cf.str());
+}
+
+sstring auth::data_resource::name() const {
+    switch (get_level()) {
+        case level::ROOT:
+            return ROOT_NAME;
+        case level::KEYSPACE:
+            return sprint("%s/%s", ROOT_NAME, _ks);
+        case level::COLUMN_FAMILY:
+        default:
+            return sprint("%s/%s/%s", ROOT_NAME, _ks, _cf);
+    }
+}
+
+auth::data_resource auth::data_resource::get_parent() const {
+    switch (get_level()) {
+    case level::KEYSPACE:
+        return data_resource();
+    case level::COLUMN_FAMILY:
+        return data_resource(_ks);
+    default:
+        throw std::invalid_argument("Root-level resource can't have a parent");
+    }
+}
+
+const sstring& auth::data_resource::keyspace() const
+                throw (std::invalid_argument) {
+    if (is_root_level()) {
+        throw std::invalid_argument("ROOT data resource has no keyspace");
+    }
+    return _ks;
+}
+
+const sstring& auth::data_resource::column_family() const
+                throw (std::invalid_argument) {
+    if (!is_column_family_level()) {
+        throw std::invalid_argument(sprint("%s data resource has no column family", name()));
+    }
+    return _cf;
+}
+
+bool auth::data_resource::has_parent() const {
+    return !is_root_level();
+}
+
+bool auth::data_resource::exists() const {
+    switch (get_level()) {
+        case level::ROOT:
+            return true;
+        case level::KEYSPACE:
+            return service::get_local_storage_proxy().get_db().local().has_keyspace(_ks);
+        case level::COLUMN_FAMILY:
+        default:
+            return service::get_local_storage_proxy().get_db().local().has_schema(_ks, _cf);
+    }
+}
+
+sstring auth::data_resource::to_string() const {
+    return name();
+}
+
+bool auth::data_resource::operator==(const data_resource& v) const {
+    return _ks == v._ks && _cf == v._cf;
+}
+
+bool auth::data_resource::operator<(const data_resource& v) const {
+    return _ks < v._ks ? true : (v._ks < _ks ? false : _cf < v._cf);
+}
+
+std::ostream& auth::operator<<(std::ostream& os, const data_resource& r) {
+    return os << r.name();
+}
+
--- a/auth/data_resource.hh
+++ b/auth/data_resource.hh
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <iosfwd>
+#include <seastar/core/sstring.hh>
+
+namespace auth {
+
+class data_resource {
+private:
+    enum class level {
+        ROOT, KEYSPACE, COLUMN_FAMILY
+    };
+
+    static const sstring ROOT_NAME;
+
+    sstring _ks;
+    sstring _cf;
+
+    data_resource(level, const sstring& ks = {}, const sstring& cf = {});
+
+    level get_level() const;
+public:
+    /**
+     * Creates a DataResource representing the root-level resource.
+     * @return the root-level resource.
+     */
+    data_resource();
+    /**
+     * Creates a DataResource representing a keyspace.
+     *
+     * @param keyspace Name of the keyspace.
+     */
+    data_resource(const sstring& ks);
+    /**
+     * Creates a DataResource instance representing a column family.
+     *
+     * @param keyspace Name of the keyspace.
+     * @param columnFamily Name of the column family.
+     */
+    data_resource(const sstring& ks, const sstring& cf);
+
+    /**
+     * Parses a data resource name into a DataResource instance.
+     *
+     * @param name Name of the data resource.
+     * @return DataResource instance matching the name.
+     */
+    static data_resource from_name(const sstring&);
+
+    /**
+     * @return Printable name of the resource.
+     */
+    sstring name() const;
+
+    /**
+     * @return Parent of the resource, if any. Throws IllegalStateException if it's the root-level resource.
+     */
+    data_resource get_parent() const;
+
+    bool is_root_level() const {
+        return get_level() == level::ROOT;
+    }
+
+    bool is_keyspace_level() const {
+        return get_level() == level::KEYSPACE;
+    }
+
+    bool is_column_family_level() const {
+        return get_level() == level::COLUMN_FAMILY;
+    }
+
+    /**
+     * @return keyspace of the resource.
+     * @throws std::invalid_argument if it's the root-level resource.
+     */
+    const sstring& keyspace() const throw(std::invalid_argument);
+
+    /**
+     * @return column family of the resource.
+     * @throws std::invalid_argument if it's not a cf-level resource.
+     */
+    const sstring& column_family() const throw(std::invalid_argument);
+
+    /**
+     * @return Whether or not the resource has a parent in the hierarchy.
+     */
+    bool has_parent() const;
+
+    /**
+     * @return Whether or not the resource exists in scylla.
+     */
+    bool exists() const;
+
+    sstring to_string() const;
+
+    bool operator==(const data_resource&) const;
+    bool operator<(const data_resource&) const;
+};
+
+std::ostream& operator<<(std::ostream&, const data_resource&);
+
+}
+
+
+
--- a/auth/password_authenticator.cc
+++ b/auth/password_authenticator.cc
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <unistd.h>
+#include <crypt.h>
+#include <random>
+#include <chrono>
+
+#include <seastar/core/reactor.hh>
+
+#include "auth.hh"
+#include "password_authenticator.hh"
+#include "authenticated_user.hh"
+#include "cql3/query_processor.hh"
+#include "log.hh"
+
+const sstring auth::password_authenticator::PASSWORD_AUTHENTICATOR_NAME("org.apache.cassandra.auth.PasswordAuthenticator");
+
+// name of the hash column.
+static const sstring SALTED_HASH = "salted_hash";
+static const sstring USER_NAME = "username";
+static const sstring DEFAULT_USER_NAME = auth::auth::DEFAULT_SUPERUSER_NAME;
+static const sstring DEFAULT_USER_PASSWORD = auth::auth::DEFAULT_SUPERUSER_NAME;
+static const sstring CREDENTIALS_CF = "credentials";
+
+static logging::logger logger("password_authenticator");
+
+auth::password_authenticator::~password_authenticator()
+{}
+
+auth::password_authenticator::password_authenticator()
+{}
+
+// TODO: blowfish
+// Origin uses Java bcrypt library, i.e. blowfish salt
+// generation and hashing, which is arguably a "better"
+// password hash than sha/md5 versions usually available in
+// crypt_r. Otoh, glibc 2.7+ uses a modified sha512 algo
+// which should be the same order of safe, so the only
+// real issue should be salted hash compatibility with
+// origin if importing system tables from there.
+//
+// Since bcrypt/blowfish is _not_ (afaict) not available
+// as a dev package/lib on most linux distros, we'd have to
+// copy and compile for example OWL  crypto
+// (http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/)
+// to be fully bit-compatible.
+//
+// Until we decide this is needed, let's just use crypt_r,
+// and some old-fashioned random salt generation.
+
+static constexpr size_t rand_bytes = 16;
+
+static sstring hashpw(const sstring& pass, const sstring& salt) {
+    // crypt_data is huge. should this be a thread_local static?
+    auto tmp = std::make_unique<crypt_data>();
+    tmp->initialized = 0;
+    auto res = crypt_r(pass.c_str(), salt.c_str(), tmp.get());
+    if (res == nullptr) {
+        throw std::system_error(errno, std::system_category());
+    }
+    return res;
+}
+
+static bool checkpw(const sstring& pass, const sstring& salted_hash) {
+    auto tmp = hashpw(pass, salted_hash);
+    return tmp == salted_hash;
+}
+
+static sstring gensalt() {
+    static sstring prefix;
+
+    std::random_device rd;
+    std::default_random_engine e1(rd());
+    std::uniform_int_distribution<char> dist;
+
+    sstring valid_salt = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
+    sstring input(rand_bytes, 0);
+
+    for (char&c : input) {
+        c = valid_salt[dist(e1) % valid_salt.size()];
+    }
+
+    sstring salt;
+
+    if (!prefix.empty()) {
+        return prefix + salt;
+    }
+
+    auto tmp = std::make_unique<crypt_data>();
+    tmp->initialized = 0;
+
+    // Try in order:
+    // blowfish 2011 fix, blowfish, sha512, sha256, md5
+    for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
+        salt = pfx + input;
+        if (crypt_r("fisk", salt.c_str(), tmp.get())) {
+            prefix = pfx;
+            return salt;
+        }
+    }
+    throw std::runtime_error("Could not initialize hashing algorithm");
+}
+
+static sstring hashpw(const sstring& pass) {
+    return hashpw(pass, gensalt());
+}
+
+future<> auth::password_authenticator::init() {
+    gensalt(); // do this once to determine usable hashing
+
+    sstring create_table = sprint(
+                    "CREATE TABLE %s.%s ("
+                                    "%s text,"
+                                    "%s text," // salt + hash + number of rounds
+                                    "options map<text,text>,"// for future extensions
+                                    "PRIMARY KEY(%s)"
+                                    ") WITH gc_grace_seconds=%d",
+                    auth::auth::AUTH_KS,
+                    CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
+                    90 * 24 * 60 * 60); // 3 months.
+
+    return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
+        // instead of once-timer, just schedule this later
+        auth::schedule_when_up([] {
+            return auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
+                if (!exists) {
+                    cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
+                                                    auth::AUTH_KS,
+                                                    CREDENTIALS_CF,
+                                                    USER_NAME, SALTED_HASH
+                                    ),
+                                    db::consistency_level::ONE, {DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD)}).then([](auto) {
+                                        logger.info("Created default user '{}'", DEFAULT_USER_NAME);
+                                    });
+                }
+            });
+        });
+    });
+}
+
+db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
+    if (username == DEFAULT_USER_NAME) {
+        return db::consistency_level::QUORUM;
+    }
+    return db::consistency_level::LOCAL_ONE;
+}
+
+const sstring& auth::password_authenticator::class_name() const {
+    return PASSWORD_AUTHENTICATOR_NAME;
+}
+
+bool auth::password_authenticator::require_authentication() const {
+    return true;
+}
+
+auth::authenticator::option_set auth::password_authenticator::supported_options() const {
+    return option_set::of<option::PASSWORD>();
+}
+
+auth::authenticator::option_set auth::password_authenticator::alterable_options() const {
+    return option_set::of<option::PASSWORD>();
+}
+
+future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::authenticate(
+                const credentials_map& credentials) const
+                                throw (exceptions::authentication_exception) {
+    if (!credentials.count(USERNAME_KEY)) {
+        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
+    }
+    if (!credentials.count(PASSWORD_KEY)) {
+        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));
+    }
+
+    auto& username = credentials.at(USERNAME_KEY);
+    auto& password = credentials.at(PASSWORD_KEY);
+
+    // Here was a thread local, explicit cache of prepared statement. In normal execution this is
+    // fine, but since we in testing set up and tear down system over and over, we'd start using
+    // obsolete prepared statements pretty quickly.
+    // Rely on query processing caching statements instead, and lets assume
+    // that a map lookup string->statement is not gonna kill us much.
+    auto& qp = cql3::get_local_query_processor();
+    return qp.process(
+                    sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
+                                    auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
+                    consistency_for_user(username), { username }, true).then_wrapped(
+                    [=](future<::shared_ptr<cql3::untyped_result_set>> f) {
+        try {
+            auto res = f.get0();
+            if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
+                throw exceptions::authentication_exception("Username and/or password are incorrect");
+            }
+            return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>(username));
+        } catch (std::system_error &) {
+            std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
+        } catch (exceptions::request_execution_exception& e) {
+            std::throw_with_nested(exceptions::authentication_exception(e.what()));
+        }
+    });
+}
+
+future<> auth::password_authenticator::create(sstring username,
+                const option_map& options)
+                                throw (exceptions::request_validation_exception,
+                                exceptions::request_execution_exception) {
+    try {
+        auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
+        auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
+                        auth::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
+        auto& qp = cql3::get_local_query_processor();
+        return qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
+    } catch (std::out_of_range&) {
+        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
+    }
+}
+
+future<> auth::password_authenticator::alter(sstring username,
+                const option_map& options)
+                                throw (exceptions::request_validation_exception,
+                                exceptions::request_execution_exception) {
+    try {
+        auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
+        auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
+                        auth::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
+        auto& qp = cql3::get_local_query_processor();
+        return qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
+    } catch (std::out_of_range&) {
+        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
+    }
+}
+
+future<> auth::password_authenticator::drop(sstring username)
+                throw (exceptions::request_validation_exception,
+                exceptions::request_execution_exception) {
+    try {
+        auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
+                        auth::AUTH_KS, CREDENTIALS_CF, USER_NAME);
+        auto& qp = cql3::get_local_query_processor();
+        return qp.process(query, consistency_for_user(username), { username }).discard_result();
+    } catch (std::out_of_range&) {
+        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
+    }
+}
+
+auth::authenticator::resource_ids auth::password_authenticator::protected_resources() const {
+    return { data_resource(auth::AUTH_KS, CREDENTIALS_CF) };
+}
+
+::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
+    class plain_text_password_challenge: public sasl_challenge {
+    public:
+        plain_text_password_challenge(const password_authenticator& a)
+                        : _authenticator(a)
+        {}
+
+        /**
+         * SASL PLAIN mechanism specifies that credentials are encoded in a
+         * sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).
+         * The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}
+         * authzId is optional, and in fact we don't care about it here as we'll
+         * set the authzId to match the authnId (that is, there is no concept of
+         * a user being authorized to act on behalf of another).
+         *
+         * @param bytes encoded credentials string sent by the client
+         * @return map containing the username/password pairs in the form an IAuthenticator
+         * would expect
+         * @throws javax.security.sasl.SaslException
+         */
+        bytes evaluate_response(bytes_view client_response)
+                        throw (exceptions::authentication_exception) override {
+            logger.debug("Decoding credentials from client token");
+
+            sstring username, password;
+
+            auto b = client_response.crbegin();
+            auto e = client_response.crend();
+            auto i = b;
+
+            while (i != e) {
+                if (*i == 0) {
+                    sstring tmp(i.base(), b.base());
+                    if (password.empty()) {
+                        password = std::move(tmp);
+                    } else if (username.empty()) {
+                        username = std::move(tmp);
+                    }
+                    b = ++i;
+                    continue;
+                }
+                ++i;
+            }
+
+            if (username.empty()) {
+                throw exceptions::authentication_exception("Authentication ID must not be null");
+            }
+            if (password.empty()) {
+                throw exceptions::authentication_exception("Password must not be null");
+            }
+
+            _credentials[USERNAME_KEY] = std::move(username);
+            _credentials[PASSWORD_KEY] = std::move(password);
+            _complete = true;
+            return {};
+        }
+        bool is_complete() const override {
+            return _complete;
+        }
+        future<::shared_ptr<authenticated_user>> get_authenticated_user() const
+                        throw (exceptions::authentication_exception) override {
+            return _authenticator.authenticate(_credentials);
+        }
+    private:
+        const password_authenticator& _authenticator;
+        credentials_map _credentials;
+        bool _complete = false;
+    };
+    return ::make_shared<plain_text_password_challenge>(*this);
+}
--- a/streaming/messages/stream_message.hh
+++ b/streaming/messages/stream_message.hh
@@ -14,9 +14,12 @@
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
 *
- * Modified by Cloudius Systems.
- * Copyright 2015 Cloudius Systems.
+ * Modified by Cloudius Systems
 */

 /*
@@ -38,54 +41,33 @@

 #pragma once

-namespace streaming {
-namespace messages {
+#include "authenticator.hh"

-/**
- * StreamMessage is an abstract base class that every messages in streaming protocol inherit.
- *
- * Every message carries message type({@link Type}) and streaming protocol version byte.
- */
-class stream_message {
+namespace auth {
+
+class password_authenticator : public authenticator {
 public:
-    enum class Type {
-        PREPARE,
-        FILE,
-        RECEIVED,
-        RETRY,
-        COMPLETE,
-        SESSION_FAILED,
-    };
+    static const sstring PASSWORD_AUTHENTICATOR_NAME;

-    Type type;
-    int priority;
+    password_authenticator();
+    ~password_authenticator();

-    stream_message() = default;
+    future<> init();

-    stream_message(Type type_)
-        : type(type_) {
-        if (type == Type::PREPARE) {
-            priority = 5;
-        } else if (type == Type::FILE) {
-            priority = 0;
-        } else if (type == Type::RECEIVED) {
-            priority = 4;
-        } else if (type == Type::RETRY) {
-            priority = 4;
-        } else if (type == Type::COMPLETE) {
-            priority = 1;
-        } else if (type == Type::SESSION_FAILED) {
-            priority = 5;
-        }
-    }
+    const sstring& class_name() const override;
+    bool require_authentication() const override;
+    option_set supported_options() const override;
+    option_set alterable_options() const override;
+    future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override;
+    future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
+    future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
+    future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
+    resource_ids protected_resources() const override;
+    ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;

-    /**
-     * @return priority of this message. higher value, higher priority.
-     */
-    int get_priority() {
-        return priority;
-    }
+
+    static db::consistency_level consistency_for_user(const sstring& username);
 };

-} // namespace messages
-} // namespace streaming
+}
+
--- a/auth/permission.cc
+++ b/auth/permission.cc
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "permission.hh"
+
+const auth::permission_set auth::ALL_DATA = auth::permission_set::of
+                < auth::permission::CREATE, auth::permission::ALTER,
+                auth::permission::DROP, auth::permission::SELECT,
+                auth::permission::MODIFY, auth::permission::AUTHORIZE>();
+const auth::permission_set auth::ALL = auth::ALL_DATA;
+const auth::permission_set auth::NONE;
--- a/auth/permission.hh
+++ b/auth/permission.hh
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "enum_set.hh"
+
+namespace auth {
+
+enum class permission {
+    //Deprecated
+    READ,
+    //Deprecated
+    WRITE,
+
+    // schema management
+    CREATE, // required for CREATE KEYSPACE and CREATE TABLE.
+    ALTER,  // required for ALTER KEYSPACE, ALTER TABLE, CREATE INDEX, DROP INDEX.
+    DROP,   // required for DROP KEYSPACE and DROP TABLE.
+
+    // data access
+    SELECT, // required for SELECT.
+    MODIFY, // required for INSERT, UPDATE, DELETE, TRUNCATE.
+
+    // permission management
+    AUTHORIZE, // required for GRANT and REVOKE.
+};
+
+typedef enum_set<super_enum<permission,
+                permission::READ,
+                permission::WRITE,
+                permission::CREATE,
+                permission::ALTER,
+                permission::DROP,
+                permission::SELECT,
+                permission::MODIFY,
+                permission::AUTHORIZE>> permission_set;
+
+extern const permission_set ALL_DATA;
+extern const permission_set ALL;
+extern const permission_set NONE;
+
+}
--- a/bytes.hh
+++ b/bytes.hh
@@ -22,6 +22,7 @@
 #pragma once

 #include "core/sstring.hh"
+#include "hashing.hh"
 #include <experimental/optional>
 #include <iosfwd>
 #include <functional>
@@ -57,3 +58,20 @@ std::ostream& operator<<(std::ostream& os, const bytes_view& b);

 }

+template<>
+struct appending_hash<bytes> {
+    template<typename Hasher>
+    void operator()(Hasher& h, const bytes& v) const {
+        feed_hash(h, v.size());
+        h.update(reinterpret_cast<const char*>(v.cbegin()), v.size() * sizeof(bytes::value_type));
+    }
+};
+
+template<>
+struct appending_hash<bytes_view> {
+    template<typename Hasher>
+    void operator()(Hasher& h, bytes_view v) const {
+        feed_hash(h, v.size());
+        h.update(reinterpret_cast<const char*>(v.begin()), v.size() * sizeof(bytes_view::value_type));
+    }
+};
--- a/bytes_ostream.hh
+++ b/bytes_ostream.hh
@@ -24,6 +24,7 @@
 #include "types.hh"
 #include "net/byteorder.hh"
 #include "core/unaligned.hh"
+#include "hashing.hh"

 /**
 * Utility for writing data into a buffer when its final size is not known up front.
@@ -205,6 +206,10 @@ public:
        }
    }

+    void write(const char* ptr, size_t size) {
+        write(bytes_view(reinterpret_cast<const signed char*>(ptr), size));
+    }
+
    // Writes given sequence of bytes with a preceding length component encoded in big-endian format
    inline void write_blob(bytes_view v) {
        assert((size_type)v.size() == v.size());
@@ -332,3 +337,13 @@ public:
        _current->offset = pos._offset;
    }
 };
+
+template<>
+struct appending_hash<bytes_ostream> {
+    template<typename Hasher>
+    void operator()(Hasher& h, const bytes_ostream& b) const {
+        for (auto&& frag : b.fragments()) {
+            feed_hash(h, frag);
+        }
+    }
+};
--- a/canonical_mutation.cc
+++ b/canonical_mutation.cc
@@ -0,0 +1,103 @@
+/*
+ * Copyright (C) 2015 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "canonical_mutation.hh"
+#include "mutation.hh"
+#include "mutation_partition_serializer.hh"
+#include "converting_mutation_partition_applier.hh"
+#include "hashing_partition_visitor.hh"
+
+template class db::serializer<canonical_mutation>;
+
+//
+// Representation layout:
+//
+// <canonical_mutation> ::= <column_family_id> <table_schema_version> <partition_key> <column-mapping> <partition>
+//
+// For <partition> see mutation_partition_serializer.cc
+// For <column-mapping> see db::serializer<column_mapping>
+//
+
+canonical_mutation::canonical_mutation(bytes data)
+        : _data(std::move(data))
+{ }
+
+canonical_mutation::canonical_mutation(const mutation& m)
+    : _data([&m] {
+        bytes_ostream out;
+        db::serializer<utils::UUID>(m.column_family_id()).write(out);
+        db::serializer<table_schema_version>(m.schema()->version()).write(out);
+        db::serializer<partition_key_view>(m.key()).write(out);
+        db::serializer<column_mapping>(m.schema()->get_column_mapping()).write(out);
+        mutation_partition_serializer ser(*m.schema(), m.partition());
+        ser.write(out);
+        return to_bytes(out.linearize());
+    }())
+{ }
+
+utils::UUID canonical_mutation::column_family_id() const {
+    data_input in(_data);
+    return db::serializer<utils::UUID>::read(in);
+}
+
+mutation canonical_mutation::to_mutation(schema_ptr s) const {
+    data_input in(_data);
+
+    auto cf_id = db::serializer<utils::UUID>::read(in);
+    if (s->id() != cf_id) {
+        throw std::runtime_error(sprint("Attempted to deserialize canonical_mutation of table %s with schema of table %s (%s.%s)",
+                                        cf_id, s->id(), s->ks_name(), s->cf_name()));
+    }
+
+    auto version = db::serializer<table_schema_version>::read(in);
+    auto pk = partition_key(db::serializer<partition_key_view>::read(in));
+
+    mutation m(std::move(pk), std::move(s));
+
+    if (version == m.schema()->version()) {
+        db::serializer<column_mapping>::skip(in);
+        auto partition_view = mutation_partition_serializer::read_as_view(in);
+        m.partition().apply(*m.schema(), partition_view, *m.schema());
+    } else {
+        column_mapping cm = db::serializer<column_mapping>::read(in);
+        converting_mutation_partition_applier v(cm, *m.schema(), m.partition());
+        auto partition_view = mutation_partition_serializer::read_as_view(in);
+        partition_view.accept(cm, v);
+    }
+    return m;
+}
+
+template<>
+db::serializer<canonical_mutation>::serializer(const canonical_mutation& v)
+        : _item(v)
+        , _size(db::serializer<bytes>(v._data).size())
+{ }
+
+template<>
+void
+db::serializer<canonical_mutation>::write(output& out, const canonical_mutation& v) {
+    db::serializer<bytes>(v._data).write(out);
+}
+
+template<>
+canonical_mutation db::serializer<canonical_mutation>::read(input& in) {
+    return canonical_mutation(db::serializer<bytes>::read(in));
+}
--- a/canonical_mutation.hh
+++ b/canonical_mutation.hh
@@ -0,0 +1,65 @@
+/*
+ * Copyright (C) 2015 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "bytes.hh"
+#include "schema.hh"
+#include "database_fwd.hh"
+#include "db/serializer.hh"
+#include "mutation_partition_visitor.hh"
+#include "mutation_partition_serializer.hh"
+
+// Immutable mutation form which can be read using any schema version of the same table.
+// Safe to access from other shards via const&.
+// Safe to pass serialized across nodes.
+class canonical_mutation {
+    bytes _data;
+    canonical_mutation(bytes);
+public:
+    explicit canonical_mutation(const mutation&);
+
+    canonical_mutation(canonical_mutation&&) = default;
+    canonical_mutation(const canonical_mutation&) = default;
+    canonical_mutation& operator=(const canonical_mutation&) = default;
+    canonical_mutation& operator=(canonical_mutation&&) = default;
+
+    // Create a mutation object interpreting this canonical mutation using
+    // given schema.
+    //
+    // Data which is not representable in the target schema is dropped. If this
+    // is not intended, user should sync the schema first.
+    mutation to_mutation(schema_ptr) const;
+
+    utils::UUID column_family_id() const;
+
+    friend class db::serializer<canonical_mutation>;
+};
+
+namespace db {
+
+template<> serializer<canonical_mutation>::serializer(const canonical_mutation&);
+template<> void serializer<canonical_mutation>::write(output&, const canonical_mutation&);
+template<> canonical_mutation serializer<canonical_mutation>::read(input&);
+
+extern template class serializer<canonical_mutation>;
+
+}
--- a/compaction_strategy.hh
+++ b/compaction_strategy.hh
@@ -34,6 +34,8 @@ enum class compaction_strategy_type {
 };

 class compaction_strategy_impl;
+class sstable;
+struct compaction_descriptor;

 class compaction_strategy {
    ::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
@@ -46,7 +48,9 @@ public:
    compaction_strategy(compaction_strategy&&);
    compaction_strategy& operator=(compaction_strategy&&);

-    future<> compact(column_family& cfs);
+    // Return a list of sstables to be compacted after applying the strategy.
+    compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
+
    static sstring name(compaction_strategy_type type) {
        switch (type) {
        case compaction_strategy_type::null:
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -169,6 +169,17 @@ rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160

+# port for REST API server
+api_port: 10000
+
+# IP for the REST API server
+api_address: 127.0.0.1
+
+# Log WARN on any batch size exceeding this value. 5kb per batch by default.
+# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
+batch_size_warn_threshold_in_kb: 5
+
+
 ###################################################
 ## Not currently supported, reserved for future use
 ###################################################
@@ -205,7 +216,7 @@ rpc_port: 9160
 # reduced proportionally to the number of nodes in the cluster.
 # batchlog_replay_throttle_in_kb: 1024

-# Authentication backend, implementing IAuthenticator; used to identify users
+# Authentication backend, identifying users
 # Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
 # PasswordAuthenticator}.
 #
@@ -599,10 +610,6 @@ commitlog_total_space_in_mb: -1
 # column_index_size_in_kb: 64


-# Log WARN on any batch size exceeding this value. 5kb per batch by default.
-# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
-# batch_size_warn_threshold_in_kb: 5
-
 # Number of simultaneous compactions to allow, NOT including
 # validation "compactions" for anti-entropy repair.  Simultaneous
 # compactions can help preserve read performance in a mixed read/write
--- a/configure.py
+++ b/configure.py
@@ -50,6 +50,9 @@ def apply_tristate(var, test, note, missing):
            return False
    return False

+def have_pkg(package):
+    return subprocess.call(['pkg-config', package]) == 0
+
 def pkg_config(option, package):
    output = subprocess.check_output(['pkg-config', option, package])
    return output.decode('utf-8').strip()
@@ -134,6 +137,7 @@ modes = {

 scylla_tests = [
    'tests/mutation_test',
+    'tests/canonical_mutation_test',
    'tests/range_test',
    'tests/types_test',
    'tests/keys_test',
@@ -151,6 +155,7 @@ scylla_tests = [
    'tests/perf/perf_sstable',
    'tests/cql_query_test',
    'tests/storage_proxy_test',
+    'tests/schema_change_test',
    'tests/mutation_reader_test',
    'tests/key_reader_test',
    'tests/mutation_query_test',
@@ -184,6 +189,7 @@ scylla_tests = [
    'tests/crc_test',
    'tests/flush_queue_test',
    'tests/dynamic_bitset_test',
+    'tests/auth_test',
 ]

 apps = [
@@ -222,6 +228,8 @@ arg_parser.add_argument('--static-stdc++', dest = 'staticcxx', action = 'store_t
 			help = 'Link libgcc and libstdc++ statically')
 arg_parser.add_argument('--tests-debuginfo', action = 'store', dest = 'tests_debuginfo', type = int, default = 0,
                        help = 'Enable(1)/disable(0)compiler debug information generation for tests')
+arg_parser.add_argument('--python', action = 'store', dest = 'python', default = 'python3',
+                        help = 'Python3 path')
 add_tristate(arg_parser, name = 'hwloc', dest = 'hwloc', help = 'hwloc support')
 add_tristate(arg_parser, name = 'xen', dest = 'xen', help = 'Xen support')
 args = arg_parser.parse_args()
@@ -235,11 +243,15 @@ cassandra_interface = Thrift(source = 'interface/cassandra.thrift', service = 'C

 scylla_core = (['database.cc',
                 'schema.cc',
+                 'frozen_schema.cc',
+                 'schema_registry.cc',
                 'bytes.cc',
                 'mutation.cc',
                 'row_cache.cc',
+                 'canonical_mutation.cc',
                 'frozen_mutation.cc',
                 'memtable.cc',
+                 'schema_mutations.cc',
                 'release.cc',
                 'utils/logalloc.cc',
                 'utils/large_bitset.cc',
@@ -257,6 +269,7 @@ scylla_core = (['database.cc',
                 'sstables/partition.cc',
                 'sstables/filter.cc',
                 'sstables/compaction.cc',
+                 'sstables/compaction_manager.cc',
                 'log.cc',
                 'transport/event.cc',
                 'transport/event_notifier.cc',
@@ -304,6 +317,7 @@ scylla_core = (['database.cc',
                 'utils/big_decimal.cc',
                 'types.cc',
                 'validation.cc',
+                 'service/priority_manager.cc',
                 'service/migration_manager.cc',
                 'service/storage_proxy.cc',
                 'cql3/operator.cc',
@@ -341,7 +355,6 @@ scylla_core = (['database.cc',
                 'utils/bloom_filter.cc',
                 'utils/bloom_calculations.cc',
                 'utils/rate_limiter.cc',
-                 'utils/compaction_manager.cc',
                 'utils/file_lock.cc',
                 'utils/dynamic_bitset.cc',
                 'gms/version_generator.cc',
@@ -375,13 +388,13 @@ scylla_core = (['database.cc',
                 'locator/ec2_snitch.cc',
                 'locator/ec2_multi_region_snitch.cc',
                 'message/messaging_service.cc',
+                 'service/client_state.cc',
                 'service/migration_task.cc',
                 'service/storage_service.cc',
                 'service/pending_range_calculator_service.cc',
                 'service/load_broadcaster.cc',
                 'service/pager/paging_state.cc',
                 'service/pager/query_pagers.cc',
-                 'streaming/streaming.cc',
                 'streaming/stream_task.cc',
                 'streaming/stream_session.cc',
                 'streaming/stream_request.cc',
@@ -394,13 +407,6 @@ scylla_core = (['database.cc',
                 'streaming/stream_coordinator.cc',
                 'streaming/stream_manager.cc',
                 'streaming/stream_result_future.cc',
-                 'streaming/messages/stream_init_message.cc',
-                 'streaming/messages/retry_message.cc',
-                 'streaming/messages/received_message.cc',
-                 'streaming/messages/prepare_message.cc',
-                 'streaming/messages/file_message_header.cc',
-                 'streaming/messages/outgoing_file_message.cc',
-                 'streaming/messages/incoming_file_message.cc',
                 'streaming/stream_session_state.cc',
                 'gc_clock.cc',
                 'partition_slice_builder.cc',
@@ -408,6 +414,12 @@ scylla_core = (['database.cc',
                 'repair/repair.cc',
                 'exceptions/exceptions.cc',
                 'dns.cc',
+                 'auth/auth.cc',
+                 'auth/authenticated_user.cc',
+                 'auth/authenticator.cc',
+                 'auth/data_resource.cc',
+                 'auth/password_authenticator.cc',
+                 'auth/permission.cc',
                 ]
                + [Antlr3Grammar('cql3/Cql.g')]
                + [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -447,7 +459,21 @@ api = ['api/api.cc',
       'api/system.cc'
       ]

-scylla_tests_dependencies = scylla_core + [
+idls = ['idl/gossip_digest.idl.hh',
+        'idl/uuid.idl.hh',
+        'idl/range.idl.hh',
+        'idl/keys.idl.hh',
+        'idl/read_command.idl.hh',
+        'idl/token.idl.hh',
+        'idl/ring_position.idl.hh',
+        'idl/result.idl.hh',
+        'idl/frozen_mutation.idl.hh',
+        'idl/reconcilable_result.idl.hh',
+        'idl/streaming.idl.hh',
+        'idl/paging_state.idl.hh',
+        ]
+
+scylla_tests_dependencies = scylla_core + api + idls + [
    'tests/cql_test_env.cc',
    'tests/cql_assertions.cc',
    'tests/result_set_assertions.cc',
@@ -460,15 +486,15 @@ scylla_tests_seastar_deps = [
 ]

 deps = {
-    'scylla': ['main.cc'] + scylla_core + api,
+    'scylla': idls + ['main.cc'] + scylla_core + api,
 }

 tests_not_using_seastar_test_framework = set([
-    'tests/types_test',
    'tests/keys_test',
    'tests/partitioner_test',
    'tests/map_difference_test',
    'tests/frozen_mutation_test',
+    'tests/canonical_mutation_test',
    'tests/perf/perf_mutation',
    'tests/lsa_async_eviction_test',
    'tests/lsa_sync_eviction_test',
@@ -530,6 +556,32 @@ else:
    args.pie = ''
    args.fpie = ''

+# a list element means a list of alternative packages to consider
+# the first element becomes the HAVE_pkg define
+# a string element is a package name with no alternatives
+optional_packages = [['libsystemd', 'libsystemd-daemon']]
+pkgs = []
+
+def setup_first_pkg_of_list(pkglist):
+    # The HAVE_pkg symbol is taken from the first alternative
+    upkg = pkglist[0].upper().replace('-', '_')
+    for pkg in pkglist:
+        if have_pkg(pkg):
+            pkgs.append(pkg)
+            defines.append('HAVE_{}=1'.format(upkg))
+            return True
+    return False
+
+for pkglist in optional_packages:
+    if isinstance(pkglist, str):
+        pkglist = [pkglist]
+    if not setup_first_pkg_of_list(pkglist):
+        if len(pkglist) == 1:
+            print('Missing optional package {pkglist[0]}'.format(**locals()))
+        else:
+            alternatives = ':'.join(pkglist[1:])
+            print('Missing optional package {pkglist[0]} (or alteratives {alternatives})'.format(**locals()))
+
 defines = ' '.join(['-D' + d for d in defines])

 globals().update(vars(args))
@@ -562,7 +614,7 @@ elif args.dpdk_target:
 seastar_cflags = args.user_cflags + " -march=nehalem"
 seastar_flags += ['--compiler', args.cxx, '--cflags=%s' % (seastar_cflags)]

-status = subprocess.call(['./configure.py'] + seastar_flags, cwd = 'seastar')
+status = subprocess.call([python, './configure.py'] + seastar_flags, cwd = 'seastar')

 if status != 0:
    print('Seastar configuration failed')
@@ -591,7 +643,10 @@ for mode in build_modes:
 seastar_deps = 'practically_anything_can_change_so_lets_run_it_every_time_and_restat.'

 args.user_cflags += " " + pkg_config("--cflags", "jsoncpp")
-libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem'
+libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem' + ' -lcrypt'
+for pkg in pkgs:
+    args.user_cflags += ' ' + pkg_config('--cflags', pkg)
+    libs += ' ' + pkg_config('--libs', pkg)
 user_cflags = args.user_cflags
 user_ldflags = args.user_ldflags
 if args.staticcxx:
@@ -623,6 +678,9 @@ with open(buildfile, 'w') as f:
        rule swagger
            command = seastar/json/json2code.py -f $in -o $out
            description = SWAGGER $out
+        rule serializer
+            command = ./idl-compiler.py --ns ser -f $in -o $out
+            description = IDL compiler $out
        rule ninja
            command = {ninja} -C $subdir $target
            restat = 1
@@ -659,6 +717,7 @@ with open(buildfile, 'w') as f:
        compiles = {}
        ragels = {}
        swaggers = {}
+        serializers = {}
        thrifts = set()
        antlr3_grammars = set()
        for binary in build_artifacts:
@@ -712,6 +771,9 @@ with open(buildfile, 'w') as f:
                elif src.endswith('.rl'):
                    hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
                    ragels[hh] = src
+                elif src.endswith('.idl.hh'):
+                    hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
+                    serializers[hh] = src
                elif src.endswith('.json'):
                    hh = '$builddir/' + mode + '/gen/' + src + '.hh'
                    swaggers[hh] = src
@@ -730,6 +792,7 @@ with open(buildfile, 'w') as f:
            for g in antlr3_grammars:
                gen_headers += g.headers('$builddir/{}/gen'.format(mode))
            gen_headers += list(swaggers.keys())
+            gen_headers += list(serializers.keys())
            f.write('build {}: cxx.{} {} || {} \n'.format(obj, mode, src, ' '.join(gen_headers)))
            if src in extra_cxxflags:
                f.write('    cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode = mode, extra_cxxflags = extra_cxxflags[src], **modeval))
@@ -739,6 +802,9 @@ with open(buildfile, 'w') as f:
        for hh in swaggers:
            src = swaggers[hh]
            f.write('build {}: swagger {}\n'.format(hh,src))
+        for hh in serializers:
+            src = serializers[hh]
+            f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh,src))
        for thrift in thrifts:
            outs = ' '.join(thrift.generated('$builddir/{}/gen'.format(mode)))
            f.write('build {}: thrift.{} {}\n'.format(outs, mode, thrift.source))
@@ -758,7 +824,7 @@ with open(buildfile, 'w') as f:
    f.write('build {}: phony\n'.format(seastar_deps))
    f.write(textwrap.dedent('''\
        rule configure
-          command = python3 configure.py $configure_args
+          command = {python} configure.py $configure_args
          generator = 1
        build build.ninja: configure | configure.py
        rule cscope
--- a/converting_mutation_partition_applier.hh
+++ b/converting_mutation_partition_applier.hh
@@ -0,0 +1,119 @@
+/*
+ * Copyright (C) 2015 Cloudius Systems, Ltd.
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "mutation_partition_view.hh"
+#include "schema.hh"
+
+// Mutation partition visitor which applies visited data into
+// existing mutation_partition. The visited data may be of a different schema.
+// Data which is not representable in the new schema is dropped.
+// Weak exception guarantees.
+class converting_mutation_partition_applier : public mutation_partition_visitor {
+    const schema& _p_schema;
+    mutation_partition& _p;
+    const column_mapping& _visited_column_mapping;
+    deletable_row* _current_row;
+private:
+    static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
+        return new_def.kind == kind && new_def.type->is_value_compatible_with(*old_type);
+    }
+    void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
+        if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
+            dst.apply(new_def, atomic_cell_or_collection(cell));
+        }
+    }
+    void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
+        if (!is_compatible(new_def, old_type, kind)) {
+            return;
+        }
+        auto&& ctype = static_pointer_cast<const collection_type_impl>(old_type);
+        auto old_view = ctype->deserialize_mutation_form(cell);
+
+        collection_type_impl::mutation_view new_view;
+        if (old_view.tomb.timestamp > new_def.dropped_at()) {
+            new_view.tomb = old_view.tomb;
+        }
+        for (auto& c : old_view.cells) {
+            if (c.second.timestamp() > new_def.dropped_at()) {
+                new_view.cells.emplace_back(std::move(c));
+            }
+        }
+        dst.apply(new_def, ctype->serialize_mutation_form(std::move(new_view)));
+    }
+public:
+    converting_mutation_partition_applier(
+            const column_mapping& visited_column_mapping,
+            const schema& target_schema,
+            mutation_partition& target)
+        : _p_schema(target_schema)
+        , _p(target)
+        , _visited_column_mapping(visited_column_mapping)
+    { }
+
+    virtual void accept_partition_tombstone(tombstone t) override {
+        _p.apply(t);
+    }
+
+    virtual void accept_static_cell(column_id id, atomic_cell_view cell) override {
+        const column_mapping::column& col = _visited_column_mapping.static_column_at(id);
+        const column_definition* def = _p_schema.get_column_definition(col.name());
+        if (def) {
+            accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), cell);
+        }
+    }
+
+    virtual void accept_static_cell(column_id id, collection_mutation_view collection) override {
+        const column_mapping::column& col = _visited_column_mapping.static_column_at(id);
+        const column_definition* def = _p_schema.get_column_definition(col.name());
+        if (def) {
+            accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), collection);
+        }
+    }
+
+    virtual void accept_row_tombstone(clustering_key_prefix_view prefix, tombstone t) override {
+        _p.apply_row_tombstone(_p_schema, prefix, t);
+    }
+
+    virtual void accept_row(clustering_key_view key, tombstone deleted_at, const row_marker& rm) override {
+        deletable_row& r = _p.clustered_row(_p_schema, key);
+        r.apply(rm);
+        r.apply(deleted_at);
+        _current_row = &r;
+    }
+
+    virtual void accept_row_cell(column_id id, atomic_cell_view cell) override {
+        const column_mapping::column& col = _visited_column_mapping.regular_column_at(id);
+        const column_definition* def = _p_schema.get_column_definition(col.name());
+        if (def) {
+            accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), cell);
+        }
+    }
+
+    virtual void accept_row_cell(column_id id, collection_mutation_view collection) override {
+        const column_mapping::column& col = _visited_column_mapping.regular_column_at(id);
+        const column_definition* def = _p_schema.get_column_definition(col.name());
+        if (def) {
+            accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
+        }
+    }
+};
--- a/cql3/query_options.cc
+++ b/cql3/query_options.cc
@@ -99,9 +99,9 @@ query_options::query_options(query_options&& o, std::vector<std::vector<bytes_vi
    _batch_options = std::move(tmp);
 }

-query_options::query_options(std::vector<bytes_opt> values)
+query_options::query_options(db::consistency_level cl, std::vector<bytes_opt> values)
    : query_options(
-          db::consistency_level::ONE,
+          cl,
          {},
          std::move(values),
          {},
@@ -120,6 +120,11 @@ query_options::query_options(std::vector<bytes_opt> values)
    }
 }

+query_options::query_options(std::vector<bytes_opt> values)
+    : query_options(
+          db::consistency_level::ONE, std::move(values))
+{}
+
 db::consistency_level query_options::get_consistency() const
 {
    return _consistency;
--- a/cql3/query_options.hh
+++ b/cql3/query_options.hh
@@ -112,6 +112,7 @@ public:

    // forInternalUse
    explicit query_options(std::vector<bytes_opt> values);
+    explicit query_options(db::consistency_level, std::vector<bytes_opt> values);

    db::consistency_level get_consistency() const;
    bytes_view_opt get_value_at(size_t idx) const;
--- a/cql3/query_processor.cc
+++ b/cql3/query_processor.cc
@@ -109,6 +109,7 @@ future<> query_processor::stop()
 future<::shared_ptr<result_message>>
 query_processor::process(const sstring_view& query_string, service::query_state& query_state, query_options& options)
 {
+    log.trace("process: \"{}\"", query_string);
    auto p = get_statement(query_string, query_state.get_client_state());
    options.prepare(p->bound_names);
    auto cql_statement = p->statement;
@@ -299,8 +300,9 @@ query_processor::parse_statement(const sstring_view& query)
 }

 query_options query_processor::make_internal_options(
-        ::shared_ptr<statements::parsed_statement::prepared> p,
-        const std::initializer_list<data_value>& values) {
+                ::shared_ptr<statements::parsed_statement::prepared> p,
+                const std::initializer_list<data_value>& values,
+                db::consistency_level cl) {
    if (p->bound_names.size() != values.size()) {
        throw std::invalid_argument(sprint("Invalid number of values. Expecting %d but got %d", p->bound_names.size(), values.size()));
    }
@@ -316,13 +318,12 @@ query_options query_processor::make_internal_options(
            bound_values.push_back({n->type->decompose(v)});
        }
    }
-    return query_options(bound_values);
+    return query_options(cl, bound_values);
 }

 ::shared_ptr<statements::parsed_statement::prepared> query_processor::prepare_internal(
-        const std::experimental::string_view& query_string) {
-
-    auto& p = _internal_statements[sstring(query_string.begin(), query_string.end())];
+        const sstring& query_string) {
+    auto& p = _internal_statements[query_string];
    if (p == nullptr) {
        auto np = parse_statement(query_string)->prepare(_db.local());
        np->statement->validate(_proxy, *_internal_state);
@@ -332,22 +333,54 @@ query_options query_processor::make_internal_options(
 }

 future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
-        const std::experimental::string_view& query_string,
+        const sstring& query_string,
        const std::initializer_list<data_value>& values) {
    if (log.is_enabled(logging::log_level::trace)) {
        log.trace("execute_internal: \"{}\" ({})", query_string, ::join(", ", values));
    }
    auto p = prepare_internal(query_string);
+    return execute_internal(p, values);
+}
+
+future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
+        ::shared_ptr<statements::parsed_statement::prepared> p,
+        const std::initializer_list<data_value>& values) {
    auto opts = make_internal_options(p, values);
    return do_with(std::move(opts),
            [this, p = std::move(p)](query_options & opts) {
                return p->statement->execute_internal(_proxy, *_internal_state, opts).then(
-                        [](::shared_ptr<transport::messages::result_message> msg) {
+                        [p](::shared_ptr<transport::messages::result_message> msg) {
                            return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
                        });
            });
 }

+future<::shared_ptr<untyped_result_set>> query_processor::process(
+                const sstring& query_string,
+                db::consistency_level cl, const std::initializer_list<data_value>& values, bool cache)
+{
+    auto p = cache ? prepare_internal(query_string) : parse_statement(query_string)->prepare(_db.local());
+    if (!cache) {
+        p->statement->validate(_proxy, *_internal_state);
+    }
+    return process(p, cl, values);
+}
+
+future<::shared_ptr<untyped_result_set>> query_processor::process(
+                ::shared_ptr<statements::parsed_statement::prepared> p,
+                db::consistency_level cl, const std::initializer_list<data_value>& values)
+{
+    auto opts = make_internal_options(p, values, cl);
+    return do_with(std::move(opts),
+            [this, p = std::move(p)](query_options & opts) {
+                return p->statement->execute(_proxy, *_internal_state, opts).then(
+                        [p](::shared_ptr<transport::messages::result_message> msg) {
+                            return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
+                        });
+            });
+}
+
+
 future<::shared_ptr<transport::messages::result_message>>
 query_processor::process_batch(::shared_ptr<statements::batch_statement> batch, service::query_state& query_state, query_options& options) {
    auto& client_state = query_state.get_client_state();
@@ -388,8 +421,12 @@ void query_processor::migration_subscriber::on_update_keyspace(const sstring& ks
 {
 }

-void query_processor::migration_subscriber::on_update_column_family(const sstring& ks_name, const sstring& cf_name)
+void query_processor::migration_subscriber::on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed)
 {
+    if (columns_changed) {
+        log.info("Column definitions for {}.{} changed, invalidating related prepared statements", ks_name, cf_name);
+        remove_invalid_prepared_statements(ks_name, cf_name);
+    }
 }

 void query_processor::migration_subscriber::on_update_user_type(const sstring& ks_name, const sstring& type_name)
@@ -439,9 +476,7 @@ void query_processor::migration_subscriber::remove_invalid_prepared_statements(s
        }
    }
    for (auto& id : invalid) {
-        get_query_processor().invoke_on_all([id] (auto& qp) {
-            qp.invalidate_prepared_statement(id);
-        });
+        _qp->invalidate_prepared_statement(id);
    }
 }

--- a/cql3/query_processor.hh
+++ b/cql3/query_processor.hh
@@ -322,14 +322,25 @@ public:
    }
 #endif
 private:
-    ::shared_ptr<statements::parsed_statement::prepared> prepare_internal(const std::experimental::string_view& query);
-    query_options make_internal_options(::shared_ptr<statements::parsed_statement::prepared>, const std::initializer_list<data_value>&);
-
+    query_options make_internal_options(::shared_ptr<statements::parsed_statement::prepared>, const std::initializer_list<data_value>&, db::consistency_level = db::consistency_level::ONE);
 public:
    future<::shared_ptr<untyped_result_set>> execute_internal(
-            const std::experimental::string_view& query_string,
+            const sstring& query_string,
            const std::initializer_list<data_value>& = { });

+    ::shared_ptr<statements::parsed_statement::prepared> prepare_internal(const sstring& query);
+
+    future<::shared_ptr<untyped_result_set>> execute_internal(
+            ::shared_ptr<statements::parsed_statement::prepared>,
+            const std::initializer_list<data_value>& = { });
+
+    future<::shared_ptr<untyped_result_set>> process(
+                    const sstring& query_string,
+                    db::consistency_level, const std::initializer_list<data_value>& = { }, bool cache = false);
+    future<::shared_ptr<untyped_result_set>> process(
+                    ::shared_ptr<statements::parsed_statement::prepared>,
+                    db::consistency_level, const std::initializer_list<data_value>& = { });
+
    /*
     * This function provides a timestamp that is guaranteed to be higher than any timestamp
     * previously used in internal queries.
@@ -486,7 +497,7 @@ public:
    virtual void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override;

    virtual void on_update_keyspace(const sstring& ks_name) override;
-    virtual void on_update_column_family(const sstring& ks_name, const sstring& cf_name) override;
+    virtual void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed) override;
    virtual void on_update_user_type(const sstring& ks_name, const sstring& type_name) override;
    virtual void on_update_function(const sstring& ks_name, const sstring& function_name) override;
    virtual void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override;
--- a/cql3/statements/alter_table_statement.cc
+++ b/cql3/statements/alter_table_statement.cc
@@ -42,6 +42,7 @@
 #include "cql3/statements/alter_table_statement.hh"
 #include "service/migration_manager.hh"
 #include "validation.hh"
+#include "db/config.hh"

 namespace cql3 {

@@ -77,216 +78,199 @@ void alter_table_statement::validate(distributed<service::storage_proxy>& proxy,
    // validated in announce_migration()
 }

+static const sstring ALTER_TABLE_FEATURE = "ALTER TABLE";
+
 future<bool> alter_table_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only)
 {
-    throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
-#if 0
-    CFMetaData meta = validateColumnFamily(keyspace(), columnFamily());
-    CFMetaData cfm = meta.copy();
+    auto& db = proxy.local().get_db().local();
+    db.get_config().check_experimental(ALTER_TABLE_FEATURE);

-    CQL3Type validator = this.validator == null ? null : this.validator.prepare(keyspace());
-    ColumnIdentifier columnName = null;
-    ColumnDefinition def = null;
-    if (rawColumnName != null)
-    {
-        columnName = rawColumnName.prepare(cfm);
-        def = cfm.getColumnDefinition(columnName);
+    auto schema = validation::validate_column_family(db, keyspace(), column_family());
+    auto cfm = schema_builder(schema);
+
+    shared_ptr<cql3_type> validator;
+    if (_validator) {
+        validator = _validator->prepare(db, keyspace());
+    }
+    shared_ptr<column_identifier> column_name;
+    const column_definition* def = nullptr;
+    if (_raw_column_name) {
+        column_name = _raw_column_name->prepare_column_identifier(schema);
+        def = get_column_definition(schema, *column_name);
    }

-    switch (oType)
+    switch (_type) {
+    case alter_table_statement::type::add:
    {
-        case ADD:
-            assert columnName != null;
-            if (cfm.comparator.isDense())
-                throw new InvalidRequestException("Cannot add new column to a COMPACT STORAGE table");
+        assert(column_name);
+        if (schema->is_dense()) {
+            throw exceptions::invalid_request_exception("Cannot add new column to a COMPACT STORAGE table");
+        }

-            if (isStatic)
-            {
-                if (!cfm.comparator.isCompound())
-                    throw new InvalidRequestException("Static columns are not allowed in COMPACT STORAGE tables");
-                if (cfm.clusteringColumns().isEmpty())
-                    throw new InvalidRequestException("Static columns are only useful (and thus allowed) if the table has at least one clustering column");
+        if (_is_static) {
+            if (!schema->is_compound()) {
+                throw exceptions::invalid_request_exception("Static columns are not allowed in COMPACT STORAGE tables");
+            }
+            if (!schema->clustering_key_size()) {
+                throw exceptions::invalid_request_exception("Static columns are only useful (and thus allowed) if the table has at least one clustering column");
+            }
+        }
+
+        if (def) {
+            if (def->is_partition_key()) {
+                throw exceptions::invalid_request_exception(sprint("Invalid column name %s because it conflicts with a PRIMARY KEY part", column_name));
+            } else {
+                throw exceptions::invalid_request_exception(sprint("Invalid column name %s because it conflicts with an existing column", column_name));
+            }
+        }
+
+        // Cannot re-add a dropped counter column. See #7831.
+        if (schema->is_counter() && schema->dropped_columns().count(column_name->text())) {
+            throw exceptions::invalid_request_exception(sprint("Cannot re-add previously dropped counter column %s", column_name));
+        }
+
+        auto type = validator->get_type();
+        if (type->is_collection() && type->is_multi_cell()) {
+            if (!schema->is_compound()) {
+                throw exceptions::invalid_request_exception("Cannot use non-frozen collections with a non-composite PRIMARY KEY");
+            }
+            if (schema->is_super()) {
+                throw exceptions::invalid_request_exception("Cannot use non-frozen collections with super column families");
            }

-            if (def != null)
-            {
-                switch (def.kind)
-                {
-                    case PARTITION_KEY:
-                    case CLUSTERING_COLUMN:
-                        throw new InvalidRequestException(String.format("Invalid column name %s because it conflicts with a PRIMARY KEY part", columnName));
-                    default:
-                        throw new InvalidRequestException(String.format("Invalid column name %s because it conflicts with an existing column", columnName));
+            auto it = schema->collections().find(column_name->name());
+            if (it != schema->collections().end() && !type->is_compatible_with(*it->second)) {
+                throw exceptions::invalid_request_exception(sprint("Cannot add a collection with the name %s "
+                    "because a collection with the same name and a different type has already been used in the past", column_name));
+            }
+        }
+
+        cfm.with_column(column_name->name(), type, _is_static ? column_kind::static_column : column_kind::regular_column);
+        break;
+    }
+    case alter_table_statement::type::alter:
+    {
+        assert(column_name);
+        if (!def) {
+            throw exceptions::invalid_request_exception(sprint("Column %s was not found in table %s", column_name, column_family()));
+        }
+
+        auto type = validator->get_type();
+        switch (def->kind) {
+        case column_kind::partition_key:
+            if (type->is_counter()) {
+                throw exceptions::invalid_request_exception(sprint("counter type is not supported for PRIMARY KEY part %s", column_name));
+            }
+
+            if (!type->is_value_compatible_with(*def->type)) {
+                throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are incompatible.",
+                    column_name,
+                    def->type->as_cql3_type(),
+                    validator));
+            }
+            break;
+
+        case column_kind::clustering_key:
+            if (!schema->is_cql3_table()) {
+                throw exceptions::invalid_request_exception(sprint("Cannot alter clustering column %s in a non-CQL3 table", column_name));
+            }
+
+            // Note that CFMetaData.validateCompatibility already validate the change we're about to do. However, the error message it
+            // sends is a bit cryptic for a CQL3 user, so validating here for a sake of returning a better error message
+            // Do note that we need isCompatibleWith here, not just isValueCompatibleWith.
+            if (!type->is_compatible_with(*def->type)) {
+                throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are not order-compatible.",
+                    column_name,
+                    def->type->as_cql3_type(),
+                    validator));
+            }
+            break;
+
+        case column_kind::compact_column:
+        case column_kind::regular_column:
+        case column_kind::static_column:
+            // Thrift allows to change a column validator so CFMetaData.validateCompatibility will let it slide
+            // if we change to an incompatible type (contrarily to the comparator case). But we don't want to
+            // allow it for CQL3 (see #5882) so validating it explicitly here. We only care about value compatibility
+            // though since we won't compare values (except when there is an index, but that is validated by
+            // ColumnDefinition already).
+            if (!type->is_value_compatible_with(*def->type)) {
+                throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are incompatible.",
+                    column_name,
+                    def->type->as_cql3_type(),
+                    validator));
+            }
+            break;
+        }
+        // In any case, we update the column definition
+        cfm.with_altered_column_type(column_name->name(), type);
+        break;
+    }
+    case alter_table_statement::type::drop:
+        assert(column_name);
+        if (!schema->is_cql3_table()) {
+            throw exceptions::invalid_request_exception("Cannot drop columns from a non-CQL3 table");
+        }
+        if (!def) {
+            throw exceptions::invalid_request_exception(sprint("Column %s was not found in table %s", column_name, column_family()));
+        }
+
+        if (def->is_primary_key()) {
+            throw exceptions::invalid_request_exception(sprint("Cannot drop PRIMARY KEY part %s", column_name));
+        } else {
+            for (auto&& column_def : boost::range::join(schema->static_columns(), schema->regular_columns())) { // find
+                if (column_def.name() == column_name->name()) {
+                    cfm.without_column(column_name->name());
+                    break;
                }
            }
+        }
+        break;

-            // Cannot re-add a dropped counter column. See #7831.
-            if (meta.isCounter() && meta.getDroppedColumns().containsKey(columnName))
-                throw new InvalidRequestException(String.format("Cannot re-add previously dropped counter column %s", columnName));
+    case alter_table_statement::type::opts:
+        if (!_properties) {
+            throw exceptions::invalid_request_exception("ALTER COLUMNFAMILY WITH invoked, but no parameters found");
+        }

-            AbstractType<?> type = validator.getType();
-            if (type.isCollection() && type.isMultiCell())
-            {
-                if (!cfm.comparator.supportCollections())
-                    throw new InvalidRequestException("Cannot use non-frozen collections with a non-composite PRIMARY KEY");
-                if (cfm.isSuper())
-                    throw new InvalidRequestException("Cannot use non-frozen collections with super column families");
+        _properties->validate();

-                // If there used to be a collection column with the same name (that has been dropped), it will
-                // still be appear in the ColumnToCollectionType because or reasons explained on #6276. The same
-                // reason mean that we can't allow adding a new collection with that name (see the ticket for details).
-                if (cfm.comparator.hasCollections())
-                {
-                    CollectionType previous = cfm.comparator.collectionType() == null ? null : cfm.comparator.collectionType().defined.get(columnName.bytes);
-                    if (previous != null && !type.isCompatibleWith(previous))
-                        throw new InvalidRequestException(String.format("Cannot add a collection with the name %s " +
-                                    "because a collection with the same name and a different type has already been used in the past", columnName));
-                }
+        if (schema->is_counter() && _properties->get_default_time_to_live() > 0) {
+            throw exceptions::invalid_request_exception("Cannot set default_time_to_live on a table with counters");
+        }

-                cfm.comparator = cfm.comparator.addOrUpdateCollection(columnName, (CollectionType)type);
+        _properties->apply_to_builder(cfm);
+        break;
+
+    case alter_table_statement::type::rename:
+        for (auto&& entry : _renames) {
+            auto from = entry.first->prepare_column_identifier(schema);
+            auto to = entry.second->prepare_column_identifier(schema);
+
+            auto def = schema->get_column_definition(from->name());
+            if (!def) {
+                throw exceptions::invalid_request_exception(sprint("Cannot rename unknown column %s in table %s", from, column_family()));
            }

-            Integer componentIndex = cfm.comparator.isCompound() ? cfm.comparator.clusteringPrefixSize() : null;
-            cfm.addColumnDefinition(isStatic
-                                    ? ColumnDefinition.staticDef(cfm, columnName.bytes, type, componentIndex)
-                                    : ColumnDefinition.regularDef(cfm, columnName.bytes, type, componentIndex));
-            break;
-
-        case ALTER:
-            assert columnName != null;
-            if (def == null)
-                throw new InvalidRequestException(String.format("Column %s was not found in table %s", columnName, columnFamily()));
-
-            AbstractType<?> validatorType = validator.getType();
-            switch (def.kind)
-            {
-                case PARTITION_KEY:
-                    if (validatorType instanceof CounterColumnType)
-                        throw new InvalidRequestException(String.format("counter type is not supported for PRIMARY KEY part %s", columnName));
-                    if (cfm.getKeyValidator() instanceof CompositeType)
-                    {
-                        List<AbstractType<?>> oldTypes = ((CompositeType) cfm.getKeyValidator()).types;
-                        if (!validatorType.isValueCompatibleWith(oldTypes.get(def.position())))
-                            throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are incompatible.",
-                                                                           columnName,
-                                                                           oldTypes.get(def.position()).asCQL3Type(),
-                                                                           validator));
-
-                        List<AbstractType<?>> newTypes = new ArrayList<AbstractType<?>>(oldTypes);
-                        newTypes.set(def.position(), validatorType);
-                        cfm.keyValidator(CompositeType.getInstance(newTypes));
-                    }
-                    else
-                    {
-                        if (!validatorType.isValueCompatibleWith(cfm.getKeyValidator()))
-                            throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are incompatible.",
-                                                                           columnName,
-                                                                           cfm.getKeyValidator().asCQL3Type(),
-                                                                           validator));
-                        cfm.keyValidator(validatorType);
-                    }
-                    break;
-                case CLUSTERING_COLUMN:
-                    if (!cfm.isCQL3Table())
-                        throw new InvalidRequestException(String.format("Cannot alter clustering column %s in a non-CQL3 table", columnName));
-
-                    AbstractType<?> oldType = cfm.comparator.subtype(def.position());
-                    // Note that CFMetaData.validateCompatibility already validate the change we're about to do. However, the error message it
-                    // sends is a bit cryptic for a CQL3 user, so validating here for a sake of returning a better error message
-                    // Do note that we need isCompatibleWith here, not just isValueCompatibleWith.
-                    if (!validatorType.isCompatibleWith(oldType))
-                        throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are not order-compatible.",
-                                                                       columnName,
-                                                                       oldType.asCQL3Type(),
-                                                                       validator));
-
-                    cfm.comparator = cfm.comparator.setSubtype(def.position(), validatorType);
-                    break;
-                case COMPACT_VALUE:
-                    // See below
-                    if (!validatorType.isValueCompatibleWith(cfm.getDefaultValidator()))
-                        throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are incompatible.",
-                                                                       columnName,
-                                                                       cfm.getDefaultValidator().asCQL3Type(),
-                                                                       validator));
-                    cfm.defaultValidator(validatorType);
-                    break;
-                case REGULAR:
-                case STATIC:
-                    // Thrift allows to change a column validator so CFMetaData.validateCompatibility will let it slide
-                    // if we change to an incompatible type (contrarily to the comparator case). But we don't want to
-                    // allow it for CQL3 (see #5882) so validating it explicitly here. We only care about value compatibility
-                    // though since we won't compare values (except when there is an index, but that is validated by
-                    // ColumnDefinition already).
-                    if (!validatorType.isValueCompatibleWith(def.type))
-                        throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are incompatible.",
-                                                                       columnName,
-                                                                       def.type.asCQL3Type(),
-                                                                       validator));
-
-                    // For collections, if we alter the type, we need to update the comparator too since it includes
-                    // the type too (note that isValueCompatibleWith above has validated that the new type doesn't
-                    // change the underlying sorting order, but we still don't want to have a discrepancy between the type
-                    // in the comparator and the one in the ColumnDefinition as that would be dodgy).
-                    if (validatorType.isCollection() && validatorType.isMultiCell())
-                        cfm.comparator = cfm.comparator.addOrUpdateCollection(def.name, (CollectionType)validatorType);
-
-                    break;
+            if (schema->get_column_definition(to->name())) {
+                throw exceptions::invalid_request_exception(sprint("Cannot rename column %s to %s in table %s; another column of that name already exist", from, to, column_family()));
            }
-            // In any case, we update the column definition
-            cfm.addOrReplaceColumnDefinition(def.withNewType(validatorType));
-            break;

-        case DROP:
-            assert columnName != null;
-            if (!cfm.isCQL3Table())
-                throw new InvalidRequestException("Cannot drop columns from a non-CQL3 table");
-            if (def == null)
-                throw new InvalidRequestException(String.format("Column %s was not found in table %s", columnName, columnFamily()));
-
-            switch (def.kind)
-            {
-                case PARTITION_KEY:
-                case CLUSTERING_COLUMN:
-                    throw new InvalidRequestException(String.format("Cannot drop PRIMARY KEY part %s", columnName));
-                case REGULAR:
-                case STATIC:
-                    ColumnDefinition toDelete = null;
-                    for (ColumnDefinition columnDef : cfm.regularAndStaticColumns())
-                    {
-                        if (columnDef.name.equals(columnName))
-                            toDelete = columnDef;
-                    }
-                    assert toDelete != null;
-                    cfm.removeColumnDefinition(toDelete);
-                    cfm.recordColumnDrop(toDelete);
-                    break;
+            if (def->is_part_of_cell_name()) {
+                throw exceptions::invalid_request_exception(sprint("Cannot rename non PRIMARY KEY part %s", from));
            }
-            break;
-        case OPTS:
-            if (cfProps == null)
-                throw new InvalidRequestException(String.format("ALTER COLUMNFAMILY WITH invoked, but no parameters found"));

-            cfProps.validate();
-
-            if (meta.isCounter() && cfProps.getDefaultTimeToLive() > 0)
-                throw new InvalidRequestException("Cannot set default_time_to_live on a table with counters");
-
-            cfProps.applyToCFMetadata(cfm);
-            break;
-        case RENAME:
-            for (Map.Entry<ColumnIdentifier.Raw, ColumnIdentifier.Raw> entry : renames.entrySet())
-            {
-                ColumnIdentifier from = entry.getKey().prepare(cfm);
-                ColumnIdentifier to = entry.getValue().prepare(cfm);
-                cfm.renameColumn(from, to);
+            if (def->is_indexed()) {
+                throw exceptions::invalid_request_exception(sprint("Cannot rename column %s because it is secondary indexed", from));
            }
-            break;
+
+            cfm.with_column_rename(from->name(), to->name());
+        }
+        break;
    }

-    MigrationManager.announceColumnFamilyUpdate(cfm, false, isLocalOnly);
-    return true;
-#endif
+    return service::get_local_migration_manager().announce_column_family_update(cfm.build(), false, is_local_only).then([] {
+        return true;
+    });
 }

 shared_ptr<transport::event::schema_change> alter_table_statement::change_event()
--- a/cql3/statements/batch_statement.cc
+++ b/cql3/statements/batch_statement.cc
@@ -38,6 +38,7 @@
 */

 #include "batch_statement.hh"
+#include "db/config.hh"

 namespace cql3 {

@@ -55,6 +56,50 @@ bool batch_statement::depends_on_column_family(const sstring& cf_name) const
    return false;
 }

+void batch_statement::verify_batch_size(const std::vector<mutation>& mutations) {
+    size_t warn_threshold = service::get_local_storage_proxy().get_db().local().get_config().batch_size_warn_threshold_in_kb();
+
+    class my_partition_visitor : public mutation_partition_visitor {
+    public:
+        void accept_partition_tombstone(tombstone) override {}
+        void accept_static_cell(column_id, atomic_cell_view v)  override {
+            size += v.value().size();
+        }
+        void accept_static_cell(column_id, collection_mutation_view v) override {
+            size += v.data.size();
+        }
+        void accept_row_tombstone(clustering_key_prefix_view, tombstone) override {}
+        void accept_row(clustering_key_view, tombstone, const row_marker&) override {}
+        void accept_row_cell(column_id, atomic_cell_view v) override {
+            size += v.value().size();
+        }
+        void accept_row_cell(column_id id, collection_mutation_view v) override {
+            size += v.data.size();
+        }
+
+        size_t size = 0;
+    };
+
+    my_partition_visitor v;
+
+    for (auto&m : mutations) {
+        m.partition().accept(*m.schema(), v);
+    }
+
+    auto size = v.size / 1024;
+
+    if (size > warn_threshold) {
+        std::unordered_set<sstring> ks_cf_pairs;
+        for (auto&& m : mutations) {
+            ks_cf_pairs.insert(m.schema()->ks_name() + "." + m.schema()->cf_name());
+        }
+        _logger.warn(
+                        "Batch of prepared statements for {} is of size {}, exceeding specified threshold of {} by {}.{}",
+                        join(", ", ks_cf_pairs), size, warn_threshold,
+                        size - warn_threshold, "");
+    }
+}
+
 }

 }
--- a/cql3/statements/batch_statement.hh
+++ b/cql3/statements/batch_statement.hh
@@ -196,27 +196,8 @@ public:
     * Checks batch size to ensure threshold is met. If not, a warning is logged.
     * @param cfs ColumnFamilies that will store the batch's mutations.
     */
-    static void verify_batch_size(const std::vector<mutation>& mutations) {
-        size_t warn_threshold = 1000; // FIXME: database_descriptor::get_batch_size_warn_threshold();
-        size_t fail_threshold = 2000; // FIXME: database_descriptor::get_batch_size_fail_threshold();
+    static void verify_batch_size(const std::vector<mutation>& mutations);

-        size_t size = mutations.size();
-
-        if (size > warn_threshold) {
-            std::unordered_set<sstring> ks_cf_pairs;
-            for (auto&& m : mutations) {
-                ks_cf_pairs.insert(m.schema()->ks_name() + "." + m.schema()->cf_name());
-            }
-            const char* format = "Batch of prepared statements for {} is of size {}, exceeding specified threshold of {} by {}.{}";
-            if (size > fail_threshold) {
-                // FIXME: Tracing.trace(format, new Object[] {ksCfPairs, size, failThreshold, size - failThreshold, " (see batch_size_fail_threshold_in_kb)"});
-                _logger.error(format, join(", ", ks_cf_pairs), size, fail_threshold, size - fail_threshold, " (see batch_size_fail_threshold_in_kb)");
-                throw exceptions::invalid_request_exception("Batch too large");
-            } else {
-                _logger.warn(format, join(", ", ks_cf_pairs), size, warn_threshold, size - warn_threshold, "");
-            }
-        }
-    }
    virtual future<shared_ptr<transport::messages::result_message>> execute(
            distributed<service::storage_proxy>& storage, service::query_state& state, const query_options& options) override {
        return execute(storage, state, options, false, options.get_timestamp(state));
--- a/cql3/statements/create_index_statement.cc
+++ b/cql3/statements/create_index_statement.cc
@@ -81,7 +81,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
    auto cd = schema->get_column_definition(target->column->name());

    if (cd == nullptr) {
-        throw exceptions::invalid_request_exception(sprint("No column definition found for column %s", target->column->name()));
+        throw exceptions::invalid_request_exception(sprint("No column definition found for column %s", *target->column));
    }

    bool is_map = dynamic_cast<const collection_type_impl *>(cd->type.get()) != nullptr
@@ -93,7 +93,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
            throw exceptions::invalid_request_exception(
                    sprint("Cannot create index on %s of frozen<map> column %s",
                            index_target::index_option(target->type),
-                            target->column->name()));
+                            *target->column));
        }
    } else {
        // validateNotFullIndex
@@ -107,7 +107,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
                    sprint(
                            "Cannot create index on %s of column %s; only non-frozen collections support %s indexes",
                            index_target::index_option(target->type),
-                            target->column->name(),
+                            *target->column,
                            index_target::index_option(target->type)));
        }
        // validateTargetColumnIsMapIfIndexInvolvesKeys
@@ -118,7 +118,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
                        sprint(
                                "Cannot create index on %s of column %s with non-map type",
                                index_target::index_option(target->type),
-                                target->column->name()));
+                                *target->column));

            }
        }
@@ -132,9 +132,9 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
                            "Cannot create index on %s(%s): an index on %s(%s) already exists and indexing "
                                    "a map on more than one dimension at the same time is not currently supported",
                            index_target::index_option(target->type),
-                            target->column->name(),
+                            *target->column,
                            index_target::index_option(prev_type),
-                            target->column->name()));
+                            *target->column));
        }
        if (_if_not_exists) {
            return;
@@ -164,12 +164,13 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
        throw exceptions::invalid_request_exception(
                sprint(
                        "Cannot create secondary index on partition key column %s",
-                        target->column->name()));
+                        *target->column));
    }
 }

 future<bool>
 cql3::statements::create_index_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) {
+    throw std::runtime_error("Indexes are not supported yet");
    auto schema = proxy.local().get_db().local().find_schema(keyspace(), column_family());
    auto target = _raw_target->prepare(schema);

--- a/cql3/statements/modification_statement.cc
+++ b/cql3/statements/modification_statement.cc
@@ -270,7 +270,7 @@ modification_statement::read_required_rows(
    for (auto&& pk : *keys) {
        pr.emplace_back(dht::global_partitioner().decorate_key(*s, pk));
    }
-    query::read_command cmd(s->id(), ps, std::numeric_limits<uint32_t>::max());
+    query::read_command cmd(s->id(), s->version(), ps, std::numeric_limits<uint32_t>::max());
    // FIXME: ignoring "local"
    return proxy.local().query(s, make_lw_shared(std::move(cmd)), std::move(pr), cl).then([this, ps] (auto result) {
        // FIXME: copying
--- a/cql3/statements/select_statement.cc
+++ b/cql3/statements/select_statement.cc
@@ -218,22 +218,24 @@ select_statement::execute(distributed<service::storage_proxy>& proxy, service::q
    int32_t limit = get_limit(options);
    auto now = db_clock::now();

-    auto command = ::make_lw_shared<query::read_command>(_schema->id(), make_partition_slice(options), limit, to_gc_clock(now));
+    auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
+        make_partition_slice(options), limit, to_gc_clock(now));

    int32_t page_size = options.get_page_size();

    // An aggregation query will never be paged for the user, but we always page it internally to avoid OOM.
    // If we user provided a page_size we'll use that to page internally (because why not), otherwise we use our default
    // Note that if there are some nodes in the cluster with a version less than 2.0, we can't use paging (CASSANDRA-6707).
-    if (_selection->is_aggregate() && page_size <= 0) {
+    auto aggregate = _selection->is_aggregate();
+    if (aggregate && page_size <= 0) {
        page_size = DEFAULT_COUNT_PAGE_SIZE;
    }

    auto key_ranges = _restrictions->get_partition_key_ranges(options);

-    if (page_size <= 0
+    if (!aggregate && (page_size <= 0
            || !service::pager::query_pagers::may_need_paging(page_size,
-                    *command, key_ranges)) {
+                    *command, key_ranges))) {
        return execute(proxy, command, std::move(key_ranges), state, options,
                now);
    }
@@ -241,7 +243,7 @@ select_statement::execute(distributed<service::storage_proxy>& proxy, service::q
    auto p = service::pager::query_pagers::pager(_schema, _selection,
            state, options, command, std::move(key_ranges));

-    if (_selection->is_aggregate()) {
+    if (aggregate) {
        return do_with(
                cql3::selection::result_set_builder(*_selection, now,
                        options.get_serialization_format()),
@@ -308,7 +310,8 @@ future<::shared_ptr<transport::messages::result_message>>
 select_statement::execute_internal(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
    int32_t limit = get_limit(options);
    auto now = db_clock::now();
-    auto command = ::make_lw_shared<query::read_command>(_schema->id(), make_partition_slice(options), limit);
+    auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
+        make_partition_slice(options), limit);
    auto partition_ranges = _restrictions->get_partition_key_ranges(options);

    if (needs_post_query_ordering() && _limit) {
--- a/cql3/statements/update_statement.cc
+++ b/cql3/statements/update_statement.cc
@@ -59,7 +59,7 @@ bool update_statement::require_full_clustering_key() const {
 void update_statement::add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
    if (s->is_dense()) {
        if (!prefix || (prefix.size() == 1 && prefix.components().front().empty())) {
-            throw exceptions::invalid_request_exception(sprint("Missing PRIMARY KEY part %s", *s->clustering_key_columns().begin()));
+            throw exceptions::invalid_request_exception(sprint("Missing PRIMARY KEY part %s", s->clustering_key_columns().begin()->name_as_text()));
        }

        // An empty name for the compact value is what we use to recognize the case where there is not column
--- a/database.cc
+++ b/database.cc
@@ -23,6 +23,7 @@
 #include "database.hh"
 #include "unimplemented.hh"
 #include "core/future-util.hh"
+#include "db/commitlog/commitlog_entry.hh"
 #include "db/system_keyspace.hh"
 #include "db/consistency_level.hh"
 #include "db/serializer.hh"
@@ -57,6 +58,8 @@
 #include <seastar/core/enum.hh>
 #include "utils/latency.hh"
 #include "utils/flush_queue.hh"
+#include "schema_registry.hh"
+#include "service/priority_manager.hh"

 using namespace std::chrono_literals;

@@ -126,9 +129,9 @@ column_family::make_partition_presence_checker(lw_shared_ptr<sstable_list> old_s

 mutation_source
 column_family::sstables_as_mutation_source() {
-    return [this] (const query::partition_range& r) {
-        return make_sstable_reader(r);
-    };
+    return mutation_source([this] (schema_ptr s, const query::partition_range& r, const io_priority_class& pc) {
+        return make_sstable_reader(std::move(s), r, pc);
+    });
 }

 // define in .cc, since sstable is forward-declared in .hh
@@ -153,10 +156,14 @@ class range_sstable_reader final : public mutation_reader::impl {
    const query::partition_range& _pr;
    lw_shared_ptr<sstable_list> _sstables;
    mutation_reader _reader;
+    // Use a pointer instead of copying, so we don't need to regenerate the reader if
+    // the priority changes.
+    const io_priority_class* _pc;
 public:
-    range_sstable_reader(schema_ptr s, lw_shared_ptr<sstable_list> sstables, const query::partition_range& pr)
+    range_sstable_reader(schema_ptr s, lw_shared_ptr<sstable_list> sstables, const query::partition_range& pr, const io_priority_class& pc)
        : _pr(pr)
        , _sstables(std::move(sstables))
+        , _pc(&pc)
    {
        std::vector<mutation_reader> readers;
        for (const lw_shared_ptr<sstables::sstable>& sst : *_sstables | boost::adaptors::map_values) {
@@ -183,11 +190,15 @@ class single_key_sstable_reader final : public mutation_reader::impl {
    mutation_opt _m;
    bool _done = false;
    lw_shared_ptr<sstable_list> _sstables;
+    // Use a pointer instead of copying, so we don't need to regenerate the reader if
+    // the priority changes.
+    const io_priority_class* _pc;
 public:
-    single_key_sstable_reader(schema_ptr schema, lw_shared_ptr<sstable_list> sstables, const partition_key& key)
+    single_key_sstable_reader(schema_ptr schema, lw_shared_ptr<sstable_list> sstables, const partition_key& key, const io_priority_class& pc)
        : _schema(std::move(schema))
        , _key(sstables::key::from_partition_key(*_schema, key))
        , _sstables(std::move(sstables))
+        , _pc(&pc)
    { }

    virtual future<mutation_opt> operator()() override {
@@ -206,26 +217,26 @@ public:
 };

 mutation_reader
-column_family::make_sstable_reader(const query::partition_range& pr) const {
+column_family::make_sstable_reader(schema_ptr s, const query::partition_range& pr, const io_priority_class& pc) const {
    if (pr.is_singular() && pr.start()->value().has_key()) {
        const dht::ring_position& pos = pr.start()->value();
        if (dht::shard_of(pos.token()) != engine().cpu_id()) {
            return make_empty_reader(); // range doesn't belong to this shard
        }
-        return make_mutation_reader<single_key_sstable_reader>(_schema, _sstables, *pos.key());
+        return make_mutation_reader<single_key_sstable_reader>(std::move(s), _sstables, *pos.key(), pc);
    } else {
        // range_sstable_reader is not movable so we need to wrap it
-        return make_mutation_reader<range_sstable_reader>(_schema, _sstables, pr);
+        return make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr, pc);
    }
 }

 key_source column_family::sstables_as_key_source() const {
-    return [this] (const query::partition_range& range) {
+    return key_source([this] (const query::partition_range& range, const io_priority_class& pc) {
        std::vector<key_reader> readers;
        readers.reserve(_sstables->size());
        std::transform(_sstables->begin(), _sstables->end(), std::back_inserter(readers), [&] (auto&& entry) {
            auto& sst = entry.second;
-            auto rd = sstables::make_key_reader(_schema, sst, range);
+            auto rd = sstables::make_key_reader(_schema, sst, range, pc);
            if (sst->is_shared()) {
                rd = make_filtering_reader(std::move(rd), [] (const dht::decorated_key& dk) {
                    return dht::shard_of(dk.token()) == engine().cpu_id();
@@ -234,14 +245,14 @@ key_source column_family::sstables_as_key_source() const {
            return rd;
        });
        return make_combined_reader(_schema, std::move(readers));
-    };
+    });
 }

 // Exposed for testing, not performance critical.
 future<column_family::const_mutation_partition_ptr>
-column_family::find_partition(const dht::decorated_key& key) const {
-    return do_with(query::partition_range::make_singular(key), [this] (auto& range) {
-        return do_with(this->make_reader(range), [] (mutation_reader& reader) {
+column_family::find_partition(schema_ptr s, const dht::decorated_key& key) const {
+    return do_with(query::partition_range::make_singular(key), [s = std::move(s), this] (auto& range) {
+        return do_with(this->make_reader(s, range), [] (mutation_reader& reader) {
            return reader().then([] (mutation_opt&& mo) -> std::unique_ptr<const mutation_partition> {
                if (!mo) {
                    return {};
@@ -253,13 +264,13 @@ column_family::find_partition(const dht::decorated_key& key) const {
 }

 future<column_family::const_mutation_partition_ptr>
-column_family::find_partition_slow(const partition_key& key) const {
-    return find_partition(dht::global_partitioner().decorate_key(*_schema, key));
+column_family::find_partition_slow(schema_ptr s, const partition_key& key) const {
+    return find_partition(s, dht::global_partitioner().decorate_key(*s, key));
 }

 future<column_family::const_row_ptr>
-column_family::find_row(const dht::decorated_key& partition_key, clustering_key clustering_key) const {
-    return find_partition(partition_key).then([clustering_key = std::move(clustering_key)] (const_mutation_partition_ptr p) {
+column_family::find_row(schema_ptr s, const dht::decorated_key& partition_key, clustering_key clustering_key) const {
+    return find_partition(std::move(s), partition_key).then([clustering_key = std::move(clustering_key)] (const_mutation_partition_ptr p) {
        if (!p) {
            return make_ready_future<const_row_ptr>();
        }
@@ -274,8 +285,8 @@ column_family::find_row(const dht::decorated_key& partition_key, clustering_key
 }

 mutation_reader
-column_family::make_reader(const query::partition_range& range) const {
-    if (query::is_wrap_around(range, *_schema)) {
+column_family::make_reader(schema_ptr s, const query::partition_range& range, const io_priority_class& pc) const {
+    if (query::is_wrap_around(range, *s)) {
        // make_combined_reader() can't handle streams that wrap around yet.
        fail(unimplemented::cause::WRAP_AROUND);
    }
@@ -304,21 +315,22 @@ column_family::make_reader(const query::partition_range& range) const {
    // https://github.com/scylladb/scylla/issues/185

    for (auto&& mt : *_memtables) {
-        readers.emplace_back(mt->make_reader(range));
+        readers.emplace_back(mt->make_reader(s, range));
    }

    if (_config.enable_cache) {
-        readers.emplace_back(_cache.make_reader(range));
+        readers.emplace_back(_cache.make_reader(s, range, pc));
    } else {
-        readers.emplace_back(make_sstable_reader(range));
+        readers.emplace_back(make_sstable_reader(s, range, pc));
    }

    return make_combined_reader(std::move(readers));
 }

+// Not performance critical. Currently used for testing only.
 template <typename Func>
 future<bool>
-column_family::for_all_partitions(Func&& func) const {
+column_family::for_all_partitions(schema_ptr s, Func&& func) const {
    static_assert(std::is_same<bool, std::result_of_t<Func(const dht::decorated_key&, const mutation_partition&)>>::value,
                  "bad Func signature");

@@ -329,13 +341,13 @@ column_family::for_all_partitions(Func&& func) const {
        bool empty = false;
    public:
        bool done() const { return !ok || empty; }
-        iteration_state(const column_family& cf, Func&& func)
-            : reader(cf.make_reader())
+        iteration_state(schema_ptr s, const column_family& cf, Func&& func)
+            : reader(cf.make_reader(std::move(s)))
            , func(std::move(func))
        { }
    };

-    return do_with(iteration_state(*this, std::move(func)), [] (iteration_state& is) {
+    return do_with(iteration_state(std::move(s), *this, std::move(func)), [] (iteration_state& is) {
        return do_until([&is] { return is.done(); }, [&is] {
            return is.reader().then([&is](mutation_opt&& mo) {
                if (!mo) {
@@ -351,8 +363,8 @@ column_family::for_all_partitions(Func&& func) const {
 }

 future<bool>
-column_family::for_all_partitions_slow(std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const {
-    return for_all_partitions(std::move(func));
+column_family::for_all_partitions_slow(schema_ptr s, std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const {
+    return for_all_partitions(std::move(s), std::move(func));
 }

 class lister {
@@ -462,7 +474,15 @@ future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sst
    }

    update_sstables_known_generation(comps.generation);
-    assert(_sstables->count(comps.generation) == 0);
+
+    {
+        auto i = _sstables->find(comps.generation);
+        if (i != _sstables->end()) {
+            auto new_toc = sstdir + "/" + fname;
+            throw std::runtime_error(sprint("Attempted to add sstable generation %d twice: new=%s existing=%s",
+                                            comps.generation, new_toc, i->second->toc_filename()));
+        }
+    }

    auto fut = sstable::get_sstable_key_range(*_schema, _schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
    return std::move(fut).then([this, sstdir = std::move(sstdir), comps] (range<partition_key> r) {
@@ -584,27 +604,20 @@ column_family::try_flush_memtable_to_sstable(lw_shared_ptr<memtable> old) {
    _config.cf_stats->pending_memtables_flushes_bytes += memtable_size;
    newtab->set_unshared();
    dblog.debug("Flushing to {}", newtab->get_filename());
-    return newtab->write_components(*old).then([this, newtab, old] {
-        return newtab->open_data().then([this, newtab] {
-            // Note that due to our sharded architecture, it is possible that
-            // in the face of a value change some shards will backup sstables
-            // while others won't.
-            //
-            // This is, in theory, possible to mitigate through a rwlock.
-            // However, this doesn't differ from the situation where all tables
-            // are coming from a single shard and the toggle happens in the
-            // middle of them.
-            //
-            // The code as is guarantees that we'll never partially backup a
-            // single sstable, so that is enough of a guarantee.
-            if (!incremental_backups_enabled()) {
-                return make_ready_future<>();
-            }
-            auto dir = newtab->get_dir() + "/backups/";
-            return touch_directory(dir).then([dir, newtab] {
-                return newtab->create_links(dir);
-            });
-        });
+    // Note that due to our sharded architecture, it is possible that
+    // in the face of a value change some shards will backup sstables
+    // while others won't.
+    //
+    // This is, in theory, possible to mitigate through a rwlock.
+    // However, this doesn't differ from the situation where all tables
+    // are coming from a single shard and the toggle happens in the
+    // middle of them.
+    //
+    // The code as is guarantees that we'll never partially backup a
+    // single sstable, so that is enough of a guarantee.
+    auto&& priority = service::get_local_memtable_flush_priority();
+    return newtab->write_components(*old, incremental_backups_enabled(), priority).then([this, newtab, old] {
+        return newtab->open_data();
    }).then_wrapped([this, old, newtab, memtable_size] (future<> ret) {
        _config.cf_stats->pending_memtables_flushes_count--;
        _config.cf_stats->pending_memtables_flushes_bytes -= memtable_size;
@@ -709,68 +722,119 @@ column_family::reshuffle_sstables(int64_t start) {
    });
 }

+void
+column_family::rebuild_sstable_list(const std::vector<sstables::shared_sstable>& new_sstables,
+                                    const std::vector<sstables::shared_sstable>& sstables_to_remove) {
+    // Build a new list of _sstables: We remove from the existing list the
+    // tables we compacted (by now, there might be more sstables flushed
+    // later), and we add the new tables generated by the compaction.
+    // We create a new list rather than modifying it in-place, so that
+    // on-going reads can continue to use the old list.
+    auto current_sstables = _sstables;
+    _sstables = make_lw_shared<sstable_list>();
+
+    // zeroing live_disk_space_used and live_sstable_count because the
+    // sstable list is re-created below.
+    _stats.live_disk_space_used = 0;
+    _stats.live_sstable_count = 0;
+
+    std::unordered_set<sstables::shared_sstable> s(
+           sstables_to_remove.begin(), sstables_to_remove.end());
+
+    for (const auto& oldtab : *current_sstables) {
+        // Checks if oldtab is a sstable not being compacted.
+        if (!s.count(oldtab.second)) {
+            update_stats_for_new_sstable(oldtab.second->data_size());
+            _sstables->emplace(oldtab.first, oldtab.second);
+        }
+    }
+
+    for (const auto& newtab : new_sstables) {
+        // FIXME: rename the new sstable(s). Verify a rename doesn't cause
+        // problems for the sstable object.
+        update_stats_for_new_sstable(newtab->data_size());
+        _sstables->emplace(newtab->generation(), newtab);
+    }
+
+    for (const auto& oldtab : sstables_to_remove) {
+        oldtab->mark_for_deletion();
+    }
+}
+
 future<>
-column_family::compact_sstables(sstables::compaction_descriptor descriptor) {
+column_family::compact_sstables(sstables::compaction_descriptor descriptor, bool cleanup) {
    if (!descriptor.sstables.size()) {
        // if there is nothing to compact, just return.
        return make_ready_future<>();
    }

-    return with_lock(_sstables_lock.for_read(), [this, descriptor = std::move(descriptor)] {
+    return with_lock(_sstables_lock.for_read(), [this, descriptor = std::move(descriptor), cleanup] {
        auto sstables_to_compact = make_lw_shared<std::vector<sstables::shared_sstable>>(std::move(descriptor.sstables));

-        auto new_tables = make_lw_shared<std::vector<
-                std::pair<unsigned, sstables::shared_sstable>>>();
+        auto new_tables = make_lw_shared<std::vector<sstables::shared_sstable>>();
        auto create_sstable = [this, new_tables] {
-                // FIXME: this generation calculation should be in a function.
-                auto gen = _sstable_generation++ * smp::count + engine().cpu_id();
+                auto gen = this->calculate_generation_for_new_table();
                // FIXME: use "tmp" marker in names of incomplete sstable
                auto sst = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(), _config.datadir, gen,
                        sstables::sstable::version_types::ka,
                        sstables::sstable::format_types::big);
                sst->set_unshared();
-                new_tables->emplace_back(gen, sst);
+                new_tables->emplace_back(sst);
                return sst;
        };
        return sstables::compact_sstables(*sstables_to_compact, *this,
-                create_sstable, descriptor.max_sstable_bytes, descriptor.level).then([this, new_tables, sstables_to_compact] {
-            // Build a new list of _sstables: We remove from the existing list the
-            // tables we compacted (by now, there might be more sstables flushed
-            // later), and we add the new tables generated by the compaction.
-            // We create a new list rather than modifying it in-place, so that
-            // on-going reads can continue to use the old list.
-            auto current_sstables = _sstables;
-            _sstables = make_lw_shared<sstable_list>();
-
-            // zeroing live_disk_space_used and live_sstable_count because the
-            // sstable list is re-created below.
-            _stats.live_disk_space_used = 0;
-            _stats.live_sstable_count = 0;
-
-            std::unordered_set<sstables::shared_sstable> s(
-                    sstables_to_compact->begin(), sstables_to_compact->end());
-            for (const auto& oldtab : *current_sstables) {
-                // Checks if oldtab is a sstable not being compacted.
-                if (!s.count(oldtab.second)) {
-                    update_stats_for_new_sstable(oldtab.second->data_size());
-                    _sstables->emplace(oldtab.first, oldtab.second);
+                create_sstable, descriptor.max_sstable_bytes, descriptor.level, cleanup).then([this, new_tables, sstables_to_compact] {
+            this->rebuild_sstable_list(*new_tables, *sstables_to_compact);
+        }).then_wrapped([this, new_tables] (future<> f) {
+            try {
+                f.get();
+            } catch (...) {
+                // Delete either partially or fully written sstables of a compaction that
+                // was either stopped abruptly (e.g. out of disk space) or deliberately
+                // (e.g. nodetool stop COMPACTION).
+                for (auto& sst : *new_tables) {
+                    dblog.debug("Deleting sstable {} of interrupted compaction for {}/{}", sst->get_filename(), _schema->ks_name(), _schema->cf_name());
+                    sst->mark_for_deletion();
                }
-            }
-
-            for (const auto& newtab : *new_tables) {
-                // FIXME: rename the new sstable(s). Verify a rename doesn't cause
-                // problems for the sstable object.
-                update_stats_for_new_sstable(newtab.second->data_size());
-                _sstables->emplace(newtab.first, newtab.second);
-            }
-
-            for (const auto& oldtab : *sstables_to_compact) {
-                oldtab->mark_for_deletion();
+                throw;
            }
        });
    });
 }

+static bool needs_cleanup(const lw_shared_ptr<sstables::sstable>& sst,
+                   const lw_shared_ptr<std::vector<range<dht::token>>>& owned_ranges,
+                   schema_ptr s) {
+    auto first = sst->get_first_partition_key(*s);
+    auto last = sst->get_last_partition_key(*s);
+    auto first_token = dht::global_partitioner().get_token(*s, first);
+    auto last_token = dht::global_partitioner().get_token(*s, last);
+    range<dht::token> sst_token_range = range<dht::token>::make(first_token, last_token);
+
+    // return true iff sst partition range isn't fully contained in any of the owned ranges.
+    for (auto& r : *owned_ranges) {
+        if (r.contains(sst_token_range, dht::token_comparator())) {
+            return false;
+        }
+    }
+    return true;
+}
+
+future<> column_family::cleanup_sstables(sstables::compaction_descriptor descriptor) {
+    std::vector<range<dht::token>> r = service::get_local_storage_service().get_local_ranges(_schema->ks_name());
+    auto owned_ranges = make_lw_shared<std::vector<range<dht::token>>>(std::move(r));
+    auto sstables_to_cleanup = make_lw_shared<std::vector<sstables::shared_sstable>>(std::move(descriptor.sstables));
+
+    return parallel_for_each(*sstables_to_cleanup, [this, owned_ranges = std::move(owned_ranges), sstables_to_cleanup] (auto& sst) {
+        if (!owned_ranges->empty() && !needs_cleanup(sst, owned_ranges, _schema)) {
+           return make_ready_future<>();
+        }
+
+        std::vector<sstables::shared_sstable> sstable_to_compact({ sst });
+        return this->compact_sstables(sstables::compaction_descriptor(std::move(sstable_to_compact)), true);
+    });
+}
+
 future<>
 column_family::load_new_sstables(std::vector<sstables::entry_descriptor> new_tables) {
    return parallel_for_each(new_tables, [this] (auto comps) {
@@ -816,12 +880,9 @@ void column_family::trigger_compaction() {
    }
 }

-future<> column_family::run_compaction() {
-    sstables::compaction_strategy strategy = _compaction_strategy;
-    return do_with(std::move(strategy), [this] (sstables::compaction_strategy& cs) {
-        return cs.compact(*this).then([this] {
-            _stats.pending_compactions--;
-        });
+future<> column_family::run_compaction(sstables::compaction_descriptor descriptor) {
+    return compact_sstables(std::move(descriptor)).then([this] {
+        _stats.pending_compactions--;
    });
 }

@@ -975,8 +1036,6 @@ database::database(const db::config& cfg)
    if (!_memtable_total_space) {
        _memtable_total_space = memory::stats().total_memory() / 2;
    }
-    bool durable = cfg.data_file_directories().size() > 0;
-    db::system_keyspace::make(*this, durable, _cfg->volatile_system_keyspace_for_testing());
    // Start compaction manager with two tasks for handling compaction jobs.
    _compaction_manager.start(2);
    setup_collectd();
@@ -1119,6 +1178,9 @@ future<> database::parse_system_tables(distributed<service::storage_proxy>& prox

 future<>
 database::init_system_keyspace() {
+    bool durable = _cfg->data_file_directories().size() > 0;
+    db::system_keyspace::make(*this, durable, _cfg->volatile_system_keyspace_for_testing());
+
    // FIXME support multiple directories
    return touch_directory(_cfg->data_file_directories()[0] + "/" + db::system_keyspace::NAME).then([this] {
        return populate_keyspace(_cfg->data_file_directories()[0], db::system_keyspace::NAME).then([this]() {
@@ -1184,6 +1246,8 @@ void database::drop_keyspace(const sstring& name) {
 }

 void database::add_column_family(schema_ptr schema, column_family::config cfg) {
+    schema = local_schema_registry().learn(schema);
+    schema->registry_entry()->mark_synced();
    auto uuid = schema->id();
    lw_shared_ptr<column_family> cf;
    if (cfg.enable_commitlog && _commitlog) {
@@ -1209,17 +1273,6 @@ void database::add_column_family(schema_ptr schema, column_family::config cfg) {
    _ks_cf_to_uuid.emplace(std::move(kscf), uuid);
 }

-future<> database::update_column_family(const sstring& ks_name, const sstring& cf_name) {
-    auto& proxy = service::get_storage_proxy();
-    auto old_cfm = find_schema(ks_name, cf_name);
-    return db::schema_tables::create_table_from_name(proxy, ks_name, cf_name).then([old_cfm] (auto&& new_cfm) {
-        if (old_cfm->id() != new_cfm->id()) {
-            return make_exception_future<>(exceptions::configuration_exception(sprint("Column family ID mismatch (found %s; expected %s)", new_cfm->id(), old_cfm->id())));
-        }
-        return make_exception_future<>(std::runtime_error("update column family not implemented"));
-    });
-}
-
 future<> database::drop_column_family(db_clock::time_point dropped_at, const sstring& ks_name, const sstring& cf_name) {
    auto uuid = find_uuid(ks_name, cf_name);
    auto& ks = find_keyspace(ks_name);
@@ -1483,13 +1536,17 @@ compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right) {
 }

 struct query_state {
-    explicit query_state(const query::read_command& cmd, const std::vector<query::partition_range>& ranges)
-            : cmd(cmd)
+    explicit query_state(schema_ptr s,
+                         const query::read_command& cmd,
+                         const std::vector<query::partition_range>& ranges)
+            : schema(std::move(s))
+            , cmd(cmd)
            , builder(cmd.slice)
            , limit(cmd.row_limit)
            , current_partition_range(ranges.begin())
            , range_end(ranges.end()){
    }
+    schema_ptr schema;
    const query::read_command& cmd;
    query::result::builder builder;
    uint32_t limit;
@@ -1503,21 +1560,21 @@ struct query_state {
 };

 future<lw_shared_ptr<query::result>>
-column_family::query(const query::read_command& cmd, const std::vector<query::partition_range>& partition_ranges) {
+column_family::query(schema_ptr s, const query::read_command& cmd, const std::vector<query::partition_range>& partition_ranges) {
    utils::latency_counter lc;
    _stats.reads.set_latency(lc);
-    return do_with(query_state(cmd, partition_ranges), [this] (query_state& qs) {
+    return do_with(query_state(std::move(s), cmd, partition_ranges), [this] (query_state& qs) {
        return do_until(std::bind(&query_state::done, &qs), [this, &qs] {
            auto&& range = *qs.current_partition_range++;
-            qs.reader = make_reader(range);
+            qs.reader = make_reader(qs.schema, range, service::get_local_sstable_query_read_priority());
            qs.range_empty = false;
-            return do_until([&qs] { return !qs.limit || qs.range_empty; }, [this, &qs] {
-                return qs.reader().then([this, &qs](mutation_opt mo) {
+            return do_until([&qs] { return !qs.limit || qs.range_empty; }, [&qs] {
+                return qs.reader().then([&qs](mutation_opt mo) {
                    if (mo) {
                        auto p_builder = qs.builder.add_partition(*mo->schema(), mo->key());
                        auto is_distinct = qs.cmd.slice.options.contains(query::partition_slice::option::distinct);
                        auto limit = !is_distinct ? qs.limit : 1;
-                        mo->partition().query(p_builder, *_schema, qs.cmd.timestamp, limit);
+                        mo->partition().query(p_builder, *qs.schema, qs.cmd.timestamp, limit);
                        qs.limit -= p_builder.row_count();
                    } else {
                        qs.range_empty = true;
@@ -1538,21 +1595,21 @@ column_family::query(const query::read_command& cmd, const std::vector<query::pa

 mutation_source
 column_family::as_mutation_source() const {
-    return [this] (const query::partition_range& range) {
-        return this->make_reader(range);
-    };
+    return mutation_source([this] (schema_ptr s, const query::partition_range& range, const io_priority_class& pc) {
+        return this->make_reader(std::move(s), range, pc);
+    });
 }

 future<lw_shared_ptr<query::result>>
-database::query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges) {
+database::query(schema_ptr s, const query::read_command& cmd, const std::vector<query::partition_range>& ranges) {
    column_family& cf = find_column_family(cmd.cf_id);
-    return cf.query(cmd, ranges);
+    return cf.query(std::move(s), cmd, ranges);
 }

 future<reconcilable_result>
-database::query_mutations(const query::read_command& cmd, const query::partition_range& range) {
+database::query_mutations(schema_ptr s, const query::read_command& cmd, const query::partition_range& range) {
    column_family& cf = find_column_family(cmd.cf_id);
-    return mutation_query(cf.as_mutation_source(), range, cmd.slice, cmd.row_limit, cmd.timestamp);
+    return mutation_query(std::move(s), cf.as_mutation_source(), range, cmd.slice, cmd.row_limit, cmd.timestamp);
 }

 std::unordered_set<sstring> database::get_initial_tokens() {
@@ -1597,7 +1654,8 @@ std::ostream& operator<<(std::ostream& out, const atomic_cell_or_collection& c)
 }

 std::ostream& operator<<(std::ostream& os, const mutation& m) {
-    fprint(os, "{mutation: schema %p key %s data ", m.schema().get(), m.decorated_key());
+    const ::schema& s = *m.schema();
+    fprint(os, "{%s.%s key %s data ", s.ks_name(), s.cf_name(), m.decorated_key());
    os << m.partition() << "}";
    return os;
 }
@@ -1616,28 +1674,74 @@ std::ostream& operator<<(std::ostream& out, const database& db) {
    return out;
 }

-future<> database::apply_in_memory(const frozen_mutation& m, const db::replay_position& rp) {
+void
+column_family::apply(const mutation& m, const db::replay_position& rp) {
+    utils::latency_counter lc;
+    _stats.writes.set_latency(lc);
+    active_memtable().apply(m, rp);
+    seal_on_overflow();
+    _stats.writes.mark(lc);
+    if (lc.is_start()) {
+        _stats.estimated_write.add(lc.latency(), _stats.writes.count);
+    }
+}
+
+void
+column_family::apply(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
+    utils::latency_counter lc;
+    _stats.writes.set_latency(lc);
+    check_valid_rp(rp);
+    active_memtable().apply(m, m_schema, rp);
+    seal_on_overflow();
+    _stats.writes.mark(lc);
+    if (lc.is_start()) {
+        _stats.estimated_write.add(lc.latency(), _stats.writes.count);
+    }
+}
+
+void
+column_family::seal_on_overflow() {
+    ++_mutation_count;
+    if (active_memtable().occupancy().total_space() >= _config.max_memtable_size) {
+        // FIXME: if sparse, do some in-memory compaction first
+        // FIXME: maybe merge with other in-memory memtables
+        _mutation_count = 0;
+        seal_active_memtable();
+    }
+}
+
+void
+column_family::check_valid_rp(const db::replay_position& rp) const {
+    if (rp < _highest_flushed_rp) {
+        throw replay_position_reordered_exception();
+    }
+}
+
+future<> database::apply_in_memory(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
    try {
        auto& cf = find_column_family(m.column_family_id());
-        cf.apply(m, rp);
+        cf.apply(m, m_schema, rp);
    } catch (no_such_column_family&) {
        dblog.error("Attempting to mutate non-existent table {}", m.column_family_id());
    }
    return make_ready_future<>();
 }

-future<> database::do_apply(const frozen_mutation& m) {
+future<> database::do_apply(schema_ptr s, const frozen_mutation& m) {
    // I'm doing a nullcheck here since the init code path for db etc
    // is a little in flux and commitlog is created only when db is
    // initied from datadir.
-    auto& cf = find_column_family(m.column_family_id());
+    auto uuid = m.column_family_id();
+    auto& cf = find_column_family(uuid);
+    if (!s->is_synced()) {
+        throw std::runtime_error(sprint("attempted to mutate using not synced schema of %s.%s, version=%s",
+                                 s->ks_name(), s->cf_name(), s->version()));
+    }
    if (cf.commitlog() != nullptr) {
-        auto uuid = m.column_family_id();
-        bytes_view repr = m.representation();
-        auto write_repr = [repr] (data_output& out) { out.write(repr.begin(), repr.end()); };
-        return cf.commitlog()->add_mutation(uuid, repr.size(), write_repr).then([&m, this](auto rp) {
+        commitlog_entry_writer cew(s, m);
+        return cf.commitlog()->add_entry(uuid, cew).then([&m, this, s](auto rp) {
            try {
-                return this->apply_in_memory(m, rp);
+                return this->apply_in_memory(m, s, rp);
            } catch (replay_position_reordered_exception&) {
                // expensive, but we're assuming this is super rare.
                // if we failed to apply the mutation due to future re-ordering
@@ -1645,11 +1749,11 @@ future<> database::do_apply(const frozen_mutation& m) {
                // let's just try again, add the mutation to the CL once more,
                // and assume success in inevitable eventually.
                dblog.debug("replay_position reordering detected");
-                return this->apply(m);
+                return this->apply(s, m);
            }
        });
    }
-    return apply_in_memory(m, db::replay_position());
+    return apply_in_memory(m, s, db::replay_position());
 }

 future<> database::throttle() {
@@ -1683,9 +1787,12 @@ void database::unthrottle() {
    }
 }

-future<> database::apply(const frozen_mutation& m) {
-    return throttle().then([this, &m] {
-        return do_apply(m);
+future<> database::apply(schema_ptr s, const frozen_mutation& m) {
+    if (dblog.is_enabled(logging::log_level::trace)) {
+        dblog.trace("apply {}", m.pretty_printer(s));
+    }
+    return throttle().then([this, &m, s = std::move(s)] {
+        return do_apply(std::move(s), m);
    });
 }

@@ -2226,3 +2333,15 @@ std::ostream& operator<<(std::ostream& os, const keyspace_metadata& m) {
    os << "}";
    return os;
 }
+
+void column_family::set_schema(schema_ptr s) {
+    dblog.debug("Changing schema version of {}.{} ({}) from {} to {}",
+                _schema->ks_name(), _schema->cf_name(), _schema->id(), _schema->version(), s->version());
+
+    for (auto& m : *_memtables) {
+        m->set_schema(s);
+    }
+
+    _cache.set_schema(s);
+    _schema = std::move(s);
+}
--- a/database.hh
+++ b/database.hh
@@ -64,7 +64,7 @@
 #include "mutation_reader.hh"
 #include "row_cache.hh"
 #include "compaction_strategy.hh"
-#include "utils/compaction_manager.hh"
+#include "sstables/compaction_manager.hh"
 #include "utils/exponential_backoff_retry.hh"
 #include "utils/histogram.hh"
 #include "sstables/estimated_histogram.hh"
@@ -172,6 +172,9 @@ private:
    int _compaction_disabled = 0;
    class memtable_flush_queue;
    std::unique_ptr<memtable_flush_queue> _flush_queue;
+    // Store generation of sstables being compacted at the moment. That's needed to prevent a
+    // sstable from being compacted twice.
+    std::unordered_set<unsigned long> _compacting_generations;
 private:
    void update_stats_for_new_sstable(uint64_t new_sstable_data_size);
    void add_sstable(sstables::sstable&& sstable);
@@ -185,11 +188,20 @@ private:
    void update_sstables_known_generation(unsigned generation) {
        _sstable_generation = std::max<uint64_t>(_sstable_generation, generation /  smp::count + 1);
    }
+
+    uint64_t calculate_generation_for_new_table() {
+        return _sstable_generation++ * smp::count + engine().cpu_id();
+    }
+
+    // Rebuild existing _sstables with new_sstables added to it and sstables_to_remove removed from it.
+    void rebuild_sstable_list(const std::vector<sstables::shared_sstable>& new_sstables,
+                              const std::vector<sstables::shared_sstable>& sstables_to_remove);
 private:
    // Creates a mutation reader which covers sstables.
    // Caller needs to ensure that column_family remains live (FIXME: relax this).
    // The 'range' parameter must be live as long as the reader is used.
-    mutation_reader make_sstable_reader(const query::partition_range& range) const;
+    // Mutations returned by the reader will all have given schema.
+    mutation_reader make_sstable_reader(schema_ptr schema, const query::partition_range& range, const io_priority_class& pc) const;

    mutation_source sstables_as_mutation_source();
    key_source sstables_as_key_source() const;
@@ -200,7 +212,12 @@ public:
    // Caller needs to ensure that column_family remains live (FIXME: relax this).
    // Note: for data queries use query() instead.
    // The 'range' parameter must be live as long as the reader is used.
-    mutation_reader make_reader(const query::partition_range& range = query::full_partition_range) const;
+    // Mutations returned by the reader will all have given schema.
+    // If I/O needs to be issued to read anything in the specified range, the operations
+    // will be scheduled under the priority class given by pc.
+    mutation_reader make_reader(schema_ptr schema,
+            const query::partition_range& range = query::full_partition_range,
+            const io_priority_class& pc = default_priority_class()) const;

    mutation_source as_mutation_source() const;

@@ -225,16 +242,21 @@ public:
    column_family(schema_ptr schema, config cfg, no_commitlog, compaction_manager&);
    column_family(column_family&&) = delete; // 'this' is being captured during construction
    ~column_family();
-    schema_ptr schema() const { return _schema; }
+    const schema_ptr& schema() const { return _schema; }
+    void set_schema(schema_ptr);
    db::commitlog* commitlog() { return _commitlog; }
-    future<const_mutation_partition_ptr> find_partition(const dht::decorated_key& key) const;
-    future<const_mutation_partition_ptr> find_partition_slow(const partition_key& key) const;
-    future<const_row_ptr> find_row(const dht::decorated_key& partition_key, clustering_key clustering_key) const;
-    void apply(const frozen_mutation& m, const db::replay_position& = db::replay_position());
+    future<const_mutation_partition_ptr> find_partition(schema_ptr, const dht::decorated_key& key) const;
+    future<const_mutation_partition_ptr> find_partition_slow(schema_ptr, const partition_key& key) const;
+    future<const_row_ptr> find_row(schema_ptr, const dht::decorated_key& partition_key, clustering_key clustering_key) const;
+    // Applies given mutation to this column family
+    // The mutation is always upgraded to current schema.
+    void apply(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& = db::replay_position());
    void apply(const mutation& m, const db::replay_position& = db::replay_position());

    // Returns at most "cmd.limit" rows
-    future<lw_shared_ptr<query::result>> query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
+    future<lw_shared_ptr<query::result>> query(schema_ptr,
+        const query::read_command& cmd,
+        const std::vector<query::partition_range>& ranges);

    future<> populate(sstring datadir);

@@ -283,7 +305,15 @@ public:
    // not a real compaction policy.
    future<> compact_all_sstables();
    // Compact all sstables provided in the vector.
-    future<> compact_sstables(sstables::compaction_descriptor descriptor);
+    // If cleanup is set to true, compaction_sstables will run on behalf of a cleanup job,
+    // meaning that irrelevant keys will be discarded.
+    future<> compact_sstables(sstables::compaction_descriptor descriptor, bool cleanup = false);
+    // Performs a cleanup on each sstable of this column family, excluding
+    // those ones that are irrelevant to this node or being compacted.
+    // Cleanup is about discarding keys that are no longer relevant for a
+    // given sstable, e.g. after node loses part of its token range because
+    // of a newly added node.
+    future<> cleanup_sstables(sstables::compaction_descriptor descriptor);

    future<bool> snapshot_exists(sstring name);

@@ -306,7 +336,7 @@ public:

    void start_compaction();
    void trigger_compaction();
-    future<> run_compaction();
+    future<> run_compaction(sstables::compaction_descriptor descriptor);
    void set_compaction_strategy(sstables::compaction_strategy_type strategy);
    const sstables::compaction_strategy& get_compaction_strategy() const {
        return _compaction_strategy;
@@ -337,6 +367,10 @@ public:
            }
        });
    }
+
+    std::unordered_set<unsigned long>& compacting_generations() {
+        return _compacting_generations;
+    }
 private:
    // One does not need to wait on this future if all we are interested in, is
    // initiating the write.  The writes initiated here will eventually
@@ -360,14 +394,14 @@ private:
    // so that iteration can be stopped by returning false.
    // Func signature: bool (const decorated_key& dk, const mutation_partition& mp)
    template <typename Func>
-    future<bool> for_all_partitions(Func&& func) const;
+    future<bool> for_all_partitions(schema_ptr, Func&& func) const;
    future<sstables::entry_descriptor> probe_file(sstring sstdir, sstring fname);
    void seal_on_overflow();
    void check_valid_rp(const db::replay_position&) const;
 public:
    // Iterate over all partitions.  Protocol is the same as std::all_of(),
    // so that iteration can be stopped by returning false.
-    future<bool> for_all_partitions_slow(std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const;
+    future<bool> for_all_partitions_slow(schema_ptr, std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const;

    friend std::ostream& operator<<(std::ostream& out, const column_family& cf);
    // Testing purposes.
@@ -541,7 +575,7 @@ class database {
    circular_buffer<promise<>> _throttled_requests;

    future<> init_commitlog();
-    future<> apply_in_memory(const frozen_mutation&, const db::replay_position&);
+    future<> apply_in_memory(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position&);
    future<> populate(sstring datadir);
    future<> populate_keyspace(sstring datadir, sstring ks_name);

@@ -553,7 +587,7 @@ private:
    friend void db::system_keyspace::make(database& db, bool durable, bool volatile_testing_only);
    void setup_collectd();
    future<> throttle();
-    future<> do_apply(const frozen_mutation&);
+    future<> do_apply(schema_ptr, const frozen_mutation&);
    void unthrottle();
 public:
    static utils::UUID empty_version;
@@ -584,7 +618,6 @@ public:

    void add_column_family(schema_ptr schema, column_family::config cfg);

-    future<> update_column_family(const sstring& ks_name, const sstring& cf_name);
    future<> drop_column_family(db_clock::time_point changed_at, const sstring& ks_name, const sstring& cf_name);

    /* throws std::out_of_range if missing */
@@ -619,9 +652,9 @@ public:
    unsigned shard_of(const dht::token& t);
    unsigned shard_of(const mutation& m);
    unsigned shard_of(const frozen_mutation& m);
-    future<lw_shared_ptr<query::result>> query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
-    future<reconcilable_result> query_mutations(const query::read_command& cmd, const query::partition_range& range);
-    future<> apply(const frozen_mutation&);
+    future<lw_shared_ptr<query::result>> query(schema_ptr, const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
+    future<reconcilable_result> query_mutations(schema_ptr, const query::read_command& cmd, const query::partition_range& range);
+    future<> apply(schema_ptr, const frozen_mutation&);
    keyspace::config make_keyspace_config(const keyspace_metadata& ksm);
    const sstring& get_snitch_name() const;
    future<> clear_snapshot(sstring tag, std::vector<sstring> keyspace_names);
@@ -669,53 +702,6 @@ public:
 // FIXME: stub
 class secondary_index_manager {};

-inline
-void
-column_family::apply(const mutation& m, const db::replay_position& rp) {
-    utils::latency_counter lc;
-    _stats.writes.set_latency(lc);
-    active_memtable().apply(m, rp);
-    seal_on_overflow();
-    _stats.writes.mark(lc);
-    if (lc.is_start()) {
-        _stats.estimated_write.add(lc.latency(), _stats.writes.count);
-    }
-}
-
-inline
-void
-column_family::seal_on_overflow() {
-    ++_mutation_count;
-    if (active_memtable().occupancy().total_space() >= _config.max_memtable_size) {
-        // FIXME: if sparse, do some in-memory compaction first
-        // FIXME: maybe merge with other in-memory memtables
-        _mutation_count = 0;
-        seal_active_memtable();
-    }
-}
-
-inline
-void
-column_family::check_valid_rp(const db::replay_position& rp) const {
-    if (rp < _highest_flushed_rp) {
-        throw replay_position_reordered_exception();
-    }
-}
-
-inline
-void
-column_family::apply(const frozen_mutation& m, const db::replay_position& rp) {
-    utils::latency_counter lc;
-    _stats.writes.set_latency(lc);
-    check_valid_rp(rp);
-    active_memtable().apply(m, rp);
-    seal_on_overflow();
-    _stats.writes.mark(lc);
-    if (lc.is_start()) {
-        _stats.estimated_write.add(lc.latency(), _stats.writes.count);
-    }
-}
-
 future<> update_schema_version_and_announce(distributed<service::storage_proxy>& proxy);

 #endif /* DATABASE_HH_ */
--- a/database_fwd.hh
+++ b/database_fwd.hh
@@ -31,12 +31,19 @@ class mutation_partition;
 // schema.hh
 class schema;
 class column_definition;
+class column_mapping;
+
+// schema_mutations.hh
+class schema_mutations;

 // keys.hh
 class exploded_clustering_prefix;
 class partition_key;
+class partition_key_view;
 class clustering_key_prefix;
+class clustering_key_prefix_view;
 using clustering_key = clustering_key_prefix;
+using clustering_key_view = clustering_key_prefix_view;

 // memtable.hh
 class memtable;
--- a/db/batchlog_manager.cc
+++ b/db/batchlog_manager.cc
@@ -45,6 +45,7 @@
 #include <boost/range/adaptor/sliced.hpp>

 #include "batchlog_manager.hh"
+#include "canonical_mutation.hh"
 #include "service/storage_service.hh"
 #include "service/storage_proxy.hh"
 #include "system_keyspace.hh"
@@ -57,6 +58,7 @@
 #include "db/config.hh"
 #include "gms/failure_detector.hh"
 #include "service/storage_service.hh"
+#include "schema_registry.hh"

 static logging::logger logger("batchlog_manager");

@@ -116,14 +118,14 @@ mutation db::batchlog_manager::get_batch_log_mutation_for(const std::vector<muta
    auto key = partition_key::from_singular(*schema, id);
    auto timestamp = api::new_timestamp();
    auto data = [this, &mutations] {
-        std::vector<frozen_mutation> fm(mutations.begin(), mutations.end());
+        std::vector<canonical_mutation> fm(mutations.begin(), mutations.end());
        const auto size = std::accumulate(fm.begin(), fm.end(), size_t(0), [](size_t s, auto& m) {
-            return s + serializer<frozen_mutation>{m}.size();
+            return s + serializer<canonical_mutation>{m}.size();
        });
        bytes buf(bytes::initialized_later(), size);
        data_output out(buf);
        for (auto& m : fm) {
-            serializer<frozen_mutation>{m}(out);
+            serializer<canonical_mutation>{m}(out);
        }
        return buf;
    }();
@@ -151,23 +153,24 @@ future<> db::batchlog_manager::replay_all_failed_batches() {

    auto batch = [this, limiter](const cql3::untyped_result_set::row& row) {
        auto written_at = row.get_as<db_clock::time_point>("written_at");
+        auto id = row.get_as<utils::UUID>("id");
        // enough time for the actual write + batchlog entry mutation delivery (two separate requests).
        // enough time for the actual write + batchlog entry mutation delivery (two separate requests).
        auto timeout = get_batch_log_timeout();
        if (db_clock::now() < written_at + timeout) {
+            logger.debug("Skipping replay of {}, too fresh", id);
            return make_ready_future<>();
        }
        // not used currently. ever?
        //auto version = row.has("version") ? row.get_as<uint32_t>("version") : /*MessagingService.VERSION_12*/6u;
-        auto id = row.get_as<utils::UUID>("id");
        auto data = row.get_blob("data");

        logger.debug("Replaying batch {}", id);

-        auto fms = make_lw_shared<std::deque<frozen_mutation>>();
+        auto fms = make_lw_shared<std::deque<canonical_mutation>>();
        data_input in(data);
        while (in.has_next()) {
-            fms->emplace_back(serializer<frozen_mutation>::read(in));
+            fms->emplace_back(serializer<canonical_mutation>::read(in));
        }

        auto mutations = make_lw_shared<std::vector<mutation>>();
@@ -179,11 +182,10 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
            }
            auto& fm = fms->front();
            auto mid = fm.column_family_id();
-            return system_keyspace::get_truncated_at(mid).then([this, &fm, written_at, mutations](db_clock::time_point t) {
-                auto schema = _qp.db().local().find_schema(fm.column_family_id());
+            return system_keyspace::get_truncated_at(mid).then([this, mid, &fm, written_at, mutations](db_clock::time_point t) {
+                schema_ptr s = _qp.db().local().find_schema(mid);
                if (written_at > t) {
-                    auto schema = _qp.db().local().find_schema(fm.column_family_id());
-                    mutations->emplace_back(fm.unfreeze(schema));
+                    mutations->emplace_back(fm.to_mutation(s));
                }
            }).then([fms] {
                fms->pop_front();
--- a/db/commitlog/commitlog.cc
+++ b/db/commitlog/commitlog.cc
@@ -64,6 +64,8 @@
 #include "utils/crc.hh"
 #include "utils/runtime.hh"
 #include "log.hh"
+#include "commitlog_entry.hh"
+#include "service/priority_manager.hh"

 static logging::logger logger("commitlog");

@@ -155,6 +157,9 @@ public:
    bool _shutdown = false;

    semaphore _new_segment_semaphore;
+    semaphore _write_semaphore;
+    semaphore _flush_semaphore;
+
    scollectd::registrations _regs;

    // TODO: verify that we're ok with not-so-great granularity
@@ -170,7 +175,11 @@ public:
        uint64_t bytes_slack = 0;
        uint64_t segments_created = 0;
        uint64_t segments_destroyed = 0;
-        uint64_t pending_operations = 0;
+        uint64_t pending_writes = 0;
+        uint64_t pending_flushes = 0;
+        uint64_t pending_allocations = 0;
+        uint64_t write_limit_exceeded = 0;
+        uint64_t flush_limit_exceeded = 0;
        uint64_t total_size = 0;
        uint64_t buffer_list_bytes = 0;
        uint64_t total_size_on_disk = 0;
@@ -178,33 +187,73 @@ public:

    stats totals;

-    void begin_op() {
+    future<> begin_write() {
        _gate.enter();
-        ++totals.pending_operations;
+        ++totals.pending_writes; // redundant, given semaphore. but easier to read
+        if (totals.pending_writes >= cfg.max_active_writes) {
+            ++totals.write_limit_exceeded;
+            logger.trace("Write ops overflow: {}. Will block.", totals.pending_writes);
+        }
+        return _write_semaphore.wait();
    }
-    void end_op() {
-        --totals.pending_operations;
+    void end_write() {
+        _write_semaphore.signal();
+        --totals.pending_writes;
        _gate.leave();
    }

+    future<> begin_flush() {
+        _gate.enter();
+        ++totals.pending_flushes;
+        if (totals.pending_flushes >= cfg.max_active_flushes) {
+            ++totals.flush_limit_exceeded;
+            logger.trace("Flush ops overflow: {}. Will block.", totals.pending_flushes);
+        }
+        return _flush_semaphore.wait();
+    }
+    void end_flush() {
+        _flush_semaphore.signal();
+        --totals.pending_flushes;
+        _gate.leave();
+    }
+
+    bool should_wait_for_write() const {
+        return _write_semaphore.waiters() > 0 || _flush_semaphore.waiters() > 0;
+    }
+
    segment_manager(config c)
-            : cfg(c), max_size(
-                    std::min<size_t>(std::numeric_limits<position_type>::max(),
-                            std::max<size_t>(cfg.commitlog_segment_size_in_mb,
-                                    1) * 1024 * 1024)), max_mutation_size(
-                    max_size >> 1), max_disk_size(
-                    size_t(
-                            std::ceil(
-                                    cfg.commitlog_total_space_in_mb
-                                            / double(smp::count))) * 1024 * 1024)
+        : cfg([&c] {
+            config cfg(c);
+
+            if (cfg.commit_log_location.empty()) {
+                cfg.commit_log_location = "/var/lib/scylla/commitlog";
+            }
+
+            if (cfg.max_active_writes == 0) {
+                cfg.max_active_writes = // TODO: call someone to get an idea...
+                                25 * smp::count;
+            }
+            cfg.max_active_writes = std::max(uint64_t(1), cfg.max_active_writes / smp::count);
+            if (cfg.max_active_flushes == 0) {
+                cfg.max_active_flushes = // TODO: call someone to get an idea...
+                                5 * smp::count;
+            }
+            cfg.max_active_flushes = std::max(uint64_t(1), cfg.max_active_flushes / smp::count);
+
+            return cfg;
+        }())
+        , max_size(std::min<size_t>(std::numeric_limits<position_type>::max(), std::max<size_t>(cfg.commitlog_segment_size_in_mb, 1) * 1024 * 1024))
+        , max_mutation_size(max_size >> 1)
+        , max_disk_size(size_t(std::ceil(cfg.commitlog_total_space_in_mb / double(smp::count))) * 1024 * 1024)
+        , _write_semaphore(cfg.max_active_writes)
+        , _flush_semaphore(cfg.max_active_flushes)
    {
        assert(max_size > 0);
-        if (cfg.commit_log_location.empty()) {
-            cfg.commit_log_location = "/var/lib/scylla/commitlog";
-        }
+
        logger.trace("Commitlog {} maximum disk size: {} MB / cpu ({} cpus)",
                cfg.commit_log_location, max_disk_size / (1024 * 1024),
                smp::count);
+
        _regs = create_counters();
    }
    ~segment_manager() {
@@ -238,6 +287,8 @@ public:
    }

    std::vector<sstring> get_active_names() const;
+    uint64_t get_num_dirty_segments() const;
+    uint64_t get_num_active_segments() const;

    using buffer_type = temporary_buffer<char>;

@@ -341,9 +392,39 @@ class db::commitlog::segment: public enable_lw_shared_from_this<segment> {
    std::unordered_map<cf_id_type, position_type> _cf_dirty;
    time_point _sync_time;
    seastar::gate _gate;
+    uint64_t _write_waiters = 0;
+    semaphore _queue;
+
+    std::unordered_set<table_schema_version> _known_schema_versions;

    friend std::ostream& operator<<(std::ostream&, const segment&);
    friend class segment_manager;
+
+    future<> begin_flush() {
+        // This is maintaining the semantica of only using the write-lock
+        // as a gate for flushing, i.e. once we've begun a flush for position X
+        // we are ok with writes to positions > X
+        return _dwrite.write_lock().then(std::bind(&segment_manager::begin_flush, _segment_manager)).finally([this] {
+            _dwrite.write_unlock();
+        });
+    }
+
+    void end_flush() {
+        _segment_manager->end_flush();
+    }
+
+    future<> begin_write() {
+        // This is maintaining the semantica of only using the write-lock
+        // as a gate for flushing, i.e. once we've begun a flush for position X
+        // we are ok with writes to positions > X
+        return _dwrite.read_lock().then(std::bind(&segment_manager::begin_write, _segment_manager));
+    }
+
+    void end_write() {
+        _segment_manager->end_write();
+        _dwrite.read_unlock();
+    }
+
 public:
    struct cf_mark {
        const segment& s;
@@ -365,7 +446,7 @@ public:

    segment(segment_manager* m, const descriptor& d, file && f, bool active)
            : _segment_manager(m), _desc(std::move(d)), _file(std::move(f)), _sync_time(
-                    clock_type::now())
+                    clock_type::now()), _queue(0)
    {
        ++_segment_manager->totals.segments_created;
        logger.debug("Created new {} segment {}", active ? "active" : "reserve", *this);
@@ -383,9 +464,19 @@ public:
        }
    }

+    bool is_schema_version_known(schema_ptr s) {
+        return _known_schema_versions.count(s->version());
+    }
+    void add_schema_version(schema_ptr s) {
+        _known_schema_versions.emplace(s->version());
+    }
+    void forget_schema_versions() {
+        _known_schema_versions.clear();
+    }
+
    bool must_sync() {
        if (_segment_manager->cfg.mode == sync_mode::BATCH) {
-            return true;
+            return false;
        }
        auto now = clock_type::now();
        auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(
@@ -401,8 +492,9 @@ public:
     */
    future<sseg_ptr> finish_and_get_new() {
        _closed = true;
-        sync();
-        return _segment_manager->active_segment();
+        return maybe_wait_for_write(sync()).then([](sseg_ptr s) {
+            return s->_segment_manager->active_segment();
+        });
    }
    void reset_sync_time() {
        _sync_time = clock_type::now();
@@ -417,7 +509,7 @@ public:
            logger.trace("Sync not needed {}: ({} / {})", *this, position(), _flush_pos);
            return make_ready_future<sseg_ptr>(shared_from_this());
        }
-        return cycle().then([](auto seg) {
+        return cycle().then([](sseg_ptr seg) {
            return seg->flush();
        });
    }
@@ -440,16 +532,14 @@ public:
        // This is not 100% neccesary, we really only need the ones below our flush pos,
        // but since we pretty much assume that task ordering will make this the case anyway...

-        return _dwrite.write_lock().then(
+        return begin_flush().then(
                [this, me, pos]() mutable {
-                    _dwrite.write_unlock(); // release it already.
                    pos = std::max(pos, _file_pos);
                    if (pos <= _flush_pos) {
                        logger.trace("{} already synced! ({} < {})", *this, pos, _flush_pos);
                        return make_ready_future<sseg_ptr>(std::move(me));
                    }
-                    _segment_manager->begin_op();
-                    return _file.flush().then_wrapped([this, pos, me](auto f) {
+                    return _file.flush().then_wrapped([this, pos, me](future<> f) {
                                try {
                                    f.get();
                                    // TODO: retry/ignore/fail/stop - optional behaviour in origin.
@@ -462,16 +552,50 @@ public:
                                    logger.error("Failed to flush commits to disk: {}", std::current_exception());
                                    throw;
                                }
-                            }).finally([this, me] {
-                                _segment_manager->end_op();
                            });
-                });
+        }).finally([this] {
+            end_flush();
+        });
    }
+    /**
+     * Allocate a new buffer
+     */
+    void new_buffer(size_t s) {
+        assert(_buffer.empty());
+
+        auto overhead = segment_overhead_size;
+        if (_file_pos == 0) {
+            overhead += descriptor_header_size;
+        }
+
+        auto a = align_up(s + overhead, alignment);
+        auto k = std::max(a, default_size);
+
+        for (;;) {
+            try {
+                _buffer = _segment_manager->acquire_buffer(k);
+                break;
+            } catch (std::bad_alloc&) {
+                logger.warn("Could not allocate {} k bytes output buffer ({} k required)", k / 1024, a / 1024);
+                if (k > a) {
+                    k = std::max(a, k / 2);
+                    logger.debug("Trying reduced size: {} k", k / 1024);
+                    continue;
+                }
+                throw;
+            }
+        }
+        _buf_pos = overhead;
+        auto * p = reinterpret_cast<uint32_t *>(_buffer.get_write());
+        std::fill(p, p + overhead, 0);
+        _segment_manager->totals.total_size += k;
+    }
+
    /**
     * Send any buffer contents to disk and get a new tmp buffer
     */
    // See class comment for info
-    future<sseg_ptr> cycle(size_t s = 0) {
+    future<sseg_ptr> cycle() {
        auto size = clear_buffer_slack();
        auto buf = std::move(_buffer);
        auto off = _file_pos;
@@ -479,36 +603,6 @@ public:
        _file_pos += size;
        _buf_pos = 0;

-        // if we need new buffer, get one.
-        // TODO: keep a queue of available buffers?
-        if (s > 0) {
-            auto overhead = segment_overhead_size;
-            if (_file_pos == 0) {
-                overhead += descriptor_header_size;
-            }
-
-            auto a = align_up(s + overhead, alignment);
-            auto k = std::max(a, default_size);
-
-            for (;;) {
-                try {
-                    _buffer = _segment_manager->acquire_buffer(k);
-                    break;
-                } catch (std::bad_alloc&) {
-                    logger.warn("Could not allocate {} k bytes output buffer ({} k required)", k / 1024, a / 1024);
-                    if (k > a) {
-                        k = std::max(a, k / 2);
-                        logger.debug("Trying reduced size: {} k", k / 1024);
-                        continue;
-                    }
-                    throw;
-                }
-            }
-            _buf_pos = overhead;
-            auto * p = reinterpret_cast<uint32_t *>(_buffer.get_write());
-            std::fill(p, p + overhead, 0);
-            _segment_manager->totals.total_size += k;
-        }
        auto me = shared_from_this();
        assert(!me.owned());

@@ -545,13 +639,15 @@ public:
        out.write(uint32_t(_file_pos));
        out.write(crc.checksum());

+        forget_schema_versions();
+
        // acquire read lock
-        return _dwrite.read_lock().then([this, size, off, buf = std::move(buf), me]() mutable {
+        return begin_write().then([this, size, off, buf = std::move(buf), me]() mutable {
            auto written = make_lw_shared<size_t>(0);
            auto p = buf.get();
-            _segment_manager->begin_op();
            return repeat([this, size, off, written, p]() mutable {
-                return _file.dma_write(off + *written, p + *written, size - *written).then_wrapped([this, size, written](auto&& f) {
+                auto&& priority_class = service::get_local_commitlog_priority();
+                return _file.dma_write(off + *written, p + *written, size - *written, priority_class).then_wrapped([this, size, written](future<size_t>&& f) {
                    try {
                        auto bytes = std::get<0>(f.get());
                        *written += bytes;
@@ -575,20 +671,59 @@ public:
                });
            }).finally([this, buf = std::move(buf)]() mutable {
                _segment_manager->release_buffer(std::move(buf));
-                _segment_manager->end_op();
            });
        }).then([me] {
            return make_ready_future<sseg_ptr>(std::move(me));
        }).finally([me, this]() {
-            _dwrite.read_unlock(); // release
+            end_write(); // release
+        });
+    }
+
+    future<sseg_ptr> maybe_wait_for_write(future<sseg_ptr> f) {
+        if (_segment_manager->should_wait_for_write()) {
+            ++_write_waiters;
+            logger.trace("Too many pending writes. Must wait.");
+            return f.finally([this] {
+                if (--_write_waiters == 0) {
+                    _queue.signal(_queue.waiters());
+                }
+            });
+        }
+        return make_ready_future<sseg_ptr>(shared_from_this());
+    }
+
+    /**
+     * If an allocation causes a write, and the write causes a block,
+     * any allocations post that need to wait for this to finish,
+     * other wise we will just continue building up more write queue
+     * eventually (+ loose more ordering)
+     *
+     * Some caution here, since maybe_wait_for_write actually
+     * releases _all_ queued up ops when finishing, we could get
+     * "bursts" of alloc->write, causing build-ups anyway.
+     * This should be measured properly. For now I am hoping this
+     * will work out as these should "block as a group". However,
+     * buffer memory usage might grow...
+     */
+    bool must_wait_for_alloc() {
+        return _write_waiters > 0;
+    }
+
+    future<sseg_ptr> wait_for_alloc() {
+        auto me = shared_from_this();
+        ++_segment_manager->totals.pending_allocations;
+        logger.trace("Previous allocation is blocking. Must wait.");
+        return _queue.wait().then([me] { // TODO: do we need a finally?
+            --me->_segment_manager->totals.pending_allocations;
+            return make_ready_future<sseg_ptr>(me);
        });
    }

    /**
     * Add a "mutation" to the segment.
     */
-    future<replay_position> allocate(const cf_id_type& id, size_t size,
-            serializer_func func) {
+    future<replay_position> allocate(const cf_id_type& id, shared_ptr<entry_writer> writer) {
+        const auto size = writer->size(*this);
        const auto s = size + entry_overhead_size; // total size
        if (s > _segment_manager->max_mutation_size) {
            return make_exception_future<replay_position>(
@@ -597,23 +732,26 @@ public:
                                    + " bytes is too large for the maxiumum size of "
                                    + std::to_string(_segment_manager->max_mutation_size)));
        }
-        // would we make the file too big?
-        for (;;) {
-            if (position() + s > _segment_manager->max_size) {
-                // do this in next segment instead.
-                return finish_and_get_new().then(
-                        [id, size, func = std::move(func)](auto new_seg) {
-                            return new_seg->allocate(id, size, func);
-                        });
-            }
-            // enough data?
-            if (s > (_buffer.size() - _buf_pos)) {
-                // TODO: iff we have to many writes running, maybe we should
-                // wait for this?
-                cycle(s);
-                continue; // re-check file size overflow
-            }
-            break;
+
+        std::experimental::optional<future<sseg_ptr>> op;
+
+        if (must_sync()) {
+            op = sync();
+        } else if (must_wait_for_alloc()) {
+            op = wait_for_alloc();
+        } else if (!is_still_allocating() || position() + s > _segment_manager->max_size) { // would we make the file too big?
+            // do this in next segment instead.
+            op = finish_and_get_new();
+        } else if (_buffer.empty()) {
+            new_buffer(s);
+        } else if (s > (_buffer.size() - _buf_pos)) { // enough data?
+            op = maybe_wait_for_write(cycle());
+        }
+
+        if (op) {
+            return op->then([id, writer = std::move(writer)] (sseg_ptr new_seg) mutable {
+                return new_seg->allocate(id, std::move(writer));
+            });
        }

        _gate.enter(); // this might throw. I guess we accept this?
@@ -634,7 +772,7 @@ public:
        out.write(crc.checksum());

        // actual data
-        func(out);
+        writer->write(*this, out);

        crc.process_bytes(p + 2 * sizeof(uint32_t), size);

@@ -645,9 +783,8 @@ public:

        _gate.leave();

-        // finally, check if we're required to sync.
-        if (must_sync()) {
-            return sync().then([rp](auto seg) {
+        if (_segment_manager->cfg.mode == sync_mode::BATCH) {
+            return sync().then([rp](sseg_ptr) {
                return make_ready_future<replay_position>(rp);
            });
        }
@@ -736,7 +873,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
                }
                return make_ready_future<std::experimental::optional<directory_entry_type>>(de.type);
            };
-            return entry_type(de).then([this, de](auto type) {
+            return entry_type(de).then([this, de](std::experimental::optional<directory_entry_type> type) {
                if (type == directory_entry_type::regular && de.name[0] != '.') {
                    try {
                        _result.emplace_back(de.name);
@@ -753,7 +890,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
        }
    };

-    return engine().open_directory(dirname).then([this, dirname](auto dir) {
+    return engine().open_directory(dirname).then([this, dirname](file dir) {
        auto h = make_lw_shared<helper>(std::move(dirname), std::move(dir));
        return h->done().then([h]() {
            return make_ready_future<std::vector<db::commitlog::descriptor>>(std::move(h->_result));
@@ -762,7 +899,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
 }

 future<> db::commitlog::segment_manager::init() {
-    return list_descriptors(cfg.commit_log_location).then([this](auto descs) {
+    return list_descriptors(cfg.commit_log_location).then([this](std::vector<descriptor> descs) {
        segment_id_type id = std::chrono::duration_cast<std::chrono::milliseconds>(runtime::get_boot_time().time_since_epoch()).count() + 1;
        for (auto& d : descs) {
            id = std::max(id, replay_position(d.id).base_id());
@@ -832,9 +969,23 @@ scollectd::registrations db::commitlog::segment_manager::create_counters() {
        ),

        add_polled_metric(type_instance_id("commitlog"
-                        , per_cpu_plugin_instance, "queue_length", "pending_operations")
-                , make_typed(data_type::GAUGE, totals.pending_operations)
+                        , per_cpu_plugin_instance, "queue_length", "pending_writes")
+                , make_typed(data_type::GAUGE, totals.pending_writes)
        ),
+        add_polled_metric(type_instance_id("commitlog"
+                        , per_cpu_plugin_instance, "queue_length", "pending_flushes")
+                , make_typed(data_type::GAUGE, totals.pending_flushes)
+        ),
+
+        add_polled_metric(type_instance_id("commitlog"
+                        , per_cpu_plugin_instance, "total_operations", "write_limit_exceeded")
+                , make_typed(data_type::DERIVE, totals.write_limit_exceeded)
+        ),
+        add_polled_metric(type_instance_id("commitlog"
+                        , per_cpu_plugin_instance, "total_operations", "flush_limit_exceeded")
+                , make_typed(data_type::DERIVE, totals.flush_limit_exceeded)
+        ),
+
        add_polled_metric(type_instance_id("commitlog"
                        , per_cpu_plugin_instance, "memory", "total_size")
                , make_typed(data_type::GAUGE, totals.total_size)
@@ -963,7 +1114,7 @@ std::ostream& db::operator<<(std::ostream& out, const db::replay_position& p) {
 void db::commitlog::segment_manager::discard_unused_segments() {
    logger.trace("Checking for unused segments ({} active)", _segments.size());

-    auto i = std::remove_if(_segments.begin(), _segments.end(), [=](auto s) {
+    auto i = std::remove_if(_segments.begin(), _segments.end(), [=](sseg_ptr s) {
        if (s->can_delete()) {
            logger.debug("Segment {} is unused", *s);
            return true;
@@ -1057,7 +1208,7 @@ void db::commitlog::segment_manager::on_timer() {
            return this->allocate_segment(false).then([this](sseg_ptr s) {
                if (!_shutdown) {
                    // insertion sort.
-                    auto i = std::upper_bound(_reserve_segments.begin(), _reserve_segments.end(), s, [](auto s1, auto s2) {
+                    auto i = std::upper_bound(_reserve_segments.begin(), _reserve_segments.end(), s, [](sseg_ptr s1, sseg_ptr s2) {
                        const descriptor& d1 = s1->_desc;
                        const descriptor& d2 = s2->_desc;
                        return d1.id < d2.id;
@@ -1069,7 +1220,7 @@ void db::commitlog::segment_manager::on_timer() {
                --_reserve_allocating;
            });
        });
-    }).handle_exception([](auto ep) {
+    }).handle_exception([](std::exception_ptr ep) {
        logger.warn("Exception in segment reservation: {}", ep);
    });
    arm();
@@ -1086,6 +1237,19 @@ std::vector<sstring> db::commitlog::segment_manager::get_active_names() const {
    return res;
 }

+uint64_t db::commitlog::segment_manager::get_num_dirty_segments() const {
+    return std::count_if(_segments.begin(), _segments.end(), [](sseg_ptr s) {
+        return !s->is_still_allocating() && !s->is_clean();
+    });
+}
+
+uint64_t db::commitlog::segment_manager::get_num_active_segments() const {
+    return std::count_if(_segments.begin(), _segments.end(), [](sseg_ptr s) {
+        return s->is_still_allocating();
+    });
+}
+
+
 db::commitlog::segment_manager::buffer_type db::commitlog::segment_manager::acquire_buffer(size_t s) {
    auto i = _temp_buffers.begin();
    auto e = _temp_buffers.end();
@@ -1128,8 +1292,44 @@ void db::commitlog::segment_manager::release_buffer(buffer_type&& b) {
 */
 future<db::replay_position> db::commitlog::add(const cf_id_type& id,
        size_t size, serializer_func func) {
-    return _segment_manager->active_segment().then([=](auto s) {
-        return s->allocate(id, size, std::move(func));
+    class serializer_func_entry_writer final : public entry_writer {
+        serializer_func _func;
+        size_t _size;
+    public:
+        serializer_func_entry_writer(size_t sz, serializer_func func)
+            : _func(std::move(func)), _size(sz)
+        { }
+        virtual size_t size(segment&) override { return _size; }
+        virtual void write(segment&, output& out) override {
+            _func(out);
+        }
+    };
+    auto writer = ::make_shared<serializer_func_entry_writer>(size, std::move(func));
+    return _segment_manager->active_segment().then([id, writer] (auto s) {
+        return s->allocate(id, writer);
+    });
+}
+
+future<db::replay_position> db::commitlog::add_entry(const cf_id_type& id, const commitlog_entry_writer& cew)
+{
+    class cl_entry_writer final : public entry_writer {
+        commitlog_entry_writer _writer;
+    public:
+        cl_entry_writer(const commitlog_entry_writer& wr) : _writer(wr) { }
+        virtual size_t size(segment& seg) override {
+            _writer.set_with_schema(!seg.is_schema_version_known(_writer.schema()));
+            return _writer.size();
+        }
+        virtual void write(segment& seg, output& out) override {
+            if (_writer.with_schema()) {
+                seg.add_schema_version(_writer.schema());
+            }
+            _writer.write(out);
+        }
+    };
+    auto writer = ::make_shared<cl_entry_writer>(cew);
+    return _segment_manager->active_segment().then([id, writer] (auto s) {
+        return s->allocate(id, writer);
    });
 }

@@ -1200,11 +1400,18 @@ future<> db::commitlog::shutdown() {
    return _segment_manager->shutdown();
 }

-
 size_t db::commitlog::max_record_size() const {
    return _segment_manager->max_mutation_size - segment::entry_overhead_size;
 }

+uint64_t db::commitlog::max_active_writes() const {
+    return _segment_manager->cfg.max_active_writes;
+}
+
+uint64_t db::commitlog::max_active_flushes() const {
+    return _segment_manager->cfg.max_active_flushes;
+}
+
 future<> db::commitlog::clear() {
    return _segment_manager->clear();
 }
@@ -1386,10 +1593,6 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
                    return skip(slack);
                }

-                if (start_off > pos) {
-                    return skip(size - entry_header_size);
-                }
-
                return fin.read_exactly(size - entry_header_size).then([this, size, crc = std::move(crc), rp](temporary_buffer<char> buf) mutable {
                    advance(buf);

@@ -1459,7 +1662,28 @@ uint64_t db::commitlog::get_flush_count() const {
 }

 uint64_t db::commitlog::get_pending_tasks() const {
-    return _segment_manager->totals.pending_operations;
+    return _segment_manager->totals.pending_writes
+                    + _segment_manager->totals.pending_flushes;
+}
+
+uint64_t db::commitlog::get_pending_writes() const {
+    return _segment_manager->totals.pending_writes;
+}
+
+uint64_t db::commitlog::get_pending_flushes() const {
+    return _segment_manager->totals.pending_flushes;
+}
+
+uint64_t db::commitlog::get_pending_allocations() const {
+    return _segment_manager->totals.pending_allocations;
+}
+
+uint64_t db::commitlog::get_write_limit_exceeded_count() const {
+    return _segment_manager->totals.write_limit_exceeded;
+}
+
+uint64_t db::commitlog::get_flush_limit_exceeded_count() const {
+    return _segment_manager->totals.flush_limit_exceeded;
 }

 uint64_t db::commitlog::get_num_segments_created() const {
@@ -1470,6 +1694,14 @@ uint64_t db::commitlog::get_num_segments_destroyed() const {
    return _segment_manager->totals.segments_destroyed;
 }

+uint64_t db::commitlog::get_num_dirty_segments() const {
+    return _segment_manager->get_num_dirty_segments();
+}
+
+uint64_t db::commitlog::get_num_active_segments() const {
+    return _segment_manager->get_num_active_segments();
+}
+
 future<std::vector<db::commitlog::descriptor>> db::commitlog::list_existing_descriptors() const {
    return list_existing_descriptors(active_config().commit_log_location);
 }
--- a/db/commitlog/commitlog.hh
+++ b/db/commitlog/commitlog.hh
@@ -48,6 +48,7 @@
 #include "core/stream.hh"
 #include "utils/UUID.hh"
 #include "replay_position.hh"
+#include "commitlog_entry.hh"

 class file;

@@ -114,6 +115,10 @@ public:
        // Max number of segments to keep in pre-alloc reserve.
        // Not (yet) configurable from scylla.conf.
        uint64_t max_reserve_segments = 12;
+        // Max active writes/flushes. Default value
+        // zero means try to figure it out ourselves
+        uint64_t max_active_writes = 0;
+        uint64_t max_active_flushes = 0;

        sync_mode mode = sync_mode::PERIODIC;
    };
@@ -181,6 +186,13 @@ public:
        });
    }

+    /**
+     * Add an entry to the commit log.
+     *
+     * @param entry_writer a writer responsible for writing the entry
+     */
+    future<replay_position> add_entry(const cf_id_type& id, const commitlog_entry_writer& entry_writer);
+
    /**
     * Modifies the per-CF dirty cursors of any commit log segments for the column family according to the position
     * given. Discards any commit log segments that are no longer used.
@@ -233,14 +245,37 @@ public:
    uint64_t get_completed_tasks() const;
    uint64_t get_flush_count() const;
    uint64_t get_pending_tasks() const;
+    uint64_t get_pending_writes() const;
+    uint64_t get_pending_flushes() const;
+    uint64_t get_pending_allocations() const;
+    uint64_t get_write_limit_exceeded_count() const;
+    uint64_t get_flush_limit_exceeded_count() const;
    uint64_t get_num_segments_created() const;
    uint64_t get_num_segments_destroyed() const;
+    /**
+     * Get number of inactive (finished), segments lingering
+     * due to still being dirty
+     */
+    uint64_t get_num_dirty_segments() const;
+    /**
+     * Get number of active segments, i.e. still being allocated to
+     */
+    uint64_t get_num_active_segments() const;

    /**
     * Returns the largest amount of data that can be written in a single "mutation".
     */
    size_t max_record_size() const;

+    /**
+     * Return max allowed pending writes (per this shard)
+     */
+    uint64_t max_active_writes() const;
+    /**
+     * Return max allowed pending flushes (per this shard)
+     */
+    uint64_t max_active_flushes() const;
+
    future<> clear();

    const config& active_config() const;
@@ -283,6 +318,11 @@ public:
            const sstring&, commit_load_reader_func, position_type = 0);
 private:
    commitlog(config);
+
+    struct entry_writer {
+        virtual size_t size(segment&) = 0;
+        virtual void write(segment&, output&) = 0;
+    };
 };

 }
--- a/db/commitlog/commitlog_entry.hh
+++ b/db/commitlog/commitlog_entry.hh
@@ -0,0 +1,88 @@
+/*
+ * Copyright 2016 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <experimental/optional>
+
+#include "frozen_mutation.hh"
+#include "schema.hh"
+
+namespace stdx = std::experimental;
+
+class commitlog_entry_writer {
+    schema_ptr _schema;
+    db::serializer<column_mapping> _column_mapping_serializer;
+    const frozen_mutation& _mutation;
+    bool _with_schema = true;
+public:
+    commitlog_entry_writer(schema_ptr s, const frozen_mutation& fm)
+        : _schema(std::move(s)), _column_mapping_serializer(_schema->get_column_mapping()), _mutation(fm)
+    { }
+
+    void set_with_schema(bool value) {
+        _with_schema = value;
+    }
+    bool with_schema() {
+        return _with_schema;
+    }
+    schema_ptr schema() const {
+        return _schema;
+    }
+
+    size_t size() const {
+        size_t size = data_output::serialized_size<bool>();
+        if (_with_schema) {
+            size += _column_mapping_serializer.size();
+        }
+        size += _mutation.representation().size();
+        return size;
+    }
+
+    void write(data_output& out) const {
+        out.write(_with_schema);
+        if (_with_schema) {
+            _column_mapping_serializer.write(out);
+        }
+        auto bv = _mutation.representation();
+        out.write(bv.begin(), bv.end());
+    }
+};
+
+class commitlog_entry_reader {
+    frozen_mutation _mutation;
+    stdx::optional<column_mapping> _column_mapping;
+public:
+    commitlog_entry_reader(const temporary_buffer<char>& buffer)
+        : _mutation(bytes())
+    {
+        data_input in(buffer);
+        bool has_column_mapping = in.read<bool>();
+        if (has_column_mapping) {
+            _column_mapping = db::serializer<::column_mapping>::read(in);
+        }
+        auto bv = in.read_view(in.avail());
+        _mutation = frozen_mutation(bytes(bv.begin(), bv.end()));
+    }
+
+    const stdx::optional<column_mapping>& get_column_mapping() const { return _column_mapping; }
+    const frozen_mutation& mutation() const { return _mutation; }
+};
--- a/db/commitlog/commitlog_replayer.cc
+++ b/db/commitlog/commitlog_replayer.cc
@@ -56,10 +56,14 @@
 #include "db/serializer.hh"
 #include "cql3/query_processor.hh"
 #include "log.hh"
+#include "converting_mutation_partition_applier.hh"
+#include "schema_registry.hh"
+#include "commitlog_entry.hh"

 static logging::logger logger("commitlog_replayer");

 class db::commitlog_replayer::impl {
+    std::unordered_map<table_schema_version, column_mapping> _column_mappings;
 public:
    impl(seastar::sharded<cql3::query_processor>& db);

@@ -70,6 +74,19 @@ public:
        uint64_t skipped_mutations = 0;
        uint64_t applied_mutations = 0;
        uint64_t corrupt_bytes = 0;
+
+        stats& operator+=(const stats& s) {
+            invalid_mutations += s.invalid_mutations;
+            skipped_mutations += s.skipped_mutations;
+            applied_mutations += s.applied_mutations;
+            corrupt_bytes += s.corrupt_bytes;
+            return *this;
+        }
+        stats operator+(const stats& s) const {
+            stats tmp = *this;
+            tmp += s;
+            return tmp;
+        }
    };

    future<> process(stats*, temporary_buffer<char> buf, replay_position rp);
@@ -148,8 +165,6 @@ future<> db::commitlog_replayer::impl::init() {

 future<db::commitlog_replayer::impl::stats>
 db::commitlog_replayer::impl::recover(sstring file) {
-    logger.info("Replaying {}", file);
-
    replay_position rp{commitlog::descriptor(file)};
    auto gp = _min_pos[rp.shard_id()];

@@ -182,19 +197,29 @@ db::commitlog_replayer::impl::recover(sstring file) {
 }

 future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char> buf, replay_position rp) {
-    auto shard = rp.shard_id();
-    if (rp < _min_pos[shard]) {
-        logger.trace("entry {} is less than global min position. skipping", rp);
-        s->skipped_mutations++;
-        return make_ready_future<>();
-    }
-
    try {

-        frozen_mutation fm(bytes(reinterpret_cast<const int8_t *>(buf.get()), buf.size()));
+        commitlog_entry_reader cer(buf);
+        auto& fm = cer.mutation();
+
+        auto cm_it = _column_mappings.find(fm.schema_version());
+        if (cm_it == _column_mappings.end()) {
+            if (!cer.get_column_mapping()) {
+                throw std::runtime_error(sprint("unknown schema version {}", fm.schema_version()));
+            }
+            logger.debug("new schema version {} in entry {}", fm.schema_version(), rp);
+            cm_it = _column_mappings.emplace(fm.schema_version(), *cer.get_column_mapping()).first;
+        }
+
+        auto shard_id = rp.shard_id();
+        if (rp < _min_pos[shard_id]) {
+            logger.trace("entry {} is less than global min position. skipping", rp);
+            s->skipped_mutations++;
+            return make_ready_future<>();
+        }

        auto uuid = fm.column_family_id();
-        auto& map = _rpm[shard];
+        auto& map = _rpm[shard_id];
        auto i = map.find(uuid);
        if (i != map.end() && rp <= i->second) {
            logger.trace("entry {} at {} is younger than recorded replay position {}. skipping", fm.column_family_id(), rp, i->second);
@@ -203,14 +228,15 @@ future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char>
        }

        auto shard = _qp.local().db().local().shard_of(fm);
-        return _qp.local().db().invoke_on(shard, [fm = std::move(fm), rp, shard, s] (database& db) -> future<> {
+        return _qp.local().db().invoke_on(shard, [this, cer = std::move(cer), cm_it, rp, shard, s] (database& db) -> future<> {
+            auto& fm = cer.mutation();
            // TODO: might need better verification that the deserialized mutation
            // is schema compatible. My guess is that just applying the mutation
            // will not do this.
            auto& cf = db.find_column_family(fm.column_family_id());

            if (logger.is_enabled(logging::log_level::debug)) {
-                logger.debug("replaying at {} {}:{} at {}", fm.column_family_id(),
+                logger.debug("replaying at {} v={} {}:{} at {}", fm.column_family_id(), fm.schema_version(),
                        cf.schema()->ks_name(), cf.schema()->cf_name(), rp);
            }
            // Removed forwarding "new" RP. Instead give none/empty.
@@ -218,7 +244,15 @@ future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char>
            // The end result should be that once sstables are flushed out
            // their "replay_position" attribute will be empty, which is
            // lower than anything the new session will produce.
-            cf.apply(fm);
+            if (cf.schema()->version() != fm.schema_version()) {
+                const column_mapping& cm = cm_it->second;
+                mutation m(fm.decorated_key(*cf.schema()), cf.schema());
+                converting_mutation_partition_applier v(cm, *cf.schema(), m.partition());
+                fm.partition().accept(cm, v);
+                cf.apply(std::move(m));
+            } else {
+                cf.apply(fm, cf.schema());
+            }
            s->applied_mutations++;
            return make_ready_future<>();
        }).handle_exception([s](auto ep) {
@@ -258,32 +292,41 @@ future<db::commitlog_replayer> db::commitlog_replayer::create_replayer(seastar::
 }

 future<> db::commitlog_replayer::recover(std::vector<sstring> files) {
-    return parallel_for_each(files, [this](auto f) {
-        return this->recover(f);
+    logger.info("Replaying {}", join(", ", files));
+    return map_reduce(files, [this](auto f) {
+        logger.debug("Replaying {}", f);
+        return _impl->recover(f).then([f](impl::stats stats) {
+            if (stats.corrupt_bytes != 0) {
+                logger.warn("Corrupted file: {}. {} bytes skipped.", f, stats.corrupt_bytes);
+            }
+            logger.debug("Log replay of {} complete, {} replayed mutations ({} invalid, {} skipped)"
+                            , f
+                            , stats.applied_mutations
+                            , stats.invalid_mutations
+                            , stats.skipped_mutations
+            );
+            return make_ready_future<impl::stats>(stats);
+        }).handle_exception([f](auto ep) -> future<impl::stats> {
+            logger.error("Error recovering {}: {}", f, ep);
+            try {
+                std::rethrow_exception(ep);
+            } catch (std::invalid_argument&) {
+                logger.error("Scylla cannot process {}. Make sure to fully flush all Cassandra commit log files to sstable before migrating.", f);
+                throw;
+            } catch (...) {
+                throw;
+            }
+        });
+    }, impl::stats(), std::plus<impl::stats>()).then([](impl::stats totals) {
+        logger.info("Log replay complete, {} replayed mutations ({} invalid, {} skipped)"
+                        , totals.applied_mutations
+                        , totals.invalid_mutations
+                        , totals.skipped_mutations
+        );
    });
 }

 future<> db::commitlog_replayer::recover(sstring f) {
-    return _impl->recover(f).then([f](impl::stats stats) {
-        if (stats.corrupt_bytes != 0) {
-            logger.warn("Corrupted file: {}. {} bytes skipped.", f, stats.corrupt_bytes);
-        }
-        logger.info("Log replay of {} complete, {} replayed mutations ({} invalid, {} skipped)"
-                , f
-                , stats.applied_mutations
-                , stats.invalid_mutations
-                , stats.skipped_mutations
-                );
-    }).handle_exception([f](auto ep) {
-        logger.error("Error recovering {}: {}", f, ep);
-        try {
-            std::rethrow_exception(ep);
-        } catch (std::invalid_argument&) {
-            logger.error("Scylla cannot process {}. Make sure to fully flush all Cassandra commit log files to sstable before migrating.");
-            throw;
-        } catch (...) {
-            throw;
-        }
-    });;
+    return recover(std::vector<sstring>{ f });
 }

--- a/db/config.cc
+++ b/db/config.cc
@@ -30,6 +30,7 @@
 #include "core/shared_ptr.hh"
 #include "core/fstream.hh"
 #include "core/do_with.hh"
+#include "core/print.hh"
 #include "log.hh"
 #include <boost/any.hpp>

@@ -432,3 +433,9 @@ boost::filesystem::path db::config::get_conf_dir() {

    return confdir;
 }
+
+void db::config::check_experimental(const sstring& what) const {
+    if (!experimental()) {
+        throw std::runtime_error(sprint("%s is currently disabled. Start Scylla with --experimental=on to enable.", what));
+    }
+}
--- a/db/config.hh
+++ b/db/config.hh
@@ -102,6 +102,9 @@ public:

    config();

+    // Throws exception if experimental feature is disabled.
+    void check_experimental(const sstring& what) const;
+
    boost::program_options::options_description
    get_options_description();

@@ -265,7 +268,7 @@ public:
            "Counter writes read the current values before incrementing and writing them back. The recommended value is (16 × number_of_drives) ."  \
    )                                                   \
    /* Common automatic backup settings */  \
-    val(incremental_backups, bool, false, Unused,     \
+    val(incremental_backups, bool, false, Used,     \
            "Backs up data updated since the last snapshot was taken. When enabled, Cassandra creates a hard link to each SSTable flushed or streamed locally in a backups/ subdirectory of the keyspace data. Removing these links is the operator's responsibility.\n"  \
            "Related information: Enabling incremental backups" \
    )                                                   \
@@ -383,7 +386,7 @@ public:
            "This setting has been removed from default configuration. It makes new (non-seed) nodes automatically migrate the right data to themselves. When initializing a fresh cluster with no data, add auto_bootstrap: false.\n"  \
            "Related information: Initializing a multiple node cluster (single data center) and Initializing a multiple node cluster (multiple data centers)."  \
    )   \
-    val(batch_size_warn_threshold_in_kb, uint32_t, 5, Unused,     \
+    val(batch_size_warn_threshold_in_kb, uint32_t, 5, Used,     \
            "Log WARN on any batch size exceeding this value in kilobytes. Caution should be taken on increasing the size of this threshold as it can lead to node instability."  \
    )   \
    val(broadcast_address, sstring, /* listen_address */, Used, \
@@ -638,8 +641,8 @@ public:
    )   \
    /* Security properties */   \
    /* Server and client security settings. */  \
-    val(authenticator, sstring, "org.apache.cassandra.auth.AllowAllAuthenticator", Unused,     \
-            "The authentication backend. It implements IAuthenticator, which is used to identify users. The available authenticators are:\n"    \
+    val(authenticator, sstring, "org.apache.cassandra.auth.AllowAllAuthenticator", Used,     \
+            "The authentication backend, used to identify users. The available authenticators are:\n"    \
            "\n"    \
            "\torg.apache.cassandra.auth.AllowAllAuthenticator : Disables authentication; no checks are performed.\n"   \
            "\torg.apache.cassandra.auth.PasswordAuthenticator : Authenticates users with user names and hashed passwords stored in the system_auth.credentials table. If you use the default, 1, and the node with the lone replica goes down, you will not be able to log into the cluster because the system_auth keyspace was not replicated.\n"  \
@@ -690,7 +693,7 @@ public:
    val(ssl_storage_port, uint32_t, 7001, Used,     \
            "The SSL port for encrypted communication. Unused unless enabled in encryption_options."  \
    )                                                   \
-    val(default_log_level, sstring, "warn", Used, \
+    val(default_log_level, sstring, "info", Used, \
            "Default log level for log messages.  Valid values are trace, debug, info, warn, error.") \
    val(logger_log_level, string_map, /* none */, Used,\
            "map of logger name to log level.  Valid values are trace, debug, info, warn, error.  " \
@@ -715,7 +718,10 @@ public:
    val(replace_address_first_boot, sstring, "", Used, "Like replace_address option, but if the node has been bootstrapped sucessfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.") \
    val(override_decommission, bool, false, Used, "Set true to force a decommissioned node to join the cluster") \
    val(ring_delay_ms, uint32_t, 30 * 1000, Used, "Time a node waits to hear from other nodes before joining the ring in milliseconds. Same as -Dcassandra.ring_delay_ms in cassandra.") \
+    val(shutdown_announce_in_ms, uint32_t, 2 * 1000, Used, "Time a node waits after sending gossip shutdown message in milliseconds. Same as -Dcassandra.shutdown_announce_in_ms in cassandra.") \
    val(developer_mode, bool, false, Used, "Relax environement checks. Setting to true can reduce performance and reliability significantly.") \
+    val(skip_wait_for_gossip_to_settle, int32_t, -1, Used, "An integer to configure the wait for gossip to settle. -1: wait normally, 0: do not wait at all, n: wait for at most n polls. Same as -Dcassandra.skip_wait_for_gossip_to_settle in cassandra.") \
+    val(experimental, bool, false, Used, "Set to true to unlock experimental features.") \
    /* done! */

 #define _make_value_member(name, type, deflt, status, desc, ...)    \
@@ -732,5 +738,4 @@ private:
    int _dummy;
 };

-
 }
--- a/db/marshal/type_parser.cc
+++ b/db/marshal/type_parser.cc
@@ -50,16 +50,20 @@ namespace db {

 namespace marshal {

-type_parser::type_parser(const sstring& str, size_t idx)
-    : _str{str}
+type_parser::type_parser(sstring_view str, size_t idx)
+    : _str{str.begin(), str.end()}
    , _idx{idx}
 { }

-type_parser::type_parser(const sstring& str)
+type_parser::type_parser(sstring_view str)
    : type_parser{str, 0}
 { }

 data_type type_parser::parse(const sstring& str) {
+    return type_parser(sstring_view(str)).parse();
+}
+
+data_type type_parser::parse(sstring_view str) {
    return type_parser(str).parse();
 }

--- a/db/marshal/type_parser.hh
+++ b/db/marshal/type_parser.hh
@@ -62,14 +62,15 @@ class type_parser {

    public static final TypeParser EMPTY_PARSER = new TypeParser("", 0);
 #endif
-    type_parser(const sstring& str, size_t idx);
+    type_parser(sstring_view str, size_t idx);
 public:
-    explicit type_parser(const sstring& str);
+    explicit type_parser(sstring_view str);

    /**
     * Parse a string containing an type definition.
     */
    static data_type parse(const sstring& str);
+    static data_type parse(sstring_view str);

 #if 0
    public static AbstractType<?> parse(CharSequence compareWith) throws SyntaxException, ConfigurationException
--- a/db/schema_tables.cc
+++ b/db/schema_tables.cc
@@ -46,6 +46,7 @@
 #include "system_keyspace.hh"
 #include "query_context.hh"
 #include "query-result-set.hh"
+#include "query-result-writer.hh"
 #include "schema_builder.hh"
 #include "map_difference.hh"
 #include "utils/UUID_gen.hh"
@@ -53,9 +54,12 @@
 #include "core/thread.hh"
 #include "json.hh"
 #include "log.hh"
+#include "frozen_schema.hh"
+#include "schema_registry.hh"

 #include "db/marshal/type_parser.hh"
 #include "db/config.hh"
+#include "md5_hasher.hh"

 #include <boost/range/algorithm/copy.hpp>
 #include <boost/range/adaptor/map.hpp>
@@ -70,6 +74,36 @@ namespace schema_tables {

 logging::logger logger("schema_tables");

+struct qualified_name {
+    sstring keyspace_name;
+    sstring table_name;
+
+    qualified_name(sstring keyspace_name, sstring table_name)
+            : keyspace_name(std::move(keyspace_name))
+            , table_name(std::move(table_name))
+    { }
+
+    qualified_name(const schema_ptr& s)
+            : keyspace_name(s->ks_name())
+            , table_name(s->cf_name())
+    { }
+
+    bool operator<(const qualified_name& o) const {
+        return keyspace_name < o.keyspace_name
+               || (keyspace_name == o.keyspace_name && table_name < o.table_name);
+    }
+
+    bool operator==(const qualified_name& o) const {
+        return keyspace_name == o.keyspace_name && table_name == o.table_name;
+    }
+};
+
+static future<schema_mutations> read_table_mutations(distributed<service::storage_proxy>& proxy, const qualified_name& table);
+
+static void merge_tables(distributed<service::storage_proxy>& proxy,
+    std::map<qualified_name, schema_mutations>&& before,
+    std::map<qualified_name, schema_mutations>&& after);
+
 std::vector<const char*> ALL { KEYSPACES, COLUMNFAMILIES, COLUMNS, TRIGGERS, USERTYPES, /* not present in 2.1.8: FUNCTIONS, AGGREGATES */ };

 using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
@@ -95,7 +129,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "keyspace definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::yes);
+        builder.with(schema_builder::compact_storage::yes);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return keyspaces;
 }
@@ -147,7 +183,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "table definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return columnfamilies;
 }
@@ -176,7 +214,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "column definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return columns;
 }
@@ -200,7 +240,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "trigger definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return triggers;
 }
@@ -225,7 +267,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "user defined type definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return usertypes;
 }
@@ -254,7 +298,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "user defined type definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return functions;
 }
@@ -283,7 +329,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "user defined aggregate definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return aggregates;
 }
@@ -295,10 +343,11 @@ future<> save_system_keyspace_schema() {

    // delete old, possibly obsolete entries in schema tables
    return parallel_for_each(ALL, [ksm] (sstring cf) {
-        return db::execute_cql("DELETE FROM system.%s WHERE keyspace_name = ?", cf, ksm->name()).discard_result();
+        auto deletion_timestamp = schema_creation_timestamp() - 1;
+        return db::execute_cql(sprint("DELETE FROM system.%%s USING TIMESTAMP %s WHERE keyspace_name = ?",
+            deletion_timestamp), cf, ksm->name()).discard_result();
    }).then([ksm] {
-        // (+1 to timestamp to make sure we don't get shadowed by the tombstones we just added)
-        auto mvec  = make_create_keyspace_mutations(ksm, qctx->next_timestamp(), true);
+        auto mvec  = make_create_keyspace_mutations(ksm, schema_creation_timestamp(), true);
        return qctx->proxy().mutate_locally(std::move(mvec));
    });
 }
@@ -326,36 +375,30 @@ future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>&
    auto map = [&proxy] (sstring table) {
        return db::system_keyspace::query_mutations(proxy, table).then([&proxy, table] (auto rs) {
            auto s = proxy.local().get_db().local().find_schema(system_keyspace::NAME, table);
-            std::vector<query::result> results;
+            std::vector<mutation> mutations;
            for (auto&& p : rs->partitions()) {
                auto mut = p.mut().unfreeze(s);
                auto partition_key = value_cast<sstring>(utf8_type->deserialize(mut.key().get_component(*s, 0)));
                if (partition_key == system_keyspace::NAME) {
                    continue;
                }
-                auto slice = partition_slice_builder(*s).build();
-                results.emplace_back(mut.query(slice));
+                mutations.emplace_back(std::move(mut));
            }
-            return results;
+            return mutations;
        });
    };
-    auto reduce = [] (auto& hash, auto&& results) {
-        for (auto&& rs : results) {
-            for (auto&& f : rs.buf().fragments()) {
-                hash.Update(reinterpret_cast<const unsigned char*>(f.begin()), f.size());
-            }
+    auto reduce = [] (auto& hash, auto&& mutations) {
+        for (const mutation& m : mutations) {
+            feed_hash_for_schema_digest(hash, m);
        }
-        return make_ready_future<>();
    };
-    return do_with(CryptoPP::Weak::MD5{}, [map, reduce] (auto& hash) {
+    return do_with(md5_hasher(), [map, reduce] (auto& hash) {
        return do_for_each(ALL.begin(), ALL.end(), [&hash, map, reduce] (auto& table) {
-            return map(table).then([&hash, reduce] (auto&& results) {
-                return reduce(hash, results);
+            return map(table).then([&hash, reduce] (auto&& mutations) {
+                reduce(hash, mutations);
            });
        }).then([&hash] {
-            bytes digest{bytes::initialized_later(), CryptoPP::Weak::MD5::DIGESTSIZE};
-            hash.Final(reinterpret_cast<unsigned char*>(digest.begin()));
-            return make_ready_future<utils::UUID>(utils::UUID_gen::get_name_UUID(digest));
+            return make_ready_future<utils::UUID>(utils::UUID_gen::get_name_UUID(hash.finalize()));
        });
    });
 }
@@ -398,6 +441,28 @@ read_schema_for_keyspaces(distributed<service::storage_proxy>& proxy, const sstr
    return map_reduce(keyspace_names.begin(), keyspace_names.end(), map, schema_result{}, insert);
 }

+static
+future<mutation> query_partition_mutation(service::storage_proxy& proxy,
+    schema_ptr s,
+    lw_shared_ptr<query::read_command> cmd,
+    partition_key pkey)
+{
+    auto dk = dht::global_partitioner().decorate_key(*s, pkey);
+    return do_with(query::partition_range::make_singular(dk), [&proxy, dk, s = std::move(s), cmd = std::move(cmd)] (auto& range) {
+        return proxy.query_mutations_locally(s, std::move(cmd), range)
+                .then([dk = std::move(dk), s](foreign_ptr<lw_shared_ptr<reconcilable_result>> res) {
+                    auto&& partitions = res->partitions();
+                    if (partitions.size() == 0) {
+                        return mutation(std::move(dk), s);
+                    } else if (partitions.size() == 1) {
+                        return partitions[0].mut().unfreeze(s);
+                    } else {
+                        assert(false && "Results must have at most one partition");
+                    }
+                });
+    });
+}
+
 future<schema_result_value_type>
 read_schema_partition_for_keyspace(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const sstring& keyspace_name)
 {
@@ -409,16 +474,18 @@ read_schema_partition_for_keyspace(distributed<service::storage_proxy>& proxy, c
    });
 }

-future<schema_result_value_type>
+future<mutation>
 read_schema_partition_for_table(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const sstring& keyspace_name, const sstring& table_name)
 {
    auto schema = proxy.local().get_db().local().find_schema(system_keyspace::NAME, schema_table_name);
-    auto keyspace_key = dht::global_partitioner().decorate_key(*schema,
-        partition_key::from_singular(*schema, keyspace_name));
-    auto clustering_range = query::clustering_range(clustering_key_prefix::from_clustering_prefix(*schema, exploded_clustering_prefix({utf8_type->decompose(table_name)})));
-    return db::system_keyspace::query(proxy, schema_table_name, keyspace_key, clustering_range).then([keyspace_name] (auto&& rs) {
-        return schema_result_value_type{keyspace_name, std::move(rs)};
-    });
+    auto keyspace_key = partition_key::from_singular(*schema, keyspace_name);
+    auto clustering_range = query::clustering_range(clustering_key_prefix::from_clustering_prefix(
+            *schema, exploded_clustering_prefix({utf8_type->decompose(table_name)})));
+    auto slice = partition_slice_builder(*schema)
+            .with_range(std::move(clustering_range))
+            .build();
+    auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(), std::move(slice), query::max_rows);
+    return query_partition_mutation(proxy.local(), std::move(schema), std::move(cmd), std::move(keyspace_key));
 }

 static semaphore the_merge_lock;
@@ -452,7 +519,7 @@ future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mu
 }

 future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mutation> mutations, bool do_flush)
-{ 
+{
    return merge_lock().then([&proxy, mutations = std::move(mutations), do_flush] () mutable {
        return do_merge_schema(proxy, std::move(mutations), do_flush);
    }).finally([] {
@@ -460,6 +527,35 @@ future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mu
    });
 }

+// Returns names of live table definitions of given keyspace
+future<std::vector<sstring>>
+static read_table_names_of_keyspace(distributed<service::storage_proxy>& proxy, const sstring& keyspace_name) {
+    auto s = columnfamilies();
+    auto pkey = dht::global_partitioner().decorate_key(*s, partition_key::from_singular(*s, keyspace_name));
+    return db::system_keyspace::query(proxy, COLUMNFAMILIES, pkey).then([] (auto&& rs) {
+        std::vector<sstring> result;
+        for (const query::result_set_row& row : rs->rows()) {
+            result.emplace_back(row.get_nonnull<sstring>("columnfamily_name"));
+        }
+        return result;
+    });
+}
+
+// Call inside a seastar thread
+static
+std::map<qualified_name, schema_mutations>
+read_tables_for_keyspaces(distributed<service::storage_proxy>& proxy, const std::set<sstring>& keyspace_names)
+{
+    std::map<qualified_name, schema_mutations> result;
+    for (auto&& keyspace_name : keyspace_names) {
+        for (auto&& table_name : read_table_names_of_keyspace(proxy, keyspace_name).get0()) {
+            auto qn = qualified_name(keyspace_name, table_name);
+            result.emplace(qn, read_table_mutations(proxy, qn).get0());
+        }
+    }
+    return result;
+}
+
 future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mutation> mutations, bool do_flush)
 {
   return seastar::async([&proxy, mutations = std::move(mutations), do_flush] () mutable {
@@ -474,7 +570,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector

       // current state of the schema
       auto&& old_keyspaces = read_schema_for_keyspaces(proxy, KEYSPACES, keyspaces).get0();
-       auto&& old_column_families = read_schema_for_keyspaces(proxy, COLUMNFAMILIES, keyspaces).get0();
+       auto&& old_column_families = read_tables_for_keyspaces(proxy, keyspaces);
       /*auto& old_types = */read_schema_for_keyspaces(proxy, USERTYPES, keyspaces).get0();
 #if 0 // not in 2.1.8
       /*auto& old_functions = */read_schema_for_keyspaces(proxy, FUNCTIONS, keyspaces).get0();
@@ -494,7 +590,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector

      // with new data applied
       auto&& new_keyspaces = read_schema_for_keyspaces(proxy, KEYSPACES, keyspaces).get0();
-       auto&& new_column_families = read_schema_for_keyspaces(proxy, COLUMNFAMILIES, keyspaces).get0();
+       auto&& new_column_families = read_tables_for_keyspaces(proxy, keyspaces);
       /*auto& new_types = */read_schema_for_keyspaces(proxy, USERTYPES, keyspaces).get0();
 #if 0 // not in 2.1.8
       /*auto& new_functions = */read_schema_for_keyspaces(proxy, FUNCTIONS, keyspaces).get0();
@@ -502,7 +598,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector
 #endif

       std::set<sstring> keyspaces_to_drop = merge_keyspaces(proxy, std::move(old_keyspaces), std::move(new_keyspaces)).get0();
-       merge_tables(proxy, std::move(old_column_families), std::move(new_column_families)).get0();
+       merge_tables(proxy, std::move(old_column_families), std::move(new_column_families));
 #if 0
       mergeTypes(oldTypes, newTypes);
       mergeFunctions(oldFunctions, newFunctions);
@@ -512,15 +608,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector
           // it is safe to drop a keyspace only when all nested ColumnFamilies where deleted
           for (auto&& keyspace_to_drop : keyspaces_to_drop) {
               db.drop_keyspace(keyspace_to_drop);
-           }
-           // FIXME: clean this up by reorganizing the code
-           // Send CQL events only once, not once per shard.
-           if (engine().cpu_id() == 0) {
-               return do_for_each(keyspaces_to_drop, [] (auto& ks_name) {
-                   return service::migration_manager::notify_drop_keyspace(ks_name);
-               });
-           } else {
-               return make_ready_future<>();
+               service::get_local_migration_manager().notify_drop_keyspace(keyspace_to_drop);
           }
       }).get0();
   });
@@ -551,138 +639,84 @@ future<std::set<sstring>> merge_keyspaces(distributed<service::storage_proxy>& p
    }
    for (auto&& key : diff.entries_only_on_right) {
        auto&& value = after[key];
-        if (!value->empty()) {
-            created.emplace_back(schema_result_value_type{key, std::move(value)});
-        }
+        created.emplace_back(schema_result_value_type{key, std::move(value)});
    }
    for (auto&& key : diff.entries_differing) {
-        sstring keyspace_name = key;
-
-        auto&& pre  = before[key];
-        auto&& post = after[key];
-
-        if (!pre->empty() && !post->empty()) {
-            altered.emplace_back(keyspace_name);
-        } else if (!pre->empty()) {
-            dropped.emplace(keyspace_name);
-        } else if (!post->empty()) { // a (re)created keyspace
-            created.emplace_back(schema_result_value_type{key, std::move(post)});
-        }
+        altered.emplace_back(key);
    }
    return do_with(std::move(created), [&proxy, altered = std::move(altered)] (auto& created) {
        return proxy.local().get_db().invoke_on_all([&created, altered = std::move(altered)] (database& db) {
-            return do_for_each(created, [&db] (auto&& val) {
+            return do_for_each(created, [&db](auto&& val) {
                auto ksm = create_keyspace_from_schema_partition(val);
-                return db.create_keyspace(std::move(ksm));
+                return db.create_keyspace(ksm).then([ksm] {
+                    service::get_local_migration_manager().notify_create_keyspace(ksm);
+                });
            }).then([&altered, &db] () mutable {
                for (auto&& name : altered) {
                    db.update_keyspace(name);
                }
-
-                return make_ready_future<>();
            });
-        }).then([&created] {
-            // FIXME: clean this up by reorganizing the code
-            // Send CQL events only once, not once per shard.
-            if (engine().cpu_id() == 0) {
-                return do_for_each(created, [] (auto&& partition) {
-                    auto ksm = create_keyspace_from_schema_partition(partition);
-                    return service::migration_manager::notify_create_keyspace(ksm);
-                });
-            } else {
-                return make_ready_future<>();
-            }
        });
    }).then([dropped = std::move(dropped)] () {
        return make_ready_future<std::set<sstring>>(dropped);
    });
 }

+static void update_column_family(database& db, schema_ptr new_schema) {
+    column_family& cfm = db.find_column_family(new_schema->id());
+
+    bool columns_changed = !cfm.schema()->equal_columns(*new_schema);
+
+    auto s = local_schema_registry().learn(new_schema);
+    s->registry_entry()->mark_synced();
+    cfm.set_schema(std::move(s));
+
+    service::get_local_migration_manager().notify_update_column_family(cfm.schema(), columns_changed);
+}
+
 // see the comments for merge_keyspaces()
-future<> merge_tables(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after)
+static void merge_tables(distributed<service::storage_proxy>& proxy,
+    std::map<qualified_name, schema_mutations>&& before,
+    std::map<qualified_name, schema_mutations>&& after)
 {
-    return do_with(std::make_pair(std::move(after), std::move(before)), [&proxy] (auto& pair) {
-        auto& after = pair.first;
-        auto& before = pair.second;
-        auto changed_at = db_clock::now();
-        return proxy.local().get_db().invoke_on_all([changed_at, &proxy, &before, &after] (database& db) {
-            return seastar::async([changed_at, &proxy, &db, &before, &after] {
-                std::vector<schema_ptr> created;
-                std::vector<schema_ptr> altered;
-                std::vector<schema_ptr> dropped;
-                auto diff = difference(before, after, [](const auto& x, const auto& y) -> bool {
-                    return *x == *y;
+    auto changed_at = db_clock::now();
+    std::vector<global_schema_ptr> created;
+    std::vector<global_schema_ptr> altered;
+    std::vector<global_schema_ptr> dropped;
+
+    auto diff = difference(before, after);
+    for (auto&& key : diff.entries_only_on_left) {
+        auto&& s = proxy.local().get_db().local().find_schema(key.keyspace_name, key.table_name);
+        dropped.emplace_back(s);
+    }
+    for (auto&& key : diff.entries_only_on_right) {
+        created.emplace_back(create_table_from_mutations(after.at(key)));
+    }
+    for (auto&& key : diff.entries_differing) {
+        altered.emplace_back(create_table_from_mutations(after.at(key)));
+    }
+
+    proxy.local().get_db().invoke_on_all([&created, &dropped, &altered, changed_at] (database& db) {
+        return seastar::async([&] {
+            for (auto&& gs : created) {
+                schema_ptr s = gs.get();
+                auto& ks = db.find_keyspace(s->ks_name());
+                auto cfg = ks.make_column_family_config(*s);
+                db.add_column_family(s, cfg);
+                ks.make_directory_for_column_family(s->cf_name(), s->id()).get();
+                service::get_local_migration_manager().notify_create_column_family(s);
+            }
+            for (auto&& gs : altered) {
+                update_column_family(db, gs.get());
+            }
+            parallel_for_each(dropped.begin(), dropped.end(), [changed_at, &db](auto&& gs) {
+                schema_ptr s = gs.get();
+                return db.drop_column_family(changed_at, s->ks_name(), s->cf_name()).then([s] {
+                    service::get_local_migration_manager().notify_drop_column_family(s);
                });
-                for (auto&& key : diff.entries_only_on_left) {
-                    auto&& rs = before[key];
-                    for (const query::result_set_row& row : rs->rows()) {
-                        auto ks_name = row.get_nonnull<sstring>("keyspace_name");
-                        auto cf_name = row.get_nonnull<sstring>("columnfamily_name");
-                        dropped.emplace_back(db.find_schema(ks_name, cf_name));
-                    }
-                }
-                for (auto&& key : diff.entries_only_on_right) {
-                    auto&& value = after[key];
-                    if (!value->empty()) {
-                        auto&& tables = create_tables_from_tables_partition(proxy, value).get0();
-                        boost::copy(tables | boost::adaptors::map_values, std::back_inserter(created));
-                    }
-                }
-                for (auto&& key : diff.entries_differing) {
-                    sstring keyspace_name = key;
-
-                    auto&& pre  = before[key];
-                    auto&& post = after[key];
-
-                    if (!pre->empty() && !post->empty()) {
-                        auto before = db.find_keyspace(keyspace_name).metadata()->cf_meta_data();
-                        auto after = create_tables_from_tables_partition(proxy, post).get0();
-                        auto delta = difference(std::map<sstring, schema_ptr>{before.begin(), before.end()}, after, [](const schema_ptr& x, const schema_ptr& y) -> bool {
-                            return *x == *y;
-                        });
-                        for (auto&& key : delta.entries_only_on_left) {
-                            dropped.emplace_back(before[key]);
-                        }
-                        for (auto&& key : delta.entries_only_on_right) {
-                            created.emplace_back(after[key]);
-                        }
-                        for (auto&& key : delta.entries_differing) {
-                            altered.emplace_back(after[key]);
-                        }
-                    } else if (!pre->empty()) {
-                        auto before = db.find_keyspace(keyspace_name).metadata()->cf_meta_data();
-                        boost::copy(before | boost::adaptors::map_values, std::back_inserter(dropped));
-                    } else if (!post->empty()) {
-                        auto tables = create_tables_from_tables_partition(proxy, post).get0();
-                        boost::copy(tables | boost::adaptors::map_values, std::back_inserter(created));
-                    }
-                }
-                for (auto&& cfm : created) {
-                    auto& ks = db.find_keyspace(cfm->ks_name());
-                    auto cfg = ks.make_column_family_config(*cfm);
-                    db.add_column_family(cfm, cfg);
-                }
-                parallel_for_each(altered.begin(), altered.end(), [&db] (auto&& cfm) {
-                    return db.update_column_family(cfm->ks_name(), cfm->cf_name());
-                }).get();
-                parallel_for_each(dropped.begin(), dropped.end(), [changed_at, &db] (auto&& cfm) {
-                    return db.drop_column_family(changed_at, cfm->ks_name(), cfm->cf_name());
-                }).get();
-                // FIXME: clean this up by reorganizing the code
-                // Send CQL events only once, not once per shard.
-                if (engine().cpu_id() == 0) {
-                    for (auto&& cfm : created) {
-                        service::migration_manager::notify_create_column_family(cfm).get0();
-                        auto& ks = db.find_keyspace(cfm->ks_name());
-                        ks.make_directory_for_column_family(cfm->cf_name(), cfm->id());
-                    }
-                    for (auto&& cfm : dropped) {
-                        service::migration_manager::notify_drop_column_family(cfm).get0();
-                    }
-                }
-            });
+            }).get();
        });
-    });
+    }).get();
 }

 #if 0
@@ -871,7 +905,7 @@ std::vector<mutation> make_create_keyspace_mutations(lw_shared_ptr<keyspace_meta
            addTypeToSchemaMutation(type, timestamp, mutation);
 #endif
        for (auto&& kv : keyspace->cf_meta_data()) {
-            add_table_to_schema_mutation(kv.second, timestamp, true, pkey, mutations);
+            add_table_to_schema_mutation(kv.second, timestamp, true, mutations);
        }
    }
    return mutations;
@@ -997,17 +1031,19 @@ std::vector<mutation> make_create_table_mutations(lw_shared_ptr<keyspace_metadat
 {
    // Include the serialized keyspace in case the target node missed a CREATE KEYSPACE migration (see CASSANDRA-5631).
    auto mutations = make_create_keyspace_mutations(keyspace, timestamp, false);
-    schema_ptr s = keyspaces();
-    auto pkey = partition_key::from_singular(*s, keyspace->name());
-    add_table_to_schema_mutation(table, timestamp, true, pkey, mutations);
+    add_table_to_schema_mutation(table, timestamp, true, mutations);
    return mutations;
 }

-void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, const partition_key& pkey, std::vector<mutation>& mutations)
+schema_mutations make_table_mutations(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers)
 {
+    // When adding new schema properties, don't set cells for default values so that
+    // both old and new nodes will see the same version during rolling upgrades.
+
    // For property that can be null (and can be changed), we insert tombstones, to make sure
    // we don't keep a property the user has removed
    schema_ptr s = columnfamilies();
+    auto pkey = partition_key::from_singular(*s, table->ks_name());
    mutation m{pkey, s};
    auto ckey = clustering_key::from_singular(*s, table->cf_name());
    m.set_clustered_cell(ckey, "cf_id", table->id(), timestamp);
@@ -1066,16 +1102,24 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
    if (table->compact_columns_count() == 1) {
        m.set_clustered_cell(ckey, "value_alias", table->compact_column().name_as_text(), timestamp);
    } // null if none
-#if 0
-    for (Map.Entry<ColumnIdentifier, Long> entry : table.getDroppedColumns().entrySet())
-        adder.addMapEntry("dropped_columns", entry.getKey().toString(), entry.getValue());
-#endif
+
+    map_type_impl::mutation dropped_columns;
+    auto dropped_columns_column = s->get_column_definition("dropped_columns");
+    assert(dropped_columns_column);
+    auto dropped_columns_type = static_pointer_cast<const map_type_impl>(dropped_columns_column->type);
+    for (auto&& entry : table->dropped_columns()) {
+        dropped_columns.cells.emplace_back(dropped_columns_type->get_keys_type()->decompose(data_value(entry.first)),
+            atomic_cell::make_live(timestamp, dropped_columns_type->get_values_type()->decompose(entry.second)));
+    }
+    m.set_clustered_cell(ckey, *dropped_columns_column,
+        atomic_cell_or_collection::from_collection_mutation(dropped_columns_type->serialize_mutation_form(std::move(dropped_columns))));

    m.set_clustered_cell(ckey, "is_dense", table->is_dense(), timestamp);

+    mutation columns_mutation(pkey, columns());
    if (with_columns_and_triggers) {
        for (auto&& column : table->all_columns_in_select_order()) {
-            add_column_to_schema_mutation(table, column, timestamp, pkey, mutations);
+            add_column_to_schema_mutation(table, column, timestamp, columns_mutation);
        }

 #if 0
@@ -1083,42 +1127,51 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
            addTriggerToSchemaMutation(table, trigger, timestamp, mutation);
 #endif
    }
-    mutations.emplace_back(std::move(m));
+    return schema_mutations{std::move(m), std::move(columns_mutation)};
 }

-#if 0
-    public static Mutation makeUpdateTableMutation(KSMetaData keyspace,
-                                                   CFMetaData oldTable,
-                                                   CFMetaData newTable,
-                                                   long timestamp,
-                                                   boolean fromThrift)
-    {
-        Mutation mutation = makeCreateKeyspaceMutation(keyspace, timestamp, false);
+void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, std::vector<mutation>& mutations)
+{
+    make_table_mutations(table, timestamp, with_columns_and_triggers).copy_to(mutations);
+}

-        addTableToSchemaMutation(newTable, timestamp, false, mutation);
+std::vector<mutation> make_update_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace,
+    schema_ptr old_table,
+    schema_ptr new_table,
+    api::timestamp_type timestamp,
+    bool from_thrift)
+{
+    // Include the serialized keyspace in case the target node missed a CREATE KEYSPACE migration (see CASSANDRA-5631).
+    auto mutations = make_create_keyspace_mutations(keyspace, timestamp, false);

-        MapDifference<ByteBuffer, ColumnDefinition> columnDiff = Maps.difference(oldTable.getColumnMetadata(),
-                                                                                 newTable.getColumnMetadata());
+    add_table_to_schema_mutation(new_table, timestamp, false, mutations);

-        // columns that are no longer needed
-        for (ColumnDefinition column : columnDiff.entriesOnlyOnLeft().values())
-        {
-            // Thrift only knows about the REGULAR ColumnDefinition type, so don't consider other type
-            // are being deleted just because they are not here.
-            if (fromThrift && column.kind != ColumnDefinition.Kind.REGULAR)
-                continue;
+    mutation columns_mutation(partition_key::from_singular(*columns(), old_table->ks_name()), columns());

-            dropColumnFromSchemaMutation(oldTable, column, timestamp, mutation);
+    auto diff = difference(old_table->all_columns(), new_table->all_columns());
+
+    // columns that are no longer needed
+    for (auto&& name : diff.entries_only_on_left) {
+        // Thrift only knows about the REGULAR ColumnDefinition type, so don't consider other type
+        // are being deleted just because they are not here.
+        const column_definition& column = *old_table->all_columns().at(name);
+        if (from_thrift && !column.is_regular()) {
+            continue;
        }

-        // newly added columns
-        for (ColumnDefinition column : columnDiff.entriesOnlyOnRight().values())
-            addColumnToSchemaMutation(newTable, column, timestamp, mutation);
+        drop_column_from_schema_mutation(old_table, column, timestamp, mutations);
+    }

-        // old columns with updated attributes
-        for (ByteBuffer name : columnDiff.entriesDiffering().keySet())
-            addColumnToSchemaMutation(newTable, newTable.getColumnDefinition(name), timestamp, mutation);
+    // newly added columns and old columns with updated attributes
+    for (auto&& name : boost::range::join(diff.entries_differing, diff.entries_only_on_right)) {
+        const column_definition& column = *new_table->all_columns().at(name);
+        add_column_to_schema_mutation(new_table, column, timestamp, columns_mutation);
+    }

+    mutations.emplace_back(std::move(columns_mutation));
+
+    warn(unimplemented::cause::TRIGGERS);
+#if 0
        MapDifference<String, TriggerDefinition> triggerDiff = Maps.difference(oldTable.getTriggers(), newTable.getTriggers());

        // dropped triggers
@@ -1129,9 +1182,9 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
        for (TriggerDefinition trigger : triggerDiff.entriesOnlyOnRight().values())
            addTriggerToSchemaMutation(newTable, trigger, timestamp, mutation);

-        return mutation;
-    }
 #endif
+    return mutations;
+}

 std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp)
 {
@@ -1159,13 +1212,39 @@ std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata>
    return mutations;
 }

+static future<schema_mutations> read_table_mutations(distributed<service::storage_proxy>& proxy, const qualified_name& table)
+{
+    return read_schema_partition_for_table(proxy, COLUMNFAMILIES, table.keyspace_name, table.table_name)
+        .then([&proxy, table] (mutation cf_m) {
+            return read_schema_partition_for_table(proxy, COLUMNS, table.keyspace_name, table.table_name)
+                .then([cf_m = std::move(cf_m)] (mutation col_m) {
+                    return schema_mutations{std::move(cf_m), std::move(col_m)};
+                });
+#if 0
+        // FIXME:
+    Row serializedTriggers = readSchemaPartitionForTable(TRIGGERS, ksName, cfName);
+    try
+    {
+        for (TriggerDefinition trigger : createTriggersFromTriggersPartition(serializedTriggers))
+            cfm.addTriggerDefinition(trigger);
+    }
+    catch (InvalidRequestException e)
+    {
+        throw new RuntimeException(e);
+    }
+#endif
+    });
+}
+
 future<schema_ptr> create_table_from_name(distributed<service::storage_proxy>& proxy, const sstring& keyspace, const sstring& table)
 {
-    return read_schema_partition_for_table(proxy, COLUMNFAMILIES, keyspace, table).then([&proxy, keyspace, table] (auto partition) {
-        if (partition.second->empty()) {
-            throw std::runtime_error(sprint("%s:%s not found in the schema definitions keyspace.", keyspace, table));
-        }
-        return create_table_from_table_partition(proxy, std::move(partition.second));
+    return do_with(qualified_name(keyspace, table), [&proxy] (auto&& qn) {
+        return read_table_mutations(proxy, qn).then([qn] (schema_mutations sm) {
+            if (!sm.live()) {
+               throw std::runtime_error(sprint("%s:%s not found in the schema definitions keyspace.", qn.keyspace_name, qn.table_name));
+            }
+            return create_table_from_mutations(std::move(sm));
+        });
    });
 }

@@ -1194,18 +1273,6 @@ future<std::map<sstring, schema_ptr>> create_tables_from_tables_partition(distri
    }
 #endif

-void create_table_from_table_row_and_columns_partition(schema_builder& builder, const query::result_set_row& table_row, const schema_result::value_type& serialized_columns)
-{
-    create_table_from_table_row_and_column_rows(builder, table_row, serialized_columns.second);
-}
-
-future<schema_ptr> create_table_from_table_partition(distributed<service::storage_proxy>& proxy, lw_shared_ptr<query::result_set>&& partition)
-{
-    return do_with(std::move(partition), [&proxy] (auto& partition) {
-        return create_table_from_table_row(proxy, partition->row(0));
-    });
-}
-
 /**
 * Deserialize table metadata from low-level representation
 *
@@ -1215,31 +1282,18 @@ future<schema_ptr> create_table_from_table_row(distributed<service::storage_prox
 {
    auto ks_name = row.get_nonnull<sstring>("keyspace_name");
    auto cf_name = row.get_nonnull<sstring>("columnfamily_name");
-    auto id = row.get_nonnull<utils::UUID>("cf_id");
-    return read_schema_partition_for_table(proxy, COLUMNS, ks_name, cf_name).then([&row, ks_name, cf_name, id] (auto serialized_columns) {
-        schema_builder builder{ks_name, cf_name, id};
-        create_table_from_table_row_and_columns_partition(builder, row, serialized_columns);
-        return builder.build();
-    });
-#if 0
-    // FIXME:
-    Row serializedTriggers = readSchemaPartitionForTable(TRIGGERS, ksName, cfName);
-    try
-    {
-        for (TriggerDefinition trigger : createTriggersFromTriggersPartition(serializedTriggers))
-            cfm.addTriggerDefinition(trigger);
-    }
-    catch (InvalidRequestException e)
-    {
-        throw new RuntimeException(e);
-    }
-#endif
+    return create_table_from_name(proxy, ks_name, cf_name);
 }

-void create_table_from_table_row_and_column_rows(schema_builder& builder, const query::result_set_row& table_row, const schema_result::mapped_type& serialized_column_definitions)
+schema_ptr create_table_from_mutations(schema_mutations sm, std::experimental::optional<table_schema_version> version)
 {
+    auto table_rs = query::result_set(sm.columnfamilies_mutation());
+    query::result_set_row table_row = table_rs.row(0);
+
    auto ks_name = table_row.get_nonnull<sstring>("keyspace_name");
    auto cf_name = table_row.get_nonnull<sstring>("columnfamily_name");
+    auto id = table_row.get_nonnull<utils::UUID>("cf_id");
+    schema_builder builder{ks_name, cf_name, id};

 #if 0
    AbstractType<?> rawComparator = TypeParser.parse(result.getString("comparator"));
@@ -1257,11 +1311,12 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
    AbstractType<?> fullRawComparator = CFMetaData.makeRawAbstractType(rawComparator, subComparator);
 #endif

-    std::vector<column_definition> column_defs = create_columns_from_column_rows(serialized_column_definitions,
-                                                                    ks_name,
-                                                                    cf_name,/*,
-                                                                    fullRawComparator, */
-                                                                    cf == cf_type::super);
+    std::vector<column_definition> column_defs = create_columns_from_column_rows(
+            query::result_set(sm.columns_mutation()),
+            ks_name,
+            cf_name,/*,
+            fullRawComparator, */
+            cf == cf_type::super);

    bool is_dense;
    if (table_row.has("is_dense")) {
@@ -1272,8 +1327,10 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
    }

-    bool is_compound = cell_comparator::check_compound(table_row.get_nonnull<sstring>("comparator"));
+    auto comparator = table_row.get_nonnull<sstring>("comparator");
+    bool is_compound = cell_comparator::check_compound(comparator);
    builder.set_is_compound(is_compound);
+    cell_comparator::read_collections(builder, comparator);
 #if 0
    CellNameType comparator = CellNames.fromAbstractType(fullRawComparator, isDense);

@@ -1365,13 +1422,22 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
        builder.set_bloom_filter_fp_chance(builder.get_bloom_filter_fp_chance());
    }

-#if 0
-    if (result.has("dropped_columns"))
-        cfm.droppedColumns(convertDroppedColumns(result.getMap("dropped_columns", UTF8Type.instance, LongType.instance)));
-#endif
+    if (table_row.has("dropped_columns")) {
+        auto map = table_row.get_nonnull<map_type_impl::native_type>("dropped_columns");
+        for (auto&& entry : map) {
+            builder.without_column(value_cast<sstring>(entry.first), value_cast<api::timestamp_type>(entry.second));
+        };
+    }
+
    for (auto&& cdef : column_defs) {
        builder.with_column(cdef);
    }
+    if (version) {
+        builder.with_version(*version);
+    } else {
+        builder.with_version(sm.digest());
+    }
+    return builder.build();
 }

 #if 0
@@ -1391,12 +1457,9 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
 void add_column_to_schema_mutation(schema_ptr table,
                                   const column_definition& column,
                                   api::timestamp_type timestamp,
-                                   const partition_key& pkey,
-                                   std::vector<mutation>& mutations)
+                                   mutation& m)
 {
-    schema_ptr s = columns();
-    mutation m{pkey, s};
-    auto ckey = clustering_key::from_exploded(*s, {utf8_type->decompose(table->cf_name()), column.name()});
+    auto ckey = clustering_key::from_exploded(*m.schema(), {utf8_type->decompose(table->cf_name()), column.name()});
    m.set_clustered_cell(ckey, "validator", column.type->name(), timestamp);
    m.set_clustered_cell(ckey, "type", serialize_kind(column.kind), timestamp);
    if (!column.is_on_all_components()) {
@@ -1407,7 +1470,6 @@ void add_column_to_schema_mutation(schema_ptr table,
    adder.add("index_type", column.getIndexType() == null ? null : column.getIndexType().toString());
    adder.add("index_options", json(column.getIndexOptions()));
 #endif
-    mutations.emplace_back(std::move(m));
 }

 sstring serialize_kind(column_kind kind)
@@ -1448,14 +1510,14 @@ void drop_column_from_schema_mutation(schema_ptr table, const column_definition&
    mutations.emplace_back(m);
 }

-std::vector<column_definition> create_columns_from_column_rows(const schema_result::mapped_type& rows,
+std::vector<column_definition> create_columns_from_column_rows(const query::result_set& rows,
                                                               const sstring& keyspace,
                                                               const sstring& table, /*,
                                                               AbstractType<?> rawComparator, */
                                                               bool is_super)
 {
    std::vector<column_definition> columns;
-    for (auto&& row : rows->rows()) {
+    for (auto&& row : rows.rows()) {
        columns.emplace_back(std::move(create_column_from_column_row(row, keyspace, table, /*, rawComparator, */ is_super)));
    }
    return columns;
--- a/db/schema_tables.hh
+++ b/db/schema_tables.hh
@@ -43,6 +43,8 @@
 #include "service/storage_proxy.hh"
 #include "mutation.hh"
 #include "schema.hh"
+#include "hashing.hh"
+#include "schema_mutations.hh"

 #include <vector>
 #include <map>
@@ -92,17 +94,24 @@ std::vector<mutation> make_drop_keyspace_mutations(lw_shared_ptr<keyspace_metada

 lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result_value_type& partition);

-future<> merge_tables(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after);
-
 lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result_value_type& partition);

 mutation make_create_keyspace_mutation(lw_shared_ptr<keyspace_metadata> keyspace, api::timestamp_type timestamp, bool with_tables_and_types_and_functions = true);

 std::vector<mutation> make_create_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp);

+std::vector<mutation> make_update_table_mutations(
+    lw_shared_ptr<keyspace_metadata> keyspace,
+    schema_ptr old_table,
+    schema_ptr new_table,
+    api::timestamp_type timestamp,
+    bool from_thrift);
+
+schema_mutations make_table_mutations(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers = true);
+
 future<std::map<sstring, schema_ptr>> create_tables_from_tables_partition(distributed<service::storage_proxy>& proxy, const schema_result::mapped_type& result);

-void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, const partition_key& pkey, std::vector<mutation>& mutations);
+void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, std::vector<mutation>& mutations);

 std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp);

@@ -110,13 +119,11 @@ future<schema_ptr> create_table_from_name(distributed<service::storage_proxy>& p

 future<schema_ptr> create_table_from_table_row(distributed<service::storage_proxy>& proxy, const query::result_set_row& row);

-void create_table_from_table_row_and_column_rows(schema_builder& builder, const query::result_set_row& table_row, const schema_result::mapped_type& serialized_columns);
-
-future<schema_ptr> create_table_from_table_partition(distributed<service::storage_proxy>& proxy, lw_shared_ptr<query::result_set>&& partition);
+schema_ptr create_table_from_mutations(schema_mutations, std::experimental::optional<table_schema_version> version = {});

 void drop_column_from_schema_mutation(schema_ptr table, const column_definition& column, long timestamp, std::vector<mutation>& mutations);

-std::vector<column_definition> create_columns_from_column_rows(const schema_result::mapped_type& rows,
+std::vector<column_definition> create_columns_from_column_rows(const query::result_set& rows,
                                                               const sstring& keyspace,
                                                               const sstring& table,/*,
                                                               AbstractType<?> rawComparator, */
@@ -129,11 +136,25 @@ column_definition create_column_from_column_row(const query::result_set_row& row
                                                bool is_super);


-void add_column_to_schema_mutation(schema_ptr table, const column_definition& column, api::timestamp_type timestamp, const partition_key& pkey, std::vector<mutation>& mutations);
+void add_column_to_schema_mutation(schema_ptr table, const column_definition& column, api::timestamp_type timestamp, mutation& mutation);

 sstring serialize_kind(column_kind kind);
 column_kind deserialize_kind(sstring kind);
 data_type parse_type(sstring str);

+schema_ptr columns();
+schema_ptr columnfamilies();
+
+template<typename Hasher>
+void feed_hash_for_schema_digest(Hasher& h, const mutation& m) {
+    // Cassandra is skipping tombstones from digest calculation
+    // to avoid disagreements due to tombstone GC.
+    // See https://issues.apache.org/jira/browse/CASSANDRA-6862.
+    // We achieve similar effect with compact_for_compaction().
+    mutation m_compacted(m);
+    m_compacted.partition().compact_for_compaction(*m.schema(), api::max_timestamp, gc_clock::time_point::max());
+    feed_hash(h, m_compacted);
+}
+
 } // namespace schema_tables
 } // namespace db
--- a/db/serializer.cc
+++ b/db/serializer.cc
@@ -69,6 +69,11 @@ void db::serializer<bytes>::read(bytes& b, input& in) {
    b = in.read<bytes>();
 }

+template<>
+void db::serializer<bytes>::skip(input& in) {
+    in.read<bytes>(); // FIXME: Avoid reading
+}
+
 template<>
 db::serializer<bytes_view>::serializer(const bytes_view& v)
        : _item(v), _size(output::serialized_size(v)) {
@@ -104,6 +109,11 @@ void db::serializer<sstring>::read(sstring& s, input& in) {
    s = in.read<sstring>();
 }

+template<>
+void db::serializer<sstring>::skip(input& in) {
+    in.read<sstring>(); // FIXME: avoid reading
+}
+
 template<>
 db::serializer<tombstone>::serializer(const tombstone& t)
        : _item(t), _size(sizeof(t.timestamp) + sizeof(decltype(t.deletion_time.time_since_epoch().count()))) {
@@ -157,81 +167,6 @@ void db::serializer<collection_mutation_view>::read(collection_mutation_view& c,
    c = collection_mutation_view::from_bytes(bytes_view_serializer::read(in));
 }

-template<>
-db::serializer<partition_key_view>::serializer(const partition_key_view& key)
-    : _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
-}
-
-template<>
-void db::serializer<partition_key_view>::write(output& out, const partition_key_view& key) {
-    bytes_view v = key.representation();
-    out.write<uint16_t>(v.size());
-    out.write(v.begin(), v.end());
-}
-
-template<>
-void db::serializer<partition_key_view>::read(partition_key_view& b, input& in) {
-    auto len = in.read<uint16_t>();
-    b = partition_key_view::from_bytes(in.read_view(len));
-}
-
-template<>
-partition_key_view db::serializer<partition_key_view>::read(input& in) {
-    auto len = in.read<uint16_t>();
-    return partition_key_view::from_bytes(in.read_view(len));
-}
-
-template<>
-void db::serializer<partition_key_view>::skip(input& in) {
-    auto len = in.read<uint16_t>();
-    in.skip(len);
-}
-
-template<>
-db::serializer<clustering_key_prefix_view>::serializer(const clustering_key_prefix_view& key)
-    : _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
-}
-
-template<>
-void db::serializer<clustering_key_prefix_view>::write(output& out, const clustering_key_prefix_view& key) {
-    bytes_view v = key.representation();
-    out.write<uint16_t>(v.size());
-    out.write(v.begin(), v.end());
-}
-
-template<>
-void db::serializer<clustering_key_prefix_view>::read(clustering_key_prefix_view& b, input& in) {
-    auto len = in.read<uint16_t>();
-    b = clustering_key_prefix_view::from_bytes(in.read_view(len));
-}
-
-template<>
-clustering_key_prefix_view db::serializer<clustering_key_prefix_view>::read(input& in) {
-    auto len = in.read<uint16_t>();
-    return clustering_key_prefix_view::from_bytes(in.read_view(len));
-}
-
-template<>
-db::serializer<frozen_mutation>::serializer(const frozen_mutation& mutation)
-    : _item(mutation), _size(sizeof(uint32_t) /* size */ + mutation.representation().size()) {
-}
-
-template<>
-void db::serializer<frozen_mutation>::write(output& out, const frozen_mutation& mutation) {
-    bytes_view v = mutation.representation();
-    out.write(v);
-}
-
-template<>
-void db::serializer<frozen_mutation>::read(frozen_mutation& m, input& in) {
-    m = read(in);
-}
-
-template<>
-frozen_mutation db::serializer<frozen_mutation>::read(input& in) {
-    return frozen_mutation(bytes_serializer::read(in));
-}
-
 template<>
 db::serializer<db::replay_position>::serializer(const db::replay_position& rp)
        : _item(rp), _size(sizeof(uint64_t) * 2) {
@@ -256,7 +191,4 @@ template class db::serializer<sstring> ;
 template class db::serializer<atomic_cell_view> ;
 template class db::serializer<collection_mutation_view> ;
 template class db::serializer<utils::UUID> ;
-template class db::serializer<partition_key_view> ;
-template class db::serializer<clustering_key_prefix_view> ;
-template class db::serializer<frozen_mutation> ;
 template class db::serializer<db::replay_position> ;
--- a/db/serializer.hh
+++ b/db/serializer.hh
@@ -28,9 +28,7 @@
 #include "utils/data_output.hh"
 #include "bytes_ostream.hh"
 #include "bytes.hh"
-#include "keys.hh"
 #include "database_fwd.hh"
-#include "frozen_mutation.hh"
 #include "db/commitlog/replay_position.hh"

 namespace db {
@@ -180,6 +178,7 @@ template<> utils::UUID serializer<utils::UUID>::read(input&);
 template<> serializer<bytes>::serializer(const bytes &);
 template<> void serializer<bytes>::write(output&, const type&);
 template<> void serializer<bytes>::read(bytes&, input&);
+template<> void serializer<bytes>::skip(input&);

 template<> serializer<bytes_view>::serializer(const bytes_view&);
 template<> void serializer<bytes_view>::write(output&, const type&);
@@ -189,6 +188,7 @@ template<> bytes_view serializer<bytes_view>::read(input&);
 template<> serializer<sstring>::serializer(const sstring&);
 template<> void serializer<sstring>::write(output&, const type&);
 template<> void serializer<sstring>::read(sstring&, input&);
+template<> void serializer<sstring>::skip(input&);

 template<> serializer<tombstone>::serializer(const tombstone &);
 template<> void serializer<tombstone>::write(output&, const type&);
@@ -203,22 +203,6 @@ template<> serializer<collection_mutation_view>::serializer(const collection_mut
 template<> void serializer<collection_mutation_view>::write(output&, const type&);
 template<> void serializer<collection_mutation_view>::read(collection_mutation_view&, input&);

-template<> serializer<frozen_mutation>::serializer(const frozen_mutation &);
-template<> void serializer<frozen_mutation>::write(output&, const type&);
-template<> void serializer<frozen_mutation>::read(frozen_mutation&, input&);
-template<> frozen_mutation serializer<frozen_mutation>::read(input&);
-
-template<> serializer<partition_key_view>::serializer(const partition_key_view &);
-template<> void serializer<partition_key_view>::write(output&, const partition_key_view&);
-template<> void serializer<partition_key_view>::read(partition_key_view&, input&);
-template<> partition_key_view serializer<partition_key_view>::read(input&);
-template<> void serializer<partition_key_view>::skip(input&);
-
-template<> serializer<clustering_key_prefix_view>::serializer(const clustering_key_prefix_view &);
-template<> void serializer<clustering_key_prefix_view>::write(output&, const clustering_key_prefix_view&);
-template<> void serializer<clustering_key_prefix_view>::read(clustering_key_prefix_view&, input&);
-template<> clustering_key_prefix_view serializer<clustering_key_prefix_view>::read(input&);
-
 template<> serializer<db::replay_position>::serializer(const db::replay_position&);
 template<> void serializer<db::replay_position>::write(output&, const db::replay_position&);
 template<> void serializer<db::replay_position>::read(db::replay_position&, input&);
@@ -235,9 +219,6 @@ extern template class serializer<bytes>;
 extern template class serializer<bytes_view>;
 extern template class serializer<sstring>;
 extern template class serializer<utils::UUID>;
-extern template class serializer<partition_key_view>;
-extern template class serializer<clustering_key_view>;
-extern template class serializer<clustering_key_prefix_view>;
 extern template class serializer<db::replay_position>;

 typedef serializer<tombstone> tombstone_serializer;
@@ -247,10 +228,6 @@ typedef serializer<sstring> sstring_serializer;
 typedef serializer<atomic_cell_view> atomic_cell_view_serializer;
 typedef serializer<collection_mutation_view> collection_mutation_view_serializer;
 typedef serializer<utils::UUID> uuid_serializer;
-typedef serializer<partition_key_view> partition_key_view_serializer;
-typedef serializer<clustering_key_view> clustering_key_view_serializer;
-typedef serializer<clustering_key_prefix_view> clustering_key_prefix_view_serializer;
-typedef serializer<frozen_mutation> frozen_mutation_serializer;
 typedef serializer<db::replay_position> replay_position_serializer;

 }
--- a/db/system_keyspace.cc
+++ b/db/system_keyspace.cc
@@ -63,6 +63,8 @@
 #include "partition_slice_builder.hh"
 #include "db/config.hh"
 #include "schema_builder.hh"
+#include "md5_hasher.hh"
+#include "release.hh"
 #include <core/enum.hh>

 using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
@@ -73,6 +75,23 @@ std::unique_ptr<query_context> qctx = {};

 namespace system_keyspace {

+static const api::timestamp_type creation_timestamp = api::new_timestamp();
+
+api::timestamp_type schema_creation_timestamp() {
+    return creation_timestamp;
+}
+
+// Increase whenever changing schema of any system table.
+// FIXME: Make automatic by calculating from schema structure.
+static const uint16_t version_sequence_number = 1;
+
+table_schema_version generate_schema_version(utils::UUID table_id) {
+    md5_hasher h;
+    feed_hash(h, table_id);
+    feed_hash(h, version_sequence_number);
+    return utils::UUID_gen::get_name_UUID(h.finalize());
+}
+
 // Currently, the type variables (uuid_type, etc.) are thread-local reference-
 // counted shared pointers. This forces us to also make the built in schemas
 // below thread-local as well.
@@ -101,6 +120,7 @@ schema_ptr hints() {
       )));
       builder.set_gc_grace_seconds(0);
       builder.set_compaction_strategy_options({{ "enabled", "false" }});
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::yes);
    }();
    return hints;
@@ -126,6 +146,7 @@ schema_ptr batchlog() {
        //    .compactionStrategyOptions(Collections.singletonMap("min_threshold", "2"))
       )));
       builder.set_gc_grace_seconds(0);
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return batchlog;
@@ -150,6 +171,7 @@ schema_ptr batchlog() {
        // operations on resulting CFMetaData:
        //    .compactionStrategyClass(LeveledCompactionStrategy.class);
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return paxos;
@@ -171,6 +193,7 @@ schema_ptr built_indexes() {
        // comment
        "built column indexes"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::yes);
    }();
    return built_indexes;
@@ -212,6 +235,7 @@ schema_ptr built_indexes() {
        // comment
        "information about the local node"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return local;
@@ -242,6 +266,7 @@ schema_ptr built_indexes() {
        // comment
        "information about known peers in the cluster"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return peers;
@@ -265,6 +290,7 @@ schema_ptr built_indexes() {
        // comment
        "events related to peers"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return peer_events;
@@ -286,6 +312,7 @@ schema_ptr built_indexes() {
        // comment
        "ranges requested for transfer"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return range_xfers;
@@ -311,6 +338,7 @@ schema_ptr built_indexes() {
        // comment
        "unfinished compactions"
        )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return compactions_in_progress;
@@ -340,6 +368,7 @@ schema_ptr built_indexes() {
        "week-long compaction history"
        )));
        builder.set_default_time_to_live(std::chrono::duration_cast<std::chrono::seconds>(days(7)));
+        builder.with_version(generate_schema_version(builder.uuid()));
        return builder.build(schema_builder::compact_storage::no);
    }();
    return compaction_history;
@@ -368,6 +397,7 @@ schema_ptr built_indexes() {
        // comment
        "historic sstable read rates"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return sstable_activity;
@@ -393,6 +423,7 @@ schema_ptr size_estimates() {
            "per-table primary range size estimates"
            )));
        builder.set_gc_grace_seconds(0);
+        builder.with_version(generate_schema_version(builder.uuid()));
        return builder.build(schema_builder::compact_storage::no);
    }();
    return size_estimates;
@@ -513,7 +544,6 @@ future<> setup(distributed<database>& db, distributed<cql3::query_processor>& qp
            return ms.init_local_preferred_ip_cache();
        });
    });
-    return make_ready_future<>();
 }

 typedef std::pair<replay_positions, db_clock::time_point> truncation_entry;
@@ -985,8 +1015,9 @@ query_mutations(distributed<service::storage_proxy>& proxy, const sstring& cf_na
    database& db = proxy.local().get_db().local();
    schema_ptr schema = db.find_schema(db::system_keyspace::NAME, cf_name);
    auto slice = partition_slice_builder(*schema).build();
-    auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), std::numeric_limits<uint32_t>::max());
-    return proxy.local().query_mutations_locally(cmd, query::full_partition_range);
+    auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(),
+        std::move(slice), std::numeric_limits<uint32_t>::max());
+    return proxy.local().query_mutations_locally(std::move(schema), std::move(cmd), query::full_partition_range);
 }

 future<lw_shared_ptr<query::result_set>>
@@ -994,7 +1025,8 @@ query(distributed<service::storage_proxy>& proxy, const sstring& cf_name) {
    database& db = proxy.local().get_db().local();
    schema_ptr schema = db.find_schema(db::system_keyspace::NAME, cf_name);
    auto slice = partition_slice_builder(*schema).build();
-    auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), std::numeric_limits<uint32_t>::max());
+    auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(),
+        std::move(slice), std::numeric_limits<uint32_t>::max());
    return proxy.local().query(schema, cmd, {query::full_partition_range}, db::consistency_level::ONE).then([schema, cmd] (auto&& result) {
        return make_lw_shared(query::result_set::from_raw_result(schema, cmd->slice, *result));
    });
@@ -1008,7 +1040,7 @@ query(distributed<service::storage_proxy>& proxy, const sstring& cf_name, const
    auto slice = partition_slice_builder(*schema)
        .with_range(std::move(row_range))
        .build();
-    auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), query::max_rows);
+    auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(), std::move(slice), query::max_rows);
    return proxy.local().query(schema, cmd, {query::partition_range::make_singular(key)}, db::consistency_level::ONE).then([schema, cmd] (auto&& result) {
        return make_lw_shared(query::result_set::from_raw_result(schema, cmd->slice, *result));
    });
--- a/db/system_keyspace.hh
+++ b/db/system_keyspace.hh
@@ -84,6 +84,8 @@ extern schema_ptr hints();
 extern schema_ptr batchlog();
 extern schema_ptr built_indexes(); // TODO (from Cassandra): make private

+table_schema_version generate_schema_version(utils::UUID table_id);
+
 // Only for testing.
 void minimal_setup(distributed<database>& db, distributed<cql3::query_processor>& qp);

@@ -673,5 +675,7 @@ future<> set_bootstrap_state(bootstrap_state state);
        executeInternal(String.format(cql, SSTABLE_ACTIVITY), keyspace, table, generation);
    }
 #endif
+
+    api::timestamp_type schema_creation_timestamp();
 } // namespace system_keyspace
 } // namespace db
--- a/dht/i_partitioner.cc
+++ b/dht/i_partitioner.cc
@@ -251,39 +251,6 @@ std::ostream& operator<<(std::ostream& out, const ring_position& pos) {
    return out << "}";
 }

-size_t ring_position::serialized_size() const {
-    size_t size = serialize_int32_size; /* _key length */
-    if (_key) {
-        size += _key.value().representation().size();
-    } else {
-        size += sizeof(int8_t); /* _token_bund */
-    }
-    return size + _token.serialized_size();
-}
-
-void ring_position::serialize(bytes::iterator& out) const {
-    _token.serialize(out);
-    if (_key) {
-        auto v = _key.value().representation();
-        serialize_int32(out, v.size());
-        out = std::copy(v.begin(), v.end(), out);
-    } else {
-        serialize_int32(out, 0);
-        serialize_int8(out, static_cast<int8_t>(_token_bound));
-    }
-}
-
-ring_position ring_position::deserialize(bytes_view& in) {
-    auto token = token::deserialize(in);
-    auto size = read_simple<uint32_t>(in);
-    if (size == 0) {
-        auto bound = dht::ring_position::token_bound(read_simple<int8_t>(in));
-        return ring_position(std::move(token), bound);
-    } else {
-        return ring_position(std::move(token), partition_key::from_bytes(to_bytes(read_simple_bytes(in, size))));
-    }
-}
-
 unsigned shard_of(const token& t) {
    return global_partitioner().shard_of(t);
 }
--- a/dht/i_partitioner.hh
+++ b/dht/i_partitioner.hh
@@ -338,6 +338,12 @@ public:
        , _key(std::experimental::make_optional(std::move(key)))
    { }

+    ring_position(dht::token token, token_bound bound, std::experimental::optional<partition_key> key)
+        : _token(std::move(token))
+        , _token_bound(bound)
+        , _key(std::move(key))
+    { }
+
    ring_position(const dht::decorated_key& dk)
        : _token(dk._token)
        , _key(std::experimental::make_optional(dk._key))
@@ -379,10 +385,6 @@ public:
    // "less" comparator corresponding to tri_compare()
    bool less_compare(const schema&, const ring_position&) const;

-    size_t serialized_size() const;
-    void serialize(bytes::iterator& out) const;
-    static ring_position deserialize(bytes_view& in);
-
    friend std::ostream& operator<<(std::ostream&, const ring_position&);
 };

--- a/dht/range_streamer.hh
+++ b/dht/range_streamer.hh
@@ -107,7 +107,7 @@ public:
        , _tokens(std::move(tokens))
        , _address(address)
        , _description(std::move(description))
-        , _stream_plan(_description, true) {
+        , _stream_plan(_description) {
    }

    range_streamer(distributed<database>& db, token_metadata& tm, inet_address address, sstring description)
--- a/dist/ami/build_ami.sh
+++ b/dist/ami/build_ami.sh
@@ -5,15 +5,22 @@ if [ ! -e dist/ami/build_ami.sh ]; then
    exit 1
 fi

-TARGET_JSON=scylla.json
-if [ "$1" != "" ]; then
-    TARGET_JSON=$1
-fi
-
-if [ ! -f dist/ami/$TARGET_JSON ]; then
-    echo "dist/ami/$TARGET_JSON does not found"
+print_usage() {
+    echo "build_ami.sh -l"
+    echo "  -l  deploy locally built rpms"
    exit 1
-fi
+}
+LOCALRPM=0
+while getopts lh OPT; do
+    case "$OPT" in
+        "l")
+            LOCALRPM=1
+            ;;
+        "h")
+            print_usage
+            ;;
+    esac
+done

 cd dist/ami

@@ -30,4 +37,12 @@ if [ ! -d packer ]; then
    cd -
 fi

-packer/packer build -var-file=variables.json $TARGET_JSON
+if [ $LOCALRPM = 0 ]; then
+    echo "sudo yum remove -y abrt; sudo sh -x -e /home/centos/scylla_install_pkg; sudo sh -x -e /usr/lib/scylla/scylla_setup -a" > scylla_deploy.sh
+else
+    echo "sudo yum remove -y abrt; sudo sh -x -e /home/centos/scylla_install_pkg -l /home/centos; sudo sh -x -e /usr/lib/scylla/scylla_setup -a" > scylla_deploy.sh
+
+fi
+
+chmod a+rx scylla_deploy.sh
+packer/packer build -var-file=variables.json scylla.json
--- a/dist/ami/build_ami_local.sh
+++ b/dist/ami/build_ami_local.sh
@@ -5,26 +5,27 @@ if [ ! -e dist/ami/build_ami_local.sh ]; then
    exit 1
 fi

+rm -rf build/*
 sudo yum -y install git
-if [ ! -f dist/ami/scylla-server.x86_64.rpm ]; then
+if [ ! -f dist/ami/files/scylla-server.x86_64.rpm ]; then
    dist/redhat/build_rpm.sh
-    cp build/rpms/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/scylla-server.x86_64.rpm
+    cp build/rpmbuild/RPMS/x86_64/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-server.x86_64.rpm
 fi
-if [ ! -f dist/ami/scylla-jmx.noarch.rpm ]; then
+if [ ! -f dist/ami/files/scylla-jmx.noarch.rpm ]; then
    cd build
    git clone --depth 1 https://github.com/scylladb/scylla-jmx.git
    cd scylla-jmx
-    sh -x -e dist/redhat/build_rpm.sh
+    sh -x -e dist/redhat/build_rpm.sh $*
    cd ../..
-    cp build/scylla-jmx/build/rpms/scylla-jmx-`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/scylla-jmx.noarch.rpm
+    cp build/scylla-jmx/build/rpmbuild/RPMS/noarch/scylla-jmx-`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-jmx.noarch.rpm
 fi
-if [ ! -f dist/ami/scylla-tools.noarch.rpm ]; then
+if [ ! -f dist/ami/files/scylla-tools.noarch.rpm ]; then
    cd build
    git clone --depth 1 https://github.com/scylladb/scylla-tools-java.git
    cd scylla-tools-java
    sh -x -e dist/redhat/build_rpm.sh
    cd ../..
-    cp build/scylla-tools-java/build/rpms/scylla-tools-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/scylla-tools.noarch.rpm
+    cp build/scylla-tools-java/build/rpmbuild/RPMS/noarch/scylla-tools-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-tools.noarch.rpm
 fi

-exec dist/ami/build_ami.sh scylla_local.json
+exec dist/ami/build_ami.sh -l
--- a/dist/ami/files/.bash_profile
+++ b/dist/ami/files/.bash_profile
@@ -23,7 +23,7 @@ echo '              |___/                        '
 echo ''
 echo ''
 echo 'Nodetool:'
-echo '	nodetool --help'
+echo '	nodetool help'
 echo 'CQL Shell:'
 echo '	cqlsh'
 echo 'More documentation available at: '
@@ -35,6 +35,7 @@ if [ "`systemctl is-active scylla-server`" = "active" ]; then
 	tput bold
 	echo "    ScyllaDB is active."
 	tput sgr0
+	echo
 else
 	tput setaf 1
 	tput bold
@@ -42,4 +43,5 @@ else
 	tput sgr0
 	echo "Please wait for startup. To see status of ScyllaDB, run "
 	echo " 'systemctl status scylla-server'"
+	echo
 fi
--- a/dist/ami/files/scylla-ami
+++ b/dist/ami/files/scylla-ami
--- a/dist/ami/scylla.json
+++ b/dist/ami/scylla.json
@@ -8,34 +8,34 @@
      "security_group_id": "{{user `security_group_id`}}",
      "region": "{{user `region`}}",
      "associate_public_ip_address": "{{user `associate_public_ip_address`}}",
-      "source_ami": "ami-a51564c0",
+      "source_ami": "ami-8ef1d6e4",
+      "user_data_file": "user_data.txt",
      "instance_type": "{{user `instance_type`}}",
-      "ssh_username": "fedora",
+      "ssh_username": "centos",
      "ssh_timeout": "5m",
-      "ami_name": "scylla_{{isotime | clean_ami_name}}"
+      "ami_name": "scylla_{{isotime | clean_ami_name}}",
+      "launch_block_device_mappings": [
+        {
+          "device_name": "/dev/sda1",
+          "volume_size": 10
+        }
+      ]
    }
  ],
  "provisioners": [
    {
      "type": "file",
-      "source": "files/scylla-ami",
-      "destination": "/home/fedora/scylla-ami"
+      "source": "files/",
+      "destination": "/home/centos/"
    },
    {
      "type": "file",
-      "source": "files/.bash_profile",
-      "destination": "/home/fedora/.bash_profile"
-    },
-    {
-      "type": "file",
-      "source": "../../scripts/scylla_install",
-      "destination": "/home/fedora/scylla_install"
+      "source": "../../scripts/scylla_install_pkg",
+      "destination": "/home/centos/scylla_install_pkg"
    },
    {
      "type": "shell",
-      "inline": [
-          "sudo sh -x -e /home/fedora/scylla_install -a"
-      ]
+      "script": "scylla_deploy.sh"
    }
  ],
  "variables": {
--- a/dist/ami/scylla_local.json
+++ b/dist/ami/scylla_local.json
@@ -1,67 +0,0 @@
-{
-  "builders": [
-    {
-      "type": "amazon-ebs",
-      "access_key": "{{user `access_key`}}",
-      "secret_key": "{{user `secret_key`}}",
-      "subnet_id": "{{user `subnet_id`}}",
-      "security_group_id": "{{user `security_group_id`}}",
-      "region": "{{user `region`}}",
-      "associate_public_ip_address": "{{user `associate_public_ip_address`}}",
-      "source_ami": "ami-a51564c0",
-      "instance_type": "{{user `instance_type`}}",
-      "ssh_username": "fedora",
-      "ssh_timeout": "5m",
-      "ami_name": "scylla_{{isotime | clean_ami_name}}"
-    }
-  ],
-  "provisioners": [
-    {
-      "type": "file",
-      "source": "files/scylla-ami",
-      "destination": "/home/fedora/scylla-ami"
-    },
-    {
-      "type": "file",
-      "source": "files/.bash_profile",
-      "destination": "/home/fedora/.bash_profile"
-    },
-    {
-      "type": "file",
-      "source": "../../scripts/scylla_install",
-      "destination": "/home/fedora/scylla_install"
-    },
-    {
-      "type": "file",
-      "source": "scylla-server.x86_64.rpm",
-      "destination": "/home/fedora/scylla-server.x86_64.rpm"
-    },
-    {
-      "type": "file",
-      "source": "scylla-jmx.noarch.rpm",
-      "destination": "/home/fedora/scylla-jmx.noarch.rpm"
-    },
-    {
-      "type": "file",
-      "source": "scylla-tools.noarch.rpm",
-      "destination": "/home/fedora/scylla-tools.noarch.rpm"
-    },
-    {
-      "type": "shell",
-      "inline": [
-          "sudo yum install -y /home/fedora/scylla-server.x86_64.rpm /home/fedora/scylla-jmx.noarch.rpm /home/fedora/scylla-tools.noarch.rpm",
-          "sudo mv /home/fedora/scylla-ami /usr/lib/scylla/scylla-ami",
-          "sudo sh -x -e /home/fedora/scylla_install -a -l /home/fedora"
-      ]
-    }
-  ],
-  "variables": {
-    "access_key": "",
-    "secret_key": "",
-    "subnet_id": "",
-    "security_group_id": "",
-    "region": "",
-    "associate_public_ip_address": "",
-    "instance_type": ""
-  }
-}
--- a/dist/ami/user_data.txt
+++ b/dist/ami/user_data.txt
@@ -0,0 +1,2 @@
+#!/bin/sh
+sed -i 's/Defaults    requiretty/#Defaults    requiretty/g' /etc/sudoers
--- a/dist/common/scripts/scylla_bootparam_setup
+++ b/dist/common/scripts/scylla_bootparam_setup
@@ -2,47 +2,22 @@
 #
 #  Copyright (C) 2015 ScyllaDB

-print_usage() {
-    echo "scylla_bootparam_setup -a"
-    echo "  -a  AMI instance mode"
-    exit 1
-}
-
-AMI=0
-while getopts a OPT; do
-    case "$OPT" in
-        "a")
-            AMI=1
-            ;;
-        "h")
-            print_usage
-            ;;
-    esac
-done
-
 . /etc/os-release

-if [ $AMI -eq 1 ]; then
-    . /etc/sysconfig/scylla-server
-    sed -e "s#append #append clocksource=tsc tsc=reliable hugepagesz=2M hugepages=$NR_HUGEPAGES #" /boot/extlinux/extlinux.conf > /tmp/extlinux.conf
-    mv /tmp/extlinux.conf /boot/extlinux/extlinux.conf
-else
-    . /etc/sysconfig/scylla-server
-    if [ ! -f /etc/default/grub ]; then
-        echo "Unsupported bootloader"
-        exit 1
-    fi
-    if [ "`grep hugepagesz /etc/default/grub`" != "" ] || [ "`grep hugepages /etc/default/grub`" != "" ]; then
-        sed -e "s#hugepagesz=2M ##" /etc/default/grub > /tmp/grub
-        mv /tmp/grub /etc/default/grub
-        sed -e "s#hugepages=[0-9]* ##" /etc/default/grub > /tmp/grub
-        mv /tmp/grub /etc/default/grub
-    fi
-    sed -e "s#^GRUB_CMDLINE_LINUX=\"#GRUB_CMDLINE_LINUX=\"hugepagesz=2M hugepages=$NR_HUGEPAGES #" /etc/default/grub > /tmp/grub
-    mv /tmp/grub /etc/default/grub
-    if [ "$ID" = "ubuntu" ]; then
-        grub2-mkconfig -o /boot/grub/grub.cfg
-    else
-        grub2-mkconfig -o /boot/grub2/grub.cfg
-    fi
+if [ ! -f /etc/default/grub ]; then
+    echo "Unsupported bootloader"
+    exit 1
+fi
+if [ "`grep hugepagesz /etc/default/grub`" != "" ] || [ "`grep hugepages /etc/default/grub`" != "" ]; then
+    sed -e "s#hugepagesz=2M ##" /etc/default/grub > /tmp/grub
+    mv /tmp/grub /etc/default/grub
+    sed -e "s#hugepages=[0-9]* ##" /etc/default/grub > /tmp/grub
+    mv /tmp/grub /etc/default/grub
+fi
+sed -e "s#^GRUB_CMDLINE_LINUX=\"#GRUB_CMDLINE_LINUX=\"hugepagesz=2M hugepages=$NR_HUGEPAGES #" /etc/default/grub > /tmp/grub
+mv /tmp/grub /etc/default/grub
+if [ "$ID" = "ubuntu" ]; then
+    grub-mkconfig -o /boot/grub/grub.cfg
+else
+    grub2-mkconfig -o /boot/grub2/grub.cfg
 fi
--- a/dist/common/scripts/scylla_coredump_setup
+++ b/dist/common/scripts/scylla_coredump_setup
@@ -2,15 +2,43 @@
 #
 #  Copyright (C) 2015 ScyllaDB

+print_usage() {
+    echo "scylla_coredump_setup -s"
+    echo "  -s  store coredump to /var/lib/scylla"
+    exit 1
+}
+
+SYMLINK=0
+while getopts sh OPT; do
+    case "$OPT" in
+        "s")
+            SYMLINK=1
+            ;;
+        "h")
+            print_usage
+            ;;
+    esac
+done
+
+
 . /etc/os-release

 if [ "$ID" = "ubuntu" ]; then
    apt-get remove -y apport-noui
+    sysctl -p /etc/sysctl.d/99-scylla.conf
 else
-    if [ -f /etc/systemd/coredump.conf ]; then
-        mv /etc/systemd/coredump.conf /etc/systemd/coredump.conf.save
-        systemctl daemon-reload
+    cat << EOS > /etc/systemd/coredump.conf
+[Coredump]
+Storage=external
+Compress=yes
+ProcessSizeMax=1024G
+ExternalSizeMax=1024G
+EOS
+    if [ $SYMLINK = 1 ]; then
+        rm -rf /var/lib/systemd/coredump
+        ln -sf /var/lib/scylla/coredump /var/lib/systemd/coredump
    fi
+    systemctl daemon-reload
+    echo "kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e" > /etc/sysctl.d/99-coredump.conf
+    sysctl -p /etc/sysctl.d/99-coredump.conf
 fi
-
-sysctl -p /etc/sysctl.d/99-scylla.conf
--- a/dist/common/scripts/scylla_ntp_setup
+++ b/dist/common/scripts/scylla_ntp_setup
@@ -29,10 +29,13 @@ if [ "$NAME" = "Ubuntu" ]; then
 else
    yum install -y ntp ntpdate || true
    if [ $AMI -eq 1 ]; then
-        sed -e s#fedora.pool.ntp.org#amazon.pool.ntp.org# /etc/ntp.conf > /tmp/ntp.conf
+        sed -e s#centos.pool.ntp.org#amazon.pool.ntp.org# /etc/ntp.conf > /tmp/ntp.conf
        mv /tmp/ntp.conf /etc/ntp.conf
    fi
-    systemctl enable ntpd.service
+    if [ "`systemctl is-active ntpd`" = "active" ]; then
+        systemctl stop ntpd.service
+    fi
    ntpdate `cat /etc/ntp.conf |grep "^server"|head -n1|awk '{print $2}'`
+    systemctl enable ntpd.service
    systemctl start ntpd.service
 fi
--- a/dist/common/scripts/scylla_prepare
+++ b/dist/common/scripts/scylla_prepare
@@ -30,8 +30,6 @@ if [ "$AMI" = "yes" ]; then

        if [ "$DISKS" != "" ]; then
            /usr/lib/scylla/scylla_raid_setup -d $DISKS
-        else
-            echo "ERROR: Scylla is not using XFS to store data. The scylla service will refuse to start." > /home/fedora/SCYLLA_SETUP_ERROR.LOG
        fi

        /usr/lib/scylla/scylla-ami/ds2_configure.py
--- a/dist/common/scripts/scylla_raid_setup
+++ b/dist/common/scripts/scylla_raid_setup
@@ -43,6 +43,13 @@ if [ "`mount|grep /var/lib/scylla`" != "" ]; then
    echo "/var/lib/scylla is already mounted"
    exit 1
 fi
+
+. /etc/os-release
+if [ "$NAME" = "Ubuntu" ]; then
+    apt-get -y install mdadm xfsprogs
+else
+    yum -y install mdadm xfsprogs
+fi
 mdadm --create --verbose --force --run $RAID --level=0 -c256 --raid-devices=$NR_DISK $DISKS
 blockdev --setra 65536 $RAID
 mkfs.xfs $RAID -f
--- a/dist/common/scripts/scylla_setup
+++ b/dist/common/scripts/scylla_setup
@@ -0,0 +1,76 @@
+#!/bin/sh -e
+#
+#  Copyright (C) 2015 ScyllaDB
+
+if [ "`id -u`" -ne 0 ]; then
+    echo "Requires root permission."
+    exit 1
+fi
+
+print_usage() {
+    echo "scylla_setup -d /dev/hda,/dev/hdb... -n eth0 -a"
+    echo "  -d  specify disks for RAID"
+    echo "  -n  specify NIC"
+    echo "  -a  setup AMI instance"
+    exit 1
+}
+
+NIC=eth0
+RAID=/dev/md0
+AMI=0
+while getopts d:n:al:h OPT; do
+    case "$OPT" in
+        "n")
+            NIC=$OPTARG
+            ;;
+        "d")
+            DISKS=$OPTARG
+            ;;
+        "a")
+            AMI=1
+            ;;
+        "h")
+            print_usage
+            ;;
+    esac
+done
+
+SYSCONFIG_SETUP_ARGS="-n $NIC"
+. /etc/os-release
+
+if [ "$ID" != "ubuntu" ]; then
+    if [ "`sestatus | awk '{print $3}'`" != "disabled" ]; then
+        setenforce 0
+        sed -e "s/enforcing/disabled/" /etc/sysconfig/selinux > /tmp/selinux
+        mv /tmp/selinux /etc/sysconfig/
+    fi
+    if [ $AMI -eq 1 ]; then
+        SYSCONFIG_SETUP_ARGS="$SYSCONFIG_SETUP_ARGS -N -a"
+        if [ "$LOCAL_PKG" = "" ]; then
+            yum update -y
+        else
+            SYSCONFIG_SETUP_ARGS="$SYSCONFIG_SETUP_ARGS -k"
+        fi
+        grep -v ' - mounts' /etc/cloud/cloud.cfg > /tmp/cloud.cfg
+        mv /tmp/cloud.cfg /etc/cloud/cloud.cfg
+        mv /home/centos/scylla-ami /usr/lib/scylla/scylla-ami
+        chmod a+rx /usr/lib/scylla/scylla-ami/ds2_configure.py
+    fi
+    systemctl enable scylla-server.service
+    systemctl enable scylla-jmx.service
+fi
+if [ $AMI -eq 0 ]; then
+    /usr/lib/scylla/scylla_ntp_setup
+    /usr/lib/scylla/scylla_bootparam_setup
+    if [ $DISKS != "" ]; then
+        /usr/lib/scylla/scylla_raid_setup -d $DISKS -u
+        /usr/lib/scylla/scylla_coredump_setup -s
+    else
+        /usr/lib/scylla/scylla_coredump_setup
+    fi
+else
+    /usr/lib/scylla/scylla_coredump_setup -s
+    /usr/lib/scylla/scylla_ntp_setup -a
+    /usr/lib/scylla/scylla_bootparam_setup -a
+fi
+/usr/lib/scylla/scylla_sysconfig_setup $SYSCONFIG_SETUP_ARGS
--- a/dist/common/sudoers.d/scylla
+++ b/dist/common/sudoers.d/scylla
@@ -0,0 +1 @@
+scylla ALL=(ALL) NOPASSWD:SETENV: /usr/lib/scylla/scylla_prepare,/usr/lib/scylla/scylla_stop
--- a/dist/common/sysconfig/scylla-server
+++ b/dist/common/sysconfig/scylla-server
@@ -34,8 +34,11 @@ SCYLLA_HOME=/var/lib/scylla
 # scylla config dir
 SCYLLA_CONF=/etc/scylla

-# additional arguments
-SCYLLA_ARGS=""
+# scylla arguments (for posix mode)
+SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info --collectd-address=127.0.0.1:25826 --collectd=1 --collectd-poll-period 3000 --network-stack posix"
+
+## scylla arguments (for dpdk mode)
+#SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info --collectd-address=127.0.0.1:25826 --collectd=1 --collectd-poll-period 3000 --network-stack native --dpdk-pmd"

 # setup as AMI instance
 AMI=no
--- a/dist/redhat/build_rpm.sh
+++ b/dist/redhat/build_rpm.sh
@@ -1,5 +1,22 @@
 #!/bin/sh -e

+print_usage() {
+    echo "build_rpm.sh -R"
+    echo "  -R  rebuild dependency packages (CentOS)"
+    exit 1
+}
+REBUILD=0
+while getopts Rh OPT; do
+    case "$OPT" in
+        "R")
+            REBUILD=1
+            ;;
+        "h")
+            print_usage
+            ;;
+    esac
+done
+
 RPMBUILD=`pwd`/build/rpmbuild

 if [ ! -e dist/redhat/build_rpm.sh ]; then
@@ -22,7 +39,12 @@ if [ ! -f /usr/bin/git ]; then
 fi
 mkdir -p $RPMBUILD/{BUILD,BUILDROOT,RPMS,SOURCES,SPECS,SRPMS}
 if [ "$ID" = "centos" ]; then
-    ./dist/redhat/centos_dep/build_dependency.sh
+    sudo yum install -y epel-release
+    if [ $REBUILD = 1 ]; then
+        ./dist/redhat/centos_dep/build_dependency.sh
+    else
+        sudo curl https://s3.amazonaws.com/downloads.scylladb.com/rpm/unstable/centos/master/latest/scylla.repo -o /etc/yum.repos.d/scylla.repo
+    fi
 fi
 VERSION=$(./SCYLLA-VERSION-GEN)
 SCYLLA_VERSION=$(cat build/SCYLLA-VERSION-FILE)
@@ -37,7 +59,7 @@ if [ "$ID" = "fedora" ]; then
    rpmbuild -bs --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla-server.spec
    mock rebuild --resultdir=`pwd`/build/rpms $RPMBUILD/SRPMS/scylla-server-$VERSION*.src.rpm
 else
-    . /etc/profile.d/scylla.sh
    sudo yum-builddep -y  $RPMBUILD/SPECS/scylla-server.spec
+    . /etc/profile.d/scylla.sh
    rpmbuild -ba --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla-server.spec
 fi
--- a/dist/redhat/centos_dep/binutils.diff
+++ b/dist/redhat/centos_dep/binutils.diff
@@ -1,5 +1,5 @@
--- binutils.spec	2015-10-19 05:45:55.106745163 +0000
-+++ binutils.spec.1	2015-10-19 05:45:55.807742899 +0000
+--- binutils.spec.orig	2015-09-30 14:48:25.000000000 +0000
+++ binutils.spec	2016-01-20 14:42:17.856037134 +0000
@@ -17,7 +17,7 @@
 %define enable_deterministic_archives 1
 
@@ -7,7 +7,7 @@
 -Name: %{?cross}binutils%{?_with_debug:-debug}
 +Name: scylla-%{?cross}binutils%{?_with_debug:-debug}
 Version: 2.25
- Release: 5%{?dist}
+ Release: 15%{?dist}
 License: GPLv3+
@@ -29,6 +29,7 @@
 # instead.
@@ -17,7 +17,7 @@
 
 Source2: binutils-2.19.50.0.1-output-format.sed
 Patch01: binutils-2.20.51.0.2-libtool-lib64.patch
-@@ -82,6 +83,9 @@
+@@ -89,6 +90,9 @@
 BuildRequires: texinfo >= 4.0, gettext, flex, bison, zlib-devel
 # BZ 920545: We need pod2man in order to build the manual pages.
 BuildRequires: /usr/bin/pod2man
@@ -27,7 +27,7 @@
 # Required for: ld-bootstrap/bootstrap.exp bootstrap with --static
 # It should not be required for: ld-elf/elf.exp static {preinit,init,fini} array
 %if %{run_testsuite}
-@@ -105,8 +109,8 @@
+@@ -112,8 +116,8 @@
 
 %if "%{build_gold}" == "both"
 Requires(post): coreutils
@@ -38,7 +38,7 @@
 %endif
 
 # On ARM EABI systems, we do want -gnueabi to be part of the
-@@ -131,11 +135,12 @@
+@@ -138,11 +142,12 @@
 %package devel
 Summary: BFD and opcodes static and dynamic libraries and header files
 Group: System Environment/Libraries
@@ -50,10 +50,10 @@
 Requires: zlib-devel
 -Requires: binutils = %{version}-%{release}
 +Requires: scylla-binutils = %{version}-%{release}
+ # BZ 1215242: We need touch...
+ Requires: coreutils
 
- %description devel
- This package contains BFD and opcodes static and dynamic libraries.
-@@ -411,11 +416,11 @@
+@@ -426,11 +431,11 @@
 %post
 %if "%{build_gold}" == "both"
 %__rm -f %{_bindir}/%{?cross}ld
@@ -68,7 +68,7 @@
 %endif
 %if %{isnative}
 /sbin/ldconfig
-@@ -433,8 +438,8 @@
+@@ -448,8 +453,8 @@
 %preun
 %if "%{build_gold}" == "both"
 if [ $1 = 0 ]; then
--- a/dist/redhat/centos_dep/boost.diff
+++ b/dist/redhat/centos_dep/boost.diff
@@ -1,5 +1,5 @@
--- boost.spec	2015-05-03 17:32:13.000000000 +0000
-+++ boost.spec.1	2015-10-19 06:03:12.670534256 +0000
+--- boost.spec.orig	2016-01-15 18:41:47.000000000 +0000
+++ boost.spec	2016-01-20 14:46:47.397663246 +0000
@@ -6,6 +6,11 @@
 # We should be able to install directly.
 %define boost_docdir __tmp_docdir
@@ -20,9 +20,9 @@
 +Name: scylla-boost
 +%define orig_name boost
 Summary: The free peer-reviewed portable C++ source libraries
- Version: 1.57.0
- %define version_enc 1_57_0
- Release: 6%{?dist}
+ Version: 1.58.0
+ %define version_enc 1_58_0
+ Release: 11%{?dist}
 License: Boost and MIT and Python
 
 -%define toplev_dirname %{name}_%{version_enc}
@@ -93,8 +93,8 @@
 +Requires: scylla-boost-wave%{?_isa} = %{version}-%{release}
 
 BuildRequires: m4
- BuildRequires: libstdc++-devel%{?_isa}
-@@ -151,6 +159,7 @@
+ BuildRequires: libstdc++-devel
+@@ -156,6 +164,7 @@
 %package atomic
 Summary: Run-Time component of boost atomic library
 Group: System Environment/Libraries
@@ -102,7 +102,7 @@
 
 %description atomic
 
-@@ -162,7 +171,8 @@
+@@ -167,7 +176,8 @@
 %package chrono
 Summary: Run-Time component of boost chrono library
 Group: System Environment/Libraries
@@ -112,7 +112,7 @@
 
 %description chrono
 
-@@ -171,6 +181,7 @@
+@@ -176,6 +186,7 @@
 %package container
 Summary: Run-Time component of boost container library
 Group: System Environment/Libraries
@@ -120,7 +120,7 @@
 
 %description container
 
-@@ -183,6 +194,7 @@
+@@ -188,6 +199,7 @@
 %package context
 Summary: Run-Time component of boost context switching library
 Group: System Environment/Libraries
@@ -128,7 +128,7 @@
 
 %description context
 
-@@ -192,6 +204,7 @@
+@@ -197,6 +209,7 @@
 %package coroutine
 Summary: Run-Time component of boost coroutine library
 Group: System Environment/Libraries
@@ -136,7 +136,7 @@
 
 %description coroutine
 Run-Time support for Boost.Coroutine, a library that provides
-@@ -203,6 +216,7 @@
+@@ -208,6 +221,7 @@
 %package date-time
 Summary: Run-Time component of boost date-time library
 Group: System Environment/Libraries
@@ -144,7 +144,7 @@
 
 %description date-time
 
-@@ -212,7 +226,8 @@
+@@ -217,7 +231,8 @@
 %package filesystem
 Summary: Run-Time component of boost filesystem library
 Group: System Environment/Libraries
@@ -154,7 +154,7 @@
 
 %description filesystem
 
-@@ -223,7 +238,8 @@
+@@ -228,7 +243,8 @@
 %package graph
 Summary: Run-Time component of boost graph library
 Group: System Environment/Libraries
@@ -164,7 +164,7 @@
 
 %description graph
 
-@@ -243,9 +259,10 @@
+@@ -248,9 +264,10 @@
 %package locale
 Summary: Run-Time component of boost locale library
 Group: System Environment/Libraries
@@ -178,7 +178,7 @@
 
 %description locale
 
-@@ -255,6 +272,7 @@
+@@ -260,6 +277,7 @@
 %package log
 Summary: Run-Time component of boost logging library
 Group: System Environment/Libraries
@@ -186,7 +186,7 @@
 
 %description log
 
-@@ -265,6 +283,7 @@
+@@ -270,6 +288,7 @@
 %package math
 Summary: Math functions for boost TR1 library
 Group: System Environment/Libraries
@@ -194,7 +194,7 @@
 
 %description math
 
-@@ -274,6 +293,7 @@
+@@ -279,6 +298,7 @@
 %package program-options
 Summary:  Run-Time component of boost program_options library
 Group: System Environment/Libraries
@@ -202,7 +202,7 @@
 
 %description program-options
 
-@@ -284,6 +304,7 @@
+@@ -289,6 +309,7 @@
 %package python
 Summary: Run-Time component of boost python library
 Group: System Environment/Libraries
@@ -210,7 +210,7 @@
 
 %description python
 
-@@ -298,6 +319,7 @@
+@@ -303,6 +324,7 @@
 %package python3
 Summary: Run-Time component of boost python library for Python 3
 Group: System Environment/Libraries
@@ -218,7 +218,7 @@
 
 %description python3
 
-@@ -310,8 +332,9 @@
+@@ -315,8 +337,9 @@
 %package python3-devel
 Summary: Shared object symbolic links for Boost.Python 3
 Group: System Environment/Libraries
@@ -230,7 +230,7 @@
 
 %description python3-devel
 
-@@ -322,6 +345,7 @@
+@@ -327,6 +350,7 @@
 %package random
 Summary: Run-Time component of boost random library
 Group: System Environment/Libraries
@@ -238,7 +238,7 @@
 
 %description random
 
-@@ -330,6 +354,7 @@
+@@ -335,6 +359,7 @@
 %package regex
 Summary: Run-Time component of boost regular expression library
 Group: System Environment/Libraries
@@ -246,7 +246,7 @@
 
 %description regex
 
-@@ -338,6 +363,7 @@
+@@ -343,6 +368,7 @@
 %package serialization
 Summary: Run-Time component of boost serialization library
 Group: System Environment/Libraries
@@ -254,7 +254,7 @@
 
 %description serialization
 
-@@ -346,6 +372,7 @@
+@@ -351,6 +377,7 @@
 %package signals
 Summary: Run-Time component of boost signals and slots library
 Group: System Environment/Libraries
@@ -262,7 +262,7 @@
 
 %description signals
 
-@@ -354,6 +381,7 @@
+@@ -359,6 +386,7 @@
 %package system
 Summary: Run-Time component of boost system support library
 Group: System Environment/Libraries
@@ -270,7 +270,7 @@
 
 %description system
 
-@@ -364,6 +392,7 @@
+@@ -369,6 +397,7 @@
 %package test
 Summary: Run-Time component of boost test library
 Group: System Environment/Libraries
@@ -278,7 +278,7 @@
 
 %description test
 
-@@ -373,7 +402,8 @@
+@@ -378,7 +407,8 @@
 %package thread
 Summary: Run-Time component of boost thread library
 Group: System Environment/Libraries
@@ -288,7 +288,7 @@
 
 %description thread
 
-@@ -385,8 +415,9 @@
+@@ -390,8 +420,9 @@
 %package timer
 Summary: Run-Time component of boost timer library
 Group: System Environment/Libraries
@@ -300,7 +300,7 @@
 
 %description timer
 
-@@ -397,11 +428,12 @@
+@@ -402,11 +433,12 @@
 %package wave
 Summary: Run-Time component of boost C99/C++ pre-processing library
 Group: System Environment/Libraries
@@ -318,7 +318,7 @@
 
 %description wave
 
-@@ -412,27 +444,20 @@
+@@ -417,27 +449,20 @@
 %package devel
 Summary: The Boost C++ headers and shared development libraries
 Group: Development/Libraries
@@ -352,7 +352,7 @@
 
 %description static
 Static Boost C++ libraries.
-@@ -443,11 +468,7 @@
+@@ -448,11 +473,7 @@
 %if 0%{?rhel} >= 6
 BuildArch: noarch
 %endif
@@ -365,7 +365,7 @@
 
 %description doc
 This package contains the documentation in the HTML format of the Boost C++
-@@ -460,7 +481,7 @@
+@@ -465,7 +486,7 @@
 %if 0%{?rhel} >= 6
 BuildArch: noarch
 %endif
@@ -374,19 +374,18 @@
 
 %description examples
 This package contains example source files distributed with boost.
-@@ -471,9 +492,10 @@
+@@ -476,8 +497,9 @@
 %package openmpi
 Summary: Run-Time component of Boost.MPI library
 Group: System Environment/Libraries
 +Requires: scylla-env
- Requires: openmpi%{?_isa}
 BuildRequires: openmpi-devel
 -Requires: boost-serialization%{?_isa} = %{version}-%{release}
 +Requires: scylla-boost-serialization%{?_isa} = %{version}-%{release}
 
 %description openmpi
 
-@@ -483,10 +505,11 @@
+@@ -487,10 +509,11 @@
 %package openmpi-devel
 Summary: Shared library symbolic links for Boost.MPI
 Group: System Environment/Libraries
@@ -402,7 +401,7 @@
 
 %description openmpi-devel
 
-@@ -496,9 +519,10 @@
+@@ -500,9 +523,10 @@
 %package openmpi-python
 Summary: Python run-time component of Boost.MPI library
 Group: System Environment/Libraries
@@ -416,7 +415,7 @@
 
 %description openmpi-python
 
-@@ -508,8 +532,9 @@
+@@ -512,8 +536,9 @@
 %package graph-openmpi
 Summary: Run-Time component of parallel boost graph library
 Group: System Environment/Libraries
@@ -428,12 +427,11 @@
 
 %description graph-openmpi
 
-@@ -526,11 +551,11 @@
+@@ -530,10 +555,10 @@
 %package mpich
 Summary: Run-Time component of Boost.MPI library
 Group: System Environment/Libraries
 +Requires: scylla-env
- Requires: mpich%{?_isa}
 BuildRequires: mpich-devel
 -Requires: boost-serialization%{?_isa} = %{version}-%{release}
 -Provides: boost-mpich2 = %{version}-%{release}
@@ -443,7 +441,7 @@
 
 %description mpich
 
-@@ -540,12 +565,12 @@
+@@ -543,12 +568,12 @@
 %package mpich-devel
 Summary: Shared library symbolic links for Boost.MPI
 Group: System Environment/Libraries
@@ -462,7 +460,7 @@
 
 %description mpich-devel
 
-@@ -555,11 +580,11 @@
+@@ -558,11 +583,11 @@
 %package mpich-python
 Summary: Python run-time component of Boost.MPI library
 Group: System Environment/Libraries
@@ -479,7 +477,7 @@
 
 %description mpich-python
 
-@@ -569,10 +594,10 @@
+@@ -572,10 +597,10 @@
 %package graph-mpich
 Summary: Run-Time component of parallel boost graph library
 Group: System Environment/Libraries
@@ -494,7 +492,7 @@
 
 %description graph-mpich
 
-@@ -586,7 +611,8 @@
+@@ -589,7 +614,8 @@
 %package build
 Summary: Cross platform build system for C++ projects
 Group: Development/Tools
@@ -504,7 +502,7 @@
 BuildArch: noarch
 
 %description build
-@@ -600,6 +626,7 @@
+@@ -613,6 +639,7 @@
 %package jam
 Summary: A low-level build tool
 Group: Development/Tools
@@ -512,7 +510,7 @@
 
 %description jam
 Boost.Jam (BJam) is the low-level build engine tool for Boost.Build.
-@@ -1134,7 +1161,7 @@
+@@ -1186,7 +1213,7 @@
 %files devel
 %defattr(-, root, root, -)
 %doc LICENSE_1_0.txt
--- a/dist/redhat/centos_dep/build_dependency.sh
+++ b/dist/redhat/centos_dep/build_dependency.sh
@@ -12,33 +12,40 @@ sudo yum install -y wget yum-utils rpm-build rpmdevtools gcc gcc-c++ make patch
 mkdir -p build/srpms
 cd build/srpms

-if [ ! -f binutils-2.25-5.fc22.src.rpm ]; then
-    wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/b/binutils-2.25-5.fc22.src.rpm
+if [ ! -f binutils-2.25-15.fc23.src.rpm ]; then
+    wget https://kojipkgs.fedoraproject.org//packages/binutils/2.25/15.fc23/src/binutils-2.25-15.fc23.src.rpm
 fi

-if [ ! -f isl-0.14-3.fc22.src.rpm ]; then
-    wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/i/isl-0.14-3.fc22.src.rpm
+if [ ! -f isl-0.14-4.fc23.src.rpm ]; then
+    wget https://kojipkgs.fedoraproject.org//packages/isl/0.14/4.fc23/src/isl-0.14-4.fc23.src.rpm
 fi

-if [ ! -f gcc-5.1.1-4.fc22.src.rpm ]; then
-    wget https://s3.amazonaws.com/scylla-centos-dep/gcc-5.1.1-4.fc22.src.rpm
+if [ ! -f gcc-5.3.1-2.fc23.src.rpm ]; then
+    wget https://kojipkgs.fedoraproject.org//packages/gcc/5.3.1/2.fc23/src/gcc-5.3.1-2.fc23.src.rpm
 fi

-if [ ! -f boost-1.57.0-6.fc22.src.rpm ]; then
-    wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/b/boost-1.57.0-6.fc22.src.rpm
+if [ ! -f boost-1.58.0-11.fc23.src.rpm ]; then
+    wget https://kojipkgs.fedoraproject.org//packages/boost/1.58.0/11.fc23/src/boost-1.58.0-11.fc23.src.rpm
 fi

-if [ ! -f ninja-build-1.5.3-2.fc22.src.rpm ]; then
-    wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/n/ninja-build-1.5.3-2.fc22.src.rpm
+if [ ! -f ninja-build-1.6.0-2.fc23.src.rpm ]; then
+    wget https://kojipkgs.fedoraproject.org//packages/ninja-build/1.6.0/2.fc23/src/ninja-build-1.6.0-2.fc23.src.rpm
 fi

-if [ ! -f ragel-6.8-3.fc22.src.rpm ]; then
-   wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/r/ragel-6.8-3.fc22.src.rpm
+if [ ! -f ragel-6.8-5.fc23.src.rpm ]; then
+   wget https://kojipkgs.fedoraproject.org//packages/ragel/6.8/5.fc23/src/ragel-6.8-5.fc23.src.rpm
+fi
+
+if [ ! -f gdb-7.10.1-30.fc23.src.rpm ]; then
+   wget https://kojipkgs.fedoraproject.org//packages/gdb/7.10.1/30.fc23/src/gdb-7.10.1-30.fc23.src.rpm
+fi
+
+if [ ! -f pyparsing-2.0.3-2.fc23.src.rpm ]; then
+   wget https://kojipkgs.fedoraproject.org//packages/pyparsing/2.0.3/2.fc23/src/pyparsing-2.0.3-2.fc23.src.rpm
 fi

 cd -

-sudo yum install -y epel-release
 sudo yum install -y cryptopp cryptopp-devel jsoncpp jsoncpp-devel lz4 lz4-devel yaml-cpp yaml-cpp-devel thrift thrift-devel scons gtest gtest-devel python34
 sudo ln -sf /usr/bin/python3.4 /usr/bin/python3

@@ -47,6 +54,8 @@ sudo yum install -y flex bison dejagnu zlib-static glibc-static sharutils bc lib
 sudo yum install -y gcc-objc
 sudo yum install -y asciidoc
 sudo yum install -y gettext
+sudo yum install -y rpm-devel python34-devel guile-devel readline-devel ncurses-devel expat-devel texlive-collection-latexrecommended xz-devel libselinux-devel
+sudo yum install -y dos2unix

 if [ ! -f $RPMBUILD/RPMS/noarch/scylla-env-1.0-1.el7.centos.noarch.rpm ]; then
    cd dist/redhat/centos_dep
@@ -56,48 +65,62 @@ if [ ! -f $RPMBUILD/RPMS/noarch/scylla-env-1.0-1.el7.centos.noarch.rpm ]; then
 fi
 do_install scylla-env-1.0-1.el7.centos.noarch.rpm

-if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-binutils-2.25-5.el7.centos.x86_64.rpm ]; then
-    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/binutils-2.25-5.fc22.src.rpm
+if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-binutils-2.25-15.el7.centos.x86_64.rpm ]; then
+    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/binutils-2.25-15.fc23.src.rpm
    patch $RPMBUILD/SPECS/binutils.spec < dist/redhat/centos_dep/binutils.diff
    rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/binutils.spec
 fi
-do_install scylla-binutils-2.25-5.el7.centos.x86_64.rpm
+do_install scylla-binutils-2.25-15.el7.centos.x86_64.rpm

-if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-isl-0.14-3.el7.centos.x86_64.rpm ]; then
-    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/isl-0.14-3.fc22.src.rpm
+if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-isl-0.14-4.el7.centos.x86_64.rpm ]; then
+    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/isl-0.14-4.fc23.src.rpm
    patch $RPMBUILD/SPECS/isl.spec < dist/redhat/centos_dep/isl.diff
    rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/isl.spec
 fi
-do_install scylla-isl-0.14-3.el7.centos.x86_64.rpm
-do_install scylla-isl-devel-0.14-3.el7.centos.x86_64.rpm
+do_install scylla-isl-0.14-4.el7.centos.x86_64.rpm
+do_install scylla-isl-devel-0.14-4.el7.centos.x86_64.rpm

-if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gcc-5.1.1-4.el7.centos.x86_64.rpm ]; then
-    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gcc-5.1.1-4.fc22.src.rpm
+if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gcc-5.3.1-2.el7.centos.x86_64.rpm ]; then
+    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gcc-5.3.1-2.fc23.src.rpm
    patch $RPMBUILD/SPECS/gcc.spec < dist/redhat/centos_dep/gcc.diff
    rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/gcc.spec
 fi
-do_install scylla-*5.1.1-4*
+do_install scylla-*5.3.1-2*

-if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-boost-1.57.0-6.el7.centos.x86_64.rpm ]; then
-    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/boost-1.57.0-6.fc22.src.rpm
+if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-boost-1.58.0-11.el7.centos.x86_64.rpm ]; then
+    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/boost-1.58.0-11.fc23.src.rpm
    patch $RPMBUILD/SPECS/boost.spec < dist/redhat/centos_dep/boost.diff
    rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/boost.spec
 fi
 do_install scylla-boost*

-if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ninja-build-1.5.3-2.el7.centos.x86_64.rpm ]; then
-   rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ninja-build-1.5.3-2.fc22.src.rpm
+if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ninja-build-1.6.0-2.el7.centos.x86_64.rpm ]; then
+   rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ninja-build-1.6.0-2.fc23.src.rpm
   patch $RPMBUILD/SPECS/ninja-build.spec < dist/redhat/centos_dep/ninja-build.diff
   rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/ninja-build.spec
 fi
-do_install scylla-ninja-build-1.5.3-2.el7.centos.x86_64.rpm
+do_install scylla-ninja-build-1.6.0-2.el7.centos.x86_64.rpm

-if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ragel-6.8-3.el7.centos.x86_64.rpm ]; then
-    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ragel-6.8-3.fc22.src.rpm
+if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ragel-6.8-5.el7.centos.x86_64.rpm ]; then
+    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ragel-6.8-5.fc23.src.rpm
    patch $RPMBUILD/SPECS/ragel.spec < dist/redhat/centos_dep/ragel.diff
    rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/ragel.spec
 fi
-do_install scylla-ragel-6.8-3.el7.centos.x86_64.rpm
+do_install scylla-ragel-6.8-5.el7.centos.x86_64.rpm
+
+if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gdb-7.10.1-30.el7.centos.x86_64.rpm ]; then
+    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gdb-7.10.1-30.fc23.src.rpm
+    patch $RPMBUILD/SPECS/gdb.spec < dist/redhat/centos_dep/gdb.diff
+    rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/gdb.spec
+fi
+do_install scylla-gdb-7.10.1-30.el7.centos.x86_64.rpm
+
+if [ ! -f $RPMBUILD/RPMS/noarch/python34-pyparsing-2.0.3-2.el7.centos.noarch.rpm ]; then
+    rpm --define "_topdir $RPMBUILD" -ivh build/srpms/pyparsing-2.0.3-2.fc23.src.rpm
+    patch $RPMBUILD/SPECS/pyparsing.spec < dist/redhat/centos_dep/pyparsing.diff
+    rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/pyparsing.spec
+fi
+do_install python34-pyparsing-2.0.3-2.el7.centos.noarch.rpm

 if [ ! -f $RPMBUILD/RPMS/noarch/scylla-antlr3-tool-3.5.2-1.el7.centos.noarch.rpm ]; then
   mkdir build/scylla-antlr3-tool-3.5.2
--- a/dist/redhat/centos_dep/gcc.diff
+++ b/dist/redhat/centos_dep/gcc.diff
@@ -1,30 +1,14 @@
--- gcc.spec	2015-10-19 06:31:44.889189647 +0000
-+++ gcc.spec.1	2015-10-19 07:56:17.445991665 +0000
-@@ -1,22 +1,15 @@
- %global DATE 20150618
- %global SVNREV 224595
- %global gcc_version 5.1.1
+--- gcc.spec.orig	2015-12-08 16:03:46.000000000 +0000
+++ gcc.spec	2016-01-21 08:47:49.160667342 +0000
+@@ -1,6 +1,7 @@
+ %global DATE 20151207
+ %global SVNREV 231358
+ %global gcc_version 5.3.1
 +%define _prefix /opt/scylladb
 # Note, gcc_release must be integer, if you want to add suffixes to
 # %{release}, append them after %{gcc_release} on Release: line.
- %global gcc_release 4
- %global _unpackaged_files_terminate_build 0
- %global _performance_build 1
- %global multilib_64_archs sparc64 ppc64 ppc64p7 s390x x86_64
-%ifarch %{ix86} x86_64 ia64 ppc ppc64 ppc64p7 alpha %{arm} aarch64
-%global build_ada 1
-%else
- %global build_ada 0
-%endif
-%ifarch %{ix86} x86_64 ppc ppc64 ppc64le ppc64p7 s390 s390x %{arm} aarch64
-%global build_go 1
-%else
- %global build_go 0
-%endif
- %ifarch %{ix86} x86_64 ia64
- %global build_libquadmath 1
- %else
-@@ -82,7 +75,8 @@
+ %global gcc_release 2
+@@ -84,7 +85,8 @@
 %global multilib_32_arch i686
 %endif
 Summary: Various compilers (C, C++, Objective-C, Java, ...)
@@ -34,7 +18,7 @@
 Version: %{gcc_version}
 Release: %{gcc_release}%{?dist}
 # libgcc, libgfortran, libgomp, libstdc++ and crtstuff have
-@@ -97,6 +91,7 @@
+@@ -99,6 +101,7 @@
 %global isl_version 0.14
 URL: http://gcc.gnu.org
 BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)
@@ -42,7 +26,7 @@
 # Need binutils with -pie support >= 2.14.90.0.4-4
 # Need binutils which can omit dot symbols and overlap .opd on ppc64 >= 2.15.91.0.2-4
 # Need binutils which handle -msecure-plt on ppc >= 2.16.91.0.2-2
-@@ -108,7 +103,7 @@
+@@ -110,7 +113,7 @@
 # Need binutils which support .cfi_sections >= 2.19.51.0.14-33
 # Need binutils which support --no-add-needed >= 2.20.51.0.2-12
 # Need binutils which support -plugin
@@ -51,7 +35,7 @@
 # While gcc doesn't include statically linked binaries, during testing
 # -static is used several times.
 BuildRequires: glibc-static
-@@ -143,15 +138,15 @@
+@@ -145,15 +148,15 @@
 BuildRequires: libunwind >= 0.98
 %endif
 %if %{build_isl}
@@ -71,7 +55,7 @@
 # Need .eh_frame ld optimizations
 # Need proper visibility support
 # Need -pie support
-@@ -166,7 +161,7 @@
+@@ -168,7 +171,7 @@
 # Need binutils that support .cfi_sections
 # Need binutils that support --no-add-needed
 # Need binutils that support -plugin
@@ -80,7 +64,7 @@
 # Make sure gdb will understand DW_FORM_strp
 Conflicts: gdb < 5.1-2
 Requires: glibc-devel >= 2.2.90-12
-@@ -174,17 +169,15 @@
+@@ -176,17 +179,15 @@
 # Make sure glibc supports TFmode long double
 Requires: glibc >= 2.3.90-35
 %endif
@@ -102,7 +86,7 @@
 Requires(post): /sbin/install-info
 Requires(preun): /sbin/install-info
 AutoReq: true
-@@ -226,12 +219,12 @@
+@@ -228,12 +229,12 @@
 The gcc package contains the GNU Compiler Collection version 5.
 You'll need this package in order to compile C code.
 
@@ -117,7 +101,7 @@
 %endif
 Obsoletes: libmudflap
 Obsoletes: libmudflap-devel
-@@ -239,17 +232,19 @@
+@@ -241,17 +242,19 @@
 Obsoletes: libgcj < %{version}-%{release}
 Obsoletes: libgcj-devel < %{version}-%{release}
 Obsoletes: libgcj-src < %{version}-%{release}
@@ -141,7 +125,7 @@
 Autoreq: true
 
 %description c++
-@@ -257,50 +252,55 @@
+@@ -259,50 +262,55 @@
 It includes support for most of the current C++ specification,
 including templates and exception handling.
 
@@ -209,7 +193,7 @@
 Autoreq: true
 
 %description objc
-@@ -311,29 +311,32 @@
+@@ -313,29 +321,32 @@
 %package objc++
 Summary: Objective-C++ support for GCC
 Group: Development/Languages
@@ -249,7 +233,7 @@
 %endif
 Requires(post): /sbin/install-info
 Requires(preun): /sbin/install-info
-@@ -343,260 +346,286 @@
+@@ -345,260 +356,286 @@
 The gcc-gfortran package provides support for compiling Fortran
 programs with the GNU Compiler Collection.
 
@@ -608,7 +592,7 @@
 Cpp is the GNU C-Compatible Compiler Preprocessor.
 Cpp is a macro processor which is used automatically
 by the C compiler to transform your program before actual
-@@ -621,8 +650,9 @@
+@@ -623,8 +660,9 @@
 %package gnat
 Summary: Ada 83, 95, 2005 and 2012 support for GCC
 Group: Development/Languages
@@ -620,7 +604,7 @@
 Requires(post): /sbin/install-info
 Requires(preun): /sbin/install-info
 Autoreq: true
-@@ -631,40 +661,44 @@
+@@ -633,82 +671,90 @@
 GNAT is a GNU Ada 83, 95, 2005 and 2012 front-end to GCC. This package includes
 development tools, the documents and Ada compiler.
 
@@ -674,8 +658,13 @@
 +Requires: scylla-libgo-devel = %{version}-%{release}
 Requires(post): /sbin/install-info
 Requires(preun): /sbin/install-info
- Requires(post): %{_sbindir}/update-alternatives
-@@ -675,38 +709,42 @@
+-Requires(post): %{_sbindir}/update-alternatives
+-Requires(postun): %{_sbindir}/update-alternatives
+Requires(post): /sbin/update-alternatives
+Requires(postun): /sbin/update-alternatives
+ Autoreq: true
+ 
+ %description go
 The gcc-go package provides support for compiling Go programs
 with the GNU Compiler Collection.
 
@@ -728,7 +717,7 @@
 Requires: gmp-devel >= 4.1.2-8, mpfr-devel >= 2.2.1, libmpc-devel >= 0.8.1
 
 %description plugin-devel
-@@ -726,7 +764,8 @@
+@@ -728,7 +774,8 @@
 Summary: Debug information for package %{name}
 Group: Development/Debug
 AutoReqProv: 0
@@ -738,21 +727,21 @@
 
 %description debuginfo
 This package provides debug information for package %{name}.
-@@ -961,11 +1000,10 @@
+@@ -958,11 +1005,11 @@
 	--enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu \
 	--enable-plugin --enable-initfini-array \
 	--disable-libgcj \
 -%if 0%{fedora} >= 21 && 0%{fedora} <= 22
- 	--with-default-libstdcxx-abi=c++98 \
+ 	--with-default-libstdcxx-abi=gcc4-compatible \
 -%endif
 %if %{build_isl}
-	--with-isl \
+ 	--with-isl \
 +	--with-isl-include=/opt/scylladb/include/ \
 +	--with-isl-lib=/opt/scylladb/lib64/ \
 %else
 	--without-isl \
 %endif
-@@ -974,11 +1012,9 @@
+@@ -971,11 +1018,9 @@
 %else
 	--disable-libmpx \
 %endif
@@ -764,7 +753,7 @@
 %ifarch %{arm}
 	--disable-sjlj-exceptions \
 %endif
-@@ -1009,9 +1045,6 @@
+@@ -1006,9 +1051,6 @@
 %if 0%{?rhel} >= 7
 	--with-cpu-32=power8 --with-tune-32=power8 --with-cpu-64=power8 --with-tune-64=power8 \
 %endif
@@ -774,7 +763,7 @@
 %endif
 %ifarch ppc
 	--build=%{gcc_target_platform} --target=%{gcc_target_platform} --with-cpu=default32
-@@ -1273,16 +1306,15 @@
+@@ -1270,16 +1312,15 @@
 mv %{buildroot}%{_prefix}/%{_lib}/libmpx.spec $FULLPATH/
 %endif
 
@@ -797,7 +786,7 @@
 %endif
 %ifarch ppc
 rm -f $FULLPATH/libgcc_s.so
-@@ -1816,7 +1848,7 @@
+@@ -1819,7 +1860,7 @@
 chmod 755 %{buildroot}%{_prefix}/bin/c?9
 
 cd ..
@@ -806,7 +795,7 @@
 %find_lang cpplib
 
 # Remove binaries we will not be including, so that they don't end up in
-@@ -1866,11 +1898,7 @@
+@@ -1869,11 +1910,7 @@
 
 # run the tests.
 make %{?_smp_mflags} -k check ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++ \
@@ -818,7 +807,7 @@
 echo ====================TESTING=========================
 ( LC_ALL=C ../contrib/test_summary || : ) 2>&1 | sed -n '/^cat.*EOF/,/^EOF/{/^cat.*EOF/d;/^EOF/d;/^LAST_UPDATED:/d;p;}'
 echo ====================TESTING END=====================
-@@ -1897,13 +1925,13 @@
+@@ -1900,13 +1937,13 @@
     --info-dir=%{_infodir} %{_infodir}/gcc.info.gz || :
 fi
 
@@ -834,7 +823,21 @@
 if [ $1 = 0 -a -f %{_infodir}/cpp.info.gz ]; then
   /sbin/install-info --delete \
     --info-dir=%{_infodir} %{_infodir}/cpp.info.gz || :
-@@ -1954,7 +1982,7 @@
+@@ -1945,19 +1982,19 @@
+ fi
+ 
+ %post go
+-%{_sbindir}/update-alternatives --install \
+/sbin/update-alternatives --install \
+   %{_prefix}/bin/go go %{_prefix}/bin/go.gcc 92 \
+   --slave %{_prefix}/bin/gofmt gofmt %{_prefix}/bin/gofmt.gcc
+ 
+ %preun go
+ if [ $1 = 0 ]; then
+-  %{_sbindir}/update-alternatives --remove go %{_prefix}/bin/go.gcc
+  /sbin/update-alternatives --remove go %{_prefix}/bin/go.gcc
+ fi
+ 
 # Because glibc Prereq's libgcc and /sbin/ldconfig
 # comes from glibc, it might not exist yet when
 # libgcc is installed
@@ -843,7 +846,7 @@
 if posix.access ("/sbin/ldconfig", "x") then
   local pid = posix.fork ()
   if pid == 0 then
-@@ -1964,7 +1992,7 @@
+@@ -1967,7 +2004,7 @@
   end
 end
 
@@ -852,7 +855,7 @@
 if posix.access ("/sbin/ldconfig", "x") then
   local pid = posix.fork ()
   if pid == 0 then
-@@ -1974,120 +2002,120 @@
+@@ -1977,120 +2014,120 @@
   end
 end
 
@@ -1011,7 +1014,7 @@
 %defattr(-,root,root,-)
 %{_prefix}/bin/cc
 %{_prefix}/bin/c89
-@@ -2409,7 +2437,7 @@
+@@ -2414,7 +2451,7 @@
 %{!?_licensedir:%global license %%doc}
 %license gcc/COPYING* COPYING.RUNTIME
 
@@ -1020,7 +1023,7 @@
 %defattr(-,root,root,-)
 %{_prefix}/lib/cpp
 %{_prefix}/bin/cpp
-@@ -2420,10 +2448,10 @@
+@@ -2425,10 +2462,10 @@
 %dir %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}
 %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/cc1
 
@@ -1034,7 +1037,7 @@
 %{!?_licensedir:%global license %%doc}
 %license gcc/COPYING* COPYING.RUNTIME
 
-@@ -2461,7 +2489,7 @@
+@@ -2469,7 +2506,7 @@
 %endif
 %doc rpm.doc/changelogs/gcc/cp/ChangeLog*
 
@@ -1043,7 +1046,7 @@
 %defattr(-,root,root,-)
 %{_prefix}/%{_lib}/libstdc++.so.6*
 %dir %{_datadir}/gdb
-@@ -2473,7 +2501,7 @@
+@@ -2481,7 +2518,7 @@
 %dir %{_prefix}/share/gcc-%{gcc_version}/python
 %{_prefix}/share/gcc-%{gcc_version}/python/libstdcxx
 
@@ -1052,7 +1055,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/include/c++
 %dir %{_prefix}/include/c++/%{gcc_version}
-@@ -2488,7 +2516,7 @@
+@@ -2507,7 +2544,7 @@
 %endif
 %doc rpm.doc/changelogs/libstdc++-v3/ChangeLog* libstdc++-v3/README*
 
@@ -1061,7 +1064,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2509,7 +2537,7 @@
+@@ -2528,7 +2565,7 @@
 %endif
 
 %if %{build_libstdcxx_docs}
@@ -1070,7 +1073,7 @@
 %defattr(-,root,root)
 %{_mandir}/man3/*
 %doc rpm.doc/libstdc++-v3/html
-@@ -2548,7 +2576,7 @@
+@@ -2567,7 +2604,7 @@
 %dir %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}
 %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/cc1objplus
 
@@ -1079,7 +1082,7 @@
 %defattr(-,root,root,-)
 %{_prefix}/%{_lib}/libobjc.so.4*
 
-@@ -2602,11 +2630,11 @@
+@@ -2621,11 +2658,11 @@
 %endif
 %doc rpm.doc/gfortran/*
 
@@ -1093,7 +1096,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2652,12 +2680,12 @@
+@@ -2671,12 +2708,12 @@
 %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/gnat1
 %doc rpm.doc/changelogs/gcc/ada/ChangeLog*
 
@@ -1108,7 +1111,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2683,7 +2711,7 @@
+@@ -2702,7 +2739,7 @@
 %exclude %{_prefix}/lib/gcc/%{gcc_target_platform}/%{gcc_version}/adalib/libgnarl.a
 %endif
 
@@ -1117,7 +1120,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2707,7 +2735,7 @@
+@@ -2726,7 +2763,7 @@
 %endif
 %endif
 
@@ -1126,7 +1129,7 @@
 %defattr(-,root,root,-)
 %{_prefix}/%{_lib}/libgomp.so.1*
 %{_prefix}/%{_lib}/libgomp-plugin-host_nonshm.so.1*
-@@ -2715,14 +2743,14 @@
+@@ -2734,14 +2771,14 @@
 %doc rpm.doc/changelogs/libgomp/ChangeLog*
 
 %if %{build_libquadmath}
@@ -1143,7 +1146,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2735,7 +2763,7 @@
+@@ -2754,7 +2791,7 @@
 %endif
 %doc rpm.doc/libquadmath/ChangeLog*
 
@@ -1152,7 +1155,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2754,12 +2782,12 @@
+@@ -2773,12 +2810,12 @@
 %endif
 
 %if %{build_libitm}
@@ -1167,7 +1170,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2772,7 +2800,7 @@
+@@ -2791,7 +2828,7 @@
 %endif
 %doc rpm.doc/libitm/ChangeLog*
 
@@ -1176,7 +1179,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2791,11 +2819,11 @@
+@@ -2810,11 +2847,11 @@
 %endif
 
 %if %{build_libatomic}
@@ -1190,7 +1193,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2815,11 +2843,11 @@
+@@ -2834,11 +2871,11 @@
 %endif
 
 %if %{build_libasan}
@@ -1204,7 +1207,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2841,11 +2869,11 @@
+@@ -2860,11 +2897,11 @@
 %endif
 
 %if %{build_libubsan}
@@ -1218,7 +1221,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2867,11 +2895,11 @@
+@@ -2886,11 +2923,11 @@
 %endif
 
 %if %{build_libtsan}
@@ -1232,7 +1235,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2883,11 +2911,11 @@
+@@ -2902,11 +2939,11 @@
 %endif
 
 %if %{build_liblsan}
@@ -1246,7 +1249,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2899,11 +2927,11 @@
+@@ -2918,11 +2955,11 @@
 %endif
 
 %if %{build_libcilkrts}
@@ -1260,7 +1263,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2923,12 +2951,12 @@
+@@ -2942,12 +2979,12 @@
 %endif
 
 %if %{build_libmpx}
@@ -1275,7 +1278,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -2990,12 +3018,12 @@
+@@ -3009,12 +3046,12 @@
 %endif
 %doc rpm.doc/go/*
 
@@ -1290,7 +1293,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -3023,7 +3051,7 @@
+@@ -3042,7 +3079,7 @@
 %{_prefix}/lib/gcc/%{gcc_target_platform}/%{gcc_version}/libgo.so
 %endif
 
@@ -1299,7 +1302,7 @@
 %defattr(-,root,root,-)
 %dir %{_prefix}/lib/gcc
 %dir %{_prefix}/lib/gcc/%{gcc_target_platform}
-@@ -3041,12 +3069,12 @@
+@@ -3060,12 +3097,12 @@
 %endif
 %endif
 
--- a/dist/redhat/centos_dep/gdb.diff
+++ b/dist/redhat/centos_dep/gdb.diff
@@ -0,0 +1,29 @@
+--- gdb.spec.orig	2015-12-06 04:10:30.000000000 +0000
+++ gdb.spec	2016-01-20 14:49:12.745843903 +0000
+@@ -16,7 +16,10 @@
+ }
+ 
+ Summary: A GNU source-level debugger for C, C++, Fortran, Go and other languages
+-Name: %{?scl_prefix}gdb
+Name: %{?scl_prefix}scylla-gdb
+%define orig_name gdb
+Requires:	scylla-env
+%define _prefix /opt/scylladb
+ 
+ # Freeze it when GDB gets branched
+ %global snapsrc    20150706
+@@ -572,12 +575,8 @@
+ BuildRequires: rpm-devel%{buildisa}
+ BuildRequires: zlib-devel%{buildisa} libselinux-devel%{buildisa}
+ %if 0%{!?_without_python:1}
+-%if 0%{?rhel:1} && 0%{?rhel} <= 7
+-BuildRequires: python-devel%{buildisa}
+-%else
+-%global __python %{__python3}
+-BuildRequires: python3-devel%{buildisa}
+-%endif
+BuildRequires: python34-devel%{?_isa}
+%global __python /usr/bin/python3.4
+ %if 0%{?rhel:1} && 0%{?rhel} <= 7
+ # Temporarily before python files get moved to libstdc++.rpm
+ # libstdc++%{bits_other} is not present in Koji, the .spec script generating
--- a/dist/redhat/centos_dep/isl.diff
+++ b/dist/redhat/centos_dep/isl.diff
@@ -1,5 +1,5 @@
--- isl.spec	2015-01-06 16:24:49.000000000 +0000
-+++ isl.spec.1	2015-10-18 12:12:38.000000000 +0000
+--- isl.spec.orig	2016-01-20 14:41:16.891802146 +0000
+++ isl.spec	2016-01-20 14:43:13.838336396 +0000
@@ -1,5 +1,5 @@
 Summary: Integer point manipulation library
 -Name: isl
--- a/dist/redhat/centos_dep/ninja-build.diff
+++ b/dist/redhat/centos_dep/ninja-build.diff
@@ -1,34 +1,56 @@
-1c1
-< Name:           ninja-build
---
-> Name:           scylla-ninja-build
-8d7
-< Source1:        ninja.vim
-10a10
-> Requires:	scylla-env
-14,16c14,15
-< BuildRequires:  re2c >= 0.11.3
-< Requires:       emacs-filesystem
-< Requires:       vim-filesystem
---
-> #BuildRequires:  scylla-re2c >= 0.11.3
-> %define _prefix /opt/scylladb
-35,37c34
-< # TODO: Install ninja_syntax.py?
-< mkdir -p %{buildroot}/{%{_bindir},%{_datadir}/bash-completion/completions,%{_datadir}/emacs/site-lisp,%{_datadir}/vim/vimfiles/syntax,%{_datadir}/vim/vimfiles/ftdetect,%{_datadir}/zsh/site-functions}
-< 
---
-> mkdir -p %{buildroot}/opt/scylladb/bin
-39,43d35
-< install -pm644 misc/bash-completion %{buildroot}%{_datadir}/bash-completion/completions/ninja-bash-completion
-< install -pm644 misc/ninja-mode.el %{buildroot}%{_datadir}/emacs/site-lisp/ninja-mode.el
-< install -pm644 misc/ninja.vim %{buildroot}%{_datadir}/vim/vimfiles/syntax/ninja.vim
-< install -pm644 %{SOURCE1} %{buildroot}%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
-< install -pm644 misc/zsh-completion %{buildroot}%{_datadir}/zsh/site-functions/_ninja
-53,58d44
-< %{_datadir}/bash-completion/completions/ninja-bash-completion
-< %{_datadir}/emacs/site-lisp/ninja-mode.el
-< %{_datadir}/vim/vimfiles/syntax/ninja.vim
-< %{_datadir}/vim/vimfiles/ftdetect/ninja.vim
-< # zsh does not have a -filesystem package
-< %{_datadir}/zsh/
+--- ninja-build.spec.orig	2016-01-20 14:41:16.892802134 +0000
+++ ninja-build.spec	2016-01-20 14:44:42.453227192 +0000
+@@ -1,19 +1,18 @@
+-Name:           ninja-build
+Name:           scylla-ninja-build
+ Version:        1.6.0
+ Release:        2%{?dist}
+ Summary:        A small build system with a focus on speed
+ License:        ASL 2.0
+ URL:            http://martine.github.com/ninja/
+ Source0:        https://github.com/martine/ninja/archive/v%{version}.tar.gz#/ninja-%{version}.tar.gz
+-Source1:        ninja.vim
+ # Rename mentions of the executable name to be ninja-build.
+ Patch1000:      ninja-1.6.0-binary-rename.patch
+Requires:	scylla-env
+ BuildRequires:  asciidoc
+ BuildRequires:  gtest-devel
+ BuildRequires:  python2-devel
+-BuildRequires:  re2c >= 0.11.3
+-Requires:       emacs-filesystem
+-Requires:       vim-filesystem
+#BuildRequires:  scylla-re2c >= 0.11.3
+%define _prefix /opt/scylladb
+ 
+ %description
+ Ninja is a small build system with a focus on speed. It differs from other
+@@ -32,15 +31,8 @@
+ ./ninja -v ninja_test
+ 
+ %install
+-# TODO: Install ninja_syntax.py?
+-mkdir -p %{buildroot}/{%{_bindir},%{_datadir}/bash-completion/completions,%{_datadir}/emacs/site-lisp,%{_datadir}/vim/vimfiles/syntax,%{_datadir}/vim/vimfiles/ftdetect,%{_datadir}/zsh/site-functions}
+-
+mkdir -p %{buildroot}/opt/scylladb/bin
+ install -pm755 ninja %{buildroot}%{_bindir}/ninja-build
+-install -pm644 misc/bash-completion %{buildroot}%{_datadir}/bash-completion/completions/ninja-bash-completion
+-install -pm644 misc/ninja-mode.el %{buildroot}%{_datadir}/emacs/site-lisp/ninja-mode.el
+-install -pm644 misc/ninja.vim %{buildroot}%{_datadir}/vim/vimfiles/syntax/ninja.vim
+-install -pm644 %{SOURCE1} %{buildroot}%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
+-install -pm644 misc/zsh-completion %{buildroot}%{_datadir}/zsh/site-functions/_ninja
+ 
+ %check
+ # workaround possible too low default limits
+@@ -50,12 +42,6 @@
+ %files
+ %doc COPYING HACKING.md README doc/manual.html
+ %{_bindir}/ninja-build
+-%{_datadir}/bash-completion/completions/ninja-bash-completion
+-%{_datadir}/emacs/site-lisp/ninja-mode.el
+-%{_datadir}/vim/vimfiles/syntax/ninja.vim
+-%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
+-# zsh does not have a -filesystem package
+-%{_datadir}/zsh/
+ 
+ %changelog
+ * Mon Nov 16 2015 Ben Boeckel <mathstuf@gmail.com> - 1.6.0-2
--- a/dist/redhat/centos_dep/pyparsing.diff
+++ b/dist/redhat/centos_dep/pyparsing.diff
@@ -0,0 +1,40 @@
+--- pyparsing.spec.orig	2016-01-25 19:11:14.663651658 +0900
+++ pyparsing.spec	2016-01-25 19:12:49.853875369 +0900
+@@ -1,4 +1,4 @@
+-%if 0%{?fedora}
+%if 0%{?centos}
+ %global with_python3 1
+ %endif
+ 
+@@ -15,7 +15,7 @@
+ BuildRequires:  dos2unix
+ BuildRequires:  glibc-common
+ %if 0%{?with_python3}
+-BuildRequires: python3-devel
+BuildRequires: python34-devel
+ %endif # if with_python3
+ 
+ %description
+@@ -30,11 +30,11 @@
+ The package contains documentation for pyparsing.
+ 
+ %if 0%{?with_python3}
+-%package -n python3-pyparsing
+%package -n python34-pyparsing
+ Summary:        An object-oriented approach to text processing (Python 3 version)
+ Group:          Development/Libraries
+ 
+-%description -n python3-pyparsing
+%description -n python34-pyparsing
+ pyparsing is a module that can be used to easily and directly configure syntax
+ definitions for any number of text parsing applications.
+ 
+@@ -90,7 +90,7 @@
+ %{python_sitelib}/pyparsing.py*
+ 
+ %if 0%{?with_python3}
+-%files -n python3-pyparsing
+%files -n python34-pyparsing
+ %doc CHANGES README LICENSE
+ %{python3_sitelib}/pyparsing*egg-info
+ %{python3_sitelib}/pyparsing.py*
--- a/dist/redhat/centos_dep/ragel.diff
+++ b/dist/redhat/centos_dep/ragel.diff
@@ -1,11 +1,11 @@
--- ragel.spec	2014-08-18 11:55:49.000000000 +0000
-+++ ragel.spec.1	2015-10-18 12:18:23.000000000 +0000
+--- ragel.spec.orig	2015-06-18 22:12:28.000000000 +0000
+++ ragel.spec	2016-01-20 14:49:53.980327766 +0000
@@ -1,17 +1,20 @@
 -Name:           ragel   
 +Name:           scylla-ragel   
 +%define orig_name ragel
 Version:        6.8
- Release:        3%{?dist}
+ Release:        5%{?dist}
 Summary:        Finite state machine compiler
 
 Group:          Development/Tools
--- a/dist/redhat/scripts/scylla_run
+++ b/dist/redhat/scripts/scylla_run
@@ -1,14 +0,0 @@
-#!/bin/sh -e
-
-args="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info $SCYLLA_ARGS"
-
-if [ "$NETWORK_MODE" = "posix" ]; then
-    args="$args --network-stack posix"
-elif [ "$NETWORK_MODE" = "virtio" ]; then
-    args="$args --network-stack native"
-elif [ "$NETWORK_MODE" = "dpdk" ]; then
-    args="$args --network-stack native --dpdk-pmd"
-fi
-
-export HOME=/var/lib/scylla
-exec sudo -E -u $USER /usr/bin/scylla $args
--- a/dist/redhat/scylla-server.spec.in
+++ b/dist/redhat/scylla-server.spec.in
@@ -8,10 +8,11 @@ License:        AGPLv3
 URL:            http://www.scylladb.com/
 Source0:        %{name}-@@VERSION@@-@@RELEASE@@.tar

-BuildRequires:  libaio-devel boost-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel
-%{?fedora:BuildRequires: ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan}
-%{?rhel:BuildRequires: scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1}
-Requires:       systemd-libs xfsprogs mdadm hwloc
+BuildRequires:  libaio-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel systemd-devel
+%{?fedora:BuildRequires: boost-devel ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan python3-pyparsing}
+%{?rhel:BuildRequires: scylla-libstdc++-static scylla-boost-devel scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1, python34-pyparsing}
+Requires:       systemd-libs hwloc
+Conflicts:      abrt

 %description

@@ -28,30 +29,29 @@ Requires:       systemd-libs xfsprogs mdadm hwloc
 ./configure.py --with scylla --disable-xen --enable-dpdk --mode=release
 %endif
 %if 0%{?rhel}
-./configure.py --with scylla --disable-xen --enable-dpdk --mode=release --static-stdc++ --compiler=/opt/scylladb/bin/g++
+python3.4 ./configure.py --with scylla --disable-xen --enable-dpdk --mode=release --static-stdc++ --compiler=/opt/scylladb/bin/g++ --python python3.4
 %endif
 ninja-build -j2

 %install
 rm -rf $RPM_BUILD_ROOT
 mkdir -p $RPM_BUILD_ROOT%{_bindir}
-mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sysctl.d/
 mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sysconfig/
 mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/security/limits.d/
+mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sudoers.d/
 mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
 mkdir -p $RPM_BUILD_ROOT%{_docdir}/scylla/
 mkdir -p $RPM_BUILD_ROOT%{_unitdir}
 mkdir -p $RPM_BUILD_ROOT%{_prefix}/lib/scylla/

-install -m644 dist/common/sysctl.d/99-scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/sysctl.d/
 install -m644 dist/common/sysconfig/scylla-server $RPM_BUILD_ROOT%{_sysconfdir}/sysconfig/
 install -m644 dist/common/limits.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/security/limits.d/
+install -m644 dist/common/sudoers.d/scylla $RPM_BUILD_ROOT%{_sysconfdir}/sudoers.d/
 install -d -m755 $RPM_BUILD_ROOT%{_sysconfdir}/scylla
 install -m644 conf/scylla.yaml $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
 install -m644 conf/cassandra-rackdc.properties $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
 install -m644 dist/redhat/systemd/scylla-server.service $RPM_BUILD_ROOT%{_unitdir}/
 install -m755 dist/common/scripts/* $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
-install -m755 dist/redhat/scripts/* $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
 install -m755 seastar/scripts/posix_net_conf.sh  $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
 install -m755 seastar/dpdk/tools/dpdk_nic_bind.py $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
 install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
@@ -128,7 +128,7 @@ rm -rf $RPM_BUILD_ROOT

 %config(noreplace) %{_sysconfdir}/sysconfig/scylla-server
 %{_sysconfdir}/security/limits.d/scylla.conf
-%{_sysconfdir}/sysctl.d/99-scylla.conf
+%{_sysconfdir}/sudoers.d/scylla
 %attr(0755,root,root) %dir %{_sysconfdir}/scylla
 %config(noreplace) %{_sysconfdir}/scylla/scylla.yaml
 %config(noreplace) %{_sysconfdir}/scylla/cassandra-rackdc.properties
@@ -140,9 +140,8 @@ rm -rf $RPM_BUILD_ROOT
 %{_unitdir}/scylla-server.service
 %{_bindir}/scylla
 %{_prefix}/lib/scylla/scylla_prepare
-%{_prefix}/lib/scylla/scylla_run
 %{_prefix}/lib/scylla/scylla_stop
-%{_prefix}/lib/scylla/scylla_save_coredump
+%{_prefix}/lib/scylla/scylla_setup
 %{_prefix}/lib/scylla/scylla_coredump_setup
 %{_prefix}/lib/scylla/scylla_raid_setup
 %{_prefix}/lib/scylla/scylla_sysconfig_setup
--- a/dist/redhat/systemd/scylla-server.service
+++ b/dist/redhat/systemd/scylla-server.service
@@ -1,20 +1,23 @@
 [Unit]
 Description=Scylla Server
-After=network.target libvirtd.service
+After=network.target

 [Service]
-Type=simple
+Type=notify
 LimitMEMLOCK=infinity
 LimitNOFILE=200000
 LimitAS=infinity
 LimitNPROC=8096
+WorkingDirectory=/var/lib/scylla
+Environment="HOME=/var/lib/scylla"
 EnvironmentFile=/etc/sysconfig/scylla-server
-ExecStartPre=/usr/lib/scylla/scylla_prepare
-ExecStart=/usr/lib/scylla/scylla_run
-ExecStopPost=/usr/lib/scylla/scylla_stop
+ExecStartPre=/usr/bin/sudo -E /usr/lib/scylla/scylla_prepare
+ExecStart=/usr/bin/scylla $SCYLLA_ARGS
+ExecStopPost=/usr/bin/sudo -E /usr/lib/scylla/scylla_stop
 TimeoutStartSec=900
 KillMode=process
 Restart=no
+User=scylla

 [Install]
 WantedBy=multi-user.target
--- a/dist/ubuntu/build_deb.sh
+++ b/dist/ubuntu/build_deb.sh
@@ -9,6 +9,19 @@ if [ -e debian ] || [ -e build/release ]; then
    rm -rf debian build
    mkdir build
 fi
+sudo apt-get -y update
+if [ ! -f /usr/bin/git ]; then
+    sudo apt-get -y install git
+fi
+if [ ! -f /usr/bin/mk-build-deps ]; then
+    sudo apt-get -y install devscripts
+fi
+if [ ! -f /usr/bin/equivs-build ]; then
+    sudo apt-get -y install equivs
+fi
+if [ ! -f /usr/bin/add-apt-repository ]; then
+    sudo apt-get -y install software-properties-common
+fi

 RELEASE=`lsb_release -r|awk '{print $2}'`
 CODENAME=`lsb_release -c|awk '{print $2}'`
@@ -21,9 +34,6 @@ fi
 VERSION=$(./SCYLLA-VERSION-GEN)
 SCYLLA_VERSION=$(cat build/SCYLLA-VERSION-FILE)
 SCYLLA_RELEASE=$(cat build/SCYLLA-RELEASE-FILE)
-if [ "$SCYLLA_VERSION" = "development" ]; then
-	SCYLLA_VERSION=0development
-fi
 echo $VERSION > version
 ./scripts/git-archive-all --extra version --force-submodules --prefix scylla-server ../scylla-server_$SCYLLA_VERSION-$SCYLLA_RELEASE.orig.tar.gz 

@@ -34,27 +44,13 @@ sed -i -e "s/@@VERSION@@/$SCYLLA_VERSION/g" debian/changelog
 sed -i -e "s/@@RELEASE@@/$SCYLLA_RELEASE/g" debian/changelog
 sed -i -e "s/@@CODENAME@@/$CODENAME/g" debian/changelog

-sudo apt-get -y update
-
 ./dist/ubuntu/dep/build_dependency.sh

-DEP="libyaml-cpp-dev liblz4-dev libsnappy-dev libcrypto++-dev libjsoncpp-dev libaio-dev ragel ninja-build git liblz4-1 libaio1 hugepages software-properties-common libgnutls28-dev libhwloc-dev libnuma-dev libpciaccess-dev"
-
-if [ "$RELEASE" = "14.04" ]; then
-    DEP="$DEP libboost1.55-dev libboost-program-options1.55.0 libboost-program-options1.55-dev libboost-system1.55.0 libboost-system1.55-dev libboost-thread1.55.0 libboost-thread1.55-dev libboost-test1.55.0 libboost-test1.55-dev libboost-filesystem1.55-dev libboost-filesystem1.55.0 libsnappy1"
-else
-    DEP="$DEP libboost-dev libboost-program-options-dev libboost-system-dev libboost-thread-dev libboost-test-dev libboost-filesystem-dev libboost-filesystem-dev libsnappy1v5"
-fi
-if [ "$RELEASE" = "15.10" ]; then
-    DEP="$DEP libjsoncpp0v5 libcrypto++9v5 libyaml-cpp0.5v5 antlr3"
-else
-    DEP="$DEP libjsoncpp0 libcrypto++9 libyaml-cpp0.5"
-fi
-sudo apt-get -y install $DEP
 if [ "$RELEASE" != "15.10" ]; then
    sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
    sudo apt-get -y update
 fi
 sudo apt-get -y install g++-4.9
+echo Y | sudo mk-build-deps -i -r

 debuild -r fakeroot -us -uc
--- a/dist/ubuntu/debian/control
+++ b/dist/ubuntu/debian/control
@@ -4,11 +4,11 @@ Homepage: http://scylladb.com
 Section: database
 Priority: optional
 Standards-Version: 3.9.5
-Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, g++-4.9, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev
+Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, g++-4.9, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev, xfslibs-dev, python3-pyparsing

 Package: scylla-server
 Architecture: amd64
-Depends: ${shlibs:Depends}, ${misc:Depends}, hugepages, adduser, mdadm, xfsprogs, hwloc-nox
+Depends: ${shlibs:Depends}, ${misc:Depends}, hugepages, adduser, hwloc-nox
 Description: Scylla database server binaries 
 Scylla is a highly scalable, eventually consistent, distributed,
 partitioned row DB.
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`scylla ALL=(ALL) NOPASSWD:SETENV: /usr/lib/scylla/scylla_prepare,/usr/lib/scylla/scylla_stop`